[PATCH] powerpc/fsl_msi: clean up and document calculation of MSIIR address
Commit 3da34aae (powerpc/fsl: Support unique MSI addresses per PCIe Root Complex) redefined the meanings of msi->msi_addr_hi and msi->msi_addr_lo to be an offset rather than an address. To help clarify the code, we make the following changes: 1) Get rid of msi_addr_hi, which is always zero anyway. 2) Rename msi_addr_lo to ccsr_msiir_offset, to indicate that it's an offset relative to the beginning of CCSR. 3) Calculate 64-bit addresses using actual 64-bit math. 4) Document some of the code and assumptions we make. Signed-off-by: Timur Tabi --- arch/powerpc/sysdev/fsl_msi.c | 26 ++ arch/powerpc/sysdev/fsl_msi.h |3 +-- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index 419a772..d824230 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -30,7 +30,7 @@ LIST_HEAD(msi_head); struct fsl_msi_feature { u32 fsl_pic_ip; - u32 msiir_offset; + u32 msiir_offset; /* offset of MSIIR, relative to start of MSI regs */ }; struct fsl_msi_cascade_data { @@ -120,16 +120,23 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev) return; } +/* + * Initialize the address and data fields of an MSI message object + */ static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq, struct msi_msg *msg, - struct fsl_msi *fsl_msi_data) + struct fsl_msi *msi_data) { - struct fsl_msi *msi_data = fsl_msi_data; struct pci_controller *hose = pci_bus_to_host(pdev->bus); - u64 base = fsl_pci_immrbar_base(hose); - msg->address_lo = msi_data->msi_addr_lo + lower_32_bits(base); - msg->address_hi = msi_data->msi_addr_hi + upper_32_bits(base); + /* +* The PCI address of MSIIR is equal to the PCI base address of CCSR +* plus the offset of MSIIR. +*/ + u64 addr = fsl_pci_immrbar_base(hose) + msi_data->ccsr_msiir_offset; + + msg->address_hi = upper_32_bits(addr); + msg->address_lo = lower_32_bits(addr); msg->data = hwirq; @@ -359,8 +366,11 @@ static int __devinit fsl_of_msi_probe(struct platform_device *dev) msi->irqhost->host_data = msi; - msi->msi_addr_hi = 0x0; - msi->msi_addr_lo = features->msiir_offset + (res.start & 0xf); + /* +* We assume that the 'reg' property of the MSI node contains an +* offset that has five (or fewer) digits, hence the 0xf. +*/ + msi->ccsr_msiir_offset = features->msiir_offset + (res.start & 0xf); rc = fsl_msi_init_allocator(msi); if (rc) { diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h index 624580c..eb68c42 100644 --- a/arch/powerpc/sysdev/fsl_msi.h +++ b/arch/powerpc/sysdev/fsl_msi.h @@ -28,8 +28,7 @@ struct fsl_msi { unsigned long cascade_irq; - u32 msi_addr_lo; - u32 msi_addr_hi; + u32 ccsr_msiir_offset; /* offset of MSIIR, relative to start of CCSR */ void __iomem *msi_regs; u32 feature; int msi_virqs[NR_MSI_REG]; -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
* Aaron Fabbri (aafab...@cisco.com) wrote: > On 8/26/11 12:35 PM, "Chris Wright" wrote: > > * Aaron Fabbri (aafab...@cisco.com) wrote: > >> Each process will open vfio devices on the fly, and they need to be able to > >> share IOMMU resources. > > > > How do you share IOMMU resources w/ multiple processes, are the processes > > sharing memory? > > Sorry, bad wording. I share IOMMU domains *within* each process. Ah, got it. Thanks. > E.g. If one process has 3 devices and another has 10, I can get by with two > iommu domains (and can share buffers among devices within each process). > > If I ever need to share devices across processes, the shared memory case > might be interesting. > > > > >> So I need the ability to dynamically bring up devices and assign them to a > >> group. The number of actual devices and how they map to iommu domains is > >> not known ahead of time. We have a single piece of silicon that can expose > >> hundreds of pci devices. > > > > This does not seem fundamentally different from the KVM use case. > > > > We have 2 kinds of groupings. > > > > 1) low-level system or topoolgy grouping > > > >Some may have multiple devices in a single group > > > >* the PCIe-PCI bridge example > >* the POWER partitionable endpoint > > > >Many will not > > > >* singleton group, e.g. typical x86 PCIe function (majority of > > assigned devices) > > > >Not sure it makes sense to have these administratively defined as > >opposed to system defined. > > > > 2) logical grouping > > > >* multiple low-level groups (singleton or otherwise) attached to same > > process, allowing things like single set of io page tables where > > applicable. > > > >These are nominally adminstratively defined. In the KVM case, there > >is likely a privileged task (i.e. libvirtd) involved w/ making the > >device available to the guest and can do things like group merging. > >In your userspace case, perhaps it should be directly exposed. > > Yes. In essence, I'd rather not have to run any other admin processes. > Doing things programmatically, on the fly, from each process, is the > cleanest model right now. I don't see an issue w/ this. As long it can not add devices to the system defined groups, it's not a privileged operation. So we still need the iommu domain concept exposed in some form to logically put groups into a single iommu domain (if desired). In fact, I believe Alex covered this in his most recent recap: ...The group fd will provide interfaces for enumerating the devices in the group, returning a file descriptor for each device in the group (the "device fd"), binding groups together, and returning a file descriptor for iommu operations (the "iommu fd"). thanks, -chris ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/eeh: fix /proc/ppc64/eeh creation
Since commit 188917e183cf9ad0374b571006d0fc6d48a7f447, /proc/ppc64 is a symlink to /proc/powerpc/. That means that creating /proc/ppc64/eeh will end up with a unaccessible file, that is not listed under /proc/powerpc/ and, then, not listed under /proc/ppc64/. Creating /proc/powerpc/eeh fixes that problem and maintain the compatibility intended with the ppc64 symlink. Signed-off-by: Thadeu Lima de Souza Cascardo --- arch/powerpc/platforms/pseries/eeh.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c index ada6e07..d42f37d 100644 --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -1338,7 +1338,7 @@ static const struct file_operations proc_eeh_operations = { static int __init eeh_init_proc(void) { if (machine_is(pseries)) - proc_create("ppc64/eeh", 0, NULL, &proc_eeh_operations); + proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations); return 0; } __initcall(eeh_init_proc); -- 1.7.4.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
* Aaron Fabbri (aafab...@cisco.com) wrote: > On 8/26/11 7:07 AM, "Alexander Graf" wrote: > > Forget the KVM case for a moment and think of a user space device driver. I > > as > > a user am not root. But I as a user when having access to /dev/vfioX want to > > be able to access the device and manage it - and only it. The admin of that > > box needs to set it up properly for me to be able to access it. > > > > So having two steps is really the correct way to go: > > > > * create VFIO group > > * use VFIO group > > > > because the two are done by completely different users. > > This is not the case for my userspace drivers using VFIO today. > > Each process will open vfio devices on the fly, and they need to be able to > share IOMMU resources. How do you share IOMMU resources w/ multiple processes, are the processes sharing memory? > So I need the ability to dynamically bring up devices and assign them to a > group. The number of actual devices and how they map to iommu domains is > not known ahead of time. We have a single piece of silicon that can expose > hundreds of pci devices. This does not seem fundamentally different from the KVM use case. We have 2 kinds of groupings. 1) low-level system or topoolgy grouping Some may have multiple devices in a single group * the PCIe-PCI bridge example * the POWER partitionable endpoint Many will not * singleton group, e.g. typical x86 PCIe function (majority of assigned devices) Not sure it makes sense to have these administratively defined as opposed to system defined. 2) logical grouping * multiple low-level groups (singleton or otherwise) attached to same process, allowing things like single set of io page tables where applicable. These are nominally adminstratively defined. In the KVM case, there is likely a privileged task (i.e. libvirtd) involved w/ making the device available to the guest and can do things like group merging. In your userspace case, perhaps it should be directly exposed. > In my case, the only administrative task would be to give my processes/users > access to the vfio groups (which are initially singletons), and the > application actually opens them and needs the ability to merge groups > together to conserve IOMMU resources (assuming we're not going to expose > uiommu). I agree, we definitely need to expose _some_ way to do this. thanks, -chris ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On 8/26/11 12:35 PM, "Chris Wright" wrote: > * Aaron Fabbri (aafab...@cisco.com) wrote: >> On 8/26/11 7:07 AM, "Alexander Graf" wrote: >>> Forget the KVM case for a moment and think of a user space device driver. I >>> as >>> a user am not root. But I as a user when having access to /dev/vfioX want to >>> be able to access the device and manage it - and only it. The admin of that >>> box needs to set it up properly for me to be able to access it. >>> >>> So having two steps is really the correct way to go: >>> >>> * create VFIO group >>> * use VFIO group >>> >>> because the two are done by completely different users. >> >> This is not the case for my userspace drivers using VFIO today. >> >> Each process will open vfio devices on the fly, and they need to be able to >> share IOMMU resources. > > How do you share IOMMU resources w/ multiple processes, are the processes > sharing memory? Sorry, bad wording. I share IOMMU domains *within* each process. E.g. If one process has 3 devices and another has 10, I can get by with two iommu domains (and can share buffers among devices within each process). If I ever need to share devices across processes, the shared memory case might be interesting. > >> So I need the ability to dynamically bring up devices and assign them to a >> group. The number of actual devices and how they map to iommu domains is >> not known ahead of time. We have a single piece of silicon that can expose >> hundreds of pci devices. > > This does not seem fundamentally different from the KVM use case. > > We have 2 kinds of groupings. > > 1) low-level system or topoolgy grouping > >Some may have multiple devices in a single group > >* the PCIe-PCI bridge example >* the POWER partitionable endpoint > >Many will not > >* singleton group, e.g. typical x86 PCIe function (majority of > assigned devices) > >Not sure it makes sense to have these administratively defined as >opposed to system defined. > > 2) logical grouping > >* multiple low-level groups (singleton or otherwise) attached to same > process, allowing things like single set of io page tables where > applicable. > >These are nominally adminstratively defined. In the KVM case, there >is likely a privileged task (i.e. libvirtd) involved w/ making the >device available to the guest and can do things like group merging. >In your userspace case, perhaps it should be directly exposed. Yes. In essence, I'd rather not have to run any other admin processes. Doing things programmatically, on the fly, from each process, is the cleanest model right now. > >> In my case, the only administrative task would be to give my processes/users >> access to the vfio groups (which are initially singletons), and the >> application actually opens them and needs the ability to merge groups >> together to conserve IOMMU resources (assuming we're not going to expose >> uiommu). > > I agree, we definitely need to expose _some_ way to do this. > > thanks, > -chris ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Kernel boot up
On 08/26/2011 01:00 AM, smitha.va...@wipro.com wrote: > > Thanks scott. > > There was an issue with the file system. Now my board is up with the > linux boot prompt . > But ping is not working. You still haven't set your MAC address. U-Boot should be fixing this up in the device tree. > The local loopback ping works. My phy chip > BCM5221 is connected on port A Your device tree describes a connection on port C. You need to update the mdio node's reg to point to port A's registers (0x10d00), and fsl,mdio-pin and fsl,mdc-pin need to be set to the particular pins your board uses. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On Thu, 2011-08-25 at 20:05 +0200, Joerg Roedel wrote: > On Thu, Aug 25, 2011 at 11:20:30AM -0600, Alex Williamson wrote: > > On Thu, 2011-08-25 at 12:54 +0200, Roedel, Joerg wrote: > > > > We need to solve this differently. ARM is starting to use the iommu-api > > > too and this definitly does not work there. One possible solution might > > > be to make the iommu-ops per-bus. > > > > That sounds good. Is anyone working on it? It seems like it doesn't > > hurt to use this in the interim, we may just be watching the wrong bus > > and never add any sysfs group info. > > I'll cook something up for RFC over the weekend. > > > > Also the return type should not be long but something that fits into > > > 32bit on all platforms. Since you use -ENODEV, probably s32 is a good > > > choice. > > > > The convenience of using seg|bus|dev|fn was too much to resist, too bad > > it requires a full 32bits. Maybe I'll change it to: > > int iommu_device_group(struct device *dev, unsigned int *group) > > If we really expect segment numbers that need the full 16 bit then this > would be the way to go. Otherwise I would prefer returning the group-id > directly and partition the group-id space for the error values (s32 with > negative numbers being errors). It's unlikely to have segments using the top bit, but it would be broken for an iommu driver to define it's group numbers using pci s:b:d.f if we don't have that bit available. Ben/David, do PEs have an identifier of a convenient size? I'd guess any hardware based identifier is going to use a full unsigned bit width. Thanks, Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On 8/26/11 7:07 AM, "Alexander Graf" wrote: > > > Forget the KVM case for a moment and think of a user space device driver. I as > a user am not root. But I as a user when having access to /dev/vfioX want to > be able to access the device and manage it - and only it. The admin of that > box needs to set it up properly for me to be able to access it. > > So having two steps is really the correct way to go: > > * create VFIO group > * use VFIO group > > because the two are done by completely different users. This is not the case for my userspace drivers using VFIO today. Each process will open vfio devices on the fly, and they need to be able to share IOMMU resources. So I need the ability to dynamically bring up devices and assign them to a group. The number of actual devices and how they map to iommu domains is not known ahead of time. We have a single piece of silicon that can expose hundreds of pci devices. In my case, the only administrative task would be to give my processes/users access to the vfio groups (which are initially singletons), and the application actually opens them and needs the ability to merge groups together to conserve IOMMU resources (assuming we're not going to expose uiommu). -Aaron ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
VFIO v2 design plan
I don't think too much has changed since the previous email went out, but it seems like a good idea to post a summary in case there were suggestions or objections that I missed. VFIO v2 will rely on the platform iommu driver reporting grouping information. Again, a group is a set of devices for which the iommu cannot differentiate transactions. An example would be a set of devices behind a PCI-to-PCI bridge. All transactions appear to be from the bridge itself rather than devices behind the bridge. Platforms are free to have whatever constraints they need to for what constitutes a group. I posted a rough draft of patch to implement that for the base iommu driver and VT-d, adding an iommu_device_group callback on iommu ops. The iommu base driver also populates an iommu_group sysfs file for each device that's part of a group. Members of the same group return the same value via either the sysfs or iommu_device_group. The value returned is arbitrary, should not be assumed to be persistent across boots, and is left to the iommu driver to generate. There are some implementation details around how to do this without favoring one bus over another, but the interface should be bus/device type agnostic in the end. When the vfio module is loaded, character devices will be created for each group in /dev/vfio/$GROUP. Setting file permissions on these files should be sufficient for providing a user with complete access to the group. Opening this device file provides what we'll call the "group fd". The group fd is restricted to only work with a single mm context. Concurrent opens will be denied if the opening process mm does not match. The group fd will provide interfaces for enumerating the devices in the group, returning a file descriptor for each device in the group (the "device fd"), binding groups together, and returning a file descriptor for iommu operations (the "iommu fd"). A group is "viable" when all member devices of the group are bound to the vfio driver. Until that point, the group fd only allows enumeration interfaces (ie. listing of group devices). I'm currently thinking enumeration will be done by a simple read() on the device file returning a list of dev_name()s. Once the group is viable, the user may bind the group to another group, retrieve the iommu fd, or retrieve device fds. Internally, each of these operations will result in an iommu domain being allocated and all of the devices attached to the domain. The purpose of binding groups is to share the iommu domain. Groups making use of incompatible iommu domains will fail to bind. Groups making use of different mm's will fail to bind. The vfio driver may reject some binding based on domain capabilities, but final veto power is left to the iommu driver[1]. If a user makes use of a group independently and later wishes to bind it to another group, all the device fds and the iommu fd must first be closed. This prevents using a stale iommu fd or accessing devices while the iommu is being switched. Operations on any group fds of a merged group are performed globally on the group (ie. enumerating the devices lists all devices in the merged group, retrieving the iommu fd from any group fd results in the same fd, device fds from any group can be retrieved from any group fd[2]). Groups can be merged and unmerged dynamically. Unmerging a group requires the device fds for the outgoing group are closed. The iommu fd will remain persistent for the remaining merged group. If a device within a group is unbound from the vfio driver while it's in use (iommu fd refcnt > 0 || device fd recnt > 0), vfio will block the release and send netlink remove requests for every opened device in the group (or merged group). If the device fds are not released and subsequently the iommu fd released as well, vfio will kill the user process after some delay. At some point in the future we may be able to adapt this to perform a hard removal and revoke all device access without killing the user. The iommu fd supports dma mapping and unmapping ioctls as well as some, yet to be defined and possibly architecture specific, iommu description interfaces. At some point we may also make use of read/write/mmap on the iommu fd as means to setup dma. The device fds will largely support the existing vfio interface, with generalizations to make it non-pci specific. We'll access mmio/pio/pci config using segmented offset into the device fd. Interrupts will use the existing mechanisms (eventfds/irqfd). We'll need to add ioctls to describe the type of device, number, size, and type of each resource and available interrupts. We still have outstanding questions with how devices are exposed in qemu, but I think that's largely a qemu-vfio problem and the vfio kernel interface described here supports all the interesting ways that devices can be exposed as individuals or sets. I'm currently working on code changes to support the above and will post as I complete useful chu
Re: kvm PCI assignment & VFIO ramblings
On 26.08.2011, at 10:24, Joerg Roedel wrote: > On Fri, Aug 26, 2011 at 09:07:35AM -0500, Alexander Graf wrote: >> On 26.08.2011, at 04:33, Roedel, Joerg wrote: >>> >>> The reason is that you mean the usability for the programmer and I mean >>> it for the actual user of qemu :) >> >> No, we mean the actual user of qemu. The reason being that making a >> device available for any user space application is an administrative >> task. >> >> Forget the KVM case for a moment and think of a user space device >> driver. I as a user am not root. But I as a user when having access to >> /dev/vfioX want to be able to access the device and manage it - and >> only it. The admin of that box needs to set it up properly for me to >> be able to access it. > > Right, and that task is being performed by attaching the device(s) in > question to the vfio driver. The rights-management happens on the > /dev/vfio/$group file. Yup :) > >> So having two steps is really the correct way to go: >> >> * create VFIO group >> * use VFIO group >> >> because the two are done by completely different users. It's similar >> to how tun/tap works in Linux too. Of course nothing keeps you from >> also creating a group on the fly, but it shouldn't be the only >> interface available. The persistent setup is definitely more useful. > > I see the use-case. But to make it as easy as possible for the end-user > we can do both. > > So the user of (qemu again) does this: > > # vfio-ctl attach 00:01.0 > vfio-ctl: attached to group 8 > # vfio-ctl attach 00:02.0 > vfio-ctl: attached to group 16 > $ qemu -device vfio-pci,host=00:01.0 -device vfio,host=00:01.0 ... > > which should cover the usecase you prefer. Qemu still creates the > meta-group that allow the devices to share the same page-table. But what > should also be possible is: > > # qemu -device vfio-pci,host=00:01.0 -device vfio-pci,host=00:02.0 > > In that case qemu detects that the devices are not yet bound to vfio and > will do so and also unbinds them afterwards (essentially the developer > use-case). I agree. The same it works with tun today. You can either have qemu spawn a tun device dynamically or have a preallocated one you use. If you run qemu as a user (which I always do), I preallocate a tun device and attach qemu to it. > Your interface which requires pre-binding of devices into one group by > the administrator only makes sense if you want to force userspace to > use certain devices (which do not belong to the same hw-group) only > together. But I don't see a usecase for defining such constraints (yet). Agreed. As long as the kernel backend can always figure out the hw-groups, we're good :) Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On Fri, Aug 26, 2011 at 09:07:35AM -0500, Alexander Graf wrote: > On 26.08.2011, at 04:33, Roedel, Joerg wrote: > > > > The reason is that you mean the usability for the programmer and I mean > > it for the actual user of qemu :) > > No, we mean the actual user of qemu. The reason being that making a > device available for any user space application is an administrative > task. > > Forget the KVM case for a moment and think of a user space device > driver. I as a user am not root. But I as a user when having access to > /dev/vfioX want to be able to access the device and manage it - and > only it. The admin of that box needs to set it up properly for me to > be able to access it. Right, and that task is being performed by attaching the device(s) in question to the vfio driver. The rights-management happens on the /dev/vfio/$group file. > So having two steps is really the correct way to go: > > * create VFIO group > * use VFIO group > > because the two are done by completely different users. It's similar > to how tun/tap works in Linux too. Of course nothing keeps you from > also creating a group on the fly, but it shouldn't be the only > interface available. The persistent setup is definitely more useful. I see the use-case. But to make it as easy as possible for the end-user we can do both. So the user of (qemu again) does this: # vfio-ctl attach 00:01.0 vfio-ctl: attached to group 8 # vfio-ctl attach 00:02.0 vfio-ctl: attached to group 16 $ qemu -device vfio-pci,host=00:01.0 -device vfio,host=00:01.0 ... which should cover the usecase you prefer. Qemu still creates the meta-group that allow the devices to share the same page-table. But what should also be possible is: # qemu -device vfio-pci,host=00:01.0 -device vfio-pci,host=00:02.0 In that case qemu detects that the devices are not yet bound to vfio and will do so and also unbinds them afterwards (essentially the developer use-case). Your interface which requires pre-binding of devices into one group by the administrator only makes sense if you want to force userspace to use certain devices (which do not belong to the same hw-group) only together. But I don't see a usecase for defining such constraints (yet). Joerg ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On 26.08.2011, at 04:33, Roedel, Joerg wrote: > On Fri, Aug 26, 2011 at 12:20:00AM -0400, David Gibson wrote: >> On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote: >>> On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: >>> > I don't see a reason to make this meta-grouping static. It would harm > flexibility on x86. I think it makes things easier on power but there > are options on that platform to get the dynamic solution too. I think several people are misreading what Ben means by "static". I would prefer to say 'persistent', in that the meta-groups lifetime is not tied to an fd, but they can be freely created, altered and removed during runtime. >>> >>> Even if it can be altered at runtime, from a usability perspective it is >>> certainly the best to handle these groups directly in qemu. Or are there >>> strong reasons to do it somewhere else? >> >> Funny, Ben and I think usability demands it be the other way around. > > The reason is that you mean the usability for the programmer and I mean > it for the actual user of qemu :) No, we mean the actual user of qemu. The reason being that making a device available for any user space application is an administrative task. Forget the KVM case for a moment and think of a user space device driver. I as a user am not root. But I as a user when having access to /dev/vfioX want to be able to access the device and manage it - and only it. The admin of that box needs to set it up properly for me to be able to access it. So having two steps is really the correct way to go: * create VFIO group * use VFIO group because the two are done by completely different users. It's similar to how tun/tap works in Linux too. Of course nothing keeps you from also creating a group on the fly, but it shouldn't be the only interface available. The persistent setup is definitely more useful. > >> If the meta-groups are transient - that is lifetime tied to an fd - >> then any program that wants to use meta-groups *must* know the >> interfaces for creating one, whatever they are. >> >> But if they're persistent, the admin can use other tools to create the >> meta-group then just hand it to a program to use, since the interfaces >> for _using_ a meta-group are identical to those for an atomic group. >> >> This doesn't preclude a program from being meta-group aware, and >> creating its own if it wants to, of course. My guess is that qemu >> would not want to build its own meta-groups, but libvirt probably >> would. > > Doing it in libvirt makes it really hard for a plain user of qemu to > assign more than one device to a guest. What I want it that a user just > types > > qemu -device vfio,host=00:01.0 -device vfio,host=00:02.0 ... > > and it just works. Qemu creates the meta-groups and they are > automatically destroyed when qemu exits. That the programs are not aware > of meta-groups is not a big problem because all software using vfio > needs still to be written :) > > Btw, with this concept the programmer can still decide to not use > meta-groups and just multiplex the mappings to all open device-fds it > uses. What I want to see is: # vfio-create 00:01.0 /dev/vfio0 # vftio-create -a /dev/vfio0 00:02.0 /dev/vfio0 $ qemu -vfio dev=/dev/vfio0,id=vfio0 -device vfio,vfio=vfio0.0 -device vfio,vfio=vfio0.1 Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] [hw-breakpoint] Use generic hw-breakpoint interfaces for new PPC ptrace flags
On Wed, Aug 24, 2011 at 01:59:39PM +1000, David Gibson wrote: > On Tue, Aug 23, 2011 at 02:55:13PM +0530, K.Prasad wrote: > > On Tue, Aug 23, 2011 at 03:08:50PM +1000, David Gibson wrote: > > > On Fri, Aug 19, 2011 at 01:21:36PM +0530, K.Prasad wrote: > > > > PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and > > > > PPC_PTRACE_DELHWDEBUG are > > > > PowerPC specific ptrace flags that use the watchpoint register. While > > > > they are > > > > targeted primarily towards BookE users, user-space applications such as > > > > GDB > > > > have started using them for BookS too. > > > > > > > > This patch enables the use of generic hardware breakpoint interfaces > > > > for these > > > > new flags. The version number of the associated data structures > > > > "ppc_hw_breakpoint" and "ppc_debug_info" is incremented to denote new > > > > semantics. > > > > > > So, the structure itself doesn't seem to have been extended. I don't > > > understand what the semantic difference is - your patch comment needs > > > to explain this clearly. > > > > > > > We had a request to extend the structure but thought it was dangerous to > > do so. For instance if the user-space used version1 of the structure, > > while kernel did a copy_to_user() pertaining to version2, then we'd run > > into problems. Unfortunately the ptrace flags weren't designed to accept > > a version number as input from the user through the > > PPC_PTRACE_GETHWDBGINFO flag (which would have solved this issue). > > I still don't follow you. > Two things here. One, the change of semantics warranted an increment of the version number. The new semantics accepts PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE on BookS, while the old version number did not. I've added a small comment in the code to this effect. Two, regarding changes in the "ppc_hw_breakpoint" and "ppc_debug_info" structures - we would like to add more members to it if we can (GDB has a pending request to add more members to it). However the problem foreseen is that there could be a mismatch between the versions of the structure used by the user vs kernel-space i.e. if a new version of the structure, known to the kernel, had an extra member while the user-space still had the old version, then it becomes dangerous because the __copy_to_user function would overflow the buffer size in user-space. This could have been avoided if PPC_PTRACE_GETHWDBGINFO was originally designed to accept a version number (and provide corresponding "struct ppc_debug_info") rather than send a populated "ppc_debug_info" structure along with the version number. > > I'll add a comment w.r.t change in semantics - such as the ability to > > accept 'range' breakpoints in BookS. > > > > > > Apart from the usual benefits of using generic hw-breakpoint > > > > interfaces, these > > > > changes allow debuggers (such as GDB) to use a common set of ptrace > > > > flags for > > > > their watchpoint needs and allow more precise breakpoint specification > > > > (length > > > > of the variable can be specified). > > > > > > What is the mechanism for implementing the range breakpoint on book3s? > > > > > > > The hw-breakpoint interface, accepts length as an argument in BookS (any > > value <= 8 Bytes) and would filter out extraneous interrupts arising out > > of accesses outside the range comprising inside > > hw_breakpoint_handler function. > > > > We put that ability to use here. > > Ah, so in hardware the breakpoints are always 8 bytes long, but you > filter out false hits on a shorter range? Of course, the utility of > range breakpoints is questionable when length <=8, but the start must > be aligned on an 8-byte boundary. > Yes, we ensure that through + attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN; > [snip] > > > > if ((unsigned long)bp_info->addr >= TASK_SIZE) > > > > return -EIO; > > > > > > > > @@ -1398,15 +1400,86 @@ static long ppc_set_hwdebug(struct task_struct > > > > *child, > > > > dabr |= DABR_DATA_READ; > > > > if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) > > > > dabr |= DABR_DATA_WRITE; > > > > +#ifdef CONFIG_HAVE_HW_BREAKPOINT > > > > + if (bp_info->version == 1) > > > > + goto version_one; > > > > > > There are several legitimate uses of goto in the kernel, but this is > > > definitely not one of them. You're essentially using it to put the > > > old and new versions of the same function in one block. Nasty. > > > > > > > Maybe it's the label that's causing bother here. It might look elegant > > if it was called something like exit_* or error_* :-) > > > > The goto here helps reduce code, is similar to the error exits we use > > everywhere. > > Rubbish, it is not an exception exit at all, it is two separate code > paths for the different versions which would be much clearer as two > different functions. > I've re-written this part of the code to avoid a goto statement.
Re: kvm PCI assignment & VFIO ramblings
On Fri, Aug 26, 2011 at 12:20:00AM -0400, David Gibson wrote: > On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote: > > On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: > > > On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: > > > > > > I don't see a reason to make this meta-grouping static. It would harm > > > > flexibility on x86. I think it makes things easier on power but there > > > > are options on that platform to get the dynamic solution too. > > > > > > I think several people are misreading what Ben means by "static". I > > > would prefer to say 'persistent', in that the meta-groups lifetime is > > > not tied to an fd, but they can be freely created, altered and removed > > > during runtime. > > > > Even if it can be altered at runtime, from a usability perspective it is > > certainly the best to handle these groups directly in qemu. Or are there > > strong reasons to do it somewhere else? > > Funny, Ben and I think usability demands it be the other way around. The reason is that you mean the usability for the programmer and I mean it for the actual user of qemu :) > If the meta-groups are transient - that is lifetime tied to an fd - > then any program that wants to use meta-groups *must* know the > interfaces for creating one, whatever they are. > > But if they're persistent, the admin can use other tools to create the > meta-group then just hand it to a program to use, since the interfaces > for _using_ a meta-group are identical to those for an atomic group. > > This doesn't preclude a program from being meta-group aware, and > creating its own if it wants to, of course. My guess is that qemu > would not want to build its own meta-groups, but libvirt probably > would. Doing it in libvirt makes it really hard for a plain user of qemu to assign more than one device to a guest. What I want it that a user just types qemu -device vfio,host=00:01.0 -device vfio,host=00:02.0 ... and it just works. Qemu creates the meta-groups and they are automatically destroyed when qemu exits. That the programs are not aware of meta-groups is not a big problem because all software using vfio needs still to be written :) Btw, with this concept the programmer can still decide to not use meta-groups and just multiplex the mappings to all open device-fds it uses. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm PCI assignment & VFIO ramblings
On Fri, Aug 26, 2011 at 12:24:23AM -0400, David Gibson wrote: > On Thu, Aug 25, 2011 at 08:25:45AM -0500, Alexander Graf wrote: > > On 25.08.2011, at 07:31, Roedel, Joerg wrote: > > > For mmio we could stop the guest and replace the mmio region with a > > > region that is filled with 0xff, no? > > > > Sure, but that happens in user space. The question is how does > > kernel space enforce an MMIO region to not be mapped after the > > hotplug event occured? Keep in mind that user space is pretty much > > untrusted here - it doesn't have to be QEMU. It could just as well > > be a generic user space driver. And that can just ignore hotplug > > events. > > We're saying you hard yank the mapping from the userspace process. > That is, you invalidate all its PTEs mapping the MMIO space, and don't > let it fault them back in. > > As I see it there are two options: (a) make subsequent accesses from > userspace or the guest result in either a SIGBUS that userspace must > either deal with or die, or (b) replace the mapping with a dummy RO > mapping containing 0xff, with any trapped writes emulated as nops. The biggest problem with this approach is that it has to happen in the context of the given process. Linux can't really modify an mm which which belong to another context in a safe way. The more I think about this, I come to the conclusion that it would be the best to just kill the process accessing the device if it is manually de-assigned from vfio. It should be a non-standard path anyway so it doesn't make a lot of sense to implement complicated handling semantics for it, no? Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3 v2] mmc: Use mmc_delay() instead of mdelay() for time delay
The mmc_delay() is a wrapper function for mdelay() and msleep(). o mdelay() -- block the system when busy-waiting. o msleep() -- suspend the currently running task to enable CPU to process other tasks, so it is non-blocking regarding the whole system. When the desired delay time is more than a period of timer interrupt, just use msleep(). Change mdelay() to mmc_delay() to avoid chewing CPU when busy wait. Signed-off-by: Shengzhou Liu Signed-off-by: Chunhe Lan Cc: Chris Ball --- drivers/mmc/host/sdhci.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 0e02cc1..b0cf79f 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -186,7 +186,7 @@ static void sdhci_reset(struct sdhci_host *host, u8 mask) return; } timeout--; - mdelay(1); + mmc_delay(1); } if (host->ops->platform_reset_exit) @@ -957,7 +957,7 @@ static void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd) return; } timeout--; - mdelay(1); + mmc_delay(1); } mod_timer(&host->timer, jiffies + 10 * HZ); @@ -1127,7 +1127,7 @@ static void sdhci_set_clock(struct sdhci_host *host, unsigned int clock) return; } timeout--; - mdelay(1); + mmc_delay(1); } clk |= SDHCI_CLOCK_CARD_EN; @@ -1192,7 +1192,7 @@ static void sdhci_set_power(struct sdhci_host *host, unsigned short power) * can apply clock after applying power */ if (host->quirks & SDHCI_QUIRK_DELAY_AFTER_POWER) - mdelay(10); + mmc_delay(10); } /*\ @@ -1712,7 +1712,7 @@ static int sdhci_execute_tuning(struct mmc_host *mmc) ctrl = sdhci_readw(host, SDHCI_HOST_CONTROL2); tuning_loop_counter--; timeout--; - mdelay(1); + mmc_delay(1); } while (ctrl & SDHCI_CTRL_EXEC_TUNING); /* -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3 v2] mmc: Use mmc_delay() instead of mdelay() for time delay
The mmc_delay() is a wrapper function for mdelay() and msleep(). o mdelay() -- block the system when busy-waiting. o msleep() -- suspend the currently running task to enable CPU to process other tasks, so it is non-blocking regarding the whole system. When the desired delay time is more than a period of timer interrupt, just use msleep(). Change mdelay() to mmc_delay() to avoid chewing CPU when busy wait. Signed-off-by: Shengzhou Liu Signed-off-by: Chunhe Lan Cc: Chris Ball --- drivers/mmc/host/sdhci-esdhc.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/sdhci-esdhc.h b/drivers/mmc/host/sdhci-esdhc.h index c3b08f1..b8c0ab1 100644 --- a/drivers/mmc/host/sdhci-esdhc.h +++ b/drivers/mmc/host/sdhci-esdhc.h @@ -1,7 +1,7 @@ /* * Freescale eSDHC controller driver generics for OF and pltfm. * - * Copyright (c) 2007 Freescale Semiconductor, Inc. + * Copyright (c) 2007, 2011 Freescale Semiconductor, Inc. * Copyright (c) 2009 MontaVista Software, Inc. * Copyright (c) 2010 Pengutronix e.K. * Author: Wolfram Sang @@ -73,7 +73,7 @@ static inline void esdhc_set_clock(struct sdhci_host *host, unsigned int clock) | (div << ESDHC_DIVIDER_SHIFT) | (pre_div << ESDHC_PREDIV_SHIFT)); sdhci_writel(host, temp, ESDHC_SYSTEM_CONTROL); - mdelay(100); + mmc_delay(100); out: host->clock = clock; } -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3 v2] mmc: Move mmc_delay() to include/linux/mmc/core.h
Move mmc_delay() from drivers/mmc/core/core.h to include/linux/mmc/core.h. So when other functions call it with include syntax using of absolute path rather than "../core/core.h" of relative path. Signed-off-by: Chunhe Lan Cc: Chris Ball --- drivers/mmc/core/core.h | 12 include/linux/mmc/core.h | 11 +++ 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h index d9411ed..58c3f10 100644 --- a/drivers/mmc/core/core.h +++ b/drivers/mmc/core/core.h @@ -11,8 +11,6 @@ #ifndef _MMC_CORE_CORE_H #define _MMC_CORE_CORE_H -#include - #define MMC_CMD_RETRIES3 struct mmc_bus_ops { @@ -44,16 +42,6 @@ int mmc_set_signal_voltage(struct mmc_host *host, int signal_voltage, void mmc_set_timing(struct mmc_host *host, unsigned int timing); void mmc_set_driver_type(struct mmc_host *host, unsigned int drv_type); -static inline void mmc_delay(unsigned int ms) -{ - if (ms < 1000 / HZ) { - cond_resched(); - mdelay(ms); - } else { - msleep(ms); - } -} - void mmc_rescan(struct work_struct *work); void mmc_start_host(struct mmc_host *host); void mmc_stop_host(struct mmc_host *host); diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h index b8b1b7a..7bc2798 100644 --- a/include/linux/mmc/core.h +++ b/include/linux/mmc/core.h @@ -10,6 +10,7 @@ #include #include +#include struct request; struct mmc_data; @@ -182,6 +183,16 @@ static inline void mmc_claim_host(struct mmc_host *host) __mmc_claim_host(host, NULL); } +static inline void mmc_delay(unsigned int ms) +{ + if (ms < 1000 / HZ) { + cond_resched(); + mdelay(ms); + } else { + msleep(ms); + } +} + extern u32 mmc_vddrange_to_ocrmask(int vdd_min, int vdd_max); #endif /* LINUX_MMC_CORE_H */ -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/2 v2] eSDHC: Fix errors when booting kernel with fsl esdhc
> -Original Message- > From: Anton Vorontsov [mailto:cbouatmai...@gmail.com] > Sent: Friday, August 12, 2011 18:05 PM > To: Zang Roy-R61911 > Cc: linux-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; akpm@linux- > foundation.org; Xu Lei-B33228; Kumar Gala; Wood Scott-B07421 > Subject: Re: [PATCH 2/2 v2] eSDHC: Fix errors when booting kernel with fsl > esdhc > > Hello, > > On Fri, Aug 12, 2011 at 09:44:26AM +, Zang Roy-R61911 wrote: > [...] > > > We try to not pollute generic sdhci.c driver with chip-specific > > > quirks. > > > > > > Maybe you can do the fixups via IO accessors? Or by introducing > > > some additional sdhci op? > > Anton, > > thanks for the comment, as we discussed, the original code use 8 bit byte > operation, > > while in fact, on some powerpc platform, 32 bit operation is needed. > > should it be possible fixed by adding some wrapper in IO accessors or > introduce additional sdhci op? > > I would do it in the IO accessors. I may miss your email. I never see your patch about " I would do it in the IO accessors ". Thanks. Roy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] mmc: Use mmc_delay() instead of mdelay() for time delay
On Thu, 25 Aug 2011 18:23:35 +0800, Kyungmin Park wrote: On Thu, Aug 25, 2011 at 5:54 PM, Chunhe Lan wrote: The mmc_delay() is a wrapper function for mdelay() and msleep(). o mdelay() -- block the system when busy-waiting. o msleep() -- suspend the currently running task to enable CPU to process other tasks, so it is non-blocking regarding the whole system. When the desired delay time is more than a period of timer interrupt, just use msleep(). Change mdelay() to mmc_delay() to avoid chewing CPU when busy wait. Signed-off-by: Shengzhou Liu Signed-off-by: Chunhe Lan Cc: Chris Ball --- drivers/mmc/host/sdhci.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 0e02cc1..0cb5dc1 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -27,6 +27,7 @@ #include #include "sdhci.h" +#include "../core/core.h" Doesn't better to move the mmc_delay() to "include/linux/mmc/core.h"? and include it. I think It's not proper include syntax using relative path. Yes, your suggestion is very good. I will move the mmc_delay() to "include/linux/mmc/core.h" . Thanks. -Jack Lan Thank you, Kyungmin Park #define DRIVER_NAME "sdhci" ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev