Re: kvm: irqchip: fix memory leak in -stable
On Wed, Nov 04, 2015 at 02:11:54PM +0100, William Dauchy wrote: > Hello stable release team, > > The commit ba60c41 kvm: irqchip: fix memory leak > is fixing commit e73f61e kvm: irqchip: Break up high order allocations of > kvm_irq_routing_table > > I believe commit ba60c41 kvm: irqchip: fix memory leak > is a good candidate for -stable. I also got an agreement from Paolo. Now queued up. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [request for stable inclusion][Patch 3.4.y] x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()
On Fri, Jul 31, 2015 at 11:03:48AM +0800, Rui Xiang wrote: ping On 2015/7/29 19:03, Rui Xiang wrote: 2 days ago? For a very old and slowly updated kernel release? Hah, be happy if you get a response in a month... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] MIPS: KVM: do not sign extend on unsigned MMIO load
On Mon, Jun 08, 2015 at 09:33:50AM +0100, James Hogan wrote: Hi stable folk, On 08/05/15 15:16, James Hogan wrote: On 07/05/15 13:47, Nicholas Mc Guire wrote: Fix possible unintended sign extension in unsigned MMIO loads by casting to uint16_t in the case of mmio_needed != 2. Signed-off-by: Nicholas Mc Guire hof...@osadl.org Looks good to me. I wrote an MMIO test to reproduce the issue, and this fixes it. Reviewed-by: James Hogan james.ho...@imgtec.com Tested-by: James Hogan james.ho...@imgtec.com It looks suitable for stable too (3.10+). This has reached mainline, commit ed9244e6c534612d2b5ae47feab2f55a0d4b4ced Please could it be added to stable (3.10+). It does not apply to 3.10 or 3.14-stable, so please provide a backport if you want it there. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in
Re: [PATCH] KVM: VMX: Preserve host CR4.MCE value while in guest mode.
On Thu, Apr 16, 2015 at 11:58:05AM -0700, Ben Serebrin wrote: The host's decision to enable machine check exceptions should remain in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset and passed a slightly-modified 0 to the vmcs.guest_cr4 value. Tested: Built. On earlier version, tested by injecting machine check while a guest is spinning. Before the change, if guest CR4.MCE==0, then the machine check is escalated to Catastrophic Error (CATERR) and the machine dies. If guest CR4.MCE==1, then the machine check causes VMEXIT and is handled normally by host Linux. After the change, injecting a machine check causes normal Linux machine check handling. Signed-off-by: Ben Serebrin sereb...@google.com --- arch/x86/kvm/vmx.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) formletter This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. /formletter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock
On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote: * Greg KH gre...@linuxfoundation.org wrote: On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(lock-tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the add_smp() and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin sasha.le...@oracle.com Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Reviewed-by: Oleg Nesterov o...@redhat.com Acked-by: David Vrabel david.vra...@citrix.com --- arch/x86/include/asm/spinlock.h | 94 - arch/x86/kernel/kvm.c | 7 ++- arch/x86/xen/spinlock.c | 7 ++- 3 files changed, 58 insertions(+), 50 deletions(-) Changes for stable: - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous Compiler warnings (Linus, David Vbriel, PeterZ, Ingo) What is the git commit id of this in Linus's tree? What stable tree(s) do you want this applied to? It's: d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock You'll also need this fix from Linus to avoid (harmless) build warnings: dd36929720f4 kernel: make READ_ONCE() valid on const arguments Great. But what stable kernel trees should it be backported to? Just 3.19? Or anything older? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock
On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(lock-tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the add_smp() and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin sasha.le...@oracle.com Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Reviewed-by: Oleg Nesterov o...@redhat.com Acked-by: David Vrabel david.vra...@citrix.com --- arch/x86/include/asm/spinlock.h | 94 - arch/x86/kernel/kvm.c | 7 ++- arch/x86/xen/spinlock.c | 7 ++- 3 files changed, 58 insertions(+), 50 deletions(-) Changes for stable: - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous Compiler warnings (Linus, David Vbriel, PeterZ, Ingo) What is the git commit id of this in Linus's tree? What stable tree(s) do you want this applied to? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock
On Tue, Feb 24, 2015 at 11:49:13PM +0530, Raghavendra K T wrote: On 02/24/2015 08:17 PM, Ingo Molnar wrote: * Greg KH gre...@linuxfoundation.org wrote: On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(lock-tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock-tickets.tail TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the add_smp() and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin sasha.le...@oracle.com Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Reviewed-by: Oleg Nesterov o...@redhat.com Acked-by: David Vrabel david.vra...@citrix.com --- arch/x86/include/asm/spinlock.h | 94 - arch/x86/kernel/kvm.c | 7 ++- arch/x86/xen/spinlock.c | 7 ++- 3 files changed, 58 insertions(+), 50 deletions(-) Changes for stable: - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous Compiler warnings (Linus, David Vbriel, PeterZ, Ingo) What is the git commit id of this in Linus's tree? What stable tree(s) do you want this applied to? It's: d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock Yes, This is the original patch. Please note I have taken out the READ_ONCE changes from the original patch to avoid build warnings mentioned below. (Those READ_ONCE changes were cosmetic and was not present in the previous versions) You'll also need this fix from Linus to avoid (harmless) build warnings: dd36929720f4 kernel: make READ_ONCE() valid on const arguments So this may not be absolutely necessary with the current patch. I'd prefer to be as close as possible to the upstream patch. So if applying both of these patches will work, I'd much rather do that. Changing patches when backporting them to stable for no good reason than to clean things up, just confuses everyone involved. Let's keep our messy history :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] x86: export get_xsave_addr
On Thu, Dec 04, 2014 at 04:57:06PM +0100, Paolo Bonzini wrote: get_xsave_addr is the API to access XSAVE states, and KVM would like to use it. Export it. Use it in what way? Cc: sta...@vger.kernel.org Why is this a stable patch? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] driver core: platform: add device binding path 'driver_override'
On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote: Needed by platform device drivers, such as the upcoming vfio-platform driver, in order to bypass the existing OF, ACPI, id_table and name string matches, and successfully be able to be bound to any device, like so: echo vfio-platform /sys/bus/platform/devices/fff51000.ethernet/driver_override echo fff51000.ethernet /sys/bus/platform/devices/fff51000.ethernet/driver/unbind echo fff51000.ethernet /sys/bus/platform/drivers_probe This mimics PCI: Introduce new device binding path using pci_dev.driver_override, which is an interface enhancement for more deterministic PCI device binding, e.g., when in the presence of hotplug. Reviewed-by: Alex Williamson alex.william...@redhat.com Reviewed-by: Alexander Graf ag...@suse.de Reviewed-by: Stuart Yoder stuart.yo...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com --- Greg, This is largely identical to the PCI version of the same that has been accepted for v3.16 and ack'd by you: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009674.html and applied to Bjorn Helgaas' PCI tree: https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualizationid=782a985d7af26db39e86070d28f987cad21313c0 You are the platform driver core maintainer: can you apply this to your driver-core tree now? Sorry for the very long delay, it's now merged in my tree. Thanks for being persistant. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] driver core: platform: add device binding path 'driver_override'
On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote: Needed by platform device drivers, such as the upcoming vfio-platform driver, in order to bypass the existing OF, ACPI, id_table and name string matches, and successfully be able to be bound to any device, like so: echo vfio-platform /sys/bus/platform/devices/fff51000.ethernet/driver_override echo fff51000.ethernet /sys/bus/platform/devices/fff51000.ethernet/driver/unbind echo fff51000.ethernet /sys/bus/platform/drivers_probe This mimics PCI: Introduce new device binding path using pci_dev.driver_override, which is an interface enhancement for more deterministic PCI device binding, e.g., when in the presence of hotplug. Reviewed-by: Alex Williamson alex.william...@redhat.com Reviewed-by: Alexander Graf ag...@suse.de Reviewed-by: Stuart Yoder stuart.yo...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com --- Greg, This is largely identical to the PCI version of the same that has been accepted for v3.16 and ack'd by you: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009674.html and applied to Bjorn Helgaas' PCI tree: https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualizationid=782a985d7af26db39e86070d28f987cad21313c0 You are the platform driver core maintainer: can you apply this to your driver-core tree now? Yes, I will after this merge window ends, it's too late for 3.16-rc1 with the window opening up a week early, sorry. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override
On Tue, May 27, 2014 at 09:07:42PM -0600, Bjorn Helgaas wrote: On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote: The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = none) or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Greg, are you going to weigh in on this? It does seem to solve some real problems. ISTR you had an opinion once, but I don't know your current thoughts. This looks good to me: Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override
On Tue, May 27, 2014 at 09:07:42PM -0600, Bjorn Helgaas wrote: On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote: The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = none) or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Greg, are you going to weigh in on this? It does seem to solve some real problems. ISTR you had an opinion once, but I don't know your current thoughts. Give me a few more days, still digging through my patch backlog... thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq
On Fri, Apr 11, 2014 at 03:23:45PM +0800, Fam Zheng wrote: virtscsi_init calls virtscsi_remove_vqs on err, even before initializing the vqs. The latter calls virtscsi_set_affinity, so let's check the pointer there before setting affinity on it. This fixes a panic when setting device's num_queues=2 on RHEL 6.5: qemu-system-x86_64 ... \ -device virtio-scsi-pci,id=scsi0,addr=0x13,...,num_queues=2 \ -drive file=/stor/vm/dummy.raw,id=drive-scsi-disk,... \ -device scsi-hd,drive=drive-scsi-disk,... [0.354734] scsi0 : Virtio SCSI HBA [0.379504] BUG: unable to handle kernel NULL pointer dereference at 0020 [0.380141] IP: [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] PGD 0 [0.380141] Oops: [#1] SMP [0.380141] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #5 [0.380141] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [0.380141] task: 88003c9f ti: 88003c9f8000 task.ti: 88003c9f8000 [0.380141] RIP: 0010:[814741ef] [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP: :88003c9f9c08 EFLAGS: 00010256 [0.380141] RAX: RBX: 88003c3a9d40 RCX: 1070 [0.380141] RDX: 0002 RSI: RDI: [0.380141] RBP: 88003c9f9c28 R08: 000136c0 R09: 88003c801c00 [0.380141] R10: 81475229 R11: 0008 R12: [0.380141] R13: 81cc7ca8 R14: 88003cac3d40 R15: 88003cac37a0 [0.380141] FS: () GS:88003e40() knlGS: [0.380141] CS: 0010 DS: ES: CR0: 8005003b [0.380141] CR2: 0020 CR3: 01c0e000 CR4: 06f0 [0.380141] Stack: [0.380141] 88003c3a9d40 88003cac3d80 88003cac3d40 [0.380141] 88003c9f9c48 814742e8 88003c26d000 88003c26d000 [0.380141] 88003c9f9c68 81474321 88003c26d000 88003c3a9d40 [0.380141] Call Trace: [0.380141] [814742e8] virtscsi_set_affinity+0x28/0x40 [0.380141] [81474321] virtscsi_remove_vqs+0x21/0x50 [0.380141] [81475231] virtscsi_init+0x91/0x240 [0.380141] [81365290] ? vp_get+0x50/0x70 [0.380141] [81475544] virtscsi_probe+0xf4/0x280 [0.380141] [81363ea5] virtio_dev_probe+0xe5/0x140 [0.380141] [8144c669] driver_probe_device+0x89/0x230 [0.380141] [8144c8ab] __driver_attach+0x9b/0xa0 [0.380141] [8144c810] ? driver_probe_device+0x230/0x230 [0.380141] [8144c810] ? driver_probe_device+0x230/0x230 [0.380141] [8144ac1c] bus_for_each_dev+0x8c/0xb0 [0.380141] [8144c499] driver_attach+0x19/0x20 [0.380141] [8144bf28] bus_add_driver+0x198/0x220 [0.380141] [8144ce9f] driver_register+0x5f/0xf0 [0.380141] [81d27c91] ? spi_transport_init+0x79/0x79 [0.380141] [8136403b] register_virtio_driver+0x1b/0x30 [0.380141] [81d27d19] init+0x88/0xd6 [0.380141] [81d27c18] ? scsi_init_procfs+0x5b/0x5b [0.380141] [81ce88a7] do_one_initcall+0x7f/0x10a [0.380141] [81ce8aa7] kernel_init_freeable+0x14a/0x1de [0.380141] [81ce8b3b] ? kernel_init_freeable+0x1de/0x1de [0.380141] [817dec20] ? rest_init+0x80/0x80 [0.380141] [817dec29] kernel_init+0x9/0xf0 [0.380141] [817e68fc] ret_from_fork+0x7c/0xb0 [0.380141] [817dec20] ? rest_init+0x80/0x80 [0.380141] RIP [814741ef] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP 88003c9f9c08 [0.380141] CR2: 0020 [0.380141] ---[ end trace 8074b70c3d5e1d73 ]--- [0.475018] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [0.475018] [0.475068] Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [0.475068] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Signed-off-by: Fam Zheng f...@redhat.com --- drivers/scsi/virtio_scsi.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) formletter This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. /formletter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] PCI: Introduce new device binding path using pci_dev.driver_override
On Tue, Apr 01, 2014 at 10:28:54AM -0600, Alex Williamson wrote: The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device, ex: echo /sys/bus/pci/devices/:03:00.0/preferred_driver echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (preferred driver = none) or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Apologies for the exceptionally long cc list, this is a follow-up to Stuart's Subject: mechanism to allow a driver to bind to any device thread. This is effectively a v2 of the proof-of-concept patch I posted in that thread. This version changes to use a dummy id struct to return on an override match, which removes the collateral damage and greatly simplifies the patch. This feels fairly well baked for PCI and I would expect that platform drivers could do a similar implementation. From there perhaps we can discuss whether there's any advantage to placing driver_override on struct device. The logic for incorporating it into the match still needs to happen per bus driver, so it might only contribute to consistency of the show/store sysfs attributes to move it up to struct device. Please comment. Thanks, Alex drivers/pci/pci-driver.c | 25 ++--- drivers/pci/pci-sysfs.c | 40 include/linux/pci.h |1 + 3 files changed, 63 insertions(+), 3 deletions(-) No Documentation/ABI/ update to reflect the ABI you are creating? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] PCI: Introduce new device binding path using pci_dev.driver_override
On Tue, Apr 01, 2014 at 06:52:12PM -0500, Kim Phillips wrote: On Tue, 01 Apr 2014 10:28:54 -0600 Alex Williamson alex.william...@redhat.com wrote: The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub /sys/bus/pci/devices/:03:00.0/driver_override echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device, ex: echo /sys/bus/pci/devices/:03:00.0/preferred_driver echo :03:00.0 /sys/bus/pci/devices/:03:00.0/driver/unbind echo :03:00.0 /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (preferred driver = none) or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Apologies for the exceptionally long cc list, this is a follow-up to Stuart's Subject: mechanism to allow a driver to bind to any device thread. This is effectively a v2 of the proof-of-concept patch I posted in that thread. This version changes to use a dummy id struct to return on an override match, which removes the collateral damage and greatly simplifies the patch. This feels fairly well baked for PCI and I would expect that platform drivers could do a similar implementation. From there perhaps we can discuss whether there's any advantage to placing driver_override on struct device. The logic for incorporating it into the match still needs to happen per bus driver, so it might only contribute to consistency of the show/store sysfs attributes to move it up to struct device. Please comment. Sounds like Greg likes this approach more than {drv,dev}_sysfs_only. I have made no such judgement, I only pointed out that if you modify/add/remove a sysfs file, it needs to have documentation for it. The diff below is the result of duplicating and converting this patch for platform devices, and, indeed, binding a device to the vfio-platform driver succeeds with: echo vfio-platform /sys/bus/platform/devices/fff51000.ethernet/driver_override echo fff51000.ethernet /sys/bus/platform/devices/fff51000.ethernet/driver/unbind echo fff51000.ethernet /sys/bus/platform/drivers_probe However, it's almost pure duplication modulo the bus match code. The only other place I can see where to put the common bus check is drivers/base/base.h:driver_match_device(), which I'm guessing is off-limits? So should we leave this as per-bus code, and somehow refactor driver_override_{show,store}? If you can provide a way for this to be done in a bus-independant way, like we did for new_id and the like, I'd be open to reviewing it. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote: From: Kim Phillips kim.phill...@linaro.org Needed by drivers, such as the vfio platform driver [1], seeking to bypass bind_store()'s driver_match_device(), and bind to any device via a private sysfs bind file. [1] https://lkml.org/lkml/2013/12/11/522 note: the EXPORT_SYMBOL is needed because vfio-platform can be built as a module. No code outside of drivers/base/ should be calling this function, you are doing something wrong in your bus if you want to do this, please fix your bus code. sorry, I can't accept this at all. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: SVM: fix NMI window after iret
On Fri, Jan 17, 2014 at 08:52:42PM +0100, Radim Krčmář wrote: We should open NMI window right after an iret, but SVM exits before it. We wanted to single step using the trap flag and then open it. (or we could emulate the iret instead) We don't do it since commit 3842d135ff2 (likely), because the iret exit handler does not request an event, so NMI window remains closed until the next exit. Fix this by making KVM_REQ_EVENT request in the iret handler. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least) arch/x86/kvm/svm.c | 1 + 1 file changed, 1 insertion(+) formletter This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. /formletter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH backport hints for 3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote: The KVM patch fix vapic memory corruption applies to most kernels that have KVM, but the fix does not apply on many older branches. The APIs it uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had one fewer parameter. The comments in this patch should help fixing kvm_lapic_set_vapic_addr in older kernels. I will review the backports as they are posted to LKML (if I am CCed...). This fails to build on 3.4-stable, could you provide a working backport for that kernel? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH backport hints for 3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote: The KVM patch fix vapic memory corruption applies to most kernels that have KVM, but the fix does not apply on many older branches. The APIs it uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had one fewer parameter. The comments in this patch should help fixing kvm_lapic_set_vapic_addr in older kernels. I will review the backports as they are posted to LKML (if I am CCed...). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/lapic.c | 38 +- arch/x86/kvm/lapic.h | 4 ++-- arch/x86/kvm/x86.c | 33 + 3 files changed, 32 insertions(+), 43 deletions(-) I don't understand, what are people supposed to do with this? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd
On Wed, Oct 30, 2013 at 07:31:05PM +0530, Raghavendra K T wrote: On 10/30/2013 01:03 AM, Linus Torvalds wrote: On Tue, Oct 29, 2013 at 12:27 PM, Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote: Could one solution be cascading actual error that is lost in fs/debugfs/inode.c:__create_file(), so that we could take correct action in case of failure of debugfs_create_dir()? (ugly side is we increase total number of params for __create_file to 6). or I hope there could be some better solution. The solution to this would be to simply return an error-pointer. See linux/err.h. That's what we do for most complex subsystems that return a pointer to a struct: rather than returning NULL as an error, return the actual error number encoded in the pointer itself. Thank you Linus. Yes, this would have been ideal. But that would require every user of debugfs_create_dir() to be updated to check errors using IS_ERR() instead of checking against NULL, and there's quite a few of them. So I think just making the error be EEXIST is a simpler solution right now. Agree. I had below patch, and just before sending as formal mail I saw Paolo's pull request which already took care of it. ---8--- From ac5d9f038c273f27bea7a54aab6af79b57f56317 Mon Sep 17 00:00:00 2001 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Date: Wed, 30 Oct 2013 18:59:46 +0530 Subject: [PATCH] Return EEXIST on debugfs_create_dir failure in kvm As quoted by Linus, EFAULT means user passed in an invalid virtual address pointer, which is why the error string is Bad address. But when a debugfs directory creation fails, the above error is not valid. Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- I understand that Tim's patch that renames directory to something like kvm-pv would solve kvm-amd/kvm-intel modules insertion problem. This patch is to address error code change complained by Linus. arch/x86/kernel/kvm.c | 2 +- virt/kvm/kvm_main.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index a0e2a8a..e475fdb 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -622,7 +622,7 @@ static int __init kvm_spinlock_debugfs(void) d_kvm = kvm_init_debugfs(); if (d_kvm == NULL) - return -ENOMEM; + return -EEXIST; Why even error out at all here? You should never care about debugfs file creation return values, so why pass any error back up the stack? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd
On Wed, Oct 30, 2013 at 04:46:28PM +0100, Paolo Bonzini wrote: Il 30/10/2013 16:39, Raghavendra K T ha scritto: Why even error out at all here? You should never care about debugfs file creation return values, so why pass any error back up the stack? We could change this to return 0, but we will still be left with kvm_main.c: kvm_init_debug() function which creates the kvm debugfs directory. (I thought Paolo and Gleb's ack would be needed for that change since it would be a bigger decision for me) So I just made the patch to fix only Linus's concern. Even if it is okay to exit and not create the files (and I think it's a bit surprising), I'd like at least a printk to signal what's happening. But there should be no reason for debugfs directory creation to fail in the end, except for basic mistakes such as the one that Tim reported, so I think it's better to report the failure. Creation will fail if debugfs is not enabled, so spiting out printk messages in that case is not a good idea. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd
On Wed, Oct 30, 2013 at 05:08:09PM +0100, Paolo Bonzini wrote: Il 30/10/2013 16:59, Greg KH ha scritto: Even if it is okay to exit and not create the files (and I think it's a bit surprising), I'd like at least a printk to signal what's happening. But there should be no reason for debugfs directory creation to fail in the end, except for basic mistakes such as the one that Tim reported, so I think it's better to report the failure. Creation will fail if debugfs is not enabled, so spiting out printk messages in that case is not a good idea. Interestingly, if debugfs is not enabled we are already returning an error-valued pointer: static inline struct dentry *debugfs_create_dir(const char *name, struct dentry *parent) { return ERR_PTR(-ENODEV); } which would oops a lot of the current callers. It will oops? Really? Where? That shouldn't happen at all. Very few places use the currently correct idiom if (IS_ERR(root) || !root) but it's very ugly... Perhaps debugfs_create_dir *should* return an error-valued pointer after all. Or just don't care about the return value, and all will work out just fine, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd
On Wed, Oct 30, 2013 at 12:57:32AM +0530, Raghavendra K T wrote: Adding Greg/AI too since we touch debugfs code. You do? [...] sudo modprobe kvm_amd modprobe: ERROR: could not insert 'kvm_amd': Bad address Bad address? Christ people, are you guys making up error numbers with some kind of dice-roll? I can just see it now, somebody sitting there with a D20, playing some kind of kernel-specific DD, and rolling a ten means that you get to slay the orc, and pick an error number of EFAULT for some random kernel function. Because quite frankly, random dice roll is the _only_ thing that explains Bad address sufficiently. Please, whoever wrote virt/kvm/kvm_main.c:: kvm_init_debug(), WTF? EFAULT means user passed in an invalid virtual address pointer, which is why the error string is Bad address. It makes absolutely NO SENSE here. Perhaps EEXIST or EBUSY. Right. In current scenario it should have been EEXIST :(. debugfs_create_dir() currently returns NULL dentry on both EEXIST, ENOMEM ... cases. Could one solution be cascading actual error that is lost in fs/debugfs/inode.c:__create_file(), so that we could take correct action in case of failure of debugfs_create_dir()? What would you do here? You shouldn't really care about debugfs files, if there's an error, keep on going, no code path should really care, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing
On Wed, Oct 02, 2013 at 10:44:31AM +0200, Borislav Petkov wrote: On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote: At best the current code only seems to free the leaf pagetables and the root. If you're unlucky enough to have a large gap (like any QEMU guest with more than 3G of memory), only the first chunk of leaf pagetables are freed (plus the root). This is a massive memory leak. This patch re-writes the pagetable freeing function to use a recursive algorithm and manages to not only free all the pagetables, but does it without any apparent performance loss versus the current broken version. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: sta...@vger.kernel.org --- Suggesting for stable, would like to see some soak time, but it's hard to imagine this being any worse than the current code. Btw, I have a backport for the 3.0.x series which builds fine here, in case you guys are interested :) Thanks, now applied. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable-3.4] possibly revert KVM: X86 emulator: fix source operand decoding...
On Tue, Sep 10, 2013 at 01:05:21PM +0200, Paolo Bonzini wrote: Il 04/09/2013 18:44, Paul Gortmaker ha scritto: Hi Greg, The 3.4.44+ cherry pick: commit 5b5b30580218eae22609989546bac6e44d0eda6e Author: Gleb Natapov g...@redhat.com Date: Wed Apr 24 13:38:36 2013 +0300 KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions commit 660696d1d16a71e15549ce1bf74953be1592bcd3 upstream. Source operand for one byte mov[zs]x is decoded incorrectly if it is in high byte register. Fix that. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org introduces the following: arch/x86/kvm/emulate.c: In function ‘decode_operand’: arch/x86/kvm/emulate.c:3974:4: warning: passing argument 1 of ‘decode_register’ makes integer from pointer without a cast [enabled by default] arch/x86/kvm/emulate.c:789:14: note: expected ‘u8’ but argument is of type ‘struct x86_emulate_ctxt *’ arch/x86/kvm/emulate.c:3974:4: warning: passing argument 2 of ‘decode_register’ makes pointer from integer without a cast [enabled by default] arch/x86/kvm/emulate.c:789:14: note: expected ‘long unsigned int *’ but argument is of type ‘u8’ Based on the severity of the warnings above, I'm reasonably sure there will be some kind of runtime regressions due to this, but I stopped to investigate the warnings as soon as I saw them, before any run time testing. It happens because mainline v3.7-rc1~113^2~40 (dd856efafe60) does this: -static void *decode_register(u8 modrm_reg, unsigned long *regs, +static void *decode_register(struct x86_emulate_ctxt *ctxt, u8 modrm_reg, Since 660696d1d16a71e1 was only applied to stable 3.4, 3.8, and 3.9 -- and the prerequisite above is in 3.7+, the issue should be limited to 3.4.44+ Right, the fix is not important to have for 3.4 kernels. Thanks for letting me know, I've now reverted it. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution
On Mon, Sep 17, 2012 at 01:28:50PM -0700, Greg KH wrote: On Mon, Sep 17, 2012 at 12:40:16PM -0700, Tejun Heo wrote: On Fri, Sep 14, 2012 at 03:50:40PM -0700, Colin Cross wrote: This patch set fixes a reproducible crash I'm seeing on a 3.4.10 kernel. flush_kthread_worker (which is different from flush_kthread_work) is initializing a kthread_work and a completion on the stack, then queuing it and calling wait_for_completion. Once the completion is signaled, flush_kthread_worker exits and the stack region used by the kthread_work may be immediately reused by another object on the stack, but kthread_worker_fn continues accessing its work pointer: work-func(work); - calls complete, effectively frees work smp_wmb(); /* wmb worker-b0 paired with flush-b1 */ work-done_seq = work-queue_seq; - overwrites a new stack object smp_mb(); /* mb worker-b1 paired with flush-b0 */ if (atomic_read(work-flushing)) wake_up_all(work-done); - or crashes here These patches fix the problem by not accessing work after work-func is called, and should be backported to stable. They apply cleanly to 3.4.10. Upstream commits are 9a2e03d8ed518a61154f18d83d6466628e519f94 and 46f3d976213452350f9d10b0c2780c2681f7075b. Yeah, you're right. I wonder why this didn't come up before. Greg, can you please pick up these two commits? Ok, will do, thanks for letting me know. Now applied, thanks. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution
On Mon, Sep 17, 2012 at 12:40:16PM -0700, Tejun Heo wrote: On Fri, Sep 14, 2012 at 03:50:40PM -0700, Colin Cross wrote: This patch set fixes a reproducible crash I'm seeing on a 3.4.10 kernel. flush_kthread_worker (which is different from flush_kthread_work) is initializing a kthread_work and a completion on the stack, then queuing it and calling wait_for_completion. Once the completion is signaled, flush_kthread_worker exits and the stack region used by the kthread_work may be immediately reused by another object on the stack, but kthread_worker_fn continues accessing its work pointer: work-func(work); - calls complete, effectively frees work smp_wmb(); /* wmb worker-b0 paired with flush-b1 */ work-done_seq = work-queue_seq; - overwrites a new stack object smp_mb(); /* mb worker-b1 paired with flush-b0 */ if (atomic_read(work-flushing)) wake_up_all(work-done); - or crashes here These patches fix the problem by not accessing work after work-func is called, and should be backported to stable. They apply cleanly to 3.4.10. Upstream commits are 9a2e03d8ed518a61154f18d83d6466628e519f94 and 46f3d976213452350f9d10b0c2780c2681f7075b. Yeah, you're right. I wonder why this didn't come up before. Greg, can you please pick up these two commits? Ok, will do, thanks for letting me know. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: manual merge of the staging tree with the target-merge tree
On Sun, Jul 22, 2012 at 02:44:23PM -0700, Nicholas A. Bellinger wrote: So Linus has merged target-pending/for-next this afternoon, so now we are just waiting on net-next to hit mainline with the vhost patches already ACK'ed by MST. Hopefully that makes things easier for you to considering taking tcm_vhost upstream via staging. ;) Also, MST asked for an RFC-v5 for the initial merge commit with some minor debug wrapper changes that will be going out next week. This will include a move into drivers/staging/tcm_vhost/ against a rebased staging.git patch with the necessary -rc0 mainline dependencies. Please let me know if your OK with this, otherwise I'll just plan to keep -v5 against target-pending/for-next-merge for now, and send a GIT PULL after MST gets back from holiday on the 29th - 30th. I have no idea what any of the above three paragraphs are asking for, or talking about, sorry. Note, the merge window is closed for taking new stuff into the staging tree. If it isn't already in my staging-next tree, it isn't going to go into 3.6 unless it's bug fixes, sorry. How about we figure all of this out after 3.6-rc1 is out so we can understand what is going on for 3.7? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] Documentation: Add ABI entry for vmcs sysfs interface
On Wed, Jul 04, 2012 at 06:06:28PM +0800, Yanfei Zhang wrote: Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- Documentation/ABI/testing/sysfs-devices-system-cpu | 21 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 5dab364..6efbd6c 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -9,6 +9,27 @@ Description: /sys/devices/system/cpu/cpu#/ +What:/sys/devices/system/cpu/vmcs_id +Date:June 2012 +KernelVersion: 3.5.0 3.5.0 will not have this feature in it, so you should probably change these lines in this patch. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] Documentation: Add ABI entry for vmcs sysfs interface
On Wed, Jul 04, 2012 at 06:06:28PM +0800, Yanfei Zhang wrote: Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- Documentation/ABI/testing/sysfs-devices-system-cpu | 21 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 5dab364..6efbd6c 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -9,6 +9,27 @@ Description: /sys/devices/system/cpu/cpu#/ +What:/sys/devices/system/cpu/vmcs_id Wait, aren't these a per-cpu value? You have them as a global value for all cpus, is that really the case? Shouldn't they be under /sys/devices/system/cpu/cpu#/ instead? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs
On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote: 于 2012年06月28日 03:22, Greg KH 写道: On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote: This patch export offsets of fields via /sys/devices/cpu/vmcs/. Individual offsets are contained in subfiles named by the filed's encoding, e.g.: /sys/devices/cpu/vmcs/0800 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- drivers/base/core.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..dd05ee7 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -26,6 +26,7 @@ #include linux/async.h #include linux/pm_runtime.h #include linux/netdevice.h +#include asm/vmcsinfo.h Did you just break the build on all other arches? Not nice. @@ -1038,6 +1039,11 @@ int device_add(struct device *dev) error = dpm_sysfs_add(dev); if (error) goto DPMError; +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE) + error = vmcs_sysfs_add(dev); + if (error) + goto VMCSError; +#endif Oh my no, that's no way to ever do this, you know better than that, please fix. greg k-h Sorry for my thoughtless, Here is the new patch. --- drivers/base/core.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..7b5266a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -30,6 +30,13 @@ #include base.h #include power/power.h +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE) +#include asm/vmcsinfo.h +#else +static inline int vmcs_sysfs_add(struct device *dev) { return 0; } +static inline void vmcs_sysfs_remove(struct device *dev) { } +#endif {sigh} No, again, you know better, don't do this. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs
On Thu, Jun 28, 2012 at 04:37:38AM -0700, Greg KH wrote: On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote: 于 2012年06月28日 03:22, Greg KH 写道: On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote: This patch export offsets of fields via /sys/devices/cpu/vmcs/. Individual offsets are contained in subfiles named by the filed's encoding, e.g.: /sys/devices/cpu/vmcs/0800 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- drivers/base/core.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..dd05ee7 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -26,6 +26,7 @@ #include linux/async.h #include linux/pm_runtime.h #include linux/netdevice.h +#include asm/vmcsinfo.h Did you just break the build on all other arches? Not nice. @@ -1038,6 +1039,11 @@ int device_add(struct device *dev) error = dpm_sysfs_add(dev); if (error) goto DPMError; +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE) +error = vmcs_sysfs_add(dev); +if (error) +goto VMCSError; +#endif Oh my no, that's no way to ever do this, you know better than that, please fix. greg k-h Sorry for my thoughtless, Here is the new patch. --- drivers/base/core.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..7b5266a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -30,6 +30,13 @@ #include base.h #include power/power.h +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE) +#include asm/vmcsinfo.h +#else +static inline int vmcs_sysfs_add(struct device *dev) { return 0; } +static inline void vmcs_sysfs_remove(struct device *dev) { } +#endif {sigh} No, again, you know better, don't do this. Ok, as others have rightly pointed out, this wasn't the most helpful review comment, sorry about that. In Linux, we don't put ifdefs in .c files, we put them in .h files. See many examples of this all over the place. That's my main complaints the past two times of this patch. But, for this, I would question why you even want / need to do this in the drivers/base/core/ file in the first place. Shouldn't it be in some arch or cpu specific file instead that already handles the cpu files? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs
On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote: This patch export offsets of fields via /sys/devices/cpu/vmcs/. Individual offsets are contained in subfiles named by the filed's encoding, e.g.: /sys/devices/cpu/vmcs/0800 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- drivers/base/core.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..dd05ee7 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -26,6 +26,7 @@ #include linux/async.h #include linux/pm_runtime.h #include linux/netdevice.h +#include asm/vmcsinfo.h Did you just break the build on all other arches? Not nice. @@ -1038,6 +1039,11 @@ int device_add(struct device *dev) error = dpm_sysfs_add(dev); if (error) goto DPMError; +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE) + error = vmcs_sysfs_add(dev); + if (error) + goto VMCSError; +#endif Oh my no, that's no way to ever do this, you know better than that, please fix. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] driver core: Add iommu_group tracking to struct device
On Fri, May 11, 2012 at 04:55:35PM -0600, Alex Williamson wrote: IOMMU groups allow IOMMU drivers to represent DMA visibility and isolation of devices. Multiple devices may be grouped together for the purposes of DMA. Placing a pointer on struct device enable easy access for things like streaming DMA programming and drivers like VFIO. Signed-off-by: Alex Williamson alex.william...@redhat.com Can't you get this today from the iommu_ops pointer that is on the bus that the device is associated with? Or can devices on a bus have different iommu_group pointers? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/13] iommu: IOMMU Groups
On Fri, May 11, 2012 at 04:55:41PM -0600, Alex Williamson wrote: IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named devices contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ snip As you are creating new sysfs files and directories, you need to also add the proper Documentation/ABI/ files at the same time. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] driver core: Add iommu_group tracking to struct device
On Fri, May 11, 2012 at 05:58:01PM -0600, Alex Williamson wrote: On Fri, 2012-05-11 at 16:38 -0700, Greg KH wrote: On Fri, May 11, 2012 at 04:55:35PM -0600, Alex Williamson wrote: IOMMU groups allow IOMMU drivers to represent DMA visibility and isolation of devices. Multiple devices may be grouped together for the purposes of DMA. Placing a pointer on struct device enable easy access for things like streaming DMA programming and drivers like VFIO. Signed-off-by: Alex Williamson alex.william...@redhat.com Can't you get this today from the iommu_ops pointer that is on the bus that the device is associated with? Or can devices on a bus have different iommu_group pointers? The latter, each device on a bus might be it's own group. This is often the case on x86 unless PCIe-to-PCI bridges obscure the device visibility. Thanks, Ah, ok, then I have no objection to add this to struct device: Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] ksysfs: export VMCSINFO via sysfs
On Tue, Apr 17, 2012 at 09:52:42AM +0800, zhangyanfei wrote: 于 2012年04月13日 07:00, Greg KH 写道: On Wed, Apr 11, 2012 at 09:57:34AM +0800, zhangyanfei wrote: This patch creates sysfs file to export where VMCSINFO is allocated, as below: $ cat /sys/kernel/vmcsinfo 1cb88a0 2000 number on the left-hand side is the physical address of VMCSINFO, while the one on the right-hand side is the max size of VMCSINFO. Ick, why do you have 2 values in one sysfs file, that's not nice, or good. What's wrong with 2 different files? The reason why I put the 2 values in one sysfs file is that there is a similar file 'vmcoreinfo' in sysfs. # cat /sys/kernel/vmcoreinfo 1d75380 1000 Then that should be fixed as well, using two different file names now :( thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] ksysfs: export VMCSINFO via sysfs
On Wed, Apr 11, 2012 at 09:57:34AM +0800, zhangyanfei wrote: This patch creates sysfs file to export where VMCSINFO is allocated, as below: $ cat /sys/kernel/vmcsinfo 1cb88a0 2000 number on the left-hand side is the physical address of VMCSINFO, while the one on the right-hand side is the max size of VMCSINFO. Ick, why do you have 2 values in one sysfs file, that's not nice, or good. What's wrong with 2 different files? Also, any new sysfs file you add needs to also have a Documentation/ABI entry added as well. But we can't accept this as-is, sorry, please split it up into 2 files. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] kvm: iommu unmap fixes
On Wed, Apr 11, 2012 at 09:51:43AM -0600, Alex Williamson wrote: This is a documentation change only from the previous version. After discussing it, there is a potential page leak as noted in the updated changelog for the first patch. Thanks, formletter This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. /formletter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs
On Fri, Mar 30, 2012 at 04:49:12PM -0400, Konrad Rzeszutek Wilk wrote: On Fri, Mar 23, 2012 at 01:36:28PM +0530, Raghavendra K T wrote: From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Move the code from Xen to debugfs to make the code common for other users as well. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Greg, I was thinking to stick this patch in my queue, but I need your OK since it touches fs/debugfs/file.c. That's fine with me: Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3.0.y 1/2] KVM: x86: extend struct x86_emulate_ops with get_cpuid
On Thu, Mar 22, 2012 at 09:50:43AM +0100, Stefan Bader wrote: From eaee58e1433e1b16e686cfcdcbc207d4310a239f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stephan=20B=C3=A4rwolf?= stephan.baerw...@tu-ilmenau.de Date: Thu, 12 Jan 2012 16:43:03 +0100 Subject: [PATCH 7/8] KVM: x86: extend struct x86_emulate_ops with get_cpuid Stefan, what's with the crappy header here? And the Subject? Ick. Come on, forcing me to hand-edit your patches just makes me grumpy... greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3.0.y 1/2] KVM: x86: extend struct x86_emulate_ops with get_cpuid
On Fri, Mar 23, 2012 at 06:47:38PM +0100, Stefan Bader wrote: On 23.03.2012 18:22, Greg KH wrote: On Thu, Mar 22, 2012 at 09:50:43AM +0100, Stefan Bader wrote: From eaee58e1433e1b16e686cfcdcbc207d4310a239f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stephan=20B=C3=A4rwolf?= stephan.baerw...@tu-ilmenau.de Date: Thu, 12 Jan 2012 16:43:03 +0100 Subject: [PATCH 7/8] KVM: x86: extend struct x86_emulate_ops with get_cpuid Stefan, what's with the crappy header here? And the Subject? Ick. Come on, forcing me to hand-edit your patches just makes me grumpy... greg k-h Greg, this is exactly the patch you get by using git format-patch and send it away with git send-email. So I would say any grumpy or constructive comments should go to the git development mailing list. ;) No it isn't at all. That's not how the patch looks using git-send-email for anything I've ever sent out. Don't blame the tool, use it correctly. Hint, look at the 7/8 crap, what is that there for? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CVE-2012-0045 for 3.2.y, 3.0.y and 2.6.32.y (again)
On Thu, Mar 22, 2012 at 09:50:40AM +0100, Stefan Bader wrote: Resubmitting with more of the proper maintainers subscribed (note that Marcelo is one of them)... Again, for the others on the cc:, I can't take these in the stable tree(s) until I get an ack from the maintainers of the code. Especially given that the KVM developers have told me in the past to not take any patches unless they send them to me themselves... greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1 v3] PCI: Device specific reset function
On Mon, Mar 05, 2012 at 10:00:49AM +, Tadeusz Struk wrote: --- drivers/pci/pci.h|1 + drivers/pci/quirks.c | 33 +++-- include/linux/pci.h |1 + 3 files changed, 29 insertions(+), 6 deletions(-) Please read Documentation/SubmittingPatches for how to properly create, and send, patches that are in a format that can be accepted. Hint, also run your patch through scripts/checkpatch.pl to find the obvious problems that are in it, to keep us from having to do that for you... This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. I have been told that such email footers require that the patch be deleted and never be accepted. Please fix this if you expect your patches to be able to be applied. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[18/48] KVM: Device assignment permission checks
3.1-stable review patch. If anyone has any objections, please let me know. -- From: Alex Williamson alex.william...@redhat.com (cherry picked from commit 3d27e23b17010c668db311140b1770c78fb9) Only allow KVM device assignment to attach to devices which: - Are not bridges - Have BAR resources (assume others are special devices) - The user has permissions to use Assigning a bridge is a configuration error, it's not supported, and typically doesn't result in the behavior the user is expecting anyway. Devices without BAR resources are typically chipset components that also don't have host drivers. We don't want users to hold such devices captive or cause system problems by fencing them off into an iommu domain. We determine permission to use by testing whether the user has access to the PCI sysfs resource files. By default a normal user will not have access to these files, so it provides a good indication that an administration agent has granted the user access to the device. [Yang Bai: add missing #include] [avi: fix comment style] Signed-off-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Yang Bai hamo...@gmail.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- Documentation/virtual/kvm/api.txt |4 ++ virt/kvm/assigned-dev.c | 75 ++ 2 files changed, 79 insertions(+) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1134,6 +1134,10 @@ following flags are specified: The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure isolation of the device. Usages not specifying this flag are deprecated. +Only PCI header type 0 devices with PCI BAR resources are supported by +device assignment. The user requesting this ioctl must have read/write +access to the PCI sysfs resource files associated with the device. + 4.49 KVM_DEASSIGN_PCI_DEVICE Capability: KVM_CAP_DEVICE_DEASSIGNMENT --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -17,6 +17,8 @@ #include linux/pci.h #include linux/interrupt.h #include linux/slab.h +#include linux/namei.h +#include linux/fs.h #include irq.h static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, @@ -474,12 +476,73 @@ out: return r; } +/* + * We want to test whether the caller has been granted permissions to + * use this device. To be able to configure and control the device, + * the user needs access to PCI configuration space and BAR resources. + * These are accessed through PCI sysfs. PCI config space is often + * passed to the process calling this ioctl via file descriptor, so we + * can't rely on access to that file. We can check for permissions + * on each of the BAR resource files, which is a pretty clear + * indicator that the user has been granted access to the device. + */ +static int probe_sysfs_permissions(struct pci_dev *dev) +{ +#ifdef CONFIG_SYSFS + int i; + bool bar_found = false; + + for (i = PCI_STD_RESOURCES; i = PCI_STD_RESOURCE_END; i++) { + char *kpath, *syspath; + struct path path; + struct inode *inode; + int r; + + if (!pci_resource_len(dev, i)) + continue; + + kpath = kobject_get_path(dev-dev.kobj, GFP_KERNEL); + if (!kpath) + return -ENOMEM; + + /* Per sysfs-rules, sysfs is always at /sys */ + syspath = kasprintf(GFP_KERNEL, /sys%s/resource%d, kpath, i); + kfree(kpath); + if (!syspath) + return -ENOMEM; + + r = kern_path(syspath, LOOKUP_FOLLOW, path); + kfree(syspath); + if (r) + return r; + + inode = path.dentry-d_inode; + + r = inode_permission(inode, MAY_READ | MAY_WRITE | MAY_ACCESS); + path_put(path); + if (r) + return r; + + bar_found = true; + } + + /* If no resources, probably something special */ + if (!bar_found) + return -EPERM; + + return 0; +#else + return -EINVAL; /* No way to control the device without sysfs */ +#endif +} + static int kvm_vm_ioctl_assign_device(struct kvm *kvm, struct kvm_assigned_pci_dev *assigned_dev) { int r = 0, idx; struct kvm_assigned_dev_kernel *match; struct pci_dev *dev; + u8 header_type; if (!(assigned_dev-flags KVM_DEV_ASSIGN_ENABLE_IOMMU)) return -EINVAL; @@ -510,6 +573,18 @@ static int kvm_vm_ioctl_assign_device(st r = -EINVAL; goto out_free; } + + /* Don't allow bridges to be assigned */ + pci_read_config_byte(dev, PCI_HEADER_TYPE, header_type); + if
[17/48] KVM: Remove ability to assign a device without iommu support
3.1-stable review patch. If anyone has any objections, please let me know. -- From: Alex Williamson alex.william...@redhat.com (cherry picked from commit 423873736b78f549fbfa2f715f2e4de7e6c5e1e9) This option has no users and it exposes a security hole that we can allow devices to be assigned without iommu protection. Make KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option. Signed-off-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- Documentation/virtual/kvm/api.txt |3 +++ virt/kvm/assigned-dev.c | 18 +- 2 files changed, 12 insertions(+), 9 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1131,6 +1131,9 @@ following flags are specified: /* Depends on KVM_CAP_IOMMU */ #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) +The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure +isolation of the device. Usages not specifying this flag are deprecated. + 4.49 KVM_DEASSIGN_PCI_DEVICE Capability: KVM_CAP_DEVICE_DEASSIGNMENT --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -481,6 +481,9 @@ static int kvm_vm_ioctl_assign_device(st struct kvm_assigned_dev_kernel *match; struct pci_dev *dev; + if (!(assigned_dev-flags KVM_DEV_ASSIGN_ENABLE_IOMMU)) + return -EINVAL; + mutex_lock(kvm-lock); idx = srcu_read_lock(kvm-srcu); @@ -538,16 +541,14 @@ static int kvm_vm_ioctl_assign_device(st list_add(match-list, kvm-arch.assigned_dev_head); - if (assigned_dev-flags KVM_DEV_ASSIGN_ENABLE_IOMMU) { - if (!kvm-arch.iommu_domain) { - r = kvm_iommu_map_guest(kvm); - if (r) - goto out_list_del; - } - r = kvm_assign_device(kvm, match); + if (!kvm-arch.iommu_domain) { + r = kvm_iommu_map_guest(kvm); if (r) goto out_list_del; } + r = kvm_assign_device(kvm, match); + if (r) + goto out_list_del; out: srcu_read_unlock(kvm-srcu, idx); @@ -587,8 +588,7 @@ static int kvm_vm_ioctl_deassign_device( goto out; } - if (match-flags KVM_DEV_ASSIGN_ENABLE_IOMMU) - kvm_deassign_device(kvm, match); + kvm_deassign_device(kvm, match); kvm_free_assigned_device(kvm, match); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[15/48] KVM guest: prevent tracing recursion with kvmclock
3.1-stable review patch. If anyone has any objections, please let me know. -- From: Avi Kivity a...@redhat.com (cherry picked from commit 95ef1e52922cf75b1ea2eae54ef886f2cc47eecb) Prevent tracing of preempt_disable() in get_cpu_var() in kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are traced and this causes the function_graph tracer to go into an infinite recursion. By open coding the preempt_disable() around the get_cpu_var(), we can use the notrace version which prevents preempt_disable/enable() from being traced and prevents the recursion. Based on a similar patch for Xen from Jeremy Fitzhardinge. Tested-by: Gleb Natapov g...@redhat.com Acked-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- arch/x86/kernel/kvmclock.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -74,9 +74,10 @@ static cycle_t kvm_clock_read(void) struct pvclock_vcpu_time_info *src; cycle_t ret; - src = get_cpu_var(hv_clock); + preempt_disable_notrace(); + src = __get_cpu_var(hv_clock); ret = pvclock_clocksource_read(src); - put_cpu_var(hv_clock); + preempt_enable_notrace(); return ret; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[16/48] KVM: x86: Prevent starting PIT timers in the absence of irqchip support
3.1-stable review patch. If anyone has any objections, please let me know. -- From: Jan Kiszka jan.kis...@siemens.com (cherry picked from commit 0924ab2cfa98b1ece26c033d696651fd62896c69) User space may create the PIT and forgets about setting up the irqchips. In that case, firing PIT IRQs will crash the host: BUG: unable to handle kernel NULL pointer dereference at 0128 IP: [a10f6280] kvm_set_irq+0x30/0x170 [kvm] ... Call Trace: [a11228c1] pit_do_work+0x51/0xd0 [kvm] [81071431] process_one_work+0x111/0x4d0 [81071bb2] worker_thread+0x152/0x340 [81075c8e] kthread+0x7e/0x90 [815a4474] kernel_thread_helper+0x4/0x10 Prevent this by checking the irqchip mode before starting a timer. We can't deny creating the PIT if the irqchips aren't set up yet as current user land expects this order to work. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- arch/x86/kvm/i8254.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -338,11 +338,15 @@ static enum hrtimer_restart pit_timer_fn return HRTIMER_NORESTART; } -static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) +static void create_pit_timer(struct kvm *kvm, u32 val, int is_period) { + struct kvm_kpit_state *ps = kvm-arch.vpit-pit_state; struct kvm_timer *pt = ps-pit_timer; s64 interval; + if (!irqchip_in_kernel(kvm)) + return; + interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ); pr_debug(create pit timer, interval is %llu nsec\n, interval); @@ -394,13 +398,13 @@ static void pit_load_count(struct kvm *k /* FIXME: enhance mode 4 precision */ case 4: if (!(ps-flags KVM_PIT_FLAGS_HPET_LEGACY)) { - create_pit_timer(ps, val, 0); + create_pit_timer(kvm, val, 0); } break; case 2: case 3: if (!(ps-flags KVM_PIT_FLAGS_HPET_LEGACY)){ - create_pit_timer(ps, val, 1); + create_pit_timer(kvm, val, 1); } break; default: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM updates for Linux 3.1.8
On Thu, Jan 12, 2012 at 12:39:50PM +0200, Avi Kivity wrote: The following patches want to be applied to the 3.1 branch. All users must update. Do any of these also need to go to the 3.0 branch? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM updates for Linux 3.1.8
On Thu, Jan 12, 2012 at 06:25:43PM +0200, Avi Kivity wrote: On 01/12/2012 06:15 PM, Greg KH wrote: On Thu, Jan 12, 2012 at 12:39:50PM +0200, Avi Kivity wrote: The following patches want to be applied to the 3.1 branch. All users must update. Do any of these also need to go to the 3.0 branch? Yes. But for kvm we regression test stable patches, and I haven't done this yet. I find the normal stable process, while very convenient, frightening. What do you mean by this? And thanks for the patches, they are all now queued up. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32
On Mon, Oct 24, 2011 at 02:58:47PM +0530, Raghavendra K T wrote: On 10/24/2011 03:49 AM, Greg KH wrote: On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote: Renaming of xen functions and change unsigned to u32. Why not just rename when you move the functions? Why the extra step? Intention was only clarity. Yes, if this patch is an overhead, I 'll combine both the patches. Yeah, it makes more sense as it originally confused me why you were adding a xen_* function to the debugfs core code :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 1/5] debugfs: Add support to print u32 array in debugfs
On Mon, Oct 24, 2011 at 12:34:04AM +0530, Raghavendra K T wrote: Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs to make the code common for other users as well. You forgot the kerneldoc for the function explaining what it is and how to use it, and the EXPORT_SYMBOL_GPL() marking for the global function as that's the only way it will be able to be used, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32
On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote: Renaming of xen functions and change unsigned to u32. Why not just rename when you move the functions? Why the extra step? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
On Tue, Sep 13, 2011 at 04:54:02PM +0200, Roedel, Joerg wrote: --- a/include/linux/device.h +++ b/include/linux/device.h @@ -22,6 +22,7 @@ #include linux/types.h #include linux/module.h #include linux/pm.h +#include linux/iommu.h Ick, please don't add new #includes to device.h, it makes the whole build slower. Just pre-declare the structure and all should be fine. Don't you need to also redo the other patches in this series based on the other comments you received? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
On Tue, Sep 13, 2011 at 05:38:11PM +0200, Roedel, Joerg wrote: On Tue, Sep 13, 2011 at 10:58:55AM -0400, Greg KH wrote: On Tue, Sep 13, 2011 at 04:54:02PM +0200, Roedel, Joerg wrote: --- a/include/linux/device.h +++ b/include/linux/device.h @@ -22,6 +22,7 @@ #include linux/types.h #include linux/module.h #include linux/pm.h +#include linux/iommu.h Ick, please don't add new #includes to device.h, it makes the whole build slower. Just pre-declare the structure and all should be fine. Hmm, since linux/iommu.h provides 'struct iommu_ops', and this patch adds a 'struct iommu_ops' to 'struct bus_type', wouldn't a simple forward declaration make the bus_type incomplete in most other places? No, just like it doesn't make iommu.h incomplete as you used a struct bus_type there. To lower the impact I can move the 'struct iommu_ops' to a seperate header file instead. No, that should not be necessary. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
On Wed, Sep 07, 2011 at 05:41:45PM +0200, Joerg Roedel wrote: This is the starting point to make the iommu_ops used for the iommu-api a per-bus-type structure. It is required to easily implement bus-specific setup in the iommu-layer. The first user will be the iommu-group attribute in sysfs. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- drivers/base/bus.c | 16 drivers/iommu/iommu.c |4 include/linux/device.h |9 + include/linux/iommu.h |2 ++ 4 files changed, 31 insertions(+), 0 deletions(-) diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 000e7b2..34ac706 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -1028,6 +1028,22 @@ void bus_sort_breadthfirst(struct bus_type *bus, } EXPORT_SYMBOL_GPL(bus_sort_breadthfirst); +#ifdef CONFIG_IOMMU_API +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops) +{ + if (bus-iommu_ops != NULL) + return -EBUSY; Busy? + + bus-iommu_ops = ops; + + /* Do IOMMU specific setup for this bus-type */ + iommu_bus_init(bus, ops); + + return 0; +} +EXPORT_SYMBOL_GPL(bus_set_iommu); I don't understand what this function is for, and who would call it. Please provide kerneldoc that explains this. +#endif + int __init buses_init(void) { bus_kset = kset_create_and_add(bus, bus_uevent_ops, NULL); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 30b0644..3b24a5b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -34,6 +34,10 @@ void register_iommu(struct iommu_ops *ops) iommu_ops = ops; } +void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops) +{ +} + bool iommu_found(void) { return iommu_ops != NULL; diff --git a/include/linux/device.h b/include/linux/device.h index c20dfbf..8240b2a 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -22,6 +22,7 @@ #include linux/types.h #include linux/module.h #include linux/pm.h +#include linux/iommu.h #include linux/atomic.h #include asm/device.h @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *); * @resume: Called to bring a device on this bus out of sleep mode. * @pm: Power management operations of this bus, callback the specific * device driver's pm-ops. + * @iommu_ops IOMMU specific operations for this bus, used to attach IOMMU + * driver implementations to a bus and allow the driver to do + * bus-specific setup So why is this just not set by the bus itself, making the above function not needed at all? confused, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10] IOMMU: Make iommu_ops per-bus_type
On Wed, Sep 07, 2011 at 05:41:43PM +0200, Joerg Roedel wrote: Hi, here is the new version of the patch-set to make the iommu_ops used in the iommu-api a bus_type property. This will allow us to move code out of the iommu drivers into generic code and it simplifies the implementation of the Alex' device-group property. Greg, can you have a look at patch 2 please and tell me if you have any objections? I object, please see my comments. With this version the patch-set is complete (not as the first RFC post). It converts all iommu drivers to use the new registration interface and completly removes the register_iommu interface. Regards, Joerg Diffstat: arch/ia64/kvm/kvm-ia64.c |3 +- arch/x86/kvm/x86.c |3 +- drivers/base/bus.c | 16 ++ drivers/iommu/amd_iommu.c |2 +- drivers/iommu/intel-iommu.c|2 +- drivers/iommu/iommu.c | 58 drivers/iommu/msm_iommu.c |2 +- drivers/iommu/omap-iommu.c |2 +- drivers/media/video/omap3isp/isp.c |2 +- include/linux/device.h |9 + include/linux/iommu.h | 21 +++-- virt/kvm/iommu.c |4 +- 12 files changed, 86 insertions(+), 38 deletions(-) So the overall work here makes for more code, right? I fail to see the benifit, what am I missing? greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
On Wed, Sep 07, 2011 at 09:19:19PM +0200, Joerg Roedel wrote: Hi Greg, the bus_set_iommu() function will be called by the IOMMU driver. There can be different drivers for the same bus, depending on the hardware. On PCI for example, there can be the Intel or the AMD IOMMU driver that implement the iommu-api and that register for that bus. Why are you pushing this down into the driver core? What other busses becides PCI use/need this? If you can have a different IOMMU driver on the same bus, then wouldn't this be a per-device thing instead of a per-bus thing? On Wed, Sep 07, 2011 at 11:47:50AM -0700, Greg KH wrote: +#ifdef CONFIG_IOMMU_API +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops) +{ + if (bus-iommu_ops != NULL) + return -EBUSY; Busy? Yes, it signals to the IOMMU driver that another driver has already registered for that bus. In the previous register_iommu() interface this was just a BUG(), but I think returning an error to the caller is better. It can be turned back into a BUG() if it is considered better, though. Can you ever have more than one IOMMU driver per bus? If so, this seems wrong (see above.) + + bus-iommu_ops = ops; + + /* Do IOMMU specific setup for this bus-type */ + iommu_bus_init(bus, ops); + + return 0; +} +EXPORT_SYMBOL_GPL(bus_set_iommu); I don't understand what this function is for, and who would call it. It is called by the IOMMU driver. Please provide kerneldoc that explains this. Will do. @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *); * @resume: Called to bring a device on this bus out of sleep mode. * @pm: Power management operations of this bus, callback the specific * device driver's pm-ops. + * @iommu_ops IOMMU specific operations for this bus, used to attach IOMMU + * driver implementations to a bus and allow the driver to do + * bus-specific setup So why is this just not set by the bus itself, making the above function not needed at all? The IOMMUs are usually devices on the bus itself, so they are initialized after the bus is set up and the devices on it are populated. So the function can not be called on bus initialization because the IOMMU is not ready at this point. Ok, that makes more sense, please state as much in the documentation. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster
On Fri, Jul 01, 2011 at 07:58:45PM +0300, Dan Carpenter wrote: On Fri, Jul 01, 2011 at 07:31:54AM -0700, Dan Magenheimer wrote: Off by one errors are kind of insidious. People cut and paste them and they spread. If someone adds a new list of chunks then there are now two examples that are correct and two which have an extra element, so it's 50/50 that he'll copy the right one. True, but these are NOT off-by-one errors... they are correct-but-slightly-ugly code snippets. (To clarify, I said the *ugliness* arose when debugging an off-by-one error.) What I meant was the new arrays are *one* element too large. Patches always welcome, and I agree that these should be fixed eventually, assuming the code doesn't go away completely first.. I'm simply stating the position that going through another test/submit cycling to fix correct-but-slightly-ugly code which is present only to surface information for experiments is not high on my priority list right now... unless GregKH says he won't accept the patch. Btw, looking at it again, this seems like maybe a similar issue in zbud_evict_zbpg(): 516 for (i = 0; i MAX_CHUNK; i++) { 517 retry_unbud_list_i: MAX_CHUNKS is NCHUNKS - 1. Shouldn't that be i NCHUNKS so that we reach the last element in the list? No, the last element in that list is unused. There is a comment to that effect someplace in the code. (These lists are keeping track of pages with chunks of available space and the last entry would have no available space so is always empty.) The comment says that the first element isn't used. Perhaps the comment is out of date and now it's the last element that isn't used. To me, it makes sense to have an unused first element, but it doesn't make sense to have an unused last element. Why not just make the array smaller? Also if the last element of the original arrays isn't used, then does that mean the last *two* elements of the new arrays aren't used? Getting array sizes wrong is not a correct-but-slightly-ugly thing. *grumble* *grumble* *grumble*. But it doesn't crash the system so I'm fine with it going in as is... I'm not. Please fix this up. I'll not accept it until it is. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster
On Thu, Jun 30, 2011 at 12:01:08PM -0700, Dan Magenheimer wrote: Hi Greg -- I think this patch is now ready for staging-next and for merging when the 3.1 window opens. Please let me know if you need any logistics done differently. Ok, thanks, I'll queue it up later this week for 3.1 greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote: On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote: Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. I'm confused. Tejun, Greg, anyone can probes happen in parallel? If so, I'll have to review all my drivers. I know we've tried it in the past, at the PCI device level, and ran into some issues, but I don't remember if that code ever made it into the mainline kernel or not. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH] kvm: fix crash on irqfd deassign
On Thu, Mar 17, 2011 at 10:53:33AM +0200, Michael S. Tsirkin wrote: irqfd in kvm used flush_work incorrectly: it assumed that work scheduled previously can't run after flush_work, but since kvm uses a non-reentrant workqueue (by means of schedule_work) we need flush_work_sync to get that guarantee. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reported-by: Jean-Philippe Menil jean-philippe.me...@univ-nantes.fr Tested-by: Jean-Philippe Menil jean-philippe.me...@univ-nantes.fr --- Note: this is needed for kernel 2.6.39 and earlier. This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote: On 16 July 2010 23:58, Tom Lyon p...@cisco.com wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. Thanks for working on that! I wonder whether it's possible to say what are the chances of it being merged to mainline and which version we might be talking about? We still have a long way to go before you need to worry about what kernel version it's going to show up in... thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers
On Thu, Jun 10, 2010 at 06:58:37PM -0700, Tom Lyon wrote: On Thursday 10 June 2010 10:27:36 am Konrad Rzeszutek Wilk wrote: +EXPORT_SYMBOL(uiommu_fdget); EXPORT_SYMBOL_GPL .. snip +EXPORT_SYMBOL(uiommu_put); ditto. Is there a definitive explanation somewhere of when to use each? For a driver like this, that is very tied to the way that the kernel works, I would recommend the _GPL marking, like the UIO interface has. But that's just me. :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 09, 2010 at 02:04:53PM +0300, Avi Kivity wrote: On 06/09/2010 12:21 AM, Tom Lyon wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. Signed-off-by: Tom Lyonp...@cisco.com --- This version now requires an IOMMU domain to be set before any access to device registers is granted (except that config space may be read). In addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API which does not have sufficient controls around IOMMU usage. The IOMMU domain is obtained from the 'uiommu' driver which is included in this patch. Various locking, security, and documentation issues have also been fixed. Please commit - it or me! But seriously, who gets to commit this? Avi for KVM? Definitely not me. or GregKH for drivers? I guess. If this ever gets that far, I'll be glad to take it. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call
On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote: virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de I need a git commit id for this one as well. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call
On Thu, Jun 03, 2010 at 04:17:34PM -0600, Bruce Rogers wrote: On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote: virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de I need a git commit id for this one as well. This one is not upstream. Then I can't include it in the -stable tree, so why are you sending it to me? :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] pci: allow sysfs file owner to read device dependent config space
On Wed, May 12, 2010 at 06:29:57PM -0700, Chris Wright wrote: The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN check to verify privileges before allowing a user to read device dependent config space. This is meant to protect from an unprivileged user potentially locking up the box. When assigning a PCI device directly to a guest with libvirt and KVM, the sysfs config space file is chown'd to the unprivileged user that the KVM guest will run as. The guest needs to have full access to the device's config space since it's responsible for driving the device. However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN check will not allow read access beyond the config header. With this patch the sysfs file owner is also considered privileged enough to read all of the config space. Signed-off-by: Chris Wright chr...@sous-sol.org --- drivers/pci/pci-sysfs.c |4 +++- Jesse, any objection to this going through my tree as it will depend on the sysfs change? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] pci: check caps from sysfs file open to read device dependent config space
On Thu, May 13, 2010 at 10:43:07AM -0700, Chris Wright wrote: * Alan Cox (a...@lxorguk.ukuu.org.uk) wrote: I agree with the problem - but IMHO the fix is to require opening the file checks CAP_SYS_something instead: not to hack the read method and make it even weirder and more un-Linux than it is now. This patch does that. Not as convenient from the KVM/libvirt point of view because it is not prepared to do this setup before dropping privileges and launching the VM. So does that mean that this patch doesn't solve your original problem here? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] sysfs: bin_attr permission checking
On Wed, May 12, 2010 at 11:47:13AM -0700, Chris Wright wrote: The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN check to verify privileges before allowing a user to read device dependent config space. This is meant to protect from an unprivileged user potentially locking up the box. When assigning a PCI device directly to a guest with libvirt and KVM, the sysfs config space file is chown'd to the user that the KVM guest will run as. The guest needs to have full access to the device's config space since it's responsible for driving the device. However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN check will not allow read access beyond the config header. This patch adds a new bin_attr-read_file() callback which adds a struct file to the normal argument list. This allows an implementation such as PCI config space bin_attr read_file handler to check both inode permission as well as privileges to determine whether to allow privileged actions through the handler. Ick, this is all because we like showing different information if the user is privileged or not :( Turns out, that this probably isn't the best user api to implement, remind me never to do that again... This is just RFC, although I've tested that it does allow the chown + read to work as expected. Any other ideas of how to handle this are welcome. Can we just pass in the 'file' for all users of the bin files instead of the dentry? You can always get the dentry from the file (as your patch showes), and there isn't that many users of this interface. I'd really rather not have two different types of callbacks here. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] sysfs: bin_attr permission checking
On Wed, May 12, 2010 at 12:28:28PM -0700, Chris Wright wrote: * Greg KH (g...@kroah.com) wrote: On Wed, May 12, 2010 at 11:47:13AM -0700, Chris Wright wrote: The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN check to verify privileges before allowing a user to read device dependent config space. This is meant to protect from an unprivileged user potentially locking up the box. When assigning a PCI device directly to a guest with libvirt and KVM, the sysfs config space file is chown'd to the user that the KVM guest will run as. The guest needs to have full access to the device's config space since it's responsible for driving the device. However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN check will not allow read access beyond the config header. This patch adds a new bin_attr-read_file() callback which adds a struct file to the normal argument list. This allows an implementation such as PCI config space bin_attr read_file handler to check both inode permission as well as privileges to determine whether to allow privileged actions through the handler. Ick, this is all because we like showing different information if the user is privileged or not :( yup Turns out, that this probably isn't the best user api to implement, remind me never to do that again... Yeah, it's challenging to deal with. Alternative here is a new config sysfs entry that doesn't have this 'feature'. (I looked into trying to allow manageing the internal capable() check externally, not so pretty). That would require people to update libpci and maybe their scripts as well, which wouldn't be as good. This is just RFC, although I've tested that it does allow the chown + read to work as expected. Any other ideas of how to handle this are welcome. Can we just pass in the 'file' for all users of the bin files instead of the dentry? The dentry doesn't currently go beyond sysfs/bin.c. So, yes, I pushed 'file' through to last level in bin.c before -read(), and can certinaly just push through to -read() as well. That would be better than having a 'read_file' callback, right? You can always get the dentry from the file (as your patch showes), and there isn't that many users of this interface. I'd really rather not have two different types of callbacks here. Absolutely, this is just RFC (i.e. quicker to compile and test). What about write()? Sure, might as well make it symmetrical :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.33.3: possible recursive locking detected
On Wed, May 12, 2010 at 12:34:20PM +0800, Américo Wang wrote: On Tue, May 11, 2010 at 08:03:20AM -0700, Greg KH wrote: On Tue, May 11, 2010 at 09:33:50PM +1000, CaT wrote: On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote: On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com wrote: On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote: On 05/04/2010 10:03 AM, CaT wrote: I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest kernel? Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf? Actually, these 3 commits fixed it: 6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class per sysfs ttribute. a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active references on attributes. e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove sysfs_get/put_active_two However, there are many other patches needed to amend these, so I think it's not suitable for -stable to include, perhaps a revert of 846f99749ab68bbc7f75c74fec305de675b1a1bf is better. Slightly at a loss as to what to do, now. It's a virt instance so I can apply patches at will but, well, clarity is good. :) Just ignore the lockdep warnings as they are bogus, or turn them off, or use .34-rc7, as they are resolved there. How about reverting that patch for 2.6.33 stable tree? No, as that patch is not reverted in Linus's tree, right? Just turn off lockdep if this is bothering you. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.33.3: possible recursive locking detected
On Tue, May 11, 2010 at 09:33:50PM +1000, CaT wrote: On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote: On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com wrote: On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote: On 05/04/2010 10:03 AM, CaT wrote: I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest kernel? Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf? Actually, these 3 commits fixed it: 6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class per sysfs ttribute. a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active references on attributes. e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove sysfs_get/put_active_two However, there are many other patches needed to amend these, so I think it's not suitable for -stable to include, perhaps a revert of 846f99749ab68bbc7f75c74fec305de675b1a1bf is better. Slightly at a loss as to what to do, now. It's a virt instance so I can apply patches at will but, well, clarity is good. :) Just ignore the lockdep warnings as they are bogus, or turn them off, or use .34-rc7, as they are resolved there. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[69/98] KVM: remove unused load_segment_descriptor_to_kvm_desct
2.6.32-stable review patch. If anyone has any objections, please let us know. -- From: Marcelo Tosatti mtosa...@redhat.com Commit 78ce64a384 in v2.6.32.12 introduced a warning due to unused load_segment_descriptor_to_kvm_desct helper, which has been opencoded by this commit. On upstream, the helper was removed as part of a different commit. Remove the now unused function. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- arch/x86/kvm/x86.c | 12 1 file changed, 12 deletions(-) --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4155,18 +4155,6 @@ static u16 get_segment_selector(struct k return kvm_seg.selector; } -static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu, - u16 selector, - struct kvm_segment *kvm_seg) -{ - struct desc_struct seg_desc; - - if (load_guest_segment_descriptor(vcpu, selector, seg_desc)) - return 1; - seg_desct_to_kvm_desct(seg_desc, selector, kvm_seg); - return 0; -} - static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg) { struct kvm_segment segvar = { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.32.12] KVM: remove unused load_segment_descriptor_to_kvm_desct
On Tue, Apr 27, 2010 at 11:14:14AM -0300, Marcelo Tosatti wrote: Function is now unused. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Was this patch also upstream? Why does stable need it? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.32.12] KVM: remove unused load_segment_descriptor_to_kvm_desct
On Tue, Apr 27, 2010 at 05:52:20PM +0300, Gleb Natapov wrote: On Tue, Apr 27, 2010 at 07:43:55AM -0700, Greg KH wrote: On Tue, Apr 27, 2010 at 11:14:14AM -0300, Marcelo Tosatti wrote: Function is now unused. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Was this patch also upstream? Why does stable need it? It is upstream, but as part of another commit. My guess is that when patches were ported to stable this chunk was lost due to wrong merge resolving. Ok, but as this patch doesn't do anything, why is it needed in -stable? Also, this type of information (why it isn't upstream), is REQUIRED in order to be accepted. Please include it. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled
On Thu, Mar 18, 2010 at 02:11:19PM +0800, Sheng Yang wrote: Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang sh...@linux.intel.com Now included through a different submission. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes
On Fri, Apr 02, 2010 at 09:43:35AM +0300, Avi Kivity wrote: On 04/01/2010 10:24 PM, Tom Lyon wrote: But there are multiple msi-x interrupts, how do you know which one triggered? You don't. This would suck for KVM, I guess, but we'd need major rework of the generic UIO stuff to have a separate event channel for each MSI-X. Doesn't it suck for non-kvm in the same way? Multiple vectors are there for a reason. For example, if you have a multiqueue NIC, you'd have to process all queues instead of just the one that triggered. For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine, because I'd be using MSI anyways if I could. The weird Intel 82599 VF only supports MSI-X. So one big question is - do we expand the whole UIO framework for KVM requirements, or do we split off either KVM or non-VM into a separate driver? Hans or Greg - care to opine? Currently kvm does device assignment with its own code, I'd like to unify it with uio, not split it off. Separate notifications for msi-x interrupts are just as useful for uio as they are for kvm. I agree, there should not be a difference here for KVM vs. the normal version. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[027/116] hrtimer: Tune hrtimer_interrupt hang logic
2.6.32-stable review patch. If anyone has any objections, please let us know. -- From: Thomas Gleixner t...@linutronix.de commit 41d2e494937715d3150e5c75d01f0e75ae899337 upstream. The hrtimer_interrupt hang logic adjusts min_delta_ns based on the execution time of the hrtimer callbacks. This is error-prone for virtual machines, where a guest vcpu can be scheduled out during the execution of the callbacks (and the callbacks themselves can do operations that translate to blocking operations in the hypervisor), which in can lead to large min_delta_ns rendering the system unusable. Replace the current heuristics with something more reliable. Allow the interrupt code to try 3 times to catch up with the lost time. If that fails use the total time spent in the interrupt handler to defer the next timer interrupt so the system can catch up with other things which got delayed. Limit that deferment to 100ms. The retry events and the maximum time spent in the interrupt handler are recorded and exposed via /proc/timer_list Inspired by a patch from Marcelo. Reported-by: Michael Tokarev m...@tls.msk.ru Signed-off-by: Thomas Gleixner t...@linutronix.de Tested-by: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: Jeremy Fitzhardinge jer...@goop.org Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- include/linux/hrtimer.h | 13 -- kernel/hrtimer.c | 96 +++ kernel/time/timer_list.c |5 +- 3 files changed, 70 insertions(+), 44 deletions(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -162,10 +162,11 @@ struct hrtimer_clock_base { * @expires_next: absolute time of the next event which was scheduled * via clock_set_next_event() * @hres_active: State of high resolution mode - * @check_clocks: Indictator, when set evaluate time source and clock - * event devices whether high resolution mode can be - * activated. - * @nr_events: Total number of timer interrupt events + * @hang_detected: The last hrtimer interrupt detected a hang + * @nr_events: Total number of hrtimer interrupt events + * @nr_retries:Total number of hrtimer interrupt retries + * @nr_hangs: Total number of hrtimer interrupt hangs + * @max_hang_time: Maximum time spent in hrtimer_interrupt */ struct hrtimer_cpu_base { spinlock_t lock; @@ -173,7 +174,11 @@ struct hrtimer_cpu_base { #ifdef CONFIG_HIGH_RES_TIMERS ktime_t expires_next; int hres_active; + int hang_detected; unsigned long nr_events; + unsigned long nr_retries; + unsigned long nr_hangs; + ktime_t max_hang_time; #endif }; --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -557,7 +557,7 @@ hrtimer_force_reprogram(struct hrtimer_c static int hrtimer_reprogram(struct hrtimer *timer, struct hrtimer_clock_base *base) { - ktime_t *expires_next = __get_cpu_var(hrtimer_bases).expires_next; + struct hrtimer_cpu_base *cpu_base = __get_cpu_var(hrtimer_bases); ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base-offset); int res; @@ -582,7 +582,16 @@ static int hrtimer_reprogram(struct hrti if (expires.tv64 0) return -ETIME; - if (expires.tv64 = expires_next-tv64) + if (expires.tv64 = cpu_base-expires_next.tv64) + return 0; + + /* +* If a hang was detected in the last timer interrupt then we +* do not schedule a timer which is earlier than the expiry +* which we enforced in the hang detection. We want the system +* to make progress. +*/ + if (cpu_base-hang_detected) return 0; /* @@ -590,7 +599,7 @@ static int hrtimer_reprogram(struct hrti */ res = tick_program_event(expires, 0); if (!IS_ERR_VALUE(res)) - *expires_next = expires; + cpu_base-expires_next = expires; return res; } @@ -1217,29 +1226,6 @@ static void __run_hrtimer(struct hrtimer #ifdef CONFIG_HIGH_RES_TIMERS -static int force_clock_reprogram; - -/* - * After 5 iteration's attempts, we consider that hrtimer_interrupt() - * is hanging, which could happen with something that slows the interrupt - * such as the tracing. Then we force the clock reprogramming for each future - * hrtimer interrupts to avoid infinite loops and use the min_delta_ns - * threshold that we will overwrite. - * The next tick event will be scheduled to 3 times we currently spend on - * hrtimer_interrupt(). This gives a good compromise, the cpus will spend - * 1/4 of their time to process the hrtimer interrupts. This is enough to - *
[49/74] KVM: allow userspace to adjust kvmclock offset
2.6.32-stable review patch. If anyone has any objections, please let us know. -- From: Glauber Costa glom...@redhat.com (cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d) When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa glom...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de --- Documentation/kvm/api.txt | 36 ++ arch/x86/include/asm/kvm_host.h |1 arch/x86/kvm/x86.c | 42 +++- include/linux/kvm.h |9 4 files changed, 87 insertions(+), 1 deletion(-) --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -412,6 +412,7 @@ struct kvm_arch{ unsigned long irq_sources_bitmap; unsigned long irq_states[KVM_IOAPIC_NUM_PINS]; u64 vm_init_tsc; + s64 kvmclock_offset; }; struct kvm_vm_stat { --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -680,7 +680,8 @@ static void kvm_write_guest_time(struct /* With all the info we got, fill in the values */ vcpu-hv_clock.system_time = ts.tv_nsec + -(NSEC_PER_SEC * (u64)ts.tv_sec); +(NSEC_PER_SEC * (u64)ts.tv_sec) + v-kvm-arch.kvmclock_offset; + /* * The interface expects us to write an even number signaling that the * update is finished. Since the guest won't see the intermediate @@ -1227,6 +1228,7 @@ int kvm_dev_ioctl_check_extension(long e case KVM_CAP_PIT2: case KVM_CAP_PIT_STATE2: case KVM_CAP_SET_IDENTITY_MAP_ADDR: + case KVM_CAP_ADJUST_CLOCK: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -2424,6 +2426,44 @@ long kvm_arch_vm_ioctl(struct file *filp r = 0; break; } + case KVM_SET_CLOCK: { + struct timespec now; + struct kvm_clock_data user_ns; + u64 now_ns; + s64 delta; + + r = -EFAULT; + if (copy_from_user(user_ns, argp, sizeof(user_ns))) + goto out; + + r = -EINVAL; + if (user_ns.flags) + goto out; + + r = 0; + ktime_get_ts(now); + now_ns = timespec_to_ns(now); + delta = user_ns.clock - now_ns; + kvm-arch.kvmclock_offset = delta; + break; + } + case KVM_GET_CLOCK: { + struct timespec now; + struct kvm_clock_data user_ns; + u64 now_ns; + + ktime_get_ts(now); + now_ns = timespec_to_ns(now); + user_ns.clock = kvm-arch.kvmclock_offset + now_ns; + user_ns.flags = 0; + + r = -EFAULT; + if (copy_to_user(argp, user_ns, sizeof(user_ns))) + goto out; + r = 0; + break; + } + default: ; } --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -593,6 +593,42 @@ struct kvm_irqchip { } chip; }; +4.27 KVM_GET_CLOCK + +Capability: KVM_CAP_ADJUST_CLOCK +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_clock_data (out) +Returns: 0 on success, -1 on error + +Gets the current timestamp of kvmclock as seen by the current guest. In +conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios +such as migration. + +struct kvm_clock_data { + __u64 clock; /* kvmclock current value */ + __u32 flags; + __u32 pad[9]; +}; + +4.28 KVM_SET_CLOCK + +Capability: KVM_CAP_ADJUST_CLOCK +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_clock_data (in) +Returns: 0 on success, -1 on error + +Sets the current timestamp of kvmclock to the valued specific in its parameter. +In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios +such as migration.
Re: kernel memory allocation bug in 2.6.27.32-2.6.27.41 kvm section
On Sun, Dec 27, 2009 at 12:13:00PM +0200, Avi Kivity wrote: On 12/17/2009 05:35 PM, Oscon wrote: Hello! I can't register new account in bugzilla.kernel.org. / my ISP's spamfilter problem (?) maybe./ -- I sent this mail to Greg KH (2.6.27.y maintainer), he sent me: Can you get the kvm maintainers to agree that this is correct? thanks, greg k-h --- So the bug : I found a memory allocation bug in kvm/mmu.c kvm_main.c. /in kvm_destroy_vm()/ Affected kernel: 2.6.27.32-2.6.27.41 Mainline kernel (2.6.32) is not affected. (Modified kvm subsystem.) Cause: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.27.y.git;a=commitdiff_plain;h=d2127c8300fb1ec54af56faee17170e7a525326d Solution: Revert this patch. This bug can cause local DoS in the host system. Looks like some other patch is missing in 2.6.27.y. Not sure what it is. But it's safer to revert this patch for now. Ok, I've now reverted it. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pci: expose function reset capability in sysfs
On Sun, Jul 26, 2009 at 08:11:39PM +0300, Michael S. Tsirkin wrote: Some devices allow an individual function to be reset without affecting other functions in the same device: that's what pci_reset_function does. For devices that have this support, expose reset attribite in sysfs. Please add the proper documentation to Documentation/ABI for any new sysfs file you create. static int pci_create_capabilities_sysfs(struct pci_dev *dev) { int retval; @@ -943,7 +965,21 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev) /* Active State Power Management */ pcie_aspm_create_sysfs_dev_files(dev); + if (!pci_probe_reset_function(dev)) { + retval = device_create_file(dev-dev, reset_attr); + if (retval) + goto error; + } So you only add the file if there is a reset function, which is fine, but later on: @@ -1037,6 +1073,7 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev) } pcie_aspm_remove_sysfs_dev_files(dev); + device_remove_file(dev-dev, reset_attr); You always remove the file, if it has been created or not. That could cause problems in the future, please only remove a file if you have actually added it to the sysfs tree. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pci: expose function reset capability in sysfs
On Mon, Jul 27, 2009 at 06:52:38PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 27, 2009 at 08:01:46AM -0700, Greg KH wrote: + if (!pci_probe_reset_function(dev)) { + retval = device_create_file(dev-dev, reset_attr); + if (retval) + goto error; + } So you only add the file if there is a reset function, which is fine, but later on: @@ -1037,6 +1073,7 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev) } pcie_aspm_remove_sysfs_dev_files(dev); + device_remove_file(dev-dev, reset_attr); You always remove the file, if it has been created or not. That could cause problems in the future, please only remove a file if you have actually added it to the sysfs tree. thanks, greg k-h OK. I think, however, that it's better to avoid probing the function on cleanup path just to figure out whether the file needs to be removed. Can this info be stored in struct pci_dev? Along the lines of: Fine with me. You forgot the documentation though :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pci: expose function reset capability in sysfs
On Mon, Jul 27, 2009 at 11:37:48PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 27, 2009 at 09:14:23AM -0700, Greg KH wrote: Fine with me. You forgot the documentation though :) This enough? pci: expose function reset capability in sysfs Some devices allow an individual function to be reset without affecting other functions in the same device: that's what pci_reset_function does. For devices that have this support, expose reset attribite in sysfs. This is useful e.g. for virtualization, where a qemu userspace process wants to reset the device when the guest is reset, to emulate machine reboot as closely as possible. Signed-off-by: Michael S. Tsirkin m...@redhat.com Looks good to me: Acked-by: Greg Kroah-Hartman gre...@suse.de -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5] uio: add generic driver for PCI 2.3 devices
On Mon, Jul 20, 2009 at 10:52:27PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 20, 2009 at 09:09:43PM +0200, Hans J. Koch wrote: On Mon, Jul 20, 2009 at 10:29:34AM +0300, Michael S. Tsirkin wrote: This adds a generic uio driver that can bind to any PCI device. First user will be virtualization where a qemu userspace process needs to give guest OS access to the device. Interrupts are handled using the Interrupt Disable bit in the PCI command register and Interrupt Status bit in the PCI status register. All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should support these bits. Driver detects this support, and won't bind to devices which do not support the Interrupt Disable Bit in the command register. It's expected that more features of interest to virtualization will be added to this driver in the future. Possibilities are: mmap for device resources, MSI/MSI-X, eventfd (to interface with kvm), iommu. Thanks for adding the docs! Looks alright to me. Thanks, Hans Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Chris Wright chr...@redhat.com Signed-off-by: Hans J. Koch h...@linutronix.de Jesse just acked this patch in a private mail, as well. Acked-by: Jesse Barnes jbar...@virtuousgeek.org Odd, but ok... I'll queue it up now. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices
On Thu, Jul 16, 2009 at 05:12:55PM +0200, Hans J. Koch wrote: On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote: How about moving that documentation into a place that people will notice it, like the rest of the UIO documentation? Greg, would it make more sense to add this to Documentation/DocBook/uio-howto.xml, or to create Documentation/uio_pci_generic.txt ? Hi Michael, I'd prefer to have it in uio-howto.xml so that people only have to look in one place. In does not have to be very detailled, just a short explanation what this driver is all about and a short example how to use it. I agree, that would be the best place for it. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices
On Thu, Jul 16, 2009 at 08:03:46PM +0300, Michael S. Tsirkin wrote: On Thu, Jul 16, 2009 at 08:52:08AM -0700, Greg KH wrote: On Thu, Jul 16, 2009 at 05:12:55PM +0200, Hans J. Koch wrote: On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote: How about moving that documentation into a place that people will notice it, like the rest of the UIO documentation? Greg, would it make more sense to add this to Documentation/DocBook/uio-howto.xml, or to create Documentation/uio_pci_generic.txt ? Hi Michael, I'd prefer to have it in uio-howto.xml so that people only have to look in one place. In does not have to be very detailled, just a short explanation what this driver is all about and a short example how to use it. I agree, that would be the best place for it. thanks, greg k-h OK. Could you take a look at the following please? Looks good to me. Want to resend both of the patches so I can apply them to my tree? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices
On Thu, Jul 16, 2009 at 03:31:01PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote: On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote: This adds a generic uio driver that can bind to any PCI device. First user will be virtualization where a qemu userspace process needs to give guest OS access to the device. Interrupts are handled using the Interrupt Disable bit in the PCI command register and Interrupt Status bit in the PCI status register. All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should support these bits. Driver detects this support, and won't bind to devices which do not support the Interrupt Disable Bit in the command register. It's expected that more features of interest to virtualization will be added to this driver in the future. Possibilities are: mmap for device resources, MSI/MSI-X, eventfd (to interface with kvm), iommu. Acked-by: Chris Wright chr...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Hans, Greg, please review and consider for upstream. This is intended to solve the problem in virtualization that shared interrupts do not work with assigned devices. Earlier versions of this patch have circulated on k...@vger. How does this play with the pci-stub driver that I thought was written to solve this very problem? AFAIK the problem pci stub was written to solve is simply to bind to a device. You then have to use another kernel module which looks the device up with something like pci_get_bus_and_slot to do anything useful. In particular, for non-shared interrupts, we can disable the interrupt in the apic. But this does not work well for shared interrupts. Thus this work. The uio driver will be used in virtualization scenarious, a couple of possible ones that have been mentioned on the kvm list are: - device assignment (guest access to device) for simple devices with shared interrupts: emulating PCI is tricky enough to better be done in userspace. shared interrupt support is important as it happens with real devices - simple communication between guest and host: we create a virtual device in host, and userspace driver in guest gets events and passes them on to e.g. dbus. shared interrupt support is important to avoid wasting irqs Ah, ok, thanks for all of the explanation, that makes sense. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices
On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote: This adds a generic uio driver that can bind to any PCI device. First user will be virtualization where a qemu userspace process needs to give guest OS access to the device. Interrupts are handled using the Interrupt Disable bit in the PCI command register and Interrupt Status bit in the PCI status register. All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should support these bits. Driver detects this support, and won't bind to devices which do not support the Interrupt Disable Bit in the command register. It's expected that more features of interest to virtualization will be added to this driver in the future. Possibilities are: mmap for device resources, MSI/MSI-X, eventfd (to interface with kvm), iommu. Acked-by: Chris Wright chr...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Hans, Greg, please review and consider for upstream. This is intended to solve the problem in virtualization that shared interrupts do not work with assigned devices. Earlier versions of this patch have circulated on k...@vger. How does this play with the pci-stub driver that I thought was written to solve this very problem? Will it conflict? In fact, it looks like you copied the comments for this driver directly from the pci-stub driver :) How about moving that documentation into a place that people will notice it, like the rest of the UIO documentation? And right now you are just sending the irq to userspace, what is userspace supposed to do with it? Do you have a userspace program that uses this interface today to verify that everything works? If so, care to provide a pointer to it? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 09/19] net: Add vbus_enet driver
On Thu, Apr 09, 2009 at 09:37:10AM -0700, Stephen Hemminger wrote: +static int tx_ringlen = 256; +module_param(tx_ringlen, int, 0444); + +#undef PDEBUG /* undef it, just in case */ +#ifdef VBUS_ENET_DEBUG +# define PDEBUG(fmt, args...) printk(KERN_DEBUG vbus_enet: fmt, ## args) +#else +# define PDEBUG(fmt, args...) /* not debugging: nothing */ +#endif Why reinvent pr_debug()? Even more important, use dev_dbg() instead please, that uniquly describes your device and driver together, which is what you need/want, and it ties into the dynamic debug work, so you don't need a special kernel config option. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver
On Tue, Mar 10, 2009 at 09:37:53AM +0800, Yu Zhao wrote: On Tue, Mar 10, 2009 at 03:39:01AM +0800, Greg KH wrote: On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote: + pci_device_add(virtfn, virtfn-bus); Greg is probably going to ding you here for adding the device, then creating the symlinks. I believe it's now best practice to create the symlinks first, so there's no window where userspace can get confused. Yes, but unfortunately we can't create links before adding a device. I double checked device_add(), there is no place for those links to be created before it sends uevent. So for now, we have to trigger another uevent for those links. What exactly are you trying to do with a symlink here that you need to do it this way? I vaguely remember you mentioning this in the past, but I thought you had dropped the symlinks after our conversation about this very problem. I'd like to create some symlinks to reflect the relationship between Physical Function and its associated Virtual Functions. The Physical Function is like a master device that controls the allocation of its Virtual Functions and owns the device physical resource. The Virtual Functions are like slave devices of the Physical Function. For example, if 01:00.0 is a Physical Function and 02:00.0 is a Virtual Function associated with 01:00.0. Then the symlinks (virtfnN and physfn) would look like: $ ls -l /sys/bus/pci/devices/:01:00.0/ ... ... virtfn0 - ../:02:00.0 ... virtfn1 - ../:02:00.1 ... virtfn2 - ../:02:00.2 ... $ ls -l /sys/bus/pci/devices/:02:00.0/ ... ... physfn - ../:01:00.0 ... This is very useful for userspace applications, both KVM and Xen need to know this kind of relationship so they can request the permission from a Physical Function before using its associated Virtual Functions. Ok, but then make sure you never rely on a udev rule or notifier to see these symlinks when the device is added to the kernel, as there will be a nice race condition there :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 1/7] PCI: initialize and release SR-IOV capability
On Tue, Mar 10, 2009 at 09:19:44AM +0800, Yu Zhao wrote: On Sat, Mar 07, 2009 at 10:38:45AM +0800, Greg KH wrote: On Fri, Mar 06, 2009 at 01:08:10PM -0700, Matthew Wilcox wrote: On Fri, Feb 20, 2009 at 02:54:42PM +0800, Yu Zhao wrote: + list_for_each_entry(pdev, dev-bus-devices, bus_list) + if (pdev-sriov) + break; + if (list_empty(dev-bus-devices) || !pdev-sriov) + pdev = NULL; + ctrl = 0; + if (!pdev pci_ari_enabled(dev-bus)) + ctrl |= PCI_SRIOV_CTRL_ARI; + I don't like this loop. At the end of a list_for_each_entry() loop, pdev will not be pointing at a pci_device, it'll be pointing to some offset from dev-bus-devices. So checking pdev-sriov at this point is really, really bad. I would prefer to see something like this: ctrl = 0; list_for_each_entry(pdev, dev-bus-devices, bus_list) { if (pdev-sriov) goto ari_enabled; } if (pci_ari_enabled(dev-bus)) ctrl = PCI_SRIOV_CTRL_ARI; ari_enabled: pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); No, please use bus_for_each_dev() instead, or bus_find_device(), don't walk the bus list by hand. I'm kind of surprised that even builds. Hm, in looking at the 2.6.29-rc kernels, I notice it will not even build at all, you are now forced to use those functions, which is good. The devices haven't been added at this time, so we can't use bus_for_each_dev(). I guess that's why the `bus-devices' exists, and actually pci_bus_add_devices() walks the bus list same way to retrieve the devices and add them. ah, this is struct pci_bus, not struct bus_type, my mistake. sorry for the noise, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver
On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote: + pci_device_add(virtfn, virtfn-bus); Greg is probably going to ding you here for adding the device, then creating the symlinks. I believe it's now best practice to create the symlinks first, so there's no window where userspace can get confused. Yes, but unfortunately we can't create links before adding a device. I double checked device_add(), there is no place for those links to be created before it sends uevent. So for now, we have to trigger another uevent for those links. What exactly are you trying to do with a symlink here that you need to do it this way? I vaguely remember you mentioning this in the past, but I thought you had dropped the symlinks after our conversation about this very problem. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 0/7] PCI: Linux kernel SR-IOV support
On Sun, Mar 08, 2009 at 09:01:09AM -0600, Matthew Wilcox wrote: On Sun, Mar 08, 2009 at 04:30:16PM +0200, Avi Kivity wrote: Matthew Wilcox wrote: On Tue, Feb 24, 2009 at 12:47:38PM +0200, Avi Kivity wrote: Do those patches allow using a VF on the host (in other words, does the kernel emulate config space accesses)? SR-IOV hardware handles config space accesses to virtual functions. No kernel changes needed for that aspect of it. Patches 2 and 3 of the patchset that enables SR/IOV in kvm [1] suggest that at the config space is only partially implemented. [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/29034 I believe Red Hat are now members of the PCI SIG, you really should download the SR-IOV spec that Yu Zhao keeps pointing at. It's going to be very hard to have a sensible discussion if you haven't read it. But if any kernel developers are working for companies that are not members of the PCI SIG, and wish to read the PCI specs, please let me know as we are working to enable this to happen through the Linux Foundation. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 0/7] PCI: Linux kernel SR-IOV support
On Fri, Mar 06, 2009 at 12:44:11PM -0700, Matthew Wilcox wrote: Physical Function driver patches for Intel 82576 NIC are available: http://patchwork.kernel.org/patch/8063/ http://patchwork.kernel.org/patch/8064/ http://patchwork.kernel.org/patch/8065/ http://patchwork.kernel.org/patch/8066/ I need to review this driver; I haven't done that yet. Has anyone else? The driver was rejected by the upstream developers, who said it would never be accepted. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 1/7] PCI: initialize and release SR-IOV capability
On Fri, Mar 06, 2009 at 01:08:10PM -0700, Matthew Wilcox wrote: On Fri, Feb 20, 2009 at 02:54:42PM +0800, Yu Zhao wrote: + list_for_each_entry(pdev, dev-bus-devices, bus_list) + if (pdev-sriov) + break; + if (list_empty(dev-bus-devices) || !pdev-sriov) + pdev = NULL; + ctrl = 0; + if (!pdev pci_ari_enabled(dev-bus)) + ctrl |= PCI_SRIOV_CTRL_ARI; + I don't like this loop. At the end of a list_for_each_entry() loop, pdev will not be pointing at a pci_device, it'll be pointing to some offset from dev-bus-devices. So checking pdev-sriov at this point is really, really bad. I would prefer to see something like this: ctrl = 0; list_for_each_entry(pdev, dev-bus-devices, bus_list) { if (pdev-sriov) goto ari_enabled; } if (pci_ari_enabled(dev-bus)) ctrl = PCI_SRIOV_CTRL_ARI; ari_enabled: pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); No, please use bus_for_each_dev() instead, or bus_find_device(), don't walk the bus list by hand. I'm kind of surprised that even builds. Hm, in looking at the 2.6.29-rc kernels, I notice it will not even build at all, you are now forced to use those functions, which is good. Has anyone even tried to build this patch recently? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver
On Fri, Mar 06, 2009 at 01:37:18PM -0700, Matthew Wilcox wrote: On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote: + virtfn-sysdata = dev-bus-sysdata; + virtfn-dev.parent = dev-dev.parent; + virtfn-dev.bus = dev-dev.bus; + virtfn-devfn = devfn; + virtfn-hdr_type = PCI_HEADER_TYPE_NORMAL; + virtfn-cfg_size = PCI_CFG_SPACE_EXP_SIZE; + virtfn-error_state = pci_channel_io_normal; + virtfn-current_state = PCI_UNKNOWN; + virtfn-is_pcie = 1; + virtfn-pcie_type = PCI_EXP_TYPE_ENDPOINT; + virtfn-dma_mask = 0x; + virtfn-vendor = dev-vendor; + virtfn-subsystem_vendor = dev-subsystem_vendor; + virtfn-class = dev-class; There seems to be a certain amount of commonality between this and pci_scan_device(). Have you considered trying to make a common helper function, or does it not work out well? + pci_device_add(virtfn, virtfn-bus); Greg is probably going to ding you here for adding the device, then creating the symlinks. I believe it's now best practice to create the symlinks first, so there's no window where userspace can get confused. If the uevent gets sent before the symlinks are created, it's a bug. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/6] ATS capability support for Intel IOMMU
On Thu, Feb 26, 2009 at 10:50:35AM +0800, Yu Zhao wrote: On Sun, Feb 15, 2009 at 06:59:10AM +0800, Grant Grundler wrote: On Thu, Feb 12, 2009 at 08:50:32PM +0800, Yu Zhao wrote: This patch series implements Address Translation Service support for the Intel IOMMU. ATS makes the PCI Endpoint be able to request the DMA address translation from the IOMMU and cache the translation in the Endpoint, thus alleviate IOMMU pressure and improve the hardware performance in the I/O virtualization environment. Changelog: v2 - v3 1, throw error message if VT-d hardware detects invalid descriptor on Queued Invalidation interface (David Woodhouse) 2, avoid using pci_find_ext_capability every time when reading ATS Invalidate Queue Depth (Matthew Wilcox) Changelog: v1 - v2 added 'static' prefix to a local LIST_HEAD (Andrew Morton) Yu Zhao (6): PCI: support the ATS capability VT-d: parse ATSR in DMA Remapping Reporting Structure VT-d: add queue invalidation fault status support VT-d: add device IOTLB invalidation support VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps VT-d: support the device IOTLB drivers/pci/dmar.c | 230 ++ Yu, Can you please add something to Documentation/PCI/pci.txt? New API I'm seeing are: +extern int pci_enable_ats(struct pci_dev *dev, int ps); +extern void pci_disable_ats(struct pci_dev *dev); +extern int pci_ats_queue_depth(struct pci_dev *dev); Yes, I'll document these new API. Do these also need to be EXPORT_SYMBOL_GPL() as well? Or are drivers never expected to call the above? PCI device driver shouldn't use these API, only IOMMU driver (can't be module) would use them. Anyway it's a good idea to export them :-) Don't export them if no one is using them, that's just a waste of space. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] PCI: add some sysfs ABI docs
On Mon, Feb 23, 2009 at 06:17:25PM -0800, Chris Wright wrote: Add sysfs ABI docs for driver entries bind, unbind and new_id. These entries are pretty old, from 2.6.0 onwards AFAIK, so this documents current behaviour. Signed-off-by: Chris Wright chr...@sous-sol.org --- Documentation/ABI/testing/sysfs-bus-pci | 41 1 file changed, 41 insertions(+) --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -1,3 +1,44 @@ +What:/sys/bus/pci/drivers/.../bind +Date:December 2003 +Contact: linux-...@vger.kernel.org +Description: + Writing a device location to this file will cause + the driver to attempt to bind to the device found at + this location. This is useful for overriding default + bindings. The format for the location is: :BB:DD.F. + That is Domain:Bus:Device.Function and is the same as + found in /sys/bus/pci/devices/. For example: + # echo :00:19.0 /sys/bus/pci/drivers/foo/bind Don't you need 'echo -n' instead? Or did we fix that problem? Or is that just for the new_id file? If so, feel free to ignore the comment and add: Acked-by: Greg Kroah-Hartman gre...@suse.de Thanks a lot for doing this, it is much needed. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] PCI: add remove_id sysfs entry
On Mon, Feb 23, 2009 at 06:18:29PM -0800, Chris Wright wrote: This adds a remove_id sysfs entry to allow users of new_id to later remove the added dynid. One use case is management tools that want to dynamically bind/unbind devices to pci-stub driver while devices are assigned to KVM guests. Rather than having to track which driver was originally bound to the driver, a mangement tool can simply: # echo 8086 10f5 /sys/bus/pci/drivers/pci-stub/new_id # echo -n :00:19.0 /sys/bus/pci/devices/:00:19.0/driver/unbind # echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind Ah, you do need -n with unbind and bind, you might want to change your documentation :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html