Re: kvm: irqchip: fix memory leak in -stable

2015-11-05 Thread Greg KH
On Wed, Nov 04, 2015 at 02:11:54PM +0100, William Dauchy wrote:
> Hello stable release team,
> 
> The commit ba60c41 kvm: irqchip: fix memory leak
> is fixing commit e73f61e kvm: irqchip: Break up high order allocations of 
> kvm_irq_routing_table
> 
> I believe commit ba60c41 kvm: irqchip: fix memory leak
> is a good candidate for -stable. I also  got an agreement from Paolo.

Now queued up.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [request for stable inclusion][Patch 3.4.y] x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()

2015-07-30 Thread Greg KH
On Fri, Jul 31, 2015 at 11:03:48AM +0800, Rui Xiang wrote:
 ping
 
 On 2015/7/29 19:03, Rui Xiang wrote:

2 days ago?  For a very old and slowly updated kernel release?  Hah, be
happy if you get a response in a month...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MIPS: KVM: do not sign extend on unsigned MMIO load

2015-06-19 Thread Greg KH
On Mon, Jun 08, 2015 at 09:33:50AM +0100, James Hogan wrote:
 Hi stable folk,
 
 On 08/05/15 15:16, James Hogan wrote:
  On 07/05/15 13:47, Nicholas Mc Guire wrote:
  Fix possible unintended sign extension in unsigned MMIO loads by casting
  to uint16_t in the case of mmio_needed != 2.
 
  Signed-off-by: Nicholas Mc Guire hof...@osadl.org
  
  Looks good to me. I wrote an MMIO test to reproduce the issue, and this
  fixes it.
  
  Reviewed-by: James Hogan james.ho...@imgtec.com
  Tested-by: James Hogan james.ho...@imgtec.com
  
  It looks suitable for stable too (3.10+).
 
 This has reached mainline, commit ed9244e6c534612d2b5ae47feab2f55a0d4b4ced
 
 Please could it be added to stable (3.10+).

It does not apply to 3.10 or 3.14-stable, so please provide a backport
if you want it there.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in


Re: [PATCH] KVM: VMX: Preserve host CR4.MCE value while in guest mode.

2015-04-17 Thread Greg KH
On Thu, Apr 16, 2015 at 11:58:05AM -0700, Ben Serebrin wrote:
 The host's decision to enable machine check exceptions should remain
 in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
 and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
 
 Tested: Built.
 On earlier version, tested by injecting machine check
 while a guest is spinning.
 
 Before the change, if guest CR4.MCE==0, then the machine check is
 escalated to Catastrophic Error (CATERR) and the machine dies.
 If guest CR4.MCE==1, then the machine check causes VMEXIT and is
 handled normally by host Linux. After the change, injecting a machine
 check causes normal Linux machine check handling.
 
 Signed-off-by: Ben Serebrin sereb...@google.com
 ---
  arch/x86/kvm/vmx.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote:
 
 * Greg KH gre...@linuxfoundation.org wrote:
 
  On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
   Paravirt spinlock clears slowpath flag after doing unlock.
   As explained by Linus currently it does:
   prev = *lock;
   add_smp(lock-tickets.head, TICKET_LOCK_INC);
   
   /* add_smp() is a full mb() */
   
   if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
   __ticket_unlock_slowpath(lock, prev);
   
   which is *exactly* the kind of things you cannot do with spinlocks,
   because after you've done the add_smp() and released the spinlock
   for the fast-path, you can't access the spinlock any more.  Exactly
   because a fast-path lock might come in, and release the whole data
   structure.
   
   Linus suggested that we should not do any writes to lock after unlock(),
   and we can move slowpath clearing to fastpath lock.
   
   So this patch implements the fix with:
   1. Moving slowpath flag to head (Oleg):
   Unlocked locks don't care about the slowpath flag; therefore we can keep
   it set after the last unlock, and clear it again on the first (try)lock.
   -- this removes the write after unlock. note that keeping slowpath flag 
   would
   result in unnecessary kicks.
   By moving the slowpath flag from the tail to the head ticket we also avoid
   the need to access both the head and tail tickets on unlock.
   
   2. use xadd to avoid read/write after unlock that checks the need for
   unlock_kick (Linus):
   We further avoid the need for a read-after-release by using xadd;
   the prev head value will include the slowpath flag and indicate if we
   need to do PV kicking of suspended spinners -- on modern chips xadd
   isn't (much) more expensive than an add + load.
   
   Result:
setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
benchmark overcommit %improve
kernbench  1x   -0.13
kernbench  2x0.02
dbench 1x   -1.77
dbench 2x   -0.63
   
   [Jeremy: hinted missing TICKET_LOCK_INC for kick]
   [Oleg: Moving slowpath flag to head, ticket_equals idea]
   [PeterZ: Detailed changelog]
   
   Reported-by: Sasha Levin sasha.le...@oracle.com
   Suggested-by: Linus Torvalds torva...@linux-foundation.org
   Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
   Reviewed-by: Oleg Nesterov o...@redhat.com
   Acked-by: David Vrabel david.vra...@citrix.com
   ---
arch/x86/include/asm/spinlock.h | 94 
   -
arch/x86/kernel/kvm.c   |  7 ++-
arch/x86/xen/spinlock.c |  7 ++-
3 files changed, 58 insertions(+), 50 deletions(-)
   
   Changes for stable:
 - Don't replace the ACCESS_ONCE to READ_ONCE which would cause 
   horraneous
   Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
  
  What is the git commit id of this in Linus's tree?  What 
  stable tree(s) do you want this applied to?
 
 It's:
 
  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
 
 You'll also need this fix from Linus to avoid (harmless) 
 build warnings:
 
  dd36929720f4 kernel: make READ_ONCE() valid on const arguments

Great.  But what stable kernel trees should it be backported to?  Just
3.19?  Or anything older?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
 /* add_smp() is a full mb() */
 
 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);
 
 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.
 
 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.
 
 So this patch implements the fix with:
 1. Moving slowpath flag to head (Oleg):
 Unlocked locks don't care about the slowpath flag; therefore we can keep
 it set after the last unlock, and clear it again on the first (try)lock.
 -- this removes the write after unlock. note that keeping slowpath flag would
 result in unnecessary kicks.
 By moving the slowpath flag from the tail to the head ticket we also avoid
 the need to access both the head and tail tickets on unlock.
 
 2. use xadd to avoid read/write after unlock that checks the need for
 unlock_kick (Linus):
 We further avoid the need for a read-after-release by using xadd;
 the prev head value will include the slowpath flag and indicate if we
 need to do PV kicking of suspended spinners -- on modern chips xadd
 isn't (much) more expensive than an add + load.
 
 Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63
 
 [Jeremy: hinted missing TICKET_LOCK_INC for kick]
 [Oleg: Moving slowpath flag to head, ticket_equals idea]
 [PeterZ: Detailed changelog]
 
 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 Reviewed-by: Oleg Nesterov o...@redhat.com
 Acked-by: David Vrabel david.vra...@citrix.com
 ---
  arch/x86/include/asm/spinlock.h | 94 
 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)
 
 Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)

What is the git commit id of this in Linus's tree?  What stable tree(s)
do you want this applied to?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 11:49:13PM +0530, Raghavendra K T wrote:
 On 02/24/2015 08:17 PM, Ingo Molnar wrote:
 
 * Greg KH gre...@linuxfoundation.org wrote:
 
 On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
  prev = *lock;
  add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
  /* add_smp() is a full mb() */
 
  if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
  __ticket_unlock_slowpath(lock, prev);
 
 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.
 
 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.
 
 So this patch implements the fix with:
 1. Moving slowpath flag to head (Oleg):
 Unlocked locks don't care about the slowpath flag; therefore we can keep
 it set after the last unlock, and clear it again on the first (try)lock.
 -- this removes the write after unlock. note that keeping slowpath flag 
 would
 result in unnecessary kicks.
 By moving the slowpath flag from the tail to the head ticket we also avoid
 the need to access both the head and tail tickets on unlock.
 
 2. use xadd to avoid read/write after unlock that checks the need for
 unlock_kick (Linus):
 We further avoid the need for a read-after-release by using xadd;
 the prev head value will include the slowpath flag and indicate if we
 need to do PV kicking of suspended spinners -- on modern chips xadd
 isn't (much) more expensive than an add + load.
 
 Result:
   setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
   benchmark overcommit %improve
   kernbench  1x   -0.13
   kernbench  2x0.02
   dbench 1x   -1.77
   dbench 2x   -0.63
 
 [Jeremy: hinted missing TICKET_LOCK_INC for kick]
 [Oleg: Moving slowpath flag to head, ticket_equals idea]
 [PeterZ: Detailed changelog]
 
 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 Reviewed-by: Oleg Nesterov o...@redhat.com
 Acked-by: David Vrabel david.vra...@citrix.com
 ---
   arch/x86/include/asm/spinlock.h | 94 
  -
   arch/x86/kernel/kvm.c   |  7 ++-
   arch/x86/xen/spinlock.c |  7 ++-
   3 files changed, 58 insertions(+), 50 deletions(-)
 
 Changes for stable:
- Don't replace the ACCESS_ONCE to READ_ONCE which would cause 
  horraneous
  Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
 
 What is the git commit id of this in Linus's tree?  What
 stable tree(s) do you want this applied to?
 
 It's:
 
   d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
 
 Yes, This is the original patch. Please note I have taken out the
 READ_ONCE changes from the original patch to avoid build warnings
 mentioned below.
 (Those READ_ONCE changes were cosmetic and was not present in the
 previous versions)
 
 
 You'll also need this fix from Linus to avoid (harmless)
 build warnings:
 
   dd36929720f4 kernel: make READ_ONCE() valid on const arguments
 
 So this may not be absolutely necessary with the current patch.

I'd prefer to be as close as possible to the upstream patch.  So if
applying both of these patches will work, I'd much rather do that.
Changing patches when backporting them to stable for no good reason than
to clean things up, just confuses everyone involved.

Let's keep our messy history :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] x86: export get_xsave_addr

2014-12-04 Thread Greg KH
On Thu, Dec 04, 2014 at 04:57:06PM +0100, Paolo Bonzini wrote:
 get_xsave_addr is the API to access XSAVE states, and KVM would
 like to use it.  Export it.

Use it in what way?

 Cc: sta...@vger.kernel.org

Why is this a stable patch?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] driver core: platform: add device binding path 'driver_override'

2014-07-08 Thread Greg KH
On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote:
 Needed by platform device drivers, such as the upcoming
 vfio-platform driver, in order to bypass the existing OF, ACPI,
 id_table and name string matches, and successfully be able to be
 bound to any device, like so:
 
 echo vfio-platform  
 /sys/bus/platform/devices/fff51000.ethernet/driver_override
 echo fff51000.ethernet  
 /sys/bus/platform/devices/fff51000.ethernet/driver/unbind
 echo fff51000.ethernet  /sys/bus/platform/drivers_probe
 
 This mimics PCI: Introduce new device binding path using
 pci_dev.driver_override, which is an interface enhancement
 for more deterministic PCI device binding, e.g., when in the
 presence of hotplug.
 
 Reviewed-by: Alex Williamson alex.william...@redhat.com
 Reviewed-by: Alexander Graf ag...@suse.de
 Reviewed-by: Stuart Yoder stuart.yo...@freescale.com
 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 ---
 Greg,
 
 This is largely identical to the PCI version of the same that has
 been accepted for v3.16 and ack'd by you:
 
 https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009674.html
 
 and applied to Bjorn Helgaas' PCI tree:
 
 https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualizationid=782a985d7af26db39e86070d28f987cad21313c0
 
 You are the platform driver core maintainer: can you apply this to
 your driver-core tree now?

Sorry for the very long delay, it's now merged in my tree.

Thanks for being persistant.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] driver core: platform: add device binding path 'driver_override'

2014-06-02 Thread Greg KH
On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote:
 Needed by platform device drivers, such as the upcoming
 vfio-platform driver, in order to bypass the existing OF, ACPI,
 id_table and name string matches, and successfully be able to be
 bound to any device, like so:
 
 echo vfio-platform  
 /sys/bus/platform/devices/fff51000.ethernet/driver_override
 echo fff51000.ethernet  
 /sys/bus/platform/devices/fff51000.ethernet/driver/unbind
 echo fff51000.ethernet  /sys/bus/platform/drivers_probe
 
 This mimics PCI: Introduce new device binding path using
 pci_dev.driver_override, which is an interface enhancement
 for more deterministic PCI device binding, e.g., when in the
 presence of hotplug.
 
 Reviewed-by: Alex Williamson alex.william...@redhat.com
 Reviewed-by: Alexander Graf ag...@suse.de
 Reviewed-by: Stuart Yoder stuart.yo...@freescale.com
 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 ---
 Greg,
 
 This is largely identical to the PCI version of the same that has
 been accepted for v3.16 and ack'd by you:
 
 https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009674.html
 
 and applied to Bjorn Helgaas' PCI tree:
 
 https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualizationid=782a985d7af26db39e86070d28f987cad21313c0
 
 You are the platform driver core maintainer: can you apply this to
 your driver-core tree now?

Yes, I will after this merge window ends, it's too late for 3.16-rc1
with the window opening up a week early, sorry.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-28 Thread Greg KH
On Tue, May 27, 2014 at 09:07:42PM -0600, Bjorn Helgaas wrote:
 On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
  The driver_override field allows us to specify the driver for a device
  rather than relying on the driver to provide a positive match of the
  device.  This shortcuts the existing process of looking up the vendor
  and device ID, adding them to the driver new_id, binding the device,
  then removing the ID, but it also provides a couple advantages.
  
  First, the above existing process allows the driver to bind to any
  device matching the new_id for the window where it's enabled.  This is
  often not desired, such as the case of trying to bind a single device
  to a meta driver like pci-stub or vfio-pci.  Using driver_override we
  can do this deterministically using:
  
  echo pci-stub  /sys/bus/pci/devices/:03:00.0/driver_override
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Previously we could not invoke drivers_probe after adding a device
  to new_id for a driver as we get non-deterministic behavior whether
  the driver we intend or the standard driver will claim the device.
  Now it becomes a deterministic process, only the driver matching
  driver_override will probe the device.
  
  To return the device to the standard driver, we simply clear the
  driver_override and reprobe the device:
  
  echo  /sys/bus/pci/devices/:03:00.0/driver_override
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Another advantage to this approach is that we can specify a driver
  override to force a specific binding or prevent any binding.  For
  instance when an IOMMU group is exposed to userspace through VFIO
  we require that all devices within that group are owned by VFIO.
  However, devices can be hot-added into an IOMMU group, in which case
  we want to prevent the device from binding to any driver (override
  driver = none) or perhaps have it automatically bind to vfio-pci.
  With driver_override it's a simple matter for this field to be set
  internally when the device is first discovered to prevent driver
  matches.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
 
 Greg, are you going to weigh in on this?  It does seem to solve some real
 problems.  ISTR you had an opinion once, but I don't know your current
 thoughts.

This looks good to me:

Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-27 Thread Greg KH
On Tue, May 27, 2014 at 09:07:42PM -0600, Bjorn Helgaas wrote:
 On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
  The driver_override field allows us to specify the driver for a device
  rather than relying on the driver to provide a positive match of the
  device.  This shortcuts the existing process of looking up the vendor
  and device ID, adding them to the driver new_id, binding the device,
  then removing the ID, but it also provides a couple advantages.
  
  First, the above existing process allows the driver to bind to any
  device matching the new_id for the window where it's enabled.  This is
  often not desired, such as the case of trying to bind a single device
  to a meta driver like pci-stub or vfio-pci.  Using driver_override we
  can do this deterministically using:
  
  echo pci-stub  /sys/bus/pci/devices/:03:00.0/driver_override
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Previously we could not invoke drivers_probe after adding a device
  to new_id for a driver as we get non-deterministic behavior whether
  the driver we intend or the standard driver will claim the device.
  Now it becomes a deterministic process, only the driver matching
  driver_override will probe the device.
  
  To return the device to the standard driver, we simply clear the
  driver_override and reprobe the device:
  
  echo  /sys/bus/pci/devices/:03:00.0/driver_override
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Another advantage to this approach is that we can specify a driver
  override to force a specific binding or prevent any binding.  For
  instance when an IOMMU group is exposed to userspace through VFIO
  we require that all devices within that group are owned by VFIO.
  However, devices can be hot-added into an IOMMU group, in which case
  we want to prevent the device from binding to any driver (override
  driver = none) or perhaps have it automatically bind to vfio-pci.
  With driver_override it's a simple matter for this field to be set
  internally when the device is first discovered to prevent driver
  matches.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
 
 Greg, are you going to weigh in on this?  It does seem to solve some real
 problems.  ISTR you had an opinion once, but I don't know your current
 thoughts.

Give me a few more days, still digging through my patch backlog...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Skip setting affinity on uninitialized vq

2014-04-11 Thread Greg KH
On Fri, Apr 11, 2014 at 03:23:45PM +0800, Fam Zheng wrote:
 virtscsi_init calls virtscsi_remove_vqs on err, even before initializing
 the vqs. The latter calls virtscsi_set_affinity, so let's check the
 pointer there before setting affinity on it.
 
 This fixes a panic when setting device's num_queues=2 on RHEL 6.5:
 
 qemu-system-x86_64 ... \
 -device virtio-scsi-pci,id=scsi0,addr=0x13,...,num_queues=2 \
 -drive file=/stor/vm/dummy.raw,id=drive-scsi-disk,... \
 -device scsi-hd,drive=drive-scsi-disk,...
 
 [0.354734] scsi0 : Virtio SCSI HBA
 [0.379504] BUG: unable to handle kernel NULL pointer dereference at 
 0020
 [0.380141] IP: [814741ef] __virtscsi_set_affinity+0x4f/0x120
 [0.380141] PGD 0
 [0.380141] Oops:  [#1] SMP
 [0.380141] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #5
 [0.380141] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 [0.380141] task: 88003c9f ti: 88003c9f8000 task.ti: 
 88003c9f8000
 [0.380141] RIP: 0010:[814741ef]  [814741ef] 
 __virtscsi_set_affinity+0x4f/0x120
 [0.380141] RSP: :88003c9f9c08  EFLAGS: 00010256
 [0.380141] RAX:  RBX: 88003c3a9d40 RCX: 
 1070
 [0.380141] RDX: 0002 RSI:  RDI: 
 
 [0.380141] RBP: 88003c9f9c28 R08: 000136c0 R09: 
 88003c801c00
 [0.380141] R10: 81475229 R11: 0008 R12: 
 
 [0.380141] R13: 81cc7ca8 R14: 88003cac3d40 R15: 
 88003cac37a0
 [0.380141] FS:  () GS:88003e40() 
 knlGS:
 [0.380141] CS:  0010 DS:  ES:  CR0: 8005003b
 [0.380141] CR2: 0020 CR3: 01c0e000 CR4: 
 06f0
 [0.380141] Stack:
 [0.380141]  88003c3a9d40  88003cac3d80 
 88003cac3d40
 [0.380141]  88003c9f9c48 814742e8 88003c26d000 
 88003c26d000
 [0.380141]  88003c9f9c68 81474321 88003c26d000 
 88003c3a9d40
 [0.380141] Call Trace:
 [0.380141]  [814742e8] virtscsi_set_affinity+0x28/0x40
 [0.380141]  [81474321] virtscsi_remove_vqs+0x21/0x50
 [0.380141]  [81475231] virtscsi_init+0x91/0x240
 [0.380141]  [81365290] ? vp_get+0x50/0x70
 [0.380141]  [81475544] virtscsi_probe+0xf4/0x280
 [0.380141]  [81363ea5] virtio_dev_probe+0xe5/0x140
 [0.380141]  [8144c669] driver_probe_device+0x89/0x230
 [0.380141]  [8144c8ab] __driver_attach+0x9b/0xa0
 [0.380141]  [8144c810] ? driver_probe_device+0x230/0x230
 [0.380141]  [8144c810] ? driver_probe_device+0x230/0x230
 [0.380141]  [8144ac1c] bus_for_each_dev+0x8c/0xb0
 [0.380141]  [8144c499] driver_attach+0x19/0x20
 [0.380141]  [8144bf28] bus_add_driver+0x198/0x220
 [0.380141]  [8144ce9f] driver_register+0x5f/0xf0
 [0.380141]  [81d27c91] ? spi_transport_init+0x79/0x79
 [0.380141]  [8136403b] register_virtio_driver+0x1b/0x30
 [0.380141]  [81d27d19] init+0x88/0xd6
 [0.380141]  [81d27c18] ? scsi_init_procfs+0x5b/0x5b
 [0.380141]  [81ce88a7] do_one_initcall+0x7f/0x10a
 [0.380141]  [81ce8aa7] kernel_init_freeable+0x14a/0x1de
 [0.380141]  [81ce8b3b] ? kernel_init_freeable+0x1de/0x1de
 [0.380141]  [817dec20] ? rest_init+0x80/0x80
 [0.380141]  [817dec29] kernel_init+0x9/0xf0
 [0.380141]  [817e68fc] ret_from_fork+0x7c/0xb0
 [0.380141]  [817dec20] ? rest_init+0x80/0x80
 [0.380141] RIP  [814741ef] __virtscsi_set_affinity+0x4f/0x120
 [0.380141]  RSP 88003c9f9c08
 [0.380141] CR2: 0020
 [0.380141] ---[ end trace 8074b70c3d5e1d73 ]---
 [0.475018] Kernel panic - not syncing: Attempted to kill init! 
 exitcode=0x0009
 [0.475018]
 [0.475068] Kernel Offset: 0x0 from 0x8100 (relocation range: 
 0x8000-0x9fff)
 [0.475068] ---[ end Kernel panic - not syncing: Attempted to kill init! 
 exitcode=0x0009
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  drivers/scsi/virtio_scsi.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce new device binding path using pci_dev.driver_override

2014-04-01 Thread Greg KH
On Tue, Apr 01, 2014 at 10:28:54AM -0600, Alex Williamson wrote:
 The driver_override field allows us to specify the driver for a device
 rather than relying on the driver to provide a positive match of the
 device.  This shortcuts the existing process of looking up the vendor
 and device ID, adding them to the driver new_id, binding the device,
 then removing the ID, but it also provides a couple advantages.
 
 First, the above process allows the driver to bind to any device
 matching the new_id for the window where it's enabled.  This is often
 not desired, such as the case of trying to bind a single device to a
 meta driver like pci-stub or vfio-pci.  Using driver_override we can
 do this deterministically using:
 
 echo pci-stub  /sys/bus/pci/devices/:03:00.0/driver_override
 echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
 echo :03:00.0  /sys/bus/pci/drivers_probe
 
 Previously we could not invoke drivers_probe after adding a device
 to new_id for a driver as we get non-deterministic behavior whether
 the driver we intend or the standard driver will claim the device.
 Now it becomes a deterministic process, only the driver matching
 driver_override will probe the device.
 
 To return the device to the standard driver, we simply clear the
 driver_override and reprobe the device, ex:
 
 echo  /sys/bus/pci/devices/:03:00.0/preferred_driver
 echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
 echo :03:00.0  /sys/bus/pci/drivers_probe
 
 Another advantage to this approach is that we can specify a driver
 override to force a specific binding or prevent any binding.  For
 instance when an IOMMU group is exposed to userspace through VFIO
 we require that all devices within that group are owned by VFIO.
 However, devices can be hot-added into an IOMMU group, in which case
 we want to prevent the device from binding to any driver (preferred
 driver = none) or perhaps have it automatically bind to vfio-pci.
 With driver_override it's a simple matter for this field to be set
 internally when the device is first discovered to prevent driver
 matches.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 Apologies for the exceptionally long cc list, this is a follow-up to
 Stuart's Subject: mechanism to allow a driver to bind to any device
 thread.  This is effectively a v2 of the proof-of-concept patch I
 posted in that thread.  This version changes to use a dummy id struct
 to return on an override match, which removes the collateral damage
 and greatly simplifies the patch.  This feels fairly well baked for
 PCI and I would expect that platform drivers could do a similar
 implementation.  From there perhaps we can discuss whether there's
 any advantage to placing driver_override on struct device.  The logic
 for incorporating it into the match still needs to happen per bus
 driver, so it might only contribute to consistency of the show/store
 sysfs attributes to move it up to struct device.  Please comment.
 Thanks,
 
 Alex
 
  drivers/pci/pci-driver.c |   25 ++---
  drivers/pci/pci-sysfs.c  |   40 
  include/linux/pci.h  |1 +
  3 files changed, 63 insertions(+), 3 deletions(-)

No Documentation/ABI/ update to reflect the ABI you are creating?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce new device binding path using pci_dev.driver_override

2014-04-01 Thread Greg KH
On Tue, Apr 01, 2014 at 06:52:12PM -0500, Kim Phillips wrote:
 On Tue, 01 Apr 2014 10:28:54 -0600
 Alex Williamson alex.william...@redhat.com wrote:
 
  The driver_override field allows us to specify the driver for a device
  rather than relying on the driver to provide a positive match of the
  device.  This shortcuts the existing process of looking up the vendor
  and device ID, adding them to the driver new_id, binding the device,
  then removing the ID, but it also provides a couple advantages.
  
  First, the above process allows the driver to bind to any device
  matching the new_id for the window where it's enabled.  This is often
  not desired, such as the case of trying to bind a single device to a
  meta driver like pci-stub or vfio-pci.  Using driver_override we can
  do this deterministically using:
  
  echo pci-stub  /sys/bus/pci/devices/:03:00.0/driver_override
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Previously we could not invoke drivers_probe after adding a device
  to new_id for a driver as we get non-deterministic behavior whether
  the driver we intend or the standard driver will claim the device.
  Now it becomes a deterministic process, only the driver matching
  driver_override will probe the device.
  
  To return the device to the standard driver, we simply clear the
  driver_override and reprobe the device, ex:
  
  echo  /sys/bus/pci/devices/:03:00.0/preferred_driver
  echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
  echo :03:00.0  /sys/bus/pci/drivers_probe
  
  Another advantage to this approach is that we can specify a driver
  override to force a specific binding or prevent any binding.  For
  instance when an IOMMU group is exposed to userspace through VFIO
  we require that all devices within that group are owned by VFIO.
  However, devices can be hot-added into an IOMMU group, in which case
  we want to prevent the device from binding to any driver (preferred
  driver = none) or perhaps have it automatically bind to vfio-pci.
  With driver_override it's a simple matter for this field to be set
  internally when the device is first discovered to prevent driver
  matches.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
  Apologies for the exceptionally long cc list, this is a follow-up to
  Stuart's Subject: mechanism to allow a driver to bind to any device
  thread.  This is effectively a v2 of the proof-of-concept patch I
  posted in that thread.  This version changes to use a dummy id struct
  to return on an override match, which removes the collateral damage
  and greatly simplifies the patch.  This feels fairly well baked for
  PCI and I would expect that platform drivers could do a similar
  implementation.  From there perhaps we can discuss whether there's
  any advantage to placing driver_override on struct device.  The logic
  for incorporating it into the match still needs to happen per bus
  driver, so it might only contribute to consistency of the show/store
  sysfs attributes to move it up to struct device.  Please comment.
 
 Sounds like Greg likes this approach more than {drv,dev}_sysfs_only.

I have made no such judgement, I only pointed out that if you
modify/add/remove a sysfs file, it needs to have documentation for it.

 The diff below is the result of duplicating and converting this patch
 for platform devices, and, indeed, binding a device to the
 vfio-platform driver succeeds with:
 
 echo vfio-platform  
 /sys/bus/platform/devices/fff51000.ethernet/driver_override
 echo fff51000.ethernet  
 /sys/bus/platform/devices/fff51000.ethernet/driver/unbind
 echo fff51000.ethernet  /sys/bus/platform/drivers_probe
 
 However, it's almost pure duplication modulo the bus match code.  The
 only other place I can see where to put the common bus check is
 drivers/base/base.h:driver_match_device(), which I'm guessing is
 off-limits?  So should we leave this as per-bus code, and somehow
 refactor driver_override_{show,store}?

If you can provide a way for this to be done in a bus-independant way,
like we did for new_id and the like, I'd be open to reviewing it.


greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()

2014-02-14 Thread Greg KH
On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
 From: Kim Phillips kim.phill...@linaro.org
 
 Needed by drivers, such as the vfio platform driver [1], seeking to
 bypass bind_store()'s driver_match_device(), and bind to any device
 via a private sysfs bind file.
 
 [1] https://lkml.org/lkml/2013/12/11/522
 
 note: the EXPORT_SYMBOL is needed because vfio-platform can be built
 as a module.

No code outside of drivers/base/ should be calling this function, you
are doing something wrong in your bus if you want to do this, please fix
your bus code.

sorry, I can't accept this at all.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Greg KH
On Fri, Jan 17, 2014 at 08:52:42PM +0100, Radim Krčmář wrote:
 We should open NMI window right after an iret, but SVM exits before it.
 We wanted to single step using the trap flag and then open it.
 (or we could emulate the iret instead)
 We don't do it since commit 3842d135ff2 (likely), because the iret exit
 handler does not request an event, so NMI window remains closed until
 the next exit.
 
 Fix this by making KVM_REQ_EVENT request in the iret handler.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least)
 
  arch/x86/kvm/svm.c | 1 +
  1 file changed, 1 insertion(+)


formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH backport hints for 3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)

2013-12-18 Thread Greg KH
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote:
 The KVM patch fix vapic memory corruption applies to most kernels that
 have KVM, but the fix does not apply on many older branches.  The APIs it
 uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had
 one fewer parameter.
 
 The comments in this patch should help fixing kvm_lapic_set_vapic_addr
 in older kernels.  I will review the backports as they are posted to
 LKML (if I am CCed...).

This fails to build on 3.4-stable, could you provide a working backport
for that kernel?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH backport hints for 3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)

2013-12-16 Thread Greg KH
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote:
 The KVM patch fix vapic memory corruption applies to most kernels that
 have KVM, but the fix does not apply on many older branches.  The APIs it
 uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had
 one fewer parameter.
 
 The comments in this patch should help fixing kvm_lapic_set_vapic_addr
 in older kernels.  I will review the backports as they are posted to
 LKML (if I am CCed...).
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/lapic.c | 38 +-
  arch/x86/kvm/lapic.h |  4 ++--
  arch/x86/kvm/x86.c   | 33 +
  3 files changed, 32 insertions(+), 43 deletions(-)

I don't understand, what are people supposed to do with this?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd

2013-10-30 Thread Greg KH
On Wed, Oct 30, 2013 at 07:31:05PM +0530, Raghavendra K T wrote:
 On 10/30/2013 01:03 AM, Linus Torvalds wrote:
  On Tue, Oct 29, 2013 at 12:27 PM, Raghavendra K T
  raghavendra...@linux.vnet.ibm.com wrote:
 
  Could one solution be cascading actual error
  that is lost in fs/debugfs/inode.c:__create_file(), so that we could
  take correct action in case of failure of debugfs_create_dir()?
 
  (ugly side is we increase total number of params for __create_file to
  6). or I hope there could be some better solution.
 
  The solution to this would be to simply return an error-pointer. See
  linux/err.h. That's what we do for most complex subsystems that
  return a pointer to a struct: rather than returning NULL as an
  error, return the actual error number encoded in the pointer itself.
 
 Thank you Linus. Yes, this would have been ideal.
 
 
  But that would require every user of debugfs_create_dir() to be
  updated to check errors using IS_ERR() instead of checking against
  NULL, and there's quite a few of them.
 
  So I think just making the error be EEXIST is a simpler solution right now.
 
 
 Agree.
 I had below patch, and just before sending as formal mail I saw
 Paolo's pull request which already took care of it.
 ---8---
 
 

 From ac5d9f038c273f27bea7a54aab6af79b57f56317 Mon Sep 17 00:00:00 2001
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 Date: Wed, 30 Oct 2013 18:59:46 +0530
 Subject: [PATCH] Return EEXIST on debugfs_create_dir failure in kvm
 
 As quoted by Linus,  EFAULT means user passed in an invalid
 virtual address pointer, which is why the error string is Bad address.
 But when a debugfs directory creation fails, the above error is not valid.
 
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  I understand that Tim's patch that renames directory to something like
  kvm-pv would solve kvm-amd/kvm-intel modules insertion problem. 
  This patch is to address error code  change complained by Linus.
 
  arch/x86/kernel/kvm.c | 2 +-
  virt/kvm/kvm_main.c   | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index a0e2a8a..e475fdb 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -622,7 +622,7 @@ static int __init kvm_spinlock_debugfs(void)
  
   d_kvm = kvm_init_debugfs();
   if (d_kvm == NULL)
 - return -ENOMEM;
 + return -EEXIST;

Why even error out at all here?  You should never care about debugfs
file creation return values, so why pass any error back up the stack?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd

2013-10-30 Thread Greg KH
On Wed, Oct 30, 2013 at 04:46:28PM +0100, Paolo Bonzini wrote:
 Il 30/10/2013 16:39, Raghavendra K T ha scritto:
 
  Why even error out at all here?  You should never care about debugfs
  file creation return values, so why pass any error back up the stack?
  
  We could change this to return 0, but we will still be left with
  kvm_main.c: kvm_init_debug() function which creates the kvm  debugfs
  directory. (I thought Paolo and Gleb's ack would be needed for
  that change since it would be a bigger decision for me)
  
  So I just made the patch to fix only Linus's concern.
 
 Even if it is okay to exit and not create the files (and I think it's a
 bit surprising), I'd like at least a printk to signal what's happening.
  But there should be no reason for debugfs directory creation to fail in
 the end, except for basic mistakes such as the one that Tim reported, so
 I think it's better to report the failure.

Creation will fail if debugfs is not enabled, so spiting out printk
messages in that case is not a good idea.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd

2013-10-30 Thread Greg KH
On Wed, Oct 30, 2013 at 05:08:09PM +0100, Paolo Bonzini wrote:
 Il 30/10/2013 16:59, Greg KH ha scritto:
   Even if it is okay to exit and not create the files (and I think it's a
   bit surprising), I'd like at least a printk to signal what's happening.
But there should be no reason for debugfs directory creation to fail in
   the end, except for basic mistakes such as the one that Tim reported, so
   I think it's better to report the failure.
  Creation will fail if debugfs is not enabled, so spiting out printk
  messages in that case is not a good idea.
 
 Interestingly, if debugfs is not enabled we are already returning an
 error-valued pointer:
 
 static inline struct dentry *debugfs_create_dir(const char *name,
 struct dentry *parent)
 {
 return ERR_PTR(-ENODEV);
 }
 
 which would oops a lot of the current callers.

It will oops?  Really?  Where?  That shouldn't happen at all.

 Very few places use the currently correct idiom
 
   if (IS_ERR(root) || !root)
 
 but it's very ugly...  Perhaps debugfs_create_dir *should* return an
 error-valued pointer after all.

Or just don't care about the return value, and all will work out just
fine, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.12-rc7] KVM: Fix modprobe failure for kvm_intel/kvm_amd

2013-10-29 Thread Greg KH
On Wed, Oct 30, 2013 at 12:57:32AM +0530, Raghavendra K T wrote:
 Adding Greg/AI too since we touch debugfs code.

You do?

 [...]
 
 sudo modprobe kvm_amd
 modprobe: ERROR: could not insert 'kvm_amd': Bad address
 
 Bad address? Christ people, are you guys making up error numbers
 with some kind of dice-roll? I can just see it now, somebody sitting
 there with a D20, playing some kind of kernel-specific DD, and
 rolling a ten means that you get to slay the orc, and pick an error
 number of EFAULT for some random kernel function. Because quite
 frankly, random dice roll is the _only_ thing that explains Bad
 address sufficiently.
 
 Please, whoever wrote virt/kvm/kvm_main.c:: kvm_init_debug(), WTF?
 EFAULT means user passed in an invalid virtual address pointer,
 which is why the error string is Bad address. It makes absolutely NO
 SENSE here. Perhaps EEXIST or EBUSY.
 
 
 Right. In current scenario it should have been EEXIST :(.
 
 debugfs_create_dir() currently returns NULL dentry on both
 EEXIST, ENOMEM ... cases.
 
 Could one solution be cascading actual error
 that is lost in fs/debugfs/inode.c:__create_file(), so that we could
 take correct action in case of failure of debugfs_create_dir()?

What would you do here?  You shouldn't really care about debugfs files,
if there's an error, keep on going, no code path should really care,
right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing

2013-10-06 Thread Greg KH
On Wed, Oct 02, 2013 at 10:44:31AM +0200, Borislav Petkov wrote:
 On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
  At best the current code only seems to free the leaf pagetables and
  the root.  If you're unlucky enough to have a large gap (like any
  QEMU guest with more than 3G of memory), only the first chunk of leaf
  pagetables are freed (plus the root).  This is a massive memory leak.
  This patch re-writes the pagetable freeing function to use a
  recursive algorithm and manages to not only free all the pagetables,
  but does it without any apparent performance loss versus the current
  broken version.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  Cc: sta...@vger.kernel.org
  ---
  
  Suggesting for stable, would like to see some soak time, but it's
  hard to imagine this being any worse than the current code.
 
 Btw, I have a backport for the 3.0.x series which builds fine here, in
 case you guys are interested :)

Thanks, now applied.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable-3.4] possibly revert KVM: X86 emulator: fix source operand decoding...

2013-09-12 Thread Greg KH
On Tue, Sep 10, 2013 at 01:05:21PM +0200, Paolo Bonzini wrote:
 Il 04/09/2013 18:44, Paul Gortmaker ha scritto:
  Hi Greg,
  
  The 3.4.44+ cherry pick:
  

commit 5b5b30580218eae22609989546bac6e44d0eda6e
Author: Gleb Natapov g...@redhat.com
Date:   Wed Apr 24 13:38:36 2013 +0300
  
  KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x 
  instructions
  
  commit 660696d1d16a71e15549ce1bf74953be1592bcd3 upstream.
  
  Source operand for one byte mov[zs]x is decoded incorrectly if it is in
  high byte register. Fix that.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

  
  introduces the following:
  
  arch/x86/kvm/emulate.c: In function ‘decode_operand’:
  arch/x86/kvm/emulate.c:3974:4: warning: passing argument 1 of 
  ‘decode_register’ makes integer from pointer without a cast [enabled by 
  default]
  arch/x86/kvm/emulate.c:789:14: note: expected ‘u8’ but argument is of type 
  ‘struct x86_emulate_ctxt *’
  arch/x86/kvm/emulate.c:3974:4: warning: passing argument 2 of 
  ‘decode_register’ makes pointer from integer without a cast [enabled by 
  default]
  arch/x86/kvm/emulate.c:789:14: note: expected ‘long unsigned int *’ but 
  argument is of type ‘u8’
  
  Based on the severity of the warnings above, I'm reasonably sure there will
  be some kind of runtime regressions due to this, but I stopped to 
  investigate
  the warnings as soon as I saw them, before any run time testing.
  
  It happens because mainline v3.7-rc1~113^2~40 (dd856efafe60) does this:
  
  -static void *decode_register(u8 modrm_reg, unsigned long *regs,
  +static void *decode_register(struct x86_emulate_ctxt *ctxt, u8 modrm_reg,
  
  Since 660696d1d16a71e1 was only applied to stable 3.4, 3.8, and 3.9 -- and
  the prerequisite above is in 3.7+, the issue should be limited to 3.4.44+
 
 Right, the fix is not important to have for 3.4 kernels.

Thanks for letting me know, I've now reverted it.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution

2012-09-27 Thread Greg KH
On Mon, Sep 17, 2012 at 01:28:50PM -0700, Greg KH wrote:
 On Mon, Sep 17, 2012 at 12:40:16PM -0700, Tejun Heo wrote:
  On Fri, Sep 14, 2012 at 03:50:40PM -0700, Colin Cross wrote:
   This patch set fixes a reproducible crash I'm seeing on a 3.4.10
   kernel.  flush_kthread_worker (which is different from
   flush_kthread_work) is initializing a kthread_work and a completion on
   the stack, then queuing it and calling wait_for_completion.  Once the
   completion is signaled, flush_kthread_worker exits and the stack
   region used by the kthread_work may be immediately reused by another
   object on the stack, but kthread_worker_fn continues accessing its
   work pointer:
   work-func(work); - calls complete,
   effectively frees work
   smp_wmb();  /* wmb worker-b0 paired with flush-b1 */
   work-done_seq = work-queue_seq;   - overwrites a
   new stack object
   smp_mb();   /* mb worker-b1 paired with flush-b0 */
   if (atomic_read(work-flushing))
   wake_up_all(work-done);  - or crashes here
   
   These patches fix the problem by not accessing work after work-func
   is called, and should be backported to stable.  They apply cleanly to
   3.4.10.  Upstream commits are 9a2e03d8ed518a61154f18d83d6466628e519f94
   and 46f3d976213452350f9d10b0c2780c2681f7075b.
  
  Yeah, you're right.  I wonder why this didn't come up before.  Greg,
  can you please pick up these two commits?
 
 Ok, will do, thanks for letting me know.

Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution

2012-09-17 Thread Greg KH
On Mon, Sep 17, 2012 at 12:40:16PM -0700, Tejun Heo wrote:
 On Fri, Sep 14, 2012 at 03:50:40PM -0700, Colin Cross wrote:
  This patch set fixes a reproducible crash I'm seeing on a 3.4.10
  kernel.  flush_kthread_worker (which is different from
  flush_kthread_work) is initializing a kthread_work and a completion on
  the stack, then queuing it and calling wait_for_completion.  Once the
  completion is signaled, flush_kthread_worker exits and the stack
  region used by the kthread_work may be immediately reused by another
  object on the stack, but kthread_worker_fn continues accessing its
  work pointer:
  work-func(work); - calls complete,
  effectively frees work
  smp_wmb();  /* wmb worker-b0 paired with flush-b1 */
  work-done_seq = work-queue_seq;   - overwrites a
  new stack object
  smp_mb();   /* mb worker-b1 paired with flush-b0 */
  if (atomic_read(work-flushing))
  wake_up_all(work-done);  - or crashes here
  
  These patches fix the problem by not accessing work after work-func
  is called, and should be backported to stable.  They apply cleanly to
  3.4.10.  Upstream commits are 9a2e03d8ed518a61154f18d83d6466628e519f94
  and 46f3d976213452350f9d10b0c2780c2681f7075b.
 
 Yeah, you're right.  I wonder why this didn't come up before.  Greg,
 can you please pick up these two commits?

Ok, will do, thanks for letting me know.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: manual merge of the staging tree with the target-merge tree

2012-07-23 Thread Greg KH
On Sun, Jul 22, 2012 at 02:44:23PM -0700, Nicholas A. Bellinger wrote:
 So Linus has merged target-pending/for-next this afternoon, so now we
 are just waiting on net-next to hit mainline with the vhost patches
 already ACK'ed by MST.  Hopefully that makes things easier for you to
 considering taking tcm_vhost upstream via staging.  ;)
 
 Also, MST asked for an RFC-v5 for the initial merge commit with some
 minor debug wrapper changes that will be going out next week.  This will
 include a move into drivers/staging/tcm_vhost/ against a rebased
 staging.git patch with the necessary -rc0 mainline dependencies.
 
 Please let me know if your OK with this, otherwise I'll just plan to
 keep -v5 against target-pending/for-next-merge for now, and send a GIT
 PULL after MST gets back from holiday on the 29th - 30th.

I have no idea what any of the above three paragraphs are asking for, or
talking about, sorry.

Note, the merge window is closed for taking new stuff into the staging
tree.  If it isn't already in my staging-next tree, it isn't going to go
into 3.6 unless it's bug fixes, sorry.  How about we figure all of this
out after 3.6-rc1 is out so we can understand what is going on for 3.7?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] Documentation: Add ABI entry for vmcs sysfs interface

2012-07-04 Thread Greg KH
On Wed, Jul 04, 2012 at 06:06:28PM +0800, Yanfei Zhang wrote:
 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
 ---
  Documentation/ABI/testing/sysfs-devices-system-cpu |   21 
 
  1 files changed, 21 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
 b/Documentation/ABI/testing/sysfs-devices-system-cpu
 index 5dab364..6efbd6c 100644
 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
 +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
 @@ -9,6 +9,27 @@ Description:
  
   /sys/devices/system/cpu/cpu#/
  
 +What:/sys/devices/system/cpu/vmcs_id
 +Date:June 2012
 +KernelVersion:   3.5.0

3.5.0 will not have this feature in it, so you should probably change
these lines in this patch.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] Documentation: Add ABI entry for vmcs sysfs interface

2012-07-04 Thread Greg KH
On Wed, Jul 04, 2012 at 06:06:28PM +0800, Yanfei Zhang wrote:
 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
 ---
  Documentation/ABI/testing/sysfs-devices-system-cpu |   21 
 
  1 files changed, 21 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
 b/Documentation/ABI/testing/sysfs-devices-system-cpu
 index 5dab364..6efbd6c 100644
 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
 +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
 @@ -9,6 +9,27 @@ Description:
  
   /sys/devices/system/cpu/cpu#/
  
 +What:/sys/devices/system/cpu/vmcs_id

Wait, aren't these a per-cpu value?  You have them as a global value
for all cpus, is that really the case?  Shouldn't they be under
/sys/devices/system/cpu/cpu#/ instead?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

2012-06-28 Thread Greg KH
On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote:
 于 2012年06月28日 03:22, Greg KH 写道:
  On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
  This patch export offsets of fields via /sys/devices/cpu/vmcs/.
  Individual offsets are contained in subfiles named by the filed's
  encoding, e.g.: /sys/devices/cpu/vmcs/0800
 
  Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
  ---
   drivers/base/core.c |   13 +
   1 files changed, 13 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/base/core.c b/drivers/base/core.c
  index 346be8b..dd05ee7 100644
  --- a/drivers/base/core.c
  +++ b/drivers/base/core.c
  @@ -26,6 +26,7 @@
   #include linux/async.h
   #include linux/pm_runtime.h
   #include linux/netdevice.h
  +#include asm/vmcsinfo.h
  
  Did you just break the build on all other arches?  Not nice.
  
  @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
 error = dpm_sysfs_add(dev);
 if (error)
 goto DPMError;
  +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
  +  error = vmcs_sysfs_add(dev);
  +  if (error)
  +  goto VMCSError;
  +#endif
  
  Oh my no, that's no way to ever do this, you know better than that,
  please fix.
  
  greg k-h
  
 
 Sorry for my thoughtless, Here is the new patch.
 
 ---
  drivers/base/core.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 346be8b..7b5266a 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -30,6 +30,13 @@
  #include base.h
  #include power/power.h
  
 +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
 +#include asm/vmcsinfo.h
 +#else
 +static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
 +static inline void vmcs_sysfs_remove(struct device *dev) { }
 +#endif

{sigh}  No, again, you know better, don't do this.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

2012-06-28 Thread Greg KH
On Thu, Jun 28, 2012 at 04:37:38AM -0700, Greg KH wrote:
 On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote:
  于 2012年06月28日 03:22, Greg KH 写道:
   On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
   This patch export offsets of fields via /sys/devices/cpu/vmcs/.
   Individual offsets are contained in subfiles named by the filed's
   encoding, e.g.: /sys/devices/cpu/vmcs/0800
  
   Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
   ---
drivers/base/core.c |   13 +
1 files changed, 13 insertions(+), 0 deletions(-)
  
   diff --git a/drivers/base/core.c b/drivers/base/core.c
   index 346be8b..dd05ee7 100644
   --- a/drivers/base/core.c
   +++ b/drivers/base/core.c
   @@ -26,6 +26,7 @@
#include linux/async.h
#include linux/pm_runtime.h
#include linux/netdevice.h
   +#include asm/vmcsinfo.h
   
   Did you just break the build on all other arches?  Not nice.
   
   @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
error = dpm_sysfs_add(dev);
if (error)
goto DPMError;
   +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
   +error = vmcs_sysfs_add(dev);
   +if (error)
   +goto VMCSError;
   +#endif
   
   Oh my no, that's no way to ever do this, you know better than that,
   please fix.
   
   greg k-h
   
  
  Sorry for my thoughtless, Here is the new patch.
  
  ---
   drivers/base/core.c |   13 +
   1 files changed, 13 insertions(+), 0 deletions(-)
  
  diff --git a/drivers/base/core.c b/drivers/base/core.c
  index 346be8b..7b5266a 100644
  --- a/drivers/base/core.c
  +++ b/drivers/base/core.c
  @@ -30,6 +30,13 @@
   #include base.h
   #include power/power.h
   
  +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
  +#include asm/vmcsinfo.h
  +#else
  +static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
  +static inline void vmcs_sysfs_remove(struct device *dev) { }
  +#endif
 
 {sigh}  No, again, you know better, don't do this.

Ok, as others have rightly pointed out, this wasn't the most helpful
review comment, sorry about that.

In Linux, we don't put ifdefs in .c files, we put them in .h files.  See
many examples of this all over the place.  That's my main complaints the
past two times of this patch.

But, for this, I would question why you even want / need to do this in
the drivers/base/core/ file in the first place.  Shouldn't it be in some
arch or cpu specific file instead that already handles the cpu files?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

2012-06-27 Thread Greg KH
On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
 This patch export offsets of fields via /sys/devices/cpu/vmcs/.
 Individual offsets are contained in subfiles named by the filed's
 encoding, e.g.: /sys/devices/cpu/vmcs/0800
 
 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
 ---
  drivers/base/core.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 346be8b..dd05ee7 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -26,6 +26,7 @@
  #include linux/async.h
  #include linux/pm_runtime.h
  #include linux/netdevice.h
 +#include asm/vmcsinfo.h

Did you just break the build on all other arches?  Not nice.

 @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
   error = dpm_sysfs_add(dev);
   if (error)
   goto DPMError;
 +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
 + error = vmcs_sysfs_add(dev);
 + if (error)
 + goto VMCSError;
 +#endif

Oh my no, that's no way to ever do this, you know better than that,
please fix.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/13] driver core: Add iommu_group tracking to struct device

2012-05-11 Thread Greg KH
On Fri, May 11, 2012 at 04:55:35PM -0600, Alex Williamson wrote:
 IOMMU groups allow IOMMU drivers to represent DMA visibility
 and isolation of devices.  Multiple devices may be grouped
 together for the purposes of DMA.  Placing a pointer on
 struct device enable easy access for things like streaming
 DMA programming and drivers like VFIO.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com

Can't you get this today from the iommu_ops pointer that is on the bus
that the device is associated with?  Or can devices on a bus have
different iommu_group pointers?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/13] iommu: IOMMU Groups

2012-05-11 Thread Greg KH
On Fri, May 11, 2012 at 04:55:41PM -0600, Alex Williamson wrote:
 IOMMU device groups are currently a rather vague associative notion
 with assembly required by the user or user level driver provider to
 do anything useful.  This patch intends to grow the IOMMU group concept
 into something a bit more consumable.
 
 To do this, we first create an object representing the group, struct
 iommu_group.  This structure is allocated (iommu_group_alloc) and
 filled (iommu_group_add_device) by the iommu driver.  The iommu driver
 is free to add devices to the group using it's own set of policies.
 This allows inclusion of devices based on physical hardware or topology
 limitations of the platform, as well as soft requirements, such as
 multi-function trust levels or peer-to-peer protection of the
 interconnects.  Each device may only belong to a single iommu group,
 which is linked from struct device.iommu_group.  IOMMU groups are
 maintained using kobject reference counting, allowing for automatic
 removal of empty, unreferenced groups.  It is the responsibility of
 the iommu driver to remove devices from the group
 (iommu_group_remove_device).
 
 IOMMU groups also include a userspace representation in sysfs under
 /sys/kernel/iommu_groups.  When allocated, each group is given a
 dynamically assign ID (int).  The ID is managed by the core IOMMU group
 code to support multiple heterogeneous iommu drivers, which could
 potentially collide in group naming/numbering.  This also keeps group
 IDs to small, easily managed values.  A directory is created under
 /sys/kernel/iommu_groups for each group.  A further subdirectory named
 devices contains links to each device within the group.  The iommu_group
 file in the device's sysfs directory, which formerly contained a group
 number when read, is now a link to the iommu group.  Example:
 
 $ ls -l /sys/kernel/iommu_groups/26/devices/

snip

As you are creating new sysfs files and directories, you need to also
add the proper Documentation/ABI/ files at the same time.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/13] driver core: Add iommu_group tracking to struct device

2012-05-11 Thread Greg KH
On Fri, May 11, 2012 at 05:58:01PM -0600, Alex Williamson wrote:
 On Fri, 2012-05-11 at 16:38 -0700, Greg KH wrote:
  On Fri, May 11, 2012 at 04:55:35PM -0600, Alex Williamson wrote:
   IOMMU groups allow IOMMU drivers to represent DMA visibility
   and isolation of devices.  Multiple devices may be grouped
   together for the purposes of DMA.  Placing a pointer on
   struct device enable easy access for things like streaming
   DMA programming and drivers like VFIO.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
  
  Can't you get this today from the iommu_ops pointer that is on the bus
  that the device is associated with?  Or can devices on a bus have
  different iommu_group pointers?
 
 The latter, each device on a bus might be it's own group.  This is often
 the case on x86 unless PCIe-to-PCI bridges obscure the device
 visibility.  Thanks,

Ah, ok, then I have no objection to add this to struct device:

Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] ksysfs: export VMCSINFO via sysfs

2012-04-16 Thread Greg KH
On Tue, Apr 17, 2012 at 09:52:42AM +0800, zhangyanfei wrote:
 于 2012年04月13日 07:00, Greg KH 写道:
  On Wed, Apr 11, 2012 at 09:57:34AM +0800, zhangyanfei wrote:
  This patch creates sysfs file to export where VMCSINFO is allocated,
  as below:
  $ cat /sys/kernel/vmcsinfo
  1cb88a0 2000
  number on the left-hand side is the physical address of VMCSINFO,
  while the one on the right-hand side is the max size of VMCSINFO.
  
  Ick, why do you have 2 values in one sysfs file, that's not nice, or
  good.
  
  What's wrong with 2 different files?
  
 
 The reason why I put the 2 values in one sysfs file is that there is a similar
 file 'vmcoreinfo' in sysfs.
 # cat /sys/kernel/vmcoreinfo 
 1d75380 1000

Then that should be fixed as well, using two different file names now :(

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] ksysfs: export VMCSINFO via sysfs

2012-04-12 Thread Greg KH
On Wed, Apr 11, 2012 at 09:57:34AM +0800, zhangyanfei wrote:
 This patch creates sysfs file to export where VMCSINFO is allocated,
 as below:
 $ cat /sys/kernel/vmcsinfo
 1cb88a0 2000
 number on the left-hand side is the physical address of VMCSINFO,
 while the one on the right-hand side is the max size of VMCSINFO.

Ick, why do you have 2 values in one sysfs file, that's not nice, or
good.

What's wrong with 2 different files?

Also, any new sysfs file you add needs to also have a Documentation/ABI
entry added as well.

But we can't accept this as-is, sorry, please split it up into 2 files.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] kvm: iommu unmap fixes

2012-04-11 Thread Greg KH
On Wed, Apr 11, 2012 at 09:51:43AM -0600, Alex Williamson wrote:
 This is a documentation change only from the previous version.
 After discussing it, there is a potential page leak as noted
 in the updated changelog for the first patch.  Thanks,

formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs

2012-03-30 Thread Greg KH
On Fri, Mar 30, 2012 at 04:49:12PM -0400, Konrad Rzeszutek Wilk wrote:
 On Fri, Mar 23, 2012 at 01:36:28PM +0530, Raghavendra K T wrote:
  From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
  
  Move the code from Xen to debugfs to make the code common
  for other users as well.
  
  Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
  Signed-off-by: Suzuki Poulose suz...@in.ibm.com
  Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Greg,
 
 I was thinking to stick this patch in my queue, but I need your
 OK since it touches fs/debugfs/file.c.

That's fine with me:

Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3.0.y 1/2] KVM: x86: extend struct x86_emulate_ops with get_cpuid

2012-03-23 Thread Greg KH
On Thu, Mar 22, 2012 at 09:50:43AM +0100, Stefan Bader wrote:
 From eaee58e1433e1b16e686cfcdcbc207d4310a239f Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Stephan=20B=C3=A4rwolf?= stephan.baerw...@tu-ilmenau.de
 Date: Thu, 12 Jan 2012 16:43:03 +0100
 Subject: [PATCH 7/8] KVM: x86: extend struct x86_emulate_ops with
  get_cpuid

Stefan, what's with the crappy header here?  And the Subject?  Ick.

Come on, forcing me to hand-edit your patches just makes me grumpy...

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3.0.y 1/2] KVM: x86: extend struct x86_emulate_ops with get_cpuid

2012-03-23 Thread Greg KH
On Fri, Mar 23, 2012 at 06:47:38PM +0100, Stefan Bader wrote:
 On 23.03.2012 18:22, Greg KH wrote:
  On Thu, Mar 22, 2012 at 09:50:43AM +0100, Stefan Bader wrote:
  From eaee58e1433e1b16e686cfcdcbc207d4310a239f Mon Sep 17 00:00:00 2001
  From: =?UTF-8?q?Stephan=20B=C3=A4rwolf?=
  stephan.baerw...@tu-ilmenau.de Date: Thu, 12 Jan 2012 16:43:03 +0100 
  Subject: [PATCH 7/8] KVM: x86: extend struct x86_emulate_ops with 
  get_cpuid
  
  Stefan, what's with the crappy header here?  And the Subject?  Ick.
  
  Come on, forcing me to hand-edit your patches just makes me grumpy...
  
  greg k-h
 
 Greg, this is exactly the patch you get by using git format-patch and send it
 away with git send-email. So I would say any grumpy or constructive comments
 should go to the git development mailing list. ;)

No it isn't at all.  That's not how the patch looks using git-send-email
for anything I've ever sent out.

Don't blame the tool, use it correctly.

Hint, look at the 7/8 crap, what is that there for?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CVE-2012-0045 for 3.2.y, 3.0.y and 2.6.32.y (again)

2012-03-22 Thread Greg KH
On Thu, Mar 22, 2012 at 09:50:40AM +0100, Stefan Bader wrote:
 Resubmitting with more of the proper maintainers
 subscribed (note that Marcelo is one of them)...

Again, for the others on the cc:, I can't take these in the stable
tree(s) until I get an ack from the maintainers of the code.

Especially given that the KVM developers have told me in the past to not
take any patches unless they send them to me themselves...

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1 v3] PCI: Device specific reset function

2012-03-05 Thread Greg KH
On Mon, Mar 05, 2012 at 10:00:49AM +, Tadeusz Struk wrote:
 
 ---
  drivers/pci/pci.h|1 +
  drivers/pci/quirks.c |   33 +++--
  include/linux/pci.h  |1 +
  3 files changed, 29 insertions(+), 6 deletions(-)

Please read Documentation/SubmittingPatches for how to properly create,
and send, patches that are in a format that can be accepted.

Hint, also run your patch through scripts/checkpatch.pl to find the
obvious problems that are in it, to keep us from having to do that for
you...

 
 This e-mail and any attachments may contain confidential material for the 
 sole use of the intended recipient(s). Any review or distribution by others 
 is strictly prohibited. If you are not the intended recipient, please contact 
 the sender and delete all copies.

I have been told that such email footers require that the patch be
deleted and never be accepted.  Please fix this if you expect your
patches to be able to be applied.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[18/48] KVM: Device assignment permission checks

2012-01-16 Thread Greg KH
3.1-stable review patch.  If anyone has any objections, please let me know.

--


From: Alex Williamson alex.william...@redhat.com

(cherry picked from commit 3d27e23b17010c668db311140b1770c78fb9)

Only allow KVM device assignment to attach to devices which:

 - Are not bridges
 - Have BAR resources (assume others are special devices)
 - The user has permissions to use

Assigning a bridge is a configuration error, it's not supported, and
typically doesn't result in the behavior the user is expecting anyway.
Devices without BAR resources are typically chipset components that
also don't have host drivers.  We don't want users to hold such devices
captive or cause system problems by fencing them off into an iommu
domain.  We determine permission to use by testing whether the user
has access to the PCI sysfs resource files.  By default a normal user
will not have access to these files, so it provides a good indication
that an administration agent has granted the user access to the device.

[Yang Bai: add missing #include]
[avi: fix comment style]

Signed-off-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Yang Bai hamo...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de
---
 Documentation/virtual/kvm/api.txt |4 ++
 virt/kvm/assigned-dev.c   |   75 ++
 2 files changed, 79 insertions(+)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1134,6 +1134,10 @@ following flags are specified:
 The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
 isolation of the device.  Usages not specifying this flag are deprecated.
 
+Only PCI header type 0 devices with PCI BAR resources are supported by
+device assignment.  The user requesting this ioctl must have read/write
+access to the PCI sysfs resource files associated with the device.
+
 4.49 KVM_DEASSIGN_PCI_DEVICE
 
 Capability: KVM_CAP_DEVICE_DEASSIGNMENT
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -17,6 +17,8 @@
 #include linux/pci.h
 #include linux/interrupt.h
 #include linux/slab.h
+#include linux/namei.h
+#include linux/fs.h
 #include irq.h
 
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
@@ -474,12 +476,73 @@ out:
return r;
 }
 
+/*
+ * We want to test whether the caller has been granted permissions to
+ * use this device.  To be able to configure and control the device,
+ * the user needs access to PCI configuration space and BAR resources.
+ * These are accessed through PCI sysfs.  PCI config space is often
+ * passed to the process calling this ioctl via file descriptor, so we
+ * can't rely on access to that file.  We can check for permissions
+ * on each of the BAR resource files, which is a pretty clear
+ * indicator that the user has been granted access to the device.
+ */
+static int probe_sysfs_permissions(struct pci_dev *dev)
+{
+#ifdef CONFIG_SYSFS
+   int i;
+   bool bar_found = false;
+
+   for (i = PCI_STD_RESOURCES; i = PCI_STD_RESOURCE_END; i++) {
+   char *kpath, *syspath;
+   struct path path;
+   struct inode *inode;
+   int r;
+
+   if (!pci_resource_len(dev, i))
+   continue;
+
+   kpath = kobject_get_path(dev-dev.kobj, GFP_KERNEL);
+   if (!kpath)
+   return -ENOMEM;
+
+   /* Per sysfs-rules, sysfs is always at /sys */
+   syspath = kasprintf(GFP_KERNEL, /sys%s/resource%d, kpath, i);
+   kfree(kpath);
+   if (!syspath)
+   return -ENOMEM;
+
+   r = kern_path(syspath, LOOKUP_FOLLOW, path);
+   kfree(syspath);
+   if (r)
+   return r;
+
+   inode = path.dentry-d_inode;
+
+   r = inode_permission(inode, MAY_READ | MAY_WRITE | MAY_ACCESS);
+   path_put(path);
+   if (r)
+   return r;
+
+   bar_found = true;
+   }
+
+   /* If no resources, probably something special */
+   if (!bar_found)
+   return -EPERM;
+
+   return 0;
+#else
+   return -EINVAL; /* No way to control the device without sysfs */
+#endif
+}
+
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
  struct kvm_assigned_pci_dev *assigned_dev)
 {
int r = 0, idx;
struct kvm_assigned_dev_kernel *match;
struct pci_dev *dev;
+   u8 header_type;
 
if (!(assigned_dev-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU))
return -EINVAL;
@@ -510,6 +573,18 @@ static int kvm_vm_ioctl_assign_device(st
r = -EINVAL;
goto out_free;
}
+
+   /* Don't allow bridges to be assigned */
+   pci_read_config_byte(dev, PCI_HEADER_TYPE, header_type);
+   if 

[17/48] KVM: Remove ability to assign a device without iommu support

2012-01-16 Thread Greg KH
3.1-stable review patch.  If anyone has any objections, please let me know.

--


From: Alex Williamson alex.william...@redhat.com

(cherry picked from commit 423873736b78f549fbfa2f715f2e4de7e6c5e1e9)

This option has no users and it exposes a security hole that we
can allow devices to be assigned without iommu protection.  Make
KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de
---
 Documentation/virtual/kvm/api.txt |3 +++
 virt/kvm/assigned-dev.c   |   18 +-
 2 files changed, 12 insertions(+), 9 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1131,6 +1131,9 @@ following flags are specified:
 /* Depends on KVM_CAP_IOMMU */
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
+The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
+isolation of the device.  Usages not specifying this flag are deprecated.
+
 4.49 KVM_DEASSIGN_PCI_DEVICE
 
 Capability: KVM_CAP_DEVICE_DEASSIGNMENT
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -481,6 +481,9 @@ static int kvm_vm_ioctl_assign_device(st
struct kvm_assigned_dev_kernel *match;
struct pci_dev *dev;
 
+   if (!(assigned_dev-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU))
+   return -EINVAL;
+
mutex_lock(kvm-lock);
idx = srcu_read_lock(kvm-srcu);
 
@@ -538,16 +541,14 @@ static int kvm_vm_ioctl_assign_device(st
 
list_add(match-list, kvm-arch.assigned_dev_head);
 
-   if (assigned_dev-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU) {
-   if (!kvm-arch.iommu_domain) {
-   r = kvm_iommu_map_guest(kvm);
-   if (r)
-   goto out_list_del;
-   }
-   r = kvm_assign_device(kvm, match);
+   if (!kvm-arch.iommu_domain) {
+   r = kvm_iommu_map_guest(kvm);
if (r)
goto out_list_del;
}
+   r = kvm_assign_device(kvm, match);
+   if (r)
+   goto out_list_del;
 
 out:
srcu_read_unlock(kvm-srcu, idx);
@@ -587,8 +588,7 @@ static int kvm_vm_ioctl_deassign_device(
goto out;
}
 
-   if (match-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU)
-   kvm_deassign_device(kvm, match);
+   kvm_deassign_device(kvm, match);
 
kvm_free_assigned_device(kvm, match);
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[15/48] KVM guest: prevent tracing recursion with kvmclock

2012-01-16 Thread Greg KH
3.1-stable review patch.  If anyone has any objections, please let me know.

--

From: Avi Kivity a...@redhat.com

(cherry picked from commit 95ef1e52922cf75b1ea2eae54ef886f2cc47eecb)

Prevent tracing of preempt_disable() in get_cpu_var() in
kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled,
preempt_disable/enable() are traced and this causes the function_graph
tracer to go into an infinite recursion. By open coding the
preempt_disable() around the get_cpu_var(), we can use the notrace
version which prevents preempt_disable/enable() from being traced and
prevents the recursion.

Based on a similar patch for Xen from Jeremy Fitzhardinge.

Tested-by: Gleb Natapov g...@redhat.com
Acked-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de
---
 arch/x86/kernel/kvmclock.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -74,9 +74,10 @@ static cycle_t kvm_clock_read(void)
struct pvclock_vcpu_time_info *src;
cycle_t ret;
 
-   src = get_cpu_var(hv_clock);
+   preempt_disable_notrace();
+   src = __get_cpu_var(hv_clock);
ret = pvclock_clocksource_read(src);
-   put_cpu_var(hv_clock);
+   preempt_enable_notrace();
return ret;
 }
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[16/48] KVM: x86: Prevent starting PIT timers in the absence of irqchip support

2012-01-16 Thread Greg KH
3.1-stable review patch.  If anyone has any objections, please let me know.

--


From: Jan Kiszka jan.kis...@siemens.com

(cherry picked from commit 0924ab2cfa98b1ece26c033d696651fd62896c69)

User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:

BUG: unable to handle kernel NULL pointer dereference at 0128
IP: [a10f6280] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
 [a11228c1] pit_do_work+0x51/0xd0 [kvm]
 [81071431] process_one_work+0x111/0x4d0
 [81071bb2] worker_thread+0x152/0x340
 [81075c8e] kthread+0x7e/0x90
 [815a4474] kernel_thread_helper+0x4/0x10

Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de
---
 arch/x86/kvm/i8254.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -338,11 +338,15 @@ static enum hrtimer_restart pit_timer_fn
return HRTIMER_NORESTART;
 }
 
-static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
+static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 {
+   struct kvm_kpit_state *ps = kvm-arch.vpit-pit_state;
struct kvm_timer *pt = ps-pit_timer;
s64 interval;
 
+   if (!irqchip_in_kernel(kvm))
+   return;
+
interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
 
pr_debug(create pit timer, interval is %llu nsec\n, interval);
@@ -394,13 +398,13 @@ static void pit_load_count(struct kvm *k
 /* FIXME: enhance mode 4 precision */
case 4:
if (!(ps-flags  KVM_PIT_FLAGS_HPET_LEGACY)) {
-   create_pit_timer(ps, val, 0);
+   create_pit_timer(kvm, val, 0);
}
break;
case 2:
case 3:
if (!(ps-flags  KVM_PIT_FLAGS_HPET_LEGACY)){
-   create_pit_timer(ps, val, 1);
+   create_pit_timer(kvm, val, 1);
}
break;
default:


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM updates for Linux 3.1.8

2012-01-12 Thread Greg KH
On Thu, Jan 12, 2012 at 12:39:50PM +0200, Avi Kivity wrote:
 The following patches want to be applied to the 3.1 branch.  All users must
 update.

Do any of these also need to go to the 3.0 branch?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM updates for Linux 3.1.8

2012-01-12 Thread Greg KH
On Thu, Jan 12, 2012 at 06:25:43PM +0200, Avi Kivity wrote:
 On 01/12/2012 06:15 PM, Greg KH wrote:
  On Thu, Jan 12, 2012 at 12:39:50PM +0200, Avi Kivity wrote:
   The following patches want to be applied to the 3.1 branch.  All users 
   must
   update.
 
  Do any of these also need to go to the 3.0 branch?
 
 Yes.  But for kvm we regression test stable patches, and I haven't done
 this yet.  I find the normal stable process, while very convenient,
 frightening.

What do you mean by this?

And thanks for the patches, they are all now queued up.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32

2011-10-24 Thread Greg KH
On Mon, Oct 24, 2011 at 02:58:47PM +0530, Raghavendra K T wrote:
 On 10/24/2011 03:49 AM, Greg KH wrote:
 On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote:
 Renaming of xen functions and change unsigned to u32.
 
 Why not just rename when you move the functions?  Why the extra step?
 
 Intention was only clarity. Yes, if this patch is an overhead, I 'll
 combine both the patches.

Yeah, it makes more sense as it originally confused me why you were
adding a xen_* function to the debugfs core code :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 1/5] debugfs: Add support to print u32 array in debugfs

2011-10-23 Thread Greg KH
On Mon, Oct 24, 2011 at 12:34:04AM +0530, Raghavendra K T wrote:
 Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to 
 debugfs
 to make the code common for other users as well.

You forgot the kerneldoc for the function explaining what it is and how
to use it, and the EXPORT_SYMBOL_GPL() marking for the global function
as that's the only way it will be able to be used, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32

2011-10-23 Thread Greg KH
On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote:
 Renaming of xen functions and change unsigned to u32.

Why not just rename when you move the functions?  Why the extra step?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-13 Thread Greg KH
On Tue, Sep 13, 2011 at 04:54:02PM +0200, Roedel, Joerg wrote:
 --- a/include/linux/device.h
 +++ b/include/linux/device.h
 @@ -22,6 +22,7 @@
  #include linux/types.h
  #include linux/module.h
  #include linux/pm.h
 +#include linux/iommu.h

Ick, please don't add new #includes to device.h, it makes the whole
build slower.  Just pre-declare the structure and all should be fine.

Don't you need to also redo the other patches in this series based on
the other comments you received?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-13 Thread Greg KH
On Tue, Sep 13, 2011 at 05:38:11PM +0200, Roedel, Joerg wrote:
 On Tue, Sep 13, 2011 at 10:58:55AM -0400, Greg KH wrote:
  On Tue, Sep 13, 2011 at 04:54:02PM +0200, Roedel, Joerg wrote:
   --- a/include/linux/device.h
   +++ b/include/linux/device.h
   @@ -22,6 +22,7 @@
#include linux/types.h
#include linux/module.h
#include linux/pm.h
   +#include linux/iommu.h
  
  Ick, please don't add new #includes to device.h, it makes the whole
  build slower.  Just pre-declare the structure and all should be fine.
 
 Hmm, since linux/iommu.h provides 'struct iommu_ops', and this patch
 adds a 'struct iommu_ops' to 'struct bus_type', wouldn't a simple
 forward declaration make the bus_type incomplete in most other places?

No, just like it doesn't make iommu.h incomplete as you used a struct
bus_type there.

 To lower the impact I can move the 'struct iommu_ops' to a seperate
 header file instead.

No, that should not be necessary.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-07 Thread Greg KH
On Wed, Sep 07, 2011 at 05:41:45PM +0200, Joerg Roedel wrote:
 This is the starting point to make the iommu_ops used for
 the iommu-api a per-bus-type structure. It is required to
 easily implement bus-specific setup in the iommu-layer.
 The first user will be the iommu-group attribute in sysfs.
 
 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  drivers/base/bus.c |   16 
  drivers/iommu/iommu.c  |4 
  include/linux/device.h |9 +
  include/linux/iommu.h  |2 ++
  4 files changed, 31 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/base/bus.c b/drivers/base/bus.c
 index 000e7b2..34ac706 100644
 --- a/drivers/base/bus.c
 +++ b/drivers/base/bus.c
 @@ -1028,6 +1028,22 @@ void bus_sort_breadthfirst(struct bus_type *bus,
  }
  EXPORT_SYMBOL_GPL(bus_sort_breadthfirst);
  
 +#ifdef CONFIG_IOMMU_API
 +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
 +{
 + if (bus-iommu_ops != NULL)
 + return -EBUSY;

Busy?

 +
 + bus-iommu_ops = ops;
 +
 + /* Do IOMMU specific setup for this bus-type */
 + iommu_bus_init(bus, ops);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(bus_set_iommu);

I don't understand what this function is for, and who would call it.

Please provide kerneldoc that explains this.


 +#endif
 +
  int __init buses_init(void)
  {
   bus_kset = kset_create_and_add(bus, bus_uevent_ops, NULL);
 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 30b0644..3b24a5b 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -34,6 +34,10 @@ void register_iommu(struct iommu_ops *ops)
   iommu_ops = ops;
  }
  
 +void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
 +{
 +}
 +
  bool iommu_found(void)
  {
   return iommu_ops != NULL;
 diff --git a/include/linux/device.h b/include/linux/device.h
 index c20dfbf..8240b2a 100644
 --- a/include/linux/device.h
 +++ b/include/linux/device.h
 @@ -22,6 +22,7 @@
  #include linux/types.h
  #include linux/module.h
  #include linux/pm.h
 +#include linux/iommu.h
  #include linux/atomic.h
  #include asm/device.h
  
 @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *, struct 
 bus_attribute *);
   * @resume:  Called to bring a device on this bus out of sleep mode.
   * @pm:  Power management operations of this bus, callback the 
 specific
   *   device driver's pm-ops.
 + * @iommu_ops   IOMMU specific operations for this bus, used to attach IOMMU
 + *  driver implementations to a bus and allow the driver to do
 + *  bus-specific setup

So why is this just not set by the bus itself, making the above function
not needed at all?

confused,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] IOMMU: Make iommu_ops per-bus_type

2011-09-07 Thread Greg KH
On Wed, Sep 07, 2011 at 05:41:43PM +0200, Joerg Roedel wrote:
 Hi,
 
 here is the new version of the patch-set to make the iommu_ops used in
 the iommu-api a bus_type property. This will allow us to move code out
 of the iommu drivers into generic code and it simplifies the
 implementation of the Alex' device-group property.
 
 Greg, can you have a look at patch 2 please and tell me if you have any
 objections?

I object, please see my comments.

 With this version the patch-set is complete (not as the first RFC post).
 It converts all iommu drivers to use the new registration interface and
 completly removes the register_iommu interface.
 
 Regards,
 
   Joerg
 
 Diffstat:
 
  arch/ia64/kvm/kvm-ia64.c   |3 +-
  arch/x86/kvm/x86.c |3 +-
  drivers/base/bus.c |   16 ++
  drivers/iommu/amd_iommu.c  |2 +-
  drivers/iommu/intel-iommu.c|2 +-
  drivers/iommu/iommu.c  |   58 
 
  drivers/iommu/msm_iommu.c  |2 +-
  drivers/iommu/omap-iommu.c |2 +-
  drivers/media/video/omap3isp/isp.c |2 +-
  include/linux/device.h |9 +
  include/linux/iommu.h  |   21 +++--
  virt/kvm/iommu.c   |4 +-
  12 files changed, 86 insertions(+), 38 deletions(-)

So the overall work here makes for more code, right?  I fail to see the
benifit, what am I missing?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-07 Thread Greg KH
On Wed, Sep 07, 2011 at 09:19:19PM +0200, Joerg Roedel wrote:
 Hi Greg,
 
 the bus_set_iommu() function will be called by the IOMMU driver. There
 can be different drivers for the same bus, depending on the hardware. On
 PCI for example, there can be the Intel or the AMD IOMMU driver that
 implement the iommu-api and that register for that bus.

Why are you pushing this down into the driver core?  What other busses
becides PCI use/need this?

If you can have a different IOMMU driver on the same bus, then wouldn't
this be a per-device thing instead of a per-bus thing?


 On Wed, Sep 07, 2011 at 11:47:50AM -0700, Greg KH wrote:
   +#ifdef CONFIG_IOMMU_API
   +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops)
   +{
   + if (bus-iommu_ops != NULL)
   + return -EBUSY;
  
  Busy?
 
 Yes, it signals to the IOMMU driver that another driver has already
 registered for that bus. In the previous register_iommu() interface this
 was just a BUG(), but I think returning an error to the caller is
 better. It can be turned back into a BUG() if it is considered better,
 though.

Can you ever have more than one IOMMU driver per bus?  If so, this seems
wrong (see above.)

   +
   + bus-iommu_ops = ops;
   +
   + /* Do IOMMU specific setup for this bus-type */
   + iommu_bus_init(bus, ops);
   +
   + return 0;
   +}
   +EXPORT_SYMBOL_GPL(bus_set_iommu);
  
  I don't understand what this function is for, and who would call it.
 
 It is called by the IOMMU driver.
 
  Please provide kerneldoc that explains this.
 
 Will do.
 
   @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *, struct 
   bus_attribute *);
 * @resume:  Called to bring a device on this bus out of sleep mode.
 * @pm:  Power management operations of this bus, callback the 
   specific
 *   device driver's pm-ops.
   + * @iommu_ops   IOMMU specific operations for this bus, used to attach 
   IOMMU
   + *  driver implementations to a bus and allow the driver to 
   do
   + *  bus-specific setup
  
  So why is this just not set by the bus itself, making the above function
  not needed at all?
 
 The IOMMUs are usually devices on the bus itself, so they are
 initialized after the bus is set up and the devices on it are
 populated.  So the function can not be called on bus initialization
 because the IOMMU is not ready at this point.

Ok, that makes more sense, please state as much in the documentation.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster

2011-07-05 Thread Greg KH
On Fri, Jul 01, 2011 at 07:58:45PM +0300, Dan Carpenter wrote:
 On Fri, Jul 01, 2011 at 07:31:54AM -0700, Dan Magenheimer wrote:
   Off by one errors are kind of insidious.  People cut and paste them
   and they spread.  If someone adds a new list of chunks then there
   are now two examples that are correct and two which have an extra
   element, so it's 50/50 that he'll copy the right one.
  
  True, but these are NOT off-by-one errors... they are
  correct-but-slightly-ugly code snippets.  (To clarify, I said
  the *ugliness* arose when debugging an off-by-one error.)
  
 
 What I meant was the new arrays are *one* element too large.
 
  Patches always welcome, and I agree that these should be
  fixed eventually, assuming the code doesn't go away completely
  first.. I'm simply stating the position
  that going through another test/submit cycling to fix
  correct-but-slightly-ugly code which is present only to
  surface information for experiments is not high on my priority
  list right now... unless GregKH says he won't accept the patch.
   
   Btw, looking at it again, this seems like maybe a similar issue in
   zbud_evict_zbpg():
   
  516  for (i = 0; i  MAX_CHUNK; i++) {
  517  retry_unbud_list_i:
   
   
   MAX_CHUNKS is NCHUNKS - 1.  Shouldn't that be i  NCHUNKS so that we
   reach the last element in the list?
  
  No, the last element in that list is unused.  There is a comment
  to that effect someplace in the code.  (These lists are keeping
  track of pages with chunks of available space and the last
  entry would have no available space so is always empty.)
 
 The comment says that the first element isn't used.  Perhaps the
 comment is out of date and now it's the last element that isn't
 used.  To me, it makes sense to have an unused first element, but it
 doesn't make sense to have an unused last element.  Why not just
 make the array smaller?
 
 Also if the last element of the original arrays isn't used, then
 does that mean the last *two* elements of the new arrays aren't
 used?
 
 Getting array sizes wrong is not a correct-but-slightly-ugly
 thing.  *grumble* *grumble* *grumble*.  But it doesn't crash the
 system so I'm fine with it going in as is...

I'm not.  Please fix this up.  I'll not accept it until it is.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster

2011-06-30 Thread Greg KH
On Thu, Jun 30, 2011 at 12:01:08PM -0700, Dan Magenheimer wrote:
 Hi Greg --
 
 I think this patch is now ready for staging-next and for merging when
 the 3.1 window opens.  Please let me know if you need any logistics
 done differently.

Ok, thanks, I'll queue it up later this week for 3.1

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-08 Thread Greg KH
On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote:
 On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
  Hi Rusty,
  Yes, I can't figure out an instance of disk probing in parallel either, but 
  as
  per the following commit, I think we still need use lock for safety. What's 
  your opinion?
  
  commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
  Author: Tejun Heo t...@kernel.org
  Date:   Sat Feb 21 11:04:45 2009 +0900
  
  [SCSI] sd: revive sd_index_lock
  
  Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
  use ida instead of idr incorrectly removed sd_index_lock around id
  allocation and free.  idr/ida do have internal locks but they protect
  their free object lists not the allocation itself.  The caller is
  responsible for that.  This missing synchronization led to the same id
  being assigned to multiple devices leading to oops.
 
 I'm confused.  Tejun, Greg, anyone can probes happen in parallel?
 
 If so, I'll have to review all my drivers.

I know we've tried it in the past, at the PCI device level, and ran into
some issues, but I don't remember if that code ever made it into the
mainline kernel or not.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH] kvm: fix crash on irqfd deassign

2011-03-17 Thread Greg KH
On Thu, Mar 17, 2011 at 10:53:33AM +0200, Michael S. Tsirkin wrote:
 irqfd in kvm used flush_work incorrectly:
 it assumed that work scheduled previously can't run
 after flush_work, but since kvm uses a non-reentrant
 workqueue (by means of schedule_work)
 we need flush_work_sync to get that guarantee.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reported-by: Jean-Philippe Menil jean-philippe.me...@univ-nantes.fr
 Tested-by: Jean-Philippe Menil jean-philippe.me...@univ-nantes.fr
 ---
 
 Note: this is needed for kernel 2.6.39 and earlier.

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-20 Thread Greg KH
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote:
 On 16 July 2010 23:58, Tom Lyon p...@cisco.com wrote:
  The VFIO driver is used to allow privileged AND non-privileged processes 
  to
  implement user-level device drivers for any well-behaved PCI, PCI-X, and 
  PCIe
  devices.
 
 Thanks for working on that! I wonder whether it's possible to say what
 are the chances of it being merged to mainline and which version we
 might be talking about?

We still have a long way to go before you need to worry about what
kernel version it's going to show up in...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-10 Thread Greg KH
On Thu, Jun 10, 2010 at 06:58:37PM -0700, Tom Lyon wrote:
 On Thursday 10 June 2010 10:27:36 am Konrad Rzeszutek Wilk wrote:
   +EXPORT_SYMBOL(uiommu_fdget);
  
  EXPORT_SYMBOL_GPL
  .. snip
   +EXPORT_SYMBOL(uiommu_put);
  
  ditto.
  
 
 Is there a definitive explanation somewhere of when to use each?

For a driver like this, that is very tied to the way that the kernel
works, I would recommend the _GPL marking, like the UIO interface has.

But that's just me. :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-09 Thread Greg KH
On Wed, Jun 09, 2010 at 02:04:53PM +0300, Avi Kivity wrote:
 On 06/09/2010 12:21 AM, Tom Lyon wrote:
 The VFIO driver is used to allow privileged AND non-privileged processes to
 implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
 devices.
  Signed-off-by: Tom Lyonp...@cisco.com
 ---
 This version now requires an IOMMU domain to be set before any access to
 device registers is granted (except that config space may be read).  In
 addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
 which does not have sufficient controls around IOMMU usage.  The IOMMU domain
 is obtained from the 'uiommu' driver which is included in this patch.
 
 Various locking, security, and documentation issues have also been fixed.
 
 Please commit - it or me!
 But seriously, who gets to commit this? Avi for KVM?
 
 Definitely not me.
 
 or GregKH for drivers?
 
 I guess.

If this ever gets that far, I'll be glad to take it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call

2010-06-03 Thread Greg KH
On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote:
 virtio_net: Add schedule check to napi_enable call
 Under harsh testing conditions, including low memory, the guest would
 stop receiving packets. With this patch applied we no longer see any
 problems in the driver while performing these tests for extended periods
 of time.
 
 Make sure napi is scheduled subsequent to each napi_enable.
 
 Signed-off-by: Bruce Rogers brog...@novell.com
 Signed-off-by: Olaf Kirch o...@suse.de

I need a git commit id for this one as well.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call

2010-06-03 Thread Greg KH
On Thu, Jun 03, 2010 at 04:17:34PM -0600, Bruce Rogers wrote:
   On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: 
  On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote:
  virtio_net: Add schedule check to napi_enable call
  Under harsh testing conditions, including low memory, the guest would
  stop receiving packets. With this patch applied we no longer see any
  problems in the driver while performing these tests for extended 
  periods
  of time.
  
  Make sure napi is scheduled subsequent to each napi_enable.
  
  Signed-off-by: Bruce Rogers brog...@novell.com
  Signed-off-by: Olaf Kirch o...@suse.de
  
  I need a git commit id for this one as well.
  
 
 This one is not upstream.

Then I can't include it in the -stable tree, so why are you sending it
to me?  :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] pci: allow sysfs file owner to read device dependent config space

2010-05-14 Thread Greg KH
On Wed, May 12, 2010 at 06:29:57PM -0700, Chris Wright wrote:
 The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
 check to verify privileges before allowing a user to read device
 dependent config space.  This is meant to protect from an unprivileged
 user potentially locking up the box.
 
 When assigning a PCI device directly to a guest with libvirt and KVM,
 the sysfs config space file is chown'd to the unprivileged user that
 the KVM guest will run as.  The guest needs to have full access to the
 device's config space since it's responsible for driving the device.
 However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN
 check will not allow read access beyond the config header.
 
 With this patch the sysfs file owner is also considered privileged enough
 to read all of the config space.
 
 Signed-off-by: Chris Wright chr...@sous-sol.org
 ---
  drivers/pci/pci-sysfs.c |4 +++-

Jesse, any objection to this going through my tree as it will depend on
the sysfs change?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] pci: check caps from sysfs file open to read device dependent config space

2010-05-13 Thread Greg KH
On Thu, May 13, 2010 at 10:43:07AM -0700, Chris Wright wrote:
 * Alan Cox (a...@lxorguk.ukuu.org.uk) wrote:
  I agree with the problem - but IMHO the fix is to require opening the file
  checks CAP_SYS_something instead: not to hack the read method and make it
  even weirder and more un-Linux than it is now.
 
 This patch does that.  Not as convenient from the KVM/libvirt point of view
 because it is not prepared to do this setup before dropping privileges
 and launching the VM.

So does that mean that this patch doesn't solve your original problem
here?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] sysfs: bin_attr permission checking

2010-05-12 Thread Greg KH
On Wed, May 12, 2010 at 11:47:13AM -0700, Chris Wright wrote:
 The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
 check to verify privileges before allowing a user to read device
 dependent config space.  This is meant to protect from an unprivileged
 user potentially locking up the box.
 
 When assigning a PCI device directly to a guest with libvirt and KVM, the 
 sysfs config space file is chown'd to the user that the KVM guest will
 run as.  The guest needs to have full access to the device's config
 space since it's responsible for driving the device.  However, despite
 being the owner of the sysfs file, the CAP_SYS_ADMIN check will not
 allow read access beyond the config header.
 
 This patch adds a new bin_attr-read_file() callback which adds a struct
 file to the normal argument list.  This allows an implementation such as
 PCI config space bin_attr read_file handler to check both inode
 permission as well as privileges to determine whether to allow
 privileged actions through the handler.

Ick, this is all because we like showing different information if the
user is privileged or not :(

Turns out, that this probably isn't the best user api to implement,
remind me never to do that again...

 This is just RFC, although I've tested that it does allow the chown +
 read to work as expected.  Any other ideas of how to handle this are
 welcome.

Can we just pass in the 'file' for all users of the bin files instead of
the dentry?  You can always get the dentry from the file (as your patch
showes), and there isn't that many users of this interface.  I'd really
rather not have two different types of callbacks here.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] sysfs: bin_attr permission checking

2010-05-12 Thread Greg KH
On Wed, May 12, 2010 at 12:28:28PM -0700, Chris Wright wrote:
 * Greg KH (g...@kroah.com) wrote:
  On Wed, May 12, 2010 at 11:47:13AM -0700, Chris Wright wrote:
   The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
   check to verify privileges before allowing a user to read device
   dependent config space.  This is meant to protect from an unprivileged
   user potentially locking up the box.
   
   When assigning a PCI device directly to a guest with libvirt and KVM, the 
   sysfs config space file is chown'd to the user that the KVM guest will
   run as.  The guest needs to have full access to the device's config
   space since it's responsible for driving the device.  However, despite
   being the owner of the sysfs file, the CAP_SYS_ADMIN check will not
   allow read access beyond the config header.
   
   This patch adds a new bin_attr-read_file() callback which adds a struct
   file to the normal argument list.  This allows an implementation such as
   PCI config space bin_attr read_file handler to check both inode
   permission as well as privileges to determine whether to allow
   privileged actions through the handler.
  
  Ick, this is all because we like showing different information if the
  user is privileged or not :(
 
 yup
 
  Turns out, that this probably isn't the best user api to implement,
  remind me never to do that again...
 
 Yeah, it's challenging to deal with.  Alternative here is a new config
 sysfs entry that doesn't have this 'feature'.  (I looked into trying to
 allow manageing the internal capable() check externally, not so pretty).

That would require people to update libpci and maybe their scripts as
well, which wouldn't be as good.

   This is just RFC, although I've tested that it does allow the chown +
   read to work as expected.  Any other ideas of how to handle this are
   welcome.
  
  Can we just pass in the 'file' for all users of the bin files instead of
  the dentry? 
 
 The dentry doesn't currently go beyond sysfs/bin.c.  So, yes, I pushed
 'file' through to last level in bin.c before -read(), and can certinaly
 just push through to -read() as well.

That would be better than having a 'read_file' callback, right?

   You can always get the dentry from the file (as your patch
  showes), and there isn't that many users of this interface.  I'd really
  rather not have two different types of callbacks here.
 
 Absolutely, this is just RFC (i.e. quicker to compile and test).  What
 about write()?

Sure, might as well make it symmetrical :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-12 Thread Greg KH
On Wed, May 12, 2010 at 12:34:20PM +0800, Américo Wang wrote:
 On Tue, May 11, 2010 at 08:03:20AM -0700, Greg KH wrote:
 On Tue, May 11, 2010 at 09:33:50PM +1000, CaT wrote:
  On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote:
   On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com 
   wrote:
On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
On 05/04/2010 10:03 AM, CaT wrote:
I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
on 1 cpu with virtio HDs running on top of a core2duo host running 
2.6.33.3.
qemu-kvm version 0.12.3.
   
Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the 
latest
kernel?
   
   
   Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf?
   
   Actually, these 3 commits fixed it:
   
   6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class
   per sysfs ttribute.
   a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active
   references on attributes.
   e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove 
   sysfs_get/put_active_two
   
   However, there are many other patches needed to amend these, so I think
   it's not suitable for -stable to include, perhaps a revert of
   846f99749ab68bbc7f75c74fec305de675b1a1bf is better.
  
  Slightly at a loss as to what to do, now. It's a virt instance so I can
  apply patches at will but, well, clarity is good. :)
 
 Just ignore the lockdep warnings as they are bogus, or turn them off, or
 use .34-rc7, as they are resolved there.
 
 
 How about reverting that patch for 2.6.33 stable tree?

No, as that patch is not reverted in Linus's tree, right?  Just turn off
lockdep if this is bothering you.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-11 Thread Greg KH
On Tue, May 11, 2010 at 09:33:50PM +1000, CaT wrote:
 On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote:
  On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com 
  wrote:
   On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
   On 05/04/2010 10:03 AM, CaT wrote:
   I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
   on 1 cpu with virtio HDs running on top of a core2duo host running 
   2.6.33.3.
   qemu-kvm version 0.12.3.
  
   Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest
   kernel?
  
  
  Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf?
  
  Actually, these 3 commits fixed it:
  
  6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class
  per sysfs ttribute.
  a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active
  references on attributes.
  e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove 
  sysfs_get/put_active_two
  
  However, there are many other patches needed to amend these, so I think
  it's not suitable for -stable to include, perhaps a revert of
  846f99749ab68bbc7f75c74fec305de675b1a1bf is better.
 
 Slightly at a loss as to what to do, now. It's a virt instance so I can
 apply patches at will but, well, clarity is good. :)

Just ignore the lockdep warnings as they are bogus, or turn them off, or
use .34-rc7, as they are resolved there.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[69/98] KVM: remove unused load_segment_descriptor_to_kvm_desct

2010-05-10 Thread Greg KH
2.6.32-stable review patch.  If anyone has any objections, please let us know.

--

From: Marcelo Tosatti mtosa...@redhat.com

Commit 78ce64a384 in v2.6.32.12 introduced a warning due to unused
load_segment_descriptor_to_kvm_desct helper, which has been opencoded by
this commit.

On upstream, the helper was removed as part of a different commit.

Remove the now unused function.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de

---
 arch/x86/kvm/x86.c |   12 
 1 file changed, 12 deletions(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4155,18 +4155,6 @@ static u16 get_segment_selector(struct k
return kvm_seg.selector;
 }
 
-static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu,
-   u16 selector,
-   struct kvm_segment *kvm_seg)
-{
-   struct desc_struct seg_desc;
-
-   if (load_guest_segment_descriptor(vcpu, selector, seg_desc))
-   return 1;
-   seg_desct_to_kvm_desct(seg_desc, selector, kvm_seg);
-   return 0;
-}
-
 static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int 
seg)
 {
struct kvm_segment segvar = {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.32.12] KVM: remove unused load_segment_descriptor_to_kvm_desct

2010-04-27 Thread Greg KH
On Tue, Apr 27, 2010 at 11:14:14AM -0300, Marcelo Tosatti wrote:
 
 Function is now unused.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Was this patch also upstream?  Why does stable need it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.32.12] KVM: remove unused load_segment_descriptor_to_kvm_desct

2010-04-27 Thread Greg KH
On Tue, Apr 27, 2010 at 05:52:20PM +0300, Gleb Natapov wrote:
 On Tue, Apr 27, 2010 at 07:43:55AM -0700, Greg KH wrote:
  On Tue, Apr 27, 2010 at 11:14:14AM -0300, Marcelo Tosatti wrote:
   
   Function is now unused.
   
   Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Was this patch also upstream?  Why does stable need it?
  
 It is upstream, but as part of another commit. My guess is that when
 patches were ported to stable this chunk was lost due to wrong merge
 resolving.

Ok, but as this patch doesn't do anything, why is it needed in -stable?

Also, this type of information (why it isn't upstream), is REQUIRED in
order to be accepted.  Please include it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Disable unrestricted guest when EPT disabled

2010-04-07 Thread Greg KH
On Thu, Mar 18, 2010 at 02:11:19PM +0800, Sheng Yang wrote:
 Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
 supported processors.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com

Now included through a different submission.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-02 Thread Greg KH
On Fri, Apr 02, 2010 at 09:43:35AM +0300, Avi Kivity wrote:
 On 04/01/2010 10:24 PM, Tom Lyon wrote:

 But there are multiple msi-x interrupts, how do you know which one
 triggered?
  
 You don't. This would suck for KVM, I guess, but we'd need major rework of 
 the
 generic UIO stuff to have a separate event channel for each MSI-X.


 Doesn't it suck for non-kvm in the same way?  Multiple vectors are there 
 for a reason.  For example, if you have a multiqueue NIC, you'd have to 
 process all queues instead of just the one that triggered.

 For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine,
 because I'd be using MSI anyways if I could. The weird Intel 82599 VF only
 supports MSI-X.

 So one big question is - do we expand the whole UIO framework for KVM
 requirements, or do we split off either KVM or non-VM into a separate driver?
 Hans or Greg - care to opine?


 Currently kvm does device assignment with its own code, I'd like to unify 
 it with uio, not split it off.

 Separate notifications for msi-x interrupts are just as useful for uio as 
 they are for kvm.

I agree, there should not be a difference here for KVM vs. the normal
version.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[027/116] hrtimer: Tune hrtimer_interrupt hang logic

2010-03-30 Thread Greg KH
2.6.32-stable review patch.  If anyone has any objections, please let us know.

--

From: Thomas Gleixner t...@linutronix.de

commit 41d2e494937715d3150e5c75d01f0e75ae899337 upstream.

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev m...@tls.msk.ru
Signed-off-by: Thomas Gleixner t...@linutronix.de
Tested-by: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: Jeremy Fitzhardinge jer...@goop.org
Signed-off-by: Greg Kroah-Hartman gre...@suse.de

---
 include/linux/hrtimer.h  |   13 --
 kernel/hrtimer.c |   96 +++
 kernel/time/timer_list.c |5 +-
 3 files changed, 70 insertions(+), 44 deletions(-)

--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -162,10 +162,11 @@ struct hrtimer_clock_base {
  * @expires_next:  absolute time of the next event which was scheduled
  * via clock_set_next_event()
  * @hres_active:   State of high resolution mode
- * @check_clocks:  Indictator, when set evaluate time source and clock
- * event devices whether high resolution mode can be
- * activated.
- * @nr_events: Total number of timer interrupt events
+ * @hang_detected: The last hrtimer interrupt detected a hang
+ * @nr_events: Total number of hrtimer interrupt events
+ * @nr_retries:Total number of hrtimer interrupt retries
+ * @nr_hangs:  Total number of hrtimer interrupt hangs
+ * @max_hang_time: Maximum time spent in hrtimer_interrupt
  */
 struct hrtimer_cpu_base {
spinlock_t  lock;
@@ -173,7 +174,11 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
+   int hang_detected;
unsigned long   nr_events;
+   unsigned long   nr_retries;
+   unsigned long   nr_hangs;
+   ktime_t max_hang_time;
 #endif
 };
 
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -557,7 +557,7 @@ hrtimer_force_reprogram(struct hrtimer_c
 static int hrtimer_reprogram(struct hrtimer *timer,
 struct hrtimer_clock_base *base)
 {
-   ktime_t *expires_next = __get_cpu_var(hrtimer_bases).expires_next;
+   struct hrtimer_cpu_base *cpu_base = __get_cpu_var(hrtimer_bases);
ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base-offset);
int res;
 
@@ -582,7 +582,16 @@ static int hrtimer_reprogram(struct hrti
if (expires.tv64  0)
return -ETIME;
 
-   if (expires.tv64 = expires_next-tv64)
+   if (expires.tv64 = cpu_base-expires_next.tv64)
+   return 0;
+
+   /*
+* If a hang was detected in the last timer interrupt then we
+* do not schedule a timer which is earlier than the expiry
+* which we enforced in the hang detection. We want the system
+* to make progress.
+*/
+   if (cpu_base-hang_detected)
return 0;
 
/*
@@ -590,7 +599,7 @@ static int hrtimer_reprogram(struct hrti
 */
res = tick_program_event(expires, 0);
if (!IS_ERR_VALUE(res))
-   *expires_next = expires;
+   cpu_base-expires_next = expires;
return res;
 }
 
@@ -1217,29 +1226,6 @@ static void __run_hrtimer(struct hrtimer
 
 #ifdef CONFIG_HIGH_RES_TIMERS
 
-static int force_clock_reprogram;
-
-/*
- * After 5 iteration's attempts, we consider that hrtimer_interrupt()
- * is hanging, which could happen with something that slows the interrupt
- * such as the tracing. Then we force the clock reprogramming for each future
- * hrtimer interrupts to avoid infinite loops and use the min_delta_ns
- * threshold that we will overwrite.
- * The next tick event will be scheduled to 3 times we currently spend on
- * hrtimer_interrupt(). This gives a good compromise, the cpus will spend
- * 1/4 of their time to process the hrtimer interrupts. This is enough to
- * 

[49/74] KVM: allow userspace to adjust kvmclock offset

2010-02-04 Thread Greg KH
2.6.32-stable review patch.  If anyone has any objections, please let us know.

--

From: Glauber Costa glom...@redhat.com

(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d)

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de

---
 Documentation/kvm/api.txt   |   36 ++
 arch/x86/include/asm/kvm_host.h |1 
 arch/x86/kvm/x86.c  |   42 +++-
 include/linux/kvm.h |9 
 4 files changed, 87 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -412,6 +412,7 @@ struct kvm_arch{
unsigned long irq_sources_bitmap;
unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
u64 vm_init_tsc;
+   s64 kvmclock_offset;
 };
 
 struct kvm_vm_stat {
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -680,7 +680,8 @@ static void kvm_write_guest_time(struct 
/* With all the info we got, fill in the values */
 
vcpu-hv_clock.system_time = ts.tv_nsec +
-(NSEC_PER_SEC * (u64)ts.tv_sec);
+(NSEC_PER_SEC * (u64)ts.tv_sec) + 
v-kvm-arch.kvmclock_offset;
+
/*
 * The interface expects us to write an even number signaling that the
 * update is finished. Since the guest won't see the intermediate
@@ -1227,6 +1228,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_PIT2:
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+   case KVM_CAP_ADJUST_CLOCK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2424,6 +2426,44 @@ long kvm_arch_vm_ioctl(struct file *filp
r = 0;
break;
}
+   case KVM_SET_CLOCK: {
+   struct timespec now;
+   struct kvm_clock_data user_ns;
+   u64 now_ns;
+   s64 delta;
+
+   r = -EFAULT;
+   if (copy_from_user(user_ns, argp, sizeof(user_ns)))
+   goto out;
+
+   r = -EINVAL;
+   if (user_ns.flags)
+   goto out;
+
+   r = 0;
+   ktime_get_ts(now);
+   now_ns = timespec_to_ns(now);
+   delta = user_ns.clock - now_ns;
+   kvm-arch.kvmclock_offset = delta;
+   break;
+   }
+   case KVM_GET_CLOCK: {
+   struct timespec now;
+   struct kvm_clock_data user_ns;
+   u64 now_ns;
+
+   ktime_get_ts(now);
+   now_ns = timespec_to_ns(now);
+   user_ns.clock = kvm-arch.kvmclock_offset + now_ns;
+   user_ns.flags = 0;
+
+   r = -EFAULT;
+   if (copy_to_user(argp, user_ns, sizeof(user_ns)))
+   goto out;
+   r = 0;
+   break;
+   }
+
default:
;
}
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -593,6 +593,42 @@ struct kvm_irqchip {
} chip;
 };
 
+4.27 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+   __u64 clock;  /* kvmclock current value */
+   __u32 flags;
+   __u32 pad[9];
+};
+
+4.28 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the valued specific in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on 
scenarios
+such as migration.

Re: kernel memory allocation bug in 2.6.27.32-2.6.27.41 kvm section

2010-01-05 Thread Greg KH
On Sun, Dec 27, 2009 at 12:13:00PM +0200, Avi Kivity wrote:
 On 12/17/2009 05:35 PM, Oscon wrote:
  Hello!
 
  I can't register new account in bugzilla.kernel.org. / my ISP's spamfilter
  problem (?) maybe./
 
  --
 
  I sent this mail to Greg KH (2.6.27.y maintainer), he sent me:
 
  Can you get the kvm maintainers to agree that this is correct?
 
  thanks,
 
  greg k-h
 
  ---
  So the bug :
 
  I found a memory allocation bug in kvm/mmu.c  kvm_main.c. /in
  kvm_destroy_vm()/
 
  Affected kernel: 2.6.27.32-2.6.27.41
 
  Mainline kernel (2.6.32) is not affected. (Modified kvm subsystem.)
 
  Cause:
  http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.27.y.git;a=commitdiff_plain;h=d2127c8300fb1ec54af56faee17170e7a525326d
 
  Solution: Revert this patch.
 
  This bug can cause local DoS in the host system.
 
 
 
 
 Looks like some other patch is missing in 2.6.27.y.  Not sure what it is.
 
 But it's safer to revert this patch for now.

Ok, I've now reverted it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] pci: expose function reset capability in sysfs

2009-07-27 Thread Greg KH
On Sun, Jul 26, 2009 at 08:11:39PM +0300, Michael S. Tsirkin wrote:
 Some devices allow an individual function to be reset without affecting
 other functions in the same device: that's what pci_reset_function does.
 For devices that have this support, expose reset attribite in sysfs.

Please add the proper documentation to Documentation/ABI for any new
sysfs file you create.

  static int pci_create_capabilities_sysfs(struct pci_dev *dev)
  {
   int retval;
 @@ -943,7 +965,21 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
 *dev)
   /* Active State Power Management */
   pcie_aspm_create_sysfs_dev_files(dev);
  
 + if (!pci_probe_reset_function(dev)) {
 + retval = device_create_file(dev-dev, reset_attr);
 + if (retval)
 + goto error;
 + }

So you only add the file if there is a reset function, which is fine,
but later on:

 @@ -1037,6 +1073,7 @@ static void pci_remove_capabilities_sysfs(struct 
 pci_dev *dev)
   }
  
   pcie_aspm_remove_sysfs_dev_files(dev);
 + device_remove_file(dev-dev, reset_attr);

You always remove the file, if it has been created or not.  That could
cause problems in the future, please only remove a file if you have
actually added it to the sysfs tree.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] pci: expose function reset capability in sysfs

2009-07-27 Thread Greg KH
On Mon, Jul 27, 2009 at 06:52:38PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 27, 2009 at 08:01:46AM -0700, Greg KH wrote:
   + if (!pci_probe_reset_function(dev)) {
   + retval = device_create_file(dev-dev, reset_attr);
   + if (retval)
   + goto error;
   + }
  
  So you only add the file if there is a reset function, which is fine,
  but later on:
  
   @@ -1037,6 +1073,7 @@ static void pci_remove_capabilities_sysfs(struct 
   pci_dev *dev)
 }

 pcie_aspm_remove_sysfs_dev_files(dev);
   + device_remove_file(dev-dev, reset_attr);
  
  You always remove the file, if it has been created or not.  That could
  cause problems in the future, please only remove a file if you have
  actually added it to the sysfs tree.
  
  thanks,
  
  greg k-h
 
 OK. I think, however, that it's better to avoid probing the function
 on cleanup path just to figure out whether the file needs to be removed.
 Can this info be stored in struct pci_dev?
 Along the lines of:

Fine with me.  You forgot the documentation though :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] pci: expose function reset capability in sysfs

2009-07-27 Thread Greg KH
On Mon, Jul 27, 2009 at 11:37:48PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 27, 2009 at 09:14:23AM -0700, Greg KH wrote:
  Fine with me.  You forgot the documentation though :)
 
 This enough?
 
 pci: expose function reset capability in sysfs
 
 Some devices allow an individual function to be reset without affecting
 other functions in the same device: that's what pci_reset_function does.
 For devices that have this support, expose reset attribite in sysfs.
 
 This is useful e.g. for virtualization, where a qemu userspace
 process wants to reset the device when the guest is reset,
 to emulate machine reboot as closely as possible.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Looks good to me:
Acked-by: Greg Kroah-Hartman gre...@suse.de


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5] uio: add generic driver for PCI 2.3 devices

2009-07-20 Thread Greg KH
On Mon, Jul 20, 2009 at 10:52:27PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 20, 2009 at 09:09:43PM +0200, Hans J. Koch wrote:
  On Mon, Jul 20, 2009 at 10:29:34AM +0300, Michael S. Tsirkin wrote:
   This adds a generic uio driver that can bind to any PCI device.  First
   user will be virtualization where a qemu userspace process needs to give
   guest OS access to the device.
   
   Interrupts are handled using the Interrupt Disable bit in the PCI command
   register and Interrupt Status bit in the PCI status register.  All devices
   compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices 
   should
   support these bits.  Driver detects this support, and won't bind to 
   devices
   which do not support the Interrupt Disable Bit in the command register.
   
   It's expected that more features of interest to virtualization will be
   added to this driver in the future. Possibilities are: mmap for device
   resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
  
  Thanks for adding the docs! Looks alright to me.
  
  Thanks,
  Hans
  
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   Acked-by: Chris Wright chr...@redhat.com
  
  Signed-off-by: Hans J. Koch h...@linutronix.de
 
 Jesse just acked this patch in a private mail, as well.
 Acked-by: Jesse Barnes jbar...@virtuousgeek.org

Odd, but ok...

I'll queue it up now.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices

2009-07-16 Thread Greg KH
On Thu, Jul 16, 2009 at 05:12:55PM +0200, Hans J. Koch wrote:
 On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
   How about moving that documentation into a place that people will notice
   it, like the rest of the UIO documentation?
  
  Greg,
  
  would it make more sense to add this to
  Documentation/DocBook/uio-howto.xml, or to create
  Documentation/uio_pci_generic.txt ?
 
 Hi Michael,
 I'd prefer to have it in uio-howto.xml so that people only have to look in
 one place. In does not have to be very detailled, just a short explanation
 what this driver is all about and a short example how to use it.

I agree, that would be the best place for it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices

2009-07-16 Thread Greg KH
On Thu, Jul 16, 2009 at 08:03:46PM +0300, Michael S. Tsirkin wrote:
 On Thu, Jul 16, 2009 at 08:52:08AM -0700, Greg KH wrote:
  On Thu, Jul 16, 2009 at 05:12:55PM +0200, Hans J. Koch wrote:
   On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote:
On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
 How about moving that documentation into a place that people will 
 notice
 it, like the rest of the UIO documentation?

Greg,

would it make more sense to add this to
Documentation/DocBook/uio-howto.xml, or to create
Documentation/uio_pci_generic.txt ?
   
   Hi Michael,
   I'd prefer to have it in uio-howto.xml so that people only have to look in
   one place. In does not have to be very detailled, just a short explanation
   what this driver is all about and a short example how to use it.
  
  I agree, that would be the best place for it.
  
  thanks,
  
  greg k-h
 
 OK. Could you take a look at the following please?

Looks good to me.

Want to resend both of the patches so I can apply them to my tree?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices

2009-07-16 Thread Greg KH
On Thu, Jul 16, 2009 at 03:31:01PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
  On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
   This adds a generic uio driver that can bind to any PCI device.  First
   user will be virtualization where a qemu userspace process needs to give
   guest OS access to the device.
   
   Interrupts are handled using the Interrupt Disable bit in the PCI command
   register and Interrupt Status bit in the PCI status register.  All devices
   compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices 
   should
   support these bits.  Driver detects this support, and won't bind to 
   devices
   which do not support the Interrupt Disable Bit in the command register.
   
   It's expected that more features of interest to virtualization will be
   added to this driver in the future. Possibilities are: mmap for device
   resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
   
   Acked-by: Chris Wright chr...@redhat.com
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
   
   Hans, Greg, please review and consider for upstream.
   
   This is intended to solve the problem in virtualization that shared
   interrupts do not work with assigned devices. Earlier versions of this
   patch have circulated on k...@vger.
  
  How does this play with the pci-stub driver that I thought was written
  to solve this very problem?
 
 
 AFAIK the problem pci stub was written to solve is simply to bind to a
 device. You then have to use another kernel module which looks the
 device up with something like pci_get_bus_and_slot to do anything
 useful. In particular, for non-shared interrupts, we can disable the
 interrupt in the apic. But this does not work well for shared
 interrupts. Thus this work.
 
 The uio driver will be used in virtualization scenarious, a couple
 of possible ones that have been mentioned on the kvm list are:
 - device assignment (guest access to device) for simple devices with
   shared interrupts: emulating PCI is tricky enough to better be done in
   userspace. shared interrupt support is important as it happens
   with real devices
 - simple communication between guest and host:
   we create a virtual device in host, and userspace
   driver in guest gets events and passes them on
   to e.g. dbus. shared interrupt support is important
   to avoid wasting irqs

Ah, ok, thanks for all of the explanation, that makes sense.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] uio: add generic driver for PCI 2.3 devices

2009-07-15 Thread Greg KH
On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
 This adds a generic uio driver that can bind to any PCI device.  First
 user will be virtualization where a qemu userspace process needs to give
 guest OS access to the device.
 
 Interrupts are handled using the Interrupt Disable bit in the PCI command
 register and Interrupt Status bit in the PCI status register.  All devices
 compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
 support these bits.  Driver detects this support, and won't bind to devices
 which do not support the Interrupt Disable Bit in the command register.
 
 It's expected that more features of interest to virtualization will be
 added to this driver in the future. Possibilities are: mmap for device
 resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
 
 Acked-by: Chris Wright chr...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Hans, Greg, please review and consider for upstream.
 
 This is intended to solve the problem in virtualization that shared
 interrupts do not work with assigned devices. Earlier versions of this
 patch have circulated on k...@vger.

How does this play with the pci-stub driver that I thought was written
to solve this very problem?  Will it conflict?

In fact, it looks like you copied the comments for this driver directly
from the pci-stub driver :)

How about moving that documentation into a place that people will notice
it, like the rest of the UIO documentation?

And right now you are just sending the irq to userspace, what is
userspace supposed to do with it?  Do you have a userspace program that
uses this interface today to verify that everything works?  If so, care
to provide a pointer to it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 09/19] net: Add vbus_enet driver

2009-04-09 Thread Greg KH
On Thu, Apr 09, 2009 at 09:37:10AM -0700, Stephen Hemminger wrote:
  +static int tx_ringlen = 256;
  +module_param(tx_ringlen, int, 0444);
  +
  +#undef PDEBUG /* undef it, just in case */
  +#ifdef VBUS_ENET_DEBUG
  +#  define PDEBUG(fmt, args...) printk(KERN_DEBUG vbus_enet:  fmt, ## 
  args)
  +#else
  +#  define PDEBUG(fmt, args...) /* not debugging: nothing */
  +#endif
 
 Why reinvent pr_debug()?

Even more important, use dev_dbg() instead please, that uniquly
describes your device and driver together, which is what you need/want,
and it ties into the dynamic debug work, so you don't need a special
kernel config option.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

2009-03-10 Thread Greg KH
On Tue, Mar 10, 2009 at 09:37:53AM +0800, Yu Zhao wrote:
 On Tue, Mar 10, 2009 at 03:39:01AM +0800, Greg KH wrote:
  On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote:
 + pci_device_add(virtfn, virtfn-bus);

Greg is probably going to ding you here for adding the device, then
creating the symlinks.  I believe it's now best practice to create the
symlinks first, so there's no window where userspace can get confused.
   
   Yes, but unfortunately we can't create links before adding a device.
   I double checked device_add(), there is no place for those links to be
   created before it sends uevent. So for now, we have to trigger another
   uevent for those links.
  
  What exactly are you trying to do with a symlink here that you need to
  do it this way?  I vaguely remember you mentioning this in the past, but
  I thought you had dropped the symlinks after our conversation about this
  very problem.
 
 I'd like to create some symlinks to reflect the relationship between
 Physical Function and its associated Virtual Functions. The Physical
 Function is like a master device that controls the allocation of its
 Virtual Functions and owns the device physical resource. The Virtual
 Functions are like slave devices of the Physical Function. For example,
 if 01:00.0 is a Physical Function and 02:00.0 is a Virtual Function
 associated with 01:00.0. Then the symlinks (virtfnN and physfn) would
 look like:
 
   $ ls -l /sys/bus/pci/devices/:01:00.0/
   ...
   ...  virtfn0 - ../:02:00.0
   ...  virtfn1 - ../:02:00.1
   ...  virtfn2 - ../:02:00.2
   ...
 
   $ ls -l /sys/bus/pci/devices/:02:00.0/
   ...
   ... physfn - ../:01:00.0
   ...
 
 This is very useful for userspace applications, both KVM and Xen need
 to know this kind of relationship so they can request the permission
 from a Physical Function before using its associated Virtual Functions.

Ok, but then make sure you never rely on a udev rule or notifier to see
these symlinks when the device is added to the kernel, as there will be
a nice race condition there :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 1/7] PCI: initialize and release SR-IOV capability

2009-03-10 Thread Greg KH
On Tue, Mar 10, 2009 at 09:19:44AM +0800, Yu Zhao wrote:
 On Sat, Mar 07, 2009 at 10:38:45AM +0800, Greg KH wrote:
  On Fri, Mar 06, 2009 at 01:08:10PM -0700, Matthew Wilcox wrote:
   On Fri, Feb 20, 2009 at 02:54:42PM +0800, Yu Zhao wrote:
+   list_for_each_entry(pdev, dev-bus-devices, bus_list)
+   if (pdev-sriov)
+   break;
+   if (list_empty(dev-bus-devices) || !pdev-sriov)
+   pdev = NULL;
+   ctrl = 0;
+   if (!pdev  pci_ari_enabled(dev-bus))
+   ctrl |= PCI_SRIOV_CTRL_ARI;
+
   
   I don't like this loop.  At the end of a list_for_each_entry() loop,
   pdev will not be pointing at a pci_device, it'll be pointing to some
   offset from dev-bus-devices.  So checking pdev-sriov at this point
   is really, really bad.  I would prefer to see something like this:
   
   ctrl = 0;
   list_for_each_entry(pdev, dev-bus-devices, bus_list) {
   if (pdev-sriov)
   goto ari_enabled;
   }
   
   if (pci_ari_enabled(dev-bus))
   ctrl = PCI_SRIOV_CTRL_ARI;
ari_enabled:
   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
  
  No, please use bus_for_each_dev() instead, or bus_find_device(), don't
  walk the bus list by hand.  I'm kind of surprised that even builds.  Hm,
  in looking at the 2.6.29-rc kernels, I notice it will not even build at
  all, you are now forced to use those functions, which is good.
 
 The devices haven't been added at this time, so we can't use
 bus_for_each_dev(). I guess that's why the `bus-devices' exists, and
 actually pci_bus_add_devices() walks the bus list same way to retrieve
 the devices and add them.

ah, this is struct pci_bus, not struct bus_type, my mistake.

sorry for the noise,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

2009-03-09 Thread Greg KH
On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote:
   + pci_device_add(virtfn, virtfn-bus);
  
  Greg is probably going to ding you here for adding the device, then
  creating the symlinks.  I believe it's now best practice to create the
  symlinks first, so there's no window where userspace can get confused.
 
 Yes, but unfortunately we can't create links before adding a device.
 I double checked device_add(), there is no place for those links to be
 created before it sends uevent. So for now, we have to trigger another
 uevent for those links.

What exactly are you trying to do with a symlink here that you need to
do it this way?  I vaguely remember you mentioning this in the past, but
I thought you had dropped the symlinks after our conversation about this
very problem.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 0/7] PCI: Linux kernel SR-IOV support

2009-03-08 Thread Greg KH
On Sun, Mar 08, 2009 at 09:01:09AM -0600, Matthew Wilcox wrote:
 On Sun, Mar 08, 2009 at 04:30:16PM +0200, Avi Kivity wrote:
  Matthew Wilcox wrote:
  On Tue, Feb 24, 2009 at 12:47:38PM +0200, Avi Kivity wrote:
  Do those patches allow using a VF on the host (in other words, does the 
  kernel emulate config space accesses)?
  
  SR-IOV hardware handles config space accesses to virtual functions.  No
  kernel changes needed for that aspect of it.
  
  Patches 2 and 3 of the patchset that enables SR/IOV in kvm [1] suggest 
  that at the config space is only partially implemented.
  
  [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/29034
 
 I believe Red Hat are now members of the PCI SIG, you really should
 download the SR-IOV spec that Yu Zhao keeps pointing at.  It's going to
 be very hard to have a sensible discussion if you haven't read it.

But if any kernel developers are working for companies that are not
members of the PCI SIG, and wish to read the PCI specs, please let me
know as we are working to enable this to happen through the Linux
Foundation.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 0/7] PCI: Linux kernel SR-IOV support

2009-03-06 Thread Greg KH
On Fri, Mar 06, 2009 at 12:44:11PM -0700, Matthew Wilcox wrote:
  Physical Function driver patches for Intel 82576 NIC are available:
http://patchwork.kernel.org/patch/8063/
http://patchwork.kernel.org/patch/8064/
http://patchwork.kernel.org/patch/8065/
http://patchwork.kernel.org/patch/8066/
 
 I need to review this driver; I haven't done that yet.  Has anyone else?

The driver was rejected by the upstream developers, who said it would
never be accepted.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 1/7] PCI: initialize and release SR-IOV capability

2009-03-06 Thread Greg KH
On Fri, Mar 06, 2009 at 01:08:10PM -0700, Matthew Wilcox wrote:
 On Fri, Feb 20, 2009 at 02:54:42PM +0800, Yu Zhao wrote:
  +   list_for_each_entry(pdev, dev-bus-devices, bus_list)
  +   if (pdev-sriov)
  +   break;
  +   if (list_empty(dev-bus-devices) || !pdev-sriov)
  +   pdev = NULL;
  +   ctrl = 0;
  +   if (!pdev  pci_ari_enabled(dev-bus))
  +   ctrl |= PCI_SRIOV_CTRL_ARI;
  +
 
 I don't like this loop.  At the end of a list_for_each_entry() loop,
 pdev will not be pointing at a pci_device, it'll be pointing to some
 offset from dev-bus-devices.  So checking pdev-sriov at this point
 is really, really bad.  I would prefer to see something like this:
 
 ctrl = 0;
 list_for_each_entry(pdev, dev-bus-devices, bus_list) {
 if (pdev-sriov)
 goto ari_enabled;
 }
 
 if (pci_ari_enabled(dev-bus))
 ctrl = PCI_SRIOV_CTRL_ARI;
  ari_enabled:
 pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);

No, please use bus_for_each_dev() instead, or bus_find_device(), don't
walk the bus list by hand.  I'm kind of surprised that even builds.  Hm,
in looking at the 2.6.29-rc kernels, I notice it will not even build at
all, you are now forced to use those functions, which is good.

Has anyone even tried to build this patch recently?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

2009-03-06 Thread Greg KH
On Fri, Mar 06, 2009 at 01:37:18PM -0700, Matthew Wilcox wrote:
 On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
  +   virtfn-sysdata = dev-bus-sysdata;
  +   virtfn-dev.parent = dev-dev.parent;
  +   virtfn-dev.bus = dev-dev.bus;
  +   virtfn-devfn = devfn;
  +   virtfn-hdr_type = PCI_HEADER_TYPE_NORMAL;
  +   virtfn-cfg_size = PCI_CFG_SPACE_EXP_SIZE;
  +   virtfn-error_state = pci_channel_io_normal;
  +   virtfn-current_state = PCI_UNKNOWN;
  +   virtfn-is_pcie = 1;
  +   virtfn-pcie_type = PCI_EXP_TYPE_ENDPOINT;
  +   virtfn-dma_mask = 0x;
  +   virtfn-vendor = dev-vendor;
  +   virtfn-subsystem_vendor = dev-subsystem_vendor;
  +   virtfn-class = dev-class;
 
 There seems to be a certain amount of commonality between this and
 pci_scan_device().  Have you considered trying to make a common helper
 function, or does it not work out well?
 
  +   pci_device_add(virtfn, virtfn-bus);
 
 Greg is probably going to ding you here for adding the device, then
 creating the symlinks.  I believe it's now best practice to create the
 symlinks first, so there's no window where userspace can get confused.

If the uevent gets sent before the symlinks are created, it's a bug.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] ATS capability support for Intel IOMMU

2009-02-25 Thread Greg KH
On Thu, Feb 26, 2009 at 10:50:35AM +0800, Yu Zhao wrote:
 On Sun, Feb 15, 2009 at 06:59:10AM +0800, Grant Grundler wrote:
  On Thu, Feb 12, 2009 at 08:50:32PM +0800, Yu Zhao wrote:
   This patch series implements Address Translation Service support for
   the Intel IOMMU. ATS makes the PCI Endpoint be able to request the
   DMA address translation from the IOMMU and cache the translation in
   the Endpoint, thus alleviate IOMMU pressure and improve the hardware
   performance in the I/O virtualization environment.
   
   
   Changelog: v2 - v3
 1, throw error message if VT-d hardware detects invalid descriptor
on Queued Invalidation interface (David Woodhouse)
 2, avoid using pci_find_ext_capability every time when reading ATS
Invalidate Queue Depth (Matthew Wilcox)
   Changelog: v1 - v2
 added 'static' prefix to a local LIST_HEAD (Andrew Morton)
   
   
   Yu Zhao (6):
 PCI: support the ATS capability
 VT-d: parse ATSR in DMA Remapping Reporting Structure
 VT-d: add queue invalidation fault status support
 VT-d: add device IOTLB invalidation support
 VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
 VT-d: support the device IOTLB
   
drivers/pci/dmar.c   |  230 
   ++
  
  Yu,
  Can you please add something to Documentation/PCI/pci.txt?
  New API I'm seeing are:
  +extern int pci_enable_ats(struct pci_dev *dev, int ps);
  +extern void pci_disable_ats(struct pci_dev *dev);
  +extern int pci_ats_queue_depth(struct pci_dev *dev);
 
 Yes, I'll document these new API.
 
  Do these also need to be EXPORT_SYMBOL_GPL() as well?
  Or are drivers never expected to call the above?
 
 PCI device driver shouldn't use these API, only IOMMU driver (can't be module)
 would use them. Anyway it's a good idea to export them :-)

Don't export them if no one is using them, that's just a waste of space.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] PCI: add some sysfs ABI docs

2009-02-23 Thread Greg KH
On Mon, Feb 23, 2009 at 06:17:25PM -0800, Chris Wright wrote:
 Add sysfs ABI docs for driver entries bind, unbind and new_id.  These
 entries are pretty old, from 2.6.0 onwards AFAIK, so this documents
 current behaviour.
 
 Signed-off-by: Chris Wright chr...@sous-sol.org
 ---
  Documentation/ABI/testing/sysfs-bus-pci |   41 
 
  1 file changed, 41 insertions(+)
 
 --- a/Documentation/ABI/testing/sysfs-bus-pci
 +++ b/Documentation/ABI/testing/sysfs-bus-pci
 @@ -1,3 +1,44 @@
 +What:/sys/bus/pci/drivers/.../bind
 +Date:December 2003
 +Contact: linux-...@vger.kernel.org
 +Description:
 + Writing a device location to this file will cause
 + the driver to attempt to bind to the device found at
 + this location.  This is useful for overriding default
 + bindings.  The format for the location is: :BB:DD.F.
 + That is Domain:Bus:Device.Function and is the same as
 + found in /sys/bus/pci/devices/.  For example:
 + # echo :00:19.0  /sys/bus/pci/drivers/foo/bind

Don't you need 'echo -n' instead?  Or did we fix that problem?  Or is
that just for the new_id file?

If so, feel free to ignore the comment and add:
Acked-by: Greg Kroah-Hartman gre...@suse.de

Thanks a lot for doing this, it is much needed.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] PCI: add remove_id sysfs entry

2009-02-23 Thread Greg KH
On Mon, Feb 23, 2009 at 06:18:29PM -0800, Chris Wright wrote:
 This adds a remove_id sysfs entry to allow users of new_id to later
 remove the added dynid.  One use case is management tools that want to
 dynamically bind/unbind devices to pci-stub driver while devices are
 assigned to KVM guests.  Rather than having to track which driver was
 originally bound to the driver, a mangement tool can simply:
 
 # echo 8086 10f5  /sys/bus/pci/drivers/pci-stub/new_id
 # echo -n :00:19.0  /sys/bus/pci/devices/:00:19.0/driver/unbind
 # echo -n :00:19.0  /sys/bus/pci/drivers/pci-stub/bind

Ah, you do need -n with unbind and bind, you might want to change your
documentation :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >