Re: [PATCH 2/7] PCI: Add mask bit definition for MSI-X table

2010-11-15 Thread Sheng Yang
On Friday 12 November 2010 01:29:29 Jesse Barnes wrote:
 On Thu, 11 Nov 2010 15:46:55 +0800
 
 Sheng Yang sh...@linux.intel.com wrote:
  Then we can use it instead of magic number 1.
  
  Reviewed-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
  Cc: Matthew Wilcox wi...@linux.intel.com
  Cc: Jesse Barnes jbar...@virtuousgeek.org
  Cc: linux-...@vger.kernel.org
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   drivers/pci/msi.c|5 +++--
   include/linux/pci_regs.h |1 +
   2 files changed, 4 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
  index 69b7be3..095634e 100644
  --- a/drivers/pci/msi.c
  +++ b/drivers/pci/msi.c
  @@ -158,8 +158,9 @@ static u32 __msix_mask_irq(struct msi_desc *desc, u32
  flag)
  
  u32 mask_bits = desc-masked;
  unsigned offset = desc-msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
  
  PCI_MSIX_ENTRY_VECTOR_CTRL;
  
  -   mask_bits = ~1;
  -   mask_bits |= flag;
  +   mask_bits = ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
  +   if (flag)
  +   mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
  
  writel(mask_bits, desc-mask_base + offset);
  
  return mask_bits;
  
  diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
  index acfc224..ff51632 100644
  --- a/include/linux/pci_regs.h
  +++ b/include/linux/pci_regs.h
  @@ -313,6 +313,7 @@
  
   #define  PCI_MSIX_ENTRY_UPPER_ADDR 4
   #define  PCI_MSIX_ENTRY_DATA   8
   #define  PCI_MSIX_ENTRY_VECTOR_CTRL12
  
  +#define   PCI_MSIX_ENTRY_CTRL_MASKBIT  1
  
   /* CompactPCI Hotswap Register */
 
 Applied 1/7 and 2/7 to my linux-next tree, thanks.
 
 If it's easier to push them both through the kvm tree let me know; you
 can just add my acked-by in that case.

Thanks Jesse!

Avi, which way do you prefer?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov
On Mon, Nov 15, 2010 at 09:53:50AM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 15, 2010 at 09:40:08AM +0200, Gleb Natapov wrote:
  On Sun, Nov 14, 2010 at 10:40:33PM -0500, Kevin O'Connor wrote:
   On Sun, Nov 14, 2010 at 05:39:41PM +0200, Gleb Natapov wrote:
+/*
+ * This function returns device list as an array in a below format:
+ * +-+-+---+-+---+--
+ * |  n  |  l1 |   devpath1|  l2 |  devpath2 | ...
+ * +-+-+---+-+---+--
+ * where:
+ *   n - a number of devise pathes (one byte)
+ *   l - length of following device path string (one byte)
+ *   devpath - non-null terminated string of length l representing
+ * one device path
+ */
   
   Why not just return a newline separated list that is null terminated?
   
  Doing it like this will needlessly complicate firmware side. How do you
  know how much memory to allocate before reading device list?
 
 Do a memory scan, count newlines until you reach 0?
 
To do memory scan you need to read it into memory first. To read it into
memory you need to know how much memory to allocate to know how much
memory to allocate you meed to do memory scan... Notice pattern here :)
Of course you can scan IO space too discarding everything you read first
time, but why introduce broken interface in the first place?

  Doing it
  like Blue suggest (have BOOTINDEX_LEN and BOOTINDEX_STRING) solves this.
  To create nice array from bootindex string you firmware will still have
  to do additional pass on it though.
 
 Why is this a problem? Pass over memory is cheap, isn't it?
 
More code, each line of code potentially introduce bug. But I will go with
Blue suggestion anyway since we already use it for other things.

  With format like above the code
  would look like that:
  
  qemu_cfg_read(n, 1);
  arr = alloc(n);
  for (i=0; in; i++) {
   qemu_cfg_read(l, 1);
   arr[i] = zalloc(l+1);
   qemu_cfg_read(arr[i], l);
  }
   
  
  --
  Gleb.
 
 
 At this point I don't care about format.
I do.

 But I would like one without 1-byte-length limitations,
 just so we can cover whatever pci can through at us.
 
I agree. 1-byte for one device string may be to limiting. It is still
more then 15 PCI bridges on a PC and if you have your pci bus go that
deep you are doing something very wrong. But according to spec device
name can be 32 byte long and device address may be 64bit physical
address and that makes length of one device element to be 50 byte.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] KVM: assigned dev: MSI-X mask support

2010-11-15 Thread Sheng Yang
On Monday 15 November 2010 16:03:53 Michael S. Tsirkin wrote:
 On Mon, Nov 15, 2010 at 03:48:46PM +0800, Sheng Yang wrote:
  On Monday 15 November 2010 15:42:50 Michael S. Tsirkin wrote:
   On Mon, Nov 15, 2010 at 03:37:21PM +0800, Sheng Yang wrote:
  We can back to them if there is someone really did it in that
  way. But for all hypervisors using QEmu, I think we haven't seen
  such kind of behavior yet.
 
 I would rather stick to the spec than go figure out what do
 BSD/Sun/Mac do, or will do.

Sure, but no hurry for that. It doesn't similar to the API case, so
we can achieve it incrementally.
   
   Isn't the proposed way to solve this to move vector address/data
   handling into kernel too? If yes it does affect the API.
  
  It didn't afffect the API used by this patch. So the code can still be
  modified after later.
 
 Then won't we have to support two APIs, forever?

In fact for this patch, the logic is pretty straightforward. I don't think this 
patch would trouble us if we really have to support two APIs(userspace and in-
kernel routing table) in the end. Just check msix_mmio_write(), you would find 
it 
just cost few lines(maybe just one line r = -EOPNOTSUPP) to get the 
modification 
handled in userspace. All other logic mostly remained the same as in the this 
patch.

IMO Mask bit support and irq routing are two separate things. It make no sense 
to 
block mask bit support patch due to the one idea of possible irq routing change 
in 
the future.

--
regards
Yang, Sheng

 
  --
  regards
  Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov
On Sun, Nov 14, 2010 at 10:50:13PM +, Blue Swirl wrote:
 On Sun, Nov 14, 2010 at 3:39 PM, Gleb Natapov g...@redhat.com wrote:
 
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   hw/fw_cfg.c |   14 ++
   hw/fw_cfg.h |    4 +++-
   sysemu.h    |    1 +
   vl.c        |   51 +++
   4 files changed, 69 insertions(+), 1 deletions(-)
 
  diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
  index 7b9434f..f6a67db 100644
  --- a/hw/fw_cfg.c
  +++ b/hw/fw_cfg.c
  @@ -53,6 +53,7 @@ struct FWCfgState {
      FWCfgFiles *files;
      uint16_t cur_entry;
      uint32_t cur_offset;
  +    Notifier machine_ready;
   };
 
   static void fw_cfg_write(FWCfgState *s, uint8_t value)
  @@ -315,6 +316,15 @@ int fw_cfg_add_file(FWCfgState *s,  const char 
  *filename, uint8_t *data,
      return 1;
   }
 
  +static void fw_cfg_machine_ready(struct Notifier* n)
  +{
  +    uint32_t len;
  +    char *bootindex = get_boot_devices_list(len);
  +
  +    fw_cfg_add_bytes(container_of(n, FWCfgState, machine_ready),
  +                     FW_CFG_BOOTINDEX, (uint8_t*)bootindex, len);
  +}
  +
   FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
                          target_phys_addr_t ctl_addr, target_phys_addr_t 
  data_addr)
   {
  @@ -343,6 +353,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
  data_port,
      fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
      fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 
  +
  +    s-machine_ready.notify = fw_cfg_machine_ready;
  +    qemu_add_machine_init_done_notifier(s-machine_ready);
  +
      return s;
   }
 
  diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
  index 856bf91..4d61410 100644
  --- a/hw/fw_cfg.h
  +++ b/hw/fw_cfg.h
  @@ -30,7 +30,9 @@
 
   #define FW_CFG_FILE_FIRST       0x20
   #define FW_CFG_FILE_SLOTS       0x10
  -#define FW_CFG_MAX_ENTRY        (FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
  +#define FW_CFG_FILE_LAST_SLOT   (FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
  +#define FW_CFG_BOOTINDEX        (FW_CFG_FILE_LAST_SLOT + 1)
  +#define FW_CFG_MAX_ENTRY        FW_CFG_BOOTINDEX
 
 This should be
 #define FW_CFG_MAX_ENTRY(FW_CFG_BOOTINDEX + 1)
 because the check is like this:
 if ((key  FW_CFG_ENTRY_MASK) = FW_CFG_MAX_ENTRY) {
 s-cur_entry = FW_CFG_INVALID;
 
Yeah, will fix.

 With that change, I got the bootindex passed to OpenBIOS:
 OpenBIOS for Sparc64
 Configuration device id QEMU version 1 machine id 0
 kernel cmdline
 CPUs: 1 x SUNW,UltraSPARC-IIi
 UUID: ----
 bootindex num_strings 1
 bootindex /p...@01fe/i...@5/dr...@1/d...@0
 
 The device path does not match exactly, but it's close:
 /p...@1fe,0/pci-...@5/i...@600/d...@0

pbm-pci should be solvable by the patch at the end. Were in the spec
it is allowed to abbreviate 1fe as 1fe,0? Spec allows to drop
starting zeroes but TARGET_FMT_plx definition in targphys.h has 0 after
%. I can define another one without leading zeroes. Can you suggest
a name?  TARGET_FMT_lx is poisoned. As of ATA there is no open firmware
binding spec for ATA, so everyone does what he pleases. I based my
implementation on what open firmware showing when running on qemu x86.
pci-ata should be ide according to PCI binding spec :) 

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index c619112..643aa49 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -453,6 +453,7 @@ static PCIDeviceInfo pbm_pci_host_info = {
 
 static SysBusDeviceInfo pbm_host_info = {
 .qdev.name = pbm,
+.qdev.fw_name = pci,
 .qdev.size = sizeof(APBState),
 .qdev.reset = pci_pbm_reset,
 .init = pci_pbm_init_device,
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New @ Proxmox: -device, vhost... Docs, notes?

2010-11-15 Thread Markus Armbruster
linux_...@proinbox.com writes:

 Hi Everyone,

 I'm impressed with all the activity I see here since joining the list
 this year.
 It helps to reinforce that I chose the right technology. Thanks.



 The -device method  vhost=on option recently became available to us at
 the ProxmoxVE project  I'm preparing to start making use of them this
 coming week.

 I found some docs on linux-kvm.org, and still have a bit to look through
  see what I can find.

 I don't want to state on the record that I will or I won't but it's
 crossed my mind to assemble something for our wiki, and if I do decide
 to it would be nice to have some fresh material.
 So I'm just writing to see if anyone would like to share any of your
 favorite reference material- ie. links, benchmarks, drawings, diagrams,
 etc.


 Kind Regards,

For -device, check out docs/qdev-device-use.txt.  It could use a minor
update.

Other obvious resources:
http://wiki.qemu.org/Main_Page
http://www.linux-kvm.org/page/Main_Page

Fresh material:
http://www.linux-kvm.org/page/KVM_Forum_2010#Presentations
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why so many vm exits caused by ept violation

2010-11-15 Thread Avi Kivity

On 11/15/2010 09:24 AM, lidong chen wrote:

the address is the Region 1 of virtio_net.

why virtio_net use this address caused ept violation?


It's probably the MSIX mask bit.  Older kernels program this bit twice 
on every interrupt.  Newer kernels do this much less frequently.  Try 
with a new kernel and see.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-15 Thread Sheng Yang
We need to query the entry later.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |2 ++
 virt/kvm/irq_comm.c  |   20 
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9da2f1a..274655b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -669,6 +669,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *entries,
unsigned nr,
unsigned flags);
+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+   struct kvm_kernel_irq_routing_entry *entry);
 void kvm_free_irq_routing(struct kvm *kvm);
 
 #else
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 8edca91..ae1dc7c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -421,6 +421,26 @@ out:
return r;
 }
 
+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+   struct kvm_kernel_irq_routing_entry *entry)
+{
+   int count = 0;
+   struct kvm_kernel_irq_routing_entry *ei = NULL;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm-irq_routing);
+   if (gsi  irq_rt-nr_rt_entries)
+   hlist_for_each_entry(ei, n, irq_rt-map[gsi], link)
+   count++;
+   if (count == 1)
+   *entry = *ei;
+   rcu_read_unlock();
+
+   return (count != 1);
+}
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,  \
  .u.irqchip.irqchip = KVM_IRQCHIP_IOAPIC, .u.irqchip.pin = (irq) }
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6 v5] MSI-X mask support for assigned device

2010-11-15 Thread Sheng Yang
Change from v4:
1. Rebased on latest KVM
2. Fix minor comments.
3. Drop big-endian patch because unable to guarantee the correctless. Add TODO 
for it.

Change from v3:
1. Re-design the userspace API.
2. Add big-endian support for msix_mmio_read/write()(untested!)

Change from v2:
1. Move all mask handling to kernel, and userspace has to access it using API.
2. Discard userspace mask bit operation API.

Sheng Yang (6):
  PCI: MSI: Move MSI-X entry definition to pci_regs.h
  PCI: Add mask bit definition for MSI-X table
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_get_irq_routing_entry() func
  KVM: assigned dev: Clean up assigned_device's flag
  KVM: assigned dev: MSI-X mask support

 arch/x86/kvm/x86.c   |1 +
 drivers/pci/msi.c|5 +-
 drivers/pci/msi.h|6 -
 include/linux/kvm.h  |   32 +
 include/linux/kvm_host.h |   31 +
 include/linux/pci_regs.h |8 +
 virt/kvm/assigned-dev.c  |  325 +-
 virt/kvm/iodev.h |   25 +
 virt/kvm/irq_comm.c  |   20 +++
 9 files changed, 417 insertions(+), 36 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-15 Thread Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/x86.c   |1 +
 include/linux/kvm.h  |   32 +
 include/linux/kvm_host.h |5 +
 virt/kvm/assigned-dev.c  |  318 +-
 4 files changed, 355 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc29223..37602e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..b3e5ffe 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_MSIX_MASK 60
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +675,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MASK */
+#define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, struct kvm_msix_entry)
+#define KVM_UPDATE_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct kvm_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_TYPE_ASSIGNED_DEV 1
+
+#define KVM_MSIX_FLAG_MASKBIT  (1  0)
+#define KVM_MSIX_FLAG_QUERY_MASKBIT(1  0)
+
+struct kvm_msix_entry {
+   __u32 id;
+   __u32 type;
+   __u32 entry; /* The index of entry in the MSI-X table */
+   __u32 flags;
+   __u32 query_flags;
+   __u32 reserved[5];
+};
+
+#define KVM_MSIX_MMIO_FLAG_REGISTER(1  0)
+#define KVM_MSIX_MMIO_FLAG_UNREGISTER  (1  1)
+
+struct kvm_msix_mmio {
+   __u32 id;
+   __u32 type;
+   __u64 base_addr;
+   __u32 max_entries_nr;
+   __u32 flags;
+   __u32 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f09db87..57a437a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
 };
 
 #define KVM_ASSIGNED_ENABLED_IOMMU (1  0)
+#define KVM_ASSIGNED_ENABLED_MSIX_MMIO (1  1)
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct work_struct interrupt_work;
@@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t assigned_dev_lock;
+   DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
+   gpa_t msix_mmio_base;
+   struct kvm_io_device msix_mmio_dev;
+   int msix_max_entries_nr;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 5c6b96d..76a1f12 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
kvm_deassign_irq(kvm, assigned_dev, assigned_dev-irq_requested_type);
 }
 
+static void unregister_msix_mmio(struct kvm *kvm,
+struct kvm_assigned_dev_kernel *adev)
+{
+   if (adev-flags  KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
+   mutex_lock(kvm-slots_lock);
+   kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
+   adev-msix_mmio_dev);
+   mutex_unlock(kvm-slots_lock);
+   adev-flags = ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+   }
+}
+
 static void kvm_free_assigned_device(struct kvm *kvm,
 struct kvm_assigned_dev_kernel
 *assigned_dev)
 {
kvm_free_assigned_irq(kvm, assigned_dev);
 
+#ifdef __KVM_HAVE_MSIX
+   unregister_msix_mmio(kvm, assigned_dev);
+#endif
pci_reset_function(assigned_dev-dev);
 
pci_release_regions(assigned_dev-dev);
@@ -504,7 +519,7 @@ out:
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,

[PATCH 5/6] KVM: assigned dev: Clean up assigned_device's flag

2010-11-15 Thread Sheng Yang
Reuse KVM_DEV_ASSIGN_ENABLE_IOMMU for an in-kernel struct didn't make much
sense.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |7 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 274655b..f09db87 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -500,6 +500,7 @@ struct kvm_guest_msix_entry {
u16 flags;
 };
 
+#define KVM_ASSIGNED_ENABLED_IOMMU (1  0)
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct work_struct interrupt_work;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..5c6b96d 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -552,7 +552,8 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
match-host_segnr = assigned_dev-segnr;
match-host_busnr = assigned_dev-busnr;
match-host_devfn = assigned_dev-devfn;
-   match-flags = assigned_dev-flags;
+   if (assigned_dev-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU)
+   match-flags |= KVM_ASSIGNED_ENABLED_IOMMU;
match-dev = dev;
spin_lock_init(match-assigned_dev_lock);
match-irq_source_id = -1;
@@ -563,7 +564,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
list_add(match-list, kvm-arch.assigned_dev_head);
 
-   if (assigned_dev-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU) {
+   if (assigned_dev-flags  KVM_ASSIGNED_ENABLED_IOMMU) {
if (!kvm-arch.iommu_domain) {
r = kvm_iommu_map_guest(kvm);
if (r)
@@ -609,7 +610,7 @@ static int kvm_vm_ioctl_deassign_device(struct kvm *kvm,
goto out;
}
 
-   if (match-flags  KVM_DEV_ASSIGN_ENABLE_IOMMU)
+   if (match-flags  KVM_ASSIGNED_ENABLED_IOMMU)
kvm_deassign_device(kvm, match);
 
kvm_free_assigned_device(kvm, match);
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] PCI: Add mask bit definition for MSI-X table

2010-11-15 Thread Sheng Yang
Then we can use it instead of magic number 1.

Reviewed-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Acked-by: Jesse Barnes jbar...@virtuousgeek.org
Cc: Matthew Wilcox wi...@linux.intel.com
Cc: linux-...@vger.kernel.org
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 drivers/pci/msi.c|5 +++--
 include/linux/pci_regs.h |1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7c24dce..44b0aee 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -168,8 +168,9 @@ static u32 __msix_mask_irq(struct msi_desc *desc, u32 flag)
u32 mask_bits = desc-masked;
unsigned offset = desc-msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_VECTOR_CTRL;
-   mask_bits = ~1;
-   mask_bits |= flag;
+   mask_bits = ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
+   if (flag)
+   mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
writel(mask_bits, desc-mask_base + offset);
 
return mask_bits;
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index b21d33e..d4f2c80 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -315,6 +315,7 @@
 #define  PCI_MSIX_ENTRY_UPPER_ADDR 4
 #define  PCI_MSIX_ENTRY_DATA   8
 #define  PCI_MSIX_ENTRY_VECTOR_CTRL12
+#define   PCI_MSIX_ENTRY_CTRL_MASKBIT  1
 
 /* CompactPCI Hotswap Register */
 
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h

2010-11-15 Thread Sheng Yang
Then it can be used by others.

Reviewed-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Reviewed-by: Matthew Wilcox wi...@linux.intel.com
Acked-by: Jesse Barnes jbar...@virtuousgeek.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 drivers/pci/msi.h|6 --
 include/linux/pci_regs.h |7 +++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h
index feff3be..65c42f8 100644
--- a/drivers/pci/msi.h
+++ b/drivers/pci/msi.h
@@ -6,12 +6,6 @@
 #ifndef MSI_H
 #define MSI_H
 
-#define PCI_MSIX_ENTRY_SIZE16
-#define  PCI_MSIX_ENTRY_LOWER_ADDR 0
-#define  PCI_MSIX_ENTRY_UPPER_ADDR 4
-#define  PCI_MSIX_ENTRY_DATA   8
-#define  PCI_MSIX_ENTRY_VECTOR_CTRL12
-
 #define msi_control_reg(base)  (base + PCI_MSI_FLAGS)
 #define msi_lower_address_reg(base)(base + PCI_MSI_ADDRESS_LO)
 #define msi_upper_address_reg(base)(base + PCI_MSI_ADDRESS_HI)
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index af83076..b21d33e 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -309,6 +309,13 @@
 #define PCI_MSIX_PBA   8
 #define  PCI_MSIX_FLAGS_BIRMASK(7  0)
 
+/* MSI-X entry's format */
+#define PCI_MSIX_ENTRY_SIZE16
+#define  PCI_MSIX_ENTRY_LOWER_ADDR 0
+#define  PCI_MSIX_ENTRY_UPPER_ADDR 4
+#define  PCI_MSIX_ENTRY_DATA   8
+#define  PCI_MSIX_ENTRY_VECTOR_CTRL12
+
 /* CompactPCI Hotswap Register */
 
 #define PCI_CHSWP_CSR  2   /* Control and Status Register */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: MMU: notrap it if gpte's reserved is set

2010-11-15 Thread Avi Kivity

On 11/15/2010 07:41 AM, Xiao Guangrong wrote:

On 11/14/2010 06:56 PM, Avi Kivity wrote:
  On 11/12/2010 12:34 PM, Xiao Guangrong wrote:
  We can past the page fault to guest directly if gpte's reserved
  is set


  How can that work? shadow_notrap_nonpresent_pte causes a fault with
  PFEC.P=PFEC.RSVD=0, while we need PFEC.P=PFEC.RSVD=1.


Ah, i missed it for a long time, thanks for you point it out.

The same mistake is in 'prefetch' path, i'll fix it in the v2 version.


Doesn't access.flat catch this?

Ideally we'd have a test case to catch this, but it may be hard to write.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6 v5 updated] KVM: assigned dev: MSI-X mask support

2010-11-15 Thread Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/x86.c   |1 +
 include/linux/kvm.h  |   32 +
 include/linux/kvm_host.h |5 +
 virt/kvm/assigned-dev.c  |  323 +-
 4 files changed, 360 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc29223..37602e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..b3e5ffe 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_MSIX_MASK 60
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +675,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MASK */
+#define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, struct kvm_msix_entry)
+#define KVM_UPDATE_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct kvm_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_TYPE_ASSIGNED_DEV 1
+
+#define KVM_MSIX_FLAG_MASKBIT  (1  0)
+#define KVM_MSIX_FLAG_QUERY_MASKBIT(1  0)
+
+struct kvm_msix_entry {
+   __u32 id;
+   __u32 type;
+   __u32 entry; /* The index of entry in the MSI-X table */
+   __u32 flags;
+   __u32 query_flags;
+   __u32 reserved[5];
+};
+
+#define KVM_MSIX_MMIO_FLAG_REGISTER(1  0)
+#define KVM_MSIX_MMIO_FLAG_UNREGISTER  (1  1)
+
+struct kvm_msix_mmio {
+   __u32 id;
+   __u32 type;
+   __u64 base_addr;
+   __u32 max_entries_nr;
+   __u32 flags;
+   __u32 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f09db87..57a437a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
 };
 
 #define KVM_ASSIGNED_ENABLED_IOMMU (1  0)
+#define KVM_ASSIGNED_ENABLED_MSIX_MMIO (1  1)
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct work_struct interrupt_work;
@@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t assigned_dev_lock;
+   DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
+   gpa_t msix_mmio_base;
+   struct kvm_io_device msix_mmio_dev;
+   int msix_max_entries_nr;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 5c6b96d..a96a74d 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
kvm_deassign_irq(kvm, assigned_dev, assigned_dev-irq_requested_type);
 }
 
+static void unregister_msix_mmio(struct kvm *kvm,
+struct kvm_assigned_dev_kernel *adev)
+{
+   if (adev-flags  KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
+   mutex_lock(kvm-slots_lock);
+   kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
+   adev-msix_mmio_dev);
+   mutex_unlock(kvm-slots_lock);
+   adev-flags = ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+   }
+}
+
 static void kvm_free_assigned_device(struct kvm *kvm,
 struct kvm_assigned_dev_kernel
 *assigned_dev)
 {
kvm_free_assigned_irq(kvm, assigned_dev);
 
+#ifdef __KVM_HAVE_MSIX
+   unregister_msix_mmio(kvm, assigned_dev);
+#endif
pci_reset_function(assigned_dev-dev);
 
pci_release_regions(assigned_dev-dev);
@@ -504,7 +519,7 @@ out:
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,

Re: [PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-15 Thread Avi Kivity

On 11/15/2010 07:25 AM, Xiao Guangrong wrote:

On 11/14/2010 06:46 PM, Avi Kivity wrote:
  On 11/12/2010 08:50 AM, Xiao Guangrong wrote:
  Retry #PF for softmmu only when the current vcpu has the same
  root shadow page as the time when #PF occurs. it means they
  have same paging environment



Hi Avi,

Thanks for your review.

  The process could have been killed and replaced by another using the
  same cr3.

Yeah, this 'retry' is unnecessary if the process is killed, but this
case is infrequent, the most case is the process keeps running and try
to access the fault address later.


The problem is that if we retry in this case, we install an incorrect spte?


And, we can get few advantages even if the process have been killed,
since we can fix the page mapping for the other processes which have
the same CR3, if other process accessed the fault address, the #PF
can be avoid. (of course we can't speculate other process can access
the fault address later)

After all, this is a speculate path, i thinks it can work well in most
case. :-)

  Or we may be running a guest that uses the same cr3 for all
  processes.

We can allow to retry #PF in the same CR3 even if there are the different
processes, since these processes have the same page mapping, the later #PF
can avoid if the page mapping have been fixed.


The guest may have changed page directories or other levels.


  Or another thread may have mmap()ed something else over the
  same address.

The mmap virtual address is also visible for other threads since the threads
have the same page table, so i think this case is the same as above?


Again, don't we install the wrong spte in this case?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-15 Thread Xiao Guangrong
On 11/15/2010 05:30 PM, Avi Kivity wrote:

 Yeah, this 'retry' is unnecessary if the process is killed, but this
 case is infrequent, the most case is the process keeps running and try
 to access the fault address later.
 
 The problem is that if we retry in this case, we install an incorrect spte?
 

..

 can avoid if the page mapping have been fixed.
 
 The guest may have changed page directories or other levels.
 

..

   Or another thread may have mmap()ed something else over the
   same address.

 The mmap virtual address is also visible for other threads since the
 threads
 have the same page table, so i think this case is the same as above?
 
 Again, don't we install the wrong spte in this case?
 

I think it doesn't corrupts spte since we will walk guest page table again
and map it to shadow pages when we retry #PF.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-15 Thread Gleb Natapov
On Mon, Nov 15, 2010 at 05:55:25PM +0800, Xiao Guangrong wrote:
 On 11/15/2010 05:30 PM, Avi Kivity wrote:
 
  Yeah, this 'retry' is unnecessary if the process is killed, but this
  case is infrequent, the most case is the process keeps running and try
  to access the fault address later.
  
  The problem is that if we retry in this case, we install an incorrect spte?
  
 
 ..
 
  can avoid if the page mapping have been fixed.
  
  The guest may have changed page directories or other levels.
  
 
 ..
 
Or another thread may have mmap()ed something else over the
same address.
 
  The mmap virtual address is also visible for other threads since the
  threads
  have the same page table, so i think this case is the same as above?
  
  Again, don't we install the wrong spte in this case?
  
 
 I think it doesn't corrupts spte since we will walk guest page table again
 and map it to shadow pages when we retry #PF.
But if the page is not mapped by new process we can inject #PF into a
guest.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-15 Thread Avi Kivity

On 11/15/2010 11:55 AM, Xiao Guangrong wrote:

 Or another thread may have mmap()ed something else over the
 same address.

  The mmap virtual address is also visible for other threads since the
  threads
  have the same page table, so i think this case is the same as above?

  Again, don't we install the wrong spte in this case?


I think it doesn't corrupts spte since we will walk guest page table again
and map it to shadow pages when we retry #PF.


Well, you're right, we don't use any gfn/pfn info from the async page fault.

However, we're still not modelling the cpu accurately.  For example we 
will set dirty and accessed bits, or inject a page fault if the gpte 
turns out to be not present.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] Monitor command: pfa2hva, translate guest physical address to host virtual address

2010-11-15 Thread Andi Kleen
 The issue of d) is that there are multiple ways to inject MCE. Now one
 software based, one APEI based, and maybe some others in the future.
 They all use different interfaces. And as debug interface, there are not
 considered kernel ABI too (some are in debugfs). So I think it is better
 to use these ABI only in some test suite.

In some cases the injection may be also through external hardware
debugging mechanisms. So yes requiring that in qemu isn't a good
idea.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-15 Thread Xiao Guangrong
On 11/15/2010 05:59 PM, Avi Kivity wrote:
 On 11/15/2010 11:55 AM, Xiao Guangrong wrote:
  Or another thread may have mmap()ed something else over the
  same address.
 
   The mmap virtual address is also visible for other threads since the
   threads
   have the same page table, so i think this case is the same as above?
 
   Again, don't we install the wrong spte in this case?
 

 I think it doesn't corrupts spte since we will walk guest page table
 again
 and map it to shadow pages when we retry #PF.
 
 Well, you're right, we don't use any gfn/pfn info from the async page
 fault.
 
 However, we're still not modelling the cpu accurately.  For example we
 will set dirty and accessed bits, or inject a page fault if the gpte
 turns out to be not present.
 

Yes, i missed this, will cook it. Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 13/15] Add bootindex for option roms.

2010-11-15 Thread Gleb Natapov
On Sun, Nov 14, 2010 at 09:33:06PM +, Blue Swirl wrote:
 On Sun, Nov 14, 2010 at 3:39 PM, Gleb Natapov g...@redhat.com wrote:
  Extend -option-rom command to have additional parameter ,bootindex=.
 
 This patch is broken:
   CCarm-softmmu/palm.o
 /src/qemu/hw/palm.c: In function 'palmte_init':
 /src/qemu/hw/palm.c:237: error: incompatible type for argument 1 of
 'get_image_size'
 /src/qemu/hw/palm.c:245: error: incompatible type for argument 1 of
 'load_image_targphys'
 cc1: warnings being treated as errors
 /src/qemu/hw/palm.c:250: error: format '%s' expects type 'char *', but
 argument 4 has type 'QEMUOptionRom'
   CCarm-softmmu/nseries.o
 /src/qemu/hw/nseries.c: In function 'n8x0_init':
 /src/qemu/hw/nseries.c:1346: error: incompatible type for argument 1
 of 'load_image_targphys'
Fixed in new version.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] qemu-kvm: Drop vga dirty logging workarounds

2010-11-15 Thread Jan Kiszka
These diffs to upstream should all date back to the days qemu-kvm
supported vga dirty logging with restricted/broken kvm kernel modules.
We no longer do, so there is no need for those workarounds. Even worse
they can trigger internal bug checks these days:

BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/cirrus_vga.c |   16 
 hw/vga-pci.c|2 --
 hw/vga.c|   44 
 3 files changed, 8 insertions(+), 54 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index a580b57..35b8b0e 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -32,7 +32,6 @@
 #include console.h
 #include vga_int.h
 #include kvm.h
-#include qemu-kvm.h
 #include loader.h
 
 /*
@@ -2553,7 +2552,6 @@ static CPUWriteMemoryFunc * const 
cirrus_linear_bitblt_write[3] = {
 
 static void map_linear_vram(CirrusVGAState *s)
 {
-vga_dirty_log_stop(s-vga);
 if (!s-vga.map_addr  s-vga.lfb_addr  s-vga.lfb_end) {
 s-vga.map_addr = s-vga.lfb_addr;
 s-vga.map_end = s-vga.lfb_end;
@@ -2566,16 +2564,11 @@ static void map_linear_vram(CirrusVGAState *s)
 #ifndef TARGET_IA64
 s-vga.lfb_vram_mapped = 0;
 
-cpu_register_physical_memory(isa_mem_base + 0xa, 0x8000,
-(s-vga.vram_offset + s-cirrus_bank_base[0]) 
| IO_MEM_UNASSIGNED);
-cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
-(s-vga.vram_offset + s-cirrus_bank_base[1]) 
| IO_MEM_UNASSIGNED);
 if (!(s-cirrus_srcptr != s-cirrus_srcptr_end)
  !((s-vga.sr[0x07]  0x01) == 0)
  !((s-vga.gr[0x0B]  0x14) == 0x14)
  !(s-vga.gr[0x0B]  0x02)) {
 
-vga_dirty_log_stop(s-vga);
 cpu_register_physical_memory(isa_mem_base + 0xa, 0x8000,
 (s-vga.vram_offset + 
s-cirrus_bank_base[0]) | IO_MEM_RAM);
 cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
@@ -2594,7 +2587,6 @@ static void map_linear_vram(CirrusVGAState *s)
 
 static void unmap_linear_vram(CirrusVGAState *s)
 {
-vga_dirty_log_stop(s-vga);
 if (s-vga.map_addr  s-vga.lfb_addr  s-vga.lfb_end) {
 s-vga.map_addr = s-vga.map_end = 0;
  cpu_register_physical_memory(s-vga.lfb_addr, s-vga.vram_size,
@@ -2602,8 +2594,6 @@ static void unmap_linear_vram(CirrusVGAState *s)
 }
 cpu_register_physical_memory(isa_mem_base + 0xa, 0x2,
  s-vga.vga_io_memory);
-
-vga_dirty_log_start(s-vga);
 }
 
 /* Compute the memory access functions */
@@ -3156,8 +3146,6 @@ static void cirrus_pci_lfb_map(PCIDevice *d, int 
region_num,
 {
 CirrusVGAState *s = DO_UPCAST(PCICirrusVGAState, dev, d)-cirrus_vga;
 
-vga_dirty_log_stop(s-vga);
-
 /* XXX: add byte swapping apertures */
 cpu_register_physical_memory(addr, s-vga.vram_size,
 s-cirrus_linear_io_addr);
@@ -3189,14 +3177,10 @@ static void pci_cirrus_write_config(PCIDevice *d,
 PCICirrusVGAState *pvs = DO_UPCAST(PCICirrusVGAState, dev, d);
 CirrusVGAState *s = pvs-cirrus_vga;
 
-vga_dirty_log_stop(s-vga);
-
 pci_default_write_config(d, address, val, len);
 if (s-vga.map_addr  d-io_regions[0].addr == PCI_BAR_UNMAPPED)
 s-vga.map_addr = 0;
 cirrus_update_memory_access(s);
-
-vga_dirty_log_start(s-vga);
 }
 
 static int pci_cirrus_vga_initfn(PCIDevice *dev)
diff --git a/hw/vga-pci.c b/hw/vga-pci.c
index 3907871..2315f70 100644
--- a/hw/vga-pci.c
+++ b/hw/vga-pci.c
@@ -68,11 +68,9 @@ static void pci_vga_write_config(PCIDevice *d,
 PCIVGAState *pvs = container_of(d, PCIVGAState, dev);
 VGACommonState *s = pvs-vga;
 
-vga_dirty_log_stop(s);
 pci_default_write_config(d, address, val, len);
 if (s-map_addr  pvs-dev.io_regions[0].addr == -1)
 s-map_addr = 0;
-vga_dirty_log_start(s);
 }
 
 static int pci_vga_initfn(PCIDevice *dev)
diff --git a/hw/vga.c b/hw/vga.c
index c316f72..36763df 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1282,8 +1282,6 @@ static void vga_draw_text(VGACommonState *s, int 
full_update)
 vga_draw_glyph8_func *vga_draw_glyph8;
 vga_draw_glyph9_func *vga_draw_glyph9;
 
-vga_dirty_log_stop(s);
-
 /* compute font data address (in plane 2) */
 v = s-sr[3];
 offset = (((v  4)  1) | ((v  1)  6)) * 8192 * 4 + 2;
@@ -1595,65 +1593,40 @@ static void vga_sync_dirty_bitmap(VGACommonState *s)
 }
 #endif
 
-vga_dirty_log_start(s);
-}
-
-static int s1, s2, s3;
-
-static void mark_dirty(target_phys_addr_t start, target_phys_addr_t len)
-{
-target_phys_addr_t end = start + len;
-
-while (start  end) {
-cpu_physical_memory_set_dirty(cpu_get_physical_page_desc(start));
-start += TARGET_PAGE_SIZE;
-}
 }
 
 void vga_dirty_log_start(VGACommonState *s)
 {
 if (kvm_enabled()  s-map_addr)
-if (!s1) {
-

Re: [Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-15 Thread Stefan Hajnoczi
On Sun, Nov 14, 2010 at 12:19 PM, Avi Kivity a...@redhat.com wrote:
 On 11/14/2010 01:05 PM, Avi Kivity wrote:

 I agree, but let's enable virtio-ioeventfd carefully because bad code
 is out there.


 Sure.  Note as long as the thread waiting on ioeventfd doesn't consume too
 much cpu, it will awaken quickly and we won't have the transaction per
 timeslice effect.

 btw, what about virtio-blk with linux-aio?  Have you benchmarked that with
 and without ioeventfd?


 And, what about efficiency?  As in bits/cycle?

We are running benchmarks with this latest patch and will report results.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: register a reset function

2010-11-15 Thread Jan Kiszka
[Wrong list, it's not upstream yet. I'm migrating the thread to kvm.]

Am 15.11.2010 12:33, Bernhard Kohl wrote:
 This is necessary because during reboot of a VM the assigned devices
 continue DMA transfers which causes memory corruption.
 
 Signed-off-by: Thomas Ostler thomas.ost...@nsn.com
 Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
 ---
 Sorry for for the long delay. Finally we added Alex' suggestions
 and rebased the patch.
 
 Thanks
 Bernhard
 ---
  hw/device-assignment.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 5f5bde1..3f8de66 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,17 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +static void reset_assigned_device(DeviceState *dev)
 +{
 +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
 +uint32_t conf;
 +
 +/* reset the bus master bit to avoid further DMA transfers */
 +conf = assigned_dev_pci_read_config(d, PCI_COMMAND, 2);
 +conf = ~PCI_COMMAND_MASTER;
 +assigned_dev_pci_write_config(d, PCI_COMMAND, conf, 2);

What about writing to /sys/bus/pci/devices/$DEVICE/reset? You probably
still need to put the command word into the reset state (ie. no RMW in
any case, just write 0), but the hardware should receive a reset as well
- if it is capable of doing a function-level reset, but we should at
least try.

 +}
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1544,6 +1555,7 @@ static PCIDeviceInfo assign_info = {
  .qdev.name= pci-assign,
  .qdev.desc= pass through host pci devices to the guest,
  .qdev.size= sizeof(AssignedDevice),
 +.qdev.reset   = reset_assigned_device,
  .init = assigned_initfn,
  .exit = assigned_exitfn,
  .config_read  = assigned_dev_pci_read_config,

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why so many vm exits caused by ept violation

2010-11-15 Thread lidong chen
2010/11/15, Avi Kivity a...@redhat.com:
 On 11/15/2010 09:24 AM, lidong chen wrote:
 the address is the Region 1 of virtio_net.

 why virtio_net use this address caused ept violation?

 It's probably the MSIX mask bit.  Older kernels program this bit twice
 on every interrupt.  Newer kernels do this much less frequently.  Try
 with a new kernel and see.

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why so many vm exits caused by ept violation

2010-11-15 Thread lidong chen
I think the address maybe initialized in function virtio_pci_probe.
err = pci_request_regions(pci_dev, virtio-pci);

but i did not know when used this address.


2010/11/15, lidong chen chen.lidong.ker...@gmail.com:
 2010/11/15, Avi Kivity a...@redhat.com:
 On 11/15/2010 09:24 AM, lidong chen wrote:
 the address is the Region 1 of virtio_net.

 why virtio_net use this address caused ept violation?

 It's probably the MSIX mask bit.  Older kernels program this bit twice
 on every interrupt.  Newer kernels do this much less frequently.  Try
 with a new kernel and see.

 --
 error compiling committee.c: too many arguments to function



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] qemu-kvm: Drop vga dirty logging workarounds

2010-11-15 Thread Avi Kivity

On 11/15/2010 12:32 PM, Jan Kiszka wrote:

These diffs to upstream should all date back to the days qemu-kvm
supported vga dirty logging with restricted/broken kvm kernel modules.
We no longer do, so there is no need for those workarounds. Even worse
they can trigger internal bug checks these days:

BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a


I'd like to apply this.  What kind of testing has this seen?  autotest 
likely isn't a good enough test.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Kevin O'Connor
On Mon, Nov 15, 2010 at 09:40:08AM +0200, Gleb Natapov wrote:
 On Sun, Nov 14, 2010 at 10:40:33PM -0500, Kevin O'Connor wrote:
  Why not just return a newline separated list that is null terminated?
  
 Doing it like this will needlessly complicate firmware side. How do you
 know how much memory to allocate before reading device list?

My preference would be for the size to be exposed via the
QEMU_CFG_FILE_DIR selector.  (My preference would be for all objects
in fw_cfg to have entries in QEMU_CFG_FILE_DIR describing their size
in a reliable manner.)

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] qemu-kvm: Drop vga dirty logging workarounds

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 14:16, Avi Kivity wrote:
 On 11/15/2010 12:32 PM, Jan Kiszka wrote:
 These diffs to upstream should all date back to the days qemu-kvm
 supported vga dirty logging with restricted/broken kvm kernel modules.
 We no longer do, so there is no need for those workarounds. Even worse
 they can trigger internal bug checks these days:

 BUG: kvm_dirty_pages_log_change: invalid parameters 
 000a8000-000a
 
 I'd like to apply this.  What kind of testing has this seen?  autotest 
 likely isn't a good enough test.

No systematic testing.

It's based on the fact that I'm not aware of any VGA issues in upstream
when KVM is enabled, on the fact that explicit log enable/disable became
obsolete when switching to upstream's logging backend, and the annoying
BUG messages only issued under qemu-kvm. There just remains a slight
uncertainty about what mark_dirty and s[123] were once doing - or may
still be useful for.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] qemu-kvm: Drop vga dirty logging workarounds

2010-11-15 Thread Avi Kivity

On 11/15/2010 03:26 PM, Jan Kiszka wrote:

Am 15.11.2010 14:16, Avi Kivity wrote:
  On 11/15/2010 12:32 PM, Jan Kiszka wrote:
  These diffs to upstream should all date back to the days qemu-kvm
  supported vga dirty logging with restricted/broken kvm kernel modules.
  We no longer do, so there is no need for those workarounds. Even worse
  they can trigger internal bug checks these days:

  BUG: kvm_dirty_pages_log_change: invalid parameters 
000a8000-000a

  I'd like to apply this.  What kind of testing has this seen?  autotest
  likely isn't a good enough test.

No systematic testing.

It's based on the fact that I'm not aware of any VGA issues in upstream
when KVM is enabled,


I don't think upstream+kvm sees a lot of testing.


  on the fact that explicit log enable/disable became
obsolete when switching to upstream's logging backend, and the annoying
BUG messages only issued under qemu-kvm. There just remains a slight
uncertainty about what mark_dirty and s[123] were once doing - or may
still be useful for.


I remember re-adding some kvm-specific dirty logging code after 
regressions were uncovered, so I'm a little worried.  I guess we can 
risk it, we can always revert it (or fix) if it turns out badly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov
On Mon, Nov 15, 2010 at 08:26:35AM -0500, Kevin O'Connor wrote:
 On Mon, Nov 15, 2010 at 09:40:08AM +0200, Gleb Natapov wrote:
  On Sun, Nov 14, 2010 at 10:40:33PM -0500, Kevin O'Connor wrote:
   Why not just return a newline separated list that is null terminated?
   
  Doing it like this will needlessly complicate firmware side. How do you
  know how much memory to allocate before reading device list?
 
 My preference would be for the size to be exposed via the
 QEMU_CFG_FILE_DIR selector.  (My preference would be for all objects
 in fw_cfg to have entries in QEMU_CFG_FILE_DIR describing their size
 in a reliable manner.)
 
Will interface suggested by Blue will be good for you? The one with two
fw_cfg ids. BOOTINDEX_LEN for len and BOOTINDEX_DATA for device list. I
already changed my implementation to this one. Using FILE_DIR requires
us to generate synthetic name. Hmm BTW I do not see proper endianness
handling in FILE_DIR.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity
Installing Windows XP with seabios 0.6.1, immediately after the first 
reboot, Windows hangs in protected mode instead of proceeding with 
installation.


I'm bisecting this, but if anyone can point to a likely culprit, I can 
try it first.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov
On Mon, Nov 15, 2010 at 03:36:25PM +0200, Gleb Natapov wrote:
   Hmm BTW I do not see proper endianness
 handling in FILE_DIR.
 
That's just me. Everything it OK there with endianness.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 02/15] Introduce new BusInfo callback get_fw_dev_path.

2010-11-15 Thread Gleb Natapov
New get_fw_dev_path callback will be used for build device path usable
by firmware in contrast to qdev qemu internal device path.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/qdev.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/qdev.h b/hw/qdev.h
index 9f90efe..dc669b3 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -49,12 +49,14 @@ struct DeviceState {
 
 typedef void (*bus_dev_printfn)(Monitor *mon, DeviceState *dev, int indent);
 typedef char *(*bus_get_dev_path)(DeviceState *dev);
+typedef char *(*bus_get_fw_dev_path)(DeviceState *dev);
 
 struct BusInfo {
 const char *name;
 size_t size;
 bus_dev_printfn print_dev;
 bus_get_dev_path get_dev_path;
+bus_get_fw_dev_path get_fw_dev_path;
 Property *props;
 };
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 01/15] Introduce fw_name field to DeviceInfo structure.

2010-11-15 Thread Gleb Natapov
Add fw_name to DeviceInfo to use in device path building. In
contrast to name fw_name should refer to functionality device
provides instead of particular device model like name does.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/fdc.c|1 +
 hw/ide/isa.c|1 +
 hw/ide/qdev.c   |1 +
 hw/isa-bus.c|1 +
 hw/lance.c  |1 +
 hw/piix_pci.c   |1 +
 hw/qdev.h   |6 ++
 hw/usb-hub.c|1 +
 hw/usb-net.c|1 +
 hw/virtio-pci.c |1 +
 10 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index c159dcb..a467c4b 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -2040,6 +2040,7 @@ static const VMStateDescription vmstate_isa_fdc ={
 static ISADeviceInfo isa_fdc_info = {
 .init = isabus_fdc_init1,
 .qdev.name  = isa-fdc,
+.qdev.fw_name  = fdc,
 .qdev.size  = sizeof(FDCtrlISABus),
 .qdev.no_user = 1,
 .qdev.vmsd  = vmstate_isa_fdc,
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 6b57e0d..9856435 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -98,6 +98,7 @@ ISADevice *isa_ide_init(int iobase, int iobase2, int isairq,
 
 static ISADeviceInfo isa_ide_info = {
 .qdev.name  = isa-ide,
+.qdev.fw_name  = ide,
 .qdev.size  = sizeof(ISAIDEState),
 .init   = isa_ide_initfn,
 .qdev.reset = isa_ide_reset,
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 0808760..6d27b60 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -134,6 +134,7 @@ static int ide_drive_initfn(IDEDevice *dev)
 
 static IDEDeviceInfo ide_drive_info = {
 .qdev.name  = ide-drive,
+.qdev.fw_name  = drive,
 .qdev.size  = sizeof(IDEDrive),
 .init   = ide_drive_initfn,
 .qdev.props = (Property[]) {
diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 4e306de..26036e0 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -153,6 +153,7 @@ static int isabus_bridge_init(SysBusDevice *dev)
 static SysBusDeviceInfo isabus_bridge_info = {
 .init = isabus_bridge_init,
 .qdev.name  = isabus-bridge,
+.qdev.fw_name  = isa,
 .qdev.size  = sizeof(SysBusDevice),
 .qdev.no_user = 1,
 };
diff --git a/hw/lance.c b/hw/lance.c
index dc12144..1a3bb1a 100644
--- a/hw/lance.c
+++ b/hw/lance.c
@@ -141,6 +141,7 @@ static void lance_reset(DeviceState *dev)
 static SysBusDeviceInfo lance_info = {
 .init   = lance_init,
 .qdev.name  = lance,
+.qdev.fw_name  = ethernet,
 .qdev.size  = sizeof(SysBusPCNetState),
 .qdev.reset = lance_reset,
 .qdev.vmsd  = vmstate_lance,
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index b5589b9..38f9d9e 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -365,6 +365,7 @@ static PCIDeviceInfo i440fx_info[] = {
 static SysBusDeviceInfo i440fx_pcihost_info = {
 .init = i440fx_pcihost_initfn,
 .qdev.name= i440FX-pcihost,
+.qdev.fw_name = pci,
 .qdev.size= sizeof(I440FXState),
 .qdev.no_user = 1,
 };
diff --git a/hw/qdev.h b/hw/qdev.h
index 579328a..9f90efe 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -139,6 +139,7 @@ typedef void (*qdev_resetfn)(DeviceState *dev);
 
 struct DeviceInfo {
 const char *name;
+const char *fw_name;
 const char *alias;
 const char *desc;
 size_t size;
@@ -288,6 +289,11 @@ void qdev_prop_set_defaults(DeviceState *dev, Property 
*props);
 void qdev_prop_register_global_list(GlobalProperty *props);
 void qdev_prop_set_globals(DeviceState *dev);
 
+static inline const char *qdev_fw_name(DeviceState *dev)
+{
+return dev-info-fw_name ? : dev-info-alias ? : dev-info-name;
+}
+
 /* This is a nasty hack to allow passing a NULL bus to qdev_create.  */
 extern struct BusInfo system_bus_info;
 
diff --git a/hw/usb-hub.c b/hw/usb-hub.c
index 2a1edfc..8e3a96b 100644
--- a/hw/usb-hub.c
+++ b/hw/usb-hub.c
@@ -545,6 +545,7 @@ static int usb_hub_initfn(USBDevice *dev)
 static struct USBDeviceInfo hub_info = {
 .product_desc   = QEMU USB Hub,
 .qdev.name  = usb-hub,
+.qdev.fw_name= hub,
 .qdev.size  = sizeof(USBHubState),
 .init   = usb_hub_initfn,
 .handle_packet  = usb_hub_handle_packet,
diff --git a/hw/usb-net.c b/hw/usb-net.c
index 70f9263..2287ee1 100644
--- a/hw/usb-net.c
+++ b/hw/usb-net.c
@@ -1496,6 +1496,7 @@ static USBDevice *usb_net_init(const char *cmdline)
 static struct USBDeviceInfo net_info = {
 .product_desc   = QEMU USB Network Interface,
 .qdev.name  = usb-net,
+.qdev.fw_name= network,
 .qdev.size  = sizeof(USBNetState),
 .init   = usb_net_initfn,
 .handle_packet  = usb_generic_handle_packet,
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 729917d..be2c92f 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -697,6 +697,7 @@ static int virtio_9p_init_pci(PCIDevice *pci_dev)
 static PCIDeviceInfo virtio_info[] = {
 {
 .qdev.name = virtio-blk-pci,
+.qdev.alias = virtio-blk,
 .qdev.size = sizeof(VirtIOPCIProxy),
 .init  = virtio_blk_init_pci,
 .exit  = 

[PATCHv5 04/15] Add get_fw_dev_path callback to ISA bus in qdev.

2010-11-15 Thread Gleb Natapov
Use device ioports to create unique device path.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/isa-bus.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index c0ac7e9..c423c1b 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -31,11 +31,13 @@ static ISABus *isabus;
 target_phys_addr_t isa_mem_base = 0;
 
 static void isabus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *isabus_get_fw_dev_path(DeviceState *dev);
 
 static struct BusInfo isa_bus_info = {
 .name  = ISA,
 .size  = sizeof(ISABus),
 .print_dev = isabus_dev_print,
+.get_fw_dev_path = isabus_get_fw_dev_path,
 };
 
 ISABus *isa_bus_new(DeviceState *dev)
@@ -188,4 +190,18 @@ static void isabus_register_devices(void)
 sysbus_register_withprop(isabus_bridge_info);
 }
 
+static char *isabus_get_fw_dev_path(DeviceState *dev)
+{
+ISADevice *d = (ISADevice*)dev;
+char path[40];
+int off;
+
+off = snprintf(path, sizeof(path), %s, qdev_fw_name(dev));
+if (d-nioports) {
+snprintf(path + off, sizeof(path) - off, @%04x, d-ioports[0]);
+}
+
+return strdup(path);
+}
+
 device_init(isabus_register_devices)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 00/15] boot order specification

2010-11-15 Thread Gleb Natapov
I am using open firmware naming scheme to specify device path names.
In this version: fixed compilation problem, changed how device list
is passed into firmware.

Names look like this on pci machine:
/p...@i0cf8/i...@1,1/dr...@1/d...@0
/p...@i0cf8/i...@1/f...@03f1/flo...@1
/p...@i0cf8/i...@1/f...@03f1/flo...@0
/p...@i0cf8/i...@1,1/dr...@1/d...@1
/p...@i0cf8/i...@1,1/dr...@0/d...@0
/p...@i0cf8/s...@3/d...@0
/p...@i0cf8/ether...@4/ethernet-...@0
/p...@i0cf8/ether...@5/ethernet-...@0
/p...@i0cf8/i...@1,1/dr...@0/d...@1
/p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0
/p...@i0cf8/u...@1,2/netw...@0/ether...@0
/p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0
/r...@genroms/linuxboot.bin

and on isa machine:
/isa/i...@0170/dr...@0/d...@0
/isa/f...@03f1/flo...@1
/isa/f...@03f1/flo...@0
/isa/i...@0170/dr...@0/d...@1


Instead of using get_dev_path() callback I introduces another one
get_fw_dev_path. Unfortunately the way get_dev_path() callback is used
in migration code makes it hard to reuse it for other purposes. First
of all it is not called recursively so caller expects it to provide
unique name by itself. Device path though is inherently recursive. Each
individual element may not be unique, but the whole path will be. On
the other hand to call get_dev_path() recursively in migration code we
should implement it for all possible buses first. Other problem is
compatibility. If we change get_dev_path() output format now we will not
be able to migrate from old qemu to new one without some additional
compatibility layer.

Gleb Natapov (15):
  Introduce fw_name field to DeviceInfo structure.
  Introduce new BusInfo callback get_fw_dev_path.
  Keep track of ISA ports ISA device is using in qdev.
  Add get_fw_dev_path callback to ISA bus in qdev.
  Store IDE bus id in IDEBus structure for easy access.
  Add get_fw_dev_path callback to IDE bus.
  Add get_dev_path callback for system bus.
  Add get_fw_dev_path callback for pci bus.
  Record which USBDevice USBPort belongs too.
  Add get_dev_path callback for usb bus.
  Add bootindex parameter to net/block/fd device
  Change fw_cfg_add_file() to get full file path as a parameter.
  Add bootindex for option roms.
  Add notifier that will be called when machine is fully created.
  Pass boot device list to firmware.

 block_int.h   |4 +-
 hw/cs4231a.c  |1 +
 hw/e1000.c|4 ++
 hw/eepro100.c |3 +
 hw/fdc.c  |   12 ++
 hw/fw_cfg.c   |   31 +--
 hw/fw_cfg.h   |9 +++-
 hw/gus.c  |4 ++
 hw/ide/cmd646.c   |4 +-
 hw/ide/internal.h |3 +-
 hw/ide/isa.c  |5 ++-
 hw/ide/piix.c |4 +-
 hw/ide/qdev.c |   22 ++-
 hw/ide/via.c  |4 +-
 hw/isa-bus.c  |   42 +++
 hw/isa.h  |4 ++
 hw/lance.c|1 +
 hw/loader.c   |   32 +++---
 hw/loader.h   |8 ++--
 hw/m48t59.c   |1 +
 hw/mc146818rtc.c  |1 +
 hw/multiboot.c|3 +-
 hw/ne2000-isa.c   |3 +
 hw/ne2000.c   |5 ++-
 hw/nseries.c  |4 +-
 hw/palm.c |6 +-
 hw/parallel.c |5 ++
 hw/pc.c   |7 ++-
 hw/pci.c  |  110 +++---
 hw/pci_host.c |2 +
 hw/pckbd.c|3 +
 hw/pcnet.c|6 ++-
 hw/piix_pci.c |1 +
 hw/qdev.c |   32 +++
 hw/qdev.h |9 
 hw/rtl8139.c  |4 ++
 hw/sb16.c |4 ++
 hw/serial.c   |1 +
 hw/sysbus.c   |   30 ++
 hw/sysbus.h   |4 ++
 hw/usb-bus.c  |   45 -
 hw/usb-hub.c  |3 +-
 hw/usb-musb.c |2 +-
 hw/usb-net.c  |3 +
 hw/usb-ohci.c |2 +-
 hw/usb-uhci.c |2 +-
 hw/usb.h  |3 +-
 hw/virtio-blk.c   |2 +
 hw/virtio-net.c   |2 +
 hw/virtio-pci.c   |1 +
 net.h |4 +-
 qemu-config.c |   17 
 sysemu.h  |   11 +-
 vl.c  |  115 -
 54 files changed, 569 insertions(+), 81 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 08/15] Add get_fw_dev_path callback for pci bus.

2010-11-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/pci.c |  108 -
 1 files changed, 85 insertions(+), 23 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 962886e..114b435 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -43,12 +43,14 @@
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
 static char *pcibus_get_dev_path(DeviceState *dev);
+static char *pcibus_get_fw_dev_path(DeviceState *dev);
 
 struct BusInfo pci_bus_info = {
 .name   = PCI,
 .size   = sizeof(PCIBus),
 .print_dev  = pcibus_dev_print,
 .get_dev_path = pcibus_get_dev_path,
+.get_fw_dev_path = pcibus_get_fw_dev_path,
 .props  = (Property[]) {
 DEFINE_PROP_PCI_DEVFN(addr, PCIDevice, devfn, -1),
 DEFINE_PROP_STRING(romfile, PCIDevice, romfile),
@@ -1061,45 +1063,63 @@ void pci_msi_notify(PCIDevice *dev, unsigned int vector)
 typedef struct {
 uint16_t class;
 const char *desc;
+const char *fw_name;
+uint16_t fw_ign_bits;
 } pci_class_desc;
 
 static const pci_class_desc pci_class_descriptions[] =
 {
-{ 0x0100, SCSI controller},
-{ 0x0101, IDE controller},
-{ 0x0102, Floppy controller},
-{ 0x0103, IPI controller},
-{ 0x0104, RAID controller},
+{ 0x0001, VGA controller, display},
+{ 0x0100, SCSI controller, scsi},
+{ 0x0101, IDE controller, ide},
+{ 0x0102, Floppy controller, fdc},
+{ 0x0103, IPI controller, ipi},
+{ 0x0104, RAID controller, raid},
 { 0x0106, SATA controller},
 { 0x0107, SAS controller},
 { 0x0180, Storage controller},
-{ 0x0200, Ethernet controller},
-{ 0x0201, Token Ring controller},
-{ 0x0202, FDDI controller},
-{ 0x0203, ATM controller},
+{ 0x0200, Ethernet controller, ethernet},
+{ 0x0201, Token Ring controller, token-ring},
+{ 0x0202, FDDI controller, fddi},
+{ 0x0203, ATM controller, atm},
 { 0x0280, Network controller},
-{ 0x0300, VGA controller},
+{ 0x0300, VGA controller, display, 0x00ff},
 { 0x0301, XGA controller},
 { 0x0302, 3D controller},
 { 0x0380, Display controller},
-{ 0x0400, Video controller},
-{ 0x0401, Audio controller},
+{ 0x0400, Video controller, video},
+{ 0x0401, Audio controller, sound},
 { 0x0402, Phone},
 { 0x0480, Multimedia controller},
-{ 0x0500, RAM controller},
-{ 0x0501, Flash controller},
+{ 0x0500, RAM controller, memory},
+{ 0x0501, Flash controller, flash},
 { 0x0580, Memory controller},
-{ 0x0600, Host bridge},
-{ 0x0601, ISA bridge},
-{ 0x0602, EISA bridge},
-{ 0x0603, MC bridge},
-{ 0x0604, PCI bridge},
-{ 0x0605, PCMCIA bridge},
-{ 0x0606, NUBUS bridge},
-{ 0x0607, CARDBUS bridge},
+{ 0x0600, Host bridge, host},
+{ 0x0601, ISA bridge, isa},
+{ 0x0602, EISA bridge, eisa},
+{ 0x0603, MC bridge, mca},
+{ 0x0604, PCI bridge, pci},
+{ 0x0605, PCMCIA bridge, pcmcia},
+{ 0x0606, NUBUS bridge, nubus},
+{ 0x0607, CARDBUS bridge, cardbus},
 { 0x0608, RACEWAY bridge},
 { 0x0680, Bridge},
-{ 0x0c03, USB controller},
+{ 0x0700, Serial port, serial},
+{ 0x0701, Parallel port, parallel},
+{ 0x0800, Interrupt controller, interrupt-controller},
+{ 0x0801, DMA controller, dma-controller},
+{ 0x0802, Timer, timer},
+{ 0x0803, RTC, rtc},
+{ 0x0900, Keyboard, keyboard},
+{ 0x0901, Pen, pen},
+{ 0x0902, Mouse, mouse},
+{ 0x0A00, Dock station, dock, 0x00ff},
+{ 0x0B00, i386 cpu, cpu, 0x00ff},
+{ 0x0c00, Fireware contorller, fireware},
+{ 0x0c01, Access bus controller, access-bus},
+{ 0x0c02, SSA controller, ssa},
+{ 0x0c03, USB controller, usb},
+{ 0x0c04, Fibre channel controller, fibre-channel},
 { 0, NULL}
 };
 
@@ -1825,6 +1845,48 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
 }
 }
 
+static char *pci_dev_fw_name(DeviceState *dev, char *buf, int len)
+{
+PCIDevice *d = (PCIDevice *)dev;
+const char *name = NULL;
+const pci_class_desc *desc =  pci_class_descriptions;
+int class = pci_get_word(d-config + PCI_CLASS_DEVICE);
+
+while (desc-desc 
+  (class  ~desc-fw_ign_bits) !=
+  (desc-class  ~desc-fw_ign_bits)) {
+desc++;
+}
+
+if (desc-desc) {
+name = desc-fw_name;
+}
+
+if (name) {
+pstrcpy(buf, len, name);
+} else {
+snprintf(buf, len, pci%04x,%04x,
+ pci_get_word(d-config + PCI_VENDOR_ID),
+ pci_get_word(d-config + PCI_DEVICE_ID));
+}
+
+return buf;
+}
+
+static char *pcibus_get_fw_dev_path(DeviceState *dev)
+{
+PCIDevice *d = (PCIDevice *)dev;
+char path[50], name[33];
+int off;
+
+off = snprintf(path, sizeof(path), %...@%x,
+   pci_dev_fw_name(dev, name, sizeof name),
+   PCI_SLOT(d-devfn));
+if (PCI_FUNC(d-devfn))
+snprintf(path + off, 

[PATCHv5 07/15] Add get_dev_path callback for system bus.

2010-11-15 Thread Gleb Natapov
Prints out mmio or pio used to access child device.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/pci_host.c |2 ++
 hw/sysbus.c   |   30 ++
 hw/sysbus.h   |4 
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/hw/pci_host.c b/hw/pci_host.c
index bc5b771..28d45bf 100644
--- a/hw/pci_host.c
+++ b/hw/pci_host.c
@@ -197,6 +197,7 @@ void pci_host_conf_register_ioport(pio_addr_t ioport, 
PCIHostState *s)
 {
 pci_host_init(s);
 register_ioport_simple(s-conf_noswap_handler, ioport, 4, 4);
+sysbus_init_ioports(s-busdev, ioport, 4);
 }
 
 int pci_host_data_register_mmio(PCIHostState *s, int swap)
@@ -215,4 +216,5 @@ void pci_host_data_register_ioport(pio_addr_t ioport, 
PCIHostState *s)
 register_ioport_simple(s-data_noswap_handler, ioport, 4, 1);
 register_ioport_simple(s-data_noswap_handler, ioport, 4, 2);
 register_ioport_simple(s-data_noswap_handler, ioport, 4, 4);
+sysbus_init_ioports(s-busdev, ioport, 4);
 }
diff --git a/hw/sysbus.c b/hw/sysbus.c
index d817721..1583bd8 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -22,11 +22,13 @@
 #include monitor.h
 
 static void sysbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *sysbus_get_fw_dev_path(DeviceState *dev);
 
 struct BusInfo system_bus_info = {
 .name   = System,
 .size   = sizeof(BusState),
 .print_dev  = sysbus_dev_print,
+.get_fw_dev_path = sysbus_get_fw_dev_path,
 };
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq)
@@ -106,6 +108,16 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, 
target_phys_addr_t size,
 dev-mmio[n].cb = cb;
 }
 
+void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t size)
+{
+pio_addr_t i;
+
+for (i = 0; i  size; i++) {
+assert(dev-num_pio  QDEV_MAX_PIO);
+dev-pio[dev-num_pio++] = ioport++;
+}
+}
+
 static int sysbus_device_init(DeviceState *dev, DeviceInfo *base)
 {
 SysBusDeviceInfo *info = container_of(base, SysBusDeviceInfo, qdev);
@@ -171,3 +183,21 @@ static void sysbus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
indent, , s-mmio[i].addr, s-mmio[i].size);
 }
 }
+
+static char *sysbus_get_fw_dev_path(DeviceState *dev)
+{
+SysBusDevice *s = sysbus_from_qdev(dev);
+char path[40];
+int off;
+
+off = snprintf(path, sizeof(path), %s, qdev_fw_name(dev));
+
+if (s-num_mmio) {
+snprintf(path + off, sizeof(path) - off, @TARGET_FMT_plx,
+ s-mmio[0].addr);
+} else if (s-num_pio) {
+snprintf(path + off, sizeof(path) - off, @i%04x, s-pio[0]);
+}
+
+return strdup(path);
+}
diff --git a/hw/sysbus.h b/hw/sysbus.h
index 5980901..e9eb618 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -6,6 +6,7 @@
 #include qdev.h
 
 #define QDEV_MAX_MMIO 32
+#define QDEV_MAX_PIO 32
 #define QDEV_MAX_IRQ 256
 
 typedef struct SysBusDevice SysBusDevice;
@@ -23,6 +24,8 @@ struct SysBusDevice {
 mmio_mapfunc cb;
 ram_addr_t iofunc;
 } mmio[QDEV_MAX_MMIO];
+int num_pio;
+pio_addr_t pio[QDEV_MAX_PIO];
 };
 
 typedef int (*sysbus_initfn)(SysBusDevice *dev);
@@ -45,6 +48,7 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, 
target_phys_addr_t size,
 mmio_mapfunc cb);
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p);
 void sysbus_pass_irq(SysBusDevice *dev, SysBusDevice *target);
+void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t 
size);
 
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/fw_cfg.c |   15 +++
 hw/fw_cfg.h |5 -
 sysemu.h|1 +
 vl.c|   49 +
 4 files changed, 69 insertions(+), 1 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 7b9434f..4eea338 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -53,6 +53,7 @@ struct FWCfgState {
 FWCfgFiles *files;
 uint16_t cur_entry;
 uint32_t cur_offset;
+Notifier machine_ready;
 };
 
 static void fw_cfg_write(FWCfgState *s, uint8_t value)
@@ -315,6 +316,16 @@ int fw_cfg_add_file(FWCfgState *s,  const char *filename, 
uint8_t *data,
 return 1;
 }
 
+static void fw_cfg_machine_ready(struct Notifier* n)
+{
+uint32_t len;
+FWCfgState *s = container_of(n, FWCfgState, machine_ready);
+char *bootindex = get_boot_devices_list(len);
+
+fw_cfg_add_i32(s, FW_CFG_BOOTINDEX_LEN, len);
+fw_cfg_add_bytes(s, FW_CFG_BOOTINDEX_DATA, (uint8_t*)bootindex, len);
+}
+
 FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 target_phys_addr_t ctl_addr, target_phys_addr_t 
data_addr)
 {
@@ -343,6 +354,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
data_port,
 fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 
+
+s-machine_ready.notify = fw_cfg_machine_ready;
+qemu_add_machine_init_done_notifier(s-machine_ready);
+
 return s;
 }
 
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 856bf91..b951f6b 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -30,7 +30,10 @@
 
 #define FW_CFG_FILE_FIRST   0x20
 #define FW_CFG_FILE_SLOTS   0x10
-#define FW_CFG_MAX_ENTRY(FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
+#define FW_CFG_FILE_LAST_SLOT   (FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
+#define FW_CFG_BOOTINDEX_LEN(FW_CFG_FILE_LAST_SLOT + 1)
+#define FW_CFG_BOOTINDEX_DATA   (FW_CFG_FILE_LAST_SLOT + 2)
+#define FW_CFG_MAX_ENTRYFW_CFG_BOOTINDEX_DATA + 1 
 
 #define FW_CFG_WRITE_CHANNEL0x4000
 #define FW_CFG_ARCH_LOCAL   0x8000
diff --git a/sysemu.h b/sysemu.h
index c42f33a..38a20a3 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -196,4 +196,5 @@ void register_devices(void);
 
 void add_boot_device_path(int32_t bootindex, DeviceState *dev,
   const char *suffix);
+char *get_boot_devices_list(uint32_t *size);
 #endif
diff --git a/vl.c b/vl.c
index 918d988..ab36f9f 100644
--- a/vl.c
+++ b/vl.c
@@ -735,6 +735,55 @@ void add_boot_device_path(int32_t bootindex, DeviceState 
*dev,
 QTAILQ_INSERT_TAIL(fw_boot_order, node, link);
 }
 
+/*
+ * This function returns device list as an array in a below format:
+ * +---+-+-
+ * | devpath1  |  devpath2   | ...
+ * +---+-+-
+ * where:
+ *   devpath - null terminated string representing one device path
+ *
+ *  memory pointed by size is assigned total length of teh array in bytes
+ *
+ */
+char *get_boot_devices_list(uint32_t *size)
+{
+FWBootEntry *i;
+uint32_t total = 0;
+char *list = NULL;
+
+QTAILQ_FOREACH(i, fw_boot_order, link) {
+char *devpath = NULL, *bootpath;
+int len;
+
+if (i-dev) {
+devpath = qdev_get_fw_dev_path(i-dev);
+assert(devpath);
+}
+
+if (i-suffix  devpath) {
+bootpath = qemu_malloc(strlen(devpath) + strlen(i-suffix) + 2);
+sprintf(bootpath, %s/%s, devpath, i-suffix);
+qemu_free(devpath);
+} else if (devpath) {
+bootpath = devpath;
+} else {
+bootpath = strdup(i-suffix);
+assert(bootpath);
+}
+
+len = strlen(bootpath) + 1;
+list = qemu_realloc(list, total + len);
+memcpy(list[total], bootpath, len);
+total += len;
+qemu_free(bootpath);
+}
+
+*size = total;
+
+return list;
+}
+
 static void numa_add(const char *optarg)
 {
 char option[128];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 11/15] Add bootindex parameter to net/block/fd device

2010-11-15 Thread Gleb Natapov
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 block_int.h |4 +++-
 hw/e1000.c  |4 
 hw/eepro100.c   |3 +++
 hw/fdc.c|8 
 hw/ide/qdev.c   |5 +
 hw/ne2000.c |3 +++
 hw/pcnet.c  |4 
 hw/qdev.c   |   32 
 hw/qdev.h   |1 +
 hw/rtl8139.c|4 
 hw/usb-net.c|2 ++
 hw/virtio-blk.c |2 ++
 hw/virtio-net.c |2 ++
 net.h   |4 +++-
 sysemu.h|2 ++
 vl.c|   40 
 16 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/block_int.h b/block_int.h
index 3c3adb5..0a0e47d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -227,6 +227,7 @@ typedef struct BlockConf {
 uint16_t logical_block_size;
 uint16_t min_io_size;
 uint32_t opt_io_size;
+int32_t bootindex;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -249,6 +250,7 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 DEFINE_PROP_UINT16(physical_block_size, _state,   \
_conf.physical_block_size, 512), \
 DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0),  \
-DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0)
+DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0),\
+DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1) \
 
 #endif /* BLOCK_INT_H */
diff --git a/hw/e1000.c b/hw/e1000.c
index 532efdc..053f33e 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -30,6 +30,7 @@
 #include net.h
 #include net/checksum.h
 #include loader.h
+#include sysemu.h
 
 #include e1000_hw.h
 
@@ -1148,6 +1149,9 @@ static int pci_e1000_init(PCIDevice *pci_dev)
   d-dev.qdev.info-name, d-dev.qdev.id, d);
 
 qemu_format_nic_info_str(d-nic-nc, macaddr);
+
+add_boot_device_path(d-conf.bootindex, pci_dev-qdev, ethernet-...@0);
+
 return 0;
 }
 
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 41d792a..80adac6 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -46,6 +46,7 @@
 #include pci.h
 #include net.h
 #include eeprom93xx.h
+#include sysemu.h
 
 #define KiB 1024
 
@@ -1907,6 +1908,8 @@ static int e100_nic_init(PCIDevice *pci_dev)
 s-vmstate-name = s-nic-nc.model;
 vmstate_register(pci_dev-qdev, -1, s-vmstate, s);
 
+add_boot_device_path(s-conf.bootindex, pci_dev-qdev, ethernet-...@0);
+
 return 0;
 }
 
diff --git a/hw/fdc.c b/hw/fdc.c
index 5ab754b..7b1349f 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -35,6 +35,7 @@
 #include sysbus.h
 #include qdev-addr.h
 #include blockdev.h
+#include sysemu.h
 
 //
 /* debug Floppy devices */
@@ -523,6 +524,8 @@ typedef struct FDCtrlSysBus {
 typedef struct FDCtrlISABus {
 ISADevice busdev;
 struct FDCtrl state;
+int32_t bootindexA;
+int32_t bootindexB;
 } FDCtrlISABus;
 
 static uint32_t fdctrl_read (void *opaque, uint32_t reg)
@@ -1992,6 +1995,9 @@ static int isabus_fdc_init1(ISADevice *dev)
 qdev_set_legacy_instance_id(dev-qdev, iobase, 2);
 ret = fdctrl_init_common(fdctrl);
 
+add_boot_device_path(isa-bootindexA, dev-qdev, flo...@0);
+add_boot_device_path(isa-bootindexB, dev-qdev, flo...@1);
+
 return ret;
 }
 
@@ -2051,6 +2057,8 @@ static ISADeviceInfo isa_fdc_info = {
 .qdev.props = (Property[]) {
 DEFINE_PROP_DRIVE(driveA, FDCtrlISABus, state.drives[0].bs),
 DEFINE_PROP_DRIVE(driveB, FDCtrlISABus, state.drives[1].bs),
+DEFINE_PROP_INT32(bootindexA, FDCtrlISABus, bootindexA, -1),
+DEFINE_PROP_INT32(bootindexB, FDCtrlISABus, bootindexB, -1),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 01a181b..69a00e2 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -21,6 +21,7 @@
 #include qemu-error.h
 #include hw/ide/internal.h
 #include blockdev.h
+#include sysemu.h
 
 /* - */
 
@@ -143,6 +144,10 @@ static int ide_drive_initfn(IDEDevice *dev)
 if (!dev-serial) {
 dev-serial = qemu_strdup(s-drive_serial_str);
 }
+
+add_boot_device_path(dev-conf.bootindex, dev-qdev,
+ dev-unit ? d...@1 : d...@0);
+
 return 0;
 }
 
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 126e7cf..f4bbac2 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -26,6 +26,7 @@
 #include net.h
 #include ne2000.h
 #include loader.h
+#include sysemu.h
 
 /* debug NE2000 card */
 //#define DEBUG_NE2000
@@ -746,6 +747,8 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 }
 }
 
+add_boot_device_path(s-c.bootindex, pci_dev-qdev, ethernet-...@0);
+
 return 0;
 }
 
diff --git a/hw/pcnet.c b/hw/pcnet.c

[PATCHv5 09/15] Record which USBDevice USBPort belongs too.

2010-11-15 Thread Gleb Natapov
Ports on root hub will have NULL here. This is needed to reconstruct
path from device to its root hub to build device path.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/usb-bus.c  |3 ++-
 hw/usb-hub.c  |2 +-
 hw/usb-musb.c |2 +-
 hw/usb-ohci.c |2 +-
 hw/usb-uhci.c |2 +-
 hw/usb.h  |3 ++-
 6 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/usb-bus.c b/hw/usb-bus.c
index b692503..256b881 100644
--- a/hw/usb-bus.c
+++ b/hw/usb-bus.c
@@ -110,11 +110,12 @@ USBDevice *usb_create_simple(USBBus *bus, const char 
*name)
 }
 
 void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
-   usb_attachfn attach)
+   USBDevice *pdev, usb_attachfn attach)
 {
 port-opaque = opaque;
 port-index = index;
 port-attach = attach;
+port-pdev = pdev;
 QTAILQ_INSERT_TAIL(bus-free, port, next);
 bus-nfree++;
 }
diff --git a/hw/usb-hub.c b/hw/usb-hub.c
index 8e3a96b..8a3f829 100644
--- a/hw/usb-hub.c
+++ b/hw/usb-hub.c
@@ -535,7 +535,7 @@ static int usb_hub_initfn(USBDevice *dev)
 for (i = 0; i  s-nb_ports; i++) {
 port = s-ports[i];
 usb_register_port(usb_bus_from_device(dev),
-  port-port, s, i, usb_hub_attach);
+  port-port, s, i, s-dev, usb_hub_attach);
 port-wPortStatus = PORT_STAT_POWER;
 port-wPortChange = 0;
 }
diff --git a/hw/usb-musb.c b/hw/usb-musb.c
index 7f15842..9efe7a6 100644
--- a/hw/usb-musb.c
+++ b/hw/usb-musb.c
@@ -343,7 +343,7 @@ struct MUSBState {
 }
 
 usb_bus_new(s-bus, NULL /* FIXME */);
-usb_register_port(s-bus, s-port, s, 0, musb_attach);
+usb_register_port(s-bus, s-port, s, 0, NULL, musb_attach);
 
 return s;
 }
diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index c60fd8d..59604cf 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -1705,7 +1705,7 @@ static void usb_ohci_init(OHCIState *ohci, DeviceState 
*dev,
 usb_bus_new(ohci-bus, dev);
 ohci-num_ports = num_ports;
 for (i = 0; i  num_ports; i++) {
-usb_register_port(ohci-bus, ohci-rhport[i].port, ohci, i, 
ohci_attach);
+usb_register_port(ohci-bus, ohci-rhport[i].port, ohci, i, NULL, 
ohci_attach);
 }
 
 ohci-async_td = 0;
diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 1d83400..b9b822f 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -1115,7 +1115,7 @@ static int usb_uhci_common_initfn(UHCIState *s)
 
 usb_bus_new(s-bus, s-dev.qdev);
 for(i = 0; i  NB_PORTS; i++) {
-usb_register_port(s-bus, s-ports[i].port, s, i, uhci_attach);
+usb_register_port(s-bus, s-ports[i].port, s, i, NULL, uhci_attach);
 }
 s-frame_timer = qemu_new_timer(vm_clock, uhci_frame_timer, s);
 s-expire_time = qemu_get_clock(vm_clock) +
diff --git a/hw/usb.h b/hw/usb.h
index 00d2802..0b32d77 100644
--- a/hw/usb.h
+++ b/hw/usb.h
@@ -203,6 +203,7 @@ struct USBPort {
 USBDevice *dev;
 usb_attachfn attach;
 void *opaque;
+USBDevice *pdev;
 int index; /* internal port index, may be used with the opaque */
 QTAILQ_ENTRY(USBPort) next;
 };
@@ -312,7 +313,7 @@ USBDevice *usb_create(USBBus *bus, const char *name);
 USBDevice *usb_create_simple(USBBus *bus, const char *name);
 USBDevice *usbdevice_create(const char *cmdline);
 void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index,
-   usb_attachfn attach);
+   USBDevice *pdev, usb_attachfn attach);
 void usb_unregister_port(USBBus *bus, USBPort *port);
 int usb_device_attach(USBDevice *dev);
 int usb_device_detach(USBDevice *dev);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 05/15] Store IDE bus id in IDEBus structure for easy access.

2010-11-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/ide/cmd646.c   |4 ++--
 hw/ide/internal.h |3 ++-
 hw/ide/isa.c  |2 +-
 hw/ide/piix.c |4 ++--
 hw/ide/qdev.c |3 ++-
 hw/ide/via.c  |4 ++--
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index ff80dd5..b2cbdbc 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -257,8 +257,8 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
 irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
-ide_bus_new(d-bus[0], d-dev.qdev);
-ide_bus_new(d-bus[1], d-dev.qdev);
+ide_bus_new(d-bus[0], d-dev.qdev, 0);
+ide_bus_new(d-bus[1], d-dev.qdev, 1);
 ide_init2(d-bus[0], irq[0]);
 ide_init2(d-bus[1], irq[1]);
 
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index d652e06..c0a1abc 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -448,6 +448,7 @@ struct IDEBus {
 IDEDevice *slave;
 BMDMAState *bmdma;
 IDEState ifs[2];
+int bus_id;
 uint8_t unit;
 uint8_t cmd;
 qemu_irq irq;
@@ -565,7 +566,7 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo 
*hd0,
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
 /* hw/ide/qdev.c */
-void ide_bus_new(IDEBus *idebus, DeviceState *dev);
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id);
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
 
 #endif /* HW_IDE_INTERNAL_H */
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 4206afd..8c59c5a 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -67,7 +67,7 @@ static int isa_ide_initfn(ISADevice *dev)
 {
 ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev);
 
-ide_bus_new(s-bus, s-dev.qdev);
+ide_bus_new(s-bus, s-dev.qdev, 0);
 ide_init_ioport(s-bus, s-iobase, s-iobase2);
 isa_init_irq(dev, s-irq, s-isairq);
 isa_init_ioport_range(dev, s-iobase, 8);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 07483e8..d0b04a3 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -129,8 +129,8 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
 vmstate_register(d-dev.qdev, 0, vmstate_ide_pci, d);
 
-ide_bus_new(d-bus[0], d-dev.qdev);
-ide_bus_new(d-bus[1], d-dev.qdev);
+ide_bus_new(d-bus[0], d-dev.qdev, 0);
+ide_bus_new(d-bus[1], d-dev.qdev, 1);
 ide_init_ioport(d-bus[0], 0x1f0, 0x3f6);
 ide_init_ioport(d-bus[1], 0x170, 0x376);
 
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6d27b60..88ff657 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -29,9 +29,10 @@ static struct BusInfo ide_bus_info = {
 .size  = sizeof(IDEBus),
 };
 
-void ide_bus_new(IDEBus *idebus, DeviceState *dev)
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id)
 {
 qbus_create_inplace(idebus-qbus, ide_bus_info, dev, NULL);
+idebus-bus_id = bus_id;
 }
 
 static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
diff --git a/hw/ide/via.c b/hw/ide/via.c
index b2c7cad..cc48b2b 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -158,8 +158,8 @@ static int vt82c686b_ide_initfn(PCIDevice *dev)
 
 vmstate_register(dev-qdev, 0, vmstate_ide_pci, d);
 
-ide_bus_new(d-bus[0], d-dev.qdev);
-ide_bus_new(d-bus[1], d-dev.qdev);
+ide_bus_new(d-bus[0], d-dev.qdev, 0);
+ide_bus_new(d-bus[1], d-dev.qdev, 1);
 ide_init2(d-bus[0], isa_reserve_irq(14));
 ide_init2(d-bus[1], isa_reserve_irq(15));
 ide_init_ioport(d-bus[0], 0x1f0, 0x3f6);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 10/15] Add get_dev_path callback for usb bus.

2010-11-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/usb-bus.c |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/hw/usb-bus.c b/hw/usb-bus.c
index 256b881..8b4583c 100644
--- a/hw/usb-bus.c
+++ b/hw/usb-bus.c
@@ -5,11 +5,13 @@
 #include monitor.h
 
 static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent);
+static char *usbbus_get_fw_dev_path(DeviceState *dev);
 
 static struct BusInfo usb_bus_info = {
 .name  = USB,
 .size  = sizeof(USBBus),
 .print_dev = usb_bus_dev_print,
+.get_fw_dev_path = usbbus_get_fw_dev_path,
 };
 static int next_usb_bus = 0;
 static QTAILQ_HEAD(, USBBus) busses = QTAILQ_HEAD_INITIALIZER(busses);
@@ -307,3 +309,43 @@ USBDevice *usbdevice_create(const char *cmdline)
 }
 return usb-usbdevice_init(params);
 }
+
+static int usbbus_get_fw_dev_path_helper(USBDevice *d, USBBus *bus, char *p,
+ int len)
+{
+int l = 0;
+USBPort *port;
+
+QTAILQ_FOREACH(port, bus-used, next) {
+if (port-dev == d) {
+if (port-pdev) {
+l = usbbus_get_fw_dev_path_helper(port-pdev, bus, p, len);
+}
+l += snprintf(p + l, len - l, %...@%x/, qdev_fw_name(d-qdev),
+  port-index);
+break;
+}
+}
+
+return l;
+}
+
+static char *usbbus_get_fw_dev_path(DeviceState *dev)
+{
+USBDevice *d = (USBDevice*)dev;
+USBBus *bus = usb_bus_from_device(d);
+char path[100];
+int l;
+
+assert(d-attached != 0);
+
+l = usbbus_get_fw_dev_path_helper(d, bus, path, sizeof(path));
+
+if (l == 0) {
+abort();
+}
+
+path[l-1] = '\0';
+
+return strdup(path);
+}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 12/15] Change fw_cfg_add_file() to get full file path as a parameter.

2010-11-15 Thread Gleb Natapov
Change fw_cfg_add_file() to get full file path as a parameter instead
of building one internally. Two reasons for that. First caller may need
to know how file is named. Second this moves policy of file naming out
from fw_cfg. Platform may want to use more then two levels of
directories for instance.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/fw_cfg.c |   16 
 hw/fw_cfg.h |4 ++--
 hw/loader.c |   16 ++--
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 72866ae..7b9434f 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -277,10 +277,9 @@ int fw_cfg_add_callback(FWCfgState *s, uint16_t key, 
FWCfgCallback callback,
 return 1;
 }
 
-int fw_cfg_add_file(FWCfgState *s,  const char *dir, const char *filename,
-uint8_t *data, uint32_t len)
+int fw_cfg_add_file(FWCfgState *s,  const char *filename, uint8_t *data,
+uint32_t len)
 {
-const char *basename;
 int i, index;
 
 if (!s-files) {
@@ -297,15 +296,8 @@ int fw_cfg_add_file(FWCfgState *s,  const char *dir, const 
char *filename,
 
 fw_cfg_add_bytes(s, FW_CFG_FILE_FIRST + index, data, len);
 
-basename = strrchr(filename, '/');
-if (basename) {
-basename++;
-} else {
-basename = filename;
-}
-
-snprintf(s-files-f[index].name, sizeof(s-files-f[index].name),
- %s/%s, dir, basename);
+pstrcpy(s-files-f[index].name, sizeof(s-files-f[index].name),
+filename);
 for (i = 0; i  index; i++) {
 if (strcmp(s-files-f[index].name, s-files-f[i].name) == 0) {
 FW_CFG_DPRINTF(%s: skip duplicate: %s\n, __FUNCTION__,
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 4d13a4f..856bf91 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -60,8 +60,8 @@ int fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t 
value);
 int fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value);
 int fw_cfg_add_callback(FWCfgState *s, uint16_t key, FWCfgCallback callback,
 void *callback_opaque, uint8_t *data, size_t len);
-int fw_cfg_add_file(FWCfgState *s, const char *dir, const char *filename,
-uint8_t *data, uint32_t len);
+int fw_cfg_add_file(FWCfgState *s, const char *filename, uint8_t *data,
+uint32_t len);
 FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 target_phys_addr_t crl_addr, target_phys_addr_t 
data_addr);
 
diff --git a/hw/loader.c b/hw/loader.c
index 49ac1fa..1e98326 100644
--- a/hw/loader.c
+++ b/hw/loader.c
@@ -592,8 +592,20 @@ int rom_add_file(const char *file, const char *fw_dir,
 }
 close(fd);
 rom_insert(rom);
-if (rom-fw_file  fw_cfg)
-fw_cfg_add_file(fw_cfg, rom-fw_dir, rom-fw_file, rom-data, 
rom-romsize);
+if (rom-fw_file  fw_cfg) {
+const char *basename;
+char fw_file_name[56];
+
+basename = strrchr(rom-fw_file, '/');
+if (basename) {
+basename++;
+} else {
+basename = rom-fw_file;
+}
+snprintf(fw_file_name, sizeof(fw_file_name), %s/%s, rom-fw_dir,
+ basename);
+fw_cfg_add_file(fw_cfg, fw_file_name, rom-data, rom-romsize);
+}
 return 0;
 
 err:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 14/15] Add notifier that will be called when machine is fully created.

2010-11-15 Thread Gleb Natapov
Action that depends on fully initialized device model should register
with this notifier chain.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 sysemu.h |2 ++
 vl.c |   15 +++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/sysemu.h b/sysemu.h
index 48f8eee..c42f33a 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -60,6 +60,8 @@ void qemu_system_reset(void);
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
+void qemu_add_machine_init_done_notifier(Notifier *notify);
+
 void do_savevm(Monitor *mon, const QDict *qdict);
 int load_vmstate(const char *name);
 void do_delvm(Monitor *mon, const QDict *qdict);
diff --git a/vl.c b/vl.c
index e8ada75..918d988 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,9 @@ static void *boot_set_opaque;
 static NotifierList exit_notifiers =
 NOTIFIER_LIST_INITIALIZER(exit_notifiers);
 
+static NotifierList machine_init_done_notifiers =
+NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
+
 int kvm_allowed = 0;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_EMULATE;
@@ -1778,6 +1781,16 @@ static void qemu_run_exit_notifiers(void)
 notifier_list_notify(exit_notifiers);
 }
 
+void qemu_add_machine_init_done_notifier(Notifier *notify)
+{
+notifier_list_add(machine_init_done_notifiers, notify);
+}
+
+static void qemu_run_machine_init_done_notifiers(void)
+{
+notifier_list_notify(machine_init_done_notifiers);
+}
+
 static const QEMUOption *lookup_opt(int argc, char **argv,
 const char **poptarg, int *poptind)
 {
@@ -3023,6 +3036,8 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 
+qemu_run_machine_init_done_notifiers();
+
 qemu_system_reset();
 if (loadvm) {
 if (load_vmstate(loadvm)  0) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 06/15] Add get_fw_dev_path callback to IDE bus.

2010-11-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/ide/qdev.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 88ff657..01a181b 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -24,9 +24,12 @@
 
 /* - */
 
+static char *idebus_get_fw_dev_path(DeviceState *dev);
+
 static struct BusInfo ide_bus_info = {
 .name  = IDE,
 .size  = sizeof(IDEBus),
+.get_fw_dev_path = idebus_get_fw_dev_path,
 };
 
 void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id)
@@ -35,6 +38,16 @@ void ide_bus_new(IDEBus *idebus, DeviceState *dev, int 
bus_id)
 idebus-bus_id = bus_id;
 }
 
+static char *idebus_get_fw_dev_path(DeviceState *dev)
+{
+char path[30];
+
+snprintf(path, sizeof(path), %...@%d, qdev_fw_name(dev),
+ ((IDEBus*)dev-parent_bus)-bus_id);
+
+return strdup(path);
+}
+
 static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
 {
 IDEDevice *dev = DO_UPCAST(IDEDevice, qdev, qdev);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5 13/15] Add bootindex for option roms.

2010-11-15 Thread Gleb Natapov
Extend -option-rom command to have additional parameter ,bootindex=.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/loader.c|   16 +++-
 hw/loader.h|8 
 hw/multiboot.c |3 ++-
 hw/ne2000.c|2 +-
 hw/nseries.c   |4 ++--
 hw/palm.c  |6 +++---
 hw/pc.c|7 ---
 hw/pci.c   |2 +-
 hw/pcnet.c |2 +-
 qemu-config.c  |   17 +
 sysemu.h   |6 +-
 vl.c   |   11 +--
 12 files changed, 60 insertions(+), 24 deletions(-)

diff --git a/hw/loader.c b/hw/loader.c
index 1e98326..eb198f6 100644
--- a/hw/loader.c
+++ b/hw/loader.c
@@ -107,7 +107,7 @@ int load_image_targphys(const char *filename,
 
 size = get_image_size(filename);
 if (size  0)
-rom_add_file_fixed(filename, addr);
+rom_add_file_fixed(filename, addr, -1);
 return size;
 }
 
@@ -557,10 +557,11 @@ static void rom_insert(Rom *rom)
 }
 
 int rom_add_file(const char *file, const char *fw_dir,
- target_phys_addr_t addr)
+ target_phys_addr_t addr, int32_t bootindex)
 {
 Rom *rom;
 int rc, fd = -1;
+char devpath[100];
 
 rom = qemu_mallocz(sizeof(*rom));
 rom-name = qemu_strdup(file);
@@ -605,7 +606,12 @@ int rom_add_file(const char *file, const char *fw_dir,
 snprintf(fw_file_name, sizeof(fw_file_name), %s/%s, rom-fw_dir,
  basename);
 fw_cfg_add_file(fw_cfg, fw_file_name, rom-data, rom-romsize);
+snprintf(devpath, sizeof(devpath), /r...@%s, fw_file_name);
+} else {
+snprintf(devpath, sizeof(devpath), /rom@ TARGET_FMT_plx, addr);
 }
+
+add_boot_device_path(bootindex, NULL, devpath);
 return 0;
 
 err:
@@ -635,12 +641,12 @@ int rom_add_blob(const char *name, const void *blob, 
size_t len,
 
 int rom_add_vga(const char *file)
 {
-return rom_add_file(file, vgaroms, 0);
+return rom_add_file(file, vgaroms, 0, -1);
 }
 
-int rom_add_option(const char *file)
+int rom_add_option(const char *file, int32_t bootindex)
 {
-return rom_add_file(file, genroms, 0);
+return rom_add_file(file, genroms, 0, bootindex);
 }
 
 static void rom_reset(void *unused)
diff --git a/hw/loader.h b/hw/loader.h
index 1f82fc5..fc6bdff 100644
--- a/hw/loader.h
+++ b/hw/loader.h
@@ -22,7 +22,7 @@ void pstrcpy_targphys(const char *name,
 
 
 int rom_add_file(const char *file, const char *fw_dir,
- target_phys_addr_t addr);
+ target_phys_addr_t addr, int32_t bootindex);
 int rom_add_blob(const char *name, const void *blob, size_t len,
  target_phys_addr_t addr);
 int rom_load_all(void);
@@ -31,8 +31,8 @@ int rom_copy(uint8_t *dest, target_phys_addr_t addr, size_t 
size);
 void *rom_ptr(target_phys_addr_t addr);
 void do_info_roms(Monitor *mon);
 
-#define rom_add_file_fixed(_f, _a)  \
-rom_add_file(_f, NULL, _a)
+#define rom_add_file_fixed(_f, _a, _i)  \
+rom_add_file(_f, NULL, _a, _i)
 #define rom_add_blob_fixed(_f, _b, _l, _a)  \
 rom_add_blob(_f, _b, _l, _a)
 
@@ -43,6 +43,6 @@ void do_info_roms(Monitor *mon);
 #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
 
 int rom_add_vga(const char *file);
-int rom_add_option(const char *file);
+int rom_add_option(const char *file, int32_t bootindex);
 
 #endif
diff --git a/hw/multiboot.c b/hw/multiboot.c
index f9097a2..b438019 100644
--- a/hw/multiboot.c
+++ b/hw/multiboot.c
@@ -325,7 +325,8 @@ int load_multiboot(void *fw_cfg,
 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
  sizeof(bootinfo));
 
-option_rom[nb_option_roms] = multiboot.bin;
+option_rom[nb_option_roms].name = multiboot.bin;
+option_rom[nb_option_roms].bootindex = 0;
 nb_option_roms++;
 
 return 1; /* yes, we are multiboot */
diff --git a/hw/ne2000.c b/hw/ne2000.c
index f4bbac2..67e0cb0 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -742,7 +742,7 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
 if (!pci_dev-qdev.hotplugged) {
 static int loaded = 0;
 if (!loaded) {
-rom_add_option(pxe-ne2k_pci.bin);
+rom_add_option(pxe-ne2k_pci.bin, -1);
 loaded = 1;
 }
 }
diff --git a/hw/nseries.c b/hw/nseries.c
index 04a028d..2f6f473 100644
--- a/hw/nseries.c
+++ b/hw/nseries.c
@@ -1326,7 +1326,7 @@ static void n8x0_init(ram_addr_t ram_size, const char 
*boot_device,
 qemu_register_reset(n8x0_boot_init, s);
 }
 
-if (option_rom[0]  (boot_device[0] == 'n' || !kernel_filename)) {
+if (option_rom[0].name  (boot_device[0] == 'n' || !kernel_filename)) {
 int rom_size;
 uint8_t nolo_tags[0x1];
 /* No, wait, better start at the ROM.  */
@@ -1341,7 +1341,7 @@ static void n8x0_init(ram_addr_t ram_size, const char 
*boot_device,
  *
  * The code above is for loading the `zImage' file from Nokia
  * images.  */
-rom_size = 

[PATCHv5 03/15] Keep track of ISA ports ISA device is using in qdev.

2010-11-15 Thread Gleb Natapov
Store all io ports used by device in ISADevice structure.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/cs4231a.c |1 +
 hw/fdc.c |3 +++
 hw/gus.c |4 
 hw/ide/isa.c |2 ++
 hw/isa-bus.c |   25 +
 hw/isa.h |4 
 hw/m48t59.c  |1 +
 hw/mc146818rtc.c |1 +
 hw/ne2000-isa.c  |3 +++
 hw/parallel.c|5 +
 hw/pckbd.c   |3 +++
 hw/sb16.c|4 
 hw/serial.c  |1 +
 13 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/hw/cs4231a.c b/hw/cs4231a.c
index 4d5ce5c..598f032 100644
--- a/hw/cs4231a.c
+++ b/hw/cs4231a.c
@@ -645,6 +645,7 @@ static int cs4231a_initfn (ISADevice *dev)
 isa_init_irq (dev, s-pic, s-irq);
 
 for (i = 0; i  4; i++) {
+isa_init_ioport(dev, i);
 register_ioport_write (s-port + i, 1, 1, cs_write, s);
 register_ioport_read (s-port + i, 1, 1, cs_read, s);
 }
diff --git a/hw/fdc.c b/hw/fdc.c
index a467c4b..5ab754b 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1983,6 +1983,9 @@ static int isabus_fdc_init1(ISADevice *dev)
   fdctrl_write_port, fdctrl);
 register_ioport_write(iobase + 0x07, 1, 1,
   fdctrl_write_port, fdctrl);
+isa_init_ioport_range(dev, iobase + 1, 5);
+isa_init_ioport(dev, iobase + 7);
+
 isa_init_irq(isa-busdev, fdctrl-irq, isairq);
 fdctrl-dma_chann = dma_chann;
 
diff --git a/hw/gus.c b/hw/gus.c
index e9016d8..ff9e7c7 100644
--- a/hw/gus.c
+++ b/hw/gus.c
@@ -264,20 +264,24 @@ static int gus_initfn (ISADevice *dev)
 
 register_ioport_write (s-port, 1, 1, gus_writeb, s);
 register_ioport_write (s-port, 1, 2, gus_writew, s);
+isa_init_ioport_range(dev, s-port, 2);
 
 register_ioport_read ((s-port + 0x100)  0xf00, 1, 1, gus_readb, s);
 register_ioport_read ((s-port + 0x100)  0xf00, 1, 2, gus_readw, s);
+isa_init_ioport_range(dev, (s-port + 0x100)  0xf00, 2);
 
 register_ioport_write (s-port + 6, 10, 1, gus_writeb, s);
 register_ioport_write (s-port + 6, 10, 2, gus_writew, s);
 register_ioport_read (s-port + 6, 10, 1, gus_readb, s);
 register_ioport_read (s-port + 6, 10, 2, gus_readw, s);
+isa_init_ioport_range(dev, s-port + 6, 10);
 
 
 register_ioport_write (s-port + 0x100, 8, 1, gus_writeb, s);
 register_ioport_write (s-port + 0x100, 8, 2, gus_writew, s);
 register_ioport_read (s-port + 0x100, 8, 1, gus_readb, s);
 register_ioport_read (s-port + 0x100, 8, 2, gus_readw, s);
+isa_init_ioport_range(dev, s-port + 0x100, 8);
 
 DMA_register_channel (s-emu.gusdma, GUS_read_DMA, s);
 s-emu.himemaddr = s-himem;
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 9856435..4206afd 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -70,6 +70,8 @@ static int isa_ide_initfn(ISADevice *dev)
 ide_bus_new(s-bus, s-dev.qdev);
 ide_init_ioport(s-bus, s-iobase, s-iobase2);
 isa_init_irq(dev, s-irq, s-isairq);
+isa_init_ioport_range(dev, s-iobase, 8);
+isa_init_ioport(dev, s-iobase2);
 ide_init2(s-bus, s-irq);
 vmstate_register(dev-qdev, 0, vmstate_ide_isa, s);
 return 0;
diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 26036e0..c0ac7e9 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -92,6 +92,31 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 dev-nirqs++;
 }
 
+static void isa_init_ioport_one(ISADevice *dev, uint16_t ioport)
+{
+assert(dev-nioports  ARRAY_SIZE(dev-ioports));
+dev-ioports[dev-nioports++] = ioport;
+}
+
+static int isa_cmp_ports(const void *p1, const void *p2)
+{
+return *(uint16_t*)p1 - *(uint16_t*)p2;
+}
+
+void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length)
+{
+int i;
+for (i = start; i  start + length; i++) {
+isa_init_ioport_one(dev, i);
+}
+qsort(dev-ioports, dev-nioports, sizeof(dev-ioports[0]), isa_cmp_ports);
+}
+
+void isa_init_ioport(ISADevice *dev, uint16_t ioport)
+{
+isa_init_ioport_range(dev, ioport, 1);
+}
+
 static int isa_qdev_init(DeviceState *qdev, DeviceInfo *base)
 {
 ISADevice *dev = DO_UPCAST(ISADevice, qdev, qdev);
diff --git a/hw/isa.h b/hw/isa.h
index aaf0272..4794b76 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -14,6 +14,8 @@ struct ISADevice {
 DeviceState qdev;
 uint32_t isairq[2];
 int nirqs;
+uint16_t ioports[32];
+int nioports;
 };
 
 typedef int (*isa_qdev_initfn)(ISADevice *dev);
@@ -26,6 +28,8 @@ ISABus *isa_bus_new(DeviceState *dev);
 void isa_bus_irqs(qemu_irq *irqs);
 qemu_irq isa_reserve_irq(int isairq);
 void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq);
+void isa_init_ioport(ISADevice *dev, uint16_t ioport);
+void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length);
 void isa_qdev_register(ISADeviceInfo *info);
 ISADevice *isa_create(const char *name);
 ISADevice *isa_create_simple(const char *name);
diff --git a/hw/m48t59.c b/hw/m48t59.c
index c7492a6..75a94e1 100644
--- 

Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 03:39 PM, Avi Kivity wrote:
Installing Windows XP with seabios 0.6.1, immediately after the first 
reboot, Windows hangs in protected mode instead of proceeding with 
installation.


I'm bisecting this, but if anyone can point to a likely culprit, I can 
try it first.




Bisect says:

commit 9a01a9c3eb336eca37c17fd74c79806ee0bda05b
Author: Kevin O'Connor ke...@koconnor.net
Date:   Wed Aug 25 21:07:48 2010 -0400

Only show bootsplash during boot menu.

When the bootsplash picture is shown, it's not possible to see text.
So, only display the picture while prompting the user for the boot
menu.

git bisect start 'rel-0.6.1' '17d3e46511aeedc9f09a8216d194d749187b80aa'
# good: [b4525a0ec176426788f293cce92160e6573e86b6] Handle unaligned 
sizes in iomemcpy().

git bisect good b4525a0ec176426788f293cce92160e6573e86b6
# good: [e2074bf6ec2956e1d803e62dcb052b7c88c214f0] Add ACPI SSDT/DSDT 
support for CPU hotplug.

git bisect good e2074bf6ec2956e1d803e62dcb052b7c88c214f0
# bad: [7ce09ae6542f0f4187024ae3267b61a0cf6ebd39] Make 
tools/transdump.py more resilient to unknown input.

git bisect bad 7ce09ae6542f0f4187024ae3267b61a0cf6ebd39
# good: [5feb83c8a55744397b4dd208fb4016a5c051222e] add write support to 
virtio-blk

git bisect good 5feb83c8a55744397b4dd208fb4016a5c051222e
# bad: [6039fc55274deb7202060d08e0f23b9f3dcface4] Update qemu_cfg_read 
to use rep insb.

git bisect bad 6039fc55274deb7202060d08e0f23b9f3dcface4


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 05:04 PM, Avi Kivity wrote:

On 11/15/2010 03:39 PM, Avi Kivity wrote:
Installing Windows XP with seabios 0.6.1, immediately after the first 
reboot, Windows hangs in protected mode instead of proceeding with 
installation.


I'm bisecting this, but if anyone can point to a likely culprit, I 
can try it first.




Bisect says:

commit 9a01a9c3eb336eca37c17fd74c79806ee0bda05b
Author: Kevin O'Connor ke...@koconnor.net
Date:   Wed Aug 25 21:07:48 2010 -0400

Only show bootsplash during boot menu.

When the bootsplash picture is shown, it's not possible to see text.
So, only display the picture while prompting the user for the boot
menu.

git bisect start 'rel-0.6.1' '17d3e46511aeedc9f09a8216d194d749187b80aa'
# good: [b4525a0ec176426788f293cce92160e6573e86b6] Handle unaligned 
sizes in iomemcpy().

git bisect good b4525a0ec176426788f293cce92160e6573e86b6
# good: [e2074bf6ec2956e1d803e62dcb052b7c88c214f0] Add ACPI SSDT/DSDT 
support for CPU hotplug.

git bisect good e2074bf6ec2956e1d803e62dcb052b7c88c214f0
# bad: [7ce09ae6542f0f4187024ae3267b61a0cf6ebd39] Make 
tools/transdump.py more resilient to unknown input.

git bisect bad 7ce09ae6542f0f4187024ae3267b61a0cf6ebd39
# good: [5feb83c8a55744397b4dd208fb4016a5c051222e] add write support 
to virtio-blk

git bisect good 5feb83c8a55744397b4dd208fb4016a5c051222e
# bad: [6039fc55274deb7202060d08e0f23b9f3dcface4] Update qemu_cfg_read 
to use rep insb.

git bisect bad 6039fc55274deb7202060d08e0f23b9f3dcface4




That was premature, the real culprit (if I didn't mess up the bisect) is:

commit 6039fc55274deb7202060d08e0f23b9f3dcface4
Author: Kevin O'Connor ke...@koconnor.net
Date:   Wed Aug 25 21:43:19 2010 -0400

Update qemu_cfg_read to use rep insb.

Use rep insb instead of manual loop - the host may be able to optimize
the rep insb instruction.

Which doesn't make any sense - will do more testing.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 05:12 PM, Avi Kivity wrote:


That was premature, the real culprit (if I didn't mess up the bisect) is:

commit 6039fc55274deb7202060d08e0f23b9f3dcface4
Author: Kevin O'Connor ke...@koconnor.net
Date:   Wed Aug 25 21:43:19 2010 -0400

Update qemu_cfg_read to use rep insb.

Use rep insb instead of manual loop - the host may be able to 
optimize

the rep insb instruction.

Which doesn't make any sense - will do more testing.



Confirmed with multiple tests.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 05:34 PM, Avi Kivity wrote:

On 11/15/2010 05:12 PM, Avi Kivity wrote:


That was premature, the real culprit (if I didn't mess up the bisect) 
is:


commit 6039fc55274deb7202060d08e0f23b9f3dcface4
Author: Kevin O'Connor ke...@koconnor.net
Date:   Wed Aug 25 21:43:19 2010 -0400

Update qemu_cfg_read to use rep insb.

Use rep insb instead of manual loop - the host may be able to 
optimize

the rep insb instruction.

Which doesn't make any sense - will do more testing.



Confirmed with multiple tests.



I think it's a miscompile.

out/code16.o:
 1a4:   3e  ds
 1a5:   6c  insb   (%dx),%es:(%edi)

Note no 66 prefix.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 05:41 PM, Avi Kivity wrote:


I think it's a miscompile.

out/code16.o:
 1a4:   3e  ds
 1a5:   6c  insb   (%dx),%es:(%edi)

Note no 66 prefix.



It isn't, that was random crap.  All the insb() code is 32-bit.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios 0.6.1 regression

2010-11-15 Thread Avi Kivity

On 11/15/2010 05:49 PM, Avi Kivity wrote:

On 11/15/2010 05:41 PM, Avi Kivity wrote:


I think it's a miscompile.

out/code16.o:
 1a4:   3e  ds
 1a5:   6c  insb   (%dx),%es:(%edi)

Note no 66 prefix.



It isn't, that was random crap.  All the insb() code is 32-bit.



Rewriting it to use inb / stos works (jecxz ; insb; loop doesn't) so it 
looks like a kernel bug in insb emulation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2010-11-15 Thread satimis

http://kortina94.com/mydocs.php
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-15 Thread M. Mohan Kumar
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:
 
 hw/virtio-9p-local.c: In function 'local_utimensat':
 hw/virtio-9p-local.c:479: warning: implicit declaration of function
 'utimensat' hw/virtio-9p-local.c:479: warning: nested extern declaration
 of 'utimensat'
 
 and:
 
 hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
 hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this
 function) hw/virtio-9p.c:1410: error: (Each undeclared identifier is
 reported only once hw/virtio-9p.c:1410: error: for each function it
 appears in.)
 hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this
 function) hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
 hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this
 function)
 
 v3:
   - Use better alternative handling for UTIME_NOW/OMIT
   - Move qemu_utimensat() to cutils.c
 V2:
   - Introduce qemu_utimensat()
 
 Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com

Looks good to me.

Acked-by: M. Mohan Kumar mo...@in.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] make-release: fix mtime for a wider range of git versions

2010-11-15 Thread Bernhard Kohl
With the latest git versions, e.g. 1.7.2.3, git still prints out
the tag info in addition to the requested format. So let's simply
fetch the first line from the output.

In addition I use the --pretty option instead of --format which
is not recognized in very old git versions, e.g. 1.5.5.6.

Tested with git versions 1.5.5.6 and 1.7.2.3.

Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
---
 kvm/scripts/make-release |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
index 56302c3..2d050fc 100755
--- a/kvm/scripts/make-release
+++ b/kvm/scripts/make-release
@@ -51,7 +51,7 @@ cd $(dirname $0)/../..
 mkdir -p $(dirname $tarball)
 git archive --prefix=$name/ --format=tar $commit  $tarball
 
-mtime=`git show --format=%ct $commit^{commit} --`
+mtime=`git show --pretty=format:%ct $commit^{commit} -- | head -n 1`
 tarargs=--owner=root --group=root
 
 mkdir -p $tmpdir/$name
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph/rbd block driver for qemu-kvm (v7)

2010-11-15 Thread Christian Brunner
Hi Stefan,

thanks for your feedback. Yehuda and Sage have already committed some
pathes to our git repository.

What I'm not sure about is the rados_(de)initialization for multiple
rbd images. I suspect that _deinitialize should only be called for the
last rbd image.

Yehuda and Sage know librados a lot better than me. I pretty sure,
that they will give some feedback about this remaining issue. After
that we will send an updated patch.

Regards,
Christian

2010/11/11 Stefan Hajnoczi stefa...@gmail.com:
 On Fri, Oct 15, 2010 at 8:54 PM, Christian Brunner c...@muc.de wrote:
 [...]
 +
 +    if ((r = rados_initialize(0, NULL))  0) {
 +        error_report(error initializing);
 +        return r;
 +    }

 Does rados_initialize() work when called multiple times?  This would happen if
 the VM has several rbd devices attached.

 [...]

 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


trace_printk() support in trace-cmd

2010-11-15 Thread Avi Kivity

trace-cmd doesn't like trace_printk():

...-23775 [000] 26343.288803: kvm_emulate_insn: 0:f14e9: rep insb
...-23775 [000] 26343.288804: bprint:   x86_emulate_insn : 
(NO FORMAT FOUND at a0131460)


...-23775 [000] 26343.288807: bprint:   x86_emulate_insn : 
(NO FORMAT FOUND at a0131460)


any chance to get it to work with custom printks?

I guess I should use 'perf probe' instead.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph/rbd block driver for qemu-kvm (v7)

2010-11-15 Thread Yehuda Sadeh Weinraub
On Mon, Nov 15, 2010 at 9:04 AM, Christian Brunner
c.m.brun...@gmail.com wrote:
 Hi Stefan,

 thanks for your feedback. Yehuda and Sage have already committed some
 pathes to our git repository.

 What I'm not sure about is the rados_(de)initialization for multiple
 rbd images. I suspect that _deinitialize should only be called for the
 last rbd image.

 Yehuda and Sage know librados a lot better than me. I pretty sure,
 that they will give some feedback about this remaining issue. After
 that we will send an updated patch.

 Regards,
 Christian

 2010/11/11 Stefan Hajnoczi stefa...@gmail.com:
 On Fri, Oct 15, 2010 at 8:54 PM, Christian Brunner c...@muc.de wrote:
 [...]
 +
 +    if ((r = rados_initialize(0, NULL))  0) {
 +        error_report(error initializing);
 +        return r;
 +    }

 Does rados_initialize() work when called multiple times?  This would happen 
 if
 the VM has several rbd devices attached.

 [...]

 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


The rados (de)initialization is refcounted and it is safe to call it
multiple times.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-15 Thread Steven Rostedt
On Mon, 2010-11-15 at 19:09 +0200, Avi Kivity wrote:
 trace-cmd doesn't like trace_printk():
 
 ...-23775 [000] 26343.288803: kvm_emulate_insn: 0:f14e9: rep insb
 ...-23775 [000] 26343.288804: bprint:   x86_emulate_insn : 
 (NO FORMAT FOUND at a0131460)
 
 ...-23775 [000] 26343.288807: bprint:   x86_emulate_insn : 
 (NO FORMAT FOUND at a0131460)
 
 any chance to get it to work with custom printks?
 
 I guess I should use 'perf probe' instead.
 

Which kernel are you using, and/or which trace-cmd? It works fine for
me. But there has been bugs with older kernels and older trace-cmds.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-11-15 Thread Avi Kivity

On 11/15/2010 07:11 PM, Steven Rostedt wrote:

On Mon, 2010-11-15 at 19:09 +0200, Avi Kivity wrote:
  trace-cmd doesn't like trace_printk():

  ...-23775 [000] 26343.288803: kvm_emulate_insn: 0:f14e9: rep insb
  ...-23775 [000] 26343.288804: bprint:   x86_emulate_insn :
  (NO FORMAT FOUND at a0131460)

  ...-23775 [000] 26343.288807: bprint:   x86_emulate_insn :
  (NO FORMAT FOUND at a0131460)

  any chance to get it to work with custom printks?

  I guess I should use 'perf probe' instead.


Which kernel are you using, and/or which trace-cmd? It works fine for
me. But there has been bugs with older kernels and older trace-cmds.


kernel 2.6.37-rc1+, I think latest trace-cmd; will retry.

Meanwhile 'perf probe' grumbles on anonymous unions:


src(tyep:operand) has no member val.
Failed to find 'ctxt' in this function.
  Error: Failed to add events. (-22)


For

--add 'insb=arch/x86/kvm/emulate.c:3369 insn=ctxt-decode.b 
bytes=ctxt-decode.dst.bytes port=ctxt-decode.src.val 
cx=ctxt-decode.regs[1] di=ctxt-decode.regs[7] 
addr=ctxt-decode.dst.mem.addr cpos=ctxt-decode.io_read.pos 
cend=ctxt-decode.io_read.end'


(ctxt-decode.src.val is a member of an anonymous union in 
ctxt-decode.src, which is of type 'struct operand')


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
Use register_device_unmigratable() to declare ourselves as
non-migratable.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..cd93941 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1434,6 +1434,13 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 dev-msix_table_page = NULL;
 }
 
+/* This should never get called, but we're required to create a save_state
+ * handler or else the no_migrate flag will never be checked. */
+static void assigned_save(QEMUFile* f, void *opaque)
+{
+abort();
+}
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(devs, dev, next);
+
+/* Assigned devices are not migratable, register a save
+ * state entry so that we can mark it unmigratable. */
+register_savevm(dev-dev.qdev, pci-assign, 0, 0,
+assigned_save, NULL, dev);
+register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
+
 return 0;
 
 assigned_out:
@@ -1503,6 +1517,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+unregister_savevm(dev-dev.qdev, pci-assign, dev);
 QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 20:41, Alex Williamson wrote:
 Use register_device_unmigratable() to declare ourselves as
 non-migratable.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  hw/device-assignment.c |   15 +++
  1 files changed, 15 insertions(+), 0 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index bde231d..cd93941 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,13 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +/* This should never get called, but we're required to create a save_state
 + * handler or else the no_migrate flag will never be checked. */
 +static void assigned_save(QEMUFile* f, void *opaque)
 +{
 +abort();
 +}
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
  
  assigned_dev_load_option_rom(dev);
  QLIST_INSERT_HEAD(devs, dev, next);
 +
 +/* Assigned devices are not migratable, register a save
 + * state entry so that we can mark it unmigratable. */
 +register_savevm(dev-dev.qdev, pci-assign, 0, 0,
 +assigned_save, NULL, dev);
 +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
 +

Isn't this expressible via some VMStateDescription? If not, that should
be changed first.

Jan

  return 0;
  
  assigned_out:
 @@ -1503,6 +1517,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  
 +unregister_savevm(dev-dev.qdev, pci-assign, dev);
  QLIST_REMOVE(dev, next);
  deassign_device(dev);
  free_assigned_device(dev);
 



signature.asc
Description: OpenPGP digital signature


KVM call agenda for Nov 16

2010-11-15 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
On Mon, 2010-11-15 at 21:05 +0100, Jan Kiszka wrote:
 Am 15.11.2010 20:41, Alex Williamson wrote:
  Use register_device_unmigratable() to declare ourselves as
  non-migratable.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   hw/device-assignment.c |   15 +++
   1 files changed, 15 insertions(+), 0 deletions(-)
  
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index bde231d..cd93941 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -1434,6 +1434,13 @@ static void 
  assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
   dev-msix_table_page = NULL;
   }
   
  +/* This should never get called, but we're required to create a save_state
  + * handler or else the no_migrate flag will never be checked. */
  +static void assigned_save(QEMUFile* f, void *opaque)
  +{
  +abort();
  +}
  +
   static int assigned_initfn(struct PCIDevice *pci_dev)
   {
   AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  @@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
   
   assigned_dev_load_option_rom(dev);
   QLIST_INSERT_HEAD(devs, dev, next);
  +
  +/* Assigned devices are not migratable, register a save
  + * state entry so that we can mark it unmigratable. */
  +register_savevm(dev-dev.qdev, pci-assign, 0, 0,
  +assigned_save, NULL, dev);
  +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
  +
 
 Isn't this expressible via some VMStateDescription? If not, that should
 be changed first.

Nope, save state handlers aren't allowed to fail.  I tried to fix it:

http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg00417.html

(you can find more discussion in other branches of that subject)  I've
succumbed to not getting that series in, so now I'm just trying to use
the code as it exists.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Blue Swirl
2010/11/15 Gleb Natapov g...@redhat.com:
 On Sun, Nov 14, 2010 at 10:50:13PM +, Blue Swirl wrote:
 On Sun, Nov 14, 2010 at 3:39 PM, Gleb Natapov g...@redhat.com wrote:
 
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   hw/fw_cfg.c |   14 ++
   hw/fw_cfg.h |    4 +++-
   sysemu.h    |    1 +
   vl.c        |   51 +++
   4 files changed, 69 insertions(+), 1 deletions(-)
 
  diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
  index 7b9434f..f6a67db 100644
  --- a/hw/fw_cfg.c
  +++ b/hw/fw_cfg.c
  @@ -53,6 +53,7 @@ struct FWCfgState {
      FWCfgFiles *files;
      uint16_t cur_entry;
      uint32_t cur_offset;
  +    Notifier machine_ready;
   };
 
   static void fw_cfg_write(FWCfgState *s, uint8_t value)
  @@ -315,6 +316,15 @@ int fw_cfg_add_file(FWCfgState *s,  const char 
  *filename, uint8_t *data,
      return 1;
   }
 
  +static void fw_cfg_machine_ready(struct Notifier* n)
  +{
  +    uint32_t len;
  +    char *bootindex = get_boot_devices_list(len);
  +
  +    fw_cfg_add_bytes(container_of(n, FWCfgState, machine_ready),
  +                     FW_CFG_BOOTINDEX, (uint8_t*)bootindex, len);
  +}
  +
   FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
                          target_phys_addr_t ctl_addr, target_phys_addr_t 
  data_addr)
   {
  @@ -343,6 +353,10 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
  data_port,
      fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
      fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 
  +
  +    s-machine_ready.notify = fw_cfg_machine_ready;
  +    qemu_add_machine_init_done_notifier(s-machine_ready);
  +
      return s;
   }
 
  diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
  index 856bf91..4d61410 100644
  --- a/hw/fw_cfg.h
  +++ b/hw/fw_cfg.h
  @@ -30,7 +30,9 @@
 
   #define FW_CFG_FILE_FIRST       0x20
   #define FW_CFG_FILE_SLOTS       0x10
  -#define FW_CFG_MAX_ENTRY        (FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
  +#define FW_CFG_FILE_LAST_SLOT   (FW_CFG_FILE_FIRST+FW_CFG_FILE_SLOTS)
  +#define FW_CFG_BOOTINDEX        (FW_CFG_FILE_LAST_SLOT + 1)
  +#define FW_CFG_MAX_ENTRY        FW_CFG_BOOTINDEX

 This should be
 #define FW_CFG_MAX_ENTRY        (FW_CFG_BOOTINDEX + 1)
 because the check is like this:
     if ((key  FW_CFG_ENTRY_MASK) = FW_CFG_MAX_ENTRY) {
         s-cur_entry = FW_CFG_INVALID;

 Yeah, will fix.

 With that change, I got the bootindex passed to OpenBIOS:
 OpenBIOS for Sparc64
 Configuration device id QEMU version 1 machine id 0
 kernel cmdline
 CPUs: 1 x SUNW,UltraSPARC-IIi
 UUID: ----
 bootindex num_strings 1
 bootindex /p...@01fe/i...@5/dr...@1/d...@0

 The device path does not match exactly, but it's close:
 /p...@1fe,0/pci-...@5/i...@600/d...@0

 pbm-pci should be solvable by the patch at the end. Were in the spec
 it is allowed to abbreviate 1fe as 1fe,0? Spec allows to drop
 starting zeroes but TARGET_FMT_plx definition in targphys.h has 0 after
 %. I can define another one without leading zeroes. Can you suggest
 a name?

I think OpenBIOS for Sparc64 is not correct here, so it may be a bad
reference architecture. OBP on a real Ultra-5 used a path like this:
/p...@1f,0/p...@1,1/i...@3/d...@0,0

p...@1f,0 specifies the PCI host bridge at UPA bus port ID of 0x1f.
p...@1,1 specifies a PCI-PCI bridge.

 TARGET_FMT_lx is poisoned. As of ATA there is no open firmware
 binding spec for ATA, so everyone does what he pleases. I based my
 implementation on what open firmware showing when running on qemu x86.
 pci-ata should be ide according to PCI binding spec :)

Yes, for example there is no ATA in the Ultra-5 tree but in UltraAX it exists:
/p...@1f,4000/i...@3/a...@0,0/c...@0,0

 diff --git a/hw/apb_pci.c b/hw/apb_pci.c
 index c619112..643aa49 100644
 --- a/hw/apb_pci.c
 +++ b/hw/apb_pci.c
 @@ -453,6 +453,7 @@ static PCIDeviceInfo pbm_pci_host_info = {

  static SysBusDeviceInfo pbm_host_info = {
     .qdev.name = pbm,
 +    .qdev.fw_name = pci,

Perhaps the FW path should use device class names if no name is specified.

I'll try Sparc32 to see how this fits there.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: register a reset function

2010-11-15 Thread Alex Williamson
On Mon, 2010-11-15 at 13:08 +0100, Jan Kiszka wrote:
 [Wrong list, it's not upstream yet. I'm migrating the thread to kvm.]
 
 Am 15.11.2010 12:33, Bernhard Kohl wrote:
  This is necessary because during reboot of a VM the assigned devices
  continue DMA transfers which causes memory corruption.
  
  Signed-off-by: Thomas Ostler thomas.ost...@nsn.com
  Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
  ---
  Sorry for for the long delay. Finally we added Alex' suggestions
  and rebased the patch.
  
  Thanks
  Bernhard
  ---
   hw/device-assignment.c |   12 
   1 files changed, 12 insertions(+), 0 deletions(-)
  
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index 5f5bde1..3f8de66 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -1434,6 +1434,17 @@ static void 
  assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
   dev-msix_table_page = NULL;
   }
   
  +static void reset_assigned_device(DeviceState *dev)
  +{
  +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
  +uint32_t conf;
  +
  +/* reset the bus master bit to avoid further DMA transfers */
  +conf = assigned_dev_pci_read_config(d, PCI_COMMAND, 2);
  +conf = ~PCI_COMMAND_MASTER;
  +assigned_dev_pci_write_config(d, PCI_COMMAND, conf, 2);
 
 What about writing to /sys/bus/pci/devices/$DEVICE/reset? You probably
 still need to put the command word into the reset state (ie. no RMW in
 any case, just write 0), but the hardware should receive a reset as well
 - if it is capable of doing a function-level reset, but we should at
 least try.

libvirt doesn't currently give us write access to that file, so it'd
require changes up the stack too.  We could accomplish the same by
deassigning and reassigning the device through KVM, but that seems error
prone.  I'm not entirely convinced it's really necessary to go that far,
I expect there's some physical systems out there that don't reset the
device on a warm reset.  In any case, I think doing this much is at
least a good start.  Thanks,

Alex

  +}
  +
   static int assigned_initfn(struct PCIDevice *pci_dev)
   {
   AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  @@ -1544,6 +1555,7 @@ static PCIDeviceInfo assign_info = {
   .qdev.name= pci-assign,
   .qdev.desc= pass through host pci devices to the guest,
   .qdev.size= sizeof(AssignedDevice),
  +.qdev.reset   = reset_assigned_device,
   .init = assigned_initfn,
   .exit = assigned_exitfn,
   .config_read  = assigned_dev_pci_read_config,
 
 Jan
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: register a reset function

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 21:38, Alex Williamson wrote:
 On Mon, 2010-11-15 at 13:08 +0100, Jan Kiszka wrote:
 [Wrong list, it's not upstream yet. I'm migrating the thread to kvm.]

 Am 15.11.2010 12:33, Bernhard Kohl wrote:
 This is necessary because during reboot of a VM the assigned devices
 continue DMA transfers which causes memory corruption.

 Signed-off-by: Thomas Ostler thomas.ost...@nsn.com
 Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com
 ---
 Sorry for for the long delay. Finally we added Alex' suggestions
 and rebased the patch.

 Thanks
 Bernhard
 ---
  hw/device-assignment.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 5f5bde1..3f8de66 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,17 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +static void reset_assigned_device(DeviceState *dev)
 +{
 +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
 +uint32_t conf;
 +
 +/* reset the bus master bit to avoid further DMA transfers */
 +conf = assigned_dev_pci_read_config(d, PCI_COMMAND, 2);
 +conf = ~PCI_COMMAND_MASTER;
 +assigned_dev_pci_write_config(d, PCI_COMMAND, conf, 2);

 What about writing to /sys/bus/pci/devices/$DEVICE/reset? You probably
 still need to put the command word into the reset state (ie. no RMW in
 any case, just write 0), but the hardware should receive a reset as well
 - if it is capable of doing a function-level reset, but we should at
 least try.
 
 libvirt doesn't currently give us write access to that file, so it'd
 require changes up the stack too.  We could accomplish the same by
 deassigning and reassigning the device through KVM, but that seems error
 prone.  I'm not entirely convinced it's really necessary to go that far,
 I expect there's some physical systems out there that don't reset the
 device on a warm reset.  In any case, I think doing this much is at
 least a good start.  Thanks,


OK, can be done on top of it - but should be done as most systems
perform a reset that is even stronger than pci_reset_function (I've seen
devices only recovering after warm reboot).

Still, I would suggest

assigned_dev_pci_write_config(d, PCI_COMMAND, 0, 2);

i.e. reset command word to specified reset state.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 21:25, Alex Williamson wrote:
 On Mon, 2010-11-15 at 21:05 +0100, Jan Kiszka wrote:
 Am 15.11.2010 20:41, Alex Williamson wrote:
 Use register_device_unmigratable() to declare ourselves as
 non-migratable.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  hw/device-assignment.c |   15 +++
  1 files changed, 15 insertions(+), 0 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index bde231d..cd93941 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,13 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +/* This should never get called, but we're required to create a save_state
 + * handler or else the no_migrate flag will never be checked. */
 +static void assigned_save(QEMUFile* f, void *opaque)
 +{
 +abort();
 +}
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
  
  assigned_dev_load_option_rom(dev);
  QLIST_INSERT_HEAD(devs, dev, next);
 +
 +/* Assigned devices are not migratable, register a save
 + * state entry so that we can mark it unmigratable. */
 +register_savevm(dev-dev.qdev, pci-assign, 0, 0,
 +assigned_save, NULL, dev);
 +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
 +

 Isn't this expressible via some VMStateDescription? If not, that should
 be changed first.
 
 Nope, save state handlers aren't allowed to fail.  I tried to fix it:
 
 http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg00417.html
 
 (you can find more discussion in other branches of that subject)  I've
 succumbed to not getting that series in, so now I'm just trying to use
 the code as it exists.  Thanks,

Hmm, didn't get why you need that series for the purpose of no_migration
declaration. My point is:

struct VMStateDescription {
const char *name;
int version_id;
int minimum_version_id;
int minimum_version_id_old;
int no_migrate; /* or 'flags' */
...

so that you can specify an empty vmstate with that flag set, and you do
not need to register/unregister things via to-be-deprecated service
calls. Or am I missing some subtle detail?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
On Mon, 2010-11-15 at 23:04 +0100, Jan Kiszka wrote:
 Am 15.11.2010 21:25, Alex Williamson wrote:
  On Mon, 2010-11-15 at 21:05 +0100, Jan Kiszka wrote:
  Am 15.11.2010 20:41, Alex Williamson wrote:
  Use register_device_unmigratable() to declare ourselves as
  non-migratable.
 
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
 
   hw/device-assignment.c |   15 +++
   1 files changed, 15 insertions(+), 0 deletions(-)
 
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index bde231d..cd93941 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -1434,6 +1434,13 @@ static void 
  assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
   dev-msix_table_page = NULL;
   }
   
  +/* This should never get called, but we're required to create a 
  save_state
  + * handler or else the no_migrate flag will never be checked. */
  +static void assigned_save(QEMUFile* f, void *opaque)
  +{
  +abort();
  +}
  +
   static int assigned_initfn(struct PCIDevice *pci_dev)
   {
   AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  @@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice 
  *pci_dev)
   
   assigned_dev_load_option_rom(dev);
   QLIST_INSERT_HEAD(devs, dev, next);
  +
  +/* Assigned devices are not migratable, register a save
  + * state entry so that we can mark it unmigratable. */
  +register_savevm(dev-dev.qdev, pci-assign, 0, 0,
  +assigned_save, NULL, dev);
  +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
  +
 
  Isn't this expressible via some VMStateDescription? If not, that should
  be changed first.
  
  Nope, save state handlers aren't allowed to fail.  I tried to fix it:
  
  http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg00417.html
  
  (you can find more discussion in other branches of that subject)  I've
  succumbed to not getting that series in, so now I'm just trying to use
  the code as it exists.  Thanks,
 
 Hmm, didn't get why you need that series for the purpose of no_migration
 declaration.
  My point is:

I was hoping you were going down the path I started, that we don't need
to special case non-migratable devices if we just allow save to return
an error.

  My point is:
 
 struct VMStateDescription {
 const char *name;
 int version_id;
 int minimum_version_id;
 int minimum_version_id_old;
 int no_migrate; /* or 'flags' */
 ...
 
 so that you can specify an empty vmstate with that flag set, and you do
 not need to register/unregister things via to-be-deprecated service
 calls. Or am I missing some subtle detail?

We don't seem to be enforcing that new drivers should use vmsd vs the
old style handling and there are some drivers that are currently too
complicated for vmsd to handle, which means no_migrate has to be
registered on the save state entry, not the vmsd.  So I could create a
dummy vmsd, then call register_device_unmigratable, to achieve roughly
the same effect.  Six of one, half dozen of the other... I can switch to
a vmsd dummy save if it's preferred.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Register as un-migratable

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 23:25, Alex Williamson wrote:
 On Mon, 2010-11-15 at 23:04 +0100, Jan Kiszka wrote:
 Am 15.11.2010 21:25, Alex Williamson wrote:
 On Mon, 2010-11-15 at 21:05 +0100, Jan Kiszka wrote:
 Am 15.11.2010 20:41, Alex Williamson wrote:
 Use register_device_unmigratable() to declare ourselves as
 non-migratable.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  hw/device-assignment.c |   15 +++
  1 files changed, 15 insertions(+), 0 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index bde231d..cd93941 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,13 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +/* This should never get called, but we're required to create a 
 save_state
 + * handler or else the no_migrate flag will never be checked. */
 +static void assigned_save(QEMUFile* f, void *opaque)
 +{
 +abort();
 +}
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1490,6 +1497,13 @@ static int assigned_initfn(struct PCIDevice 
 *pci_dev)
  
  assigned_dev_load_option_rom(dev);
  QLIST_INSERT_HEAD(devs, dev, next);
 +
 +/* Assigned devices are not migratable, register a save
 + * state entry so that we can mark it unmigratable. */
 +register_savevm(dev-dev.qdev, pci-assign, 0, 0,
 +assigned_save, NULL, dev);
 +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
 +

 Isn't this expressible via some VMStateDescription? If not, that should
 be changed first.

 Nope, save state handlers aren't allowed to fail.  I tried to fix it:

 http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg00417.html

 (you can find more discussion in other branches of that subject)  I've
 succumbed to not getting that series in, so now I'm just trying to use
 the code as it exists.  Thanks,

 Hmm, didn't get why you need that series for the purpose of no_migration
 declaration.
  My point is:
 
 I was hoping you were going down the path I started, that we don't need
 to special case non-migratable devices if we just allow save to return
 an error.

Ah, of course. Still, vmstate is the way to go IMO.

 
  My point is:

 struct VMStateDescription {
 const char *name;
 int version_id;
 int minimum_version_id;
 int minimum_version_id_old;
 int no_migrate; /* or 'flags' */
 ...

 so that you can specify an empty vmstate with that flag set, and you do
 not need to register/unregister things via to-be-deprecated service
 calls. Or am I missing some subtle detail?
 
 We don't seem to be enforcing that new drivers should use vmsd vs the
 old style handling

Don't we? We are enforcing qdev but not vmstate? Sounds silly. I think
new devices should all be expressible this way, no?

 and there are some drivers that are currently too
 complicated for vmsd to handle, which means no_migrate has to be
 registered on the save state entry, not the vmsd.  So I could create a
 dummy vmsd, then call register_device_unmigratable, to achieve roughly
 the same effect.  Six of one, half dozen of the other... I can switch to
 a vmsd dummy save if it's preferred.  Thanks,

IMHO, new designs should not use register_savevm anymore.

Jan



signature.asc
Description: OpenPGP digital signature


Re: PCI passthrough on Sony Vaio F11 laptop...

2010-11-15 Thread Jan Kiszka
Am 15.11.2010 22:55, Erik Brakkee wrote:
 Jan Kiszka wrote:
 Am 14.11.2010 14:21, Erik Brakkee wrote:
   
 Jan Kiszka wrote:
 
 Strange, should work. I would suggest to post your full kernel log,
 maybe there is some enlightening message hidden.

 I don't think it is a problem of your kernel version, but I'm able to
 pass through devices on OpenSUSE 11.3 with
 kernel-desktop-2.6.36-90.1.x86_64 from their kernel repository.

 Jan



 Exactly what server logs do you need. Is this only /var/log/messages or
 more? And do I need to set specific options there?
 Any other log files that you need?
  
 dmesg  log-file

   
 Before, generating these logs I will upgrade to a later kernel. As far
 as I can tell, that will still be a 2.6.34 kernel. Perhaps I should try
 the 2.6.36 kernel as well. Do you have the URL for the kernel repository
 I should use? (cannot find an obvious kernel repository in YAST2).
  
 http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.3

 Jan


 I have attached the logs of /var/log/messages, dmesg, and qemu log
 (other.log), as well as the kernel config parameters (/proc/config.gz).
 I did the test with two kernels (see the tar.gz file): one kernel a
 2.6.34 and the other a 2.6.36 kernel.


Comparing the dmesg with my kernel log, I'm missing messages like

[0.023960] DMAR: Host address width 36
[0.023962] DMAR: DRHD base: 0x00fed9 flags: 0x0
[0.023968] IOMMU 0: reg_base_addr fed9 ver 1:0 cap c9008020e30272 ecap 
1000
[0.023970] DMAR: DRHD base: 0x00fed93000 flags: 0x1
[0.023974] IOMMU 1: reg_base_addr fed93000 ver 1:0 cap c9008020630272 ecap 
1000
[0.023976] DMAR: RMRR base: 0x00bf6e9000 end: 0x00bf6f
[0.023978] DMAR: No ATSR found

about the Intel DMAR (IOMMU) setup.

Are you sure that you have an Intel chipset with the required features?
And have you checked that VT-d is enabled in the BIOS (or however it
may be called there)?

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH v2] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
Use register_device_unmigratable() to declare ourselves as
non-migratable.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 v2: Use dummy vmsd instead of dummy save_state

 hw/device-assignment.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..154bb1a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1434,6 +1434,10 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 dev-msix_table_page = NULL;
 }
 
+static const VMStateDescription vmstate_assigned_device = {
+.name = pci-assign
+};
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1490,6 +1494,11 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(devs, dev, next);
+
+/* Register a vmsd so that we can mark it unmigratable. */
+vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);
+register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
+
 return 0;
 
 assigned_out:
@@ -1503,6 +1512,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+vmstate_unregister(dev-dev.qdev, vmstate_assigned_device, dev);
 QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
Use register_device_unmigratable() to declare ourselves as
non-migratable.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 v3: Use .name instead of repeating pci-assign
 v2: Use dummy vmsd instead of dummy save_state

 hw/device-assignment.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..41dec36 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1434,6 +1434,10 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 dev-msix_table_page = NULL;
 }
 
+static const VMStateDescription vmstate_assigned_device = {
+.name = pci-assign
+};
+
 static int assigned_initfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1490,6 +1494,12 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(devs, dev, next);
+
+/* Register a vmsd so that we can mark it unmigratable. */
+vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);
+register_device_unmigratable(dev-dev.qdev,
+ vmstate_assigned_device.name, dev);
+
 return 0;
 
 assigned_out:
@@ -1503,6 +1513,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+vmstate_unregister(dev-dev.qdev, vmstate_assigned_device, dev);
 QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] device-assignment: Register as un-migratable

2010-11-15 Thread Jan Kiszka
Am 16.11.2010 00:06, Alex Williamson wrote:
 Use register_device_unmigratable() to declare ourselves as
 non-migratable.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  v2: Use dummy vmsd instead of dummy save_state
 
  hw/device-assignment.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index bde231d..154bb1a 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,10 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +static const VMStateDescription vmstate_assigned_device = {
 +.name = pci-assign
 +};
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1490,6 +1494,11 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
  
  assigned_dev_load_option_rom(dev);
  QLIST_INSERT_HEAD(devs, dev, next);
 +
 +/* Register a vmsd so that we can mark it unmigratable. */
 +vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);

Almost: You can register this vmstate description via assign_info
(.qdev.vmsd = ).

Jan

 +register_device_unmigratable(dev-dev.qdev, pci-assign, dev);
 +
  return 0;
  
  assigned_out:
 @@ -1503,6 +1512,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  
 +vmstate_unregister(dev-dev.qdev, vmstate_assigned_device, dev);
  QLIST_REMOVE(dev, next);
  deassign_device(dev);
  free_assigned_device(dev);
 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] device-assignment: Register as un-migratable

2010-11-15 Thread Alex Williamson
On Tue, 2010-11-16 at 00:11 +0100, Jan Kiszka wrote:
 Am 16.11.2010 00:06, Alex Williamson wrote:
  Use register_device_unmigratable() to declare ourselves as
  non-migratable.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   v2: Use dummy vmsd instead of dummy save_state
  
   hw/device-assignment.c |   10 ++
   1 files changed, 10 insertions(+), 0 deletions(-)
  
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index bde231d..154bb1a 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -1434,6 +1434,10 @@ static void 
  assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
   dev-msix_table_page = NULL;
   }
   
  +static const VMStateDescription vmstate_assigned_device = {
  +.name = pci-assign
  +};
  +
   static int assigned_initfn(struct PCIDevice *pci_dev)
   {
   AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  @@ -1490,6 +1494,11 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
   
   assigned_dev_load_option_rom(dev);
   QLIST_INSERT_HEAD(devs, dev, next);
  +
  +/* Register a vmsd so that we can mark it unmigratable. */
  +vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);
 
 Almost: You can register this vmstate description via assign_info
 (.qdev.vmsd = ).

Only if you have some other suggestion on where to call
register_device_unmigratable rather than init.  qdev_init looks like
this:

int qdev_init(DeviceState *dev)
{
int rc;

assert(dev-state == DEV_STATE_CREATED);
rc = dev-info-init(dev, dev-info);
if (rc  0) {
qdev_free(dev);
return rc;
}
qemu_register_reset(qdev_reset, dev);
if (dev-info-vmsd) {
vmstate_register_with_alias_id(dev, -1, dev-info-vmsd, dev,
   dev-instance_id_alias,
   dev-alias_required_for_version);
}
dev-state = DEV_STATE_INITIALIZED;
return 0;
}

So the save state entry hasn't been inserted yet for me to attach the
no_migrate flag to from init :(

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] device-assignment: Register as un-migratable

2010-11-15 Thread Jan Kiszka
Am 16.11.2010 00:24, Alex Williamson wrote:
 On Tue, 2010-11-16 at 00:11 +0100, Jan Kiszka wrote:
 Am 16.11.2010 00:06, Alex Williamson wrote:
 Use register_device_unmigratable() to declare ourselves as
 non-migratable.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  v2: Use dummy vmsd instead of dummy save_state

  hw/device-assignment.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index bde231d..154bb1a 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1434,6 +1434,10 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  dev-msix_table_page = NULL;
  }
  
 +static const VMStateDescription vmstate_assigned_device = {
 +.name = pci-assign
 +};
 +
  static int assigned_initfn(struct PCIDevice *pci_dev)
  {
  AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1490,6 +1494,11 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
  
  assigned_dev_load_option_rom(dev);
  QLIST_INSERT_HEAD(devs, dev, next);
 +
 +/* Register a vmsd so that we can mark it unmigratable. */
 +vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev);

 Almost: You can register this vmstate description via assign_info
 (.qdev.vmsd = ).
 
 Only if you have some other suggestion on where to call
 register_device_unmigratable rather than init.  qdev_init looks like
 this:
 
 int qdev_init(DeviceState *dev)
 {
 int rc;
 
 assert(dev-state == DEV_STATE_CREATED);
 rc = dev-info-init(dev, dev-info);
 if (rc  0) {
 qdev_free(dev);
 return rc;
 }
 qemu_register_reset(qdev_reset, dev);
 if (dev-info-vmsd) {
 vmstate_register_with_alias_id(dev, -1, dev-info-vmsd, dev,
dev-instance_id_alias,
dev-alias_required_for_version);
 }
 dev-state = DEV_STATE_INITIALIZED;
 return 0;
 }
 
 So the save state entry hasn't been inserted yet for me to attach the
 no_migrate flag to from init :(

I see. I think that's a sign register_device_unmigratable should be
obsoleted as well by introducing no_migrate to vmstate (one day, vmsd ==
NULL could replace this flag).

BTW, ivshmem could resolve its need for dynamic no_migrate by
introducing two device types: one that is migratable (ivshmem-master)
and one that isn't (normal peer devices).

Jan



signature.asc
Description: OpenPGP digital signature


Re: seabios 0.6.1 regression

2010-11-15 Thread Kevin O'Connor
On Mon, Nov 15, 2010 at 06:09:45PM +0200, Avi Kivity wrote:
 On 11/15/2010 05:49 PM, Avi Kivity wrote:
 On 11/15/2010 05:41 PM, Avi Kivity wrote:
 
 I think it's a miscompile.
 
 out/code16.o:
  1a4:   3e  ds
  1a5:   6c  insb   (%dx),%es:(%edi)
 
 Note no 66 prefix.
 
 
 It isn't, that was random crap.  All the insb() code is 32-bit.
 
 
 Rewriting it to use inb / stos works (jecxz ; insb; loop doesn't) so
 it looks like a kernel bug in insb emulation.

Ughh.  I can revert that change on stable-0.6.1 if needed.  It sounds
like we really do want this on the main branch though for the speed
benefit it provides.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address

2010-11-15 Thread Huang Ying
Author: Max Asbock masb...@linux.vnet.ibm.com

Add command x-gpa2hva to translate guest physical address to host
virtual address. Because gpa to hva translation is not consistent, so
this command is only used for debugging.

The x-gpa2hva command provides one step in a chain of translations from
guest virtual to guest physical to host virtual to host physical. Host
physical is then used to inject a machine check error. As a
consequence the HWPOISON code on the host and the MCE injection code
in qemu-kvm are exercised.

v3:

- Rename to x-gpa2hva
- Remove QMP support, because gpa2hva is not consistent

v2:

- Add QMP support

Signed-off-by: Max Asbock masb...@linux.vnet.ibm.com
Signed-off-by: Jiajia Zheng jiajia.zh...@intel.com
Signed-off-by: Huang Ying ying.hu...@intel.com
---
 hmp-commands.hx |   15 +++
 monitor.c   |   22 ++
 2 files changed, 37 insertions(+)

--- a/monitor.c
+++ b/monitor.c
@@ -2272,6 +2272,28 @@ static void do_inject_mce(Monitor *mon,
 }
 #endif
 
+static void do_gpa2hva_print(Monitor *mon, const QObject *data)
+{
+QInt *qint;
+
+qint = qobject_to_qint(data);
+monitor_printf(mon, 0x%lx\n, (unsigned long)qint-value);
+}
+
+static int do_gpa2hva(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+target_phys_addr_t paddr;
+target_phys_addr_t size = TARGET_PAGE_SIZE;
+void *vaddr;
+
+paddr = qdict_get_int(qdict, addr);
+vaddr = cpu_physical_memory_map(paddr, size, 0);
+cpu_physical_memory_unmap(vaddr, size, 0, 0);
+*ret_data = qobject_from_jsonf(%ld, (unsigned long)vaddr);
+
+return 0;
+}
+
 static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
 const char *fdname = qdict_get_str(qdict, fdname);
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -293,6 +293,21 @@ Start gdbserver session (default @var{po
 ETEXI
 
 {
+.name   = x-gpa2hva,
+.args_type  = fmt:/,addr:l,
+.params = /fmt addr,
+.help   = translate guest physical 'addr' to host virtual 
address, only for debugging,
+.user_print = do_gpa2hva_print,
+.mhandler.cmd_new = do_gpa2hva,
+},
+
+STEXI
+...@item x-gpa2hva @var{addr}
+...@findex x-gpa2hva
+Translate guest physical @var{addr} to host virtual address, only for 
debugging.
+ETEXI
+
+{
 .name   = x,
 .args_type  = fmt:/,addr:l,
 .params = /fmt addr,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Kevin O'Connor
On Mon, Nov 15, 2010 at 03:36:25PM +0200, Gleb Natapov wrote:
 On Mon, Nov 15, 2010 at 08:26:35AM -0500, Kevin O'Connor wrote:
  On Mon, Nov 15, 2010 at 09:40:08AM +0200, Gleb Natapov wrote:
   On Sun, Nov 14, 2010 at 10:40:33PM -0500, Kevin O'Connor wrote:
Why not just return a newline separated list that is null terminated?

   Doing it like this will needlessly complicate firmware side. How do you
   know how much memory to allocate before reading device list?
  
  My preference would be for the size to be exposed via the
  QEMU_CFG_FILE_DIR selector.  (My preference would be for all objects
  in fw_cfg to have entries in QEMU_CFG_FILE_DIR describing their size
  in a reliable manner.)
  
 Will interface suggested by Blue will be good for you? The one with two
 fw_cfg ids. BOOTINDEX_LEN for len and BOOTINDEX_DATA for device list. I

I dislike how different fw_cfg objects pass the length in different
ways (eg, QEMU_CFG_E820_TABLE passes length as first 4 bytes).  This
is a common problem - I'd prefer if we could adopt one uniform way of
passing length.  I think QEMU_CFG_FILE_DIR solves this problem well.

I also have an ulterior motive here.  If the boot order is exposed as
a newline separated list via an entry in QEMU_CFG_FILE_DIR, then this
becomes free for coreboot users as well.  (On coreboot, the boot order
could be placed in a file in flash with no change to the seabios
code.)

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: VMX: Inform user about INTEL_TXT dependency

2010-11-15 Thread Wang, Shane
Avi Kivity wrote:
 On 11/14/2010 12:41 PM, Jan Kiszka wrote:
 Am 14.11.2010 11:30, Avi Kivity wrote:
  On 11/14/2010 11:18 AM, Jan Kiszka wrote:
  From: Jan Kiszkajan.kis...@siemens.com
 
  Without CONFIG_INTEL_TXT, the user must not enable this feature
  in the BIOS. Otherwise, KVM will not work. Explain this
 dependency via a kernel  log message. 
 
  Signed-off-by: Jan Kiszkajan.kis...@siemens.com
  ---
arch/x86/kvm/vmx.c |7 ++-
1 files changed, 6 insertions(+), 1 deletions(-)
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 9367abc..ebafd57 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -1306,8 +1306,13 @@ static __init int
vmx_disabled_by_bios(void)tboot_enabled())
return 1;
if (!(msr   FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
  -   !tboot_enabled()) +   !tboot_enabled()) {
  +#ifndef CONFIG_INTEL_TXT
  +printk(KERN_INFO kvm: if TXT is enabled in the
  bios,  + kvm depends on
CONFIG_INTEL_TXT\n);  +#endif return 1;
  +}
}
 
Why do we need this?
If TXT is enabled in the bios, it doesn't mean TXT is launched but TXT is 
available.
tboot_enabled() = TXT is launched. And non-CONFIG_INTEL_TXT means 
tboot_enabled() = 0.
If you enable VT in bios, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX is set.


 
  Maybe reword to an instruction?
 
  Something like
 
kvm: TXT enabled in the bios.  Either disable TXT in the bios, or
  enable CONFIG_INTEL_TXT in your kernel.
 
 
 I always get an aching head when thinking about these dependency:
 Does FEATURE_CONTROL_LOCKED 
 !FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX imply that the BIOS uses
 TXT?
No

Or could it also mean that it just disabled VT-x explicitly?
 
 Probably the latter, at least that's what we took it to mean before it
 was renamed to that long string.
Yes, it is.

 
 As CONFIG_INTEL_TXT is off, we do not know if
 tboot_enabled is off as well.

CONFIG_INTEL_TXT is off, tboot_enabled() must be off.


 
 I guess, if FEATURE_CONTROL_VMXON_ENABLED_INSIDER_SMX_YADA_YADA_YADA
 is set, then the bios wants us to enable TXT. 
Yes. In most cases if TXT is enabled in the bios, the bit is set, otherwise, it 
is clear.
FEATURE_CONTROL_VMXON_ENABLED_INSIDER_SMX and 
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX can be set at the same time. It 
doesn't mean bios wants us to enable TXT.
Here we just check the following logic in the spec for feature control MSR:

- Bit 1 enables VMXON in SMX operation. If this bit is clear, execution of
VMXON in SMX operation causes a general-protection exception.
- Bit 2 enables VMXON outside SMX operation. If this bit is clear, execution of
VMXON outside SMX operation causes a general-protection exception.

 But if both bits are
 clear, the bios really doesn't want us to play with vmx.
Yes it is.


 But it
 would be good to get Intel guidance before we pass our confusion on
 to users. 

Thanks.
Shane
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bounty offered to add Cloud Storage Mount option to a VM using Scality Droplet

2010-11-15 Thread Marc Villemade
Hi,

First of all, if this is not the correct list to send this kind of information 
to, please forgive me and i'd be very appreciative if you could let me know the 
one i should post this on.

I'm with Scality, developer of an object-based storage software platform, 
called Scality RING. A month ago, at SNIA's Software Developer Conference, we 
announced our Open-Source program, SCOP, the immediate release of our first 
open-source library, Scality Droplet, and the launch of a bounty program 
offering contributing developers grants from a $100,000 fund. More information 
on http://scop.scality.com/

One of the bounties we have designed concerns adding the option to mount an 
object storage system on a VM. The bounty for this is of $10,000.

The goal is to provide a VM running in KVM with the ability to mount an object 
storage system from a public cloud (like Amazon S3 or the many others popping 
up around the globe) or a private, on-premise object-based storage using 
Scality Droplet library.
A virtual machine is usually setup with virtual network interfaces, cpu and 
memory. A number of virtual disks can be configured, and with this extension, a 
new type of such virtual disks is available, cloud based storage. A mix of 
regular local storage and cloud based storage is possible. For example, a vm 
with a root partition on local disks and a /mnt partition on the cloud with a 
20GB maximum local cache.

For more information on the bounty, please have a look at http://bit.ly/aF4WOw
You can register for the bounty on the site as well at this address: 
http://bit.ly/9ZNxTk

And please feel free to drop us an email at dr...@scality.com if you have more 
questions, or to continue the discussion on this mailing list. We are obviously 
welcoming comments of any kind on our initiative.

Read the original Annoucement at http://bit.ly/bkNcbH

-Marc Villemade
Community Manager
Scality US




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MQ performance on other cards (cxgb3)

2010-11-15 Thread Krishna Kumar2
I had sent this mail to Michael last week - he agrees that I should
share this information on the list:

On latest net-next-2.6, virtio-net (guest-host) results are:
__
 SQ vs MQ (#txqs=8)
#  BW1  BW2 (%)  CPU1 CPU2 (%)   RCPU1   RCPU2 (%)
___
1  105774  112256 (6.1)   257  255 (-.7) 532 549 (3.1)
2  20842   30674 (47.1)   107  150 (40.1)208 279 (34.1)
4  22500   31953 (42.0)   241  409 (69.7)467 619 (32.5)
8  22416   44507 (98.5)   477  1039 (117.8)  960 1459 (51.9)
16 22605   45372 (100.7)  905  2060 (127.6)  18952962 (56.3)
24 23192   44201 (90.5)   1360 3028 (122.6)  28334437 (56.6)
32 23158   43394 (87.3)   1811 3957 (118.4)  37705936 (57.4)
40 23322   42550 (82.4)   2276 4986 (119.0)  47117417 (57.4)
48 23564   41931 (77.9)   2757 5966 (116.3)  56538896 (57.3)
64 23949   41092 (71.5)   3788 7898 (108.5)  760911826 (55.4)
80 23256   41343 (77.7)   4597 9887 (115.0)  950314801 (55.7)
96 23310   40645 (74.3)   5588 11758 (110.4) 11381   17761 (56.0)
12824095   41082 (70.5)   7587 15574 (105.2) 15029   23716 (57.8)
__
Avg:  BW: (58.3)  CPU: (110.8)  RCPU: (55.9)

It's true that average CPU% on guest is almost double that of the BW
improvement. But I don't think this is due to the patch (driver does no
synchronization, etc). To compare MQ vs SQ on a 10G card, I ran the
same test from host to remote host across cxgb3. The results are
somewhat similar:

(I changed cxgb_open on the client system to:
netif_set_real_num_tx_queues(dev, 1);
err = netif_set_real_num_rx_queues(dev, 1);
to simulate single queue (SQ))
_
cxgb3 SQ vs cxgb3 MQ
# BW1  BW2 (%)  CPU1   CPU2 (%)
_
1 83018315 (.1)5 4.66 (-6.6)
2 93959380 (-.1)  1616 (0)
4 94119414 (0)3326 (-21.2)
8 94119398 (-.1)  6062 (3.3)
16   94129413 (0)116  117 (.8)
24   94429963 (5.5) 179  198 (10.6)
32   10031  10025 (0)   230 249 (8.2)
40   995310024 (.7)  300 312 (4.0)
48   10002  10015 (.1)  351 376 (7.1)
64   10022  10024 (0)   494 515 (4.2)
80   889410011 (12.5)   537630 (17.3)
96   84659907 (17.0) 612749 (22.3)
128  7541   9617 (27.5) 760989 (30.1)
_
Avg: BW: (3.8) CPU: (14.8)

(Each case runs runs once for 60 secs)

The BW increased modestly but CPU increased much more. I assume
the change I made above to convert the driver from MQ to SQ is not
incorrect.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-15 Thread Gleb Natapov
On Mon, Nov 15, 2010 at 09:52:19PM -0500, Kevin O'Connor wrote:
 On Mon, Nov 15, 2010 at 03:36:25PM +0200, Gleb Natapov wrote:
  On Mon, Nov 15, 2010 at 08:26:35AM -0500, Kevin O'Connor wrote:
   On Mon, Nov 15, 2010 at 09:40:08AM +0200, Gleb Natapov wrote:
On Sun, Nov 14, 2010 at 10:40:33PM -0500, Kevin O'Connor wrote:
 Why not just return a newline separated list that is null terminated?
 
Doing it like this will needlessly complicate firmware side. How do you
know how much memory to allocate before reading device list?
   
   My preference would be for the size to be exposed via the
   QEMU_CFG_FILE_DIR selector.  (My preference would be for all objects
   in fw_cfg to have entries in QEMU_CFG_FILE_DIR describing their size
   in a reliable manner.)
   
  Will interface suggested by Blue will be good for you? The one with two
  fw_cfg ids. BOOTINDEX_LEN for len and BOOTINDEX_DATA for device list. I
 
 I dislike how different fw_cfg objects pass the length in different
 ways (eg, QEMU_CFG_E820_TABLE passes length as first 4 bytes).  This
 is a common problem - I'd prefer if we could adopt one uniform way of
 passing length.  I think QEMU_CFG_FILE_DIR solves this problem well.

Looking at available fw cfg option I see that _SIZE _DATA is also a
common pattern. The problem with QEMU_CFG_FILE_DIR is that we have very
little available slots right now. If we a going to require everything to
use it we better grow number of available slots considerably now while
it is easily done (no option defined above file slots yet).

I personally do not have preferences one way or the other. Blue are you
OK with using QEMU_CFG_FILE_DIR?

 I also have an ulterior motive here.  If the boot order is exposed as
 a newline separated list via an entry in QEMU_CFG_FILE_DIR, then this
 becomes free for coreboot users as well.  (On coreboot, the boot order
 could be placed in a file in flash with no change to the seabios
 code.)
 
You can define get_boot_order() function and implement it differently
for qemu and coreboot. For coreboot it will be one linear. Just call
cbfs_copyfile(bootorder). BTW why newline separation is important? 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html