Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-02-01 Thread Alexey Korolev
On 01/02/12 20:04, Michael S. Tsirkin wrote:
 On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote:
 On 31/01/12 22:43, Avi Kivity wrote:
 On 01/31/2012 11:40 AM, Avi Kivity wrote:
 On 01/27/2012 06:42 AM, Alexey Korolev wrote:
 On 27/01/12 04:12, Avi Kivity wrote:
 On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
 On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html 
 I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.

 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]

 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR

 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

 What does it mean for qemu?

 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the 
 value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So 
 qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means 
 kernel
 panic on boot, if there is another mapping in the FE00 - 
 0x
 range, which is quite common.
 Do you know why does it panic? As far as I can see
 from code at
 http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

  171pci_read_config_dword(dev, pos, l);
  172pci_write_config_dword(dev, pos, l | mask);
  173pci_read_config_dword(dev, pos, sz);
  174pci_write_config_dword(dev, pos, l);

 BAR is restored: what triggers an access between lines 172 and 174?
 Random interrupt reading the time, likely.
 Weird, what the backtrace shows is init, unrelated
 to interrupts.

 It's a bug then.  qemu doesn't undo the mapping correctly.

 If you have clear instructions, I'll try to reproduce it.

 Well the easiest way to reproduce this is:


 1. Get kernel bzImage (version  2.6.36)
 2. Apply patch to ivshmem.c

 ---
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 1aa9e3b..71f8c21 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, 
 int fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);
  
  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, 
 PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
  }
  
  static void close_guest_eventfds(IVShmemState *s, int posn)
 ---

 3. Launch qemu with a command like that

 /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
 1,socket=1,cores=1,threads=1 -name centos54 -uuid
 d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=readline -rtc
 base=utc -drive 
 file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
 -drive
 file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
  -device
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
 file,id=charserial0,path=/home/alexey/cent54.log -device
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us 
 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 
 --device ivshmem,size=32,shm=shm -kernel bzImage -append
 root=/dev/hda1 console=ttyS0,115200n8 console=tty0

 in other words add: --device ivshmem,size=32,shm=shm

 That is all.

 Note: it won't necessary cause panic message on some kernels it just 
 hangs or reboots.

 In fact qemu segfaults for me, since registering a ram region not on a
 page boundary is broken.  This happens when the ivshmem bar is split by
 the hpet region, which is less than page long.

 Happens only with qemu-kvm for some reason.  Two separate bugs.

 Well it's quite possible that there are two separate problems.

 1. Page boundary related
 2. Another is related to invalid mapping, when we request region size on 
 64bit BAR.
 The patch sent previously addresses this sizing behaviour, and so
 

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-31 Thread Avi Kivity
On 01/27/2012 06:42 AM, Alexey Korolev wrote:
 On 27/01/12 04:12, Avi Kivity wrote:
  On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
  On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
  On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
  On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
  Hi, 
  In this post
  http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
  mentioned about the issues when 64Bit PCI BAR is present and 32bit
  address range is selected for it.
  The issue affects all recent qemu releases and all
  old and recent guest Linux kernel versions.
 
  We've done some investigations. Let me explain what happens.
  Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
  0xF200]
 
  When Linux guest starts it does PCI bus enumeration.
  The OS enumerates 64BIT bars using the following procedure.
  1. Write all FF's to lower half of 64bit BAR
  2. Write address back to lower half of 64bit BAR
  3. Write all FF's to higher half of 64bit BAR
  4. Write address back to higher half of 64bit BAR
 
  Linux code is here: 
  http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
 
  What does it mean for qemu?
 
  At step 1. qemu pci_default_write_config() recevies all FFs for lower
  part of the 64bit BAR. Then it applies the mask and converts the value
  to All FF's - size + 1 (FE00 if size is 32MB).
  Then pci_bar_address() checks if BAR address is valid. Since it is a
  64bit bar it reads 0xFE00 - this address is valid. So qemu
  updates topology and sends request to update mappings in KVM with new
  range for the 64bit BAR FE00 - 0x. This usually means kernel
  panic on boot, if there is another mapping in the FE00 - 0x
  range, which is quite common.
  Do you know why does it panic? As far as I can see
  from code at
  http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
 
   171pci_read_config_dword(dev, pos, l);
   172pci_write_config_dword(dev, pos, l | mask);
   173pci_read_config_dword(dev, pos, sz);
   174pci_write_config_dword(dev, pos, l);
 
  BAR is restored: what triggers an access between lines 172 and 174?
  Random interrupt reading the time, likely.
  Weird, what the backtrace shows is init, unrelated
  to interrupts.
 
  It's a bug then.  qemu doesn't undo the mapping correctly.
 
  If you have clear instructions, I'll try to reproduce it.
 
 Well the easiest way to reproduce this is:


 1. Get kernel bzImage (version  2.6.36)
 2. Apply patch to ivshmem.c

 ---
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 1aa9e3b..71f8c21 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
 fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);
  
  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, 
 PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
  }
  
  static void close_guest_eventfds(IVShmemState *s, int posn)
 ---

 3. Launch qemu with a command like that

 /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
 1,socket=1,cores=1,threads=1 -name centos54 -uuid
 d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=readline -rtc
 base=utc -drive 
 file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
 file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
  -device
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
 file,id=charserial0,path=/home/alexey/cent54.log -device
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga 
 cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 
 --device ivshmem,size=32,shm=shm -kernel bzImage -append
 root=/dev/hda1 console=ttyS0,115200n8 console=tty0

 in other words add: --device ivshmem,size=32,shm=shm

 That is all.

 Note: it won't necessary cause panic message on some kernels it just hangs or 
 reboots.


In fact qemu segfaults for me, since registering a ram region not on a
page boundary is broken.  This happens when the ivshmem bar is split by
the hpet region, which is less than page long.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-31 Thread Avi Kivity
On 01/31/2012 11:40 AM, Avi Kivity wrote:
 On 01/27/2012 06:42 AM, Alexey Korolev wrote:
  On 27/01/12 04:12, Avi Kivity wrote:
   On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
   On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
   On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
   On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
   Hi, 
   In this post
   http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html 
   I've
   mentioned about the issues when 64Bit PCI BAR is present and 32bit
   address range is selected for it.
   The issue affects all recent qemu releases and all
   old and recent guest Linux kernel versions.
  
   We've done some investigations. Let me explain what happens.
   Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
   0xF200]
  
   When Linux guest starts it does PCI bus enumeration.
   The OS enumerates 64BIT bars using the following procedure.
   1. Write all FF's to lower half of 64bit BAR
   2. Write address back to lower half of 64bit BAR
   3. Write all FF's to higher half of 64bit BAR
   4. Write address back to higher half of 64bit BAR
  
   Linux code is here: 
   http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
  
   What does it mean for qemu?
  
   At step 1. qemu pci_default_write_config() recevies all FFs for lower
   part of the 64bit BAR. Then it applies the mask and converts the value
   to All FF's - size + 1 (FE00 if size is 32MB).
   Then pci_bar_address() checks if BAR address is valid. Since it is a
   64bit bar it reads 0xFE00 - this address is valid. So qemu
   updates topology and sends request to update mappings in KVM with new
   range for the 64bit BAR FE00 - 0x. This usually means 
   kernel
   panic on boot, if there is another mapping in the FE00 - 
   0x
   range, which is quite common.
   Do you know why does it panic? As far as I can see
   from code at
   http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
  
171pci_read_config_dword(dev, pos, l);
172pci_write_config_dword(dev, pos, l | mask);
173pci_read_config_dword(dev, pos, sz);
174pci_write_config_dword(dev, pos, l);
  
   BAR is restored: what triggers an access between lines 172 and 174?
   Random interrupt reading the time, likely.
   Weird, what the backtrace shows is init, unrelated
   to interrupts.
  
   It's a bug then.  qemu doesn't undo the mapping correctly.
  
   If you have clear instructions, I'll try to reproduce it.
  
  Well the easiest way to reproduce this is:
 
 
  1. Get kernel bzImage (version  2.6.36)
  2. Apply patch to ivshmem.c
 
  ---
  diff --git a/hw/ivshmem.c b/hw/ivshmem.c
  index 1aa9e3b..71f8c21 100644
  --- a/hw/ivshmem.c
  +++ b/hw/ivshmem.c
  @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, 
  int fd) {
   memory_region_add_subregion(s-bar, 0, s-ivshmem);
   
   /* region for shared memory */
  -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
  +pci_register_bar(s-dev, 2, 
  PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
   }
   
   static void close_guest_eventfds(IVShmemState *s, int posn)
  ---
 
  3. Launch qemu with a command like that
 
  /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
  1,socket=1,cores=1,threads=1 -name centos54 -uuid
  d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait
   -mon chardev=charmonitor,id=monitor,mode=readline -rtc
  base=utc -drive 
  file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
  ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
  -drive
  file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
   -device
  ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
  file,id=charserial0,path=/home/alexey/cent54.log -device
  isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us 
  -vga cirrus -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 
  --device ivshmem,size=32,shm=shm -kernel bzImage -append
  root=/dev/hda1 console=ttyS0,115200n8 console=tty0
 
  in other words add: --device ivshmem,size=32,shm=shm
 
  That is all.
 
  Note: it won't necessary cause panic message on some kernels it just hangs 
  or reboots.
 

 In fact qemu segfaults for me, since registering a ram region not on a
 page boundary is broken.  This happens when the ivshmem bar is split by
 the hpet region, which is less than page long.


Happens only with qemu-kvm for some reason.  Two separate bugs.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-31 Thread Avi Kivity
On 01/27/2012 06:42 AM, Alexey Korolev wrote:
 On 27/01/12 04:12, Avi Kivity wrote:
  On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
  On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
  On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
  On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
  Hi, 
  In this post
  http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
  mentioned about the issues when 64Bit PCI BAR is present and 32bit
  address range is selected for it.
  The issue affects all recent qemu releases and all
  old and recent guest Linux kernel versions.
 
  We've done some investigations. Let me explain what happens.
  Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
  0xF200]
 
  When Linux guest starts it does PCI bus enumeration.
  The OS enumerates 64BIT bars using the following procedure.
  1. Write all FF's to lower half of 64bit BAR
  2. Write address back to lower half of 64bit BAR
  3. Write all FF's to higher half of 64bit BAR
  4. Write address back to higher half of 64bit BAR
 
  Linux code is here: 
  http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
 
  What does it mean for qemu?
 
  At step 1. qemu pci_default_write_config() recevies all FFs for lower
  part of the 64bit BAR. Then it applies the mask and converts the value
  to All FF's - size + 1 (FE00 if size is 32MB).
  Then pci_bar_address() checks if BAR address is valid. Since it is a
  64bit bar it reads 0xFE00 - this address is valid. So qemu
  updates topology and sends request to update mappings in KVM with new
  range for the 64bit BAR FE00 - 0x. This usually means kernel
  panic on boot, if there is another mapping in the FE00 - 0x
  range, which is quite common.
  Do you know why does it panic? As far as I can see
  from code at
  http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
 
   171pci_read_config_dword(dev, pos, l);
   172pci_write_config_dword(dev, pos, l | mask);
   173pci_read_config_dword(dev, pos, sz);
   174pci_write_config_dword(dev, pos, l);
 
  BAR is restored: what triggers an access between lines 172 and 174?
  Random interrupt reading the time, likely.
  Weird, what the backtrace shows is init, unrelated
  to interrupts.
 
  It's a bug then.  qemu doesn't undo the mapping correctly.
 
  If you have clear instructions, I'll try to reproduce it.
 
 Well the easiest way to reproduce this is:


 1. Get kernel bzImage (version  2.6.36)
 2. Apply patch to ivshmem.c



I have some patches that fix this, but they're very hacky since they're
dealing with the old and rotten core.  I much prefer to let this resolve
itself in my continuing rewrite.  Is this an urgent problem for you or
can you live with this for a while?

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-31 Thread Alexey Korolev
On 31/01/12 22:43, Avi Kivity wrote:
 On 01/31/2012 11:40 AM, Avi Kivity wrote:
 On 01/27/2012 06:42 AM, Alexey Korolev wrote:
 On 27/01/12 04:12, Avi Kivity wrote:
 On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
 On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.

 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]

 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR

 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

 What does it mean for qemu?

 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means 
 kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.
 Do you know why does it panic? As far as I can see
 from code at
 http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

  171pci_read_config_dword(dev, pos, l);
  172pci_write_config_dword(dev, pos, l | mask);
  173pci_read_config_dword(dev, pos, sz);
  174pci_write_config_dword(dev, pos, l);

 BAR is restored: what triggers an access between lines 172 and 174?
 Random interrupt reading the time, likely.
 Weird, what the backtrace shows is init, unrelated
 to interrupts.

 It's a bug then.  qemu doesn't undo the mapping correctly.

 If you have clear instructions, I'll try to reproduce it.

 Well the easiest way to reproduce this is:


 1. Get kernel bzImage (version  2.6.36)
 2. Apply patch to ivshmem.c

 ---
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 1aa9e3b..71f8c21 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, 
 int fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);
  
  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, 
 PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
  }
  
  static void close_guest_eventfds(IVShmemState *s, int posn)
 ---

 3. Launch qemu with a command like that

 /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
 1,socket=1,cores=1,threads=1 -name centos54 -uuid
 d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=readline -rtc
 base=utc -drive 
 file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
 -drive
 file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
  -device
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
 file,id=charserial0,path=/home/alexey/cent54.log -device
 isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us 
 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 
 --device ivshmem,size=32,shm=shm -kernel bzImage -append
 root=/dev/hda1 console=ttyS0,115200n8 console=tty0

 in other words add: --device ivshmem,size=32,shm=shm

 That is all.

 Note: it won't necessary cause panic message on some kernels it just hangs 
 or reboots.

 In fact qemu segfaults for me, since registering a ram region not on a
 page boundary is broken.  This happens when the ivshmem bar is split by
 the hpet region, which is less than page long.

 Happens only with qemu-kvm for some reason.  Two separate bugs.

Well it's quite possible that there are two separate problems.

1. Page boundary related
2. Another is related to invalid mapping, when we request region size on 64bit 
BAR.
The patch sent previously addresses this sizing behaviour, and so avoids the 
mapping error.
Not sure if it is valid to temporary occupy completely wrong memory region when 
we request size 

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-31 Thread Michael S. Tsirkin
On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote:
 On 31/01/12 22:43, Avi Kivity wrote:
  On 01/31/2012 11:40 AM, Avi Kivity wrote:
  On 01/27/2012 06:42 AM, Alexey Korolev wrote:
  On 27/01/12 04:12, Avi Kivity wrote:
  On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
  On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
  On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
  On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
  Hi, 
  In this post
  http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html 
  I've
  mentioned about the issues when 64Bit PCI BAR is present and 32bit
  address range is selected for it.
  The issue affects all recent qemu releases and all
  old and recent guest Linux kernel versions.
 
  We've done some investigations. Let me explain what happens.
  Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
  0xF200]
 
  When Linux guest starts it does PCI bus enumeration.
  The OS enumerates 64BIT bars using the following procedure.
  1. Write all FF's to lower half of 64bit BAR
  2. Write address back to lower half of 64bit BAR
  3. Write all FF's to higher half of 64bit BAR
  4. Write address back to higher half of 64bit BAR
 
  Linux code is here: 
  http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
 
  What does it mean for qemu?
 
  At step 1. qemu pci_default_write_config() recevies all FFs for lower
  part of the 64bit BAR. Then it applies the mask and converts the 
  value
  to All FF's - size + 1 (FE00 if size is 32MB).
  Then pci_bar_address() checks if BAR address is valid. Since it is a
  64bit bar it reads 0xFE00 - this address is valid. So 
  qemu
  updates topology and sends request to update mappings in KVM with new
  range for the 64bit BAR FE00 - 0x. This usually means 
  kernel
  panic on boot, if there is another mapping in the FE00 - 
  0x
  range, which is quite common.
  Do you know why does it panic? As far as I can see
  from code at
  http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
 
   171pci_read_config_dword(dev, pos, l);
   172pci_write_config_dword(dev, pos, l | mask);
   173pci_read_config_dword(dev, pos, sz);
   174pci_write_config_dword(dev, pos, l);
 
  BAR is restored: what triggers an access between lines 172 and 174?
  Random interrupt reading the time, likely.
  Weird, what the backtrace shows is init, unrelated
  to interrupts.
 
  It's a bug then.  qemu doesn't undo the mapping correctly.
 
  If you have clear instructions, I'll try to reproduce it.
 
  Well the easiest way to reproduce this is:
 
 
  1. Get kernel bzImage (version  2.6.36)
  2. Apply patch to ivshmem.c
 
  ---
  diff --git a/hw/ivshmem.c b/hw/ivshmem.c
  index 1aa9e3b..71f8c21 100644
  --- a/hw/ivshmem.c
  +++ b/hw/ivshmem.c
  @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, 
  int fd) {
   memory_region_add_subregion(s-bar, 0, s-ivshmem);
   
   /* region for shared memory */
  -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
  +pci_register_bar(s-dev, 2, 
  PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
   }
   
   static void close_guest_eventfds(IVShmemState *s, int posn)
  ---
 
  3. Launch qemu with a command like that
 
  /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
  1,socket=1,cores=1,threads=1 -name centos54 -uuid
  d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait
   -mon chardev=charmonitor,id=monitor,mode=readline -rtc
  base=utc -drive 
  file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
  ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
  -drive
  file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
   -device
  ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
  file,id=charserial0,path=/home/alexey/cent54.log -device
  isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us 
  -vga cirrus -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 
  --device ivshmem,size=32,shm=shm -kernel bzImage -append
  root=/dev/hda1 console=ttyS0,115200n8 console=tty0
 
  in other words add: --device ivshmem,size=32,shm=shm
 
  That is all.
 
  Note: it won't necessary cause panic message on some kernels it just 
  hangs or reboots.
 
  In fact qemu segfaults for me, since registering a ram region not on a
  page boundary is broken.  This happens when the ivshmem bar is split by
  the hpet region, which is less than page long.
 
  Happens only with qemu-kvm for some reason.  Two separate bugs.
 
 Well it's quite possible that there are two separate problems.
 
 1. Page boundary related
 2. Another is related to invalid mapping, when we request region size on 
 

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Michael S. Tsirkin
On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.
 
 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]
 
 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR
 
 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
 
 What does it mean for qemu?
 
 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.

Do you know why does it panic? As far as I can see
from code at
http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

 171pci_read_config_dword(dev, pos, l);
 172pci_write_config_dword(dev, pos, l | mask);
 173pci_read_config_dword(dev, pos, sz);
 174pci_write_config_dword(dev, pos, l);

BAR is restored: what triggers an access between lines 172 and 174?


Also, what you describe happens on a 32 bit BAR in the same way, no?

-- 
MST



Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Avi Kivity
On 01/26/2012 05:19 AM, Alexey Korolev wrote:
 If you apply the following patch and add to qemu command: --device 
 ivshmem,size=32,shm=shm
 ---
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 1aa9e3b..71f8c21 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
 fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);
  
  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, 
 PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
  }
  
  static void close_guest_eventfds(IVShmemState *s, int posn)
 ---

 You can get the following bootup log:


 Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 
 console=tty0)
 Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 
 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
 BIOS-provided physical RAM map:
  BIOS-e820:  - 0009f400 (usable)
  BIOS-e820: 0009f400 - 000a (reserved)
  BIOS-e820: 000f - 0010 (reserved)
  BIOS-e820: 0010 - 7fffd000 (usable)
  BIOS-e820: 7fffd000 - 8000 (reserved)
  BIOS-e820: feffc000 - ff00 (reserved)
  BIOS-e820: fffc - 0001 (reserved)
 DMI 2.4 present.
 No NUMA configuration found
 Faking a node at -7fffd000
 Bootmem setup node 0 -7fffd000
 ACPI: PM-Timer IO Port: 0xb008
 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
 Processor #0 6:2 APIC version 17
 ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
 IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23
 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
 Setting APIC routing to physical flat
 ACPI: HPET id: 0x8086a201 base: 0xfed0
 Using ACPI (MADT) for SMP configuration information
 Allocating PCI resources starting at 8800 (gap: 8000:7effc000)
 SMP: Allowing 1 CPUs, 0 hotplug CPUs
 Built 1 zonelists.  Total pages: 515393
 Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
 Initializing CPU#0
 PID hash table entries: 4096 (order: 12, 32768 bytes)
 time.c: Using 100.00 MHz WALL HPET GTOD HPET/TSC timer.
 time.c: Detected 2500.081 MHz processor.
 Console: colour VGA+ 80x25
 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
 Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
 Checking aperture...
 Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 
 2266k data, 204k init)
 Calibrating delay using timer specific routine.. 5030.07 BogoMIPS 
 (lpj=10060155)
 Mount-cache hash table entries: 256
 CPU: L1 I cache: 32K, L1 D cache: 32K
 CPU: L2 cache: 4096K
 MCE: warning: using only 10 banks
 SMP alternatives: switching to UP code
 Freeing SMP alternatives: 36k freed
 ACPI: Core revision 20060707
 activating NMI Watchdog ... done.
 Using local APIC timer interrupts.
 result 62501506
 Detected 62.501 MHz APIC timer.
 Brought up 1 CPUs
 testing NMI watchdog ... OK.
 migration_cost=0
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 PCI: Using configuration type 1
 ACPI: Interpreter enabled
 ACPI: Using IOAPIC for interrupt routing
 ACPI: PCI Root Bridge [PCI0] (:00)
 ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
 PCI quirk: region b000-b03f claimed by PIIX4 ACPI
 PCI quirk: region b100-b10f claimed by PIIX4 SMB
 ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
 ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
 ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
 ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
 ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled.
 SCSI subsystem initialized
 usbcore: registered new driver usbfs
 usbcore: registered new driver hub
 PCI: Using ACPI for IRQ routing
 PCI: If a device doesn't work, try pci=routeirq.  If it helps, post a report
 divide error:  [1] SMP
 CPU 0
 Modules linked in:
 Pid: 1, comm: swapper Not tainted 2.6.18 #3
 RIP: 0010:[80388299]  [80388299] hpet_alloc+0x12a/0x30c
 RSP: :81007e3a1e20  EFLAGS: 00010246
 RAX: 00038d7ea4c68000 RBX:  RCX: 
 RDX:  RSI:  RDI: 8057fc2b
 RBP: 81007e2e28c0 R08: 8055b492 R09: 81007e39f510
 R10: 81007e3a1e50 R11: 0098 R12: 81007e3a1e50
 R13:  R14: ff5fe000 R15: 
 FS:  () GS:807fc000() knlGS:
 CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 CR2:  CR3: 00201000 CR4: 

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Avi Kivity
On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
  Hi, 
  In this post
  http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
  mentioned about the issues when 64Bit PCI BAR is present and 32bit
  address range is selected for it.
  The issue affects all recent qemu releases and all
  old and recent guest Linux kernel versions.
  
  We've done some investigations. Let me explain what happens.
  Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
  0xF200]
  
  When Linux guest starts it does PCI bus enumeration.
  The OS enumerates 64BIT bars using the following procedure.
  1. Write all FF's to lower half of 64bit BAR
  2. Write address back to lower half of 64bit BAR
  3. Write all FF's to higher half of 64bit BAR
  4. Write address back to higher half of 64bit BAR
  
  Linux code is here: 
  http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
  
  What does it mean for qemu?
  
  At step 1. qemu pci_default_write_config() recevies all FFs for lower
  part of the 64bit BAR. Then it applies the mask and converts the value
  to All FF's - size + 1 (FE00 if size is 32MB).
  Then pci_bar_address() checks if BAR address is valid. Since it is a
  64bit bar it reads 0xFE00 - this address is valid. So qemu
  updates topology and sends request to update mappings in KVM with new
  range for the 64bit BAR FE00 - 0x. This usually means kernel
  panic on boot, if there is another mapping in the FE00 - 0x
  range, which is quite common.

 Do you know why does it panic? As far as I can see
 from code at
 http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

  171pci_read_config_dword(dev, pos, l);
  172pci_write_config_dword(dev, pos, l | mask);
  173pci_read_config_dword(dev, pos, sz);
  174pci_write_config_dword(dev, pos, l);

 BAR is restored: what triggers an access between lines 172 and 174?

Random interrupt reading the time, likely.

 Also, what you describe happens on a 32 bit BAR in the same way, no?

So it seems.  Btw, is this procedure correct for sizing a BAR which is
larger than 4GB?

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Michael S. Tsirkin
On Thu, Jan 26, 2012 at 03:51:06PM +0200, Avi Kivity wrote:
  Please look at HPET lines. HPET is mapped to 0xfed0.
  Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the 
  range from 0xfe00 - 0x.
  It overlaps HPET memory. When Linux does late_hpet init, it finds garbage 
  and this is causing panic.
 
 
 Let me see if I get this right: during BAR sizing, the guest sets the
 BAR to ~1, which means 4GB-32MB - 4GB, which overlaps the HPET.  If so,
 that's expected behaviour.

Yes BAR sizing temporarily sets the BAR to an invalid value then
restores it.  What I don't understand is how come something accesses the
HPET range in between.

 If the guest doesn't want this memory there,
 it should disable mmio.

Recent kernels do this for most devices, but not for
platform devices.

 -- 
 error compiling committee.c: too many arguments to function



Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Avi Kivity
On 01/26/2012 04:05 PM, Michael S. Tsirkin wrote:
  
  Let me see if I get this right: during BAR sizing, the guest sets the
  BAR to ~1, which means 4GB-32MB - 4GB, which overlaps the HPET.  If so,
  that's expected behaviour.

 Yes BAR sizing temporarily sets the BAR to an invalid value then
 restores it.  What I don't understand is how come something accesses the
 HPET range in between.

Interrupt - read time.

  If the guest doesn't want this memory there,
  it should disable mmio.

 Recent kernels do this for most devices, but not for
 platform devices.

Then they are vulnerable to this issue.

The i440fx spec states that the entire top-of-memory range to 4GB if
forwarded to PCI, so qemu appears to be correct here.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Michael S. Tsirkin
On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
 On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
  On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
   Hi, 
   In this post
   http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
   mentioned about the issues when 64Bit PCI BAR is present and 32bit
   address range is selected for it.
   The issue affects all recent qemu releases and all
   old and recent guest Linux kernel versions.
   
   We've done some investigations. Let me explain what happens.
   Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
   0xF200]
   
   When Linux guest starts it does PCI bus enumeration.
   The OS enumerates 64BIT bars using the following procedure.
   1. Write all FF's to lower half of 64bit BAR
   2. Write address back to lower half of 64bit BAR
   3. Write all FF's to higher half of 64bit BAR
   4. Write address back to higher half of 64bit BAR
   
   Linux code is here: 
   http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
   
   What does it mean for qemu?
   
   At step 1. qemu pci_default_write_config() recevies all FFs for lower
   part of the 64bit BAR. Then it applies the mask and converts the value
   to All FF's - size + 1 (FE00 if size is 32MB).
   Then pci_bar_address() checks if BAR address is valid. Since it is a
   64bit bar it reads 0xFE00 - this address is valid. So qemu
   updates topology and sends request to update mappings in KVM with new
   range for the 64bit BAR FE00 - 0x. This usually means kernel
   panic on boot, if there is another mapping in the FE00 - 0x
   range, which is quite common.
 
  Do you know why does it panic? As far as I can see
  from code at
  http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
 
   171pci_read_config_dword(dev, pos, l);
   172pci_write_config_dword(dev, pos, l | mask);
   173pci_read_config_dword(dev, pos, sz);
   174pci_write_config_dword(dev, pos, l);
 
  BAR is restored: what triggers an access between lines 172 and 174?
 
 Random interrupt reading the time, likely.

Weird, what the backtrace shows is init, unrelated
to interrupts.

  Also, what you describe happens on a 32 bit BAR in the same way, no?
 
 So it seems.  Btw, is this procedure correct for sizing a BAR which is
 larger than 4GB?

There's more code sizing 64 bit BARs, but generally
software is allowed to write any junk into enabled BARs
as long as there aren't any memory accesses.

 -- 
 error compiling committee.c: too many arguments to function



Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Avi Kivity
On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
  On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
   On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
Hi, 
In this post
http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
mentioned about the issues when 64Bit PCI BAR is present and 32bit
address range is selected for it.
The issue affects all recent qemu releases and all
old and recent guest Linux kernel versions.

We've done some investigations. Let me explain what happens.
Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
0xF200]

When Linux guest starts it does PCI bus enumeration.
The OS enumerates 64BIT bars using the following procedure.
1. Write all FF's to lower half of 64bit BAR
2. Write address back to lower half of 64bit BAR
3. Write all FF's to higher half of 64bit BAR
4. Write address back to higher half of 64bit BAR

Linux code is here: 
http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

What does it mean for qemu?

At step 1. qemu pci_default_write_config() recevies all FFs for lower
part of the 64bit BAR. Then it applies the mask and converts the value
to All FF's - size + 1 (FE00 if size is 32MB).
Then pci_bar_address() checks if BAR address is valid. Since it is a
64bit bar it reads 0xFE00 - this address is valid. So qemu
updates topology and sends request to update mappings in KVM with new
range for the 64bit BAR FE00 - 0x. This usually means kernel
panic on boot, if there is another mapping in the FE00 - 0x
range, which is quite common.
  
   Do you know why does it panic? As far as I can see
   from code at
   http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
  
171pci_read_config_dword(dev, pos, l);
172pci_write_config_dword(dev, pos, l | mask);
173pci_read_config_dword(dev, pos, sz);
174pci_write_config_dword(dev, pos, l);
  
   BAR is restored: what triggers an access between lines 172 and 174?
  
  Random interrupt reading the time, likely.

 Weird, what the backtrace shows is init, unrelated
 to interrupts.


It's a bug then.  qemu doesn't undo the mapping correctly.

If you have clear instructions, I'll try to reproduce it.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Alexey Korolev
On 27/01/12 03:36, Michael S. Tsirkin wrote:
 On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
 On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.

 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]

 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR

 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

 What does it mean for qemu?

 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.
 Do you know why does it panic? As far as I can see
 from code at
 http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

  171pci_read_config_dword(dev, pos, l);
  172pci_write_config_dword(dev, pos, l | mask);
  173pci_read_config_dword(dev, pos, sz);
  174pci_write_config_dword(dev, pos, l);

 BAR is restored: what triggers an access between lines 172 and 174?
 Random interrupt reading the time, likely.
 Weird, what the backtrace shows is init, unrelated
 to interrupts.
Yes, it fails during ordered late_hpet_init() call. Which is a part of kernel
fs_initcall list. So no time interrupts are involved here.
Basically once the region is programmed (even temporary), area behind it is 
lost.
I mean if we even temporary overlap the HPET region with our BAR, backed by 
host user space memory, and
commit a mapping request to kvm, the information about the old mappings 
belonging to HPET are lost.
Even if we did this for short period of time, and later restore the original 
address.

 Also, what you describe happens on a 32 bit BAR in the same way, no?
 So it seems.  Btw, is this procedure correct for sizing a BAR which is
 larger than 4GB?
 There's more code sizing 64 bit BARs, but generally
 software is allowed to write any junk into enabled BARs
 as long as there aren't any memory accesses.





Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-26 Thread Alexey Korolev
On 27/01/12 04:12, Avi Kivity wrote:
 On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
 On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.

 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]

 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR

 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

 What does it mean for qemu?

 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.
 Do you know why does it panic? As far as I can see
 from code at
 http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

  171pci_read_config_dword(dev, pos, l);
  172pci_write_config_dword(dev, pos, l | mask);
  173pci_read_config_dword(dev, pos, sz);
  174pci_write_config_dword(dev, pos, l);

 BAR is restored: what triggers an access between lines 172 and 174?
 Random interrupt reading the time, likely.
 Weird, what the backtrace shows is init, unrelated
 to interrupts.

 It's a bug then.  qemu doesn't undo the mapping correctly.

 If you have clear instructions, I'll try to reproduce it.

Well the easiest way to reproduce this is:


1. Get kernel bzImage (version  2.6.36)
2. Apply patch to ivshmem.c

---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
fd) {
 memory_region_add_subregion(s-bar, 0, s-ivshmem);
 
 /* region for shared memory */
-pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
+pci_register_bar(s-dev, 2, 
PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
---

3. Launch qemu with a command like that

/usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 
1,socket=1,cores=1,threads=1 -name centos54 -uuid
d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc
base=utc -drive 
file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
 -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev 
file,id=charserial0,path=/home/alexey/cent54.log -device
isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga 
cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device 
ivshmem,size=32,shm=shm -kernel bzImage -append
root=/dev/hda1 console=ttyS0,115200n8 console=tty0

in other words add: --device ivshmem,size=32,shm=shm

That is all.

Note: it won't necessary cause panic message on some kernels it just hangs or 
reboots.







Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-25 Thread Michael S. Tsirkin
On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.
 
 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]
 
 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR
 
 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
 
 What does it mean for qemu?
 
 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.
 
 
 The following patch fixes the issue. It affects 64bit PCI BAR's only. 
 The idea of the patch is: we introduce the states for low and high BARs
 whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY
 - someone has requested size of one half of the 64bit PCI BAR,
 PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the
 address of one half of the 64bit PCI BAR. The state becomes BAR_VALID
 when both halfs are in the same state. We ignore BAR value until both
 states become BAR_VALID
 
 Note: Please use the latest Seabios version (commit
 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions
 didn't initialize high part of 64bit BAR. 
 
 The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server
 
 Signed-off-by: Alexey Korolev alexey.koro...@endace.com

Interesting. However, looking at guest code,
I note that memory and io are disabled
during BAR sizing unless mmio always on is set.
pci_bar_address should return PCI_BAR_UNMAPPED
in this case, and we should never map this BAR
until it's enabled. What's going on?





Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-25 Thread Michael S. Tsirkin
On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.
 

For testing, I applied the following patch to qemu,
converting msix bar to 64 bit.
Guest did not seem to crash.
I booted Fedora Live CD 32 bit guest on a 32 bit host
to level 3 without crash, and verified that
the BAR is a 64 bit one, and that I got assigned an address
at fe00.
command line I used:
qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
-device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
-cdrom Fedora-15-i686-Live-LXDE.iso

At boot prompt type tab and add '3' to kernel command line
to have guest boot into a fast text console instead
of a graphical one which is very slow.

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 2ac87ea..5271394 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 memory_region_init(proxy-msix_bar, virtio-msix, 4096);
 if (vdev-nvectors  !msix_init(proxy-pci_dev, vdev-nvectors,
  proxy-msix_bar, 1, 0)) {
-pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
+pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
+PCI_BASE_ADDRESS_MEM_TYPE_64,
  proxy-msix_bar);
 } else
 vdev-nvectors = 0;



Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-25 Thread Alex Williamson
On Wed, 2012-01-25 at 17:38 +0200, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
  Hi, 
  In this post
  http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
  mentioned about the issues when 64Bit PCI BAR is present and 32bit
  address range is selected for it.
  The issue affects all recent qemu releases and all
  old and recent guest Linux kernel versions.
  
 
 For testing, I applied the following patch to qemu,
 converting msix bar to 64 bit.
 Guest did not seem to crash.
 I booted Fedora Live CD 32 bit guest on a 32 bit host
 to level 3 without crash, and verified that
 the BAR is a 64 bit one, and that I got assigned an address
 at fe00.
 command line I used:
 qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
 file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
 -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
 -cdrom Fedora-15-i686-Live-LXDE.iso
 
 At boot prompt type tab and add '3' to kernel command line
 to have guest boot into a fast text console instead
 of a graphical one which is very slow.
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 2ac87ea..5271394 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
 *vdev)
  memory_region_init(proxy-msix_bar, virtio-msix, 4096);
  if (vdev-nvectors  !msix_init(proxy-pci_dev, vdev-nvectors,
   proxy-msix_bar, 1, 0)) {
 -pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
 +pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
 +  PCI_BASE_ADDRESS_MEM_TYPE_64,
   proxy-msix_bar);
  } else
  vdev-nvectors = 0;
 

I was also able to add MEM64 BARs to device assignment pretty trivially
and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
to an fexx address and it works.

Alex




Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-25 Thread Alexey Korolev
Hi Alex and Michael
 For testing, I applied the following patch to qemu,
 converting msix bar to 64 bit.
 Guest did not seem to crash.
 I booted Fedora Live CD 32 bit guest on a 32 bit host
 to level 3 without crash, and verified that
 the BAR is a 64 bit one, and that I got assigned an address
 at fe00.
 command line I used:
 qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
 file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
 -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
 -cdrom Fedora-15-i686-Live-LXDE.iso

 At boot prompt type tab and add '3' to kernel command line
 to have guest boot into a fast text console instead
 of a graphical one which is very slow.

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 2ac87ea..5271394 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
 *vdev)
  memory_region_init(proxy-msix_bar, virtio-msix, 4096);
  if (vdev-nvectors  !msix_init(proxy-pci_dev, vdev-nvectors,
   proxy-msix_bar, 1, 0)) {
 -pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
 +pci_register_bar(proxy-pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
 + PCI_BASE_ADDRESS_MEM_TYPE_64,
   proxy-msix_bar);
  } else
  vdev-nvectors = 0;

 I was also able to add MEM64 BARs to device assignment pretty trivially
 and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
 to an fexx address and it works.

 Alex


I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 
2.6.18 guest for example.

The msix case is not failing because:
1. Buffer size is just 4KB - it will reprogram range from 0xE000-0x 
(it doesn't overlap critical resources to cause immediate panic)
2. The memory_region_init -function doesn't create backing user memory region. 
So kvm does nothing about remapping in this case.

If you apply the following patch and add to qemu command: --device 
ivshmem,size=32,shm=shm
---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
fd) {
 memory_region_add_subregion(s-bar, 0, s-ivshmem);
 
 /* region for shared memory */
-pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
+pci_register_bar(s-dev, 2, 
PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, s-bar)
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
---

You can get the following bootup log:


Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0)
Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 
(Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f400 (usable)
 BIOS-e820: 0009f400 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fffd000 (usable)
 BIOS-e820: 7fffd000 - 8000 (reserved)
 BIOS-e820: feffc000 - ff00 (reserved)
 BIOS-e820: fffc - 0001 (reserved)
DMI 2.4 present.
No NUMA configuration found
Faking a node at -7fffd000
Bootmem setup node 0 -7fffd000
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:2 APIC version 17
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Setting APIC routing to physical flat
ACPI: HPET id: 0x8086a201 base: 0xfed0
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 8800 (gap: 8000:7effc000)
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 515393
Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Using 100.00 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2500.081 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k 
data, 204k init)
Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D 

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present

2012-01-25 Thread Alexey Korolev
On 26/01/12 01:51, Michael S. Tsirkin wrote:
 On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
 Hi, 
 In this post
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
 mentioned about the issues when 64Bit PCI BAR is present and 32bit
 address range is selected for it.
 The issue affects all recent qemu releases and all
 old and recent guest Linux kernel versions.

 We've done some investigations. Let me explain what happens.
 Assume we have 64bit BAR with size 32MB mapped at [0xF000 -
 0xF200]

 When Linux guest starts it does PCI bus enumeration.
 The OS enumerates 64BIT bars using the following procedure.
 1. Write all FF's to lower half of 64bit BAR
 2. Write address back to lower half of 64bit BAR
 3. Write all FF's to higher half of 64bit BAR
 4. Write address back to higher half of 64bit BAR

 Linux code is here: 
 http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

 What does it mean for qemu?

 At step 1. qemu pci_default_write_config() recevies all FFs for lower
 part of the 64bit BAR. Then it applies the mask and converts the value
 to All FF's - size + 1 (FE00 if size is 32MB).
 Then pci_bar_address() checks if BAR address is valid. Since it is a
 64bit bar it reads 0xFE00 - this address is valid. So qemu
 updates topology and sends request to update mappings in KVM with new
 range for the 64bit BAR FE00 - 0x. This usually means kernel
 panic on boot, if there is another mapping in the FE00 - 0x
 range, which is quite common.


 The following patch fixes the issue. It affects 64bit PCI BAR's only. 
 The idea of the patch is: we introduce the states for low and high BARs
 whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY
 - someone has requested size of one half of the 64bit PCI BAR,
 PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the
 address of one half of the 64bit PCI BAR. The state becomes BAR_VALID
 when both halfs are in the same state. We ignore BAR value until both
 states become BAR_VALID

 Note: Please use the latest Seabios version (commit
 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions
 didn't initialize high part of 64bit BAR. 

 The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server

 Signed-off-by: Alexey Korolev alexey.koro...@endace.com
 Interesting. However, looking at guest code,
 I note that memory and io are disabled
 during BAR sizing unless mmio always on is set.
 pci_bar_address should return PCI_BAR_UNMAPPED
 in this case, and we should never map this BAR
 until it's enabled. What's going on?


Oh. Good point. You are right here. Linux developers
have added a protection starting 2.6.36 for lower part of PCI BAR.
So this issue affects all guest kernels before 2.6.36.
Sorry about confusion.

The code without protection is here:

http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162


To solve this issue for older kernel versions the submitted patch is still 
relevant.