Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-14 Thread George Dunlap
On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:

   We could have a xenstore flag somewhere that enables the old behaviour
   so that people can revert back to qemu-xen-traditional and make the pci
   hole below 4G even bigger than 448MB, but I think that keeping the old
   behaviour around is going to make the code more difficult to maintain.
 
  The downside of that is that things which worked with the old scheme may
  not work with the new one though. Early in a release cycle when we have
  time to discover what has broken then that might be OK, but is post rc4
  really the time to be risking it?

 Yes, you are right: there are some scenarios that would have worked
 before that wouldn't work anymore with the new scheme.
 Are they important enough to have a workaround, pretty difficult to
 identify for a user?

 That question would be reasonable early in the development cycle. At rc4
 the question should be: do we think this problem is so critical that we
 want to risk breaking something else which currently works for people.

 Remember that we are invalidating whatever passthrough testing people
 have already done up to this point of the release.

 It is also worth noting that the things which this change ends up
 breaking may for all we know be equally difficult for a user to identify
 (they are after all approximately the same class of issue).

 The problem here is that the risk is difficult to evaluate, we just
 don't know what will break with this change, and we don't know therefore
 if the cure is worse than the disease. The conservative approach at this
 point in the release would be to not change anything, or to change the
 minimal possible number of things (which would preclude changes which
 impact qemu-trad IMHO).



 WRT pretty difficult to identify -- the root of this thread suggests the
 guest entered a reboot loop with No bootable device, that sounds
 eminently release notable to me. I also not that it was changing the
 size of the PCI hole which caused the issue -- which does somewhat
 underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

If it's a choice between users experiencing, My VM randomly crashes
and experiencing, I tried to pass through this device but the guest
OS doesn't see it, I'd rather choose the latter.

 -George



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-14 Thread Ian Campbell
On Fri, 2013-06-14 at 11:53 +0100, George Dunlap wrote:
 On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell ian.campb...@citrix.com wrote:
  On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:
 
We could have a xenstore flag somewhere that enables the old behaviour
so that people can revert back to qemu-xen-traditional and make the pci
hole below 4G even bigger than 448MB, but I think that keeping the old
behaviour around is going to make the code more difficult to maintain.
  
   The downside of that is that things which worked with the old scheme may
   not work with the new one though. Early in a release cycle when we have
   time to discover what has broken then that might be OK, but is post rc4
   really the time to be risking it?
 
  Yes, you are right: there are some scenarios that would have worked
  before that wouldn't work anymore with the new scheme.
  Are they important enough to have a workaround, pretty difficult to
  identify for a user?
 
  That question would be reasonable early in the development cycle. At rc4
  the question should be: do we think this problem is so critical that we
  want to risk breaking something else which currently works for people.
 
  Remember that we are invalidating whatever passthrough testing people
  have already done up to this point of the release.
 
  It is also worth noting that the things which this change ends up
  breaking may for all we know be equally difficult for a user to identify
  (they are after all approximately the same class of issue).
 
  The problem here is that the risk is difficult to evaluate, we just
  don't know what will break with this change, and we don't know therefore
  if the cure is worse than the disease. The conservative approach at this
  point in the release would be to not change anything, or to change the
  minimal possible number of things (which would preclude changes which
  impact qemu-trad IMHO).
 
 
 
  WRT pretty difficult to identify -- the root of this thread suggests the
  guest entered a reboot loop with No bootable device, that sounds
  eminently release notable to me. I also not that it was changing the
  size of the PCI hole which caused the issue -- which does somewhat
  underscore the risks involved in this sort of change.
 
 But that bug was a bug in the first attempt to fix the root problem.
 The root problem shows up as qemu crashing at some point because it
 tried to access invalid guest gpfn space; see
 http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.
 
 Stefano tried to fix it with the above patch, just changing the hole
 to start at 0xe; but that was incomplete, as it didn't match with
 hvmloader and seabios's view of the world.  That's what this bug
 report is about.  This thread is an attempt to find a better fix.
 
 So the root problem is that if we revert this patch, and someone
 passes through a pci device using qemu-xen (the default) and the MMIO
 hole is resized, at some point in the future qemu will randomly die.

Right, I see, thanks for explaining.

 If it's a choice between users experiencing, My VM randomly crashes
 and experiencing, I tried to pass through this device but the guest
 OS doesn't see it, I'd rather choose the latter.

All other things being equal, obviously we all would. But the point I've
been trying to make is that we don't know the other consequences of
making that fix -- e.g. on existing working configurations. So the
choice is some VMs randomly crash, but other stuff works fine and we
have had a reasonable amount of user testing and those particular VMs
don't crash any more, but we don't know what other stuff no longer works
and the existing test base has been at least partially invalidated.

I think that at post rc4 in a release we ought to be being pretty
conservative about the risks of this sort of change, especially wrt
invalidating testing and the unknowns involved.

Aren't the configurations which might trip over this issue are going to
be in the minority compared to those which we risk breaking?

Ian.





Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-14 Thread George Dunlap

On 14/06/13 12:34, Ian Campbell wrote:

On Fri, 2013-06-14 at 11:53 +0100, George Dunlap wrote:

On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell ian.campb...@citrix.com wrote:

On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:


We could have a xenstore flag somewhere that enables the old behaviour
so that people can revert back to qemu-xen-traditional and make the pci
hole below 4G even bigger than 448MB, but I think that keeping the old
behaviour around is going to make the code more difficult to maintain.

The downside of that is that things which worked with the old scheme may
not work with the new one though. Early in a release cycle when we have
time to discover what has broken then that might be OK, but is post rc4
really the time to be risking it?

Yes, you are right: there are some scenarios that would have worked
before that wouldn't work anymore with the new scheme.
Are they important enough to have a workaround, pretty difficult to
identify for a user?

That question would be reasonable early in the development cycle. At rc4
the question should be: do we think this problem is so critical that we
want to risk breaking something else which currently works for people.

Remember that we are invalidating whatever passthrough testing people
have already done up to this point of the release.

It is also worth noting that the things which this change ends up
breaking may for all we know be equally difficult for a user to identify
(they are after all approximately the same class of issue).

The problem here is that the risk is difficult to evaluate, we just
don't know what will break with this change, and we don't know therefore
if the cure is worse than the disease. The conservative approach at this
point in the release would be to not change anything, or to change the
minimal possible number of things (which would preclude changes which
impact qemu-trad IMHO).




WRT pretty difficult to identify -- the root of this thread suggests the
guest entered a reboot loop with No bootable device, that sounds
eminently release notable to me. I also not that it was changing the
size of the PCI hole which caused the issue -- which does somewhat
underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

Right, I see, thanks for explaining.


If it's a choice between users experiencing, My VM randomly crashes
and experiencing, I tried to pass through this device but the guest
OS doesn't see it, I'd rather choose the latter.

All other things being equal, obviously we all would. But the point I've
been trying to make is that we don't know the other consequences of
making that fix -- e.g. on existing working configurations. So the
choice is some VMs randomly crash, but other stuff works fine and we
have had a reasonable amount of user testing and those particular VMs
don't crash any more, but we don't know what other stuff no longer works
and the existing test base has been at least partially invalidated.

I think that at post rc4 in a release we ought to be being pretty
conservative about the risks of this sort of change, especially wrt
invalidating testing and the unknowns involved.

Aren't the configurations which might trip over this issue are going to
be in the minority compared to those which we risk breaking?


So there are the technical proposals we've been discussing, each of 
which has different risks.


1. Set the default MMIO hole size to 0xe000.
2. If possible, relocate PCI devices that don't fit in the hole to the 
64-bit hole.
 - Here if possible will mean a) the device has a 64-bit BAR, and b) 
this hasn't been disabled by libxl (probably via a xenstore key).

3. If possible, resize the MMIO hole; otherwise refuse to map the device
 - Currently if possible is always true; the new thing here would be 
making it possible for libxl to disable this, probably via a xenstore key.


Each of these will have different risks for qemu-traditional and qemu-xen.

Implementing #3 would have no risk for qemu-traditional, because we 
won't be changing the way anything works; what works will still work, 
what is broken (if anything) will still be broken.


Implementing #3 for qemu-xen only changes one kind of failure for 
another.  If you resize the MMIO hole for qemu-xen, then you *will* 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-14 Thread George Dunlap

On 14/06/13 15:14, George Dunlap wrote:

On 14/06/13 12:34, Ian Campbell wrote:

On Fri, 2013-06-14 at 11:53 +0100, George Dunlap wrote:
On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell 
ian.campb...@citrix.com wrote:

On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:

We could have a xenstore flag somewhere that enables the old 
behaviour
so that people can revert back to qemu-xen-traditional and make 
the pci
hole below 4G even bigger than 448MB, but I think that keeping 
the old
behaviour around is going to make the code more difficult to 
maintain.
The downside of that is that things which worked with the old 
scheme may
not work with the new one though. Early in a release cycle when 
we have
time to discover what has broken then that might be OK, but is 
post rc4

really the time to be risking it?

Yes, you are right: there are some scenarios that would have worked
before that wouldn't work anymore with the new scheme.
Are they important enough to have a workaround, pretty difficult to
identify for a user?
That question would be reasonable early in the development cycle. 
At rc4
the question should be: do we think this problem is so critical 
that we

want to risk breaking something else which currently works for people.

Remember that we are invalidating whatever passthrough testing people
have already done up to this point of the release.

It is also worth noting that the things which this change ends up
breaking may for all we know be equally difficult for a user to 
identify

(they are after all approximately the same class of issue).

The problem here is that the risk is difficult to evaluate, we just
don't know what will break with this change, and we don't know 
therefore
if the cure is worse than the disease. The conservative approach at 
this

point in the release would be to not change anything, or to change the
minimal possible number of things (which would preclude changes which
impact qemu-trad IMHO).



WRT pretty difficult to identify -- the root of this thread 
suggests the

guest entered a reboot loop with No bootable device, that sounds
eminently release notable to me. I also not that it was changing the
size of the PCI hole which caused the issue -- which does somewhat
underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

Right, I see, thanks for explaining.


If it's a choice between users experiencing, My VM randomly crashes
and experiencing, I tried to pass through this device but the guest
OS doesn't see it, I'd rather choose the latter.

All other things being equal, obviously we all would. But the point I've
been trying to make is that we don't know the other consequences of
making that fix -- e.g. on existing working configurations. So the
choice is some VMs randomly crash, but other stuff works fine and we
have had a reasonable amount of user testing and those particular VMs
don't crash any more, but we don't know what other stuff no longer works
and the existing test base has been at least partially invalidated.

I think that at post rc4 in a release we ought to be being pretty
conservative about the risks of this sort of change, especially wrt
invalidating testing and the unknowns involved.

Aren't the configurations which might trip over this issue are going to
be in the minority compared to those which we risk breaking?


So there are the technical proposals we've been discussing, each of 
which has different risks.


1. Set the default MMIO hole size to 0xe000.
2. If possible, relocate PCI devices that don't fit in the hole to the 
64-bit hole.
 - Here if possible will mean a) the device has a 64-bit BAR, and b) 
this hasn't been disabled by libxl (probably via a xenstore key).

3. If possible, resize the MMIO hole; otherwise refuse to map the device
 - Currently if possible is always true; the new thing here would be 
making it possible for libxl to disable this, probably via a xenstore 
key.


Each of these will have different risks for qemu-traditional and 
qemu-xen.


Implementing #3 would have no risk for qemu-traditional, because we 
won't be changing the way anything works; what works will still work, 
what is broken (if anything) will still be broken.


Implementing #3 for qemu-xen only changes one kind of failure for 
another.  If you 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Stefano Stabellini
On Wed, 12 Jun 2013, George Dunlap wrote:
 On 12/06/13 08:25, Jan Beulich wrote:
 On 11.06.13 at 19:26, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
   I went through the code that maps the PCI MMIO regions in hvmloader
   (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
   maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
   region is larger than 512MB.
   
   Maybe we could just relax this condition and map the device memory to
   high memory no matter the size of the MMIO region if the PCI bar is
   64-bit?
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
  
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
 
 First of all, the proposal to move the PCI BAR up to the 64-bit range is a
 temporary work-around.  It should only be done if a device doesn't fit in the
 current MMIO range.
 
 We have three options here:
 1. Don't do anything
 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
 fit
 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
 memory).
 4. Add a mechanism to tell qemu that memory is being relocated.
 
 Number 4 is definitely the right answer long-term, but we just don't have time
 to do that before the 4.3 release.  We're not sure yet if #3 is possible; even
 if it is, it may have unpredictable knock-on effects.
 
 Doing #2, it is true that many guests will be unable to access the device
 because of 32-bit limitations.  However, in #1, *no* guests will be able to
 access the device.  At least in #2, *many* guests will be able to do so.  In
 any case, apparently #2 is what KVM does, so having the limitation on guests
 is not without precedent.  It's also likely to be a somewhat tested
 configuration (unlike #3, for example).

I would avoid #3, because I don't think is a good idea to rely on that
behaviour.
I would also avoid #4, because having seen QEMU's code, it's wouldn't be
easy and certainly not doable in time for 4.3.

So we are left to play with the PCI MMIO region size and location in
hvmloader.

I agree with Jan that we shouldn't relocate unconditionally all the
devices to the region above 4G. I meant to say that we should relocate
only the ones that don't fit. And we shouldn't try to dynamically
increase the PCI hole below 4G because clearly that doesn't work.
However we could still increase the size of the PCI hole below 4G by
default from start at 0xf000 to starting at 0xe000.
Why do we know that is safe? Because in the current configuration
hvmloader *already* increases the PCI hole size by decreasing the start
address every time a device doesn't fit.
So it's already common for hvmloader to set pci_mem_start to
0xe000, you just need to assign a device with a PCI hole size big
enough.


My proposed solution is:

- set 0xe000 as the default PCI hole start for everybody, including
qemu-xen-traditional
- move above 4G everything that doesn't fit and support 64-bit bars
- print an error if the device doesn't fit and doesn't support 64-bit
bars



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread George Dunlap

On 13/06/13 14:44, Stefano Stabellini wrote:

On Wed, 12 Jun 2013, George Dunlap wrote:

On 12/06/13 08:25, Jan Beulich wrote:

On 11.06.13 at 19:26, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is a
temporary work-around.  It should only be done if a device doesn't fit in the
current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't have time
to do that before the 4.3 release.  We're not sure yet if #3 is possible; even
if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the device
because of 32-bit limitations.  However, in #1, *no* guests will be able to
access the device.  At least in #2, *many* guests will be able to do so.  In
any case, apparently #2 is what KVM does, so having the limitation on guests
is not without precedent.  It's also likely to be a somewhat tested
configuration (unlike #3, for example).

I would avoid #3, because I don't think is a good idea to rely on that
behaviour.
I would also avoid #4, because having seen QEMU's code, it's wouldn't be
easy and certainly not doable in time for 4.3.

So we are left to play with the PCI MMIO region size and location in
hvmloader.

I agree with Jan that we shouldn't relocate unconditionally all the
devices to the region above 4G. I meant to say that we should relocate
only the ones that don't fit. And we shouldn't try to dynamically
increase the PCI hole below 4G because clearly that doesn't work.
However we could still increase the size of the PCI hole below 4G by
default from start at 0xf000 to starting at 0xe000.
Why do we know that is safe? Because in the current configuration
hvmloader *already* increases the PCI hole size by decreasing the start
address every time a device doesn't fit.
So it's already common for hvmloader to set pci_mem_start to
0xe000, you just need to assign a device with a PCI hole size big
enough.


My proposed solution is:

- set 0xe000 as the default PCI hole start for everybody, including
qemu-xen-traditional
- move above 4G everything that doesn't fit and support 64-bit bars
- print an error if the device doesn't fit and doesn't support 64-bit
bars


Also, as I understand it, at the moment:
1. Some operating systems (32-bit XP) won't be able to use relocated devices
2. Some devices (without 64-bit BARs) can't be relocated
3. qemu-traditional is fine with a resized 4GiB MMIO hole.

So if we have #1 or #2, at the moment an option for a work-around is to 
use qemu-traditional.


However, if we add your print an error if the device doesn't fit, then 
this option will go away -- this will be a regression in functionality 
from 4.2.


I thought that what we had proposed was to have an option in xenstore, 
that libxl would set, which would instruct hvmloader whether to expand 
the MMIO hole and whether to relocate devices above 64-bit?


 -George



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Paolo Bonzini
Il 13/06/2013 09:54, George Dunlap ha scritto:
 
 Also, as I understand it, at the moment:
 1. Some operating systems (32-bit XP) won't be able to use relocated
 devices
 2. Some devices (without 64-bit BARs) can't be relocated

Are there really devices with huge 32-bit BARs?  I think #1 is the only
real problem, so far though it never was for KVM.

SeaBIOS sorts the BARs from smallest to largest alignment, and then from
smallest to largest size.  Typically only the GPU will be relocated.

Paolo

 3. qemu-traditional is fine with a resized 4GiB MMIO hole.
 
 So if we have #1 or #2, at the moment an option for a work-around is to
 use qemu-traditional.
 
 However, if we add your print an error if the device doesn't fit, then
 this option will go away -- this will be a regression in functionality
 from 4.2.
 
 I thought that what we had proposed was to have an option in xenstore,
 that libxl would set, which would instruct hvmloader whether to expand
 the MMIO hole and whether to relocate devices above 64-bit?




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Stefano Stabellini
On Thu, 13 Jun 2013, George Dunlap wrote:
 On 13/06/13 14:44, Stefano Stabellini wrote:
  On Wed, 12 Jun 2013, George Dunlap wrote:
   On 12/06/13 08:25, Jan Beulich wrote:
   On 11.06.13 at 19:26, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
 I went through the code that maps the PCI MMIO regions in hvmloader
 (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it
 already
 maps the PCI region to high memory if the PCI bar is 64-bit and the
 MMIO
 region is larger than 512MB.
 
 Maybe we could just relax this condition and map the device memory to
 high memory no matter the size of the MMIO region if the PCI bar is
 64-bit?
I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.
   First of all, the proposal to move the PCI BAR up to the 64-bit range is a
   temporary work-around.  It should only be done if a device doesn't fit in
   the
   current MMIO range.
   
   We have three options here:
   1. Don't do anything
   2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
   don't
   fit
   3. Convince qemu to allow MMIO regions to mask memory (or what it thinks
   is
   memory).
   4. Add a mechanism to tell qemu that memory is being relocated.
   
   Number 4 is definitely the right answer long-term, but we just don't have
   time
   to do that before the 4.3 release.  We're not sure yet if #3 is possible;
   even
   if it is, it may have unpredictable knock-on effects.
   
   Doing #2, it is true that many guests will be unable to access the device
   because of 32-bit limitations.  However, in #1, *no* guests will be able
   to
   access the device.  At least in #2, *many* guests will be able to do so.
   In
   any case, apparently #2 is what KVM does, so having the limitation on
   guests
   is not without precedent.  It's also likely to be a somewhat tested
   configuration (unlike #3, for example).
  I would avoid #3, because I don't think is a good idea to rely on that
  behaviour.
  I would also avoid #4, because having seen QEMU's code, it's wouldn't be
  easy and certainly not doable in time for 4.3.
  
  So we are left to play with the PCI MMIO region size and location in
  hvmloader.
  
  I agree with Jan that we shouldn't relocate unconditionally all the
  devices to the region above 4G. I meant to say that we should relocate
  only the ones that don't fit. And we shouldn't try to dynamically
  increase the PCI hole below 4G because clearly that doesn't work.
  However we could still increase the size of the PCI hole below 4G by
  default from start at 0xf000 to starting at 0xe000.
  Why do we know that is safe? Because in the current configuration
  hvmloader *already* increases the PCI hole size by decreasing the start
  address every time a device doesn't fit.
  So it's already common for hvmloader to set pci_mem_start to
  0xe000, you just need to assign a device with a PCI hole size big
  enough.
  
  
  My proposed solution is:
  
  - set 0xe000 as the default PCI hole start for everybody, including
  qemu-xen-traditional
  - move above 4G everything that doesn't fit and support 64-bit bars
  - print an error if the device doesn't fit and doesn't support 64-bit
  bars
 
 Also, as I understand it, at the moment:
 1. Some operating systems (32-bit XP) won't be able to use relocated devices
 2. Some devices (without 64-bit BARs) can't be relocated
 3. qemu-traditional is fine with a resized 4GiB MMIO hole.
 
 So if we have #1 or #2, at the moment an option for a work-around is to use
 qemu-traditional.
 
 However, if we add your print an error if the device doesn't fit, then this
 option will go away -- this will be a regression in functionality from 4.2.

Keep in mind that if we start the pci hole at 0xe000, the number of
cases for which any workarounds are needed is going to be dramatically
decreased to the point that I don't think we need a workaround anymore.

The algorithm is going to work like this in details:

- the pci hole size is set to 0xfc00-0xe000 = 448MB
- we calculate the total mmio size, if it's bigger than the pci hole we
raise a 64 bit relocation flag
- if the 64 bit relocation 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Jan Beulich
 On 13.06.13 at 16:50, Stefano Stabellini stefano.stabell...@eu.citrix.com 
 wrote:
 The algorithm is going to work like this in details:
 
 - the pci hole size is set to 0xfc00-0xe000 = 448MB
 - we calculate the total mmio size, if it's bigger than the pci hole we
 raise a 64 bit relocation flag
 - if the 64 bit relocation is enabled, we relocate above 4G the first
 device that is 64-bit capable and has an MMIO size greater or equal to
 512MB
 - if the pci hole size is now big enough for the remaining devices we
 stop the above 4G relocation, otherwise keep relocating devices that are
 64 bit capable and have an MMIO size greater or equal to 512MB
 - if one or more devices don't fit we print an error and continue (it's
 not a critical failure, one device won't be used)

Devices with 512Mb BARs won't fit in a 448Mb hole in any case,
so there's no point in trying. Any such BARs need to be relocated.
Then for 256Mb BARs, you could see whether it's just one and fits.
Else relocate it, and all others (except for perhaps one). Then
halve the size again and start over, etc.

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Ian Campbell
On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
 On 13/06/13 14:44, Stefano Stabellini wrote:
  On Wed, 12 Jun 2013, George Dunlap wrote:
  On 12/06/13 08:25, Jan Beulich wrote:
  On 11.06.13 at 19:26, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
  maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
  region is larger than 512MB.
 
  Maybe we could just relax this condition and map the device memory to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
 
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
  First of all, the proposal to move the PCI BAR up to the 64-bit range is a
  temporary work-around.  It should only be done if a device doesn't fit in 
  the
  current MMIO range.
 
  We have three options here:
  1. Don't do anything
  2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
  fit
  3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
  memory).
  4. Add a mechanism to tell qemu that memory is being relocated.
 
  Number 4 is definitely the right answer long-term, but we just don't have 
  time
  to do that before the 4.3 release.  We're not sure yet if #3 is possible; 
  even
  if it is, it may have unpredictable knock-on effects.
 
  Doing #2, it is true that many guests will be unable to access the device
  because of 32-bit limitations.  However, in #1, *no* guests will be able to
  access the device.  At least in #2, *many* guests will be able to do so.  
  In
  any case, apparently #2 is what KVM does, so having the limitation on 
  guests
  is not without precedent.  It's also likely to be a somewhat tested
  configuration (unlike #3, for example).
  I would avoid #3, because I don't think is a good idea to rely on that
  behaviour.
  I would also avoid #4, because having seen QEMU's code, it's wouldn't be
  easy and certainly not doable in time for 4.3.
 
  So we are left to play with the PCI MMIO region size and location in
  hvmloader.
 
  I agree with Jan that we shouldn't relocate unconditionally all the
  devices to the region above 4G. I meant to say that we should relocate
  only the ones that don't fit. And we shouldn't try to dynamically
  increase the PCI hole below 4G because clearly that doesn't work.
  However we could still increase the size of the PCI hole below 4G by
  default from start at 0xf000 to starting at 0xe000.
  Why do we know that is safe? Because in the current configuration
  hvmloader *already* increases the PCI hole size by decreasing the start
  address every time a device doesn't fit.
  So it's already common for hvmloader to set pci_mem_start to
  0xe000, you just need to assign a device with a PCI hole size big
  enough.

Isn't this the exact case which is broken? And therefore not known safe
at all?

 
 
  My proposed solution is:
 
  - set 0xe000 as the default PCI hole start for everybody, including
  qemu-xen-traditional

What is the impact on existing qemu-trad guests?

It does mean that guest which were installed with a bit less than 4GB
RAM may now find a little bit of RAM moves above 4GB to make room for
the bigger whole. If they can dynamically enable PAE that might be ok.

Does this have any impact on Windows activation?

  - move above 4G everything that doesn't fit and support 64-bit bars
  - print an error if the device doesn't fit and doesn't support 64-bit
  bars
 
 Also, as I understand it, at the moment:
 1. Some operating systems (32-bit XP) won't be able to use relocated devices
 2. Some devices (without 64-bit BARs) can't be relocated
 3. qemu-traditional is fine with a resized 4GiB MMIO hole.
 
 So if we have #1 or #2, at the moment an option for a work-around is to 
 use qemu-traditional.
 
 However, if we add your print an error if the device doesn't fit, then 
 this option will go away -- this will be a regression in functionality 
 from 4.2.

Only if print an error also involves aborting. It could print an error
(lets call it a warning) and continue, which would leave the workaround
viable.

 I 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread George Dunlap

On 13/06/13 15:50, Stefano Stabellini wrote:

Keep in mind that if we start the pci hole at 0xe000, the number of
cases for which any workarounds are needed is going to be dramatically
decreased to the point that I don't think we need a workaround anymore.


You don't think anyone is going to want to pass through a card with 
1GiB+ of RAM?




The algorithm is going to work like this in details:

- the pci hole size is set to 0xfc00-0xe000 = 448MB
- we calculate the total mmio size, if it's bigger than the pci hole we
raise a 64 bit relocation flag
- if the 64 bit relocation is enabled, we relocate above 4G the first
device that is 64-bit capable and has an MMIO size greater or equal to
512MB
- if the pci hole size is now big enough for the remaining devices we
stop the above 4G relocation, otherwise keep relocating devices that are
64 bit capable and have an MMIO size greater or equal to 512MB
- if one or more devices don't fit we print an error and continue (it's
not a critical failure, one device won't be used)

We could have a xenstore flag somewhere that enables the old behaviour
so that people can revert back to qemu-xen-traditional and make the pci
hole below 4G even bigger than 448MB, but I think that keeping the old
behaviour around is going to make the code more difficult to maintain.


We'll only need to do that for one release, until we have a chance to 
fix it properly.




Also it's difficult for people to realize that they need the workaround
because hvmloader logs aren't enabled by default and only go to the Xen
serial console.


Well if key people know about it (Pasi, David Techer, c), and we put it 
on the wikis related to VGA pass-through, I think information will get 
around.



The value of this workaround pretty low in my view.
Finally it's worth noting that Windows XP is going EOL in less than an
year.


That's 1 year that a configuration with a currently-supported OS won't 
work for Xen 4.3 that worked for 4.2.  Apart from that, one of the 
reasons for doing virtualization in the first place is to be able to run 
older, unsupported OSes on current hardware; so XP isn't important 
doesn't really cut it for me. :-)






I thought that what we had proposed was to have an option in xenstore, that
libxl would set, which would instruct hvmloader whether to expand the MMIO
hole and whether to relocate devices above 64-bit?

I think it's right to have this discussion in public on the mailing
list, rather than behind closed doors.
Also I don't agree on the need for a workaround, as explained above.


I see -- you thought it was a bad idea and so were letting someone else 
bring it up -- or maybe hoping no one would remember to bring it up. :-)


(Obviously the decision needs to be made in public, but sometimes having 
technical solutions hashed out in a face-to-face meeting is more efficient.)


 -George



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread George Dunlap

On 13/06/13 16:16, Ian Campbell wrote:

On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:

On 13/06/13 14:44, Stefano Stabellini wrote:

On Wed, 12 Jun 2013, George Dunlap wrote:

On 12/06/13 08:25, Jan Beulich wrote:

On 11.06.13 at 19:26, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is a
temporary work-around.  It should only be done if a device doesn't fit in the
current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't have time
to do that before the 4.3 release.  We're not sure yet if #3 is possible; even
if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the device
because of 32-bit limitations.  However, in #1, *no* guests will be able to
access the device.  At least in #2, *many* guests will be able to do so.  In
any case, apparently #2 is what KVM does, so having the limitation on guests
is not without precedent.  It's also likely to be a somewhat tested
configuration (unlike #3, for example).

I would avoid #3, because I don't think is a good idea to rely on that
behaviour.
I would also avoid #4, because having seen QEMU's code, it's wouldn't be
easy and certainly not doable in time for 4.3.

So we are left to play with the PCI MMIO region size and location in
hvmloader.

I agree with Jan that we shouldn't relocate unconditionally all the
devices to the region above 4G. I meant to say that we should relocate
only the ones that don't fit. And we shouldn't try to dynamically
increase the PCI hole below 4G because clearly that doesn't work.
However we could still increase the size of the PCI hole below 4G by
default from start at 0xf000 to starting at 0xe000.
Why do we know that is safe? Because in the current configuration
hvmloader *already* increases the PCI hole size by decreasing the start
address every time a device doesn't fit.
So it's already common for hvmloader to set pci_mem_start to
0xe000, you just need to assign a device with a PCI hole size big
enough.

Isn't this the exact case which is broken? And therefore not known safe
at all?



My proposed solution is:

- set 0xe000 as the default PCI hole start for everybody, including
qemu-xen-traditional

What is the impact on existing qemu-trad guests?

It does mean that guest which were installed with a bit less than 4GB
RAM may now find a little bit of RAM moves above 4GB to make room for
the bigger whole. If they can dynamically enable PAE that might be ok.

Does this have any impact on Windows activation?


- move above 4G everything that doesn't fit and support 64-bit bars
- print an error if the device doesn't fit and doesn't support 64-bit
bars

Also, as I understand it, at the moment:
1. Some operating systems (32-bit XP) won't be able to use relocated devices
2. Some devices (without 64-bit BARs) can't be relocated
3. qemu-traditional is fine with a resized 4GiB MMIO hole.

So if we have #1 or #2, at the moment an option for a work-around is to
use qemu-traditional.

However, if we add your print an error if the device doesn't fit, then
this option will go away -- this will be a regression in functionality
from 4.2.

Only if print an error also involves aborting. It could print an error
(lets call it a warning) and continue, which would leave the workaround
viable.\


No, because if hvmloader doesn't increase the size of the MMIO hole, 
then the device won't actually work.  The guest will boot, but 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Ian Campbell
On Thu, 2013-06-13 at 15:50 +0100, Stefano Stabellini wrote:
 On Thu, 13 Jun 2013, George Dunlap wrote:
  On 13/06/13 14:44, Stefano Stabellini wrote:
   On Wed, 12 Jun 2013, George Dunlap wrote:
On 12/06/13 08:25, Jan Beulich wrote:
On 11.06.13 at 19:26, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it
  already
  maps the PCI region to high memory if the PCI bar is 64-bit and the
  MMIO
  region is larger than 512MB.
  
  Maybe we could just relax this condition and map the device memory 
  to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).
 
 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
First of all, the proposal to move the PCI BAR up to the 64-bit range 
is a
temporary work-around.  It should only be done if a device doesn't fit 
in
the
current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
don't
fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks
is
memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't 
have
time
to do that before the 4.3 release.  We're not sure yet if #3 is 
possible;
even
if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the 
device
because of 32-bit limitations.  However, in #1, *no* guests will be able
to
access the device.  At least in #2, *many* guests will be able to do so.
In
any case, apparently #2 is what KVM does, so having the limitation on
guests
is not without precedent.  It's also likely to be a somewhat tested
configuration (unlike #3, for example).
   I would avoid #3, because I don't think is a good idea to rely on that
   behaviour.
   I would also avoid #4, because having seen QEMU's code, it's wouldn't be
   easy and certainly not doable in time for 4.3.
   
   So we are left to play with the PCI MMIO region size and location in
   hvmloader.
   
   I agree with Jan that we shouldn't relocate unconditionally all the
   devices to the region above 4G. I meant to say that we should relocate
   only the ones that don't fit. And we shouldn't try to dynamically
   increase the PCI hole below 4G because clearly that doesn't work.
   However we could still increase the size of the PCI hole below 4G by
   default from start at 0xf000 to starting at 0xe000.
   Why do we know that is safe? Because in the current configuration
   hvmloader *already* increases the PCI hole size by decreasing the start
   address every time a device doesn't fit.
   So it's already common for hvmloader to set pci_mem_start to
   0xe000, you just need to assign a device with a PCI hole size big
   enough.
   
   
   My proposed solution is:
   
   - set 0xe000 as the default PCI hole start for everybody, including
   qemu-xen-traditional
   - move above 4G everything that doesn't fit and support 64-bit bars
   - print an error if the device doesn't fit and doesn't support 64-bit
   bars
  
  Also, as I understand it, at the moment:
  1. Some operating systems (32-bit XP) won't be able to use relocated devices
  2. Some devices (without 64-bit BARs) can't be relocated
  3. qemu-traditional is fine with a resized 4GiB MMIO hole.
  
  So if we have #1 or #2, at the moment an option for a work-around is to use
  qemu-traditional.
  
  However, if we add your print an error if the device doesn't fit, then 
  this
  option will go away -- this will be a regression in functionality from 4.2.
 
 Keep in mind that if we start the pci hole at 0xe000, the number of
 cases for which any workarounds are needed is going to be dramatically
 decreased to the point that I don't think we need a workaround anymore.

Starting at 0xe000 leaves, as you say 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Ian Campbell
On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote:
 On 13/06/13 16:16, Ian Campbell wrote:
  On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
  On 13/06/13 14:44, Stefano Stabellini wrote:
  On Wed, 12 Jun 2013, George Dunlap wrote:
  On 12/06/13 08:25, Jan Beulich wrote:
  On 11.06.13 at 19:26, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
  maps the PCI region to high memory if the PCI bar is 64-bit and the 
  MMIO
  region is larger than 512MB.
 
  Maybe we could just relax this condition and map the device memory to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
 
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
  First of all, the proposal to move the PCI BAR up to the 64-bit range is 
  a
  temporary work-around.  It should only be done if a device doesn't fit 
  in the
  current MMIO range.
 
  We have three options here:
  1. Don't do anything
  2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
  don't
  fit
  3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
  is
  memory).
  4. Add a mechanism to tell qemu that memory is being relocated.
 
  Number 4 is definitely the right answer long-term, but we just don't 
  have time
  to do that before the 4.3 release.  We're not sure yet if #3 is 
  possible; even
  if it is, it may have unpredictable knock-on effects.
 
  Doing #2, it is true that many guests will be unable to access the device
  because of 32-bit limitations.  However, in #1, *no* guests will be able 
  to
  access the device.  At least in #2, *many* guests will be able to do so. 
   In
  any case, apparently #2 is what KVM does, so having the limitation on 
  guests
  is not without precedent.  It's also likely to be a somewhat tested
  configuration (unlike #3, for example).
  I would avoid #3, because I don't think is a good idea to rely on that
  behaviour.
  I would also avoid #4, because having seen QEMU's code, it's wouldn't be
  easy and certainly not doable in time for 4.3.
 
  So we are left to play with the PCI MMIO region size and location in
  hvmloader.
 
  I agree with Jan that we shouldn't relocate unconditionally all the
  devices to the region above 4G. I meant to say that we should relocate
  only the ones that don't fit. And we shouldn't try to dynamically
  increase the PCI hole below 4G because clearly that doesn't work.
  However we could still increase the size of the PCI hole below 4G by
  default from start at 0xf000 to starting at 0xe000.
  Why do we know that is safe? Because in the current configuration
  hvmloader *already* increases the PCI hole size by decreasing the start
  address every time a device doesn't fit.
  So it's already common for hvmloader to set pci_mem_start to
  0xe000, you just need to assign a device with a PCI hole size big
  enough.
  Isn't this the exact case which is broken? And therefore not known safe
  at all?
 
 
  My proposed solution is:
 
  - set 0xe000 as the default PCI hole start for everybody, including
  qemu-xen-traditional
  What is the impact on existing qemu-trad guests?
 
  It does mean that guest which were installed with a bit less than 4GB
  RAM may now find a little bit of RAM moves above 4GB to make room for
  the bigger whole. If they can dynamically enable PAE that might be ok.
 
  Does this have any impact on Windows activation?
 
  - move above 4G everything that doesn't fit and support 64-bit bars
  - print an error if the device doesn't fit and doesn't support 64-bit
  bars
  Also, as I understand it, at the moment:
  1. Some operating systems (32-bit XP) won't be able to use relocated 
  devices
  2. Some devices (without 64-bit BARs) can't be relocated
  3. qemu-traditional is fine with a resized 4GiB MMIO hole.
 
  So if we have #1 or #2, at the moment an option for a work-around is to
  use qemu-traditional.
 
  However, if we add your print an error if the device doesn't fit, then
  this option will go away -- this will be a regression in functionality
  from 4.2.
  Only if print an error 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Ian Campbell
On Thu, 2013-06-13 at 16:40 +0100, George Dunlap wrote:
 On 13/06/13 16:36, Ian Campbell wrote:
  On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote:
  On 13/06/13 16:16, Ian Campbell wrote:
  On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
  On 13/06/13 14:44, Stefano Stabellini wrote:
  On Wed, 12 Jun 2013, George Dunlap wrote:
  On 12/06/13 08:25, Jan Beulich wrote:
  On 11.06.13 at 19:26, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it 
  already
  maps the PCI region to high memory if the PCI bar is 64-bit and the 
  MMIO
  region is larger than 512MB.
 
  Maybe we could just relax this condition and map the device memory to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
 
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
  First of all, the proposal to move the PCI BAR up to the 64-bit range 
  is a
  temporary work-around.  It should only be done if a device doesn't fit 
  in the
  current MMIO range.
 
  We have three options here:
  1. Don't do anything
  2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
  don't
  fit
  3. Convince qemu to allow MMIO regions to mask memory (or what it 
  thinks is
  memory).
  4. Add a mechanism to tell qemu that memory is being relocated.
 
  Number 4 is definitely the right answer long-term, but we just don't 
  have time
  to do that before the 4.3 release.  We're not sure yet if #3 is 
  possible; even
  if it is, it may have unpredictable knock-on effects.
 
  Doing #2, it is true that many guests will be unable to access the 
  device
  because of 32-bit limitations.  However, in #1, *no* guests will be 
  able to
  access the device.  At least in #2, *many* guests will be able to do 
  so.  In
  any case, apparently #2 is what KVM does, so having the limitation on 
  guests
  is not without precedent.  It's also likely to be a somewhat tested
  configuration (unlike #3, for example).
  I would avoid #3, because I don't think is a good idea to rely on that
  behaviour.
  I would also avoid #4, because having seen QEMU's code, it's wouldn't be
  easy and certainly not doable in time for 4.3.
 
  So we are left to play with the PCI MMIO region size and location in
  hvmloader.
 
  I agree with Jan that we shouldn't relocate unconditionally all the
  devices to the region above 4G. I meant to say that we should relocate
  only the ones that don't fit. And we shouldn't try to dynamically
  increase the PCI hole below 4G because clearly that doesn't work.
  However we could still increase the size of the PCI hole below 4G by
  default from start at 0xf000 to starting at 0xe000.
  Why do we know that is safe? Because in the current configuration
  hvmloader *already* increases the PCI hole size by decreasing the start
  address every time a device doesn't fit.
  So it's already common for hvmloader to set pci_mem_start to
  0xe000, you just need to assign a device with a PCI hole size big
  enough.
  Isn't this the exact case which is broken? And therefore not known safe
  at all?
 
  My proposed solution is:
 
  - set 0xe000 as the default PCI hole start for everybody, including
  qemu-xen-traditional
  What is the impact on existing qemu-trad guests?
 
  It does mean that guest which were installed with a bit less than 4GB
  RAM may now find a little bit of RAM moves above 4GB to make room for
  the bigger whole. If they can dynamically enable PAE that might be ok.
 
  Does this have any impact on Windows activation?
 
  - move above 4G everything that doesn't fit and support 64-bit bars
  - print an error if the device doesn't fit and doesn't support 64-bit
  bars
  Also, as I understand it, at the moment:
  1. Some operating systems (32-bit XP) won't be able to use relocated 
  devices
  2. Some devices (without 64-bit BARs) can't be relocated
  3. qemu-traditional is fine with a resized 4GiB MMIO hole.
 
  So if we have #1 or #2, at the moment an option for a work-around is to
  use qemu-traditional.
 
  However, if we add your print an error if the device doesn't fit, then
  this 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Stefano Stabellini
On Thu, 13 Jun 2013, Ian Campbell wrote:
 On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
  On 13/06/13 14:44, Stefano Stabellini wrote:
   On Wed, 12 Jun 2013, George Dunlap wrote:
   On 12/06/13 08:25, Jan Beulich wrote:
   On 11.06.13 at 19:26, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
   I went through the code that maps the PCI MMIO regions in hvmloader
   (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
   maps the PCI region to high memory if the PCI bar is 64-bit and the 
   MMIO
   region is larger than 512MB.
  
   Maybe we could just relax this condition and map the device memory to
   high memory no matter the size of the MMIO region if the PCI bar is
   64-bit?
   I can only recommend not to: For one, guests not using PAE or
   PSE-36 can't map such space at all (and older OSes may not
   properly deal with 64-bit BARs at all). And then one would generally
   expect this allocation to be done top down (to minimize risk of
   running into RAM), and doing so is going to present further risks of
   incompatibilities with guest OSes (Linux for example learned only in
   2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
   3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
   PFN to pfn_pte(), the respective parameter of which is
   unsigned long).
  
   I think this ought to be done in an iterative process - if all MMIO
   regions together don't fit below 4G, the biggest one should be
   moved up beyond 4G first, followed by the next to biggest one
   etc.
   First of all, the proposal to move the PCI BAR up to the 64-bit range is 
   a
   temporary work-around.  It should only be done if a device doesn't fit 
   in the
   current MMIO range.
  
   We have three options here:
   1. Don't do anything
   2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
   don't
   fit
   3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
   is
   memory).
   4. Add a mechanism to tell qemu that memory is being relocated.
  
   Number 4 is definitely the right answer long-term, but we just don't 
   have time
   to do that before the 4.3 release.  We're not sure yet if #3 is 
   possible; even
   if it is, it may have unpredictable knock-on effects.
  
   Doing #2, it is true that many guests will be unable to access the device
   because of 32-bit limitations.  However, in #1, *no* guests will be able 
   to
   access the device.  At least in #2, *many* guests will be able to do so. 
In
   any case, apparently #2 is what KVM does, so having the limitation on 
   guests
   is not without precedent.  It's also likely to be a somewhat tested
   configuration (unlike #3, for example).
   I would avoid #3, because I don't think is a good idea to rely on that
   behaviour.
   I would also avoid #4, because having seen QEMU's code, it's wouldn't be
   easy and certainly not doable in time for 4.3.
  
   So we are left to play with the PCI MMIO region size and location in
   hvmloader.
  
   I agree with Jan that we shouldn't relocate unconditionally all the
   devices to the region above 4G. I meant to say that we should relocate
   only the ones that don't fit. And we shouldn't try to dynamically
   increase the PCI hole below 4G because clearly that doesn't work.
   However we could still increase the size of the PCI hole below 4G by
   default from start at 0xf000 to starting at 0xe000.
   Why do we know that is safe? Because in the current configuration
   hvmloader *already* increases the PCI hole size by decreasing the start
   address every time a device doesn't fit.
   So it's already common for hvmloader to set pci_mem_start to
   0xe000, you just need to assign a device with a PCI hole size big
   enough.
 
 Isn't this the exact case which is broken? And therefore not known safe
 at all?

hvmloader sets pci_mem_start to 0xe000 and works with
qemu-xen-traditional but it doesn't with qemu-xen (before the patch that
increases the default pci hole size in QEMU).
What I was trying to say is: it's already common for hvmloader to set
pci_mem_start to 0xe000 with qemu-xen-traditional.


   My proposed solution is:
  
   - set 0xe000 as the default PCI hole start for everybody, including
   qemu-xen-traditional
 
 What is the impact on existing qemu-trad guests?
 
 It does mean that guest which were installed with a bit less than 4GB
 RAM may now find a little bit of RAM moves above 4GB to make room for
 the bigger whole. If they can dynamically enable PAE that might be ok.

Yes, the amount of below 4G ram is going to be a bit less.


 Does this have any impact on Windows activation?

I don't think so: I assigned graphic cards with less than 512MB of
videoram to Windows guests before without compromising the Windows
license. I'll get more info on this.


   - move above 4G everything that doesn't fit and support 64-bit bars
   - print an error if the device doesn't fit and 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread George Dunlap

On 13/06/13 16:36, Ian Campbell wrote:

On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote:

On 13/06/13 16:16, Ian Campbell wrote:

On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:

On 13/06/13 14:44, Stefano Stabellini wrote:

On Wed, 12 Jun 2013, George Dunlap wrote:

On 12/06/13 08:25, Jan Beulich wrote:

On 11.06.13 at 19:26, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is a
temporary work-around.  It should only be done if a device doesn't fit in the
current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't have time
to do that before the 4.3 release.  We're not sure yet if #3 is possible; even
if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the device
because of 32-bit limitations.  However, in #1, *no* guests will be able to
access the device.  At least in #2, *many* guests will be able to do so.  In
any case, apparently #2 is what KVM does, so having the limitation on guests
is not without precedent.  It's also likely to be a somewhat tested
configuration (unlike #3, for example).

I would avoid #3, because I don't think is a good idea to rely on that
behaviour.
I would also avoid #4, because having seen QEMU's code, it's wouldn't be
easy and certainly not doable in time for 4.3.

So we are left to play with the PCI MMIO region size and location in
hvmloader.

I agree with Jan that we shouldn't relocate unconditionally all the
devices to the region above 4G. I meant to say that we should relocate
only the ones that don't fit. And we shouldn't try to dynamically
increase the PCI hole below 4G because clearly that doesn't work.
However we could still increase the size of the PCI hole below 4G by
default from start at 0xf000 to starting at 0xe000.
Why do we know that is safe? Because in the current configuration
hvmloader *already* increases the PCI hole size by decreasing the start
address every time a device doesn't fit.
So it's already common for hvmloader to set pci_mem_start to
0xe000, you just need to assign a device with a PCI hole size big
enough.

Isn't this the exact case which is broken? And therefore not known safe
at all?


My proposed solution is:

- set 0xe000 as the default PCI hole start for everybody, including
qemu-xen-traditional

What is the impact on existing qemu-trad guests?

It does mean that guest which were installed with a bit less than 4GB
RAM may now find a little bit of RAM moves above 4GB to make room for
the bigger whole. If they can dynamically enable PAE that might be ok.

Does this have any impact on Windows activation?


- move above 4G everything that doesn't fit and support 64-bit bars
- print an error if the device doesn't fit and doesn't support 64-bit
bars

Also, as I understand it, at the moment:
1. Some operating systems (32-bit XP) won't be able to use relocated devices
2. Some devices (without 64-bit BARs) can't be relocated
3. qemu-traditional is fine with a resized 4GiB MMIO hole.

So if we have #1 or #2, at the moment an option for a work-around is to
use qemu-traditional.

However, if we add your print an error if the device doesn't fit, then
this option will go away -- this will be a regression in functionality
from 4.2.

Only if print an error also involves aborting. It could print an error
(lets call it a warning) and continue, which would leave the workaround
viable.\

No, because if hvmloader doesn't 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Stefano Stabellini
On Thu, 13 Jun 2013, George Dunlap wrote:
 On 13/06/13 15:50, Stefano Stabellini wrote:
  Keep in mind that if we start the pci hole at 0xe000, the number of
  cases for which any workarounds are needed is going to be dramatically
  decreased to the point that I don't think we need a workaround anymore.
 
 You don't think anyone is going to want to pass through a card with 1GiB+ of
 RAM?

Yes, but as Paolo pointed out, those devices are going to be 64-bit
capable so they'll relocate above 4G just fine.


  The algorithm is going to work like this in details:
  
  - the pci hole size is set to 0xfc00-0xe000 = 448MB
  - we calculate the total mmio size, if it's bigger than the pci hole we
  raise a 64 bit relocation flag
  - if the 64 bit relocation is enabled, we relocate above 4G the first
  device that is 64-bit capable and has an MMIO size greater or equal to
  512MB
  - if the pci hole size is now big enough for the remaining devices we
  stop the above 4G relocation, otherwise keep relocating devices that are
  64 bit capable and have an MMIO size greater or equal to 512MB
  - if one or more devices don't fit we print an error and continue (it's
  not a critical failure, one device won't be used)
  
  We could have a xenstore flag somewhere that enables the old behaviour
  so that people can revert back to qemu-xen-traditional and make the pci
  hole below 4G even bigger than 448MB, but I think that keeping the old
  behaviour around is going to make the code more difficult to maintain.
 
 We'll only need to do that for one release, until we have a chance to fix it
 properly.

There is nothing more lasting than a temporary workaround :-)
Also it's not very clear what the proper solution would be like in this
case.
However keeping the old behaviour is certainly possible. It would just
be a bit harder to also keep the old (smaller) default pci hole around.



  Also it's difficult for people to realize that they need the workaround
  because hvmloader logs aren't enabled by default and only go to the Xen
  serial console.
 
 Well if key people know about it (Pasi, David Techer, c), and we put it on
 the wikis related to VGA pass-through, I think information will get around.

It's not that I don't value documentation, but given that the average
user won't see any logs and the error is completely non-informative,
many people are going to be lost in a wild goose chase on google.


  The value of this workaround pretty low in my view.
  Finally it's worth noting that Windows XP is going EOL in less than an
  year.
 
 That's 1 year that a configuration with a currently-supported OS won't work
 for Xen 4.3 that worked for 4.2.  Apart from that, one of the reasons for
 doing virtualization in the first place is to be able to run older,
 unsupported OSes on current hardware; so XP isn't important doesn't really
 cut it for me. :-)

fair enough


   I thought that what we had proposed was to have an option in xenstore,
   that
   libxl would set, which would instruct hvmloader whether to expand the MMIO
   hole and whether to relocate devices above 64-bit?
  I think it's right to have this discussion in public on the mailing
  list, rather than behind closed doors.
  Also I don't agree on the need for a workaround, as explained above.
 
 I see -- you thought it was a bad idea and so were letting someone else bring
 it up -- or maybe hoping no one would remember to bring it up. :-)
 
Nothing that Macchiavellian: I didn't consider all the implications at
the time and I thought I managed to come up with a better plan.


 (Obviously the decision needs to be made in public, but sometimes having
 technical solutions hashed out in a face-to-face meeting is more efficient.)

But it's also easier to overlook something, at least it is easier for
me.



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Stefano Stabellini
On Thu, 13 Jun 2013, Ian Campbell wrote:
 On Thu, 2013-06-13 at 15:50 +0100, Stefano Stabellini wrote:
  On Thu, 13 Jun 2013, George Dunlap wrote:
   On 13/06/13 14:44, Stefano Stabellini wrote:
On Wed, 12 Jun 2013, George Dunlap wrote:
 On 12/06/13 08:25, Jan Beulich wrote:
 On 11.06.13 at 19:26, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
   I went through the code that maps the PCI MMIO regions in 
   hvmloader
   (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it
   already
   maps the PCI region to high memory if the PCI bar is 64-bit and 
   the
   MMIO
   region is larger than 512MB.
   
   Maybe we could just relax this condition and map the device 
   memory to
   high memory no matter the size of the MMIO region if the PCI bar 
   is
   64-bit?
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
  
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
 First of all, the proposal to move the PCI BAR up to the 64-bit range 
 is a
 temporary work-around.  It should only be done if a device doesn't 
 fit in
 the
 current MMIO range.
 
 We have three options here:
 1. Don't do anything
 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
 don't
 fit
 3. Convince qemu to allow MMIO regions to mask memory (or what it 
 thinks
 is
 memory).
 4. Add a mechanism to tell qemu that memory is being relocated.
 
 Number 4 is definitely the right answer long-term, but we just don't 
 have
 time
 to do that before the 4.3 release.  We're not sure yet if #3 is 
 possible;
 even
 if it is, it may have unpredictable knock-on effects.
 
 Doing #2, it is true that many guests will be unable to access the 
 device
 because of 32-bit limitations.  However, in #1, *no* guests will be 
 able
 to
 access the device.  At least in #2, *many* guests will be able to do 
 so.
 In
 any case, apparently #2 is what KVM does, so having the limitation on
 guests
 is not without precedent.  It's also likely to be a somewhat tested
 configuration (unlike #3, for example).
I would avoid #3, because I don't think is a good idea to rely on that
behaviour.
I would also avoid #4, because having seen QEMU's code, it's wouldn't be
easy and certainly not doable in time for 4.3.

So we are left to play with the PCI MMIO region size and location in
hvmloader.

I agree with Jan that we shouldn't relocate unconditionally all the
devices to the region above 4G. I meant to say that we should relocate
only the ones that don't fit. And we shouldn't try to dynamically
increase the PCI hole below 4G because clearly that doesn't work.
However we could still increase the size of the PCI hole below 4G by
default from start at 0xf000 to starting at 0xe000.
Why do we know that is safe? Because in the current configuration
hvmloader *already* increases the PCI hole size by decreasing the start
address every time a device doesn't fit.
So it's already common for hvmloader to set pci_mem_start to
0xe000, you just need to assign a device with a PCI hole size big
enough.


My proposed solution is:

- set 0xe000 as the default PCI hole start for everybody, including
qemu-xen-traditional
- move above 4G everything that doesn't fit and support 64-bit bars
- print an error if the device doesn't fit and doesn't support 64-bit
bars
   
   Also, as I understand it, at the moment:
   1. Some operating systems (32-bit XP) won't be able to use relocated 
   devices
   2. Some devices (without 64-bit BARs) can't be relocated
   3. qemu-traditional is fine with a resized 4GiB MMIO hole.
   
   So if we have #1 or #2, at the moment an option for a work-around is to 
   use
   qemu-traditional.
   
   However, if we add your print an error if the device doesn't fit, then 
   this
   option will go away -- this will be a regression in functionality from 
   4.2.
  
  Keep in mind that if we start the pci hole at 

Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-13 Thread Ian Campbell
On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:

   We could have a xenstore flag somewhere that enables the old behaviour
   so that people can revert back to qemu-xen-traditional and make the pci
   hole below 4G even bigger than 448MB, but I think that keeping the old
   behaviour around is going to make the code more difficult to maintain.
  
  The downside of that is that things which worked with the old scheme may
  not work with the new one though. Early in a release cycle when we have
  time to discover what has broken then that might be OK, but is post rc4
  really the time to be risking it?
 
 Yes, you are right: there are some scenarios that would have worked
 before that wouldn't work anymore with the new scheme.
 Are they important enough to have a workaround, pretty difficult to
 identify for a user?

That question would be reasonable early in the development cycle. At rc4
the question should be: do we think this problem is so critical that we
want to risk breaking something else which currently works for people.

Remember that we are invalidating whatever passthrough testing people
have already done up to this point of the release.

It is also worth noting that the things which this change ends up
breaking may for all we know be equally difficult for a user to identify
(they are after all approximately the same class of issue).

The problem here is that the risk is difficult to evaluate, we just
don't know what will break with this change, and we don't know therefore
if the cure is worse than the disease. The conservative approach at this
point in the release would be to not change anything, or to change the
minimal possible number of things (which would preclude changes which
impact qemu-trad IMHO).

WRT pretty difficult to identify -- the root of this thread suggests the
guest entered a reboot loop with No bootable device, that sounds
eminently release notable to me. I also not that it was changing the
size of the PCI hole which caused the issue -- which does somewhat
underscore the risks involved in this sort of change.

   Also it's difficult for people to realize that they need the workaround
   because hvmloader logs aren't enabled by default and only go to the Xen
   serial console. The value of this workaround pretty low in my view.
   Finally it's worth noting that Windows XP is going EOL in less than an
   year.
  
  That's been true for something like 5 years...
  
  Also, apart from XP, doesn't Windows still pick a HAL at install time,
  so even a modern guest installed under the old scheme may not get a PAE
  capable HAL. If you increase the amount of RAM I think Windows will
  upgrade the HAL, but is changing the MMIO layout enough to trigger
  this? Or maybe modern Windows all use PAE (or even 64 bit) anyway?
  
  There are also performance implications of enabling PAE over 2 level
  paging. Not sure how significant they are with HAP though. Made a big
  difference with shadow IIRC.
  
  Maybe I'm worrying about nothing but while all of these unknowns might
  be OK towards the start of a release cycle rc4 seems awfully late in the
  day to be risking it.
 
 Keep in mind that all these configurations are perfectly valid even with
 the code that we have out there today. We aren't doing anything new,
 just modifying the default.

I don't think that is true. We are changing the behaviour, calling it
just a default doesn't make it any less worrying or any less of a
change.

 One just needs to assign a PCI device with more than 190MB to trigger it.
 I am trusting the fact that given that we had this behaviour for many
 years now, and it's pretty common to assign a device only some of the
 times you are booting your guest, any problems would have already come
 up.

With qemu-trad perhaps, although that's not completely obvious TBH. In
any case should we really be crossing our fingers and trusting that
it'll be ok at rc4?

Ian.




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 11.06.13 at 19:26, Stefano Stabellini stefano.stabell...@eu.citrix.com 
 wrote:
 I went through the code that maps the PCI MMIO regions in hvmloader
 (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
 maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
 region is larger than 512MB.
 
 Maybe we could just relax this condition and map the device memory to
 high memory no matter the size of the MMIO region if the PCI bar is
 64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.

And, just like many BIOSes have, there ought to be a guest
(config) controlled option to shrink the RAM portion below 4G
allowing more MMIO blocks to fit.

Finally we shouldn't forget the option of not doing any assignment
at all in the BIOS, allowing/forcing the OS to use suitable address
ranges. Of course any OS is permitted to re-assign resources, but
I think they will frequently prefer to avoid re-assignment if already
done by the BIOS.

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Ian Campbell
On Wed, 2013-06-12 at 08:25 +0100, Jan Beulich wrote:
  On 11.06.13 at 19:26, Stefano Stabellini 
  stefano.stabell...@eu.citrix.com wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
  maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
  region is larger than 512MB.
  
  Maybe we could just relax this condition and map the device memory to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
 
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).
 
 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
 
 And, just like many BIOSes have, there ought to be a guest
 (config) controlled option to shrink the RAM portion below 4G
 allowing more MMIO blocks to fit.
 
 Finally we shouldn't forget the option of not doing any assignment
 at all in the BIOS, allowing/forcing the OS to use suitable address
 ranges. Of course any OS is permitted to re-assign resources, but
 I think they will frequently prefer to avoid re-assignment if already
 done by the BIOS.

Is bios=assign-busses on the guest command line suitable as a
workaround then? Or possibly bios=realloc

Ian.




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 10:31, Ian Campbell ian.campb...@citrix.com wrote:
 On Wed, 2013-06-12 at 08:25 +0100, Jan Beulich wrote:
  On 11.06.13 at 19:26, Stefano Stabellini 
  stefano.stabell...@eu.citrix.com 
 wrote:
  I went through the code that maps the PCI MMIO regions in hvmloader
  (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
  maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
  region is larger than 512MB.
  
  Maybe we could just relax this condition and map the device memory to
  high memory no matter the size of the MMIO region if the PCI bar is
  64-bit?
 
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).
 
 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
 
 And, just like many BIOSes have, there ought to be a guest
 (config) controlled option to shrink the RAM portion below 4G
 allowing more MMIO blocks to fit.
 
 Finally we shouldn't forget the option of not doing any assignment
 at all in the BIOS, allowing/forcing the OS to use suitable address
 ranges. Of course any OS is permitted to re-assign resources, but
 I think they will frequently prefer to avoid re-assignment if already
 done by the BIOS.
 
 Is bios=assign-busses on the guest command line suitable as a
 workaround then? Or possibly bios=realloc

Which command line? Getting passed to hvmloader? In that case,
doing the assignment is the default, so an inverse option would be
needed. And not doing any assignment would be wrong too - all
devices involved in booting need (some of) their resources
assigned. That's particularly a potential problem since the graphics
card is the most likely candidate for wanting an extremely large
area, and I'm not sure whether booting with an assigned graphics
card would use that card instead of the emulated one.

As to realloc - that can hardly be meant as an option to
hvmloader, so I'm really unsure what command line you think
about here.

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Ian Campbell
On Wed, 2013-06-12 at 10:02 +0100, Jan Beulich wrote:
  On 12.06.13 at 10:31, Ian Campbell ian.campb...@citrix.com wrote:
  On Wed, 2013-06-12 at 08:25 +0100, Jan Beulich wrote:
   On 11.06.13 at 19:26, Stefano Stabellini 
   stefano.stabell...@eu.citrix.com 
  wrote:
   I went through the code that maps the PCI MMIO regions in hvmloader
   (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
   maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
   region is larger than 512MB.
   
   Maybe we could just relax this condition and map the device memory to
   high memory no matter the size of the MMIO region if the PCI bar is
   64-bit?
  
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
  
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
  
  And, just like many BIOSes have, there ought to be a guest
  (config) controlled option to shrink the RAM portion below 4G
  allowing more MMIO blocks to fit.
  
  Finally we shouldn't forget the option of not doing any assignment
  at all in the BIOS, allowing/forcing the OS to use suitable address
  ranges. Of course any OS is permitted to re-assign resources, but
  I think they will frequently prefer to avoid re-assignment if already
  done by the BIOS.
  
  Is bios=assign-busses on the guest command line suitable as a
  workaround then? Or possibly bios=realloc
 
 Which command line? Getting passed to hvmloader?

I meant the guest kernel command line.

  In that case,
 doing the assignment is the default, so an inverse option would be
 needed. And not doing any assignment would be wrong too - all
 devices involved in booting need (some of) their resources
 assigned. That's particularly a potential problem since the graphics
 card is the most likely candidate for wanting an extremely large
 area, and I'm not sure whether booting with an assigned graphics
 card would use that card instead of the emulated one.
 
 As to realloc - that can hardly be meant as an option to
 hvmloader, so I'm really unsure what command line you think
 about here.
 
 Jan
 





Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread George Dunlap

On 12/06/13 08:25, Jan Beulich wrote:

On 11.06.13 at 19:26, Stefano Stabellini stefano.stabell...@eu.citrix.com 
wrote:

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.


First of all, the proposal to move the PCI BAR up to the 64-bit range is 
a temporary work-around.  It should only be done if a device doesn't fit 
in the current MMIO range.


We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
don't fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
is memory).

4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't 
have time to do that before the 4.3 release.  We're not sure yet if #3 
is possible; even if it is, it may have unpredictable knock-on effects.


Doing #2, it is true that many guests will be unable to access the 
device because of 32-bit limitations.  However, in #1, *no* guests will 
be able to access the device.  At least in #2, *many* guests will be 
able to do so.  In any case, apparently #2 is what KVM does, so having 
the limitation on guests is not without precedent.  It's also likely to 
be a somewhat tested configuration (unlike #3, for example).


 -George



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 11:22, Ian Campbell ian.campb...@citrix.com wrote:
 On Wed, 2013-06-12 at 10:02 +0100, Jan Beulich wrote:
  On 12.06.13 at 10:31, Ian Campbell ian.campb...@citrix.com wrote:
  On Wed, 2013-06-12 at 08:25 +0100, Jan Beulich wrote:
   On 11.06.13 at 19:26, Stefano Stabellini 
   stefano.stabell...@eu.citrix.com 
  wrote:
   I went through the code that maps the PCI MMIO regions in hvmloader
   (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
   maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
   region is larger than 512MB.
   
   Maybe we could just relax this condition and map the device memory to
   high memory no matter the size of the MMIO region if the PCI bar is
   64-bit?
  
  I can only recommend not to: For one, guests not using PAE or
  PSE-36 can't map such space at all (and older OSes may not
  properly deal with 64-bit BARs at all). And then one would generally
  expect this allocation to be done top down (to minimize risk of
  running into RAM), and doing so is going to present further risks of
  incompatibilities with guest OSes (Linux for example learned only in
  2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
  3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
  PFN to pfn_pte(), the respective parameter of which is
  unsigned long).
  
  I think this ought to be done in an iterative process - if all MMIO
  regions together don't fit below 4G, the biggest one should be
  moved up beyond 4G first, followed by the next to biggest one
  etc.
  
  And, just like many BIOSes have, there ought to be a guest
  (config) controlled option to shrink the RAM portion below 4G
  allowing more MMIO blocks to fit.
  
  Finally we shouldn't forget the option of not doing any assignment
  at all in the BIOS, allowing/forcing the OS to use suitable address
  ranges. Of course any OS is permitted to re-assign resources, but
  I think they will frequently prefer to avoid re-assignment if already
  done by the BIOS.
  
  Is bios=assign-busses on the guest command line suitable as a
  workaround then? Or possibly bios=realloc
 
 Which command line? Getting passed to hvmloader?
 
 I meant the guest kernel command line.

As there's no accessible guest kernel command for HVM guests,
did you mean to require the guest admin to put something on the
command line manually?

And then - this might cover Linux, but what about other OSes,
namely Windows? Oh, and for Linux you confused me by using
bios= instead of pci=... And pci=realloc only exists as of 3.0.

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 12:05, George Dunlap george.dun...@eu.citrix.com wrote:
 On 12/06/13 08:25, Jan Beulich wrote:
 On 11.06.13 at 19:26, Stefano Stabellini 
 stefano.stabell...@eu.citrix.com 
 wrote:
 I went through the code that maps the PCI MMIO regions in hvmloader
 (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
 maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
 region is larger than 512MB.

 Maybe we could just relax this condition and map the device memory to
 high memory no matter the size of the MMIO region if the PCI bar is
 64-bit?
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).

 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
 
 First of all, the proposal to move the PCI BAR up to the 64-bit range is 
 a temporary work-around.  It should only be done if a device doesn't fit 
 in the current MMIO range.
 
 We have three options here:
 1. Don't do anything
 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
 don't fit
 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
 is memory).
 4. Add a mechanism to tell qemu that memory is being relocated.
 
 Number 4 is definitely the right answer long-term, but we just don't 
 have time to do that before the 4.3 release.  We're not sure yet if #3 
 is possible; even if it is, it may have unpredictable knock-on effects.
 
 Doing #2, it is true that many guests will be unable to access the 
 device because of 32-bit limitations.  However, in #1, *no* guests will 
 be able to access the device.  At least in #2, *many* guests will be 
 able to do so.  In any case, apparently #2 is what KVM does, so having 
 the limitation on guests is not without precedent.  It's also likely to 
 be a somewhat tested configuration (unlike #3, for example).

That's all fine with me. My objection was to Stefano's consideration
to assign high addresses to _all_ 64-bit capable BARs up, not just
the biggest one(s).

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread George Dunlap

On 12/06/13 11:11, Jan Beulich wrote:

On 12.06.13 at 12:05, George Dunlap george.dun...@eu.citrix.com wrote:

On 12/06/13 08:25, Jan Beulich wrote:

On 11.06.13 at 19:26, Stefano Stabellini stefano.stabell...@eu.citrix.com

wrote:

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?

I can only recommend not to: For one, guests not using PAE or
PSE-36 can't map such space at all (and older OSes may not
properly deal with 64-bit BARs at all). And then one would generally
expect this allocation to be done top down (to minimize risk of
running into RAM), and doing so is going to present further risks of
incompatibilities with guest OSes (Linux for example learned only in
2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
PFN to pfn_pte(), the respective parameter of which is
unsigned long).

I think this ought to be done in an iterative process - if all MMIO
regions together don't fit below 4G, the biggest one should be
moved up beyond 4G first, followed by the next to biggest one
etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is
a temporary work-around.  It should only be done if a device doesn't fit
in the current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
don't fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks
is memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't
have time to do that before the 4.3 release.  We're not sure yet if #3
is possible; even if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the
device because of 32-bit limitations.  However, in #1, *no* guests will
be able to access the device.  At least in #2, *many* guests will be
able to do so.  In any case, apparently #2 is what KVM does, so having
the limitation on guests is not without precedent.  It's also likely to
be a somewhat tested configuration (unlike #3, for example).

That's all fine with me. My objection was to Stefano's consideration
to assign high addresses to _all_ 64-bit capable BARs up, not just
the biggest one(s).


Oh right -- I understood him to mean, *allow* hvmloader to map the 
device memory to high memory *if necessary* if the BAR is 64-bit. I 
agree, mapping them all at 64-bit even if there's room in the 32-bit 
hole isn't a good idea.


 -George



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Ian Campbell
On Wed, 2013-06-12 at 11:07 +0100, Jan Beulich wrote:
  On 12.06.13 at 11:22, Ian Campbell ian.campb...@citrix.com wrote:
  On Wed, 2013-06-12 at 10:02 +0100, Jan Beulich wrote:
   On 12.06.13 at 10:31, Ian Campbell ian.campb...@citrix.com wrote:
   On Wed, 2013-06-12 at 08:25 +0100, Jan Beulich wrote:
On 11.06.13 at 19:26, Stefano Stabellini 
stefano.stabell...@eu.citrix.com 
   wrote:
I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it 
already
maps the PCI region to high memory if the PCI bar is 64-bit and the 
MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?
   
   I can only recommend not to: For one, guests not using PAE or
   PSE-36 can't map such space at all (and older OSes may not
   properly deal with 64-bit BARs at all). And then one would generally
   expect this allocation to be done top down (to minimize risk of
   running into RAM), and doing so is going to present further risks of
   incompatibilities with guest OSes (Linux for example learned only in
   2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
   3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
   PFN to pfn_pte(), the respective parameter of which is
   unsigned long).
   
   I think this ought to be done in an iterative process - if all MMIO
   regions together don't fit below 4G, the biggest one should be
   moved up beyond 4G first, followed by the next to biggest one
   etc.
   
   And, just like many BIOSes have, there ought to be a guest
   (config) controlled option to shrink the RAM portion below 4G
   allowing more MMIO blocks to fit.
   
   Finally we shouldn't forget the option of not doing any assignment
   at all in the BIOS, allowing/forcing the OS to use suitable address
   ranges. Of course any OS is permitted to re-assign resources, but
   I think they will frequently prefer to avoid re-assignment if already
   done by the BIOS.
   
   Is bios=assign-busses on the guest command line suitable as a
   workaround then? Or possibly bios=realloc
  
  Which command line? Getting passed to hvmloader?
  
  I meant the guest kernel command line.
 
 As there's no accessible guest kernel command for HVM guests,
 did you mean to require the guest admin to put something on the
 command line manually?

Yes, as a workaround for this shortcoming of 4.3, not as a long term
solution. It's only people using passthrough with certain devices with
large BARs who will ever trip over this, right?

 And then - this might cover Linux, but what about other OSes,
 namely Windows?

True, I'm not sure if/how this can be done.

The only reference I could find was at
http://windows.microsoft.com/en-id/windows7/using-system-configuration
which says under Advanced boot options:
PCI Lock. Prevents Windows from reallocating I/O and IRQ
resources on the PCI bus. The I/O and memory resources set by
the BIOS are preserved.

Which seems to suggest the default is to reallocate, but I don't know.

  Oh, and for Linux you confused me by using
 bios= instead of pci=... And pci=realloc only exists as of 3.0.

Oops, sorry.

Ian.




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 13:23, Ian Campbell ian.campb...@citrix.com wrote:
 On Wed, 2013-06-12 at 11:07 +0100, Jan Beulich wrote:
 As there's no accessible guest kernel command for HVM guests,
 did you mean to require the guest admin to put something on the
 command line manually?
 
 Yes, as a workaround for this shortcoming of 4.3, not as a long term
 solution. It's only people using passthrough with certain devices with
 large BARs who will ever trip over this, right?

Yes.

 And then - this might cover Linux, but what about other OSes,
 namely Windows?
 
 True, I'm not sure if/how this can be done.
 
 The only reference I could find was at
 http://windows.microsoft.com/en-id/windows7/using-system-configuration 
 which says under Advanced boot options:
 PCI Lock. Prevents Windows from reallocating I/O and IRQ
 resources on the PCI bus. The I/O and memory resources set by
 the BIOS are preserved.
 
 Which seems to suggest the default is to reallocate, but I don't know.

Without having known of this option, I would take it as may, not
will reallocate.

Jan





Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Ian Campbell
On Wed, 2013-06-12 at 12:56 +0100, Jan Beulich wrote:
  On 12.06.13 at 13:23, Ian Campbell ian.campb...@citrix.com wrote:
  On Wed, 2013-06-12 at 11:07 +0100, Jan Beulich wrote:

  And then - this might cover Linux, but what about other OSes,
  namely Windows?
  
  True, I'm not sure if/how this can be done.
  
  The only reference I could find was at
  http://windows.microsoft.com/en-id/windows7/using-system-configuration 
  which says under Advanced boot options:
  PCI Lock. Prevents Windows from reallocating I/O and IRQ
  resources on the PCI bus. The I/O and memory resources set by
  the BIOS are preserved.
  
  Which seems to suggest the default is to reallocate, but I don't know.
 
 Without having known of this option, I would take it as may, not
 will reallocate.

That does seem likely, yes.

Ian.




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Paolo Bonzini
Il 12/06/2013 06:05, George Dunlap ha scritto:
 On 12/06/13 08:25, Jan Beulich wrote:
 On 11.06.13 at 19:26, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 I went through the code that maps the PCI MMIO regions in hvmloader
 (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
 maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
 region is larger than 512MB.

 Maybe we could just relax this condition and map the device memory to
 high memory no matter the size of the MMIO region if the PCI bar is
 64-bit?
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).

 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
 
 First of all, the proposal to move the PCI BAR up to the 64-bit range is
 a temporary work-around.  It should only be done if a device doesn't fit
 in the current MMIO range.
 
 We have three options here:
 1. Don't do anything
 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
 don't fit
 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks
 is memory).
 4. Add a mechanism to tell qemu that memory is being relocated.
 
 Number 4 is definitely the right answer long-term, but we just don't
 have time to do that before the 4.3 release.  We're not sure yet if #3
 is possible; even if it is, it may have unpredictable knock-on effects.

#3 should be possible or even the default (would need to check), but #4
is probably a bit harder to do.  Perhaps you can use a magic I/O port
for the xen platform PV driver, but if you can simply use two PCI
windows it would be much simpler because that's the same that TCG and
KVM already do.  The code is all there for you to lift in SeaBIOS.

Only Windows XP and older had problems with that because they didn't
like something in the ASL; but the 64-bit window is placed at the end of
RAM, so in principle any PAE-enabled OS can use it.

Paolo

 Doing #2, it is true that many guests will be unable to access the
 device because of 32-bit limitations.  However, in #1, *no* guests will
 be able to access the device.  At least in #2, *many* guests will be
 able to do so.  In any case, apparently #2 is what KVM does, so having
 the limitation on guests is not without precedent.  It's also likely to
 be a somewhat tested configuration (unlike #3, for example).
 
  -George




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 15:23, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 12/06/2013 06:05, George Dunlap ha scritto:
 On 12/06/13 08:25, Jan Beulich wrote:
 On 11.06.13 at 19:26, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 I went through the code that maps the PCI MMIO regions in hvmloader
 (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
 maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
 region is larger than 512MB.

 Maybe we could just relax this condition and map the device memory to
 high memory no matter the size of the MMIO region if the PCI bar is
 64-bit?
 I can only recommend not to: For one, guests not using PAE or
 PSE-36 can't map such space at all (and older OSes may not
 properly deal with 64-bit BARs at all). And then one would generally
 expect this allocation to be done top down (to minimize risk of
 running into RAM), and doing so is going to present further risks of
 incompatibilities with guest OSes (Linux for example learned only in
 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
 3.10-rc5 ioremap_pte_range(), while using u64 pfn, passes the
 PFN to pfn_pte(), the respective parameter of which is
 unsigned long).

 I think this ought to be done in an iterative process - if all MMIO
 regions together don't fit below 4G, the biggest one should be
 moved up beyond 4G first, followed by the next to biggest one
 etc.
 
 First of all, the proposal to move the PCI BAR up to the 64-bit range is
 a temporary work-around.  It should only be done if a device doesn't fit
 in the current MMIO range.
 
 We have three options here:
 1. Don't do anything
 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they
 don't fit
 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks
 is memory).
 4. Add a mechanism to tell qemu that memory is being relocated.
 
 Number 4 is definitely the right answer long-term, but we just don't
 have time to do that before the 4.3 release.  We're not sure yet if #3
 is possible; even if it is, it may have unpredictable knock-on effects.
 
 #3 should be possible or even the default (would need to check), but #4
 is probably a bit harder to do.  Perhaps you can use a magic I/O port
 for the xen platform PV driver, but if you can simply use two PCI
 windows it would be much simpler because that's the same that TCG and
 KVM already do.  The code is all there for you to lift in SeaBIOS.

What is the connection here to the platform PV driver?

 Only Windows XP and older had problems with that because they didn't
 like something in the ASL; but the 64-bit window is placed at the end of
 RAM, so in principle any PAE-enabled OS can use it.

At the end of _RAM_???

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Paolo Bonzini
Il 12/06/2013 09:49, Jan Beulich ha scritto:
 #3 should be possible or even the default (would need to check), but #4
 is probably a bit harder to do.  Perhaps you can use a magic I/O port
 for the xen platform PV driver, but if you can simply use two PCI
 windows it would be much simpler because that's the same that TCG and
 KVM already do.  The code is all there for you to lift in SeaBIOS.
 
 What is the connection here to the platform PV driver?

It's just a hook you already have for Xen-specific stuff in QEMU.

 Only Windows XP and older had problems with that because they didn't
 like something in the ASL; but the 64-bit window is placed at the end of
 RAM, so in principle any PAE-enabled OS can use it.
 
 At the end of _RAM_???

Why the question marks? :)

If you have 4GB of RAM it will end at 0x14000 (or something like
that) and that's where the 64-bit window starts.  Of course if you have
no RAM above the PCI hole, the 64-bit window will start at 0x1.

Paolo



Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Jan Beulich
 On 12.06.13 at 16:02, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 12/06/2013 09:49, Jan Beulich ha scritto:
 #3 should be possible or even the default (would need to check), but #4
 is probably a bit harder to do.  Perhaps you can use a magic I/O port
 for the xen platform PV driver, but if you can simply use two PCI
 windows it would be much simpler because that's the same that TCG and
 KVM already do.  The code is all there for you to lift in SeaBIOS.
 
 What is the connection here to the platform PV driver?
 
 It's just a hook you already have for Xen-specific stuff in QEMU.

Oh, sorry, I'm generally taking this term to refer to a Linux
component.

 Only Windows XP and older had problems with that because they didn't
 like something in the ASL; but the 64-bit window is placed at the end of
 RAM, so in principle any PAE-enabled OS can use it.
 
 At the end of _RAM_???
 
 Why the question marks? :)

Ah, so mean right after RAM. At the end of RAM reads like
overlapping (and discarding) the tail of it.

 If you have 4GB of RAM it will end at 0x14000 (or something like
 that) and that's where the 64-bit window starts.  Of course if you have
 no RAM above the PCI hole, the 64-bit window will start at 0x1.

So there's no provision whatsoever for extending the amount of RAM
a guest may see? This is why I'd see any such allocation strategy to
start at the end of physical address space, moving downwards.

Jan




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread George Dunlap

On 12/06/13 15:19, Jan Beulich wrote:

On 12.06.13 at 16:02, Paolo Bonzini pbonz...@redhat.com wrote:

Il 12/06/2013 09:49, Jan Beulich ha scritto:

#3 should be possible or even the default (would need to check), but #4
is probably a bit harder to do.  Perhaps you can use a magic I/O port
for the xen platform PV driver, but if you can simply use two PCI
windows it would be much simpler because that's the same that TCG and
KVM already do.  The code is all there for you to lift in SeaBIOS.

What is the connection here to the platform PV driver?

It's just a hook you already have for Xen-specific stuff in QEMU.

Oh, sorry, I'm generally taking this term to refer to a Linux
component.


Only Windows XP and older had problems with that because they didn't
like something in the ASL; but the 64-bit window is placed at the end of
RAM, so in principle any PAE-enabled OS can use it.

At the end of _RAM_???

Why the question marks? :)

Ah, so mean right after RAM. At the end of RAM reads like
overlapping (and discarding) the tail of it.


If you have 4GB of RAM it will end at 0x14000 (or something like
that) and that's where the 64-bit window starts.  Of course if you have
no RAM above the PCI hole, the 64-bit window will start at 0x1.

So there's no provision whatsoever for extending the amount of RAM
a guest may see? This is why I'd see any such allocation strategy to
start at the end of physical address space, moving downwards.


Is there a mechanism to do memory hot-plug in qemu at the moment? If 
not, then there's no reason to put it anywhere else.


 -George




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-12 Thread Paolo Bonzini
Il 12/06/2013 11:25, George Dunlap ha scritto:
 If you have 4GB of RAM it will end at 0x14000 (or something like
 that) and that's where the 64-bit window starts.  Of course if you have
 no RAM above the PCI hole, the 64-bit window will start at 0x1.
 So there's no provision whatsoever for extending the amount of RAM
 a guest may see? This is why I'd see any such allocation strategy to
 start at the end of physical address space, moving downwards.

That'd work too, I guess.

 Is there a mechanism to do memory hot-plug in qemu at the moment? If
 not, then there's no reason to put it anywhere else.

Not yet, but then memory could also be discontiguous as long as you
describe it correctly in the tables.

Paolo




Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

2013-06-11 Thread Stefano Stabellini
On Mon, 10 Jun 2013, George Dunlap wrote:
 On Sat, Jun 8, 2013 at 8:27 AM, Hao, Xudong xudong@intel.com wrote:
 
  What happens on real hardware when the BIOS boots and finds out that the
  PCI hole is too small to contain all the MMIO regions of the PCI devices?
 
 
  I do not know what does BIOS do details, but I think it's in 4G above if 
  pci hole is too small. I saw native OS program the pci hole to a very high 
  addr 0x3c0c if the pci device has large bar size.
 
  For Xen hvmloader, it already policy that is: if PCI hole is too small, 
  another MMIO region will be created starting from the high_mem_pgend.
 
 Yes, and that, as I understand it, is the problem: that this change of
 the MMIO regions is not communicated to qemu.
 
 But it seems like this same problem would occur on real hardware --
 i.e., you add a new video card with 2GiB of video ram, and now you
 need a 2.5 GiB MMIO hole. It seems like the BIOS would be the logical
 place to sort it out.  But I'm not familiar enough with SeaBIOS / qemu
 to really know for sure.

I went through the code that maps the PCI MMIO regions in hvmloader
(tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
region is larger than 512MB.

Maybe we could just relax this condition and map the device memory to
high memory no matter the size of the MMIO region if the PCI bar is
64-bit?
If the Nvidia Quadro that originally Gonglei was trying to assign to the
guest is 64-bit capable, then it would fix the issue in the simplest
possible way.

Are there actually any PCI devices that people might want to assign to
guests that don't have 64-bit bars and don't fit in
0xe000-0xfc00?