Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-07-09 Thread Juergen Gross
On 08/07/17 02:59, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote:
>> On 28/03/17 16:27, Boris Ostrovsky wrote:
>>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the 
> guest).
>
> For example
>
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
>
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
>
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.
 But this is all in-guest behavior, i.e. nothing an entity outside the
 guest (tool stack or hypervisor) should need to be aware of. That
 said, there is still room for improvement in the tools I think:
 Regions which architecturally aren't RAM (namely the
 0xa-0xf range) would probably better not be accounted
 for as RAM as far as ballooning is concerned. In the hypervisor,
 otoh, all memory assigned to the guest (i.e. including such backing
 ROMs) needs to be accounted.
>>>
>>> On the Linux side we should not include in balloon calculations pages
>>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>>
>>> Which leaves hvmloader's special pages (and possibly memory under
>>> 0xA which may get reserved). Can we pass this info to guests via
>>> xenstore?
>>
>> I'd rather keep an internal difference between online pages and E820-map
>> count value in the balloon driver. This should work always.
> 
> Did we ever come with a patch for this?

Yes, I've sent V2 recently:

https://lists.xen.org/archives/html/xen-devel/2017-07/msg00530.html


Juergen


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-07-09 Thread Juergen Gross
On 08/07/17 02:59, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote:
>> On 28/03/17 16:27, Boris Ostrovsky wrote:
>>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the 
> guest).
>
> For example
>
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
>
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
>
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.
 But this is all in-guest behavior, i.e. nothing an entity outside the
 guest (tool stack or hypervisor) should need to be aware of. That
 said, there is still room for improvement in the tools I think:
 Regions which architecturally aren't RAM (namely the
 0xa-0xf range) would probably better not be accounted
 for as RAM as far as ballooning is concerned. In the hypervisor,
 otoh, all memory assigned to the guest (i.e. including such backing
 ROMs) needs to be accounted.
>>>
>>> On the Linux side we should not include in balloon calculations pages
>>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>>
>>> Which leaves hvmloader's special pages (and possibly memory under
>>> 0xA which may get reserved). Can we pass this info to guests via
>>> xenstore?
>>
>> I'd rather keep an internal difference between online pages and E820-map
>> count value in the balloon driver. This should work always.
> 
> Did we ever come with a patch for this?

Yes, I've sent V2 recently:

https://lists.xen.org/archives/html/xen-devel/2017-07/msg00530.html


Juergen


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-07-07 Thread Konrad Rzeszutek Wilk
On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote:
> On 28/03/17 16:27, Boris Ostrovsky wrote:
> > On 03/28/2017 04:08 AM, Jan Beulich wrote:
> > On 28.03.17 at 03:57,  wrote:
> >>> I think there is indeed a disconnect between target memory (provided by 
> >>> the toolstack) and current memory (i.e actual pages available to the 
> >>> guest).
> >>>
> >>> For example
> >>>
> >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> >>> reserved
> >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> >>> reserved
> >>>
> >>> are missed in target calculation. The hvmloader marks them as RESERVED 
> >>> (in build_e820_table()) but target value is not aware of this action.
> >>>
> >>> And then the same problem repeats when kernel removes 
> >>> 0x000a-0x000f chunk.
> >> But this is all in-guest behavior, i.e. nothing an entity outside the
> >> guest (tool stack or hypervisor) should need to be aware of. That
> >> said, there is still room for improvement in the tools I think:
> >> Regions which architecturally aren't RAM (namely the
> >> 0xa-0xf range) would probably better not be accounted
> >> for as RAM as far as ballooning is concerned. In the hypervisor,
> >> otoh, all memory assigned to the guest (i.e. including such backing
> >> ROMs) needs to be accounted.
> > 
> > On the Linux side we should not include in balloon calculations pages
> > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
> > 
> > Which leaves hvmloader's special pages (and possibly memory under
> > 0xA which may get reserved). Can we pass this info to guests via
> > xenstore?
> 
> I'd rather keep an internal difference between online pages and E820-map
> count value in the balloon driver. This should work always.

Did we ever come with a patch for this?


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-07-07 Thread Konrad Rzeszutek Wilk
On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote:
> On 28/03/17 16:27, Boris Ostrovsky wrote:
> > On 03/28/2017 04:08 AM, Jan Beulich wrote:
> > On 28.03.17 at 03:57,  wrote:
> >>> I think there is indeed a disconnect between target memory (provided by 
> >>> the toolstack) and current memory (i.e actual pages available to the 
> >>> guest).
> >>>
> >>> For example
> >>>
> >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> >>> reserved
> >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> >>> reserved
> >>>
> >>> are missed in target calculation. The hvmloader marks them as RESERVED 
> >>> (in build_e820_table()) but target value is not aware of this action.
> >>>
> >>> And then the same problem repeats when kernel removes 
> >>> 0x000a-0x000f chunk.
> >> But this is all in-guest behavior, i.e. nothing an entity outside the
> >> guest (tool stack or hypervisor) should need to be aware of. That
> >> said, there is still room for improvement in the tools I think:
> >> Regions which architecturally aren't RAM (namely the
> >> 0xa-0xf range) would probably better not be accounted
> >> for as RAM as far as ballooning is concerned. In the hypervisor,
> >> otoh, all memory assigned to the guest (i.e. including such backing
> >> ROMs) needs to be accounted.
> > 
> > On the Linux side we should not include in balloon calculations pages
> > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
> > 
> > Which leaves hvmloader's special pages (and possibly memory under
> > 0xA which may get reserved). Can we pass this info to guests via
> > xenstore?
> 
> I'd rather keep an internal difference between online pages and E820-map
> count value in the balloon driver. This should work always.

Did we ever come with a patch for this?


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Juergen Gross
On 28/03/17 18:32, Boris Ostrovsky wrote:
> On 03/28/2017 11:30 AM, Juergen Gross wrote:
>> On 28/03/17 16:27, Boris Ostrovsky wrote:
>>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the 
> guest).
>
> For example
>
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
>
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
>
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.
 But this is all in-guest behavior, i.e. nothing an entity outside the
 guest (tool stack or hypervisor) should need to be aware of. That
 said, there is still room for improvement in the tools I think:
 Regions which architecturally aren't RAM (namely the
 0xa-0xf range) would probably better not be accounted
 for as RAM as far as ballooning is concerned. In the hypervisor,
 otoh, all memory assigned to the guest (i.e. including such backing
 ROMs) needs to be accounted.
>>> On the Linux side we should not include in balloon calculations pages
>>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>>
>>> Which leaves hvmloader's special pages (and possibly memory under
>>> 0xA which may get reserved). Can we pass this info to guests via
>>> xenstore?
>> I'd rather keep an internal difference between online pages and E820-map
>> count value in the balloon driver. This should work always.
> 
> We could indeed base calculation on initial state of e820 and not count
> the holes toward ballooning needs. I am not sure this will work for
> memory unplug though, where a hole can be created in the map and we will
> be supposed to handle disappearing memory via ballooning.
> 
> Or am I creating a problem where none exists?

I'm rather sure memory has to be offlined before being deleted from
the E820 map.


Juergen



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Juergen Gross
On 28/03/17 18:32, Boris Ostrovsky wrote:
> On 03/28/2017 11:30 AM, Juergen Gross wrote:
>> On 28/03/17 16:27, Boris Ostrovsky wrote:
>>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the 
> guest).
>
> For example
>
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
>
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
>
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.
 But this is all in-guest behavior, i.e. nothing an entity outside the
 guest (tool stack or hypervisor) should need to be aware of. That
 said, there is still room for improvement in the tools I think:
 Regions which architecturally aren't RAM (namely the
 0xa-0xf range) would probably better not be accounted
 for as RAM as far as ballooning is concerned. In the hypervisor,
 otoh, all memory assigned to the guest (i.e. including such backing
 ROMs) needs to be accounted.
>>> On the Linux side we should not include in balloon calculations pages
>>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>>
>>> Which leaves hvmloader's special pages (and possibly memory under
>>> 0xA which may get reserved). Can we pass this info to guests via
>>> xenstore?
>> I'd rather keep an internal difference between online pages and E820-map
>> count value in the balloon driver. This should work always.
> 
> We could indeed base calculation on initial state of e820 and not count
> the holes toward ballooning needs. I am not sure this will work for
> memory unplug though, where a hole can be created in the map and we will
> be supposed to handle disappearing memory via ballooning.
> 
> Or am I creating a problem where none exists?

I'm rather sure memory has to be offlined before being deleted from
the E820 map.


Juergen



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Boris Ostrovsky
On 03/28/2017 11:30 AM, Juergen Gross wrote:
> On 28/03/17 16:27, Boris Ostrovsky wrote:
>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>> On 28.03.17 at 03:57,  wrote:
 I think there is indeed a disconnect between target memory (provided by 
 the toolstack) and current memory (i.e actual pages available to the 
 guest).

 For example

 [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
 reserved
 [0.00] BIOS-e820: [mem 0x000e-0x000f] 
 reserved

 are missed in target calculation. The hvmloader marks them as RESERVED 
 (in build_e820_table()) but target value is not aware of this action.

 And then the same problem repeats when kernel removes 
 0x000a-0x000f chunk.
>>> But this is all in-guest behavior, i.e. nothing an entity outside the
>>> guest (tool stack or hypervisor) should need to be aware of. That
>>> said, there is still room for improvement in the tools I think:
>>> Regions which architecturally aren't RAM (namely the
>>> 0xa-0xf range) would probably better not be accounted
>>> for as RAM as far as ballooning is concerned. In the hypervisor,
>>> otoh, all memory assigned to the guest (i.e. including such backing
>>> ROMs) needs to be accounted.
>> On the Linux side we should not include in balloon calculations pages
>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>
>> Which leaves hvmloader's special pages (and possibly memory under
>> 0xA which may get reserved). Can we pass this info to guests via
>> xenstore?
> I'd rather keep an internal difference between online pages and E820-map
> count value in the balloon driver. This should work always.

We could indeed base calculation on initial state of e820 and not count
the holes toward ballooning needs. I am not sure this will work for
memory unplug though, where a hole can be created in the map and we will
be supposed to handle disappearing memory via ballooning.

Or am I creating a problem where none exists?

-boris


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Boris Ostrovsky
On 03/28/2017 11:30 AM, Juergen Gross wrote:
> On 28/03/17 16:27, Boris Ostrovsky wrote:
>> On 03/28/2017 04:08 AM, Jan Beulich wrote:
>> On 28.03.17 at 03:57,  wrote:
 I think there is indeed a disconnect between target memory (provided by 
 the toolstack) and current memory (i.e actual pages available to the 
 guest).

 For example

 [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
 reserved
 [0.00] BIOS-e820: [mem 0x000e-0x000f] 
 reserved

 are missed in target calculation. The hvmloader marks them as RESERVED 
 (in build_e820_table()) but target value is not aware of this action.

 And then the same problem repeats when kernel removes 
 0x000a-0x000f chunk.
>>> But this is all in-guest behavior, i.e. nothing an entity outside the
>>> guest (tool stack or hypervisor) should need to be aware of. That
>>> said, there is still room for improvement in the tools I think:
>>> Regions which architecturally aren't RAM (namely the
>>> 0xa-0xf range) would probably better not be accounted
>>> for as RAM as far as ballooning is concerned. In the hypervisor,
>>> otoh, all memory assigned to the guest (i.e. including such backing
>>> ROMs) needs to be accounted.
>> On the Linux side we should not include in balloon calculations pages
>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
>>
>> Which leaves hvmloader's special pages (and possibly memory under
>> 0xA which may get reserved). Can we pass this info to guests via
>> xenstore?
> I'd rather keep an internal difference between online pages and E820-map
> count value in the balloon driver. This should work always.

We could indeed base calculation on initial state of e820 and not count
the holes toward ballooning needs. I am not sure this will work for
memory unplug though, where a hole can be created in the map and we will
be supposed to handle disappearing memory via ballooning.

Or am I creating a problem where none exists?

-boris


Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Juergen Gross
On 28/03/17 16:27, Boris Ostrovsky wrote:
> On 03/28/2017 04:08 AM, Jan Beulich wrote:
> On 28.03.17 at 03:57,  wrote:
>>> I think there is indeed a disconnect between target memory (provided by 
>>> the toolstack) and current memory (i.e actual pages available to the guest).
>>>
>>> For example
>>>
>>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
>>> reserved
>>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
>>> reserved
>>>
>>> are missed in target calculation. The hvmloader marks them as RESERVED 
>>> (in build_e820_table()) but target value is not aware of this action.
>>>
>>> And then the same problem repeats when kernel removes 
>>> 0x000a-0x000f chunk.
>> But this is all in-guest behavior, i.e. nothing an entity outside the
>> guest (tool stack or hypervisor) should need to be aware of. That
>> said, there is still room for improvement in the tools I think:
>> Regions which architecturally aren't RAM (namely the
>> 0xa-0xf range) would probably better not be accounted
>> for as RAM as far as ballooning is concerned. In the hypervisor,
>> otoh, all memory assigned to the guest (i.e. including such backing
>> ROMs) needs to be accounted.
> 
> On the Linux side we should not include in balloon calculations pages
> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
> 
> Which leaves hvmloader's special pages (and possibly memory under
> 0xA which may get reserved). Can we pass this info to guests via
> xenstore?

I'd rather keep an internal difference between online pages and E820-map
count value in the balloon driver. This should work always.


Juergen



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Juergen Gross
On 28/03/17 16:27, Boris Ostrovsky wrote:
> On 03/28/2017 04:08 AM, Jan Beulich wrote:
> On 28.03.17 at 03:57,  wrote:
>>> I think there is indeed a disconnect between target memory (provided by 
>>> the toolstack) and current memory (i.e actual pages available to the guest).
>>>
>>> For example
>>>
>>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
>>> reserved
>>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
>>> reserved
>>>
>>> are missed in target calculation. The hvmloader marks them as RESERVED 
>>> (in build_e820_table()) but target value is not aware of this action.
>>>
>>> And then the same problem repeats when kernel removes 
>>> 0x000a-0x000f chunk.
>> But this is all in-guest behavior, i.e. nothing an entity outside the
>> guest (tool stack or hypervisor) should need to be aware of. That
>> said, there is still room for improvement in the tools I think:
>> Regions which architecturally aren't RAM (namely the
>> 0xa-0xf range) would probably better not be accounted
>> for as RAM as far as ballooning is concerned. In the hypervisor,
>> otoh, all memory assigned to the guest (i.e. including such backing
>> ROMs) needs to be accounted.
> 
> On the Linux side we should not include in balloon calculations pages
> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.
> 
> Which leaves hvmloader's special pages (and possibly memory under
> 0xA which may get reserved). Can we pass this info to guests via
> xenstore?

I'd rather keep an internal difference between online pages and E820-map
count value in the balloon driver. This should work always.


Juergen



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Jan Beulich
>>> On 28.03.17 at 16:27,  wrote:
> Which leaves hvmloader's special pages (and possibly memory under
> 0xA which may get reserved). Can we pass this info to guests via
> xenstore?

I'm perhaps the wrong one to ask regarding xenstore, but for
in-guest communication this seems an at least strange approach
to me.

Jan



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Jan Beulich
>>> On 28.03.17 at 16:27,  wrote:
> Which leaves hvmloader's special pages (and possibly memory under
> 0xA which may get reserved). Can we pass this info to guests via
> xenstore?

I'm perhaps the wrong one to ask regarding xenstore, but for
in-guest communication this seems an at least strange approach
to me.

Jan



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Boris Ostrovsky
On 03/28/2017 04:08 AM, Jan Beulich wrote:
 On 28.03.17 at 03:57,  wrote:
>> I think there is indeed a disconnect between target memory (provided by 
>> the toolstack) and current memory (i.e actual pages available to the guest).
>>
>> For example
>>
>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
>> reserved
>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
>> reserved
>>
>> are missed in target calculation. The hvmloader marks them as RESERVED 
>> (in build_e820_table()) but target value is not aware of this action.
>>
>> And then the same problem repeats when kernel removes 
>> 0x000a-0x000f chunk.
> But this is all in-guest behavior, i.e. nothing an entity outside the
> guest (tool stack or hypervisor) should need to be aware of. That
> said, there is still room for improvement in the tools I think:
> Regions which architecturally aren't RAM (namely the
> 0xa-0xf range) would probably better not be accounted
> for as RAM as far as ballooning is concerned. In the hypervisor,
> otoh, all memory assigned to the guest (i.e. including such backing
> ROMs) needs to be accounted.

On the Linux side we should not include in balloon calculations pages
reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.

Which leaves hvmloader's special pages (and possibly memory under
0xA which may get reserved). Can we pass this info to guests via
xenstore?

-boris




Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Boris Ostrovsky
On 03/28/2017 04:08 AM, Jan Beulich wrote:
 On 28.03.17 at 03:57,  wrote:
>> I think there is indeed a disconnect between target memory (provided by 
>> the toolstack) and current memory (i.e actual pages available to the guest).
>>
>> For example
>>
>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
>> reserved
>> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
>> reserved
>>
>> are missed in target calculation. The hvmloader marks them as RESERVED 
>> (in build_e820_table()) but target value is not aware of this action.
>>
>> And then the same problem repeats when kernel removes 
>> 0x000a-0x000f chunk.
> But this is all in-guest behavior, i.e. nothing an entity outside the
> guest (tool stack or hypervisor) should need to be aware of. That
> said, there is still room for improvement in the tools I think:
> Regions which architecturally aren't RAM (namely the
> 0xa-0xf range) would probably better not be accounted
> for as RAM as far as ballooning is concerned. In the hypervisor,
> otoh, all memory assigned to the guest (i.e. including such backing
> ROMs) needs to be accounted.

On the Linux side we should not include in balloon calculations pages
reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1.

Which leaves hvmloader's special pages (and possibly memory under
0xA which may get reserved). Can we pass this info to guests via
xenstore?

-boris




Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Jan Beulich
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the guest).
> 
> For example
> 
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
> 
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
> 
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.

But this is all in-guest behavior, i.e. nothing an entity outside the
guest (tool stack or hypervisor) should need to be aware of. That
said, there is still room for improvement in the tools I think:
Regions which architecturally aren't RAM (namely the
0xa-0xf range) would probably better not be accounted
for as RAM as far as ballooning is concerned. In the hypervisor,
otoh, all memory assigned to the guest (i.e. including such backing
ROMs) needs to be accounted.

Jan



Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-28 Thread Jan Beulich
>>> On 28.03.17 at 03:57,  wrote:
> I think there is indeed a disconnect between target memory (provided by 
> the toolstack) and current memory (i.e actual pages available to the guest).
> 
> For example
> 
> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
> reserved
> [0.00] BIOS-e820: [mem 0x000e-0x000f] 
> reserved
> 
> are missed in target calculation. The hvmloader marks them as RESERVED 
> (in build_e820_table()) but target value is not aware of this action.
> 
> And then the same problem repeats when kernel removes 
> 0x000a-0x000f chunk.

But this is all in-guest behavior, i.e. nothing an entity outside the
guest (tool stack or hypervisor) should need to be aware of. That
said, there is still room for improvement in the tools I think:
Regions which architecturally aren't RAM (namely the
0xa-0xf range) would probably better not be accounted
for as RAM as far as ballooning is concerned. In the hypervisor,
otoh, all memory assigned to the guest (i.e. including such backing
ROMs) needs to be accounted.

Jan



Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-27 Thread Boris Ostrovsky



On 03/27/2017 03:57 PM, Dan Streetman wrote:

On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky
 wrote:




I think we can all agree that the *ideal* situation would be, for the
balloon driver to not immediately hotplug memory so it can add 11 more
pages, so maybe I just need to figure out why the balloon driver
thinks it needs 11 more pages, and fix that.




How does the new memory appear in the guest? Via online_pages()?

Or is ballooning triggered from watch_target()?


yes, it's triggered from watch_target() which then calls
online_pages() with the new memory.  I added some debug (all numbers
are in hex):

[0.500080] xen:balloon: Initialising balloon driver
[0.503027] xen:balloon: balloon_init: current/target pages 1fff9d
[0.504044] xen_balloon: Initialising balloon driver
[0.508046] xen_balloon: watch_target: new target 80 kb
[0.508046] xen:balloon: balloon_set_new_target: target 20
[0.524024] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[0.567055] xen:balloon: balloon_process: current_credit 63
[0.568005] xen:balloon: reserve_additional_memory: adding memory
resource for 8000 pages
[3.694443] online_pages: pfn 21 nr_pages 8000 type 0
[3.701072] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[3.701074] xen:balloon: balloon_process: current_credit 63
[3.701075] xen:balloon: increase_reservation: nr_pages 63
[3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8
[3.701172] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[3.701173] xen:balloon: balloon_process: current_credit 58
[3.701173] xen:balloon: increase_reservation: nr_pages 58
[3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[5.708085] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[5.708088] xen:balloon: balloon_process: current_credit 58
[5.708089] xen:balloon: increase_reservation: nr_pages 58
[5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[9.716065] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[9.716068] xen:balloon: balloon_process: current_credit 58
[9.716069] xen:balloon: increase_reservation: nr_pages 58
[9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0


and that continues forever at the max interval (32), since
max_retry_count is unlimited.  So I think I understand things now;
first, the current_pages is set properly based on the e820 map:

$ dmesg|grep -i e820
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xefff] usable
[0.00] BIOS-e820: [mem 0xfc00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4
[0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4
[0.00] e820: [mem 0xf000-0xfbff] available for PCI devices
[0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009]
ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 +
0xf - 0x100 + 0x9e - 1 ]
1fff9d


then, the xen balloon notices its target has been set to 20 by the
hypervisor.  That target does account for the hole at 0xf to
0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20
pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted
hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages).  That's
0x62 pages, plus the 1-page hole at addr 0 that the kernel always
reserves, is 0x63 pages of holes, which aren't accounted for in the
hypervisor's target.

so the balloon driver hotplugs the memory, and tries to increase its
reservation to provide the needed pages to get the current_pages up to
the target.  However, when it calls the hypervisor to populate the
physmap, the hypervisor only allows 11 (0xb) pages to be populated;
all calls after that get back 0 from the hypervisor.

Do you think the hypervisor's balloon target should account for the
e820 holes (and for the kernel's added hole at addr 0)?
Alternately/additionally, if the hypervisor doesn't want to support
ballooning, should it just return error from the call to populate the
physmap, and not allow those 11 pages?

At this point, it doesn't seem to me like the kernel is doing anything
wrong, correct?




I think there is indeed a disconnect between target 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-27 Thread Boris Ostrovsky



On 03/27/2017 03:57 PM, Dan Streetman wrote:

On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky
 wrote:




I think we can all agree that the *ideal* situation would be, for the
balloon driver to not immediately hotplug memory so it can add 11 more
pages, so maybe I just need to figure out why the balloon driver
thinks it needs 11 more pages, and fix that.




How does the new memory appear in the guest? Via online_pages()?

Or is ballooning triggered from watch_target()?


yes, it's triggered from watch_target() which then calls
online_pages() with the new memory.  I added some debug (all numbers
are in hex):

[0.500080] xen:balloon: Initialising balloon driver
[0.503027] xen:balloon: balloon_init: current/target pages 1fff9d
[0.504044] xen_balloon: Initialising balloon driver
[0.508046] xen_balloon: watch_target: new target 80 kb
[0.508046] xen:balloon: balloon_set_new_target: target 20
[0.524024] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[0.567055] xen:balloon: balloon_process: current_credit 63
[0.568005] xen:balloon: reserve_additional_memory: adding memory
resource for 8000 pages
[3.694443] online_pages: pfn 21 nr_pages 8000 type 0
[3.701072] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[3.701074] xen:balloon: balloon_process: current_credit 63
[3.701075] xen:balloon: increase_reservation: nr_pages 63
[3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8
[3.701172] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[3.701173] xen:balloon: balloon_process: current_credit 58
[3.701173] xen:balloon: increase_reservation: nr_pages 58
[3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[5.708085] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[5.708088] xen:balloon: balloon_process: current_credit 58
[5.708089] xen:balloon: increase_reservation: nr_pages 58
[5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[9.716065] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[9.716068] xen:balloon: balloon_process: current_credit 58
[9.716069] xen:balloon: increase_reservation: nr_pages 58
[9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0


and that continues forever at the max interval (32), since
max_retry_count is unlimited.  So I think I understand things now;
first, the current_pages is set properly based on the e820 map:

$ dmesg|grep -i e820
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xefff] usable
[0.00] BIOS-e820: [mem 0xfc00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4
[0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4
[0.00] e820: [mem 0xf000-0xfbff] available for PCI devices
[0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009]
ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 +
0xf - 0x100 + 0x9e - 1 ]
1fff9d


then, the xen balloon notices its target has been set to 20 by the
hypervisor.  That target does account for the hole at 0xf to
0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20
pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted
hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages).  That's
0x62 pages, plus the 1-page hole at addr 0 that the kernel always
reserves, is 0x63 pages of holes, which aren't accounted for in the
hypervisor's target.

so the balloon driver hotplugs the memory, and tries to increase its
reservation to provide the needed pages to get the current_pages up to
the target.  However, when it calls the hypervisor to populate the
physmap, the hypervisor only allows 11 (0xb) pages to be populated;
all calls after that get back 0 from the hypervisor.

Do you think the hypervisor's balloon target should account for the
e820 holes (and for the kernel's added hole at addr 0)?
Alternately/additionally, if the hypervisor doesn't want to support
ballooning, should it just return error from the call to populate the
physmap, and not allow those 11 pages?

At this point, it doesn't seem to me like the kernel is doing anything
wrong, correct?




I think there is indeed a disconnect between target memory (provided by 
the 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-27 Thread Dan Streetman
On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky
 wrote:
>
>>
>> I think we can all agree that the *ideal* situation would be, for the
>> balloon driver to not immediately hotplug memory so it can add 11 more
>> pages, so maybe I just need to figure out why the balloon driver
>> thinks it needs 11 more pages, and fix that.
>
>
>
> How does the new memory appear in the guest? Via online_pages()?
>
> Or is ballooning triggered from watch_target()?

yes, it's triggered from watch_target() which then calls
online_pages() with the new memory.  I added some debug (all numbers
are in hex):

[0.500080] xen:balloon: Initialising balloon driver
[0.503027] xen:balloon: balloon_init: current/target pages 1fff9d
[0.504044] xen_balloon: Initialising balloon driver
[0.508046] xen_balloon: watch_target: new target 80 kb
[0.508046] xen:balloon: balloon_set_new_target: target 20
[0.524024] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[0.567055] xen:balloon: balloon_process: current_credit 63
[0.568005] xen:balloon: reserve_additional_memory: adding memory
resource for 8000 pages
[3.694443] online_pages: pfn 21 nr_pages 8000 type 0
[3.701072] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[3.701074] xen:balloon: balloon_process: current_credit 63
[3.701075] xen:balloon: increase_reservation: nr_pages 63
[3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8
[3.701172] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[3.701173] xen:balloon: balloon_process: current_credit 58
[3.701173] xen:balloon: increase_reservation: nr_pages 58
[3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[5.708085] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[5.708088] xen:balloon: balloon_process: current_credit 58
[5.708089] xen:balloon: increase_reservation: nr_pages 58
[5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[9.716065] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[9.716068] xen:balloon: balloon_process: current_credit 58
[9.716069] xen:balloon: increase_reservation: nr_pages 58
[9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0


and that continues forever at the max interval (32), since
max_retry_count is unlimited.  So I think I understand things now;
first, the current_pages is set properly based on the e820 map:

$ dmesg|grep -i e820
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xefff] usable
[0.00] BIOS-e820: [mem 0xfc00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4
[0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4
[0.00] e820: [mem 0xf000-0xfbff] available for PCI devices
[0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009]
ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 +
0xf - 0x100 + 0x9e - 1 ]
1fff9d


then, the xen balloon notices its target has been set to 20 by the
hypervisor.  That target does account for the hole at 0xf to
0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20
pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted
hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages).  That's
0x62 pages, plus the 1-page hole at addr 0 that the kernel always
reserves, is 0x63 pages of holes, which aren't accounted for in the
hypervisor's target.

so the balloon driver hotplugs the memory, and tries to increase its
reservation to provide the needed pages to get the current_pages up to
the target.  However, when it calls the hypervisor to populate the
physmap, the hypervisor only allows 11 (0xb) pages to be populated;
all calls after that get back 0 from the hypervisor.

Do you think the hypervisor's balloon target should account for the
e820 holes (and for the kernel's added hole at addr 0)?
Alternately/additionally, if the hypervisor doesn't want to support
ballooning, should it just return error from the call to populate the
physmap, and not allow those 11 pages?

At this point, it doesn't seem to me like the kernel is doing anything
wrong, correct?


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-27 Thread Dan Streetman
On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky
 wrote:
>
>>
>> I think we can all agree that the *ideal* situation would be, for the
>> balloon driver to not immediately hotplug memory so it can add 11 more
>> pages, so maybe I just need to figure out why the balloon driver
>> thinks it needs 11 more pages, and fix that.
>
>
>
> How does the new memory appear in the guest? Via online_pages()?
>
> Or is ballooning triggered from watch_target()?

yes, it's triggered from watch_target() which then calls
online_pages() with the new memory.  I added some debug (all numbers
are in hex):

[0.500080] xen:balloon: Initialising balloon driver
[0.503027] xen:balloon: balloon_init: current/target pages 1fff9d
[0.504044] xen_balloon: Initialising balloon driver
[0.508046] xen_balloon: watch_target: new target 80 kb
[0.508046] xen:balloon: balloon_set_new_target: target 20
[0.524024] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[0.567055] xen:balloon: balloon_process: current_credit 63
[0.568005] xen:balloon: reserve_additional_memory: adding memory
resource for 8000 pages
[3.694443] online_pages: pfn 21 nr_pages 8000 type 0
[3.701072] xen:balloon: current_credit: target pages 20
current pages 1fff9d credit 63
[3.701074] xen:balloon: balloon_process: current_credit 63
[3.701075] xen:balloon: increase_reservation: nr_pages 63
[3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8
[3.701172] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[3.701173] xen:balloon: balloon_process: current_credit 58
[3.701173] xen:balloon: increase_reservation: nr_pages 58
[3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[5.708085] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[5.708088] xen:balloon: balloon_process: current_credit 58
[5.708089] xen:balloon: increase_reservation: nr_pages 58
[5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0
[9.716065] xen:balloon: current_credit: target pages 20
current pages 1fffa8 credit 58
[9.716068] xen:balloon: balloon_process: current_credit 58
[9.716069] xen:balloon: increase_reservation: nr_pages 58
[9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0


and that continues forever at the max interval (32), since
max_retry_count is unlimited.  So I think I understand things now;
first, the current_pages is set properly based on the e820 map:

$ dmesg|grep -i e820
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xefff] usable
[0.00] BIOS-e820: [mem 0xfc00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4
[0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4
[0.00] e820: [mem 0xf000-0xfbff] available for PCI devices
[0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009]
ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 +
0xf - 0x100 + 0x9e - 1 ]
1fff9d


then, the xen balloon notices its target has been set to 20 by the
hypervisor.  That target does account for the hole at 0xf to
0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20
pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted
hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages).  That's
0x62 pages, plus the 1-page hole at addr 0 that the kernel always
reserves, is 0x63 pages of holes, which aren't accounted for in the
hypervisor's target.

so the balloon driver hotplugs the memory, and tries to increase its
reservation to provide the needed pages to get the current_pages up to
the target.  However, when it calls the hypervisor to populate the
physmap, the hypervisor only allows 11 (0xb) pages to be populated;
all calls after that get back 0 from the hypervisor.

Do you think the hypervisor's balloon target should account for the
e820 holes (and for the kernel's added hole at addr 0)?
Alternately/additionally, if the hypervisor doesn't want to support
ballooning, should it just return error from the call to populate the
physmap, and not allow those 11 pages?

At this point, it doesn't seem to me like the kernel is doing anything
wrong, correct?


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Boris Ostrovsky




I think we can all agree that the *ideal* situation would be, for the
balloon driver to not immediately hotplug memory so it can add 11 more
pages, so maybe I just need to figure out why the balloon driver
thinks it needs 11 more pages, and fix that.



How does the new memory appear in the guest? Via online_pages()?

Or is ballooning triggered from watch_target()?

-boris


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Boris Ostrovsky




I think we can all agree that the *ideal* situation would be, for the
balloon driver to not immediately hotplug memory so it can add 11 more
pages, so maybe I just need to figure out why the balloon driver
thinks it needs 11 more pages, and fix that.



How does the new memory appear in the guest? Via online_pages()?

Or is ballooning triggered from watch_target()?

-boris


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Fri, Mar 24, 2017 at 5:10 PM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote:
>> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
>>  wrote:
>> >
>> >
>> > On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> >>
>> >> I have a question about a problem introduced by this commit:
>> >> c275a57f5ec3056f732843b11659d892235faff7
>> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>> >>
>> >> It changed the xen balloon current_pages calculation to start with the
>> >> number of physical pages in the system, instead of max_pfn.  Since
>> >> get_num_physpages() does not include holes, it's always less than the
>> >> e820 map's max_pfn.
>> >>
>> >> However, the problem that commit introduced is, if the hypervisor sets
>> >> the balloon target to equal to the e820 map's max_pfn, then the
>> >> balloon target will *always* be higher than the initial current pages.
>> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> >> the OS adds any holes, the balloon target will be higher than the
>> >> current pages.  This is the situation, for example, for Amazon AWS
>> >> instances.  The result is, the xen balloon will always immediately
>> >> hotplug some memory at boot, but then make only (max_pfn -
>> >> get_num_physpages()) available to the system.
>> >>
>> >> This balloon-hotplugged memory can cause problems, if the hypervisor
>> >> wasn't expecting it; specifically, the system's physical page
>> >> addresses now will exceed the e820 map's max_pfn, due to the
>> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> >> DMA to/from those physical pages above the e820 max_pfn, it causes
>> >> problems.  For example:
>> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>> >>
>> >> The additional small amount of balloon memory can cause other problems
>> >> as well, for example:
>> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>> >>
>> >> Anyway, I'd like to ask, was the original commit added because
>> >> hypervisors are supposed to set their balloon target to the guest
>> >> system's number of phys pages (max_pfn - holes)?  The mailing list
>> >> discussion and commit description seem to indicate that.
>> >
>> >
>> >
>> > IIRC the problem that this was trying to fix was that since max_pfn 
>> > includes
>> > holes, upon booting we'd immediately balloon down by the (typically, MMIO)
>> > hole size.
>> >
>> > If you boot a guest with ~4+GB memory you should see this.
>> >
>> >
>> >> However I'm
>> >> not sure how that is possible, because the kernel reserves its own
>> >> holes, regardless of any predefined holes in the e820 map; for
>> >> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> >> the hypervisor really has no way to know what the "right" target to
>> >> specify is; unless it knows the exact guest OS and kernel version, and
>> >> kernel config values, it will never be able to correctly specify its
>> >> target to be exactly (e820 max_pfn - all holes).
>> >>
>> >> Should this commit be reverted?  Should the xen balloon target be
>> >> adjusted based on kernel-added e820 holes?
>> >
>> >
>> > I think the second one but shouldn't current_pages be updated, and not the
>> > target? The latter is set by Xen (toolstack, via xenstore usually).
>> >
>> > Also, the bugs above (at least one of them) talk about NVMe and I wonder
>> > whether the memory that they add is of RAM type --- I believe it has its 
>> > own
>> > type and so perhaps that introduces additional inconsistencies. AWS may 
>> > have
>> > added their own support for that, which we don't have upstream yet.
>>
>> The type of memory doesn't have anything to do with it.
>>
>> The problem with NVMe is it's a passthrough device, so the guest talks
>> directly to the NVMe controller and does DMA with it.  But the
>> hypervisor does swiotlb translation between the guest physical memory,
>
> Um, the hypervisor does not have SWIOTLB support, only IOMMU support.

heh, well I have no special insight into Amazon's hypervisor, so I
have no idea what underlying memory remapping mechanism it uses :)

>
>> and the host physical memory, so that the NVMe device can correctly
>> DMA to the right memory in the host.
>>
>> However, the hypervisor only has the guest's physical memory up to the
>> max e820 pfn mapped; it didn't expect the balloon driver to hotplug
>> any additional memory above the e820 max pfn, so when the NVMe driver
>> in the guest tries to tell the NVMe controller to DMA to that
>> balloon-hotplugged memory, the hypervisor fails the NVMe request,
>
> But when the memory hotplug happens the hypercalls are done to
> raise the max pfn.

well...all I can say is it rejects DMA above the e820 range.  so this
very well may be a hypervisor bug, where it should add the balloon

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Fri, Mar 24, 2017 at 5:10 PM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote:
>> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
>>  wrote:
>> >
>> >
>> > On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> >>
>> >> I have a question about a problem introduced by this commit:
>> >> c275a57f5ec3056f732843b11659d892235faff7
>> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>> >>
>> >> It changed the xen balloon current_pages calculation to start with the
>> >> number of physical pages in the system, instead of max_pfn.  Since
>> >> get_num_physpages() does not include holes, it's always less than the
>> >> e820 map's max_pfn.
>> >>
>> >> However, the problem that commit introduced is, if the hypervisor sets
>> >> the balloon target to equal to the e820 map's max_pfn, then the
>> >> balloon target will *always* be higher than the initial current pages.
>> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> >> the OS adds any holes, the balloon target will be higher than the
>> >> current pages.  This is the situation, for example, for Amazon AWS
>> >> instances.  The result is, the xen balloon will always immediately
>> >> hotplug some memory at boot, but then make only (max_pfn -
>> >> get_num_physpages()) available to the system.
>> >>
>> >> This balloon-hotplugged memory can cause problems, if the hypervisor
>> >> wasn't expecting it; specifically, the system's physical page
>> >> addresses now will exceed the e820 map's max_pfn, due to the
>> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> >> DMA to/from those physical pages above the e820 max_pfn, it causes
>> >> problems.  For example:
>> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>> >>
>> >> The additional small amount of balloon memory can cause other problems
>> >> as well, for example:
>> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>> >>
>> >> Anyway, I'd like to ask, was the original commit added because
>> >> hypervisors are supposed to set their balloon target to the guest
>> >> system's number of phys pages (max_pfn - holes)?  The mailing list
>> >> discussion and commit description seem to indicate that.
>> >
>> >
>> >
>> > IIRC the problem that this was trying to fix was that since max_pfn 
>> > includes
>> > holes, upon booting we'd immediately balloon down by the (typically, MMIO)
>> > hole size.
>> >
>> > If you boot a guest with ~4+GB memory you should see this.
>> >
>> >
>> >> However I'm
>> >> not sure how that is possible, because the kernel reserves its own
>> >> holes, regardless of any predefined holes in the e820 map; for
>> >> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> >> the hypervisor really has no way to know what the "right" target to
>> >> specify is; unless it knows the exact guest OS and kernel version, and
>> >> kernel config values, it will never be able to correctly specify its
>> >> target to be exactly (e820 max_pfn - all holes).
>> >>
>> >> Should this commit be reverted?  Should the xen balloon target be
>> >> adjusted based on kernel-added e820 holes?
>> >
>> >
>> > I think the second one but shouldn't current_pages be updated, and not the
>> > target? The latter is set by Xen (toolstack, via xenstore usually).
>> >
>> > Also, the bugs above (at least one of them) talk about NVMe and I wonder
>> > whether the memory that they add is of RAM type --- I believe it has its 
>> > own
>> > type and so perhaps that introduces additional inconsistencies. AWS may 
>> > have
>> > added their own support for that, which we don't have upstream yet.
>>
>> The type of memory doesn't have anything to do with it.
>>
>> The problem with NVMe is it's a passthrough device, so the guest talks
>> directly to the NVMe controller and does DMA with it.  But the
>> hypervisor does swiotlb translation between the guest physical memory,
>
> Um, the hypervisor does not have SWIOTLB support, only IOMMU support.

heh, well I have no special insight into Amazon's hypervisor, so I
have no idea what underlying memory remapping mechanism it uses :)

>
>> and the host physical memory, so that the NVMe device can correctly
>> DMA to the right memory in the host.
>>
>> However, the hypervisor only has the guest's physical memory up to the
>> max e820 pfn mapped; it didn't expect the balloon driver to hotplug
>> any additional memory above the e820 max pfn, so when the NVMe driver
>> in the guest tries to tell the NVMe controller to DMA to that
>> balloon-hotplugged memory, the hypervisor fails the NVMe request,
>
> But when the memory hotplug happens the hypercalls are done to
> raise the max pfn.

well...all I can say is it rejects DMA above the e820 range.  so this
very well may be a hypervisor bug, where it should add the balloon
memory region to whatever does the NVMe passthrough 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Konrad Rzeszutek Wilk
On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote:
> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
>  wrote:
> >
> >
> > On 03/22/2017 05:16 PM, Dan Streetman wrote:
> >>
> >> I have a question about a problem introduced by this commit:
> >> c275a57f5ec3056f732843b11659d892235faff7
> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
> >>
> >> It changed the xen balloon current_pages calculation to start with the
> >> number of physical pages in the system, instead of max_pfn.  Since
> >> get_num_physpages() does not include holes, it's always less than the
> >> e820 map's max_pfn.
> >>
> >> However, the problem that commit introduced is, if the hypervisor sets
> >> the balloon target to equal to the e820 map's max_pfn, then the
> >> balloon target will *always* be higher than the initial current pages.
> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
> >> the OS adds any holes, the balloon target will be higher than the
> >> current pages.  This is the situation, for example, for Amazon AWS
> >> instances.  The result is, the xen balloon will always immediately
> >> hotplug some memory at boot, but then make only (max_pfn -
> >> get_num_physpages()) available to the system.
> >>
> >> This balloon-hotplugged memory can cause problems, if the hypervisor
> >> wasn't expecting it; specifically, the system's physical page
> >> addresses now will exceed the e820 map's max_pfn, due to the
> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
> >> DMA to/from those physical pages above the e820 max_pfn, it causes
> >> problems.  For example:
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
> >>
> >> The additional small amount of balloon memory can cause other problems
> >> as well, for example:
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
> >>
> >> Anyway, I'd like to ask, was the original commit added because
> >> hypervisors are supposed to set their balloon target to the guest
> >> system's number of phys pages (max_pfn - holes)?  The mailing list
> >> discussion and commit description seem to indicate that.
> >
> >
> >
> > IIRC the problem that this was trying to fix was that since max_pfn includes
> > holes, upon booting we'd immediately balloon down by the (typically, MMIO)
> > hole size.
> >
> > If you boot a guest with ~4+GB memory you should see this.
> >
> >
> >> However I'm
> >> not sure how that is possible, because the kernel reserves its own
> >> holes, regardless of any predefined holes in the e820 map; for
> >> example, the kernel reserves 64k (by default) at phys addr 0 (the
> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
> >> the hypervisor really has no way to know what the "right" target to
> >> specify is; unless it knows the exact guest OS and kernel version, and
> >> kernel config values, it will never be able to correctly specify its
> >> target to be exactly (e820 max_pfn - all holes).
> >>
> >> Should this commit be reverted?  Should the xen balloon target be
> >> adjusted based on kernel-added e820 holes?
> >
> >
> > I think the second one but shouldn't current_pages be updated, and not the
> > target? The latter is set by Xen (toolstack, via xenstore usually).
> >
> > Also, the bugs above (at least one of them) talk about NVMe and I wonder
> > whether the memory that they add is of RAM type --- I believe it has its own
> > type and so perhaps that introduces additional inconsistencies. AWS may have
> > added their own support for that, which we don't have upstream yet.
> 
> The type of memory doesn't have anything to do with it.
> 
> The problem with NVMe is it's a passthrough device, so the guest talks
> directly to the NVMe controller and does DMA with it.  But the
> hypervisor does swiotlb translation between the guest physical memory,

Um, the hypervisor does not have SWIOTLB support, only IOMMU support.

> and the host physical memory, so that the NVMe device can correctly
> DMA to the right memory in the host.
> 
> However, the hypervisor only has the guest's physical memory up to the
> max e820 pfn mapped; it didn't expect the balloon driver to hotplug
> any additional memory above the e820 max pfn, so when the NVMe driver
> in the guest tries to tell the NVMe controller to DMA to that
> balloon-hotplugged memory, the hypervisor fails the NVMe request,

But when the memory hotplug happens the hypercalls are done to
raise the max pfn.

> because it can't do the guest-to-host phys mem mapping, since the
> guest phys address is outside the expected max range.
> 
> 
> 
> >
> > -boris
> >
> >
> >
> >> Should something else be
> >> done?
> >>
> >> For context, Amazon Linux has simply disabled Xen ballooning
> >> completely.  Likewise, we're planning to disable Xen ballooning in the
> >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
> >> Ubuntu kernels).  However, if reverting this 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Konrad Rzeszutek Wilk
On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote:
> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
>  wrote:
> >
> >
> > On 03/22/2017 05:16 PM, Dan Streetman wrote:
> >>
> >> I have a question about a problem introduced by this commit:
> >> c275a57f5ec3056f732843b11659d892235faff7
> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
> >>
> >> It changed the xen balloon current_pages calculation to start with the
> >> number of physical pages in the system, instead of max_pfn.  Since
> >> get_num_physpages() does not include holes, it's always less than the
> >> e820 map's max_pfn.
> >>
> >> However, the problem that commit introduced is, if the hypervisor sets
> >> the balloon target to equal to the e820 map's max_pfn, then the
> >> balloon target will *always* be higher than the initial current pages.
> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
> >> the OS adds any holes, the balloon target will be higher than the
> >> current pages.  This is the situation, for example, for Amazon AWS
> >> instances.  The result is, the xen balloon will always immediately
> >> hotplug some memory at boot, but then make only (max_pfn -
> >> get_num_physpages()) available to the system.
> >>
> >> This balloon-hotplugged memory can cause problems, if the hypervisor
> >> wasn't expecting it; specifically, the system's physical page
> >> addresses now will exceed the e820 map's max_pfn, due to the
> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
> >> DMA to/from those physical pages above the e820 max_pfn, it causes
> >> problems.  For example:
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
> >>
> >> The additional small amount of balloon memory can cause other problems
> >> as well, for example:
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
> >>
> >> Anyway, I'd like to ask, was the original commit added because
> >> hypervisors are supposed to set their balloon target to the guest
> >> system's number of phys pages (max_pfn - holes)?  The mailing list
> >> discussion and commit description seem to indicate that.
> >
> >
> >
> > IIRC the problem that this was trying to fix was that since max_pfn includes
> > holes, upon booting we'd immediately balloon down by the (typically, MMIO)
> > hole size.
> >
> > If you boot a guest with ~4+GB memory you should see this.
> >
> >
> >> However I'm
> >> not sure how that is possible, because the kernel reserves its own
> >> holes, regardless of any predefined holes in the e820 map; for
> >> example, the kernel reserves 64k (by default) at phys addr 0 (the
> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
> >> the hypervisor really has no way to know what the "right" target to
> >> specify is; unless it knows the exact guest OS and kernel version, and
> >> kernel config values, it will never be able to correctly specify its
> >> target to be exactly (e820 max_pfn - all holes).
> >>
> >> Should this commit be reverted?  Should the xen balloon target be
> >> adjusted based on kernel-added e820 holes?
> >
> >
> > I think the second one but shouldn't current_pages be updated, and not the
> > target? The latter is set by Xen (toolstack, via xenstore usually).
> >
> > Also, the bugs above (at least one of them) talk about NVMe and I wonder
> > whether the memory that they add is of RAM type --- I believe it has its own
> > type and so perhaps that introduces additional inconsistencies. AWS may have
> > added their own support for that, which we don't have upstream yet.
> 
> The type of memory doesn't have anything to do with it.
> 
> The problem with NVMe is it's a passthrough device, so the guest talks
> directly to the NVMe controller and does DMA with it.  But the
> hypervisor does swiotlb translation between the guest physical memory,

Um, the hypervisor does not have SWIOTLB support, only IOMMU support.

> and the host physical memory, so that the NVMe device can correctly
> DMA to the right memory in the host.
> 
> However, the hypervisor only has the guest's physical memory up to the
> max e820 pfn mapped; it didn't expect the balloon driver to hotplug
> any additional memory above the e820 max pfn, so when the NVMe driver
> in the guest tries to tell the NVMe controller to DMA to that
> balloon-hotplugged memory, the hypervisor fails the NVMe request,

But when the memory hotplug happens the hypercalls are done to
raise the max pfn.

> because it can't do the guest-to-host phys mem mapping, since the
> guest phys address is outside the expected max range.
> 
> 
> 
> >
> > -boris
> >
> >
> >
> >> Should something else be
> >> done?
> >>
> >> For context, Amazon Linux has simply disabled Xen ballooning
> >> completely.  Likewise, we're planning to disable Xen ballooning in the
> >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
> >> Ubuntu kernels).  However, if reverting this patch makes sense in a
> >> 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
 wrote:
>
>
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>>
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
>
>
>
> IIRC the problem that this was trying to fix was that since max_pfn includes
> holes, upon booting we'd immediately balloon down by the (typically, MMIO)
> hole size.
>
> If you boot a guest with ~4+GB memory you should see this.
>
>
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
>
>
> I think the second one but shouldn't current_pages be updated, and not the
> target? The latter is set by Xen (toolstack, via xenstore usually).
>
> Also, the bugs above (at least one of them) talk about NVMe and I wonder
> whether the memory that they add is of RAM type --- I believe it has its own
> type and so perhaps that introduces additional inconsistencies. AWS may have
> added their own support for that, which we don't have upstream yet.

The type of memory doesn't have anything to do with it.

The problem with NVMe is it's a passthrough device, so the guest talks
directly to the NVMe controller and does DMA with it.  But the
hypervisor does swiotlb translation between the guest physical memory,
and the host physical memory, so that the NVMe device can correctly
DMA to the right memory in the host.

However, the hypervisor only has the guest's physical memory up to the
max e820 pfn mapped; it didn't expect the balloon driver to hotplug
any additional memory above the e820 max pfn, so when the NVMe driver
in the guest tries to tell the NVMe controller to DMA to that
balloon-hotplugged memory, the hypervisor fails the NVMe request,
because it can't do the guest-to-host phys mem mapping, since the
guest phys address is outside the expected max range.



>
> -boris
>
>
>
>> Should something else be
>> done?
>>
>> For context, Amazon Linux has simply disabled Xen ballooning
>> completely.  Likewise, we're planning to disable Xen ballooning in the
>> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
>> Ubuntu kernels).  However, if reverting this patch makes sense in a
>> bigger context (i.e. Xen users besides AWS), that would allow more
>> Ubuntu kernels to work correctly in AWS instances.
>>
>


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky
 wrote:
>
>
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>>
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
>
>
>
> IIRC the problem that this was trying to fix was that since max_pfn includes
> holes, upon booting we'd immediately balloon down by the (typically, MMIO)
> hole size.
>
> If you boot a guest with ~4+GB memory you should see this.
>
>
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
>
>
> I think the second one but shouldn't current_pages be updated, and not the
> target? The latter is set by Xen (toolstack, via xenstore usually).
>
> Also, the bugs above (at least one of them) talk about NVMe and I wonder
> whether the memory that they add is of RAM type --- I believe it has its own
> type and so perhaps that introduces additional inconsistencies. AWS may have
> added their own support for that, which we don't have upstream yet.

The type of memory doesn't have anything to do with it.

The problem with NVMe is it's a passthrough device, so the guest talks
directly to the NVMe controller and does DMA with it.  But the
hypervisor does swiotlb translation between the guest physical memory,
and the host physical memory, so that the NVMe device can correctly
DMA to the right memory in the host.

However, the hypervisor only has the guest's physical memory up to the
max e820 pfn mapped; it didn't expect the balloon driver to hotplug
any additional memory above the e820 max pfn, so when the NVMe driver
in the guest tries to tell the NVMe controller to DMA to that
balloon-hotplugged memory, the hypervisor fails the NVMe request,
because it can't do the guest-to-host phys mem mapping, since the
guest phys address is outside the expected max range.



>
> -boris
>
>
>
>> Should something else be
>> done?
>>
>> For context, Amazon Linux has simply disabled Xen ballooning
>> completely.  Likewise, we're planning to disable Xen ballooning in the
>> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
>> Ubuntu kernels).  However, if reverting this patch makes sense in a
>> bigger context (i.e. Xen users besides AWS), that would allow more
>> Ubuntu kernels to work correctly in AWS instances.
>>
>


Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Thu, Mar 23, 2017 at 3:56 AM, Juergen Gross  wrote:
> On 23/03/17 03:13, Boris Ostrovsky wrote:
>>
>>
>> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>>> I have a question about a problem introduced by this commit:
>>> c275a57f5ec3056f732843b11659d892235faff7
>>> "xen/balloon: Set balloon's initial state to number of existing RAM
>>> pages"
>>>
>>> It changed the xen balloon current_pages calculation to start with the
>>> number of physical pages in the system, instead of max_pfn.  Since
>>> get_num_physpages() does not include holes, it's always less than the
>>> e820 map's max_pfn.
>>>
>>> However, the problem that commit introduced is, if the hypervisor sets
>>> the balloon target to equal to the e820 map's max_pfn, then the
>>> balloon target will *always* be higher than the initial current pages.
>>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>>> the OS adds any holes, the balloon target will be higher than the
>>> current pages.  This is the situation, for example, for Amazon AWS
>>> instances.  The result is, the xen balloon will always immediately
>>> hotplug some memory at boot, but then make only (max_pfn -
>>> get_num_physpages()) available to the system.
>>>
>>> This balloon-hotplugged memory can cause problems, if the hypervisor
>>> wasn't expecting it; specifically, the system's physical page
>>> addresses now will exceed the e820 map's max_pfn, due to the
>>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>>> DMA to/from those physical pages above the e820 max_pfn, it causes
>>> problems.  For example:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>>
>>> The additional small amount of balloon memory can cause other problems
>>> as well, for example:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>>
>>> Anyway, I'd like to ask, was the original commit added because
>>> hypervisors are supposed to set their balloon target to the guest
>>> system's number of phys pages (max_pfn - holes)?  The mailing list
>>> discussion and commit description seem to indicate that.
>>
>>
>> IIRC the problem that this was trying to fix was that since max_pfn
>> includes holes, upon booting we'd immediately balloon down by the
>> (typically, MMIO) hole size.
>>
>> If you boot a guest with ~4+GB memory you should see this.
>>
>>
>>> However I'm
>>> not sure how that is possible, because the kernel reserves its own
>>> holes, regardless of any predefined holes in the e820 map; for
>>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>>> the hypervisor really has no way to know what the "right" target to
>>> specify is; unless it knows the exact guest OS and kernel version, and
>>> kernel config values, it will never be able to correctly specify its
>>> target to be exactly (e820 max_pfn - all holes).
>>>
>>> Should this commit be reverted?  Should the xen balloon target be
>>> adjusted based on kernel-added e820 holes?
>>
>> I think the second one but shouldn't current_pages be updated, and not
>> the target? The latter is set by Xen (toolstack, via xenstore usually).
>
> Right.
>
> Looking into a HVM domU I can't see any problem related to
> CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is

sorry I brought that up; I was only giving an example.  It's not
directly relevant to this and may have distracted from the actual
problem; in fact on closer inspection, the X86_RESERVE_LOW is using
memblock_reserve(), which removes it from managed memory but not the
e820 map (and thus doesn't remove it from get_num_physpages()).  Only
phys page 0 is actually reserved in the e820 map.

> configured with 2048 MB of RAM, 8MB being video RAM. Looking into
> /sys/devices/system/xen_memory/xen_memory0 I can see the current
> size and target size do match: both are 2088960 kB (2 GB - 8 MB).
>
> Ballooning down and up to 2048 MB again doesn't change the picture.
>
> So which additional holes are added by the kernel on AWS via which
> functions?

I'll use two AWS types as examples, t2.micro (1G mem) and t2.large (8G mem).

In the micro, the results of ballooning are obvious, because the
hotplugged memory always goes into the Normal zone; but since the base
memory is only 1g, it's contained entirely in the DMA32/DMA zones.  So
we get:

$ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo
spanned  4095
present  3997
managed  3976
  start_pfn: 1
spanned  258048
present  258048
managed  249606
  start_pfn: 4096
spanned  32768
present  32768
managed  11
  start_pfn: 262144

As you can see, none of the e820 memory went into the Normal zone; the
balloon driver hotpluged 128m (32k pages), but only made 11 pages
available.  Having a memory zone with only 11 pages really screwed
with kswapd, since the zone's memory watermarks 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-24 Thread Dan Streetman
On Thu, Mar 23, 2017 at 3:56 AM, Juergen Gross  wrote:
> On 23/03/17 03:13, Boris Ostrovsky wrote:
>>
>>
>> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>>> I have a question about a problem introduced by this commit:
>>> c275a57f5ec3056f732843b11659d892235faff7
>>> "xen/balloon: Set balloon's initial state to number of existing RAM
>>> pages"
>>>
>>> It changed the xen balloon current_pages calculation to start with the
>>> number of physical pages in the system, instead of max_pfn.  Since
>>> get_num_physpages() does not include holes, it's always less than the
>>> e820 map's max_pfn.
>>>
>>> However, the problem that commit introduced is, if the hypervisor sets
>>> the balloon target to equal to the e820 map's max_pfn, then the
>>> balloon target will *always* be higher than the initial current pages.
>>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>>> the OS adds any holes, the balloon target will be higher than the
>>> current pages.  This is the situation, for example, for Amazon AWS
>>> instances.  The result is, the xen balloon will always immediately
>>> hotplug some memory at boot, but then make only (max_pfn -
>>> get_num_physpages()) available to the system.
>>>
>>> This balloon-hotplugged memory can cause problems, if the hypervisor
>>> wasn't expecting it; specifically, the system's physical page
>>> addresses now will exceed the e820 map's max_pfn, due to the
>>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>>> DMA to/from those physical pages above the e820 max_pfn, it causes
>>> problems.  For example:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>>
>>> The additional small amount of balloon memory can cause other problems
>>> as well, for example:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>>
>>> Anyway, I'd like to ask, was the original commit added because
>>> hypervisors are supposed to set their balloon target to the guest
>>> system's number of phys pages (max_pfn - holes)?  The mailing list
>>> discussion and commit description seem to indicate that.
>>
>>
>> IIRC the problem that this was trying to fix was that since max_pfn
>> includes holes, upon booting we'd immediately balloon down by the
>> (typically, MMIO) hole size.
>>
>> If you boot a guest with ~4+GB memory you should see this.
>>
>>
>>> However I'm
>>> not sure how that is possible, because the kernel reserves its own
>>> holes, regardless of any predefined holes in the e820 map; for
>>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>>> the hypervisor really has no way to know what the "right" target to
>>> specify is; unless it knows the exact guest OS and kernel version, and
>>> kernel config values, it will never be able to correctly specify its
>>> target to be exactly (e820 max_pfn - all holes).
>>>
>>> Should this commit be reverted?  Should the xen balloon target be
>>> adjusted based on kernel-added e820 holes?
>>
>> I think the second one but shouldn't current_pages be updated, and not
>> the target? The latter is set by Xen (toolstack, via xenstore usually).
>
> Right.
>
> Looking into a HVM domU I can't see any problem related to
> CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is

sorry I brought that up; I was only giving an example.  It's not
directly relevant to this and may have distracted from the actual
problem; in fact on closer inspection, the X86_RESERVE_LOW is using
memblock_reserve(), which removes it from managed memory but not the
e820 map (and thus doesn't remove it from get_num_physpages()).  Only
phys page 0 is actually reserved in the e820 map.

> configured with 2048 MB of RAM, 8MB being video RAM. Looking into
> /sys/devices/system/xen_memory/xen_memory0 I can see the current
> size and target size do match: both are 2088960 kB (2 GB - 8 MB).
>
> Ballooning down and up to 2048 MB again doesn't change the picture.
>
> So which additional holes are added by the kernel on AWS via which
> functions?

I'll use two AWS types as examples, t2.micro (1G mem) and t2.large (8G mem).

In the micro, the results of ballooning are obvious, because the
hotplugged memory always goes into the Normal zone; but since the base
memory is only 1g, it's contained entirely in the DMA32/DMA zones.  So
we get:

$ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo
spanned  4095
present  3997
managed  3976
  start_pfn: 1
spanned  258048
present  258048
managed  249606
  start_pfn: 4096
spanned  32768
present  32768
managed  11
  start_pfn: 262144

As you can see, none of the e820 memory went into the Normal zone; the
balloon driver hotpluged 128m (32k pages), but only made 11 pages
available.  Having a memory zone with only 11 pages really screwed
with kswapd, since the zone's memory watermarks were all 0.  That 

Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-23 Thread Juergen Gross
On 23/03/17 03:13, Boris Ostrovsky wrote:
> 
> 
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM
>> pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
> 
> 
> IIRC the problem that this was trying to fix was that since max_pfn
> includes holes, upon booting we'd immediately balloon down by the
> (typically, MMIO) hole size.
> 
> If you boot a guest with ~4+GB memory you should see this.
> 
> 
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
> 
> I think the second one but shouldn't current_pages be updated, and not
> the target? The latter is set by Xen (toolstack, via xenstore usually).

Right.

Looking into a HVM domU I can't see any problem related to
CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is
configured with 2048 MB of RAM, 8MB being video RAM. Looking into
/sys/devices/system/xen_memory/xen_memory0 I can see the current
size and target size do match: both are 2088960 kB (2 GB - 8 MB).

Ballooning down and up to 2048 MB again doesn't change the picture.

So which additional holes are added by the kernel on AWS via which
functions?


Juergen



Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-23 Thread Juergen Gross
On 23/03/17 03:13, Boris Ostrovsky wrote:
> 
> 
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM
>> pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
> 
> 
> IIRC the problem that this was trying to fix was that since max_pfn
> includes holes, upon booting we'd immediately balloon down by the
> (typically, MMIO) hole size.
> 
> If you boot a guest with ~4+GB memory you should see this.
> 
> 
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
> 
> I think the second one but shouldn't current_pages be updated, and not
> the target? The latter is set by Xen (toolstack, via xenstore usually).

Right.

Looking into a HVM domU I can't see any problem related to
CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is
configured with 2048 MB of RAM, 8MB being video RAM. Looking into
/sys/devices/system/xen_memory/xen_memory0 I can see the current
size and target size do match: both are 2088960 kB (2 GB - 8 MB).

Ballooning down and up to 2048 MB again doesn't change the picture.

So which additional holes are added by the kernel on AWS via which
functions?


Juergen



Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-22 Thread Boris Ostrovsky



On 03/22/2017 05:16 PM, Dan Streetman wrote:

I have a question about a problem introduced by this commit:
c275a57f5ec3056f732843b11659d892235faff7
"xen/balloon: Set balloon's initial state to number of existing RAM pages"

It changed the xen balloon current_pages calculation to start with the
number of physical pages in the system, instead of max_pfn.  Since
get_num_physpages() does not include holes, it's always less than the
e820 map's max_pfn.

However, the problem that commit introduced is, if the hypervisor sets
the balloon target to equal to the e820 map's max_pfn, then the
balloon target will *always* be higher than the initial current pages.
Even if the hypervisor sets the target to (e820 max_pfn - holes), if
the OS adds any holes, the balloon target will be higher than the
current pages.  This is the situation, for example, for Amazon AWS
instances.  The result is, the xen balloon will always immediately
hotplug some memory at boot, but then make only (max_pfn -
get_num_physpages()) available to the system.

This balloon-hotplugged memory can cause problems, if the hypervisor
wasn't expecting it; specifically, the system's physical page
addresses now will exceed the e820 map's max_pfn, due to the
balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
DMA to/from those physical pages above the e820 max_pfn, it causes
problems.  For example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129

The additional small amount of balloon memory can cause other problems
as well, for example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

Anyway, I'd like to ask, was the original commit added because
hypervisors are supposed to set their balloon target to the guest
system's number of phys pages (max_pfn - holes)?  The mailing list
discussion and commit description seem to indicate that.



IIRC the problem that this was trying to fix was that since max_pfn 
includes holes, upon booting we'd immediately balloon down by the 
(typically, MMIO) hole size.


If you boot a guest with ~4+GB memory you should see this.



However I'm
not sure how that is possible, because the kernel reserves its own
holes, regardless of any predefined holes in the e820 map; for
example, the kernel reserves 64k (by default) at phys addr 0 (the
amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
the hypervisor really has no way to know what the "right" target to
specify is; unless it knows the exact guest OS and kernel version, and
kernel config values, it will never be able to correctly specify its
target to be exactly (e820 max_pfn - all holes).

Should this commit be reverted?  Should the xen balloon target be
adjusted based on kernel-added e820 holes?


I think the second one but shouldn't current_pages be updated, and not 
the target? The latter is set by Xen (toolstack, via xenstore usually).


Also, the bugs above (at least one of them) talk about NVMe and I wonder 
whether the memory that they add is of RAM type --- I believe it has its 
own type and so perhaps that introduces additional inconsistencies. AWS 
may have added their own support for that, which we don't have upstream yet.


-boris



Should something else be
done?

For context, Amazon Linux has simply disabled Xen ballooning
completely.  Likewise, we're planning to disable Xen ballooning in the
Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
Ubuntu kernels).  However, if reverting this patch makes sense in a
bigger context (i.e. Xen users besides AWS), that would allow more
Ubuntu kernels to work correctly in AWS instances.



Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-22 Thread Boris Ostrovsky



On 03/22/2017 05:16 PM, Dan Streetman wrote:

I have a question about a problem introduced by this commit:
c275a57f5ec3056f732843b11659d892235faff7
"xen/balloon: Set balloon's initial state to number of existing RAM pages"

It changed the xen balloon current_pages calculation to start with the
number of physical pages in the system, instead of max_pfn.  Since
get_num_physpages() does not include holes, it's always less than the
e820 map's max_pfn.

However, the problem that commit introduced is, if the hypervisor sets
the balloon target to equal to the e820 map's max_pfn, then the
balloon target will *always* be higher than the initial current pages.
Even if the hypervisor sets the target to (e820 max_pfn - holes), if
the OS adds any holes, the balloon target will be higher than the
current pages.  This is the situation, for example, for Amazon AWS
instances.  The result is, the xen balloon will always immediately
hotplug some memory at boot, but then make only (max_pfn -
get_num_physpages()) available to the system.

This balloon-hotplugged memory can cause problems, if the hypervisor
wasn't expecting it; specifically, the system's physical page
addresses now will exceed the e820 map's max_pfn, due to the
balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
DMA to/from those physical pages above the e820 max_pfn, it causes
problems.  For example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129

The additional small amount of balloon memory can cause other problems
as well, for example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

Anyway, I'd like to ask, was the original commit added because
hypervisors are supposed to set their balloon target to the guest
system's number of phys pages (max_pfn - holes)?  The mailing list
discussion and commit description seem to indicate that.



IIRC the problem that this was trying to fix was that since max_pfn 
includes holes, upon booting we'd immediately balloon down by the 
(typically, MMIO) hole size.


If you boot a guest with ~4+GB memory you should see this.



However I'm
not sure how that is possible, because the kernel reserves its own
holes, regardless of any predefined holes in the e820 map; for
example, the kernel reserves 64k (by default) at phys addr 0 (the
amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
the hypervisor really has no way to know what the "right" target to
specify is; unless it knows the exact guest OS and kernel version, and
kernel config values, it will never be able to correctly specify its
target to be exactly (e820 max_pfn - all holes).

Should this commit be reverted?  Should the xen balloon target be
adjusted based on kernel-added e820 holes?


I think the second one but shouldn't current_pages be updated, and not 
the target? The latter is set by Xen (toolstack, via xenstore usually).


Also, the bugs above (at least one of them) talk about NVMe and I wonder 
whether the memory that they add is of RAM type --- I believe it has its 
own type and so perhaps that introduces additional inconsistencies. AWS 
may have added their own support for that, which we don't have upstream yet.


-boris



Should something else be
done?

For context, Amazon Linux has simply disabled Xen ballooning
completely.  Likewise, we're planning to disable Xen ballooning in the
Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
Ubuntu kernels).  However, if reverting this patch makes sense in a
bigger context (i.e. Xen users besides AWS), that would allow more
Ubuntu kernels to work correctly in AWS instances.



maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-22 Thread Dan Streetman
I have a question about a problem introduced by this commit:
c275a57f5ec3056f732843b11659d892235faff7
"xen/balloon: Set balloon's initial state to number of existing RAM pages"

It changed the xen balloon current_pages calculation to start with the
number of physical pages in the system, instead of max_pfn.  Since
get_num_physpages() does not include holes, it's always less than the
e820 map's max_pfn.

However, the problem that commit introduced is, if the hypervisor sets
the balloon target to equal to the e820 map's max_pfn, then the
balloon target will *always* be higher than the initial current pages.
Even if the hypervisor sets the target to (e820 max_pfn - holes), if
the OS adds any holes, the balloon target will be higher than the
current pages.  This is the situation, for example, for Amazon AWS
instances.  The result is, the xen balloon will always immediately
hotplug some memory at boot, but then make only (max_pfn -
get_num_physpages()) available to the system.

This balloon-hotplugged memory can cause problems, if the hypervisor
wasn't expecting it; specifically, the system's physical page
addresses now will exceed the e820 map's max_pfn, due to the
balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
DMA to/from those physical pages above the e820 max_pfn, it causes
problems.  For example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129

The additional small amount of balloon memory can cause other problems
as well, for example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

Anyway, I'd like to ask, was the original commit added because
hypervisors are supposed to set their balloon target to the guest
system's number of phys pages (max_pfn - holes)?  The mailing list
discussion and commit description seem to indicate that.  However I'm
not sure how that is possible, because the kernel reserves its own
holes, regardless of any predefined holes in the e820 map; for
example, the kernel reserves 64k (by default) at phys addr 0 (the
amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
the hypervisor really has no way to know what the "right" target to
specify is; unless it knows the exact guest OS and kernel version, and
kernel config values, it will never be able to correctly specify its
target to be exactly (e820 max_pfn - all holes).

Should this commit be reverted?  Should the xen balloon target be
adjusted based on kernel-added e820 holes?  Should something else be
done?

For context, Amazon Linux has simply disabled Xen ballooning
completely.  Likewise, we're planning to disable Xen ballooning in the
Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
Ubuntu kernels).  However, if reverting this patch makes sense in a
bigger context (i.e. Xen users besides AWS), that would allow more
Ubuntu kernels to work correctly in AWS instances.


maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

2017-03-22 Thread Dan Streetman
I have a question about a problem introduced by this commit:
c275a57f5ec3056f732843b11659d892235faff7
"xen/balloon: Set balloon's initial state to number of existing RAM pages"

It changed the xen balloon current_pages calculation to start with the
number of physical pages in the system, instead of max_pfn.  Since
get_num_physpages() does not include holes, it's always less than the
e820 map's max_pfn.

However, the problem that commit introduced is, if the hypervisor sets
the balloon target to equal to the e820 map's max_pfn, then the
balloon target will *always* be higher than the initial current pages.
Even if the hypervisor sets the target to (e820 max_pfn - holes), if
the OS adds any holes, the balloon target will be higher than the
current pages.  This is the situation, for example, for Amazon AWS
instances.  The result is, the xen balloon will always immediately
hotplug some memory at boot, but then make only (max_pfn -
get_num_physpages()) available to the system.

This balloon-hotplugged memory can cause problems, if the hypervisor
wasn't expecting it; specifically, the system's physical page
addresses now will exceed the e820 map's max_pfn, due to the
balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
DMA to/from those physical pages above the e820 max_pfn, it causes
problems.  For example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129

The additional small amount of balloon memory can cause other problems
as well, for example:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

Anyway, I'd like to ask, was the original commit added because
hypervisors are supposed to set their balloon target to the guest
system's number of phys pages (max_pfn - holes)?  The mailing list
discussion and commit description seem to indicate that.  However I'm
not sure how that is possible, because the kernel reserves its own
holes, regardless of any predefined holes in the e820 map; for
example, the kernel reserves 64k (by default) at phys addr 0 (the
amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
the hypervisor really has no way to know what the "right" target to
specify is; unless it knows the exact guest OS and kernel version, and
kernel config values, it will never be able to correctly specify its
target to be exactly (e820 max_pfn - all holes).

Should this commit be reverted?  Should the xen balloon target be
adjusted based on kernel-added e820 holes?  Should something else be
done?

For context, Amazon Linux has simply disabled Xen ballooning
completely.  Likewise, we're planning to disable Xen ballooning in the
Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
Ubuntu kernels).  However, if reverting this patch makes sense in a
bigger context (i.e. Xen users besides AWS), that would allow more
Ubuntu kernels to work correctly in AWS instances.