Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 08/07/17 02:59, Konrad Rzeszutek Wilk wrote: > On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>> On 28.03.17 at 03:57,wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the > guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. >>> >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA which may get reserved). Can we pass this info to guests via >>> xenstore? >> >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > Did we ever come with a patch for this? Yes, I've sent V2 recently: https://lists.xen.org/archives/html/xen-devel/2017-07/msg00530.html Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 08/07/17 02:59, Konrad Rzeszutek Wilk wrote: > On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>> On 28.03.17 at 03:57, wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the > guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. >>> >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA which may get reserved). Can we pass this info to guests via >>> xenstore? >> >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > Did we ever come with a patch for this? Yes, I've sent V2 recently: https://lists.xen.org/archives/html/xen-devel/2017-07/msg00530.html Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: > > On 03/28/2017 04:08 AM, Jan Beulich wrote: > > On 28.03.17 at 03:57,wrote: > >>> I think there is indeed a disconnect between target memory (provided by > >>> the toolstack) and current memory (i.e actual pages available to the > >>> guest). > >>> > >>> For example > >>> > >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > >>> reserved > >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] > >>> reserved > >>> > >>> are missed in target calculation. The hvmloader marks them as RESERVED > >>> (in build_e820_table()) but target value is not aware of this action. > >>> > >>> And then the same problem repeats when kernel removes > >>> 0x000a-0x000f chunk. > >> But this is all in-guest behavior, i.e. nothing an entity outside the > >> guest (tool stack or hypervisor) should need to be aware of. That > >> said, there is still room for improvement in the tools I think: > >> Regions which architecturally aren't RAM (namely the > >> 0xa-0xf range) would probably better not be accounted > >> for as RAM as far as ballooning is concerned. In the hypervisor, > >> otoh, all memory assigned to the guest (i.e. including such backing > >> ROMs) needs to be accounted. > > > > On the Linux side we should not include in balloon calculations pages > > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > > > Which leaves hvmloader's special pages (and possibly memory under > > 0xA which may get reserved). Can we pass this info to guests via > > xenstore? > > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. Did we ever come with a patch for this?
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: > > On 03/28/2017 04:08 AM, Jan Beulich wrote: > > On 28.03.17 at 03:57, wrote: > >>> I think there is indeed a disconnect between target memory (provided by > >>> the toolstack) and current memory (i.e actual pages available to the > >>> guest). > >>> > >>> For example > >>> > >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > >>> reserved > >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] > >>> reserved > >>> > >>> are missed in target calculation. The hvmloader marks them as RESERVED > >>> (in build_e820_table()) but target value is not aware of this action. > >>> > >>> And then the same problem repeats when kernel removes > >>> 0x000a-0x000f chunk. > >> But this is all in-guest behavior, i.e. nothing an entity outside the > >> guest (tool stack or hypervisor) should need to be aware of. That > >> said, there is still room for improvement in the tools I think: > >> Regions which architecturally aren't RAM (namely the > >> 0xa-0xf range) would probably better not be accounted > >> for as RAM as far as ballooning is concerned. In the hypervisor, > >> otoh, all memory assigned to the guest (i.e. including such backing > >> ROMs) needs to be accounted. > > > > On the Linux side we should not include in balloon calculations pages > > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > > > Which leaves hvmloader's special pages (and possibly memory under > > 0xA which may get reserved). Can we pass this info to guests via > > xenstore? > > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. Did we ever come with a patch for this?
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 28/03/17 18:32, Boris Ostrovsky wrote: > On 03/28/2017 11:30 AM, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>> On 28.03.17 at 03:57,wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the > guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA which may get reserved). Can we pass this info to guests via >>> xenstore? >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > We could indeed base calculation on initial state of e820 and not count > the holes toward ballooning needs. I am not sure this will work for > memory unplug though, where a hole can be created in the map and we will > be supposed to handle disappearing memory via ballooning. > > Or am I creating a problem where none exists? I'm rather sure memory has to be offlined before being deleted from the E820 map. Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 28/03/17 18:32, Boris Ostrovsky wrote: > On 03/28/2017 11:30 AM, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>> On 28.03.17 at 03:57, wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the > guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA which may get reserved). Can we pass this info to guests via >>> xenstore? >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > We could indeed base calculation on initial state of e820 and not count > the holes toward ballooning needs. I am not sure this will work for > memory unplug though, where a hole can be created in the map and we will > be supposed to handle disappearing memory via ballooning. > > Or am I creating a problem where none exists? I'm rather sure memory has to be offlined before being deleted from the E820 map. Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/28/2017 11:30 AM, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: >> On 03/28/2017 04:08 AM, Jan Beulich wrote: >> On 28.03.17 at 03:57,wrote: I think there is indeed a disconnect between target memory (provided by the toolstack) and current memory (i.e actual pages available to the guest). For example [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved are missed in target calculation. The hvmloader marks them as RESERVED (in build_e820_table()) but target value is not aware of this action. And then the same problem repeats when kernel removes 0x000a-0x000f chunk. >>> But this is all in-guest behavior, i.e. nothing an entity outside the >>> guest (tool stack or hypervisor) should need to be aware of. That >>> said, there is still room for improvement in the tools I think: >>> Regions which architecturally aren't RAM (namely the >>> 0xa-0xf range) would probably better not be accounted >>> for as RAM as far as ballooning is concerned. In the hypervisor, >>> otoh, all memory assigned to the guest (i.e. including such backing >>> ROMs) needs to be accounted. >> On the Linux side we should not include in balloon calculations pages >> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >> >> Which leaves hvmloader's special pages (and possibly memory under >> 0xA which may get reserved). Can we pass this info to guests via >> xenstore? > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. We could indeed base calculation on initial state of e820 and not count the holes toward ballooning needs. I am not sure this will work for memory unplug though, where a hole can be created in the map and we will be supposed to handle disappearing memory via ballooning. Or am I creating a problem where none exists? -boris
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/28/2017 11:30 AM, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: >> On 03/28/2017 04:08 AM, Jan Beulich wrote: >> On 28.03.17 at 03:57, wrote: I think there is indeed a disconnect between target memory (provided by the toolstack) and current memory (i.e actual pages available to the guest). For example [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved are missed in target calculation. The hvmloader marks them as RESERVED (in build_e820_table()) but target value is not aware of this action. And then the same problem repeats when kernel removes 0x000a-0x000f chunk. >>> But this is all in-guest behavior, i.e. nothing an entity outside the >>> guest (tool stack or hypervisor) should need to be aware of. That >>> said, there is still room for improvement in the tools I think: >>> Regions which architecturally aren't RAM (namely the >>> 0xa-0xf range) would probably better not be accounted >>> for as RAM as far as ballooning is concerned. In the hypervisor, >>> otoh, all memory assigned to the guest (i.e. including such backing >>> ROMs) needs to be accounted. >> On the Linux side we should not include in balloon calculations pages >> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >> >> Which leaves hvmloader's special pages (and possibly memory under >> 0xA which may get reserved). Can we pass this info to guests via >> xenstore? > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. We could indeed base calculation on initial state of e820 and not count the holes toward ballooning needs. I am not sure this will work for memory unplug though, where a hole can be created in the map and we will be supposed to handle disappearing memory via ballooning. Or am I creating a problem where none exists? -boris
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 28/03/17 16:27, Boris Ostrovsky wrote: > On 03/28/2017 04:08 AM, Jan Beulich wrote: > On 28.03.17 at 03:57,wrote: >>> I think there is indeed a disconnect between target memory (provided by >>> the toolstack) and current memory (i.e actual pages available to the guest). >>> >>> For example >>> >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] >>> reserved >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] >>> reserved >>> >>> are missed in target calculation. The hvmloader marks them as RESERVED >>> (in build_e820_table()) but target value is not aware of this action. >>> >>> And then the same problem repeats when kernel removes >>> 0x000a-0x000f chunk. >> But this is all in-guest behavior, i.e. nothing an entity outside the >> guest (tool stack or hypervisor) should need to be aware of. That >> said, there is still room for improvement in the tools I think: >> Regions which architecturally aren't RAM (namely the >> 0xa-0xf range) would probably better not be accounted >> for as RAM as far as ballooning is concerned. In the hypervisor, >> otoh, all memory assigned to the guest (i.e. including such backing >> ROMs) needs to be accounted. > > On the Linux side we should not include in balloon calculations pages > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > Which leaves hvmloader's special pages (and possibly memory under > 0xA which may get reserved). Can we pass this info to guests via > xenstore? I'd rather keep an internal difference between online pages and E820-map count value in the balloon driver. This should work always. Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 28/03/17 16:27, Boris Ostrovsky wrote: > On 03/28/2017 04:08 AM, Jan Beulich wrote: > On 28.03.17 at 03:57, wrote: >>> I think there is indeed a disconnect between target memory (provided by >>> the toolstack) and current memory (i.e actual pages available to the guest). >>> >>> For example >>> >>> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] >>> reserved >>> [0.00] BIOS-e820: [mem 0x000e-0x000f] >>> reserved >>> >>> are missed in target calculation. The hvmloader marks them as RESERVED >>> (in build_e820_table()) but target value is not aware of this action. >>> >>> And then the same problem repeats when kernel removes >>> 0x000a-0x000f chunk. >> But this is all in-guest behavior, i.e. nothing an entity outside the >> guest (tool stack or hypervisor) should need to be aware of. That >> said, there is still room for improvement in the tools I think: >> Regions which architecturally aren't RAM (namely the >> 0xa-0xf range) would probably better not be accounted >> for as RAM as far as ballooning is concerned. In the hypervisor, >> otoh, all memory assigned to the guest (i.e. including such backing >> ROMs) needs to be accounted. > > On the Linux side we should not include in balloon calculations pages > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > Which leaves hvmloader's special pages (and possibly memory under > 0xA which may get reserved). Can we pass this info to guests via > xenstore? I'd rather keep an internal difference between online pages and E820-map count value in the balloon driver. This should work always. Juergen
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>> On 28.03.17 at 16:27,wrote: > Which leaves hvmloader's special pages (and possibly memory under > 0xA which may get reserved). Can we pass this info to guests via > xenstore? I'm perhaps the wrong one to ask regarding xenstore, but for in-guest communication this seems an at least strange approach to me. Jan
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>> On 28.03.17 at 16:27, wrote: > Which leaves hvmloader's special pages (and possibly memory under > 0xA which may get reserved). Can we pass this info to guests via > xenstore? I'm perhaps the wrong one to ask regarding xenstore, but for in-guest communication this seems an at least strange approach to me. Jan
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/28/2017 04:08 AM, Jan Beulich wrote: On 28.03.17 at 03:57,wrote: >> I think there is indeed a disconnect between target memory (provided by >> the toolstack) and current memory (i.e actual pages available to the guest). >> >> For example >> >> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] >> reserved >> [0.00] BIOS-e820: [mem 0x000e-0x000f] >> reserved >> >> are missed in target calculation. The hvmloader marks them as RESERVED >> (in build_e820_table()) but target value is not aware of this action. >> >> And then the same problem repeats when kernel removes >> 0x000a-0x000f chunk. > But this is all in-guest behavior, i.e. nothing an entity outside the > guest (tool stack or hypervisor) should need to be aware of. That > said, there is still room for improvement in the tools I think: > Regions which architecturally aren't RAM (namely the > 0xa-0xf range) would probably better not be accounted > for as RAM as far as ballooning is concerned. In the hypervisor, > otoh, all memory assigned to the guest (i.e. including such backing > ROMs) needs to be accounted. On the Linux side we should not include in balloon calculations pages reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. Which leaves hvmloader's special pages (and possibly memory under 0xA which may get reserved). Can we pass this info to guests via xenstore? -boris
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/28/2017 04:08 AM, Jan Beulich wrote: On 28.03.17 at 03:57, wrote: >> I think there is indeed a disconnect between target memory (provided by >> the toolstack) and current memory (i.e actual pages available to the guest). >> >> For example >> >> [0.00] BIOS-e820: [mem 0x0009e000-0x0009] >> reserved >> [0.00] BIOS-e820: [mem 0x000e-0x000f] >> reserved >> >> are missed in target calculation. The hvmloader marks them as RESERVED >> (in build_e820_table()) but target value is not aware of this action. >> >> And then the same problem repeats when kernel removes >> 0x000a-0x000f chunk. > But this is all in-guest behavior, i.e. nothing an entity outside the > guest (tool stack or hypervisor) should need to be aware of. That > said, there is still room for improvement in the tools I think: > Regions which architecturally aren't RAM (namely the > 0xa-0xf range) would probably better not be accounted > for as RAM as far as ballooning is concerned. In the hypervisor, > otoh, all memory assigned to the guest (i.e. including such backing > ROMs) needs to be accounted. On the Linux side we should not include in balloon calculations pages reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. Which leaves hvmloader's special pages (and possibly memory under 0xA which may get reserved). Can we pass this info to guests via xenstore? -boris
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>> On 28.03.17 at 03:57,wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. Jan
Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>>> On 28.03.17 at 03:57, wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the guest). > > For example > > [0.00] BIOS-e820: [mem 0x0009e000-0x0009] > reserved > [0.00] BIOS-e820: [mem 0x000e-0x000f] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a-0x000f chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa-0xf range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. Jan
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/27/2017 03:57 PM, Dan Streetman wrote: On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovskywrote: I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? yes, it's triggered from watch_target() which then calls online_pages() with the new memory. I added some debug (all numbers are in hex): [0.500080] xen:balloon: Initialising balloon driver [0.503027] xen:balloon: balloon_init: current/target pages 1fff9d [0.504044] xen_balloon: Initialising balloon driver [0.508046] xen_balloon: watch_target: new target 80 kb [0.508046] xen:balloon: balloon_set_new_target: target 20 [0.524024] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [0.567055] xen:balloon: balloon_process: current_credit 63 [0.568005] xen:balloon: reserve_additional_memory: adding memory resource for 8000 pages [3.694443] online_pages: pfn 21 nr_pages 8000 type 0 [3.701072] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [3.701074] xen:balloon: balloon_process: current_credit 63 [3.701075] xen:balloon: increase_reservation: nr_pages 63 [3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 [3.701172] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [3.701173] xen:balloon: balloon_process: current_credit 58 [3.701173] xen:balloon: increase_reservation: nr_pages 58 [3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [5.708085] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [5.708088] xen:balloon: balloon_process: current_credit 58 [5.708089] xen:balloon: increase_reservation: nr_pages 58 [5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [9.716065] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [9.716068] xen:balloon: balloon_process: current_credit 58 [9.716069] xen:balloon: increase_reservation: nr_pages 58 [9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 and that continues forever at the max interval (32), since max_retry_count is unlimited. So I think I understand things now; first, the current_pages is set properly based on the e820 map: $ dmesg|grep -i e820 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dfff] usable [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xefff] usable [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4 [0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4 [0.00] e820: [mem 0xf000-0xfbff] available for PCI devices [0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009] ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 + 0xf - 0x100 + 0x9e - 1 ] 1fff9d then, the xen balloon notices its target has been set to 20 by the hypervisor. That target does account for the hole at 0xf to 0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's 0x62 pages, plus the 1-page hole at addr 0 that the kernel always reserves, is 0x63 pages of holes, which aren't accounted for in the hypervisor's target. so the balloon driver hotplugs the memory, and tries to increase its reservation to provide the needed pages to get the current_pages up to the target. However, when it calls the hypervisor to populate the physmap, the hypervisor only allows 11 (0xb) pages to be populated; all calls after that get back 0 from the hypervisor. Do you think the hypervisor's balloon target should account for the e820 holes (and for the kernel's added hole at addr 0)? Alternately/additionally, if the hypervisor doesn't want to support ballooning, should it just return error from the call to populate the physmap, and not allow those 11 pages? At this point, it doesn't seem to me like the kernel is doing anything wrong, correct? I think there is indeed a disconnect between target
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/27/2017 03:57 PM, Dan Streetman wrote: On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky wrote: I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? yes, it's triggered from watch_target() which then calls online_pages() with the new memory. I added some debug (all numbers are in hex): [0.500080] xen:balloon: Initialising balloon driver [0.503027] xen:balloon: balloon_init: current/target pages 1fff9d [0.504044] xen_balloon: Initialising balloon driver [0.508046] xen_balloon: watch_target: new target 80 kb [0.508046] xen:balloon: balloon_set_new_target: target 20 [0.524024] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [0.567055] xen:balloon: balloon_process: current_credit 63 [0.568005] xen:balloon: reserve_additional_memory: adding memory resource for 8000 pages [3.694443] online_pages: pfn 21 nr_pages 8000 type 0 [3.701072] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [3.701074] xen:balloon: balloon_process: current_credit 63 [3.701075] xen:balloon: increase_reservation: nr_pages 63 [3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 [3.701172] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [3.701173] xen:balloon: balloon_process: current_credit 58 [3.701173] xen:balloon: increase_reservation: nr_pages 58 [3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [5.708085] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [5.708088] xen:balloon: balloon_process: current_credit 58 [5.708089] xen:balloon: increase_reservation: nr_pages 58 [5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [9.716065] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [9.716068] xen:balloon: balloon_process: current_credit 58 [9.716069] xen:balloon: increase_reservation: nr_pages 58 [9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 and that continues forever at the max interval (32), since max_retry_count is unlimited. So I think I understand things now; first, the current_pages is set properly based on the e820 map: $ dmesg|grep -i e820 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dfff] usable [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xefff] usable [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4 [0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4 [0.00] e820: [mem 0xf000-0xfbff] available for PCI devices [0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009] ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 + 0xf - 0x100 + 0x9e - 1 ] 1fff9d then, the xen balloon notices its target has been set to 20 by the hypervisor. That target does account for the hole at 0xf to 0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's 0x62 pages, plus the 1-page hole at addr 0 that the kernel always reserves, is 0x63 pages of holes, which aren't accounted for in the hypervisor's target. so the balloon driver hotplugs the memory, and tries to increase its reservation to provide the needed pages to get the current_pages up to the target. However, when it calls the hypervisor to populate the physmap, the hypervisor only allows 11 (0xb) pages to be populated; all calls after that get back 0 from the hypervisor. Do you think the hypervisor's balloon target should account for the e820 holes (and for the kernel's added hole at addr 0)? Alternately/additionally, if the hypervisor doesn't want to support ballooning, should it just return error from the call to populate the physmap, and not allow those 11 pages? At this point, it doesn't seem to me like the kernel is doing anything wrong, correct? I think there is indeed a disconnect between target memory (provided by the
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovskywrote: > >> >> I think we can all agree that the *ideal* situation would be, for the >> balloon driver to not immediately hotplug memory so it can add 11 more >> pages, so maybe I just need to figure out why the balloon driver >> thinks it needs 11 more pages, and fix that. > > > > How does the new memory appear in the guest? Via online_pages()? > > Or is ballooning triggered from watch_target()? yes, it's triggered from watch_target() which then calls online_pages() with the new memory. I added some debug (all numbers are in hex): [0.500080] xen:balloon: Initialising balloon driver [0.503027] xen:balloon: balloon_init: current/target pages 1fff9d [0.504044] xen_balloon: Initialising balloon driver [0.508046] xen_balloon: watch_target: new target 80 kb [0.508046] xen:balloon: balloon_set_new_target: target 20 [0.524024] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [0.567055] xen:balloon: balloon_process: current_credit 63 [0.568005] xen:balloon: reserve_additional_memory: adding memory resource for 8000 pages [3.694443] online_pages: pfn 21 nr_pages 8000 type 0 [3.701072] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [3.701074] xen:balloon: balloon_process: current_credit 63 [3.701075] xen:balloon: increase_reservation: nr_pages 63 [3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 [3.701172] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [3.701173] xen:balloon: balloon_process: current_credit 58 [3.701173] xen:balloon: increase_reservation: nr_pages 58 [3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [5.708085] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [5.708088] xen:balloon: balloon_process: current_credit 58 [5.708089] xen:balloon: increase_reservation: nr_pages 58 [5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [9.716065] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [9.716068] xen:balloon: balloon_process: current_credit 58 [9.716069] xen:balloon: increase_reservation: nr_pages 58 [9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 and that continues forever at the max interval (32), since max_retry_count is unlimited. So I think I understand things now; first, the current_pages is set properly based on the e820 map: $ dmesg|grep -i e820 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dfff] usable [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xefff] usable [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4 [0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4 [0.00] e820: [mem 0xf000-0xfbff] available for PCI devices [0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009] ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 + 0xf - 0x100 + 0x9e - 1 ] 1fff9d then, the xen balloon notices its target has been set to 20 by the hypervisor. That target does account for the hole at 0xf to 0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's 0x62 pages, plus the 1-page hole at addr 0 that the kernel always reserves, is 0x63 pages of holes, which aren't accounted for in the hypervisor's target. so the balloon driver hotplugs the memory, and tries to increase its reservation to provide the needed pages to get the current_pages up to the target. However, when it calls the hypervisor to populate the physmap, the hypervisor only allows 11 (0xb) pages to be populated; all calls after that get back 0 from the hypervisor. Do you think the hypervisor's balloon target should account for the e820 holes (and for the kernel's added hole at addr 0)? Alternately/additionally, if the hypervisor doesn't want to support ballooning, should it just return error from the call to populate the physmap, and not allow those 11 pages? At this point, it doesn't seem to me like the kernel is doing anything wrong, correct?
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky wrote: > >> >> I think we can all agree that the *ideal* situation would be, for the >> balloon driver to not immediately hotplug memory so it can add 11 more >> pages, so maybe I just need to figure out why the balloon driver >> thinks it needs 11 more pages, and fix that. > > > > How does the new memory appear in the guest? Via online_pages()? > > Or is ballooning triggered from watch_target()? yes, it's triggered from watch_target() which then calls online_pages() with the new memory. I added some debug (all numbers are in hex): [0.500080] xen:balloon: Initialising balloon driver [0.503027] xen:balloon: balloon_init: current/target pages 1fff9d [0.504044] xen_balloon: Initialising balloon driver [0.508046] xen_balloon: watch_target: new target 80 kb [0.508046] xen:balloon: balloon_set_new_target: target 20 [0.524024] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [0.567055] xen:balloon: balloon_process: current_credit 63 [0.568005] xen:balloon: reserve_additional_memory: adding memory resource for 8000 pages [3.694443] online_pages: pfn 21 nr_pages 8000 type 0 [3.701072] xen:balloon: current_credit: target pages 20 current pages 1fff9d credit 63 [3.701074] xen:balloon: balloon_process: current_credit 63 [3.701075] xen:balloon: increase_reservation: nr_pages 63 [3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 [3.701172] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [3.701173] xen:balloon: balloon_process: current_credit 58 [3.701173] xen:balloon: increase_reservation: nr_pages 58 [3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [5.708085] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [5.708088] xen:balloon: balloon_process: current_credit 58 [5.708089] xen:balloon: increase_reservation: nr_pages 58 [5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [9.716065] xen:balloon: current_credit: target pages 20 current pages 1fffa8 credit 58 [9.716068] xen:balloon: balloon_process: current_credit 58 [9.716069] xen:balloon: increase_reservation: nr_pages 58 [9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 and that continues forever at the max interval (32), since max_retry_count is unlimited. So I think I understand things now; first, the current_pages is set properly based on the e820 map: $ dmesg|grep -i e820 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dfff] usable [0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xefff] usable [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00020fff] usable [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x21 max_arch_pfn = 0x4 [0.00] e820: last_pfn = 0xf max_arch_pfn = 0x4 [0.00] e820: [mem 0xf000-0xfbff] available for PCI devices [0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009] ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x21 - 0x10 + 0xf - 0x100 + 0x9e - 1 ] 1fff9d then, the xen balloon notices its target has been set to 20 by the hypervisor. That target does account for the hole at 0xf to 0x10, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's 0x62 pages, plus the 1-page hole at addr 0 that the kernel always reserves, is 0x63 pages of holes, which aren't accounted for in the hypervisor's target. so the balloon driver hotplugs the memory, and tries to increase its reservation to provide the needed pages to get the current_pages up to the target. However, when it calls the hypervisor to populate the physmap, the hypervisor only allows 11 (0xb) pages to be populated; all calls after that get back 0 from the hypervisor. Do you think the hypervisor's balloon target should account for the e820 holes (and for the kernel's added hole at addr 0)? Alternately/additionally, if the hypervisor doesn't want to support ballooning, should it just return error from the call to populate the physmap, and not allow those 11 pages? At this point, it doesn't seem to me like the kernel is doing anything wrong, correct?
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? -boris
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? -boris
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 5:10 PM, Konrad Rzeszutek Wilkwrote: > On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: >> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky >> wrote: >> > >> > >> > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> >> >> I have a question about a problem introduced by this commit: >> >> c275a57f5ec3056f732843b11659d892235faff7 >> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> >> >> It changed the xen balloon current_pages calculation to start with the >> >> number of physical pages in the system, instead of max_pfn. Since >> >> get_num_physpages() does not include holes, it's always less than the >> >> e820 map's max_pfn. >> >> >> >> However, the problem that commit introduced is, if the hypervisor sets >> >> the balloon target to equal to the e820 map's max_pfn, then the >> >> balloon target will *always* be higher than the initial current pages. >> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> >> the OS adds any holes, the balloon target will be higher than the >> >> current pages. This is the situation, for example, for Amazon AWS >> >> instances. The result is, the xen balloon will always immediately >> >> hotplug some memory at boot, but then make only (max_pfn - >> >> get_num_physpages()) available to the system. >> >> >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> >> wasn't expecting it; specifically, the system's physical page >> >> addresses now will exceed the e820 map's max_pfn, due to the >> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> >> DMA to/from those physical pages above the e820 max_pfn, it causes >> >> problems. For example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> >> >> The additional small amount of balloon memory can cause other problems >> >> as well, for example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> >> >> Anyway, I'd like to ask, was the original commit added because >> >> hypervisors are supposed to set their balloon target to the guest >> >> system's number of phys pages (max_pfn - holes)? The mailing list >> >> discussion and commit description seem to indicate that. >> > >> > >> > >> > IIRC the problem that this was trying to fix was that since max_pfn >> > includes >> > holes, upon booting we'd immediately balloon down by the (typically, MMIO) >> > hole size. >> > >> > If you boot a guest with ~4+GB memory you should see this. >> > >> > >> >> However I'm >> >> not sure how that is possible, because the kernel reserves its own >> >> holes, regardless of any predefined holes in the e820 map; for >> >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> >> the hypervisor really has no way to know what the "right" target to >> >> specify is; unless it knows the exact guest OS and kernel version, and >> >> kernel config values, it will never be able to correctly specify its >> >> target to be exactly (e820 max_pfn - all holes). >> >> >> >> Should this commit be reverted? Should the xen balloon target be >> >> adjusted based on kernel-added e820 holes? >> > >> > >> > I think the second one but shouldn't current_pages be updated, and not the >> > target? The latter is set by Xen (toolstack, via xenstore usually). >> > >> > Also, the bugs above (at least one of them) talk about NVMe and I wonder >> > whether the memory that they add is of RAM type --- I believe it has its >> > own >> > type and so perhaps that introduces additional inconsistencies. AWS may >> > have >> > added their own support for that, which we don't have upstream yet. >> >> The type of memory doesn't have anything to do with it. >> >> The problem with NVMe is it's a passthrough device, so the guest talks >> directly to the NVMe controller and does DMA with it. But the >> hypervisor does swiotlb translation between the guest physical memory, > > Um, the hypervisor does not have SWIOTLB support, only IOMMU support. heh, well I have no special insight into Amazon's hypervisor, so I have no idea what underlying memory remapping mechanism it uses :) > >> and the host physical memory, so that the NVMe device can correctly >> DMA to the right memory in the host. >> >> However, the hypervisor only has the guest's physical memory up to the >> max e820 pfn mapped; it didn't expect the balloon driver to hotplug >> any additional memory above the e820 max pfn, so when the NVMe driver >> in the guest tries to tell the NVMe controller to DMA to that >> balloon-hotplugged memory, the hypervisor fails the NVMe request, > > But when the memory hotplug happens the hypercalls are done to > raise the max pfn. well...all I can say is it rejects DMA above the e820 range. so this very well may be a hypervisor bug, where it should add the balloon
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 5:10 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: >> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky >> wrote: >> > >> > >> > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> >> >> I have a question about a problem introduced by this commit: >> >> c275a57f5ec3056f732843b11659d892235faff7 >> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> >> >> It changed the xen balloon current_pages calculation to start with the >> >> number of physical pages in the system, instead of max_pfn. Since >> >> get_num_physpages() does not include holes, it's always less than the >> >> e820 map's max_pfn. >> >> >> >> However, the problem that commit introduced is, if the hypervisor sets >> >> the balloon target to equal to the e820 map's max_pfn, then the >> >> balloon target will *always* be higher than the initial current pages. >> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> >> the OS adds any holes, the balloon target will be higher than the >> >> current pages. This is the situation, for example, for Amazon AWS >> >> instances. The result is, the xen balloon will always immediately >> >> hotplug some memory at boot, but then make only (max_pfn - >> >> get_num_physpages()) available to the system. >> >> >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> >> wasn't expecting it; specifically, the system's physical page >> >> addresses now will exceed the e820 map's max_pfn, due to the >> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> >> DMA to/from those physical pages above the e820 max_pfn, it causes >> >> problems. For example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> >> >> The additional small amount of balloon memory can cause other problems >> >> as well, for example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> >> >> Anyway, I'd like to ask, was the original commit added because >> >> hypervisors are supposed to set their balloon target to the guest >> >> system's number of phys pages (max_pfn - holes)? The mailing list >> >> discussion and commit description seem to indicate that. >> > >> > >> > >> > IIRC the problem that this was trying to fix was that since max_pfn >> > includes >> > holes, upon booting we'd immediately balloon down by the (typically, MMIO) >> > hole size. >> > >> > If you boot a guest with ~4+GB memory you should see this. >> > >> > >> >> However I'm >> >> not sure how that is possible, because the kernel reserves its own >> >> holes, regardless of any predefined holes in the e820 map; for >> >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> >> the hypervisor really has no way to know what the "right" target to >> >> specify is; unless it knows the exact guest OS and kernel version, and >> >> kernel config values, it will never be able to correctly specify its >> >> target to be exactly (e820 max_pfn - all holes). >> >> >> >> Should this commit be reverted? Should the xen balloon target be >> >> adjusted based on kernel-added e820 holes? >> > >> > >> > I think the second one but shouldn't current_pages be updated, and not the >> > target? The latter is set by Xen (toolstack, via xenstore usually). >> > >> > Also, the bugs above (at least one of them) talk about NVMe and I wonder >> > whether the memory that they add is of RAM type --- I believe it has its >> > own >> > type and so perhaps that introduces additional inconsistencies. AWS may >> > have >> > added their own support for that, which we don't have upstream yet. >> >> The type of memory doesn't have anything to do with it. >> >> The problem with NVMe is it's a passthrough device, so the guest talks >> directly to the NVMe controller and does DMA with it. But the >> hypervisor does swiotlb translation between the guest physical memory, > > Um, the hypervisor does not have SWIOTLB support, only IOMMU support. heh, well I have no special insight into Amazon's hypervisor, so I have no idea what underlying memory remapping mechanism it uses :) > >> and the host physical memory, so that the NVMe device can correctly >> DMA to the right memory in the host. >> >> However, the hypervisor only has the guest's physical memory up to the >> max e820 pfn mapped; it didn't expect the balloon driver to hotplug >> any additional memory above the e820 max pfn, so when the NVMe driver >> in the guest tries to tell the NVMe controller to DMA to that >> balloon-hotplugged memory, the hypervisor fails the NVMe request, > > But when the memory hotplug happens the hypercalls are done to > raise the max pfn. well...all I can say is it rejects DMA above the e820 range. so this very well may be a hypervisor bug, where it should add the balloon memory region to whatever does the NVMe passthrough
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: > On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky >wrote: > > > > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: > >> > >> I have a question about a problem introduced by this commit: > >> c275a57f5ec3056f732843b11659d892235faff7 > >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" > >> > >> It changed the xen balloon current_pages calculation to start with the > >> number of physical pages in the system, instead of max_pfn. Since > >> get_num_physpages() does not include holes, it's always less than the > >> e820 map's max_pfn. > >> > >> However, the problem that commit introduced is, if the hypervisor sets > >> the balloon target to equal to the e820 map's max_pfn, then the > >> balloon target will *always* be higher than the initial current pages. > >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if > >> the OS adds any holes, the balloon target will be higher than the > >> current pages. This is the situation, for example, for Amazon AWS > >> instances. The result is, the xen balloon will always immediately > >> hotplug some memory at boot, but then make only (max_pfn - > >> get_num_physpages()) available to the system. > >> > >> This balloon-hotplugged memory can cause problems, if the hypervisor > >> wasn't expecting it; specifically, the system's physical page > >> addresses now will exceed the e820 map's max_pfn, due to the > >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > >> DMA to/from those physical pages above the e820 max_pfn, it causes > >> problems. For example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > >> > >> The additional small amount of balloon memory can cause other problems > >> as well, for example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > >> > >> Anyway, I'd like to ask, was the original commit added because > >> hypervisors are supposed to set their balloon target to the guest > >> system's number of phys pages (max_pfn - holes)? The mailing list > >> discussion and commit description seem to indicate that. > > > > > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > > hole size. > > > > If you boot a guest with ~4+GB memory you should see this. > > > > > >> However I'm > >> not sure how that is possible, because the kernel reserves its own > >> holes, regardless of any predefined holes in the e820 map; for > >> example, the kernel reserves 64k (by default) at phys addr 0 (the > >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > >> the hypervisor really has no way to know what the "right" target to > >> specify is; unless it knows the exact guest OS and kernel version, and > >> kernel config values, it will never be able to correctly specify its > >> target to be exactly (e820 max_pfn - all holes). > >> > >> Should this commit be reverted? Should the xen balloon target be > >> adjusted based on kernel-added e820 holes? > > > > > > I think the second one but shouldn't current_pages be updated, and not the > > target? The latter is set by Xen (toolstack, via xenstore usually). > > > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > > whether the memory that they add is of RAM type --- I believe it has its own > > type and so perhaps that introduces additional inconsistencies. AWS may have > > added their own support for that, which we don't have upstream yet. > > The type of memory doesn't have anything to do with it. > > The problem with NVMe is it's a passthrough device, so the guest talks > directly to the NVMe controller and does DMA with it. But the > hypervisor does swiotlb translation between the guest physical memory, Um, the hypervisor does not have SWIOTLB support, only IOMMU support. > and the host physical memory, so that the NVMe device can correctly > DMA to the right memory in the host. > > However, the hypervisor only has the guest's physical memory up to the > max e820 pfn mapped; it didn't expect the balloon driver to hotplug > any additional memory above the e820 max pfn, so when the NVMe driver > in the guest tries to tell the NVMe controller to DMA to that > balloon-hotplugged memory, the hypervisor fails the NVMe request, But when the memory hotplug happens the hypercalls are done to raise the max pfn. > because it can't do the guest-to-host phys mem mapping, since the > guest phys address is outside the expected max range. > > > > > > > -boris > > > > > > > >> Should something else be > >> done? > >> > >> For context, Amazon Linux has simply disabled Xen ballooning > >> completely. Likewise, we're planning to disable Xen ballooning in the > >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > >> Ubuntu kernels). However, if reverting this
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: > On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky > wrote: > > > > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: > >> > >> I have a question about a problem introduced by this commit: > >> c275a57f5ec3056f732843b11659d892235faff7 > >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" > >> > >> It changed the xen balloon current_pages calculation to start with the > >> number of physical pages in the system, instead of max_pfn. Since > >> get_num_physpages() does not include holes, it's always less than the > >> e820 map's max_pfn. > >> > >> However, the problem that commit introduced is, if the hypervisor sets > >> the balloon target to equal to the e820 map's max_pfn, then the > >> balloon target will *always* be higher than the initial current pages. > >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if > >> the OS adds any holes, the balloon target will be higher than the > >> current pages. This is the situation, for example, for Amazon AWS > >> instances. The result is, the xen balloon will always immediately > >> hotplug some memory at boot, but then make only (max_pfn - > >> get_num_physpages()) available to the system. > >> > >> This balloon-hotplugged memory can cause problems, if the hypervisor > >> wasn't expecting it; specifically, the system's physical page > >> addresses now will exceed the e820 map's max_pfn, due to the > >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > >> DMA to/from those physical pages above the e820 max_pfn, it causes > >> problems. For example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > >> > >> The additional small amount of balloon memory can cause other problems > >> as well, for example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > >> > >> Anyway, I'd like to ask, was the original commit added because > >> hypervisors are supposed to set their balloon target to the guest > >> system's number of phys pages (max_pfn - holes)? The mailing list > >> discussion and commit description seem to indicate that. > > > > > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > > hole size. > > > > If you boot a guest with ~4+GB memory you should see this. > > > > > >> However I'm > >> not sure how that is possible, because the kernel reserves its own > >> holes, regardless of any predefined holes in the e820 map; for > >> example, the kernel reserves 64k (by default) at phys addr 0 (the > >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > >> the hypervisor really has no way to know what the "right" target to > >> specify is; unless it knows the exact guest OS and kernel version, and > >> kernel config values, it will never be able to correctly specify its > >> target to be exactly (e820 max_pfn - all holes). > >> > >> Should this commit be reverted? Should the xen balloon target be > >> adjusted based on kernel-added e820 holes? > > > > > > I think the second one but shouldn't current_pages be updated, and not the > > target? The latter is set by Xen (toolstack, via xenstore usually). > > > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > > whether the memory that they add is of RAM type --- I believe it has its own > > type and so perhaps that introduces additional inconsistencies. AWS may have > > added their own support for that, which we don't have upstream yet. > > The type of memory doesn't have anything to do with it. > > The problem with NVMe is it's a passthrough device, so the guest talks > directly to the NVMe controller and does DMA with it. But the > hypervisor does swiotlb translation between the guest physical memory, Um, the hypervisor does not have SWIOTLB support, only IOMMU support. > and the host physical memory, so that the NVMe device can correctly > DMA to the right memory in the host. > > However, the hypervisor only has the guest's physical memory up to the > max e820 pfn mapped; it didn't expect the balloon driver to hotplug > any additional memory above the e820 max pfn, so when the NVMe driver > in the guest tries to tell the NVMe controller to DMA to that > balloon-hotplugged memory, the hypervisor fails the NVMe request, But when the memory hotplug happens the hypercalls are done to raise the max pfn. > because it can't do the guest-to-host phys mem mapping, since the > guest phys address is outside the expected max range. > > > > > > > -boris > > > > > > > >> Should something else be > >> done? > >> > >> For context, Amazon Linux has simply disabled Xen ballooning > >> completely. Likewise, we're planning to disable Xen ballooning in the > >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > >> Ubuntu kernels). However, if reverting this patch makes sense in a > >>
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovskywrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > > I think the second one but shouldn't current_pages be updated, and not the > target? The latter is set by Xen (toolstack, via xenstore usually). > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > whether the memory that they add is of RAM type --- I believe it has its own > type and so perhaps that introduces additional inconsistencies. AWS may have > added their own support for that, which we don't have upstream yet. The type of memory doesn't have anything to do with it. The problem with NVMe is it's a passthrough device, so the guest talks directly to the NVMe controller and does DMA with it. But the hypervisor does swiotlb translation between the guest physical memory, and the host physical memory, so that the NVMe device can correctly DMA to the right memory in the host. However, the hypervisor only has the guest's physical memory up to the max e820 pfn mapped; it didn't expect the balloon driver to hotplug any additional memory above the e820 max pfn, so when the NVMe driver in the guest tries to tell the NVMe controller to DMA to that balloon-hotplugged memory, the hypervisor fails the NVMe request, because it can't do the guest-to-host phys mem mapping, since the guest phys address is outside the expected max range. > > -boris > > > >> Should something else be >> done? >> >> For context, Amazon Linux has simply disabled Xen ballooning >> completely. Likewise, we're planning to disable Xen ballooning in the >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS >> Ubuntu kernels). However, if reverting this patch makes sense in a >> bigger context (i.e. Xen users besides AWS), that would allow more >> Ubuntu kernels to work correctly in AWS instances. >> >
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > > I think the second one but shouldn't current_pages be updated, and not the > target? The latter is set by Xen (toolstack, via xenstore usually). > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > whether the memory that they add is of RAM type --- I believe it has its own > type and so perhaps that introduces additional inconsistencies. AWS may have > added their own support for that, which we don't have upstream yet. The type of memory doesn't have anything to do with it. The problem with NVMe is it's a passthrough device, so the guest talks directly to the NVMe controller and does DMA with it. But the hypervisor does swiotlb translation between the guest physical memory, and the host physical memory, so that the NVMe device can correctly DMA to the right memory in the host. However, the hypervisor only has the guest's physical memory up to the max e820 pfn mapped; it didn't expect the balloon driver to hotplug any additional memory above the e820 max pfn, so when the NVMe driver in the guest tries to tell the NVMe controller to DMA to that balloon-hotplugged memory, the hypervisor fails the NVMe request, because it can't do the guest-to-host phys mem mapping, since the guest phys address is outside the expected max range. > > -boris > > > >> Should something else be >> done? >> >> For context, Amazon Linux has simply disabled Xen ballooning >> completely. Likewise, we're planning to disable Xen ballooning in the >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS >> Ubuntu kernels). However, if reverting this patch makes sense in a >> bigger context (i.e. Xen users besides AWS), that would allow more >> Ubuntu kernels to work correctly in AWS instances. >> >
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Thu, Mar 23, 2017 at 3:56 AM, Juergen Grosswrote: > On 23/03/17 03:13, Boris Ostrovsky wrote: >> >> >> On 03/22/2017 05:16 PM, Dan Streetman wrote: >>> I have a question about a problem introduced by this commit: >>> c275a57f5ec3056f732843b11659d892235faff7 >>> "xen/balloon: Set balloon's initial state to number of existing RAM >>> pages" >>> >>> It changed the xen balloon current_pages calculation to start with the >>> number of physical pages in the system, instead of max_pfn. Since >>> get_num_physpages() does not include holes, it's always less than the >>> e820 map's max_pfn. >>> >>> However, the problem that commit introduced is, if the hypervisor sets >>> the balloon target to equal to the e820 map's max_pfn, then the >>> balloon target will *always* be higher than the initial current pages. >>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >>> the OS adds any holes, the balloon target will be higher than the >>> current pages. This is the situation, for example, for Amazon AWS >>> instances. The result is, the xen balloon will always immediately >>> hotplug some memory at boot, but then make only (max_pfn - >>> get_num_physpages()) available to the system. >>> >>> This balloon-hotplugged memory can cause problems, if the hypervisor >>> wasn't expecting it; specifically, the system's physical page >>> addresses now will exceed the e820 map's max_pfn, due to the >>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >>> DMA to/from those physical pages above the e820 max_pfn, it causes >>> problems. For example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >>> >>> The additional small amount of balloon memory can cause other problems >>> as well, for example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >>> >>> Anyway, I'd like to ask, was the original commit added because >>> hypervisors are supposed to set their balloon target to the guest >>> system's number of phys pages (max_pfn - holes)? The mailing list >>> discussion and commit description seem to indicate that. >> >> >> IIRC the problem that this was trying to fix was that since max_pfn >> includes holes, upon booting we'd immediately balloon down by the >> (typically, MMIO) hole size. >> >> If you boot a guest with ~4+GB memory you should see this. >> >> >>> However I'm >>> not sure how that is possible, because the kernel reserves its own >>> holes, regardless of any predefined holes in the e820 map; for >>> example, the kernel reserves 64k (by default) at phys addr 0 (the >>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >>> the hypervisor really has no way to know what the "right" target to >>> specify is; unless it knows the exact guest OS and kernel version, and >>> kernel config values, it will never be able to correctly specify its >>> target to be exactly (e820 max_pfn - all holes). >>> >>> Should this commit be reverted? Should the xen balloon target be >>> adjusted based on kernel-added e820 holes? >> >> I think the second one but shouldn't current_pages be updated, and not >> the target? The latter is set by Xen (toolstack, via xenstore usually). > > Right. > > Looking into a HVM domU I can't see any problem related to > CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is sorry I brought that up; I was only giving an example. It's not directly relevant to this and may have distracted from the actual problem; in fact on closer inspection, the X86_RESERVE_LOW is using memblock_reserve(), which removes it from managed memory but not the e820 map (and thus doesn't remove it from get_num_physpages()). Only phys page 0 is actually reserved in the e820 map. > configured with 2048 MB of RAM, 8MB being video RAM. Looking into > /sys/devices/system/xen_memory/xen_memory0 I can see the current > size and target size do match: both are 2088960 kB (2 GB - 8 MB). > > Ballooning down and up to 2048 MB again doesn't change the picture. > > So which additional holes are added by the kernel on AWS via which > functions? I'll use two AWS types as examples, t2.micro (1G mem) and t2.large (8G mem). In the micro, the results of ballooning are obvious, because the hotplugged memory always goes into the Normal zone; but since the base memory is only 1g, it's contained entirely in the DMA32/DMA zones. So we get: $ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo spanned 4095 present 3997 managed 3976 start_pfn: 1 spanned 258048 present 258048 managed 249606 start_pfn: 4096 spanned 32768 present 32768 managed 11 start_pfn: 262144 As you can see, none of the e820 memory went into the Normal zone; the balloon driver hotpluged 128m (32k pages), but only made 11 pages available. Having a memory zone with only 11 pages really screwed with kswapd, since the zone's memory watermarks
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On Thu, Mar 23, 2017 at 3:56 AM, Juergen Gross wrote: > On 23/03/17 03:13, Boris Ostrovsky wrote: >> >> >> On 03/22/2017 05:16 PM, Dan Streetman wrote: >>> I have a question about a problem introduced by this commit: >>> c275a57f5ec3056f732843b11659d892235faff7 >>> "xen/balloon: Set balloon's initial state to number of existing RAM >>> pages" >>> >>> It changed the xen balloon current_pages calculation to start with the >>> number of physical pages in the system, instead of max_pfn. Since >>> get_num_physpages() does not include holes, it's always less than the >>> e820 map's max_pfn. >>> >>> However, the problem that commit introduced is, if the hypervisor sets >>> the balloon target to equal to the e820 map's max_pfn, then the >>> balloon target will *always* be higher than the initial current pages. >>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >>> the OS adds any holes, the balloon target will be higher than the >>> current pages. This is the situation, for example, for Amazon AWS >>> instances. The result is, the xen balloon will always immediately >>> hotplug some memory at boot, but then make only (max_pfn - >>> get_num_physpages()) available to the system. >>> >>> This balloon-hotplugged memory can cause problems, if the hypervisor >>> wasn't expecting it; specifically, the system's physical page >>> addresses now will exceed the e820 map's max_pfn, due to the >>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >>> DMA to/from those physical pages above the e820 max_pfn, it causes >>> problems. For example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >>> >>> The additional small amount of balloon memory can cause other problems >>> as well, for example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >>> >>> Anyway, I'd like to ask, was the original commit added because >>> hypervisors are supposed to set their balloon target to the guest >>> system's number of phys pages (max_pfn - holes)? The mailing list >>> discussion and commit description seem to indicate that. >> >> >> IIRC the problem that this was trying to fix was that since max_pfn >> includes holes, upon booting we'd immediately balloon down by the >> (typically, MMIO) hole size. >> >> If you boot a guest with ~4+GB memory you should see this. >> >> >>> However I'm >>> not sure how that is possible, because the kernel reserves its own >>> holes, regardless of any predefined holes in the e820 map; for >>> example, the kernel reserves 64k (by default) at phys addr 0 (the >>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >>> the hypervisor really has no way to know what the "right" target to >>> specify is; unless it knows the exact guest OS and kernel version, and >>> kernel config values, it will never be able to correctly specify its >>> target to be exactly (e820 max_pfn - all holes). >>> >>> Should this commit be reverted? Should the xen balloon target be >>> adjusted based on kernel-added e820 holes? >> >> I think the second one but shouldn't current_pages be updated, and not >> the target? The latter is set by Xen (toolstack, via xenstore usually). > > Right. > > Looking into a HVM domU I can't see any problem related to > CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is sorry I brought that up; I was only giving an example. It's not directly relevant to this and may have distracted from the actual problem; in fact on closer inspection, the X86_RESERVE_LOW is using memblock_reserve(), which removes it from managed memory but not the e820 map (and thus doesn't remove it from get_num_physpages()). Only phys page 0 is actually reserved in the e820 map. > configured with 2048 MB of RAM, 8MB being video RAM. Looking into > /sys/devices/system/xen_memory/xen_memory0 I can see the current > size and target size do match: both are 2088960 kB (2 GB - 8 MB). > > Ballooning down and up to 2048 MB again doesn't change the picture. > > So which additional holes are added by the kernel on AWS via which > functions? I'll use two AWS types as examples, t2.micro (1G mem) and t2.large (8G mem). In the micro, the results of ballooning are obvious, because the hotplugged memory always goes into the Normal zone; but since the base memory is only 1g, it's contained entirely in the DMA32/DMA zones. So we get: $ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo spanned 4095 present 3997 managed 3976 start_pfn: 1 spanned 258048 present 258048 managed 249606 start_pfn: 4096 spanned 32768 present 32768 managed 11 start_pfn: 262144 As you can see, none of the e820 memory went into the Normal zone; the balloon driver hotpluged 128m (32k pages), but only made 11 pages available. Having a memory zone with only 11 pages really screwed with kswapd, since the zone's memory watermarks were all 0. That
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 23/03/17 03:13, Boris Ostrovsky wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM >> pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > IIRC the problem that this was trying to fix was that since max_pfn > includes holes, upon booting we'd immediately balloon down by the > (typically, MMIO) hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > I think the second one but shouldn't current_pages be updated, and not > the target? The latter is set by Xen (toolstack, via xenstore usually). Right. Looking into a HVM domU I can't see any problem related to CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is configured with 2048 MB of RAM, 8MB being video RAM. Looking into /sys/devices/system/xen_memory/xen_memory0 I can see the current size and target size do match: both are 2088960 kB (2 GB - 8 MB). Ballooning down and up to 2048 MB again doesn't change the picture. So which additional holes are added by the kernel on AWS via which functions? Juergen
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 23/03/17 03:13, Boris Ostrovsky wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM >> pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > IIRC the problem that this was trying to fix was that since max_pfn > includes holes, upon booting we'd immediately balloon down by the > (typically, MMIO) hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > I think the second one but shouldn't current_pages be updated, and not > the target? The latter is set by Xen (toolstack, via xenstore usually). Right. Looking into a HVM domU I can't see any problem related to CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is configured with 2048 MB of RAM, 8MB being video RAM. Looking into /sys/devices/system/xen_memory/xen_memory0 I can see the current size and target size do match: both are 2088960 kB (2 GB - 8 MB). Ballooning down and up to 2048 MB again doesn't change the picture. So which additional holes are added by the kernel on AWS via which functions? Juergen
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/22/2017 05:16 PM, Dan Streetman wrote: I have a question about a problem introduced by this commit: c275a57f5ec3056f732843b11659d892235faff7 "xen/balloon: Set balloon's initial state to number of existing RAM pages" It changed the xen balloon current_pages calculation to start with the number of physical pages in the system, instead of max_pfn. Since get_num_physpages() does not include holes, it's always less than the e820 map's max_pfn. However, the problem that commit introduced is, if the hypervisor sets the balloon target to equal to the e820 map's max_pfn, then the balloon target will *always* be higher than the initial current pages. Even if the hypervisor sets the target to (e820 max_pfn - holes), if the OS adds any holes, the balloon target will be higher than the current pages. This is the situation, for example, for Amazon AWS instances. The result is, the xen balloon will always immediately hotplug some memory at boot, but then make only (max_pfn - get_num_physpages()) available to the system. This balloon-hotplugged memory can cause problems, if the hypervisor wasn't expecting it; specifically, the system's physical page addresses now will exceed the e820 map's max_pfn, due to the balloon-hotplugged pages; if the hypervisor isn't expecting pt-device DMA to/from those physical pages above the e820 max_pfn, it causes problems. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 The additional small amount of balloon memory can cause other problems as well, for example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 Anyway, I'd like to ask, was the original commit added because hypervisors are supposed to set their balloon target to the guest system's number of phys pages (max_pfn - holes)? The mailing list discussion and commit description seem to indicate that. IIRC the problem that this was trying to fix was that since max_pfn includes holes, upon booting we'd immediately balloon down by the (typically, MMIO) hole size. If you boot a guest with ~4+GB memory you should see this. However I'm not sure how that is possible, because the kernel reserves its own holes, regardless of any predefined holes in the e820 map; for example, the kernel reserves 64k (by default) at phys addr 0 (the amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So the hypervisor really has no way to know what the "right" target to specify is; unless it knows the exact guest OS and kernel version, and kernel config values, it will never be able to correctly specify its target to be exactly (e820 max_pfn - all holes). Should this commit be reverted? Should the xen balloon target be adjusted based on kernel-added e820 holes? I think the second one but shouldn't current_pages be updated, and not the target? The latter is set by Xen (toolstack, via xenstore usually). Also, the bugs above (at least one of them) talk about NVMe and I wonder whether the memory that they add is of RAM type --- I believe it has its own type and so perhaps that introduces additional inconsistencies. AWS may have added their own support for that, which we don't have upstream yet. -boris Should something else be done? For context, Amazon Linux has simply disabled Xen ballooning completely. Likewise, we're planning to disable Xen ballooning in the Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS Ubuntu kernels). However, if reverting this patch makes sense in a bigger context (i.e. Xen users besides AWS), that would allow more Ubuntu kernels to work correctly in AWS instances.
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
On 03/22/2017 05:16 PM, Dan Streetman wrote: I have a question about a problem introduced by this commit: c275a57f5ec3056f732843b11659d892235faff7 "xen/balloon: Set balloon's initial state to number of existing RAM pages" It changed the xen balloon current_pages calculation to start with the number of physical pages in the system, instead of max_pfn. Since get_num_physpages() does not include holes, it's always less than the e820 map's max_pfn. However, the problem that commit introduced is, if the hypervisor sets the balloon target to equal to the e820 map's max_pfn, then the balloon target will *always* be higher than the initial current pages. Even if the hypervisor sets the target to (e820 max_pfn - holes), if the OS adds any holes, the balloon target will be higher than the current pages. This is the situation, for example, for Amazon AWS instances. The result is, the xen balloon will always immediately hotplug some memory at boot, but then make only (max_pfn - get_num_physpages()) available to the system. This balloon-hotplugged memory can cause problems, if the hypervisor wasn't expecting it; specifically, the system's physical page addresses now will exceed the e820 map's max_pfn, due to the balloon-hotplugged pages; if the hypervisor isn't expecting pt-device DMA to/from those physical pages above the e820 max_pfn, it causes problems. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 The additional small amount of balloon memory can cause other problems as well, for example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 Anyway, I'd like to ask, was the original commit added because hypervisors are supposed to set their balloon target to the guest system's number of phys pages (max_pfn - holes)? The mailing list discussion and commit description seem to indicate that. IIRC the problem that this was trying to fix was that since max_pfn includes holes, upon booting we'd immediately balloon down by the (typically, MMIO) hole size. If you boot a guest with ~4+GB memory you should see this. However I'm not sure how that is possible, because the kernel reserves its own holes, regardless of any predefined holes in the e820 map; for example, the kernel reserves 64k (by default) at phys addr 0 (the amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So the hypervisor really has no way to know what the "right" target to specify is; unless it knows the exact guest OS and kernel version, and kernel config values, it will never be able to correctly specify its target to be exactly (e820 max_pfn - all holes). Should this commit be reverted? Should the xen balloon target be adjusted based on kernel-added e820 holes? I think the second one but shouldn't current_pages be updated, and not the target? The latter is set by Xen (toolstack, via xenstore usually). Also, the bugs above (at least one of them) talk about NVMe and I wonder whether the memory that they add is of RAM type --- I believe it has its own type and so perhaps that introduces additional inconsistencies. AWS may have added their own support for that, which we don't have upstream yet. -boris Should something else be done? For context, Amazon Linux has simply disabled Xen ballooning completely. Likewise, we're planning to disable Xen ballooning in the Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS Ubuntu kernels). However, if reverting this patch makes sense in a bigger context (i.e. Xen users besides AWS), that would allow more Ubuntu kernels to work correctly in AWS instances.
maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I have a question about a problem introduced by this commit: c275a57f5ec3056f732843b11659d892235faff7 "xen/balloon: Set balloon's initial state to number of existing RAM pages" It changed the xen balloon current_pages calculation to start with the number of physical pages in the system, instead of max_pfn. Since get_num_physpages() does not include holes, it's always less than the e820 map's max_pfn. However, the problem that commit introduced is, if the hypervisor sets the balloon target to equal to the e820 map's max_pfn, then the balloon target will *always* be higher than the initial current pages. Even if the hypervisor sets the target to (e820 max_pfn - holes), if the OS adds any holes, the balloon target will be higher than the current pages. This is the situation, for example, for Amazon AWS instances. The result is, the xen balloon will always immediately hotplug some memory at boot, but then make only (max_pfn - get_num_physpages()) available to the system. This balloon-hotplugged memory can cause problems, if the hypervisor wasn't expecting it; specifically, the system's physical page addresses now will exceed the e820 map's max_pfn, due to the balloon-hotplugged pages; if the hypervisor isn't expecting pt-device DMA to/from those physical pages above the e820 max_pfn, it causes problems. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 The additional small amount of balloon memory can cause other problems as well, for example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 Anyway, I'd like to ask, was the original commit added because hypervisors are supposed to set their balloon target to the guest system's number of phys pages (max_pfn - holes)? The mailing list discussion and commit description seem to indicate that. However I'm not sure how that is possible, because the kernel reserves its own holes, regardless of any predefined holes in the e820 map; for example, the kernel reserves 64k (by default) at phys addr 0 (the amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So the hypervisor really has no way to know what the "right" target to specify is; unless it knows the exact guest OS and kernel version, and kernel config values, it will never be able to correctly specify its target to be exactly (e820 max_pfn - all holes). Should this commit be reverted? Should the xen balloon target be adjusted based on kernel-added e820 holes? Should something else be done? For context, Amazon Linux has simply disabled Xen ballooning completely. Likewise, we're planning to disable Xen ballooning in the Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS Ubuntu kernels). However, if reverting this patch makes sense in a bigger context (i.e. Xen users besides AWS), that would allow more Ubuntu kernels to work correctly in AWS instances.
maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I have a question about a problem introduced by this commit: c275a57f5ec3056f732843b11659d892235faff7 "xen/balloon: Set balloon's initial state to number of existing RAM pages" It changed the xen balloon current_pages calculation to start with the number of physical pages in the system, instead of max_pfn. Since get_num_physpages() does not include holes, it's always less than the e820 map's max_pfn. However, the problem that commit introduced is, if the hypervisor sets the balloon target to equal to the e820 map's max_pfn, then the balloon target will *always* be higher than the initial current pages. Even if the hypervisor sets the target to (e820 max_pfn - holes), if the OS adds any holes, the balloon target will be higher than the current pages. This is the situation, for example, for Amazon AWS instances. The result is, the xen balloon will always immediately hotplug some memory at boot, but then make only (max_pfn - get_num_physpages()) available to the system. This balloon-hotplugged memory can cause problems, if the hypervisor wasn't expecting it; specifically, the system's physical page addresses now will exceed the e820 map's max_pfn, due to the balloon-hotplugged pages; if the hypervisor isn't expecting pt-device DMA to/from those physical pages above the e820 max_pfn, it causes problems. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 The additional small amount of balloon memory can cause other problems as well, for example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 Anyway, I'd like to ask, was the original commit added because hypervisors are supposed to set their balloon target to the guest system's number of phys pages (max_pfn - holes)? The mailing list discussion and commit description seem to indicate that. However I'm not sure how that is possible, because the kernel reserves its own holes, regardless of any predefined holes in the e820 map; for example, the kernel reserves 64k (by default) at phys addr 0 (the amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So the hypervisor really has no way to know what the "right" target to specify is; unless it knows the exact guest OS and kernel version, and kernel config values, it will never be able to correctly specify its target to be exactly (e820 max_pfn - all holes). Should this commit be reverted? Should the xen balloon target be adjusted based on kernel-added e820 holes? Should something else be done? For context, Amazon Linux has simply disabled Xen ballooning completely. Likewise, we're planning to disable Xen ballooning in the Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS Ubuntu kernels). However, if reverting this patch makes sense in a bigger context (i.e. Xen users besides AWS), that would allow more Ubuntu kernels to work correctly in AWS instances.