Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 16:22, Alex Thorlton wrote:
> On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote:
>> I'd go with the new error code. What about E2BIG or ENOSPC?
>>
>> I think the hypervisor should fill in the number of entries required
>> in this case.
>>
>> In case nobody objects I can post patches for this purpose (both Xen
>> and Linux).
> 
> This sounds like a good solution to me.  I think it's definitely more
> appropriate than simply bumping up the size of xen_e820_map, especially
> considering the fact that it's theoretically possible for the e820 map
> generated by the hypercall to grow too large, even on a non-EFI machine,
> where my change would have no effect.

Well, it won't help with the current hypervisor, so bumping up the size
of xen_e820_map will still be a good idea. I think using E820_X_MAX is
okay since in the end xen_e820_map will be transferred into a struct
e820map which can't hold more than E820_X_MAX entries (additional
entries are ignored here, so this won't let the boot fail).


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 16:22, Alex Thorlton wrote:
> On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote:
>> I'd go with the new error code. What about E2BIG or ENOSPC?
>>
>> I think the hypervisor should fill in the number of entries required
>> in this case.
>>
>> In case nobody objects I can post patches for this purpose (both Xen
>> and Linux).
> 
> This sounds like a good solution to me.  I think it's definitely more
> appropriate than simply bumping up the size of xen_e820_map, especially
> considering the fact that it's theoretically possible for the e820 map
> generated by the hypercall to grow too large, even on a non-EFI machine,
> where my change would have no effect.

Well, it won't help with the current hypervisor, so bumping up the size
of xen_e820_map will still be a good idea. I think using E820_X_MAX is
okay since in the end xen_e820_map will be transferred into a struct
e820map which can't hold more than E820_X_MAX entries (additional
entries are ignored here, so this won't let the boot fail).


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Alex Thorlton
On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote:
> I'd go with the new error code. What about E2BIG or ENOSPC?
> 
> I think the hypervisor should fill in the number of entries required
> in this case.
> 
> In case nobody objects I can post patches for this purpose (both Xen
> and Linux).

This sounds like a good solution to me.  I think it's definitely more
appropriate than simply bumping up the size of xen_e820_map, especially
considering the fact that it's theoretically possible for the e820 map
generated by the hypercall to grow too large, even on a non-EFI machine,
where my change would have no effect.

Thanks for your input!

- Alex


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Alex Thorlton
On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote:
> I'd go with the new error code. What about E2BIG or ENOSPC?
> 
> I think the hypervisor should fill in the number of entries required
> in this case.
> 
> In case nobody objects I can post patches for this purpose (both Xen
> and Linux).

This sounds like a good solution to me.  I think it's definitely more
appropriate than simply bumping up the size of xen_e820_map, especially
considering the fact that it's theoretically possible for the e820 map
generated by the hypercall to grow too large, even on a non-EFI machine,
where my change would have no effect.

Thanks for your input!

- Alex


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 12:07,  wrote:
> On 15/11/16 11:44, Jan Beulich wrote:
> On 15.11.16 at 10:55,  wrote:
>>> On 15/11/16 10:45, Jan Beulich wrote:
>>> On 15.11.16 at 09:42,  wrote:
> For a fully dynamical solution we'd need a way to get a partial
> E820 map from the hypervisor (e.g. first 128 entries) in order to
> be able to setup at least some memory and later get the rest of
> the memory map using some dynamically allocated memory.

 And we could of course also make the hypercall allow for that (e.g.
 by defining the semantics of a specific error code, so far not used
 by it, to avoid mis-interpretation of output on older hypervisors),
 or introduce a new clone of the existing one(s).
>>>
>>> I'd go with the new error code. What about E2BIG or ENOSPC?
>> 
>> Either seems fine.
>> 
>>> I think the hypervisor should fill in the number of entries required
>>> in this case.
>> 
>> And you'd then mean the caller to imply that the passed in (and now
>> overwritten) count to describe how many entries got filled? That's
>> not that nice an interface. I'd rather return the number of entries
>> filled, and require the sizing variant (NULL handle) to be used to
>> obtain the number of entries. After all if a caller _wants_ to handle
>> a partial map, it doesn't really care how many further entries there
>> are.
> 
> I just wanted to avoid two interface changes.

Just make one at a time.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 12:07,  wrote:
> On 15/11/16 11:44, Jan Beulich wrote:
> On 15.11.16 at 10:55,  wrote:
>>> On 15/11/16 10:45, Jan Beulich wrote:
>>> On 15.11.16 at 09:42,  wrote:
> For a fully dynamical solution we'd need a way to get a partial
> E820 map from the hypervisor (e.g. first 128 entries) in order to
> be able to setup at least some memory and later get the rest of
> the memory map using some dynamically allocated memory.

 And we could of course also make the hypercall allow for that (e.g.
 by defining the semantics of a specific error code, so far not used
 by it, to avoid mis-interpretation of output on older hypervisors),
 or introduce a new clone of the existing one(s).
>>>
>>> I'd go with the new error code. What about E2BIG or ENOSPC?
>> 
>> Either seems fine.
>> 
>>> I think the hypervisor should fill in the number of entries required
>>> in this case.
>> 
>> And you'd then mean the caller to imply that the passed in (and now
>> overwritten) count to describe how many entries got filled? That's
>> not that nice an interface. I'd rather return the number of entries
>> filled, and require the sizing variant (NULL handle) to be used to
>> obtain the number of entries. After all if a caller _wants_ to handle
>> a partial map, it doesn't really care how many further entries there
>> are.
> 
> I just wanted to avoid two interface changes.

Just make one at a time.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 11:44, Jan Beulich wrote:
 On 15.11.16 at 10:55,  wrote:
>> On 15/11/16 10:45, Jan Beulich wrote:
>> On 15.11.16 at 09:42,  wrote:
 For a fully dynamical solution we'd need a way to get a partial
 E820 map from the hypervisor (e.g. first 128 entries) in order to
 be able to setup at least some memory and later get the rest of
 the memory map using some dynamically allocated memory.
>>>
>>> And we could of course also make the hypercall allow for that (e.g.
>>> by defining the semantics of a specific error code, so far not used
>>> by it, to avoid mis-interpretation of output on older hypervisors),
>>> or introduce a new clone of the existing one(s).
>>
>> I'd go with the new error code. What about E2BIG or ENOSPC?
> 
> Either seems fine.
> 
>> I think the hypervisor should fill in the number of entries required
>> in this case.
> 
> And you'd then mean the caller to imply that the passed in (and now
> overwritten) count to describe how many entries got filled? That's
> not that nice an interface. I'd rather return the number of entries
> filled, and require the sizing variant (NULL handle) to be used to
> obtain the number of entries. After all if a caller _wants_ to handle
> a partial map, it doesn't really care how many further entries there
> are.

I just wanted to avoid two interface changes. IMO the caller knows the
buffer size he supplied and it is rather clear that e.g. E2BIG means
it has been filled completely.

And the assumption the caller doesn't care for the further entries is
not necessarily true: he might not care at the moment of the call, but
he will eventually care for them later when he is able to allocate an
appropriate sized buffer.

OTOH I'm not feeling strong in this case and you are the hypervisor
maintainer. Either solution is fine for me.


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 11:44, Jan Beulich wrote:
 On 15.11.16 at 10:55,  wrote:
>> On 15/11/16 10:45, Jan Beulich wrote:
>> On 15.11.16 at 09:42,  wrote:
 For a fully dynamical solution we'd need a way to get a partial
 E820 map from the hypervisor (e.g. first 128 entries) in order to
 be able to setup at least some memory and later get the rest of
 the memory map using some dynamically allocated memory.
>>>
>>> And we could of course also make the hypercall allow for that (e.g.
>>> by defining the semantics of a specific error code, so far not used
>>> by it, to avoid mis-interpretation of output on older hypervisors),
>>> or introduce a new clone of the existing one(s).
>>
>> I'd go with the new error code. What about E2BIG or ENOSPC?
> 
> Either seems fine.
> 
>> I think the hypervisor should fill in the number of entries required
>> in this case.
> 
> And you'd then mean the caller to imply that the passed in (and now
> overwritten) count to describe how many entries got filled? That's
> not that nice an interface. I'd rather return the number of entries
> filled, and require the sizing variant (NULL handle) to be used to
> obtain the number of entries. After all if a caller _wants_ to handle
> a partial map, it doesn't really care how many further entries there
> are.

I just wanted to avoid two interface changes. IMO the caller knows the
buffer size he supplied and it is rather clear that e.g. E2BIG means
it has been filled completely.

And the assumption the caller doesn't care for the further entries is
not necessarily true: he might not care at the moment of the call, but
he will eventually care for them later when he is able to allocate an
appropriate sized buffer.

OTOH I'm not feeling strong in this case and you are the hypervisor
maintainer. Either solution is fine for me.


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 10:55,  wrote:
> On 15/11/16 10:45, Jan Beulich wrote:
> On 15.11.16 at 09:42,  wrote:
>>> For a fully dynamical solution we'd need a way to get a partial
>>> E820 map from the hypervisor (e.g. first 128 entries) in order to
>>> be able to setup at least some memory and later get the rest of
>>> the memory map using some dynamically allocated memory.
>> 
>> And we could of course also make the hypercall allow for that (e.g.
>> by defining the semantics of a specific error code, so far not used
>> by it, to avoid mis-interpretation of output on older hypervisors),
>> or introduce a new clone of the existing one(s).
> 
> I'd go with the new error code. What about E2BIG or ENOSPC?

Either seems fine.

> I think the hypervisor should fill in the number of entries required
> in this case.

And you'd then mean the caller to imply that the passed in (and now
overwritten) count to describe how many entries got filled? That's
not that nice an interface. I'd rather return the number of entries
filled, and require the sizing variant (NULL handle) to be used to
obtain the number of entries. After all if a caller _wants_ to handle
a partial map, it doesn't really care how many further entries there
are.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 10:55,  wrote:
> On 15/11/16 10:45, Jan Beulich wrote:
> On 15.11.16 at 09:42,  wrote:
>>> For a fully dynamical solution we'd need a way to get a partial
>>> E820 map from the hypervisor (e.g. first 128 entries) in order to
>>> be able to setup at least some memory and later get the rest of
>>> the memory map using some dynamically allocated memory.
>> 
>> And we could of course also make the hypercall allow for that (e.g.
>> by defining the semantics of a specific error code, so far not used
>> by it, to avoid mis-interpretation of output on older hypervisors),
>> or introduce a new clone of the existing one(s).
> 
> I'd go with the new error code. What about E2BIG or ENOSPC?

Either seems fine.

> I think the hypervisor should fill in the number of entries required
> in this case.

And you'd then mean the caller to imply that the passed in (and now
overwritten) count to describe how many entries got filled? That's
not that nice an interface. I'd rather return the number of entries
filled, and require the sizing variant (NULL handle) to be used to
obtain the number of entries. After all if a caller _wants_ to handle
a partial map, it doesn't really care how many further entries there
are.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 10:45, Jan Beulich wrote:
 On 15.11.16 at 09:42,  wrote:
>> On 15/11/16 09:01, Jan Beulich wrote:
>> On 15.11.16 at 08:36,  wrote:
 On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
>
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

 The hypervisor interface of XENMEM_machine_memory_map requires to
 specify the size of the e820 map where the hypervisor can supply
 entries. The alternative would be try and error: start with a small
 e820 map and in case of error increasing the size until success. I
 don't like this approach. Especially as dynamic memory allocations
 are not possible at this stage (using RESERVE_BRK() isn't any better
 than a static __initdata array IMO).
>>>
>>> Well, I think as a first step we should extend
>>> XENMEM_{,machine_}memory_map to the model used elsewhere
>>> where passing in a NULL handle (and perhaps count being zero) is
>>> a request for the number of needed entries. Granted this doesn't
>>> help with Linux'es way of consuming the output, but it at least
>>> allows for dynamic sizing. And Linux would then be free to prepare
>>> a static array or have a RESERVE_BRK() as it sees fit.
>>
>> This still needs the definition of an upper limit, as RESERVE_BRK()
>> is a compile time feature.
> 
> That's why I did say "as it sees fit".
> 
>> For a fully dynamical solution we'd need a way to get a partial
>> E820 map from the hypervisor (e.g. first 128 entries) in order to
>> be able to setup at least some memory and later get the rest of
>> the memory map using some dynamically allocated memory.
> 
> And we could of course also make the hypercall allow for that (e.g.
> by defining the semantics of a specific error code, so far not used
> by it, to avoid mis-interpretation of output on older hypervisors),
> or introduce a new clone of the existing one(s).

I'd go with the new error code. What about E2BIG or ENOSPC?

I think the hypervisor should fill in the number of entries required
in this case.

In case nobody objects I can post patches for this purpose (both Xen
and Linux).


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 10:45, Jan Beulich wrote:
 On 15.11.16 at 09:42,  wrote:
>> On 15/11/16 09:01, Jan Beulich wrote:
>> On 15.11.16 at 08:36,  wrote:
 On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
>
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

 The hypervisor interface of XENMEM_machine_memory_map requires to
 specify the size of the e820 map where the hypervisor can supply
 entries. The alternative would be try and error: start with a small
 e820 map and in case of error increasing the size until success. I
 don't like this approach. Especially as dynamic memory allocations
 are not possible at this stage (using RESERVE_BRK() isn't any better
 than a static __initdata array IMO).
>>>
>>> Well, I think as a first step we should extend
>>> XENMEM_{,machine_}memory_map to the model used elsewhere
>>> where passing in a NULL handle (and perhaps count being zero) is
>>> a request for the number of needed entries. Granted this doesn't
>>> help with Linux'es way of consuming the output, but it at least
>>> allows for dynamic sizing. And Linux would then be free to prepare
>>> a static array or have a RESERVE_BRK() as it sees fit.
>>
>> This still needs the definition of an upper limit, as RESERVE_BRK()
>> is a compile time feature.
> 
> That's why I did say "as it sees fit".
> 
>> For a fully dynamical solution we'd need a way to get a partial
>> E820 map from the hypervisor (e.g. first 128 entries) in order to
>> be able to setup at least some memory and later get the rest of
>> the memory map using some dynamically allocated memory.
> 
> And we could of course also make the hypercall allow for that (e.g.
> by defining the semantics of a specific error code, so far not used
> by it, to avoid mis-interpretation of output on older hypervisors),
> or introduce a new clone of the existing one(s).

I'd go with the new error code. What about E2BIG or ENOSPC?

I think the hypervisor should fill in the number of entries required
in this case.

In case nobody objects I can post patches for this purpose (both Xen
and Linux).


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 09:42,  wrote:
> On 15/11/16 09:01, Jan Beulich wrote:
> On 15.11.16 at 08:36,  wrote:
>>> On 15/11/16 08:15, Jan Beulich wrote:
>>> On 15.11.16 at 07:33,  wrote:
> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
>
> So I think we need something like:
>
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>
> and use this for sizing xen_e820_map[].

 I would say that if any change gets done here, there shouldn't be
 any static upper limit at all. That could even be viewed as in line
 with recent e820.c changes moving to dynamic allocations. In
 particular I don't see why MAX_IO_APICS would need adding in
 here, but not other (current and future) factors determining the
 (pseudo) E820 map Xen presents to Dom0.
>>>
>>> The hypervisor interface of XENMEM_machine_memory_map requires to
>>> specify the size of the e820 map where the hypervisor can supply
>>> entries. The alternative would be try and error: start with a small
>>> e820 map and in case of error increasing the size until success. I
>>> don't like this approach. Especially as dynamic memory allocations
>>> are not possible at this stage (using RESERVE_BRK() isn't any better
>>> than a static __initdata array IMO).
>> 
>> Well, I think as a first step we should extend
>> XENMEM_{,machine_}memory_map to the model used elsewhere
>> where passing in a NULL handle (and perhaps count being zero) is
>> a request for the number of needed entries. Granted this doesn't
>> help with Linux'es way of consuming the output, but it at least
>> allows for dynamic sizing. And Linux would then be free to prepare
>> a static array or have a RESERVE_BRK() as it sees fit.
> 
> This still needs the definition of an upper limit, as RESERVE_BRK()
> is a compile time feature.

That's why I did say "as it sees fit".

> For a fully dynamical solution we'd need a way to get a partial
> E820 map from the hypervisor (e.g. first 128 entries) in order to
> be able to setup at least some memory and later get the rest of
> the memory map using some dynamically allocated memory.

And we could of course also make the hypercall allow for that (e.g.
by defining the semantics of a specific error code, so far not used
by it, to avoid mis-interpretation of output on older hypervisors),
or introduce a new clone of the existing one(s).

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 09:42,  wrote:
> On 15/11/16 09:01, Jan Beulich wrote:
> On 15.11.16 at 08:36,  wrote:
>>> On 15/11/16 08:15, Jan Beulich wrote:
>>> On 15.11.16 at 07:33,  wrote:
> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
>
> So I think we need something like:
>
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>
> and use this for sizing xen_e820_map[].

 I would say that if any change gets done here, there shouldn't be
 any static upper limit at all. That could even be viewed as in line
 with recent e820.c changes moving to dynamic allocations. In
 particular I don't see why MAX_IO_APICS would need adding in
 here, but not other (current and future) factors determining the
 (pseudo) E820 map Xen presents to Dom0.
>>>
>>> The hypervisor interface of XENMEM_machine_memory_map requires to
>>> specify the size of the e820 map where the hypervisor can supply
>>> entries. The alternative would be try and error: start with a small
>>> e820 map and in case of error increasing the size until success. I
>>> don't like this approach. Especially as dynamic memory allocations
>>> are not possible at this stage (using RESERVE_BRK() isn't any better
>>> than a static __initdata array IMO).
>> 
>> Well, I think as a first step we should extend
>> XENMEM_{,machine_}memory_map to the model used elsewhere
>> where passing in a NULL handle (and perhaps count being zero) is
>> a request for the number of needed entries. Granted this doesn't
>> help with Linux'es way of consuming the output, but it at least
>> allows for dynamic sizing. And Linux would then be free to prepare
>> a static array or have a RESERVE_BRK() as it sees fit.
> 
> This still needs the definition of an upper limit, as RESERVE_BRK()
> is a compile time feature.

That's why I did say "as it sees fit".

> For a fully dynamical solution we'd need a way to get a partial
> E820 map from the hypervisor (e.g. first 128 entries) in order to
> be able to setup at least some memory and later get the rest of
> the memory map using some dynamically allocated memory.

And we could of course also make the hypercall allow for that (e.g.
by defining the semantics of a specific error code, so far not used
by it, to avoid mis-interpretation of output on older hypervisors),
or introduce a new clone of the existing one(s).

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 09:01, Jan Beulich wrote:
 On 15.11.16 at 08:36,  wrote:
>> On 15/11/16 08:15, Jan Beulich wrote:
>> On 15.11.16 at 07:33,  wrote:
 On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
>
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
>
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
>
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
>
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

 While I think extending the e820 table is the right thing to do I'm
 questioning the assumptions here.

 Looking briefly through the Xen hypervisor sources I think it isn't
 yet ready for such large machines: the hypervisor's e820 map seems to
 be still limited to 128 e820 entries. Jan, did I overlook an EFI
 specific path extending this limitation?
>>>
>>> No, you didn't. I do question the correlation with "large machines"
>>> here though: The issue isn't with large machines afaict, but with
>>> ones having very many entries (i.e. heavily fragmented).
>>
>> Fact is: non-EFI machines are limited to 128 entries.
>>
>> One reason for fragmentation is NUMA - which tends to be present and
>> especially adding many entries only with lots of nodes - on large
>> machines.
> 
> Yes. However, at present Xen only supports up to 64 nodes, and
> with one entry per node (assuming there are holes between nodes,
> but contiguous memory within them) that's still unlikely to overflow
> the table. Yet I'm not meaning to discard the fact that it's been
> known for a long time that since Linux needed a larger table, Xen
> would need to follow suit sooner or later.
> 
 In case I'm right the Xen hypervisor should be prepared for a larger
 e820 map, but this won't help alone as there would still be additional
 entries for the IOAPICs created.

 So I think we need something like:

 #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

 and use this for sizing xen_e820_map[].
>>>
>>> I would say that if any change gets done here, there shouldn't be
>>> any static upper limit at all. That could even be viewed as in line
>>> with recent e820.c changes moving to dynamic allocations. In
>>> particular I don't see why MAX_IO_APICS would need adding in
>>> here, but not other (current and future) factors determining the
>>> (pseudo) E820 map Xen presents to Dom0.
>>
>> The hypervisor interface of XENMEM_machine_memory_map requires to
>> specify the size of the e820 map where the hypervisor can supply
>> entries. The alternative would be try and error: start with a small
>> e820 map and in case of error increasing the size until success. I
>> don't like this approach. Especially as dynamic memory allocations
>> are not possible at this stage (using RESERVE_BRK() isn't any better
>> than a static __initdata array IMO).
> 
> Well, I think as a first step we should extend
> XENMEM_{,machine_}memory_map to the model used elsewhere
> where passing in a NULL handle (and perhaps count being zero) is
> a request for the number of needed entries. Granted this doesn't
> help with Linux'es way of consuming the output, but it at least
> allows for dynamic sizing. And Linux would then be free to prepare
> a static array or have a RESERVE_BRK() as it sees fit.

This still needs the definition of an upper limit, as RESERVE_BRK()
is a compile time feature.

For a fully dynamical solution we'd need a way to get a partial
E820 map from the hypervisor (e.g. first 128 entries) in order to
be able to setup at least some memory and later get the rest of
the memory map using some dynamically allocated memory.

>> Which other current factors do you see determining the number of
>> e820 entries presented to Dom0?
> 
> Just look at the implementation: The dependency is on iomem_caps,
> and hence any code removing pages from there could affect the
> number of entries. Which means the number of IOMMUs also has an
> effect here. And further things could get added at any time, as 

Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Juergen Gross
On 15/11/16 09:01, Jan Beulich wrote:
 On 15.11.16 at 08:36,  wrote:
>> On 15/11/16 08:15, Jan Beulich wrote:
>> On 15.11.16 at 07:33,  wrote:
 On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
>
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
>
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
>
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
>
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

 While I think extending the e820 table is the right thing to do I'm
 questioning the assumptions here.

 Looking briefly through the Xen hypervisor sources I think it isn't
 yet ready for such large machines: the hypervisor's e820 map seems to
 be still limited to 128 e820 entries. Jan, did I overlook an EFI
 specific path extending this limitation?
>>>
>>> No, you didn't. I do question the correlation with "large machines"
>>> here though: The issue isn't with large machines afaict, but with
>>> ones having very many entries (i.e. heavily fragmented).
>>
>> Fact is: non-EFI machines are limited to 128 entries.
>>
>> One reason for fragmentation is NUMA - which tends to be present and
>> especially adding many entries only with lots of nodes - on large
>> machines.
> 
> Yes. However, at present Xen only supports up to 64 nodes, and
> with one entry per node (assuming there are holes between nodes,
> but contiguous memory within them) that's still unlikely to overflow
> the table. Yet I'm not meaning to discard the fact that it's been
> known for a long time that since Linux needed a larger table, Xen
> would need to follow suit sooner or later.
> 
 In case I'm right the Xen hypervisor should be prepared for a larger
 e820 map, but this won't help alone as there would still be additional
 entries for the IOAPICs created.

 So I think we need something like:

 #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

 and use this for sizing xen_e820_map[].
>>>
>>> I would say that if any change gets done here, there shouldn't be
>>> any static upper limit at all. That could even be viewed as in line
>>> with recent e820.c changes moving to dynamic allocations. In
>>> particular I don't see why MAX_IO_APICS would need adding in
>>> here, but not other (current and future) factors determining the
>>> (pseudo) E820 map Xen presents to Dom0.
>>
>> The hypervisor interface of XENMEM_machine_memory_map requires to
>> specify the size of the e820 map where the hypervisor can supply
>> entries. The alternative would be try and error: start with a small
>> e820 map and in case of error increasing the size until success. I
>> don't like this approach. Especially as dynamic memory allocations
>> are not possible at this stage (using RESERVE_BRK() isn't any better
>> than a static __initdata array IMO).
> 
> Well, I think as a first step we should extend
> XENMEM_{,machine_}memory_map to the model used elsewhere
> where passing in a NULL handle (and perhaps count being zero) is
> a request for the number of needed entries. Granted this doesn't
> help with Linux'es way of consuming the output, but it at least
> allows for dynamic sizing. And Linux would then be free to prepare
> a static array or have a RESERVE_BRK() as it sees fit.

This still needs the definition of an upper limit, as RESERVE_BRK()
is a compile time feature.

For a fully dynamical solution we'd need a way to get a partial
E820 map from the hypervisor (e.g. first 128 entries) in order to
be able to setup at least some memory and later get the rest of
the memory map using some dynamically allocated memory.

>> Which other current factors do you see determining the number of
>> e820 entries presented to Dom0?
> 
> Just look at the implementation: The dependency is on iomem_caps,
> and hence any code removing pages from there could affect the
> number of entries. Which means the number of IOMMUs also has an
> effect here. And further things could get added at any time, as we
> happen to find them.

Its 

Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 08:36,  wrote:
> On 15/11/16 08:15, Jan Beulich wrote:
> On 15.11.16 at 07:33,  wrote:
>>> On 15/11/16 01:11, Alex Thorlton wrote:
 Hey everyone,

 We're having problems with large systems hitting a BUG in
 xen_memory_setup, due to extra e820 entries created in the
 XENMEM_machine_memory_map callback.  The change in the patch gets things
 working, but Boris and I wanted to get opinions on whether or not this
 is the appropriate/entire solution, which is why I've sent it as an RFC
 for now.

 Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
 which is a detail worth discussig.  He proposed possibly adding
 CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
 value than E820MAX, since the Xen e820 table isn't bound by the
 zero-page memory limitations.

 I do *slightly* question the use of E820_X_MAX here, only from a
 cosmetic prospective, as I believe this macro is intended to describe
 the maximum size of the extended e820 table, which, AFAIK, is not used
 by the Xen HV.  That being said, there isn't exactly a "more
 appropriate" macro/variable to use, so this may not really be an issue.

 Any input on the patch, or the questions I've raised above is greatly
 appreciated!
>>>
>>> While I think extending the e820 table is the right thing to do I'm
>>> questioning the assumptions here.
>>>
>>> Looking briefly through the Xen hypervisor sources I think it isn't
>>> yet ready for such large machines: the hypervisor's e820 map seems to
>>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>>> specific path extending this limitation?
>> 
>> No, you didn't. I do question the correlation with "large machines"
>> here though: The issue isn't with large machines afaict, but with
>> ones having very many entries (i.e. heavily fragmented).
> 
> Fact is: non-EFI machines are limited to 128 entries.
> 
> One reason for fragmentation is NUMA - which tends to be present and
> especially adding many entries only with lots of nodes - on large
> machines.

Yes. However, at present Xen only supports up to 64 nodes, and
with one entry per node (assuming there are holes between nodes,
but contiguous memory within them) that's still unlikely to overflow
the table. Yet I'm not meaning to discard the fact that it's been
known for a long time that since Linux needed a larger table, Xen
would need to follow suit sooner or later.

>>> In case I'm right the Xen hypervisor should be prepared for a larger
>>> e820 map, but this won't help alone as there would still be additional
>>> entries for the IOAPICs created.
>>>
>>> So I think we need something like:
>>>
>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>>
>>> and use this for sizing xen_e820_map[].
>> 
>> I would say that if any change gets done here, there shouldn't be
>> any static upper limit at all. That could even be viewed as in line
>> with recent e820.c changes moving to dynamic allocations. In
>> particular I don't see why MAX_IO_APICS would need adding in
>> here, but not other (current and future) factors determining the
>> (pseudo) E820 map Xen presents to Dom0.
> 
> The hypervisor interface of XENMEM_machine_memory_map requires to
> specify the size of the e820 map where the hypervisor can supply
> entries. The alternative would be try and error: start with a small
> e820 map and in case of error increasing the size until success. I
> don't like this approach. Especially as dynamic memory allocations
> are not possible at this stage (using RESERVE_BRK() isn't any better
> than a static __initdata array IMO).

Well, I think as a first step we should extend
XENMEM_{,machine_}memory_map to the model used elsewhere
where passing in a NULL handle (and perhaps count being zero) is
a request for the number of needed entries. Granted this doesn't
help with Linux'es way of consuming the output, but it at least
allows for dynamic sizing. And Linux would then be free to prepare
a static array or have a RESERVE_BRK() as it sees fit.

> Which other current factors do you see determining the number of
> e820 entries presented to Dom0?

Just look at the implementation: The dependency is on iomem_caps,
and hence any code removing pages from there could affect the
number of entries. Which means the number of IOMMUs also has an
effect here. And further things could get added at any time, as we
happen to find them.

Jan


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-15 Thread Jan Beulich
>>> On 15.11.16 at 08:36,  wrote:
> On 15/11/16 08:15, Jan Beulich wrote:
> On 15.11.16 at 07:33,  wrote:
>>> On 15/11/16 01:11, Alex Thorlton wrote:
 Hey everyone,

 We're having problems with large systems hitting a BUG in
 xen_memory_setup, due to extra e820 entries created in the
 XENMEM_machine_memory_map callback.  The change in the patch gets things
 working, but Boris and I wanted to get opinions on whether or not this
 is the appropriate/entire solution, which is why I've sent it as an RFC
 for now.

 Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
 which is a detail worth discussig.  He proposed possibly adding
 CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
 value than E820MAX, since the Xen e820 table isn't bound by the
 zero-page memory limitations.

 I do *slightly* question the use of E820_X_MAX here, only from a
 cosmetic prospective, as I believe this macro is intended to describe
 the maximum size of the extended e820 table, which, AFAIK, is not used
 by the Xen HV.  That being said, there isn't exactly a "more
 appropriate" macro/variable to use, so this may not really be an issue.

 Any input on the patch, or the questions I've raised above is greatly
 appreciated!
>>>
>>> While I think extending the e820 table is the right thing to do I'm
>>> questioning the assumptions here.
>>>
>>> Looking briefly through the Xen hypervisor sources I think it isn't
>>> yet ready for such large machines: the hypervisor's e820 map seems to
>>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>>> specific path extending this limitation?
>> 
>> No, you didn't. I do question the correlation with "large machines"
>> here though: The issue isn't with large machines afaict, but with
>> ones having very many entries (i.e. heavily fragmented).
> 
> Fact is: non-EFI machines are limited to 128 entries.
> 
> One reason for fragmentation is NUMA - which tends to be present and
> especially adding many entries only with lots of nodes - on large
> machines.

Yes. However, at present Xen only supports up to 64 nodes, and
with one entry per node (assuming there are holes between nodes,
but contiguous memory within them) that's still unlikely to overflow
the table. Yet I'm not meaning to discard the fact that it's been
known for a long time that since Linux needed a larger table, Xen
would need to follow suit sooner or later.

>>> In case I'm right the Xen hypervisor should be prepared for a larger
>>> e820 map, but this won't help alone as there would still be additional
>>> entries for the IOAPICs created.
>>>
>>> So I think we need something like:
>>>
>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>>
>>> and use this for sizing xen_e820_map[].
>> 
>> I would say that if any change gets done here, there shouldn't be
>> any static upper limit at all. That could even be viewed as in line
>> with recent e820.c changes moving to dynamic allocations. In
>> particular I don't see why MAX_IO_APICS would need adding in
>> here, but not other (current and future) factors determining the
>> (pseudo) E820 map Xen presents to Dom0.
> 
> The hypervisor interface of XENMEM_machine_memory_map requires to
> specify the size of the e820 map where the hypervisor can supply
> entries. The alternative would be try and error: start with a small
> e820 map and in case of error increasing the size until success. I
> don't like this approach. Especially as dynamic memory allocations
> are not possible at this stage (using RESERVE_BRK() isn't any better
> than a static __initdata array IMO).

Well, I think as a first step we should extend
XENMEM_{,machine_}memory_map to the model used elsewhere
where passing in a NULL handle (and perhaps count being zero) is
a request for the number of needed entries. Granted this doesn't
help with Linux'es way of consuming the output, but it at least
allows for dynamic sizing. And Linux would then be free to prepare
a static array or have a RESERVE_BRK() as it sees fit.

> Which other current factors do you see determining the number of
> e820 entries presented to Dom0?

Just look at the implementation: The dependency is on iomem_caps,
and hence any code removing pages from there could affect the
number of entries. Which means the number of IOMMUs also has an
effect here. And further things could get added at any time, as we
happen to find them.

Jan


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> On 15/11/16 01:11, Alex Thorlton wrote:
>>> Hey everyone,
>>>
>>> We're having problems with large systems hitting a BUG in
>>> xen_memory_setup, due to extra e820 entries created in the
>>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>>> working, but Boris and I wanted to get opinions on whether or not this
>>> is the appropriate/entire solution, which is why I've sent it as an RFC
>>> for now.
>>>
>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>>> which is a detail worth discussig.  He proposed possibly adding
>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>>> value than E820MAX, since the Xen e820 table isn't bound by the
>>> zero-page memory limitations.
>>>
>>> I do *slightly* question the use of E820_X_MAX here, only from a
>>> cosmetic prospective, as I believe this macro is intended to describe
>>> the maximum size of the extended e820 table, which, AFAIK, is not used
>>> by the Xen HV.  That being said, there isn't exactly a "more
>>> appropriate" macro/variable to use, so this may not really be an issue.
>>>
>>> Any input on the patch, or the questions I've raised above is greatly
>>> appreciated!
>>
>> While I think extending the e820 table is the right thing to do I'm
>> questioning the assumptions here.
>>
>> Looking briefly through the Xen hypervisor sources I think it isn't
>> yet ready for such large machines: the hypervisor's e820 map seems to
>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>> specific path extending this limitation?
> 
> No, you didn't. I do question the correlation with "large machines"
> here though: The issue isn't with large machines afaict, but with
> ones having very many entries (i.e. heavily fragmented).

Fact is: non-EFI machines are limited to 128 entries.

One reason for fragmentation is NUMA - which tends to be present and
especially adding many entries only with lots of nodes - on large
machines.

So while you are of course right that the problem isn't the size of
the machine, but the memory fragmentation, the chances to show up
are much higher on large machines.

So I'd rephrase:

Looking briefly through the Xen hypervisor sources I think it isn't yet
ready for machines with many e820 entries due to memory fragmentation
to be found e.g. on very large NUMA machines with lots of nodes: ...

>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
> 
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

The hypervisor interface of XENMEM_machine_memory_map requires to
specify the size of the e820 map where the hypervisor can supply
entries. The alternative would be try and error: start with a small
e820 map and in case of error increasing the size until success. I
don't like this approach. Especially as dynamic memory allocations
are not possible at this stage (using RESERVE_BRK() isn't any better
than a static __initdata array IMO).

Which other current factors do you see determining the number of
e820 entries presented to Dom0?


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> On 15/11/16 01:11, Alex Thorlton wrote:
>>> Hey everyone,
>>>
>>> We're having problems with large systems hitting a BUG in
>>> xen_memory_setup, due to extra e820 entries created in the
>>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>>> working, but Boris and I wanted to get opinions on whether or not this
>>> is the appropriate/entire solution, which is why I've sent it as an RFC
>>> for now.
>>>
>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>>> which is a detail worth discussig.  He proposed possibly adding
>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>>> value than E820MAX, since the Xen e820 table isn't bound by the
>>> zero-page memory limitations.
>>>
>>> I do *slightly* question the use of E820_X_MAX here, only from a
>>> cosmetic prospective, as I believe this macro is intended to describe
>>> the maximum size of the extended e820 table, which, AFAIK, is not used
>>> by the Xen HV.  That being said, there isn't exactly a "more
>>> appropriate" macro/variable to use, so this may not really be an issue.
>>>
>>> Any input on the patch, or the questions I've raised above is greatly
>>> appreciated!
>>
>> While I think extending the e820 table is the right thing to do I'm
>> questioning the assumptions here.
>>
>> Looking briefly through the Xen hypervisor sources I think it isn't
>> yet ready for such large machines: the hypervisor's e820 map seems to
>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>> specific path extending this limitation?
> 
> No, you didn't. I do question the correlation with "large machines"
> here though: The issue isn't with large machines afaict, but with
> ones having very many entries (i.e. heavily fragmented).

Fact is: non-EFI machines are limited to 128 entries.

One reason for fragmentation is NUMA - which tends to be present and
especially adding many entries only with lots of nodes - on large
machines.

So while you are of course right that the problem isn't the size of
the machine, but the memory fragmentation, the chances to show up
are much higher on large machines.

So I'd rephrase:

Looking briefly through the Xen hypervisor sources I think it isn't yet
ready for machines with many e820 entries due to memory fragmentation
to be found e.g. on very large NUMA machines with lots of nodes: ...

>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
> 
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

The hypervisor interface of XENMEM_machine_memory_map requires to
specify the size of the e820 map where the hypervisor can supply
entries. The alternative would be try and error: start with a small
e820 map and in case of error increasing the size until success. I
don't like this approach. Especially as dynamic memory allocations
are not possible at this stage (using RESERVE_BRK() isn't any better
than a static __initdata array IMO).

Which other current factors do you see determining the number of
e820 entries presented to Dom0?


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Jan Beulich
>>> On 15.11.16 at 07:33,  wrote:
> On 15/11/16 01:11, Alex Thorlton wrote:
>> Hey everyone,
>> 
>> We're having problems with large systems hitting a BUG in
>> xen_memory_setup, due to extra e820 entries created in the
>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>> working, but Boris and I wanted to get opinions on whether or not this
>> is the appropriate/entire solution, which is why I've sent it as an RFC
>> for now.
>> 
>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>> which is a detail worth discussig.  He proposed possibly adding
>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>> value than E820MAX, since the Xen e820 table isn't bound by the
>> zero-page memory limitations.
>> 
>> I do *slightly* question the use of E820_X_MAX here, only from a
>> cosmetic prospective, as I believe this macro is intended to describe
>> the maximum size of the extended e820 table, which, AFAIK, is not used
>> by the Xen HV.  That being said, there isn't exactly a "more
>> appropriate" macro/variable to use, so this may not really be an issue.
>> 
>> Any input on the patch, or the questions I've raised above is greatly
>> appreciated!
> 
> While I think extending the e820 table is the right thing to do I'm
> questioning the assumptions here.
> 
> Looking briefly through the Xen hypervisor sources I think it isn't
> yet ready for such large machines: the hypervisor's e820 map seems to
> be still limited to 128 e820 entries. Jan, did I overlook an EFI
> specific path extending this limitation?

No, you didn't. I do question the correlation with "large machines"
here though: The issue isn't with large machines afaict, but with
ones having very many entries (i.e. heavily fragmented).

> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
> 
> So I think we need something like:
> 
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
> 
> and use this for sizing xen_e820_map[].

I would say that if any change gets done here, there shouldn't be
any static upper limit at all. That could even be viewed as in line
with recent e820.c changes moving to dynamic allocations. In
particular I don't see why MAX_IO_APICS would need adding in
here, but not other (current and future) factors determining the
(pseudo) E820 map Xen presents to Dom0.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Jan Beulich
>>> On 15.11.16 at 07:33,  wrote:
> On 15/11/16 01:11, Alex Thorlton wrote:
>> Hey everyone,
>> 
>> We're having problems with large systems hitting a BUG in
>> xen_memory_setup, due to extra e820 entries created in the
>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>> working, but Boris and I wanted to get opinions on whether or not this
>> is the appropriate/entire solution, which is why I've sent it as an RFC
>> for now.
>> 
>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>> which is a detail worth discussig.  He proposed possibly adding
>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>> value than E820MAX, since the Xen e820 table isn't bound by the
>> zero-page memory limitations.
>> 
>> I do *slightly* question the use of E820_X_MAX here, only from a
>> cosmetic prospective, as I believe this macro is intended to describe
>> the maximum size of the extended e820 table, which, AFAIK, is not used
>> by the Xen HV.  That being said, there isn't exactly a "more
>> appropriate" macro/variable to use, so this may not really be an issue.
>> 
>> Any input on the patch, or the questions I've raised above is greatly
>> appreciated!
> 
> While I think extending the e820 table is the right thing to do I'm
> questioning the assumptions here.
> 
> Looking briefly through the Xen hypervisor sources I think it isn't
> yet ready for such large machines: the hypervisor's e820 map seems to
> be still limited to 128 e820 entries. Jan, did I overlook an EFI
> specific path extending this limitation?

No, you didn't. I do question the correlation with "large machines"
here though: The issue isn't with large machines afaict, but with
ones having very many entries (i.e. heavily fragmented).

> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
> 
> So I think we need something like:
> 
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
> 
> and use this for sizing xen_e820_map[].

I would say that if any change gets done here, there shouldn't be
any static upper limit at all. That could even be viewed as in line
with recent e820.c changes moving to dynamic allocations. In
particular I don't see why MAX_IO_APICS would need adding in
here, but not other (current and future) factors determining the
(pseudo) E820 map Xen presents to Dom0.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
> 
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
> 
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
> 
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
> 
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

While I think extending the e820 table is the right thing to do I'm
questioning the assumptions here.

Looking briefly through the Xen hypervisor sources I think it isn't
yet ready for such large machines: the hypervisor's e820 map seems to
be still limited to 128 e820 entries. Jan, did I overlook an EFI
specific path extending this limitation?

In case I'm right the Xen hypervisor should be prepared for a larger
e820 map, but this won't help alone as there would still be additional
entries for the IOAPICs created.

So I think we need something like:

#define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

and use this for sizing xen_e820_map[].


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
> 
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
> 
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
> 
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
> 
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

While I think extending the e820 table is the right thing to do I'm
questioning the assumptions here.

Looking briefly through the Xen hypervisor sources I think it isn't
yet ready for such large machines: the hypervisor's e820 map seems to
be still limited to 128 e820 entries. Jan, did I overlook an EFI
specific path extending this limitation?

In case I'm right the Xen hypervisor should be prepared for a larger
e820 map, but this won't help alone as there would still be additional
entries for the IOAPICs created.

So I think we need something like:

#define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

and use this for sizing xen_e820_map[].


Juergen


[RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Alex Thorlton
Hey everyone,

We're having problems with large systems hitting a BUG in
xen_memory_setup, due to extra e820 entries created in the
XENMEM_machine_memory_map callback.  The change in the patch gets things
working, but Boris and I wanted to get opinions on whether or not this
is the appropriate/entire solution, which is why I've sent it as an RFC
for now.

Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
which is a detail worth discussig.  He proposed possibly adding
CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
value than E820MAX, since the Xen e820 table isn't bound by the
zero-page memory limitations.

I do *slightly* question the use of E820_X_MAX here, only from a
cosmetic prospective, as I believe this macro is intended to describe
the maximum size of the extended e820 table, which, AFAIK, is not used
by the Xen HV.  That being said, there isn't exactly a "more
appropriate" macro/variable to use, so this may not really be an issue.

Any input on the patch, or the questions I've raised above is greatly
appreciated!

- Alex

Alex Thorlton (1):
  xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

 arch/x86/xen/setup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
1.8.5.6



[RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Alex Thorlton
Hey everyone,

We're having problems with large systems hitting a BUG in
xen_memory_setup, due to extra e820 entries created in the
XENMEM_machine_memory_map callback.  The change in the patch gets things
working, but Boris and I wanted to get opinions on whether or not this
is the appropriate/entire solution, which is why I've sent it as an RFC
for now.

Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
which is a detail worth discussig.  He proposed possibly adding
CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
value than E820MAX, since the Xen e820 table isn't bound by the
zero-page memory limitations.

I do *slightly* question the use of E820_X_MAX here, only from a
cosmetic prospective, as I believe this macro is intended to describe
the maximum size of the extended e820 table, which, AFAIK, is not used
by the Xen HV.  That being said, there isn't exactly a "more
appropriate" macro/variable to use, so this may not really be an issue.

Any input on the patch, or the questions I've raised above is greatly
appreciated!

- Alex

Alex Thorlton (1):
  xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

 arch/x86/xen/setup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
1.8.5.6