Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-08 Thread Anshuman Khandual



On 11/01/2018 04:57 PM, Will Deacon wrote:
> [ Nit: Please wrap your lines when replying -- I've done it for you here ]
> 
> On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 08:14 PM, John Garry wrote:
>>>  I think we should either factor out the sanity check
 into a core helper or make the core code robust to these funny 
 configurations.
>>>
>>> OK, so to me it would make sense to factor out a sanity check into a 
>>> core
>>> helper.
>>
>> That, or have the OF code perform the same validation that slit_valid() 
>> is
>> doing for ACPI. I'm just trying to avoid other architectures running into
>> this problem down the line.
>>
>
> Right, OF code should do this validation job if ACPI is doing it
> (especially since the DT bindings actually specify the distance
> rules), and not rely on the arch NUMA code to accept/reject
> numa_set_distance() combinations.

 I would say this particular condition checking still falls under arch NUMA 
 init
 code sanity check like other basic tests what numa_set_distance() 
 currently does
 already but it should not be a necessity for the OF driver to check these.
>>>
>>> The checks in the arch NUMA code mean that invalid inter-node distance
>>> combinations are ignored.
>>
>> Right and should not this new test (from != to && distance == 
>> LOCAL_DISTANCE) be
>> one of them as well ? numa_set_distance() updates the table or just throws 
>> some
>> warnings while skipping entries it deems invalid. It would be okay to have 
>> this
>> new check there in addition to others like this patch suggests.

I missed this response due to some problem in my mail client and re-iterated
some of the same points again on the V2 (https://lkml.org/lkml/2018/11/8/734).
My apologies for the same.

> 
> If we're changing numa_set_distance(), I'd actually be inclined to do the
> opposite of what you're suggesting and move as much of this boilerplate
> checking into the core code. Perhaps we could add something like
> check_fw_numa_topology() and have both ACPI and OF call that so that the
> arch doesn't need to worry about malformed tables at all.

Right although I doubt that we could ever have a common check_fw_numa_topology()
check as the semantics for the table and individual entries there in for DT and
ACPI might be different. But I agree what you are saying.

> 
>>> However, if any entries in the table are invalid, then the whole table
>>> can be discarded as none of it can be believed, i.e. it's better to
>>> validate the table.
>>>
>>
>> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
>> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
>> NUMA code numa_set_distance() never had the opportunity to do the sanity
>> checks as ACPI slit_valid() has completely invalidated the table.
>>
>> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
>> checks on the distance values parse from the "distance-matrix" property
>> and all the checks directly falls on numa_set_distance(). This needs to
>> be fixed in line with ACPI
>>
>> * If (to == from) ---> distance = LOCAL_DISTANCE
>> * If (to != from) ---> distance > LOCAL_DISTANCE
>>
>> At the same time its okay to just enhance numa_set_distance() test coverage
>> to include this new test. If we would have trusted firmware parsing all the
>> way, existing basic checks about node range, distance stuff should not have
>> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
>> part, we should include this new check there as well.
> 
> I don't see what we gain by duplicating the check. In fact, it has a few
> downsides:
> 
>   (1) It confuses the semantics of the API, because it is no longer clear
>   who "owns" the check
> 
>   (2) It duplicates code in each architecture
> 
>   (3) Some clever-cloggs will remove at least some of the duplication in
>   future

Agreed, it has down sides.

> 
> I'm not willing to accept the check in the arm64 code if we update the
> OF code.

Okay, we should remove them instead.

> 
> I think the way forward here is for John to fix the crash he reported by
> adding the check to the OF code. If somebody wants to follow up with
> subsequent patches to move more of the checking out of the arch code, then
> we can review that as a separate series.

Okay. Anyways I had some other comments related to semantics checking at
the OF driver level but I can probably send them out as a separate patch
later. Once again it was my bad to miss this response in the first place.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-08 Thread Anshuman Khandual



On 11/01/2018 04:57 PM, Will Deacon wrote:
> [ Nit: Please wrap your lines when replying -- I've done it for you here ]
> 
> On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 08:14 PM, John Garry wrote:
>>>  I think we should either factor out the sanity check
 into a core helper or make the core code robust to these funny 
 configurations.
>>>
>>> OK, so to me it would make sense to factor out a sanity check into a 
>>> core
>>> helper.
>>
>> That, or have the OF code perform the same validation that slit_valid() 
>> is
>> doing for ACPI. I'm just trying to avoid other architectures running into
>> this problem down the line.
>>
>
> Right, OF code should do this validation job if ACPI is doing it
> (especially since the DT bindings actually specify the distance
> rules), and not rely on the arch NUMA code to accept/reject
> numa_set_distance() combinations.

 I would say this particular condition checking still falls under arch NUMA 
 init
 code sanity check like other basic tests what numa_set_distance() 
 currently does
 already but it should not be a necessity for the OF driver to check these.
>>>
>>> The checks in the arch NUMA code mean that invalid inter-node distance
>>> combinations are ignored.
>>
>> Right and should not this new test (from != to && distance == 
>> LOCAL_DISTANCE) be
>> one of them as well ? numa_set_distance() updates the table or just throws 
>> some
>> warnings while skipping entries it deems invalid. It would be okay to have 
>> this
>> new check there in addition to others like this patch suggests.

I missed this response due to some problem in my mail client and re-iterated
some of the same points again on the V2 (https://lkml.org/lkml/2018/11/8/734).
My apologies for the same.

> 
> If we're changing numa_set_distance(), I'd actually be inclined to do the
> opposite of what you're suggesting and move as much of this boilerplate
> checking into the core code. Perhaps we could add something like
> check_fw_numa_topology() and have both ACPI and OF call that so that the
> arch doesn't need to worry about malformed tables at all.

Right although I doubt that we could ever have a common check_fw_numa_topology()
check as the semantics for the table and individual entries there in for DT and
ACPI might be different. But I agree what you are saying.

> 
>>> However, if any entries in the table are invalid, then the whole table
>>> can be discarded as none of it can be believed, i.e. it's better to
>>> validate the table.
>>>
>>
>> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
>> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
>> NUMA code numa_set_distance() never had the opportunity to do the sanity
>> checks as ACPI slit_valid() has completely invalidated the table.
>>
>> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
>> checks on the distance values parse from the "distance-matrix" property
>> and all the checks directly falls on numa_set_distance(). This needs to
>> be fixed in line with ACPI
>>
>> * If (to == from) ---> distance = LOCAL_DISTANCE
>> * If (to != from) ---> distance > LOCAL_DISTANCE
>>
>> At the same time its okay to just enhance numa_set_distance() test coverage
>> to include this new test. If we would have trusted firmware parsing all the
>> way, existing basic checks about node range, distance stuff should not have
>> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
>> part, we should include this new check there as well.
> 
> I don't see what we gain by duplicating the check. In fact, it has a few
> downsides:
> 
>   (1) It confuses the semantics of the API, because it is no longer clear
>   who "owns" the check
> 
>   (2) It duplicates code in each architecture
> 
>   (3) Some clever-cloggs will remove at least some of the duplication in
>   future

Agreed, it has down sides.

> 
> I'm not willing to accept the check in the arm64 code if we update the
> OF code.

Okay, we should remove them instead.

> 
> I think the way forward here is for John to fix the crash he reported by
> adding the check to the OF code. If somebody wants to follow up with
> subsequent patches to move more of the checking out of the arch code, then
> we can review that as a separate series.

Okay. Anyways I had some other comments related to semantics checking at
the OF driver level but I can probably send them out as a separate patch
later. Once again it was my bad to miss this response in the first place.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread John Garry


Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.


I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code.


I was planning on doing that.

If somebody wants to follow up with

subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

.






Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread John Garry


Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.


I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code.


I was planning on doing that.

If somebody wants to follow up with

subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

.






Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread Will Deacon
[ Nit: Please wrap your lines when replying -- I've done it for you here ]

On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
> On 10/29/2018 08:14 PM, John Garry wrote:
> >  I think we should either factor out the sanity check
> >> into a core helper or make the core code robust to these funny 
> >> configurations.
> >
> > OK, so to me it would make sense to factor out a sanity check into a 
> > core
> > helper.
> 
>  That, or have the OF code perform the same validation that slit_valid() 
>  is
>  doing for ACPI. I'm just trying to avoid other architectures running into
>  this problem down the line.
> 
> >>>
> >>> Right, OF code should do this validation job if ACPI is doing it
> >>> (especially since the DT bindings actually specify the distance
> >>> rules), and not rely on the arch NUMA code to accept/reject
> >>> numa_set_distance() combinations.
> >>
> >> I would say this particular condition checking still falls under arch NUMA 
> >> init
> >> code sanity check like other basic tests what numa_set_distance() 
> >> currently does
> >> already but it should not be a necessity for the OF driver to check these.
> > 
> > The checks in the arch NUMA code mean that invalid inter-node distance
> > combinations are ignored.
> 
> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) 
> be
> one of them as well ? numa_set_distance() updates the table or just throws 
> some
> warnings while skipping entries it deems invalid. It would be okay to have 
> this
> new check there in addition to others like this patch suggests.

If we're changing numa_set_distance(), I'd actually be inclined to do the
opposite of what you're suggesting and move as much of this boilerplate
checking into the core code. Perhaps we could add something like
check_fw_numa_topology() and have both ACPI and OF call that so that the
arch doesn't need to worry about malformed tables at all.

> > However, if any entries in the table are invalid, then the whole table
> > can be discarded as none of it can be believed, i.e. it's better to
> > validate the table.
> >
> 
> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
> NUMA code numa_set_distance() never had the opportunity to do the sanity
> checks as ACPI slit_valid() has completely invalidated the table.
> 
> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
> checks on the distance values parse from the "distance-matrix" property
> and all the checks directly falls on numa_set_distance(). This needs to
> be fixed in line with ACPI
> 
> * If (to == from) ---> distance = LOCAL_DISTANCE
> * If (to != from) ---> distance > LOCAL_DISTANCE
> 
> At the same time its okay to just enhance numa_set_distance() test coverage
> to include this new test. If we would have trusted firmware parsing all the
> way, existing basic checks about node range, distance stuff should not have
> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
> part, we should include this new check there as well.

I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code. If somebody wants to follow up with
subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread Will Deacon
[ Nit: Please wrap your lines when replying -- I've done it for you here ]

On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
> On 10/29/2018 08:14 PM, John Garry wrote:
> >  I think we should either factor out the sanity check
> >> into a core helper or make the core code robust to these funny 
> >> configurations.
> >
> > OK, so to me it would make sense to factor out a sanity check into a 
> > core
> > helper.
> 
>  That, or have the OF code perform the same validation that slit_valid() 
>  is
>  doing for ACPI. I'm just trying to avoid other architectures running into
>  this problem down the line.
> 
> >>>
> >>> Right, OF code should do this validation job if ACPI is doing it
> >>> (especially since the DT bindings actually specify the distance
> >>> rules), and not rely on the arch NUMA code to accept/reject
> >>> numa_set_distance() combinations.
> >>
> >> I would say this particular condition checking still falls under arch NUMA 
> >> init
> >> code sanity check like other basic tests what numa_set_distance() 
> >> currently does
> >> already but it should not be a necessity for the OF driver to check these.
> > 
> > The checks in the arch NUMA code mean that invalid inter-node distance
> > combinations are ignored.
> 
> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) 
> be
> one of them as well ? numa_set_distance() updates the table or just throws 
> some
> warnings while skipping entries it deems invalid. It would be okay to have 
> this
> new check there in addition to others like this patch suggests.

If we're changing numa_set_distance(), I'd actually be inclined to do the
opposite of what you're suggesting and move as much of this boilerplate
checking into the core code. Perhaps we could add something like
check_fw_numa_topology() and have both ACPI and OF call that so that the
arch doesn't need to worry about malformed tables at all.

> > However, if any entries in the table are invalid, then the whole table
> > can be discarded as none of it can be believed, i.e. it's better to
> > validate the table.
> >
> 
> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
> NUMA code numa_set_distance() never had the opportunity to do the sanity
> checks as ACPI slit_valid() has completely invalidated the table.
> 
> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
> checks on the distance values parse from the "distance-matrix" property
> and all the checks directly falls on numa_set_distance(). This needs to
> be fixed in line with ACPI
> 
> * If (to == from) ---> distance = LOCAL_DISTANCE
> * If (to != from) ---> distance > LOCAL_DISTANCE
> 
> At the same time its okay to just enhance numa_set_distance() test coverage
> to include this new test. If we would have trusted firmware parsing all the
> way, existing basic checks about node range, distance stuff should not have
> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
> part, we should include this new check there as well.

I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code. If somebody wants to follow up with
subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:18 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 06:02 PM, John Garry wrote:
>>> On 29/10/2018 12:16, Will Deacon wrote:
 On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>> Currently it is acceptable to set the distance between 2 separate nodes 
>>> to
>>> LOCAL_DISTANCE.
>>>
>>> Reject this as it is invalid.
>>>
>>> This change avoids a crash reported in [1].
>>>
>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>
>>> Signed-off-by: John Garry 
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 146c04c..6092e3d 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
>>> distance)
>>> }
>>>
>>> if ((u8)distance != distance ||
>>> -    (from == to && distance != LOCAL_DISTANCE)) {
>>> +    (from == to && distance != LOCAL_DISTANCE) ||
>>> +    (from != to && distance == LOCAL_DISTANCE)) {
>>
>> The current code here is more-or-less lifted from the x86 implementation
>> of numa_set_distance().
>
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it
>>> (especially since the DT bindings actually specify the distance rules),
>>> and not rely on the arch NUMA code to accept/reject numa_set_distance()
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these. 
>> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>>  (from == to && distance != LOCAL_DISTANCE) ||
>>  (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM". Disabling NUMA is one such major decision 
>> which
>> should be with arch NUMA code not with OF driver.
> 
> I don't fully understand what you're getting at here, but why would the
> check posted by John be arch-specific? It's already done in the core code
> for ACPI, so there's a discrepancy between ACPI and FDT that should be
> resolved. I'd also argue that the subtleties of this check are actually
> based on what the core code is willing to accept in terms of the NUMA
> description, so it's also the best place to enforce it.

Agreed. I had overlooked the existing semantics with respect to ACPI parsing.
Yes, there is a discrepancy with respect to FDT which should be fixed. But
IMHO its also worth to enhance numa_set_distance() checks with this proposed
new check as well.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:18 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 06:02 PM, John Garry wrote:
>>> On 29/10/2018 12:16, Will Deacon wrote:
 On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>> Currently it is acceptable to set the distance between 2 separate nodes 
>>> to
>>> LOCAL_DISTANCE.
>>>
>>> Reject this as it is invalid.
>>>
>>> This change avoids a crash reported in [1].
>>>
>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>
>>> Signed-off-by: John Garry 
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 146c04c..6092e3d 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
>>> distance)
>>> }
>>>
>>> if ((u8)distance != distance ||
>>> -    (from == to && distance != LOCAL_DISTANCE)) {
>>> +    (from == to && distance != LOCAL_DISTANCE) ||
>>> +    (from != to && distance == LOCAL_DISTANCE)) {
>>
>> The current code here is more-or-less lifted from the x86 implementation
>> of numa_set_distance().
>
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it
>>> (especially since the DT bindings actually specify the distance rules),
>>> and not rely on the arch NUMA code to accept/reject numa_set_distance()
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these. 
>> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>>  (from == to && distance != LOCAL_DISTANCE) ||
>>  (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM". Disabling NUMA is one such major decision 
>> which
>> should be with arch NUMA code not with OF driver.
> 
> I don't fully understand what you're getting at here, but why would the
> check posted by John be arch-specific? It's already done in the core code
> for ACPI, so there's a discrepancy between ACPI and FDT that should be
> resolved. I'd also argue that the subtleties of this check are actually
> based on what the core code is willing to accept in terms of the NUMA
> description, so it's also the best place to enforce it.

Agreed. I had overlooked the existing semantics with respect to ACPI parsing.
Yes, there is a discrepancy with respect to FDT which should be fixed. But
IMHO its also worth to enhance numa_set_distance() checks with this proposed
new check as well.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:14 PM, John Garry wrote:
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it 
>>> (especially since the DT bindings actually specify the distance rules), and 
>>> not rely on the arch NUMA code to accept/reject numa_set_distance() 
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these.
> 
> The checks in the arch NUMA code mean that invalid inter-node distance 
> combinations are ignored.

Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
one of them as well ? numa_set_distance() updates the table or just throws some
warnings while skipping entries it deems invalid. It would be okay to have this
new check there in addition to others like this patch suggests.

> 
> However, if any entries in the table are invalid, then the whole table can be 
> discarded as none of it can be believed, i.e. it's better to validate the 
> table.
>

Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.

> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>> (from == to && distance != LOCAL_DISTANCE) ||
>>     (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an 
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM".
> 
> Sorry, but I don't know who was advocating this.

I was just giving an example. Invalidating NUMA distance table during firmware
table (ACPI or FDT) parsing forces arm64_numa_init() to fall back on dummy NUMA
node which is like disabling NUMA. But that is the current semantics with ACPI
parsing which I overlooked. Fixing OF driver to do the same wont extend this
any further, hence my previous concern does not stand valid.

> 
> Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I meant parsing the table would fail, so arch NUMA would fall back on dummy 
> NUMA.

Right and ACPI parsing does that and can force a fallback on a dummy NUMA node.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:14 PM, John Garry wrote:
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it 
>>> (especially since the DT bindings actually specify the distance rules), and 
>>> not rely on the arch NUMA code to accept/reject numa_set_distance() 
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these.
> 
> The checks in the arch NUMA code mean that invalid inter-node distance 
> combinations are ignored.

Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
one of them as well ? numa_set_distance() updates the table or just throws some
warnings while skipping entries it deems invalid. It would be okay to have this
new check there in addition to others like this patch suggests.

> 
> However, if any entries in the table are invalid, then the whole table can be 
> discarded as none of it can be believed, i.e. it's better to validate the 
> table.
>

Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.

> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>> (from == to && distance != LOCAL_DISTANCE) ||
>>     (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an 
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM".
> 
> Sorry, but I don't know who was advocating this.

I was just giving an example. Invalidating NUMA distance table during firmware
table (ACPI or FDT) parsing forces arm64_numa_init() to fall back on dummy NUMA
node which is like disabling NUMA. But that is the current semantics with ACPI
parsing which I overlooked. Fixing OF driver to do the same wont extend this
any further, hence my previous concern does not stand valid.

> 
> Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I meant parsing the table would fail, so arch NUMA would fall back on dummy 
> NUMA.

Right and ACPI parsing does that and can force a fallback on a dummy NUMA node.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
> On 10/29/2018 06:02 PM, John Garry wrote:
> > On 29/10/2018 12:16, Will Deacon wrote:
> >> On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> >>> On 29/10/2018 11:25, Will Deacon wrote:
>  On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> > Currently it is acceptable to set the distance between 2 separate nodes 
> > to
> > LOCAL_DISTANCE.
> >
> > Reject this as it is invalid.
> >
> > This change avoids a crash reported in [1].
> >
> > [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >
> > Signed-off-by: John Garry 
> >
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index 146c04c..6092e3d 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> > distance)
> > }
> >
> > if ((u8)distance != distance ||
> > -    (from == to && distance != LOCAL_DISTANCE)) {
> > +    (from == to && distance != LOCAL_DISTANCE) ||
> > +    (from != to && distance == LOCAL_DISTANCE)) {
> 
>  The current code here is more-or-less lifted from the x86 implementation
>  of numa_set_distance().
> >>>
> >>> Right, I did notice this. I didn't think that x86 folks would be so
> >>> concerned since they generally only use ACPI, and the ACPI code already
> >>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> >>> code].
> >>>
> >>>  I think we should either factor out the sanity check
>  into a core helper or make the core code robust to these funny 
>  configurations.
> >>>
> >>> OK, so to me it would make sense to factor out a sanity check into a core
> >>> helper.
> >>
> >> That, or have the OF code perform the same validation that slit_valid() is
> >> doing for ACPI. I'm just trying to avoid other architectures running into
> >> this problem down the line.
> >>
> > 
> > Right, OF code should do this validation job if ACPI is doing it
> > (especially since the DT bindings actually specify the distance rules),
> > and not rely on the arch NUMA code to accept/reject numa_set_distance()
> > combinations.
> 
> I would say this particular condition checking still falls under arch NUMA 
> init
> code sanity check like other basic tests what numa_set_distance() currently 
> does
> already but it should not be a necessity for the OF driver to check these. It 
> can
> choose to check but arch NUMA should check basic things like two different 
> NUMA
> nodes should not have LOCAL_DISTANCE as distance like in this case.
> 
>   (from == to && distance != LOCAL_DISTANCE) ||
>   (from != to && distance == LOCAL_DISTANCE))
> 
> 
> > 
> > And, in addition to this, I'd say OF should disable NUMA if given an
> > invalid table (like ACPI does).
> 
> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
> kernel
> starts booting. Platform should have sent right values, OF driver trying to
> adjust stuff what platform has sent with FDT once the kernel starts booting is
> not right. For example "Kernel NUMA wont like the distance factors lets clean
> then up before passing on to MM". Disabling NUMA is one such major decision 
> which
> should be with arch NUMA code not with OF driver.

I don't fully understand what you're getting at here, but why would the
check posted by John be arch-specific? It's already done in the core code
for ACPI, so there's a discrepancy between ACPI and FDT that should be
resolved. I'd also argue that the subtleties of this check are actually
based on what the core code is willing to accept in terms of the NUMA
description, so it's also the best place to enforce it.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
> On 10/29/2018 06:02 PM, John Garry wrote:
> > On 29/10/2018 12:16, Will Deacon wrote:
> >> On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> >>> On 29/10/2018 11:25, Will Deacon wrote:
>  On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> > Currently it is acceptable to set the distance between 2 separate nodes 
> > to
> > LOCAL_DISTANCE.
> >
> > Reject this as it is invalid.
> >
> > This change avoids a crash reported in [1].
> >
> > [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >
> > Signed-off-by: John Garry 
> >
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index 146c04c..6092e3d 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> > distance)
> > }
> >
> > if ((u8)distance != distance ||
> > -    (from == to && distance != LOCAL_DISTANCE)) {
> > +    (from == to && distance != LOCAL_DISTANCE) ||
> > +    (from != to && distance == LOCAL_DISTANCE)) {
> 
>  The current code here is more-or-less lifted from the x86 implementation
>  of numa_set_distance().
> >>>
> >>> Right, I did notice this. I didn't think that x86 folks would be so
> >>> concerned since they generally only use ACPI, and the ACPI code already
> >>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> >>> code].
> >>>
> >>>  I think we should either factor out the sanity check
>  into a core helper or make the core code robust to these funny 
>  configurations.
> >>>
> >>> OK, so to me it would make sense to factor out a sanity check into a core
> >>> helper.
> >>
> >> That, or have the OF code perform the same validation that slit_valid() is
> >> doing for ACPI. I'm just trying to avoid other architectures running into
> >> this problem down the line.
> >>
> > 
> > Right, OF code should do this validation job if ACPI is doing it
> > (especially since the DT bindings actually specify the distance rules),
> > and not rely on the arch NUMA code to accept/reject numa_set_distance()
> > combinations.
> 
> I would say this particular condition checking still falls under arch NUMA 
> init
> code sanity check like other basic tests what numa_set_distance() currently 
> does
> already but it should not be a necessity for the OF driver to check these. It 
> can
> choose to check but arch NUMA should check basic things like two different 
> NUMA
> nodes should not have LOCAL_DISTANCE as distance like in this case.
> 
>   (from == to && distance != LOCAL_DISTANCE) ||
>   (from != to && distance == LOCAL_DISTANCE))
> 
> 
> > 
> > And, in addition to this, I'd say OF should disable NUMA if given an
> > invalid table (like ACPI does).
> 
> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
> kernel
> starts booting. Platform should have sent right values, OF driver trying to
> adjust stuff what platform has sent with FDT once the kernel starts booting is
> not right. For example "Kernel NUMA wont like the distance factors lets clean
> then up before passing on to MM". Disabling NUMA is one such major decision 
> which
> should be with arch NUMA code not with OF driver.

I don't fully understand what you're getting at here, but why would the
check posted by John be arch-specific? It's already done in the core code
for ACPI, so there's a discrepancy between ACPI and FDT that should be
resolved. I'd also argue that the subtleties of this check are actually
based on what the core code is willing to accept in terms of the NUMA
description, so it's also the best place to enforce it.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry


 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a core
helper.


That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.



Right, OF code should do this validation job if ACPI is doing it (especially 
since the DT bindings actually specify the distance rules), and not rely on the 
arch NUMA code to accept/reject numa_set_distance() combinations.


I would say this particular condition checking still falls under arch NUMA init
code sanity check like other basic tests what numa_set_distance() currently does
already but it should not be a necessity for the OF driver to check these.


The checks in the arch NUMA code mean that invalid inter-node distance 
combinations are ignored.


However, if any entries in the table are invalid, then the whole table 
can be discarded as none of it can be believed, i.e. it's better to 
validate the table.


It can

choose to check but arch NUMA should check basic things like two different NUMA
nodes should not have LOCAL_DISTANCE as distance like in this case.

(from == to && distance != LOCAL_DISTANCE) ||
(from != to && distance == LOCAL_DISTANCE))




And, in addition to this, I'd say OF should disable NUMA if given an invalid 
table (like ACPI does).


Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
starts booting. Platform should have sent right values, OF driver trying to
adjust stuff what platform has sent with FDT once the kernel starts booting is
not right. For example "Kernel NUMA wont like the distance factors lets clean
then up before passing on to MM".


Sorry, but I don't know who was advocating this.

Disabling NUMA is one such major decision which

should be with arch NUMA code not with OF driver.


I meant parsing the table would fail, so arch NUMA would fall back on 
dummy NUMA.






Thanks,
John




Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry


 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a core
helper.


That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.



Right, OF code should do this validation job if ACPI is doing it (especially 
since the DT bindings actually specify the distance rules), and not rely on the 
arch NUMA code to accept/reject numa_set_distance() combinations.


I would say this particular condition checking still falls under arch NUMA init
code sanity check like other basic tests what numa_set_distance() currently does
already but it should not be a necessity for the OF driver to check these.


The checks in the arch NUMA code mean that invalid inter-node distance 
combinations are ignored.


However, if any entries in the table are invalid, then the whole table 
can be discarded as none of it can be believed, i.e. it's better to 
validate the table.


It can

choose to check but arch NUMA should check basic things like two different NUMA
nodes should not have LOCAL_DISTANCE as distance like in this case.

(from == to && distance != LOCAL_DISTANCE) ||
(from != to && distance == LOCAL_DISTANCE))




And, in addition to this, I'd say OF should disable NUMA if given an invalid 
table (like ACPI does).


Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
starts booting. Platform should have sent right values, OF driver trying to
adjust stuff what platform has sent with FDT once the kernel starts booting is
not right. For example "Kernel NUMA wont like the distance factors lets clean
then up before passing on to MM".


Sorry, but I don't know who was advocating this.

Disabling NUMA is one such major decision which

should be with arch NUMA code not with OF driver.


I meant parsing the table would fail, so arch NUMA would fall back on 
dummy NUMA.






Thanks,
John




Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 06:02 PM, John Garry wrote:
> On 29/10/2018 12:16, Will Deacon wrote:
>> On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
>>> On 29/10/2018 11:25, Will Deacon wrote:
 On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> Currently it is acceptable to set the distance between 2 separate nodes to
> LOCAL_DISTANCE.
>
> Reject this as it is invalid.
>
> This change avoids a crash reported in [1].
>
> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>
> Signed-off-by: John Garry 
>
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 146c04c..6092e3d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> distance)
> }
>
> if ((u8)distance != distance ||
> -    (from == to && distance != LOCAL_DISTANCE)) {
> +    (from == to && distance != LOCAL_DISTANCE) ||
> +    (from != to && distance == LOCAL_DISTANCE)) {

 The current code here is more-or-less lifted from the x86 implementation
 of numa_set_distance().
>>>
>>> Right, I did notice this. I didn't think that x86 folks would be so
>>> concerned since they generally only use ACPI, and the ACPI code already
>>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
>>> code].
>>>
>>>  I think we should either factor out the sanity check
 into a core helper or make the core code robust to these funny 
 configurations.
>>>
>>> OK, so to me it would make sense to factor out a sanity check into a core
>>> helper.
>>
>> That, or have the OF code perform the same validation that slit_valid() is
>> doing for ACPI. I'm just trying to avoid other architectures running into
>> this problem down the line.
>>
> 
> Right, OF code should do this validation job if ACPI is doing it (especially 
> since the DT bindings actually specify the distance rules), and not rely on 
> the arch NUMA code to accept/reject numa_set_distance() combinations.

I would say this particular condition checking still falls under arch NUMA init
code sanity check like other basic tests what numa_set_distance() currently does
already but it should not be a necessity for the OF driver to check these. It 
can
choose to check but arch NUMA should check basic things like two different NUMA
nodes should not have LOCAL_DISTANCE as distance like in this case.

(from == to && distance != LOCAL_DISTANCE) ||
(from != to && distance == LOCAL_DISTANCE))


> 
> And, in addition to this, I'd say OF should disable NUMA if given an invalid 
> table (like ACPI does).

Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
starts booting. Platform should have sent right values, OF driver trying to
adjust stuff what platform has sent with FDT once the kernel starts booting is
not right. For example "Kernel NUMA wont like the distance factors lets clean
then up before passing on to MM". Disabling NUMA is one such major decision 
which
should be with arch NUMA code not with OF driver.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 06:02 PM, John Garry wrote:
> On 29/10/2018 12:16, Will Deacon wrote:
>> On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
>>> On 29/10/2018 11:25, Will Deacon wrote:
 On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> Currently it is acceptable to set the distance between 2 separate nodes to
> LOCAL_DISTANCE.
>
> Reject this as it is invalid.
>
> This change avoids a crash reported in [1].
>
> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>
> Signed-off-by: John Garry 
>
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 146c04c..6092e3d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> distance)
> }
>
> if ((u8)distance != distance ||
> -    (from == to && distance != LOCAL_DISTANCE)) {
> +    (from == to && distance != LOCAL_DISTANCE) ||
> +    (from != to && distance == LOCAL_DISTANCE)) {

 The current code here is more-or-less lifted from the x86 implementation
 of numa_set_distance().
>>>
>>> Right, I did notice this. I didn't think that x86 folks would be so
>>> concerned since they generally only use ACPI, and the ACPI code already
>>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
>>> code].
>>>
>>>  I think we should either factor out the sanity check
 into a core helper or make the core code robust to these funny 
 configurations.
>>>
>>> OK, so to me it would make sense to factor out a sanity check into a core
>>> helper.
>>
>> That, or have the OF code perform the same validation that slit_valid() is
>> doing for ACPI. I'm just trying to avoid other architectures running into
>> this problem down the line.
>>
> 
> Right, OF code should do this validation job if ACPI is doing it (especially 
> since the DT bindings actually specify the distance rules), and not rely on 
> the arch NUMA code to accept/reject numa_set_distance() combinations.

I would say this particular condition checking still falls under arch NUMA init
code sanity check like other basic tests what numa_set_distance() currently does
already but it should not be a necessity for the OF driver to check these. It 
can
choose to check but arch NUMA should check basic things like two different NUMA
nodes should not have LOCAL_DISTANCE as distance like in this case.

(from == to && distance != LOCAL_DISTANCE) ||
(from != to && distance == LOCAL_DISTANCE))


> 
> And, in addition to this, I'd say OF should disable NUMA if given an invalid 
> table (like ACPI does).

Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
starts booting. Platform should have sent right values, OF driver trying to
adjust stuff what platform has sent with FDT once the kernel starts booting is
not right. For example "Kernel NUMA wont like the distance factors lets clean
then up before passing on to MM". Disabling NUMA is one such major decision 
which
should be with arch NUMA code not with OF driver.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry

On 29/10/2018 12:16, Will Deacon wrote:

On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:

On 29/10/2018 11:25, Will Deacon wrote:

On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:

Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}

if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {


The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance().


Right, I did notice this. I didn't think that x86 folks would be so
concerned since they generally only use ACPI, and the ACPI code already
validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
code].

 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a core
helper.


That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.



Right, OF code should do this validation job if ACPI is doing it 
(especially since the DT bindings actually specify the distance rules), 
and not rely on the arch NUMA code to accept/reject numa_set_distance() 
combinations.


And, in addition to this, I'd say OF should disable NUMA if given an 
invalid table (like ACPI does).


Cheers,
John


Will

.






Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry

On 29/10/2018 12:16, Will Deacon wrote:

On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:

On 29/10/2018 11:25, Will Deacon wrote:

On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:

Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}

if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {


The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance().


Right, I did notice this. I didn't think that x86 folks would be so
concerned since they generally only use ACPI, and the ACPI code already
validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
code].

 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a core
helper.


That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.



Right, OF code should do this validation job if ACPI is doing it 
(especially since the DT bindings actually specify the distance rules), 
and not rely on the arch NUMA code to accept/reject numa_set_distance() 
combinations.


And, in addition to this, I'd say OF should disable NUMA if given an 
invalid table (like ACPI does).


Cheers,
John


Will

.






Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
> >On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> >>Currently it is acceptable to set the distance between 2 separate nodes to
> >>LOCAL_DISTANCE.
> >>
> >>Reject this as it is invalid.
> >>
> >>This change avoids a crash reported in [1].
> >>
> >>[1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >>
> >>Signed-off-by: John Garry 
> >>
> >>diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>index 146c04c..6092e3d 100644
> >>--- a/arch/arm64/mm/numa.c
> >>+++ b/arch/arm64/mm/numa.c
> >>@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> >>distance)
> >>}
> >>
> >>if ((u8)distance != distance ||
> >>-   (from == to && distance != LOCAL_DISTANCE)) {
> >>+   (from == to && distance != LOCAL_DISTANCE) ||
> >>+   (from != to && distance == LOCAL_DISTANCE)) {
> >
> >The current code here is more-or-less lifted from the x86 implementation
> >of numa_set_distance().
> 
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
> 
>  I think we should either factor out the sanity check
> >into a core helper or make the core code robust to these funny 
> >configurations.
> 
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
> >On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> >>Currently it is acceptable to set the distance between 2 separate nodes to
> >>LOCAL_DISTANCE.
> >>
> >>Reject this as it is invalid.
> >>
> >>This change avoids a crash reported in [1].
> >>
> >>[1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >>
> >>Signed-off-by: John Garry 
> >>
> >>diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>index 146c04c..6092e3d 100644
> >>--- a/arch/arm64/mm/numa.c
> >>+++ b/arch/arm64/mm/numa.c
> >>@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> >>distance)
> >>}
> >>
> >>if ((u8)distance != distance ||
> >>-   (from == to && distance != LOCAL_DISTANCE)) {
> >>+   (from == to && distance != LOCAL_DISTANCE) ||
> >>+   (from != to && distance == LOCAL_DISTANCE)) {
> >
> >The current code here is more-or-less lifted from the x86 implementation
> >of numa_set_distance().
> 
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
> 
>  I think we should either factor out the sanity check
> >into a core helper or make the core code robust to these funny 
> >configurations.
> 
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry

On 29/10/2018 11:25, Will Deacon wrote:

Hi John,



Hi Will,


On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:

Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}

if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {


The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance().


Right, I did notice this. I didn't think that x86 folks would be so 
concerned since they generally only use ACPI, and the ACPI code already 
validates these distances in drivers/acpi/numa.c: slit_valid() [unlike 
OF code].


 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a 
core helper.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel







Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread John Garry

On 29/10/2018 11:25, Will Deacon wrote:

Hi John,



Hi Will,


On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:

Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}

if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {


The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance().


Right, I did notice this. I didn't think that x86 folks would be so 
concerned since they generally only use ACPI, and the ACPI code already 
validates these distances in drivers/acpi/numa.c: slit_valid() [unlike 
OF code].


 I think we should either factor out the sanity check

into a core helper or make the core code robust to these funny configurations.


OK, so to me it would make sense to factor out a sanity check into a 
core helper.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel







Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
Hi John,

On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> Currently it is acceptable to set the distance between 2 separate nodes to
> LOCAL_DISTANCE.
> 
> Reject this as it is invalid.
> 
> This change avoids a crash reported in [1].
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> 
> Signed-off-by: John Garry 
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 146c04c..6092e3d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> distance)
>   }
>  
>   if ((u8)distance != distance ||
> - (from == to && distance != LOCAL_DISTANCE)) {
> + (from == to && distance != LOCAL_DISTANCE) ||
> + (from != to && distance == LOCAL_DISTANCE)) {

The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance(). I think we should either factor out the sanity check
into a core helper or make the core code robust to these funny configurations.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Will Deacon
Hi John,

On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> Currently it is acceptable to set the distance between 2 separate nodes to
> LOCAL_DISTANCE.
> 
> Reject this as it is invalid.
> 
> This change avoids a crash reported in [1].
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> 
> Signed-off-by: John Garry 
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 146c04c..6092e3d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
> distance)
>   }
>  
>   if ((u8)distance != distance ||
> - (from == to && distance != LOCAL_DISTANCE)) {
> + (from == to && distance != LOCAL_DISTANCE) ||
> + (from != to && distance == LOCAL_DISTANCE)) {

The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance(). I think we should either factor out the sanity check
into a core helper or make the core code robust to these funny configurations.

Will


[PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-26 Thread John Garry
Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}
 
if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {
pr_warn_once("Warning: invalid distance parameter, from=%d 
to=%d distance=%d\n",
 from, to, distance);
return;
-- 
1.9.1



[PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-26 Thread John Garry
Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry 

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
distance)
}
 
if ((u8)distance != distance ||
-   (from == to && distance != LOCAL_DISTANCE)) {
+   (from == to && distance != LOCAL_DISTANCE) ||
+   (from != to && distance == LOCAL_DISTANCE)) {
pr_warn_once("Warning: invalid distance parameter, from=%d 
to=%d distance=%d\n",
 from, to, distance);
return;
-- 
1.9.1