Re: [PATCH] arm64: Fix map_range() not splitting mapped blocks

2024-03-19 Thread Pierre-Clément Tosi
Hi Fabio,

On Tue, Mar 19, 2024 at 09:13:12AM -0300, Fabio Estevam wrote:
> Hi Pierre,
> 
> On Tue, Mar 19, 2024 at 8:39 AM Pierre-Clément Tosi  wrote:
> 
> > This means gd->arch.tlb_addr pointing to the live PTs during 
> > setup_pgtables().
> >
> > In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.
> >
> > In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:
> >
> >  - early_mmu_setup() calls it with SCTLR_ELx.M unset;
> >  - final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff
> >CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if 
> > CONFIG_SYS_SOC="fsl-layerscape"
> >so that  gets auto-included through
> >.
> >
> > So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != 
> > "fsl-layerscape"?
> 
> No, this cannot happen.

Thanks for confirming.

For clarity, it might then make sense to drop that #ifdef in final_mmu_setup().

> Only the following Layerscape SoCs select CONFIG_FSL_LAYERSCAPE
> in arch/arm/cpu/armv8/fsl-layerscape/Kconfig:
> LS1012A, LS1028A, LS1043A, LS1046A, LS1088A, LS2080A, LX2162A and LX2160A
> 
> I saw the original boot problem with the i.MX8QX.
> 
> The i.MX8QX is part of the i.MX family, not the Layerscape family.

Sure.

To be clear, the concern here was that split_block() doesn't perform what the
CPU architecture requires when modifying page tables that the MMU is using and
the question therefore was: can setup_pgtables() be called on such live PTs?

For most AArch64 U-Boot ports (including the i.MX family), the answer is trivial
because they use the arch code i.e. setup_all_pgtables(). However, as
fsl-layerscape re-implements mmu_setup(), it had to be looked at separately,
hence my question, which you answered above.

HTH,

-- 
Pierre


Re: [PATCH] arm64: Fix map_range() not splitting mapped blocks

2024-03-19 Thread Pierre-Clément Tosi
Hi Marc,

On Tue, Mar 19, 2024 at 09:43:03AM +, Marc Zyngier wrote:
> This seems pretty reasonable, thanks for looking into this. However, I
> can't help but notice that this is done without any BBM, and no TLBI
> either.
> 
> Are we guaranteed that the updated page tables are not live at the
> point of update?

This means gd->arch.tlb_addr pointing to the live PTs during setup_pgtables().

In arch/arm/cpu/armv8, setup_all_pgtables() runs with SCTLR_ELx.M unset.

In arch/arm/cpu/armv8/fsl-layerscape, setup_pgtables() is called twice:

 - early_mmu_setup() calls it with SCTLR_ELx.M unset;
 - final_mmu_setup() overwrites gd->arch.tlb_addr before calling it iff
   CFG_SYS_MEM_RESERVE_SECURE is defined i.e. if CONFIG_SYS_SOC="fsl-layerscape"
   so that  gets auto-included through
   .

So can CONFIG_FSL_LAYERSCAPE be set while CONFIG_SYS_SOC != "fsl-layerscape"?

I suppose Fabio and Stefano can answer this and/or help with ensuring that
setup_pgtables() is never called on live PTs.

Thanks,

-- 
Pierre


Re: [PATCH 1/2] arm64: Reduce add_map() complexity

2024-03-18 Thread Pierre-Clément Tosi
Hi Fabio,

On Mon, Mar 18, 2024 at 10:31:25AM -0300, Fabio Estevam wrote:
> Please find the new logs here:
> 
> https://pastebin.com/raw/qF3GbJry

I notice that the mem_map in these logs have overlapping ranges, which results
in unnecessary work when creating the PTs. For this reason, it would make sense
to prune it in arch/arm/mach-imx/imx8/cpu.c before calling dcache_enable(), IMO.
On this point, you also have contiguous ranges with identical attributes
(mem_map[0] and mem_map[6]), which could be merged into a single entry. This
could result in more efficient MMU mappings if the mem_map entries can share a
block mapping. Also note that mem_map[4].size=0 so could be dropped.

In any case, if we assume that overlapping mem_map entries are a valid input to
the arch code (maybe not as it leads to potentially ambiguous behavior?), then
41e2787f5ec4 had removed support for splitting existing block mappings.

In your case, my assumption is that the function was then treating block
mappings in the range 0x1c00-0x8000 (which get mapped, at least
partially, by mem_map[0], mem_map[1], then mem_map[2]) as table mappings and was
issuing read/write instructions in that range (accessing them as PTEs). As those
seem to be device memory (I see they get mapped as MT_DEVICE_NGNRNE), these
accesses might explain the SError you were experiencing.

Would you mind testing [1] and giving it "Tested-by:" if it addresses the issue?

Thanks,

[1]: 
https://lore.kernel.org/u-boot/43haokus4jdxguk4arig5tsqcgq2wbezwpbj7oti6pdkvrfual@wa7vz2iypcv5/

-- 
Pierre


[PATCH] arm64: Fix map_range() not splitting mapped blocks

2024-03-18 Thread Pierre-Clément Tosi
The implementation of map_range() creates the requested mapping by
walking the page tables, iterating over multiple PTEs and/or descending
into existing table mappings as needed. When doing so, it assumes any
pre-existing valid PTE to be a table mapping. This assumption is wrong
if the platform code attempts to successively map two overlapping ranges
where the latter intersects a block mapping created for the former.

As a result, map_range() treats the existing block mapping as a table
mapping and descends into it i.e. starts interpreting the
previously-mapped range as an array of PTEs, writing to them and
potentially even descending further (extra fun with MMIO ranges!).

Instead, pass any valid non-table mapping to split_block(), which
ensures that it actually was a block mapping (calls panic() otherwise)
before splitting it.

Fixes: 41e2787f5ec4 ("arm64: Reduce add_map() complexity")
Signed-off-by: Pierre-Clément Tosi 
---
 arch/arm/cpu/armv8/cache_v8.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
index 697334086f..57d06f0575 100644
--- a/arch/arm/cpu/armv8/cache_v8.c
+++ b/arch/arm/cpu/armv8/cache_v8.c
@@ -326,6 +326,8 @@ static void map_range(u64 virt, u64 phys, u64 size, int 
level,
/* Going one level down */
if (pte_type([i]) == PTE_TYPE_FAULT)
set_pte_table([i], create_table());
+   else if (pte_type([i]) != PTE_TYPE_TABLE)
+   split_block([i], level);
 
next_table = (u64 *)(table[i] & GENMASK_ULL(47, PAGE_SHIFT));
next_size = min(map_size - (virt & (map_size - 1)), size);
-- 
2.44.0.291.gc1ea87d7ee-goog


-- 
Pierre


Re: [PATCH 1/2] arm64: Reduce add_map() complexity

2024-03-15 Thread Pierre-Clément Tosi
Hi Fabio,

On Fri, Mar 15, 2024 at 08:56:17AM -0300, Fabio Estevam wrote:
> Hi Marc,
> 
> On Sat, Mar 9, 2024 at 11:36 AM Fabio Estevam  wrote:
> 
> > Does the log below help?
> >
> > https://pastebin.com/raw/1i1VBA0a
> >
> > If not, please send me a debug patch and I will be glad to run it here.
> 
> I'm sorry to bother you, but have you had a chance to look at the log
> I shared with you?

I had a quick look through your logs and notice that U-Boot master attempts to
map addresses in the high VA range e.g.

  Checking if pte fits for virt=e400 [...]

while the logs that boot successfully only use the low VA range e.g.

  Checking if pte fits for virt=80193000 [...]

Unless that has recently changed (since I last worked with U-Boot), U-Boot on
AArch64 only supports identity mappings and therefore was only taught how to
program TTBR0_ELx (i.e. is only able to map low virtual addresses). This means
that the code - with or without 41e2787f5ec4 - would be unable to map addresses
such as 0xe400.

Now, given that 41e2787f5ec4 only affects implementation details of add_map(),
I am surprised that reverting that commit changes the arguments received by the
function such as virt. As a reminder, add_map() is exclusively used on mem_map:

  for (i = 0; mem_map[i].size || mem_map[i].attrs; i++)
  add_map(_map[i]);

> 
> That's the only issue preventing colibri-imx8x from booting mainline U-Boot.

If I read the U-Boot configs right i.e.

 - configs/colibri-imx8x_defconfig: CONFIG_ARCH_IMX8=y
 - arch/arm/mach-imx/imx8/Makefile: obj-y += cpu.o
 - arch/arm/mach-imx/imx8/cpu.c: struct mm_region *mem_map = imx8_mem_map;

There is a possibility that your mem_map is getting modified by MACH-specific
code. In particular, enable_caches() seems to dynamically get the MMU mappings
from some RPC mechanism (see get_owned_memreg() and sc_rm_get_memreg_info()).

Could it be that whatever services those requests might be returning unexpected
values? Instrumenting arch/arm/mach-imx/imx8/cpu.c and dumping mem_map (and
the RPC messages) with/without the patch would help clear this up.

HTH,

> 
> We would like to get this fixed for U-Boot 2024.04.
> 
> Thanks for your help

-- 
Pierre


Re: [PATCH 0/3] Revert HAFDBS changes

2023-10-27 Thread Pierre-Clément Tosi
Hi Marc,

On Fri, Oct 27, 2023 at 03:11:15PM +0100, Marc Zyngier wrote:
> Hi Pierre-Clément,
> 
> On Fri, 27 Oct 2023 10:49:47 +0100,
> Pierre-Clément Tosi  wrote:
> > 
> > Hi Chris,
> > 
> > On Fri, Oct 27, 2023 at 01:23:51PM +1300, Chris Packham wrote:
> > > As discussed this series reverts the HAFDBS changes that caused an issue
> > > on AC5/AC5X. I think there are some improvements that can be made to the
> > > initial memory map for the AC5/AC5X but so far nothing I've found makes
> > > it compatible with the HAFDBS changes.
> > 
> > Would you mind briefly explaining what those issues are and/or point
> > me to the discussion where it was decided to revert these patches?
> 
> It was me[1] who offered to revert them, as the "fix" that Chris came
> up with didn't make much sense on its own, was provided without much
> of an explanation, and nobody else with sufficient understanding of
> the ARMv8 translation model seem to have access to this HW to debug
> it (though if someone sends me a board, I'll happily look into it).

Thanks for the context.

I have checked with our internal users of the patches being reverted and they
don't mind the performance hit; if no one else is worried about the potential
regression, reverting (instead of properly addressing the root cause) might be
a reasonable option.

> To be clear, I'm pretty much convinced that the issue is due to the
> way page tables are managed on the arm64 port. The whole "emergency
> page table" switch doesn't make much sense, specially when changing
> something as simple as some page attributes, and it isn't like "break
> before make" is a brand new concept.

It might not "make sense" from a functional/performance perspective (as it isn't
strictly necessary) but I was assuming that it at least complied with the
architecture (i.e. should not cause the CPU to lockup, as Chris reports).

Even if the reverts get merged, it would be good to clear up if it was a bug in
the logic (e.g. how is gd->arch.tlb_emerg affected by these patches?) or a
violation of the architecture and if the issue was actually coming from the
HAFDBS changes or if they were triggering it.

Otherwise, these reverts might be masking the symptom instead of addressing the
cause. As a result, we would be left with code that seems to "work" but is still
as fragile as before, where any future patch is at risk of triggering the faulty
behavior and of ending up also being reverted.

> 
> > The feature is quite useful for users of CONFIG_CMO_BY_VA_ONLY, to
> > speed up the boot sequence: instead of reverting these patches
> > altogether, would a reasonable alternative be to put them behind a
> > build option?
> 
> I don't think this is right. Either we have something that works for
> all ARMV8.1+ systems, or it just makes things a lot less maintainable.
> 
> But maybe the u-boot maintainers have a different view on this.
> 
> The core of the problem is that there is a truckload of code that
> already exists in the tree and that somehow relies on the existing
> (and funky) behaviours[2]. How do we go about fixing the core u-boot?
> I have no idea.
> 
> Cheers,
> 
>   M.
> 
> [1] https://lore.kernel.org/r/063a16d559dc9cb327c9459803005...@kernel.org
> [2] 
> https://lore.kernel.org/r/CAFOYHZC_Dveax85n0fLr5BFyZcLqsvUssn=j0ohyvn75bta...@mail.gmail.com
> 
> -- 
> Without deviation from the norm, progress is not possible.

Thanks,

-- 
Pierre


Re: [PATCH 0/3] Revert HAFDBS changes

2023-10-27 Thread Pierre-Clément Tosi
Hi Chris,

On Fri, Oct 27, 2023 at 01:23:51PM +1300, Chris Packham wrote:
> As discussed this series reverts the HAFDBS changes that caused an issue
> on AC5/AC5X. I think there are some improvements that can be made to the
> initial memory map for the AC5/AC5X but so far nothing I've found makes
> it compatible with the HAFDBS changes.

Would you mind briefly explaining what those issues are and/or point me to the
discussion where it was decided to revert these patches?

The feature is quite useful for users of CONFIG_CMO_BY_VA_ONLY, to speed up the
boot sequence: instead of reverting these patches altogether, would a reasonable
alternative be to put them behind a build option?

Also, could you confirm that the "initial memory map" you are referring to above
only describes actual memory as, IIRC, some boards were using mappings **much**
larger than their DRAM address space?

> 
> 
> Chris Packham (3):
>   Revert "armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is
> present"
>   Revert "arm64: Use level-2 for largest block mappings when FEAT_HAFDBS
> is present"
>   Revert "arm64: Use FEAT_HAFDBS to track dirty pages when available"
> 
>  arch/arm/cpu/armv8/cache_v8.c  | 30 +++---
>  arch/arm/include/asm/armv8/mmu.h   | 20 
>  arch/arm/include/asm/global_data.h |  2 --
>  3 files changed, 7 insertions(+), 45 deletions(-)
> 
> -- 
> 2.42.0
> 

-- 
Pierre


Re: [PATCH] pci: Handle failed calloc in decode_regions()

2023-03-22 Thread Pierre-Clément Tosi
Hi,

On Tue, Mar 21, 2023 at 08:57:18AM +0100, Christian Gmeiner wrote:
> Am So., 4. Dez. 2022 um 22:22 Uhr schrieb Pierre-Clément Tosi
> :
> >
> > Hi,
> >
> > On Fri, Dec 02, 2022 at 08:38:37PM +0100, s...@geanix.com wrote:
> > >
> > > Quoting Pierre-Clément Tosi :
> > >
> > > > Add a check for calloc() failing to allocate the requested memory.
> > > >
> > > > Make decode_regions() return an error code.
> > > >
> > > > Cc: Bin Meng 
> > > > Cc: Simon Glass 
> > > > Cc: Stefan Roese 
> > > > Signed-off-by: Pierre-Clément Tosi 
> > > > ---
> > >
> > > Hi,
> > >
> > > We upgraded from v2022.04 -> v2022.10, and this PATCH breaks PCI 
> > > (scsi/sata)
> > > support for x86_64.
> > > I have seen this in qemu, i havn't had the time to test this on real 
> > > hardware.
> > >
> > > Steps to reproduce:
> > > $ make efi-x86_payload64_defconfig && make -j32 && scripts/build-efi.sh 
> > > -sPrp
> >
> 
> Breaks my use case too and is basically a revert of
> https://source.denx.de/u-boot/u-boot/-/commit/f2825f6ec0bb50e7bd9376828a32212f1961f979
> In my use case we are using coreboot for different Intel SoCs with a
> generic U-Boot payload.
> 
> > Decompiling the resulting u-boot.dtb shows
> >
> > pci {
> > compatible = "pci-x86";
> > u-boot,dm-pre-reloc;
> > };
> >
> 
> Yes.. that is what can be found here:
> https://source.denx.de/u-boot/u-boot/-/blob/master/arch/x86/dts/coreboot.dts#L34
> 
> 
> > which isn't supported by the patch as we return -EINVAL when missing 
> > "ranges".
> > The rationale here was that the property seemed mandatory (see [1] and 
> > [2]); was
> > this a misunderstanding of potential corner cases?
> >
> 
> At the moment I would like to revert this change.
> 

Thanks for sharing such a corner case.

IMO, normal operation shouldn't rely on errors being silently skipped because
return values are being blindly ignored. Instead, we could limit EINVAL to
libfdt errors that aren't FDT_ERR_NOTFOUND, which would address your use-case,
where "ranges" is missing from the DT node.

Is your issue fixed by this patch:

https://patchwork.ozlabs.org/project/uboot/patch/20230220194927.476708-8-...@chromium.org/

?

> > OTOH, I see that most DTS using "pci-x86" (a pseudo-device intended to act 
> > as a
> > PCI bus) actually provide "ranges". In particular, a comment added by
> > 0ced70a0bb7a ("x86: tpl: Add a fake PCI bus") states that
> >
> > The BARs are allocated statically in the device tree.
> >
> > So I'll let others comment on this but either:
> >
> > - arch/x86/dts/efi-x86_payload.dts (and a few other DTS) is missing 
> > "ranges"; or
> > - pci_uclass_pre_probe() should treat UCLASS_SIMPLE_BUS differently.
> >
> > [1]: 
> > https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt
> > [2]: IEEE Std 1275-1994
> >
> > >
> > > Br,
> > > /Sean
> > >
> > > >  drivers/pci/pci-uclass.c | 15 ++-
> > > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pci-uclass.c b/drivers/pci/pci-uclass.c
> > > > index 970ee1adf1..2c85e78a13 100644
> > > > --- a/drivers/pci/pci-uclass.c
> > > > +++ b/drivers/pci/pci-uclass.c
> > > > @@ -954,7 +954,7 @@ int pci_bind_bus_devices(struct udevice *bus)
> > > > return 0;
> > > >  }
> > > >
> > > > -static void decode_regions(struct pci_controller *hose, ofnode 
> > > > parent_node,
> > > > +static int decode_regions(struct pci_controller *hose, ofnode 
> > > > parent_node,
> > > >ofnode node)
> > > >  {
> > > > int pci_addr_cells, addr_cells, size_cells;
> > > > @@ -968,7 +968,7 @@ static void decode_regions(struct pci_controller
> > > > *hose, ofnode parent_node,
> > > > prop = ofnode_get_property(node, "ranges", );
> > > > if (!prop) {
> > > > debug("%s: Cannot decode regions\n", __func__);
> > > > -   return;
> > > > +   return -EINVAL;
> > > > }
> > > >
> > > > pci_addr_cells = ofnode_read_simple_addr_cells(node);
&g

Re: [PATCH] pci: Handle failed calloc in decode_regions()

2022-12-04 Thread Pierre-Clément Tosi
Hi,

On Fri, Dec 02, 2022 at 08:38:37PM +0100, s...@geanix.com wrote:
> 
> Quoting Pierre-Clément Tosi :
> 
> > Add a check for calloc() failing to allocate the requested memory.
> > 
> > Make decode_regions() return an error code.
> > 
> > Cc: Bin Meng 
> > Cc: Simon Glass 
> > Cc: Stefan Roese 
> > Signed-off-by: Pierre-Clément Tosi 
> > ---
> 
> Hi,
> 
> We upgraded from v2022.04 -> v2022.10, and this PATCH breaks PCI (scsi/sata)
> support for x86_64.
> I have seen this in qemu, i havn't had the time to test this on real hardware.
> 
> Steps to reproduce:
> $ make efi-x86_payload64_defconfig && make -j32 && scripts/build-efi.sh -sPrp

Decompiling the resulting u-boot.dtb shows

pci {
compatible = "pci-x86";
u-boot,dm-pre-reloc;
};

which isn't supported by the patch as we return -EINVAL when missing "ranges".
The rationale here was that the property seemed mandatory (see [1] and [2]); was
this a misunderstanding of potential corner cases?

OTOH, I see that most DTS using "pci-x86" (a pseudo-device intended to act as a
PCI bus) actually provide "ranges". In particular, a comment added by
0ced70a0bb7a ("x86: tpl: Add a fake PCI bus") states that

The BARs are allocated statically in the device tree.

So I'll let others comment on this but either:

- arch/x86/dts/efi-x86_payload.dts (and a few other DTS) is missing "ranges"; or
- pci_uclass_pre_probe() should treat UCLASS_SIMPLE_BUS differently.

[1]: 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt
[2]: IEEE Std 1275-1994

> 
> Br,
> /Sean
> 
> >  drivers/pci/pci-uclass.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-uclass.c b/drivers/pci/pci-uclass.c
> > index 970ee1adf1..2c85e78a13 100644
> > --- a/drivers/pci/pci-uclass.c
> > +++ b/drivers/pci/pci-uclass.c
> > @@ -954,7 +954,7 @@ int pci_bind_bus_devices(struct udevice *bus)
> > return 0;
> >  }
> > 
> > -static void decode_regions(struct pci_controller *hose, ofnode parent_node,
> > +static int decode_regions(struct pci_controller *hose, ofnode parent_node,
> >ofnode node)
> >  {
> > int pci_addr_cells, addr_cells, size_cells;
> > @@ -968,7 +968,7 @@ static void decode_regions(struct pci_controller
> > *hose, ofnode parent_node,
> > prop = ofnode_get_property(node, "ranges", );
> > if (!prop) {
> > debug("%s: Cannot decode regions\n", __func__);
> > -   return;
> > +   return -EINVAL;
> > }
> > 
> > pci_addr_cells = ofnode_read_simple_addr_cells(node);
> > @@ -986,6 +986,8 @@ static void decode_regions(struct pci_controller
> > *hose, ofnode parent_node,
> > max_regions = len / cells_per_record + CONFIG_NR_DRAM_BANKS;
> > hose->regions = (struct pci_region *)
> > calloc(1, max_regions * sizeof(struct pci_region));
> > +   if (!hose->regions)
> > +   return -ENOMEM;
> > 
> > for (i = 0; i < max_regions; i++, len -= cells_per_record) {
> > u64 pci_addr, addr, size;
> > @@ -1053,7 +1055,7 @@ static void decode_regions(struct pci_controller
> > *hose, ofnode parent_node,
> > /* Add a region for our local memory */
> > bd = gd->bd;
> > if (!bd)
> > -   return;
> > +   return 0;
> > 
> > for (i = 0; i < CONFIG_NR_DRAM_BANKS; ++i) {
> > if (bd->bi_dram[i].size) {
> > @@ -1068,7 +1070,7 @@ static void decode_regions(struct pci_controller
> > *hose, ofnode parent_node,
> > }
> > }
> > 
> > -   return;
> > +   return 0;
> >  }
> > 
> >  static int pci_uclass_pre_probe(struct udevice *bus)
> > @@ -1097,7 +1099,10 @@ static int pci_uclass_pre_probe(struct udevice *bus)
> > /* For bridges, use the top-level PCI controller */
> > if (!device_is_on_pci_bus(bus)) {
> > hose->ctlr = bus;
> > -   decode_regions(hose, dev_ofnode(bus->parent), dev_ofnode(bus));
> > +   ret = decode_regions(hose, dev_ofnode(bus->parent),
> > +dev_ofnode(bus));
> > +   if (ret)
> > +   return ret;
> > } else {
> > struct pci_controller *parent_hose;
> > 
> > --
> > 2.36.1.124.g0e6072fb45-goog
> 
> 
> 

-- 
Pierre


[PATCH] qfw: Don't fail if setup data size is 0

2022-05-25 Thread Pierre-Clément Tosi
Skip missing setup data (which is valid) rather than failing with an
error.

Cc: Bin Meng 
Cc: Simon Glass 
Reported-by: Andrew Walbran 
Signed-off-by: Pierre-Clément Tosi 
---
 cmd/qfw.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/cmd/qfw.c b/cmd/qfw.c
index d58615040c..ccbc967ca9 100644
--- a/cmd/qfw.c
+++ b/cmd/qfw.c
@@ -25,15 +25,17 @@ static int qemu_fwcfg_cmd_setup_kernel(void *load_addr, 
void *initrd_addr)
qfw_read_entry(qfw_dev, FW_CFG_SETUP_SIZE, 4, _size);
qfw_read_entry(qfw_dev, FW_CFG_KERNEL_SIZE, 4, _size);
 
-   if (setup_size == 0 || kernel_size == 0) {
+   if (kernel_size == 0) {
printf("warning: no kernel available\n");
return -1;
}
 
data_addr = load_addr;
-   qfw_read_entry(qfw_dev, FW_CFG_SETUP_DATA,
-  le32_to_cpu(setup_size), data_addr);
-   data_addr += le32_to_cpu(setup_size);
+   if (setup_size != 0) {
+   qfw_read_entry(qfw_dev, FW_CFG_SETUP_DATA,
+  le32_to_cpu(setup_size), data_addr);
+   data_addr += le32_to_cpu(setup_size);
+   }
 
qfw_read_entry(qfw_dev, FW_CFG_KERNEL_DATA,
   le32_to_cpu(kernel_size), data_addr);
-- 
2.36.1.124.g0e6072fb45-goog



Re: [PATCH 10/11] virtio: rng: Check length before copying

2022-04-06 Thread Pierre-Clément Tosi
Hi,

On Thu, Mar 31, 2022 at 10:09:48AM +, Andrew Scull wrote:
> Check the length of data written by the device is consistent with the
> size of the buffers to avoid out-of-bounds memory accesses in case
> values aren't consistent.
> 
> Signed-off-by: Andrew Scull 
> Cc: Sughosh Ganu 
> ---
>  drivers/virtio/virtio_rng.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_rng.c b/drivers/virtio/virtio_rng.c
> index 9314c0a03e..b85545c2ee 100644
> --- a/drivers/virtio/virtio_rng.c
> +++ b/drivers/virtio/virtio_rng.c
> @@ -41,6 +41,9 @@ static int virtio_rng_read(struct udevice *dev, void *data, 
> size_t len)
>   while (!virtqueue_get_buf(priv->rng_vq, ))
>   ;
>  
> + if (rsize > sg.length)
> + return -EIO;
> +

Although this patch addresses a legitimate concern, could we instead aim for
strengthening the lower-level virtio building blocks (e.g. virtqueue_get_buf())
so that higher-level virtio device drivers such as virtio-{rng,console,net,...}
don't have to be littered with checks of this nature? Could this be achieved by
using the shadow copy introduced in [PATCH 03/11]?

>   memcpy(ptr, buf, rsize);
>   len -= rsize;
>   ptr += rsize;
> -- 
> 2.35.1.1094.g7c7d902a7c-goog
> 

Thanks,

-- 
Pierre


Re: [PATCH 3/9] scripts: Makefile.lib: Pass __UBOOT__ to DTC's CPP

2022-03-17 Thread Pierre-Clément Tosi
On Wed, Mar 16, 2022 at 01:23:43PM -0600, Simon Glass wrote:
> Hi Pierre-Clément,
> 
> On Wed, 16 Mar 2022 at 09:40, Pierre-Clément Tosi  wrote:
> >
> > Some headers included (possibly indirectly) from .dts files might have
> > U-Boot specific content relying on the __UBOOT__ macro passed to CPP
> > when building C code. In that case, it would be sensible for DTC to see
> > that content instead of the non-U-Boot one. To do so, pass the macro to
> > the pre-processor when generate DTC inputs.
> 
> Can you give an example of such a situation?

This patch isn't fixing an existing issue. Instead, it preempts one that
following patches, adding "#ifndef __UBOOT__/{code not supported in U-Boot}" in
Linux headers, would bring. But IMO (and IIUC), it makes sense on its own given
the meaning that CPP macro has in C code and how it's systematically set through
KBUILD_CPPFLAGS.

> 
> >
> > Signed-off-by: Pierre-Clément Tosi 
> > Cc: Simon Glass 
> > ---
> >  scripts/Makefile.lib | 1 +
> >  1 file changed, 1 insertion(+)
> 
> Reviewed-by: Simon Glass 
> 
> 
> >
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index c14da10de7..d7b548dce8 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -192,6 +192,7 @@ dtc_cpp_flags  = -Wp,-MD,$(depfile).pre.tmp -nostdinc   
> >  \
> >  -I$(srctree)/arch/$(ARCH)/include   \
> >  -include $(srctree)/include/linux/kconfig.h \
> >  -D__ASSEMBLY__  \
> > +-D__UBOOT__ \
> >  -undef -D__DTS__
> >
> >  # Finds the multi-part object the current object will be linked into
> > --
> > 2.35.1.723.g4982287a31-goog
> >

-- 
Pierre


[PATCH 5/9] include: Import & Update bitops.h

2022-03-16 Thread Pierre-Clément Tosi
Import the header from version 5.16 of the kernel:

commit df0cc57e057f18e44dac8e6c18aba47ab53202f9

Inline the included  and prevent U-Boot from including
 as BITS_PER_LONG is defined in .

Remove now-duplicate definitions from .

Note: This brings extra compile-time checks through GENMASK_INPUT_CHECK.

Signed-off-by: Pierre-Clément Tosi 
Cc: Simon Glass 
Cc: Tom Rini 
---
 include/linux/bitops.h | 27 ++---
 include/linux/bits.h   | 55 ++
 2 files changed, 62 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/bits.h

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index d2e5ca026e..6d465077d6 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -5,37 +5,24 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef __KERNEL__
-#define BIT(nr)(1UL << (nr))
-#define BIT_ULL(nr)(1ULL << (nr))
-#define BIT_MASK(nr)   (1UL << ((nr) % BITS_PER_LONG))
-#define BIT_WORD(nr)   ((nr) / BITS_PER_LONG)
-#define BIT_ULL_MASK(nr)   (1ULL << ((nr) % BITS_PER_LONG_LONG))
-#define BIT_ULL_WORD(nr)   ((nr) / BITS_PER_LONG_LONG)
-#define BITS_PER_BYTE  8
 #define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
 #endif
 
 /* kernel.h includes log.h which include bitops.h */
 #include 
 
-/*
- * Create a contiguous bitmask starting at bit position @l and ending at
- * position @h. For example
- * GENMASK_ULL(39, 21) gives us the 64bit vector 0x00e0.
- */
 #ifdef CONFIG_SANDBOX
-#define GENMASK(h, l) \
-   (((~0UL) << (l)) & (~0UL >> (CONFIG_SANDBOX_BITS_PER_LONG - 1 - (h
-#else
-#define GENMASK(h, l) \
-   (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h
+#ifdef __GENMASK
+#undef __GENMASK
+#endif
+#define __GENMASK(h, l) \
+   (((~UL(0)) - (UL(1) << (l)) + 1) & \
+(~UL(0) >> (CONFIG_SANDBOX_BITS_PER_LONG  - 1 - (h
 #endif
-
-#define GENMASK_ULL(h, l) \
-   (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
 /*
  * ffs: find first bit set. This is defined the same way as
diff --git a/include/linux/bits.h b/include/linux/bits.h
new file mode 100644
index 00..04e7dae7f9
--- /dev/null
+++ b/include/linux/bits.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_BITS_H
+#define __LINUX_BITS_H
+
+#include 
+#ifndef __UBOOT__
+#include 
+#else
+#define BIT(nr) (UL(1) << (nr))
+#endif
+#ifndef __UBOOT__
+#include 
+#else
+/* U-Boot defines BITS_PER_LONG in . */
+#include 
+#endif
+
+#define BIT_ULL(nr)(ULL(1) << (nr))
+#define BIT_MASK(nr)   (UL(1) << ((nr) % BITS_PER_LONG))
+#define BIT_WORD(nr)   ((nr) / BITS_PER_LONG)
+#define BIT_ULL_MASK(nr)   (ULL(1) << ((nr) % BITS_PER_LONG_LONG))
+#define BIT_ULL_WORD(nr)   ((nr) / BITS_PER_LONG_LONG)
+#define BITS_PER_BYTE  8
+
+/*
+ * Create a contiguous bitmask starting at bit position @l and ending at
+ * position @h. For example
+ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x00e0.
+ */
+#if !defined(__ASSEMBLY__)
+#include 
+#define GENMASK_INPUT_CHECK(h, l) \
+   (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
+   __is_constexpr((l) > (h)), (l) > (h), 0)))
+#else
+/*
+ * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
+ * disable the input check if that is the case.
+ */
+#define GENMASK_INPUT_CHECK(h, l) 0
+#endif
+
+#define __GENMASK(h, l) \
+   (((~UL(0)) - (UL(1) << (l)) + 1) & \
+(~UL(0) >> (BITS_PER_LONG - 1 - (h
+#define GENMASK(h, l) \
+   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
+
+#define __GENMASK_ULL(h, l) \
+   (((~ULL(0)) - (ULL(1) << (l)) + 1) & \
+(~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h
+#define GENMASK_ULL(h, l) \
+   (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
+
+#endif /* __LINUX_BITS_H */
-- 
2.35.1.723.g4982287a31-goog



[PATCH 2/9] lib: crypt: Avoid redefining static_assert

2022-03-16 Thread Pierre-Clément Tosi
Use the macro introduced by commit ef0f4e834c66 ("build_bug.h: add
wrapper for _Static_assert") by importing .

Signed-off-by: Pierre-Clément Tosi 
Cc: Simon Glass 
Cc: Steffen Jaeckel 
---
 lib/crypt/crypt-port.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/crypt/crypt-port.h b/lib/crypt/crypt-port.h
index 6b9542d75b..e85d3c132c 100644
--- a/lib/crypt/crypt-port.h
+++ b/lib/crypt/crypt-port.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0+ */
 /* Copyright (C) 2020 Steffen Jaeckel  */
 
+#include 
 #include 
 #include 
 
@@ -10,8 +11,6 @@
 
 #define ARG_UNUSED(x) (x)
 
-#define static_assert(a, b) _Static_assert(a, b)
-
 #define strtoul(cp, endp, base) simple_strtoul(cp, endp, base)
 
 extern const unsigned char ascii64[65];
-- 
2.35.1.723.g4982287a31-goog



[PATCH 2/2] armv8/cache.S: Triple with single instruction

2021-08-27 Thread Pierre-Clément Tosi
Replace the current 2-instruction 2-step tripling code by a
corresponding single instruction leveraging ARMv8-A's "flexible second
operand as a register with optional shift". This has the added benefit
(albeit arguably negligible) of reducing the final code size.

Fix the comment as the tripled cache level is placed in x12, not x0.

Signed-off-by: Pierre-Clément Tosi 
---
 arch/arm/cpu/armv8/cache.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S
index aabb3dff61..5051597f6f 100644
--- a/arch/arm/cpu/armv8/cache.S
+++ b/arch/arm/cpu/armv8/cache.S
@@ -80,8 +80,7 @@ ENTRY(__asm_dcache_all)
/* x15 <- return address */
 
 loop_level:
-   lsl x12, x0, #1
-   add x12, x12, x0/* x0 <- tripled cache level */
+   add x12, x0, x0, lsl #1 /* x12 <- tripled cache level */
lsr x12, x10, x12
and x12, x12, #7/* x12 <- cache type */
cmp x12, #2
-- 
2.33.0.259.gc128427fd7-goog


-- 
Pierre


[PATCH 1/2] armv8/cache.S: Read sysreg fields through ubfx

2021-08-27 Thread Pierre-Clément Tosi
Improve the file's readability and conciseness by using the appropriate
Aarch64 instruction: ubfx (unsigned bitfield extract). This makes the
code easier to follow as it directly manipulates the offsets and widths
of the fields read from system registers, as they are expressed in the
Standard (ARM ARM). This has the added benefit (albeit arguably
negligible) of reducing the final code size.

Signed-off-by: Pierre-Clément Tosi 
---
 arch/arm/cpu/armv8/cache.S | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S
index 443d94c262..aabb3dff61 100644
--- a/arch/arm/cpu/armv8/cache.S
+++ b/arch/arm/cpu/armv8/cache.S
@@ -27,13 +27,11 @@ ENTRY(__asm_dcache_level)
msr csselr_el1, x12 /* select cache level */
isb /* sync change of cssidr_el1 */
mrs x6, ccsidr_el1  /* read the new cssidr_el1 */
-   and x2, x6, #7  /* x2 <- log2(cache line size)-4 */
+   ubfxx2, x6,  #0,  #3/* x2 <- log2(cache line size)-4 */
+   ubfxx3, x6,  #3, #10/* x3 <- number of cache ways - 1 */
+   ubfxx4, x6, #13, #15/* x4 <- number of cache sets - 1 */
add x2, x2, #4  /* x2 <- log2(cache line size) */
-   mov x3, #0x3ff
-   and x3, x3, x6, lsr #3  /* x3 <- max number of #ways */
clz w5, w3  /* bit position of #ways */
-   mov x4, #0x7fff
-   and x4, x4, x6, lsr #13 /* x4 <- max number of #sets */
/* x12 <- cache level << 1 */
/* x2 <- line length offset */
/* x3 <- number of cache ways - 1 */
@@ -72,8 +70,7 @@ ENTRY(__asm_dcache_all)
mov x1, x0
dsb sy
mrs x10, clidr_el1  /* read clidr_el1 */
-   lsr x11, x10, #24
-   and x11, x11, #0x7  /* x11 <- loc */
+   ubfxx11, x10, #24, #3   /* x11 <- loc */
cbz x11, finished   /* if loc is 0, exit */
mov x15, lr
mov x0, #0  /* start flush at cache level 0 */
@@ -131,8 +128,7 @@ ENDPROC(__asm_invalidate_dcache_all)
 .pushsection .text.__asm_flush_dcache_range, "ax"
 ENTRY(__asm_flush_dcache_range)
mrs x3, ctr_el0
-   lsr x3, x3, #16
-   and x3, x3, #0xf
+   ubfxx3, x3, #16, #4
mov x2, #4
lsl x2, x2, x3  /* cache line size */
 
@@ -158,7 +154,7 @@ ENDPROC(__asm_flush_dcache_range)
 .pushsection .text.__asm_invalidate_dcache_range, "ax"
 ENTRY(__asm_invalidate_dcache_range)
mrs x3, ctr_el0
-   ubfmx3, x3, #16, #19
+   ubfxx3, x3, #16, #4
mov x2, #4
lsl x2, x2, x3  /* cache line size */
 
-- 
2.33.0.259.gc128427fd7-goog


-- 
Pierre


[PATCH] env: Make _init() expect _INVALID when _IS_NOWHERE

2021-08-12 Thread Pierre-Clément Tosi
Avoid applying the "fix" introduced by commit 5557eec01cbf ("env: Fix
invalid env handling in env_init()") to the environment "nowhere".

This is necessary as that commit, by setting the return value of
env_init() to -ENOENT if gd->env_valid is ENV_INVALID, forces that
function to reset gd->env_valid to ENV_VALID. By doing so, it breaks the
assumption (required by ENV_IS_NOWHERE) that gd->env_valid must be
ENV_INVALID.

This, in turn, results in env_relocate() calling env_load() (it should
not), which itself, calls U_BOOT_ENV_LOCATION(nowhere).load() i.e.
env_nowhere_load(). That function, being implemented under the
assumption mentioned above, calls env_set_default(), which in turn,
seeing that gd->env_valid is ENV_VALID (it should not), tries to
dereference whatever lies in gd->env_addr (most likely garbage), leading
to a faulty memory access.

Note that other env_locations might be concerned by this bug but that
this commit only intends to fix it for when ENV_IS_NOWHERE.

Fixes: 5557eec01cbf ("env: Fix invalid env handling in env_init()")
Signed-off-by: Pierre-Clément Tosi 
---
 env/env.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/env/env.c b/env/env.c
index e534008006..0a0f234747 100644
--- a/env/env.c
+++ b/env/env.c
@@ -336,7 +336,7 @@ int env_init(void)
debug("%s: Environment %s init done (ret=%d)\n", __func__,
  drv->name, ret);
 
-   if (gd->env_valid == ENV_INVALID)
+   if (gd->env_valid == ENV_INVALID && drv->location != 
ENVL_NOWHERE)
ret = -ENOENT;
}
 
-- 
2.32.0.605.g8dce9f2422-goog


-- 
Pierre