HI Peter

On Tue, Dec 16, 2025 at 2:41 PM Peter Suti
<[email protected]> wrote:
>
> When performing an 'mtd read' on a NAND partition, the presence of bad
> blocks can cause the command to fail with error -22 (EINVAL) if the
> requested size is equal to (or close to) the partition size.
> When size is not provided, the whole partition is read by default.
>
> This issue occurs because the bad block skipping logic increments the
> physical read offset ('off') without decrementing the 'remaining' byte
> count. If enough bad blocks are skipped, the 'off' pointer eventually
> slides past the end of the partition boundary ('mtd->size') while
> 'remaining' is still non-zero. The subsequent call to 'mtd_read_oob'
> then fails its internal boundary checks.
>
> The Scenario:
>   Partition Size: 0x1200000
>   Bad Blocks:     1
>   Command:        mtd read swufit ${loadaddr}
>                   (Defaults to reading full 0x1200000 bytes)
>
>   1. U-Boot attempts to read 0x1200000 valid bytes.
>   2. It encounters the bad block and skips it (off += mtd->erasesize).
>   3. To satisfy the full request, it must read
>      (0x1200000 + mtd->erasesize) physical bytes.
>   4. The read pointer 'off' exceeds 'mtd->size'.
>   5. Command aborts with: "Read on swufit failed with error -22".
>
> Add a boundary check at the start of the read loop. If the read pointer
> 'off' reaches the end of the physical device, check if the user explicitly
> requested a size (argc >= 2).
>
> * If no size was specified (default read): Stop the read gracefully,
>   print a notice, and return success. We assume the user wants "all
>   available data" up to the physical limit.
> * If a size was explicitly specified: Continue standard execution. This
>   will result in an error (-22) as expected, since the specific requirement
>   cannot be met.
>
> It is physically impossible to read the full logical size of a partition
> if it contains bad blocks (as the bad blocks consume physical space
> required for the data). In this case, truncating the read is reasonable.
> For common use cases like loading FIT images (which often
> contain padding at the end), missing the trailing 0xFF data is acceptable,
> whereas failing the entire load operation is not.
>
> Also add some logging when bad blocks are encountered.
>
> NOTE: This changes the behavior of 'mtd read' when no size is provided.
>       Previously, it failed if the full read was impossible. Now, it
>       succeeds with a truncated buffer. The caller is responsible for
>       verifying data integrity (e.g., via fitImage hash verification).
>
> Signed-off-by: Peter Suti <[email protected]>
> [polished the comment and commit message]
> Signed-off-by: Radek Dostál <[email protected]>
> ---
>  cmd/mtd.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/cmd/mtd.c b/cmd/mtd.c
> index 7f25144098b..1d1845bce44 100644
> --- a/cmd/mtd.c
> +++ b/cmd/mtd.c
> @@ -559,8 +559,10 @@ static int do_mtd_io(struct cmd_tbl *cmdtp, int flag, 
> int argc,
>
>         /* Search for the first good block after the given offset */
>         off = start_off;
> -       while (mtd_block_isbad(mtd, off))
> +       while (mtd_block_isbad(mtd, off)) {
> +               printf("Bad block: failed to read at offset 0x%llx, 
> skipping.\n", off);
>                 off += mtd->erasesize;
> +       }

Would like to avoid adding more debug stuff or printing here.

>
>         led_activity_blink();
>
> @@ -569,9 +571,32 @@ static int do_mtd_io(struct cmd_tbl *cmdtp, int flag, 
> int argc,
>
>         /* Loop over the pages to do the actual read/write */
>         while (remaining) {
> +               /*
> +                * Boundary Check: Bad block skipping may push the physical 
> read offset
> +                * past the partition end (mtd->size).
> +                * If the read pointer 'off' has reached the end of the 
> physical flash,
> +                * we physically cannot read more.
> +                *
> +                * Check 'argc' to see if the user explicitly requested a 
> specific size.
> +                * * If argc < 2, the user did NOT specify a size (defaulted 
> to partition size).
> +                *   In this case, we treat "end of flash" as a natural 
> stopping point
> +                *   and truncate the read gracefully.
> +                * * If argc >= 2, the user explicitly asked for N bytes.
> +                *   In this case, we do NOT truncate. We let the loop 
> continue, which
> +                *   will trigger the standard error (-22) because the 
> request cannot be met.
> +                */
> +               if (argc < 2 && off >= mtd->size) {
> +                       printf("Notice: Reached end of partition with %lld 
> bytes remaining. "
> +                              "Truncating read.\n", remaining);

This is ok as print, to inform about the boundary check

> +                       remaining = 0;
> +                       ret = CMD_RET_SUCCESS;

What is the drawback to failing here, I mean you can still verify the
integrity. GIve a script example
where this is needed in the commit message

> +                       break;
> +               }
> +
>                 /* Skip the block if it is bad */
>                 if (mtd_is_aligned_with_block_size(mtd, off) &&
>                     mtd_block_isbad(mtd, off)) {
> +                       printf("Bad block: failed to read at offset 0x%llx, 
> skipping.\n", off);
>                         off += mtd->erasesize;

This is an extra print for now, you can propose it on a separated patch

Michael

>                         continue;
>                 }
> --
> 2.43.0
>


--
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
[email protected]
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
[email protected]
www.amarulasolutions.com

Reply via email to