Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
I installed Debian Jessie on a ReadyNas102 that has a bad block, like so: , | NAND: (ID 0xf1ad) 128 MiB | MMC: MRVL_MMC: 0 | Bad block table found at page 65472, version 0x01 | Bad block table found at page 65408, version 0x01 | nand_read_bbt: Bad block at 0x0024 ` Where using the current flash-kernel resulted in a non-booting system, with messages like these: , | nand: nand_erase_nand: attempt to erase a bad block at page 0x0480 | mtdblock: erase of region [0x4, 0x2] on "uImage" failed ` I can confirm that Uwe's patched flash-kernel works. Short quote from the log: , | flash-kernel: appending /usr/lib/linux-image-3.16.0-4-armmp/armada-370-netgear-rn102.dtb to kernel | Generating kernel u-boot image... done. | Erasing 128 Kibyte @ 2 -- 2 % complete flash_erase: Skipping bad block at 0004 | Erasing 128 Kibyte @ 5e -- 100 % complete | Writing data to block 0 at offset 0x0 | Writing data to block 1 at offset 0x2 | Writing data to block 2 at offset 0x4 | Bad block at 4, 1 block(s) from 4 will be skipped | Writing data to block 3 at offset 0x6 ` -- Caution: Keep out of reach of children.
Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
Hello, here is a patch for flash-kernel to become nand-aware. I also built a package for armhf with this patch that is available from https://debian.kleine-koenig.org/pool/main/f/flash-kernel/flash-kernel_3.56ukl2_armhf.deb for testing. On my device (without bad blocks) this works fine. After talking to some guys in #mtd on OFTC I think this is the right way to go as upstream won't expand flashcp anytime soon. Best regards Uwe --->8--- [PATCH] Use nand-aware tools to write to nand flashes This fixes #813995 assuming that mtd-utils won't change --- debian/changelog | 3 +++ functions| 14 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/debian/changelog b/debian/changelog index 75c69d0..5bfa975 100644 --- a/debian/changelog +++ b/debian/changelog @@ -25,6 +25,9 @@ flash-kernel (3.57) UNRELEASED; urgency=medium * Add new marvell kernel flavour as alternative for orion5x and kirkwood. + [ Uwe Kleine-König ] + * use nandwrite when writing to nand flash. + -- Colin WatsonFri, 29 Jan 2016 13:35:17 + flash-kernel (3.56) unstable; urgency=medium diff --git a/functions b/functions index adfb85a..203edce 100644 --- a/functions +++ b/functions @@ -337,7 +337,19 @@ write_mtd() { # Can't really flashcp to /dev/mtd when testing if [ -z "${FK_TESTSUITE_RUNNING}" ]; then - flashcp "$input_file" "$output_mtd" + local flashtype=$(cat /sys/class/mtd/$base_mtd/type) + case "$flashtype" in + nand|mlc-nand) + flash_erase "$output_mtd" 0 0 + nandwrite -p "$output_mtd" "$input_file" + ;; + nor) + flashcp "$input_file" "$output_mtd" + ;; + *) + error "unsupported flash type" + ;; + esac else cp "$input_file" "$output_mtd" fi -- 2.1.4 signature.asc Description: PGP signature
Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
Hello *, On 02/07/2016 12:34 PM, Ian Campbell wrote: > So far I see no evidence for the claim that flashcp should not be used > for writing to NAND devices in either its --help or its source (it has > no man page AFAICS). > > Having a tool in Debian called "flashcp" which can (according to this > report, I haven't checked this myself) destroy some classes of flash > device with no warning is a clear problem irrespective of flash > -kernel's use of that tool. > > At the very least flashcp should either abort when used on NAND devices > or should be renamed norwrite (cf. nandwrite) but ideally it would Just > Work properly when used on a nand device. "Work properly" might not be well defined here. If flashcp should be taught to write to nand, how should it behave? Like nandwrite with -p? What about -m? Probably without -o. I would guess that renaming flashcp to (say) norwrite isn't an option for upstream. I think the naming was coined before nand flash was widely adopted. So I still think the best thing to do here is to teach flash-kernel about nand chips and let it use nandwrite then. Adding a note to flashcp -h that it is only supposed to work on nor and let it fail for nand also sounds right. > mtd-utils maintainer(s), please let me know if this is either wontfix > or if the fix is going to take some time, in either case I will > workaround flashcp in flash-kernel (either permanently or temporarily > respectively). > > I suppose it is also possible that this is a bug in the underlying > /dev/mtdN and/or mtdblockN device or in the h/w specific driver at the > bottom of the stack? No, mtdblockN exposes bad blocks by design: http://linux-kernel.2935.n7.nabble.com/PATCH-Make-the-mtdblock-read-write-skip-the-bad-nand-sector-td756524.html Best regards Uwe
Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
Package: flash-kernel Version: 3.35+deb8u2 Severity: critical Justification: causes serious data loss Control: block 806926 with -1 Hello, when flash-kernel writes a kernel/initrd to NAND flash it uses plain write(2) to /dev/mtdX (flash-kernel < 3.52) or flashcp (flash-kernel >= 3.52). If the device being written to has bad blocks these are tried to be erased and written by both approaches. This results in a non-booting system at best. In general writing to bad blocks can also affect other (otherwise good) blocks and so result in loss of unrelated data. I never saw this in practise, but the manufacturers of NAND flash say so. I didn't check which machines are affected, but Netgear ReadyNAS 102/104 (which isn't in flash-kernel's database yet, but see below for the obvious entry to add support for them and #806926) is affected and flash kernel managed to break a ReadyNAS 102 already (non-permanently by good fortune as far as I can tell up to now). I guess there are several other machines affected though. The right fix is to use nandwrite to write to NAND flash and only use flashcp for NOR. Something like test -f /sys/class/mtd/mtdX/oobsize could be used to detect if the device is NAND or NOR. But there might be more reliable ways I'm not aware of. I will debug/test a bit more with the broken rn102 (and its owner :-) to maybe come up with a patch, but if someone beats me that's very welcome, too. Best regards Uwe -- System Information: Debian Release: 8.2 APT prefers stable APT policy: (990, 'stable'), (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: armhf (armv7l) Kernel: Linux 3.16.0-4-armmp (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages flash-kernel depends on: ii debconf [debconf-2.0] 1.5.56 ii devio 1.2-1 ii initramfs-tools0.120 ii linux-base 3.5 ii ucf3.0030 Versions of packages flash-kernel recommends: ii u-boot-tools 2014.10+dfsg1-5 flash-kernel suggests no packages. -- Configuration Files: /etc/flash-kernel/db changed: Machine: NETGEAR ReadyNAS 104 DTB-Id: armada-370-netgear-rn104.dtb DTB-Append: yes Mtd-Kernel: uImage Mtd-Initrd: minirootfs U-Boot-Kernel-Address: 0x0400 U-Boot-Initrd-Address: 0x0500 Required-Packages: u-boot-tools -- debconf information: flash-kernel/linux_cmdline: quiet
Processed: Re: Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
Processing control commands: > reassign -1 mtd-utils Bug #813995 [flash-kernel] flash-kernel: writes to nand without being aware of bad blocks Bug reassigned from package 'flash-kernel' to 'mtd-utils'. No longer marked as found in versions flash-kernel/3.35+deb8u2. Ignoring request to alter fixed versions of bug #813995 to the same values previously set -- 813995: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813995 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#813995: flash-kernel: writes to nand without being aware of bad blocks
Control: reassign -1 mtd-utils So far I see no evidence for the claim that flashcp should not be used for writing to NAND devices in either its --help or its source (it has no man page AFAICS). Having a tool in Debian called "flashcp" which can (according to this report, I haven't checked this myself) destroy some classes of flash device with no warning is a clear problem irrespective of flash -kernel's use of that tool. At the very least flashcp should either abort when used on NAND devices or should be renamed norwrite (cf. nandwrite) but ideally it would Just Work properly when used on a nand device. mtd-utils maintainer(s), please let me know if this is either wontfix or if the fix is going to take some time, in either case I will workaround flashcp in flash-kernel (either permanently or temporarily respectively). I suppose it is also possible that this is a bug in the underlying /dev/mtdN and/or mtdblockN device or in the h/w specific driver at the bottom of the stack? codesearch.debian.net doesn't show any other use in packages other than flash-kernel, but of course that doesn't account for users calling the tool directly. Thanks, Ian. On Sun, 2016-02-07 at 12:09 +0100, Uwe Kleine-König wrote: > Package: flash-kernel > Version: 3.35+deb8u2 > Severity: critical > Justification: causes serious data loss > Control: block 806926 with -1 > > Hello, > > when flash-kernel writes a kernel/initrd to NAND flash it uses plain > write(2) to /dev/mtdX (flash-kernel < 3.52) or flashcp > (flash-kernel >= 3.52). If the device being written to has bad blocks > these are tried to be erased and written by both approaches. > > This results in a non-booting system at best. In general writing to > bad > blocks can also affect other (otherwise good) blocks and so result in > loss of unrelated data. I never saw this in practise, but the > manufacturers of NAND flash say so. > > I didn't check which machines are affected, but Netgear ReadyNAS > 102/104 > (which isn't in flash-kernel's database yet, but see below for the > obvious entry to add support for them and #806926) is affected and > flash > kernel managed to break a ReadyNAS 102 already (non-permanently by > good > fortune as far as I can tell up to now). > I guess there are several other machines affected though. > > The right fix is to use nandwrite to write to NAND flash and only use > flashcp for NOR. > > Something like > > test -f /sys/class/mtd/mtdX/oobsize > > could be used to detect if the device is NAND or NOR. But there might > be > more reliable ways I'm not aware of. > > I will debug/test a bit more with the broken rn102 (and its owner :-) > to > maybe come up with a patch, but if someone beats me that's very > welcome, > too. > > Best regards > Uwe > > -- System Information: > Debian Release: 8.2 > APT prefers stable > APT policy: (990, 'stable'), (500, 'unstable'), (500, 'testing'), > (1, 'experimental') > Architecture: armhf (armv7l) > > Kernel: Linux 3.16.0-4-armmp (SMP w/1 CPU core) > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > Init: systemd (via /run/systemd/system) > > Versions of packages flash-kernel depends on: > ii debconf [debconf-2.0] 1.5.56 > ii devio 1.2-1 > ii initramfs-tools0.120 > ii linux-base 3.5 > ii ucf3.0030 > > Versions of packages flash-kernel recommends: > ii u-boot-tools 2014.10+dfsg1-5 > > flash-kernel suggests no packages. > > -- Configuration Files: > /etc/flash-kernel/db changed: > Machine: NETGEAR ReadyNAS 104 > DTB-Id: armada-370-netgear-rn104.dtb > DTB-Append: yes > Mtd-Kernel: uImage > Mtd-Initrd: minirootfs > U-Boot-Kernel-Address: 0x0400 > U-Boot-Initrd-Address: 0x0500 > Required-Packages: u-boot-tools > > > -- debconf information: > flash-kernel/linux_cmdline: quiet > >