Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-13 Thread Gijs Hillenius
I installed Debian Jessie on a ReadyNas102 that has a bad block, like so:

,
| NAND:  (ID 0xf1ad)  128 MiB
| MMC:   MRVL_MMC: 0
| Bad block table found at page 65472, version 0x01
| Bad block table found at page 65408, version 0x01
| nand_read_bbt: Bad block at 0x0024
`

Where using the current flash-kernel resulted in a non-booting
system, with messages like these:

,
| nand: nand_erase_nand: attempt to erase a bad block at page 0x0480
| mtdblock: erase of region [0x4, 0x2] on "uImage" failed
`

I can confirm that Uwe's patched flash-kernel works. Short quote from
the log:

,
| flash-kernel: appending 
/usr/lib/linux-image-3.16.0-4-armmp/armada-370-netgear-rn102.dtb to kernel
| Generating kernel u-boot image... done.
| Erasing 128 Kibyte @ 2 --  2 % complete flash_erase: Skipping bad block 
at 0004
| Erasing 128 Kibyte @ 5e -- 100 % complete 
| Writing data to block 0 at offset 0x0
| Writing data to block 1 at offset 0x2
| Writing data to block 2 at offset 0x4
| Bad block at 4, 1 block(s) from 4 will be skipped
| Writing data to block 3 at offset 0x6
`

-- 
Caution: Keep out of reach of children.



Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-10 Thread Uwe Kleine-König
Hello,

here is a patch for flash-kernel to become nand-aware. I also built a
package for armhf with this patch that is available from


https://debian.kleine-koenig.org/pool/main/f/flash-kernel/flash-kernel_3.56ukl2_armhf.deb

for testing. On my device (without bad blocks) this works fine.

After talking to some guys in #mtd on OFTC I think this is the right way
to go as upstream won't expand flashcp anytime soon.

Best regards
Uwe

--->8---
[PATCH] Use nand-aware tools to write to nand flashes

This fixes #813995 assuming that mtd-utils won't change
---
 debian/changelog |  3 +++
 functions| 14 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/debian/changelog b/debian/changelog
index 75c69d0..5bfa975 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -25,6 +25,9 @@ flash-kernel (3.57) UNRELEASED; urgency=medium
   * Add new marvell kernel flavour as alternative for orion5x and
 kirkwood.
 
+  [ Uwe Kleine-König ]
+  * use nandwrite when writing to nand flash.
+
  -- Colin Watson   Fri, 29 Jan 2016 13:35:17 +
 
 flash-kernel (3.56) unstable; urgency=medium
diff --git a/functions b/functions
index adfb85a..203edce 100644
--- a/functions
+++ b/functions
@@ -337,7 +337,19 @@ write_mtd() {
 
# Can't really flashcp to /dev/mtd when testing
if [ -z "${FK_TESTSUITE_RUNNING}" ]; then
-   flashcp "$input_file" "$output_mtd"
+   local flashtype=$(cat /sys/class/mtd/$base_mtd/type)
+   case "$flashtype" in
+   nand|mlc-nand)
+   flash_erase "$output_mtd" 0 0
+   nandwrite -p "$output_mtd" "$input_file"
+   ;;
+   nor)
+   flashcp "$input_file" "$output_mtd"
+   ;;
+   *)
+   error "unsupported flash type"
+   ;;
+   esac
else
cp "$input_file" "$output_mtd"
fi
-- 
2.1.4



signature.asc
Description: PGP signature


Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-09 Thread Uwe Kleine-König
Hello *,

On 02/07/2016 12:34 PM, Ian Campbell wrote:
> So far I see no evidence for the claim that flashcp should not be used
> for writing to NAND devices in either its --help or its source (it has
> no man page AFAICS).
> 
> Having a tool in Debian called "flashcp" which can (according to this
> report, I haven't checked this myself) destroy some classes of flash
> device with no warning is a clear problem irrespective of flash
> -kernel's use of that tool.
> 
> At the very least flashcp should either abort when used on NAND devices
> or should be renamed norwrite (cf. nandwrite) but ideally it would Just
> Work properly when used on a nand device.

"Work properly" might not be well defined here. If flashcp should be
taught to write to nand, how should it behave? Like nandwrite with -p?
What about -m? Probably without -o.

I would guess that renaming flashcp to (say) norwrite isn't an option
for upstream. I think the naming was coined before nand flash was widely
adopted. So I still think the best thing to do here is to teach
flash-kernel about nand chips and let it use nandwrite then. Adding a
note to flashcp -h that it is only supposed to work on nor and let it
fail for nand also sounds right.

> mtd-utils maintainer(s), please let me know if this is either wontfix
> or if the fix is going to take some time, in either case I will
> workaround flashcp in flash-kernel (either permanently or temporarily
> respectively).
> 
> I suppose it is also possible that this is a bug in the underlying
> /dev/mtdN and/or mtdblockN device or in the h/w specific driver at the
> bottom of the stack?

No, mtdblockN exposes bad blocks by design:


http://linux-kernel.2935.n7.nabble.com/PATCH-Make-the-mtdblock-read-write-skip-the-bad-nand-sector-td756524.html

Best regards
Uwe



Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-07 Thread Uwe Kleine-König
Package: flash-kernel
Version: 3.35+deb8u2
Severity: critical
Justification: causes serious data loss
Control: block 806926 with -1

Hello,

when flash-kernel writes a kernel/initrd to NAND flash it uses plain
write(2) to /dev/mtdX (flash-kernel < 3.52) or flashcp
(flash-kernel >= 3.52). If the device being written to has bad blocks
these are tried to be erased and written by both approaches.

This results in a non-booting system at best. In general writing to bad
blocks can also affect other (otherwise good) blocks and so result in
loss of unrelated data. I never saw this in practise, but the
manufacturers of NAND flash say so.

I didn't check which machines are affected, but Netgear ReadyNAS 102/104
(which isn't in flash-kernel's database yet, but see below for the
obvious entry to add support for them and #806926) is affected and flash
kernel managed to break a ReadyNAS 102 already (non-permanently by good
fortune as far as I can tell up to now).
I guess there are several other machines affected though.

The right fix is to use nandwrite to write to NAND flash and only use
flashcp for NOR.

Something like

test -f /sys/class/mtd/mtdX/oobsize

could be used to detect if the device is NAND or NOR. But there might be
more reliable ways I'm not aware of.

I will debug/test a bit more with the broken rn102 (and its owner :-) to
maybe come up with a patch, but if someone beats me that's very welcome,
too.

Best regards
Uwe

-- System Information:
Debian Release: 8.2
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'unstable'), (500, 'testing'), (1, 
'experimental')
Architecture: armhf (armv7l)

Kernel: Linux 3.16.0-4-armmp (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages flash-kernel depends on:
ii  debconf [debconf-2.0]  1.5.56
ii  devio  1.2-1
ii  initramfs-tools0.120
ii  linux-base 3.5
ii  ucf3.0030

Versions of packages flash-kernel recommends:
ii  u-boot-tools  2014.10+dfsg1-5

flash-kernel suggests no packages.

-- Configuration Files:
/etc/flash-kernel/db changed:
Machine: NETGEAR ReadyNAS 104
DTB-Id: armada-370-netgear-rn104.dtb
DTB-Append: yes
Mtd-Kernel: uImage
Mtd-Initrd: minirootfs
U-Boot-Kernel-Address: 0x0400
U-Boot-Initrd-Address: 0x0500
Required-Packages: u-boot-tools


-- debconf information:
  flash-kernel/linux_cmdline: quiet



Processed: Re: Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-07 Thread Debian Bug Tracking System
Processing control commands:

> reassign -1 mtd-utils
Bug #813995 [flash-kernel] flash-kernel: writes to nand without being aware of 
bad blocks
Bug reassigned from package 'flash-kernel' to 'mtd-utils'.
No longer marked as found in versions flash-kernel/3.35+deb8u2.
Ignoring request to alter fixed versions of bug #813995 to the same values 
previously set

-- 
813995: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813995
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#813995: flash-kernel: writes to nand without being aware of bad blocks

2016-02-07 Thread Ian Campbell
Control: reassign -1 mtd-utils

So far I see no evidence for the claim that flashcp should not be used
for writing to NAND devices in either its --help or its source (it has
no man page AFAICS).

Having a tool in Debian called "flashcp" which can (according to this
report, I haven't checked this myself) destroy some classes of flash
device with no warning is a clear problem irrespective of flash
-kernel's use of that tool.

At the very least flashcp should either abort when used on NAND devices
or should be renamed norwrite (cf. nandwrite) but ideally it would Just
Work properly when used on a nand device.

mtd-utils maintainer(s), please let me know if this is either wontfix
or if the fix is going to take some time, in either case I will
workaround flashcp in flash-kernel (either permanently or temporarily
respectively).

I suppose it is also possible that this is a bug in the underlying
/dev/mtdN and/or mtdblockN device or in the h/w specific driver at the
bottom of the stack?

codesearch.debian.net doesn't show any other use in packages other than
flash-kernel, but of course that doesn't account for users calling the
tool directly.

Thanks,
Ian.

On Sun, 2016-02-07 at 12:09 +0100, Uwe Kleine-König wrote:
> Package: flash-kernel
> Version: 3.35+deb8u2
> Severity: critical
> Justification: causes serious data loss
> Control: block 806926 with -1
> 
> Hello,
> 
> when flash-kernel writes a kernel/initrd to NAND flash it uses plain
> write(2) to /dev/mtdX (flash-kernel < 3.52) or flashcp
> (flash-kernel >= 3.52). If the device being written to has bad blocks
> these are tried to be erased and written by both approaches.
> 
> This results in a non-booting system at best. In general writing to
> bad
> blocks can also affect other (otherwise good) blocks and so result in
> loss of unrelated data. I never saw this in practise, but the
> manufacturers of NAND flash say so.
> 
> I didn't check which machines are affected, but Netgear ReadyNAS
> 102/104
> (which isn't in flash-kernel's database yet, but see below for the
> obvious entry to add support for them and #806926) is affected and
> flash
> kernel managed to break a ReadyNAS 102 already (non-permanently by
> good
> fortune as far as I can tell up to now).
> I guess there are several other machines affected though.
> 
> The right fix is to use nandwrite to write to NAND flash and only use
> flashcp for NOR.
> 
> Something like
> 
>   test -f /sys/class/mtd/mtdX/oobsize
> 
> could be used to detect if the device is NAND or NOR. But there might
> be
> more reliable ways I'm not aware of.
> 
> I will debug/test a bit more with the broken rn102 (and its owner :-)
> to
> maybe come up with a patch, but if someone beats me that's very
> welcome,
> too.
> 
> Best regards
> Uwe
> 
> -- System Information:
> Debian Release: 8.2
>   APT prefers stable
>   APT policy: (990, 'stable'), (500, 'unstable'), (500, 'testing'),
> (1, 'experimental')
> Architecture: armhf (armv7l)
> 
> Kernel: Linux 3.16.0-4-armmp (SMP w/1 CPU core)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: systemd (via /run/systemd/system)
> 
> Versions of packages flash-kernel depends on:
> ii  debconf [debconf-2.0]  1.5.56
> ii  devio  1.2-1
> ii  initramfs-tools0.120
> ii  linux-base 3.5
> ii  ucf3.0030
> 
> Versions of packages flash-kernel recommends:
> ii  u-boot-tools  2014.10+dfsg1-5
> 
> flash-kernel suggests no packages.
> 
> -- Configuration Files:
> /etc/flash-kernel/db changed:
> Machine: NETGEAR ReadyNAS 104
> DTB-Id: armada-370-netgear-rn104.dtb
> DTB-Append: yes
> Mtd-Kernel: uImage
> Mtd-Initrd: minirootfs
> U-Boot-Kernel-Address: 0x0400
> U-Boot-Initrd-Address: 0x0500
> Required-Packages: u-boot-tools
> 
> 
> -- debconf information:
>   flash-kernel/linux_cmdline: quiet
> 
>