Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-08-13 Thread Ian Campbell
On Sun, 2017-08-13 at 11:56 +0200, Robert Schlabbach wrote:
> Von: "Ian Campbell" 
> > There is one other option, which is to ask people to adjust their
> u-
> > boot boot scripts as Robert has done, however the QNAP systems are
> > often run headless and without access to the serial console (it's a
> > special cable which only a minority of users will have access to)
> so
> > that really is a last resort.
> 
> Note that it is possible to modify the u-boot environment without the
> serial console, using the "fw_setenv" command in a running debian
> system.

Given a correct /etc/fw_env.config, yes. I think the TS-xxx ones are
pretty well known although I don't think we ship any anywhere and we
certainly don't install one automatically.

> So one possibility would indeed be to modify the flash-kernel scripts
> to use fw_printenv, "parse" the environment to detect affected systems
> and, if needed, use fw_setenv to make the necessary changes.
> 
> That's not really a "pretty" solution, though, and any bugs in that
> function could easily brick the device. Then again, there have been
> "bricking" changes in the past, so it wouldn't be an entirely new
> risk ;-)

Do you know if this particular brickage is undone by the recovery
procedure (http://www.cyrius.com/debian/kirkwood/qnap/ts-219/recovery/)
. I think one of the mtd devices which is backed up is the boot loader
config (mtd4) so I think the answer must be yes, but I've not tried it.

Ian.

> At least making the change _without_ flashing a new kernel should not
> be harmful as the moved initramfs location appears to be backwards
> compatible (though I've tried only 4.9 in practice).
> 
> Best regards,
> -Robert
> 



Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-08-13 Thread Robert Schlabbach
Von: "Ian Campbell" 
> There is one other option, which is to ask people to adjust their u-
> boot boot scripts as Robert has done, however the QNAP systems are
> often run headless and without access to the serial console (it's a
> special cable which only a minority of users will have access to) so
> that really is a last resort.

Note that it is possible to modify the u-boot environment without the
serial console, using the "fw_setenv" command in a running debian
system.

So one possibility would indeed be to modify the flash-kernel scripts
to use fw_printenv, "parse" the environment to detect affected systems
and, if needed, use fw_setenv to make the necessary changes.

That's not really a "pretty" solution, though, and any bugs in that
function could easily brick the device. Then again, there have been
"bricking" changes in the past, so it wouldn't be an entirely new
risk ;-)

At least making the change _without_ flashing a new kernel should not
be harmful as the moved initramfs location appears to be backwards
compatible (though I've tried only 4.9 in practice).

Best regards,
-Robert



Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-08-13 Thread Ian Campbell
On Sat, 2017-08-12 at 21:49 +0100, Ben Hutchings wrote:
> On Mon, 2017-07-31 at 18:10 +0200, Robert Schlabbach wrote:
> > Ok, I figured it out. I noticed that the 4.11 kernel has a more
> > "generous" memory layout than the 4.9 one:
> > 
> > kernel 4.9:
> > 
> > [0.00] Memory: 504492K/524288K available (3777K kernel code, 371K 
> > rwdata, 1128K rodata, 296K init, 247K bss, 19796K reserved, 0K 
> > cma-reserved, 0K highmem)
> > 
> > kernel 4.11:
> > 
> > [0.00] Memory: 502648K/524288K available (4096K kernel code, 398K 
> > rwdata, 1132K rodata, 1024K init, 248K bss, 21640K reserved, 0K 
> > cma-reserved, 0K highmem)
> > 
> > So I suspected that the 4.11 kernel might be overwriting/corrupting
> > the initrd.img provided in memory before it gets to unpack it, and
> > changed the memory location from 0xa0 to 0xc0:
> [...]
> > Voila! It's finally booting!
> > 
> > So, was the 4.11 kernel compiled/linked with a wrong alignment
> > padding setting? Or should the bootloader environment be changed to
> > permanently use the higher address for passing initrd.img to the
> > kernel?
> 
> Should this be assigned to flash-kernel?

Sadly probably not.

There are three relevant load addresses, the one in the uboot header
added to the kernel (added by flash-kernel) and the two baked into the
uboot boot script, one for the kernel and one for the initrd. In some
systems there is a forth one in the uboot header on the initrd binary
but the QNAP systems don't appear to use that one, the initrd in flash
is the raw one.

Robert is modifying the boot script load address for the initrd which
flash-kernel has no control over. flash-kernel only controls the
address in the kernel u-boot header and IIRC that has to match a build
time constant in the kernel, so while we could perhaps coordinate a
change here I don't think there would be an appropriate kernel load
address which would help very much here since AIUI the conflicting
addresses which cause overlaps are the boot script ones.

The only thing I can think of would be simply reducing the size of the
armel kernel binary. I believe Roger was already looking into that? 

I don't think looking into reducing the size of the initrd will help
since it is loaded second in RAM. I suppose it is worth double checking that 
/etc/initramfs-tools/initramfs.conf is using MODULES=dep (instead of most). I 
think d-i arranges that automatically on these platforms so it is highly 
probably Robert is already using it, Robert can you confirm?

Relatedly it does seem here like perhaps the kernels limit on kernel
size on these platforms needs to be shrunk to take into account the
boot time RAM layout considerations and not just the flash partition
sizes. Roger, what do you think?

There is one other option, which is to ask people to adjust their u-
boot boot scripts as Robert has done, however the QNAP systems are
often run headless and without access to the serial console (it's a
special cable which only a minority of users will have access to) so
that really is a last resort.

There's also the chainloaded u-boot solution, but realistically noone
appears to be working on that (me included).

Ian.



Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-08-12 Thread Ben Hutchings
On Mon, 2017-07-31 at 18:10 +0200, Robert Schlabbach wrote:
> Ok, I figured it out. I noticed that the 4.11 kernel has a more
> "generous" memory layout than the 4.9 one:
> 
> kernel 4.9:
> 
> [0.00] Memory: 504492K/524288K available (3777K kernel code, 371K 
> rwdata, 1128K rodata, 296K init, 247K bss, 19796K reserved, 0K cma-reserved, 
> 0K highmem)
> 
> kernel 4.11:
> 
> [0.00] Memory: 502648K/524288K available (4096K kernel code, 398K 
> rwdata, 1132K rodata, 1024K init, 248K bss, 21640K reserved, 0K cma-reserved, 
> 0K highmem)
> 
> So I suspected that the 4.11 kernel might be overwriting/corrupting
> the initrd.img provided in memory before it gets to unpack it, and
> changed the memory location from 0xa0 to 0xc0:
[...]
> Voila! It's finally booting!
> 
> So, was the 4.11 kernel compiled/linked with a wrong alignment
> padding setting? Or should the bootloader environment be changed to
> permanently use the higher address for passing initrd.img to the
> kernel?

Should this be assigned to flash-kernel?

Ben.

-- 
Ben Hutchings
Never put off till tomorrow what you can avoid all together.



signature.asc
Description: This is a digitally signed message part


Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-07-31 Thread Robert Schlabbach
Ok, I figured it out. I noticed that the 4.11 kernel has a more "generous" 
memory layout than the 4.9 one:

kernel 4.9:

[0.00] Memory: 504492K/524288K available (3777K kernel code, 371K 
rwdata, 1128K rodata, 296K init, 247K bss, 19796K reserved, 0K cma-reserved, 0K 
highmem)

kernel 4.11:

[0.00] Memory: 502648K/524288K available (4096K kernel code, 398K 
rwdata, 1132K rodata, 1024K init, 248K bss, 21640K reserved, 0K cma-reserved, 
0K highmem)

So I suspected that the 4.11 kernel might be overwriting/corrupting the 
initrd.img provided in memory before it gets to unpack it, and changed the 
memory location from 0xa0 to 0xc0:

Marvell>> tftpboot 0x80 C0A80802.img-4.11-bpo
[...]
Marvell>> cp.l 0xf840 0xc0 0x24
Marvell>> setenv bootargs earlycon console=ttyS0,115200 root=/dev/ram 
initrd=0xc0,0x90 ramdisk=34816 coherent_pool=1M
Marvell>> bootm 0x80
## Booting image at 0080 ...
   Image Name:   kernel 4.11.0-0.bpo.1-marvell
   Created:  2017-07-30  23:17:11 UTC
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:2076472 Bytes =  2 MB
   Load Address: 8000
   Entry Point:  8000
   Verifying Checksum ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.11.0-0.bpo.1-marvell 
(debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) 
) #1 Debian 4.11.6-1~bpo9+1 (2017-07-09)
[...]
[0.267272] Unpacking initramfs...
[0.597766] Freeing initrd memory: 9216K
[...]
Welcome to Debian GNU/Linux 9 (stretch)!

Voila! It's finally booting!

So, was the 4.11 kernel compiled/linked with a wrong alignment padding setting? 
Or should the bootloader environment be changed to permanently use the higher 
address for passing initrd.img to the kernel?



Bug#870185: FATAL: kernel 4.11.0-0.bpo.1-marvell does not boot on QNAP TS-219P II

2017-07-30 Thread Robert Schlabbach
Package: linux-image-4.11.0-0.bpo.1-marvell
Version: 4.11.6-1~bpo9+1: armel

I had Debian Stretch installed on my QNAP TS-219P II and already upgraded to 
the backports kernel linux-image-4.9.0-0.bpo.3-marvell (4.9.30-2+deb9u2~bpo8+1: 
armel). Upgrading that to the package specified above yielded a no-boot 
situation. The full output of the serial console:

 __  __  _ _
|  \/  | __ _ _    _| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|\_/ \___|_|_|
 _   _   _
| | | |   | __ )  ___   ___ | |_ 
| | | |___|  _ \ / _ \ / _ \| __| 
| |_| |___| |_) | (_) | (_) | |_ 
 \___/|/ \___/ \___/ \__|  ** LOADER **
 ** MARVELL BOARD: DB-88F6282A-BP LE TS-219P2+ ,PHY=1.8v 

U-Boot 1.1.4 (Jan  3 2012 - 14:49:37) Marvell version: 3.5.3

U-Boot code: 0060 -> 0067FFF0  BSS: -> 006CD5C0

Soc: MV88F6282 Rev 1CPU running @ 2000Mhz L2 running @ 500Mhz
SysClock = 500Mhz , TClock = 200Mhz 

DRAM (DDR3) CAS Latency = 7 tRP = 7 tRAS = 20 tRCD=7
DRAM CS[0] base 0x   size 256MB 
DRAM CS[1] base 0x1000   size 256MB 
DRAM Total size 512MB  16bit width
Addresses 8M - 0M are saved for the U-Boot usage.
Mem malloc Initialization (8M - 7M): Done
[16384kB@f800] Flash: 16 MB

CPU : Marvell Feroceon (Rev 1)
USB 0: host mode
PEX 0: PCI Express Root Complex Interface
PEX interface detected Link X1
PEX 1: PCI Express Root Complex Interface
PEX interface detected Link X1

Reset IDE: 
Marvell Serial ATA Adapter
Integrated Sata device found

Net:   egiga0 [PRIME]
Hit any key to stop autoboot:  0 
QNAP: Recovery Button pressed: 0
Marvell>> boot
Send Cmd : 0x68 to UART1
## Booting image at 0080 ...
   Image Name:   kernel 4.11.0-0.bpo.1-marvell
   Created:  2017-07-30  18:55:56 UTC
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:2076472 Bytes =  2 MB
   Load Address: 8000
   Entry Point:  8000
   Verifying Checksum ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.



That's it, no further output, no boot. The kernel seems to die early on.

I've observed such early failures on another platform when a "bad" device tree 
binary was flashed. Is it possible that kernel 4.11 requires a changed dtb, and 
that was not correctly upgraded?