Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable

2023-09-15 Thread Timo Sigurdsson
Hi,

Salvatore Bonaccorso schrieb am 12.09.2023 21:13 (GMT +02:00):

> Hi Timo,
> 
> On Tue, Sep 12, 2023 at 01:39:59PM +0200, Timo Sigurdsson wrote:
>> Hi Pablo,
>> 
>> Pablo Neira Ayuso schrieb am 12.09.2023 00:57 (GMT +02:00):
>> 
>> > Hi Timo,
>> > 
>> > On Mon, Sep 11, 2023 at 11:37:50PM +0200, Timo Sigurdsson wrote:
>> >> Hi,
>> >> 
>> >> recently, Debian updated their stable kernel from 6.1.38 to 6.1.52
>> >> which broke nftables ruleset loading on one of my machines with lots
>> >> of "Operation not supported" errors. I've reported this to the
>> >> Debian project (see link below) and Salvatore Bonaccorso and I
>> >> identified "netfilter: nf_tables: disallow rule addition to bound
>> >> chain via NFTA_RULE_CHAIN_ID" (0ebc1064e487) as the offending commit
>> >> that introduced the regression. Salvatore also found that this issue
>> >> affects the 5.10 stable tree as well (observed in 5.10.191), but he
>> >> cannot reproduce it on 6.4.13 and 6.5.2.
>> >> 
>> >> The issue only occurs with some rulesets. While I can't trigger it
>> >> with simple/minimal rulesets that I use on some machines, it does
>> >> occur with a more complex ruleset that has been in use for months
>> >> (if not years, for large parts of it). I'm attaching a somewhat
>> >> stripped down version of the ruleset from the machine I originally
>> >> observed this issue on. It's still not a small or simple ruleset,
>> >> but I'll try to reduce it further when I have more time.
>> >> 
>> >> The error messages shown when trying to load the ruleset don't seem
>> >> to be helpful. Just two simple examples: Just to give two simple
>> >> examples from the log when nftables fails to start:
>> >> /etc/nftables.conf:99:4-44: Error: Could not process rule: Operation not
>> >> supported
>> >> tcp option maxseg size 1-500 counter drop
>> >> ^
>> >> /etc/nftables.conf:308:4-27: Error: Could not process rule: Operation not
>> >> supported
>> >> tcp dport sip-tls accept
>> >> 
>> > 
>> > I can reproduce this issue with 5.10.191 and 6.1.52 and nftables v1.0.6,
>> > this is not reproducible with v1.0.7 and v1.0.8.
>> > 
>> >> Since the issue only affects some stable trees, Salvatore thought it
>> >> might be an incomplete backport that causes this.
>> >> 
>> >> If you need further information, please let me know.
>> > 
>> > Userspace nftables v1.0.6 generates incorrect bytecode that hits a new
>> > kernel check that rejects adding rules to bound chains. The incorrect
>> > bytecode adds the chain binding, attach it to the rule and it adds the
>> > rules to the chain binding. I have cherry-picked these three patches
>> > for nftables v1.0.6 userspace and your ruleset restores fine.
>> 
>> hmm, that doesn't explain why Salvatore didn't observe this with
>> more recent kernels.
>> 
>> Salvatore, did you use newer userspace components when you tested
>> your 6.4.13 and 6.5.2 builds?
> 
> It does explain now because understanding the issue better. While one
> while experinting should only change each one constraint for the
> 6.4.13 and 6.5.2 testing I indeed switched to a Debian unstable
> system, which has newer userpace nftables and so not triggering the
> issue. This was missleading for the report.
> 
>> As for the regression and how it be dealt with: Personally, I don't
>> really care whether the regression is solved in the kernel or
>> userspace. If everybody agrees that this is the best or only viable
>> option and Debian decides to push a nftables update to fix this,
>> that works for me. But I do feel the burden to justify this should
>> be high. A kernel change that leaves users without a working packet
>> filter after upgrading their machines is serious, if you ask me. And
>> since it affects several stable/longterm trees, I would assume this
>> will hit other stable (non-rolling) distributions as well, since
>> they will also use older userspace components (unless this is
>> behavior specific to nftables 1.0.6 but not older versions). They
>> probably should get a heads up then.
> 
> So if it is generally believed on kernel side there should not happen
> any further chan

Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable

2023-09-12 Thread Timo Sigurdsson
Hi,

Florian Westphal schrieb am 12.09.2023 12:27 (GMT +02:00):

> Linux regression tracking (Thorsten Leemhuis) 
> wrote:
>> On 12.09.23 00:57, Pablo Neira Ayuso wrote:
>> > Userspace nftables v1.0.6 generates incorrect bytecode that hits a new
>> > kernel check that rejects adding rules to bound chains. The incorrect
>> > bytecode adds the chain binding, attach it to the rule and it adds the
>> > rules to the chain binding. I have cherry-picked these three patches
>> > for nftables v1.0.6 userspace and your ruleset restores fine.
>> > [...]
>> 
>> H. Well, this sounds like a kernel regression to me that normally
>> should be dealt with on the kernel level, as users after updating the
>> kernel should never have to update any userspace stuff to continue what
>> they have been doing before the kernel update.
> 
> This is a combo of a userspace bug and this new sanity check that
> rejects the incorrect ordering (adding rules to the already-bound
> anonymous chain).
> 

Out of curiosity, did the incorrect ordering or bytecode from the older 
userspace components actually lead to a wrong representation of the rules in 
the kernel or did the rules still work despite all that?

Thanks,

Timo 



Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable

2023-09-12 Thread Timo Sigurdsson
Hi Pablo,

Pablo Neira Ayuso schrieb am 12.09.2023 00:57 (GMT +02:00):

> Hi Timo,
> 
> On Mon, Sep 11, 2023 at 11:37:50PM +0200, Timo Sigurdsson wrote:
>> Hi,
>> 
>> recently, Debian updated their stable kernel from 6.1.38 to 6.1.52
>> which broke nftables ruleset loading on one of my machines with lots
>> of "Operation not supported" errors. I've reported this to the
>> Debian project (see link below) and Salvatore Bonaccorso and I
>> identified "netfilter: nf_tables: disallow rule addition to bound
>> chain via NFTA_RULE_CHAIN_ID" (0ebc1064e487) as the offending commit
>> that introduced the regression. Salvatore also found that this issue
>> affects the 5.10 stable tree as well (observed in 5.10.191), but he
>> cannot reproduce it on 6.4.13 and 6.5.2.
>> 
>> The issue only occurs with some rulesets. While I can't trigger it
>> with simple/minimal rulesets that I use on some machines, it does
>> occur with a more complex ruleset that has been in use for months
>> (if not years, for large parts of it). I'm attaching a somewhat
>> stripped down version of the ruleset from the machine I originally
>> observed this issue on. It's still not a small or simple ruleset,
>> but I'll try to reduce it further when I have more time.
>> 
>> The error messages shown when trying to load the ruleset don't seem
>> to be helpful. Just two simple examples: Just to give two simple
>> examples from the log when nftables fails to start:
>> /etc/nftables.conf:99:4-44: Error: Could not process rule: Operation not
>> supported
>> tcp option maxseg size 1-500 counter drop
>> ^
>> /etc/nftables.conf:308:4-27: Error: Could not process rule: Operation not
>> supported
>> tcp dport sip-tls accept
>> 
> 
> I can reproduce this issue with 5.10.191 and 6.1.52 and nftables v1.0.6,
> this is not reproducible with v1.0.7 and v1.0.8.
> 
>> Since the issue only affects some stable trees, Salvatore thought it
>> might be an incomplete backport that causes this.
>> 
>> If you need further information, please let me know.
> 
> Userspace nftables v1.0.6 generates incorrect bytecode that hits a new
> kernel check that rejects adding rules to bound chains. The incorrect
> bytecode adds the chain binding, attach it to the rule and it adds the
> rules to the chain binding. I have cherry-picked these three patches
> for nftables v1.0.6 userspace and your ruleset restores fine.

hmm, that doesn't explain why Salvatore didn't observe this with more recent 
kernels.

Salvatore, did you use newer userspace components when you tested your 6.4.13 
and 6.5.2 builds?

As for the regression and how it be dealt with: Personally, I don't really care 
whether the regression is solved in the kernel or userspace. If everybody 
agrees that this is the best or only viable option and Debian decides to push a 
nftables update to fix this, that works for me. But I do feel the burden to 
justify this should be high. A kernel change that leaves users without a 
working packet filter after upgrading their machines is serious, if you ask me. 
And since it affects several stable/longterm trees, I would assume this will 
hit other stable (non-rolling) distributions as well, since they will also use 
older userspace components (unless this is behavior specific to nftables 1.0.6 
but not older versions). They probably should get a heads up then.


Regards,

Timo



Bug#1051592: linux: Regression - upgrade to 6.1.52-1 breaks nftables

2023-09-11 Thread Timo Sigurdsson
Hi Salvatore,

Salvatore Bonaccorso schrieb am 11.09.2023 22:20 (GMT +02:00):

> Bisected the issue:
> 
> $ git bisect log
> git bisect start
> # status: waiting for both good and bad commits
> # good: [61fd484b2cf6bc8022e8e5ea6f693a9991740ac2] Linux 6.1.38
> git bisect good 61fd484b2cf6bc8022e8e5ea6f693a9991740ac2
> # status: waiting for bad commit, 1 good commit known
> # bad: [1321ab403b38366a4cfb283145bb2c005becb1e5] Linux 6.1.45
> git bisect bad 1321ab403b38366a4cfb283145bb2c005becb1e5
> # good: [95d49f79e94d4fa8105c880a266789609f3e791a] ext4: only update
> i_reserved_data_blocks on successful block allocation
> git bisect good 95d49f79e94d4fa8105c880a266789609f3e791a
> # good: [f8b61a2c29fc70f64daad698cf09c1f79a0e39f9] drm/amd/display: Set 
> minimum
> requirement for using PSR-SU on Rembrandt
> git bisect good f8b61a2c29fc70f64daad698cf09c1f79a0e39f9
> # bad: [bd2decac7345134ea0bd3f4b978478ef53367cd8] mptcp: ensure subflow is
> unhashed before cleaning the backlog
> git bisect bad bd2decac7345134ea0bd3f4b978478ef53367cd8
> # bad: [fe3409cd013cfd10d3e6787b49f33a5dda39cffd] RDMA/irdma: Fix op_type
> reporting in CQEs
> git bisect bad fe3409cd013cfd10d3e6787b49f33a5dda39cffd
> # good: [85c38ac62c1372cc1ab05426315aad61025d33ef] atheros: fix return value
> check in atl1_tso()
> git bisect good 85c38ac62c1372cc1ab05426315aad61025d33ef
> # bad: [539cf23cb48835c69cc3d22edff28b92bd82bb18] tipc: stop tipc crypto on
> failure in tipc_node_create
> git bisect bad 539cf23cb48835c69cc3d22edff28b92bd82bb18
> # good: [1ecdbf2467ae4bc4df00c5cfab427cb1aaa5e3e1] x86/traps: Fix
> load_unaligned_zeropad() handling for shared TDX memory
> git bisect good 1ecdbf2467ae4bc4df00c5cfab427cb1aaa5e3e1
> # bad: [7218974aba07ff60c646d5a512b02b871402b03e] mm: suppress mm fault 
> logging
> if fatal signal already pending
> git bisect bad 7218974aba07ff60c646d5a512b02b871402b03e
> # good: [89a4d1a89751a0fbd520e64091873e19cc0979e8] netfilter: nft_set_rbtree:
> fix overlap expiration walk
> git bisect good 89a4d1a89751a0fbd520e64091873e19cc0979e8
> # bad: [268cb07ef3ee17b5454a7c4b23376802c5b00c79] netfilter: nf_tables:
> disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
> git bisect bad 268cb07ef3ee17b5454a7c4b23376802c5b00c79
> # good: [4237462a073e24f71c700f3e5929f07b6ee1bcaa] netfilter: nf_tables: skip
> immediate deactivate in _PREPARE_ERROR
> git bisect good 4237462a073e24f71c700f3e5929f07b6ee1bcaa
> # first bad commit: [268cb07ef3ee17b5454a7c4b23376802c5b00c79] netfilter:
> nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
> 
> $ git bisect visualize
> commit 268cb07ef3ee17b5454a7c4b23376802c5b00c79
> Author: Pablo Neira Ayuso 
> Date:   Sun Jul 23 16:41:48 2023 +0200
> 
> netfilter: nf_tables: disallow rule addition to bound chain via
> NFTA_RULE_CHAIN_ID
> 
> [ Upstream commit 0ebc1064e4874d5987722a2ddbc18f94aa53b211 ]
> 
> Bail out with EOPNOTSUPP when adding rule to bound chain via
> NFTA_RULE_CHAIN_ID. The following warning splat is shown when
> adding a rule to a deleted bound chain:
> 
>  WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013
>  nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
>  CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
>  RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
> 
> Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
> Reported-by: Kevin Rich 
> Signed-off-by: Pablo Neira Ayuso 
> Signed-off-by: Florian Westphal 
> Signed-off-by: Sasha Levin 

Hehe, yes, I was just about to write you the same. My test build with this one 
reverted lets me load the ruleset again.

Would you like to take this upstream? I was just about to file a report in 
netfilter's bugzilla, but since you also worked on it as well, I don't mean to 
interfere...

I'll try to further reduce my test ruleset to see what actually triggers this.

Thanks and regards,

Timo



Bug#1051592: linux: Regression - upgrade to 6.1.52-1 breaks nftables

2023-09-10 Thread Timo Sigurdsson
Hi,

Salvatore Bonaccorso schrieb am 10.09.2023 12:21 (GMT +02:00):

> Would it be possible to provide a minimal set of rules triggering the
> issue? Can you reproduce the issue with the official build?

So, I did some more testing on a different machine running the official build. 
My findings so far are:
1) Yes, I can reproduce the issue with the official build.
2) The issue depends on the ruleset. The minimal ruleset I have on that 
machine, doesn't trigger the issue, but when I copy over the ruleset from the 
machine I first observed this on, then I can reproduce it.

I'm attaching a somewhat stripped down version of my original, rather complex 
ruleset. It's by no means a "minimal" reproducer, cause I haven't had the time 
yet to further reduce it in order to see what actually triggers it. But you 
should be able to observe that this ruleset loads just fine on linux 6.1.38-4, 
but doesn't anymore on 6.1.52-1.

I also started looking into what commit could have introduced this. My first 
guess "netfilter: nft_dynset: disallow object maps" (23185c6aed1f) is wrong. 
Even with this one reverted, the issue occurs. I'll try another build with 
"netfilter: nf_tables: disallow rule addition to bound chain via 
NFTA_RULE_CHAIN_ID" (0ebc1064e487) reverted tomorrow evening...

Kind regards,

Timo


P.S.: Regarding the severity: Treat it with whatever severity you see fit. I 
was a bit in a hurry and didn't actually look at the definitions for the 
different severity options this morning. 

#!/usr/sbin/nft -f

flush ruleset

define public_if = eth0
define trusted_if = eth1
define voip_if = eth2.10
define guest_if = eth2.20
define home_if = { $trusted_if, $voip_if, $guest_if }
define home_ipv6_if = { $trusted_if, $voip_if, $guest_if }

define masq_ip = { 192.168.1.0/24, 192.168.2.0/24, 192.168.3.0/24, 
192.168.4.0/24 }
define masq_if = $public_if

define host1_ip = 192.168.1.10
define host2_ip = 192.168.2.20
define host3_ip = 192.168.3.30
define host4_ip = 192.168.4.40

define proxy_port = 8443

define private_ip = { 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8 }
define private_ip6 = { fe80::/64, fd00::/8 }
define bogons_ip = { 0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 127.0.0.0/8, 
169.254.0.0/16, 172.16.0.0/12, 192.0.0.0/24, 192.0.2.0/24, 192.168.0.0/16, 
198.18.0.0/15, 198.51.100.0/24, 203.0.113.0/24, 224.0.0.0/3 }
define bogons_ip6 = { ::/3, 2001:0002::/48, 2001:0003::/32, 2001:10::/28, 
2001:20::/28, 2001::/32, 2001:db8::/32, 2002::/16, 3000::/4, 4000::/2, 8000::/1 
}

define sip_whitelist_ip6 = { 2001:db8::1/128, 2001:db8::2/128 }
define smtps_whitelist_ip = 10.0.0.1
define protocol_whitelist = { tcp, udp, icmp, ipv6-icmp }

table inet filter {
map if_input {
type ifname : verdict;
elements = { $public_if : jump public_input, $trusted_if : jump 
home_input, $voip_if : jump home_input, $guest_if : jump home_input }
}
map if_forward {
type ifname : verdict;
elements = { $public_if : jump public_forward, $trusted_if : 
jump trusted_forward, $voip_if : jump voip_forward, $guest_if : jump 
guest_forward }
}
map if_output {
type ifname : verdict;
elements = { $public_if : jump public_output, $trusted_if : 
jump home_output, $voip_if : jump home_output, $guest_if : jump home_output }
}

set ipv4_blacklist { type ipv4_addr; flags interval; auto-merge; }
set ipv6_blacklist { type ipv6_addr; flags interval; auto-merge; }
set limit_src_ip { type ipv4_addr; flags dynamic, timeout; size 1024; }
set limit_src_ip6 { type ipv6_addr; flags dynamic, timeout; size 1024; }

chain PREROUTING_RAW {
type filter hook prerouting priority raw;

meta l4proto != $protocol_whitelist counter drop
tcp flags syn jump {
tcp option maxseg size 1-500 counter drop
tcp sport 0 counter drop
}
rt type 0 counter drop
}

chain PREROUTING_MANGLE {
type filter hook prerouting priority mangle;

ct state vmap { invalid : jump ct_invalid_pre, untracked : jump 
ct_untracked_pre, new : jump ct_new_pre, related : jump rpfilter }
}
chain ct_invalid_pre {
counter drop
}
chain ct_untracked_pre {
icmpv6 type { nd-router-solicit, nd-router-advert, 
nd-neighbor-solicit, nd-neighbor-advert, mld-listener-query, 
mld2-listener-report } return
counter drop
}
chain ct_new_pre {
jump rpfilter

tcp flags & (fin|syn|rst|ack) != syn counter drop

iifname $public_if meta nfproto vmap { ipv4 : jump 
blacklist_input_ipv4, ipv6 : jump blacklist_input_ipv6 }
}
chain rpfilter {
ip saddr 0.0.0.0 ip daddr 255.255.255.255 udp sport bootpc udp 
dport bootps 

Bug#1051592: linux: Regression - upgrade to 6.1.52-1 breaks nftables

2023-09-10 Thread Timo Sigurdsson
Package: linux
Version: 6.1.52-1
Severity: grave

Dear Maintainers,

linux-image-6.1.0-12-amd64 causes a serious regression in nftables. After 
upgrading one of my machines, nftables fails to start - leaving the system 
without an active firewall.

Doing
`nft -cf /etc/nftables.conf'
throws many "Operation not supported" errors on rulesets that have been in 
place for months wihtout issues.

Just to give two simple examples from the log when nftables fails to start:
/etc/nftables.conf:99:4-44: Error: Could not process rule: Operation not 
supported
tcp option maxseg size 1-500 counter drop
^
/etc/nftables.conf:308:4-27: Error: Could not process rule: Operation not 
supported
tcp dport sip-tls accept


Downgrading to linux-image-6.1.0-11-amd64 resolves the issue.

Notes: I'm running a local rebuild of linux-image-amd64 with a few additional 
symbols enabled. But since these symbols are totally unrelated to the netfilter 
subsystem and there are no changes to the source itself, I'm certain, this 
affects the original Debian build as well. Whether it only affects certain 
architectures or rulesets, I can't say, though.

I'm cc'ing debian-secur...@debian.org because the update came via the 
stable-security channel.


Thanks and regards,

Timo



Bug#922478: have yet to find an armhf board that works with 4.9.144-3

2019-02-18 Thread Timo Sigurdsson
Hi Cyril,

Cyril Brulebois schrieb am 18.02.2019 17:09:

> Based on this suggestion and Julien's suggested patch on IRC a couple
> hours ago, I've tested the attached patch successfully (as in: from a
> busy loop in qemu-system-arm to the “expected” kernel panic, as
> discussed in another subthread).
> 
> I've uploaded linux-image binaries (armmp and armmp-lpae) here, which
> were cross-built through sbuild, thanks to Vagrant's suggestion on IRC:
>  https://people.debian.org/~kibi/linux-bug-922478/
> 
> which is:
>  DEBIAN_KERNEL_DISABLE_DEBUG=yes sbuild -d stretch-proposed-updates -c
>  stretch-amd64-sbuild --build=amd64 --profiles='pkg.linux.notools nodoc
>  nopython cross pkg.linux.nosource' --host=armhf linux_4.9.144-4.dsc
> 
> Checking this on real hardware would be great, trying to put everyone
> involved in the loop through cc.
> 

Thanks for the patch and effort. I can confirm your kernel package boots fine 
on my LeMaker BananaPi.
  $ uname -a
  Linux bananapi 4.9.0-8-armmp-lpae #1 SMP Debian 4.9.144-4 (2019-02-18) armv7l 
GNU/Linux

journalctl also shows no errors that haven't been there before.


Regards,

Timo



Bug#922478: upgrade linux-image-4.9.0-8-armmp-lpae:armhf from 4.9.130-2 to 4.9.144-3 renders Bananapi and Lamobo R1 unbootable

2019-02-18 Thread Timo Sigurdsson
Hi,

On Mon, 18 Feb 2019 11:28:10 +, Neil Williams  wrote:
> Is it feasible to have a script in devscripts or similar which maps the
> version of the kernel *Candidate* to KernelCI URLs for the same
> version?
> 
> Can we correlate Debian kernel versions to something like
> https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.144/
> or
> https://kernelci.org/boot/all/job/stable-rc/kernel/v4.9.144/ ?

I just had another look and found they also have a job for the stable releases 
rather than the release candidates:
https://kernelci.org/build/stable/branch/linux-4.9.y/kernel/v4.9.144/

So, as long as the Debian kernel is based on a longterm kernel which is still 
supported upstream, the mapping should work.

What might be worth a thought as well, though, is to have such automated 
testing of the Debian kernels as well. Either by asking the Kernel CI project 
whether they'd be willing to build and test Debian kernels, too, or by setting 
up an infrastructure similar to theirs just for Debian. Now, I wouldn't expect 
Debian to have as much hardware to test on, but in this particular case it 
would have helped already to test the kernel in a virtualized setup. Based on 
this thread, it seems to me that the armhf kernels haven't received any boot 
testing prior to release. If that's really the case, I guess something along 
these lines might help Debian substantially.


Cheers,

Timo



Bug#922478: upgrade linux-image-4.9.0-8-armmp-lpae:armhf from 4.9.130-2 to 4.9.144-3 renders Bananapi and Lamobo R1 unbootable

2019-02-17 Thread Timo Sigurdsson
Hi,

Cyril Brulebois schrieb am 17.02.2019 19:38:

> Hi folks,
> 
> Jürgen Löb  (2019-02-16):
>> Package: linux-image-4.9.0-8-armmp-lpae
>> Version: 4.9.144-3
>> Severity: serious
>> 
>> Updated my Lamobo R1 board with apt update;apt upgrade
>> 
>> After the update uboot is struck at "Starting kernel". There is no
>> further output after "Starting kernel". Same happens on Bananapi 1
>> board. Unfortunately there is no more useful information.
> […]
> 
> Summing up, it looks like everybody in cc is confirming the regression
> happens between 4.9.130-2 and 4.9.144-3, with and without lpae, on
> various boards. Any chance you could check what happens with the
> 4.9.135-1 intermediary version that can be found on snapshot.debian.org?
> 
>  https://snapshot.debian.org/package/linux/4.9.135-1/
> 
> This might help narrow it down when the regression happened.
> 
> (And please use reply-all so that everyone is kept in the loop.)

So, I also tested 4.9.135-1 on a Bananapi board and can confirm it works.

I would suspect the issue is caused by Debian's kernel configuration or 
changes. The Kernel CI project has ARM hardware, including the Bananapi board 
and does tests of stable kernel updates to verify that the kernel boots. At 
least with multi_v7_defconfig and sunxi_defconfig, upstream 4.9.144 does boot 
on Allwinner-based hardware, see: 
https://kernelci.org/soc/allwinner/job/stable-rc/kernel/v4.9.144/

On a sidenote: This issue makes me wonder if Debian's approach to kernel 
updates (i.e. not bumping the version number/ABI and overwriting the kernel 
image and modules) is really the best option. IMHO Ubuntu's handling of kernel 
updates is more robust. It would have made things much easier today if I could 
have simply selected an older kernel in the bootloader, rather than having to 
recover one from a backup.

Kind regards,

Timo



Bug#922478: upgrade linux-image-4.9.0-8-armmp-lpae:armhf from 4.9.130-2 to 4.9.144-3 renders Bananapi and Lamobo R1 unbootable

2019-02-17 Thread Timo Sigurdsson
Hi,

I've also been hit by this bug on two systems (both are Lemaker Bananapi). The 
first system upgraded the kernel via unattended-upgrades and failed to come up 
after reboot. I don't have a serial cable, but I did hook up the board to a 
HDMI display. U-Boot loads the kernel, dtb and initramfs and starts the kernel 
but that's the last message and nothing happens after that anymore. My first 
suspicion was that something went wrong during the upgrade and the reboot might 
have happened before everything was configured. But I also tried manual package 
upgrade with a second device via apt update && apt full-upgrade followed by a 
manual reboot. That system didn't boot either.

I recovered both systems by replacing the contents of the directories /boot/ 
and /lib/modules/ with those of a recent backup (taken 3 days ago). After 
logging into the systems again, I downgraded the package 
linux-image-4.9.0-8-armmp-lpae to 4.9.130-2 and rebooted again in order to make 
sure no other package upgrade caused the issue. Indeed, with all packages 
up-to-date except linux-image-4.9.0-8-armmp-lpae, the systems work just fine.

So, there must be a serious regression in 4.9.144-3 at least on armmp-lpae. My 
amd64 systems run fine, btw, even with the latest kernel.


Thanks,

Timo