Bug#883413: linux-image-4.14.0-1-amd64: WARN_ON_ONCE in page_counter_cancel() in mm/page_counter.c

2018-01-04 Thread Chris Boot
On 30/12/17 23:24, Chris Boot wrote:
> What makes me suspicious that these are related is that neither happens
> with a 4.13 kernel, but I get both of these cgroup-related problems with
> 4.14.
> 
> I wouldn't mind trying to bisect this, but I haven't done that for many
> years. Is there a nice way to do this with the Debian packaging or am I
> better off seeing if I can reproduce with vanilla upstream kernels and
> bisecting that? Or shall I give 4.15~rc5 from experimental a whirl instead?

I tried with linux-image-4.15.0-rc5-amd64_4.15~rc5-1~exp1 and my cgroup
issues no longer happen, so I think this is likely fixed in 4.15.

Unfortunately I'm now running into a KVM instability that feels like
#885166, so I'm going to go back to 4.13 shortly.

Cheers,
Chris

-- 
Chris Boot
bo...@debian.org

GPG: 8467 53CB 1921 3142 C56D  C918 F5C8 3C05 D9CE 



Bug#883413: linux-image-4.14.0-1-amd64: WARN_ON_ONCE in page_counter_cancel() in mm/page_counter.c

2017-12-30 Thread Chris Boot
On 25/12/17 23:09, Ben Hutchings wrote:
> On Sat, 2017-12-23 at 12:42 +, Chris Boot wrote:
>> Severity: serious
>> Justification: kernel panic
>>
>> I experimented a little and disabled cgroupv2 on that server. Because I 
>> had some issues during boot I attempted to enable 
>> NetworkManager-wait-online.service using systemd, but that instantly 
>> resulted in the following kernel panic:
> [...]
>> I don't know that this is the same bug at all, but I'm keeping it on
>> this report for now as it seems at least related somehow.
> 
> The log messages don't look even slightly related, so please move this 
> to a separate bug report.

I'm still not so certain - both sets of stack dumps fall somewhere
within cgroup space, and disabling systemd's cgroup accounting (not
enabled by default) avoids these conditions.

I like to run this system with the following all enabled in
/etc/systemd/system.conf:

DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultBlockIOAccounting=yes
DefaultMemoryAccounting=yes

These are useful for tools like systemd-cgtop for example.

With cgroupv2, I can avoid the error by disabling
DefaultMemoryAccounting. I was running for nearly 48 hours with this
configuration before rebooting to try without cgroupv2.

Without cgroupv2, it's DefaultCPUAccounting I need to disable to avoid
the panics when I run 'systemd daemon-reload'. I have yet to run into
the warning or OOM killer with memory accounting enabled but I'll let
you know if it happens.

What makes me suspicious that these are related is that neither happens
with a 4.13 kernel, but I get both of these cgroup-related problems with
4.14.

I wouldn't mind trying to bisect this, but I haven't done that for many
years. Is there a nice way to do this with the Debian packaging or am I
better off seeing if I can reproduce with vanilla upstream kernels and
bisecting that? Or shall I give 4.15~rc5 from experimental a whirl instead?

Thanks,
Chris

-- 
Chris Boot
bo...@debian.org



Bug#883413: linux-image-4.14.0-1-amd64: WARN_ON_ONCE in page_counter_cancel() in mm/page_counter.c

2017-12-25 Thread Ben Hutchings
On Sat, 2017-12-23 at 12:42 +, Chris Boot wrote:
> Severity: serious
> Justification: kernel panic
> 
> I experimented a little and disabled cgroupv2 on that server. Because I 
> had some issues during boot I attempted to enable 
> NetworkManager-wait-online.service using systemd, but that instantly 
> resulted in the following kernel panic:
[...]
> I don't know that this is the same bug at all, but I'm keeping it on
> this report for now as it seems at least related somehow.

The log messages don't look even slightly related, so please move this 
to a separate bug report.

Ben.

-- 
Ben Hutchings
The world is coming to an end.  Please log off.



signature.asc
Description: This is a digitally signed message part