Re: Recent kernel anomalies
Hi, On Fri, Nov 16, 2018 at 04:07:32PM +0100, Daniel Bilik wrote: > On Fri, 16 Nov 2018 16:29:15 +0200 > "karu.pruun" wrote: > > > Can you specify what graphics hw are you using? > > vgapci0@pci0:0:2:0: class=0x03 card=0x222617aa chip=0x16168086 rev=0x09 > hdr=0x00 > vendor = 'Intel Corporation' > device = 'HD Graphics 5500' This means a Broadwell GPU. I have a Broadwell laptop on hand; I will try to reproduce and fix whatever is going wrong. Cheers, -- Francois Tigeot
Re: Recent kernel anomalies
On Fri, 16 Nov 2018 16:29:15 +0200 "karu.pruun" wrote: > Can you specify what graphics hw are you using? Sure... vgapci0@pci0:0:2:0: class=0x03 card=0x222617aa chip=0x16168086 rev=0x09 hdr=0x00 vendor = 'Intel Corporation' device = 'HD Graphics 5500' class = display subclass = VGA -- Daniel
Re: Recent kernel anomalies
On Fri, Nov 16, 2018 at 3:35 PM Daniel Bilik wrote: > > On Fri, 16 Nov 2018 10:49:33 + > Antonio Huete Jiménez wrote: > > > Would you mind pin pointing the commit that may be causing the problems? > > You can use 'git bisect' for it. > > Done, bisect pointed me to commit c9f83a7+: > > Author: Francois Tigeot > Date: Mon Nov 5 22:15:18 2018 +0100 > > drm/linux: Fix vmap() > > Page protection information was not being used. > > Prior to this change, chromium is stable for me. This modification makes > chromium to fall on various signals (seen 4, 6, 10 and 11). I guess the > update causes memory corruption under some circumstancies, making the > kernel to misbehave (which can possibly lead to a filesystem not being > unmounted properly). Can you specify what graphics hw are you using? Peeter --
Re: dhcpcd now in dfly - disabled by default, tests needed
On Fri, 16 Nov 2018 21:52:58 +0800 Aaron LI wrote: > One more thing, I don't know whether you tell dhcpcd to obtain *both* > IPv4 and IPv6. Configured just to get IPv4 address. After several more reboots (needed to bisect another problem, see thread "Recent kernel anomalies" ;-)), I've hit a state where dhcpcd was unable to acquire an address even with custom flags. There were more errors than just "transition lost"... wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost iwm0: device timeout iwm0: dumping device error log iwm0: errlog not found, skipping iwm0: could not initiate scan iwm0: could not initiate scan iwm0: could not initiate scan ... and a system was left without an address, until I've manually restarted networking via "/etc/rc.d/netif restart". Reverting back to dhclient made getting the connectivity smooth again for me. HTH. -- Daniel
Re: dhcpcd now in dfly - disabled by default, tests needed
On Fri, 16 Nov 2018 10:10:59 +0100 Daniel Bilik wrote: > On Fri, 16 Nov 2018 15:59:47 +0800 > Aaron LI wrote: > > > Just report the success/failure in this mailing list is OK > > OK, here we go... I've just tested a setup with dhcpcd following > instructions from your original post. System really has chosen dhcpcd > instead of dhclient during boot, but was unable to acquire an address. > This is excerpt from a log: > > --- > Starting dhcpcd. > em0: waiting for carrier > > (notably long delay here) > > timed out > > dhcpcd exited OK, dhcpcd will wait for a lease until timeout (seems 30 seconds). Adding '-b' option tells dhcpcd to fork to background immediately. > wlan0: driver didn't set altq_maxlen > wlan0: MAC address: 34:02:xx:xx:xx:xx > Starting wpa_supplicant. > Starting dhcpcd. > wlan0: waiting for carrier > > wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost > wlan0: carrier acquired > > DUID 00:01:00:01:23:81:31:88:34:02:xx:xx:xx:xx > > wlan0: IAID xx:xx:xx:xx > > wlan0: soliciting a DHCP lease > > wlan0: offered 10.x.x.x from 10.x.x.x > > wlan0: probing address 10.x.x.x/21 > > timed out > > dhcpcd exited > --- I don't know why dhcpcd exited here. I'll find time to test it on my ThinkPad X200 for the wireless part. > Timeout on em0 was expected as the cable was not connected, just the delay > waiting for a carrier was annoying. But dhcpcd failed to get an address > also on wlan0, even though it got link. > > After adding dhcpcd_flags="-b -t 0" to rc.conf, the system successfully > gets an address during boot. But comparing it to dhclient, it takes a > little longer to get the connectivity. I suspect this is related to > "transition lost" error (see above) that I don't get with dhclient. According to the dhcpcd(8) man page, '-t 0' seems to mean waiting forever... you can test with other values about the '-t' option. One more thing, I don't know whether you tell dhcpcd to obtain *both* IPv4 and IPv6. If so, I expect dhcpcd can take a bit longer than dhclient, which only supports IPv4. Thanks for the nice report. Cheers, -- Aaron pgppCltIU5Pha.pgp Description: OpenPGP digital signature
Re: Recent kernel anomalies
On Fri, 16 Nov 2018 10:49:33 + Antonio Huete Jiménez wrote: > Would you mind pin pointing the commit that may be causing the problems? > You can use 'git bisect' for it. Done, bisect pointed me to commit c9f83a7+: Author: Francois Tigeot Date: Mon Nov 5 22:15:18 2018 +0100 drm/linux: Fix vmap() Page protection information was not being used. Prior to this change, chromium is stable for me. This modification makes chromium to fall on various signals (seen 4, 6, 10 and 11). I guess the update causes memory corruption under some circumstancies, making the kernel to misbehave (which can possibly lead to a filesystem not being unmounted properly). -- Daniel
Re: Recent kernel anomalies
As of 8 Nov (d1dbb0fb), I can't see these issues. This suggests the need to bisect between 8 and 16 Nov. Cheers Peeter -- On Fri, Nov 16, 2018 at 12:49 PM Antonio Huete Jiménez wrote: > > Hi Daniel, > > Would you mind pin pointing the commit that may be causing the problems? > You can use 'git bisect' for it. > > Regards, > Antonio Huete > > Daniel Bilik escribió: > > > Hi. > > > > After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to > > fall on me. Well, not really chromium itself, but "just" extensions are > > crashing during browser activity, taking down chromium child processes, > > with kernel reporting something like this: > > > > pid 1123 (chrome), uid 1001: exited on signal 10 > > pid 1717 (chrome), uid 1001: exited on signal 4 > > pid 1512 (chrome), uid 1002: exited on signal 4 > > pid 1146 (chrome), uid 1001: exited on signal 6 > > pid 1149 (chrome), uid 1001: exited on signal 10 > > pid 1811 (chrome), uid 1001: exited on signal 11 > > pid 1724 (chrome), uid 1001: exited on signal 4 > > pid 1806 (chrome), uid 1002: exited on signal 11 > > pid 1310 (chrome), uid 1001: exited on signal 10 > > pid 1343 (chrome), uid 1002: exited on signal 10 > > pid 1021 (chrome), uid 1001: exited on signal 11 > > > > I've also noticed another anomalous behaviour with this kernel... > > > > On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which > > prevents it from starting after next system boot. And, for one time, > > Windowmaker state file was sort-of reset on reboot, and I had to recover > > it from a snapshot. > > > > Other applications did not seem to have problems, or at least I did not > > hit any more. > > > > Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains > > current (21b2a00+) - everything is back to normal, ie. chromium processes > > are not crashing and ssh-agent cleans its socket on reboot. > > > > From Oct 31 to now, there have been some modifications to kernel, and also > > some vfs related ones. But looking at particular commits, I'm not sure > > what to try to revert. Any hints? > > > > Thanks. > > > > -- > > Daniel > > >
Re: Recent kernel anomalies
Hi Daniel, Would you mind pin pointing the commit that may be causing the problems? You can use 'git bisect' for it. Regards, Antonio Huete Daniel Bilik escribió: Hi. After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to fall on me. Well, not really chromium itself, but "just" extensions are crashing during browser activity, taking down chromium child processes, with kernel reporting something like this: pid 1123 (chrome), uid 1001: exited on signal 10 pid 1717 (chrome), uid 1001: exited on signal 4 pid 1512 (chrome), uid 1002: exited on signal 4 pid 1146 (chrome), uid 1001: exited on signal 6 pid 1149 (chrome), uid 1001: exited on signal 10 pid 1811 (chrome), uid 1001: exited on signal 11 pid 1724 (chrome), uid 1001: exited on signal 4 pid 1806 (chrome), uid 1002: exited on signal 11 pid 1310 (chrome), uid 1001: exited on signal 10 pid 1343 (chrome), uid 1002: exited on signal 10 pid 1021 (chrome), uid 1001: exited on signal 11 I've also noticed another anomalous behaviour with this kernel... On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which prevents it from starting after next system boot. And, for one time, Windowmaker state file was sort-of reset on reboot, and I had to recover it from a snapshot. Other applications did not seem to have problems, or at least I did not hit any more. Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains current (21b2a00+) - everything is back to normal, ie. chromium processes are not crashing and ssh-agent cleans its socket on reboot. From Oct 31 to now, there have been some modifications to kernel, and also some vfs related ones. But looking at particular commits, I'm not sure what to try to revert. Any hints? Thanks. -- Daniel
Recent kernel anomalies
Hi. After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to fall on me. Well, not really chromium itself, but "just" extensions are crashing during browser activity, taking down chromium child processes, with kernel reporting something like this: pid 1123 (chrome), uid 1001: exited on signal 10 pid 1717 (chrome), uid 1001: exited on signal 4 pid 1512 (chrome), uid 1002: exited on signal 4 pid 1146 (chrome), uid 1001: exited on signal 6 pid 1149 (chrome), uid 1001: exited on signal 10 pid 1811 (chrome), uid 1001: exited on signal 11 pid 1724 (chrome), uid 1001: exited on signal 4 pid 1806 (chrome), uid 1002: exited on signal 11 pid 1310 (chrome), uid 1001: exited on signal 10 pid 1343 (chrome), uid 1002: exited on signal 10 pid 1021 (chrome), uid 1001: exited on signal 11 I've also noticed another anomalous behaviour with this kernel... On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which prevents it from starting after next system boot. And, for one time, Windowmaker state file was sort-of reset on reboot, and I had to recover it from a snapshot. Other applications did not seem to have problems, or at least I did not hit any more. Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains current (21b2a00+) - everything is back to normal, ie. chromium processes are not crashing and ssh-agent cleans its socket on reboot. >From Oct 31 to now, there have been some modifications to kernel, and also some vfs related ones. But looking at particular commits, I'm not sure what to try to revert. Any hints? Thanks. -- Daniel
Re: dhcpcd now in dfly - disabled by default, tests needed
On Fri, 16 Nov 2018 15:59:47 +0800 Aaron LI wrote: > Just report the success/failure in this mailing list is OK OK, here we go... I've just tested a setup with dhcpcd following instructions from your original post. System really has chosen dhcpcd instead of dhclient during boot, but was unable to acquire an address. This is excerpt from a log: --- Starting dhcpcd. em0: waiting for carrier (notably long delay here) timed out dhcpcd exited wlan0: driver didn't set altq_maxlen wlan0: MAC address: 34:02:xx:xx:xx:xx Starting wpa_supplicant. Starting dhcpcd. wlan0: waiting for carrier wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost wlan0: carrier acquired DUID 00:01:00:01:23:81:31:88:34:02:xx:xx:xx:xx wlan0: IAID xx:xx:xx:xx wlan0: soliciting a DHCP lease wlan0: offered 10.x.x.x from 10.x.x.x wlan0: probing address 10.x.x.x/21 timed out dhcpcd exited --- Timeout on em0 was expected as the cable was not connected, just the delay waiting for a carrier was annoying. But dhcpcd failed to get an address also on wlan0, even though it got link. After adding dhcpcd_flags="-b -t 0" to rc.conf, the system successfully gets an address during boot. But comparing it to dhclient, it takes a little longer to get the connectivity. I suspect this is related to "transition lost" error (see above) that I don't get with dhclient. -- Daniel
Re: dhcpcd now in dfly - disabled by default, tests needed
On Fri, 16 Nov 2018 08:53:00 +0100 Daniel Bilik wrote: > > > The imported dhcpcd is disabled by default. After more tests, we'll > > enable it by default > > Where should we direct success and/or failure reports? Hi Daniel, Just report the success/failure in this mailing list is OK, as we don't get large mail flux. If you have more logs or screenshots to show, then it would be better to open an issue at: https://bugs.dragonflybsd.org/ Thanks. -- Aaron pgpV5R_YksRgs.pgp Description: OpenPGP digital signature