Re: Recent kernel anomalies

2018-11-16 Thread Francois Tigeot
Hi,

On Fri, Nov 16, 2018 at 04:07:32PM +0100, Daniel Bilik wrote:
> On Fri, 16 Nov 2018 16:29:15 +0200
> "karu.pruun"  wrote:
> 
> > Can you specify what graphics hw are you using?
> 
> vgapci0@pci0:0:2:0:   class=0x03 card=0x222617aa chip=0x16168086 rev=0x09 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = 'HD Graphics 5500'

This means a Broadwell GPU.

I have a Broadwell laptop on hand; I will try to reproduce and fix
whatever is going wrong.

Cheers,

-- 
Francois Tigeot



Re: Recent kernel anomalies

2018-11-16 Thread Daniel Bilik
On Fri, 16 Nov 2018 16:29:15 +0200
"karu.pruun"  wrote:

> Can you specify what graphics hw are you using?

Sure...

vgapci0@pci0:0:2:0: class=0x03 card=0x222617aa chip=0x16168086 rev=0x09 
hdr=0x00
vendor = 'Intel Corporation'
device = 'HD Graphics 5500'
class  = display
subclass   = VGA

--
Daniel


Re: Recent kernel anomalies

2018-11-16 Thread karu.pruun
On Fri, Nov 16, 2018 at 3:35 PM Daniel Bilik  wrote:
>
> On Fri, 16 Nov 2018 10:49:33 +
> Antonio Huete Jiménez  wrote:
>
> > Would you mind pin pointing the commit that may be causing the problems?
> > You can use 'git bisect' for it.
>
> Done, bisect pointed me to commit c9f83a7+:
>
> Author: Francois Tigeot 
> Date:   Mon Nov 5 22:15:18 2018 +0100
>
> drm/linux: Fix vmap()
>
> Page protection information was not being used.
>
> Prior to this change, chromium is stable for me. This modification makes
> chromium to fall on various signals (seen 4, 6, 10 and 11). I guess the
> update causes memory corruption under some circumstancies, making the
> kernel to misbehave (which can possibly lead to a filesystem not being
> unmounted properly).

Can you specify what graphics hw are you using?

Peeter

--


Re: dhcpcd now in dfly - disabled by default, tests needed

2018-11-16 Thread Daniel Bilik
On Fri, 16 Nov 2018 21:52:58 +0800
Aaron LI  wrote:

> One more thing, I don't know whether you tell dhcpcd to obtain *both*
> IPv4 and IPv6.

Configured just to get IPv4 address.

After several more reboots (needed to bisect another problem, see thread
"Recent kernel anomalies" ;-)), I've hit a state where dhcpcd was unable
to acquire an address even with custom flags. There were more errors than
just "transition lost"...

wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
iwm0: device timeout
iwm0: dumping device error log
iwm0: errlog not found, skipping
iwm0: could not initiate scan
iwm0: could not initiate scan
iwm0: could not initiate scan

... and a system was left without an address, until I've manually
restarted networking via "/etc/rc.d/netif restart".

Reverting back to dhclient made getting the connectivity smooth again for
me.

HTH.

--
Daniel


Re: dhcpcd now in dfly - disabled by default, tests needed

2018-11-16 Thread Aaron LI
On Fri, 16 Nov 2018 10:10:59 +0100
Daniel Bilik  wrote:

> On Fri, 16 Nov 2018 15:59:47 +0800
> Aaron LI  wrote:
> 
> > Just report the success/failure in this mailing list is OK  
> 
> OK, here we go... I've just tested a setup with dhcpcd following
> instructions from your original post. System really has chosen dhcpcd
> instead of dhclient during boot, but was unable to acquire an address.
> This is excerpt from a log:
> 
> ---
> Starting dhcpcd.
> em0: waiting for carrier
> 
> (notably long delay here)
> 
> timed out
> 
> dhcpcd exited

OK, dhcpcd will wait for a lease until timeout (seems 30 seconds).
Adding '-b' option tells dhcpcd to fork to background immediately.

> wlan0: driver didn't set altq_maxlen
> wlan0: MAC address: 34:02:xx:xx:xx:xx
> Starting wpa_supplicant.
> Starting dhcpcd.
> wlan0: waiting for carrier
> 
> wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
> wlan0: carrier acquired
> 
> DUID 00:01:00:01:23:81:31:88:34:02:xx:xx:xx:xx
> 
> wlan0: IAID xx:xx:xx:xx
> 
> wlan0: soliciting a DHCP lease
> 
> wlan0: offered 10.x.x.x from 10.x.x.x
> 
> wlan0: probing address 10.x.x.x/21
> 
> timed out
> 
> dhcpcd exited
> ---

I don't know why dhcpcd exited here.  I'll find time to test it on my
ThinkPad X200 for the wireless part.

> Timeout on em0 was expected as the cable was not connected, just the delay
> waiting for a carrier was annoying. But dhcpcd failed to get an address
> also on wlan0, even though it got link.
> 
> After adding dhcpcd_flags="-b -t 0" to rc.conf, the system successfully
> gets an address during boot. But comparing it to dhclient, it takes a
> little longer to get the connectivity. I suspect this is related to
> "transition lost" error (see above) that I don't get with dhclient.

According to the dhcpcd(8) man page, '-t 0' seems to mean waiting forever...
you can test with other values about the '-t' option.

One more thing, I don't know whether you tell dhcpcd to obtain *both* IPv4
and IPv6.  If so, I expect dhcpcd can take a bit longer than dhclient, which
only supports IPv4.

Thanks for the nice report.


Cheers,
-- 
Aaron


pgppCltIU5Pha.pgp
Description: OpenPGP digital signature


Re: Recent kernel anomalies

2018-11-16 Thread Daniel Bilik
On Fri, 16 Nov 2018 10:49:33 +
Antonio Huete Jiménez  wrote:

> Would you mind pin pointing the commit that may be causing the problems?
> You can use 'git bisect' for it.

Done, bisect pointed me to commit c9f83a7+:

Author: Francois Tigeot 
Date:   Mon Nov 5 22:15:18 2018 +0100

drm/linux: Fix vmap()

Page protection information was not being used.

Prior to this change, chromium is stable for me. This modification makes
chromium to fall on various signals (seen 4, 6, 10 and 11). I guess the
update causes memory corruption under some circumstancies, making the
kernel to misbehave (which can possibly lead to a filesystem not being
unmounted properly).

--
Daniel


Re: Recent kernel anomalies

2018-11-16 Thread karu.pruun
As of 8 Nov (d1dbb0fb), I can't see these issues. This suggests the
need to bisect between 8 and 16 Nov.

Cheers

Peeter

--

On Fri, Nov 16, 2018 at 12:49 PM Antonio Huete Jiménez
 wrote:
>
> Hi Daniel,
>
> Would you mind pin pointing the commit that may be causing the problems?
> You can use 'git bisect' for it.
>
> Regards,
> Antonio Huete
>
> Daniel Bilik  escribió:
>
> > Hi.
> >
> > After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to
> > fall on me. Well, not really chromium itself, but "just" extensions are
> > crashing during browser activity, taking down chromium child processes,
> > with kernel reporting something like this:
> >
> > pid 1123 (chrome), uid 1001: exited on signal 10
> > pid 1717 (chrome), uid 1001: exited on signal 4
> > pid 1512 (chrome), uid 1002: exited on signal 4
> > pid 1146 (chrome), uid 1001: exited on signal 6
> > pid 1149 (chrome), uid 1001: exited on signal 10
> > pid 1811 (chrome), uid 1001: exited on signal 11
> > pid 1724 (chrome), uid 1001: exited on signal 4
> > pid 1806 (chrome), uid 1002: exited on signal 11
> > pid 1310 (chrome), uid 1001: exited on signal 10
> > pid 1343 (chrome), uid 1002: exited on signal 10
> > pid 1021 (chrome), uid 1001: exited on signal 11
> >
> > I've also noticed another anomalous behaviour with this kernel...
> >
> > On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which
> > prevents it from starting after next system boot. And, for one time,
> > Windowmaker state file was sort-of reset on reboot, and I had to recover
> > it from a snapshot.
> >
> > Other applications did not seem to have problems, or at least I did not
> > hit any more.
> >
> > Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains
> > current (21b2a00+) - everything is back to normal, ie. chromium processes
> > are not crashing and ssh-agent cleans its socket on reboot.
> >
> > From Oct 31 to now, there have been some modifications to kernel, and also
> > some vfs related ones. But looking at particular commits, I'm not sure
> > what to try to revert. Any hints?
> >
> > Thanks.
> >
> > --
> >   Daniel
>
>
>


Re: Recent kernel anomalies

2018-11-16 Thread Antonio Huete Jiménez

Hi Daniel,

Would you mind pin pointing the commit that may be causing the problems?
You can use 'git bisect' for it.

Regards,
Antonio Huete

Daniel Bilik  escribió:


Hi.

After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to
fall on me. Well, not really chromium itself, but "just" extensions are
crashing during browser activity, taking down chromium child processes,
with kernel reporting something like this:

pid 1123 (chrome), uid 1001: exited on signal 10
pid 1717 (chrome), uid 1001: exited on signal 4
pid 1512 (chrome), uid 1002: exited on signal 4
pid 1146 (chrome), uid 1001: exited on signal 6
pid 1149 (chrome), uid 1001: exited on signal 10
pid 1811 (chrome), uid 1001: exited on signal 11
pid 1724 (chrome), uid 1001: exited on signal 4
pid 1806 (chrome), uid 1002: exited on signal 11
pid 1310 (chrome), uid 1001: exited on signal 10
pid 1343 (chrome), uid 1002: exited on signal 10
pid 1021 (chrome), uid 1001: exited on signal 11

I've also noticed another anomalous behaviour with this kernel...

On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which
prevents it from starting after next system boot. And, for one time,
Windowmaker state file was sort-of reset on reboot, and I had to recover
it from a snapshot.

Other applications did not seem to have problems, or at least I did not
hit any more.

Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains
current (21b2a00+) - everything is back to normal, ie. chromium processes
are not crashing and ssh-agent cleans its socket on reboot.

From Oct 31 to now, there have been some modifications to kernel, and also
some vfs related ones. But looking at particular commits, I'm not sure
what to try to revert. Any hints?

Thanks.

--
Daniel






Recent kernel anomalies

2018-11-16 Thread Daniel Bilik
Hi.

After updating to current 5.3-DEVELOPMENT (21b2a00+), chromium started to
fall on me. Well, not really chromium itself, but "just" extensions are
crashing during browser activity, taking down chromium child processes,
with kernel reporting something like this:

pid 1123 (chrome), uid 1001: exited on signal 10
pid 1717 (chrome), uid 1001: exited on signal 4
pid 1512 (chrome), uid 1002: exited on signal 4
pid 1146 (chrome), uid 1001: exited on signal 6
pid 1149 (chrome), uid 1001: exited on signal 10
pid 1811 (chrome), uid 1001: exited on signal 11
pid 1724 (chrome), uid 1001: exited on signal 4
pid 1806 (chrome), uid 1002: exited on signal 11
pid 1310 (chrome), uid 1001: exited on signal 10
pid 1343 (chrome), uid 1002: exited on signal 10
pid 1021 (chrome), uid 1001: exited on signal 11

I've also noticed another anomalous behaviour with this kernel...

On reboot, ssh-agent(1) leaves its socket laying on the filesystem, which
prevents it from starting after next system boot. And, for one time,
Windowmaker state file was sort-of reset on reboot, and I had to recover
it from a snapshot.

Other applications did not seem to have problems, or at least I did not
hit any more.

Using kernel.old (5af112a+ from Oct 31) - just kernel, userland remains
current (21b2a00+) - everything is back to normal, ie. chromium processes
are not crashing and ssh-agent cleans its socket on reboot.

>From Oct 31 to now, there have been some modifications to kernel, and also
some vfs related ones. But looking at particular commits, I'm not sure
what to try to revert. Any hints?

Thanks.

--
Daniel


Re: dhcpcd now in dfly - disabled by default, tests needed

2018-11-16 Thread Daniel Bilik
On Fri, 16 Nov 2018 15:59:47 +0800
Aaron LI  wrote:

> Just report the success/failure in this mailing list is OK

OK, here we go... I've just tested a setup with dhcpcd following
instructions from your original post. System really has chosen dhcpcd
instead of dhclient during boot, but was unable to acquire an address.
This is excerpt from a log:

---
Starting dhcpcd.
em0: waiting for carrier

(notably long delay here)

timed out

dhcpcd exited

wlan0: driver didn't set altq_maxlen
wlan0: MAC address: 34:02:xx:xx:xx:xx
Starting wpa_supplicant.
Starting dhcpcd.
wlan0: waiting for carrier

wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
wlan0: carrier acquired

DUID 00:01:00:01:23:81:31:88:34:02:xx:xx:xx:xx

wlan0: IAID xx:xx:xx:xx

wlan0: soliciting a DHCP lease

wlan0: offered 10.x.x.x from 10.x.x.x

wlan0: probing address 10.x.x.x/21

timed out

dhcpcd exited
---

Timeout on em0 was expected as the cable was not connected, just the delay
waiting for a carrier was annoying. But dhcpcd failed to get an address
also on wlan0, even though it got link.

After adding dhcpcd_flags="-b -t 0" to rc.conf, the system successfully
gets an address during boot. But comparing it to dhclient, it takes a
little longer to get the connectivity. I suspect this is related to
"transition lost" error (see above) that I don't get with dhclient.

--
Daniel


Re: dhcpcd now in dfly - disabled by default, tests needed

2018-11-16 Thread Aaron LI
On Fri, 16 Nov 2018 08:53:00 +0100
Daniel Bilik  wrote:
> 
> > The imported dhcpcd is disabled by default.  After more tests, we'll
> > enable it by default  
> 
> Where should we direct success and/or failure reports?

Hi Daniel,

Just report the success/failure in this mailing list is OK, as we don't get
large mail flux.  If you have more logs or screenshots to show, then it would
be better to open an issue at: https://bugs.dragonflybsd.org/

Thanks.

-- 
Aaron


pgpV5R_YksRgs.pgp
Description: OpenPGP digital signature