Re: Panic on a -current from 13/12/2018

2018-12-21 Thread Chavdar Ivanov
On Fri, 21 Dec 2018 at 22:05, Cherry G. Mathew  wrote:
>
> On December 22, 2018 2:24:44 AM GMT+05:30, Chavdar Ivanov  
> wrote:
> ...
> >
> >It is interesting also that when NetBSD is ran under XenServer (XCP-NG
> >actually) in PV mode, benchmarked against the same 8.99.28 version
> >running on a physical machine, everything on a 1GB interface and
> >switch, I get maximum saturated line (~ 933Mb/s). When the iperf3
> >server is on the same XCP-BG guest and the client - a CentOS guest -
> >the figures approach 2.3Gb/sec.
> >
>
> Do you have jumbo frames on on the centos VM?
Not as far as I see it:
...
2: eth0:  mtu 1500 qdisc mq state UP
group default qlen 1000
link/ether da:6f:e3:73:da:ce brd ff:ff:ff:ff:ff:ff
inet 192.168.0.22/24 brd 192.168.0.255 scope global noprefixroute eth0
   valid_lft forever preferred_lft forever
inet6 fe80::4975:2632:3eb9:916/64 scope link noprefixroute
   valid_lft forever preferred_lft forever
...

the iperf3 figures are as follows:
...

$ iperf3  -c n8x
 1 ↵
Connecting to host n8x, port 5201
[  4] local 192.168.0.22 port 46924 connected to 192.168.0.202 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   184 MBytes  1.54 Gbits/sec0   66.5 KBytes
[  4]   1.00-2.00   sec   251 MBytes  2.11 Gbits/sec0103 KBytes
[  4]   2.00-3.00   sec   291 MBytes  2.44 Gbits/sec0133 KBytes
[  4]   3.00-4.00   sec   329 MBytes  2.76 Gbits/sec0164 KBytes
[  4]   4.00-5.00   sec   334 MBytes  2.81 Gbits/sec0205 KBytes
[  4]   5.00-6.00   sec   289 MBytes  2.43 Gbits/sec0205 KBytes
[  4]   6.00-7.00   sec   327 MBytes  2.74 Gbits/sec0205 KBytes
[  4]   7.00-8.00   sec   329 MBytes  2.76 Gbits/sec0205 KBytes
[  4]   8.00-9.00   sec   325 MBytes  2.72 Gbits/sec0205 KBytes
[  4]   9.00-10.00  sec   331 MBytes  2.77 Gbits/sec0205 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  2.92 GBytes  2.51 Gbits/sec0 sender
[  4]   0.00-10.00  sec  2.92 GBytes  2.51 Gbits/sec  receiver

iperf Done.
...

n8x is actually 8-STABLE. The figures for -current in the same
conditions are a bit slower:---
-
─$ iperf3  -c hween
Connecting to host hween, port 5201
[  4] local 192.168.0.22 port 47856 connected to 192.168.0.248 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   178 MBytes  1.49 Gbits/sec0   67.9 KBytes
[  4]   1.00-2.00   sec   219 MBytes  1.84 Gbits/sec0100 KBytes
[  4]   2.00-3.00   sec   246 MBytes  2.06 Gbits/sec0136 KBytes
[  4]   3.00-4.00   sec   263 MBytes  2.20 Gbits/sec0165 KBytes
[  4]   4.00-5.00   sec   287 MBytes  2.41 Gbits/sec0199 KBytes
[  4]   5.00-6.00   sec   275 MBytes  2.31 Gbits/sec0199 KBytes
[  4]   6.00-7.00   sec   264 MBytes  2.22 Gbits/sec0199 KBytes
[  4]   7.00-8.00   sec   264 MBytes  2.22 Gbits/sec0199 KBytes
[  4]   8.00-9.00   sec   267 MBytes  2.24 Gbits/sec0199 KBytes
[  4]   9.00-10.00  sec   274 MBytes  2.30 Gbits/sec0199 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  2.48 GBytes  2.13 Gbits/sec0 sender
[  4]   0.00-10.00  sec  2.48 GBytes  2.13 Gbits/sec  receiver

iperf Done.

---

Between the two NetBSD hosts I get just under 1.7Gbist/sec.

Go figure...

>
> Thanks,
>


-- 



Re: Panic on a -current from 13/12/2018

2018-12-21 Thread Cherry G. Mathew
On December 22, 2018 2:24:44 AM GMT+05:30, Chavdar Ivanov  
wrote:
...
>
>It is interesting also that when NetBSD is ran under XenServer (XCP-NG
>actually) in PV mode, benchmarked against the same 8.99.28 version
>running on a physical machine, everything on a 1GB interface and
>switch, I get maximum saturated line (~ 933Mb/s). When the iperf3
>server is on the same XCP-BG guest and the client - a CentOS guest -
>the figures approach 2.3Gb/sec.
>

Do you have jumbo frames on on the centos VM? 

Thanks,



Re: Panic on a -current from 13/12/2018

2018-12-21 Thread Chavdar Ivanov
I managed to build the VBox v6.0 additions under 8.99.28, now when
using vioif interface I get reasonable results:
...
PS C:\bin\iperf-3.1.3-win64> .\iperf3.exe -c marge
Connecting to host marge, port 5201
[  4] local 192.168.0.35 port 10152 connected to 192.168.0.6 port 5201
[ ID] Interval   Transfer Bandwidth
[  4]   0.00-1.00   sec  68.2 MBytes   572 Mbits/sec
[  4]   1.00-2.00   sec  71.9 MBytes   603 Mbits/sec
[  4]   2.00-3.00   sec  69.8 MBytes   585 Mbits/sec
[  4]   3.00-4.00   sec  71.9 MBytes   603 Mbits/sec
[  4]   4.00-5.00   sec  68.9 MBytes   578 Mbits/sec
[  4]   5.00-6.00   sec  69.4 MBytes   581 Mbits/sec
[  4]   6.00-7.00   sec  70.2 MBytes   590 Mbits/sec
[  4]   7.00-8.00   sec  75.6 MBytes   634 Mbits/sec
[  4]   8.00-9.00   sec  70.1 MBytes   589 Mbits/sec
[  4]   9.00-10.00  sec  73.8 MBytes   619 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  4]   0.00-10.00  sec   710 MBytes   595 Mbits/sec  sender
[  4]   0.00-10.00  sec   710 MBytes   595 Mbits/sec  receiver

iperf Done.
.

It is interesting also that when NetBSD is ran under XenServer (XCP-NG
actually) in PV mode, benchmarked against the same 8.99.28 version
running on a physical machine, everything on a 1GB interface and
switch, I get maximum saturated line (~ 933Mb/s). When the iperf3
server is on the same XCP-BG guest and the client - a CentOS guest -
the figures approach 2.3Gb/sec.

On Wed, 19 Dec 2018 at 12:36, Chavdar Ivanov  wrote:
>
> The workaround is fine. In the mean time I upgraded my VirtualBox
> installation to 6.0 (released yesterday) and will check again.
>
> While here I did some, admittedly not very scientific, benchmarks on
> network performance under VirtualBox. I started a single guest of a
> different type, had iperf3 installed and running as server on the
> guest and tested the iperf3 client connection from the host. All
> guests were configured to use bridged adapter to the active (WiFi, in
> my case Intel AC-7265, but it shouldn't matter), using the first
> (desktop) Intel emulation (82540EM). The results varied wildly between
> different guests, the best being the latest Linux guests (OpenSUSE
> Tumbleweed and Fedora 29), the worst happened to be NetBSD-current. I
> also tested on a vew systems the difference in speed between the above
> chosen adapter type and the virtio one; this again showed differences
> - NetBSD was better, on some tests by a factor of two, when using
> virtio, whereas OpenBSD was the other way round - the Intel emulation
> was twice as fast. I've attached the log file of some of these
> attempts for reference. I didn't have Guest additions running on any
> of the BSD guests, which perhaps is relevant; the other systems had it
> configured. I also switched the emulation on the NetBSD host from KVM
> to default, as you suggested.
>
> As I said, we shouldn' t be reading too much from this, but it is
> still a point.
>
>
> On Wed, 19 Dec 2018 at 02:35, Masanobu SAITOH  wrote:
> >
> > On 2018/12/18 20:13, Masanobu SAITOH wrote:
> > > Hi!
> > >
> > > On 2018/12/17 19:38, Chavdar Ivanov wrote:
> > >> I went through a series of tests. It is indeed that point the panic
> > >> takes place, the two parts of the screendump are in
> > >>
> > >> http://ci4ic4.tx0.org/nb-panic-wm-03.png and
> > >> http://ci4ic4.tx0.org/nb-panic-wm-04.png .
> > >
> > >   Thanks. This is the workaround code for broken lapic timer
> > > counter which was added in:
> > >
> > >  http://mail-index.netbsd.org/source-changes/2017/11/23/msg089946.html
> > >  
> > > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/lapic.c.diff?r1=1.63&r2=1.64&f=h
> > >
> > > Your VM is configured act as KVM
> > > (See system->acceleration(L) tab or see .box file's "Paravirt provider=")
> > >
> > > I set up my vm to KVM and
> > >
> > >> VirtualBox gives three Intel NIC options:
> > >>
> > >> Intel PRO/1000 MT Desktop (82540EM)
> > >> Intel PRO/1000 T Server   (82543GC)
> > >> Intel PRO/1000 MT Server  (82545EM)
> > >>
> > >> I was able to get a panic with the same kernel from 13/12/2018 only
> > >> when I select the second option:
> > >
> > >   I changed my VM's setting to use 82543GC. I tried hibernation
> > > three times but I couldn't reproduce the problem. I couldn't reproduce
> > > the same problem, but this problem must be exist because you had the
> > > problem.
> > >
> > >   The possibilities are:
> > >  a) VirtualBox's lapic is not good.
> > >  b) Our workaround code is not perfect or somewhere is not good.
> > >  c) any others
> > >
> > > I suspect this problem is not from if_wm.c. but from
> > >> There was a VirtualBox upgrade a few weeks ago, perhaps the problem is 
> > >> there.
> > >
> > >
> > >   I read vbox/src/VBox/Devices/Network/DevE1000.cpp. One of the
> > > difference between 82543GC emulation and other two is that
> > > it generates interrupt when chip reset occurred. If other ne

Re: Panic on a -current from 13/12/2018

2018-12-19 Thread Chavdar Ivanov
The workaround is fine. In the mean time I upgraded my VirtualBox
installation to 6.0 (released yesterday) and will check again.

While here I did some, admittedly not very scientific, benchmarks on
network performance under VirtualBox. I started a single guest of a
different type, had iperf3 installed and running as server on the
guest and tested the iperf3 client connection from the host. All
guests were configured to use bridged adapter to the active (WiFi, in
my case Intel AC-7265, but it shouldn't matter), using the first
(desktop) Intel emulation (82540EM). The results varied wildly between
different guests, the best being the latest Linux guests (OpenSUSE
Tumbleweed and Fedora 29), the worst happened to be NetBSD-current. I
also tested on a vew systems the difference in speed between the above
chosen adapter type and the virtio one; this again showed differences
- NetBSD was better, on some tests by a factor of two, when using
virtio, whereas OpenBSD was the other way round - the Intel emulation
was twice as fast. I've attached the log file of some of these
attempts for reference. I didn't have Guest additions running on any
of the BSD guests, which perhaps is relevant; the other systems had it
configured. I also switched the emulation on the NetBSD host from KVM
to default, as you suggested.

As I said, we shouldn' t be reading too much from this, but it is
still a point.


On Wed, 19 Dec 2018 at 02:35, Masanobu SAITOH  wrote:
>
> On 2018/12/18 20:13, Masanobu SAITOH wrote:
> > Hi!
> >
> > On 2018/12/17 19:38, Chavdar Ivanov wrote:
> >> I went through a series of tests. It is indeed that point the panic
> >> takes place, the two parts of the screendump are in
> >>
> >> http://ci4ic4.tx0.org/nb-panic-wm-03.png and
> >> http://ci4ic4.tx0.org/nb-panic-wm-04.png .
> >
> >   Thanks. This is the workaround code for broken lapic timer
> > counter which was added in:
> >
> >  http://mail-index.netbsd.org/source-changes/2017/11/23/msg089946.html
> >  
> > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/lapic.c.diff?r1=1.63&r2=1.64&f=h
> >
> > Your VM is configured act as KVM
> > (See system->acceleration(L) tab or see .box file's "Paravirt provider=")
> >
> > I set up my vm to KVM and
> >
> >> VirtualBox gives three Intel NIC options:
> >>
> >> Intel PRO/1000 MT Desktop (82540EM)
> >> Intel PRO/1000 T Server   (82543GC)
> >> Intel PRO/1000 MT Server  (82545EM)
> >>
> >> I was able to get a panic with the same kernel from 13/12/2018 only
> >> when I select the second option:
> >
> >   I changed my VM's setting to use 82543GC. I tried hibernation
> > three times but I couldn't reproduce the problem. I couldn't reproduce
> > the same problem, but this problem must be exist because you had the
> > problem.
> >
> >   The possibilities are:
> >  a) VirtualBox's lapic is not good.
> >  b) Our workaround code is not perfect or somewhere is not good.
> >  c) any others
> >
> > I suspect this problem is not from if_wm.c. but from
> >> There was a VirtualBox upgrade a few weeks ago, perhaps the problem is 
> >> there.
> >
> >
> >   I read vbox/src/VBox/Devices/Network/DevE1000.cpp. One of the
> > difference between 82543GC emulation and other two is that
> > it generates interrupt when chip reset occurred. If other network
> > device emulation works well, I suspect that the reset timing in vbox
> > is not good and it makes no update of lapic timer.
> >
> >   Workarounds are:
> >  a) Don't use KVM mode and use "Default" or other.
> > On my Windows7's virtual box, "Default" makes
> > CPUID2_RAZ bit not set. It makes NetBSD recognize
> > it's not on KVM.
>
>   If the problem which lapic timer stops also exist on the "Defalut" mode,
> that workaround isn't used and delay() won't work. If so, b) is the best
> to avoid the problem.
>
> >  b) Use Other than 82543GC.
> >  c) any others
> >
> > BTW, when I use 82543GC emulation, I got the following bug:
> >> makphy0 at wm0 phy 0: Marvell 88E1000 Gigabit PHY, rev. 0
> >> makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> >> makphy1 at wm0 phy 1: Marvell 88E1000 Gigabit PHY, rev. 0
> >> makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> > (snip)
> >> makphy31 at wm0 phy 31: Marvell 88E1000 Gigabit PHY, rev. 0
> >> makphy31: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> >> ifmedia_match: multiple match for 0x20/0xfbff9ff, selected instance 0
> >
> > This _IS_ a bug of VirtualBox's 82543GC emulation.
> > DevE1000Phy.cpp line 568 says:
> >
> >  /* Note: A single PHY is supported, ignore PHYADR */
> >
> > So I recommend all users not to use 82543GC emulation until this PHY
> > bug is fixed.
> >
> >> ..
> >> -rw--- 1 root wheel   2199810 Dec 17 09:24 netbsd.9
> >> -rw--- 1 root wheel 147348504 Dec 17 09:24 netbsd.9.core
> >> /var/crash # gdb netbsd.9
> >> GNU gdb (GDB) 8.0.1
> >> Copyright (C) 2017 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or

Re: Panic on a -current from 13/12/2018

2018-12-18 Thread Masanobu SAITOH

On 2018/12/18 20:13, Masanobu SAITOH wrote:

Hi!

On 2018/12/17 19:38, Chavdar Ivanov wrote:

I went through a series of tests. It is indeed that point the panic
takes place, the two parts of the screendump are in

http://ci4ic4.tx0.org/nb-panic-wm-03.png and
http://ci4ic4.tx0.org/nb-panic-wm-04.png .


  Thanks. This is the workaround code for broken lapic timer
counter which was added in:

 http://mail-index.netbsd.org/source-changes/2017/11/23/msg089946.html
 
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/lapic.c.diff?r1=1.63&r2=1.64&f=h

Your VM is configured act as KVM
(See system->acceleration(L) tab or see .box file's "Paravirt provider=")

I set up my vm to KVM and


VirtualBox gives three Intel NIC options:

Intel PRO/1000 MT Desktop (82540EM)
Intel PRO/1000 T Server   (82543GC)
Intel PRO/1000 MT Server  (82545EM)

I was able to get a panic with the same kernel from 13/12/2018 only
when I select the second option:


  I changed my VM's setting to use 82543GC. I tried hibernation
three times but I couldn't reproduce the problem. I couldn't reproduce
the same problem, but this problem must be exist because you had the
problem.

  The possibilities are:
 a) VirtualBox's lapic is not good.
 b) Our workaround code is not perfect or somewhere is not good.
 c) any others

I suspect this problem is not from if_wm.c. but from

There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.



  I read vbox/src/VBox/Devices/Network/DevE1000.cpp. One of the
difference between 82543GC emulation and other two is that
it generates interrupt when chip reset occurred. If other network
device emulation works well, I suspect that the reset timing in vbox
is not good and it makes no update of lapic timer.

  Workarounds are:
 a) Don't use KVM mode and use "Default" or other.
    On my Windows7's virtual box, "Default" makes
    CPUID2_RAZ bit not set. It makes NetBSD recognize
    it's not on KVM.


 If the problem which lapic timer stops also exist on the "Defalut" mode,
that workaround isn't used and delay() won't work. If so, b) is the best
to avoid the problem.


 b) Use Other than 82543GC.
 c) any others

BTW, when I use 82543GC emulation, I got the following bug:

makphy0 at wm0 phy 0: Marvell 88E1000 Gigabit PHY, rev. 0
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
makphy1 at wm0 phy 1: Marvell 88E1000 Gigabit PHY, rev. 0
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

(snip)

makphy31 at wm0 phy 31: Marvell 88E1000 Gigabit PHY, rev. 0
makphy31: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ifmedia_match: multiple match for 0x20/0xfbff9ff, selected instance 0


This _IS_ a bug of VirtualBox's 82543GC emulation.
DevE1000Phy.cpp line 568 says:

 /* Note: A single PHY is supported, ignore PHYADR */

So I recommend all users not to use 82543GC emulation until this PHY
bug is fixed.


..
-rw--- 1 root wheel   2199810 Dec 17 09:24 netbsd.9
-rw--- 1 root wheel 147348504 Dec 17 09:24 netbsd.9.core
/var/crash # gdb netbsd.9
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd.9...(no debugging symbols found)...done.
(gdb) target kvm netbsd.9.core
0x80222d75 in cpu_reboot ()
(gdb) bt
#0  0x80222d75 in cpu_reboot ()
#1  0x8076e6f7 in db_reboot_cmd ()
#2  0x8076ee92 in db_command ()
#3  0x8076f20c in db_command_loop ()
#4  0x80772b80 in db_trap ()
#5  0x8021f5c2 in kdb_trap ()
#6  0x802244b1 in trap ()
#7  0x8021d568 in alltraps ()
#8  0x8021de45 in breakpoint ()
#9  0x809d54b0 in vpanic ()
#10 0x809d5550 in panic ()
#11 0x802514f0 in lapic_delay ()
#12 0x80353270 in wm_gmii_i82543_readreg ()
#13 0x807b1aa5 in makphy_status ()
#14 0x807b1cf7 in makphy_service ()
#15 0x807a826c in mii_tick ()
#16 0x80360926 in wm_tick ()
#17 0x809b6b96 in callout_softclock ()
#18 0x809aaa55 in softint_dispatch ()
#19 0x8021d21f in Xsoftintr ()


  I rebuilt the kernel (on a different physical host, but there may
have been an update on the 14th there) and tried to get a panic with
the .gdb kernel, but it never happened.

Obviously it is not a problem for me or anyone running NetBSD a

Re: Panic on a -current from 13/12/2018

2018-12-18 Thread Masanobu SAITOH

Hi!

On 2018/12/17 19:38, Chavdar Ivanov wrote:

I went through a series of tests. It is indeed that point the panic
takes place, the two parts of the screendump are in

http://ci4ic4.tx0.org/nb-panic-wm-03.png and
http://ci4ic4.tx0.org/nb-panic-wm-04.png .


 Thanks. This is the workaround code for broken lapic timer
counter which was added in:

http://mail-index.netbsd.org/source-changes/2017/11/23/msg089946.html

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/lapic.c.diff?r1=1.63&r2=1.64&f=h

Your VM is configured act as KVM
(See system->acceleration(L) tab or see .box file's "Paravirt provider=")

I set up my vm to KVM and


VirtualBox gives three Intel NIC options:

Intel PRO/1000 MT Desktop (82540EM)
Intel PRO/1000 T Server   (82543GC)
Intel PRO/1000 MT Server  (82545EM)

I was able to get a panic with the same kernel from 13/12/2018 only
when I select the second option:


 I changed my VM's setting to use 82543GC. I tried hibernation
three times but I couldn't reproduce the problem. I couldn't reproduce
the same problem, but this problem must be exist because you had the
problem.

 The possibilities are:
a) VirtualBox's lapic is not good.
b) Our workaround code is not perfect or somewhere is not good.
c) any others

I suspect this problem is not from if_wm.c. but from

There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.



 I read vbox/src/VBox/Devices/Network/DevE1000.cpp. One of the
difference between 82543GC emulation and other two is that
it generates interrupt when chip reset occurred. If other network
device emulation works well, I suspect that the reset timing in vbox
is not good and it makes no update of lapic timer.

 Workarounds are:
a) Don't use KVM mode and use "Default" or other.
   On my Windows7's virtual box, "Default" makes
   CPUID2_RAZ bit not set. It makes NetBSD recognize
   it's not on KVM.
b) Use Other than 82543GC.
c) any others

BTW, when I use 82543GC emulation, I got the following bug:

makphy0 at wm0 phy 0: Marvell 88E1000 Gigabit PHY, rev. 0
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
makphy1 at wm0 phy 1: Marvell 88E1000 Gigabit PHY, rev. 0
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

(snip)

makphy31 at wm0 phy 31: Marvell 88E1000 Gigabit PHY, rev. 0
makphy31: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ifmedia_match: multiple match for 0x20/0xfbff9ff, selected instance 0


This _IS_ a bug of VirtualBox's 82543GC emulation.
DevE1000Phy.cpp line 568 says:

/* Note: A single PHY is supported, ignore PHYADR */

So I recommend all users not to use 82543GC emulation until this PHY
bug is fixed.


..
-rw--- 1 root wheel   2199810 Dec 17 09:24 netbsd.9
-rw--- 1 root wheel 147348504 Dec 17 09:24 netbsd.9.core
/var/crash # gdb netbsd.9
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd.9...(no debugging symbols found)...done.
(gdb) target kvm netbsd.9.core
0x80222d75 in cpu_reboot ()
(gdb) bt
#0  0x80222d75 in cpu_reboot ()
#1  0x8076e6f7 in db_reboot_cmd ()
#2  0x8076ee92 in db_command ()
#3  0x8076f20c in db_command_loop ()
#4  0x80772b80 in db_trap ()
#5  0x8021f5c2 in kdb_trap ()
#6  0x802244b1 in trap ()
#7  0x8021d568 in alltraps ()
#8  0x8021de45 in breakpoint ()
#9  0x809d54b0 in vpanic ()
#10 0x809d5550 in panic ()
#11 0x802514f0 in lapic_delay ()
#12 0x80353270 in wm_gmii_i82543_readreg ()
#13 0x807b1aa5 in makphy_status ()
#14 0x807b1cf7 in makphy_service ()
#15 0x807a826c in mii_tick ()
#16 0x80360926 in wm_tick ()
#17 0x809b6b96 in callout_softclock ()
#18 0x809aaa55 in softint_dispatch ()
#19 0x8021d21f in Xsoftintr ()


  I rebuilt the kernel (on a different physical host, but there may
have been an update on the 14th there) and tried to get a panic with
the .gdb kernel, but it never happened.

Obviously it is not a problem for me or anyone running NetBSD as a
VirtualBox guest, as using vioif / virtio is almost as twice as fast,
but I reported the panic thinking it may be relevant in other use
cases.


 Thank you for your report!




On Mon

Re: Panic on a -current from 13/12/2018

2018-12-17 Thread Chavdar Ivanov
I went through a series of tests. It is indeed that point the panic
takes place, the two parts of the screendump are in

http://ci4ic4.tx0.org/nb-panic-wm-03.png and
http://ci4ic4.tx0.org/nb-panic-wm-04.png .

VirtualBox gives three Intel NIC options:

Intel PRO/1000 MT Desktop (82540EM)
Intel PRO/1000 T Server   (82543GC)
Intel PRO/1000 MT Server  (82545EM)

I was able to get a panic with the same kernel from 13/12/2018 only
when I select the second option:

..
-rw--- 1 root wheel   2199810 Dec 17 09:24 netbsd.9
-rw--- 1 root wheel 147348504 Dec 17 09:24 netbsd.9.core
/var/crash # gdb netbsd.9
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd.9...(no debugging symbols found)...done.
(gdb) target kvm netbsd.9.core
0x80222d75 in cpu_reboot ()
(gdb) bt
#0  0x80222d75 in cpu_reboot ()
#1  0x8076e6f7 in db_reboot_cmd ()
#2  0x8076ee92 in db_command ()
#3  0x8076f20c in db_command_loop ()
#4  0x80772b80 in db_trap ()
#5  0x8021f5c2 in kdb_trap ()
#6  0x802244b1 in trap ()
#7  0x8021d568 in alltraps ()
#8  0x8021de45 in breakpoint ()
#9  0x809d54b0 in vpanic ()
#10 0x809d5550 in panic ()
#11 0x802514f0 in lapic_delay ()
#12 0x80353270 in wm_gmii_i82543_readreg ()
#13 0x807b1aa5 in makphy_status ()
#14 0x807b1cf7 in makphy_service ()
#15 0x807a826c in mii_tick ()
#16 0x80360926 in wm_tick ()
#17 0x809b6b96 in callout_softclock ()
#18 0x809aaa55 in softint_dispatch ()
#19 0x8021d21f in Xsoftintr ()


 I rebuilt the kernel (on a different physical host, but there may
have been an update on the 14th there) and tried to get a panic with
the .gdb kernel, but it never happened.

Obviously it is not a problem for me or anyone running NetBSD as a
VirtualBox guest, as using vioif / virtio is almost as twice as fast,
but I reported the panic thinking it may be relevant in other use
cases.


On Mon, 17 Dec 2018 at 07:49, Masanobu SAITOH  wrote:
>
> On 2018/12/17 1:09, Chavdar Ivanov wrote:
> > I have no idea. As I said, it is running under VirtualBox on a Windows
> > 10 host; I put the host in hibernation whilst the NetBSD guest is
> > running.
>
> I tested today's -current on VirtualBox 5.2.22 on Windows 7 64bit
> (on Core i7-2600). I tried hybernate(shutdown ->hybernate(H)) a few times
> but I couldn't reproduce the problem yet.
>
>   while (deltat > 0) {
>   xtick = lapic_gettick();
>   if (lapic_broken_periodic && xtick == 0 && otick == 0) {
>   lapic_initclocks();
>   xtick = lapic_gettick();
>   if (xtick == 0)
>   panic("lapic timer stopped ticking");   
>  <=== here!
>   }
>
> If that panic is from this, lapic_broken_periodic must be true, but it's set 
> only
> when the VM is KVM:
> > /*
> >  * Apply workaround for broken periodic timer under KVM
> >  */
> > if (vm_guest == VM_GUEST_KVM) {
> > lapic_broken_periodic = true;
> > lapic_timecounter.tc_quality = -100;
> > aprint_debug_dev(ci->ci_dev,
> > "applying KVM timer workaround\n");
> > }
>
>   Could you try to reproduce the problem and see the panic message?
> ci4ic4-panic-01.png has backtrace and it wiped out the panic message.
>
>   Regards.
>
> > Previously it survived this, using the Intel Desktop NIC
> > emulation within VirtualBox, even my ssh connections (from the host to
> > the guest) remained active. I switched the NIC emulation for the
> > NetBSD guest to virtio-net, now it behaves as before, surviving a
> > hibernation.
> >
> > There was a VirtualBox upgrade a few weeks ago, perhaps the problem is 
> > there.
> > On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu  wrote:
> >>
> >> Hi.
> >>
> >> On 2018/12/16 18:09, Chavdar Ivanov wrote:
> >>> Repeated this morning. Happens when the host hibernates when the
> >>> machine is running. The initial trace is slightly different, but the
> >>> lines with wm_gmii are the same, so for now 

Re: Panic on a -current from 13/12/2018

2018-12-16 Thread Masanobu SAITOH

On 2018/12/17 1:09, Chavdar Ivanov wrote:

I have no idea. As I said, it is running under VirtualBox on a Windows
10 host; I put the host in hibernation whilst the NetBSD guest is
running.


I tested today's -current on VirtualBox 5.2.22 on Windows 7 64bit
(on Core i7-2600). I tried hybernate(shutdown ->hybernate(H)) a few times
but I couldn't reproduce the problem yet.


 while (deltat > 0) {
 xtick = lapic_gettick();
 if (lapic_broken_periodic && xtick == 0 && otick == 0) {
 lapic_initclocks();
 xtick = lapic_gettick();
 if (xtick == 0)
 panic("lapic timer stopped ticking");   
<=== here!
 }


If that panic is from this, lapic_broken_periodic must be true, but it's set 
only
when the VM is KVM:

/*
 * Apply workaround for broken periodic timer under KVM
 */
if (vm_guest == VM_GUEST_KVM) {
lapic_broken_periodic = true;
lapic_timecounter.tc_quality = -100;
aprint_debug_dev(ci->ci_dev,
"applying KVM timer workaround\n");
}


 Could you try to reproduce the problem and see the panic message?
ci4ic4-panic-01.png has backtrace and it wiped out the panic message.

 Regards.


Previously it survived this, using the Intel Desktop NIC
emulation within VirtualBox, even my ssh connections (from the host to
the guest) remained active. I switched the NIC emulation for the
NetBSD guest to virtio-net, now it behaves as before, surviving a
hibernation.

There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.
On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu  wrote:


Hi.

On 2018/12/16 18:09, Chavdar Ivanov wrote:

Repeated this morning. Happens when the host hibernates when the
machine is running. The initial trace is slightly different, but the
lines with wm_gmii are the same, so for now I will switch to a
different NIC emulator.



In your .png:

vpanic()
lapic_delay()
wm_gmii_mdic_readreg()
.
.
.


There is no panic message itself, but I suspect it's:

static void
lapic_delay(unsigned int usec)
{
 int32_t xtick, otick;
 int64_t deltat; /* XXX may want to be 64bit */

 otick = lapic_gettick();

 if (usec <= 0)
 return;
 if (usec <= 25)
 deltat = lapic_delaytab[usec];
 else
 deltat = (lapic_frac_cycle_per_usec * usec) >> 32;

 while (deltat > 0) {
 xtick = lapic_gettick();
 if (lapic_broken_periodic && xtick == 0 && otick == 0) {
 lapic_initclocks();
 xtick = lapic_gettick();
 if (xtick == 0)
 panic("lapic timer stopped ticking");   
<=== here!
 }
 if (xtick > otick)
 deltat -= lapic_tval - (xtick - otick);
 else
 deltat -= otick - xtick;
 otick = xtick;

 x86_pause();
 }
}


Why does it cause?



And yes, it used to survive many hibernations of the hosts before. I
only had to adjust the time after waking the host up.
On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov  wrote:


Hi,

On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
in http://ci4ic4.tx0.org/ci4ic4-panic-01.png

I have the  coredump, if it is of interest. I thought it might be
useful, as it is apparently in the wm driver.

Chavdar
--








--
---
 SAITOH Masanobu (msai...@execsw.org
  msai...@netbsd.org)







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Panic on a -current from 13/12/2018

2018-12-16 Thread Chavdar Ivanov
I have no idea. As I said, it is running under VirtualBox on a Windows
10 host; I put the host in hibernation whilst the NetBSD guest is
running. Previously it survived this, using the Intel Desktop NIC
emulation within VirtualBox, even my ssh connections (from the host to
the guest) remained active. I switched the NIC emulation for the
NetBSD guest to virtio-net, now it behaves as before, surviving a
hibernation.

There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.
On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu  wrote:
>
> Hi.
>
> On 2018/12/16 18:09, Chavdar Ivanov wrote:
> > Repeated this morning. Happens when the host hibernates when the
> > machine is running. The initial trace is slightly different, but the
> > lines with wm_gmii are the same, so for now I will switch to a
> > different NIC emulator.
> >
>
> In your .png:
> >vpanic()
> >lapic_delay()
> >wm_gmii_mdic_readreg()
> >.
> >.
> >.
>
> There is no panic message itself, but I suspect it's:
> > static void
> > lapic_delay(unsigned int usec)
> > {
> > int32_t xtick, otick;
> > int64_t deltat; /* XXX may want to be 64bit */
> >
> > otick = lapic_gettick();
> >
> > if (usec <= 0)
> > return;
> > if (usec <= 25)
> > deltat = lapic_delaytab[usec];
> > else
> > deltat = (lapic_frac_cycle_per_usec * usec) >> 32;
> >
> > while (deltat > 0) {
> > xtick = lapic_gettick();
> > if (lapic_broken_periodic && xtick == 0 && otick == 0) {
> > lapic_initclocks();
> > xtick = lapic_gettick();
> > if (xtick == 0)
> > panic("lapic timer stopped ticking");   
> > <=== here!
> > }
> > if (xtick > otick)
> > deltat -= lapic_tval - (xtick - otick);
> > else
> > deltat -= otick - xtick;
> > otick = xtick;
> >
> > x86_pause();
> > }
> > }
>
> Why does it cause?
>
>
> > And yes, it used to survive many hibernations of the hosts before. I
> > only had to adjust the time after waking the host up.
> > On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov  wrote:
> >>
> >> Hi,
> >>
> >> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
> >> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png
> >>
> >> I have the  coredump, if it is of interest. I thought it might be
> >> useful, as it is apparently in the wm driver.
> >>
> >> Chavdar
> >> --
> >> 
> >
> >
> >
>
>
> --
> ---
> SAITOH Masanobu (msai...@execsw.org
>  msai...@netbsd.org)



-- 



Re: Panic on a -current from 13/12/2018

2018-12-16 Thread SAITOH Masanobu
Hi.

On 2018/12/16 18:09, Chavdar Ivanov wrote:
> Repeated this morning. Happens when the host hibernates when the
> machine is running. The initial trace is slightly different, but the
> lines with wm_gmii are the same, so for now I will switch to a
> different NIC emulator.
> 

In your .png:
>vpanic()
>lapic_delay()
>wm_gmii_mdic_readreg()
>.
>.
>.

There is no panic message itself, but I suspect it's:
> static void
> lapic_delay(unsigned int usec)
> {
> int32_t xtick, otick;
> int64_t deltat; /* XXX may want to be 64bit */
> 
> otick = lapic_gettick();
> 
> if (usec <= 0)
> return;
> if (usec <= 25)
> deltat = lapic_delaytab[usec];
> else
> deltat = (lapic_frac_cycle_per_usec * usec) >> 32;
> 
> while (deltat > 0) {
> xtick = lapic_gettick();
> if (lapic_broken_periodic && xtick == 0 && otick == 0) {
> lapic_initclocks();
> xtick = lapic_gettick();
> if (xtick == 0)
> panic("lapic timer stopped ticking");   
> <=== here!
> }
> if (xtick > otick)
> deltat -= lapic_tval - (xtick - otick);
> else
> deltat -= otick - xtick;
> otick = xtick;
> 
> x86_pause();
> }
> }

Why does it cause?


> And yes, it used to survive many hibernations of the hosts before. I
> only had to adjust the time after waking the host up.
> On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov  wrote:
>>
>> Hi,
>>
>> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
>> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png
>>
>> I have the  coredump, if it is of interest. I thought it might be
>> useful, as it is apparently in the wm driver.
>>
>> Chavdar
>> --
>> 
> 
> 
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Panic on a -current from 13/12/2018

2018-12-16 Thread Chavdar Ivanov
Repeated this morning. Happens when the host hibernates when the
machine is running. The initial trace is slightly different, but the
lines with wm_gmii are the same, so for now I will switch to a
different NIC emulator.

And yes, it used to survive many hibernations of the hosts before. I
only had to adjust the time after waking the host up.
On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov  wrote:
>
> Hi,
>
> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png
>
> I have the  coredump, if it is of interest. I thought it might be
> useful, as it is apparently in the wm driver.
>
> Chavdar
> --
> 



-- 



Re: Panic on a -current from 13/12/2018

2018-12-15 Thread Paul Goyette

On Sat, 15 Dec 2018, Chavdar Ivanov wrote:


Hi,

On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
in http://ci4ic4.tx0.org/ci4ic4-panic-01.png

I have the  coredump, if it is of interest. I thought it might be
useful, as it is apparently in the wm driver.


Oh, dear.

I just updated my one-and-only machine (real hardware, nothing virual)
to yesterday's -current.   And my one-and-only network interface is
(of course) a wm0!

I hope this panic is not frequently encountered.


+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Panic on a -current from 13/12/2018

2018-12-15 Thread Chavdar Ivanov
Hi,

On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
in http://ci4ic4.tx0.org/ci4ic4-panic-01.png

I have the  coredump, if it is of interest. I thought it might be
useful, as it is apparently in the wm driver.

Chavdar
--