Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?)

2018-03-17 Thread Nimrod Levy
Looks like I got almost 4 full weeks before it locked up this morning
:(



On Fri, Feb 23, 2018 at 3:33 PM Nimrod Levy  wrote:

> After a couple of hours of running the iperf commands you were testing
> with, I'm unable to duplicate this so far.
>
> I'm running with FreeBSD stable from 17-Feb with the commits noted in
> https://reviews.freebsd.org/D14347 pulled in.
>
> I've also lowered the memory clock and disabled c-states in the bios.
>
> The bhyve VM is running CentOS.
>
> The system has been up for over 6 days and has been running the iperf3
> loop for over 2 hours.
>
> The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM.
>
> --
> Nimrod
>
>
> On Fri, Feb 23, 2018 at 3:22 PM, Mike Tancsa  wrote:
>
>> Actually I can confirm the same sort of hard lockup happens on my Epyc
>> board with RELENG11.  It also happens in current. I will file a PR and
>> post on freebsd-current in case someone has any suggestions on how to
>> try and figure out whats going on.
>>
>> I upgraded the box to
>> 12.0-CURRENT #0 r329866
>> in order to see if it could avoid the lockup, but same deal.  The vmm
>> driver does seem different when loaded, but the same lock up under load
>>
>> CPU: AMD Ryzen 5 1600X Six-Core Processor(3593.35-MHz
>> K8-class CPU)
>>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>>
>>
>> Features=0x178bfbff
>>
>>
>> Features2=0x7ed8320b
>>   AMD Features=0x2e500800
>>   AMD
>>
>> Features2=0x35c233ff
>>   Structured Extended
>>
>> Features=0x209c01a9
>>   XSAVE Features=0xf
>>   AMD Extended Feature Extensions ID EBX=0x7
>>   SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
>>   TSC: P-state invariant, performance statistics
>>
>>
>> AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0
>> driver bug: Unable to set devclass (class: ppc devname: (unknown))
>> ivhd0:  on acpi0
>> ivhd0: Flag:b0
>> ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
>> ivhd0: Extended features[31:0]:22294ada HATS =
>> 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1
>> DualPortLogSup = 0x2 DualEventLogSup = 0x2
>> ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f
>> DevTblSegSup = 0x3 MarcSup = 0x1
>> ivhd0: supported paging level:7, will use only: 4
>> ivhd0: device range: 0x0 - 0x
>> ivhd0: PCI cap 0x190b640f@0x40 feature:19
>>
>>
>>
>> On 2/23/2018 12:35 PM, Nimrod Levy wrote:
>> > Now that is a fascinating data point. My machine that I've been having
>> > issues with has been running a bhyve vm from the beginning.  I never
>> > made the connection. I'll try throwing some network traffic at the VM
>> > and see if I can make it lock up.
>> >
>> > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa > > > wrote:
>> >
>> > On 2/22/2018 3:41 PM, Mike Tancsa wrote:
>> > > On 2/21/2018 3:04 PM, Mike Tancsa wrote:
>> > >> Not sure if I have found another issue specific to Ryzen, or a
>> bug that
>> > >> manifests itself on Ryzen systems easier.  I installed the latest
>> > >> virtualbox from the ports and was doing some network performance
>> tests
>> > >> between a vm and the hypervisor using iperf3.  The guest is just
>> a
>> > >> RELENG11 image and the network is an em nic bridged to epair1b
>> > >
>> > > This looks possibly related to VirtualBox. Doing the same tests
>> and more
>> > > using bhyve, I dont get any lockup.  Not to mention, network IO
>> is MUCH
>> > > faster.
>> >
>> >
>> > Actually, it just took a little bit longer to lock up the box with
>> bhyve
>> > on RELENG_11 as the hypervisor.   Would be great if anyone can
>> confirm
>> > this locks up their Ryzen boxes ? I tried 2 different boxes to
>> eliminate
>> > a hardware issue.  Also tried a similar test on Ubuntu and I can
>> spin up
>> > 4 instances and run without lockups.
>> >
>> > Just grab a copy of
>> >
>> >
>> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz
>> > <
>> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz
>> >
>> >
>> > and make 2 copies. tmp.raw and tmp2.raw
>> >
>> >
>> > kldload vmm
>> > ifconfig tap0 create
>> > ifconfig tap1 create
>> > ifconfig tap1 up
>> > ifconfig tap0 up
>> > ifconfig bridge0 create addm tap0 addm tap1
>> > ifconfig bridge0 192.168.99.1/24 
>> >
>> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
>> tap0
>> > -d tmp.raw BSD11a
>> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
>> tap1
>> > -d tmp2.raw BSD11b
>> >
>> > Install netperf on the 2 vms and give the vtnet interface
>> > 192.168.99.2/24  and 192.168.99.3/24
>> > 
>> >
>> > In both VMs pkg install iperf3 and start it up as
>> > iperf -s
>> >
>> > In the hy

Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?)

2018-02-23 Thread Nimrod Levy
I'm not using any FreeBSD VMs, only on the bare metal host.  The VM
instance is CentOS.  The network to the VM is a tap bridge.

On Fri, Feb 23, 2018 at 3:44 PM, Mike Tancsa  wrote:

> On 2/23/2018 3:33 PM, Nimrod Levy wrote:
> > After a couple of hours of running the iperf commands you were testing
> > with, I'm unable to duplicate this so far.
> >
> > I'm running with FreeBSD stable from 17-Feb with the commits noted
> > in https://reviews.freebsd.org/D14347
> >  pulled in.
> >
> > I've also lowered the memory clock and disabled c-states in the bios.
> >
> > The bhyve VM is running CentOS.
> >
> > The system has been up for over 6 days and has been running the iperf3
> > loop for over 2 hours.
> >
> > The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of
> RAM.
>
> Hmmm, Interesting. Thank you for testing. I have a slightly different
> chipset (X370-PRO).  Wouldnt think that would make a difference. And
> considering the Epyc also crashes-- although that takes a LOT longer, I
> am not sure whats going on.
>
> There is yet another BIOS update for this board. So trying with that.
> The vmm driver does report things a little different, so we will see
>
> ivhd0:  on acpi0
> ivhd0: Flag:b0
> ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
> ivhd0: Extended features[31:0]:22294ada HATS =
> 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1
> DualPortLogSup = 0x2 DualEventLogSup = 0x2
> ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f
> DevTblSegSup = 0x3 MarcSup = 0x1
> ivhd0: supported paging level:7, will use only: 4
> ivhd0: device range: 0x0 - 0x
> ivhd0: PCI cap 0x190b640f@0x40 feature:19
>
> The load also matters. Try with 2 FreeBSD instances. I am not able to
> quickly crash with just one.
>
> ---Mike
>
>
>
> --
> ---
> Mike Tancsa, tel +1 519 651 3400 x203
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?)

2018-02-23 Thread Mike Tancsa
On 2/23/2018 3:33 PM, Nimrod Levy wrote:
> After a couple of hours of running the iperf commands you were testing
> with, I'm unable to duplicate this so far.
> 
> I'm running with FreeBSD stable from 17-Feb with the commits noted
> in https://reviews.freebsd.org/D14347
>  pulled in.
> 
> I've also lowered the memory clock and disabled c-states in the bios.
> 
> The bhyve VM is running CentOS.
> 
> The system has been up for over 6 days and has been running the iperf3
> loop for over 2 hours.
> 
> The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM.

Hmmm, Interesting. Thank you for testing. I have a slightly different
chipset (X370-PRO).  Wouldnt think that would make a difference. And
considering the Epyc also crashes-- although that takes a LOT longer, I
am not sure whats going on.

There is yet another BIOS update for this board. So trying with that.
The vmm driver does report things a little different, so we will see

ivhd0:  on acpi0
ivhd0: Flag:b0
ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
ivhd0: Extended features[31:0]:22294ada HATS =
0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1
DualPortLogSup = 0x2 DualEventLogSup = 0x2
ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f
DevTblSegSup = 0x3 MarcSup = 0x1
ivhd0: supported paging level:7, will use only: 4
ivhd0: device range: 0x0 - 0x
ivhd0: PCI cap 0x190b640f@0x40 feature:19

The load also matters. Try with 2 FreeBSD instances. I am not able to
quickly crash with just one.

---Mike



-- 
---
Mike Tancsa, tel +1 519 651 3400 x203
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?)

2018-02-23 Thread Nimrod Levy
After a couple of hours of running the iperf commands you were testing
with, I'm unable to duplicate this so far.

I'm running with FreeBSD stable from 17-Feb with the commits noted in
https://reviews.freebsd.org/D14347 pulled in.

I've also lowered the memory clock and disabled c-states in the bios.

The bhyve VM is running CentOS.

The system has been up for over 6 days and has been running the iperf3 loop
for over 2 hours.

The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM.

--
Nimrod


On Fri, Feb 23, 2018 at 3:22 PM, Mike Tancsa  wrote:

> Actually I can confirm the same sort of hard lockup happens on my Epyc
> board with RELENG11.  It also happens in current. I will file a PR and
> post on freebsd-current in case someone has any suggestions on how to
> try and figure out whats going on.
>
> I upgraded the box to
> 12.0-CURRENT #0 r329866
> in order to see if it could avoid the lockup, but same deal.  The vmm
> driver does seem different when loaded, but the same lock up under load
>
> CPU: AMD Ryzen 5 1600X Six-Core Processor(3593.35-MHz
> K8-class CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>
> Features=0x178bfbff SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>
> Features2=0x7ed8320b 1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>   AMD Features=0x2e500800
>   AMD
> Features2=0x35c233ff Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
>   Structured Extended
> Features=0x209c01a9 SMAP,CLFLUSHOPT,SHA>
>   XSAVE Features=0xf
>   AMD Extended Feature Extensions ID EBX=0x7
>   SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
>   TSC: P-state invariant, performance statistics
>
>
> AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0
> driver bug: Unable to set devclass (class: ppc devname: (unknown))
> ivhd0:  on acpi0
> ivhd0: Flag:b0
> ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
> ivhd0: Extended features[31:0]:22294ada HATS =
> 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1
> DualPortLogSup = 0x2 DualEventLogSup = 0x2
> ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f
> DevTblSegSup = 0x3 MarcSup = 0x1
> ivhd0: supported paging level:7, will use only: 4
> ivhd0: device range: 0x0 - 0x
> ivhd0: PCI cap 0x190b640f@0x40 feature:19
>
>
>
> On 2/23/2018 12:35 PM, Nimrod Levy wrote:
> > Now that is a fascinating data point. My machine that I've been having
> > issues with has been running a bhyve vm from the beginning.  I never
> > made the connection. I'll try throwing some network traffic at the VM
> > and see if I can make it lock up.
> >
> > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa  > > wrote:
> >
> > On 2/22/2018 3:41 PM, Mike Tancsa wrote:
> > > On 2/21/2018 3:04 PM, Mike Tancsa wrote:
> > >> Not sure if I have found another issue specific to Ryzen, or a
> bug that
> > >> manifests itself on Ryzen systems easier.  I installed the latest
> > >> virtualbox from the ports and was doing some network performance
> tests
> > >> between a vm and the hypervisor using iperf3.  The guest is just a
> > >> RELENG11 image and the network is an em nic bridged to epair1b
> > >
> > > This looks possibly related to VirtualBox. Doing the same tests
> and more
> > > using bhyve, I dont get any lockup.  Not to mention, network IO is
> MUCH
> > > faster.
> >
> >
> > Actually, it just took a little bit longer to lock up the box with
> bhyve
> > on RELENG_11 as the hypervisor.   Would be great if anyone can
> confirm
> > this locks up their Ryzen boxes ? I tried 2 different boxes to
> eliminate
> > a hardware issue.  Also tried a similar test on Ubuntu and I can
> spin up
> > 4 instances and run without lockups.
> >
> > Just grab a copy of
> >
> > https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RE
> LEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz
> >  1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz>
> >
> > and make 2 copies. tmp.raw and tmp2.raw
> >
> >
> > kldload vmm
> > ifconfig tap0 create
> > ifconfig tap1 create
> > ifconfig tap1 up
> > ifconfig tap0 up
> > ifconfig bridge0 create addm tap0 addm tap1
> > ifconfig bridge0 192.168.99.1/24 
> >
> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
> tap0
> > -d tmp.raw BSD11a
> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
> tap1
> > -d tmp2.raw BSD11b
> >
> > Install netperf on the 2 vms and give the vtnet interface
> > 192.168.99.2/24  and 192.168.99.3/24
> > 
> >
> > In both VMs pkg install iperf3 and start it up as
> > iperf -s
> >
> > In the hypervisor,
> > iperf -t 1 -R -c 192.168.99.2
> > iperf -t 1 -c 192