Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Don Lewis
On 23 Jan, Mike Tancsa wrote:
> On 1/22/2018 5:13 PM, Don Lewis wrote:
>>>
>>> I am trying an RMA with AMD.
>> 
>> Something else that you might want to try is 12.0-CURRENT.  There might
>> be some changes in HEAD that need to be merged back to 11.1-STABLE.
> 
> It looks like this thread got mention on phorix :) In the comments
> section (comment #9) a post makes reference to
> 
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
> 
> I guess Linux is still working through similar lockups too :(

Yes.  Interesting (and fairly concise) thread here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Pete French

On 23/01/2018 19:08, Mike Tancsa wrote:

It looks like this thread got mention on phorix :) In the comments
section (comment #9) a post makes reference to

http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen

I guess Linux is still working through similar lockups too :(



Interesting - do we have anything like RCU implemented in the kernel 
which might be worth looking at ? From a quick glance it looks like its 
just a software technique, so I cant see which bits of the CPU it's 
tickling that might cause issues though.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Mike Tancsa
On 1/23/2018 2:08 PM, Mike Tancsa wrote:
> On 1/22/2018 5:13 PM, Don Lewis wrote:
>>>
>>> I am trying an RMA with AMD.
>>
>> Something else that you might want to try is 12.0-CURRENT.  There might
>> be some changes in HEAD that need to be merged back to 11.1-STABLE.
> 
> It looks like this thread got mention on phorix :) In the comments
> section (comment #9) a post makes reference to
> 
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
> 
> I guess Linux is still working through similar lockups too :(

Ubuntu has a patch / workaround for these random lockups which
symptomatically sound very similar to what some of us have been experiencing

https://bugs.launchpad.net/linux/+bug/1690085/comments/69

---Mike



-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Mike Tancsa
On 1/23/2018 2:22 PM, Andriy Gapon wrote:
>>
>> ctrl+T shows
> 
> If that works, then maybe you can get procstat -kk -a or a crash dump.
> Maybe this is not a hardware problem at all (or maybe it is).

Unfortunately all 3 CPUs are packed up now and on their way to AMD for
RMA.  As soon as I get some replacements, I will get back to this.  I am
thinking of looking at a ThreadRipper board in the mean time as the Epyc
ones are on 2-4week back order from my suppliers :(


> 
>> load: 1.98  cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
>> load: 0.34  cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
>> load: 0.31  cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
> 
> 
> 


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Andriy Gapon
On 23/01/2018 19:15, Mike Tancsa wrote:
> On 1/22/2018 5:13 PM, Don Lewis wrote:
>> On 22 Jan, Mike Tancsa wrote:
>>> On 1/22/2018 1:41 PM, Peter Moody wrote:
 fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the
 hw.lower_amd64_sharedpage setting to my loader.conf and got a crash
 last night following the familiar high load -> idle. this was with SMT
 re-enabled. no crashdump, so it was the hard crash that I've been
 getting.
>>>
>>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt
>>> need to set mine to 1
>>>

 shrug, I'm at a loss here.
>>>
>>> I am trying an RMA with AMD.
>>
>> Something else that you might want to try is 12.0-CURRENT.  There might
>> be some changes in HEAD that need to be merged back to 11.1-STABLE.
> 
> 
> Temp works as expected now. However, a (similar?) hang building Samba47.
> 
> ctrl+T shows

If that works, then maybe you can get procstat -kk -a or a crash dump.
Maybe this is not a hardware problem at all (or maybe it is).

> load: 1.98  cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47
> load: 0.34  cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47
> load: 0.31  cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47



-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Mike Tancsa
On 1/22/2018 5:13 PM, Don Lewis wrote:
>>
>> I am trying an RMA with AMD.
> 
> Something else that you might want to try is 12.0-CURRENT.  There might
> be some changes in HEAD that need to be merged back to 11.1-STABLE.

It looks like this thread got mention on phorix :) In the comments
section (comment #9) a post makes reference to

http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen

I guess Linux is still working through similar lockups too :(

---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Don Lewis
On 23 Jan, Pete French wrote:
> On 22/01/2018 18:25, Don Lewis wrote:
>> On 22 Jan, Pete French wrote:
>>>
>>>
>>> On 21/01/2018 19:05, Peter Moody wrote:
 hm, so i've got nearly 3 days of uptime with smt disabled.
 unfortunately this means that my otherwise '12' cores is actually only
 '6'. I'm also getting occasional segfaults compiling go programs.
>>>
>>> Isn't go known to have issues on BSD anyway though ? I have seen
>>> complaints of random crashes running go under BSD systems - and
>>> preseumably the go compiler itself is written in go, so those issues
>>> might surface when compiling.
>> 
>> Not that I'm aware of.  I'm not a heavy go user on FreeBSD, but I don't
>> recall any unexpected go crashes and I haven't seen  problems building
>> go on my older AMD machines.
> 
> 
>  From the go 1.9 release notes:
> 
> "Known Issues
> There are some instabilities on FreeBSD that are known but not 
> understood. These can lead to program crashes in rare cases. See issue 
> 15658. Any help in solving this FreeBSD-specific issue would be 
> appreciated."
> 
> ( link is to https://github.com/golang/go/issues/15658 )
> 
> Having said that, we use it internally and have not seen any issues with 
> it ourselves. Just I am wary of the release notes, and that issue report.

Interesting ...

I've only seen problems on my Ryzen machine, which has >= 2x the number
of cores as any of my other machines.  All are AMD CPUs.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Mike Tancsa
On 1/22/2018 5:13 PM, Don Lewis wrote:
> On 22 Jan, Mike Tancsa wrote:
>> On 1/22/2018 1:41 PM, Peter Moody wrote:
>>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the
>>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash
>>> last night following the familiar high load -> idle. this was with SMT
>>> re-enabled. no crashdump, so it was the hard crash that I've been
>>> getting.
>>
>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt
>> need to set mine to 1
>>
>>>
>>> shrug, I'm at a loss here.
>>
>> I am trying an RMA with AMD.
> 
> Something else that you might want to try is 12.0-CURRENT.  There might
> be some changes in HEAD that need to be merged back to 11.1-STABLE.


Temp works as expected now. However, a (similar?) hang building Samba47.

ctrl+T shows


load: 1.98  cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47
load: 0.34  cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47
load: 0.31  cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47

Going to try the RMA route and see if the replacement CPU avoids this
problem.


 # uname -a
FreeBSD amdtestr12.sentex.ca 12.0-CURRENT FreeBSD 12.0-CURRENT #1
r328282: Tue Jan 23 11:34:18 EST 2018
mdtan...@amdtestr12.sentex.ca:/usr/obj/usr/src/amd64.amd64/sys/server  amd64



dev.amdtemp.0.core0.sensor0: 52.6C
dev.amdtemp.0.sensor_offset: 0
dev.amdtemp.0.%parent: hostb0
dev.amdtemp.0.%pnpinfo:
dev.amdtemp.0.%location:
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:
dev.cpu.11.temperature: 52.6C
dev.cpu.10.temperature: 52.6C
dev.cpu.9.temperature: 52.6C
dev.cpu.8.temperature: 52.6C
dev.cpu.7.temperature: 52.6C
dev.cpu.6.temperature: 52.6C
dev.cpu.5.temperature: 52.6C
dev.cpu.4.temperature: 52.6C
dev.cpu.3.temperature: 52.6C
dev.cpu.2.temperature: 52.6C
dev.cpu.1.temperature: 52.6C
dev.cpu.0.temperature: 52.6C





-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Clock occasionally jumps backwards on 11.1-RELEASE

2018-01-23 Thread Alan Somers
On Tue, Jan 23, 2018 at 3:48 AM, Mike Pumford  wrote:

> On 22/01/2018 17:07, Alan Somers wrote:
>
>> Since upgrading my jail server to 11.1-RELEASE, the clock occasionally
>> jumps backwards by 5-35 minutes for no apparent reason.  Has anybody seen
>> something like this?
>>
>> Details
>> =
>>
>> * Happens about once a day on my jail server, and has happened at least
>> once on a separate bhyve server.
>>
>> * The jumps almost always happen between 1 and 3 AM, but I've also seen
>> them happen at 06:30 and 20:15.
>>
>> That's the window when the period scripts are run which if you have a
> default configuration and a lot of jails will put the system under a lot of
> stress.
>

That did not fail to escape my notice.  However, none of the jails'
periodic jobs involve the clock in any way.  And I wouldn't think that a
high CPU load could cause clock drift, could it?  This isn't Windows XP,
after all.


> * I'm using the default ntp.conf file.
>>
>> Are you running ntpd inside the jail or on the jail host? On my jail
> systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside all
> jails) and not inside the jails and the jails then get the accurate time as
> the underlying host has accurate time.
>

Only on the host.

New info: there is a possibility that my NFS server is hanging for awhile.
That would explain my problem's timing.  However, ntpd shouldn't be
accessing any NFS shares, and I wouldn't think that a hung NFS server
should be able to pause the clock.  I'm doing a new experiment that should
be more informative.  But I'll have to wait until the problem recurs to
learn anything.

-Alan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Clock occasionally jumps backwards on 11.1-RELEASE

2018-01-23 Thread Robert Blayzor
On Jan 22, 2018, at 12:07 PM, Alan Somers  wrote:
> 
> * Sometimes the jumps happen immediately after ntpd adds a new server to
> its list, but not always.
> 
> * I'm using the default ntp.conf file.
> 
> * ntpd is running on both, and it should be the only process touching the
> clock.   I have a script running "ntpq -c peers" once a minute, which shows
> the offset for one server suddenly jump to a large negative number.  Then
> the offsets for other servers jump to the same value, then either ntpd
> fixes the clock or exits because the offset is too high.


- Lose ntpd running in jails and run it only on the host. Running in the jail 
is totally unnecessary.

- Is this a bare metal server or VM? Lots of clock issues with VM’s…

- Stagger your periodic jobs on the host and the jail so they don’t all run at 
the same time
  slamming the host.

--
inoc.net!rblayzor
XMPP: rblayzor.AT.inoc.net
PGP:  https://inoc.net/~rblayzor/

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Clock occasionally jumps backwards on 11.1-RELEASE

2018-01-23 Thread Mike Pumford

On 22/01/2018 17:07, Alan Somers wrote:

Since upgrading my jail server to 11.1-RELEASE, the clock occasionally
jumps backwards by 5-35 minutes for no apparent reason.  Has anybody seen
something like this?

Details
=

* Happens about once a day on my jail server, and has happened at least
once on a separate bhyve server.

* The jumps almost always happen between 1 and 3 AM, but I've also seen
them happen at 06:30 and 20:15.

That's the window when the period scripts are run which if you have a 
default configuration and a lot of jails will put the system under a lot 
of stress.

* I'm using the default ntp.conf file.

Are you running ntpd inside the jail or on the jail host? On my jail 
systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside 
all jails) and not inside the jails and the jails then get the accurate 
time as the underlying host has accurate time.


Mike

--
Mike Pumford | Senior Software Engineer

T: +44 (0) 1225 710635

BSQUARE - The business of IoT

www.bsquare.com 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-23 Thread Pete French

On 22/01/2018 18:25, Don Lewis wrote:

On 22 Jan, Pete French wrote:



On 21/01/2018 19:05, Peter Moody wrote:

hm, so i've got nearly 3 days of uptime with smt disabled.
unfortunately this means that my otherwise '12' cores is actually only
'6'. I'm also getting occasional segfaults compiling go programs.


Isn't go known to have issues on BSD anyway though ? I have seen
complaints of random crashes running go under BSD systems - and
preseumably the go compiler itself is written in go, so those issues
might surface when compiling.


Not that I'm aware of.  I'm not a heavy go user on FreeBSD, but I don't
recall any unexpected go crashes and I haven't seen  problems building
go on my older AMD machines.



From the go 1.9 release notes:

"Known Issues
There are some instabilities on FreeBSD that are known but not 
understood. These can lead to program crashes in rare cases. See issue 
15658. Any help in solving this FreeBSD-specific issue would be 
appreciated."


( link is to https://github.com/golang/go/issues/15658 )

Having said that, we use it internally and have not seen any issues with 
it ourselves. Just I am wary of the release notes, and that issue report.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"