Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Mark Millard via freebsd-stable
On 2018-Jan-21, at 12:17 PM, Don Lewis  wrote:

> On 20 Jan, Mark Millard wrote:
>> Don Lewis truckman at FreeBSD.org wrote on
>> Sat Jan 20 02:35:40 UTC 2018 :
>> 
>>> The only real problem with the old CPUs is the random segfault problem
>>> and some other random strangeness, like the lang/ghc build almost always
>>> failing.
>> 
>> 
>> At one time you had written
>> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
>> comment #103 on 2017-Oct-09):
>> 
>> QUOTE
>> The ghc build failure seems to be gone after upgrading the a
>> more recent 12.0-CURRENT.  I will try to bisect for the fix
>> when I have a chance.
>> END QUOTE
>> 
>> Did that not pan out? Did you conclude it was
>> hardware-context specific?
> 
> I was never able to reproduce the problem.  It seems like it failed on
> the first ports build run after I replaced the CPU.  When I upgraded the
> OS and ports, the build succeeded.  I tried going back to much earlier
> OS and ports versions, but I could never get the ghc build to fail
> again.  I'm baffled by this ...

Sounds like the overall information is then:

Old CPU: frequent problem building ghc (nearly always
 fails as far as I know)

New CPU: rare problem building ghc
 (possibly never for some softare version combinations?)

(On a Ryzen Threadripper 1950X I've not seen a failure. For the
above I'm including what I observed under Hyper-V for the 1800X
and 1950X as contributing evidence: The 1800X was a early one
and fit the "Old CPU" case above. AMD has stated that
threadrippers never had the problems that other, early Ryzen
CPUs did for heavy compiling use. So far, for me, that seems
true.)

So, it sounds like building ghc is still a good test. Back when
I had access to the 1800X Ryzen system ghc was the most reliable
failure-to-build of what I tried. It still may be useful for
that sort of test activity to classify Ryzen CPUs for the one
type of issue.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Don Lewis
On 21 Jan, Willem Jan Withagen wrote:
> On 21/01/2018 21:24, Don Lewis wrote:
>> On 21 Jan, Willem Jan Withagen wrote:
>>> On 19/01/2018 23:29, Don Lewis wrote:
 On 19 Jan, Pete French wrote:
> Out of interest, is there anyone out there running Ryzen who *hasnt*
> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
> works fine"

 No hangs or silent reboots here with either my original CPU or warranty
 replacement once the shared page fix was in place.
>>>
>>> Perhaps a too weird reference:
>>>
>>> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
>>> But he runs Windows 10, but I haven't heard him complain about anything
>>> like this.
>>> But I'll ask him specific.
>> 
>> Only the BSDs were affected by the shared page issue.  I think Linux
>> already had a guard page.  I don't think Windows was affected by the
>> idle C-state issue.  I suspect it is caused by software not doing the
>> right thing during C-state transitions, but the publicly available
>> documentation from AMD is pretty lacking.  The random segfault issue is
>> primarily triggered by heavy parallel software build loads and how many
>> Windows users do that?
> 
> This is an adobe workstation where several users remote login and do 
> work. So I would assume that the system is seriously (ab)used.
> 
> Adn as expected I'm know aware of any of the detailed things that 
> Windows does while powering into lesser active states.

It might depend on the scheduler details.  On Linux and the FreeBSD ULE
scheduler, runnable threads migrate between CPUs to balance the loading
across all cores.  When I did some experiments to disable that, the rate
of build failures greatly decreased.  AMD has been very vague about the
cause of the problem (a "performance marginality") and resorted to
replacing CPUs with this problem without suggesting any sort of software
workaround.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Don Lewis
On 21 Jan, Peter Moody wrote:
> hm, so i've got nearly 3 days of uptime with smt disabled.
> unfortunately this means that my otherwise '12' cores is actually only
> '6'. I'm also getting occasional segfaults compiling go programs.

Both my original and replacement CPUs croak on go, so I don't think an
RMA is likely to help with that.  Go is a heavy user of threads and my
suspicion is that there is some sort of issue with the locking that is
uses. I'm guessing a memory barrier issue of some sort ...

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ancient FreeBSD update path

2018-01-21 Thread krad
Back everything up. Forget about upgrading you are making your life harder.
Do a ZFS install with BE's then port your apps. It will be quicker less
prone to errors and you will end up with a better system in the end. If the
system is a bit of a rats nest, then this would be the ideal time to
document all its quirks, and tidy up. If it really is that hairy consider
running the old applications in jails ontop of a new pristine OS build.

If its on hardware consider new kit as well if its an important service, or
consolidate it into a jail on something else.

On 20 January 2018 at 01:16, Bakul Shah  wrote:

> On Fri, 19 Jan 2018 13:28:41 +0100 Andrea Brancatelli <
> abrancate...@schema31.it> wrote:
> Andrea Brancatelli writes:
> > Hello guys.
> >
> > I have a couple of ancient FreeBSD install that I have to bring into
> > this century (read either 10.4 or 11.1) :-)
> >
> > I'm talking about a FreeBSD 8.0-RELEASE-p4 and a couple of FreeBSD
> > 9.3-RELEASE-p53.
> >
> > What upgrade strategy would you suggest?
> >
> > Direct jump into the future (8 -> 11)? Progressive steps (8 -> 9 -> 10
> > -> 11)? Boiling water on the HDs? :-)
> >
> > Thanks, any suggestion in more than welcome.
>
> Incremental update will take a long time and if something gets
> messed up in the middle, you will be much worse off. You may
> also not find relevant packages any more for an EOLed release.
> And you may have to solve problems that no longer exist on
> newer packages.
>
> What I would do is to make a backup of everything, make a list
> of installed packages and config files, and do a fresh install
> of the latest release. Then get the critical packages working.
> Then add others as needed.
>
> If possible do this on a separate machine so that you can
> check config/program behavior on the original machine.  When
> you are satisfied, either switch to the other machine or copy
> things back to the original. When one of my computers was
> starting to fall apart, I did this with an inexpensive used
> thinkpad.
>
> One more thing to consider: your ancient machine hardware may
> need to be maintenance/repais/replacement.  A fully
> operational second (temporary) machine gives you a chance to
> try to do maintenace such as remove dust and cat hair
> carefully, check fans and replace them if needed, replace
> disks if older than 4 years, etc.
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Willem Jan Withagen

On 21/01/2018 21:24, Don Lewis wrote:

On 21 Jan, Willem Jan Withagen wrote:

On 19/01/2018 23:29, Don Lewis wrote:

On 19 Jan, Pete French wrote:

Out of interest, is there anyone out there running Ryzen who *hasnt*
seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
works fine"


No hangs or silent reboots here with either my original CPU or warranty
replacement once the shared page fix was in place.


Perhaps a too weird reference:

I have supplied a customer with a Ryzen5 and a 350MB motherboard.
But he runs Windows 10, but I haven't heard him complain about anything
like this.
But I'll ask him specific.


Only the BSDs were affected by the shared page issue.  I think Linux
already had a guard page.  I don't think Windows was affected by the
idle C-state issue.  I suspect it is caused by software not doing the
right thing during C-state transitions, but the publicly available
documentation from AMD is pretty lacking.  The random segfault issue is
primarily triggered by heavy parallel software build loads and how many
Windows users do that?


This is an adobe workstation where several users remote login and do 
work. So I would assume that the system is seriously (ab)used.


Adn as expected I'm know aware of any of the detailed things that 
Windows does while powering into lesser active states.


--WjW


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Don Lewis
On 21 Jan, Willem Jan Withagen wrote:
> On 19/01/2018 23:29, Don Lewis wrote:
>> On 19 Jan, Pete French wrote:
>>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>>> works fine"
>> 
>> No hangs or silent reboots here with either my original CPU or warranty
>> replacement once the shared page fix was in place.
> 
> Perhaps a too weird reference:
> 
> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
> But he runs Windows 10, but I haven't heard him complain about anything 
> like this.
> But I'll ask him specific.

Only the BSDs were affected by the shared page issue.  I think Linux
already had a guard page.  I don't think Windows was affected by the
idle C-state issue.  I suspect it is caused by software not doing the
right thing during C-state transitions, but the publicly available
documentation from AMD is pretty lacking.  The random segfault issue is
primarily triggered by heavy parallel software build loads and how many
Windows users do that?

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Don Lewis
On 20 Jan, Mark Millard wrote:
> Don Lewis truckman at FreeBSD.org wrote on
> Sat Jan 20 02:35:40 UTC 2018 :
> 
>> The only real problem with the old CPUs is the random segfault problem
>> and some other random strangeness, like the lang/ghc build almost always
>> failing.
> 
> 
> At one time you had written
> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
> comment #103 on 2017-Oct-09):
> 
> QUOTE
> The ghc build failure seems to be gone after upgrading the a
> more recent 12.0-CURRENT.  I will try to bisect for the fix
> when I have a chance.
> END QUOTE
> 
> Did that not pan out? Did you conclude it was
> hardware-context specific?

I was never able to reproduce the problem.  It seems like it failed on
the first ports build run after I replaced the CPU.  When I upgraded the
OS and ports, the build succeeded.  I tried going back to much earlier
OS and ports versions, but I could never get the ghc build to fail
again.  I'm baffled by this ...

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Willem Jan Withagen

On 19/01/2018 23:29, Don Lewis wrote:

On 19 Jan, Pete French wrote:

Out of interest, is there anyone out there running Ryzen who *hasnt*
seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
works fine"


No hangs or silent reboots here with either my original CPU or warranty
replacement once the shared page fix was in place.


Perhaps a too weird reference:

I have supplied a customer with a Ryzen5 and a 350MB motherboard.
But he runs Windows 10, but I haven't heard him complain about anything 
like this.

But I'll ask him specific.

--WjW


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Peter Moody
hm, so i've got nearly 3 days of uptime with smt disabled.
unfortunately this means that my otherwise '12' cores is actually only
'6'. I'm also getting occasional segfaults compiling go programs.

should I just RMA this beast again?

On Sun, Jan 21, 2018 at 5:25 AM, Nimrod Levy  wrote:
> almost 2 days uptime with a lower memory clock. still holding my breath,
> but this seems promising.
>
>
>
> On Fri, Jan 19, 2018 at 4:02 PM Nimrod Levy  wrote:
>
>> I can try lowering my memory clock and see what happens.  I'm a little
>> skeptical because I have been able to run memtest with no errors for some
>> time.  I'm glad to give anything a try...
>>
>>
>> On Fri, Jan 19, 2018 at 3:49 PM Mike Tancsa  wrote:
>>
>>> On 1/19/2018 3:23 PM, Ryan Root wrote:
>>> > This looks like the QVL list for your MB ->
>>> >
>>> http://download.gigabyte.us/FileList/Memory/mb_memory_ga-ax370-Gaming5.pdf
>>>
>>> Its an Asus MB, but the memory I have is in the above PDF list
>>>
>>> I dont see CT16G4DFD824A, but I do see other crucial products with
>>> slower clock speeds. Right now I do have it set to 2133 where as it was
>>> 2400 before.
>>>
>>> ---Mike
>>>
>>>
>>> >
>>> >
>>> > On 1/19/2018 12:13 PM, Mike Tancsa wrote:
>>> >> Drag :( I have mine disabled as well as lowering the RAM freq to 2100
>>> >> from 2400.  For me the hangs are infrequent.  Its only been a day and a
>>> >> half, so not sure if its gone or I have been "lucky"... Either ways,
>>> >> this platform feels way too fragile to deploy on anything :(
>>> >>
>>> >>  ---Mike
>>> >>
>>> >> On 1/19/2018 3:08 PM, Nimrod Levy wrote:
>>> >>> Looks like disabling the C- states in the bios didn't change
>>> anything.
>>> >>>
>>> >>> On Wed, Jan 17, 2018 at 9:22 PM Nimrod Levy >> >>> > wrote:
>>> >>>
>>> >>> That looks promising. I just found that seeing in the bios and
>>> >>> disabled it. I'll see how it runs.
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>>
>>> >>> On Wed, Jan 17, 2018, 18:38 Don Lewis >> >>> > wrote:
>>> >>>
>>> >>> On 17 Jan, Nimrod Levy wrote:
>>> >>> > I'm running 11-STABLE from 12/9.  amdtemp works for me.  It
>>> >>> also has the
>>> >>> > systl indicating that it it has the shared page fix. I'm
>>> >>> pretty sure I've
>>> >>> > seen the lockups since then.  I'll update to the latest
>>> STABLE
>>> >>> and see
>>> >>> > what  happens.
>>> >>> >
>>> >>> > One weird thing about my experience is that if I keep
>>> >>> something running
>>> >>> > continuously like the distributed.net <
>>> http://distributed.net>
>>> >>> client on 6 of 12 possible threads,
>>> >>> > it keeps the system up for MUCH longer than without.  This
>>> is
>>> >>> a home server
>>> >>> > and very lightly loaded (one could argue insanely
>>> overpowered
>>> >>> for the use
>>> >>> > case).
>>> >>>
>>> >>> This sounds like the problem with the deep Cx states that has
>>> been
>>> >>> reported by numerous Linux users.  I think some motherboard
>>> >>> brands are
>>> >>> more likely to have the problem.  See:
>>> >>>
>>> http://forum.asrock.com/forum_posts.asp?TID=5963=taichi-x370-with-ubuntu-idle-lock-ups-idle-freeze
>>> >>>
>>> >>> --
>>> >>>
>>> >>> --
>>> >>> Nimrod
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>> --
>>> >>> Nimrod
>>> >>>
>>> >>
>>> >
>>> >
>>> > ___
>>> > freebsd-stable@freebsd.org mailing list
>>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> > To unsubscribe, send any mail to "
>>> freebsd-stable-unsubscr...@freebsd.org"
>>> >
>>> >
>>>
>>>
>>> --
>>> ---
>>> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
>>> Sentex Communications, m...@sentex.net
>>> Providing Internet services since 1994 www.sentex.net
>>> Cambridge, Ontario Canada   http://www.tancsa.com/
>>> ___
>>> freebsd-stable@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>>>
>>
>>
>> --
>>
>> --
>> Nimrod
>>
>
>
> --
>
> --
> Nimrod
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Nimrod Levy
almost 2 days uptime with a lower memory clock. still holding my breath,
but this seems promising.



On Fri, Jan 19, 2018 at 4:02 PM Nimrod Levy  wrote:

> I can try lowering my memory clock and see what happens.  I'm a little
> skeptical because I have been able to run memtest with no errors for some
> time.  I'm glad to give anything a try...
>
>
> On Fri, Jan 19, 2018 at 3:49 PM Mike Tancsa  wrote:
>
>> On 1/19/2018 3:23 PM, Ryan Root wrote:
>> > This looks like the QVL list for your MB ->
>> >
>> http://download.gigabyte.us/FileList/Memory/mb_memory_ga-ax370-Gaming5.pdf
>>
>> Its an Asus MB, but the memory I have is in the above PDF list
>>
>> I dont see CT16G4DFD824A, but I do see other crucial products with
>> slower clock speeds. Right now I do have it set to 2133 where as it was
>> 2400 before.
>>
>> ---Mike
>>
>>
>> >
>> >
>> > On 1/19/2018 12:13 PM, Mike Tancsa wrote:
>> >> Drag :( I have mine disabled as well as lowering the RAM freq to 2100
>> >> from 2400.  For me the hangs are infrequent.  Its only been a day and a
>> >> half, so not sure if its gone or I have been "lucky"... Either ways,
>> >> this platform feels way too fragile to deploy on anything :(
>> >>
>> >>  ---Mike
>> >>
>> >> On 1/19/2018 3:08 PM, Nimrod Levy wrote:
>> >>> Looks like disabling the C- states in the bios didn't change
>> anything.
>> >>>
>> >>> On Wed, Jan 17, 2018 at 9:22 PM Nimrod Levy > >>> > wrote:
>> >>>
>> >>> That looks promising. I just found that seeing in the bios and
>> >>> disabled it. I'll see how it runs.
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>> On Wed, Jan 17, 2018, 18:38 Don Lewis > >>> > wrote:
>> >>>
>> >>> On 17 Jan, Nimrod Levy wrote:
>> >>> > I'm running 11-STABLE from 12/9.  amdtemp works for me.  It
>> >>> also has the
>> >>> > systl indicating that it it has the shared page fix. I'm
>> >>> pretty sure I've
>> >>> > seen the lockups since then.  I'll update to the latest
>> STABLE
>> >>> and see
>> >>> > what  happens.
>> >>> >
>> >>> > One weird thing about my experience is that if I keep
>> >>> something running
>> >>> > continuously like the distributed.net <
>> http://distributed.net>
>> >>> client on 6 of 12 possible threads,
>> >>> > it keeps the system up for MUCH longer than without.  This
>> is
>> >>> a home server
>> >>> > and very lightly loaded (one could argue insanely
>> overpowered
>> >>> for the use
>> >>> > case).
>> >>>
>> >>> This sounds like the problem with the deep Cx states that has
>> been
>> >>> reported by numerous Linux users.  I think some motherboard
>> >>> brands are
>> >>> more likely to have the problem.  See:
>> >>>
>> http://forum.asrock.com/forum_posts.asp?TID=5963=taichi-x370-with-ubuntu-idle-lock-ups-idle-freeze
>> >>>
>> >>> --
>> >>>
>> >>> --
>> >>> Nimrod
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> --
>> >>> Nimrod
>> >>>
>> >>
>> >
>> >
>> > ___
>> > freebsd-stable@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "
>> freebsd-stable-unsubscr...@freebsd.org"
>> >
>> >
>>
>>
>> --
>> ---
>> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
>> Sentex Communications, m...@sentex.net
>> Providing Internet services since 1994 www.sentex.net
>> Cambridge, Ontario Canada   http://www.tancsa.com/
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>>
>
>
> --
>
> --
> Nimrod
>


-- 

--
Nimrod
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Mark Millard via freebsd-stable
Don Lewis truckman at FreeBSD.org wrote on
Sat Jan 20 02:35:40 UTC 2018 :

> The only real problem with the old CPUs is the random segfault problem
> and some other random strangeness, like the lang/ghc build almost always
> failing.


At one time you had written
( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
comment #103 on 2017-Oct-09):

QUOTE
The ghc build failure seems to be gone after upgrading the a
more recent 12.0-CURRENT.  I will try to bisect for the fix
when I have a chance.
END QUOTE

Did that not pan out? Did you conclude it was
hardware-context specific?


===
Mark Millard
marklmi26-fbsd at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"