Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-03 Thread Mark Millard
On 2017-Mar-3, at 6:17 AM, Rodney W. Grimes  wrote:

>> On 2017-Mar-2, at 7:19 AM, Steve Kargl  
>> wrote:
>> 
>> On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote:
>>> On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote:
 
> Summary of the transition interval:
> 
> So for powerpc64 (and powerpc?) It is a good
> idea to avoid anything that is after -r313254
> and before -r314474 in head. (Would this be
> appropriate for a UPDATING notice given its
> span?)
> 
> There may be other architectures that might have
> a similar status(?): the last fixes involved were
> not in Machine Dependent code. (Some architectures
> are apparently insensitive to the errors, such as
> amd64).
> 
 
 When following current you are expected to be on the newest revision,
 so I don't think mentioning interim broken releases makes much sense.
 
>>> 
>>> Documenting the range may aid those bisecting src/ to find a bug. 
>>> How is one to know that anything in the range that Mark points
>>> out should be skipped on powerpc64?
>>> 
>>> -- 
>>> Steve
>> 
>> I have tested with a TARGET_ARCH=powerpc -r314473 build and
>> its kernel version has locking problems like
>> TARGET_ARCH=powerpc64 does for that version.
>> 
>> [Note: This was run on a PowerMac G5 so-called "Quad Core"
>> so most of the memory was ignored.]
>> 
>> Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474
>> or later as of the new locking.
>> 
>> I've not explicitly tested other architectures. As I remember
>> armv6/v7 are classified as having some from of a weak memory
>> model compared to the likes of amd64. If so armv6/v7 might be
>> candidates for having problems. There might be other candidates.
> 
> I also had locking issues on amd64 around this build time that
> sent me down a week long rabbit hole chasing what I thought was
> a bug in the new AMD/IOMMU code.  IMHO if we can at least
> flag prior snapshot builds as "Broken for reason X" it might
> save someone some time and time is a one way depleting resource
> usually worth saving if possible.
> 
> If needed I can dig out the specifc build.  Oh, nvm, let me
> just do that, it was r309302.  This revision I beleive is
> a november snapshot.  It has kernel panics due to spinlock
> timeout and sparatic deadlock that is undetected.
> 
> 
> -- 
> Rod Grimes rgrimes at 
> freebsd.org

Sounds like that amd64 -r309302 problem might be another
good example.

Locking tends to be central and heavily used. When it
breaks many other things tend to also end up broken.
This is the sort of context I was thinking about if it
goes on very long.

I'm not sure that the -r309302 problem would reproduce
at -r313259 so -r309302 might be a separate issue. I've
no clue what -rdd range had the amd64 -r309302
problem.


Details that I'm aware of are something like:

-r309302 is dated 2016-Nov-30. (your reported amd64 locking problem's -rdd)

-r312973 is dated 2017-Jan-30. (example add of an atomic_fcmpset implementation)
   (getting ready for machine independent usage)

-r313259 is dated 2017-Feb-5.  (last before "machine independent use of 
atomic_fcmpset"?)
   (powerpc64 and powerpc working here)

-r313260 is dated 2017-Feb-5.  (first machine-independent usage of 
atomic_fcmpset?)

. . . (various machine-independent atomic_fcmpset usage check-ins) . . .

-r313271 is dated 2017-Feb-5.  (observed powerpc64 failures for this version)
   (powerpc would fail too)

. . . (various machine-independent atomic_fcmpset usage check-ins) . . .
. . . (powerpc64 [and powerpc] continuing to fail) . . .

-r314474 is dated 2017-Mar-1.  (powerpc64 and powerpc started working)



===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-03 Thread Rodney W. Grimes
> On 2017-Mar-2, at 7:19 AM, Steve Kargl  
> wrote:
> 
> On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote:
> > On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote:
> >> 
> >>> Summary of the transition interval:
> >>> 
> >>> So for powerpc64 (and powerpc?) It is a good
> >>> idea to avoid anything that is after -r313254
> >>> and before -r314474 in head. (Would this be
> >>> appropriate for a UPDATING notice given its
> >>> span?)
> >>> 
> >>> There may be other architectures that might have
> >>> a similar status(?): the last fixes involved were
> >>> not in Machine Dependent code. (Some architectures
> >>> are apparently insensitive to the errors, such as
> >>> amd64).
> >>> 
> >> 
> >> When following current you are expected to be on the newest revision,
> >> so I don't think mentioning interim broken releases makes much sense.
> >> 
> > 
> > Documenting the range may aid those bisecting src/ to find a bug. 
> > How is one to know that anything in the range that Mark points
> > out should be skipped on powerpc64?
> > 
> > -- 
> > Steve
> 
> I have tested with a TARGET_ARCH=powerpc -r314473 build and
> its kernel version has locking problems like
> TARGET_ARCH=powerpc64 does for that version.
> 
> [Note: This was run on a PowerMac G5 so-called "Quad Core"
> so most of the memory was ignored.]
> 
> Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474
> or later as of the new locking.
> 
> I've not explicitly tested other architectures. As I remember
> armv6/v7 are classified as having some from of a weak memory
> model compared to the likes of amd64. If so armv6/v7 might be
> candidates for having problems. There might be other candidates.

I also had locking issues on amd64 around this build time that
sent me down a week long rabbit hole chasing what I thought was
a bug in the new AMD/IOMMU code.  IMHO if we can at least
flag prior snapshot builds as "Broken for reason X" it might
save someone some time and time is a one way depleting resource
usually worth saving if possible.

If needed I can dig out the specifc build.  Oh, nvm, let me
just do that, it was r309302.  This revision I beleive is
a november snapshot.  It has kernel panics due to spinlock
timeout and sparatic deadlock that is undetected.


-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-03 Thread Mark Millard
On 2017-Mar-2, at 7:19 AM, Steve Kargl  
wrote:

On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote:
> On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote:
>> 
>>> Summary of the transition interval:
>>> 
>>> So for powerpc64 (and powerpc?) It is a good
>>> idea to avoid anything that is after -r313254
>>> and before -r314474 in head. (Would this be
>>> appropriate for a UPDATING notice given its
>>> span?)
>>> 
>>> There may be other architectures that might have
>>> a similar status(?): the last fixes involved were
>>> not in Machine Dependent code. (Some architectures
>>> are apparently insensitive to the errors, such as
>>> amd64).
>>> 
>> 
>> When following current you are expected to be on the newest revision,
>> so I don't think mentioning interim broken releases makes much sense.
>> 
> 
> Documenting the range may aid those bisecting src/ to find a bug. 
> How is one to know that anything in the range that Mark points
> out should be skipped on powerpc64?
> 
> -- 
> Steve

I have tested with a TARGET_ARCH=powerpc -r314473 build and
its kernel version has locking problems like
TARGET_ARCH=powerpc64 does for that version.

[Note: This was run on a PowerMac G5 so-called "Quad Core"
so most of the memory was ignored.]

Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474
or later as of the new locking.

I've not explicitly tested other architectures. As I remember
armv6/v7 are classified as having some from of a weak memory
model compared to the likes of amd64. If so armv6/v7 might be
candidates for having problems. There might be other candidates.


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-02 Thread Steve Kargl
On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote:
> On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote:
> > 
> > Summary of the transition interval:
> > 
> > So for powerpc64 (and powerpc?) It is a good
> > idea to avoid anything that is after -r313254
> > and before -r314474 in head. (Would this be
> > appropriate for a UPDATING notice given its
> > span?)
> > 
> > There may be other architectures that might have
> > a similar status(?): the last fixes involved were
> > not in Machine Dependent code. (Some architectures
> > are apparently insensitive to the errors, such as
> > amd64).
> > 
> 
> When following current you are expected to be on the newest revision,
> so I don't think mentioning interim broken releases makes much sense.
> 

Documenting the range may aid those bisecting src/ to find a bug. 
How is one to know that anything in the range that Mark points
out should be skipped on powerpc64?

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-02 Thread Mateusz Guzik
On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote:
> 
> On 2017-Feb-28, at 10:13 PM, Mateusz Guzik  wrote:
> 
> On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote:
> >> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote:
> >>> Thus the PowerMac G5 so-called "Quad Core" is back to
> >>> -r313254 without your patches. (The "Quad Core" really has
> >>> two processors, each with 2 cores.)
> >>> 
> >> 
> >> 
> >> Thanks a lot for testing. I'll have to think what to do with it, worst
> >> case I'll #ifdef changes with powerpc.
> >> 
> > 
> > Should be fixed with r314474. Got a real powerpc to test on (60 cores),
> > was able to lock it up in seconds. Now it is perfectly stablle.
> > 
> > -- 
> > Mateusz Guzik 
> 
> The updated so-called "Quad Core" PowerMac G5 used for
> TARGET_ARCH=powerpc64 was able to do a self hosted
> buildworld buildkernel for -r314479 just fine.
> 

Cool.

> Thanks much for the fixes: Now I can track head again
> for powerpc64.
> 

Well it was my breakage to begin with.

> 
> Summary of the transition interval:
> 
> So for powerpc64 (and powerpc?) It is a good
> idea to avoid anything that is after -r313254
> and before -r314474 in head. (Would this be
> appropriate for a UPDATING notice given its
> span?)
> 
> There may be other architectures that might have
> a similar status(?): the last fixes involved were
> not in Machine Dependent code. (Some architectures
> are apparently insensitive to the errors, such as
> amd64).
> 

When following current you are expected to be on the newest revision,
so I don't think mentioning interim broken releases makes much sense.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-01 Thread Mark Millard

On 2017-Feb-28, at 10:13 PM, Mateusz Guzik  wrote:

On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote:
>> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote:
>>> Thus the PowerMac G5 so-called "Quad Core" is back to
>>> -r313254 without your patches. (The "Quad Core" really has
>>> two processors, each with 2 cores.)
>>> 
>> 
>> 
>> Thanks a lot for testing. I'll have to think what to do with it, worst
>> case I'll #ifdef changes with powerpc.
>> 
> 
> Should be fixed with r314474. Got a real powerpc to test on (60 cores),
> was able to lock it up in seconds. Now it is perfectly stablle.
> 
> -- 
> Mateusz Guzik 

The updated so-called "Quad Core" PowerMac G5 used for
TARGET_ARCH=powerpc64 was able to do a self hosted
buildworld buildkernel for -r314479 just fine.

Thanks much for the fixes: Now I can track head again
for powerpc64.


Summary of the transition interval:

So for powerpc64 (and powerpc?) It is a good
idea to avoid anything that is after -r313254
and before -r314474 in head. (Would this be
appropriate for a UPDATING notice given its
span?)

There may be other architectures that might have
a similar status(?): the last fixes involved were
not in Machine Dependent code. (Some architectures
are apparently insensitive to the errors, such as
amd64).

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-28 Thread Mateusz Guzik
On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote:
> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote:
> > Thus the PowerMac G5 so-called "Quad Core" is back to
> > -r313254 without your patches. (The "Quad Core" really has
> > two processors, each with 2 cores.)
> > 
> 
> 
> Thanks a lot for testing. I'll have to think what to do with it, worst
> case I'll #ifdef changes with powerpc.
> 

Should be fixed with r314474. Got a real powerpc to test on (60 cores),
was able to lock it up in seconds. Now it is perfectly stablle.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-25 Thread Mateusz Guzik
On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote:
> Thus the PowerMac G5 so-called "Quad Core" is back to
> -r313254 without your patches. (The "Quad Core" really has
> two processors, each with 2 cores.)
> 


Thanks a lot for testing. I'll have to think what to do with it, worst
case I'll #ifdef changes with powerpc.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-25 Thread Mark Millard
On 2017-Feb-25, at 5:49 AM, Mark Millard  wrote:

> On 2017-Feb-25, at 1:05 AM, Mark Millard  wrote:
> 
>> On 2017-Feb-24, at 11:46 PM, Mark Millard  wrote:
>> 
>>> On 2017-Feb-24, at 8:25 PM, Mark Millard  wrote:
>>> 
 On 2017-Feb-24, at 4:23 PM, Mateusz Guzik  wrote:
> 
> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
>> [Back to the powerpc64 context.]
>> 
>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
>> 
>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
 [Note: I experiment with clang based powerpc64 builds,
 reporting problems that I find. Justin is familiar
 with this, as is Nathan.]
 
 I tried to update the PowerMac G5 (a so-called "Quad Core")
 that I have access to from head -r312761 to -r313864 and
 ended up with random panics and hang ups in fairly short
 order after booting.
 
 Some approximate bisecting for the kernel lead to:
 (sometimes getting part way into a buildkernel attempt
 for a different version before a failure happens)
 
 -r313266: works (just before use of atomic_fcmpset)
 vs.
 -r313271: fails (last of the "use atomic_fcmpset" check-ins)
 
 (I did not try -r313268 through -r313270 as the use was
 gradually added.)
 
 So I'm currently running a -r313864 world with a -r313266
 kernel.
 
 No kernel that I tried that was from before -r313266 had the
 problems.
 
 Any kernel that I tried that was from after -r313271 had the
 problems.
 
 Of course I did not try them all in other direction. :)
 
>>> 
>>> I found that spin mutexes were not properly handling this, fixed in
>>> r313996.
>>> 
>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>>> fcmpset to simulate failures. Everything works, while it would easily
>>> fail without the patch.
>>> 
>>> That said, I hope this concludes the 'missing check for not-reread value
>>> of failed fcmpset' saga.
>>> 
>>> -- 
>>> Mateusz Guzik 
>> 
>> -r313999 is an improvement for powerpc64: it boots and I can
>> log in on the old PowerMac G5 so-called "Quad Core".
>> 
>> But, e.g., buildworld buildkernel eventually hangs and later
>> the powerpc64 panics for "spin lock held too long".
>> 
> 
> Allright, play time is over.
> 
> Can you please:
> 1. verify r313254 is stable for you
> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
> the test?
> 
> This is a workaround which effectively disables the powerpc-specific
> primitive and makes it use a cmpset wrapper instead. I don't have the
> hardware to test right now and my attempts to boot in qemu also failed.
> 
> That said, does not look like there are general fcmpset bugs left and
> the remaining issue seems powerpc-specific.
> 
> If this works, I'll commit the workaround for the time being as in few
> weeks I'd like to start merging the work back to stable/11.
> 
> -- 
> Mateusz Guzik 
 
 I've started a self-hosted powerpc64 -r313254 build
 based on running the -r313266 kernel. (The context 
 sometimes do cross builds in is tied up with other
 things. -r313266 is what my prior bisection came up
 with as the last appearently-working kernel at the
 time.)
 
 So it will be a while before I have a -r313254 in
 place to try: the self-hosted build takes longer
 and so will not be installed for a while.
 
 To judge stability I'll probably have -e313254 build
 the patched update that you want me to test, initially
 doing a cleanworld. So that too will take a while.
 
 (The above wording presumes all goes well.)
 
 I'll let you know as I go along if I run into anything
 interesting.
 
 
 My builds are rebuilding both world and kernel since
 what turns into /usr/include/sys/* has changes in your
 patch.
 
 The builds are without MALLOC_PRODUCTION but are
 otherwise not debug builds.
 
 
 I've not seen anything indicating that anyone has
 been trying TARGET_ARCH=powerpc. I've been trying
 TARGET_ARCH=powerpc64 .
 
 While I do not have access to a true
 TARGET_ARCH=powerpc machine currently, such a build
 can be used on a PowerMac G5 so-called "Quad Core".
 So I could eventually build and try such on the one
 powerpc family machine that I currently have access
 to.
 
 clang 3.9.1 has a significant code generation problem
 for TARGET_ARCH=powerpc and so I'd have to use
 a gcc 4.2.1 based build for that sort 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-25 Thread Mark Millard
On 2017-Feb-25, at 1:05 AM, Mark Millard  wrote:

> On 2017-Feb-24, at 11:46 PM, Mark Millard  wrote:
> 
>> On 2017-Feb-24, at 8:25 PM, Mark Millard  wrote:
>> 
>>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik  wrote:
 
 On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
> [Back to the powerpc64 context.]
> 
> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
> 
>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>> 
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>> 
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>> 
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>> 
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>> 
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>> 
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>> 
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>> 
>>> Of course I did not try them all in other direction. :)
>>> 
>> 
>> I found that spin mutexes were not properly handling this, fixed in
>> r313996.
>> 
>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>> fcmpset to simulate failures. Everything works, while it would easily
>> fail without the patch.
>> 
>> That said, I hope this concludes the 'missing check for not-reread value
>> of failed fcmpset' saga.
>> 
>> -- 
>> Mateusz Guzik 
> 
> -r313999 is an improvement for powerpc64: it boots and I can
> log in on the old PowerMac G5 so-called "Quad Core".
> 
> But, e.g., buildworld buildkernel eventually hangs and later
> the powerpc64 panics for "spin lock held too long".
> 
 
 Allright, play time is over.
 
 Can you please:
 1. verify r313254 is stable for you
 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
 https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
 the test?
 
 This is a workaround which effectively disables the powerpc-specific
 primitive and makes it use a cmpset wrapper instead. I don't have the
 hardware to test right now and my attempts to boot in qemu also failed.
 
 That said, does not look like there are general fcmpset bugs left and
 the remaining issue seems powerpc-specific.
 
 If this works, I'll commit the workaround for the time being as in few
 weeks I'd like to start merging the work back to stable/11.
 
 -- 
 Mateusz Guzik 
>>> 
>>> I've started a self-hosted powerpc64 -r313254 build
>>> based on running the -r313266 kernel. (The context 
>>> sometimes do cross builds in is tied up with other
>>> things. -r313266 is what my prior bisection came up
>>> with as the last appearently-working kernel at the
>>> time.)
>>> 
>>> So it will be a while before I have a -r313254 in
>>> place to try: the self-hosted build takes longer
>>> and so will not be installed for a while.
>>> 
>>> To judge stability I'll probably have -e313254 build
>>> the patched update that you want me to test, initially
>>> doing a cleanworld. So that too will take a while.
>>> 
>>> (The above wording presumes all goes well.)
>>> 
>>> I'll let you know as I go along if I run into anything
>>> interesting.
>>> 
>>> 
>>> My builds are rebuilding both world and kernel since
>>> what turns into /usr/include/sys/* has changes in your
>>> patch.
>>> 
>>> The builds are without MALLOC_PRODUCTION but are
>>> otherwise not debug builds.
>>> 
>>> 
>>> I've not seen anything indicating that anyone has
>>> been trying TARGET_ARCH=powerpc. I've been trying
>>> TARGET_ARCH=powerpc64 .
>>> 
>>> While I do not have access to a true
>>> TARGET_ARCH=powerpc machine currently, such a build
>>> can be used on a PowerMac G5 so-called "Quad Core".
>>> So I could eventually build and try such on the one
>>> powerpc family machine that I currently have access
>>> to.
>>> 
>>> clang 3.9.1 has a significant code generation problem
>>> for TARGET_ARCH=powerpc and so I'd have to use
>>> a gcc 4.2.1 based build for that sort of experiment.
>>> (There is no xtoolchain for 32-bit powerpc.)
>>> 
>>> I use clang 3.9.1 or xtoolchain for
>>> TARGET_ARCH=powerpc64 and have been using clang 3.9.1
>>> in recent times. My primary 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-25 Thread Mark Millard
On 2017-Feb-24, at 11:46 PM, Mark Millard  wrote:

> On 2017-Feb-24, at 8:25 PM, Mark Millard  wrote:
> 
>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik  wrote:
>>> 
>>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
 [Back to the powerpc64 context.]
 
 On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
 
> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>> [Note: I experiment with clang based powerpc64 builds,
>> reporting problems that I find. Justin is familiar
>> with this, as is Nathan.]
>> 
>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>> that I have access to from head -r312761 to -r313864 and
>> ended up with random panics and hang ups in fairly short
>> order after booting.
>> 
>> Some approximate bisecting for the kernel lead to:
>> (sometimes getting part way into a buildkernel attempt
>> for a different version before a failure happens)
>> 
>> -r313266: works (just before use of atomic_fcmpset)
>> vs.
>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>> 
>> (I did not try -r313268 through -r313270 as the use was
>> gradually added.)
>> 
>> So I'm currently running a -r313864 world with a -r313266
>> kernel.
>> 
>> No kernel that I tried that was from before -r313266 had the
>> problems.
>> 
>> Any kernel that I tried that was from after -r313271 had the
>> problems.
>> 
>> Of course I did not try them all in other direction. :)
>> 
> 
> I found that spin mutexes were not properly handling this, fixed in
> r313996.
> 
> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> fcmpset to simulate failures. Everything works, while it would easily
> fail without the patch.
> 
> That said, I hope this concludes the 'missing check for not-reread value
> of failed fcmpset' saga.
> 
> -- 
> Mateusz Guzik 
 
 -r313999 is an improvement for powerpc64: it boots and I can
 log in on the old PowerMac G5 so-called "Quad Core".
 
 But, e.g., buildworld buildkernel eventually hangs and later
 the powerpc64 panics for "spin lock held too long".
 
>>> 
>>> Allright, play time is over.
>>> 
>>> Can you please:
>>> 1. verify r313254 is stable for you
>>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
>>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
>>> the test?
>>> 
>>> This is a workaround which effectively disables the powerpc-specific
>>> primitive and makes it use a cmpset wrapper instead. I don't have the
>>> hardware to test right now and my attempts to boot in qemu also failed.
>>> 
>>> That said, does not look like there are general fcmpset bugs left and
>>> the remaining issue seems powerpc-specific.
>>> 
>>> If this works, I'll commit the workaround for the time being as in few
>>> weeks I'd like to start merging the work back to stable/11.
>>> 
>>> -- 
>>> Mateusz Guzik 
>> 
>> I've started a self-hosted powerpc64 -r313254 build
>> based on running the -r313266 kernel. (The context 
>> sometimes do cross builds in is tied up with other
>> things. -r313266 is what my prior bisection came up
>> with as the last appearently-working kernel at the
>> time.)
>> 
>> So it will be a while before I have a -r313254 in
>> place to try: the self-hosted build takes longer
>> and so will not be installed for a while.
>> 
>> To judge stability I'll probably have -e313254 build
>> the patched update that you want me to test, initially
>> doing a cleanworld. So that too will take a while.
>> 
>> (The above wording presumes all goes well.)
>> 
>> I'll let you know as I go along if I run into anything
>> interesting.
>> 
>> 
>> My builds are rebuilding both world and kernel since
>> what turns into /usr/include/sys/* has changes in your
>> patch.
>> 
>> The builds are without MALLOC_PRODUCTION but are
>> otherwise not debug builds.
>> 
>> 
>> I've not seen anything indicating that anyone has
>> been trying TARGET_ARCH=powerpc. I've been trying
>> TARGET_ARCH=powerpc64 .
>> 
>> While I do not have access to a true
>> TARGET_ARCH=powerpc machine currently, such a build
>> can be used on a PowerMac G5 so-called "Quad Core".
>> So I could eventually build and try such on the one
>> powerpc family machine that I currently have access
>> to.
>> 
>> clang 3.9.1 has a significant code generation problem
>> for TARGET_ARCH=powerpc and so I'd have to use
>> a gcc 4.2.1 based build for that sort of experiment.
>> (There is no xtoolchain for 32-bit powerpc.)
>> 
>> I use clang 3.9.1 or xtoolchain for
>> TARGET_ARCH=powerpc64 and have been using clang 3.9.1
>> in recent times. My primary powerpc family use has
>> been to experiment with building based on the
>> modern libc++ and reporting issues discovered in the
>> attempts. This explains the clang/xtoolchain context.
>> 
>> clang 3.9.1 has 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-24 Thread Mark Millard
On 2017-Feb-24, at 8:25 PM, Mark Millard  wrote:

> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik  wrote:
>> 
>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
>>> [Back to the powerpc64 context.]
>>> 
>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
>>> 
 On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> [Note: I experiment with clang based powerpc64 builds,
> reporting problems that I find. Justin is familiar
> with this, as is Nathan.]
> 
> I tried to update the PowerMac G5 (a so-called "Quad Core")
> that I have access to from head -r312761 to -r313864 and
> ended up with random panics and hang ups in fairly short
> order after booting.
> 
> Some approximate bisecting for the kernel lead to:
> (sometimes getting part way into a buildkernel attempt
> for a different version before a failure happens)
> 
> -r313266: works (just before use of atomic_fcmpset)
> vs.
> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> 
> (I did not try -r313268 through -r313270 as the use was
> gradually added.)
> 
> So I'm currently running a -r313864 world with a -r313266
> kernel.
> 
> No kernel that I tried that was from before -r313266 had the
> problems.
> 
> Any kernel that I tried that was from after -r313271 had the
> problems.
> 
> Of course I did not try them all in other direction. :)
> 
 
 I found that spin mutexes were not properly handling this, fixed in
 r313996.
 
 Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
 fcmpset to simulate failures. Everything works, while it would easily
 fail without the patch.
 
 That said, I hope this concludes the 'missing check for not-reread value
 of failed fcmpset' saga.
 
 -- 
 Mateusz Guzik 
>>> 
>>> -r313999 is an improvement for powerpc64: it boots and I can
>>> log in on the old PowerMac G5 so-called "Quad Core".
>>> 
>>> But, e.g., buildworld buildkernel eventually hangs and later
>>> the powerpc64 panics for "spin lock held too long".
>>> 
>> 
>> Allright, play time is over.
>> 
>> Can you please:
>> 1. verify r313254 is stable for you
>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
>> the test?
>> 
>> This is a workaround which effectively disables the powerpc-specific
>> primitive and makes it use a cmpset wrapper instead. I don't have the
>> hardware to test right now and my attempts to boot in qemu also failed.
>> 
>> That said, does not look like there are general fcmpset bugs left and
>> the remaining issue seems powerpc-specific.
>> 
>> If this works, I'll commit the workaround for the time being as in few
>> weeks I'd like to start merging the work back to stable/11.
>> 
>> -- 
>> Mateusz Guzik 
> 
> I've started a self-hosted powerpc64 -r313254 build
> based on running the -r313266 kernel. (The context 
> sometimes do cross builds in is tied up with other
> things. -r313266 is what my prior bisection came up
> with as the last appearently-working kernel at the
> time.)
> 
> So it will be a while before I have a -r313254 in
> place to try: the self-hosted build takes longer
> and so will not be installed for a while.
> 
> To judge stability I'll probably have -e313254 build
> the patched update that you want me to test, initially
> doing a cleanworld. So that too will take a while.
> 
> (The above wording presumes all goes well.)
> 
> I'll let you know as I go along if I run into anything
> interesting.
> 
> 
> My builds are rebuilding both world and kernel since
> what turns into /usr/include/sys/* has changes in your
> patch.
> 
> The builds are without MALLOC_PRODUCTION but are
> otherwise not debug builds.
> 
> 
> I've not seen anything indicating that anyone has
> been trying TARGET_ARCH=powerpc. I've been trying
> TARGET_ARCH=powerpc64 .
> 
> While I do not have access to a true
> TARGET_ARCH=powerpc machine currently, such a build
> can be used on a PowerMac G5 so-called "Quad Core".
> So I could eventually build and try such on the one
> powerpc family machine that I currently have access
> to.
> 
> clang 3.9.1 has a significant code generation problem
> for TARGET_ARCH=powerpc and so I'd have to use
> a gcc 4.2.1 based build for that sort of experiment.
> (There is no xtoolchain for 32-bit powerpc.)
> 
> I use clang 3.9.1 or xtoolchain for
> TARGET_ARCH=powerpc64 and have been using clang 3.9.1
> in recent times. My primary powerpc family use has
> been to experiment with building based on the
> modern libc++ and reporting issues discovered in the
> attempts. This explains the clang/xtoolchain context.
> 
> clang 3.9.1 has major problems for C++ exception
> handling for both powerpc64 and powerpc but a
> lot of FreeBSD is independent of throwing C++
> exceptions. By contrast xtoolchain-based works
> for C++ 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-24 Thread Mark Millard
On 2017-Feb-24, at 4:23 PM, Mateusz Guzik  wrote:
> 
> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
>> [Back to the powerpc64 context.]
>> 
>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
>> 
>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
 [Note: I experiment with clang based powerpc64 builds,
 reporting problems that I find. Justin is familiar
 with this, as is Nathan.]
 
 I tried to update the PowerMac G5 (a so-called "Quad Core")
 that I have access to from head -r312761 to -r313864 and
 ended up with random panics and hang ups in fairly short
 order after booting.
 
 Some approximate bisecting for the kernel lead to:
 (sometimes getting part way into a buildkernel attempt
 for a different version before a failure happens)
 
 -r313266: works (just before use of atomic_fcmpset)
 vs.
 -r313271: fails (last of the "use atomic_fcmpset" check-ins)
 
 (I did not try -r313268 through -r313270 as the use was
 gradually added.)
 
 So I'm currently running a -r313864 world with a -r313266
 kernel.
 
 No kernel that I tried that was from before -r313266 had the
 problems.
 
 Any kernel that I tried that was from after -r313271 had the
 problems.
 
 Of course I did not try them all in other direction. :)
 
>>> 
>>> I found that spin mutexes were not properly handling this, fixed in
>>> r313996.
>>> 
>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>>> fcmpset to simulate failures. Everything works, while it would easily
>>> fail without the patch.
>>> 
>>> That said, I hope this concludes the 'missing check for not-reread value
>>> of failed fcmpset' saga.
>>> 
>>> -- 
>>> Mateusz Guzik 
>> 
>> -r313999 is an improvement for powerpc64: it boots and I can
>> log in on the old PowerMac G5 so-called "Quad Core".
>> 
>> But, e.g., buildworld buildkernel eventually hangs and later
>> the powerpc64 panics for "spin lock held too long".
>> 
> 
> Allright, play time is over.
> 
> Can you please:
> 1. verify r313254 is stable for you
> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
> the test?
> 
> This is a workaround which effectively disables the powerpc-specific
> primitive and makes it use a cmpset wrapper instead. I don't have the
> hardware to test right now and my attempts to boot in qemu also failed.
> 
> That said, does not look like there are general fcmpset bugs left and
> the remaining issue seems powerpc-specific.
> 
> If this works, I'll commit the workaround for the time being as in few
> weeks I'd like to start merging the work back to stable/11.
> 
> -- 
> Mateusz Guzik 

I've started a self-hosted powerpc64 -r313254 build
based on running the -r313266 kernel. (The context 
sometimes do cross builds in is tied up with other
things. -r313266 is what my prior bisection came up
with as the last appearently-working kernel at the
time.)

So it will be a while before I have a -r313254 in
place to try: the self-hosted build takes longer
and so will not be installed for a while.

To judge stability I'll probably have -e313254 build
the patched update that you want me to test, initially
doing a cleanworld. So that too will take a while.

(The above wording presumes all goes well.)

I'll let you know as I go along if I run into anything
interesting.


My builds are rebuilding both world and kernel since
what turns into /usr/include/sys/* has changes in your
patch.

The builds are without MALLOC_PRODUCTION but are
otherwise not debug builds.


I've not seen anything indicating that anyone has
been trying TARGET_ARCH=powerpc. I've been trying
TARGET_ARCH=powerpc64 .

While I do not have access to a true
TARGET_ARCH=powerpc machine currently, such a build
can be used on a PowerMac G5 so-called "Quad Core".
So I could eventually build and try such on the one
powerpc family machine that I currently have access
to.

clang 3.9.1 has a significant code generation problem
for TARGET_ARCH=powerpc and so I'd have to use
a gcc 4.2.1 based build for that sort of experiment.
(There is no xtoolchain for 32-bit powerpc.)

I use clang 3.9.1 or xtoolchain for
TARGET_ARCH=powerpc64 and have been using clang 3.9.1
in recent times. My primary powerpc family use has
been to experiment with building based on the
modern libc++ and reporting issues discovered in the
attempts. This explains the clang/xtoolchain context.

clang 3.9.1 has major problems for C++ exception
handling for both powerpc64 and powerpc but a
lot of FreeBSD is independent of throwing C++
exceptions. By contrast xtoolchain-based works
for C++ exception handling but lib32 fails
to operate when built by a xtoolchain build.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-24 Thread Mateusz Guzik
On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
> [Back to the powerpc64 context.]
> 
> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
> 
> > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> >> [Note: I experiment with clang based powerpc64 builds,
> >> reporting problems that I find. Justin is familiar
> >> with this, as is Nathan.]
> >> 
> >> I tried to update the PowerMac G5 (a so-called "Quad Core")
> >> that I have access to from head -r312761 to -r313864 and
> >> ended up with random panics and hang ups in fairly short
> >> order after booting.
> >> 
> >> Some approximate bisecting for the kernel lead to:
> >> (sometimes getting part way into a buildkernel attempt
> >> for a different version before a failure happens)
> >> 
> >> -r313266: works (just before use of atomic_fcmpset)
> >> vs.
> >> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> >> 
> >> (I did not try -r313268 through -r313270 as the use was
> >> gradually added.)
> >> 
> >> So I'm currently running a -r313864 world with a -r313266
> >> kernel.
> >> 
> >> No kernel that I tried that was from before -r313266 had the
> >> problems.
> >> 
> >> Any kernel that I tried that was from after -r313271 had the
> >> problems.
> >> 
> >> Of course I did not try them all in other direction. :)
> >> 
> > 
> > I found that spin mutexes were not properly handling this, fixed in
> > r313996.
> > 
> > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> > fcmpset to simulate failures. Everything works, while it would easily
> > fail without the patch.
> > 
> > That said, I hope this concludes the 'missing check for not-reread value
> > of failed fcmpset' saga.
> > 
> > -- 
> > Mateusz Guzik 
> 
> -r313999 is an improvement for powerpc64: it boots and I can
> log in on the old PowerMac G5 so-called "Quad Core".
> 
> But, e.g., buildworld buildkernel eventually hangs and later
> the powerpc64 panics for "spin lock held too long".
> 

Allright, play time is over.

Can you please:
1. verify r313254 is stable for you
2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
the test?

This is a workaround which effectively disables the powerpc-specific
primitive and makes it use a cmpset wrapper instead. I don't have the
hardware to test right now and my attempts to boot in qemu also failed.

That said, does not look like there are general fcmpset bugs left and
the remaining issue seems powerpc-specific.

If this works, I'll commit the workaround for the time being as in few
weeks I'd like to start merging the work back to stable/11.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-21 Thread Mark Millard
[Back to the powerpc64 context.]

On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:

> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>> [Note: I experiment with clang based powerpc64 builds,
>> reporting problems that I find. Justin is familiar
>> with this, as is Nathan.]
>> 
>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>> that I have access to from head -r312761 to -r313864 and
>> ended up with random panics and hang ups in fairly short
>> order after booting.
>> 
>> Some approximate bisecting for the kernel lead to:
>> (sometimes getting part way into a buildkernel attempt
>> for a different version before a failure happens)
>> 
>> -r313266: works (just before use of atomic_fcmpset)
>> vs.
>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>> 
>> (I did not try -r313268 through -r313270 as the use was
>> gradually added.)
>> 
>> So I'm currently running a -r313864 world with a -r313266
>> kernel.
>> 
>> No kernel that I tried that was from before -r313266 had the
>> problems.
>> 
>> Any kernel that I tried that was from after -r313271 had the
>> problems.
>> 
>> Of course I did not try them all in other direction. :)
>> 
> 
> I found that spin mutexes were not properly handling this, fixed in
> r313996.
> 
> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> fcmpset to simulate failures. Everything works, while it would easily
> fail without the patch.
> 
> That said, I hope this concludes the 'missing check for not-reread value
> of failed fcmpset' saga.
> 
> -- 
> Mateusz Guzik 

-r313999 is an improvement for powerpc64: it boots and I can
log in on the old PowerMac G5 so-called "Quad Core".

But, e.g., buildworld buildkernel eventually hangs and later
the powerpc64 panics for "spin lock held too long".

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mateusz Guzik
On Mon, Feb 20, 2017 at 09:39:32PM -0800, Mark Millard wrote:
> Looks like some kernel binary interface (as seen by
> emulators/virtualbox-ose-addition ) has changed:
> rebuilding emulators/virtualbox-ose-addition removed
> the booting crash but uname -apKU still lists 1200021
> and 2100021 for the kernel and world for -r313999,
> just like for -r313864.
> 

I think this is r313992.

I don't see why __FreeBSD_version would be modified for this. You are
expected to always recompilel your modules while tracking -current.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mark Millard
On 2017-Feb-20, at 6:36 PM, Mark Millard  wrote:

> On 2017-Feb-20, at 3:35 PM, Mateusz Guzik  wrote:
> 
>> On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote:
>>> On 2017-Feb-20, at 2:58 PM, Mark Millard  wrote:
>>> 
 On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
 
> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>> [Note: I experiment with clang based powerpc64 builds,
>> reporting problems that I find. Justin is familiar
>> with this, as is Nathan.]
>> 
>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>> that I have access to from head -r312761 to -r313864 and
>> ended up with random panics and hang ups in fairly short
>> order after booting.
>> 
>> Some approximate bisecting for the kernel lead to:
>> (sometimes getting part way into a buildkernel attempt
>> for a different version before a failure happens)
>> 
>> -r313266: works (just before use of atomic_fcmpset)
>> vs.
>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>> 
>> (I did not try -r313268 through -r313270 as the use was
>> gradually added.)
>> 
>> So I'm currently running a -r313864 world with a -r313266
>> kernel.
>> 
>> No kernel that I tried that was from before -r313266 had the
>> problems.
>> 
>> Any kernel that I tried that was from after -r313271 had the
>> problems.
>> 
>> Of course I did not try them all in other direction. :)
>> 
> 
> I found that spin mutexes were not properly handling this, fixed in
> r313996.
> 
> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> fcmpset to simulate failures. Everything works, while it would easily
> fail without the patch.
> 
> That said, I hope this concludes the 'missing check for not-reread value
> of failed fcmpset' saga.
> 
> -- 
> Mateusz Guzik 
 
 I tried to update from -r313864 to -r313999 in my amd64 context
 (a VirtualBox machine under macOS) but it now crashes late in
 the boot sequence (after it processes a dump if I make one but
 before I can log in).
 
 This update was via my usual explicit svnlite update; buildworld
 buildkernel; etc. production style build of world and kernel,
 including use of MALLOC_PRODUCTION.
 
 The window shows:
 
 _vm_map_lock+0xf
 vm_map_wire+0x32
 rtROMemObjNativeLockInMap+0x8c
 rtROMemObjNativeLockUser+0x51
 RTR0MemObjLockUserTag+0x231
 vbglR0HGCMInternalPreprocessCall+0x65d
 vbglR0HGCMInternalCall+0x17c
 vgdrvIoCtl_HGCMCall+0x43f
 VGDrvCommonIoCtl+0x261
 vgdrvFreeBSDIOCtl+0x2cd
 devfs_ioctl+0xae
 VOP_IOCTL_APV+0x88
 vn_ioctl+0x161
 devfs_ioctl_f+0x1f
 kern_ioctl+0x280
 sys_ioctl+0x13f
 amd64_syscall+0x397
 Xfast_syscall+0xfb
>>> 
>>> More detail from booting with the -r313864 kernel.old
>>> and using kgdb on what the dump produced:
>>> 
>>> # kgdb kernel.debug /var/crash/vmcore.
>>> /var/crash/vmcore.0/var/crash/vmcore.last
>>> # kgdb kernel.debug /var/crash/vmcore.0
>>> GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc.
>>> GDB is free software, covered by the GNU General Public License, and you are
>>> welcome to change it and/or distribute copies of it under certain 
>>> conditions.
>>> Type "show copying" to see the conditions.
>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>> This GDB was configured as "amd64-marcel-freebsd"...
>>> 
>>> Unread portion of the kernel message buffer:
>>> <118>Starting vboxservice.
>>> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 
>>> 18:37:45) release log
>>> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z
>>> <118>00:00:00.000162 main OS Product: FreeBSD
>>> <118>00:00:00.000171 main OS Release: 12.0-CURRENT
>>> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT  r313999M
>>> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService
>>> <118>00:00:00.000194 main Process ID: 609
>>> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE)
>>> 
>>> 
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 2; apic id = 02
>>> fault virtual address   = 0xd6
>>> fault code  = supervisor read data, page not present
>>> instruction pointer = 0x20:0x80d4ebaf
>>> stack pointer   = 0x28:0xfe0122e2bef0
>>> frame pointer   = 0x28:0xfe0122e2bf00
>>> code segment= base 0x0, limit 0xf, type 0x1b
>>>   = DPL 0, pres 1, long 1, def32 0, gran 1
>>> processor eflags= interrupt enabled, resume, IOPL = 0
>>> current process = 609 (VBoxService)
>>> 
>> 
>> 
>> 
>>> #9  0x80eb6be1 in calltrap () at 
>>> /usr/src/sys/amd64/amd64/exception.S:236
>>> #10 0x80d4ebaf in _vm_map_lock 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mark Millard
On 2017-Feb-20, at 3:35 PM, Mateusz Guzik  wrote:

> On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote:
>> On 2017-Feb-20, at 2:58 PM, Mark Millard  wrote:
>> 
>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
>>> 
 On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> [Note: I experiment with clang based powerpc64 builds,
> reporting problems that I find. Justin is familiar
> with this, as is Nathan.]
> 
> I tried to update the PowerMac G5 (a so-called "Quad Core")
> that I have access to from head -r312761 to -r313864 and
> ended up with random panics and hang ups in fairly short
> order after booting.
> 
> Some approximate bisecting for the kernel lead to:
> (sometimes getting part way into a buildkernel attempt
> for a different version before a failure happens)
> 
> -r313266: works (just before use of atomic_fcmpset)
> vs.
> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> 
> (I did not try -r313268 through -r313270 as the use was
> gradually added.)
> 
> So I'm currently running a -r313864 world with a -r313266
> kernel.
> 
> No kernel that I tried that was from before -r313266 had the
> problems.
> 
> Any kernel that I tried that was from after -r313271 had the
> problems.
> 
> Of course I did not try them all in other direction. :)
> 
 
 I found that spin mutexes were not properly handling this, fixed in
 r313996.
 
 Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
 fcmpset to simulate failures. Everything works, while it would easily
 fail without the patch.
 
 That said, I hope this concludes the 'missing check for not-reread value
 of failed fcmpset' saga.
 
 -- 
 Mateusz Guzik 
>>> 
>>> I tried to update from -r313864 to -r313999 in my amd64 context
>>> (a VirtualBox machine under macOS) but it now crashes late in
>>> the boot sequence (after it processes a dump if I make one but
>>> before I can log in).
>>> 
>>> This update was via my usual explicit svnlite update; buildworld
>>> buildkernel; etc. production style build of world and kernel,
>>> including use of MALLOC_PRODUCTION.
>>> 
>>> The window shows:
>>> 
>>> _vm_map_lock+0xf
>>> vm_map_wire+0x32
>>> rtROMemObjNativeLockInMap+0x8c
>>> rtROMemObjNativeLockUser+0x51
>>> RTR0MemObjLockUserTag+0x231
>>> vbglR0HGCMInternalPreprocessCall+0x65d
>>> vbglR0HGCMInternalCall+0x17c
>>> vgdrvIoCtl_HGCMCall+0x43f
>>> VGDrvCommonIoCtl+0x261
>>> vgdrvFreeBSDIOCtl+0x2cd
>>> devfs_ioctl+0xae
>>> VOP_IOCTL_APV+0x88
>>> vn_ioctl+0x161
>>> devfs_ioctl_f+0x1f
>>> kern_ioctl+0x280
>>> sys_ioctl+0x13f
>>> amd64_syscall+0x397
>>> Xfast_syscall+0xfb
>> 
>> More detail from booting with the -r313864 kernel.old
>> and using kgdb on what the dump produced:
>> 
>> # kgdb kernel.debug /var/crash/vmcore.
>> /var/crash/vmcore.0/var/crash/vmcore.last
>> # kgdb kernel.debug /var/crash/vmcore.0
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>> 
>> Unread portion of the kernel message buffer:
>> <118>Starting vboxservice.
>> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 
>> 18:37:45) release log
>> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z
>> <118>00:00:00.000162 main OS Product: FreeBSD
>> <118>00:00:00.000171 main OS Release: 12.0-CURRENT
>> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT  r313999M
>> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService
>> <118>00:00:00.000194 main Process ID: 609
>> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE)
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 2; apic id = 02
>> fault virtual address   = 0xd6
>> fault code  = supervisor read data, page not present
>> instruction pointer = 0x20:0x80d4ebaf
>> stack pointer   = 0x28:0xfe0122e2bef0
>> frame pointer   = 0x28:0xfe0122e2bf00
>> code segment= base 0x0, limit 0xf, type 0x1b
>>= DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags= interrupt enabled, resume, IOPL = 0
>> current process = 609 (VBoxService)
>> 
> 
> 
> 
>> #9  0x80eb6be1 in calltrap () at 
>> /usr/src/sys/amd64/amd64/exception.S:236
>> #10 0x80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at 
>> /usr/src/sys/vm/vm_map.c:501
> 
> The function is:
> void
> _vm_map_lock(vm_map_t map, const char *file, int line)
> {
> 
>if 

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mateusz Guzik
On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote:
> On 2017-Feb-20, at 2:58 PM, Mark Millard  wrote:
> 
> > On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
> > 
> >> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> >>> [Note: I experiment with clang based powerpc64 builds,
> >>> reporting problems that I find. Justin is familiar
> >>> with this, as is Nathan.]
> >>> 
> >>> I tried to update the PowerMac G5 (a so-called "Quad Core")
> >>> that I have access to from head -r312761 to -r313864 and
> >>> ended up with random panics and hang ups in fairly short
> >>> order after booting.
> >>> 
> >>> Some approximate bisecting for the kernel lead to:
> >>> (sometimes getting part way into a buildkernel attempt
> >>> for a different version before a failure happens)
> >>> 
> >>> -r313266: works (just before use of atomic_fcmpset)
> >>> vs.
> >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> >>> 
> >>> (I did not try -r313268 through -r313270 as the use was
> >>> gradually added.)
> >>> 
> >>> So I'm currently running a -r313864 world with a -r313266
> >>> kernel.
> >>> 
> >>> No kernel that I tried that was from before -r313266 had the
> >>> problems.
> >>> 
> >>> Any kernel that I tried that was from after -r313271 had the
> >>> problems.
> >>> 
> >>> Of course I did not try them all in other direction. :)
> >>> 
> >> 
> >> I found that spin mutexes were not properly handling this, fixed in
> >> r313996.
> >> 
> >> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> >> fcmpset to simulate failures. Everything works, while it would easily
> >> fail without the patch.
> >> 
> >> That said, I hope this concludes the 'missing check for not-reread value
> >> of failed fcmpset' saga.
> >> 
> >> -- 
> >> Mateusz Guzik 
> > 
> > I tried to update from -r313864 to -r313999 in my amd64 context
> > (a VirtualBox machine under macOS) but it now crashes late in
> > the boot sequence (after it processes a dump if I make one but
> > before I can log in).
> > 
> > This update was via my usual explicit svnlite update; buildworld
> > buildkernel; etc. production style build of world and kernel,
> > including use of MALLOC_PRODUCTION.
> > 
> > The window shows:
> > 
> > _vm_map_lock+0xf
> > vm_map_wire+0x32
> > rtROMemObjNativeLockInMap+0x8c
> > rtROMemObjNativeLockUser+0x51
> > RTR0MemObjLockUserTag+0x231
> > vbglR0HGCMInternalPreprocessCall+0x65d
> > vbglR0HGCMInternalCall+0x17c
> > vgdrvIoCtl_HGCMCall+0x43f
> > VGDrvCommonIoCtl+0x261
> > vgdrvFreeBSDIOCtl+0x2cd
> > devfs_ioctl+0xae
> > VOP_IOCTL_APV+0x88
> > vn_ioctl+0x161
> > devfs_ioctl_f+0x1f
> > kern_ioctl+0x280
> > sys_ioctl+0x13f
> > amd64_syscall+0x397
> > Xfast_syscall+0xfb
> 
> More detail from booting with the -r313864 kernel.old
> and using kgdb on what the dump produced:
> 
> # kgdb kernel.debug /var/crash/vmcore.
> /var/crash/vmcore.0/var/crash/vmcore.last
> # kgdb kernel.debug /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> <118>Starting vboxservice.
> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 
> 18:37:45) release log
> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z
> <118>00:00:00.000162 main OS Product: FreeBSD
> <118>00:00:00.000171 main OS Release: 12.0-CURRENT
> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT  r313999M
> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService
> <118>00:00:00.000194 main Process ID: 609
> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE)
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 02
> fault virtual address   = 0xd6
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80d4ebaf
> stack pointer   = 0x28:0xfe0122e2bef0
> frame pointer   = 0x28:0xfe0122e2bf00
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 609 (VBoxService)
> 



> #9  0x80eb6be1 in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:236
> #10 0x80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at 
> /usr/src/sys/vm/vm_map.c:501

The function is:
void
_vm_map_lock(vm_map_t map, const char *file, int line)
{

if (map->system_map)
mtx_lock_flags_(>system_mtx, 0, file, line);
else

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mark Millard
On 2017-Feb-20, at 2:58 PM, Mark Millard  wrote:

> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:
> 
>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>> 
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>> 
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>> 
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>> 
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>> 
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>> 
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>> 
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>> 
>>> Of course I did not try them all in other direction. :)
>>> 
>> 
>> I found that spin mutexes were not properly handling this, fixed in
>> r313996.
>> 
>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>> fcmpset to simulate failures. Everything works, while it would easily
>> fail without the patch.
>> 
>> That said, I hope this concludes the 'missing check for not-reread value
>> of failed fcmpset' saga.
>> 
>> -- 
>> Mateusz Guzik 
> 
> I tried to update from -r313864 to -r313999 in my amd64 context
> (a VirtualBox machine under macOS) but it now crashes late in
> the boot sequence (after it processes a dump if I make one but
> before I can log in).
> 
> This update was via my usual explicit svnlite update; buildworld
> buildkernel; etc. production style build of world and kernel,
> including use of MALLOC_PRODUCTION.
> 
> The window shows:
> 
> _vm_map_lock+0xf
> vm_map_wire+0x32
> rtROMemObjNativeLockInMap+0x8c
> rtROMemObjNativeLockUser+0x51
> RTR0MemObjLockUserTag+0x231
> vbglR0HGCMInternalPreprocessCall+0x65d
> vbglR0HGCMInternalCall+0x17c
> vgdrvIoCtl_HGCMCall+0x43f
> VGDrvCommonIoCtl+0x261
> vgdrvFreeBSDIOCtl+0x2cd
> devfs_ioctl+0xae
> VOP_IOCTL_APV+0x88
> vn_ioctl+0x161
> devfs_ioctl_f+0x1f
> kern_ioctl+0x280
> sys_ioctl+0x13f
> amd64_syscall+0x397
> Xfast_syscall+0xfb

More detail from booting with the -r313864 kernel.old
and using kgdb on what the dump produced:

# kgdb kernel.debug /var/crash/vmcore.
/var/crash/vmcore.0/var/crash/vmcore.last
# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Starting vboxservice.
<118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 
18:37:45) release log
<118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z
<118>00:00:00.000162 main OS Product: FreeBSD
<118>00:00:00.000171 main OS Release: 12.0-CURRENT
<118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT  r313999M
<118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService
<118>00:00:00.000194 main Process ID: 609
<118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE)


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0xd6
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80d4ebaf
stack pointer   = 0x28:0xfe0122e2bef0
frame pointer   = 0x28:0xfe0122e2bf00
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 609 (VBoxService)

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/modules/vboxguest.ko...done.
Loaded symbols for /boot/modules/vboxguest.ko
#0  doadump (textdump=0) at pcpu.h:232
232 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  doadump (textdump=0) at pcpu.h:232
#1  0x8039dd0b in db_dump (dummy=, dummy2=, dummy3=, dummy4=) at 
/usr/src/sys/ddb/db_command.c:546
#2  

Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mark Millard
On 2017-Feb-20, at 11:10 AM, Mateusz Guzik  wrote:

> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>> [Note: I experiment with clang based powerpc64 builds,
>> reporting problems that I find. Justin is familiar
>> with this, as is Nathan.]
>> 
>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>> that I have access to from head -r312761 to -r313864 and
>> ended up with random panics and hang ups in fairly short
>> order after booting.
>> 
>> Some approximate bisecting for the kernel lead to:
>> (sometimes getting part way into a buildkernel attempt
>> for a different version before a failure happens)
>> 
>> -r313266: works (just before use of atomic_fcmpset)
>> vs.
>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>> 
>> (I did not try -r313268 through -r313270 as the use was
>> gradually added.)
>> 
>> So I'm currently running a -r313864 world with a -r313266
>> kernel.
>> 
>> No kernel that I tried that was from before -r313266 had the
>> problems.
>> 
>> Any kernel that I tried that was from after -r313271 had the
>> problems.
>> 
>> Of course I did not try them all in other direction. :)
>> 
> 
> I found that spin mutexes were not properly handling this, fixed in
> r313996.
> 
> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> fcmpset to simulate failures. Everything works, while it would easily
> fail without the patch.
> 
> That said, I hope this concludes the 'missing check for not-reread value
> of failed fcmpset' saga.
> 
> -- 
> Mateusz Guzik 

I tried to update from -r313864 to -r313999 in my amd64 context
(a VirtualBox machine under macOS) but it now crashes late in
the boot sequence (after it processes a dump if I make one but
before I can log in).

This update was via my usual explicit svnlite update; buildworld
buildkernel; etc. production style build of world and kernel,
including use of MALLOC_PRODUCTION.

The window shows:

_vm_map_lock+0xf
vm_map_wire+0x32
rtROMemObjNativeLockInMap+0x8c
rtROMemObjNativeLockUser+0x51
RTR0MemObjLockUserTag+0x231
vbglR0HGCMInternalPreprocessCall+0x65d
vbglR0HGCMInternalCall+0x17c
vgdrvIoCtl_HGCMCall+0x43f
VGDrvCommonIoCtl+0x261
vgdrvFreeBSDIOCtl+0x2cd
devfs_ioctl+0xae
VOP_IOCTL_APV+0x88
vn_ioctl+0x161
devfs_ioctl_f+0x1f
kern_ioctl+0x280
sys_ioctl+0x13f
amd64_syscall+0x397
Xfast_syscall+0xfb


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-20 Thread Mateusz Guzik
On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> [Note: I experiment with clang based powerpc64 builds,
> reporting problems that I find. Justin is familiar
> with this, as is Nathan.]
> 
> I tried to update the PowerMac G5 (a so-called "Quad Core")
> that I have access to from head -r312761 to -r313864 and
> ended up with random panics and hang ups in fairly short
> order after booting.
> 
> Some approximate bisecting for the kernel lead to:
> (sometimes getting part way into a buildkernel attempt
> for a different version before a failure happens)
> 
> -r313266: works (just before use of atomic_fcmpset)
> vs.
> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> 
> (I did not try -r313268 through -r313270 as the use was
> gradually added.)
> 
> So I'm currently running a -r313864 world with a -r313266
> kernel.
> 
> No kernel that I tried that was from before -r313266 had the
> problems.
> 
> Any kernel that I tried that was from after -r313271 had the
> problems.
> 
> Of course I did not try them all in other direction. :)
> 

I found that spin mutexes were not properly handling this, fixed in
r313996.

Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
fcmpset to simulate failures. Everything works, while it would easily
fail without the patch.

That said, I hope this concludes the 'missing check for not-reread value
of failed fcmpset' saga.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-18 Thread Mateusz Guzik
On Sat, Feb 18, 2017 at 01:58:49PM -0800, Mark Millard wrote:
> On 2017-Feb-18, at 12:58 PM, Mateusz Guzik  wrote:
> > Well either the primitive itself is buggy or the somewhat (now) unusual
> > condition of not providing the failed value (but possibly a stale one)
> > is not handled correctly in locking code.
> > 
> > That said, I would start with putting barriers "on both sides" of
> > powerpc's fcmpset for debugging purposes and if the problem persists I
> > can add some debugs to locking priitmives.
> > 
> 
> I currently have the only powerpc64 that I have access
> to for now doing a test that will likely finish tonight
> sometime (if it has no problems).
> 
> Also I'm not so familiar with powerpc64 details as to be
> able insert proper barriers and the like off the top of
> my head: It is more of a research subject for me.
> 

This was a suggestion to jhibbits@. Looking at the code it is not hard
to slap them in for testing purposes, or maybe there is an "obvious now
that I look at it" braino in there, or maybe he has a better idea.

Now that I wrote it I can get myself access to powerpc boxes. While I
wont be able to run bsd on them, I can hack around in userapce and see.

That's unless jhibbits@ steps in. I have no clue about ppc.

> It looks like contexts like __rw_wlock_hard(c,v,tid,file,line)
> now needs the caller to do an equivalent of:
> 
> __rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line)
> 
> in order for the code behavior to match the old behavior
> that was based on the original local-v's initialization
> before v was used:
> 
> rw = rwlock2rw(c);
> v = RW_READ_VALUE(rw); /* this line no longer exists */
> 
> This means that checking for equivalence is no longer
> local to the routine but involves checking all the
> usage of the routine.
> 

Not reading the argument locally was the entire point of introducing
fcmpset. Otherwise the 'v' argument would be a waste of time.

Some primitives can attempt grabbing the lock and if they fail, we have
the lock value to work with (e.g. check who owns the lock and see if
they are running). In particular amd64 will give us the value it found.
An explicit read requires whoever owns the cachelilne to lose the
exclusive ownership and if the lock is contended (multiple cpus doing
fcmpset), this makes the cachelilne ping-pong between cores. This
destroys performance especially on systems with many cores and
especially so with multiple numa nodes.

Other primitives don't have inline variants. This concerns read-write
locks which try to:
retry:
r = lock_value(lock);
if (!locked(r)) {
if (!cmpset(lock, r, r + ONE_READER))
goto retry;
}

That is, if multiple cpus try to get the lock for reading, one will fail
and willl lbe forced to compute the new value to be set. The longer the
time between attempts the more likely it is other core showed up trying
to do the same thing with the same value, causing another failed
attempt.

So here there are no inlilnes so that the time is shorter and fcmpset
alllows NOT reading the lock value explicitely - it was already provided
by hardware.

Note this is still significantly slower than it has to be in principle -
the lock can 'blilndly increment by ONE_READER and see what happens',
but that requires several changes and is a subject for another e-mail.
I'm working on it though.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-18 Thread Mark Millard
On 2017-Feb-18, at 12:58 PM, Mateusz Guzik  wrote:

> On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote:
>> On 2017-Feb-18, at 4:18 AM, Mark Millard  wrote:
>> 
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>> 
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>> 
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>> 
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>> 
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>> 
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>> 
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>> 
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>> 
>>> Of course I did not try them all in other direction. :)
>> 
>> [Of course: "either direction".]
>> 
>> I'll note that the -r313864 buildworld was without
>> MALLOC_PRODUCTION being defined. (Unusual for me but
>> I'm testing if a jemalloc assert problem on arm64
>> also happens on powerpc64.)
>> 
>> By contrast the buildkernels were production style
>> (as is normal for me unless I'm trying to track
>> something down that I think might be exposed by
>> the extra checks).
>> 
> 
> Well either the primitive itself is buggy or the somewhat (now) unusual
> condition of not providing the failed value (but possibly a stale one)
> is not handled correctly in locking code.
> 
> That said, I would start with putting barriers "on both sides" of
> powerpc's fcmpset for debugging purposes and if the problem persists I
> can add some debugs to locking priitmives.
> 
> -- 
> Mateusz Guzik 

I currently have the only powerpc64 that I have access
to for now doing a test that will likely finish tonight
sometime (if it has no problems).

Also I'm not so familiar with powerpc64 details as to be
able insert proper barriers and the like off the top of
my head: It is more of a research subject for me.


Side note:

It looks like contexts like __rw_wlock_hard(c,v,tid,file,line)
now needs the caller to do an equivalent of:

__rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line)

in order for the code behavior to match the old behavior
that was based on the original local-v's initialization
before v was used:

rw = rwlock2rw(c);
v = RW_READ_VALUE(rw); /* this line no longer exists */

This means that checking for equivalence is no longer
local to the routine but involves checking all the
usage of the routine.

I've not done such so for all I know such usage is always
in place: This is not a claim of a problem. The other
routines in kern_rwlock.c still have local variables and
the original initializations. I just thought that this
was interesting. I've not looked at other files yet.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-18 Thread Mateusz Guzik
On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote:
> On 2017-Feb-18, at 4:18 AM, Mark Millard  wrote:
> 
> > [Note: I experiment with clang based powerpc64 builds,
> > reporting problems that I find. Justin is familiar
> > with this, as is Nathan.]
> > 
> > I tried to update the PowerMac G5 (a so-called "Quad Core")
> > that I have access to from head -r312761 to -r313864 and
> > ended up with random panics and hang ups in fairly short
> > order after booting.
> > 
> > Some approximate bisecting for the kernel lead to:
> > (sometimes getting part way into a buildkernel attempt
> > for a different version before a failure happens)
> > 
> > -r313266: works (just before use of atomic_fcmpset)
> > vs.
> > -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> > 
> > (I did not try -r313268 through -r313270 as the use was
> > gradually added.)
> > 
> > So I'm currently running a -r313864 world with a -r313266
> > kernel.
> > 
> > No kernel that I tried that was from before -r313266 had the
> > problems.
> > 
> > Any kernel that I tried that was from after -r313271 had the
> > problems.
> > 
> > Of course I did not try them all in other direction. :)
> 
> [Of course: "either direction".]
> 
> I'll note that the -r313864 buildworld was without
> MALLOC_PRODUCTION being defined. (Unusual for me but
> I'm testing if a jemalloc assert problem on arm64
> also happens on powerpc64.)
> 
> By contrast the buildkernels were production style
> (as is normal for me unless I'm trying to track
> something down that I think might be exposed by
> the extra checks).
> 

Well either the primitive itself is buggy or the somewhat (now) unusual
condition of not providing the failed value (but possibly a stale one)
is not handled correctly in locking code.

That said, I would start with putting barriers "on both sides" of
powerpc's fcmpset for debugging purposes and if the problem persists I
can add some debugs to locking priitmives.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-18 Thread Mark Millard
On 2017-Feb-18, at 4:18 AM, Mark Millard  wrote:

> [Note: I experiment with clang based powerpc64 builds,
> reporting problems that I find. Justin is familiar
> with this, as is Nathan.]
> 
> I tried to update the PowerMac G5 (a so-called "Quad Core")
> that I have access to from head -r312761 to -r313864 and
> ended up with random panics and hang ups in fairly short
> order after booting.
> 
> Some approximate bisecting for the kernel lead to:
> (sometimes getting part way into a buildkernel attempt
> for a different version before a failure happens)
> 
> -r313266: works (just before use of atomic_fcmpset)
> vs.
> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> 
> (I did not try -r313268 through -r313270 as the use was
> gradually added.)
> 
> So I'm currently running a -r313864 world with a -r313266
> kernel.
> 
> No kernel that I tried that was from before -r313266 had the
> problems.
> 
> Any kernel that I tried that was from after -r313271 had the
> problems.
> 
> Of course I did not try them all in other direction. :)

[Of course: "either direction".]

I'll note that the -r313864 buildworld was without
MALLOC_PRODUCTION being defined. (Unusual for me but
I'm testing if a jemalloc assert problem on arm64
also happens on powerpc64.)

By contrast the buildkernels were production style
(as is normal for me unless I'm trying to track
something down that I think might be exposed by
the extra checks).

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-02-18 Thread Mark Millard
[Note: I experiment with clang based powerpc64 builds,
reporting problems that I find. Justin is familiar
with this, as is Nathan.]

I tried to update the PowerMac G5 (a so-called "Quad Core")
that I have access to from head -r312761 to -r313864 and
ended up with random panics and hang ups in fairly short
order after booting.

Some approximate bisecting for the kernel lead to:
(sometimes getting part way into a buildkernel attempt
for a different version before a failure happens)

-r313266: works (just before use of atomic_fcmpset)
vs.
-r313271: fails (last of the "use atomic_fcmpset" check-ins)

(I did not try -r313268 through -r313270 as the use was
gradually added.)

So I'm currently running a -r313864 world with a -r313266
kernel.

No kernel that I tried that was from before -r313266 had the
problems.

Any kernel that I tried that was from after -r313271 had the
problems.

Of course I did not try them all in other direction. :)

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"