Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Mar-3, at 6:17 AM, Rodney W. Grimes wrote: >> On 2017-Mar-2, at 7:19 AM, Steve Kargl >> wrote: >> >> On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote: >>> On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote: > Summary of the transition interval: > > So for powerpc64 (and powerpc?) It is a good > idea to avoid anything that is after -r313254 > and before -r314474 in head. (Would this be > appropriate for a UPDATING notice given its > span?) > > There may be other architectures that might have > a similar status(?): the last fixes involved were > not in Machine Dependent code. (Some architectures > are apparently insensitive to the errors, such as > amd64). > When following current you are expected to be on the newest revision, so I don't think mentioning interim broken releases makes much sense. >>> >>> Documenting the range may aid those bisecting src/ to find a bug. >>> How is one to know that anything in the range that Mark points >>> out should be skipped on powerpc64? >>> >>> -- >>> Steve >> >> I have tested with a TARGET_ARCH=powerpc -r314473 build and >> its kernel version has locking problems like >> TARGET_ARCH=powerpc64 does for that version. >> >> [Note: This was run on a PowerMac G5 so-called "Quad Core" >> so most of the memory was ignored.] >> >> Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474 >> or later as of the new locking. >> >> I've not explicitly tested other architectures. As I remember >> armv6/v7 are classified as having some from of a weak memory >> model compared to the likes of amd64. If so armv6/v7 might be >> candidates for having problems. There might be other candidates. > > I also had locking issues on amd64 around this build time that > sent me down a week long rabbit hole chasing what I thought was > a bug in the new AMD/IOMMU code. IMHO if we can at least > flag prior snapshot builds as "Broken for reason X" it might > save someone some time and time is a one way depleting resource > usually worth saving if possible. > > If needed I can dig out the specifc build. Oh, nvm, let me > just do that, it was r309302. This revision I beleive is > a november snapshot. It has kernel panics due to spinlock > timeout and sparatic deadlock that is undetected. > > > -- > Rod Grimes rgrimes at > freebsd.org Sounds like that amd64 -r309302 problem might be another good example. Locking tends to be central and heavily used. When it breaks many other things tend to also end up broken. This is the sort of context I was thinking about if it goes on very long. I'm not sure that the -r309302 problem would reproduce at -r313259 so -r309302 might be a separate issue. I've no clue what -rdd range had the amd64 -r309302 problem. Details that I'm aware of are something like: -r309302 is dated 2016-Nov-30. (your reported amd64 locking problem's -rdd) -r312973 is dated 2017-Jan-30. (example add of an atomic_fcmpset implementation) (getting ready for machine independent usage) -r313259 is dated 2017-Feb-5. (last before "machine independent use of atomic_fcmpset"?) (powerpc64 and powerpc working here) -r313260 is dated 2017-Feb-5. (first machine-independent usage of atomic_fcmpset?) . . . (various machine-independent atomic_fcmpset usage check-ins) . . . -r313271 is dated 2017-Feb-5. (observed powerpc64 failures for this version) (powerpc would fail too) . . . (various machine-independent atomic_fcmpset usage check-ins) . . . . . . (powerpc64 [and powerpc] continuing to fail) . . . -r314474 is dated 2017-Mar-1. (powerpc64 and powerpc started working) === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
> On 2017-Mar-2, at 7:19 AM, Steve Kargl > wrote: > > On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote: > > On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote: > >> > >>> Summary of the transition interval: > >>> > >>> So for powerpc64 (and powerpc?) It is a good > >>> idea to avoid anything that is after -r313254 > >>> and before -r314474 in head. (Would this be > >>> appropriate for a UPDATING notice given its > >>> span?) > >>> > >>> There may be other architectures that might have > >>> a similar status(?): the last fixes involved were > >>> not in Machine Dependent code. (Some architectures > >>> are apparently insensitive to the errors, such as > >>> amd64). > >>> > >> > >> When following current you are expected to be on the newest revision, > >> so I don't think mentioning interim broken releases makes much sense. > >> > > > > Documenting the range may aid those bisecting src/ to find a bug. > > How is one to know that anything in the range that Mark points > > out should be skipped on powerpc64? > > > > -- > > Steve > > I have tested with a TARGET_ARCH=powerpc -r314473 build and > its kernel version has locking problems like > TARGET_ARCH=powerpc64 does for that version. > > [Note: This was run on a PowerMac G5 so-called "Quad Core" > so most of the memory was ignored.] > > Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474 > or later as of the new locking. > > I've not explicitly tested other architectures. As I remember > armv6/v7 are classified as having some from of a weak memory > model compared to the likes of amd64. If so armv6/v7 might be > candidates for having problems. There might be other candidates. I also had locking issues on amd64 around this build time that sent me down a week long rabbit hole chasing what I thought was a bug in the new AMD/IOMMU code. IMHO if we can at least flag prior snapshot builds as "Broken for reason X" it might save someone some time and time is a one way depleting resource usually worth saving if possible. If needed I can dig out the specifc build. Oh, nvm, let me just do that, it was r309302. This revision I beleive is a november snapshot. It has kernel panics due to spinlock timeout and sparatic deadlock that is undetected. -- Rod Grimes rgri...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Mar-2, at 7:19 AM, Steve Kargl wrote: On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote: > On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote: >> >>> Summary of the transition interval: >>> >>> So for powerpc64 (and powerpc?) It is a good >>> idea to avoid anything that is after -r313254 >>> and before -r314474 in head. (Would this be >>> appropriate for a UPDATING notice given its >>> span?) >>> >>> There may be other architectures that might have >>> a similar status(?): the last fixes involved were >>> not in Machine Dependent code. (Some architectures >>> are apparently insensitive to the errors, such as >>> amd64). >>> >> >> When following current you are expected to be on the newest revision, >> so I don't think mentioning interim broken releases makes much sense. >> > > Documenting the range may aid those bisecting src/ to find a bug. > How is one to know that anything in the range that Mark points > out should be skipped on powerpc64? > > -- > Steve I have tested with a TARGET_ARCH=powerpc -r314473 build and its kernel version has locking problems like TARGET_ARCH=powerpc64 does for that version. [Note: This was run on a PowerMac G5 so-called "Quad Core" so most of the memory was ignored.] Both TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc need -r314474 or later as of the new locking. I've not explicitly tested other architectures. As I remember armv6/v7 are classified as having some from of a weak memory model compared to the likes of amd64. If so armv6/v7 might be candidates for having problems. There might be other candidates. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Thu, Mar 02, 2017 at 01:10:21PM +0100, Mateusz Guzik wrote: > On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote: > > > > Summary of the transition interval: > > > > So for powerpc64 (and powerpc?) It is a good > > idea to avoid anything that is after -r313254 > > and before -r314474 in head. (Would this be > > appropriate for a UPDATING notice given its > > span?) > > > > There may be other architectures that might have > > a similar status(?): the last fixes involved were > > not in Machine Dependent code. (Some architectures > > are apparently insensitive to the errors, such as > > amd64). > > > > When following current you are expected to be on the newest revision, > so I don't think mentioning interim broken releases makes much sense. > Documenting the range may aid those bisecting src/ to find a bug. How is one to know that anything in the range that Mark points out should be skipped on powerpc64? -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Wed, Mar 01, 2017 at 09:45:07AM -0800, Mark Millard wrote: > > On 2017-Feb-28, at 10:13 PM, Mateusz Guzik wrote: > > On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote: > >> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote: > >>> Thus the PowerMac G5 so-called "Quad Core" is back to > >>> -r313254 without your patches. (The "Quad Core" really has > >>> two processors, each with 2 cores.) > >>> > >> > >> > >> Thanks a lot for testing. I'll have to think what to do with it, worst > >> case I'll #ifdef changes with powerpc. > >> > > > > Should be fixed with r314474. Got a real powerpc to test on (60 cores), > > was able to lock it up in seconds. Now it is perfectly stablle. > > > > -- > > Mateusz Guzik > > The updated so-called "Quad Core" PowerMac G5 used for > TARGET_ARCH=powerpc64 was able to do a self hosted > buildworld buildkernel for -r314479 just fine. > Cool. > Thanks much for the fixes: Now I can track head again > for powerpc64. > Well it was my breakage to begin with. > > Summary of the transition interval: > > So for powerpc64 (and powerpc?) It is a good > idea to avoid anything that is after -r313254 > and before -r314474 in head. (Would this be > appropriate for a UPDATING notice given its > span?) > > There may be other architectures that might have > a similar status(?): the last fixes involved were > not in Machine Dependent code. (Some architectures > are apparently insensitive to the errors, such as > amd64). > When following current you are expected to be on the newest revision, so I don't think mentioning interim broken releases makes much sense. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-28, at 10:13 PM, Mateusz Guzik wrote: On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote: >> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote: >>> Thus the PowerMac G5 so-called "Quad Core" is back to >>> -r313254 without your patches. (The "Quad Core" really has >>> two processors, each with 2 cores.) >>> >> >> >> Thanks a lot for testing. I'll have to think what to do with it, worst >> case I'll #ifdef changes with powerpc. >> > > Should be fixed with r314474. Got a real powerpc to test on (60 cores), > was able to lock it up in seconds. Now it is perfectly stablle. > > -- > Mateusz Guzik The updated so-called "Quad Core" PowerMac G5 used for TARGET_ARCH=powerpc64 was able to do a self hosted buildworld buildkernel for -r314479 just fine. Thanks much for the fixes: Now I can track head again for powerpc64. Summary of the transition interval: So for powerpc64 (and powerpc?) It is a good idea to avoid anything that is after -r313254 and before -r314474 in head. (Would this be appropriate for a UPDATING notice given its span?) There may be other architectures that might have a similar status(?): the last fixes involved were not in Machine Dependent code. (Some architectures are apparently insensitive to the errors, such as amd64). === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote: > On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote: > > Thus the PowerMac G5 so-called "Quad Core" is back to > > -r313254 without your patches. (The "Quad Core" really has > > two processors, each with 2 cores.) > > > > > Thanks a lot for testing. I'll have to think what to do with it, worst > case I'll #ifdef changes with powerpc. > Should be fixed with r314474. Got a real powerpc to test on (60 cores), was able to lock it up in seconds. Now it is perfectly stablle. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote: > Thus the PowerMac G5 so-called "Quad Core" is back to > -r313254 without your patches. (The "Quad Core" really has > two processors, each with 2 cores.) > Thanks a lot for testing. I'll have to think what to do with it, worst case I'll #ifdef changes with powerpc. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-25, at 5:49 AM, Mark Millardwrote: > On 2017-Feb-25, at 1:05 AM, Mark Millard wrote: > >> On 2017-Feb-24, at 11:46 PM, Mark Millard wrote: >> >>> On 2017-Feb-24, at 8:25 PM, Mark Millard wrote: >>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik wrote: > > On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: >> [Back to the powerpc64 context.] >> >> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: >> >>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: [Note: I experiment with clang based powerpc64 builds, reporting problems that I find. Justin is familiar with this, as is Nathan.] I tried to update the PowerMac G5 (a so-called "Quad Core") that I have access to from head -r312761 to -r313864 and ended up with random panics and hang ups in fairly short order after booting. Some approximate bisecting for the kernel lead to: (sometimes getting part way into a buildkernel attempt for a different version before a failure happens) -r313266: works (just before use of atomic_fcmpset) vs. -r313271: fails (last of the "use atomic_fcmpset" check-ins) (I did not try -r313268 through -r313270 as the use was gradually added.) So I'm currently running a -r313864 world with a -r313266 kernel. No kernel that I tried that was from before -r313266 had the problems. Any kernel that I tried that was from after -r313271 had the problems. Of course I did not try them all in other direction. :) >>> >>> I found that spin mutexes were not properly handling this, fixed in >>> r313996. >>> >>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >>> fcmpset to simulate failures. Everything works, while it would easily >>> fail without the patch. >>> >>> That said, I hope this concludes the 'missing check for not-reread value >>> of failed fcmpset' saga. >>> >>> -- >>> Mateusz Guzik >> >> -r313999 is an improvement for powerpc64: it boots and I can >> log in on the old PowerMac G5 so-called "Quad Core". >> >> But, e.g., buildworld buildkernel eventually hangs and later >> the powerpc64 panics for "spin lock held too long". >> > > Allright, play time is over. > > Can you please: > 1. verify r313254 is stable for you > 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and > https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry > the test? > > This is a workaround which effectively disables the powerpc-specific > primitive and makes it use a cmpset wrapper instead. I don't have the > hardware to test right now and my attempts to boot in qemu also failed. > > That said, does not look like there are general fcmpset bugs left and > the remaining issue seems powerpc-specific. > > If this works, I'll commit the workaround for the time being as in few > weeks I'd like to start merging the work back to stable/11. > > -- > Mateusz Guzik I've started a self-hosted powerpc64 -r313254 build based on running the -r313266 kernel. (The context sometimes do cross builds in is tied up with other things. -r313266 is what my prior bisection came up with as the last appearently-working kernel at the time.) So it will be a while before I have a -r313254 in place to try: the self-hosted build takes longer and so will not be installed for a while. To judge stability I'll probably have -e313254 build the patched update that you want me to test, initially doing a cleanworld. So that too will take a while. (The above wording presumes all goes well.) I'll let you know as I go along if I run into anything interesting. My builds are rebuilding both world and kernel since what turns into /usr/include/sys/* has changes in your patch. The builds are without MALLOC_PRODUCTION but are otherwise not debug builds. I've not seen anything indicating that anyone has been trying TARGET_ARCH=powerpc. I've been trying TARGET_ARCH=powerpc64 . While I do not have access to a true TARGET_ARCH=powerpc machine currently, such a build can be used on a PowerMac G5 so-called "Quad Core". So I could eventually build and try such on the one powerpc family machine that I currently have access to. clang 3.9.1 has a significant code generation problem for TARGET_ARCH=powerpc and so I'd have to use a gcc 4.2.1 based build for that sort
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-25, at 1:05 AM, Mark Millardwrote: > On 2017-Feb-24, at 11:46 PM, Mark Millard wrote: > >> On 2017-Feb-24, at 8:25 PM, Mark Millard wrote: >> >>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik wrote: On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: > [Back to the powerpc64 context.] > > On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > >> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >>> [Note: I experiment with clang based powerpc64 builds, >>> reporting problems that I find. Justin is familiar >>> with this, as is Nathan.] >>> >>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>> that I have access to from head -r312761 to -r313864 and >>> ended up with random panics and hang ups in fairly short >>> order after booting. >>> >>> Some approximate bisecting for the kernel lead to: >>> (sometimes getting part way into a buildkernel attempt >>> for a different version before a failure happens) >>> >>> -r313266: works (just before use of atomic_fcmpset) >>> vs. >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>> >>> (I did not try -r313268 through -r313270 as the use was >>> gradually added.) >>> >>> So I'm currently running a -r313864 world with a -r313266 >>> kernel. >>> >>> No kernel that I tried that was from before -r313266 had the >>> problems. >>> >>> Any kernel that I tried that was from after -r313271 had the >>> problems. >>> >>> Of course I did not try them all in other direction. :) >>> >> >> I found that spin mutexes were not properly handling this, fixed in >> r313996. >> >> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >> fcmpset to simulate failures. Everything works, while it would easily >> fail without the patch. >> >> That said, I hope this concludes the 'missing check for not-reread value >> of failed fcmpset' saga. >> >> -- >> Mateusz Guzik > > -r313999 is an improvement for powerpc64: it boots and I can > log in on the old PowerMac G5 so-called "Quad Core". > > But, e.g., buildworld buildkernel eventually hangs and later > the powerpc64 panics for "spin lock held too long". > Allright, play time is over. Can you please: 1. verify r313254 is stable for you 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry the test? This is a workaround which effectively disables the powerpc-specific primitive and makes it use a cmpset wrapper instead. I don't have the hardware to test right now and my attempts to boot in qemu also failed. That said, does not look like there are general fcmpset bugs left and the remaining issue seems powerpc-specific. If this works, I'll commit the workaround for the time being as in few weeks I'd like to start merging the work back to stable/11. -- Mateusz Guzik >>> >>> I've started a self-hosted powerpc64 -r313254 build >>> based on running the -r313266 kernel. (The context >>> sometimes do cross builds in is tied up with other >>> things. -r313266 is what my prior bisection came up >>> with as the last appearently-working kernel at the >>> time.) >>> >>> So it will be a while before I have a -r313254 in >>> place to try: the self-hosted build takes longer >>> and so will not be installed for a while. >>> >>> To judge stability I'll probably have -e313254 build >>> the patched update that you want me to test, initially >>> doing a cleanworld. So that too will take a while. >>> >>> (The above wording presumes all goes well.) >>> >>> I'll let you know as I go along if I run into anything >>> interesting. >>> >>> >>> My builds are rebuilding both world and kernel since >>> what turns into /usr/include/sys/* has changes in your >>> patch. >>> >>> The builds are without MALLOC_PRODUCTION but are >>> otherwise not debug builds. >>> >>> >>> I've not seen anything indicating that anyone has >>> been trying TARGET_ARCH=powerpc. I've been trying >>> TARGET_ARCH=powerpc64 . >>> >>> While I do not have access to a true >>> TARGET_ARCH=powerpc machine currently, such a build >>> can be used on a PowerMac G5 so-called "Quad Core". >>> So I could eventually build and try such on the one >>> powerpc family machine that I currently have access >>> to. >>> >>> clang 3.9.1 has a significant code generation problem >>> for TARGET_ARCH=powerpc and so I'd have to use >>> a gcc 4.2.1 based build for that sort of experiment. >>> (There is no xtoolchain for 32-bit powerpc.) >>> >>> I use clang 3.9.1 or xtoolchain for >>> TARGET_ARCH=powerpc64 and have been using clang 3.9.1 >>> in recent times. My primary
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-24, at 11:46 PM, Mark Millard wrote: > On 2017-Feb-24, at 8:25 PM, Mark Millard wrote: > >> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik wrote: >>> >>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: [Back to the powerpc64 context.] On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >> [Note: I experiment with clang based powerpc64 builds, >> reporting problems that I find. Justin is familiar >> with this, as is Nathan.] >> >> I tried to update the PowerMac G5 (a so-called "Quad Core") >> that I have access to from head -r312761 to -r313864 and >> ended up with random panics and hang ups in fairly short >> order after booting. >> >> Some approximate bisecting for the kernel lead to: >> (sometimes getting part way into a buildkernel attempt >> for a different version before a failure happens) >> >> -r313266: works (just before use of atomic_fcmpset) >> vs. >> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >> >> (I did not try -r313268 through -r313270 as the use was >> gradually added.) >> >> So I'm currently running a -r313864 world with a -r313266 >> kernel. >> >> No kernel that I tried that was from before -r313266 had the >> problems. >> >> Any kernel that I tried that was from after -r313271 had the >> problems. >> >> Of course I did not try them all in other direction. :) >> > > I found that spin mutexes were not properly handling this, fixed in > r313996. > > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > fcmpset to simulate failures. Everything works, while it would easily > fail without the patch. > > That said, I hope this concludes the 'missing check for not-reread value > of failed fcmpset' saga. > > -- > Mateusz Guzik -r313999 is an improvement for powerpc64: it boots and I can log in on the old PowerMac G5 so-called "Quad Core". But, e.g., buildworld buildkernel eventually hangs and later the powerpc64 panics for "spin lock held too long". >>> >>> Allright, play time is over. >>> >>> Can you please: >>> 1. verify r313254 is stable for you >>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and >>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry >>> the test? >>> >>> This is a workaround which effectively disables the powerpc-specific >>> primitive and makes it use a cmpset wrapper instead. I don't have the >>> hardware to test right now and my attempts to boot in qemu also failed. >>> >>> That said, does not look like there are general fcmpset bugs left and >>> the remaining issue seems powerpc-specific. >>> >>> If this works, I'll commit the workaround for the time being as in few >>> weeks I'd like to start merging the work back to stable/11. >>> >>> -- >>> Mateusz Guzik >> >> I've started a self-hosted powerpc64 -r313254 build >> based on running the -r313266 kernel. (The context >> sometimes do cross builds in is tied up with other >> things. -r313266 is what my prior bisection came up >> with as the last appearently-working kernel at the >> time.) >> >> So it will be a while before I have a -r313254 in >> place to try: the self-hosted build takes longer >> and so will not be installed for a while. >> >> To judge stability I'll probably have -e313254 build >> the patched update that you want me to test, initially >> doing a cleanworld. So that too will take a while. >> >> (The above wording presumes all goes well.) >> >> I'll let you know as I go along if I run into anything >> interesting. >> >> >> My builds are rebuilding both world and kernel since >> what turns into /usr/include/sys/* has changes in your >> patch. >> >> The builds are without MALLOC_PRODUCTION but are >> otherwise not debug builds. >> >> >> I've not seen anything indicating that anyone has >> been trying TARGET_ARCH=powerpc. I've been trying >> TARGET_ARCH=powerpc64 . >> >> While I do not have access to a true >> TARGET_ARCH=powerpc machine currently, such a build >> can be used on a PowerMac G5 so-called "Quad Core". >> So I could eventually build and try such on the one >> powerpc family machine that I currently have access >> to. >> >> clang 3.9.1 has a significant code generation problem >> for TARGET_ARCH=powerpc and so I'd have to use >> a gcc 4.2.1 based build for that sort of experiment. >> (There is no xtoolchain for 32-bit powerpc.) >> >> I use clang 3.9.1 or xtoolchain for >> TARGET_ARCH=powerpc64 and have been using clang 3.9.1 >> in recent times. My primary powerpc family use has >> been to experiment with building based on the >> modern libc++ and reporting issues discovered in the >> attempts. This explains the clang/xtoolchain context. >> >> clang 3.9.1 has
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-24, at 8:25 PM, Mark Millard wrote: > On 2017-Feb-24, at 4:23 PM, Mateusz Guzik wrote: >> >> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: >>> [Back to the powerpc64 context.] >>> >>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: >>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: > [Note: I experiment with clang based powerpc64 builds, > reporting problems that I find. Justin is familiar > with this, as is Nathan.] > > I tried to update the PowerMac G5 (a so-called "Quad Core") > that I have access to from head -r312761 to -r313864 and > ended up with random panics and hang ups in fairly short > order after booting. > > Some approximate bisecting for the kernel lead to: > (sometimes getting part way into a buildkernel attempt > for a different version before a failure happens) > > -r313266: works (just before use of atomic_fcmpset) > vs. > -r313271: fails (last of the "use atomic_fcmpset" check-ins) > > (I did not try -r313268 through -r313270 as the use was > gradually added.) > > So I'm currently running a -r313864 world with a -r313266 > kernel. > > No kernel that I tried that was from before -r313266 had the > problems. > > Any kernel that I tried that was from after -r313271 had the > problems. > > Of course I did not try them all in other direction. :) > I found that spin mutexes were not properly handling this, fixed in r313996. Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 fcmpset to simulate failures. Everything works, while it would easily fail without the patch. That said, I hope this concludes the 'missing check for not-reread value of failed fcmpset' saga. -- Mateusz Guzik >>> >>> -r313999 is an improvement for powerpc64: it boots and I can >>> log in on the old PowerMac G5 so-called "Quad Core". >>> >>> But, e.g., buildworld buildkernel eventually hangs and later >>> the powerpc64 panics for "spin lock held too long". >>> >> >> Allright, play time is over. >> >> Can you please: >> 1. verify r313254 is stable for you >> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and >> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry >> the test? >> >> This is a workaround which effectively disables the powerpc-specific >> primitive and makes it use a cmpset wrapper instead. I don't have the >> hardware to test right now and my attempts to boot in qemu also failed. >> >> That said, does not look like there are general fcmpset bugs left and >> the remaining issue seems powerpc-specific. >> >> If this works, I'll commit the workaround for the time being as in few >> weeks I'd like to start merging the work back to stable/11. >> >> -- >> Mateusz Guzik > > I've started a self-hosted powerpc64 -r313254 build > based on running the -r313266 kernel. (The context > sometimes do cross builds in is tied up with other > things. -r313266 is what my prior bisection came up > with as the last appearently-working kernel at the > time.) > > So it will be a while before I have a -r313254 in > place to try: the self-hosted build takes longer > and so will not be installed for a while. > > To judge stability I'll probably have -e313254 build > the patched update that you want me to test, initially > doing a cleanworld. So that too will take a while. > > (The above wording presumes all goes well.) > > I'll let you know as I go along if I run into anything > interesting. > > > My builds are rebuilding both world and kernel since > what turns into /usr/include/sys/* has changes in your > patch. > > The builds are without MALLOC_PRODUCTION but are > otherwise not debug builds. > > > I've not seen anything indicating that anyone has > been trying TARGET_ARCH=powerpc. I've been trying > TARGET_ARCH=powerpc64 . > > While I do not have access to a true > TARGET_ARCH=powerpc machine currently, such a build > can be used on a PowerMac G5 so-called "Quad Core". > So I could eventually build and try such on the one > powerpc family machine that I currently have access > to. > > clang 3.9.1 has a significant code generation problem > for TARGET_ARCH=powerpc and so I'd have to use > a gcc 4.2.1 based build for that sort of experiment. > (There is no xtoolchain for 32-bit powerpc.) > > I use clang 3.9.1 or xtoolchain for > TARGET_ARCH=powerpc64 and have been using clang 3.9.1 > in recent times. My primary powerpc family use has > been to experiment with building based on the > modern libc++ and reporting issues discovered in the > attempts. This explains the clang/xtoolchain context. > > clang 3.9.1 has major problems for C++ exception > handling for both powerpc64 and powerpc but a > lot of FreeBSD is independent of throwing C++ > exceptions. By contrast xtoolchain-based works > for C++
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-24, at 4:23 PM, Mateusz Guzik wrote: > > On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: >> [Back to the powerpc64 context.] >> >> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: >> >>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: [Note: I experiment with clang based powerpc64 builds, reporting problems that I find. Justin is familiar with this, as is Nathan.] I tried to update the PowerMac G5 (a so-called "Quad Core") that I have access to from head -r312761 to -r313864 and ended up with random panics and hang ups in fairly short order after booting. Some approximate bisecting for the kernel lead to: (sometimes getting part way into a buildkernel attempt for a different version before a failure happens) -r313266: works (just before use of atomic_fcmpset) vs. -r313271: fails (last of the "use atomic_fcmpset" check-ins) (I did not try -r313268 through -r313270 as the use was gradually added.) So I'm currently running a -r313864 world with a -r313266 kernel. No kernel that I tried that was from before -r313266 had the problems. Any kernel that I tried that was from after -r313271 had the problems. Of course I did not try them all in other direction. :) >>> >>> I found that spin mutexes were not properly handling this, fixed in >>> r313996. >>> >>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >>> fcmpset to simulate failures. Everything works, while it would easily >>> fail without the patch. >>> >>> That said, I hope this concludes the 'missing check for not-reread value >>> of failed fcmpset' saga. >>> >>> -- >>> Mateusz Guzik >> >> -r313999 is an improvement for powerpc64: it boots and I can >> log in on the old PowerMac G5 so-called "Quad Core". >> >> But, e.g., buildworld buildkernel eventually hangs and later >> the powerpc64 panics for "spin lock held too long". >> > > Allright, play time is over. > > Can you please: > 1. verify r313254 is stable for you > 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and > https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry > the test? > > This is a workaround which effectively disables the powerpc-specific > primitive and makes it use a cmpset wrapper instead. I don't have the > hardware to test right now and my attempts to boot in qemu also failed. > > That said, does not look like there are general fcmpset bugs left and > the remaining issue seems powerpc-specific. > > If this works, I'll commit the workaround for the time being as in few > weeks I'd like to start merging the work back to stable/11. > > -- > Mateusz Guzik I've started a self-hosted powerpc64 -r313254 build based on running the -r313266 kernel. (The context sometimes do cross builds in is tied up with other things. -r313266 is what my prior bisection came up with as the last appearently-working kernel at the time.) So it will be a while before I have a -r313254 in place to try: the self-hosted build takes longer and so will not be installed for a while. To judge stability I'll probably have -e313254 build the patched update that you want me to test, initially doing a cleanworld. So that too will take a while. (The above wording presumes all goes well.) I'll let you know as I go along if I run into anything interesting. My builds are rebuilding both world and kernel since what turns into /usr/include/sys/* has changes in your patch. The builds are without MALLOC_PRODUCTION but are otherwise not debug builds. I've not seen anything indicating that anyone has been trying TARGET_ARCH=powerpc. I've been trying TARGET_ARCH=powerpc64 . While I do not have access to a true TARGET_ARCH=powerpc machine currently, such a build can be used on a PowerMac G5 so-called "Quad Core". So I could eventually build and try such on the one powerpc family machine that I currently have access to. clang 3.9.1 has a significant code generation problem for TARGET_ARCH=powerpc and so I'd have to use a gcc 4.2.1 based build for that sort of experiment. (There is no xtoolchain for 32-bit powerpc.) I use clang 3.9.1 or xtoolchain for TARGET_ARCH=powerpc64 and have been using clang 3.9.1 in recent times. My primary powerpc family use has been to experiment with building based on the modern libc++ and reporting issues discovered in the attempts. This explains the clang/xtoolchain context. clang 3.9.1 has major problems for C++ exception handling for both powerpc64 and powerpc but a lot of FreeBSD is independent of throwing C++ exceptions. By contrast xtoolchain-based works for C++ exception handling but lib32 fails to operate when built by a xtoolchain build. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: > [Back to the powerpc64 context.] > > On 2017-Feb-20, at 11:10 AM, Mateusz Guzikwrote: > > > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: > >> [Note: I experiment with clang based powerpc64 builds, > >> reporting problems that I find. Justin is familiar > >> with this, as is Nathan.] > >> > >> I tried to update the PowerMac G5 (a so-called "Quad Core") > >> that I have access to from head -r312761 to -r313864 and > >> ended up with random panics and hang ups in fairly short > >> order after booting. > >> > >> Some approximate bisecting for the kernel lead to: > >> (sometimes getting part way into a buildkernel attempt > >> for a different version before a failure happens) > >> > >> -r313266: works (just before use of atomic_fcmpset) > >> vs. > >> -r313271: fails (last of the "use atomic_fcmpset" check-ins) > >> > >> (I did not try -r313268 through -r313270 as the use was > >> gradually added.) > >> > >> So I'm currently running a -r313864 world with a -r313266 > >> kernel. > >> > >> No kernel that I tried that was from before -r313266 had the > >> problems. > >> > >> Any kernel that I tried that was from after -r313271 had the > >> problems. > >> > >> Of course I did not try them all in other direction. :) > >> > > > > I found that spin mutexes were not properly handling this, fixed in > > r313996. > > > > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > > fcmpset to simulate failures. Everything works, while it would easily > > fail without the patch. > > > > That said, I hope this concludes the 'missing check for not-reread value > > of failed fcmpset' saga. > > > > -- > > Mateusz Guzik > > -r313999 is an improvement for powerpc64: it boots and I can > log in on the old PowerMac G5 so-called "Quad Core". > > But, e.g., buildworld buildkernel eventually hangs and later > the powerpc64 panics for "spin lock held too long". > Allright, play time is over. Can you please: 1. verify r313254 is stable for you 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry the test? This is a workaround which effectively disables the powerpc-specific primitive and makes it use a cmpset wrapper instead. I don't have the hardware to test right now and my attempts to boot in qemu also failed. That said, does not look like there are general fcmpset bugs left and the remaining issue seems powerpc-specific. If this works, I'll commit the workaround for the time being as in few weeks I'd like to start merging the work back to stable/11. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
[Back to the powerpc64 context.] On 2017-Feb-20, at 11:10 AM, Mateusz Guzikwrote: > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >> [Note: I experiment with clang based powerpc64 builds, >> reporting problems that I find. Justin is familiar >> with this, as is Nathan.] >> >> I tried to update the PowerMac G5 (a so-called "Quad Core") >> that I have access to from head -r312761 to -r313864 and >> ended up with random panics and hang ups in fairly short >> order after booting. >> >> Some approximate bisecting for the kernel lead to: >> (sometimes getting part way into a buildkernel attempt >> for a different version before a failure happens) >> >> -r313266: works (just before use of atomic_fcmpset) >> vs. >> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >> >> (I did not try -r313268 through -r313270 as the use was >> gradually added.) >> >> So I'm currently running a -r313864 world with a -r313266 >> kernel. >> >> No kernel that I tried that was from before -r313266 had the >> problems. >> >> Any kernel that I tried that was from after -r313271 had the >> problems. >> >> Of course I did not try them all in other direction. :) >> > > I found that spin mutexes were not properly handling this, fixed in > r313996. > > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > fcmpset to simulate failures. Everything works, while it would easily > fail without the patch. > > That said, I hope this concludes the 'missing check for not-reread value > of failed fcmpset' saga. > > -- > Mateusz Guzik -r313999 is an improvement for powerpc64: it boots and I can log in on the old PowerMac G5 so-called "Quad Core". But, e.g., buildworld buildkernel eventually hangs and later the powerpc64 panics for "spin lock held too long". === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Mon, Feb 20, 2017 at 09:39:32PM -0800, Mark Millard wrote: > Looks like some kernel binary interface (as seen by > emulators/virtualbox-ose-addition ) has changed: > rebuilding emulators/virtualbox-ose-addition removed > the booting crash but uname -apKU still lists 1200021 > and 2100021 for the kernel and world for -r313999, > just like for -r313864. > I think this is r313992. I don't see why __FreeBSD_version would be modified for this. You are expected to always recompilel your modules while tracking -current. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-20, at 6:36 PM, Mark Millard wrote: > On 2017-Feb-20, at 3:35 PM, Mateusz Guzik wrote: > >> On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote: >>> On 2017-Feb-20, at 2:58 PM, Mark Millard wrote: >>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >> [Note: I experiment with clang based powerpc64 builds, >> reporting problems that I find. Justin is familiar >> with this, as is Nathan.] >> >> I tried to update the PowerMac G5 (a so-called "Quad Core") >> that I have access to from head -r312761 to -r313864 and >> ended up with random panics and hang ups in fairly short >> order after booting. >> >> Some approximate bisecting for the kernel lead to: >> (sometimes getting part way into a buildkernel attempt >> for a different version before a failure happens) >> >> -r313266: works (just before use of atomic_fcmpset) >> vs. >> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >> >> (I did not try -r313268 through -r313270 as the use was >> gradually added.) >> >> So I'm currently running a -r313864 world with a -r313266 >> kernel. >> >> No kernel that I tried that was from before -r313266 had the >> problems. >> >> Any kernel that I tried that was from after -r313271 had the >> problems. >> >> Of course I did not try them all in other direction. :) >> > > I found that spin mutexes were not properly handling this, fixed in > r313996. > > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > fcmpset to simulate failures. Everything works, while it would easily > fail without the patch. > > That said, I hope this concludes the 'missing check for not-reread value > of failed fcmpset' saga. > > -- > Mateusz Guzik I tried to update from -r313864 to -r313999 in my amd64 context (a VirtualBox machine under macOS) but it now crashes late in the boot sequence (after it processes a dump if I make one but before I can log in). This update was via my usual explicit svnlite update; buildworld buildkernel; etc. production style build of world and kernel, including use of MALLOC_PRODUCTION. The window shows: _vm_map_lock+0xf vm_map_wire+0x32 rtROMemObjNativeLockInMap+0x8c rtROMemObjNativeLockUser+0x51 RTR0MemObjLockUserTag+0x231 vbglR0HGCMInternalPreprocessCall+0x65d vbglR0HGCMInternalCall+0x17c vgdrvIoCtl_HGCMCall+0x43f VGDrvCommonIoCtl+0x261 vgdrvFreeBSDIOCtl+0x2cd devfs_ioctl+0xae VOP_IOCTL_APV+0x88 vn_ioctl+0x161 devfs_ioctl_f+0x1f kern_ioctl+0x280 sys_ioctl+0x13f amd64_syscall+0x397 Xfast_syscall+0xfb >>> >>> More detail from booting with the -r313864 kernel.old >>> and using kgdb on what the dump produced: >>> >>> # kgdb kernel.debug /var/crash/vmcore. >>> /var/crash/vmcore.0/var/crash/vmcore.last >>> # kgdb kernel.debug /var/crash/vmcore.0 >>> GNU gdb 6.1.1 [FreeBSD] >>> Copyright 2004 Free Software Foundation, Inc. >>> GDB is free software, covered by the GNU General Public License, and you are >>> welcome to change it and/or distribute copies of it under certain >>> conditions. >>> Type "show copying" to see the conditions. >>> There is absolutely no warranty for GDB. Type "show warranty" for details. >>> This GDB was configured as "amd64-marcel-freebsd"... >>> >>> Unread portion of the kernel message buffer: >>> <118>Starting vboxservice. >>> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 >>> 18:37:45) release log >>> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z >>> <118>00:00:00.000162 main OS Product: FreeBSD >>> <118>00:00:00.000171 main OS Release: 12.0-CURRENT >>> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT r313999M >>> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService >>> <118>00:00:00.000194 main Process ID: 609 >>> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE) >>> >>> >>> Fatal trap 12: page fault while in kernel mode >>> cpuid = 2; apic id = 02 >>> fault virtual address = 0xd6 >>> fault code = supervisor read data, page not present >>> instruction pointer = 0x20:0x80d4ebaf >>> stack pointer = 0x28:0xfe0122e2bef0 >>> frame pointer = 0x28:0xfe0122e2bf00 >>> code segment= base 0x0, limit 0xf, type 0x1b >>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags= interrupt enabled, resume, IOPL = 0 >>> current process = 609 (VBoxService) >>> >> >> >> >>> #9 0x80eb6be1 in calltrap () at >>> /usr/src/sys/amd64/amd64/exception.S:236 >>> #10 0x80d4ebaf in _vm_map_lock
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-20, at 3:35 PM, Mateusz Guzik wrote: > On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote: >> On 2017-Feb-20, at 2:58 PM, Mark Millard wrote: >> >>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: >>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: > [Note: I experiment with clang based powerpc64 builds, > reporting problems that I find. Justin is familiar > with this, as is Nathan.] > > I tried to update the PowerMac G5 (a so-called "Quad Core") > that I have access to from head -r312761 to -r313864 and > ended up with random panics and hang ups in fairly short > order after booting. > > Some approximate bisecting for the kernel lead to: > (sometimes getting part way into a buildkernel attempt > for a different version before a failure happens) > > -r313266: works (just before use of atomic_fcmpset) > vs. > -r313271: fails (last of the "use atomic_fcmpset" check-ins) > > (I did not try -r313268 through -r313270 as the use was > gradually added.) > > So I'm currently running a -r313864 world with a -r313266 > kernel. > > No kernel that I tried that was from before -r313266 had the > problems. > > Any kernel that I tried that was from after -r313271 had the > problems. > > Of course I did not try them all in other direction. :) > I found that spin mutexes were not properly handling this, fixed in r313996. Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 fcmpset to simulate failures. Everything works, while it would easily fail without the patch. That said, I hope this concludes the 'missing check for not-reread value of failed fcmpset' saga. -- Mateusz Guzik >>> >>> I tried to update from -r313864 to -r313999 in my amd64 context >>> (a VirtualBox machine under macOS) but it now crashes late in >>> the boot sequence (after it processes a dump if I make one but >>> before I can log in). >>> >>> This update was via my usual explicit svnlite update; buildworld >>> buildkernel; etc. production style build of world and kernel, >>> including use of MALLOC_PRODUCTION. >>> >>> The window shows: >>> >>> _vm_map_lock+0xf >>> vm_map_wire+0x32 >>> rtROMemObjNativeLockInMap+0x8c >>> rtROMemObjNativeLockUser+0x51 >>> RTR0MemObjLockUserTag+0x231 >>> vbglR0HGCMInternalPreprocessCall+0x65d >>> vbglR0HGCMInternalCall+0x17c >>> vgdrvIoCtl_HGCMCall+0x43f >>> VGDrvCommonIoCtl+0x261 >>> vgdrvFreeBSDIOCtl+0x2cd >>> devfs_ioctl+0xae >>> VOP_IOCTL_APV+0x88 >>> vn_ioctl+0x161 >>> devfs_ioctl_f+0x1f >>> kern_ioctl+0x280 >>> sys_ioctl+0x13f >>> amd64_syscall+0x397 >>> Xfast_syscall+0xfb >> >> More detail from booting with the -r313864 kernel.old >> and using kgdb on what the dump produced: >> >> # kgdb kernel.debug /var/crash/vmcore. >> /var/crash/vmcore.0/var/crash/vmcore.last >> # kgdb kernel.debug /var/crash/vmcore.0 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> <118>Starting vboxservice. >> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 >> 18:37:45) release log >> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z >> <118>00:00:00.000162 main OS Product: FreeBSD >> <118>00:00:00.000171 main OS Release: 12.0-CURRENT >> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT r313999M >> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService >> <118>00:00:00.000194 main Process ID: 609 >> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE) >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 02 >> fault virtual address = 0xd6 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0x80d4ebaf >> stack pointer = 0x28:0xfe0122e2bef0 >> frame pointer = 0x28:0xfe0122e2bf00 >> code segment= base 0x0, limit 0xf, type 0x1b >>= DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags= interrupt enabled, resume, IOPL = 0 >> current process = 609 (VBoxService) >> > > > >> #9 0x80eb6be1 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:236 >> #10 0x80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at >> /usr/src/sys/vm/vm_map.c:501 > > The function is: > void > _vm_map_lock(vm_map_t map, const char *file, int line) > { > >if
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote: > On 2017-Feb-20, at 2:58 PM, Mark Millardwrote: > > > On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > > > >> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: > >>> [Note: I experiment with clang based powerpc64 builds, > >>> reporting problems that I find. Justin is familiar > >>> with this, as is Nathan.] > >>> > >>> I tried to update the PowerMac G5 (a so-called "Quad Core") > >>> that I have access to from head -r312761 to -r313864 and > >>> ended up with random panics and hang ups in fairly short > >>> order after booting. > >>> > >>> Some approximate bisecting for the kernel lead to: > >>> (sometimes getting part way into a buildkernel attempt > >>> for a different version before a failure happens) > >>> > >>> -r313266: works (just before use of atomic_fcmpset) > >>> vs. > >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) > >>> > >>> (I did not try -r313268 through -r313270 as the use was > >>> gradually added.) > >>> > >>> So I'm currently running a -r313864 world with a -r313266 > >>> kernel. > >>> > >>> No kernel that I tried that was from before -r313266 had the > >>> problems. > >>> > >>> Any kernel that I tried that was from after -r313271 had the > >>> problems. > >>> > >>> Of course I did not try them all in other direction. :) > >>> > >> > >> I found that spin mutexes were not properly handling this, fixed in > >> r313996. > >> > >> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > >> fcmpset to simulate failures. Everything works, while it would easily > >> fail without the patch. > >> > >> That said, I hope this concludes the 'missing check for not-reread value > >> of failed fcmpset' saga. > >> > >> -- > >> Mateusz Guzik > > > > I tried to update from -r313864 to -r313999 in my amd64 context > > (a VirtualBox machine under macOS) but it now crashes late in > > the boot sequence (after it processes a dump if I make one but > > before I can log in). > > > > This update was via my usual explicit svnlite update; buildworld > > buildkernel; etc. production style build of world and kernel, > > including use of MALLOC_PRODUCTION. > > > > The window shows: > > > > _vm_map_lock+0xf > > vm_map_wire+0x32 > > rtROMemObjNativeLockInMap+0x8c > > rtROMemObjNativeLockUser+0x51 > > RTR0MemObjLockUserTag+0x231 > > vbglR0HGCMInternalPreprocessCall+0x65d > > vbglR0HGCMInternalCall+0x17c > > vgdrvIoCtl_HGCMCall+0x43f > > VGDrvCommonIoCtl+0x261 > > vgdrvFreeBSDIOCtl+0x2cd > > devfs_ioctl+0xae > > VOP_IOCTL_APV+0x88 > > vn_ioctl+0x161 > > devfs_ioctl_f+0x1f > > kern_ioctl+0x280 > > sys_ioctl+0x13f > > amd64_syscall+0x397 > > Xfast_syscall+0xfb > > More detail from booting with the -r313864 kernel.old > and using kgdb on what the dump produced: > > # kgdb kernel.debug /var/crash/vmcore. > /var/crash/vmcore.0/var/crash/vmcore.last > # kgdb kernel.debug /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > <118>Starting vboxservice. > <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 > 18:37:45) release log > <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z > <118>00:00:00.000162 main OS Product: FreeBSD > <118>00:00:00.000171 main OS Release: 12.0-CURRENT > <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT r313999M > <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService > <118>00:00:00.000194 main Process ID: 609 > <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE) > > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 02 > fault virtual address = 0xd6 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80d4ebaf > stack pointer = 0x28:0xfe0122e2bef0 > frame pointer = 0x28:0xfe0122e2bf00 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 609 (VBoxService) > > #9 0x80eb6be1 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:236 > #10 0x80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at > /usr/src/sys/vm/vm_map.c:501 The function is: void _vm_map_lock(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_lock_flags_(>system_mtx, 0, file, line); else
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-20, at 2:58 PM, Mark Millardwrote: > On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > >> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >>> [Note: I experiment with clang based powerpc64 builds, >>> reporting problems that I find. Justin is familiar >>> with this, as is Nathan.] >>> >>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>> that I have access to from head -r312761 to -r313864 and >>> ended up with random panics and hang ups in fairly short >>> order after booting. >>> >>> Some approximate bisecting for the kernel lead to: >>> (sometimes getting part way into a buildkernel attempt >>> for a different version before a failure happens) >>> >>> -r313266: works (just before use of atomic_fcmpset) >>> vs. >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>> >>> (I did not try -r313268 through -r313270 as the use was >>> gradually added.) >>> >>> So I'm currently running a -r313864 world with a -r313266 >>> kernel. >>> >>> No kernel that I tried that was from before -r313266 had the >>> problems. >>> >>> Any kernel that I tried that was from after -r313271 had the >>> problems. >>> >>> Of course I did not try them all in other direction. :) >>> >> >> I found that spin mutexes were not properly handling this, fixed in >> r313996. >> >> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >> fcmpset to simulate failures. Everything works, while it would easily >> fail without the patch. >> >> That said, I hope this concludes the 'missing check for not-reread value >> of failed fcmpset' saga. >> >> -- >> Mateusz Guzik > > I tried to update from -r313864 to -r313999 in my amd64 context > (a VirtualBox machine under macOS) but it now crashes late in > the boot sequence (after it processes a dump if I make one but > before I can log in). > > This update was via my usual explicit svnlite update; buildworld > buildkernel; etc. production style build of world and kernel, > including use of MALLOC_PRODUCTION. > > The window shows: > > _vm_map_lock+0xf > vm_map_wire+0x32 > rtROMemObjNativeLockInMap+0x8c > rtROMemObjNativeLockUser+0x51 > RTR0MemObjLockUserTag+0x231 > vbglR0HGCMInternalPreprocessCall+0x65d > vbglR0HGCMInternalCall+0x17c > vgdrvIoCtl_HGCMCall+0x43f > VGDrvCommonIoCtl+0x261 > vgdrvFreeBSDIOCtl+0x2cd > devfs_ioctl+0xae > VOP_IOCTL_APV+0x88 > vn_ioctl+0x161 > devfs_ioctl_f+0x1f > kern_ioctl+0x280 > sys_ioctl+0x13f > amd64_syscall+0x397 > Xfast_syscall+0xfb More detail from booting with the -r313864 kernel.old and using kgdb on what the dump produced: # kgdb kernel.debug /var/crash/vmcore. /var/crash/vmcore.0/var/crash/vmcore.last # kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: <118>Starting vboxservice. <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 18:37:45) release log <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.34808Z <118>00:00:00.000162 main OS Product: FreeBSD <118>00:00:00.000171 main OS Release: 12.0-CURRENT <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT r313999M <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService <118>00:00:00.000194 main Process ID: 609 <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE) Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0xd6 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80d4ebaf stack pointer = 0x28:0xfe0122e2bef0 frame pointer = 0x28:0xfe0122e2bf00 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 609 (VBoxService) Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/modules/vboxguest.ko...done. Loaded symbols for /boot/modules/vboxguest.ko #0 doadump (textdump=0) at pcpu.h:232 232 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=0) at pcpu.h:232 #1 0x8039dd0b in db_dump (dummy=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:546 #2
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-20, at 11:10 AM, Mateusz Guzik wrote: > On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >> [Note: I experiment with clang based powerpc64 builds, >> reporting problems that I find. Justin is familiar >> with this, as is Nathan.] >> >> I tried to update the PowerMac G5 (a so-called "Quad Core") >> that I have access to from head -r312761 to -r313864 and >> ended up with random panics and hang ups in fairly short >> order after booting. >> >> Some approximate bisecting for the kernel lead to: >> (sometimes getting part way into a buildkernel attempt >> for a different version before a failure happens) >> >> -r313266: works (just before use of atomic_fcmpset) >> vs. >> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >> >> (I did not try -r313268 through -r313270 as the use was >> gradually added.) >> >> So I'm currently running a -r313864 world with a -r313266 >> kernel. >> >> No kernel that I tried that was from before -r313266 had the >> problems. >> >> Any kernel that I tried that was from after -r313271 had the >> problems. >> >> Of course I did not try them all in other direction. :) >> > > I found that spin mutexes were not properly handling this, fixed in > r313996. > > Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 > fcmpset to simulate failures. Everything works, while it would easily > fail without the patch. > > That said, I hope this concludes the 'missing check for not-reread value > of failed fcmpset' saga. > > -- > Mateusz Guzik I tried to update from -r313864 to -r313999 in my amd64 context (a VirtualBox machine under macOS) but it now crashes late in the boot sequence (after it processes a dump if I make one but before I can log in). This update was via my usual explicit svnlite update; buildworld buildkernel; etc. production style build of world and kernel, including use of MALLOC_PRODUCTION. The window shows: _vm_map_lock+0xf vm_map_wire+0x32 rtROMemObjNativeLockInMap+0x8c rtROMemObjNativeLockUser+0x51 RTR0MemObjLockUserTag+0x231 vbglR0HGCMInternalPreprocessCall+0x65d vbglR0HGCMInternalCall+0x17c vgdrvIoCtl_HGCMCall+0x43f VGDrvCommonIoCtl+0x261 vgdrvFreeBSDIOCtl+0x2cd devfs_ioctl+0xae VOP_IOCTL_APV+0x88 vn_ioctl+0x161 devfs_ioctl_f+0x1f kern_ioctl+0x280 sys_ioctl+0x13f amd64_syscall+0x397 Xfast_syscall+0xfb === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: > [Note: I experiment with clang based powerpc64 builds, > reporting problems that I find. Justin is familiar > with this, as is Nathan.] > > I tried to update the PowerMac G5 (a so-called "Quad Core") > that I have access to from head -r312761 to -r313864 and > ended up with random panics and hang ups in fairly short > order after booting. > > Some approximate bisecting for the kernel lead to: > (sometimes getting part way into a buildkernel attempt > for a different version before a failure happens) > > -r313266: works (just before use of atomic_fcmpset) > vs. > -r313271: fails (last of the "use atomic_fcmpset" check-ins) > > (I did not try -r313268 through -r313270 as the use was > gradually added.) > > So I'm currently running a -r313864 world with a -r313266 > kernel. > > No kernel that I tried that was from before -r313266 had the > problems. > > Any kernel that I tried that was from after -r313271 had the > problems. > > Of course I did not try them all in other direction. :) > I found that spin mutexes were not properly handling this, fixed in r313996. Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 fcmpset to simulate failures. Everything works, while it would easily fail without the patch. That said, I hope this concludes the 'missing check for not-reread value of failed fcmpset' saga. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Sat, Feb 18, 2017 at 01:58:49PM -0800, Mark Millard wrote: > On 2017-Feb-18, at 12:58 PM, Mateusz Guzik wrote: > > Well either the primitive itself is buggy or the somewhat (now) unusual > > condition of not providing the failed value (but possibly a stale one) > > is not handled correctly in locking code. > > > > That said, I would start with putting barriers "on both sides" of > > powerpc's fcmpset for debugging purposes and if the problem persists I > > can add some debugs to locking priitmives. > > > > I currently have the only powerpc64 that I have access > to for now doing a test that will likely finish tonight > sometime (if it has no problems). > > Also I'm not so familiar with powerpc64 details as to be > able insert proper barriers and the like off the top of > my head: It is more of a research subject for me. > This was a suggestion to jhibbits@. Looking at the code it is not hard to slap them in for testing purposes, or maybe there is an "obvious now that I look at it" braino in there, or maybe he has a better idea. Now that I wrote it I can get myself access to powerpc boxes. While I wont be able to run bsd on them, I can hack around in userapce and see. That's unless jhibbits@ steps in. I have no clue about ppc. > It looks like contexts like __rw_wlock_hard(c,v,tid,file,line) > now needs the caller to do an equivalent of: > > __rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line) > > in order for the code behavior to match the old behavior > that was based on the original local-v's initialization > before v was used: > > rw = rwlock2rw(c); > v = RW_READ_VALUE(rw); /* this line no longer exists */ > > This means that checking for equivalence is no longer > local to the routine but involves checking all the > usage of the routine. > Not reading the argument locally was the entire point of introducing fcmpset. Otherwise the 'v' argument would be a waste of time. Some primitives can attempt grabbing the lock and if they fail, we have the lock value to work with (e.g. check who owns the lock and see if they are running). In particular amd64 will give us the value it found. An explicit read requires whoever owns the cachelilne to lose the exclusive ownership and if the lock is contended (multiple cpus doing fcmpset), this makes the cachelilne ping-pong between cores. This destroys performance especially on systems with many cores and especially so with multiple numa nodes. Other primitives don't have inline variants. This concerns read-write locks which try to: retry: r = lock_value(lock); if (!locked(r)) { if (!cmpset(lock, r, r + ONE_READER)) goto retry; } That is, if multiple cpus try to get the lock for reading, one will fail and willl lbe forced to compute the new value to be set. The longer the time between attempts the more likely it is other core showed up trying to do the same thing with the same value, causing another failed attempt. So here there are no inlilnes so that the time is shorter and fcmpset alllows NOT reading the lock value explicitely - it was already provided by hardware. Note this is still significantly slower than it has to be in principle - the lock can 'blilndly increment by ONE_READER and see what happens', but that requires several changes and is a subject for another e-mail. I'm working on it though. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-18, at 12:58 PM, Mateusz Guzik wrote: > On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote: >> On 2017-Feb-18, at 4:18 AM, Mark Millard wrote: >> >>> [Note: I experiment with clang based powerpc64 builds, >>> reporting problems that I find. Justin is familiar >>> with this, as is Nathan.] >>> >>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>> that I have access to from head -r312761 to -r313864 and >>> ended up with random panics and hang ups in fairly short >>> order after booting. >>> >>> Some approximate bisecting for the kernel lead to: >>> (sometimes getting part way into a buildkernel attempt >>> for a different version before a failure happens) >>> >>> -r313266: works (just before use of atomic_fcmpset) >>> vs. >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>> >>> (I did not try -r313268 through -r313270 as the use was >>> gradually added.) >>> >>> So I'm currently running a -r313864 world with a -r313266 >>> kernel. >>> >>> No kernel that I tried that was from before -r313266 had the >>> problems. >>> >>> Any kernel that I tried that was from after -r313271 had the >>> problems. >>> >>> Of course I did not try them all in other direction. :) >> >> [Of course: "either direction".] >> >> I'll note that the -r313864 buildworld was without >> MALLOC_PRODUCTION being defined. (Unusual for me but >> I'm testing if a jemalloc assert problem on arm64 >> also happens on powerpc64.) >> >> By contrast the buildkernels were production style >> (as is normal for me unless I'm trying to track >> something down that I think might be exposed by >> the extra checks). >> > > Well either the primitive itself is buggy or the somewhat (now) unusual > condition of not providing the failed value (but possibly a stale one) > is not handled correctly in locking code. > > That said, I would start with putting barriers "on both sides" of > powerpc's fcmpset for debugging purposes and if the problem persists I > can add some debugs to locking priitmives. > > -- > Mateusz Guzik I currently have the only powerpc64 that I have access to for now doing a test that will likely finish tonight sometime (if it has no problems). Also I'm not so familiar with powerpc64 details as to be able insert proper barriers and the like off the top of my head: It is more of a research subject for me. Side note: It looks like contexts like __rw_wlock_hard(c,v,tid,file,line) now needs the caller to do an equivalent of: __rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line) in order for the code behavior to match the old behavior that was based on the original local-v's initialization before v was used: rw = rwlock2rw(c); v = RW_READ_VALUE(rw); /* this line no longer exists */ This means that checking for equivalence is no longer local to the routine but involves checking all the usage of the routine. I've not done such so for all I know such usage is always in place: This is not a claim of a problem. The other routines in kern_rwlock.c still have local variables and the original initializations. I just thought that this was interesting. I've not looked at other files yet. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote: > On 2017-Feb-18, at 4:18 AM, Mark Millard wrote: > > > [Note: I experiment with clang based powerpc64 builds, > > reporting problems that I find. Justin is familiar > > with this, as is Nathan.] > > > > I tried to update the PowerMac G5 (a so-called "Quad Core") > > that I have access to from head -r312761 to -r313864 and > > ended up with random panics and hang ups in fairly short > > order after booting. > > > > Some approximate bisecting for the kernel lead to: > > (sometimes getting part way into a buildkernel attempt > > for a different version before a failure happens) > > > > -r313266: works (just before use of atomic_fcmpset) > > vs. > > -r313271: fails (last of the "use atomic_fcmpset" check-ins) > > > > (I did not try -r313268 through -r313270 as the use was > > gradually added.) > > > > So I'm currently running a -r313864 world with a -r313266 > > kernel. > > > > No kernel that I tried that was from before -r313266 had the > > problems. > > > > Any kernel that I tried that was from after -r313271 had the > > problems. > > > > Of course I did not try them all in other direction. :) > > [Of course: "either direction".] > > I'll note that the -r313864 buildworld was without > MALLOC_PRODUCTION being defined. (Unusual for me but > I'm testing if a jemalloc assert problem on arm64 > also happens on powerpc64.) > > By contrast the buildkernels were production style > (as is normal for me unless I'm trying to track > something down that I think might be exposed by > the extra checks). > Well either the primitive itself is buggy or the somewhat (now) unusual condition of not providing the failed value (but possibly a stale one) is not handled correctly in locking code. That said, I would start with putting barriers "on both sides" of powerpc's fcmpset for debugging purposes and if the problem persists I can add some debugs to locking priitmives. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
On 2017-Feb-18, at 4:18 AM, Mark Millard wrote: > [Note: I experiment with clang based powerpc64 builds, > reporting problems that I find. Justin is familiar > with this, as is Nathan.] > > I tried to update the PowerMac G5 (a so-called "Quad Core") > that I have access to from head -r312761 to -r313864 and > ended up with random panics and hang ups in fairly short > order after booting. > > Some approximate bisecting for the kernel lead to: > (sometimes getting part way into a buildkernel attempt > for a different version before a failure happens) > > -r313266: works (just before use of atomic_fcmpset) > vs. > -r313271: fails (last of the "use atomic_fcmpset" check-ins) > > (I did not try -r313268 through -r313270 as the use was > gradually added.) > > So I'm currently running a -r313864 world with a -r313266 > kernel. > > No kernel that I tried that was from before -r313266 had the > problems. > > Any kernel that I tried that was from after -r313271 had the > problems. > > Of course I did not try them all in other direction. :) [Of course: "either direction".] I'll note that the -r313864 buildworld was without MALLOC_PRODUCTION being defined. (Unusual for me but I'm testing if a jemalloc assert problem on arm64 also happens on powerpc64.) By contrast the buildkernels were production style (as is normal for me unless I'm trying to track something down that I think might be exposed by the extra checks). === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
[Note: I experiment with clang based powerpc64 builds, reporting problems that I find. Justin is familiar with this, as is Nathan.] I tried to update the PowerMac G5 (a so-called "Quad Core") that I have access to from head -r312761 to -r313864 and ended up with random panics and hang ups in fairly short order after booting. Some approximate bisecting for the kernel lead to: (sometimes getting part way into a buildkernel attempt for a different version before a failure happens) -r313266: works (just before use of atomic_fcmpset) vs. -r313271: fails (last of the "use atomic_fcmpset" check-ins) (I did not try -r313268 through -r313270 as the use was gradually added.) So I'm currently running a -r313864 world with a -r313266 kernel. No kernel that I tried that was from before -r313266 had the problems. Any kernel that I tried that was from after -r313271 had the problems. Of course I did not try them all in other direction. :) === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"