Re: signal 8 (floating point exception) upon resume
On Saturday, March 29, 2014 8:23:44 pm Adrian Chadd wrote: ... nope, just had a process die from SIGFPE. Does it still trigger SIGFPE if you suspend/resume a UP kernel? -a On 29 March 2014 07:32, Adrian Chadd adr...@freebsd.org wrote: Hi! On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote: i386_fpu_suspend3.patch at the same URL builds for me. I've not had the kernel lose the plot yet with SIGFPE's. I'll do some further testing and let you know if that changes. -a -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
Hi! On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote: i386_fpu_suspend3.patch at the same URL builds for me. I've not had the kernel lose the plot yet with SIGFPE's. I'll do some further testing and let you know if that changes. -a ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
... nope, just had a process die from SIGFPE. -a On 29 March 2014 07:32, Adrian Chadd adr...@freebsd.org wrote: Hi! On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote: i386_fpu_suspend3.patch at the same URL builds for me. I've not had the kernel lose the plot yet with SIGFPE's. I'll do some further testing and let you know if that changes. -a ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Tuesday, March 25, 2014 5:38:50 pm Adrian Chadd wrote: On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote: On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote: [snip] Hi, As part of this thread, a whole lot of stuff was thrown around to try and fix / improve the correctness of this. But it still happens to me in -HEAD i386. I updated to r263418 and it's now doing it around 30-50% of the time I resume. Yes, nothing has changed in HEAD. So, since I really am trying to avoid getting neck deep in learning (by myself) a new thing right now, would someone be willing to help me through the process of (a) learning how this is all supposed to work (which thanks to jhb and bde, I think I've learnt from the posts in this thread) and (b) some things to try out? I'll be able to report the results of this pretty quickly. You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch. You could have tried the first patch I posted here earlier when I first posted it as well. :) There was a lot of chatter, I thought it was prudent to let it all settle before jumping in. Anyway: You do understand C well enough to fix simple typos? I don't have any i386 machines around, but I'll work on cross-building. --- npx.o --- /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18: error: declaration of 'union safefpu' will not be visible outside of this function [-Werror,-Wvisibility] npxsuspend(union safefpu *addr) ^ s/safe/save/ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:779:8: error: implicit declaration of function 'rcr' is invalid in C99 [-Werror,-Wimplicit-function-declaration] cr0 = rcr(0); ^ Probably just need #include machine/cpufunc.h -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Wednesday, March 26, 2014 1:43:22 pm John Baldwin wrote: On Tuesday, March 25, 2014 5:38:50 pm Adrian Chadd wrote: On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote: On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote: [snip] Hi, As part of this thread, a whole lot of stuff was thrown around to try and fix / improve the correctness of this. But it still happens to me in -HEAD i386. I updated to r263418 and it's now doing it around 30-50% of the time I resume. Yes, nothing has changed in HEAD. So, since I really am trying to avoid getting neck deep in learning (by myself) a new thing right now, would someone be willing to help me through the process of (a) learning how this is all supposed to work (which thanks to jhb and bde, I think I've learnt from the posts in this thread) and (b) some things to try out? I'll be able to report the results of this pretty quickly. You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch. You could have tried the first patch I posted here earlier when I first posted it as well. :) There was a lot of chatter, I thought it was prudent to let it all settle before jumping in. Anyway: You do understand C well enough to fix simple typos? I don't have any i386 machines around, but I'll work on cross-building. i386_fpu_suspend3.patch at the same URL builds for me. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote: [snip] Hi, As part of this thread, a whole lot of stuff was thrown around to try and fix / improve the correctness of this. But it still happens to me in -HEAD i386. I updated to r263418 and it's now doing it around 30-50% of the time I resume. Yes, nothing has changed in HEAD. So, since I really am trying to avoid getting neck deep in learning (by myself) a new thing right now, would someone be willing to help me through the process of (a) learning how this is all supposed to work (which thanks to jhb and bde, I think I've learnt from the posts in this thread) and (b) some things to try out? I'll be able to report the results of this pretty quickly. You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch. You could have tried the first patch I posted here earlier when I first posted it as well. :) -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote: On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote: [snip] Hi, As part of this thread, a whole lot of stuff was thrown around to try and fix / improve the correctness of this. But it still happens to me in -HEAD i386. I updated to r263418 and it's now doing it around 30-50% of the time I resume. Yes, nothing has changed in HEAD. So, since I really am trying to avoid getting neck deep in learning (by myself) a new thing right now, would someone be willing to help me through the process of (a) learning how this is all supposed to work (which thanks to jhb and bde, I think I've learnt from the posts in this thread) and (b) some things to try out? I'll be able to report the results of this pretty quickly. You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch. You could have tried the first patch I posted here earlier when I first posted it as well. :) There was a lot of chatter, I thought it was prudent to let it all settle before jumping in. Anyway: --- npx.o --- /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18: error: declaration of 'union safefpu' will not be visible outside of this function [-Werror,-Wvisibility] npxsuspend(union safefpu *addr) ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:1: error: conflicting types for 'npxsuspend' npxsuspend(union safefpu *addr) ^ ./machine/npx.h:59:6: note: previous declaration is here voidnpxsuspend(union savefpu *addr); ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:776:9: error: incomplete type 'union safefpu' is not assignable *addr = npx_initialstate; ~ ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18: note: forward declaration of 'union safefpu' npxsuspend(union safefpu *addr) ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:779:8: error: implicit declaration of function 'rcr' is invalid in C99 [-Werror,-Wimplicit-function-declaration] cr0 = rcr(0); ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:781:10: error: incompatible pointer types passing 'union safefpu *' to parameter of type 'union savefpu *' [-Werror,-Wincompatible-pointer-types] fpusave(addr); ^~~~ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:160:36: note: passing argument to parameter here static voidfpusave(union savefpu *); ^ /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:782:2: error: implicit declaration of function 'load_cr' is invalid in C99 [-Werror,-Wimplicit-function-declaration] load_cr(0, cr0); ^ 6 errors generated. *** [npx.o] Error code 1 -a ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
[snip] Hi, As part of this thread, a whole lot of stuff was thrown around to try and fix / improve the correctness of this. But it still happens to me in -HEAD i386. I updated to r263418 and it's now doing it around 30-50% of the time I resume. So, since I really am trying to avoid getting neck deep in learning (by myself) a new thing right now, would someone be willing to help me through the process of (a) learning how this is all supposed to work (which thanks to jhb and bde, I think I've learnt from the posts in this thread) and (b) some things to try out? I'll be able to report the results of this pretty quickly. I'd like to start work on supporting and power efficiency stuff on some of the chromebook and tablet hardware using Intel stuff but it's going to be totally moot if i386 suspend/resume (and vt/xorg, but that's a different thread) is this busted. :-) Thanks, -a ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Mon, 10 Mar 2014, John Baldwin wrote: On Tuesday, March 04, 2014 4:50:01 pm Bruce Evans wrote: On Tue, 4 Mar 2014, John Baldwin wrote: % Index: i386/i386/swtch.s % === % --- i386/i386/swtch.s (revision 262711) % +++ i386/i386/swtch.s (working copy) [...savectx()] This function is mostly bogus (see old mails). I was going off of the commit logs for amd64 that removed this code as savectx() is not used for fork(), only for IPI_STOP and suspend/resume. Without fxsave, npxsuspend() cannot be atomic without locking, since fnsave destroys the state in the FPU and you either need a lock to reload the old state atomically enough, or a lock to modify FPCURTHREAD atomically enough. save_ctx() is now only called from IPI handlers or when doing suspend in which case we shouldn't have to worry about being preempted. I don't understand the suspend part. Is sufficient locking held througout suspend/resume to prevent states changing after they have been saved here? % @@ -520,7 +490,16 @@ % movl%eax,%dr7 % % #ifdef DEV_NPX % - /* XXX FIX ME */ % + /* Restore FPU state */ Is the problem just this missing functionality? Possibly. I now think it was just the clobbering of %cr0 so i386 never had the problem. I think on amd64 there was also the desire to have the pcb state be meaningful in dumps (since we IPI_STOP before a dump). OTOH, It should also be meaningful in debuggers. Hopefully stop IPIs put it there form all stopped CPUs. I think it remains in the FPU for the running CPU. the current approach used by amd64 (and this patch for i386) is to not dirty fpcurthread's state during save_ctx(), but to instead leave fpcurthread alone and explicitly save whatever state the FPU is in in the PCB used for IPI_STOP or suspend. Hmm, if kernel debuggers actually supported displaying the FPU state, then they would prefer to find it in the PCB only (after debugger entry puts it there), but this doesn't work in places like the dna trap handler. Similarly for IPIs and suspend. The dna trap handler would be broken unless any saving in the PCB is undone when normal operation is resumed, and it seems more difficult to undo it than to save specially so as not to have anything to undo. It is OK to save in the usual place in the PCB so that debuggers can find it more easily (since that place is not used in normal operation), but not to change the state in the CPU+FPU across the operation. Harmful state changes in the CPU+FPU include toggling CR0_TS and implicit fninit. For suspend/resume, we have no option but to undo everything, since other things may clobber the state. % @@ -761,7 +761,34 @@ % PCPU_SET(fpcurthread, NULL); % } % % +/* % + * Unconditionally save the current co-processor state across suspend and % + * resume. % + */ % void % +npxsuspend(union safefpu *addr) % +{ % + register_t cr0; % + % + if (!hw_float) % + return; % + cr0 = rcr(0); % + clts(); % + fpusave(addr); % + load_cr(0, cr0); % +} In the !fxsave case, this destroys the state in the npx, leaving fpcurthread invalid. It also does the save when the state in the npx is inactive. I think jkim intentionally this state so that resume can load it unconditionally. It must be arranged that there are no interactions with fpcurthread. Given the single-threaded nature of suspend/resume and IPI_STOP / restart_cpus(), those requirements are met, so it should be safe to resume whatever state was in the FPU and leave fpcurthread unchanged. Is the whole suspend/resume really locked? This doesn't work so well without fxsave. When fpcurthread != NULL, reloading CR0 keeps CR0_TS and thus ensures that inconsistent state lives for longer. Things will only be OK if fpcurthread isn't changed until resume. After the save_ctx() the CPU is going to either resume without doing a resume_ctx (IPI_STOP case) leaving fpcurthread unchanged (so save_ctx() just grabbed a snapshot of the FPU state for debugging purposes) or the CPU is going to power off for suspend. If it doesn't restore for IPI_STOP, then it will continue with the state clobbered by fnsave in the !fxsr case. That is rare but can happen. Most CPUs that have IPIs also have fxsr. But on at least i386, there is an option to disable fxsr. During resume it will invoke resume_ctx() which will restore the FPU state (whatever state it was in) and fpcurthread and only after those are true is the CPU able to run other threads which will modify or use the FPU state. You can probably fix this by using the old code here. The old code doesn't need the hw_float test, since fpcurthread != NULL implies hw_float != 0. Actually, I don't see any need to change anything on i386 -- after storing the state for the thread, there should be no need to store it anywhere else across suspend/resume. We intentionally use this method (even on amd64 IIRC), although it is
Re: signal 8 (floating point exception) upon resume
On Tuesday, March 04, 2014 4:50:01 pm Bruce Evans wrote: On Tue, 4 Mar 2014, John Baldwin wrote: % Index: i386/i386/swtch.s % === % --- i386/i386/swtch.s (revision 262711) % +++ i386/i386/swtch.s (working copy) % @@ -417,42 +417,9 @@ % str PCB_TR(%ecx) % % #ifdef DEV_NPX % - /* % -* If fpcurthread == NULL, then the npx h/w state is irrelevant and the % -* state had better already be in the pcb. This is true for forks % -* but not for dumps (the old book-keeping with FP flags in the pcb % -* always lost for dumps because the dump pcb has 0 flags). % -* % -* If fpcurthread != NULL, then we have to save the npx h/w state to % -* fpcurthread's pcb and copy it to the requested pcb, or save to the % -* requested pcb and reload. Copying is easier because we would % -* have to handle h/w bugs for reloading. We used to lose the % -* parent's npx state for forks by forgetting to reload. % -*/ This function is mostly bogus (see old mails). I was going off of the commit logs for amd64 that removed this code as savectx() is not used for fork(), only for IPI_STOP and suspend/resume. Without fxsave, npxsuspend() cannot be atomic without locking, since fnsave destroys the state in the FPU and you either need a lock to reload the old state atomically enough, or a lock to modify FPCURTHREAD atomically enough. save_ctx() is now only called from IPI handlers or when doing suspend in which case we shouldn't have to worry about being preempted. % % movl$1,%eax % ... % @@ -520,7 +490,16 @@ % movl%eax,%dr7 % % #ifdef DEV_NPX % - /* XXX FIX ME */ % + /* Restore FPU state */ Is the problem just this missing functionality? Possibly. I think on amd64 there was also the desire to have the pcb state be meaningful in dumps (since we IPI_STOP before a dump). OTOH, the current approach used by amd64 (and this patch for i386) is to not dirty fpcurthread's state during save_ctx(), but to instead leave fpcurthread alone and explicitly save whatever state the FPU is in in the PCB used for IPI_STOP or suspend. % @@ -761,7 +761,34 @@ % PCPU_SET(fpcurthread, NULL); % } % % +/* % + * Unconditionally save the current co-processor state across suspend and % + * resume. % + */ % void % +npxsuspend(union safefpu *addr) % +{ % + register_t cr0; % + % + if (!hw_float) % + return; % + cr0 = rcr(0); % + clts(); % + fpusave(addr); % + load_cr(0, cr0); % +} In the !fxsave case, this destroys the state in the npx, leaving fpcurthread invalid. It also does the save when the state in the npx is inactive. I think jkim intentionally this state so that resume can load it unconditionally. It must be arranged that there are no interactions with fpcurthread. Given the single-threaded nature of suspend/resume and IPI_STOP / restart_cpus(), those requirements are met, so it should be safe to resume whatever state was in the FPU and leave fpcurthread unchanged. This doesn't work so well without fxsave. When fpcurthread != NULL, reloading CR0 keeps CR0_TS and thus ensures that inconsistent state lives for longer. Things will only be OK if fpcurthread isn't changed until resume. After the save_ctx() the CPU is going to either resume without doing a resume_ctx (IPI_STOP case) leaving fpcurthread unchanged (so save_ctx() just grabbed a snapshot of the FPU state for debugging purposes) or the CPU is going to power off for suspend. During resume it will invoke resume_ctx() which will restore the FPU state (whatever state it was in) and fpcurthread and only after those are true is the CPU able to run other threads which will modify or use the FPU state. You can probably fix this by using the old code here. The old code doesn't need the hw_float test, since fpcurthread != NULL implies hw_float != 0. Actually, I don't see any need to change anything on i386 -- after storing the state for the thread, there should be no need to store it anywhere else across suspend/resume. We intentionally use this method (even on amd64 IIRC), although it is suboptimal, to reduce complications for context switchres and signal handling. npxsave() takes an address, but savectx() didn't abuse this to store directly in the special save area. It made npxsave() store in the pcb, and then copied to the special area. So I guess that is one option is to always clear fpcurthread during suspend and just do an fninit on resume. However, I chose to match what amd64 does for now. I did make one change locally which was to not bother saving the FPU state if fpcurthread was NULL during save_ctx, but to instead store a copy of 'npx_initial_state' in the pcb instead. This is then loaded into the FPU on resume. Is that sufficient for the !fxsave case? -- John Baldwin ___
Re: signal 8 (floating point exception) upon resume
On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote: I'll try this soon. I had it fail back to newcons, rather than Xorg normally dying without restoring state. It wouldn't let me spawn a shell. Logging in worked fine, but normal shell exec would eventually and quickly lead to failure, dropping me back to the login prompt. If you have set CPUTYPE in /etc/src.conf such that your userland binaries are built with SSE, etc. then I expect most things to break because the FPU is in a funky state without this patch. I suspect if you don't set CPUTYPE so that your userland binaries do not use the FPU, you can probably resume just fine without this fix. -a On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote: On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch -- John Baldwin -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
I'm not using anything in /etc/src.conf . -a On 4 March 2014 08:24, John Baldwin j...@freebsd.org wrote: On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote: I'll try this soon. I had it fail back to newcons, rather than Xorg normally dying without restoring state. It wouldn't let me spawn a shell. Logging in worked fine, but normal shell exec would eventually and quickly lead to failure, dropping me back to the login prompt. If you have set CPUTYPE in /etc/src.conf such that your userland binaries are built with SSE, etc. then I expect most things to break because the FPU is in a funky state without this patch. I suspect if you don't set CPUTYPE so that your userland binaries do not use the FPU, you can probably resume just fine without this fix. -a On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote: On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch -- John Baldwin -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2014-03-04 11:24:04 -0500, John Baldwin wrote: On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote: I'll try this soon. I had it fail back to newcons, rather than Xorg normally dying without restoring state. It wouldn't let me spawn a shell. Logging in worked fine, but normal shell exec would eventually and quickly lead to failure, dropping me back to the login prompt. If you have set CPUTYPE in /etc/src.conf such that your userland binaries are built with SSE, etc. then I expect most things to break because the FPU is in a funky state without this patch. I suspect if you don't set CPUTYPE so that your userland binaries do not use the FPU, you can probably resume just fine without this fix. -a On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote: On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch The patch for sys/amd64/amd64/cpu_switch.S is committed: http://svnweb.freebsd.org/changeset/base/262746 i386 patches may be reviewed by the original author (CC'ed). Jung-uk Kim -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (FreeBSD) iQEcBAEBAgAGBQJTFi3RAAoJEHyflib82/FGnboH/3qrAp+Eq/0eebEP/9wU0Ke/ y4y4yvw9nDVexKZ+c5VuTxyWvK9O0w2b+r3f5kuHWferOm22NaJCctt3E/OA5Ly2 1p3ZPvqD5cRZfkdh68AwEeJv93lg84VMSUqNUfS9rsrIU+WpHpPR46sdLpq5KxSP cY2522npmoPrwk+PaTJS4uBQeaX/3vnj5996zxavwVqwlYyR+Zqgi6FhGj+F2RJ1 Ry+9icyNx/8lUfRTLCPsCBRjlUKUk/p/8bfbQK4mSef5Gd8ZAiqdyKqgdMBUYhNA ZplkpijJjvlIIc0dYSwg8gMKmaB6amgw/LJGQit9nTkBU2bOd6L05f1dCpYAxDE= =x0sS -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Tue, 4 Mar 2014, John Baldwin wrote: On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote: I'll try this soon. I had it fail back to newcons, rather than Xorg normally dying without restoring state. It wouldn't let me spawn a shell. Logging in worked fine, but normal shell exec would eventually and quickly lead to failure, dropping me back to the login prompt. If you have set CPUTYPE in /etc/src.conf such that your userland binaries are built with SSE, etc. then I expect most things to break because the FPU is in a funky state without this patch. I suspect if you don't set CPUTYPE so that your userland binaries do not use the FPU, you can probably resume just fine without this fix. Non-SSE FPU state might be broken too. Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch I forget many details of how this works, but noticed that it seems to break consistency of the state for the !fxsave case and related locking. % Index: i386/i386/swtch.s % === % --- i386/i386/swtch.s (revision 262711) % +++ i386/i386/swtch.s (working copy) % @@ -417,42 +417,9 @@ % str PCB_TR(%ecx) % % #ifdef DEV_NPX % - /* % - * If fpcurthread == NULL, then the npx h/w state is irrelevant and the % - * state had better already be in the pcb. This is true for forks % - * but not for dumps (the old book-keeping with FP flags in the pcb % - * always lost for dumps because the dump pcb has 0 flags). % - * % - * If fpcurthread != NULL, then we have to save the npx h/w state to % - * fpcurthread's pcb and copy it to the requested pcb, or save to the % - * requested pcb and reload. Copying is easier because we would % - * have to handle h/w bugs for reloading. We used to lose the % - * parent's npx state for forks by forgetting to reload. % - */ This function is mostly bogus (see old mails). % - pushfl % - CLI % - movlPCPU(FPCURTHREAD),%eax % - testl %eax,%eax % - je 1f This CLI/STI locking is bogus. Accesses to FPCURTHREAD are now locked by critical_enter(), as on amd64, and perhaps a higher level already did critical_enter() or even CLI. (CLI/STI in swtch.s seems to be bogus too. amd64 doesn't do it, and I think a higher level does mtx_lock_spin() which does too much, including CLI via spinlock_enter().) % - % - pushl %ecx % - movlTD_PCB(%eax),%eax % - movlPCB_SAVEFPU(%eax),%eax % - pushl %eax % - pushl %eax % - callnpxsave % + pushl PCB_FPUSUSPEND(%ecx) % + callnpxsuspend Without fxsave, npxsuspend() cannot be atomic without locking, since fnsave destroys the state in the FPU and you either need a lock to reload the old state atomically enough, or a lock to modify FPCURTHREAD atomically enough. Reloading the old state is problematic because the reload might trap. So the old version uses the second method. It calls npxsave() to handle most of the details. But npxsave() was designed to be efficient for its usual use in cpu_switch(), so it doesn't handle the detail of checking FPCURTHREAD or the locking needed for this check, so the above code had to handle these details. % addl$4,%esp % - popl%eax % - popl%ecx % - % - pushl $PCB_SAVEFPU_SIZE % - lealPCB_USERFPU(%ecx),%ecx % - pushl %ecx % - pushl %eax % - callbcopy % - addl$12,%esp % -1: % - popfl % #endif /* DEV_NPX */ This probably should never have been written in asm. Only the similar code in cpu_switch() is time-critical. % % movl $1,%eax % ... % @@ -520,7 +490,16 @@ % movl%eax,%dr7 % % #ifdef DEV_NPX % - /* XXX FIX ME */ % + /* Restore FPU state */ Is the problem just this missing functionality? % ... % Index: i386/isa/npx.c % === % --- i386/isa/npx.c(revision 262711) % +++ i386/isa/npx.c(working copy) This has many vestiges of support for interrupt handling (mainly in comments and in complications in the probe). CLI/STI was used for locking partly to reduce complications for the IRQ13 case. The comment before npxsave() still says that it needs CLI/STI locking by callers, but it actually needs critical_enter() locking and most callers only provided that. % @@ -761,7 +761,34 @@ % PCPU_SET(fpcurthread, NULL); % } % % +/* % + * Unconditionally save the current co-processor state across suspend and % + * resume. % + */ % void % +npxsuspend(union safefpu *addr) % +{ % + register_t cr0; % + % + if (!hw_float) % + return; % + cr0 = rcr(0); % + clts(); % + fpusave(addr); % + load_cr(0, cr0); % +} In the !fxsave case, this destroys the state in the npx, leaving fpcurthread invalid. It also does the save when the state in the npx is inactive. I think jkim
Re: signal 8 (floating point exception) upon resume
On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote: On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
I'll try this soon. I had it fail back to newcons, rather than Xorg normally dying without restoring state. It wouldn't let me spawn a shell. Logging in worked fine, but normal shell exec would eventually and quickly lead to failure, dropping me back to the login prompt. -a On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote: On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Complete stab in the dark (not compile tested) here: http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote: Hi, On my i386 -HEAD laptops (running -HEAD as of last night, but it's been a problem for a while) I occasionally hit a point where I get an FPE on _all_ processes upon resume. I can still do a clean shutdown through the power-button method, but I can't do anything else. Has anyone seen this before? Does anyone have an inkling of an idea why I'd be getting FPE's for things like ps and sh? I'm guessing fpcurthread is stale. We should probably be flushing the FPU state on suspend and starting off without any FPU state on resume. Ah, see this bit here in x86/acpica/acpi_wakeup.c: int acpi_sleep_machdep(struct acpi_softc *sc, int state) { ... if (savectx(susppcbs[0])) { #ifdef __amd64__ ctx_fpusave(susppcbs[0]-pcb_fpususpend); #endif ... } Looks like you need to implement ctx_fpusave() for i386. kib@ did it as part of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave() and have it call npxsave() instead of fpxsave()? Not sure if you'd need it to be in asm as it is on amd64 or if you can do this in C. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
... how'd this ever work in the past then? -a On 28 February 2014 13:08, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote: Hi, On my i386 -HEAD laptops (running -HEAD as of last night, but it's been a problem for a while) I occasionally hit a point where I get an FPE on _all_ processes upon resume. I can still do a clean shutdown through the power-button method, but I can't do anything else. Has anyone seen this before? Does anyone have an inkling of an idea why I'd be getting FPE's for things like ps and sh? I'm guessing fpcurthread is stale. We should probably be flushing the FPU state on suspend and starting off without any FPU state on resume. Ah, see this bit here in x86/acpica/acpi_wakeup.c: int acpi_sleep_machdep(struct acpi_softc *sc, int state) { ... if (savectx(susppcbs[0])) { #ifdef __amd64__ ctx_fpusave(susppcbs[0]-pcb_fpususpend); #endif ... } Looks like you need to implement ctx_fpusave() for i386. kib@ did it as part of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave() and have it call npxsave() instead of fpxsave()? Not sure if you'd need it to be in asm as it is on amd64 or if you can do this in C. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: signal 8 (floating point exception) upon resume
On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote: ... how'd this ever work in the past then? .. and I've submitted it as a PR: kern/187152 Thanks, -a -a On 28 February 2014 13:08, John Baldwin j...@freebsd.org wrote: On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote: Hi, On my i386 -HEAD laptops (running -HEAD as of last night, but it's been a problem for a while) I occasionally hit a point where I get an FPE on _all_ processes upon resume. I can still do a clean shutdown through the power-button method, but I can't do anything else. Has anyone seen this before? Does anyone have an inkling of an idea why I'd be getting FPE's for things like ps and sh? I'm guessing fpcurthread is stale. We should probably be flushing the FPU state on suspend and starting off without any FPU state on resume. Ah, see this bit here in x86/acpica/acpi_wakeup.c: int acpi_sleep_machdep(struct acpi_softc *sc, int state) { ... if (savectx(susppcbs[0])) { #ifdef __amd64__ ctx_fpusave(susppcbs[0]-pcb_fpususpend); #endif ... } Looks like you need to implement ctx_fpusave() for i386. kib@ did it as part of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave() and have it call npxsave() instead of fpxsave()? Not sure if you'd need it to be in asm as it is on amd64 or if you can do this in C. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org