Re: signal 8 (floating point exception) upon resume

2014-04-10 Thread John Baldwin
On Saturday, March 29, 2014 8:23:44 pm Adrian Chadd wrote:
 ... nope, just had a process die from SIGFPE.

Does it still trigger SIGFPE if you suspend/resume a UP kernel?

 -a
 
 
 On 29 March 2014 07:32, Adrian Chadd adr...@freebsd.org wrote:
  Hi!
 
  On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote:
 
  i386_fpu_suspend3.patch at the same URL builds for me.
 
  I've not had the kernel lose the plot yet with SIGFPE's.
 
  I'll do some further testing and let you know if that changes.
 
 
 
  -a
 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-29 Thread Adrian Chadd
Hi!

On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote:

 i386_fpu_suspend3.patch at the same URL builds for me.

I've not had the kernel lose the plot yet with SIGFPE's.

I'll do some further testing and let you know if that changes.



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-29 Thread Adrian Chadd
... nope, just had a process die from SIGFPE.


-a


On 29 March 2014 07:32, Adrian Chadd adr...@freebsd.org wrote:
 Hi!

 On 26 March 2014 12:00, John Baldwin j...@freebsd.org wrote:

 i386_fpu_suspend3.patch at the same URL builds for me.

 I've not had the kernel lose the plot yet with SIGFPE's.

 I'll do some further testing and let you know if that changes.



 -a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-26 Thread John Baldwin
On Tuesday, March 25, 2014 5:38:50 pm Adrian Chadd wrote:
 On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote:
  On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote:
  [snip]
 
  Hi,
 
  As part of this thread, a whole lot of stuff was thrown around to try
  and fix / improve the correctness of this.
 
  But it still happens to me in -HEAD i386. I updated to r263418 and
  it's now doing it around 30-50% of the time I resume.
 
  Yes, nothing has changed in HEAD.
 
  So, since I really am trying to avoid getting neck deep in learning
  (by myself) a new thing right now, would someone be willing to help me
  through the process of (a) learning how this is all supposed to work
  (which thanks to jhb and bde, I think I've learnt from the posts in
  this thread) and (b) some things to try out? I'll be able to report
  the results of this pretty quickly.
 
  You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch.  You
  could have tried the first patch I posted here earlier when I first
  posted it as well. :)
 
 There was a lot of chatter, I thought it was prudent to let it all
 settle before jumping in.
 
 Anyway:

You do understand C well enough to fix simple typos?  I don't have any
i386 machines around, but I'll work on cross-building.

 --- npx.o ---
 /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18:
 error: declaration of 'union safefpu' will not be visible outside of
 this function [-Werror,-Wvisibility]
 npxsuspend(union safefpu *addr)
  ^

s/safe/save/

 /usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:779:8:
 error: implicit declaration of function 'rcr' is invalid in C99
 [-Werror,-Wimplicit-function-declaration]
 cr0 = rcr(0);
   ^

Probably just need #include machine/cpufunc.h

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-26 Thread John Baldwin
On Wednesday, March 26, 2014 1:43:22 pm John Baldwin wrote:
 On Tuesday, March 25, 2014 5:38:50 pm Adrian Chadd wrote:
  On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote:
   On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote:
   [snip]
  
   Hi,
  
   As part of this thread, a whole lot of stuff was thrown around to try
   and fix / improve the correctness of this.
  
   But it still happens to me in -HEAD i386. I updated to r263418 and
   it's now doing it around 30-50% of the time I resume.
  
   Yes, nothing has changed in HEAD.
  
   So, since I really am trying to avoid getting neck deep in learning
   (by myself) a new thing right now, would someone be willing to help me
   through the process of (a) learning how this is all supposed to work
   (which thanks to jhb and bde, I think I've learnt from the posts in
   this thread) and (b) some things to try out? I'll be able to report
   the results of this pretty quickly.
  
   You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch.  You
   could have tried the first patch I posted here earlier when I first
   posted it as well. :)
  
  There was a lot of chatter, I thought it was prudent to let it all
  settle before jumping in.
  
  Anyway:
 
 You do understand C well enough to fix simple typos?  I don't have any
 i386 machines around, but I'll work on cross-building.

i386_fpu_suspend3.patch at the same URL builds for me.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-25 Thread John Baldwin
On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote:
 [snip]
 
 Hi,
 
 As part of this thread, a whole lot of stuff was thrown around to try
 and fix / improve the correctness of this.
 
 But it still happens to me in -HEAD i386. I updated to r263418 and
 it's now doing it around 30-50% of the time I resume.

Yes, nothing has changed in HEAD.

 So, since I really am trying to avoid getting neck deep in learning
 (by myself) a new thing right now, would someone be willing to help me
 through the process of (a) learning how this is all supposed to work
 (which thanks to jhb and bde, I think I've learnt from the posts in
 this thread) and (b) some things to try out? I'll be able to report
 the results of this pretty quickly.

You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch.  You
could have tried the first patch I posted here earlier when I first
posted it as well. :)

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-25 Thread Adrian Chadd
On 25 March 2014 12:46, John Baldwin j...@freebsd.org wrote:
 On Sunday, March 23, 2014 4:41:24 pm Adrian Chadd wrote:
 [snip]

 Hi,

 As part of this thread, a whole lot of stuff was thrown around to try
 and fix / improve the correctness of this.

 But it still happens to me in -HEAD i386. I updated to r263418 and
 it's now doing it around 30-50% of the time I resume.

 Yes, nothing has changed in HEAD.

 So, since I really am trying to avoid getting neck deep in learning
 (by myself) a new thing right now, would someone be willing to help me
 through the process of (a) learning how this is all supposed to work
 (which thanks to jhb and bde, I think I've learnt from the posts in
 this thread) and (b) some things to try out? I'll be able to report
 the results of this pretty quickly.

 You can try www.freebsd.org/~jhb/patches/i386_fpu_suspend2.patch.  You
 could have tried the first patch I posted here earlier when I first
 posted it as well. :)

There was a lot of chatter, I thought it was prudent to let it all
settle before jumping in.

Anyway:

--- npx.o ---
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18:
error: declaration of 'union safefpu' will not be visible outside of
this function [-Werror,-Wvisibility]
npxsuspend(union safefpu *addr)
 ^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:1:
error: conflicting types for 'npxsuspend'
npxsuspend(union safefpu *addr)
^
./machine/npx.h:59:6: note: previous declaration is here
voidnpxsuspend(union savefpu *addr);
^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:776:9:
error: incomplete type 'union safefpu' is not assignable
*addr = npx_initialstate;
~ ^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:769:18:
note: forward declaration of 'union safefpu'
npxsuspend(union safefpu *addr)
 ^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:779:8:
error: implicit declaration of function 'rcr' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
cr0 = rcr(0);
  ^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:781:10:
error: incompatible pointer types passing 'union safefpu *' to
parameter of type 'union savefpu *'
[-Werror,-Wincompatible-pointer-types]
fpusave(addr);
^~~~
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:160:36:
note: passing argument to parameter here
static  voidfpusave(union savefpu *);
   ^
/usr/home/adrian/work/freebsd/head/src/sys/i386/isa/npx.c:782:2:
error: implicit declaration of function 'load_cr' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
load_cr(0, cr0);
^
6 errors generated.
*** [npx.o] Error code 1



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-23 Thread Adrian Chadd
[snip]

Hi,

As part of this thread, a whole lot of stuff was thrown around to try
and fix / improve the correctness of this.

But it still happens to me in -HEAD i386. I updated to r263418 and
it's now doing it around 30-50% of the time I resume.

So, since I really am trying to avoid getting neck deep in learning
(by myself) a new thing right now, would someone be willing to help me
through the process of (a) learning how this is all supposed to work
(which thanks to jhb and bde, I think I've learnt from the posts in
this thread) and (b) some things to try out? I'll be able to report
the results of this pretty quickly.

I'd like to start work on supporting and power efficiency stuff on
some of the chromebook and tablet hardware using Intel stuff but it's
going to be totally moot if i386 suspend/resume (and vt/xorg, but
that's a different thread) is this busted. :-)

Thanks,


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-11 Thread Bruce Evans

On Mon, 10 Mar 2014, John Baldwin wrote:


On Tuesday, March 04, 2014 4:50:01 pm Bruce Evans wrote:

On Tue, 4 Mar 2014, John Baldwin wrote:
% Index: i386/i386/swtch.s
% ===
% --- i386/i386/swtch.s (revision 262711)
% +++ i386/i386/swtch.s (working copy)

[...savectx()]

This function is mostly bogus (see old mails).


I was going off of the commit logs for amd64 that removed this code as savectx()
is not used for fork(), only for IPI_STOP and suspend/resume.


Without fxsave, npxsuspend() cannot be atomic without locking, since
fnsave destroys the state in the FPU and you either need a lock to
reload the old state atomically enough, or a lock to modify FPCURTHREAD
atomically enough.


save_ctx() is now only called from IPI handlers or when doing suspend in
which case we shouldn't have to worry about being preempted.


I don't understand the suspend part.  Is sufficient locking held througout
suspend/resume to prevent states changing after they have been saved here?


% @@ -520,7 +490,16 @@
%   movl%eax,%dr7
%
%  #ifdef DEV_NPX
% - /* XXX FIX ME */
% + /* Restore FPU state */

Is the problem just this missing functionality?


Possibly.


I now think it was just the clobbering of %cr0 so i386 never had the
problem.


I think on amd64 there was also the desire to have the pcb
state be meaningful in dumps (since we IPI_STOP before a dump).  OTOH,


It should also be meaningful in debuggers.  Hopefully stop IPIs put
it there form all stopped CPUs.  I think it remains in the FPU for
the running CPU.


the current approach used by amd64 (and this patch for i386) is to not
dirty fpcurthread's state during save_ctx(), but to instead leave
fpcurthread alone and explicitly save whatever state the FPU is in
in the PCB used for IPI_STOP or suspend.


Hmm, if kernel debuggers actually supported displaying the FPU state, then
they would prefer to find it in the PCB only (after debugger entry puts
it there), but this doesn't work in places like the dna trap handler.
Similarly for IPIs and suspend.  The dna trap handler would be broken
unless any saving in the PCB is undone when normal operation is resumed,
and it seems more difficult to undo it than to save specially so as not
to have anything to undo.  It is OK to save in the usual place in the PCB
so that debuggers can find it more easily (since that place is not used
in normal operation), but not to change the state in the CPU+FPU across
the operation.  Harmful state changes in the CPU+FPU include toggling
CR0_TS and implicit fninit.  For suspend/resume, we have no option but
to undo everything, since other things may clobber the state.




% @@ -761,7 +761,34 @@
%   PCPU_SET(fpcurthread, NULL);
%  }
%
% +/*
% + * Unconditionally save the current co-processor state across suspend and
% + * resume.
% + */
%  void
% +npxsuspend(union safefpu *addr)
% +{
% + register_t cr0;
% +
% + if (!hw_float)
% + return;
% + cr0 = rcr(0);
% + clts();
% + fpusave(addr);
% + load_cr(0, cr0);
% +}

In the !fxsave case, this destroys the state in the npx, leaving
fpcurthread invalid.  It also does the save when the state in the
npx is inactive.  I think jkim intentionally this state so that
resume can load it unconditionally.  It must be arranged that there
are no interactions with fpcurthread.


Given the single-threaded nature of suspend/resume and IPI_STOP /
restart_cpus(), those requirements are met, so it should be safe
to resume whatever state was in the FPU and leave fpcurthread
unchanged.


Is the whole suspend/resume really locked?


This doesn't work so well
without fxsave.  When fpcurthread != NULL, reloading CR0 keeps
CR0_TS and thus ensures that inconsistent state lives for longer.
Things will only be OK if fpcurthread isn't changed until resume.


After the save_ctx() the CPU is going to either resume without
doing a resume_ctx (IPI_STOP case) leaving fpcurthread unchanged
(so save_ctx() just grabbed a snapshot of the FPU state for
debugging purposes) or the CPU is going to power off for suspend.


If it doesn't restore for IPI_STOP, then it will continue with the
state clobbered by fnsave in the !fxsr case.  That is rare but can
happen.  Most CPUs that have IPIs also have fxsr.  But on at least
i386, there is an option to disable fxsr.


During resume it will invoke resume_ctx() which will restore the
FPU state (whatever state it was in) and fpcurthread and only
after those are true is the CPU able to run other threads which
will modify or use the FPU state.


You can probably fix this by using the old code here.  The old code
doesn't need the hw_float test, since fpcurthread != NULL implies
hw_float != 0.

Actually, I don't see any need to change anything on i386 -- after
storing the state for the thread, there should be no need to store it
anywhere else across suspend/resume.  We intentionally use this method
(even on amd64 IIRC), although it is 

Re: signal 8 (floating point exception) upon resume

2014-03-10 Thread John Baldwin
On Tuesday, March 04, 2014 4:50:01 pm Bruce Evans wrote:
 On Tue, 4 Mar 2014, John Baldwin wrote:
 % Index: i386/i386/swtch.s
 % ===
 % --- i386/i386/swtch.s   (revision 262711)
 % +++ i386/i386/swtch.s   (working copy)
 % @@ -417,42 +417,9 @@
 % str PCB_TR(%ecx)
 % 
 %  #ifdef DEV_NPX
 % -   /*
 % -* If fpcurthread == NULL, then the npx h/w state is irrelevant and the
 % -* state had better already be in the pcb.  This is true for forks
 % -* but not for dumps (the old book-keeping with FP flags in the pcb
 % -* always lost for dumps because the dump pcb has 0 flags).
 % -*
 % -* If fpcurthread != NULL, then we have to save the npx h/w state to
 % -* fpcurthread's pcb and copy it to the requested pcb, or save to the
 % -* requested pcb and reload.  Copying is easier because we would
 % -* have to handle h/w bugs for reloading.  We used to lose the
 % -* parent's npx state for forks by forgetting to reload.
 % -*/
 
 This function is mostly bogus (see old mails).

I was going off of the commit logs for amd64 that removed this code as savectx()
is not used for fork(), only for IPI_STOP and suspend/resume.

 Without fxsave, npxsuspend() cannot be atomic without locking, since
 fnsave destroys the state in the FPU and you either need a lock to
 reload the old state atomically enough, or a lock to modify FPCURTHREAD
 atomically enough.

save_ctx() is now only called from IPI handlers or when doing suspend in
which case we shouldn't have to worry about being preempted.

 % 
 % movl$1,%eax
 % ...
 % @@ -520,7 +490,16 @@
 % movl%eax,%dr7
 % 
 %  #ifdef DEV_NPX
 % -   /* XXX FIX ME */
 % +   /* Restore FPU state */
 
 Is the problem just this missing functionality?

Possibly.  I think on amd64 there was also the desire to have the pcb
state be meaningful in dumps (since we IPI_STOP before a dump).  OTOH,
the current approach used by amd64 (and this patch for i386) is to not
dirty fpcurthread's state during save_ctx(), but to instead leave
fpcurthread alone and explicitly save whatever state the FPU is in
in the PCB used for IPI_STOP or suspend.

 % @@ -761,7 +761,34 @@
 % PCPU_SET(fpcurthread, NULL);
 %  }
 % 
 % +/*
 % + * Unconditionally save the current co-processor state across suspend and
 % + * resume.
 % + */
 %  void
 % +npxsuspend(union safefpu *addr)
 % +{
 % +   register_t cr0;
 % +
 % +   if (!hw_float)
 % +   return;
 % +   cr0 = rcr(0);
 % +   clts();
 % +   fpusave(addr);
 % +   load_cr(0, cr0);
 % +}
 
 In the !fxsave case, this destroys the state in the npx, leaving
 fpcurthread invalid.  It also does the save when the state in the
 npx is inactive.  I think jkim intentionally this state so that
 resume can load it unconditionally.  It must be arranged that there
 are no interactions with fpcurthread.

Given the single-threaded nature of suspend/resume and IPI_STOP /
restart_cpus(), those requirements are met, so it should be safe
to resume whatever state was in the FPU and leave fpcurthread
unchanged.

 This doesn't work so well
 without fxsave.  When fpcurthread != NULL, reloading CR0 keeps
 CR0_TS and thus ensures that inconsistent state lives for longer.
 Things will only be OK if fpcurthread isn't changed until resume.

After the save_ctx() the CPU is going to either resume without
doing a resume_ctx (IPI_STOP case) leaving fpcurthread unchanged
(so save_ctx() just grabbed a snapshot of the FPU state for
debugging purposes) or the CPU is going to power off for suspend.
During resume it will invoke resume_ctx() which will restore the
FPU state (whatever state it was in) and fpcurthread and only
after those are true is the CPU able to run other threads which
will modify or use the FPU state.

 You can probably fix this by using the old code here.  The old code
 doesn't need the hw_float test, since fpcurthread != NULL implies
 hw_float != 0.
 
 Actually, I don't see any need to change anything on i386 -- after
 storing the state for the thread, there should be no need to store it
 anywhere else across suspend/resume.  We intentionally use this method
 (even on amd64 IIRC), although it is suboptimal, to reduce complications
 for context switchres and signal handling.  npxsave() takes an address,
 but savectx() didn't abuse this to store directly in the special save
 area.  It made npxsave() store in the pcb, and then copied to the special
 area.

So I guess that is one option is to always clear fpcurthread during
suspend and just do an fninit on resume.  However, I chose to match
what amd64 does for now.  I did make one change locally which was to
not bother saving the FPU state if fpcurthread was NULL during save_ctx,
but to instead store a copy of 'npx_initial_state' in the pcb instead.
This is then loaded into the FPU on resume.  Is that sufficient for
the !fxsave case?

-- 
John Baldwin
___

Re: signal 8 (floating point exception) upon resume

2014-03-04 Thread John Baldwin
On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote:
 I'll try this soon.
 
 I had it fail back to newcons, rather than Xorg normally dying without
 restoring state. It wouldn't let me spawn a shell. Logging in worked
 fine, but normal shell exec would eventually and quickly lead to
 failure, dropping me back to the login prompt.

If you have set CPUTYPE in /etc/src.conf such that your userland binaries
are built with SSE, etc. then I expect most things to break because the FPU
is in a funky state without this patch.  I suspect if you don't set CPUTYPE
so that your userland binaries do not use the FPU, you can probably resume
just fine without this fix.

 -a
 
 
 On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote:
  On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote:
  On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote:
   ... how'd this ever work in the past then?
  
 
  .. and I've submitted it as a PR:
 
  kern/187152
 
  Complete stab in the dark (not compile tested) here:
 
  http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch
 
  --
  John Baldwin
 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-04 Thread Adrian Chadd
I'm not using anything in /etc/src.conf .


-a


On 4 March 2014 08:24, John Baldwin j...@freebsd.org wrote:
 On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote:
 I'll try this soon.

 I had it fail back to newcons, rather than Xorg normally dying without
 restoring state. It wouldn't let me spawn a shell. Logging in worked
 fine, but normal shell exec would eventually and quickly lead to
 failure, dropping me back to the login prompt.

 If you have set CPUTYPE in /etc/src.conf such that your userland binaries
 are built with SSE, etc. then I expect most things to break because the FPU
 is in a funky state without this patch.  I suspect if you don't set CPUTYPE
 so that your userland binaries do not use the FPU, you can probably resume
 just fine without this fix.

 -a


 On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote:
  On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote:
  On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote:
   ... how'd this ever work in the past then?
  
 
  .. and I've submitted it as a PR:
 
  kern/187152
 
  Complete stab in the dark (not compile tested) here:
 
  http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch
 
  --
  John Baldwin


 --
 John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-04 Thread Jung-uk Kim
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2014-03-04 11:24:04 -0500, John Baldwin wrote:
 On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote:
 I'll try this soon.
 
 I had it fail back to newcons, rather than Xorg normally dying 
 without restoring state. It wouldn't let me spawn a shell. 
 Logging in worked fine, but normal shell exec would eventually 
 and quickly lead to failure, dropping me back to the login 
 prompt.
 
 If you have set CPUTYPE in /etc/src.conf such that your userland 
 binaries are built with SSE, etc. then I expect most things to 
 break because the FPU is in a funky state without this patch.  I 
 suspect if you don't set CPUTYPE so that your userland binaries do 
 not use the FPU, you can probably resume just fine without this 
 fix.
 
 -a
 
 
 On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote:
 On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote:
 On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org 
 wrote:
 ... how'd this ever work in the past then?
 
 
 .. and I've submitted it as a PR:
 
 kern/187152
 
 Complete stab in the dark (not compile tested) here:
 
 http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch

The patch for sys/amd64/amd64/cpu_switch.S is committed:

http://svnweb.freebsd.org/changeset/base/262746

i386 patches may be reviewed by the original author (CC'ed).

Jung-uk Kim
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (FreeBSD)

iQEcBAEBAgAGBQJTFi3RAAoJEHyflib82/FGnboH/3qrAp+Eq/0eebEP/9wU0Ke/
y4y4yvw9nDVexKZ+c5VuTxyWvK9O0w2b+r3f5kuHWferOm22NaJCctt3E/OA5Ly2
1p3ZPvqD5cRZfkdh68AwEeJv93lg84VMSUqNUfS9rsrIU+WpHpPR46sdLpq5KxSP
cY2522npmoPrwk+PaTJS4uBQeaX/3vnj5996zxavwVqwlYyR+Zqgi6FhGj+F2RJ1
Ry+9icyNx/8lUfRTLCPsCBRjlUKUk/p/8bfbQK4mSef5Gd8ZAiqdyKqgdMBUYhNA
ZplkpijJjvlIIc0dYSwg8gMKmaB6amgw/LJGQit9nTkBU2bOd6L05f1dCpYAxDE=
=x0sS
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-04 Thread Bruce Evans

On Tue, 4 Mar 2014, John Baldwin wrote:


On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote:

I'll try this soon.

I had it fail back to newcons, rather than Xorg normally dying without
restoring state. It wouldn't let me spawn a shell. Logging in worked
fine, but normal shell exec would eventually and quickly lead to
failure, dropping me back to the login prompt.


If you have set CPUTYPE in /etc/src.conf such that your userland binaries
are built with SSE, etc. then I expect most things to break because the FPU
is in a funky state without this patch.  I suspect if you don't set CPUTYPE
so that your userland binaries do not use the FPU, you can probably resume
just fine without this fix.


Non-SSE FPU state might be broken too.


Complete stab in the dark (not compile tested) here:

http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch


I forget many details of how this works, but noticed that it seems
to break consistency of the state for the !fxsave case and related
locking.

% Index: i386/i386/swtch.s
% ===
% --- i386/i386/swtch.s (revision 262711)
% +++ i386/i386/swtch.s (working copy)
% @@ -417,42 +417,9 @@
%   str PCB_TR(%ecx)
% 
%  #ifdef DEV_NPX

% - /*
% -  * If fpcurthread == NULL, then the npx h/w state is irrelevant and the
% -  * state had better already be in the pcb.  This is true for forks
% -  * but not for dumps (the old book-keeping with FP flags in the pcb
% -  * always lost for dumps because the dump pcb has 0 flags).
% -  *
% -  * If fpcurthread != NULL, then we have to save the npx h/w state to
% -  * fpcurthread's pcb and copy it to the requested pcb, or save to the
% -  * requested pcb and reload.  Copying is easier because we would
% -  * have to handle h/w bugs for reloading.  We used to lose the
% -  * parent's npx state for forks by forgetting to reload.
% -  */

This function is mostly bogus (see old mails).

% - pushfl
% - CLI
% - movlPCPU(FPCURTHREAD),%eax
% - testl   %eax,%eax
% - je  1f

This CLI/STI locking is bogus.  Accesses to FPCURTHREAD are now locked
by critical_enter(), as on amd64, and perhaps a higher level already
did critical_enter() or even CLI.

(CLI/STI in swtch.s seems to be bogus too.  amd64 doesn't do it, and
I think a higher level does mtx_lock_spin() which does too much, including
CLI via spinlock_enter().)

% -
% - pushl   %ecx
% - movlTD_PCB(%eax),%eax
% - movlPCB_SAVEFPU(%eax),%eax
% - pushl   %eax
% - pushl   %eax
% - callnpxsave
% + pushl   PCB_FPUSUSPEND(%ecx)
% + callnpxsuspend

Without fxsave, npxsuspend() cannot be atomic without locking, since
fnsave destroys the state in the FPU and you either need a lock to
reload the old state atomically enough, or a lock to modify FPCURTHREAD
atomically enough.  Reloading the old state is problematic because
the reload might trap.  So the old version uses the second method.
It calls npxsave() to handle most of the details.  But npxsave() was
designed to be efficient for its usual use in cpu_switch(), so it doesn't
handle the detail of checking FPCURTHREAD or the locking needed for this
check, so the above code had to handle these details.

%   addl$4,%esp
% - popl%eax
% - popl%ecx
% -
% - pushl   $PCB_SAVEFPU_SIZE
% - lealPCB_USERFPU(%ecx),%ecx
% - pushl   %ecx
% - pushl   %eax
% - callbcopy
% - addl$12,%esp
% -1:
% - popfl
%  #endif   /* DEV_NPX */

This probably should never have been written in asm.  Only the similar
code in cpu_switch() is time-critical.

% 
%  	movl	$1,%eax

% ...
% @@ -520,7 +490,16 @@
%   movl%eax,%dr7
% 
%  #ifdef DEV_NPX

% - /* XXX FIX ME */
% + /* Restore FPU state */

Is the problem just this missing functionality?

% ...
% Index: i386/isa/npx.c
% ===
% --- i386/isa/npx.c(revision 262711)
% +++ i386/isa/npx.c(working copy)

This has many vestiges of support for interrupt handling (mainly in
comments and in complications in the probe).  CLI/STI was used for
locking partly to reduce complications for the IRQ13 case.  The
comment before npxsave() still says that it needs CLI/STI locking
by callers, but it actually needs critical_enter() locking and
most callers only provided that.

% @@ -761,7 +761,34 @@
%   PCPU_SET(fpcurthread, NULL);
%  }
% 
% +/*

% + * Unconditionally save the current co-processor state across suspend and
% + * resume.
% + */
%  void
% +npxsuspend(union safefpu *addr)
% +{
% + register_t cr0;
% +
% + if (!hw_float)
% + return;
% + cr0 = rcr(0);
% + clts();
% + fpusave(addr);
% + load_cr(0, cr0);
% +}

In the !fxsave case, this destroys the state in the npx, leaving
fpcurthread invalid.  It also does the save when the state in the
npx is inactive.  I think jkim 

Re: signal 8 (floating point exception) upon resume

2014-03-03 Thread John Baldwin
On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote:
 On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote:
  ... how'd this ever work in the past then?
 
 
 .. and I've submitted it as a PR:
 
 kern/187152

Complete stab in the dark (not compile tested) here:

http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-03-03 Thread Adrian Chadd
I'll try this soon.

I had it fail back to newcons, rather than Xorg normally dying without
restoring state. It wouldn't let me spawn a shell. Logging in worked
fine, but normal shell exec would eventually and quickly lead to
failure, dropping me back to the login prompt.


-a


On 3 March 2014 11:11, John Baldwin j...@freebsd.org wrote:
 On Friday, February 28, 2014 9:00:57 pm Adrian Chadd wrote:
 On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote:
  ... how'd this ever work in the past then?
 

 .. and I've submitted it as a PR:

 kern/187152

 Complete stab in the dark (not compile tested) here:

 http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch

 --
 John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-02-28 Thread John Baldwin
On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote:
 Hi,
 
 On my i386 -HEAD laptops (running -HEAD as of last night, but it's
 been a problem for a while) I occasionally hit a point where I get an
 FPE on _all_ processes upon resume.
 
 I can still do a clean shutdown through the power-button method, but I
 can't do anything else.
 
 Has anyone seen this before? Does anyone have an inkling of an idea
 why I'd be getting FPE's for things like ps and sh?

I'm guessing fpcurthread is stale.  We should probably be flushing
the FPU state on suspend and starting off without any FPU state on
resume.

Ah, see this bit here in x86/acpica/acpi_wakeup.c:


int
acpi_sleep_machdep(struct acpi_softc *sc, int state)
{
...
if (savectx(susppcbs[0])) {
#ifdef __amd64__
ctx_fpusave(susppcbs[0]-pcb_fpususpend);
#endif
...
}

Looks like you need to implement ctx_fpusave() for i386.  kib@ did it as part 
of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave() 
and have it call npxsave() instead of fpxsave()?  Not sure if you'd need it to 
be in asm as it is on amd64 or if you can do this in C.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-02-28 Thread Adrian Chadd
... how'd this ever work in the past then?


-a


On 28 February 2014 13:08, John Baldwin j...@freebsd.org wrote:
 On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote:
 Hi,

 On my i386 -HEAD laptops (running -HEAD as of last night, but it's
 been a problem for a while) I occasionally hit a point where I get an
 FPE on _all_ processes upon resume.

 I can still do a clean shutdown through the power-button method, but I
 can't do anything else.

 Has anyone seen this before? Does anyone have an inkling of an idea
 why I'd be getting FPE's for things like ps and sh?

 I'm guessing fpcurthread is stale.  We should probably be flushing
 the FPU state on suspend and starting off without any FPU state on
 resume.

 Ah, see this bit here in x86/acpica/acpi_wakeup.c:


 int
 acpi_sleep_machdep(struct acpi_softc *sc, int state)
 {
 ...
 if (savectx(susppcbs[0])) {
 #ifdef __amd64__
 ctx_fpusave(susppcbs[0]-pcb_fpususpend);
 #endif
 ...
 }

 Looks like you need to implement ctx_fpusave() for i386.  kib@ did it as part
 of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave()
 and have it call npxsave() instead of fpxsave()?  Not sure if you'd need it to
 be in asm as it is on amd64 or if you can do this in C.

 --
 John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: signal 8 (floating point exception) upon resume

2014-02-28 Thread Adrian Chadd
On 28 February 2014 15:35, Adrian Chadd adr...@freebsd.org wrote:
 ... how'd this ever work in the past then?


.. and I've submitted it as a PR:

kern/187152

Thanks,


-a



 -a


 On 28 February 2014 13:08, John Baldwin j...@freebsd.org wrote:
 On Friday, February 28, 2014 1:15:45 pm Adrian Chadd wrote:
 Hi,

 On my i386 -HEAD laptops (running -HEAD as of last night, but it's
 been a problem for a while) I occasionally hit a point where I get an
 FPE on _all_ processes upon resume.

 I can still do a clean shutdown through the power-button method, but I
 can't do anything else.

 Has anyone seen this before? Does anyone have an inkling of an idea
 why I'd be getting FPE's for things like ps and sh?

 I'm guessing fpcurthread is stale.  We should probably be flushing
 the FPU state on suspend and starting off without any FPU state on
 resume.

 Ah, see this bit here in x86/acpica/acpi_wakeup.c:


 int
 acpi_sleep_machdep(struct acpi_softc *sc, int state)
 {
 ...
 if (savectx(susppcbs[0])) {
 #ifdef __amd64__
 ctx_fpusave(susppcbs[0]-pcb_fpususpend);
 #endif
 ...
 }

 Looks like you need to implement ctx_fpusave() for i386.  kib@ did it as part
 of the AVX work, but I wonder if you can just steal the amd64 ctx_fpusave()
 and have it call npxsave() instead of fpxsave()?  Not sure if you'd need it 
 to
 be in asm as it is on amd64 or if you can do this in C.

 --
 John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org