Re: arm64 setjmp/longjmp once again.

2017-03-11 Thread Theo de Raadt
> > setjmp.S and _setjmp.S were modified to save the full 128 bit floating
> > point registers instead of only the 64 bit portion of the registers.
> 
> Both Linux and Android
> 
> * Save x19-x30 + sp (x31)
> * Save d8-d15 (so only the 64-bit portion of the registers)
> * Do not save fpcr.
> 
> I don't think we should deviate from that unless we have a good reason
> to do so.

Agree.

> I also think we should make jmpbuf fully opaque, and simply remove the
> _JB_XXX constants from .

Indeed.  Some study of the ports tree happened last year.  Only a few
programs use the fields, and inside #ifdef OS + #ifdef ARCH fields,
typically as hand-built thread API or a debugging/inspection method.
Very few cases of this, generally in very old code.  If I recall only
a few did modification of SP and PC, others did inspection of SP, PC,
and "first argument".

No need to extend such support to new architectures.

> The plan is to "encrypt" the contents of jmpbuf so users shouldn't be
> messing around with its contents.

xor safety is now enforced on most architectures.  That's paving the
way for us to be sure we can do authentication or encryption. Auth
would entail replacing the two-value xor with something like a hmac,
but still exposing the registers for inspection by
debugging/inspection methods.  Encryption would hide the values on the
stack further, not sure what the gain would be yet.  The difficulty is
that this is written in asm, and painful to get right.

> It'd be good to document the layout somewhere in the source code
> though, e.g. in setjmp.S.

If _JB-XXX fields disappear from .h files, then the layout is only
handled inside the .S file.  Correct place to document it, if the asm
doesn't make it clear.

> If we want, I think we can adjust the size of jmpbuf.  No formal
> release yet and I don't think it's a difficial ABI break to overcome.

Our base uses so few setjmp calls, so an ABI break in this area is
really inexpensive, just get through a build and it's done.

Dale, please look at the xor handling in .S files on other architectures.



Re: arm64 setjmp/longjmp once again.

2017-03-11 Thread Mark Kettenis
> Date: Fri, 10 Mar 2017 02:14:00 -0500
> From: Dale Rahn 
> 
> Yesterday's comments about setjmp caused me to dig more deeply into the
> function today. Here are several fixes. There is still one possibly
> outstanding issue with fpsr. When reading thru the calling convention
> it seemed to indicate that the rounding mode was not to be assumed.
> (Document is 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
> 
>  These fields [in FPSR] are not preserved across a public interface and may
>  have any value on entry to a subroutine.
> 
> This would seem to indicate that none of those fields need to be preserved,
> however should fpcr (rounding modes) be saved and restored?  The text
> doesn't seem to give any indication about saving that state across routines.
> 
> setjmp.h had never been updated for arm64, the defines were all from arm32.
> It was necessary to move the signal mask because after saving the full
> floating point context, the old save location was used by the fp context.
> Also the context is quite a bit larger than necessary 64 * 8 bytes,
> where the structure is only around 31 * 8 bytes in context.
> 
> setjmp.S and _setjmp.S were modified to save the full 128 bit floating
> point registers instead of only the 64 bit portion of the registers.

Both Linux and Android

* Save x19-x30 + sp (x31)
* Save d8-d15 (so only the 64-bit portion of the registers)
* Do not save fpcr.

I don't think we should deviate from that unless we have a good reason
to do so.

I also think we should make jmpbuf fully opaque, and simply remove the
_JB_XXX constants from .  The plan is to "encrypt"
the contents of jmpbuf so users shouldn't be messing around with its
contents.  Again Linux and Android don't expose such constants either.
It'd be good to document the layout somewhere in the source code
though, e.g. in setjmp.S.

If we want, I think we can adjust the size of jmpbuf.  No formal
release yet and I don't think it's a difficial ABI break to overcome.

> _setjmp.S was modified to have _longjmp always return non-zero, effectively
> the same diff as was applied to setjmp.S recently.
> 
> Finally, the regress/lib/libc/setjmp-signal test was wrong, it was
> attempting to write to a null pointer, however the compiler was optimizing
> away the NULL deref and replacing it with a breakpoint instruction.
> Adding a volatile defeats the compiler optimization, this may not be
> the best fix for this issue. Is there better suggestions?

I think it's perfectly fine.  So ok kettenis@ on that bit.

> Index: sys/arch/arm64/include/setjmp.h
> ===
> RCS file: /cvs/src/sys/arch/arm64/include/setjmp.h,v
> retrieving revision 1.1
> diff -u -p -r1.1 setjmp.h
> --- sys/arch/arm64/include/setjmp.h   17 Dec 2016 23:38:33 -  1.1
> +++ sys/arch/arm64/include/setjmp.h   10 Mar 2017 06:41:15 -
> @@ -16,27 +16,32 @@
>   * Description of the setjmp buffer
>   *
>   * word  0   magic number(dependant on creator)
> - *   1 -  3  f4  fp register 4
> - *4 -  6 f5  fp register 5
> - *7 -  9 f6  fp register 6
> - *   10 - 12 f7  fp register 7
> - *   13  fpsrfp status register
> - *   14  r4  register 4
> - *   15  r5  register 5
> - *   16  r6  register 6
> - *   17  r7  register 7
> - *   18  r8  register 8
> - *   19  r9  register 9
> - *   20  r10 register 10 (sl)
> - *   21  r11 register 11 (fp)
> - *   22  r12 register 12 (ip)
> - *   23  r13 register 13 (sp)
> - *   24  r14 register 14 (lr)
> - *   25  signal mask (dependant on magic)
> - *   26  (con't)
> - *   27  (con't)
> - *   28  (con't)
> - *
> + *   1   sp  stack
> + *   2   x19 register 19
> + *   3   x20 register 20
> + *   4   x21 register 21
> + *   5   x22 register 22
> + *   6   x23 register 23
> + *   7   x24 register 24
> + *   8   x25 register 25
> + *   9   x26 register 26
> + *  10   x27 register 27
> + *  11   x28 register 28
> + *  12   x29 register 29
> + *  13   x30 register 30
> + *   14  q8  fp register 8
> + *   16  q9  fp register 9
> + *   18  q10 fp register 10
> + *   20  q11 fp register 11
> + *   22  q12 fp register 12
> + *   24  q13 fp register 13
> + *   26  q14 fp register 14
> + *   28  v15 fp register 15
> + *   30  signal mask (dependant on magic)
> + *   31  (con't)
> + *   32  (con't)
> + *   33  (con't)
> + *  
>   * 

arm64 setjmp/longjmp once again.

2017-03-09 Thread Dale Rahn
Yesterday's comments about setjmp caused me to dig more deeply into the
function today. Here are several fixes. There is still one possibly
outstanding issue with fpsr. When reading thru the calling convention
it seemed to indicate that the rounding mode was not to be assumed.
(Document is 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)

 These fields [in FPSR] are not preserved across a public interface and may
 have any value on entry to a subroutine.

This would seem to indicate that none of those fields need to be preserved,
however should fpcr (rounding modes) be saved and restored?  The text
doesn't seem to give any indication about saving that state across routines.

setjmp.h had never been updated for arm64, the defines were all from arm32.
It was necessary to move the signal mask because after saving the full
floating point context, the old save location was used by the fp context.
Also the context is quite a bit larger than necessary 64 * 8 bytes,
where the structure is only around 31 * 8 bytes in context.

setjmp.S and _setjmp.S were modified to save the full 128 bit floating
point registers instead of only the 64 bit portion of the registers.

_setjmp.S was modified to have _longjmp always return non-zero, effectively
the same diff as was applied to setjmp.S recently.

Finally, the regress/lib/libc/setjmp-signal test was wrong, it was
attempting to write to a null pointer, however the compiler was optimizing
away the NULL deref and replacing it with a breakpoint instruction.
Adding a volatile defeats the compiler optimization, this may not be
the best fix for this issue. Is there better suggestions?

Unfortunately, these fixes do not explain the sh coredumps that were being
seen, in fact this test is not readily producing them on the RPI3 board.
However other sporatic errors were still being seen while running other
code and working on these changes.


Index: sys/arch/arm64/include/setjmp.h
===
RCS file: /cvs/src/sys/arch/arm64/include/setjmp.h,v
retrieving revision 1.1
diff -u -p -r1.1 setjmp.h
--- sys/arch/arm64/include/setjmp.h 17 Dec 2016 23:38:33 -  1.1
+++ sys/arch/arm64/include/setjmp.h 10 Mar 2017 06:41:15 -
@@ -16,27 +16,32 @@
  * Description of the setjmp buffer
  *
  * word  0 magic number(dependant on creator)
- *   1 -  3f4  fp register 4
- *  4 -  6 f5  fp register 5
- *  7 -  9 f6  fp register 6
- * 10 - 12 f7  fp register 7
- * 13  fpsrfp status register
- * 14  r4  register 4
- * 15  r5  register 5
- * 16  r6  register 6
- * 17  r7  register 7
- * 18  r8  register 8
- * 19  r9  register 9
- * 20  r10 register 10 (sl)
- * 21  r11 register 11 (fp)
- * 22  r12 register 12 (ip)
- * 23  r13 register 13 (sp)
- * 24  r14 register 14 (lr)
- * 25  signal mask (dependant on magic)
- * 26  (con't)
- * 27  (con't)
- * 28  (con't)
- *
+ *   1 sp  stack
+ *   2 x19 register 19
+ *   3 x20 register 20
+ *   4 x21 register 21
+ *   5 x22 register 22
+ *   6 x23 register 23
+ *   7 x24 register 24
+ *   8 x25 register 25
+ *   9 x26 register 26
+ *  10 x27 register 27
+ *  11 x28 register 28
+ *  12 x29 register 29
+ *  13 x30 register 30
+ * 14  q8  fp register 8
+ * 16  q9  fp register 9
+ * 18  q10 fp register 10
+ * 20  q11 fp register 11
+ * 22  q12 fp register 12
+ * 24  q13 fp register 13
+ * 26  q14 fp register 14
+ * 28  v15 fp register 15
+ * 30  signal mask (dependant on magic)
+ * 31  (con't)
+ * 32  (con't)
+ * 33  (con't)
+ *  
  * The magic number number identifies the jmp_buf and
  * how the buffer was created as well as providing
  * a sanity check.
@@ -61,23 +66,28 @@
 /* Valid for all jmp_buf's */
 
 #define _JB_MAGIC   0
-#define _JB_REG_F4  1
-#define _JB_REG_F5  4
-#define _JB_REG_F6  7
-#define _JB_REG_F7 10
-#define _JB_REG_FPSR   13
-#define _JB_REG_R4 14
-#define _JB_REG_R5 15
-#define _JB_REG_R6 16
-#define _JB_REG_R7 17
-#define _JB_REG_R8 18
-#define _JB_REG_R9 19
-#define _JB_REG_R1020
-#define _JB_REG_R1