Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Bruno Haible
Hi Ivan,

Thank you for sharing your insights!

> > It would mean that Linux/e2k can hardly conform to
> > POSIX well, as Bruno said, because POSIX requires different signals for
> > different cases and incompatibilities can't be forgiven on the reason of
> > speculative computations in the CPU.

> The compiler would know how to
> replay the faulty speculative computation, so it would be able
> generate code to do this non-speculatively and trigger the real fault.

Yes, you need to think at the kernel and the compiler together.

As I understand it, the general approach in such cases is to:

  1) See in the hardware manual whether there is a way to retrieve the
 exception details (exception code, and memory address in case of a
 memory access) from the speculative execution. If so, use it in the
 kernel, in linux//mm/fault.c.
  If not:

  2) Implement a proposed solution in the compiler that results in
 discarding the speculative execution results when there was an
 exception during speculative execution.
  3) Implement another proposed solution in the compiler that completely
 disables speculative execution for instructions that may produce
 exceptions (and leave it enabled only for guaranteed exception-free
 instructions, such as integer arithmetic instructions).
 [It is not unheard of that processor features get completely disabled.
 For example, OpenBSD/x86_64 disables hyperthreading, which many
 people previously thought to be a valuable processor feature.]

  4) Benchmark the performance impact of 2) and 3) on programs. Choose
 the one with less impact.

  5) If the impact is high, then invent a compiler option that allows
 the application developer to choose among POSIX compliant code or
 fast code. [This is the approach used e.g. for floating-point instructions
 on alpha in GCC: The instructions provided by the hardware are not
 IEEE 854 compliant, and the workaround that GCC adds to make it
 it IEEE 854 compliant is so much of a performance hit that it is
 only enabled through a compiler option.]

Bruno




Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Ivan Zakharyaschev
Here is a follow-up to the story, for those curious what happens in a 
similar IA64 architecture. And this should be it.

As for the problem on E2K itself, we should discuss it with MCST and/or 
investigate whether the missing information about the faults can be 
recovered to better satisfy POSIX.

On Sat, 29 Dec 2018, Ivan Zakharyaschev wrote:

> > > As for the SIGILL peculiarity, it has a reason in the Elbrus 
> > > architecture. 

> I've studied the assembler code and found the other true 
> reason in this specific case: these are faults "hidden" in an explicitly 
> "speculative" computation which utltimately result in SIGILL. (The E2K ISA 
> is reminiscent of IA64; this can help get the idea.) The specific kind of 
> the fault is "forgotten", unfortunately.

> Besides, in many aspects including the newly mentioned by me explicitly 
> speculative instructions, E2K reminds IA64.
> 
> And it'd be interesting to have a look how they treat faults coming from 
> speculative computations in Linux/ia64 to get an idea whether it can be 
> done in a manner with better conformance to POSIX.

> * * *
> 
> BTW, saving and forgetting the type of the original fault doesn't seem

I meant "not forgetting".

> to be something expensive to implement (after some thought): when a
> register is marked as invalid, it shouldn't matter anymore what value
> it holds. So, the same register can be used to save the information
> about the type of the fault.

As Dmitry Levin pointed out, probably not, because there can be too much 
information (the fault, and the associated addres) for a single register.

> * * *
> 
> I wanted to see how Linux/ia64 handles these complications arising
> from speculative computations possibly causing a fault; and powered on
> such a machine, and had a look at the above examples with SIGILL on
> E2K: the third one, and the fifth one (speculative division by zero).
> 
> The third example from above:
> 
> imz@rx2620:~/test-speculative-SIGSEGV$ cc -Wall -O3 -xc - -S -o c.s && cat c.s
> int main(int argc, char ** argv) {
>   if (0 < argc)
> ++*(char*)0xbad;
>   return 0xbeef;
> }
>   .file   ""
>   .pred.safe_across_calls p1-p5,p16-p63
>   .section.text.startup,"ax",@progbits
>   .align 16
>   .align 64
>   .global main#
>   .type   main#, @function
>   .proc main#
> main:
>   .prologue
>   .body
>   .mmi
>   cmp4.ge p6, p7 = 0, r32
>   addl r14 = 2989, r0
>   addl r8 = 48879, r0
>   ;;
>   .mmi
>   (p7) ld1 r15 = [r14]
>   ;;
>   (p7) adds r15 = 1, r15
>   nop 0
>   ;;
>   .mib
>   (p7) st1 [r14] = r15
>   nop 0
>   br.ret.sptk.many b0
>   .endp main#
>   .ident  "GCC: (Debian 4.6.3-14) 4.6.3"
>   .section.note.GNU-stack,"",@progbits
> imz@rx2620:~/test-speculative-SIGSEGV$ cc -Wall -O3 c.s && ./a.out; echo $?
> Segmentation fault
> 139

> Notes on the assembler: the possible groupings into VLIWs are
> separated by double semicolons (";;"). Predicative execution of
> instructions is marked by a prefix with the corresponding predicate
> register in parentheses, like "(p7)" in the code above:
> 
>   .mmi
>   (p7) ld1 r15 = [r14]
>   ;;
>   (p7) adds r15 = 1, r15
>   nop 0
>   ;;
>   .mib
>   (p7) st1 [r14] = r15
> 
> These are the "load", "add", and "store" instructions corresponding to: 
> ++*(char*)0xbad
> 
> All this shows that gcc-4.6 on IA-64 doesn't generate speculative
> computations for the same examples that had speculative computations
> on E2K. Unfortunately, this means that we couldn't compare the
> interesting bits of the behavior between Linux/e2k and Linux/ia64
> quickly. Perhaps, editing the IA64 assembler code can give a desired
> example.

Cool! Linux/ia64 also produces SIGILL in the same situation; it seems
to have no magic. (But there is a second part of the story!)

imz@rx2620:~/test-speculative-SIGSEGV$ diff c.s c_s.s
18c18
<   (p7) ld1 r15 = [r14]
---
>   (p7) ld1.s r15 = [r14]
imz@rx2620:~/test-speculative-SIGSEGV$ cc c_s.s && ./a.out; echo $?
Illegal instruction
132

"ld1.s" is the "load 1 byte" instruction with the "speculative" flag.

If we do not use the "invalid" register in a "store" instruction, then
there is no fault:

imz@rx2620:~/test-speculative-SIGSEGV$ diff c_s.s c_nost.s
24,25d23
<   (p7) st1 [r14] = r15
<   nop 0
imz@rx2620:~/test-speculative-SIGSEGV$ cc c_nost.s && ./a.out; echo $?
239


And the second part:

The problem has a solution on IA64. The compiler would know how to
replay the faulty speculative computation, so it would be able
generate code to do this non-speculatively and trigger the real fault.
And there is an instruction that checks whether a register is
"valid"[1] and helps to jump to the recovery code[2]: "chk.s".

I've implemented this approach manually in c_chk.s like this (but I
have not seen what a compiler would do actually; IA64 has other
flavors of speculative instruct

Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Dmitry V. Levin
On Sat, Dec 29, 2018 at 06:03:42PM +0300, Ivan Zakharyaschev wrote:
[...]
> BTW, saving and forgetting the type of the original fault doesn't seem
> to be something expensive to implement (after some thought): when a
> register is marked as invalid, it shouldn't matter anymore what value
> it holds. So, the same register can be used to save the information
> about the type of the fault.

Note that SIGILL, SIGFPE, SIGSEGV, and SIGBUS come with si_addr specifying
the memory location which caused the fault.  When memory fault is
transformed into illegal operand failt, the location of the original
memory fault is likely lost, too - you can easily check this hypothesis
by installing a signal handler: if si_addr is not 0xbad from your example,
then it's been lost.


-- 
ldv


signature.asc
Description: PGP signature


Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Ivan Zakharyaschev
Hi,

On Sat, 29 Dec 2018, Dmitry V. Levin wrote:

> On Fri, Dec 28, 2018 at 05:23:09PM +0300, Ivan Zakharyaschev wrote:

> > As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 

> No, this particular case (++*argv[argc]) has nothing to do with tagged memory,
> I hope Ivan will share his findings here.

I've done it.

Thanks for your hints regarding a test for another kind of fault (SIGFPE) 
happenning speculatively, and regarding the hexadecimal values which are 
easy to detect visually (0xbad etc.)!

-- 
Best regards,
Ivan



Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Ivan Zakharyaschev
Hi Bruno,

On Sat, 29 Dec 2018, Bruno Haible wrote:

> > "system in development" is the one which suits 
> > Linux/E2k better. The port to E2K (MCST Elbrus general purpose hardware 
> > architecture) is quite mature, but not yet released publicly.
> 
> Thanks for the info. Based on it, I found a couple of other pointers as well:
> [1][2].

> [1] 
> https://linux.slashdot.org/story/99/03/31/2324218/linus-will-move-to-moscow-to-work-with-elbrus

[1] is fun. :)

> > As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 
> > ...
> > And it's not a segmentation fault.

Meanwhile, I have found out that my explanations about it being the 
consequence of tagged memory (at least, in this specific case of 
test-c-stack.c) were largely incorrect. I'm sorry for that misleading 
information. I've studied the assembler code and found the other true 
reason in this specific case: these are faults "hidden" in an explicitly 
"speculative" computation which utltimately result in SIGILL. (The E2K ISA 
is reminiscent of IA64; this can help get the idea.) The specific kind of 
the fault is "forgotten", unfortunately.

Bruno, this discovery makes your claims even more strong and relevant: 
this kind of fault is expected by all programs to be SIGSEGV normally, and 
they can't care whether the computation was done speculatively or not 
(i.e., with immediate effects).

> I believe you should make it signal a SIGSEGV or SIGBUS, not SIGILL, for
> the following reasons:
> 
> * Look at the second table in
>   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html.
>   It defines a couple of signal codes for SIGILL, SIGSEGV, and SIGBUS.
>   It implies that SIGILL means an invalid instruction (and "illegal operand"
>   means an invalid operand that is in the instruction stream).
>   Whereas SIGSEGV and SIGBUS mean a problem with an instruction in combination
>   with a memory address.

Thanks for the explanation concerning "illegal operand"! This was a 
rebuttal more relevant given my first imagined explanation, but not the 
actual one. But anyway important to know.

> * The main users of SIGSEGV and SIGBUS are catching stack overflow, garbage
>   collection, and similar (e.g. by use of GNU libsigsegv). The fact that
>   you observe an incompatibility between your Linux adaptation and
>   application programs that work fine across Linux/BSD/AIX/Solaris is a sure
>   indication that you will encounter similar incompatibilities along the 
> lines,
>   until you fix that port, to produce SIGSEGV or SIGBUS instead of SIGILL.

That's what I'm feeling now, too. It only remains a question concerning 
the hardware: whether it can save the type of the fault that happened in a 
speculative computation to give it back when the result of the speculative 
computation is actually needed.

> This reminds the segmented architectures, such as the ones used by AIX
> and Linux/ia64. In these OSes, SIGSEGV is produced when a memory address
> is used that does not fit with the instruction.

Thanks for the information about similar conditions (to what I wrote about 
tagged memory) in other OSes!

Besides, in many aspects including the newly mentioned by me explicitly 
speculative instructions, E2K reminds IA64.

And it'd be interesting to have a look how they treat faults coming from 
speculative computations in Linux/ia64 to get an idea whether it can be 
done in a manner with better conformance to POSIX.

* * *

Here are the actual facts about what happens on E2k (and little bit on 
IA64) with a set of minimal contrasting examples:

Here are four example programs; the first two write to the memory, the latter
two first read from the memory. (There is an amazing difference
between the last two examples.) Probably, the demonstrated contrasts
do not cover all conditions under which SIGILL can occur.

 $ cc -Wall -xc - && ./a.out; echo $?
 int main(int argc, char ** argv) {
   *(char*)0 = 175;
   return 0;
 }
 Segmentation fault
 139
 $ cc -Wall -xc - && ./a.out; echo $?
 int main(int argc, char ** argv) {
   if (0 < argc)
 *(char*)0 = 175;
   return 0;
 }
 Segmentation fault
 139
 $ cc -Wall -xc - && ./a.out; echo $?
 int main(int argc, char ** argv) {
   if (0 < argc)
 ++*(char*)0;
   return 0;
 }
 Illegal instruction
 132
 $ cc -Wall -xc - && ./a.out; echo $?
 int main(int argc, char ** argv) {
   ++*(char*)0;
   return 0;
 }
 Segmentation fault
 139
 $ cc --version
 lcc:1.23.12:Aug--6-2018:e2k-v4-linux
 gcc (GCC) 5.5.0 compatible
 $

This leads to a suspicion that not only the direction of the memory
access matters (read or write), but also the speculative execution of
the memory access instruction (in the third example) -- for the sake
of optimization, something is done before the actual value of the
condition is computed. (Otherwise, without a speculative computation,
it's unclear how a redundant condition can affect anything.) The
speculative instructions are written explicitly in E2K ISA (and this
is also li

Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Bruno Haible
I wrote:
> I believe you should make it signal a SIGSEGV or SIGBUS, not SIGILL, for
> the following reasons:

A third reason is that the application will want to react depending on the
memory address which produced the fault. (I mean the memory address of the
data, not of the instruction.) This memory address is available as si_addr
in the siginfo struct only for SIGSEGV and SIGBUS, see
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

Bruno




Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Bruno Haible
Andrey Savchenko wrote:
> This is not possible. Four generations of hardware are already
> manufactured and they use SIGILL for such cases. It may be fixed in
> future generations if CPU designers will agree to do so

The mapping from hardware exception code to Unix signal number is done in
software, not in hardware. For an example, look in
linux-4.20/arch/sparc/mm/fault_32.c.

Bruno




Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Dmitry V. Levin
On Sat, Dec 29, 2018 at 02:31:11PM +0300, Andrey Savchenko wrote:
> On Sat, 29 Dec 2018 12:17:32 +0100 Bruno Haible wrote:
> > > As for the SIGILL peculiarity, it has a reason in the Elbrus 
> > > architecture. 
> > > ...
> > > And it's not a segmentation fault.
> > 
> > I believe you should make it signal a SIGSEGV or SIGBUS, not SIGILL, for
> > the following reasons:
> > 
> > * Look at the second table in
> >   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html.
> >   It defines a couple of signal codes for SIGILL, SIGSEGV, and SIGBUS.
> >   It implies that SIGILL means an invalid instruction (and "illegal operand"
> >   means an invalid operand that is in the instruction stream).
> >   Whereas SIGSEGV and SIGBUS mean a problem with an instruction in 
> > combination
> >   with a memory address.
> > 
> > * The main users of SIGSEGV and SIGBUS are catching stack overflow, garbage
> >   collection, and similar (e.g. by use of GNU libsigsegv). The fact that
> >   you observe an incompatibility between your Linux adaptation and
> >   application programs that work fine across Linux/BSD/AIX/Solaris is a sure
> >   indication that you will encounter similar incompatibilities along the 
> > lines,
> >   until you fix that port, to produce SIGSEGV or SIGBUS instead of SIGILL.
> 
> This is not possible. Four generations of hardware are already
> manufactured and they use SIGILL for such cases. It may be fixed in
> future generations if CPU designers will agree to do so, but we
> have to deal with already produced and used in production hardware.

It's all up to the kernel what signal to generate in response
to that particular non-SIGSEGV kind of trap.

I agree with Bruno here, as long as the code in question causes SIGILL,
the architecture is not compatible and its users will suffer more
because of this unneeded incompatibility.


-- 
ldv


signature.asc
Description: PGP signature


Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Dmitry V. Levin
On Fri, Dec 28, 2018 at 05:23:09PM +0300, Ivan Zakharyaschev wrote:
> Hi Bruno,
> 
> On Thu, 20 Dec 2018, Bruno Haible wrote:
> 
> > > +# E2K (elbrus) systems send SIGILL on an access to an invalid 
> > > address.
> > 
> > This is a bug in the system. Access of an invalid address ought to produce a
> > SIGSEGV or SIGBUS.
> > 
> > 'elbrus' is not an important OS so far, for which it would be worth adding
> > workarounds in the gnulib source.
> > Is it still in development? -> If so, please fix that bug.
> > Or is it a museum system? -> If so, just bear with the test failure.
> 
> Of these descriptions, "system in development" is the one which suits 
> Linux/E2k better. The port to E2K (MCST Elbrus general purpose hardware 
> architecture) is quite mature, but not yet released publicly.
> 
> As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 
> AFAIU, a different protection mechanism comes into play here. It is based 
> on tagging values/memory: if an attempt is made to use a value in a way 
> which contradicts its tag, then the "illegal operand" condition arises. 
> Namely, a "load" instruction can expect a certain tag, and then there can 
> be a mismatch between the assumptions of the code and the actual value 
> and its tag.

No, this particular case (++*argv[argc]) has nothing to do with tagged memory,
I hope Ivan will share his findings here.


-- 
ldv


signature.asc
Description: PGP signature


Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Andrey Savchenko
Hi all!

On Sat, 29 Dec 2018 12:17:32 +0100 Bruno Haible wrote:
> > As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 
> > ...
> > And it's not a segmentation fault.
> 
> I believe you should make it signal a SIGSEGV or SIGBUS, not SIGILL, for
> the following reasons:
> 
> * Look at the second table in
>   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html.
>   It defines a couple of signal codes for SIGILL, SIGSEGV, and SIGBUS.
>   It implies that SIGILL means an invalid instruction (and "illegal operand"
>   means an invalid operand that is in the instruction stream).
>   Whereas SIGSEGV and SIGBUS mean a problem with an instruction in combination
>   with a memory address.
> 
> * The main users of SIGSEGV and SIGBUS are catching stack overflow, garbage
>   collection, and similar (e.g. by use of GNU libsigsegv). The fact that
>   you observe an incompatibility between your Linux adaptation and
>   application programs that work fine across Linux/BSD/AIX/Solaris is a sure
>   indication that you will encounter similar incompatibilities along the 
> lines,
>   until you fix that port, to produce SIGSEGV or SIGBUS instead of SIGILL.

This is not possible. Four generations of hardware are already
manufactured and they use SIGILL for such cases. It may be fixed in
future generations if CPU designers will agree to do so, but we
have to deal with already produced and used in production hardware.

Best regards,
Andrew Savchenko


pgpraseYmZo15.pgp
Description: PGP signature


Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-29 Thread Bruno Haible
Hi Ivan,

> "system in development" is the one which suits 
> Linux/E2k better. The port to E2K (MCST Elbrus general purpose hardware 
> architecture) is quite mature, but not yet released publicly.

Thanks for the info. Based on it, I found a couple of other pointers as well:
[1][2].

> As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 
> ...
> And it's not a segmentation fault.

I believe you should make it signal a SIGSEGV or SIGBUS, not SIGILL, for
the following reasons:

* Look at the second table in
  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html.
  It defines a couple of signal codes for SIGILL, SIGSEGV, and SIGBUS.
  It implies that SIGILL means an invalid instruction (and "illegal operand"
  means an invalid operand that is in the instruction stream).
  Whereas SIGSEGV and SIGBUS mean a problem with an instruction in combination
  with a memory address.

* The main users of SIGSEGV and SIGBUS are catching stack overflow, garbage
  collection, and similar (e.g. by use of GNU libsigsegv). The fact that
  you observe an incompatibility between your Linux adaptation and
  application programs that work fine across Linux/BSD/AIX/Solaris is a sure
  indication that you will encounter similar incompatibilities along the lines,
  until you fix that port, to produce SIGSEGV or SIGBUS instead of SIGILL.

> But wait, while writing this explanation, I seem to have come to see a way 
> how the code in test-c-stack.c:
> 
>   ++*argv[argc]; /* Intentionally dereference NULL.  */
> 
> could be rewritten to cause the intended SIGSEGV and not SIGILL like now:

If you get SIGSEGV in one case (write to the memory location), you should
also get SIGSEGV in the other case (read from the memory location).
 
> AFAIU, a different protection mechanism comes into play here. It is based 
> on tagging values/memory: if an attempt is made to use a value in a way 
> which contradicts its tag, then the "illegal operand" condition arises.

This reminds the segmented architectures, such as the ones used by AIX
and Linux/ia64. In these OSes, SIGSEGV is produced when a memory address
is used that does not fit with the instruction.

Bruno

[1] 
https://linux.slashdot.org/story/99/03/31/2324218/linus-will-move-to-moscow-to-work-with-elbrus
[2] http://elbrus2k.wikidot.com/elbrus-operating-system




Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-28 Thread Ivan Zakharyaschev
Hi Bruno,

On Thu, 20 Dec 2018, Bruno Haible wrote:

> > +  # E2K (elbrus) systems send SIGILL on an access to an invalid 
> > address.
> 
> This is a bug in the system. Access of an invalid address ought to produce a
> SIGSEGV or SIGBUS.
> 
> 'elbrus' is not an important OS so far, for which it would be worth adding
> workarounds in the gnulib source.
> Is it still in development? -> If so, please fix that bug.
> Or is it a museum system? -> If so, just bear with the test failure.

Of these descriptions, "system in development" is the one which suits 
Linux/E2k better. The port to E2K (MCST Elbrus general purpose hardware 
architecture) is quite mature, but not yet released publicly.

As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 
AFAIU, a different protection mechanism comes into play here. It is based 
on tagging values/memory: if an attempt is made to use a value in a way 
which contradicts its tag, then the "illegal operand" condition arises. 
Namely, a "load" instruction can expect a certain tag, and then there can 
be a mismatch between the assumptions of the code and the actual value 
and its tag.

And it's not a segmentation fault.

(This must be just a simple case of the use of tagging in this 
architecture, whereas--AFAIK--MCST has been developing some smarter 
protection modes to make use of tags to track the array bounds along with 
pointers and for other things. The smarter modes are probably not enabled 
by default in the compiler. Now, I could google up a 2018 report on such 
recent work by searching for "elbrus" "e2k" "SIGILL", in Russian.)

But wait, while writing this explanation, I seem to have come to see a way 
how the code in test-c-stack.c:

  ++*argv[argc]; /* Intentionally dereference NULL.  */

could be rewritten to cause the intended SIGSEGV and not SIGILL like now:

$ ./test-c-stack 1; echo $?
Illegal instruction
132
$ 

The tags that are seen and checked by a "load" instruction must have been 
stored before. So, if we now think about storing values to memory, we see 
that when storing a value, one is not checking the tag, but rather writing 
it initially. So (at least in the simple protection mode), there can be no 
SIGILL when writing.

And I've tested running test-c-stack with this code instead:

  *argv[argc] = 175; /* Intentionally dereference NULL.  */

and it indeed causes a SIGSEGV:

$ ./test-c-stack 1; echo $?
test-c-stack: stack overflow
77
$ 

and with libsigsegv:

$ ./test-c-stack 1; echo $?
test-c-stack: program error
Aborted
134
$ ./test-c-stack2.sh; echo $?
0
$ 

So, now I suggest a patch that replaces the reading-and-then-writing a 
value at this place with just writing a value. (A complete patch is 
attached.) This way we don't need a workaround in the test for the 
Linux/E2K platform, and the test shouldn't have got worse.

There is a possibility to follow the "first-writing" part by a 
"then-reading" part, but this doesn't seem to be essential. At least, on 
E2K and probably most other architectures it would never come to it. (But 
that way the new code would be closer to the old code in the involved 
operations, and who knows, there might be some architecture where one 
needs to read to cause a fault.)

-- 
Best regards,
IvanFrom 057259bd81fbb60233df00d0a2846304088e1d47 Mon Sep 17 00:00:00 2001
From: Ivan Zakharyaschev 
Date: Fri, 28 Dec 2018 17:03:18 +0300
Subject: [PATCH] c-stack tests: Avoid test failure on Linux/E2K.

Reading a value without having initialized it caused a SIGILL on
Linux/E2K rather than SIGSEGV as desired.

This made test-c-stack2.sh fail on E2K. As for test-c-stack2.sh, its
intention is to test whether we can tell a stack overflow from other
cases when SIGSEGV is sent, and the way we cause a SIGSEGV in this
test is just an implementation detail. It turned out that these
implementation details need to be slightly changed for Linux/E2K.
---
 tests/test-c-stack.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/test-c-stack.c b/tests/test-c-stack.c
index 1dae74e6c..14fec8e07 100644
--- a/tests/test-c-stack.c
+++ b/tests/test-c-stack.c
@@ -63,7 +63,9 @@ main (int argc, char **argv)
   if (1 < argc)
 {
   exit_failure = 77;
-  ++*argv[argc]; /* Intentionally dereference NULL.  */
+  *argv[argc] = 175; /* Intentionally dereference NULL.  Writing an
+arbitrary value, because reading without having
+initialized it causes a SIGILL on Linux/E2K.  */
 }
   return recurse (0);
 }
-- 
2.19.2



Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-19 Thread Bruno Haible
Hi Ivan,

> +# E2K (elbrus) systems send SIGILL on an access to an invalid 
> address.

This is a bug in the system. Access of an invalid address ought to produce a
SIGSEGV or SIGBUS.

'elbrus' is not an important OS so far, for which it would be worth adding
workarounds in the gnulib source.
Is it still in development? -> If so, please fix that bug.
Or is it a museum system? -> If so, just bear with the test failure.

Bruno




[RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

2018-12-15 Thread Ivan Zakharyaschev
I can think of two ways to think about the purpose of this test:

1. distinguish stack overflow from an access to an invalid address
("programm error")

2. distinguish stack overflow from other cases when SIGSEGV is sent

Under view 2, then the access to an invalid address is just an
implementation detail: a simple way to cause SIGSEGV.

I assume view 2 in this patch and simply consider the platform which
doesn't send a SIGSEGV on this condition (but rather sends SIGILL as
E2K (i.e., elbrus)) not suitable for this implementation of the
test. Therefore, the result is skip.

Under view 1, it could even be consiidered a success: the distinction
is made, but not thanks to our code, but thanks to the platform
sending a different signal.

Here is what it looks like on E2K (i.e., elbrus):

$ ./test-c-stack 1; echo $?
Illegal instruction
132
$ ./test-c-stack; echo $?
test-c-stack: stack overflow
1
$
---
 tests/test-c-stack2.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/test-c-stack2.sh b/tests/test-c-stack2.sh
index 0cd49c969..a04d861cd 100755
--- a/tests/test-c-stack2.sh
+++ b/tests/test-c-stack2.sh
@@ -23,6 +23,11 @@ case $? in
 exit 77
   fi
   ;;
+  132) echo 'not applicable if non-SIGSEGV is sent in the case to be told from 
stack overflow' >&2
+  # E2K (elbrus) systems send SIGILL on an access to an invalid 
address.
+  # So, this test is skipped:
+  exit 77
+  ;;
   0) (exit 1); exit 1 ;;
 esac
 if grep 'program error' t-c-stack2.tmp >/dev/null ; then
-- 
2.17.1