Yes,
But we have to replace natural crappy code (split a long in 2 ints) that
was once legal, by an even more crappy code (memcpy), so all in all, it's a
crappy art.

Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a
écrit :

> Hi Nicolas,
>   thanks for the comment, you are right the problem is the bad
> original code. But my opinion is that it just is not reporting the
> situation correctly, generating a warning or non-optimizing the code
> looks more like a expected behavior. Because as I have said, using a
> constant as index in the last statement generates a meaningful warning
> and the non-optimizated version of the function.
>
> And again as you said, the only thing to learn about all this is that
> we should not write crappy code.
>
> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier
> <[email protected]> wrote:
> >
> > Of course, when I say "your" code, it's the code you have shown, and
> probably "our" (VMMaker) code ;)
> >
> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier <
> [email protected]> a écrit :
> >>
> >> Hi Pablo (again),
> >> no, not a bug.
> >>
> >> The problem is in the source code. The compiler has the right to
> presume that your code is exempt of UB, because you cannot depend on UB
> (obviously).
> >> So it can eliminate all code which corresponds to UB.
> >>
> >> The compiler has the right to assume that a pointer to an int cannot
> point to a long (UB).
> >> So modifying a long cannot have any sort of impact on the content of
> the int pointer.
> >> So the compiler can decouple both path return int content and assign
> long.
> >> But assigning the long has no effect, so the code can be suppressed
> altogether.
> >>
> >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a
> écrit :
> >>>
> >>> Hi Aliaksei,
> >>>       to me it looks like a bug of GCC optimization. Basically, it is
> >>> assuming that the x variable is used but never read or its value is
> >>> never used. Also it assumes the same of the i variable, as we are only
> >>> accessing indirectly to the memory where it locates (the code is even
> >>> assuming that the variable exists, but it can be optimize out as in
> >>> this scenario). Even though, the original C code is valid C code, we
> >>> are not helping the compiler writing code like that. So I have
> >>> rewritten the code in a way that does not use indirect memory access
> >>> to the stack space.
> >>>
> >>> One thing more that makes me think is a bug, if you use an int
> >>> constant as the index and not a parameter, the error does not occur
> >>> (the code is not badly optimized) and there is a warning about the
> >>> not-so-great access to the stack.
> >>>
> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]>
> wrote:
> >>> >
> >>> > Hi Pablo,
> >>> >
> >>> > Wow! Thank you for the detective story :)
> >>> >
> >>> > Do I understand correctly that the original code causes undefined
> behavior and therefore can be changed (or even removed) by the compiler?
> >>> > (because it returns something that is referencing memory on the
> stack)
> >>> >
> >>> > Please keep posting similar things in future! It is very educative :)
> >>> >
> >>> > Cheers,
> >>> > Alex
> >>> >
> >>> >
> >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]>
> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>     this mail is related to Pharo because it is knowledge I found
> >>> >> debugging the build of the VM, but the rest is to document it and
> >>> >> perhaps someone will found it interesting (also I couldn't find it
> >>> >> easily using Google). Sorry for the long mail!
> >>> >>
> >>> >> The problem
> >>> >> ==========
> >>> >>
> >>> >> The following code does not produce good code in 8.3 when using
> optimizations:
> >>> >>
> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){
> >>> >>    long v;
> >>> >>    long x = i >> 3;
> >>> >>
> >>> >>    v = x;
> >>> >>    return ((int*)(&v))[index];
> >>> >> }
> >>> >>
> >>> >> #include <stdio.h>
> >>> >>
> >>> >> int main(){
> >>> >>
> >>> >>     long i;
> >>> >>     int x;
> >>> >>
> >>> >>     scanf("%ld", &i);
> >>> >>     scanf("%d", &x);
> >>> >>
> >>> >>     printf("%ld",myFunc(i,x));
> >>> >> }
> >>> >>
> >>> >> Basically, with -02, it generates the following code:
> >>> >>
> >>> >> myFunc:
> >>> >>      movslq %esi, %rsi
> >>> >>      movslq -8(%rsp,%rsi,4), %rax
> >>> >>      ret
> >>> >>
> >>> >> And with -01 it generates the following code:
> >>> >>
> >>> >> myFunc:
> >>> >>      sarq $3, %rdi
> >>> >>      movq %rdi, -8(%rsp)
> >>> >>      movslq %esi, %rsi
> >>> >>      movslq -8(%rsp,%rsi,4), %rax
> >>> >>      ret
> >>> >>
> >>> >> As you can see, the more optimized version is losing the bit shift
> (or
> >>> >> any other operation).
> >>> >> To detect the problem it was not easy, basically, I have used the
> >>> >> Pharo Tests to detect the failing function and then to understand
> the
> >>> >> generation of code in GCC, we need to debug its generation.
> >>> >>
> >>> >> The first snippet is generated using:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall
> >>> >>
> >>> >> and the second using:
> >>> >>
> >>> >> gcc -O1 prb.c -S -o prb.s -Wall
> >>> >>
> >>> >> The GCC pipeline has different steps, I have centered my self in the
> >>> >> tree optimizations.
> >>> >> GCC dumps each of the intermediate states of the compilation,
> working
> >>> >> with C like trees. For example to get the optimized version of the
> >>> >> program, before generating the Assembler we have to do:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
> >>> >>
> >>> >> This will generate:
> >>> >>
> >>> >> __attribute__((noinline))
> >>> >> myFunc (long int i, int index)
> >>> >> {
> >>> >>   long int v;
> >>> >>   long unsigned int _1;
> >>> >>   long unsigned int _2;
> >>> >>   int * _3;
> >>> >>   int _4;
> >>> >>   long int _8;
> >>> >>
> >>> >>   <bb 2> [local count: 1073741825]:
> >>> >>   _1 = (long unsigned int) index_7(D);
> >>> >>   _2 = _1 * 4;
> >>> >>   _3 = &v + _2;
> >>> >>   _4 = *_3;
> >>> >>   _8 = (long int) _4;
> >>> >>   v ={v} {CLOBBER};
> >>> >>   return _8;
> >>> >>
> >>> >> }
> >>> >>
> >>> >> This is the optimized SSA (static single assign) version of the
> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
> >>> >> As you can see in this version the code is already optimized, and
> our
> >>> >> operations not correctly generated. We expect to see something like:
> >>> >>
> >>> >> __attribute__((noinline))
> >>> >> myFunc (long int i, int index)
> >>> >> {
> >>> >>   long int x;
> >>> >>   long int v;
> >>> >>   long unsigned int _1;
> >>> >>   long unsigned int _2;
> >>> >>   int * _3;
> >>> >>   int _4;
> >>> >>   long int _10;
> >>> >>
> >>> >>   <bb 2> [local count: 1073741825]:
> >>> >>   x_6 = i_5(D) >> 3;
> >>> >>   v = x_6;
> >>> >>   _1 = (long unsigned int) index_9(D);
> >>> >>   _2 = _1 * 4;
> >>> >>   _3 = &v + _2;
> >>> >>   _4 = *_3;
> >>> >>   _10 = (long int) _4;
> >>> >>   v ={v} {CLOBBER};
> >>> >>   return _10;
> >>> >>
> >>> >> }
> >>> >>
> >>> >> In the first SSA version, we are lacking the shift operation:
> >>> >>
> >>> >>   x_6 = i_5(D) >> 3;
> >>> >>   v = x_6;
> >>> >>
> >>> >> So, we need to see in which of the optimizations and transformations
> >>> >> our code is lost:
> >>> >> To see all the steps we should execute:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
> >>> >>
> >>> >> This will generate a lot, really a lot, of files in the same
> directory.
> >>> >> They have the name: prb.c.xxx.yyyy
> >>> >> Where xxx is the number of the step, and yyyy is the name of the
> step.
> >>> >> Each of the files contains the result of applying the changes, so
> what
> >>> >> I have done is looking in binary search where my code was lost.
> >>> >>
> >>> >> I have found that the problem was in the first dead code elimination
> >>> >> step (prb.c.041t.cddce1)
> >>> >> GCC does not like the crap code that we are writing, as we are
> copying
> >>> >> a stack variable and then accessing directly to the memory:
> >>> >>
> >>> >> v = x;
> >>> >> return ((int*)(&v))[index];
> >>> >>
> >>> >> So, basically, it was assuming that we were only using v and not x,
> so
> >>> >> all the code modifying x is described (this optimization is called
> >>> >> "dead store code deletion"). Basically tries to remove all the code
> >>> >> that is affecting stack (temporary) variables that are never used.
> >>> >>
> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I
> have
> >>> >> fixed the problem writing the code in a proper way.
> >>> >>
> >>> >> Sorry again for the long mail, I wanted to store the knowledge and
> >>> >> propagate it before I eventually will flush it. I like these things,
> >>> >> but I am not debugging the GCC optimization pipeline all the days.
> >>> >>
> >>> >> --
> >>> >> Pablo Tesone.
> >>> >> [email protected]
> >>> >>
> >>>
> >>>
> >>> --
> >>> Pablo Tesone.
> >>> [email protected]
> >>>
>
>
> --
> Pablo Tesone.
> [email protected]
>
>

Reply via email to