It's the tribute to pay for optimization.
With unbounded pointer and pointer aliasing, the C compiler is in the worse
world for optimizing.
As long as a FORTRAN compiler will beat a C compiler in generated code
efficiency, the C guys will have to amend the standard ;)
So we have plenty of new rules (C89 is 30 years ago, not exactly as new as
we pretend), the restrict keyword, etc...

Le mer. 11 déc. 2019 à 21:14, Eliot Miranda <[email protected]> a
écrit :

>
>
> On Wed, Dec 11, 2019 at 12:03 PM Nicolas Cellier <
> [email protected]> wrote:
>
>> Yes,
>> But we have to replace natural crappy code (split a long in 2 ints) that
>> was once legal, by an even more crappy code (memcpy), so all in all, it's a
>> crappy art.
>>
>
> :-)  Indeed.  Personally I liked it when C was a portable assembler.
> That's what it's fit for and that's what it should be good at.  Trying to
> pretend it's Pascal is, um, pretentious?
>
>
>>
>> Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a
>> écrit :
>>
>>> Hi Nicolas,
>>>   thanks for the comment, you are right the problem is the bad
>>> original code. But my opinion is that it just is not reporting the
>>> situation correctly, generating a warning or non-optimizing the code
>>> looks more like a expected behavior. Because as I have said, using a
>>> constant as index in the last statement generates a meaningful warning
>>> and the non-optimizated version of the function.
>>>
>>> And again as you said, the only thing to learn about all this is that
>>> we should not write crappy code.
>>>
>>> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier
>>> <[email protected]> wrote:
>>> >
>>> > Of course, when I say "your" code, it's the code you have shown, and
>>> probably "our" (VMMaker) code ;)
>>> >
>>> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier <
>>> [email protected]> a écrit :
>>> >>
>>> >> Hi Pablo (again),
>>> >> no, not a bug.
>>> >>
>>> >> The problem is in the source code. The compiler has the right to
>>> presume that your code is exempt of UB, because you cannot depend on UB
>>> (obviously).
>>> >> So it can eliminate all code which corresponds to UB.
>>> >>
>>> >> The compiler has the right to assume that a pointer to an int cannot
>>> point to a long (UB).
>>> >> So modifying a long cannot have any sort of impact on the content of
>>> the int pointer.
>>> >> So the compiler can decouple both path return int content and assign
>>> long.
>>> >> But assigning the long has no effect, so the code can be suppressed
>>> altogether.
>>> >>
>>> >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]>
>>> a écrit :
>>> >>>
>>> >>> Hi Aliaksei,
>>> >>>       to me it looks like a bug of GCC optimization. Basically, it is
>>> >>> assuming that the x variable is used but never read or its value is
>>> >>> never used. Also it assumes the same of the i variable, as we are
>>> only
>>> >>> accessing indirectly to the memory where it locates (the code is even
>>> >>> assuming that the variable exists, but it can be optimize out as in
>>> >>> this scenario). Even though, the original C code is valid C code, we
>>> >>> are not helping the compiler writing code like that. So I have
>>> >>> rewritten the code in a way that does not use indirect memory access
>>> >>> to the stack space.
>>> >>>
>>> >>> One thing more that makes me think is a bug, if you use an int
>>> >>> constant as the index and not a parameter, the error does not occur
>>> >>> (the code is not badly optimized) and there is a warning about the
>>> >>> not-so-great access to the stack.
>>> >>>
>>> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]>
>>> wrote:
>>> >>> >
>>> >>> > Hi Pablo,
>>> >>> >
>>> >>> > Wow! Thank you for the detective story :)
>>> >>> >
>>> >>> > Do I understand correctly that the original code causes undefined
>>> behavior and therefore can be changed (or even removed) by the compiler?
>>> >>> > (because it returns something that is referencing memory on the
>>> stack)
>>> >>> >
>>> >>> > Please keep posting similar things in future! It is very educative
>>> :)
>>> >>> >
>>> >>> > Cheers,
>>> >>> > Alex
>>> >>> >
>>> >>> >
>>> >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]>
>>> wrote:
>>> >>> >>
>>> >>> >> Hi,
>>> >>> >>     this mail is related to Pharo because it is knowledge I found
>>> >>> >> debugging the build of the VM, but the rest is to document it and
>>> >>> >> perhaps someone will found it interesting (also I couldn't find it
>>> >>> >> easily using Google). Sorry for the long mail!
>>> >>> >>
>>> >>> >> The problem
>>> >>> >> ==========
>>> >>> >>
>>> >>> >> The following code does not produce good code in 8.3 when using
>>> optimizations:
>>> >>> >>
>>> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){
>>> >>> >>    long v;
>>> >>> >>    long x = i >> 3;
>>> >>> >>
>>> >>> >>    v = x;
>>> >>> >>    return ((int*)(&v))[index];
>>> >>> >> }
>>> >>> >>
>>> >>> >> #include <stdio.h>
>>> >>> >>
>>> >>> >> int main(){
>>> >>> >>
>>> >>> >>     long i;
>>> >>> >>     int x;
>>> >>> >>
>>> >>> >>     scanf("%ld", &i);
>>> >>> >>     scanf("%d", &x);
>>> >>> >>
>>> >>> >>     printf("%ld",myFunc(i,x));
>>> >>> >> }
>>> >>> >>
>>> >>> >> Basically, with -02, it generates the following code:
>>> >>> >>
>>> >>> >> myFunc:
>>> >>> >>      movslq %esi, %rsi
>>> >>> >>      movslq -8(%rsp,%rsi,4), %rax
>>> >>> >>      ret
>>> >>> >>
>>> >>> >> And with -01 it generates the following code:
>>> >>> >>
>>> >>> >> myFunc:
>>> >>> >>      sarq $3, %rdi
>>> >>> >>      movq %rdi, -8(%rsp)
>>> >>> >>      movslq %esi, %rsi
>>> >>> >>      movslq -8(%rsp,%rsi,4), %rax
>>> >>> >>      ret
>>> >>> >>
>>> >>> >> As you can see, the more optimized version is losing the bit
>>> shift (or
>>> >>> >> any other operation).
>>> >>> >> To detect the problem it was not easy, basically, I have used the
>>> >>> >> Pharo Tests to detect the failing function and then to understand
>>> the
>>> >>> >> generation of code in GCC, we need to debug its generation.
>>> >>> >>
>>> >>> >> The first snippet is generated using:
>>> >>> >>
>>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall
>>> >>> >>
>>> >>> >> and the second using:
>>> >>> >>
>>> >>> >> gcc -O1 prb.c -S -o prb.s -Wall
>>> >>> >>
>>> >>> >> The GCC pipeline has different steps, I have centered my self in
>>> the
>>> >>> >> tree optimizations.
>>> >>> >> GCC dumps each of the intermediate states of the compilation,
>>> working
>>> >>> >> with C like trees. For example to get the optimized version of the
>>> >>> >> program, before generating the Assembler we have to do:
>>> >>> >>
>>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
>>> >>> >>
>>> >>> >> This will generate:
>>> >>> >>
>>> >>> >> __attribute__((noinline))
>>> >>> >> myFunc (long int i, int index)
>>> >>> >> {
>>> >>> >>   long int v;
>>> >>> >>   long unsigned int _1;
>>> >>> >>   long unsigned int _2;
>>> >>> >>   int * _3;
>>> >>> >>   int _4;
>>> >>> >>   long int _8;
>>> >>> >>
>>> >>> >>   <bb 2> [local count: 1073741825]:
>>> >>> >>   _1 = (long unsigned int) index_7(D);
>>> >>> >>   _2 = _1 * 4;
>>> >>> >>   _3 = &v + _2;
>>> >>> >>   _4 = *_3;
>>> >>> >>   _8 = (long int) _4;
>>> >>> >>   v ={v} {CLOBBER};
>>> >>> >>   return _8;
>>> >>> >>
>>> >>> >> }
>>> >>> >>
>>> >>> >> This is the optimized SSA (static single assign) version of the
>>> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
>>> >>> >> As you can see in this version the code is already optimized, and
>>> our
>>> >>> >> operations not correctly generated. We expect to see something
>>> like:
>>> >>> >>
>>> >>> >> __attribute__((noinline))
>>> >>> >> myFunc (long int i, int index)
>>> >>> >> {
>>> >>> >>   long int x;
>>> >>> >>   long int v;
>>> >>> >>   long unsigned int _1;
>>> >>> >>   long unsigned int _2;
>>> >>> >>   int * _3;
>>> >>> >>   int _4;
>>> >>> >>   long int _10;
>>> >>> >>
>>> >>> >>   <bb 2> [local count: 1073741825]:
>>> >>> >>   x_6 = i_5(D) >> 3;
>>> >>> >>   v = x_6;
>>> >>> >>   _1 = (long unsigned int) index_9(D);
>>> >>> >>   _2 = _1 * 4;
>>> >>> >>   _3 = &v + _2;
>>> >>> >>   _4 = *_3;
>>> >>> >>   _10 = (long int) _4;
>>> >>> >>   v ={v} {CLOBBER};
>>> >>> >>   return _10;
>>> >>> >>
>>> >>> >> }
>>> >>> >>
>>> >>> >> In the first SSA version, we are lacking the shift operation:
>>> >>> >>
>>> >>> >>   x_6 = i_5(D) >> 3;
>>> >>> >>   v = x_6;
>>> >>> >>
>>> >>> >> So, we need to see in which of the optimizations and
>>> transformations
>>> >>> >> our code is lost:
>>> >>> >> To see all the steps we should execute:
>>> >>> >>
>>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
>>> >>> >>
>>> >>> >> This will generate a lot, really a lot, of files in the same
>>> directory.
>>> >>> >> They have the name: prb.c.xxx.yyyy
>>> >>> >> Where xxx is the number of the step, and yyyy is the name of the
>>> step.
>>> >>> >> Each of the files contains the result of applying the changes, so
>>> what
>>> >>> >> I have done is looking in binary search where my code was lost.
>>> >>> >>
>>> >>> >> I have found that the problem was in the first dead code
>>> elimination
>>> >>> >> step (prb.c.041t.cddce1)
>>> >>> >> GCC does not like the crap code that we are writing, as we are
>>> copying
>>> >>> >> a stack variable and then accessing directly to the memory:
>>> >>> >>
>>> >>> >> v = x;
>>> >>> >> return ((int*)(&v))[index];
>>> >>> >>
>>> >>> >> So, basically, it was assuming that we were only using v and not
>>> x, so
>>> >>> >> all the code modifying x is described (this optimization is called
>>> >>> >> "dead store code deletion"). Basically tries to remove all the
>>> code
>>> >>> >> that is affecting stack (temporary) variables that are never used.
>>> >>> >>
>>> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I
>>> have
>>> >>> >> fixed the problem writing the code in a proper way.
>>> >>> >>
>>> >>> >> Sorry again for the long mail, I wanted to store the knowledge and
>>> >>> >> propagate it before I eventually will flush it. I like these
>>> things,
>>> >>> >> but I am not debugging the GCC optimization pipeline all the days.
>>> >>> >>
>>> >>> >> --
>>> >>> >> Pablo Tesone.
>>> >>> >> [email protected]
>>> >>> >>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Pablo Tesone.
>>> >>> [email protected]
>>> >>>
>>>
>>>
>>> --
>>> Pablo Tesone.
>>> [email protected]
>>>
>>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>

Reply via email to