It's the tribute to pay for optimization. With unbounded pointer and pointer aliasing, the C compiler is in the worse world for optimizing. As long as a FORTRAN compiler will beat a C compiler in generated code efficiency, the C guys will have to amend the standard ;) So we have plenty of new rules (C89 is 30 years ago, not exactly as new as we pretend), the restrict keyword, etc...
Le mer. 11 déc. 2019 à 21:14, Eliot Miranda <[email protected]> a écrit : > > > On Wed, Dec 11, 2019 at 12:03 PM Nicolas Cellier < > [email protected]> wrote: > >> Yes, >> But we have to replace natural crappy code (split a long in 2 ints) that >> was once legal, by an even more crappy code (memcpy), so all in all, it's a >> crappy art. >> > > :-) Indeed. Personally I liked it when C was a portable assembler. > That's what it's fit for and that's what it should be good at. Trying to > pretend it's Pascal is, um, pretentious? > > >> >> Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a >> écrit : >> >>> Hi Nicolas, >>> thanks for the comment, you are right the problem is the bad >>> original code. But my opinion is that it just is not reporting the >>> situation correctly, generating a warning or non-optimizing the code >>> looks more like a expected behavior. Because as I have said, using a >>> constant as index in the last statement generates a meaningful warning >>> and the non-optimizated version of the function. >>> >>> And again as you said, the only thing to learn about all this is that >>> we should not write crappy code. >>> >>> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier >>> <[email protected]> wrote: >>> > >>> > Of course, when I say "your" code, it's the code you have shown, and >>> probably "our" (VMMaker) code ;) >>> > >>> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier < >>> [email protected]> a écrit : >>> >> >>> >> Hi Pablo (again), >>> >> no, not a bug. >>> >> >>> >> The problem is in the source code. The compiler has the right to >>> presume that your code is exempt of UB, because you cannot depend on UB >>> (obviously). >>> >> So it can eliminate all code which corresponds to UB. >>> >> >>> >> The compiler has the right to assume that a pointer to an int cannot >>> point to a long (UB). >>> >> So modifying a long cannot have any sort of impact on the content of >>> the int pointer. >>> >> So the compiler can decouple both path return int content and assign >>> long. >>> >> But assigning the long has no effect, so the code can be suppressed >>> altogether. >>> >> >>> >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> >>> a écrit : >>> >>> >>> >>> Hi Aliaksei, >>> >>> to me it looks like a bug of GCC optimization. Basically, it is >>> >>> assuming that the x variable is used but never read or its value is >>> >>> never used. Also it assumes the same of the i variable, as we are >>> only >>> >>> accessing indirectly to the memory where it locates (the code is even >>> >>> assuming that the variable exists, but it can be optimize out as in >>> >>> this scenario). Even though, the original C code is valid C code, we >>> >>> are not helping the compiler writing code like that. So I have >>> >>> rewritten the code in a way that does not use indirect memory access >>> >>> to the stack space. >>> >>> >>> >>> One thing more that makes me think is a bug, if you use an int >>> >>> constant as the index and not a parameter, the error does not occur >>> >>> (the code is not badly optimized) and there is a warning about the >>> >>> not-so-great access to the stack. >>> >>> >>> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]> >>> wrote: >>> >>> > >>> >>> > Hi Pablo, >>> >>> > >>> >>> > Wow! Thank you for the detective story :) >>> >>> > >>> >>> > Do I understand correctly that the original code causes undefined >>> behavior and therefore can be changed (or even removed) by the compiler? >>> >>> > (because it returns something that is referencing memory on the >>> stack) >>> >>> > >>> >>> > Please keep posting similar things in future! It is very educative >>> :) >>> >>> > >>> >>> > Cheers, >>> >>> > Alex >>> >>> > >>> >>> > >>> >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]> >>> wrote: >>> >>> >> >>> >>> >> Hi, >>> >>> >> this mail is related to Pharo because it is knowledge I found >>> >>> >> debugging the build of the VM, but the rest is to document it and >>> >>> >> perhaps someone will found it interesting (also I couldn't find it >>> >>> >> easily using Google). Sorry for the long mail! >>> >>> >> >>> >>> >> The problem >>> >>> >> ========== >>> >>> >> >>> >>> >> The following code does not produce good code in 8.3 when using >>> optimizations: >>> >>> >> >>> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){ >>> >>> >> long v; >>> >>> >> long x = i >> 3; >>> >>> >> >>> >>> >> v = x; >>> >>> >> return ((int*)(&v))[index]; >>> >>> >> } >>> >>> >> >>> >>> >> #include <stdio.h> >>> >>> >> >>> >>> >> int main(){ >>> >>> >> >>> >>> >> long i; >>> >>> >> int x; >>> >>> >> >>> >>> >> scanf("%ld", &i); >>> >>> >> scanf("%d", &x); >>> >>> >> >>> >>> >> printf("%ld",myFunc(i,x)); >>> >>> >> } >>> >>> >> >>> >>> >> Basically, with -02, it generates the following code: >>> >>> >> >>> >>> >> myFunc: >>> >>> >> movslq %esi, %rsi >>> >>> >> movslq -8(%rsp,%rsi,4), %rax >>> >>> >> ret >>> >>> >> >>> >>> >> And with -01 it generates the following code: >>> >>> >> >>> >>> >> myFunc: >>> >>> >> sarq $3, %rdi >>> >>> >> movq %rdi, -8(%rsp) >>> >>> >> movslq %esi, %rsi >>> >>> >> movslq -8(%rsp,%rsi,4), %rax >>> >>> >> ret >>> >>> >> >>> >>> >> As you can see, the more optimized version is losing the bit >>> shift (or >>> >>> >> any other operation). >>> >>> >> To detect the problem it was not easy, basically, I have used the >>> >>> >> Pharo Tests to detect the failing function and then to understand >>> the >>> >>> >> generation of code in GCC, we need to debug its generation. >>> >>> >> >>> >>> >> The first snippet is generated using: >>> >>> >> >>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall >>> >>> >> >>> >>> >> and the second using: >>> >>> >> >>> >>> >> gcc -O1 prb.c -S -o prb.s -Wall >>> >>> >> >>> >>> >> The GCC pipeline has different steps, I have centered my self in >>> the >>> >>> >> tree optimizations. >>> >>> >> GCC dumps each of the intermediate states of the compilation, >>> working >>> >>> >> with C like trees. For example to get the optimized version of the >>> >>> >> program, before generating the Assembler we have to do: >>> >>> >> >>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout >>> >>> >> >>> >>> >> This will generate: >>> >>> >> >>> >>> >> __attribute__((noinline)) >>> >>> >> myFunc (long int i, int index) >>> >>> >> { >>> >>> >> long int v; >>> >>> >> long unsigned int _1; >>> >>> >> long unsigned int _2; >>> >>> >> int * _3; >>> >>> >> int _4; >>> >>> >> long int _8; >>> >>> >> >>> >>> >> <bb 2> [local count: 1073741825]: >>> >>> >> _1 = (long unsigned int) index_7(D); >>> >>> >> _2 = _1 * 4; >>> >>> >> _3 = &v + _2; >>> >>> >> _4 = *_3; >>> >>> >> _8 = (long int) _4; >>> >>> >> v ={v} {CLOBBER}; >>> >>> >> return _8; >>> >>> >> >>> >>> >> } >>> >>> >> >>> >>> >> This is the optimized SSA (static single assign) version of the >>> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html) >>> >>> >> As you can see in this version the code is already optimized, and >>> our >>> >>> >> operations not correctly generated. We expect to see something >>> like: >>> >>> >> >>> >>> >> __attribute__((noinline)) >>> >>> >> myFunc (long int i, int index) >>> >>> >> { >>> >>> >> long int x; >>> >>> >> long int v; >>> >>> >> long unsigned int _1; >>> >>> >> long unsigned int _2; >>> >>> >> int * _3; >>> >>> >> int _4; >>> >>> >> long int _10; >>> >>> >> >>> >>> >> <bb 2> [local count: 1073741825]: >>> >>> >> x_6 = i_5(D) >> 3; >>> >>> >> v = x_6; >>> >>> >> _1 = (long unsigned int) index_9(D); >>> >>> >> _2 = _1 * 4; >>> >>> >> _3 = &v + _2; >>> >>> >> _4 = *_3; >>> >>> >> _10 = (long int) _4; >>> >>> >> v ={v} {CLOBBER}; >>> >>> >> return _10; >>> >>> >> >>> >>> >> } >>> >>> >> >>> >>> >> In the first SSA version, we are lacking the shift operation: >>> >>> >> >>> >>> >> x_6 = i_5(D) >> 3; >>> >>> >> v = x_6; >>> >>> >> >>> >>> >> So, we need to see in which of the optimizations and >>> transformations >>> >>> >> our code is lost: >>> >>> >> To see all the steps we should execute: >>> >>> >> >>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all >>> >>> >> >>> >>> >> This will generate a lot, really a lot, of files in the same >>> directory. >>> >>> >> They have the name: prb.c.xxx.yyyy >>> >>> >> Where xxx is the number of the step, and yyyy is the name of the >>> step. >>> >>> >> Each of the files contains the result of applying the changes, so >>> what >>> >>> >> I have done is looking in binary search where my code was lost. >>> >>> >> >>> >>> >> I have found that the problem was in the first dead code >>> elimination >>> >>> >> step (prb.c.041t.cddce1) >>> >>> >> GCC does not like the crap code that we are writing, as we are >>> copying >>> >>> >> a stack variable and then accessing directly to the memory: >>> >>> >> >>> >>> >> v = x; >>> >>> >> return ((int*)(&v))[index]; >>> >>> >> >>> >>> >> So, basically, it was assuming that we were only using v and not >>> x, so >>> >>> >> all the code modifying x is described (this optimization is called >>> >>> >> "dead store code deletion"). Basically tries to remove all the >>> code >>> >>> >> that is affecting stack (temporary) variables that are never used. >>> >>> >> >>> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I >>> have >>> >>> >> fixed the problem writing the code in a proper way. >>> >>> >> >>> >>> >> Sorry again for the long mail, I wanted to store the knowledge and >>> >>> >> propagate it before I eventually will flush it. I like these >>> things, >>> >>> >> but I am not debugging the GCC optimization pipeline all the days. >>> >>> >> >>> >>> >> -- >>> >>> >> Pablo Tesone. >>> >>> >> [email protected] >>> >>> >> >>> >>> >>> >>> >>> >>> -- >>> >>> Pablo Tesone. >>> >>> [email protected] >>> >>> >>> >>> >>> -- >>> Pablo Tesone. >>> [email protected] >>> >>> > > -- > _,,,^..^,,,_ > best, Eliot >
