Yes, But we have to replace natural crappy code (split a long in 2 ints) that was once legal, by an even more crappy code (memcpy), so all in all, it's a crappy art.
Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a écrit : > Hi Nicolas, > thanks for the comment, you are right the problem is the bad > original code. But my opinion is that it just is not reporting the > situation correctly, generating a warning or non-optimizing the code > looks more like a expected behavior. Because as I have said, using a > constant as index in the last statement generates a meaningful warning > and the non-optimizated version of the function. > > And again as you said, the only thing to learn about all this is that > we should not write crappy code. > > On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier > <[email protected]> wrote: > > > > Of course, when I say "your" code, it's the code you have shown, and > probably "our" (VMMaker) code ;) > > > > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier < > [email protected]> a écrit : > >> > >> Hi Pablo (again), > >> no, not a bug. > >> > >> The problem is in the source code. The compiler has the right to > presume that your code is exempt of UB, because you cannot depend on UB > (obviously). > >> So it can eliminate all code which corresponds to UB. > >> > >> The compiler has the right to assume that a pointer to an int cannot > point to a long (UB). > >> So modifying a long cannot have any sort of impact on the content of > the int pointer. > >> So the compiler can decouple both path return int content and assign > long. > >> But assigning the long has no effect, so the code can be suppressed > altogether. > >> > >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a > écrit : > >>> > >>> Hi Aliaksei, > >>> to me it looks like a bug of GCC optimization. Basically, it is > >>> assuming that the x variable is used but never read or its value is > >>> never used. Also it assumes the same of the i variable, as we are only > >>> accessing indirectly to the memory where it locates (the code is even > >>> assuming that the variable exists, but it can be optimize out as in > >>> this scenario). Even though, the original C code is valid C code, we > >>> are not helping the compiler writing code like that. So I have > >>> rewritten the code in a way that does not use indirect memory access > >>> to the stack space. > >>> > >>> One thing more that makes me think is a bug, if you use an int > >>> constant as the index and not a parameter, the error does not occur > >>> (the code is not badly optimized) and there is a warning about the > >>> not-so-great access to the stack. > >>> > >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]> > wrote: > >>> > > >>> > Hi Pablo, > >>> > > >>> > Wow! Thank you for the detective story :) > >>> > > >>> > Do I understand correctly that the original code causes undefined > behavior and therefore can be changed (or even removed) by the compiler? > >>> > (because it returns something that is referencing memory on the > stack) > >>> > > >>> > Please keep posting similar things in future! It is very educative :) > >>> > > >>> > Cheers, > >>> > Alex > >>> > > >>> > > >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]> > wrote: > >>> >> > >>> >> Hi, > >>> >> this mail is related to Pharo because it is knowledge I found > >>> >> debugging the build of the VM, but the rest is to document it and > >>> >> perhaps someone will found it interesting (also I couldn't find it > >>> >> easily using Google). Sorry for the long mail! > >>> >> > >>> >> The problem > >>> >> ========== > >>> >> > >>> >> The following code does not produce good code in 8.3 when using > optimizations: > >>> >> > >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){ > >>> >> long v; > >>> >> long x = i >> 3; > >>> >> > >>> >> v = x; > >>> >> return ((int*)(&v))[index]; > >>> >> } > >>> >> > >>> >> #include <stdio.h> > >>> >> > >>> >> int main(){ > >>> >> > >>> >> long i; > >>> >> int x; > >>> >> > >>> >> scanf("%ld", &i); > >>> >> scanf("%d", &x); > >>> >> > >>> >> printf("%ld",myFunc(i,x)); > >>> >> } > >>> >> > >>> >> Basically, with -02, it generates the following code: > >>> >> > >>> >> myFunc: > >>> >> movslq %esi, %rsi > >>> >> movslq -8(%rsp,%rsi,4), %rax > >>> >> ret > >>> >> > >>> >> And with -01 it generates the following code: > >>> >> > >>> >> myFunc: > >>> >> sarq $3, %rdi > >>> >> movq %rdi, -8(%rsp) > >>> >> movslq %esi, %rsi > >>> >> movslq -8(%rsp,%rsi,4), %rax > >>> >> ret > >>> >> > >>> >> As you can see, the more optimized version is losing the bit shift > (or > >>> >> any other operation). > >>> >> To detect the problem it was not easy, basically, I have used the > >>> >> Pharo Tests to detect the failing function and then to understand > the > >>> >> generation of code in GCC, we need to debug its generation. > >>> >> > >>> >> The first snippet is generated using: > >>> >> > >>> >> gcc -O2 prb.c -S -o prb.s -Wall > >>> >> > >>> >> and the second using: > >>> >> > >>> >> gcc -O1 prb.c -S -o prb.s -Wall > >>> >> > >>> >> The GCC pipeline has different steps, I have centered my self in the > >>> >> tree optimizations. > >>> >> GCC dumps each of the intermediate states of the compilation, > working > >>> >> with C like trees. For example to get the optimized version of the > >>> >> program, before generating the Assembler we have to do: > >>> >> > >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout > >>> >> > >>> >> This will generate: > >>> >> > >>> >> __attribute__((noinline)) > >>> >> myFunc (long int i, int index) > >>> >> { > >>> >> long int v; > >>> >> long unsigned int _1; > >>> >> long unsigned int _2; > >>> >> int * _3; > >>> >> int _4; > >>> >> long int _8; > >>> >> > >>> >> <bb 2> [local count: 1073741825]: > >>> >> _1 = (long unsigned int) index_7(D); > >>> >> _2 = _1 * 4; > >>> >> _3 = &v + _2; > >>> >> _4 = *_3; > >>> >> _8 = (long int) _4; > >>> >> v ={v} {CLOBBER}; > >>> >> return _8; > >>> >> > >>> >> } > >>> >> > >>> >> This is the optimized SSA (static single assign) version of the > >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html) > >>> >> As you can see in this version the code is already optimized, and > our > >>> >> operations not correctly generated. We expect to see something like: > >>> >> > >>> >> __attribute__((noinline)) > >>> >> myFunc (long int i, int index) > >>> >> { > >>> >> long int x; > >>> >> long int v; > >>> >> long unsigned int _1; > >>> >> long unsigned int _2; > >>> >> int * _3; > >>> >> int _4; > >>> >> long int _10; > >>> >> > >>> >> <bb 2> [local count: 1073741825]: > >>> >> x_6 = i_5(D) >> 3; > >>> >> v = x_6; > >>> >> _1 = (long unsigned int) index_9(D); > >>> >> _2 = _1 * 4; > >>> >> _3 = &v + _2; > >>> >> _4 = *_3; > >>> >> _10 = (long int) _4; > >>> >> v ={v} {CLOBBER}; > >>> >> return _10; > >>> >> > >>> >> } > >>> >> > >>> >> In the first SSA version, we are lacking the shift operation: > >>> >> > >>> >> x_6 = i_5(D) >> 3; > >>> >> v = x_6; > >>> >> > >>> >> So, we need to see in which of the optimizations and transformations > >>> >> our code is lost: > >>> >> To see all the steps we should execute: > >>> >> > >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all > >>> >> > >>> >> This will generate a lot, really a lot, of files in the same > directory. > >>> >> They have the name: prb.c.xxx.yyyy > >>> >> Where xxx is the number of the step, and yyyy is the name of the > step. > >>> >> Each of the files contains the result of applying the changes, so > what > >>> >> I have done is looking in binary search where my code was lost. > >>> >> > >>> >> I have found that the problem was in the first dead code elimination > >>> >> step (prb.c.041t.cddce1) > >>> >> GCC does not like the crap code that we are writing, as we are > copying > >>> >> a stack variable and then accessing directly to the memory: > >>> >> > >>> >> v = x; > >>> >> return ((int*)(&v))[index]; > >>> >> > >>> >> So, basically, it was assuming that we were only using v and not x, > so > >>> >> all the code modifying x is described (this optimization is called > >>> >> "dead store code deletion"). Basically tries to remove all the code > >>> >> that is affecting stack (temporary) variables that are never used. > >>> >> > >>> >> I am not sure if this is a bug in GCC, so I will report it. But I > have > >>> >> fixed the problem writing the code in a proper way. > >>> >> > >>> >> Sorry again for the long mail, I wanted to store the knowledge and > >>> >> propagate it before I eventually will flush it. I like these things, > >>> >> but I am not debugging the GCC optimization pipeline all the days. > >>> >> > >>> >> -- > >>> >> Pablo Tesone. > >>> >> [email protected] > >>> >> > >>> > >>> > >>> -- > >>> Pablo Tesone. > >>> [email protected] > >>> > > > -- > Pablo Tesone. > [email protected] > >
