On Sun, Sep 8, 2013 at 11:17 AM, Armin Rigo <ar...@tunes.org> wrote: > Hi again, > > On Sun, Sep 8, 2013 at 9:42 AM, Armin Rigo <ar...@tunes.org> wrote: >> We've been suitably impressed by the results on the new llvm backend >> during the sprint (well, or suitably un-impressed by both gcc and >> clang's failure to reconstruct the SSA meaning of the C code). > > I have investigated a bit more and it's quite unclear that this would > be the source of the difference. It seems that the "-flto" option of > gcc, enabling link-time optimization, actually gives very good > improvements over the same compilation without this option --- some > 11-14%, more so than, say, the typical 5% reported with CPython. If I > had to guess, I'd say it is because of the particularly disorganized > kind of C code produced by RPyhon. > > About the llvm backend, one detail hints that it might be the reason > for the speed improvement: the fact that the current llvm backend > produces most of the source code in a single file. This may be what > gives llvm extra room for improvements. This is precisely the same > room for improvement that "-flto" also gives gcc, considering that we > generate many C files with never-"static" functions. > > I tried to compile a no-jit version of PyPy from the > llvm-translation-backend branch, for comparison, but this fails right > now with "NotImplementedError: v585190 = debug_offset()". It > successfully compiles targetrpystonedalone (in -O2 mode), though. I > get the following results (with the argument "100000000"): > > plain gcc 4.7.3: 1.95 seconds > llvm 3.3: 1.75 seconds > gcc with -flto: 1.66 seconds > > If we get similar results on the whole PyPy, then I fear the llvm > backend is going back to where it already went to several time: "not > useful enough". We can simply add the -flto flag to the generated > Makefiles. Manuel, do you feel like trying to compare? I'm modifying > the Makefile manually as follows: > > CFLAGS = ...... -flto -fno-fat-lto-objects > LDFLAGS = ..... -flto=8 -O3
The type of machine-generated code produced PyPy is difficult for compilers to optimize (lots of seemingly unstructured gotos, state machines, unusual basic block heuristics) when presented in a high-level langauge like C. The distribution of the source code across a large number of source files also complicates the optimization process. GCC and LLVM link-time optimization can overcome some of these problems by allowing the compiler to "see" more of the program and optimize across the source files. Directly generating LLVM IR accomplishes a similar benefit. With some of the recent changes to GCC, one also directly could generate GCC IR. LLVM makes it very convenient to directly input the IR and take advantage of optimization opportunities allowed by such an input method, but the performance benefit is not likely due to other difference in optimization pipelines and code generation capabilities. In addition to the GCC -flto option, you should consider if -fwhole-program also is appropriate (I believe that it is). GCC has additional optimizations that can help with the style of code generated by programs like PyPy. PyPy does not generate code with computed gotos, but the aggressive use of gotos are different than normal user-written code and probably can benefit from non-default compiler optimization heuristics. There is no obvious recommendation, but experiments with enabling / disabling some forms of GCSE (-fgcse, -fgcse-lm, -fgcse-sm, -fgcse-las, -fgcse-after-reload) as well as some of the parameters (crossjumping, goto-duplication, inlining limits) might benefit PyPy. One can achieve performance gains with either compiler through adjustments to the generated code and the compiler optimization heuristics. Thanks, David _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev