Just tested the latest revision of the hg repository. Works nicely (compiling a project with 'debug'):
-> % make UR=$HOME/code/urweb-hg/build/bin/urweb P=ref /Users/a/code/urweb-hg/build/bin/urweb ref clang -Wimplicit -Werror -O3 -fno-inline -I /Users/a/code/urweb-hg/build/include/urweb -c /tmp/webapp.c -o /tmp/webapp.o -g clang -Werror -O3 -lm -pthread -L/Users/a/code/urweb-hg/build/lib/urweb/.. -lurweb -lurweb_http -lssl -lcrypto -lz /tmp/webapp.o -o /Users/a/code/ur/ref.exe -lsqlite3 -g -L/usr/local/lib clang: warning: argument unused during compilation: '-pthread' -> % I guess for clang on OS X, '-pthread' is redundant, but I don't think it's much to worry about. So, for the whole 'link time' thing, I decided to put my money where my mouth/brain is. I took the easy way. I booted up a Debian VM and installed GCC 4.6 (with LTO support) and GNU gold out of the sid/unstable tree. GCC 4.6 has relatively stable link time optimization features that can build working things like the linux kernel or firefox. So, I hacked the build system to use LTO on my machine. This was pretty easy, but required a few changes at build time and configuration time. I tested with a statically linked copy of the Ur/Web runtime system and the builtin HTTP server. I used the 'ref' demo from the main site since it seemed appropriate - the compiler will remove all the polymorphism, but there should still be a good bit of code in the resulting executable. Furthermore, the result doesn't necessarily use, say, all of the Ur basis functions which are written in C, so these could be eliminated by dead code elimination at compile time. The SQL tables were backed by sqlite. The results were quite impressive, just from a size perspective: a@pylon1:~/ur$ du -h ref.lto.exe 72K ref.lto.exe a@pylon1:~/ur$ Versus: a@pylon1:~/ur$ du -h ref.exe 308K ref.exe a@pylon1:~/ur$ So the compiler was able to eliminate a *lot* of dead code! That's pretty awesome. I haven't done any thorough speed tests with any sort of HTTP stress tester yet, although I probably should I guess. Is there a good benchmark for doing this? To replicate these results, a few changes are needed: 1) In src/c/Makefile.am, you need to add the additional compiler flags '-flto -fwhole-program' - this tells GCC to emit LTO info into object files, and optimize every individual object file as if it was a 'whole program' 2) Generate new makefiles 3) When you configure the Ur/Web compiler, you need to do this: $ GCCARGS="-flto -fwhole-program -fuse-linker-plugin" CFLAGS="-O3" ./configure ... The -flto and -fwhole-program flags are thus used on all executables compiled by the Ur/Web compiler. The result of all of this is that all of the C files for the runtime and all executables compiled by the Ur/Web compiler are compiled with LTO information. Note the use of -fuse-linker-plugin - this flag is crucial, and it requires the gold linker (aptitude install binutils-gold.) It allows the linker to do LTO with object files inside archives. Since the RTS is compiled into archives, this is crucial for any sort of real LTO to work because a good bit of code will reside there. Also note the need for CFLAGS="-O3" - this is because by default, autoconf will compile at -O2, and the Ur/Web compiler compiles at -O3. However, unless I'm mistaken, GCC requires that for LTO to work, object files must be compiled with the same optimization settings. So this evens things out. A final note which is worth mentioning: I removed the Ur/Web compiler's "-fno-inline" invocation to GCC - is there a reason for this flag, like a bug? The resulting applications I've tried seem to work, but I removed it because I was trying to expose as many optimization opportunities as possible. I have not tested the LTO changes with this flag in effect, so I don't know if it makes any crucial difference (but this is worth checking.) Anyway, the final compile line for the compiler looks like this with LTO: gcc -flto=4 -fwhole-program -fuse-linker-plugin -Wimplicit -Werror -O3 -I /home/a/urweb/build/include/urweb -c /tmp/webapp.c -o /tmp/webapp.o gcc -Werror -O3 -lm -pthread -flto=4 -fwhole-program -fuse-linker-plugin /home/a/urweb/build/lib/urweb/../liburweb_http.a /home/a/urweb/build/lib/urweb/../liburweb.a -L/usr/lib -lssl -lcrypto /tmp/webapp.o -o /home/a/ur/ref.exe -lsqlite3 And the final compile line for the compiler without LTO is this: gcc -Wimplicit -Werror -O3 -fno-inline -I /home/a/urweb/build/include/urweb -c /tmp/webapp.c -o /tmp/webapp.o -g gcc -Werror -O3 -lm -pthread /home/a/urweb/build/lib/urweb/../liburweb_http.a /home/a/urweb/build/lib/urweb/../liburweb.a -L/usr/lib -lssl -lcrypto /tmp/webapp.o -o /home/a/ur/ref.exe -lsqlite3 -g The reduction in executable size is, IMO, quite good. However I imagine a lot of this is the result of dead code elimination, and not so much raw optimization, but I could be wrong. But these changes bring the resulting *static* executable within ~8 kb of the *dynamically* linked version. That's insanely good I think from a dead code perspective: a@pylon1:~/ur$ du -h ref.dyn.exe 64K ref.dyn.exe a@pylon1:~/ur$ Anyway this email is getting a bit long so I'll drop it here, but hopefully these results seem interesting to you. Like I said, I don't know if it *really* matters though - I don't think Ur/Web's speed is going to be the limiting factor in web apps any time soon from what I've seen. But hey, more speed and smaller executables is always good. On Fri, Jun 17, 2011 at 3:34 PM, Adam Chlipala <[email protected]> wrote: > austin seipp wrote: >> >> I was also thinking about making it possible to compile the ur/web >> runtime as an LLVM bitcode file, and do similar for programs you >> compile with the compiler. Then the final link step can merge the >> files and effectively do whole program optimization, before emitting a >> final executable. Alas, I don't know autoconf very well, nor do I know >> automake (I'd need to make automake generate LLVM bitcode archives or >> simply bitcode files instead of object file archives, which would >> require some infrastructure I think.) >> > > I'd be happy to add support for this, if you can tell me all of the > appropriate changes that lie outside of Ur/Web's SML/C source code. :) > > A good first step would be to demonstrate the sequence of commands used to > build the final application from all of the C sources that go into it. To > get the C source for an individual Ur/Web application, you can run 'urweb' > with the '-debug' flag, in which case the C source is left in /tmp/webapp.c. > The C-compilation command lines Ur/Web is presently using will also be > printed in that mode. > > _______________________________________________ > Ur mailing list > [email protected] > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur > -- Regards, Austin _______________________________________________ Ur mailing list [email protected] http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
