Michael Knight wrote:
I'm starting an honours year at uni this year. My supervisor and I are thinking of doing a paper on trying to determine if memory-mapping source files will speed up the lexical analysis phase of various compilers (instead of whatever buffering method it currently employs).
The problem you've got is that lexical analysis makes heavy use of the C library's ungetc(). So gains from memory-mapped I/O are going to be lost implementing an alternative ungetc() [ie, you'll find yourself implementing the C stdio library].
Note that most of the I/O hit doesn't come from the initial lexical scan, but from the intermediate files. I'd expect C to have a lot more I/O attributable to lexing than most languages (because of it's use of .h files and it has two lexical passes -- one by the pre-compiler and one by the compiler).
It's an area worth persisting with though, as almost every effort to drop the cost or amount of I/O when compiling a project has been a big win (see distcc, ccache, the gcc -pipe options, the argument for using include in Makefiles as opposed to heirarchical Makefiles, precompiled .h files).
Operating systems have a hard time with compiling because of the bivariate nature of a compile -- repeatedly I/O-bound and then CPU-bound. So tuning based on recent activity (ie, most automatic tuning in OSs) fails to optimise.
Personally, since this is just a honours project and you don't want something too open-ended, why not seek to accurately characterise the OS demands of a typical large project build (eg, Samba, Apache, etc).
There's some nice work going on characterising the demands of the Linux boot sequence, and you could leverage those tools and techniques.
Break the compiling process into stages (and the gcc tool chain is pretty modular so there's a reasonable but not complete mapping of stage to compiler program, but the tools I told you about above allow you to mark a stage transition with a small mod to the program). Now track the I/O and CPU attributable to each stage for various options (eg, heirachical v included Makefiles, -O optimisation levels, static v dynamic linking). Similarly for OS tuning.
Enough raw material there for a very fine paper and thesis.
-- Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936 Australia's Academic & Research Network www.aarnet.edu.au -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
