Michael Knight wrote:

I'm starting an honours year at uni this year. My supervisor and I are
thinking of doing a paper on trying to determine if memory-mapping
source files will speed up the lexical analysis phase of various
compilers (instead of whatever buffering method it currently employs).

The problem you've got is that lexical analysis makes heavy use of the C library's ungetc(). So gains from memory-mapped I/O are going to be lost implementing an alternative ungetc() [ie, you'll find yourself implementing the C stdio library].

Note that most of the I/O hit doesn't come from the initial
lexical scan, but from the intermediate files.  I'd expect C
to have a lot more I/O attributable to lexing than most
languages (because of it's use of .h files and it has two
lexical passes -- one by the pre-compiler and one by the
compiler).

It's an area worth persisting with though, as almost every
effort to drop the cost or amount of I/O when compiling a
project has been a big win (see distcc, ccache, the gcc -pipe
options, the argument for using include in Makefiles as
opposed to heirarchical Makefiles, precompiled .h files).

Operating systems have a hard time with compiling because
of the bivariate nature of a compile -- repeatedly I/O-bound
and then CPU-bound. So tuning based on recent activity (ie,
most automatic tuning in OSs) fails to optimise.


Personally, since this is just a honours project and you don't want something too open-ended, why not seek to accurately characterise the OS demands of a typical large project build (eg, Samba, Apache, etc).

There's some nice work going on characterising the demands
of the Linux boot sequence, and you could leverage those tools
and techniques.

Break the compiling process into stages (and the gcc tool chain is
pretty modular so there's a reasonable but not complete mapping of
stage to compiler program, but the tools I told you about above allow
you to mark a stage transition with a small mod to the program).  Now
track the I/O and CPU attributable to each stage for various options
(eg, heirachical v included Makefiles, -O optimisation levels, static
v dynamic linking).  Similarly for OS tuning.

Enough raw material there for a very fine paper and thesis.

--
 Glen Turner         Tel: (08) 8303 3936 or +61 8 8303 3936
 Australia's Academic & Research Network  www.aarnet.edu.au
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to