Dear QEMU developers -- I have had some email conversation with a few active developers, and with their encouragement, want to open it up for the whole list to comment.
For several years my research group at UMass has been developing generic code-generator generator (CGG) technology. Historic CGGs have always been tied to a particular code-generation framework, that is, to a particular intermediate representation (IR) and compiler. Our tool, called GIST (for Generator of Instruction Selectors Tool), is designed to work from any reasonable IR and to connect to any reasonable framework. More technical details below, but what we are hoping for is to be able to say that if we make this industrial-strength with some funding from the National Science Foundation, the QEMU community will be interested in using it. No commitment -- just that you think it *might* be a good idea if we can make it go. We would use QEMU as one of our "demo" environments. Ok, more details. We have an architecture description language called CISL (CoGenT Instruction Set Language; CoGenT is our overall project's name). It is somewhat like C or Java in appearance. You define the various memories and registers, and the instructions. To generate an instruction selector from input ISA A (generally a compiler IR, but not necessarily) to output ISA B (generally, but not always, a hardware architecture), you start with descriptions of A and B in CISL -- some of which may already be around. You also write what we call a *mapping* from A to B, which simply indicates where on B each memory/register of A should go. The tool then finds instruction selector patterns, at least one for each instruction of the A machine. For any given retargetable *framework* (compiler, interpreter, emulator), we write one *adapter*, that knows how to take GIST patterns in their internal form and write them out in the way that the framework needs them. Here's an example. Suppose we are going from A = QEMU IR to B = MIPS, that is, the same as the TCG back end for an emulator running on the MIPS processor. We have written a CISL description for the QEMU IR (yes, already), and suppose we have one for the MIPS, sufficient for code generation anyway. [Side note: Compilers do not generally use every instruction of their target, e.g., not the privileged mode ones, etc. Also, in the presence of register allocation, they generally target a slightly virtualized machine -- one with a huge number of registers, which register allocation then resolves to real registers and occasional spilled locations.] The mapping would talk about how to find QEMU memory on the MIPS (perhaps a dedicated base register), etc., and would also capture the conventions for calling helper routines, and so on. The adapter for QEMU TCG back ends would generate something like a C switch statement with one case for each QEMU IR instruction. Each case might have some additional case analysis. This is because (as you see in QEMU), a given IR instruction can have special cases depending on values of constants, whether something is in a register, etc. GIST will have found different patterns for each of these, and with each one there would be a *constraint*, indicating when it applies. For example, patterns for adding a constant value on the MIPS would likely have a special case for constants that fit in 16 bits, since then you can use one immediate instruction. Likewise, the constant 0 is a special case since it can just be a move. In addition to constraints, patterns have costs, which one can develop for any given target, but would typically be based on number of instructions, number of instruction bytes, number of memory references, etc. Thus the case analysis for a given instruction would check for the lowest cost patterns first, and would conclude with the most general pattern (but which may be the most expensive). The adapter would also need to generate the information needed by the QEMU TCG register allocator. Now, here are some things of additional interest: - While QEMU IR -> emulation host code-generation is maybe the most obvious case, we can also handle the "front end" emulation target -> QEMU IR generation. This probably requires a slightly different description of machine A than when A is the emulation host -- after all, we must handle *all* instructions, including privileged ones, etc. But it is possible to make the descriptions modular in such a way that instructions used in both cases are not repeated. - I noticed that someone is looking at interpretation rather than compilation. We have seen that we can generate functional simulators (very close to emulators) from CISL descriptions. Thus, it would be possible to generate a simulator for any of the machines of interest. What QEMU provides is a framework with all the memory and device modeling, etc. - An approach like this might make it easier to maintain a range of different models of the same ISA. It might also facilitate moving towards multicore emulation, maybe even heterogeneous multicore. It would also make it easier to change around how the simulated memory is organized and accessed, if that would be helpful. - It would make it particularly simple to build an emulator for a new or extended machine. Of course you still need a compiler for it, but we can use the same description to generate a C compiler, etc. This would be a several year long project, with real support ($$) for three or more years. The goal is for GIST to have its own self-sustaining open-source community after that. We are in conversation with some other software communities of interest concerning whether they would also be in favor of the project. These include the Jikes RVM Java Virtual Machine project (both the optimizing and the non-optimizing compilers), another compiler framework, and a simulator framework. I look forward to your thoughts, questions, and reactions. Regards -- Eliot Moss ============================================================================== J. Eliot B. Moss, Professor http://www.cs.umass.edu/~moss www Director, Arch. and Lang. Impl. Lab. +1-413-545-4206 voice Department of Computer Science +1-413-695-4226 cell 140 Governor's Drive, Room 372 +1-413-545-1249 fax University of Massachusetts at Amherst m...@cs.umass.edu email Amherst, MA 01003-9264 USA +1-413-545-2746 Laurie Downey sec'y ==============================================================================