On Mon, 18 Oct 2004 14:17:59 -0500 (CDT), Michel Pelletier <[EMAIL PROTECTED]> wrote: > Okay, note that the code I mentioned (the > speration of core from core words) is not > checked in right now, but the version in CVS > does do NCG.
Noted. > Using the direct threading model, this does 2000 > global lookups and subroutine invokes, which in > turn, do the actual "work" of 1000 > multiplications and the associated stack > traffic. The lookups and invokes are pure > inner-loop overhead. > > Using NCG this does 1000 multiplications and the > associated stack traffic (which can be optimized > out for the most part) with no lookups or > invokes. > > The overhead of diect threading vs. NCG does not > need to be benchmarked, it can be proven by > argument: both methods execute the same code the > same way, but the NCG method does 2000 less > global lookups and invokes. Indeed. Pardon my ignorance. I hadn't thought things all the way through. > The "extra" compiler overhead is trivial, and it > only applies to compile-time; generally when a > program is started. At run-time (when all those > lookups and invokes are happening in the direct > thread case) there is no additional compilation > overhead because a word is compiled only once. This still doesn't seem right. The compilation from Forth to PIR only happens once, yes. But each time the defined word is used, the PIR code, which is injected, must be compiled to bytecode. You said earlier that: > direct thrading this would rsult in the > execution of: > > find_global $P0, "dup" > invoke $P0 > find_global $P0, "mul" > invoke $P0 > > in NCG it would result in the execution of: > > .POP > .NOS = .TOS > .PUSH2 # this can be optimized out > .POP2 # of NCG, but not direct threading > .TOS = .TOS * .NOS > .PUSH The second PIR sequence is longer. It will take IMCC more time to compile that than the first example. As the words become less trivial, this will become more true. But like you said, this only happens at (a) compile time or (b) at the interactive prompt. And optimizing out push/pop combos will speed things up more, though I'm not sure how to implement that optimization using PIR. So it may be programs can fall on either side of the fence of this issue. Building words in terms of other words will give NCG an advantage. But using relatively simple words many times will give direct threading an advantage. But I do believe you when you say that NCG is fastest overall (read, for most programs). Furthermore, our two models will behave differently when you redefine a word. Consider this Forth example: : inc 1 + ; : print+ inc . ; : inc 2 + ; Should print+ increment by one or by two? gforth increments by one. I'd be interesting in knowing which was the "correct" behavior. -- matt