Michael Maraist wrote:
> > All --
> >
> > > I've created a varargs-ish example by making a new op, print_s_v.
> > > This is pretty rough, and I haven't updated the assembler, but it
> > > seems to work.
>
>
> With var-args, we could produce highly efficient SIMD instructions.
> printf obviously, multi-push/pop, createArrays/ Hashes.
>
> I toyed with the idea of a stack-based ALU to handle algebra such as:
>
> ( ( a + b*c ) / d + e ) / ( f + g + h )
>
> // compute dest, "fmt", register-names
> compute Rd, "++^*+/+", f, g, h, a, b, c, d, e
>
> Unfortunately I couldn't find a way of garunteeing that it's efficient.
> The best I've come up with so far is to use a mini-switch statement. The
> only reason it might be faster than an optimized switch-statement over all
> op-codes is that the switch has fewer terms.
>
> The above case also gets complex if we don't use multiple registers for
> intermediate calculations.
>
> None the less the possibilities are cool.
>
I had more time to think about it, and I determined how a compute op-code
could be efficient.
Pros:
o First of all, there is no need to assign to temporary registers (as would
be necessary for the compiler).
o Secondly, I'm under the impression that compilers convert algebraic
expressions to reverse polish notation anyway (to reduce the temporaries);
though parallel computing is an obvious exception (when multiple ops can be
executing simultaneously). This theoretically simplifies the job of the
compiler.
o Third, only a single register access is required for each parameter
(where-as an expression might otherwise require multiple accesses).
o Fourth, it's possible to check for a subset of two or more adjacent
operations such as "++++" which would quickly add four number with n-1
physical adds ('accumulator = a+ b + c + d + e') which can be optimized in
for parallel op-code execution within the physical CPU. Or "^+" which
normally pushes a stack-element, adds top two, then pops. Here it just adds
to the top-element. To prevent d^n number of operations, only common ones
are defined. The default block of the switch checks for individual
operations.
o Fifth, the op-code is a lot more dense: (16 + 6 * var) verses (16 * n).
In inner loops, this could add up with cache-savings (don't pay retail).
Cons:
o The overhead of an op-code is probably not much greater than the total
overhead of this instruction (especially with a switch-based op-dispatcher
and -O2 compilation)
o Potentially long op-chains are possible which could thwart responsiveness
to synchronous event-handling. (Unless there is a max-fmt size)
o Potential portability issues if we deal with 16bit "shorts". Might have to
require 32bit ints as the smallest macro-symbol
/** FMT: compute v_arg_size, destination-register, fmt_const_string, ( list
of reg-indexes )
*contents of fmt_string:
* + //add top two stack el's
* - // sub ..
* * // mul ..
* / // divide ..
* % // take modulous
* & // logical and
* | // logical or
* P // push register onto local stack
* C // push constant onto local stack
* D // duplicate top stack element
* S // switch top two elements
* \0 // end-of-fmt-string. Must be word-aligned with such eofs eg
"PPP++\0\0\0"
******
* Note that the above symbols should probably be part of a c-enum so that
the switch statement can be optimized
**/
AUTO_OP computei_varg {
static int stack[MAX_STACK]; // make 1 larger than the max-size of
allowable varg_size
unsigned int top = 0; // points to safe location
STRING *s = Parrot_string_constants[P2];
int args_offset = 3 +code ; // pointer to the start of parameters (could
have just been word_align( strlen(code[4]) ) )
short * dops = (short*)s->bufstart; // get format
int f_first_two_op;
char * sops; // single_char_ops
// sanity check against varg_size
if ( code[1] > MAX_STACK )
die "invalid size-of operation";
stack[ 0 ] = 0; // null-value (should never read)
stack[ 1 ] = 0; // initially stack contains null
// compare pairs of op-codes (theoretically could nest this into the
default statement of a 4-op comparison)
while (*dops && top) {
switch( *dops ) {
case PUSH_PLUS: // "P+" or ('^' << 8) +'+'
stack[top] += INT_REG( code[ args_offset++ ] );
break;
case CONST_PLUS: // "C+"
stack[top] += code[ args_offset++ ];
case PLUS_PLUS: // "++"
stack[top] += stack[top--] + stack[top--];
...
defailt: // fall back to the single-character case for unrecognized
pairs
f_first_two_op = 0;
sops = (char*)dops;
while(!f_first_two_op++) {
switch( *((char*) sops)) {
case '+': ...
case '-': ...
...
default: die "invalid op-code"
} // end 1char-switch
} // end 1char-while
} // end 2char-switch
} // end 2char-while
if ( top )
INT_REG( P2 ) = stack[top];
else
die "invalid number of pushes";
} // computei_varg
-Michael