Michael Maraist wrote:

> > All --
> >
> > > I've created a varargs-ish example by making a new op, print_s_v.
> > > This is pretty rough, and I haven't updated the assembler, but it
> > > seems to work.
>
>
> With var-args, we could produce highly efficient SIMD instructions.
> printf obviously, multi-push/pop, createArrays/ Hashes.
>
> I toyed with the idea of a stack-based ALU to handle algebra such as:
>
> ( ( a + b*c ) / d + e ) / ( f + g + h )
>
> // compute dest, "fmt", register-names
> compute Rd, "++^*+/+", f, g, h, a, b, c, d, e
>
> Unfortunately I couldn't find a way of garunteeing that it's efficient.
> The best I've come up with so far is to use a mini-switch statement.  The
> only reason it might be faster than an optimized switch-statement over all
> op-codes is that the switch has fewer terms.
>
> The above case also gets complex if we don't use multiple registers for
> intermediate calculations.
>
> None the less the possibilities are cool.
>

I had more time to think about it, and I determined how a compute op-code
could be efficient.
Pros:
o First of all, there is no need to assign to temporary registers (as would
be necessary for the compiler).
o Secondly, I'm under the impression that compilers convert algebraic
expressions to reverse polish notation anyway (to reduce the temporaries);
though parallel computing is an obvious exception (when multiple ops can be
executing simultaneously).  This theoretically simplifies the job of the
compiler.
o Third, only a single register access is required for each parameter
(where-as an expression might otherwise require multiple accesses).
o Fourth, it's possible to check for a subset of two or more adjacent
operations such as "++++" which would quickly add four number with n-1
physical adds ('accumulator = a+ b + c + d  + e') which can be optimized in
for parallel op-code execution within the physical CPU.  Or "^+" which
normally pushes a stack-element, adds top two, then pops.  Here it just adds
to the top-element.  To prevent d^n number of operations, only common ones
are defined.  The default block of the switch checks for individual
operations.
o Fifth, the op-code is a lot more dense:  (16 + 6 * var)  verses (16 * n).
In inner loops, this could add up with cache-savings (don't pay retail).

Cons:
o The overhead of an op-code is probably not much greater than the total
overhead of this instruction (especially with a switch-based op-dispatcher
and -O2 compilation)
o Potentially long op-chains are possible which could thwart responsiveness
to synchronous event-handling. (Unless there is a max-fmt size)
o Potential portability issues if we deal with 16bit "shorts".  Might have to
require 32bit ints as the smallest macro-symbol

/** FMT: compute v_arg_size, destination-register, fmt_const_string, ( list
of reg-indexes )
 *contents of fmt_string:
 * + //add top two stack el's
 * - // sub ..
 * * // mul ..
 * / // divide ..
 * % // take modulous
 * & //  logical and
 * | // logical or
 * P // push register onto local stack
 * C // push constant onto local stack
 * D // duplicate top stack element
 * S // switch top two elements
 * \0 // end-of-fmt-string.  Must be word-aligned  with such eofs eg
"PPP++\0\0\0"
 ******
 * Note that the above symbols should probably be part of a c-enum so that
the switch statement can be optimized
 **/
AUTO_OP computei_varg {
  static int stack[MAX_STACK]; // make 1 larger than the max-size of
allowable varg_size
  unsigned int top = 0; // points to safe location

  STRING *s = Parrot_string_constants[P2];
  int args_offset =  3 +code ; // pointer to the start of parameters (could
have just been word_align( strlen(code[4]) ) )
 short * dops = (short*)s->bufstart; // get format

  int f_first_two_op;
  char * sops; // single_char_ops

  // sanity check against varg_size
  if ( code[1] > MAX_STACK )
    die "invalid size-of operation";

  stack[ 0 ] = 0; // null-value (should never read)
  stack[ 1 ] = 0; // initially stack contains null

  // compare pairs of op-codes (theoretically could nest this into the
default statement of a 4-op comparison)
  while (*dops && top) {
    switch( *dops ) {
      case PUSH_PLUS: // "P+" or ('^' << 8) +'+'
         stack[top] += INT_REG( code[ args_offset++ ] );
         break;
      case CONST_PLUS: //  "C+"
         stack[top] += code[ args_offset++ ];
      case PLUS_PLUS: // "++"
        stack[top] += stack[top--] + stack[top--];
      ...
      defailt: // fall back to the single-character case for unrecognized
pairs
        f_first_two_op = 0;
        sops = (char*)dops;
        while(!f_first_two_op++) {
          switch( *((char*) sops))  {
            case '+': ...
            case '-': ...
             ...
            default: die "invalid op-code"
          } // end 1char-switch
        } // end 1char-while
    } // end 2char-switch
  } // end 2char-while

  if ( top )
    INT_REG( P2 ) = stack[top];
  else
    die "invalid number of pushes";
} // computei_varg

-Michael

Reply via email to