Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-18 Thread Leon Brocard

Simon Cozens sent the following bits through the ether:

 My view is that if you screw up writing assembly code, you should be
 thankful that you get the protection of a segfault.

Indeed, but why not warn about it. The JVM does this by checking that
the bytecodes are wellformed during its bytecode verification stage
(before it runs the bytecode). It does some simple checks to see if
the bytecode is doing reasonable things (no halting problem here, move
along). Something for later, perhaps.

Leon
-- 
Leon Brocard.http://www.astray.com/
Iterative Software...http://www.iterative-software.com/

... All generalizations are false, including this one



[PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Gregor N. Purdy

All --

While I was working on understanding jump_i, I was wishing I could
see a trace of where the PC was going and what it was executing.
(BTW, the jump.pasm test works now that I hard-coded the offsets
as *word* counts not *byte* counts -- D'Oh!)

I was also concerned that even though I was probably jumping out of
the bytecode block I wasn't getting an error message from the
interpreter.

The patch I'm attaching here does the following things:

  * Passes around the bytecode block size to all the functions that
work with it. As we strip off the leading stuff to get to the
actual code to execute, the size is also decremented.

  * Generates BUILD_NAME_TABLE and BUILD_ARG_TABLE macros via
build_interp_starter.pl. These are used later when tracing ops.

  * Additional bounds checks in the runops loop. I know we don't
like to add code to the inner loop, but allowing bogus or evil
bytecode to jump to arbitrary locations outside the bytecode
block is most likely a Bad Thing (TM).

  * If the TRACE_OPS symbol is defined, then additional code will
be enabled which prints out the PC, OP and args for each operation
the interpreter executes.

NOTES:

  * For testing, I hard-coded #define TRACE_OPS. I'm not comfortable
enough with the config stuff to muck with it. I'd like to be
able to enable this stuff with a config option (presuming Dan
et al. like the idea of me committing this modulo some
refinements). Can someone send me a patch with the right
incantations so I can try it out?

  * The bounds checking actually doesn't go quite far enough. Here
are two other things that need consideration:

  * Suppose we jump to the last word in the bytecode, and there
is an op there that expects arguments... BOOM

  * Suppose the bytecode contains opcodes for out-of-range ops...
BOOM.

  * Now, we could consider verifying the bytecode after reading
it and before interpreting it to see that these things don't
occur. BUT, I don't think there is anything to stop someone
from producing a perfectly fine bytecode stream that has
data embedded in the middle or at the end that is never
intended to be executed (unless we make a rule that says
Thou Shalt Not Do That).

I'd like to commit this if I can get the Config support worked out.
I think it would be a big help in debugging new ops, especially as
we start working on subroutines and such.

Any thoughts?


Regards,

-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/


? op.diff
? trace_ops.diff
? t/inc.pasm
? t/jumpsub.pasm
? t/substr.pasm
Index: build_interp_starter.pl
===
RCS file: /home/perlcvs/parrot/build_interp_starter.pl,v
retrieving revision 1.6
diff -u -r1.6 build_interp_starter.pl
--- build_interp_starter.pl 2001/09/15 16:05:40 1.6
+++ build_interp_starter.pl 2001/09/17 14:17:48
@@ -26,7 +26,39 @@
 }
 print INTERP } while (0);\n;
 
+
+#
+# BUILD_NAME_TABLE macro:
+#
+
+print INTERP CONST;
+#define BUILD_NAME_TABLE(x) do { \\
+CONST
+
+for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) {
+print INTERP \tx[$opcodes{$name}{CODE}] = \$name\; \\\n;
+}
+print INTERP } while (0);\n;
+
+
+#
+# BUILD_ARG_TABLE macro:
+#
+
+print INTERP CONST;
+#define BUILD_ARG_TABLE(x) do { \\
+CONST
+
+for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) {
+print INTERP \tx[$opcodes{$name}{CODE}] = $opcodes{$name}{ARGS}; \\\n;
+}
+print INTERP } while (0);\n;
+
+
+#
 # Spit out the DO_OP function
+#
+
 print INTERP EOI;
 
 #define DO_OP(w,x,y,z) do { \\
Index: bytecode.c
===
RCS file: /home/perlcvs/parrot/bytecode.c,v
retrieving revision 1.9
diff -u -r1.9 bytecode.c
--- bytecode.c  2001/09/16 01:45:50 1.9
+++ bytecode.c  2001/09/17 14:17:49
@@ -49,7 +49,8 @@
 */
 
 static int
-check_magic(void** program_code) {
+check_magic(void** program_code, long* program_size) {
+program_size -= sizeof(IV);
 return (GRAB_IV(program_code) == PARROT_MAGIC);
 }
 
@@ -69,12 +70,14 @@
 */
 
 static void
-read_constants_table(void** program_code)
+read_constants_table(void** program_code, long* program_size)
 {
 IV len = GRAB_IV(program_code);
 IV num;
 IV i = 0;
 
+*program_size -= sizeof(IV);
+
 Parrot_num_string_constants = len;
 if (len == 0) {
return;
@@ -82,6 +85,7 @@
 
 num = GRAB_IV(program_code);
  

Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Dan Sugalski

At 10:33 AM 9/17/2001 -0400, Gregor N. Purdy wrote:
Any thoughts?

The big one is you shouldn't assume that we are only going to have a single 
chunk of bytecode--we may well have several loaded from disk, and more 
created on the fly by eval/do/dynamic recompilation.

Also, we're trying to keep the stuff in the loop to a minimum, so for this 
I'd rather have a separate runops function, as well as having the actual 
funky stuff in the body separated out. (I'd really like it abstracted out 
into a generic debugging runops, but we can do that later)

Other than that it looks pretty good.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Simon Cozens

On Mon, Sep 17, 2001 at 10:33:35AM -0400, Gregor N. Purdy wrote:
 as *word* counts not *byte* counts -- D'Oh!)

Isn't assembly programming fun? :)

  */
  
  static int
 -check_magic(void** program_code) {
 +check_magic(void** program_code, long* program_size) {
 +program_size -= sizeof(IV);
  return (GRAB_IV(program_code) == PARROT_MAGIC);
  }

And to think, just above that */ is a load of lovely documentation
you haven't touched...

 +#ifdef TRACE_OPS
 +fprintf(stderr, PC=%ld; OP=%ld (%s)\n, code - code_start, *code, 
op_names[*code]);
 +#endif /* TRACE_OPS */

That's neat.

 +#ifdef TRACE_OPS
 +if (code = code_start  code  (code_start + code_size)) {
 +fprintf(stderr, PC=%ld; OP=%ld (%s), code - code_start, *code, 
op_names[*code]);
 +if (op_args[*code]) {
 +fprintf(stderr, ; ARGS=();
 +for(i = 0; i  op_args[*code]; i++) {
 +if (i) { fprintf(stderr, , ); }
 +fprintf(stderr, %ld, *(code + i + 1));
 +}
 +fprintf(stderr, ));
 +}
 +fprintf(stderr, \n);
 +} else {
 +fprintf(stderr, PC=%ld; OP=err\n, code - code_start);
 +}
 +#endif /* TRACE_OPS */

That's less than neat, but I can't think of a better way to do it.

 +if (code  code_start || code = (code_start + code_size)) {
 +fprintf(stderr, Error: Control left bounds of byte-code block (now at 
location %d)!\n, code - code_start);
 +exit(1);

I don't like this check in every iteration. 

Simon

-- 
Heh, heh, heh, heh... the NOISE of a bursar CHEWING Proctors' Memoranda.
- Henry Braun is Oxford Zippy



Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Gregor N. Purdy

Dan --

 The big one is you shouldn't assume that we are only going to have a single 
 chunk of bytecode--we may well have several loaded from disk, and more 
 created on the fly by eval/do/dynamic recompilation.

Yeah. It seems we should have a parrot_bytecode structure that keeps the
pointer to the block-o-bytes, and also memoizes the pointers to the
other bits, and the length(s).

I didn't mess with that, though, because I didn't want to conflate too
many issues.

 Also, we're trying to keep the stuff in the loop to a minimum, so for this 
 I'd rather have a separate runops function, as well as having the actual 
 funky stuff in the body separated out. (I'd really like it abstracted out 
 into a generic debugging runops, but we can do that later)

So, shall I make a runops_trace() function and modify test_prog to run
that if it is passed an approproate --trace (or something) flag?

 Other than that it looks pretty good.

I'll wait for a little more feedback, modify, and submit another patch.
If that looks good, I'll go ahead and commit.


Regards,
 
-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/




Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Simon Cozens

On Mon, Sep 17, 2001 at 11:01:32AM -0400, Dan Sugalski wrote:
 While I'm not fond of segfaults myself, the place to check isn't in the 
 interpreter loop. 

My view is that if you screw up writing assembly code, you should be
thankful that you get the protection of a segfault.

Same happens if you construct horribly malformed bytecode in Perl 5, as
I know only too well. :)

-- 
Writing software is more fun than working.



Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Dan Sugalski

At 11:02 AM 9/17/2001 -0400, Gregor N. Purdy wrote:
Dan --
  Also, we're trying to keep the stuff in the loop to a minimum, so for this
  I'd rather have a separate runops function, as well as having the actual
  funky stuff in the body separated out. (I'd really like it abstracted out
  into a generic debugging runops, but we can do that later)

So, shall I make a runops_trace() function and modify test_prog to run
that if it is passed an approproate --trace (or something) flag?

Yup. Just checked in a new interpreter.h with an interpreter flags field 
and some defined bits. Go ahead and use those. (flags != 0 means we need to 
check, FWIW)


Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Dan Sugalski

At 03:53 PM 9/17/2001 +0100, Simon Cozens wrote:
On Mon, Sep 17, 2001 at 11:01:32AM -0400, Dan Sugalski wrote:
  While I'm not fond of segfaults myself, the place to check isn't in the
  interpreter loop.

My view is that if you screw up writing assembly code, you should be
thankful that you get the protection of a segfault.

I agree, but don't forget that if we're going to be allowing both Safe-ish 
partitions and the execution of potentially untrusted code we need to guard 
against jumps to naughty places by incoming bytecode.

Which, I suppose, argues as much for making the branch and jump code 
guaranteed to vector through the opcode function table as anything, so we 
can pay to check only in compartments we care about.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [PATCH] Bytecode bounds checking and TRACE_OPS

2001-09-17 Thread Gregor N. Purdy

Dan --

 While I'm not fond of segfaults myself, the place to check isn't in the 
 interpreter loop. It's not unwarranted in assuming that it's told to go OK 
 places. If we want to check the better place is inside the jump ops.

I'm in agreement, although we still need to watch out for falling off
the end of the bytecode block, too. Take blamo.pasm, for example. If
we caused the jump to spit out a warning instead of dying, and fall
through to the next opcode, we'd end up falling off the bytecode block
into never-never land. Now, if we know the maximum number of operands
allowed in Parrot, we could paste that many zeros on then end of the
bytecode stream so that no matter what the last thing in the input
bytecode stream is (even if its an op with its operands missing),
we'll be forcing an 'end'.

It might be nice, though, to have an op called 'err' instead of 'end'
so that if this ever happens we complain rather than just ending as
if everything was right with the world.

Do you want me to do anything like this? Or, shall I fix up the
current patch for now so we've got trace + basic protection and 
then move on to smarter protection as a separate step?


Regards,
 
-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/