Re: [PATCH] Bytecode bounds checking and TRACE_OPS
Simon Cozens sent the following bits through the ether: My view is that if you screw up writing assembly code, you should be thankful that you get the protection of a segfault. Indeed, but why not warn about it. The JVM does this by checking that the bytecodes are wellformed during its bytecode verification stage (before it runs the bytecode). It does some simple checks to see if the bytecode is doing reasonable things (no halting problem here, move along). Something for later, perhaps. Leon -- Leon Brocard.http://www.astray.com/ Iterative Software...http://www.iterative-software.com/ ... All generalizations are false, including this one
[PATCH] Bytecode bounds checking and TRACE_OPS
All -- While I was working on understanding jump_i, I was wishing I could see a trace of where the PC was going and what it was executing. (BTW, the jump.pasm test works now that I hard-coded the offsets as *word* counts not *byte* counts -- D'Oh!) I was also concerned that even though I was probably jumping out of the bytecode block I wasn't getting an error message from the interpreter. The patch I'm attaching here does the following things: * Passes around the bytecode block size to all the functions that work with it. As we strip off the leading stuff to get to the actual code to execute, the size is also decremented. * Generates BUILD_NAME_TABLE and BUILD_ARG_TABLE macros via build_interp_starter.pl. These are used later when tracing ops. * Additional bounds checks in the runops loop. I know we don't like to add code to the inner loop, but allowing bogus or evil bytecode to jump to arbitrary locations outside the bytecode block is most likely a Bad Thing (TM). * If the TRACE_OPS symbol is defined, then additional code will be enabled which prints out the PC, OP and args for each operation the interpreter executes. NOTES: * For testing, I hard-coded #define TRACE_OPS. I'm not comfortable enough with the config stuff to muck with it. I'd like to be able to enable this stuff with a config option (presuming Dan et al. like the idea of me committing this modulo some refinements). Can someone send me a patch with the right incantations so I can try it out? * The bounds checking actually doesn't go quite far enough. Here are two other things that need consideration: * Suppose we jump to the last word in the bytecode, and there is an op there that expects arguments... BOOM * Suppose the bytecode contains opcodes for out-of-range ops... BOOM. * Now, we could consider verifying the bytecode after reading it and before interpreting it to see that these things don't occur. BUT, I don't think there is anything to stop someone from producing a perfectly fine bytecode stream that has data embedded in the middle or at the end that is never intended to be executed (unless we make a rule that says Thou Shalt Not Do That). I'd like to commit this if I can get the Config support worked out. I think it would be a big help in debugging new ops, especially as we start working on subroutines and such. Any thoughts? Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/ ? op.diff ? trace_ops.diff ? t/inc.pasm ? t/jumpsub.pasm ? t/substr.pasm Index: build_interp_starter.pl === RCS file: /home/perlcvs/parrot/build_interp_starter.pl,v retrieving revision 1.6 diff -u -r1.6 build_interp_starter.pl --- build_interp_starter.pl 2001/09/15 16:05:40 1.6 +++ build_interp_starter.pl 2001/09/17 14:17:48 @@ -26,7 +26,39 @@ } print INTERP } while (0);\n; + +# +# BUILD_NAME_TABLE macro: +# + +print INTERP CONST; +#define BUILD_NAME_TABLE(x) do { \\ +CONST + +for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) { +print INTERP \tx[$opcodes{$name}{CODE}] = \$name\; \\\n; +} +print INTERP } while (0);\n; + + +# +# BUILD_ARG_TABLE macro: +# + +print INTERP CONST; +#define BUILD_ARG_TABLE(x) do { \\ +CONST + +for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) { +print INTERP \tx[$opcodes{$name}{CODE}] = $opcodes{$name}{ARGS}; \\\n; +} +print INTERP } while (0);\n; + + +# # Spit out the DO_OP function +# + print INTERP EOI; #define DO_OP(w,x,y,z) do { \\ Index: bytecode.c === RCS file: /home/perlcvs/parrot/bytecode.c,v retrieving revision 1.9 diff -u -r1.9 bytecode.c --- bytecode.c 2001/09/16 01:45:50 1.9 +++ bytecode.c 2001/09/17 14:17:49 @@ -49,7 +49,8 @@ */ static int -check_magic(void** program_code) { +check_magic(void** program_code, long* program_size) { +program_size -= sizeof(IV); return (GRAB_IV(program_code) == PARROT_MAGIC); } @@ -69,12 +70,14 @@ */ static void -read_constants_table(void** program_code) +read_constants_table(void** program_code, long* program_size) { IV len = GRAB_IV(program_code); IV num; IV i = 0; +*program_size -= sizeof(IV); + Parrot_num_string_constants = len; if (len == 0) { return; @@ -82,6 +85,7 @@ num = GRAB_IV(program_code);
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
At 10:33 AM 9/17/2001 -0400, Gregor N. Purdy wrote: Any thoughts? The big one is you shouldn't assume that we are only going to have a single chunk of bytecode--we may well have several loaded from disk, and more created on the fly by eval/do/dynamic recompilation. Also, we're trying to keep the stuff in the loop to a minimum, so for this I'd rather have a separate runops function, as well as having the actual funky stuff in the body separated out. (I'd really like it abstracted out into a generic debugging runops, but we can do that later) Other than that it looks pretty good. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
On Mon, Sep 17, 2001 at 10:33:35AM -0400, Gregor N. Purdy wrote: as *word* counts not *byte* counts -- D'Oh!) Isn't assembly programming fun? :) */ static int -check_magic(void** program_code) { +check_magic(void** program_code, long* program_size) { +program_size -= sizeof(IV); return (GRAB_IV(program_code) == PARROT_MAGIC); } And to think, just above that */ is a load of lovely documentation you haven't touched... +#ifdef TRACE_OPS +fprintf(stderr, PC=%ld; OP=%ld (%s)\n, code - code_start, *code, op_names[*code]); +#endif /* TRACE_OPS */ That's neat. +#ifdef TRACE_OPS +if (code = code_start code (code_start + code_size)) { +fprintf(stderr, PC=%ld; OP=%ld (%s), code - code_start, *code, op_names[*code]); +if (op_args[*code]) { +fprintf(stderr, ; ARGS=(); +for(i = 0; i op_args[*code]; i++) { +if (i) { fprintf(stderr, , ); } +fprintf(stderr, %ld, *(code + i + 1)); +} +fprintf(stderr, )); +} +fprintf(stderr, \n); +} else { +fprintf(stderr, PC=%ld; OP=err\n, code - code_start); +} +#endif /* TRACE_OPS */ That's less than neat, but I can't think of a better way to do it. +if (code code_start || code = (code_start + code_size)) { +fprintf(stderr, Error: Control left bounds of byte-code block (now at location %d)!\n, code - code_start); +exit(1); I don't like this check in every iteration. Simon -- Heh, heh, heh, heh... the NOISE of a bursar CHEWING Proctors' Memoranda. - Henry Braun is Oxford Zippy
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
Dan -- The big one is you shouldn't assume that we are only going to have a single chunk of bytecode--we may well have several loaded from disk, and more created on the fly by eval/do/dynamic recompilation. Yeah. It seems we should have a parrot_bytecode structure that keeps the pointer to the block-o-bytes, and also memoizes the pointers to the other bits, and the length(s). I didn't mess with that, though, because I didn't want to conflate too many issues. Also, we're trying to keep the stuff in the loop to a minimum, so for this I'd rather have a separate runops function, as well as having the actual funky stuff in the body separated out. (I'd really like it abstracted out into a generic debugging runops, but we can do that later) So, shall I make a runops_trace() function and modify test_prog to run that if it is passed an approproate --trace (or something) flag? Other than that it looks pretty good. I'll wait for a little more feedback, modify, and submit another patch. If that looks good, I'll go ahead and commit. Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
On Mon, Sep 17, 2001 at 11:01:32AM -0400, Dan Sugalski wrote: While I'm not fond of segfaults myself, the place to check isn't in the interpreter loop. My view is that if you screw up writing assembly code, you should be thankful that you get the protection of a segfault. Same happens if you construct horribly malformed bytecode in Perl 5, as I know only too well. :) -- Writing software is more fun than working.
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
At 11:02 AM 9/17/2001 -0400, Gregor N. Purdy wrote: Dan -- Also, we're trying to keep the stuff in the loop to a minimum, so for this I'd rather have a separate runops function, as well as having the actual funky stuff in the body separated out. (I'd really like it abstracted out into a generic debugging runops, but we can do that later) So, shall I make a runops_trace() function and modify test_prog to run that if it is passed an approproate --trace (or something) flag? Yup. Just checked in a new interpreter.h with an interpreter flags field and some defined bits. Go ahead and use those. (flags != 0 means we need to check, FWIW) Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
At 03:53 PM 9/17/2001 +0100, Simon Cozens wrote: On Mon, Sep 17, 2001 at 11:01:32AM -0400, Dan Sugalski wrote: While I'm not fond of segfaults myself, the place to check isn't in the interpreter loop. My view is that if you screw up writing assembly code, you should be thankful that you get the protection of a segfault. I agree, but don't forget that if we're going to be allowing both Safe-ish partitions and the execution of potentially untrusted code we need to guard against jumps to naughty places by incoming bytecode. Which, I suppose, argues as much for making the branch and jump code guaranteed to vector through the opcode function table as anything, so we can pay to check only in compartments we care about. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [PATCH] Bytecode bounds checking and TRACE_OPS
Dan -- While I'm not fond of segfaults myself, the place to check isn't in the interpreter loop. It's not unwarranted in assuming that it's told to go OK places. If we want to check the better place is inside the jump ops. I'm in agreement, although we still need to watch out for falling off the end of the bytecode block, too. Take blamo.pasm, for example. If we caused the jump to spit out a warning instead of dying, and fall through to the next opcode, we'd end up falling off the bytecode block into never-never land. Now, if we know the maximum number of operands allowed in Parrot, we could paste that many zeros on then end of the bytecode stream so that no matter what the last thing in the input bytecode stream is (even if its an op with its operands missing), we'll be forcing an 'end'. It might be nice, though, to have an op called 'err' instead of 'end' so that if this ever happens we complain rather than just ending as if everything was right with the world. Do you want me to do anything like this? Or, shall I fix up the current patch for now so we've got trace + basic protection and then move on to smarter protection as a separate step? Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/