Re: segfault in parrot using pugs PIR output
On Thu, 2005-06-30 at 19:36 +0200, Leopold Toetsch wrote: Brian Wheeler wrote: Its been a while since I tinkered with parrot so I thought I'd start playing again...but I've hit a segfault. Should of course not happen... But it seems that the codegen is mixing old and new calling conventions. The trace: 8233 set I0, 1- I0=1, 8236 set I1, 0- I1=0, 8239 set I2, 0- I2=0, 8242 set I3, 0- I3=0, 8245 set I4, 0- I4=0, 8248 set S1,- S1=, 8251 set P0, PMC_C[733] - P0=Sub=PMC(0x8af24b8 pc:8023), 8254 invokecc clearly shows some wrong function call, like another one I found in the pir source: .namespace ['main'] main() This shorthand call syntax doesn't work together (yet) with the new set_args/get_params ... opcodes. Ok, that seems reasonable, considering the newness of the new calling conventions. So please verify that no function shortcut syntax is used in code generation. leo Thanks! Not knowing Haskell (who said _perl_ looks like line noise?) this patch to src/Pugs/CodeGen/PIR.hs gets it working to the point I can test an optimizer I'm tinkering with: @@ -320,6 +320,7 @@ varInit ('':_) = text $ PerlScalar varInit x = error $ invalid name: ++ x + {-| Compiles the current environment to PIR code. -} genPIR :: Eval Val genPIR = do @@ -374,11 +375,17 @@ -- XXX wrong, should be lexical , InsNew tempPMC PerlScalar , store_global.- [lit $_, tempPMC] -]) ++ [ StmtRaw (text (name ++ ())) | PSub name@('_':'_':_) _ _ _ - globPIL ] ++ -[ StmtRaw (text main()) -, StmtIns (exit .- [lit0]) -] -, DeclSub main [SubANON] [ StmtRaw $ emit mainPIR ] + ] ++ +{- ) ++ [ StmtRaw (text (name ++ ())) | PSub name@('_':'_':_) _ _ _ - globPIL ] ++ -} + [ InsNew tempPMC PerlScalar + , tempPMC -- find_name $ [lit __main] + , set_args.- [lit ()] + , InsNew tempPMC2 PerlScalar + , get_results .-[lit (0), tempPMC2] + , invokecc.-[tempPMC] + , exit.-[lit0] + ]) +, DeclSub __main {- [SubANON] -} [] [ StmtRaw $ emit mainPIR ] ] ] ] where style = MkEvalStyle Brian
Re: eof opcode
Fair enough. However, shouldn't the rest of the opcodes with an IO object as their parameter be methods as well? Its not a lot of ops, but it would trim down the core a bit. Brian On Thu, 2004-11-25 at 08:00 +0100, Leopold Toetsch wrote: Brian Wheeler [EMAIL PROTECTED] wrote: I noticed a hole in the io.ops where the PIO stuff wasn't covered. This patch creates an eof opcode which checks for end of file. Please just use the eof method of the PIO object: $I0 = $P0.eof() leo -- brian wheeler [EMAIL PROTECTED]
Re: Problems with 0.1.1 release on x86-64
Sigh. I'll get this right sometime! Brian Index: config/auto/jit.pl === RCS file: /cvs/public/parrot/config/auto/jit.pl,v retrieving revision 1.33 diff -u -r1.33 jit.pl --- config/auto/jit.pl 8 Mar 2004 08:49:05 - 1.33 +++ config/auto/jit.pl 19 Oct 2004 18:38:41 - @@ -171,9 +171,9 @@ else { Configure::Data-set( jitarchname = 'nojit', - jitcpuarch = 'i386', - jitcpu = 'I386', - jitosname = 'nojit', + jitcpuarch = $cpuarch, + jitcpu = $cpuarch, + jitosname = $osname, jitcapable = 0, execcapable = 0, cc_hasjit = '', Index: config/auto/memalign.pl === RCS file: /cvs/public/parrot/config/auto/memalign.pl,v retrieving revision 1.10 diff -u -r1.10 memalign.pl --- config/auto/memalign.pl 13 Oct 2004 14:37:59 - 1.10 +++ config/auto/memalign.pl 19 Oct 2004 18:38:41 - @@ -42,6 +42,13 @@ Configure::Data-set('malloc_header', 'stdlib.h'); } +if (Configure::Data-get('ptrsize') == Configure::Data-get('intsize')) { + Configure::Data-set('ptrcast','int'); + } +else { + Configure::Data-set('ptrcast','long'); + } + cc_gen('config/auto/memalign/test_c.in'); eval { cc_build(); }; unless ($@ || cc_run_capture() !~ /ok/) { Index: config/auto/memalign/test_c.in === RCS file: /cvs/public/parrot/config/auto/memalign/test_c.in,v retrieving revision 1.4 diff -u -r1.4 test_c.in --- config/auto/memalign/test_c.in 13 Jul 2003 18:52:37 - 1.4 +++ config/auto/memalign/test_c.in 19 Oct 2004 18:38:41 - @@ -9,6 +9,6 @@ int main(int argc, char **argv) { void *ptr = memalign(256, 17); - puts(ptr ((int)ptr 0xff) == 0 ? ok : nix); + puts(ptr ((${ptrcast})ptr 0xff) == 0 ? ok : nix); return 0; } Index: config/auto/memalign/test_c2.in === RCS file: /cvs/public/parrot/config/auto/memalign/test_c2.in,v retrieving revision 1.3 diff -u -r1.3 test_c2.in --- config/auto/memalign/test_c2.in 13 Jul 2003 18:52:37 - 1.3 +++ config/auto/memalign/test_c2.in 19 Oct 2004 18:38:41 - @@ -20,6 +20,6 @@ * arbitrary allocation size) */ int i = posix_memalign(p, s, 177); - puts(((int)p 0xff) == 0 i == 0 ? ok : nix); + puts(((${ptrcast})p 0xff) == 0 i == 0 ? ok : nix); return i; }
Re: Problems with 0.1.1 release on x86-64
Sorry for the delay...work interfered with my playing and I had to transfer my CVS repo to my x86-64 machine. I don't know if I'd classify it as silence thereafter... as in the summary, but its pretty close :) Here's the diff against the current CVS. It doesn't mess with the default class that needs the split for the return exception. Brian Wheeler [EMAIL PROTECTED] Index: config/auto/jit.pl === RCS file: /cvs/public/parrot/config/auto/jit.pl,v retrieving revision 1.33 diff -u -r1.33 jit.pl --- config/auto/jit.pl 8 Mar 2004 08:49:05 - 1.33 +++ config/auto/jit.pl 18 Oct 2004 05:25:57 - @@ -171,9 +171,9 @@ else { Configure::Data-set( jitarchname = 'nojit', - jitcpuarch = 'i386', - jitcpu = 'I386', - jitosname = 'nojit', + jitcpuarch = $cpuarch, + jitcpu = $cpuarch, + jitosname = $osname, jitcapable = 0, execcapable = 0, cc_hasjit = '', Index: config/auto/memalign.pl === RCS file: /cvs/public/parrot/config/auto/memalign.pl,v retrieving revision 1.10 diff -u -r1.10 memalign.pl --- config/auto/memalign.pl 13 Oct 2004 14:37:59 - 1.10 +++ config/auto/memalign.pl 18 Oct 2004 05:25:57 - @@ -42,6 +42,13 @@ Configure::Data-set('malloc_header', 'stdlib.h'); } +if (Configure::Data-get('ptrsize') == Configure::Data-get('intsize')) { + Configure::Data-set('ptrcast','int'); + } +else { + Configure::Data-set('ptrcast','long'); + } + cc_gen('config/auto/mema On Thu, 2004-10-14 at 06:37, Leopold Toetsch wrote: Brian Wheeler wrote: * cast warnings in default.pmc. Changing static int cant_do_method to static long cant_do_method makes it compile without warnings, but its not the right fix. Better would be to split the return statement and the exception in the generated code. Below is a patch which fixes the first 3. Doesn't apply. Please rediff to current CVS and attach the patch. Thanks, leolign/test_c.in'); eval { cc_build(); }; unless ($@ || cc_run_capture() !~ /ok/) { Index: config/auto/memalign/test_c.in === RCS file: /cvs/public/parrot/config/auto/memalign/test_c.in,v retrieving revision 1.4 diff -u -r1.4 test_c.in --- config/auto/memalign/test_c.in 13 Jul 2003 18:52:37 - 1.4 +++ config/auto/memalign/test_c.in 18 Oct 2004 05:25:57 - @@ -9,6 +9,6 @@ int main(int argc, char **argv) { void *ptr = memalign(256, 17); - puts(ptr ((int)ptr 0xff) == 0 ? ok : nix); + puts(ptr ((${ptrcast})ptr 0xff) == 0 ? ok : nix); return 0; } Index: config/auto/memalign/test_c2.in === RCS file: /cvs/public/parrot/config/auto/memalign/test_c2.in,v retrieving revision 1.3 diff -u -r1.3 test_c2.in --- config/auto/memalign/test_c2.in 13 Jul 2003 18:52:37 - 1.3 +++ config/auto/memalign/test_c2.in 18 Oct 2004 05:25:57 - @@ -20,6 +20,6 @@ * arbitrary allocation size) */ int i = posix_memalign(p, s, 177); - puts(((int)p 0xff) == 0 i == 0 ? ok : nix); + puts(((${ptrcast})p 0xff) == 0 i == 0 ? ok : nix); return i; }
Problems with 0.1.1 release on x86-64
* PARROT_CPU_ARCH is defined as i386 and PARROT_OS_NAME is nojit when jit determination fails. It now correctly (?) reports 'x86_64' and 'linux'. * Memory alignment tests warn about a pointer size difference in cast. A new configuration setting ptrcast is either 'int' or 'long' depending on what type a pointer is safe to cast to/from. I suspect that all cases will be 'long', but you never really know. * cast warnings in default.pmc. Changing static int cant_do_method to static long cant_do_method makes it compile without warnings, but its not the right fix. * Build fails with: gcc -shared -fPIC -g \ -o runtime/parrot/dynext/libnci.so src/nci_test.o /usr/bin/ld: src/nci_test.o: relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC src/nci_test.o: could not read symbols: Bad value collect2: ld returned 1 exit status make: *** [runtime/parrot/dynext/libnci.so] Error 1 No idea how to fix this one...adding --ccflags=-fPIC doesn't help the compile. Below is a patch which fixes the first 3. Brian Wheeler [EMAIL PROTECTED] diff -ur ../parrot-0.1.1/classes/default.pmc ./classes/default.pmc --- ../parrot-0.1.1/classes/default.pmc 2004-10-06 10:55:36.0 -0500 +++ ./classes/default.pmc 2004-10-13 23:22:01.089477209 -0500 @@ -52,7 +52,7 @@ */ -static int +static long cant_do_method(Parrot_Interp interpreter, PMC * pmc, const char *methname) { internal_exception(ILL_INHERIT, diff -ur ../parrot-0.1.1/config/auto/jit.pl ./config/auto/jit.pl --- ../parrot-0.1.1/config/auto/jit.pl 2004-03-08 04:29:29.0 -0500 +++ ./config/auto/jit.pl2004-10-13 22:59:05.551927726 -0500 @@ -171,9 +171,9 @@ else { Configure::Data-set( jitarchname = 'nojit', - jitcpuarch = 'i386', - jitcpu = 'I386', - jitosname = 'nojit', + jitcpuarch = $cpuarch, + jitcpu = $cpuarch, + jitosname = $osname, jitcapable = 0, execcapable = 0, cc_hasjit = '', diff -ur ../parrot-0.1.1/config/auto/memalign/test_c2.in ./config/auto/memalign/test_c2.in --- ../parrot-0.1.1/config/auto/memalign/test_c2.in 2003-07-13 13:52:58.0 -0500 +++ ./config/auto/memalign/test_c2.in 2004-10-13 23:09:54.87570 -0500 @@ -20,6 +20,6 @@ * arbitrary allocation size) */ int i = posix_memalign(p, s, 177); - puts(((int)p 0xff) == 0 i == 0 ? ok : nix); + puts(((${ptrcast})p 0xff) == 0 i == 0 ? ok : nix); return i; } diff -ur ../parrot-0.1.1/config/auto/memalign/test_c.in ./config/auto/memalign/test_c.in --- ../parrot-0.1.1/config/auto/memalign/test_c.in 2003-07-13 13:52:58.0 -0500 +++ ./config/auto/memalign/test_c.in2004-10-13 23:09:59.903596577 -0500 @@ -9,6 +9,6 @@ int main(int argc, char **argv) { void *ptr = memalign(256, 17); - puts(ptr ((int)ptr 0xff) == 0 ? ok : nix); + puts(ptr ((${ptrcast})ptr 0xff) == 0 ? ok : nix); return 0; } diff -ur ../parrot-0.1.1/config/auto/memalign.pl ./config/auto/memalign.pl --- ../parrot-0.1.1/config/auto/memalign.pl 2004-03-13 13:46:24.0 -0500 +++ ./config/auto/memalign.pl 2004-10-13 23:09:48.870037359 -0500 @@ -39,6 +39,14 @@ Configure::Data-set('malloc_header', 'stdlib.h'); } +if (Configure::Data-get('ptrsize') == Configure::Data-get('intsize')) { + Configure::Data-set('ptrcast','int'); + } +else { + Configure::Data-set('ptrcast','long'); +} + + cc_gen('config/auto/memalign/test_c.in'); eval { cc_build(); }; unless ($@ || cc_run_capture() !~ /ok/) {
Re: Python bytecode notes, part one
On Mon, 2004-07-05 at 19:27, Dan Sugalski wrote: UNARY_NEGATIVE: restore $Px; new $Py, Undef; $Py = $Px * -1; save $Py | $Px = -1 * $Px Wouldn't something this do what is desired? I'm just guessing, though. Brian
Current CVS broken?
Its been a while since I've looked at parrot, so I did a cvs update -d, perl Configure.pl, make clean, make and build failed: $ make gcc -o parrot -L/usr/local/lib test_main.o blib/lib/libparrot.a -lnsl -ldl -lm -lpthread -lcrypt -lutil blib/lib/libparrot.a(jit_cpu.o)(.text+0x2ce0): In function `Parrot_jit_restart_op': : undefined reference to `Parrot_end_jit' collect2: ld returned 1 exit status make: *** [parrot] Error 1 It can't find Parrot_end_jit. Sure enough, its not there: $ nm blib/lib/libparrot.a | grep Parrot_end_jit U Parrot_end_jit $ grep --recursive Parrot_end_jit * include/parrot/jit_emit.h:static void Parrot_end_jit(Parrot_jit_info_t *, struct Parrot_Interp * ); include/parrot/jit_emit.h:Parrot_end_jit(jit_info, interpreter); jit/i386/jit_emit.h:static void Parrot_end_jit(Parrot_jit_info_t *, struct Parrot_Interp * ); jit/i386/jit_emit.h:Parrot_end_jit(jit_info, interpreter); Did I miss something obvious? Brian Wheeler [EMAIL PROTECTED]
Re: Current CVS broken?
On Mon, 2003-06-09 at 15:23, Leopold Toetsch wrote: Brian Wheeler [EMAIL PROTECTED] wrote: Its been a while since I've looked at parrot, so I did a cvs update -d, perl Configure.pl, make clean, make and build failed: $ make realclean should help. We don't have all dependencies in generated files sorted out that far - sorry. Doesn't, I'm afraid. I removed everything but the CVS directory at the top level, repopulated everything and tried to build it again with the same results Any thoughts? Brian It can't find Parrot_end_jit. Sure enough, its not there: Its in jit_cpu.c generated from core.jit. Brian Wheeler leo
RE: Parrot 0.0.9
On Tue, 2002-10-29 at 11:48, Brent Dax wrote: [EMAIL PROTECTED]: # Well, on thinking a bit about this, there's no reason that # we have to # worry--it's perfectly OK for us to declare, unconditionally, that # segment 0 is always bytecode, 1 line number info, and so on, with # everything after position X (for some value of X) left up # in the air. # A bit dodgy, true, as it means that any new known segment # types we add # in will be floating, but I don't think we're going to end # up with too # many performance-critical pieces in the bytecode. (Arguably # it's just # the bytecode itself, the symbols, and the constants, as the # rest are # looked at under exceptional circumstances or on (rare) demand) # # *No* # # This really kills extendability, or makes it at least very # ugly. It needs to prealloc a certain number of segments. Each # of this has a fixed semantic. Extending means consuming on of # the preallocated fields, or using some segment beyond the # preallocated area but then it needs a type field. In fact the # preallocated segments also have a # type-field: the position in the packfile. How about this structure: HEADER SEGMENT 0 CHUNK 0 (DIRECTORY) SIZE: DATA: CHUNK 0 ENTRY TYPE: DIRECTORY (type 0) OFFSET: Is this really necessary? Seems like a chicken-and-egg thing: to know which chuck the directory is in, you need to read the directory. However, since you've defined that the first chunk (0) is always the directory, there's really no need to have it in the directory since you know it has to be the first chunk. Out of curiosity, would I need a separate segments if I was going to have multiple versions of the program (say, one debugging and one optimized) in the same file? It looks that way. Will the segment/chunk ids's be consistent between builds how do I know what they will be in advance (for dynamically loading the 'debugging' version on demand) ? Brian
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
On Sat, 2002-07-13 at 12:32, Tom Hughes wrote: In message 20020703012231$[EMAIL PROTECTED] Simon Glover (via RT) [EMAIL PROTECTED] wrote: This code: A:# prints a print a end doesn't assemble; the assembler dies with the error message: Use of uninitialized value in hash element at assemble.pl line 844. Couldn't find operator '' on line 1. If you remove the s from the comment, it works fine. Likewise, if you put the label, op and comment on the same line, ie: A: print a # prints a end then it assembles and runs OK. Here's a patch that will fix this. I havn't committed it because I'm not sure why the assember wasn't dropping comments that included quotes so I'm giving people who know more about the assembler than me a chance to comment first... I believe it wasn't dropping the comments with quotes as a side effect of not wanting to break things like: print # which breaks with the included patch. I basically had the same patch you do, but wasn't able to figure out how to handle the above case *and* do the right thing with # prints a Brian Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: assemble.pl === RCS file: /cvs/public/parrot/assemble.pl,v retrieving revision 1.77 diff -u -r1.77 assemble.pl --- assemble.pl 4 Jul 2002 18:36:17 - 1.77 +++ assemble.pl 13 Jul 2002 17:30:48 - @@ -433,7 +433,7 @@ $self-{pc}++; return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank lines - $line=~s/#[^']+$//; # Remove trailing comments + $line=~s/#.*$//; # Remove trailing comments $line=~s/(^\s+|\s+$)//g; # Remove leading and trailing whitespace # # Accumulate lines that only have labels until an instruction is found..
[Patch] typeof op
This patch implements a typeof op which returns the integer type or string type of a PMC. The test I used is: new P0,.PerlInt typeof S0,P0 eq S0,PerlInt,OK_1 print not OK_1: print ok 1\\n typeof I0,P0 eq I0,.PerlInt,OK_2 print not OK_2: print ok 2\\n end This should allow a program to know what kind of PMC they're playing with. Brian diff -u -r1.174 core.ops --- core.ops28 Jun 2002 18:50:05 - 1.174 +++ core.ops29 Jun 2002 18:49:32 - -3884,6 +3884,27 goto NEXT(); } + +=item Btypeof(out STR, in PMC) + +=item Btypeof(out INT, in PMC) + +Return the type of PMC in $2. + +=cut + +inline op typeof (out STR, in PMC) { + PMC *p = $2; + $1 = (p-vtable-name(interpreter, p)); + goto NEXT(); +} + +inline op typeof (out INT, in PMC) { + PMC *p = $2; + $1 = (p-vtable-type(interpreter, p)); + goto NEXT(); +} + =item Bfind_type(out INT, in STR)
Re: .include directive for new assembler
On Sat, 2002-06-22 at 13:06, Jeff wrote: Clinton A Pierce wrote: At 09:37 PM 6/21/2002 -0500, brian wheeler wrote: I've implemented a .include directive for the new assembler. It basically changes the preprocessor to shift through the source file, and when an include is found, the included file is unshifted to the beginning. To the beginning? Do we have any pre-processor directives (.constant, etc...) that are sensitive to where they're done? Either way, I'm fine with it. As soon as it's in, I'll fix BASIC to remove the hand-rolled .include stuff it has now. I hadn't read the 'beginning' part, unfortunately. That actually could change a lot of things. If it's unshifted onto the beginning of the file, then each new file will become the main as it's unshifted onto the beginning. For example, any code outside of a macro declaration would get run before the main file, and potentially initialize in the wrong order. To be fair, this is more of a fault with the current assembler not having built-in support for subroutine boundaries. Correct me please, Brian, but I'm envisioning a situation like this: .include foo.pasm .include bar.pasm MAIN: print hi end Expanding to: Initialize_Bar: set I0,32 branch END_OF_BAR # This is, of course, something of a bogus mechanism to begin with, but it should make my point. .constant IO_VECTOR S31 # Some subroutines END_OF_BAR: Initialize_Foo: set I1,42 branch END_OF_FOO # See above comment .constant IO_VECTOR S30 END_OF_FOO: MAIN: print hi end The code is admittedly highly contrived, but note that initialization is being done in the wrong order with respect to the files (foo.pasm's init should be run then bar.pasm), and IO_VECTOR is defined to be S30 rather than S31. The Initialize_Bar and Initialize_Foo functions have just been called in the wrong order. Now, I realize that modifying that bit of the assembler to count its position in the array of lines is non-trivial, and that's entirely my fault. However, having it attach files to the beginning of the list would violate the principle of least surprise, and could cause problems with conditional macros, which I've designed but haven't seen a need to implement yet. If you wouldn't mind rewriting the patch so that it substitutes files inline at the point of .include'ing I'll be happy to revert the old patch and put your new one in. Sorry about the seeming change of mind, but this is what I get for affirming stuff at obscenely late hours. Its not backwards, it does the right thing. The at the beginning part is correct since the source is treated like a queue, and the first item is shifted off. When an .include is found, the file is inserted at the beginning of the queue, since all of the code prior to the .include have been removed (shifted) off. So this (also contrived) example works as expected: test.pasm -- print yo!\n ..include foo.pasm ..include foo2.pasm print you are here\n end -- foo.pasm -- print hello -- foo2.pasm -- print world\n -- Does that make more sense? Brian -- Jeff [EMAIL PROTECTED]
Re: .include directive for new assembler
On Sat, 2002-06-22 at 20:12, Jeff wrote: brian wheeler wrote: Its not backwards, it does the right thing. Okay, I believe you now :) I was thinking that the insert was done at the beginning of the -file-, not the insertion point of the file. If you haven't committed, feel free to do so. I shouldn't have stuck my nose in :) no problem! I have a tendency to not actually test my code thoroughly! Brian The at the beginning part is correct since the source is treated like a queue, and the first item is shifted off. When an .include is found, the file is inserted at the beginning of the queue, since all of the code prior to the .include have been removed (shifted) off. So this (also contrived) example works as expected: test.pasm -- print yo!\n .include foo.pasm .include foo2.pasm print you are here\n end -- foo.pasm -- print hello -- foo2.pasm -- print world\n -- Does that make more sense? Brian -- Jeff [EMAIL PROTECTED]
.include directive for new assembler
I've implemented a .include directive for the new assembler. It basically changes the preprocessor to shift through the source file, and when an include is found, the included file is unshifted to the beginning. Should I commit it? Brian --- assemble.pl 17 Jun 2002 03:18:17 - 1.74 +++ assemble.pl 22 Jun 2002 02:39:15 - -227,7 +227,9 my $reg_re = qr([INSP]\d+); my $num_re = qr([-+]?\d+(\.\d+([eE][-+]?\d+)?)?); - for({$self-{cur_contents}}) { + my todo=@{$self-{cur_contents}}; + while(scalar(todo)) { +$_=shift(todo); $line++; # -263,16 +265,19 elsif(/^\.include \s+ ([^]+) /x) {# .include {file} -# if(-e $1) { -#open FOO, $1; -#while(FOO) { -# chomp; -#} -#close FOO; -# } -# else { -#print STDERR Couldn't open '$1' for inclusion at line $line: $!.\n; -# } + if(-e $1) { +open FOO, $1; +my include; +while(FOO) { + chomp; + push(include,$_); +} +unshift(todo,include); +close FOO; + } + else { +print STDERR Couldn't open '$1' for inclusion at line $line: $!.\n; + } } elsif(/^\.macro\s+ ($label_re) \s*
Re: quicksort in pasm
On Fri, 2002-05-24 at 15:10, Sean O'Rourke wrote: I was starting with a very simple test to decide how to determine where the memory overuse was coming from, I'm actually looking at this now as well, though with zip2.pasm instead of quicksort. What I've found is that because zip constructs the result with incremental packing, and since the 3-argument pack is implemented through string_concat, which allocates a new string every time, it chews through vast quantities of memory. At some point, these buffers (which are now ~45k each) stop getting freed, and Parrot grows from 10M to about 200M before my laptop dies a horrible death. Sort does things much differently, but I wouldn't be surprised if it's hittng the same things-not-being-reclaimed bug as zip. [interesting sorting results] Am I missing something somewhere? You've probably got it in EBCDIC mode or something. That or it's a randomized quicksort, which is supposed to be faster. /s Actually I was getting the wrong results because I was using ge le to test the bounds when I should have used gt lt. The attached quicksort fixes it. Its still _huge_ when the memory runs. Looking at the profile data, it looks like Parrot_pushs is the big time user... % cumulative self self total time seconds secondscalls ns/call ns/call name 66.67 0.02 0.02 singlebyte_decode 33.33 0.03 0.01 646 15479.88 15479.88 Parrot_pushs Of course, if I understood the calling convention, I might get away with less pushs's :) Brian
Re: quicksort in pasm
Crud, I forgot to attach the quicksort in the last one... Brian On Sat, 2002-05-25 at 21:17, brian wheeler wrote: On Fri, 2002-05-24 at 15:10, Sean O'Rourke wrote: I was starting with a very simple test to decide how to determine where the memory overuse was coming from, I'm actually looking at this now as well, though with zip2.pasm instead of quicksort. What I've found is that because zip constructs the result with incremental packing, and since the 3-argument pack is implemented through string_concat, which allocates a new string every time, it chews through vast quantities of memory. At some point, these buffers (which are now ~45k each) stop getting freed, and Parrot grows from 10M to about 200M before my laptop dies a horrible death. Sort does things much differently, but I wouldn't be surprised if it's hittng the same things-not-being-reclaimed bug as zip. [interesting sorting results] Am I missing something somewhere? You've probably got it in EBCDIC mode or something. That or it's a randomized quicksort, which is supposed to be faster. /s Actually I was getting the wrong results because I was using ge le to test the bounds when I should have used gt lt. The attached quicksort fixes it. Its still _huge_ when the memory runs. Looking at the profile data, it looks like Parrot_pushs is the big time user... % cumulative self self total time seconds secondscalls ns/call ns/call name 66.67 0.02 0.02 singlebyte_decode 33.33 0.03 0.01 646 15479.88 15479.88 Parrot_pushs Of course, if I understood the calling convention, I might get away with less pushs's :) Brian # # quicksort.pasm # # Author: Brian Wheeler ([EMAIL PROTECTED]) # # Usage: #./parrot quicksort.pbc /file/to/quicksort # main: new P0,PerlArray set I1,0 m1: readlineS0,0 length I0,S0 eq I0,0,m2 chopn S0,1 set_keyed P0,I1,S0 inc I1 branch m1 m2: set I0,0 set I1,P0 dec I1 bsr quicksort m3: get_keyed S0,P0,I0 print S0 print \n inc I0 le I0,I1,m3 end # # quicksort # params: # I0 low # I1 high # P0 array to sort # returns: # nothing quicksort: pushi print Starting quicksort with print I0 printand print I1 print \n set I11,I1 # High=high; set I10,I0 # Low=low; le I1,I0,q1# if(highlow) { bsr partition # pivot=partition(p,low,high); set I0,I10 # quicksort(p,Low,pivot-1) set I1,I2 dec I1 bsr quicksort set I0,I2 # quicksort(p,pivot+1,High) inc I0 set I1,I11 bsr quicksort q1: popi# } ret # # partition # params: # I0 low # I1 high # P0 array to sort # returns: # I2 pivot partition: pushi pushs get_keyed S2,P0,I0# S2 is pivot_item set I2,I0 # I2 is pivot index set I10,I0 # I10 is left set I11,I1 # I11 is right p1: ge I10,I11,p1e # while(left right) { p2: get_keyed S0,P0,I10 # while(p[left] = pivot_item gt S0,S2,p2e gt I10,I11,p2e # left = right) { inc I10 # left++; branch p2 # } p2e: p3: get_keyed S1,P0,I11 # while(p[right] pivot_item le S1,S2,p3e lt I11,I10,p3e #right = left) { dec I11 # right--; branch p3 # } p3e: ge I10,I11,p4 # if(left right) { get_keyed S3,P0,I10 # swap(p,left,right); get_keyed S4,P0,I11 set_keyed P0,I11,S3 set_keyed P0,I10,S4 p4: # } branch p1 # } p1e:get_keyed S3,P0,I11 # p[low]=p[right]; set_keyed P0,I0,S3 set_keyed P0,I11,S2 # p[right
quicksort in pasm
1 0.00 0.02 mem_setup_allocator [41] 0.0 0.35 0.001 0.00 0.00 mem_sys_realloc [125] 0.0 0.35 0.001 0.00 0.02 new_hash [44] 0.0 0.35 0.001 0.00 0.00 new_sized_resource_pool [60] 0.0 0.35 0.001 0.00 0.01 new_tracked_header [55] 0.0 0.35 0.001 0.00 0.00 parseflags [126] 0.0 0.35 0.001 0.00 349.75 runops [3] 0.0 0.35 0.001 0.00 349.75 runops_cgoto_core [4] 0.0 0.35 0.001 0.00 349.75 runops_generic [5] 0.0 0.35 0.001 0.00 0.00 sized_index [127] 0.0 0.35 0.001 0.00 0.00 string_grow [64] 0.0 0.35 0.001 0.00 0.00 string_init [128] I don't see anything that really stands out (unlike earlier builds where string_make was taking the most time) Any thoughts? Brian # # quicksort.pasm # # Author: Brian Wheeler ([EMAIL PROTECTED]) # # Usage: #./parrot quicksort.pbc /file/to/quicksort # main: new P0,PerlArray set I1,0 m1: readlineS0,0 length I0,S0 eq I0,0,m2 chopn S0,1 set_keyed P0,I1,S0 inc I1 branch m1 m2: set I0,0 set I1,P0 dec I1 bsr quicksort m3: get_keyed S0,P0,I0 print S0 print \n inc I0 le I0,I1,m3 end # # quicksort # params: # I0 low # I1 high # P0 array to sort # returns: # nothing quicksort: pushi print Starting quicksort with print I0 printand print I1 print \n set I11,I1 # High=high; set I10,I0 # Low=low; le I1,I0,q1# if(highlow) { bsr partition # pivot=partition(p,low,high); set I0,I10 # quicksort(p,Low,pivot-1) set I1,I2 dec I1 bsr quicksort set I0,I2 # quicksort(p,pivot+1,High) inc I0 set I1,I11 bsr quicksort q1: popi# } ret # # partition # params: # I0 low # I1 high # P0 array to sort # returns: # I2 pivot partition: pushi pushs get_keyed S2,P0,I0# S2 is pivot_item set I2,I0 # I2 is pivot index set I10,I0 # I10 is left set I11,I1 # I11 is right p1: ge I10,I11,p1e # while(left right) { p2: get_keyed S0,P0,I10 # while(p[left] = pivot_item gt S0,S2,p2e ge I10,I11,p2e # left right) { inc I10 # left++; branch p2 # } p2e: p3: get_keyed S1,P0,I11 # while(p[right] pivot_item le S1,S2,p3e le I11,I10,p3e #right left) { dec I11 # right--; branch p3 # } p3e: ge I10,I11,p4 # if(left right) { get_keyed S3,P0,I10 # swap(p,left,right); get_keyed S4,P0,I11 set_keyed P0,I11,S3 set_keyed P0,I10,S4 p4: # } branch p1 # } p1e:get_keyed S3,P0,I11 # p[low]=p[right]; set_keyed P0,I0,S3 set_keyed P0,I11,S2 # p[right]=pivot_item; saveI11 pops popi restore I2 # return right; ret
Non-vtable functionality on PMCs?
I've been trying to catch up with parrot again (darn it, babies take more time than I thought :) and I've come up with a question... how do you do other things to PMCs that aren't normal ops? In particular, I was wondering about shift/unshift, push/pop on the PerlArray PMC. Am I missing something obvious? Brian
Re: .NET CLR and Parrot
On Sat, 2002-02-23 at 13:12, Dan Sugalski wrote: At 12:22 PM -0500 2/23/02, Melvin Smith wrote: At 11:53 AM 2/23/2002 +, Simon Cozens wrote: I was very lucky recently to attend a talk by Ganesh Sittampalam introducing Microsoft .NET and the Common Language Runtime. A lot of what CLR is trying to do is quite similar to what we're doing with Parrot, so I thought it would be a good idea to briefly recap what's going on with CLR. I've read several specs on CLR and ILAsm; there are some good ideas. rant I wouldn't want Parrot to look like ILA but I wish we would at least have more dialogue on things like pbc format, adding directives to the language, etc. Then lets start. I've not been giving the assembler much attention mainly because I've been assuming that it'll be marginalized quickly, but I think that assumption's a bad one. We also need a PDD for the bytecode file format. We need notation for globals, notation for metadata about objects and types, etc. etc. but frankly the last time I looked at the assembler I sort of got lost. You're not alone here. The assembler gives me headaches, and I think it's stymied Simon recently as well. ;) me too [uh oh, I wrote bits of it!] I've been playing around with a new assembler that might be a bit better laid out. Here's what I've got done so far: * object oriented: * you can load more than one assembler at once * assembly can be done in parts, i.e. $a=new Assembler; $a-assemble(set I0,3); $a-assemble(set I1,4); is the same as: $a=new Assembler; $a-assemble(set I0,3\nset I1,4); * the preprocessor is a separate routine * directives must be in upper case (per PDD) * local labels, plus the ability to use local labels from outside the scope: global_label.local_label from anywhere will find the right one. * argument types are fully determined prior to looking up signatures I've not got these things finished * label arithmetic [is this really needed?] * macros * actual bytecode output I just kind of set it aside because I wasn't sure if it was really needed. I can dig it back up if there is interest... Brian
Re: We have PMCs. Time to start work.
On Fri, 2001-11-23 at 13:41, Simon Cozens wrote: On Fri, Nov 23, 2001 at 06:04:29PM +, Simon Cozens wrote: * Rewrite mops.pasm to use integer PMCs, and compare the speeds. I couldn't wait. :) % ../../test_prog mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 9.948440 M op/s:20.103654 % ../../test_prog mops_p.pbc Iterations:1 Estimated ops: 2 done Elapsed time: 20.994231 M op/s:9.526427 I don't get it. What kind of machine was that on? Here are my numbers: [bdwheele@thor parrot]$ ./test_prog mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 11.016154 M op/s:18.155156 [bdwheele@thor parrot]$ ./test_prog mops_p.pbc Iterations:1 Estimated ops: 2 done Elapsed time: 14.288639 M op/s:13.997134 My I-regs one is slower than yours, and my P-regs one is faster...this is on a P4/1.7G under Linux 2.4.15pre8 Any ideas? Brian
Re: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)
On Tue, 2001-11-20 at 12:19, Ken Fox wrote: James Mastros wrote: In byteswapping the bytecode ... I propose that we make INTVAL and opcode_t the same size, and gaurrenteed to be able to hold a void*. It sounds like you want portable byte code. Is that a goal? It seems like we can have either mmap'able byte code or portable byte code, but not both. Personally, I'd rather have portable byte code because memory is cheap and self-modifiying byte code opens up a lot of optimization potential. I know others disagree. Hmm. It wouldn't necessarily be portable, though it probably would be on machines with the same size endianness. So, on an alpha, you'd have: sizeof(INTVAL)=sizeof(opcode_t)=sizeof(void *)=64 bits whereas on x86 (and other 32 bit machines) sizeof(INTVAL)=sizeof(opcode_t)=sizeof(void *)=32 bits and on the Parrot/C-64 VM you'd have sizeof(INTVAL)=sizeof(opcode_t)=sizeof(void *)=16 bits Is that right? Brian Are we looking at two different byte code formats? Dan? - Ken
RE: Size of integer register vs sizeof(void *)
On Mon, 2001-11-19 at 12:43, Hong Zhang wrote: Are there any cases where a void * cannot be placed into an integer register? It seems like it shouldn't happen, especially since jump and jsr are supposed to take an integer register and they point to a host-machine-address... What register are you talking about? The 16-bit x86 has 16-bit integer registers, but uses segment:offset as address. So the sizeof(void*) may be 32-bit, plus you have to deal with far, near, huge, ... (total 6 types) of pointers. Hong Sorry: should be size of parrot integer vs sizeof(void *) Should have clarified that :) Brian
Re: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)
On Mon, 2001-11-19 at 19:59, James Mastros wrote: Hey all. In parellel to splitting out features (yeah, I like that better then platforms too) (which is going well this time, I think (I'm being a lot better about checking against clean checkouts, but having problems thinking of a good generic interface for open() and friends), I'm thinking about a new packfile format. I'm comeing across a common problem to both of them. That is, that the sizeof things don't come in nice easy units. In byteswapping the bytecode, it's a Very Good Thing if you can just treat it as an array of some specifc type, so you don't have to figure out how man args each opcode takes (this may, in fact, be impossible, if oplibs are dynamicly loadable during the bytecode's runtime). In the feature-splitting, for example in file opening, some interfaces (POSIX open(), for example) want to return an integer (the fd), and some (win32 CreateFile(), for example) want to return a void* (the file handle). (This is a pointer to a kernel-allocated structure that will cause a segfault to directly access, BTW, so no GC problems here.) I propose that we make INTVAL and opcode_t the same size, and gaurrenteed to be able to hold a void*. Seems reasonable to me, since jsr and jump are slated to use an I register to jump to a host-machine-address and start interpreting there. Brian
Size of integer register vs sizeof(void *)
Are there any cases where a void * cannot be placed into an integer register? It seems like it shouldn't happen, especially since jump and jsr are supposed to take an integer register and they point to a host-machine-address... Brian
Ooops, sorry for that blank log message.
Darn it, I fat fingered the log message. This is a fix which changes the way op variants are handled. The old method forgot the last variant, so thing(i,i|ic,i|ic) would generate: thing(i,i,i) thing(i,i,ic) thing(i,ic,i) but not thing(i,ic,ic) The new one does. Brian
[patch] changes to trace
This patch makes trace a little more useful. It prints the constant referred to, as well as the value of the register being accessed. This string reverse program trace 1 set S0,Hello world set S1, set S2, length I0,S0 dec I0 $loop: substr S2,S0,I0,1 concat S1,S2 dec I0 ge I0,0,$loop set S0,S1 end produces this output (I removed some spaces on PC=16 to stop the word wrap in my mailer): PC=2; OP=9 (set_s_sc); ARGS=(S0=(null), Hello world) PC=5; OP=9 (set_s_sc); ARGS=(S1=(null), ) PC=8; OP=9 (set_s_sc); ARGS=(S2=(null), ) PC=11; OP=115 (length_i_s); ARGS=(I0=0, S0=Hello world) PC=14; OP=66 (dec_i); ARGS=(I0=11) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=, S0=Hello world, I0=10,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=, S2=d) PC=24; OP=66 (dec_i); ARGS=(I0=10) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=9, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=d, S0=Hello world, I0=9,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=d, S2=l) PC=24; OP=66 (dec_i); ARGS=(I0=9) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=8, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=l, S0=Hello world, I0=8,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dl, S2=r) PC=24; OP=66 (dec_i); ARGS=(I0=8) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=7, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=r, S0=Hello world, I0=7,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlr, S2=o) PC=24; OP=66 (dec_i); ARGS=(I0=7) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=6, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=o, S0=Hello world, I0=6,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlro, S2=w) PC=24; OP=66 (dec_i); ARGS=(I0=6) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=5, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=w, S0=Hello world, I0=5,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow, S2= ) PC=24; OP=66 (dec_i); ARGS=(I0=5) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=4, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2= , S0=Hello world, I0=4,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow , S2=o) PC=24; OP=66 (dec_i); ARGS=(I0=4) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=3, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=o, S0=Hello world, I0=3 1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow o, S2=l) PC=24; OP=66 (dec_i); ARGS=(I0=3) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=2, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=l, S0=Hello world, I0=2,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow ol, S2=l) PC=24; OP=66 (dec_i); ARGS=(I0=2) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=1, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=l, S0=Hello world, I0=1,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow oll, S2=e) PC=24; OP=66 (dec_i); ARGS=(I0=1) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=0, 0, -10) PC=16; OP=117 (substr_s_s_i_ic); ARGS=(S2=e, S0=Hello world, I0=0,1) PC=21; OP=110 (concat_s_s); ARGS=(S1=dlrow olle, S2=H) PC=24; OP=66 (dec_i); ARGS=(I0=0) PC=26; OP=43 (ge_i_ic_ic); ARGS=(I0=-1, 0, -10) PC=30; OP=8 (set_s_s); ARGS=(S0=Hello world, S1=dlrow olleH) PC=33; OP=0 (end) If it looks reasonable, I'll commit it. Brian Index: interpreter.c === RCS file: /home/perlcvs/parrot/interpreter.c,v retrieving revision 1.29 diff -u -r1.29 interpreter.c --- interpreter.c 2001/10/14 23:47:39 1.29 +++ interpreter.c 2001/10/17 20:15:42 @@ -81,7 +81,38 @@ fprintf(stderr, ; ARGS=(); for(i = 1; i interpreter-opcode_info[*pc].arg_count; i++) { if (i 1) { fprintf(stderr, , ); } -fprintf(stderr, %ld, (long) *(pc + i)); +switch(interpreter-opcode_info[*pc].types[i]) { +case PARROT_ARG_IC: +fprintf(stderr, %ld, (long) *(pc + i)); +break; +case PARROT_ARG_NC: +fprintf(stderr, %f, interpreter-code-const_table-constants[*(pc ++ i)]-number); +break; +case PARROT_ARG_PC: +/* what is a PMC constant look like? */ +fprintf(stderr, %ld, (long) *(pc + i)); +break; +case PARROT_ARG_SC: +fprintf(stderr, \%s\, +interpreter-code-const_table-constants[*(pc + i)]-string-bufstart); +break; +case PARROT_ARG_I: +fprintf(stderr, I%ld=%ld, (long) *(pc + i), (long) +interpreter-int_reg-registers[*(pc + i)]); +break; +case PARROT_ARG_N: +fprintf(stderr, N%ld=%f, (long) *(pc + i), +interpreter-num_reg-registers[*(pc + i)]); +break; +case PARROT_ARG_P: +/* what does a PMC constant look like? */ +fprintf(stderr, P%ld=???, (long) *(pc + i)); +break; +case PARROT_ARG_S: +if(interpreter-string_reg-registers[*(pc + i)]) { +fprintf(stderr, S%ld=\%s\, (long) *(pc + i), +interpreter-string_reg-registers[*(pc + i)]-bufstart); +
Re: string weirdness
On Mon, 2001-10-15 at 21:12, Dan Sugalski wrote: On 15 Oct 2001, Brian Wheeler wrote: With the addition of clone, I started writing some generic routines which might be useful (index,lc,uc,reverse,abs,tr,etc)...and I came across some weirdness: doing: save S0 restore S1 (since there's no set S1,S0) binds the registers together, so a change to one is a change to both...which doesn't happen on int registers. Right. Save on a string register pushes the pointer to the string structure in the register onto the stack. The same thing happens with PMCs, or will when they're implemented. The assumption is that, when you push a register onto the stack, you'll then stomp on the contents of the register. (Rather than what the register points to...) Otherwise a push would need to create a copy of the string structure and a copy of the string contents. Dan Ok, that's fair... With that in mind I've implemented set Sx,Sy which does a string copy. This is what I was after in the first place, but then noticed the unexpected behavior of save/restore of S registers. It makes sense now :) Brian
missing opcodes
After writing a couple of library functions, I realized that we have to do alot of data shuffling to do common tasks. Reserving a register to hold 0 or 1 and/or filling up registers with constants just takes up cpu time and could better be handled if the opcodes took constants directly as well as registers. substr was really annoying in this regard: set I0,0 set I1,1 substr S0,S1,I0,I1 pre-allocating the registers (I.e. I31=1, I30=0) helped a bit, but it took away from the size of the general purpose register pool, and obscured the code. So, with that in mind, here are some opcodes to fill in the blanks. I tried to make sure that none of the opcodes below could be computed in advance (i.e. there's no add(i,ic,ic)) and when the order of the args doesn't matter (in add, for example), the constant comes last. ** Suggested Opcodes to minimize data shuffling ** -- arithmetic operations [unification of add inc, except in the 'inc I0' case] add(i,i,ic) add(n,n,nc) cmod(i,i,ic) cmod(i,ic,i) cmod(n,n,nc) cmod(n,nc,n) div(i,i,ic) div(i,ic,i) div(n,n,nc) div(n,nc,n) mod(i,i,ic) mod(i,ic,i) mod(n,n,nc) mod(n,n,nc) mul(i,i,ic) mul(n,n,nc) pow(n,i,ic) pow(n,ic,i) pow(n,i,nc) pow(n,ic,n) pow(n,n,ic) pow(n,nc,i) pow(n,n,nc) pow(n,nc,n) [unification of sub dec, except in the 'dec I0' case] sub(i,i,ic) sub(n,n,nc) -- string operations chopn(s,i) concat(s,sc) substr(s,s,i,ic) substr(s,s,ic,i) substr(s,s,ic,ic) substr(s,sc,i,i) substr(s,sc,i,ic) substr(s,sc,ic,i) -- trancendental operations atan(n,i,ic) atan(n,ic,i) atan(n,i,nc) atan(n,ic,n) atan(n,n,ic) atan(n,nc,i) atan(n,n,nc) atan(n,nc,n) -- bitwise logical operations and(i,i,ic) or(i,i,ic) shl(i,i,i) shr(i,i,i) xor(i,i,ic) WHEW! That's alot of opcodes. However, it would lead to more compact (and probably faster) bytecode to be able to avoid having to load registers all the time with values you know in advance. Thoughts? Brian
[PATCH] missing opcodes
Here's a patch which adds the 'missing' opcodes from the earlier email. It also adds the 3 arg variant of concat. Dan/Simon/Anyone, if it seems ok, I'll commit it, but since it adds 52 op variants, I wasn't sure if it would be ok. Brian Ops follow +AUTO_OP add(i, i, ic) { +AUTO_OP add(n, n, nc) { +AUTO_OP cmod(i, i, ic) { +AUTO_OP cmod(i, ic, i) { +AUTO_OP cmod(n, n, nc) { +AUTO_OP cmod(n, nc, n) { +AUTO_OP div(i, i, ic) { +AUTO_OP div(i, ic, i) { +AUTO_OP div(n, n, nc) { +AUTO_OP div(n, nc, n) { +AUTO_OP mod(i, i, ic) { +AUTO_OP mod(i, ic, i) { +AUTO_OP mod(n, n, nc) { +AUTO_OP mod(n, nc, n) { +AUTO_OP mul(i, i, ic) { +AUTO_OP mul(n, n, nc) { +AUTO_OP pow(n, i, ic) { +AUTO_OP pow(n, ic, i) { +AUTO_OP pow(n, i, nc) { +AUTO_OP pow(n, ic, n) { +AUTO_OP pow(n, n, ic) { +AUTO_OP pow(n, nc, i) { +AUTO_OP pow(n, n, nc) { +AUTO_OP pow(n, nc, n) { +AUTO_OP sub(i, i, ic) { +AUTO_OP sub(i, ic, i) { +AUTO_OP sub(n, n, nc) { +AUTO_OP sub(n, nc, n) { +AUTO_OP chopn(s, i) { +AUTO_OP concat(s, sc) { +AUTO_OP concat(s, s, s) { +AUTO_OP concat(s, s, sc) { +AUTO_OP concat(s, sc, s) { +AUTO_OP substr(s, s, i, ic) { +AUTO_OP substr(s, s, ic, i) { +AUTO_OP substr(s, s, ic, ic) { +AUTO_OP substr(s, sc, i, i) { +AUTO_OP substr(s, sc, i, ic) { +AUTO_OP substr(s, sc, ic, i) { +AUTO_OP atan(n, i, ic) { +AUTO_OP atan(n, ic, i) { +AUTO_OP atan(n, i, nc) { +AUTO_OP atan(n, ic, n) { +AUTO_OP atan(n, n, ic) { +AUTO_OP atan(n, nc, i) { +AUTO_OP atan(n, n, nc) { +AUTO_OP atan(n, nc, n) { +AUTO_OP and(i, i, ic) { +AUTO_OP or(i, i, ic) { +AUTO_OP shl(i, i, i) { +AUTO_OP shr(i, i, i) { +AUTO_OP xor(i, i, ic) { Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.8 diff -u -r1.8 core.ops --- core.ops2001/10/16 14:12:41 1.8 +++ core.ops2001/10/16 15:35:36 @@ -661,8 +661,12 @@ =item Badd(i, i, i) +=item Badd(i, i, ic) + =item Badd(n, n, n) +=item Badd(n, n, nc) + Set $1 to the sum of $2 and $3. =cut @@ -670,16 +674,28 @@ AUTO_OP add(i, i, i) { $1 = $2 + $3; } + +AUTO_OP add(i, i, ic) { + $1 = $2 + $3; +} AUTO_OP add(n, n, n) { $1 = $2 + $3; } +AUTO_OP add(n, n, nc) { + $1 = $2 + $3; +} + =item Bcmod(i, i, i) +=item Bcmod(i, i, ic) + +=item Bcmod(i, ic, i) + NOTE: This uncorrected mod algorithm uses the C language's built-in mod operator (x % y), which is @@ -716,11 +732,23 @@ $1 = $2 % $3; } +AUTO_OP cmod(i, i, ic) { + $1 = $2 % $3; +} + +AUTO_OP cmod(i, ic, i) { + $1 = $2 % $3; +} + =item Bcmod(n, n, n) +=item Bcmod(n, n, nc) + +=item Bcmod(n, nc, n) + NOTE: This uncorrected mod algorithm uses the built-in C math library's fmod() function, which computes @@ -752,7 +780,15 @@ $1 = fmod($2, $3); } +AUTO_OP cmod(n, n, nc) { + $1 = fmod($2, $3); +} +AUTO_OP cmod(n, nc, n) { + $1 = fmod($2, $3); +} + + =item Bdec(i) @@ -795,8 +831,16 @@ =item Bdiv(i, i, i) +=item Bdiv(i, i, ic) + +=item Bdiv(i, ic, i) + =item Bdiv(n, n, n) +=item Bdiv(n, n, nc) + +=item Bdiv(n, nc, n) + Set $1 to the quotient of $2 divided by $3. In the case of INTVAL division, the result is truncated (NOT rounded or floored). @@ -806,11 +850,27 @@ $1 = $2 / $3; } +AUTO_OP div(i, i, ic) { + $1 = $2 / $3; +} + +AUTO_OP div(i, ic, i) { + $1 = $2 / $3; +} + AUTO_OP div(n, n, n) { $1 = $2 / $3; } +AUTO_OP div(n, n, nc) { + $1 = $2 / $3; +} + +AUTO_OP div(n, nc, n) { + $1 = $2 / $3; +} + =item Binc(i) @@ -853,6 +913,10 @@ =item Bmod(i, i, i) +=item Bmod(i, i, ic) + +=item Bmod(i, ic, i) + NOTE: This corrected mod algorithm is based on the C code on page 70 of [1]. Assuming correct behavior of C's built-in mod operator (%) with positive arguments, this algorithm implements a mathematically convenient @@ -899,11 +963,67 @@ $1 = r; } +AUTO_OP mod(i, i, ic) { + INTVAL y = $2; + INTVAL z = $3; + INTVAL s = 0; + INTVAL r; + if (z == 0) { +r = y; + } + else { +if (y 0) { s += 2; y = -y; } +if (z 0) { s += 1; z = -z; } + +r = y % z; + +switch (s) { + case 0 + 0:break; + case 0 + 1: r = r - z; break; + case 2 + 0: r = z - r; break; + case 2 + 1: r = -r;break; +} + } + + $1 = r; +} + +AUTO_OP mod(i, ic, i) { + INTVAL y = $2; + INTVAL z = $3; + INTVAL s = 0; + INTVAL r; + + if (z == 0) { +r = y; + } + else { +if (y 0) { s += 2; y = -y; } +if (z 0) { s += 1; z = -z; } + +r = y % z; + +switch (s) { + case 0 + 0:break; + case 0 + 1: r = r - z; break; + case 2 + 0: r = z - r; break; + case 2 + 1: r = -r;break; +} + } + + $1 = r; +} + + =item Bmod(n, n, n) +=item Bmod(n, n, nc) + +=item Bmod(n, nc, n) + NOTE:
Re: missing opcodes
Heheh, I should read all of my mail before I send new ones. I'll commit it shortly. Brian On Tue, 2001-10-16 at 10:36, Dan Sugalski wrote: At 10:04 AM 10/16/2001 -0500, Brian Wheeler wrote: Thoughts? Go for it. This sort of thing's just fine. I know I made a NO NEW OPCODES WITHOUT CLEARANCE statement a while ago, but that really needs clarification. The thing that's a no-no is new 'high-level' opcodes--basically anything that'd require a new entry in parrot_assembly.pod. Opcode variants--tan which takes a constant, say--are fine. We may, at some point, prune out the less-used opcodes so we don't have a huge table of 'em if that turns out to be a problem (like if there's a performance knee at 255/256 or 511/512 base opcodes in the switch) but for the moment I'm not overly worried about the excess of opcodes if it turns out that having special ones is worth it. It may, of course, turn out that: set N0, 12.4 tan N1, N0 ends up being faster than tan N1, 12.4 in some cases (like if we almost never use tan of a constant but do use tan of a register a lot, thus having to fault in the code for the constant tan every time we use it) but that's something that'll have to do some performance testing on to know for sure. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
substr broken?
I'm getting some weird results when using substr. Here's my test program: set S0,Hello world print Arg to Reverse: print S0 print \n set S1, set S2, length I0,S0 dec I0 $loop: substr S2,S0,I0,1 print string so far: print S1 print , print S2 print \n concat S1,S2 dec I0 ge I0,0,$loop set S0,S1 end Here's the output: Arg to Reverse: Hello world string so far:d, d string so far:l, l string so far:r, r string so far:o, o string so far:w, w string so far: , string so far:o, o string so far:l, l string so far:l, l string so far:e, e string so far:H, H Why is S1 set on the first iteration, before the concat? Brian
Re: substr broken?
On Tue, 2001-10-16 at 13:04, Alex Gough wrote: On Tue, 16 Oct 2001, Alex Gough wrote: On 16 Oct 2001, Brian Wheeler wrote: I'm getting some weird results when using substr. Here's my test program: It's probably something wrong with the constant table or the assembly phase, if the script is changed so that S1 is set to -, say, it does more what I expect. Indeed it is. My CVS access isn't working (ask??) so here's a new test to fail: Actually its the set s,sc operation. I've modified it to use string_copy. I also added your test and committed them a few minutes ago. Thanks! This one had been really stumping me! Brian Alex Gough Index: parrot/t/op/string.t === RCS file: /home/perlcvs/parrot/t/op/string.t,v retrieving revision 1.11 diff -u -r1.11 string.t --- parrot/t/op/string.t 2001/10/14 00:43:50 1.11 +++ parrot/t/op/string.t 2001/10/16 18:01:34 @@ -1,6 +1,6 @@ #! perl -w -use Parrot::Test tests = 23; +use Parrot::Test tests = 24; output_is( 'CODE', OUTPUT, set_s_sc ); set S4, JAPH\n @@ -296,6 +296,21 @@ ok OUTPUT +output_is('CODE', OUTPUT, same constant twice bug); + set S0, + set S1, + set S2, foo + concat S1,S1,S2 + print S1 + print S0 + print \n +CODE +foo +OUTPUT + + +# Helper subs + # Set all string registers to values given by $_[0](reg num) sub set_str_regs { my $code = shift;
sample code
I've written a library of sorts which contains useful routines such as: _absi - absolute value of I0 _absn - absolute value of I0 _chomp - chomp a string (S0) with a trailing newline _chr - create a string (S0) with the ascii value of I0 _exit - terminate with a return code of I0 _hex - return in I0 the decimal value of the hex string in S0 _index - return the index (in I0) of S1 in S0 starting with I0 _lc - lowercase S0 _lcfirst - lowercase the first character in S0 _ord - return (in I0) the ascii value of the first character in S0 _reverse - reverse the string in S0 _rindex - do a reverse index of S1 in S0, returing the value in I0 _tr - transliterate S0 with S1 as the source and S2 as the dest _uc - uppercase S0 _ucfirst - uppercase the first char in S0 Here's an example which would uppercase only the last character of a string: set S0,Hello world bsr _lc bsr _reverse bsr _ucfirst bsr _reverse print This should have only the 'd' uppercase: print S0 print \n Do you think this library (called utils.pasm) should be included in the examples directory? Brian # # utils.pasm # # Copyright (C) 2001 Yet Another Society. All rights reserved. # This program is free software. It is subject to the same # license as The Parrot Interpreter. # # $Id$ # # This contains a bunch of routines which can be used in other programs by # adding a line like this: # # include utils.pasm # # Unfortunately, since we don't have real libraries, you get the whole thing # whether you want it or not, but its a start. # # # The calling convention is register based, and starts with I0 for the first # integer argument, S0 for the first string argument, etc. Return value(s) # are placed in register 0 of whatever type is being returned. # _absi - the absolute value of I0 _absi: gt I0,0,$done mul I0,I0,-1 $done: ret # _absn - the absolute value of N0 _absn: gt N0,0,$done mul N0,N0,-1.0 $done: ret # _chomp - remove the last character of S0, if it is a newline _chomp: clones pushi length I0,S0 dec I0 substr S1,S0,I0,1 ne S1,\n,$done substr S0,S0,0,I0 $done: saveS0 popi pops restore S0 ret # _chr - return the character S0 for the argument I0 _chr: clonei pushs set S0, set S1,\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xA\xB\xC\xD\xE\xF\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2A\x2B\x2C\x2D\x2E\x2F\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3A\x3B\x3C\x3D\x3E\x3F\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4A\x4B\x4C\x4D\x4E\x4F\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5A\x5B\x5C\x5D\x5E\x5F\x6A\x6B\x6C\x6D\x6E\x6F\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7A\x7B\x7C\x7D\x7E\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xD\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF substr S0,S1,I0,1 saveS0 pops popi restore S0 ret # _exit - exit the program, indicating the status of I0 _exit: print Program terminated with result code print I0 print \n end # _hex - return the decimal value of the hex argument S0 _hex: clones pushi bsr _reverse bsr _uc length I28,S0 set S2,S0 set I29,0 set I30,0 set I31,1 set S0,0123456789ABCDEF set S1, $loop: substr S1,S2,I30,1 bsr _index mul I0,I0,I31 add I29,I29,I0 inc I30 mul I31,I31,16 lt I30,I28,$loop saveI29 popi pops restore I0 ret # _index - return the index of the substring S1 in the string S0, starting # with position I0. -1 is returned if it is not found _index: clones clonei set S2, length I31,S1 length I30,S0 $loop: substr S2,S0,I0,I31 eq S2,S1,$done inc I0 lt I0,I30,$loop set I0,-1 $done: saveI0 popi pops restore I0 ret # _lc - lowercase the string S0 _lc:clones set S2,abcdefghijklmnopqrstuvwxyz set S1,ABCDEFGHIJKLMNOPQRSTUVWXYZ bsr _tr saveS0 pops restore S0 ret # _lcfirst - lowercase the first character of S0 _lcfirst:
Re: RAM?
On Tue, 2001-10-16 at 15:02, James Mastros wrote: On Tue, 16 Oct 2001, Dan Sugalski wrote: That's one way to do it, sure. You can always look at a string as a bounded byte buffer. One of the core 'string' types is series of 8-bit bytes. We couldn't manage JPEG images too well without that. ;) Hm. How do you convert the bytes into integers so you can do arithmetic with them? (In other words, how do you write unpack('c') and friends.) I wrote a library in parrot which implemented ord() to do it. I was going to do pack/unpack but I haven't had a chance yet :) Seriously, though, what are you trying to accomplish with providing generic access to memory? Perhaps if you had a solid application in mind it'd make working out what would be needed to support it easier. I'm thinking of porting GCC, of course G. However, I'm thinking that pretty much any c-like language is going to want somthing like this. You are sick. :) Maybe you could compile the linux kernel down into parrot. I did think about writing a kernel module for writing drivers in parrot, but considering the response that the last person got for writing a script interface module to the kernel, I put the thought away :) Brian
Re: sample code
On Tue, 2001-10-16 at 15:58, James Mastros wrote: On Tue, 16 Oct 2001, Brian Wheeler wrote: _chr - create a string (S0) with the ascii value of I0 _ord - return (in I0) the ascii value of the first character in S0 There's /got/ to be a better way to write these. I propose making opcodes specificly to do these. This is an inner-loop kind of thing. Yeah, probably, but not with the opcodes we have so far. They were just kind of afterthoughts and to see if its actually possible to do it with the ops we have now :) Brian
clone operators
Because I was bored this evening, I implemented the clone operators. Dan? Brian # compute the factorialrecursively! # lets do it for the numbers 0 to 6 main: set I1,0 $loop: print fact of print I1 printis: set I0,I1 bsr fact print I0 print \n inc I1,1 eq I1,7,$done branch $loop $done: end # I0 is the number to compute fact: clonei lt I0,2,$is_one set I1,I0 dec I0,1 bsr fact mul I0,I0,I1 saveI0 branch $done $is_one: set I0,1 saveI0 $done: popi restore I0 ret ? test.pasm ? test.inc ? test.list ? fact.pasm ? interp_guts.c ? config.h ? basic_opcodes.c ? DCn.diff ? macro.pbc ? macro.list ? local_label.pbc ? test.pbc ? euclid.pbc ? macro.pasm ? test2.pbc ? test3.pbc ? mytest.pasm ? cequ.code ? assemble.pl-new_macros ? mytest.pbc ? op_info.c ? frame_test.pbc ? fact.pbc ? frame_test.pasm ? clone.diff ? include/parrot/op_info.h ? t/test.pbc ? t/test1.c ? t/test1 Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.4 diff -u -r1.4 core.ops --- core.ops2001/10/15 21:18:42 1.4 +++ core.ops2001/10/16 00:29:05 @@ -1678,6 +1678,35 @@ Parrot_push_p(interpreter); } + + +=item Bclonei() + +=item Bclonen() + +=item Bclonep() + +=item Bclones() + +Save all the registers of the type indicated in the name of the operation. + +=cut + +AUTO_OP clonei() { + Parrot_clone_i(interpreter); +} + +AUTO_OP clonen() { + Parrot_clone_n(interpreter); +} + +AUTO_OP clones() { + Parrot_clone_s(interpreter); +} + +AUTO_OP clonep() { + Parrot_clone_p(interpreter); +} =back Index: register.c === RCS file: /home/perlcvs/parrot/register.c,v retrieving revision 1.10 diff -u -r1.10 register.c --- register.c 2001/10/02 14:01:30 1.10 +++ register.c 2001/10/16 00:29:06 @@ -38,6 +38,39 @@ } } +/*=for api register Parrot_clone_i + pushes a new integer register frame on the frame stack and + copies the last frame to the current frame +*/ +void +Parrot_clone_i(struct Parrot_Interp *interpreter) { +struct IRegChunk *chunk_base; + +chunk_base = CHUNK_BASE(interpreter-int_reg); +/* Do we have any slots left in the current chunk? */ +if (chunk_base-free) { +interpreter-int_reg = chunk_base-IReg[chunk_base-used++]; +chunk_base-free--; +mem_sys_memcopy(chunk_base-IReg[chunk_base-used-1], +chunk_base-IReg[chunk_base-used-2], +sizeof(struct IReg)); +} +/* Nope, so plan B time. Allocate a new chunk of integer register frames */ +else { +struct IRegChunk *new_chunk; +new_chunk = mem_allocate_aligned(sizeof(struct IRegChunk)); +new_chunk-used = 1; +new_chunk-free = FRAMES_PER_INT_REG_CHUNK - 1; +new_chunk-next = NULL; +new_chunk-prev = chunk_base; +chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-IReg[0], +chunk_base-IReg[chunk_base-used-1], +sizeof(struct IReg)); +interpreter-int_reg = new_chunk-IReg[0]; +} +} + /*=for api register Parrot_pop_i pops an integer register frame off of the frame stack */ @@ -107,6 +140,42 @@ } } +/*=for api register Parrot_clone_s + pushes a new string register frame on the frame stack and + copies the last frame to the current frame +*/ +void +Parrot_clone_s(struct Parrot_Interp *interpreter) { +struct SRegChunk *chunk_base; + +chunk_base = CHUNK_BASE(interpreter-string_reg); +/* Do we have any slots left in the current chunk? */ +if (chunk_base-free) { +interpreter-string_reg = chunk_base-SReg[chunk_base-used++]; +chunk_base-free--; +mem_sys_memcopy(chunk_base-SReg[chunk_base-used-1], +chunk_base-SReg[chunk_base-used-2], +sizeof(struct SReg)); +} +/* Nope, so plan B time. Allocate a new chunk of string register frames */ +else { +struct SRegChunk *new_chunk; +new_chunk = mem_allocate_aligned(sizeof(struct SRegChunk)); +new_chunk-used = 1; +new_chunk-free = FRAMES_PER_STR_REG_CHUNK - 1; +new_chunk-next = NULL; +new_chunk-prev = chunk_base; +chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-SReg[0], +chunk_base-SReg[chunk_base-used-1], +sizeof(struct SReg)); +interpreter-string_reg = new_chunk-SReg[0]; +/* Gotta NULL them out as some string + functions depend on NULL strings */ +Parrot_clear_s(interpreter); +} +}
string weirdness
With the addition of clone, I started writing some generic routines which might be useful (index,lc,uc,reverse,abs,tr,etc)...and I came across some weirdness: doing: save S0 restore S1 (since there's no set S1,S0) binds the registers together, so a change to one is a change to both...which doesn't happen on int registers. In addition, I've got a test program which I cannot figure out why it fails...and I believe its related to the above problem. Here's the test code (sorry for the size): set S0,Hello World bsr _uc print UC is: print S0 print \n bsr _lc print LC is: print S0 print \n end # index - return the position (I0) of a substring (S1) within a string (S0) _index: clones pushi set I0,0 length I1,S1 length I2,S0 $loop: substr S2,S0,I0,I1 eq S1,S2,$done inc I0,1 lt I0,I2,$loop set I0,-1 branch $done $done: saveI0 popi pops restore I0 ret # tr - convert the string S0 by replacing chars in S1 with those in S2. _tr: clones pushi length I22,S2 length I21,S1 length I20,S0 set I0,0 substr S22,S2,I0,I22 # no set s,s substr S21,S1,I0,I21 substr S20,S0,I0,I20 set S28, # our result string. set I29,0 # zero set I28,1 # one set I27,0 $loop: substr S1,S20,I27,I28 substr S0,S21,I29,I21 bsr _index ne I0,-1,$found concat S28,S1 branch $next $found: substr S1,S22,I0,I28 concat S28,S1 $next: print building string: print S28 print , I27= print I27 print , I20= print I20 print \n inc I27,1 lt I27,I20,$loop saveS28 popi pops restore S0 ret _lc:clones set S1,ABCDEFGHIJKLMNOPQRSTUVWXYZ set S2,abcdefghijklmnopqrstuvwxyz bsr _tr saveS0 pops restore S0 ret _uc:clones set S2,ABCDEFGHIJKLMNOPQRSTUVWXYZ set S1,abcdefghijklmnopqrstuvwxyz bsr _tr saveS0 pops restore S0 ret Here's what it outputs: building string: H, I27= 0, I20= 11 building string: HE, I27= 1, I20= 11 building string: HEL, I27= 2, I20= 11 building string: HELL, I27= 3, I20= 11 building string: HELLO, I27= 4, I20= 11 building string: HELLO , I27= 5, I20= 11 building string: HELLO W, I27= 6, I20= 11 building string: HELLO WO, I27= 7, I20= 11 building string: HELLO WOR, I27= 8, I20= 11 building string: HELLO WORL, I27= 9, I20= 11 building string: HELLO WORLD, I27= 10, I20= 11 UC is: HELLO WORLD building string: dH, I27= 0, I20= 11 building string: dE, I27= 1, I20= 11 building string: dL, I27= 2, I20= 11 building string: dL, I27= 3, I20= 11 building string: dO, I27= 4, I20= 11 building string: d , I27= 5, I20= 11 building string: dW, I27= 6, I20= 11 building string: dO, I27= 7, I20= 11 building string: dR, I27= 8, I20= 11 building string: dL, I27= 9, I20= 11 building string: dD, I27= 10, I20= 11 LC is: dD S28 is somehow not growing the 2nd time a routine using _tr is called...which leads me to believe that some binding of S28 has occurred. Maybe its a dumb typo. I don't know, and I'm stumped. Anyone have any hints? Brian
push* change [was: simple subs...]
Here's a small change to push* which copies the last context automatically. I was thinking about Dan's clone opcode, when I realized that most of the time you're going to want the values, and if you don't, you can always issue a clear. It passes all of the make tests, and it allows recursive programs to be written, such as the factorial below. Dan, what do you think? Brian # compute the factorialrecursively! # lets do it for the numbers 0 to 6 main: set I1,0 $loop: print fact of print I1 printis: set I0,I1 bsr fact print I0 print \n inc I1,1 eq I1,7,$done branch $loop $done: end # I0 is the number to compute fact: pushi lt I0,2,$is_one set I1,I0 dec I0,1 bsr fact mul I0,I0,I1 saveI0 branch $done $is_one: set I0,1 saveI0 $done: popi restore I0 ret Index: register.c === RCS file: /home/perlcvs/parrot/register.c,v retrieving revision 1.10 diff -u -r1.10 register.c --- register.c 2001/10/02 14:01:30 1.10 +++ register.c 2001/10/14 01:28:43 @@ -24,6 +24,9 @@ if (chunk_base-free) { interpreter-int_reg = chunk_base-IReg[chunk_base-used++]; chunk_base-free--; +mem_sys_memcopy(chunk_base-IReg[chunk_base-used-1], +chunk_base-IReg[chunk_base-used-2], +sizeof(struct IReg)); } /* Nope, so plan B time. Allocate a new chunk of integer register frames */ else { @@ -34,6 +37,9 @@ new_chunk-next = NULL; new_chunk-prev = chunk_base; chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-IReg[0], +chunk_base-IReg[chunk_base-used-1], +sizeof(struct IReg)); interpreter-int_reg = new_chunk-IReg[0]; } } @@ -90,6 +96,9 @@ if (chunk_base-free) { interpreter-string_reg = chunk_base-SReg[chunk_base-used++]; chunk_base-free--; +mem_sys_memcopy(chunk_base-SReg[chunk_base-used-1], +chunk_base-SReg[chunk_base-used-2], +sizeof(struct SReg)); } /* Nope, so plan B time. Allocate a new chunk of string register frames */ else { @@ -100,6 +109,9 @@ new_chunk-next = NULL; new_chunk-prev = chunk_base; chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-SReg[0], +chunk_base-SReg[chunk_base-used-1], +sizeof(struct SReg)); interpreter-string_reg = new_chunk-SReg[0]; /* Gotta NULL them out as some string functions depend on NULL strings */ @@ -159,6 +171,9 @@ if (chunk_base-free) { interpreter-num_reg = chunk_base-NReg[chunk_base-used++]; chunk_base-free--; +mem_sys_memcopy(chunk_base-NReg[chunk_base-used-1], +chunk_base-NReg[chunk_base-used-2], +sizeof(struct NReg)); } /* Nope, so plan B time. Allocate a new chunk of float register frames */ else { @@ -169,6 +184,9 @@ new_chunk-next = NULL; new_chunk-prev = chunk_base; chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-NReg[0], +chunk_base-NReg[chunk_base-used-1], +sizeof(struct NReg)); interpreter-num_reg = new_chunk-NReg[0]; } } @@ -225,6 +243,9 @@ if (chunk_base-free) { interpreter-pmc_reg = chunk_base-PReg[chunk_base-used++]; chunk_base-free--; +mem_sys_memcopy(chunk_base-PReg[chunk_base-used-1], +chunk_base-PReg[chunk_base-used-2], +sizeof(struct PReg)); } /* Nope, so plan B time. Allocate a new chunk of float register frames */ else { @@ -235,6 +256,9 @@ new_chunk-next = NULL; new_chunk-prev = chunk_base; chunk_base-next = new_chunk; +mem_sys_memcopy(new_chunk-PReg[0], +chunk_base-PReg[chunk_base-used-1], +sizeof(struct PReg)); interpreter-pmc_reg = new_chunk-PReg[0]; /* Gotta NULL them out or we might GC Wrong things later */ Parrot_clear_p(interpreter);
Re: Fetching the PC?
On Thu, 2001-10-11 at 20:49, Dan Sugalski wrote: At 08:25 PM 10/11/2001 -0500, Brian Wheeler wrote: Since we're passing guilt around, there's an equate of '*' which is the current PC...and I didn't document it. You can do set I1,* and it will set I1 to the current PC. It doesn't allow any math, though. I thought about hooking up eval to various brackets but I never got the time before my job got busy... Absolute or relative PC? Well, its relative to the start of the bytecode...which I suppose would be absolute...unless multiple bytecode chunks are placed in the same memory block, in which case it'd be relative. Now I'm confused. :) Though I like Gregor's way of doing it: we know the addresses (more or less) at compile time, so we might as well not waste ops doing arithmetic that we know in advance... Fair enough, though we don't really know the absolute PC at assembly time, as we're all position independent. Thinking further, having the getpc opcode take an offset would let us do something like: getpc I0, FOO to put the absolute address of FOO into I0, suitable for jumps and jsrs. This just comes out as a specialized add, right? In fact, isn't it this: set I0,* inc I0,FOO -- assuming the assembler knew that this is an address How are multiple bytecode chunks (i.e. libraries) going to be handled? Are they going to be contiguous? Are they going to be segmented somehow so there's a far jump which takes us out of the current block? Brian
Re: Fetching the PC? [RESEND]
Argh, my mailer crashed as I sent this, so I don't know if it went out. On Thu, 2001-10-11 at 21:23, Dan Sugalski wrote: At 09:12 PM 10/11/2001 -0500, Brian Wheeler wrote: On Thu, 2001-10-11 at 20:49, Dan Sugalski wrote: At 08:25 PM 10/11/2001 -0500, Brian Wheeler wrote: Since we're passing guilt around, there's an equate of '*' which is the current PC...and I didn't document it. You can do set I1,* and it will set I1 to the current PC. It doesn't allow any math, though. I thought about hooking up eval to various brackets but I never got the time before my job got busy... Absolute or relative PC? Well, its relative to the start of the bytecode...which I suppose would be absolute...unless multiple bytecode chunks are placed in the same memory block, in which case it'd be relative. Now I'm confused. :) Absolute addresses are, well, absolute addresses. Relative addresses are offsets from the current location. I think the confusion's because the jump opcode's broken. When you say jump 12 It should jump to absolute address 12, not 12 bytes/words/opcodes from the current position. Ok, fair enough...but should it be jumping to: (char *)byte_code+12 or (char *)12 If its the former, it seems there will be some problems jumping to other bytecode blocks, and if its the latter, at the very least someone can do a jump 0 and coredump the process. I could be missing stuff, because I've not had a chance to review the changes over the last few weeks Though I like Gregor's way of doing it: we know the addresses (more or less) at compile time, so we might as well not waste ops doing arithmetic that we know in advance... Fair enough, though we don't really know the absolute PC at assembly time, as we're all position independent. Thinking further, having the getpc opcode take an offset would let us do something like: getpc I0, FOO to put the absolute address of FOO into I0, suitable for jumps and jsrs. This just comes out as a specialized add, right? In fact, isn't it this: set I0,* inc I0,FOO -- assuming the assembler knew that this is an address Yup. Only set doesn't take * as a parameter--it can't, because there's no way to know at assembly time what the real PC will be. Yeah, I thought about that afterwards (the condition where real PC is actually a memory address) How are multiple bytecode chunks (i.e. libraries) going to be handled? They're just going to get mmapped in wherever the system puts 'em. Are they going to be contiguous? Nope. Are they going to be segmented somehow so there's a far jump which takes us out of the current block? Nope. Jumps and jsrs take absolute addresses, so they can go anywhere. Branches are relative so fixing them up to bounce between segments would be tough, but we're not going to do that. :) Fair enough :) Brian Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [BUG] Mandlebrot core
On Wed, 2001-10-03 at 14:43, Leon Brocard wrote: Leon Brocard sent the following bits through the ether: It's all greek to me, Leon The following bytecode: 0075 [01d4]: 0032 0008 000a 0008 gt_nc_ic N8, [nc:10], L2 is wrong. It shouldn't be [nc:10]. 10 is the numeric constant which should have been in the constant table ([nc:10] refers to the 10th constant, there are only constants 0-6 hence the crash), so I point the finger at the assembler (of which I little). But I know more about debugging coredumps now, honest. Its sort of the assembler, but an easy fix is to make the constant 10.0 instead of 10. Brian Leon -- Leon Brocard.http://www.astray.com/ Nanoware...http://www.nanoware.org/ ... I tried to think but nothing happened!
Re: Manifest constants?
On Tue, 2001-10-02 at 08:23, Dan Sugalski wrote: Is 'pi' a string to be looked up in a table at assemble time and converted to an intrinsic constant table index Yes. At some point the assembler needs to have a way to declare named constants, we just haven't gotten there yet. How about the attached patch? It adds these directives: DCI, DCN, DCP, DCS for Define Constant (Integer, Number, PMC, String) It (currently) works by abusing the equates. In the cases of Integer and PMC, it is exactly equivalent. In the case of Number and String, the value of the equate is [sc:#] or [nc:#] which is the index into the look up table. It could easily be expanded to store these constant names somewhere other than the equate hash. Since we're anticipating the release of 0.02, I didn't apply it, and I'd like feedback on it. Brian Index: Parrot/Assembler.pm === RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v retrieving revision 1.3 diff -u -r1.3 Assembler.pm --- Parrot/Assembler.pm 2001/10/03 23:58:41 1.3 +++ Parrot/Assembler.pm 2001/10/04 00:29:23 @@ -670,7 +670,8 @@ sub has_asm_directive { return $_[0] =~ /^[_a-zA-Z]\w*\s+macro\s+.+$/i || - $_[0] =~ /^[_a-zA-Z]\w*\s+equ\s+.+$/i; + $_[0] =~ /^[_a-zA-Z]\w*\s+equ\s+.+$/i || +$_[0] =~ /^[_a-zA-Z]\w*\s+dc[inps]\s+.+$/i ; } @@ -678,8 +679,10 @@ =head2 handle_asm_directive -Processes macros and equ directives. equ directives get stored in an equ hash. -Macros store all program lines in an array. +Processes macro definitions, dc* declarations, and equ directives. equ +directives get stored in an equ hash. Macros store all program lines in an +array. dc* declarations create a constant and an equ which refers to the +constant. NOTE: This function modifies @program. @@ -691,6 +694,26 @@ my( $name, $data ) = ($1, $2); $equate{$name} = $data; return 1; + } + elsif( $line =~ /^([_a-zA-Z]\w*)\s+dc([inps])\s+(.+)$/i ) { + my( $name, $type, $data ) = ($1, $2, $3); + if( $type eq i ) { + # TODO: we cheat on integers, since they really don't go in the + # constants table. + $equate{$name} = $data; + } + elsif( $type eq n ) { + $equate{$name} = constantize_number($data); + } + elsif( $type eq p) { + # TODO: what does a PMC constant look like? + $equate{$name} = $data; + } + elsif( $type eq s) { + $data=~s/^(.+)$/$1/; + $equate{$name} = constantize_string($data); + } + return 1; } elsif( $line =~ /^([_a-zA-Z]\w*)\s+macro\s+(.+)$/i ) { # a macro definition
Re: instructions per second benchmark (in parrot ;)
On Thu, 2001-09-20 at 16:46, Dan Sugalski wrote: At 04:54 PM 9/20/2001 -0500, Brian Wheeler wrote: Since all benchmarks are crap anyway, I've written a test which tells the average number of instructions per second. On my athlon 700 I get 3966053 instructions per second and on my PIII 866 I get 5081485 instructions per second. Do those sound like reasonable numbers? Of course, since time_i is one of the opcodes looped, it probably brings the numbers down. That's actually what test.pasm tests. :) I just checked in a new version that prints labels. Yeah, I realized that _as soon as I posted it_. Doh! Heheh, for the longest time I thought test.pasm just did stuff for a while. Anyway, it started out as an implementation of the bogomips.c program and slowly became this. FWIW, my 600MHz Alpha clocks in at around 23M ops/sec. Nyah! ;-P I get 10M ops/sec on the Athlon 700 using test.pasm. time_i is a killer, especially judging by the difference in cpu usage (test.pasm is nearly 100% user, and mine is 75/25 user/system) The 866 machine gets 15M. Maybe when/if I get my 1.26GHz machine at work I'll be able to match you :) Brian Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Changes to assemble.pl: Includes and Macros
Please test this out to make sure I haven't done anything stupid! The syntax for including another file is: include 'filename' or include filename The file will be included as-is at that spot. The listing will reflect that a series of lines was included by printing # Start of filename # End of filename at the beginning and end of the included file. Macros are defined like: println macro thing print thing print \n endm and invoked like println My happy string in the listing, expanded macros appear with in front of the code. You should be able to include up to any depth, but I didn't test it. In addition, macros should be able to use other macros. I did some basic tests, but nothing extensive. The only incompatibility I've introduced is now assemble.pl won't read from stdin...you have to give it a filename. Patches welcome! Enjoy! Brian
Re: Wanted: Subroutine call example
On Sun, 2001-09-16 at 14:26, Gregor N. Purdy wrote: Brian -- Its not going to work, if I understand it correctly. I tried doing it (even set up the symbol '*' to mean the current PC) and do it, but it seems the ops take a relative offset. Take jump_i, for example: Taking this into account, I modivied jump.pasm and fixed the offsets. I tested it this way... $ assemble.pl --output t/jump.pbc --listing t/jump.list t/jump.pasm $ test_prog t/jump.pbc Jump test. Jumping to subroutine... $ perl disassemble.pl t/jump.pbc I also did 'vi t/jump.list'. With these hard-coded relative offsets, I think the program should be producing the desired output, but it doesn't. If it did, I could do some tricks in jakoc and/or assemble.pl to create a pseudo-op for jumping to any label, and another for setting up the return-offset based on the end-label of the destination block and the label we want it to return to. It sure would be nice, though, to have a real 'absjump_i' opcode (really start-of-bytecode- relative rather than current-program-counter relative). This won't be needed once jsr/ret become available. ret is (really) the absjump_i you're looking for since its only purpose is to return to a previous place. Everything else can be setup as relative to the PC. It would also be helpful to have jump_ic and absjump_ic, rather than having to load these things into registers. You can use branch_ic for this. absjump is probably a bad idea since it appears the return value of the opcode function is really a memory address of the host machine. If an absolute value (relative to the beginning of the opcode stream) was used, then it would have to be relocated at load-time and wouldn't be able to remain read-only. Anyway, I'm attaching the revised jump.pasm example. Maybe I still don't have the offsets calculated right, but I stared at jump.list and the disassembly listing for a while and I *think* they are right. I wish I had a tracing mode where I could watch exactly where it was jumping to and what ops it was executing. I still wonder if somehow its jumping out of the bytecode and landing on an op zero (end). Jumping out of the bytecode *should* raise an error, though, no? If I'm not using this right, I sure would like to see an example of the correct use of jump_i... I've not figured out what jump_i is for :) I suspect its for calculated jumps (i.e. the switch statement) rather than for subroutines. Overall, I think you're barking up the wrong tree. jsr/ret are for subroutines. Your subroutine isn't going to work for more than one caller, since the return offset is hardcoded, you might as well write jump.pasm as: MAIN: print Jump test.\n print Jumping to subroutine...\n branch SUB RET:print Returned from subroutine!\n end SUB:print Entered subroutine...\n branch RET because the bytecodes dumped out are virtually identical. We're going to have to wait for Dan (or, you could manually apply the patch I posted earlier which adds a simple jsr/ret implementation) Brian
Re: Wanted: Subroutine call example
On Mon, 2001-09-17 at 11:20, Gregor N. Purdy wrote: I agree that jsr/ret are what I really want, but I'm dying to play with baby subroutines in jako, and I think I could play enough games with a properly understood jump_i and some assembler magic to make them work. I now have jump.pasm working right (the key was to think in terms of word offsets, not byte offsets). :) I think I read somewhere that impatience was a good thing... It would also be helpful to have jump_ic and absjump_ic, rather than having to load these things into registers. You can use branch_ic for this. absjump is probably a bad idea since it appears the return value of the opcode function is really a memory address of the host machine. If an absolute value (relative to the beginning of the opcode stream) was used, then it would have to be relocated at load-time and wouldn't be able to remain read-only. My mental model for this thing is Parrot is a CPU architecture and a lot of what I'm trying to make it do is behave as much as possible in that way. So, if we are talking about a CPU architecture (ISA), lacking the full complement of relative and absolute jumps, register and constant comparisons, etc. is a bummer. I think we're ok on relative ops, but you're right, we're short of absolute. Mucking with things like memory addresses of host machines, etc. is important, but only at the interpreter level, not at the ISA design level (IMHO). I don't see a relocation issue anyway, since the way I look at the interpreter code is: PC === (code - start_code) So if an instruction says absjump 43, then I know that I need to do: code = (start_code + 43) and if an instruction says (rel)jump 43 then I know that I need to do: code += 43 neither of these is a problem in my eyes. Ok, they're not really problems, per se, but the way the interpreter is constructed, the ops *always* return the offset to the next PC. Overall, I think you're barking up the wrong tree. jsr/ret are for subroutines. Your subroutine isn't going to work for more than one caller, since the return offset is hardcoded, you might as well write jump.pasm as: Not quite. First, recall jump.pasm (attached). It does work now that I've got the offsets correct. Now, there is nothing to stop us from moving the setting of I2 to earlier (jump2.pasm, attached). With appropriate labels and address arithmetic (jump3.pasm, attached) it is even almost readable. Now, if we also have good macros and '* == PC' and calculated labels, we can have some real fun (jump4.pasm, attached). Ok, fair enough. I've implemented '*' as the PC in the assembler. I've not done macros yet, but I remember seeing a patch somewhere. Of course, this doesn't allow us to do recursion, wastes registers, etc. But, it would allow me to play with the syntax of the language with a simple implementation until cooler ops are available. :) there are real architectures that couldn't (easily, if at all) do recursion. I think the PDP-8 was one of them. We're going to have to wait for Dan (or, you could manually apply the patch I posted earlier which adds a simple jsr/ret implementation) Probably it would be best to wait. But my Impatience is showing... Like I said...I think I've heard its a good thing :) Brian
Re: Assembler missing byte code segment header?
As near as I can tell, its always been just the bytecode following without a length specifier. I was going to play with it, but since we're still deciding on the file format, I thought I'd leave it alone. Brian On Mon, 2001-09-17 at 16:06, Gregor N. Purdy wrote: All -- I'm not certain about this, but it sure doesn't look right to me. (note I did this after turning off fingerprint saving in the assembler for clarity): $ perl -e 'print set_i I0, 4\nend\n' | assemble.pl | od -x 000 55a1 0131 0002 020 0004 034 * The first four bytes are the magic number * The next four bytes are all zero for no fixup table here * The next four bytes are all zero for no const table here * The next four bytes should be pack('l', 16) for 16 bytes of byte code follow (NOTE: THESE SEEM TO BE MISSING) Now, the disassembler and interpreter don't seem to care, but I thought the current format involved 3 or 4 segments in the length-payload pair format. The fourth (optional) segment according to the docs is a place to store the source code. Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Local labels in assemble.pl
I've commited a change which allows local labels to be used in parrot. The labels are local relative to the last non-local label defined (i.e. local labels are forgotten when a non-local is defined). Here's my test program: main: print test 1\n branch $ok $ng:print ng 1\n branch test2 $ok:print ok 1\n test2: print test 2\n branch $ok $ng:print ng 2\n branch done $ok:print ok 2\n done: end It should make automatic code generation easier for jako. If a listing is produced, local labels are _not_ shown. Brian
RFC: bytecode file format (again)
After reading various posts about the bytcode file format it occurred to me that we need to determine what we need :) Meta-information: * Magic cookie * version * endian/size markers * index of all chunks for fast lookup Things we need to store: * bytecode * external symbols/list of modules required * public symbols * source code * raw data (think: predefined PMCs, DATA sections ,etc) * other stuff :) So, in the spirit of KISS, what about this format: Offset Size Description 0 4 Magic Cookie 'PAR0' 4 8 Endian mark (0x0123456789ABCDEF) 12 4 Version (Major * 0x1 + Minor) 16 4 Size Marker 20 12 Padding/Reserved for future use 32 n Index (n=# of entries * 16) 32+n m Data blocks, stored sequentially Each index entry 16 bytes, and is laid out like:: Offset Size Description 0 4 Type (bytecode, comment, source, fixup, etc) 4 4 ID (to differentiate blocks of the same type) 8 4 Offset (offset from beginning of file of data) 12 4 Length (in bytes) Block IDs are used when we need to differentiate between different blocks of the same type, i.e. constant data or multiple sets of bytecode in the same file as in libraries. Given this overall format, the format of each data block varies depending on its type. Type_id Name Ignore/Empty/Invalid - This block is ignored 0001 Bytecode 0002 String Constants 0003 External Symbols 0004 Public Symbols 0005 Source Code 0006 Debugging Info Thoughts on *this* can of worms? Brian
segfaults in today's cvs
I've tracked it down to string problems. Looks like: init_bytecode(program_code); calls read_constants_table(program_code); calls Parrot_string_constants[i++] = string_make(*program_code /* ouch */, buflen, encoding, flags, type); calls string_compute_strlen(s); which segfaults. Any ideas what's going on? Brian
Re: Calls for a Macro Assembler Guy
On Fri, 2001-09-14 at 10:20, Dan Sugalski wrote: Okay, we've had a number of people in favor of a good macro assembler for Parrot. Given that, do we have anyone who'll volunteer to define, maintain, and extend the thing? (Presuming we jump off of the current assembler, which seems reasonable) There probably isn't a huge amount to do with the thing--maintain macro substitutions, handle local labels, manage sub definitions, and suchlike things. Wouldn't it largely be just filtering the input through cpp? Brian Anyone? Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RFC: Bytecode file format
I've been thinking alot about the bytecode file format lately. Its going to get really gross really fast when we start adding other (optional) sections to the code. So, with that in mind, here's what I propose: * All data sizes are in longwords (4 bytes) because that's just the way things are :) * The file is composed of a header (which is really just a magic cookie) , a series of data chunks, and a directory (of sorts) Offset Length Description 0 1 Magic Cookie (0x013155a1) 1 n Data n+1 m Directory Table m+n+1 1 Offset of beginning of directory table (i.e. n+1) The directory is after the data so offsets can be determined as the data is written. The directory offset is at the very end, so it can be determined before the directory is written, and easily found by loaders. Each Directory Entry consists of 3 longs: Offset Length Description 0 1 ID/Type of chunk 1 1 Size of chunk 2 1 Offset from beginning of file Chunk types we've seen so far are: Bytecode, Fixup, String Table. Just guessing, but we'll probably need things likes symbol tables, source chunks, notes, etc. What do you guys think? Brian
Re: RFC: Bytecode file format
On Fri, 2001-09-14 at 15:44, Buddha Buck wrote: At 03:10 PM 09-14-2001 -0500, Brian Wheeler wrote: I've been thinking alot about the bytecode file format lately. Its going to get really gross really fast when we start adding other (optional) sections to the code. So, with that in mind, here's what I propose: snip What do you guys think? Have you taken a look at the old Amiga IFF format? It consisted mainly of chunks identified by a 32-bit type code and a chunk-length code. While most implementations were for specific multi-media applications (chunks defining sound formats, chunks defining image formats, etc), the standard itself was data-neutral. :) That was one of the references I used. I looked up these formats on wotsit before proposing: * IFF Consistant, but you've got to scan the whole file to find out ifa chunk is even there. * Doom WAD Clean format, mini-directory at the end. * ELF Overkill, I think, but does give some ideas of the types of sections we'll be needing. I believe that Microsoft is using a derivative of that format for some of its files, and I think that TIFF files are another instantiation. It may be worth looking at to avoid re-inventing wheels. Agreed. Brian
Re: RFC: Bytecode file format
On Fri, 2001-09-14 at 15:42, Dan Sugalski wrote: At 03:10 PM 9/14/2001 -0500, Brian Wheeler wrote: I've been thinking alot about the bytecode file format lately. Its going to get really gross really fast when we start adding other (optional) sections to the code. So, with that in mind, here's what I propose: * All data sizes are in longwords (4 bytes) because that's just the way things are :) Nope. At the very least, a bytecode file needs to start with: 8-byte word:endianness (magic value 0x123456789abcdef0) byte: word size byte[7]:empty word: major version word: minor version Where all word values are as big as the word size says they are. The magic value can be something else, but it should byteswap such that if you read it in you can tell whether it was a big-endian write or a little-endian write. Followed by the offsets to the various segments in the file. The directory (basically the offsets) *must* be at the very beginning, right after the header. Also, don't choose a stream format that makes writing the file easy--choose one that makes reading easy and fast. Two or three passes before writing are fine. And...in another message Heh. Doing our bytecode files in IFF format would be really nifty. We could even see about raiding the old Fred Fish library for C source to read and write 'em... :) Fred Fish...that's a name I've not heard in a long time. I still have a floppy or two with fish disk ### stuck on them. Ok, I'll admit I was initially considering IFF but it seemed too scary...but then it could be residual feelings from when I was a youngster and tried to write them with BASIC :) Ok, what if we did IFF with these caveats: * all chunks must be padded to 4 bytes (instead of IFF's 2) * no nesting of FORMs Chunks we'd need are: Name: 'PINF' - Parrot Information Size: 28 bytes + size of directory Optional: No Data: longmagic cookie (or will PINF) be enough? 8-byte word:endianness (magic value 0x123456789abcdef0) byte: word size byte[7]:empty word: major version word: minor version longcount of directory entries --- directory goes here --- -- each entry as follows -- longtype of chunk longoffset Name: 'PBYT' - Parrot Bytecode Size: Varies Optional: Sure. :) Data: bytes of the bytecode Name: 'PSTR' - Parrot String Table Size: Varies Optional: Yes Data: longCount of string entries --- each string as follows --- longbyte length n bytes + pad string data Name: 'PFIX' - Parrot Fixup Table Size: Varies Optional: Yes Data: --- beats me...how are we doing fixups? --- Name: 'PNOT' - Parrot Notes Block Size: Varies Optional: Yes Data: free-form text for 'notes' about the file. How's this? Brian
Re: pasm.pl: a different parrot assembler
On Thu, 2001-09-13 at 02:23, Simon Cozens wrote: On Wed, Sep 12, 2001 at 11:23:27PM -0500, Brian Wheeler wrote: I've been having tons of problems with labels in the current assembler...so I wrote my own. It should provide all of the features that the current assembler has. I'ved tested and ran all of the current t/*.pasm files. Here it is...feedback is always welcome. Cool. Now, if it only handled abbreviated op names, I'd apply it... :) Simon You, sir, are a very picky man. :) Here it is...I tested it on this euclid.pasm: MAIN: set I1, 96 set I2, 64 set I3, 0 set S1, Algorithm E (Euclid's algorithm) print S1 E1: mod I4, I1, I2 E2: eq I4, I3, DONE, E3 E3: set I1, I2 set I2, I4 branch E1 DONE: print I2 end Enjoy! Brian #! /usr/bin/perl -w # # pasm.pl - take a parrot assembly file and spit out a bytecode file # This is based heavily on assemble.pl # Brian Wheeler ([EMAIL PROTECTED]) use strict; my $opt_c; if (@ARGV and $ARGV[0] eq -c) { shift @ARGV; $opt_c = 1; } # define data types my(%pack_type)=('i'='l','n'='d'); my(%real_type)=('I'='i','i'='i', 'N'='i','n'='n', 'S'='i','s'='i', 'D'='i'); # compute sizes my(%sizeof); foreach (keys(%real_type)) { $sizeof{$_}=length(pack($pack_type{$real_type{$_}},0)); } # get opcodes from guts. open GUTS, interp_guts.h; my %opcodes; while (GUTS) { next unless /\tx\[(\d+)\] = ([a-z_]+);/; $opcodes{$2}{CODE} = $1; } close GUTS; # get opcodes and their arg lists open OPCODES, opcode_table or die Can't get opcode table, $!/$^E; while (OPCODES) { next if /^\s*#/; chomp; s/^\s+//; next unless $_; my ($name, $args, @types) = split /\s+/, $_; my @rtypes=@types; @types=map { $_ = $real_type{$_}} @types; $opcodes{$name}{ARGS} = $args; $opcodes{$name}{TYPES} = [@types]; $opcodes{$name}{RTYPES}=[@rtypes]; } close OPCODES; # read source and assemble my $pc=0; my $op_pc=0; my ($bytecode,%label,%fixup,%constants,@constants); my $line=0; while() { $line++; chomp; s/^\s*//; s/\s*$//; next if(/^\#/ || $_ eq ); if(m/^((\S+):)?\s*(.+)?/) { my($label,$code)=($2,$3); if(defined($label) $label ne ) { if(exists($label{$label})) { error('$label' already defined!); } if(exists($fixup{$label})) { # backpatch everything with this PC. while(scalar(@{$fixup{$label}})) { my $op_pc=shift(@{$fixup{$label}}); my $offset=shift(@{$fixup{$label}}); substr($bytecode,$offset,4)=pack('l',($pc-$op_pc)/4); } delete($fixup{$label}); } $label{$label}=$pc; # store it. } next if(!defined($code)); 1 while $code=~s/\([^\\\]*(?:\\.[^\\\]*)*)\/constantize($1)/eg; $code=~s/,/ /g; my($opcode,@args)=split(/\s+/,$code); $opcode=lc($opcode); if (!exists $opcodes{$opcode}) { # try to determine _real_ opcode. my @arg_t=(); foreach (@args) { if(m/^([INPS])\d+$/) { # a register. push @arg_t,lc($1); } elsif(m/^\d+$/) { # a constant of some sort push @arg_t,'(ic|nc|sc)'; } else { # a label push @arg_t,'ic'; } } my $test; my($first,$last)=($arg_t[0],$arg_t[-1]); if($first ne $last) { $test=$opcode\_$first\_$last; } else { $test=$opcode\_$first; } my($found_op)=0; foreach my $op (grep($_=~/^$opcode/,keys(%opcodes))) { if($op=~/$test/) { $opcode=$op; $found_op=1; last; } } error(No opcode $opcode in $_) if(!$found_op); } if (@args != $opcodes{$opcode}{ARGS}) { error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); } $bytecode .= pack l, $opcodes{$opcode}{CODE}; $op_pc=$pc; $pc+=4; foreach (0..$#args) { my($rtype)=$opcodes{$opcode}{RTYPES}[$_]; my($type)=$opcodes{$opcode}{TYPES}[$_]; if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { # its a register argument $args[$_]=~s/^[INPS](\d+)$/$1/i; $pc+=$sizeof{$rtype} } elsif($rtype eq D) { # a destination if(!exists($label{$args[$_]})) { # we have not seen it yet...put it on the fixup list push(@{$fixup{$args[$_]}},$op_pc,$pc); $args[$_]=0x; } else { $args[$_]=($label{$args[$_]}-$op_pc)/4; } $pc+=$sizeof{$rtype}; } else { $args[$_]=oct($args[$_]) if($args[$_]=~/^0/); $pc+=$sizeof{$rtype
patch: assemble.pl choosing wrong op sometimes...
I caught it trying to use inc_i_ic instead of inc_i in a test program I was running. this patch fixes it. Brian Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.12 diff -r1.12 assemble.pl 112c112 if($op=~/$test/) { --- if($op=~/^$test$/) {
Re: patch: assemble.pl choosing wrong op sometimes...
On Thu, 2001-09-13 at 09:52, Jarkko Hietaniemi wrote: On Thu, Sep 13, 2001 at 09:54:35AM -0500, Brian Wheeler wrote: I caught it trying to use inc_i_ic instead of inc_i in a test program I was running. this patch fixes it. Brian Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.12 diff -r1.12 assemble.pl 112c112 if($op=~/$test/) { --- if($op=~/^$test$/) { Unless $test is really a pattern here this is $op eq $test... It is a pattern. If you have a constant, it doesn't know what kind, so the pattern it creates is (ic|nc|sc) Brian
patch: print op cleanups, new ops, assembler support of \(whatever)
This patch gives the assembler support of '\a','\n','\r','\t', and '\\' in string constants. In addition, it changes (for all registers) I reg %li is ... to just the value of the register. Printing constants is also supported, but alas, you have to specify the type (print_sc, print_ic, print_nc). Brian ? pasm.pl ? patch ? test2.pbc ? test3.pbc ? euclid0.pbc ? euclid1.pbc ? euclid.pbc ? print_cleanups.diff ? bitops.pbc ? bitops+assembler.patch ? t/bitops.pasm Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.12 diff -r1.12 assemble.pl 112c112 if($op=~/$test/) { --- if($op=~/^$test$/) { 121,122c121 error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); --- error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); 131,132c130 if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { --- if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { 219a218,220 # handle \ characters in the constant my %escape = ('a'=\a,'n'=\n,'r'=\r,'t'=\t,'\\'='\\',); $s=~s/\\([anrt\\])/$escape{$1}/g; 223a225 Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.13 diff -r1.13 basic_opcodes.ops 120c120 printf(I reg %li is %li\n, P1, INT_REG(P1)); --- printf(%li, INT_REG(P1)); 121a122,127 /* PRINT ic */ AUTO_OP print_ic { printf(%li, P1); } 212c218 printf(N reg %li is %f\n, P1, NUM_REG(P1)); --- printf(%f, NUM_REG(P1)); 214a221,225 /* PRINT nc */ AUTO_OP print_nc { printf(%f, P1); } 317c328,334 printf(S reg %li is %.*s\n, P1, (int) string_length(s), (char *) s-bufstart); --- printf(%.*s,(int)string_length(s),(char *) s-bufstart); } /* PRINT sc */ AUTO_OP print_sc { STRING *s = Parrot_string_constants[P1]; printf(%.*s,(int)string_length(s),(char *) s-bufstart); 318a336 Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.12 diff -r1.12 opcode_table 56a57 print_sc 1 s 86a88 print_ic 1 i 88a91 print_nc 1 n
Yet another patch: assemble.pl - better opcode guessing
This does a better job at guessing the correct opcode: the constant is compared to a regex and determined which kind it is, instead of saying its just some sort of constant. This fixes the guessing problems with my (print_ic print_sc print_nc) patch. Brian Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.12 diff -r1.12 assemble.pl 95,97d94 } elsif(m/^\d+$/) { # a constant of some sort push @arg_t,'(ic|nc|sc)'; 99,100c96,109 # a label push @arg_t,'ic'; --- # a constant of some sort if(m/^\[(\d+)\]$/) { # string push @arg_t,'sc'; } elsif(m/^((-?\d+)|(0b[01]+)|(0x[0-9a-f]+))$/i) { # integer push @arg_t,'ic'; } elsif(m/^[a-z][\w]*$/i) { # label push @arg_t,'ic'; } else { # numeric push @arg_t,'nc'; } 109a119 print STDERR test: $test\n; 112c122 if($op=~/$test/) { --- if($op eq $test) { 121,122c131 error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); --- error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); 131,132c140 if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { --- if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { 145a154,156 } elsif($rtype eq 's') { $args[$_]=~s/[\[\]]//g; $pc+=$sizeof{$rtype}; 220,222c231,238 return $constants{$s} if exists $constants{$s}; push @constants, $s; return $constants{$s} = $#constants; --- # handle \ characters in the constant my %escape = ('a'=\a,'n'=\n,'r'=\r,'t'=\t,'\\'='\\',); $s=~s/\\([anrt\\])/$escape{$1}/g; if(!exists($constants{$s})) { push(@constants,$s); $constants{$s}=$#constants; } return [.$constants{$s}.]; 223a240
patch: bitops with constants
This patch allows you to do thingies like: and I1,I2,0x 'and', 'or', and 'xor' have been adapted to use this. Also, shl and shr can take an integer register as the amount to shift. Brian ? pasm.pl ? patch ? test2.pbc ? test3.pbc ? euclid0.pbc ? euclid1.pbc ? euclid.pbc ? const_bitops.diff ? bitops.pbc ? t/bitops.pasm Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.14 diff -u -r1.14 basic_opcodes.ops --- basic_opcodes.ops 2001/09/13 16:16:38 1.14 +++ basic_opcodes.ops 2001/09/13 19:13:43 @@ -562,6 +562,12 @@ INT_REG(P1) = INT_REG(P2) INT_REG(P3); } +/* AND_i_ic */ +AUTO_OP and_i_ic { + INT_REG(P1) = INT_REG(P2) P3; +} + + /* NOT_i */ AUTO_OP not_i { INT_REG(P1) = ! INT_REG(P2); @@ -572,17 +578,37 @@ INT_REG(P1) = INT_REG(P2) | INT_REG(P3); } +/* OR_i_ic */ +AUTO_OP or_i_ic { + INT_REG(P1) = INT_REG(P2) | P3; +} + /* SHL_i_ic */ AUTO_OP shl_i_ic { INT_REG(P1) = INT_REG(P2) P3; } +/* SHL_i */ +AUTO_OP shl_i { + INT_REG(P1) = INT_REG(P2) INT_REG(P3); +} + /* SHR_i_ic */ AUTO_OP shr_i_ic { INT_REG(P1) = INT_REG(P2) P3; } +/* SHR_i */ +AUTO_OP shr_i { + INT_REG(P1) = INT_REG(P2) INT_REG(P3); +} + /* XOR_i */ AUTO_OP xor_i { INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3); +} + +/* XOR_i_ic */ +AUTO_OP xor_i_ic { + INT_REG(P1) = INT_REG(P2) ^ P3; } Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.13 diff -u -r1.13 opcode_table --- opcode_table2001/09/13 16:16:38 1.13 +++ opcode_table2001/09/13 19:13:44 @@ -152,9 +152,15 @@ # Bitwise Ops and_i 3 I I I +and_i_ic 3 I I i not_i 2 I I or_i 3 I I I +or_i_ic3 I I i +shl_i 3 I I I shl_i_ic 3I I i +shr_i 3 I I I shr_i_ic 3I I i xor_i 3 I I I +xor_i_ic 3 I I i +
patch: assembly listings from assembler
This patch does a couple of things: * uses Getopt::Long for options. -c is now --checksyntax. I wasn't sure how to keep compatible (patches welcome!) * options include: --help --version --verbose --output=file --listing=file --checksyntax * produces verbose listing of what the assembler saw :) Only one nitpick with it: unknown symbols are given as 0x, unfortunately, this includes symbols which may be defined later in the file (i.e. forward jumps). Brian Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.14 diff -r1.14 assemble.pl 7a8 use Getopt::Long; 9,12c10,33 my $opt_c; if (@ARGV and $ARGV[0] eq -c) { shift @ARGV; $opt_c = 1; --- my %options; GetOptions(\%options,('checksyntax', 'help', 'version', 'verbose', 'output=s', 'listing=s')); if($options{'version'}) { print $0,'Version $Id$ ',\n; exit; } if($options{'help'}) { print $0 - Parrot Assembler Options: --checksyntaxCheck assembler syntax only, no output --help This text --listingDump assembly listing to file --output File to dump bytecode into --verboseShow what's going on --versionShow assembler version ; exit; 14a36,44 if(exists($options{'output'}) $options{'output'} eq ) { print STDERR You must provide a file with --output flag!\n; exit; } if(exists($options{'listing'}) $options{'listing'} eq ) { print STDERR You must provide a file with --listing flag!\n; exit; } 54a85,87 my $listing=PARROT ASSEMBLY LISTING - .scalar(localtime).\n\n; 62a96 my $sline=$_; 65c99,104 next if(/^\#/ || $_ eq ); --- if(/^\#/ || $_ eq ) { if($options{'listing'}) { $listing.=sprintf(%4d %08x %-44s %s\n, $line, $op_pc, '',$sline); } next; } 121a161 log_message(substituting $op for $opcode); 161a202,209 if($options{'listing'}) { # add line to listing. my $odata; foreach (unpack('l*',substr($bytecode,$op_pc))) { $odata.=sprintf(%08x ,$_); } $listing.=sprintf(%4d %08x %-44s %s\n, $line, $op_pc, $odata,$sline); } 163a212 $listing.=\n if($options{'listing'}); 173c222,229 # FIXUP --- # FIXUP (also, dump listing symbols) if($options{'listing'}) { $listing.=DEFINED SYMBOLS:\n; foreach (sort(keys(%label))) { $listing.=sprintf(\t%08x %s\n,$label{$_},$_); } } 184c240,244 exit; --- $listing.=\nUNDEFINED SYMBOLS:\n; foreach (sort(keys(%fixup))) { $listing.=\t$_\n; } exit; # some day, unresolved symbols won't be an error! 195a256,258 if($options{'listing'}) { $listing.=\nSTRING CONSTANTS\n; } 196a260 my $counter=0; 204c268,271 } --- } $listing.=sprintf(\t%04x %08x [[%s]]\n,$counter,length($_),$_) if($options{'listing'}); $counter++; 217,218c284,298 if(!$opt_c) { print $output; --- if(!$options{'checksyntax'}) { if($options{'output'} ne ) { open O,$options{'output'} || die $!; print O $output; close O; } else { print $output; } } if($options{'listing'}) { open L,$options{'listing'} || die $!; print L $listing; close L; 224a305,311 } sub log_message { my($message)=@_; if($options{'verbose'}) { print STDERR INFO ($line): $message\n; }
patches to assembler opcode_table
The assembler patches: * handle blank lines containing a label * handle constants in decimal,octal, or hex. Opcode table patch (and basic_opcodes.ops): * adds and, or, not, xor, shl, and shr. Builds ok, but coredumps in the interpreter. Any hints on what I did wrong welcome! Attached is also a test program for the new ops (which coredumps). Brian ? op.h ? config.h ? patch ? interp_guts.h ? basic_opcodes.c ? test_prog ? test2.pbc ? test3.pbc ? euclid.pbc ? bitops.pbc ? bitops+assembler.patch ? t/bitops.pasm Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.8 diff -u -r1.8 assemble.pl --- assemble.pl 2001/09/12 09:54:46 1.8 +++ assemble.pl 2001/09/12 20:24:07 @@ -62,19 +62,17 @@ push @code, $_; $pc += 1+@args; } - emit_magic(); emit_fixup_section(); emit_constants_section(); - # Now assemble $pc = 0; my $line = 0; -while ($_ = shift @code) { +foreach (@code) { $line++; chomp; s/,/ /g; - +next if(/^\s*$/); my ($opcode, @args) = split /\s+/, $_; if (!exists $opcodes{lc $opcode}) { @@ -93,6 +91,8 @@ } elsif($rtype eq D) { # a destination $args[$_]=fixup($args[$_]); + } else { + $args[$_]=oct($args[$_]) if($args[$_]=~/^0/); } $output .= pack $type, $args[$_]; } Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.8 diff -u -r1.8 basic_opcodes.ops --- basic_opcodes.ops 2001/09/12 18:39:12 1.8 +++ basic_opcodes.ops 2001/09/12 20:24:07 @@ -329,3 +329,34 @@ // NOOP AUTO_OP noop { } + +// AND_I +AUTO_OP and_i { + INT_REG(P1)=INT_REG(P2) INT_REG(P3); +} + +// NOT_I +AUTO_OP not_i { + INT_REG(P1)=!INT_REG(P2); +} + +// OR_I +AUTO_OP or_i { + INT_REG(P1)=INT_REG(P2) | INT_REG(P3); +} + +// SHL_I +AUTO_OP shl_i_ic { + INT_REG(P1)=INT_REG(P2) P3; +} + + +// SHR_I +AUTO_OP shr_i_ic { + INT_REG(P1)=INT_REG(P2) P3; +} + +// XOR +AUTO_OP xor_i { + INT_REG(P1)=INT_REG(P2) ^ INT_REG(P3); +} Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.9 diff -u -r1.9 opcode_table --- opcode_table2001/09/12 18:39:12 1.9 +++ opcode_table2001/09/12 20:24:07 @@ -102,3 +102,10 @@ clear_n0 clear_p0 +# Bitops +and_i 3 I I I +not_i 2 I I +or_i 3 I I I +shl_i_ic 3 I I i +shr_i_ic 3 I I i +xor_i 3 I I I # bitops.pasm : test bitops. # and, not, or, shl, shr, xor # Brian Wheeler ([EMAIL PROTECTED]) MAIN: set_i_icI1,0b set_i_icI2,0b set_i_icI3,0b10101010 set_i_icI4,0b01010101 set_i_icI5,0 set_s_scS1,OK set_s_scS2,NG set_s_scS3,Test Number # test 'and' set_s_scS5,AND Test print_s S5 AND_T1: inc_i I5 print_s S3 print_i I5 and_i I6,I1,I2 print_i I6 eq_i_ic I6,I2,A1_OK,A1_NG A1_OK: print_s S1 branch_ic AND_T2 A1_NG: print_s S2 branch_ic AND_T2 AND_T2: inc_i I5 print_s S3 print_i I5 and_i I6,I1,I3 eq_i_ic I6,I3,A2_OK,A3_NG A2_OK: print_s S1 branch_ic AND_T3 A3_NG: print_s S2 branch_ic AND_T3 AND_T3: inc_i I5 print_s S3 print_i I5 and_i I6,I3,I4 eq_i_ic I6,I2,A3_OK,A3_NG A3_OK: print_s S1 branch_ic OR_T A3_NG: print_s S2 branch_ic OR_T # test or OR_T: set_s_scS5,OR Test print_s S5 set_i_icI5,0 end
Resend: working patch for bitops
This seems to be working: * fixes for label-only lines in assembler * recognition of 0x, 0b, etc in constants * and, not, or, shl, shr, xor Enjoy! Brian Patch follows Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.8 diff -u -r1.8 assemble.pl --- assemble.pl 2001/09/12 09:54:46 1.8 +++ assemble.pl 2001/09/13 00:46:50 @@ -70,11 +70,12 @@ # Now assemble $pc = 0; my $line = 0; -while ($_ = shift @code) { +foreach (@code) { $line++; chomp; +next if(m/^\s*$/); # blank lines s/,/ /g; - + my ($opcode, @args) = split /\s+/, $_; if (!exists $opcodes{lc $opcode}) { @@ -93,6 +94,8 @@ } elsif($rtype eq D) { # a destination $args[$_]=fixup($args[$_]); + } else { + $args[$_]=oct($args[$_]) if($args[$_]=~/^0/); } $output .= pack $type, $args[$_]; } Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.8 diff -u -r1.8 basic_opcodes.ops --- basic_opcodes.ops 2001/09/12 18:39:12 1.8 +++ basic_opcodes.ops 2001/09/13 00:46:51 @@ -329,3 +329,33 @@ // NOOP AUTO_OP noop { } + +// AND_i +AUTO_OP and_i { + INT_REG(P1) = INT_REG(P2) INT_REG(P3); +} + +// NOT_i +AUTO_OP not_i { + INT_REG(P1) = ! INT_REG(P2); +} + +// OR_i +AUTO_OP or_i { + INT_REG(P1) = INT_REG(P2) | INT_REG(P3); +} + +// SHL_i_ic +AUTO_OP shl_i_ic { + INT_REG(P1) = INT_REG(P2) P3; +} + +// SHR_i_ic +AUTO_OP shr_i_ic { + INT_REG(P1) = INT_REG(P2) P3; +} + +// XOR_i +AUTO_OP xor_i { + INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3); +} Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.9 diff -u -r1.9 opcode_table --- opcode_table2001/09/12 18:39:12 1.9 +++ opcode_table2001/09/13 00:46:51 @@ -102,3 +102,9 @@ clear_n0 clear_p0 +and_i 3 I I I +not_i 2 I I +or_i 3 I I I +shl_i_ic 3 I I i +shr_i_ic 3 I I i +xor_i 3 I I I
pasm.pl: a different parrot assembler
I've been having tons of problems with labels in the current assembler...so I wrote my own. It should provide all of the features that the current assembler has. I'ved tested and ran all of the current t/*.pasm files. Here it is...feedback is always welcome. Brian #! /usr/bin/perl -w # # pasm.pl - take a parrot assembly file and spit out a bytecode file # This is based heavily on assemble.pl # Brian Wheeler ([EMAIL PROTECTED]) use strict; my $opt_c; if (@ARGV and $ARGV[0] eq -c) { shift @ARGV; $opt_c = 1; } # define data types my(%pack_type)=('i'='l','n'='d'); my(%real_type)=('I'='i','i'='i', 'N'='i','n'='n', 'S'='i','s'='i', 'D'='i'); # compute sizes my(%sizeof); foreach (keys(%real_type)) { $sizeof{$_}=length(pack($pack_type{$real_type{$_}},0)); } # get opcodes from guts. open GUTS, interp_guts.h; my %opcodes; while (GUTS) { next unless /\tx\[(\d+)\] = ([a-z_]+);/; $opcodes{$2}{CODE} = $1; } close GUTS; # get opcodes and their arg lists open OPCODES, opcode_table or die Can't get opcode table, $!/$^E; while (OPCODES) { next if /^\s*#/; chomp; s/^\s+//; next unless $_; my ($name, $args, @types) = split /\s+/, $_; my @rtypes=@types; @types=map { $_ = $real_type{$_}} @types; $opcodes{$name}{ARGS} = $args; $opcodes{$name}{TYPES} = [@types]; $opcodes{$name}{RTYPES}=[@rtypes]; } close OPCODES; # read source and assemble my $pc=0; my $op_pc=0; my ($bytecode,%label,%fixup,%constants,@constants); my $line=0; while() { $line++; chomp; s/^\s*//; s/\s*$//; next if(/^\#/ || $_ eq ); if(m/^((\S+):)?\s*(.+)?/) { my($label,$code)=($2,$3); if(defined($label) $label ne ) { if(exists($label{$label})) { error('$label' already defined!); } if(exists($fixup{$label})) { # backpatch everything with this PC. while(scalar(@{$fixup{$label}})) { my $op_pc=shift(@{$fixup{$label}}); my $offset=shift(@{$fixup{$label}}); substr($bytecode,$offset,4)=pack('l',($pc-$op_pc)/4); } delete($fixup{$label}); } $label{$label}=$pc; # store it. } next if(!defined($code)); 1 while $code=~s/\([^\\\]*(?:\\.[^\\\]*)*)\/constantize($1)/eg; $code=~s/,/ /g; my($opcode,@args)=split(/\s+/,$code); if (!exists $opcodes{lc $opcode}) { error(No opcode $opcode in $_); } if (@args != $opcodes{$opcode}{ARGS}) { error(Wrong arg count--got .scalar(@args). needed .$opcodes{$opcode}{ARGS}); } $bytecode .= pack l, $opcodes{$opcode}{CODE}; $op_pc=$pc; $pc+=4; foreach (0..$#args) { my($rtype)=$opcodes{$opcode}{RTYPES}[$_]; my($type)=$opcodes{$opcode}{TYPES}[$_]; if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { # its a register argument $args[$_]=~s/^[INPS](\d+)$/$1/i; $pc+=$sizeof{$rtype} } elsif($rtype eq D) { # a destination if(!exists($label{$args[$_]})) { # we have not seen it yet...put it on the fixup list push(@{$fixup{$args[$_]}},$op_pc,$pc); $args[$_]=0x; } else { $args[$_]=($label{$args[$_]}-$op_pc)/4; } $pc+=$sizeof{$rtype}; } else { $args[$_]=oct($args[$_]) if($args[$_]=~/^0/); $pc+=$sizeof{$rtype}; } $bytecode .= pack $type, $args[$_]; } } } my $output; # build file in memory # MAGIC COOKIE $output=pack($pack_type{i},0x13155a1); # FIXUP if(keys(%fixup)) { print STDERR SQUAWK! These symbols were referenced but not defined:\n; foreach (sort(keys(%fixup))) { print STDERR \t$_ at pc: ; foreach my $pc (@{$fixup{$_}}) { print STDERR sprintf(%08x ,$pc); } print STDERR \n; } exit; } else { # dump empty header $output.=pack($pack_type{i},0); } # CONSTANTS if(@constants) { my($const); # Then spit out how many constants there are, so we can allocate $const .= pack($pack_type{i}, scalar @constants); # Now emit each constant for (@constants) { $const .= pack($pack_type{i},0) x 3; # Flags, encoding, type $const .= pack($pack_type{i},length($_)); # Strlen followed by that many bytes. $const .= $_; my $pad=(length($_) % $sizeof{i}); if($pad) { $const .= \0 x ($sizeof{i}-(length($_) % $sizeof{i})); # Padding; } } $output.=pack($pack_type{i},length($const)); $output.=$const; } else { # no constants, dump empty header
jsr_ic ret support
This diff adds jsr_ic and ret to the interpreter. I don't know if my way of returning is legal, and I know there's probably issues with 64 bit machines, but it works...and that's the important part :) Right now it only has a depth of 32 and no bounds checking, but its enough to get started. Brian Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.8 diff -r1.8 basic_opcodes.ops 331a332,345 // JSR MANUAL_OP jsr_ic { interpreter-call_stack[interpreter-call_sp]=cur_opcode+2; interpreter-call_sp++; RETURN(P1); } // RET MANUAL_OP ret { interpreter-call_sp--; // probably not legal :) return interpreter-call_stack[(interpreter-call_sp)]; } Index: interpreter.h === RCS file: /home/perlcvs/parrot/interpreter.h,v retrieving revision 1.3 diff -r1.3 interpreter.h 31a32,34 unsigned int call_sp; unsigned long call_stack[32]; Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.9 diff -r1.9 opcode_table 75a76,77 jsr_ic1 D ret 0
Re: Math functions? (Particularly transcendental ones)
On Mon, 2001-09-10 at 08:47, Dan Sugalski wrote: At 08:07 PM 9/9/2001 -0400, Uri Guttman wrote: DS == Dan Sugalski [EMAIL PROTECTED] writes: DS Yeah, I can't think of a good reason for a noop. We might have one DS anyway, though, just in case one comes along anyway. in a hardware cpu they were commonly used to fill an instruction slot to keep a pipeline filled, or to follow a branch decision, or to pad a long running op. Yup, I realize that. I wasn't sure that we might not have some sort of in-memory opcode whiteout thing we need to do, in which case it'd be useful and potentially faster than recalculating a bunch of jump addresses. Here's a dumb question: will parrot allow bytecode which is stored in a perl scalar to be executed? DS Yup, in a restricted sandbox too, if you want. That way we'll be DS able to serialize code to bytestreams, spit them across the 'net, DS and execute them on the other end. will the op code table need to be sent over if it is code from a module which defines new op codes? Basically we'll build a small freeze to disk section and send it over the wire instead of freezing to disk. It'll have all the standard stuff--fixup section, constants section, and code. I was thinking about NOP this morning, and I realized that it might very well be necessary. If someone was writing a simple assembler for parrot, it might be useful for padding. Brian Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Math functions? (Particularly transcendental ones)
On Mon, 2001-09-10 at 09:16, Bryan C. Warnock wrote: On Monday 10 September 2001 10:28 am, Brian Wheeler wrote: I was thinking about NOP this morning, and I realized that it might very well be necessary. If someone was writing a simple assembler for parrot, it might be useful for padding. Pad what? How about preserving offsets during an optimization phase: add i3,i1,1 add i3,i3,8 could become add i3,i1,9 nop without having to recompute offsets for later bytecode. In the same way, you could also use it for reserving space for things like debugging code, like adding 10 nops if debugging is turned off, and using those 10 instructions for debugging if it is turned onmaintaining the relative addresses of things. Of course, one could just recompile using the parrot assembler, so this would only be for those tinkering with their own assembler, I suppose. Honestly, I don't care either way, since add i0,i0,0 is the same (basically) as a nop, but takes a little more cpu. One could always #define nop add i0,i0,0 :) Brian
Patch to assembler/disassembler + parrot asm inconsistancies
First off, here's an inconsistancy I found: In test.pasm REDO: eq_i_ic I2, I4, DONE, NEXT appears. Shouldn't this be comparing to a constant, not a register? It became a little obvious when I made a few changes to the assembler/disassembler to give more details about the data (and to allow shortcuts like add I1,I2,I3 to go to add_i I1,I2,I3, etc) There's 3 pieces: opcode_table, disassemble.pl and assemble.pl The opcode_table patch changes the argument encoding to use these terms: # i Integer constant # I Integer register # n Numeric constant # N Numeric register # s String constant? # S String register # D Destination The opcodes definitions were changed accordingly. Disassemble.pl takes the new definitions and prints things out a little prettier (test.pbc): time_i I1 0008 set_i_ic I2 0 0014 set_i_ic I3 1 0020 set_i_ic I4 1000 002c eq_i_ic I2 4 0058 0040 0040 add_i I2 I2 I3 0050 branch_ic 002c 0058 time_i I5 0060 print_i I1 0068 print_i I5 0070 print_i I2 0078 sub_i I2 I5 I1 0088 print_i I2 0090 set_i_ic I1 3 009c mul_i I4 I4 I1 00ac iton_n_i N1 I4 00b8 iton_n_i N2 I2 00c4 print_i I4 00cc print_n N1 00d4 print_i I2 00dc print_n N2 00e4 div_n N1 N1 N2 00f4 print_n N1 It also skips the magic number, and skips (but doesn't handle) the constant data. String registers aren't handled either...yet assemble.pl: this just contains workarounds to the new opcode_table format. Brian Lastly, here's the patch: Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.5 diff -u -r1.5 assemble.pl --- assemble.pl 2001/09/10 17:30:29 1.5 +++ assemble.pl 2001/09/10 22:04:29 @@ -10,6 +10,15 @@ %pack_type = (i = 'l', n = 'd', ); +my %real_type=('i'='i', + 'n'='n', + 'N'='i', + 'I'='i', + 'S'='i', + 's'='i', + 'D'='i'); + + my $sizeof_packi = length(pack($pack_type{i},1024)); @@ -20,6 +29,7 @@ s/^\s+//; next unless $_; my ($code, $name, $args, @types) = split /\s+/, $_; +@types=map { $_ = $real_type{$_}} @types; $opcodes{$name} = {CODE = $code, ARGS = $args, TYPES = [@types] Index: disassemble.pl === RCS file: /home/perlcvs/parrot/disassemble.pl,v retrieving revision 1.1 diff -u -r1.1 disassemble.pl --- disassemble.pl 2001/08/29 12:07:02 1.1 +++ disassemble.pl 2001/09/10 22:04:29 @@ -7,10 +7,20 @@ my %unpack_type; %unpack_type = (i = 'l', + I = 'l', n = 'd', + N = 'l', + D = 'l', + S = 'l', + s = 'l', ); my %unpack_size = (i = 4, n = 8, + I = 4, + N = 4, + D = 4, + S = 4, + s = 4, ); open OPCODES, opcode_table or die Can't get opcode table, $!/$^E; @@ -28,16 +38,34 @@ } } + $/ = \4; +my $magic=; +my $constants=; + $constants=; + +my $offset=0; while () { $code = unpack 'l', $_; $args = $opcodes[$code]{ARGS}; -print $opcodes[$code]{NAME}; +my $op_offset=$offset; +print sprintf(%08x,$offset), ,$opcodes[$code]{NAME}; +$offset+=4; + if ($args) { foreach (1..$args) { local $/ = \$unpack_size{$opcodes[$code]{TYPES}[$_-1]}; $data = ; - print , unpack $unpack_type{$opcodes[$code]{TYPES}[$_-1]}, $data;+ $offset+=$ {$/ }; + if($opcodes[$code]{TYPES}[$_-1] eq N || + $opcodes[$code]{TYPES}[$_-1] eq I) { + print ,$opcodes[$code]{TYPES}[$_-1],unpack $unpack_type{$opcodes[$code]{TYPES}[$_-1]}, $data; + } elsif($opcodes[$code]{TYPES}[$_-1] eq D) { + # handle destination + print ,sprintf(%08x,$op_offset+unpack($unpack_type{$opcodes[$code]{TYPES}[$_-1]},$data)*4); + } else { + print , unpack $unpack_type{$opcodes[$code]{TYPES}[$_-1]}, $data; + } } } print \n; Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.5 diff -u -r1.5 opcode_table --- opcode_table2001/09/10 15:48:36 1.5 +++ opcode_table2001/09/10 22:04:30 @@ -9,61 +9,70 @@ # not the type of the register or anything. So N3 is still an i, since that # 3 specifying the register should be packed as an integer. +# Revised arg types: +# i Integer constant +# I Integer register +# n Numeric constant +#
Re: Patch to assembler/disassembler + parrot asm
At 05:23 PM 9/10/2001 -0500, Brian Wheeler wrote: First off, here's an inconsistancy I found: In test.pasm REDO: eq_i_ic I2, I4, DONE, NEXT appears. Shouldn't this be comparing to a constant, not a register? Nope, though if I let you in on the actual secret it's help. That should really be eq_i_ic_ic. (Well, actually there should be only one label, and we fall through otherwise. It's a bug in implementation and assembly, not opcode name... :) The intention is the last _x covers the last arg, the next to last covers the next to last arg, and so on. When we run out, we repeat the innermost type. The ultimate intention is that you'd write that as a plain: eq I2, I4, DONE, NEXT or probably eq I2, I4, DONE and either way the assembler would know DONE was a constant and we needed i registers since that was specified, and emit the eq_i_ic opcode. BUT, I'm more confused now :) If eq_i_ic is really treated as /eq(_i)+_ic/ then this code still doesn't work: eq_i_ic I1,I2,NEXT,DONE because that'd be like eq_i_i_ic_ic, right? I assume that opcodes aren't going to have variable arguments at this level, so there should be a one-to-one mapping between function and opcode, right? A thought (though gross): if we restrict mneumonics to not use the underscore, then anything after _ can be the op signature. The opcode_table could use these characters for different data types: integer i integer constantj numeric n numeric constanto address a string s string constant t The file could be reorganized as: set 2 i j set 2 i i set 2 n o set 2 s t The perl scripts which create the interfaces (process_opfunc.pl, etc) could use this information to create 4 opcodes: set_ij set_ii set_no set_st When the assembler comes across 'set I1,I2', It knows the set_ii form is the one to use. The disassembler can dump it as 'set_ii I1,I2' or (I suppose) as 'set I1,I2' Also, doing it this way takes out the special cases for the comparison and jump ops: the fixups are known to be done with things that have type 'a' It became a little obvious when I made a few changes to the assembler/disassembler to give more details about the data (and to allow shortcuts like add I1,I2,I3 to go to add_i I1,I2,I3, etc) The opcode_table patch changes the argument encoding to use these terms: # i Integer constant # I Integer register # n Numeric constant # N Numeric register # s String constant? # S String register # D Destination I was using a trailing c to note a constant since we're using the opcode name as a C function name, and we're not counting on case-sensitivity in symbols. fair enough... Other than that (well, and Simon has a patch in to the repository to yank out the opcode numbers entirely from opcode_table) it looks keen. I'll take a peek and see what all depends on the opcode_table file... Brian Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Patch to assembler/disassembler + parrot asm
another thought... A thought (though gross): if we restrict mneumonics to not use the underscore, then anything after _ can be the op signature. The opcode_table could use these characters for different data types: integer i integer constant j numeric n numeric constant o address a strings string constant t The file could be reorganized as: set 2 i j set 2 i i set 2 n o set 2 s t what if the table had another column (optional) at the end: set 2 i j set_i which gave the name of the C function which implemented it. that way the assembly ops are independant of the C function names and multiple ops could map to a single C routine (if needed) I've got to know...what's the significance of the magic number? :) Brian
Re: Patch to assembler/disassembler + parrot asm
On Mon, 2001-09-10 at 19:54, Dan Sugalski wrote: At 07:45 PM 9/10/2001 -0500, Brian Wheeler wrote: If eq_i_ic is really treated as /eq(_i)+_ic/ then this code still doesn't work: eq_i_ic I1,I2,NEXT,DONE because that'd be like eq_i_i_ic_ic, right? Right. But don't forget, I screwed up the eq op--it ought to have a single destination. :) DOH! That's the trick :) I assume that opcodes aren't going to have variable arguments at this level, so there should be a one-to-one mapping between function and opcode, right? Each opcode number has a single function, yes. The same high-level opcode, for example eq or add, might map to two or more different 'real' opcodes based on the types of the args. There won't be any runtime morphing--it's more The assembler sees the first arg of foo as a numberic register and the second as a constant, so it must be foo_n_nc. A thought (though gross): if we restrict mneumonics to not use the underscore, then anything after _ can be the op signature. Too gross. We don't need to go there. :) Just checking. Also, doing it this way takes out the special cases for the comparison and jump ops: the fixups are known to be done with things that have type 'a' The jump ops will be easy to figure--either they'll take a register, a constant number, or a label. We don't allow labels that could be confused with registers. (No I0: anywhere...) I've had more thoughts about my first patch. The case issue isn't an issue since it only touches the generation tools, not the C code that's generated (at least, not directly). It also provides the additional information needed to let the assembler choose the correct opcode, and the disassembler to dump things nicely :) I've also fixed up the supporting tools. As a test case, I rebuilt test_prog, assembled test.parm, ran it and dissassembled it. Looks for for that one, at least :) Please consider this new patch. Brian Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.6 diff -u -r1.6 assemble.pl --- assemble.pl 2001/09/10 21:26:08 1.6 +++ assemble.pl 2001/09/11 02:02:15 @@ -9,7 +9,16 @@ my %pack_type; %pack_type = (i = 'l', n = 'd', - ); + ); + +my %real_type=('i'='i', + 'n'='n', + 'N'='i', + 'I'='i', + 'S'='i', + 's'='i', + 'D'='i'); + my $sizeof_packi = length(pack($pack_type{i},1024)); open GUTS, interp_guts.h; @@ -26,8 +35,11 @@ s/^\s+//; next unless $_; my ($name, $args, @types) = split /\s+/, $_; +my @rtypes=@types; +@types=map { $_ = $real_type{$_}} @types; $opcodes{$name}{ARGS} = $args; $opcodes{$name}{TYPES} = [@types]; +$opcodes{$name}{RTYPES}=[@rtypes]; } my $pc = 0; @@ -65,23 +77,17 @@ die wrong arg count--got . scalar @args. needed . $opcodes{$opcode}{ARGS}; } -$args[0] = fixup($args[0]) -if $opcode eq branch_ic and $args[0] =~ /[a-zA-Z]/; - -#if ($opcode eq eq_i_ic or $opcode eq lt_i_ic) { -if ($opcode =~ /^(eq|ne|lt|le|gt|ge)_i_ic$/) { -$args[2] = fixup($args[2]) if $args[2] =~ /[a-zA-Z]/; -$args[3] = fixup($args[3]) if $args[3] =~ /[a-zA-Z]/; -} -if ($opcode eq if_i_ic) { -$args[1] = fixup($args[1]) if $args[1] =~ /[a-zA-Z]/; -$args[2] = fixup($args[2]) if $args[2] =~ /[a-zA-Z]/; -} - print pack l, $opcodes{$opcode}{CODE}; foreach (0..$#args) { - $args[$_] =~ s/^[INPS]?(\d+)$/$1/i; - my $type = $pack_type{$opcodes{$opcode}{TYPES}[$_]}; + my($rtype)=$opcodes{$opcode}{RTYPES}[$_]; + my($type)=$opcodes{$opcode}{TYPES}[$_]; + if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { + # its a register argument + $args[$_]=~s/^[INPS](\d+)$/$1/i; + } elsif($rtype eq D) { + # a destination + $args[$_]=fixup($args[$_]); + } print pack $type, $args[$_]; } $pc += 1+@args; Index: disassemble.pl === RCS file: /home/perlcvs/parrot/disassemble.pl,v retrieving revision 1.3 diff -u -r1.3 disassemble.pl --- disassemble.pl 2001/09/10 21:45:33 1.3 +++ disassemble.pl 2001/09/11 02:02:16 @@ -8,14 +8,25 @@ my(%opcodes, @opcodes); -my %unpack_type; -%unpack_type = (i = 'l', - n = 'd', - ); + +my %unpack_type = (i = 'l', + I = 'l', + n = 'd', + N = 'l', + D = 'l', + S = 'l', + s = 'l', + ); my %unpack_size = (i = 4, n = 8, + I = 4, + N = 4, + D = 4, + S = 4, + s = 4, ); + open GUTS
Re: Patch to assembler/disassembler + parrot asm inconsistancies
On Mon, 2001-09-10 at 20:52, Dan Sugalski wrote: At 07:25 PM 9/10/2001 -0400, Bryan C. Warnock wrote: I think Dan mentioned this, but it looks like the suffixes can be derived from the args being passed in. That would greatly simply the assembler to just the function names: set, eq, add, branch. Were there problems with the scheme, is someone working on it, or did it fall through the cracks? (I'm very much in favor of such a change, and will pick it up if no one else is working on it.) No, I dont' think so, and yes, respectively. (Or, rather, we did the easy literal stuff first and planned on smartening up the assembler later. It's in the TODO even... :) Hint, Nudge, Wink: the last patch I sent that hits on the assembler should make it very easy to add better assembly checking For example, register checking could be done like this: if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { # its a register argument if($args[$_]=~m/^[INPS](\d+)$/) { my($arg_num)=$1; if($arg_num 32) { # bad register number } else { $args[$_]=$arg_num; } } else { # non-register being used! } } ... Brian
Another Patch...
This patch (which is pretty big) does: * Changes the opcode_table file to provide additional information about the operands. Case shouldn't be a problem since that data never becomes a C symbol [this is pretty much as before] * Padding errors solved: assemble.pl and bytecode.c were padding the constants incorrectly. It should have been 4-(size % 4), not just (size % 4). It is now fixed in both places. * assembler has less special cases, and should be easier to hang error checking on * disassembler dumps constant table and the format is a bit prettier, including register names, etc. Test2.pbc dumps as this: # Constants: 1 entries (32 bytes) # ID FlagsEncoding Type Size Data : 000b Hello World # Code Section : set_i_ic I2, 1 000c: set_i_ic I1, 0 0018: set_s_sc S1, [string ] 0024: eq_i_ic I1, I2, 0060, 0038 0038: length_s_i S1, I1 0044: print_s S1 004c: chopn_s_ic S1, 1 0058: branch_ic0024 0060: end Let me know what you guys think! Brian [Crap, there's some wordwrapping below. Too bad you can plug emacs into evolution :) ] Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.6 diff -u -r1.6 assemble.pl --- assemble.pl 2001/09/10 21:26:08 1.6 +++ assemble.pl 2001/09/11 03:14:32 @@ -9,7 +9,16 @@ my %pack_type; %pack_type = (i = 'l', n = 'd', - ); + ); + +my %real_type=('i'='i', + 'n'='n', + 'N'='i', + 'I'='i', + 'S'='i', + 's'='i', + 'D'='i'); + my $sizeof_packi = length(pack($pack_type{i},1024)); open GUTS, interp_guts.h; @@ -26,8 +35,11 @@ s/^\s+//; next unless $_; my ($name, $args, @types) = split /\s+/, $_; +my @rtypes=@types; +@types=map { $_ = $real_type{$_}} @types; $opcodes{$name}{ARGS} = $args; $opcodes{$name}{TYPES} = [@types]; +$opcodes{$name}{RTYPES}=[@rtypes]; } my $pc = 0; @@ -65,23 +77,17 @@ die wrong arg count--got . scalar @args. needed . $opcodes{$opcode}{ARGS}; } -$args[0] = fixup($args[0]) -if $opcode eq branch_ic and $args[0] =~ /[a-zA-Z]/; - -#if ($opcode eq eq_i_ic or $opcode eq lt_i_ic) { -if ($opcode =~ /^(eq|ne|lt|le|gt|ge)_i_ic$/) { -$args[2] = fixup($args[2]) if $args[2] =~ /[a-zA-Z]/; -$args[3] = fixup($args[3]) if $args[3] =~ /[a-zA-Z]/; -} -if ($opcode eq if_i_ic) { -$args[1] = fixup($args[1]) if $args[1] =~ /[a-zA-Z]/; -$args[2] = fixup($args[2]) if $args[2] =~ /[a-zA-Z]/; -} - print pack l, $opcodes{$opcode}{CODE}; foreach (0..$#args) { - $args[$_] =~ s/^[INPS]?(\d+)$/$1/i; - my $type = $pack_type{$opcodes{$opcode}{TYPES}[$_]}; + my($rtype)=$opcodes{$opcode}{RTYPES}[$_]; + my($type)=$opcodes{$opcode}{TYPES}[$_]; + if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) { + # its a register argument + $args[$_]=~s/^[INPS](\d+)$/$1/i; + } elsif($rtype eq D) { + # a destination + $args[$_]=fixup($args[$_]); + } print pack $type, $args[$_]; } $pc += 1+@args; @@ -112,7 +118,10 @@ for (@constants) { $size += 4*$sizeof_packi; $size += length($_); -$size += length($_) % $sizeof_packi; # Padding + my($pad)=length($_) % $sizeof_packi; + if($pad) { + $size+=$sizeof_packi-$pad; + } } $size += $sizeof_packi if @constants; # That's for the number of constants @@ -127,6 +136,9 @@ print pack($pack_type{i},0) x 3; # Flags, encoding, type print pack($pack_type{i},length($_)); # Strlen followed by that many bytes. print $_; -print \0 x (length($_) % $sizeof_packi); # Padding; + my $pad=(length($_) % $sizeof_packi); + if($pad) { + print \0 x ($sizeof_packi-(length($_) % $sizeof_packi)); # Padding; + } } } Index: bytecode.c === RCS file: /home/perlcvs/parrot/bytecode.c,v retrieving revision 1.4 diff -u -r1.4 bytecode.c --- bytecode.c 2001/09/10 21:47:26 1.4 +++ bytecode.c 2001/09/11 03:14:33 @@ -79,6 +79,7 @@ IV encoding = GRAB_IV(program_code); IV type = GRAB_IV(program_code); IV buflen = GRAB_IV(program_code); + int pad; len -= 4 * sizeof(IV); @@ -87,9 +88,11 @@ len -= buflen; /* Padding */ -if (buflen % sizeof(IV)) { -len -= buflen % sizeof(IV); -(char*)*program_code += buflen % sizeof(IV); + pad=buflen % sizeof(IV); + if(pad) { + pad=sizeof(IV)-pad; + len -= pad; + (char*)*program_code += pad; } num--; if (len 0 ||
parrot question
While waiting for Parrot (dammit, I took the wrong week off), I've been scanning the various documents and samples which have been floating around on the list. Is there a document describing Parrot syntax yet? Or is that a will be released on monday thing as well? Brian Wheeler [EMAIL PROTECTED]