Re: More speed trials
On Saturday 06 October 2001 02:04 am, Gibbs Tanton - tgibbs wrote: I think that changing from a function based implementation to a switch based implementation will help on many platforms. Someone did a patch on that, maybe we could update it and commit it. Having to go through two indirections and two array accesses to access a register probably doesn't help much either, although it won't be easy to get around that. Apart from that, there is not much else to be done. We can reprofile, but the only thing being executed are integer additions and comparisons...you can't get much more basic than that...and you are right, we are running much too slowly. I don't know. I think the loop may actually be too tight, which a switch won't necessarily help. But I'm all for reducing the overhead of unnecessary function calls. That's my project for the weekend. (Well, that and the summary.) As far as the double indirection, I moved one out of the loop and it actually slowed down. Just an hour ago, I experimented with having each function return the offset (vice the address) of the next opcode, in hopes that that might help with some performance within the loop. (Since the compiler can then guess that you may be doing pointer arithmetic around the address you previously had, vice having to dereference some random pointer. Or so the theory goes. But that also slowed things down.)-: -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Test::...
On Sat, Oct 06, 2001 at 08:19:16AM -0400, Michael Fischer wrote: Why were they there? Because 'l!' does exactly the right thing on Perl version 5.6.0 and above, and 'l' is not guaranteed to do exactly the right thing. -- It starts like fascination, it ends up like a trance You've gotta use your imagination on some of that magazine romance And these bones--they don't look so good to me Jokers talk and they all disagree One day soon, I will laugh right in the face of the poison moon
RE: acceptable development environments/tools...?
[EMAIL PROTECTED]: # Sorry to disrupt your discussion with some loosely related # question... Could anyone help me determine which development # tools/IDEs are to be used when hacking at Parrot? Whatever strikes your fancy. You could even use ed if you wanted to--although I wouldn't recommend it. :^) As long as it reads and writes ASCII and doesn't mangle the files, it's okay to use. # As you'd have guessed it, I'm relatively new to this project... # ;-). However, I do want to help out with the effort # to code a great 'system' that any other developer could # benefit from (including myself :). Welcome to the team then. :^) # Say, at home, I'm working on a Windows (ME) system. The IDEs # I have at my disposal are the MS Visual Studio (C++), CodeWarrior, # and some old DOS based C compilers. I've got CVS all set up on my # side so retrieving recent copy of the working files from the # Parrot cvs root shouldn't be of a problem. I'm also thinking # of moving to a Unix based system in a short while (since I've used # to coding on a Solaris box at work). I'm also on Win32 (Win2000, to be exact). I use WinCVS with two directories: parrot and parrot-cvs. parrot is my working copy and parrot-cvs is what's currently on CVS. I use Visual Studio.NET beta 2 ($13 from MS) as my editor, since ActiveState has a nifty Perl code editor plugin for it, and most of my work (I mostly muck with Configure, but wade into the C once in a while) is with Perl code. The actual directory structure looks like this: +--+ Perl 6 | +--+ parrot | | +--- .vcproj and related files | | +--+ parrot | | +--- working copy of CVS files | | | +--+ parrot-cvs | | +--+ parrot | | +--- pure copy of CVS files | | | +--+ babyperl | | +--- files related to babyperl | | | +--+ smoke | | +--- remote smoke-testing stuff I use PPT (Perl Power Tools--look around on the CPAN) to get various Unix utilities (although I haven't gotten their patch program to work well, diff works okay--I just apply patches via an SSH connection to a BSD box). Of course, you may choose a different solution than my cobbled-together collection of free and nonfree software--I've heard Cygwin works well for this sort of thing. Whatever tools you use, make sure you have fun working on Parrot. That is, after all, what it's all about. --Brent Dax [EMAIL PROTECTED] Configure pumpking for Perl 6 They *will* pay for what they've done.
Re: Perl6 Tinderbox
On Sat 06 Oct 2001 :58, Michael G Schwern [EMAIL PROTECTED] wrote: On Fri, Oct 05, 2001 at 05:18:07PM -0700, Zach Lipton wrote: Because the need for a tinderbox testing platform is fairly urgent right now for perl6, I am releasing my (place your favorite adjective in the blank here) tinderbox client for perl6 ahead of the near-rewrite that I am working on to use Devel::Tinderbox::Reporter (which was just written) and Test::Smoke (which wouldn't help perl6 all that much anyway. There's an existing Parrot::Smoke module, I forget where it is off hand. CPAN/authors/id/M/MB/MBARBON/Parrot-Smoke-0.02.tar.gz I'd expext -- H.Merijn BrandAmsterdam Perl Mongers (http://www.amsterdam.pm.org/) using perl-5.6.1, 5.7.1 623 on HP-UX 10.20 11.00, AIX 4.2, AIX 4.3, WinNT 4, Win2K pro WinCE 2.11 often with Tk800.022 /| DBD-Unify ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/
vtable.h
I've just committed some files which generate vtable.h; these were actually left over from my experiments of a *long* time ago. [1] It might need quite a few changes, but it's a good start, and I think it's general enough to survive. The next thing I want to do with it is have something akin to process_op_func.pl which takes a macro-ized description of the vtable functions and turns them into real C code. Volunteers welcome, or I'll write it myself. :) This could be a good place, however, for newcomers to Parrot to get involved with something relatively straightforward but pretty crucial. Hint, hint... Simon [1] In short, back in the beginning, Dan and I independently started implementing Parrot; Dan's version was more complete and sensible by the time we got together to discuss it, so his became the codebase that was checked into CVS. Dan had started with the interpreter main loop and ops, and I had started with PMCs. -- Actually Perl *can* be a Bondage Discipline language but it's unique among such languages in that it lets you use safe words. -- Piers Cawley
RE: vtable.h
for add we will end up with void add( PMC* self, PMC* left, PMC* right ); does this represent: self = left + right or some other ordering? -Original Message- From: Simon Cozens To: [EMAIL PROTECTED] Sent: 10/6/2001 7:50 AM Subject: vtable.h I've just committed some files which generate vtable.h; these were actually left over from my experiments of a *long* time ago. [1] It might need quite a few changes, but it's a good start, and I think it's general enough to survive. The next thing I want to do with it is have something akin to process_op_func.pl which takes a macro-ized description of the vtable functions and turns them into real C code. Volunteers welcome, or I'll write it myself. :) This could be a good place, however, for newcomers to Parrot to get involved with something relatively straightforward but pretty crucial. Hint, hint... Simon [1] In short, back in the beginning, Dan and I independently started implementing Parrot; Dan's version was more complete and sensible by the time we got together to discuss it, so his became the codebase that was checked into CVS. Dan had started with the interpreter main loop and ops, and I had started with PMCs. -- Actually Perl *can* be a Bondage Discipline language but it's unique among such languages in that it lets you use safe words. -- Piers Cawley
RE: vtable.h
2 other things 1.) Will each different type of PMC have its own vtable, function definitions, etc or will they all share everything with switches on type in the function definitions. 2.) Can you give an idea of what you think the macro-ized function should look like (an example would be great.) Thanks! Tanton -Original Message- From: Simon Cozens To: [EMAIL PROTECTED] Sent: 10/6/2001 7:50 AM Subject: vtable.h I've just committed some files which generate vtable.h; these were actually left over from my experiments of a *long* time ago. [1] It might need quite a few changes, but it's a good start, and I think it's general enough to survive. The next thing I want to do with it is have something akin to process_op_func.pl which takes a macro-ized description of the vtable functions and turns them into real C code. Volunteers welcome, or I'll write it myself. :) This could be a good place, however, for newcomers to Parrot to get involved with something relatively straightforward but pretty crucial. Hint, hint... Simon [1] In short, back in the beginning, Dan and I independently started implementing Parrot; Dan's version was more complete and sensible by the time we got together to discuss it, so his became the codebase that was checked into CVS. Dan had started with the interpreter main loop and ops, and I had started with PMCs. -- Actually Perl *can* be a Bondage Discipline language but it's unique among such languages in that it lets you use safe words. -- Piers Cawley
Re: vtable.h
On Sat, Oct 06, 2001 at 08:11:30AM -0500, Gibbs Tanton - tgibbs wrote: void add( PMC* self, PMC* left, PMC* right ); does this represent: self = left + right Yes. -- UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. -- Doug Gwyn
RE: vtable.h
2.) Can you give an idea of what you think the macro-ized function should look like (an example would be great.) No, because then you'll go away and implement it, and I want to encourage some fresh blood to do that. :) Okey Dokey...I promise not to do it :) Seriously, before I do that, I need to seriously think about what vtable accessors ought to look like; (pmc1-vtable[want_vtbl_add])(pmc1, pmc2, pmc3) is going to scare people away quickly, and, while PMC_ADD(pmc1, pmc2, pmc3) s cute, (and allows us to autogenerate Parrot byte ops ;) Macro Hell is something we want to avoid. Well, you currently have vtable as a struct so you would say pmc1-vtable-add( pmc1, pmc2, pmc3 ) which doesn't look that bad. Really, I would imagine all of this would be autogenerated by process_opfunc.pl so it doesn't matter what the longhand looks like. We can use PMC_ADD in basic_opcodes.ops just like we use INT_CONST or whatever and the macro is stripped out of the perl. Also, how will adds of different types be handled. In the above if pmc2 is an int and pmc3 is a float we're going to have to know that and do a switch or something to convert to/create the right type. Tanton
[patch] give Configure a policy
I've modified Configure.pl to take defaults from a previous build (if there was one). This should play nicely with hints, and '--defaults' by doing the Right Thing. I've added a '--nopolicy' option to disable this. Patch below sig. Alex Gough -- W.W- A little nonsense now and then is relished by the wisest men. ## Index: Configure.pl === RCS file: /home/perlcvs/parrot/Configure.pl,v retrieving revision 1.23 diff -u -r1.23 Configure.pl --- Configure.pl2001/10/04 20:19:38 1.23 +++ Configure.pl2001/10/06 14:07:16 @@ -8,9 +8,11 @@ use Getopt::Long; use ExtUtils::Manifest qw(manicheck); -my($opt_debugging, $opt_defaults, $opt_version, $opt_help) = (0, 0, 0, 0); +my($opt_debugging, $opt_defaults, $opt_version, + $opt_help, $opt_nopolicy) = (0) x 5; my(%opt_defines); my $result = GetOptions( + 'nopolicy' = \$opt_nopolicy, 'debugging!' = \$opt_debugging, 'defaults!' = \$opt_defaults, 'version'= \$opt_version, @@ -29,6 +31,7 @@ Options: --debugging Enable debugging --defaults Accept all default values + --nopolicy Do not take values from previous build --define name=value Defines value name as value --help This text --versionShow assembler version @@ -60,25 +63,39 @@ #Some versions don't seem to have ivtype or nvtype--provide #defaults for them. #XXX Figure out better defaults +my %policy; +unless ($opt_nopolicy) { +eval ' + require Parrot::Config; + %policy = %Parrot::Config::PConfig; +'; +if ($@) { + print No policy available, using defaults\n; +} +else { + print Using defaults from earlier build\n; + $policy{__have_policy} = 1; +} +} my(%c)=( - iv = ($Config{ivtype} ||'long'), + iv = ($policy{ivtype} || $Config{ivtype} ||'long'), intvalsize = undef, - nv = ($Config{nvtype} ||'double'), + nv = ($policy{nvtype} || $Config{nvtype} ||'double'), numvalsize = undef, - opcode_t = ($Config{ivtype} ||'long'), + opcode_t =($policy{ivtype} || $Config{ivtype} ||'long'), longsize = undef, - cc = $Config{cc}, + cc = ($policy{cc} || $Config{cc}), #ADD C COMPILER FLAGS HERE - ccflags = $Config{ccflags}. -I./include, - libs = $Config{libs}, + ccflags = ($policy{ccflags}||$Config{ccflags}. -I./include), + libs =($policy{libs} ||$Config{libs}), cc_debug = '-g', o ='.o', # object files extension - exe = $Config{_exe}, + exe = ($policy{exe} || $Config{_exe}), - ld = $Config{ld}, + ld = ($policy{ld} || $Config{ld}), ld_out = '-o ', # ld output file ld_debug = '', # include debug info in executable @@ -91,8 +108,9 @@ @c{keys %opt_defines}=@opt_defines{keys %opt_defines}; # set up default values +# don't need these if previously complied, can take from Parrot::Config my $hints = hints/ . lc($^O) . .pl; -if(-f $hints) { +if(!$policy{__have_policy} -f $hints) { local($/); open HINT, $hints or die Unable to open hints file '$hints'; my $hint = HINT;
Re: More speed trials
On Sat, Oct 06, 2001 at 12:44:59AM -0400, Bryan C. Warnock wrote: Ops/sec:31,716,577.291820 Wowsers. What are you running that thing on? For comparison, on this machine: Parrot Ops/sec: 500.00 Python2 ops/sec: 3289276.607351 (Python 1 is slightly faster - at the moment.) That's not fast enough; once PMCs get introduced, that advantage is going to fall away. What are we doing wrong? :( Python uses a switch, however, maybe that's it. -- pudge i've dreamed in Perl many time, last night i dreamed in Make, and that just sucks.
Re: vtable.h
On Sat, Oct 06, 2001 at 09:01:34AM -0500, Gibbs Tanton - tgibbs wrote: which doesn't look that bad. Really, I would imagine all of this would be autogenerated by process_opfunc.pl so it doesn't matter what the longhand looks like. Not really; I expect that external code will also manipulate PMCs. Also, how will adds of different types be handled. In the above if pmc2 is an int and pmc3 is a float we're going to have to know that and do a switch or something to convert to/create the right type. There'll actually (and I need to change my vtable code to reflect this) be several versions of each vtable function, depending on the relative type of each PMC. Basically, there'll be two easily optimizable versions (i.e. types are the same, or types can be easily converted with a cast or simple function) and a non-optimized version, which would actually be the naive implementation in many cases. (These types are way out of my depth - call -get_integer on each one, and add the result.) I didn't think that up, by the way, it was Dan's idea. :) -- Oh dear. I've just realised that my fvwm config lasted longer than my marriage, in that case. - Anonymous
Re: More speed trials
On Saturday 06 October 2001 10:58 am, Dan Sugalski wrote: It's the function pointer indirection, to some extent. The switch dispatch loop should help some. Also I don't think you should make too many performance comparisons until we've got something equivalent to compare with. Unless we're already slower. ;-) (Which is what I wanted to check.) -- Bryan C. Warnock [EMAIL PROTECTED]
Re: More speed trials
At 11:36 AM 10/6/2001 -0400, Bryan C. Warnock wrote: On Saturday 06 October 2001 10:58 am, Dan Sugalski wrote: It's the function pointer indirection, to some extent. The switch dispatch loop should help some. Also I don't think you should make too many performance comparisons until we've got something equivalent to compare with. Unless we're already slower. ;-) True, true. But we're not, which is good. A 200% speed improvement's sort of good, depending on what ops we execute. And it looks like we execute considerably more iterations of the actual loop than perl 5 does, which is also good. (Which is what I wanted to check.) I'm glad you did. Rational benchmarks are good, even if they tell us something we don't want to hear... :) Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More speed trials
At 05:46 PM 10/6/2001 +0200, Paolo Molaro wrote: I get about the same number for Parrot on my K6-400, but compiling with -O2 gets it up to 11,500,000, so maybe your forgot to use -O2 or it may be the laptop in power-saving mode:-) The current mono interp can do at least twice that many ops using a switch. Ah, that's the number I wanted. So mono manages 23M ops/sec, then? Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More speed trials
On Oct 06, Gibbs Tanton - tgibbs [EMAIL PROTECTED] took up a keyboard and banged out I think that changing from a function based implementation to a switch based implementation will help on many platforms. Someone did a patch on that, maybe we could update it and commit it. I'll revise it to fit current state of code and kick it back out to you and the list before Monday AM. Michael -- Michael Fischer 7.5 million years to run [EMAIL PROTECTED]printf %d, 0x2a; -- deep thought
[PATCH] vtable.tbl: REGEX pointer
Index: vtable.tbl === RCS file: /home/perlcvs/parrot/vtable.tbl,v retrieving revision 1.1 diff -u -r1.1 vtable.tbl --- vtable.tbl 2001/10/06 12:41:57 1.1 +++ vtable.tbl 2001/10/06 16:56:14 @@ -35,5 +35,5 @@ void logical_orPMC* leftPMC* right void logical_and PMC* leftPMC* right void logical_not PMC* left -void match PMC* leftREGEX re +void match PMC* leftREGEX* re void repeatPMC* leftPMC* right -- Bryan C. Warnock [EMAIL PROTECTED]
Re: vtable.h
On Sat, 6 Oct 2001, Simon Cozens wrote: On Sat, Oct 06, 2001 at 09:01:34AM -0500, Gibbs Tanton - tgibbs wrote: Also, how will adds of different types be handled. In the above if pmc2 is an int and pmc3 is a float we're going to have to know that and do a switch or something to convert to/create the right type. There'll actually (and I need to change my vtable code to reflect this) be several versions of each vtable function, depending on the relative type of each PMC. Basically, there'll be two easily optimizable versions (i.e. types are the same, or types can be easily converted with a cast or simple function) and a non-optimized version, which would actually be the naive implementation in many cases. (These types are way out of my depth - call -get_integer on each one, and add the result.) So would it be something like(ultimtaely put into a macro): AUTO_OP add_p_p_p { if (!P1) CREATE_PMC(P1); if (!P2 || !P3) throw exception; // however this is done in Parrot P2-vtable-add[P3-type]-(interp, P1, P2, P3); //in macro } In this way each vtable operation is really an array of handlers for each possible type of input. This avoids any comparisons. Invalid comparisons all share a common function (which throws an invalid data-intermingling exception). int_pmc_vtable = { ., { pmc_vtable_add_int_int, pmc_vtable_add_int_float, pmc_vtable_add_int_string, pmc_vtable_add_int_iconst, pmc_vtable_add_int_fconst, ... }, ... }; // maps to RET_DT pmc_vtable_add_int_int(interp_t*, PMC*, PMC*, PMC*) AUTO_VOP add_int_int { UPGRADE(P1,PMC_INT); P1-ival = P2-ival + P3-ival; } AUTO_VOP add_int_float { UPGRADE(P1,PMC_INT); P1-ival = P2-ival + P3-fval; } AUTO_VOP add_int_iconst { UPGRADE(P1,PMC_INT); P1-ival = P2-ival + P3; } AUTO_VOP add_int_fconst { UPGRADE(P1,PMC_INT); P1-ival = P2-ival + Parrot_float_constants[P3]; } AUTO_VOP add_int_string { UPGRADE(P3,PMC_INT); UPGRADE(P1,PMC_INT); P1-ival = P2-ival + P3-ival; } Alternatively, if we can't be both a string AND an int in PMC: AUTO_VOP add_int_string { int p3ival = PMC_STR_TO_INT(P3); UPGRADE(P1,PMC_INT); P1-ival = P2-ival + p3ival; } This assumes that a = b op c will be the same as a = b.op( c ), which I think is fair. Thus add_float_int produces a float while add_int_float produces an int. The compiler can worry about the order or the parameters. I don't think there's much value in writing separate a op= b since you could just do: P1-vtable-add[P1-type]-(interp, P1, P1, P2); With hardly any additional overhead. The optimized code might have been: AUTO_VOP inc_int_int { P1-ival += P2-ival; // avoids casting P1 } But now you have LOTS more vtable ops. My question at this point is if the PMC's are polymorphic like Perl5 or if there is an explicit type type. Polymorphics can make for vary large vtable sub-arrays. (int, int_float, int_float_string, int_string, etc). If PMC-types are bit-masked (for easy upgrading) such as: O O O O ^ ^ ^ | | | ... INT FLOAT STR We could apply a macro that extract the desired type. Such as GET_PMC_TYPE_INT(Px) which if it is of type int, it returns int, else float else string. #define GET_PMC_TYPE_INT(type) (type PMC_INT)?PMC_INT : (type PMC_FLOAT)?PMC_FLOAT : (type PMC_STRING)?PMC_STRING : type Likewise GET_PMC_TYPE_FLOAT would return first float then int then string It's not as fast because we're not avoiding the nested if-statements, but it's easy enough to read. P2-vtable-add[ GET_PMC_TYPE_INT(P3-type) ]-(...) Ideally, the bit-pattern for the pmc-type is numerically small (for small sub-arrays). enum PMC_TYPES { PMC_INT, PMC_FLOAT, PMC_STR, PMC_INT_FLOAT, PMC_INT_STR, PMC_INT_FLOAT_STR, PMC_FLOAT, ... }; In this way we simply map everything that has INT in it to the same handler. No conditionals at all (but lots and lots of vtable space). Thankfully this is constant and could be assigned globally such that there is no intialization overhead. -Michael
Re: vtable.h
On Sat, 6 Oct 2001, Michael Maraist wrote: My question at this point is if the PMC's are polymorphic like Perl5 or if there is an explicit type type. Polymorphics can make for vary large vtable sub-arrays. (int, int_float, int_float_string, int_string, etc). If PMC-types are bit-masked (for easy upgrading) such as: O O O O ^ ^ ^ | | | ... INT FLOAT STR We could apply a macro that extract the desired type. Such as GET_PMC_TYPE_INT(Px) which if it is of type int, it returns int, else float else string. #define GET_PMC_TYPE_INT(type) (type PMC_INT)?PMC_INT : (type PMC_FLOAT)?PMC_FLOAT : (type PMC_STRING)?PMC_STRING : type Likewise GET_PMC_TYPE_FLOAT would return first float then int then string It's not as fast because we're not avoiding the nested if-statements, but it's easy enough to read. P2-vtable-add[ GET_PMC_TYPE_INT(P3-type) ]-(...) Ops! Stupid me. I forgot that at the add_p_p_p level we don't know that it's an INT / FLOAT, etc. The only way that we could use bit-masked types is if we wrote a complex if statement: ( P2 PMC_INT )? (P3 PMC_INT ? PMC_INT : (P3 PMC_FLOAT ? PMC_FLOAT : PMC_STR ) : (P2 PMC_FLOAT)? (P3 PMC_FLOAT ? PMC_FLOAT : ( P3 PMC_INT ) ? PMC_INT : PMC_STR ) : (P2 PMC_STR)? ( P3 PMC_STR ? PMC_STR : This is beyond ugly, not to mention not upgradable if/when we add new types. So unless we're using an enum-style type-compaction doubly indirected vtables won't be feasible. Ideally, the bit-pattern for the pmc-type is numerically small (for small sub-arrays). enum PMC_TYPES { PMC_INT, PMC_FLOAT, PMC_STR, PMC_INT_FLOAT, PMC_INT_STR, PMC_INT_FLOAT_STR, PMC_FLOAT, ... }; In this way we simply map everything that has INT in it to the same handler. No conditionals at all (but lots and lots of vtable space). Thankfully this is constant and could be assigned globally such that there is no intialization overhead. -Michael -Michael
Re: vtable.h
On Saturday 06 October 2001 01:13 pm, Michael Maraist wrote: So would it be something like(ultimtaely put into a macro): AUTO_OP add_p_p_p { if (!P1) CREATE_PMC(P1); if (!P2 || !P3) throw exception; // however this is done in Parrot P2-vtable-add[P3-type]-(interp, P1, P2, P3); //in macro } In this way each vtable operation is really an array of handlers for each possible type of input. Arghh, no. Surely you don't mean 'each possible' the way that I'm reading 'each possible'? [1] int_pmc_vtable = { ., { pmc_vtable_add_int_int, pmc_vtable_add_int_float, pmc_vtable_add_int_string, pmc_vtable_add_int_iconst, pmc_vtable_add_int_fconst, ... }, ... }; And each and every class, object, and package that potentially creates and/or modifes its vtable. {snip} This assumes that a = b op c will be the same as a = b.op( c ), which I think is fair. Thus add_float_int produces a float while add_int_float produces an int. The compiler can worry about the order or the parameters. add int_float should also produce a float. (Barring 'use integer', string typing, or overloading.) {snip} My question at this point is if the PMC's are polymorphic like Perl5 or if there is an explicit type type. Polymorphics can make for vary large vtable sub-arrays. (int, int_float, int_float_string, int_string, etc). Polymorphic plus, I believe. [1] And don't call me Shirley. -- Bryan C. Warnock [EMAIL PROTECTED]
printf format strings
What are our short- and long-term goals for handling printf formats for configurable types? This fixes the ones not dependent on the answer above. I'm also wrapping some lengthy lines. And why aren't the coding standards up on dev.perl.org? -- Bryan C. Warnock [EMAIL PROTECTED] Index: packfile.c === RCS file: /home/perlcvs/parrot/packfile.c,v retrieving revision 1.10 diff -u -r1.10 packfile.c --- packfile.c 2001/10/06 00:57:43 1.10 +++ packfile.c 2001/10/06 17:29:56 @@ -1689,10 +1689,15 @@ case PFC_STRING: printf([ 'PFC_STRING', {\n); printf(FLAGS= 0x%04x,\n, self-string-flags); -printf(ENCODING = %ld,\n, (long) self-string-encoding-which); -printf(TYPE = %ld,\n, (long) self-string-type); -printf(SIZE = %ld,\n, (long) self-string-bufused); -printf(DATA = '%s'\n, self-string-bufstart); /* TODO: Not a good idea in general */ +printf(ENCODING = %ld,\n, +(long) self-string-encoding-which); +printf(TYPE = %ld,\n, +(long) self-string-type); +printf(SIZE = %ld,\n, +(long) self-string-bufused); +/* TODO: Not a good idea in general */ +printf(DATA = '%s'\n, +(char *) self-string-bufstart); printf(} ],\n); break; Index: test_main.c === RCS file: /home/perlcvs/parrot/test_main.c,v retrieving revision 1.14 diff -u -r1.14 test_main.c --- test_main.c 2001/10/06 00:57:43 1.14 +++ test_main.c 2001/10/06 17:29:56 @@ -46,13 +46,22 @@ int i; time_t foo; -printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart); +printf(String %p has length %i: %.*s\n, (void *) s, +(int) string_length(s), (int) string_length(s), +(char *) s-bufstart); string_concat(s, t, 0); -printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart); +printf(String %p has length %i: %.*s\n, (void *) s, +(int) string_length(s), (int) string_length(s), +(char *) s-bufstart); string_chopn(s, 4); -printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart); +printf(String %p has length %i: %.*s\n, (void *) s, +(int) string_length(s), (int) string_length(s), +(char *) s-bufstart); string_chopn(s, 4); -printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart); +printf(String %p has length %i: %.*s\n, (void *) s, +(int) string_length(s), (int) string_length(s), +(char *) s-bufstart); + foo = time(0); for (i = 0; i 1; i++) { string_concat(s, t, 0);
[PATCH] non-init var possibility
mask and max_to_alloc are unitialized if the size requested is less that 1. (Which it could be, since INTVAL is signed.) Of course, if it happens, you should get what you deserve, but this at least horks them cleanly. Creation of an UINTVAL (UNTVAL? :-) and subsequent patches will follow pending feedback. Is the behavior of malloc(0) consistent? Index: memory.c === RCS file: /home/perlcvs/parrot/memory.c,v retrieving revision 1.12 diff -u -r1.12 memory.c --- memory.c2001/10/06 00:57:43 1.12 +++ memory.c2001/10/06 17:39:55 @@ -40,8 +40,8 @@ */ void * mem_allocate_aligned(INTVAL size) { -ptrcast_t max_to_alloc; -ptrcast_t mask; +ptrcast_t max_to_alloc = 0; +ptrcast_t mask = 0; ptrcast_t i; void *mem = NULL; -- Bryan C. Warnock [EMAIL PROTECTED]
[PATCH] packfile.c another uninit var potential
Index: packfile.c === RCS file: /home/perlcvs/parrot/packfile.c,v retrieving revision 1.10 diff -u -r1.10 packfile.c --- packfile.c 2001/10/06 00:57:43 1.10 +++ packfile.c 2001/10/06 17:53:04 @@ -1507,11 +1507,12 @@ if (!self) { /* TODO: OK to gloss over this? */ -return 0; +return (opcode_t) 0; } switch(self-type) { case PFC_NONE: +packed_size = 0; break; case PFC_INTEGER: @@ -1533,12 +1534,17 @@ break; default: +packed_size = 0; break; } /* Tack on space for the initial type and size fields */ - -return packed_size + 2 * sizeof(opcode_t); +if (packed_size) { +return packed_size + 2 * sizeof(opcode_t); +} +else { +return 0; +} } -- Bryan C. Warnock [EMAIL PROTECTED]
RE: [PATCH] non-init var possibility
No, the behavior of malloc(0) is implementation defined. -Original Message- From: Bryan C. Warnock To: [EMAIL PROTECTED] Sent: 10/6/2001 12:43 PM Subject: [PATCH] non-init var possibility mask and max_to_alloc are unitialized if the size requested is less that 1. (Which it could be, since INTVAL is signed.) Of course, if it happens, you should get what you deserve, but this at least horks them cleanly. Creation of an UINTVAL (UNTVAL? :-) and subsequent patches will follow pending feedback. Is the behavior of malloc(0) consistent? Index: memory.c === RCS file: /home/perlcvs/parrot/memory.c,v retrieving revision 1.12 diff -u -r1.12 memory.c --- memory.c2001/10/06 00:57:43 1.12 +++ memory.c2001/10/06 17:39:55 @@ -40,8 +40,8 @@ */ void * mem_allocate_aligned(INTVAL size) { -ptrcast_t max_to_alloc; -ptrcast_t mask; +ptrcast_t max_to_alloc = 0; +ptrcast_t mask = 0; ptrcast_t i; void *mem = NULL; -- Bryan C. Warnock [EMAIL PROTECTED]
RE: [PATCH] vtable.tbl: REGEX pointer
Thanks! Applied. -Original Message- From: Bryan C. Warnock To: [EMAIL PROTECTED] Sent: 10/6/2001 11:56 AM Subject: [PATCH] vtable.tbl: REGEX pointer Index: vtable.tbl === RCS file: /home/perlcvs/parrot/vtable.tbl,v retrieving revision 1.1 diff -u -r1.1 vtable.tbl --- vtable.tbl 2001/10/06 12:41:57 1.1 +++ vtable.tbl 2001/10/06 16:56:14 @@ -35,5 +35,5 @@ void logical_orPMC* leftPMC* right void logical_and PMC* leftPMC* right void logical_not PMC* left -void match PMC* leftREGEX re +void match PMC* leftREGEX* re void repeatPMC* leftPMC* right -- Bryan C. Warnock [EMAIL PROTECTED]
RE: vtable.h
I could help you with the process_op_func.pl thing. Unless, you've already coded it yourself! :-). Cheers, Vladimir Bogdanov. ps. have to figure how to get the WinCVS thing work... can't seem to be able to access cvs.perl.org. I've used the following set up: CVSROOT = :pserver:[EMAIL PROTECTED]:/home/perlcvs Authentication = passwd file on the remote host. any suggestion on how to make it work? -Original Message- From: Gibbs Tanton - tgibbs [mailto:[EMAIL PROTECTED]] Sent: Saturday, October 06, 2001 6:12 AM To: 'Simon Cozens '; '[EMAIL PROTECTED] ' Subject: RE: vtable.h for add we will end up with void add( PMC* self, PMC* left, PMC* right ); does this represent: self = left + right or some other ordering? -Original Message- From: Simon Cozens To: [EMAIL PROTECTED] Sent: 10/6/2001 7:50 AM Subject: vtable.h I've just committed some files which generate vtable.h; these were actually left over from my experiments of a *long* time ago. [1] It might need quite a few changes, but it's a good start, and I think it's general enough to survive. The next thing I want to do with it is have something akin to process_op_func.pl which takes a macro-ized description of the vtable functions and turns them into real C code. Volunteers welcome, or I'll write it myself. :) This could be a good place, however, for newcomers to Parrot to get involved with something relatively straightforward but pretty crucial. Hint, hint... Simon [1] In short, back in the beginning, Dan and I independently started implementing Parrot; Dan's version was more complete and sensible by the time we got together to discuss it, so his became the codebase that was checked into CVS. Dan had started with the interpreter main loop and ops, and I had started with PMCs. -- Actually Perl *can* be a Bondage Discipline language but it's unique among such languages in that it lets you use safe words. -- Piers Cawley
RE: [PATCH] non-init var possibility
In message [EMAIL PROTECTED] Gibbs Tanton - tgibbs [EMAIL PROTECTED] wrote: No, the behavior of malloc(0) is implementation defined. It is, yes, but there are only two legal results according to the ISO C standard: If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. In other words it can't crash or do anything else undesirable, and the result will always be something that can't be dereferenced, but can be freed (given that the standard requires free(NULL) to work). Given that, although we can't say the behaviour is strictly speaking consistent it is true that as far as performing normal operations on the pointer go you are unlikely to notice which behaviour a given platform has chosen. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
[Patch] Lint, take two.
Here's a replacement for my previous patch. This one includes the following: Makefile target for lint (runs lclint with some very permissive settings) Fixes some ignored return values A few minor casts. --Josh -- Josh Wilmes ([EMAIL PROTECTED]) | http://www.hitchhiker.org Index: Configure.pl === RCS file: /home/perlcvs/parrot/Configure.pl,v retrieving revision 1.23 diff -u -r1.23 Configure.pl --- Configure.pl 2001/10/04 20:19:38 1.23 +++ Configure.pl 2001/10/06 21:17:42 @@ -72,7 +72,8 @@ cc = $Config{cc}, #ADD C COMPILER FLAGS HERE - ccflags = $Config{ccflags}. -I./include, +cc_inc = -I./include, + ccflags = $Config{ccflags}, libs = $Config{libs}, cc_debug = '-g', o = '.o', # object files extension Index: Makefile.in === RCS file: /home/perlcvs/parrot/Makefile.in,v retrieving revision 1.18 diff -u -r1.18 Makefile.in --- Makefile.in 2001/10/06 12:41:57 1.18 +++ Makefile.in 2001/10/06 21:17:42 @@ -9,7 +9,7 @@ #DO NOT ADD C COMPILER FLAGS HERE #Add them in Configure.pl--look for the #comment 'ADD C COMPILER FLAGS HERE' -CFLAGS = ${ccflags} ${cc_debug} +CFLAGS = ${ccflags} ${cc_inc} ${cc_debug} C_LIBS = ${libs} @@ -19,6 +19,9 @@ TEST_PROG = test_prog${exe} PDUMP = pdump${exe} +LINT = lclint +LINTFLAGS = +showscan +posixlib -weak +longintegral +matchanyintegral -formattype + .c$(O): $(CC) $(CFLAGS) -o $@ -c $ @@ -86,3 +89,9 @@ update: cvs -q update -dP + +lint: test_prog pdump + $(LINT) ${cc_inc} $(LINTFLAGS) `echo $(O_FILES) | sed 's/\.o/\.c/g'` + $(LINT) ${cc_inc} $(LINTFLAGS) test_main.c + $(LINT) ${cc_inc} $(LINTFLAGS) pdump.c + Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.32 diff -u -r1.32 basic_opcodes.ops --- basic_opcodes.ops 2001/10/06 00:57:43 1.32 +++ basic_opcodes.ops 2001/10/06 21:17:43 @@ -140,7 +140,7 @@ /* TIME Ix */ AUTO_OP time_i { - INT_REG(P1) = time(NULL); + INT_REG(P1) = (INTVAL)time(NULL); } /* PRINT Ix */ @@ -316,7 +316,7 @@ /* TIME Nx */ AUTO_OP time_n { - NUM_REG(P1) = time(NULL); + NUM_REG(P1) = (FLOATVAL)time(NULL); } /* PRINT Nx */ Index: interpreter.c === RCS file: /home/perlcvs/parrot/interpreter.c,v retrieving revision 1.23 diff -u -r1.23 interpreter.c --- interpreter.c 2001/10/03 16:21:30 1.23 +++ interpreter.c 2001/10/06 21:17:44 @@ -235,6 +235,8 @@ /* The default opcode function table would be a good thing here... */ { +/*@-castfcnptr@*/ + opcode_t *(**foo)(); foo = mem_sys_allocate(2048 * sizeof(void *)); Index: packfile.c === RCS file: /home/perlcvs/parrot/packfile.c,v retrieving revision 1.10 diff -u -r1.10 packfile.c --- packfile.c 2001/10/06 00:57:43 1.10 +++ packfile.c 2001/10/06 21:17:47 @@ -1306,27 +1306,28 @@ #if TRACE_PACKFILE printf(PackFile_Constant_unpack(): Unpacking no-type constant...\n); #endif +return 1; break; case PFC_INTEGER: #if TRACE_PACKFILE printf(PackFile_Constant_unpack(): Unpacking integer constant...\n); #endif -PackFile_Constant_unpack_integer(self, cursor, size); +return(PackFile_Constant_unpack_integer(self, cursor, size)); break; case PFC_NUMBER: #if TRACE_PACKFILE printf(PackFile_Constant_unpack(): Unpacking number constant...\n); #endif -PackFile_Constant_unpack_number(self, cursor, size); +return(PackFile_Constant_unpack_number(self, cursor, size)); break; case PFC_STRING: #if TRACE_PACKFILE printf(PackFile_Constant_unpack(): Unpacking string constant...\n); #endif -PackFile_Constant_unpack_string(self, cursor, size); +return(PackFile_Constant_unpack_string(self, cursor, size)); break; default: @@ -1335,7 +1336,7 @@ break; } -return 1; +/*NOTREACHED*/ } Index: pdump.c === RCS file: /home/perlcvs/parrot/pdump.c,v retrieving revision 1.3 diff -u -r1.3 pdump.c --- pdump.c 2001/09/30 20:25:22 1.3 +++ pdump.c 2001/10/06 21:17:47 @@ -60,7 +60,10 @@ pf = PackFile_new(); -PackFile_unpack(pf, packed, packed_size); +if (!PackFile_unpack(pf, packed, packed_size)) { +printf( Can't unpack.\n ); +return 1; +} PackFile_dump(pf); PackFile_DELETE(pf); Index: register.c === RCS file: /home/perlcvs/parrot/register.c,v retrieving revision 1.10 diff -u -r1.10 register.c ---
Re: More speed trials
1) Assuming a core set of unoverrideable opcodes 0-128 (so I don't need to differentiate between core and alternate opcodes.) 2) Maintaining each operation as a block (so that any necessary variables are declared locally to each case.) 3) Incrementing the pc pointer directly. 4) Accessing the necessary registers as current written (from the interpreter struct.) Benchmarks on test.pasm: Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long Function table: 31,712,475 ops/sec Switch hybrid: 39,215,686 ops/sec (+24%) Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long Function table: 13,181,019 ops/sec Switch hybrid: 18,416,206 ops/sec (+40%) This is relatively consistent with my pre-Parrot testing. If the model holds, reserving 256 (vice 128, which we're almost at) will reduce the difference slightly. (Obviously, by clustering most often used codes to the front, you'd probably get better performance since you're not traipsing all about memory any longer. Currently, for instance, comparision and branches are 40-60 code blocks away, while 'end' (which occurs once) is at offset 0. The ops used in this test are mostly up front. -- Bryan C. Warnock [EMAIL PROTECTED]
[PATCH] Switchable runops core functions
All -- I've had a couple of inspirations since the 0.0.2 release, and this is the one I can do from home, without the op_info stuff from one of my earlier patches. Assume there is a configuration space of runops core behaviors that is based on various settings of the interpreter flags. If there were enough of them, we'd want to be able to do the combinatorics with a script that generates the code, but for this example there are only two such flags: PARROT_TRACE_FLAG- If true, we print tracing info PARROT_BOUNDS_FLAG - If true, we check bounds I can imagine a third one, which I'll get to in more detail later: PARROT_EVENT_FLAG- If true, we check for events Now, we already had test_prog (BTW, when are we going to call this *the* interpreter, instead of just a test program) set up to intercept the -t flag to turn on tracing. I've set up the -b flag to turn on bounds checking (off by default now). But, the most important part of this patch is the implementation of dynamic switching of runops cores. This means that we have much flexibility, but programs that don't use a feature don't have to pay for it in the inner loop. But, those features can be turned on and off at *run time*. Here's how it works: * A new element of the interpreter structure: resume_addr. There's a new check in runops so that if we end, but have a resume_addr set, we go back to the point where we select a core based on the flags, and resume execution with the new core. * New ops 'trace_ic' and 'bounds_ic' (_i variants would be reasonable, too). These ops twiddle the appropriate bits of the interpreter's flag, set up a resume address and return the PC for the next instruction (just like any other op). It is required that you have an 'end' op follow these ops to force the DO_OP loop to terminate and trigger the mechanism described above. I've included a test program, t/trace.pasm, that demonstrates this. I've also run t/test.pasm with and without -b and noted a small increase in performance without bounds checking for those that want to live fast and dangerously :) Now, on to PARROT_EVENT_FLAG. With the mechanism implemented here, events could be turned off at the start (when the event queue is empty), and ops that queue events could turn the flag on and cause a resumption so that a core that checks for events would be used. When the event queue is empty, we can flip the flag back off again. Details, of course, to be filled in by folks who want to do the event stuff. Anyway, whether code that does events needs to leave the flag on or can get by with flipping it like this, programs that don't do events don't have to pay *any* inner loop cost for having an interpreter that allows other programs to use them. I think that counts as being The Parrot Way (TM). With enough thought, chances are good that we could come up with a program that generates runops_cores.[hc] from some specification of the interpreter flags and some code fragments with combination hints. Sort of a Poor Man's Aspect-Oriented Programming for Parrot Inner Loops... I'm sure you can imagine your own bits of switchable code for runops cores, but one more that I've thought a little bit about is: PARROT_PROFILE_FLAG for op profiling. I'm sure you can imagine what that would look like. I'm considering committing this patch, but I'll wait for some feedback from others to see if I've missed something important. Regards, -- Gregor ? include/parrot/vtable.h Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.32 diff -a -u -r1.32 basic_opcodes.ops --- basic_opcodes.ops 2001/10/06 00:57:43 1.32 +++ basic_opcodes.ops 2001/10/06 22:27:13 @@ -699,3 +699,20 @@ AUTO_OP xor_i { INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3); } + +/* BOUNDS_ic */ +AUTO_OP bounds_ic { + if (P1) { interpreter-flags |= PARROT_BOUNDS_FLAG; } + else{ interpreter-flags = ~PARROT_BOUNDS_FLAG; } + RESUME(3); /* After the end op which must follow bounds */ + RETURN(2); +} + +/* TRACE_ic */ +AUTO_OP trace_ic { + if (P1) { interpreter-flags |= PARROT_TRACE_FLAG; } + else{ interpreter-flags = ~PARROT_TRACE_FLAG; } + RESUME(3); /* After the end op which must follow trace */ + RETURN(2); +} + Index: interpreter.c === RCS file: /home/perlcvs/parrot/interpreter.c,v retrieving revision 1.23 diff -a -u -r1.23 interpreter.c --- interpreter.c 2001/10/03 16:21:30 1.23 +++ interpreter.c 2001/10/06 22:27:13 @@ -13,6 +13,13 @@ #include parrot/parrot.h #include parrot/interp_guts.h +runops_core_f runops_cores[4] = { + runops_t0b0_core, + runops_t0b1_core, + runops_t1b0_core, + runops_t1b1_core +}; + char *op_names[2048]; int op_args[2048]; @@ -42,26
Re: More speed trials
On Saturday 06 October 2001 06:38 pm, Bryan C. Warnock wrote: 4) Accessing the necessary registers as current written (from the interpreter struct.) The added benchmarks are the caching of the interpreter's register groups within the runops_*_core. (You can't cache the register set itself, as functions may manipulate the register stack.) Benchmarks on test.pasm: Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long Function table: 31,712,475 ops/sec Switch hybrid: 39,215,686 ops/sec (+24%) Switch (rcache): 41,152,263 ops/sec (+30%) Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long Function table: 13,181,019 ops/sec Switch hybrid: 18,416,206 ops/sec (+40%) Switch (rcache): 18,203,883 ops/sec (+38%) One of the more interesting discoveries? Adding a 'default:' case to the switch slowed down the Linux runs by several percent. -- Bryan C. Warnock [EMAIL PROTECTED]
[PROPOSED] Crystalizing loader
All -- My previous post included a patch. This one doesn't because I can't work on this one away from my office. But, I'm going to put the idea out to the list, and perhaps someone will beat me to trying it (but, do please tell me if you are going to so I don't go duplicating your effort when I get back to my office). I have a 32-bit system, so the discussion below will be geared to that environment. I think it shouldn't be too hard to adapt the technique to a 64-bit system, but that's not my area of specialization. After the bytecode is loaded, but before it is executed, put it through a stage of processing that requires about as much information as a disassembler would (which is why my op_info stuff from one of my previous patches is required). This process converts opcodes into pointers to the op functions, and arguments to pointers to the constant values or register entries. This means that we amortize the dereferences over all invocations of the op at each PC, which when tight loops are involved should make for noticable savings. In the case of my 32-bit machine, I could do the conversion in-place and hand the resulting crystalized bytecode over to a runops variant that knows what to expect. BTW, this could be controlled by an interpreter flag (but one that doesn't have a corresponding op): PARROT_CRYSTALIZE_FLAG Although, one wonders if these flags really should be part of the code object rather than part of the interpreter, so that they are local to their compilation unit. Perhaps the true answer will be some combination of interpreter flags and code flags combine to select the runops core that is used. One extra trick needed, though, is a version of process_opfunc.pl that compiles the ops so that they expect their arguments to already have the dereferencing done. This shouldn't be too hard, but we'd need room for another parallel opcode table, or we'd need to switch it in as appropriate. If I don't hear from someone that they are going to try this out, I'll take it on next time I'm in my office (possibly as early as tomorrow Morning, EST). Regards, -- Gregor ? include/parrot/vtable.h Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.32 diff -a -u -r1.32 basic_opcodes.ops --- basic_opcodes.ops 2001/10/06 00:57:43 1.32 +++ basic_opcodes.ops 2001/10/06 22:27:13 @@ -699,3 +699,20 @@ AUTO_OP xor_i { INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3); } + +/* BOUNDS_ic */ +AUTO_OP bounds_ic { + if (P1) { interpreter-flags |= PARROT_BOUNDS_FLAG; } + else{ interpreter-flags = ~PARROT_BOUNDS_FLAG; } + RESUME(3); /* After the end op which must follow bounds */ + RETURN(2); +} + +/* TRACE_ic */ +AUTO_OP trace_ic { + if (P1) { interpreter-flags |= PARROT_TRACE_FLAG; } + else{ interpreter-flags = ~PARROT_TRACE_FLAG; } + RESUME(3); /* After the end op which must follow trace */ + RETURN(2); +} + Index: interpreter.c === RCS file: /home/perlcvs/parrot/interpreter.c,v retrieving revision 1.23 diff -a -u -r1.23 interpreter.c --- interpreter.c 2001/10/03 16:21:30 1.23 +++ interpreter.c 2001/10/06 22:27:13 @@ -13,6 +13,13 @@ #include parrot/parrot.h #include parrot/interp_guts.h +runops_core_f runops_cores[4] = { + runops_t0b0_core, + runops_t0b1_core, + runops_t1b0_core, + runops_t1b1_core +}; + char *op_names[2048]; int op_args[2048]; @@ -42,26 +49,44 @@ } } -/*=for api interpreter runops +/*=for api interpreter runops_t0b0_core * run parrot operations until the program is complete + * + * No tracing. + * No bounds checking. */ opcode_t * -runops_notrace_core (struct Parrot_Interp *interpreter) { +runops_t0b0_core (struct Parrot_Interp *interpreter, opcode_t * pc) { /* Move these out of the inner loop. No need to redeclare 'em each time through */ opcode_t *(* func)(); opcode_t *(**temp)(); + +while (*pc) { DO_OP(pc, temp, func, interpreter); } + +return pc; +} + +/*=for api interpreter runops_t0b1_core + * run parrot operations until the program is complete + * + * No tracing. + * With bounds checking. + */ +opcode_t * +runops_t0b1_core (struct Parrot_Interp *interpreter, opcode_t * pc) { +/* Move these out of the inner loop. No need to redeclare 'em each + time through */ +opcode_t *(* func)(); +opcode_t *(**temp)(); opcode_t * code_start; INTVAL code_size; opcode_t * code_end; -opcode_t * pc; code_start = (opcode_t *)interpreter-code-byte_code; code_size = interpreter-code-byte_code_size; code_end = (opcode_t *)(interpreter-code-byte_code + code_size); -pc = code_start; - while (pc = code_start pc code_end *pc) { DO_OP(pc, temp, func,
Re: More speed trials
On Sunday 07 October 2001 01:16, Bryan C. Warnock wrote: [...] One of the more interesting discoveries? Adding a 'default:' case to the switch slowed down the Linux runs by several percent. In that, umh, case: do you have an explanation or could you provide the code? Buggs
Re: More speed trials
On Saturday 06 October 2001 07:43 pm, Gregor N. Purdy wrote: The Crystalizing Loader proposal I just made would work better if the addresses to the current registers were always the same and pushing regs onto stacks made copies, rather than having the current reg file be the new set of regs. And now that you mention it, that may be how the register stack is handled. Let's take a look... Nope, it's handled like a regular stack. Of course, for hardware registers, that isn't so. You do copy the registers onto a stack, but you still reference the registers. But that's more of a function of hardware (where registers are different from memory) than software (where memory is the same as memory). However, to always refer to the base set of registers, and to push and pop copies of those registers onto a stack that intrigues me. (Depends on how much we push and pull, I guess. But you did bring up one thing - you don't get a copy of the registers when you push. That makes it nigh impossible to pass values in the registers when you need to save the registers off. Dan?) But before we going jumping the gun, let's see what straight registers do. {dum de dum de dum...} Runs about the same for me. (A shade slower on Linux.) I'm interested to know if there's a way to turn the op funcs into chunks of code that longjmp around (or something equivalent) so we can get rid of function call overhead for simple ops (complex ops could consist primarily of a function call internally). But argument passing? In theory, you'd just be coding by hand what the platform's calling semantics already provide you. (More or less.) In this case, the crystalizing loader puts the address to jump to in place of the opcode, and opcodes jump to the location in the next opcode field when they are done, and the 'end' opcode is replaced by a well-known location that terminates the runops core. Saving the dereference of the opcode type. Yes, I'm reserving judgement on this (whilst I ponder it.) -- Bryan C. Warnock [EMAIL PROTECTED]
Re: More speed trials
On Saturday 06 October 2001 08:05 pm, Buggs wrote: On Sunday 07 October 2001 01:16, Bryan C. Warnock wrote: [...] One of the more interesting discoveries? Adding a 'default:' case to the switch slowed down the Linux runs by several percent. In that, umh, case: do you have an explanation or could you provide the code? http://members.home.net/bcwarno/Perl6/spool/interpreter.c http://members.home.net/bcwarno/Perl6/spool/switch.cinc An assembler diff between adding the default and not. I'd interpret it, but I haven't figured out exactly how yet. (These diffs are repeated later on in the other runops loop. I snipped them for brevity.) The default case wasn't exercised during runtime, so it has to be related to the Athlon, which I know behaves weirdly. 175c175 ja .L212 --- ja .L11 179c179 movl.L213(%eax), %eax --- movl.L212(%eax), %eax 185c185 .L213: --- .L212: 3126,3130d3125 .L212: leal-40(%ebp), %eax addl$4, (%eax) jmp .L11 .p2align 4,,7 -- Bryan C. Warnock [EMAIL PROTECTED]
Re: More speed trials
Bryan -- ... But you did bring up one thing - you don't get a copy of the registers when you push. That makes it nigh impossible to pass values in the registers when you need to save the registers off. Dan?) This is the part about the current design I have a hard time understanding. Thats why I asked for an example of what the subroutine calling convention is supposed to look like (at least preliminarily). I don't remember seeing one. At least for now, though, I've been able to implement Poor Man's (I'm using that phrase a lot lately--I wonder what that means :) Subroutines in Jako through the use of address arithmetic in the assembler and a sprinkle of cleverness. It doesn't do arguments, though. Sooner or later The Jako compiler is going to have to become a real compiler and do real register alloation, etc. But I can't do that until the appropriate bits of Parrot are implemented and my brain is configured to work with them. But before we going jumping the gun, let's see what straight registers do. {dum de dum de dum...} Runs about the same for me. (A shade slower on Linux.) Could you elaborate on this statement please? I'm not sure I follow... I'm interested to know if there's a way to turn the op funcs into chunks of code that longjmp around (or something equivalent) so we can get rid of function call overhead for simple ops (complex ops could consist primarily of a function call internally). But argument passing? In theory, you'd just be coding by hand what the platform's calling semantics already provide you. (More or less.) There's no argument passing, because the args are on the stream. If everything is in the byte code stream. You jump to a fixed up address. The code there knows the PC within the byte code, so it messes with its args (fixed up pointers to regs and constants) and then jumps to the address thats been fixed up in place of the next op's opcode (after updating the PC). No argument passing. Unless I've missed something... In this case, the crystalizing loader puts the address to jump to in place of the opcode, and opcodes jump to the location in the next opcode field when they are done, and the 'end' opcode is replaced by a well-known location that terminates the runops core. Saving the dereference of the opcode type. Yes, I'm reserving judgement on this (whilst I ponder it.) Yeah, I want to save (really amortize) all those dereferences and also save the function call overhead for all simple ops (as I said before, complex ops that need temporary variables and such would probably be moved to functions and the code at the jump target would call that function with appropriate arg passing and then get back to the same business as the rest of the ops by updating PC and jumping to the next op func body. Regards, -- Gregor
Re: More speed trials
On Saturday 06 October 2001 09:07 pm, Gregor N. Purdy wrote: But before we going jumping the gun, let's see what straight registers do. {dum de dum de dum...} Runs about the same for me. (A shade slower on Linux.) Could you elaborate on this statement please? I'm not sure I follow... Oh, since I wasn't doing any register stack manipulation, I pointed to the register set itself (to save another level of indirection) to see if that would indeed improve performance on register manipulation. The x86 ran 1/10 second slower, and the SPARC was unchanged. (So there's no real performance gain to point at for working off the bottom instead of the top.) I'm interested to know if there's a way to turn the op funcs into chunks of code that longjmp around (or something equivalent) so we can get rid of function call overhead for simple ops (complex ops could consist primarily of a function call internally). But argument passing? In theory, you'd just be coding by hand what the platform's calling semantics already provide you. (More or less.) There's no argument passing, because the args are on the stream. If everything is in the byte code stream. You jump to a fixed up address. The code there knows the PC within the byte code, so it messes with its args (fixed up pointers to regs and constants) and then jumps to the address thats been fixed up in place of the next op's opcode (after updating the PC). No argument passing. Unless I've missed something... Well, yes. Argument passing. Whether they're on the stack or in the stream. (In this case, the stream *is* the stack, sort of.) I'm just saying that, in essence, all the jumping that you'd be coding, with the arguments in the stream (vice the stack), is more or less simply reinventing the calling semantics of whatever hardware you're on. At some point, though, we will have to trade maintainability and sanity for speed. ;-) In this case, the crystalizing loader puts the address to jump to in place of the opcode, and opcodes jump to the location in the next opcode field when they are done, and the 'end' opcode is replaced by a well-known location that terminates the runops core. Saving the dereference of the opcode type. Yes, I'm reserving judgement on this (whilst I ponder it.) Yeah, I want to save (really amortize) all those dereferences and also save the function call overhead for all simple ops (as I said before, complex ops that need temporary variables and such would probably be moved to functions and the code at the jump target would call that function with appropriate arg passing and then get back to the same business as the rest of the ops by updating PC and jumping to the next op func body. Well, the simple ops switch is all inlined. (No function calls.) But you lose the ability to truly cache those addresses, so you can't call them directly. And attemping to discern between an already converted address and a simply op will lose any ground you've gained. (But some caching of the dereferences are good. On the x86, where registers are scarce, I just squeezed another 1.3 million ops/sec by doing that. But the same trick on the SPARC (which has the registers to cache it automatically) suffered a performance hit with the overhead of storing the deference.) -- Bryan C. Warnock [EMAIL PROTECTED]
RE: More speed trials
I tried removing the bounds checking and adding multiple DO_OPs inside the while loop. with -O0 the loop unrolling helped, but removing the bounds checking actually slowed it down. With -O3, neither one helped at all.