Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 02:04 am, Gibbs Tanton - tgibbs wrote:
 I think that changing from a function based implementation to a switch
 based implementation will help on many platforms.  Someone did a patch on
 that, maybe we could update it and commit it.  Having to go through two
 indirections and two array accesses to access a register probably doesn't
 help much either, although it won't be easy to get around that.  Apart
 from that, there is not much else to be done.  We can reprofile, but the
 only thing being executed are integer additions and comparisons...you
 can't get much more basic than that...and you are right, we are running
 much too slowly.

I don't know.  I think the loop may actually be too tight, which a switch 
won't necessarily help.  But I'm all for reducing the overhead of 
unnecessary function calls.  That's my project for the weekend.  (Well, that 
and the summary.)  As far as the double indirection, I moved one out of the 
loop and it actually slowed down.  Just an hour ago, I experimented with 
having each function return the offset (vice the address) of the next 
opcode, in hopes that that might help with some performance within the loop. 
(Since the compiler can then guess that you may be doing pointer arithmetic 
around the address you previously had, vice having to dereference some 
random pointer.  Or so the theory goes. But that also slowed things down.)-: 
 


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: Test::...

2001-10-06 Thread Simon Cozens

On Sat, Oct 06, 2001 at 08:19:16AM -0400, Michael Fischer wrote:
 Why were they there?

Because 'l!' does exactly the right thing on Perl version 5.6.0 and
above, and 'l' is not guaranteed to do exactly the right thing.

-- 
It starts like fascination, it ends up like a trance
   You've gotta use your imagination on some of that magazine romance
And these bones--they don't look so good to me
   Jokers talk and they all disagree
One day soon, I will laugh right in the face of the poison moon  



RE: acceptable development environments/tools...?

2001-10-06 Thread Brent Dax

[EMAIL PROTECTED]:
# Sorry to disrupt your discussion with some loosely related
# question... Could anyone help me determine which development
# tools/IDEs are to be used when hacking at Parrot?

Whatever strikes your fancy.  You could even use ed if you wanted
to--although I wouldn't recommend it.  :^)  As long as it reads and
writes ASCII and doesn't mangle the files, it's okay to use.

# As you'd have guessed it, I'm relatively new to this project...
# ;-). However, I do want to help out with the effort
# to code a great 'system' that any other developer could
# benefit from (including myself :).

Welcome to the team then.  :^)

# Say, at home, I'm working on a Windows (ME) system.  The IDEs
# I have at my disposal are the MS Visual Studio (C++), CodeWarrior,
# and some old DOS based C compilers.  I've got CVS all set up on my
# side so retrieving recent copy of the working files from the
# Parrot cvs root shouldn't be of a problem.  I'm also thinking
# of moving to a Unix based system in a short while (since I've used
# to coding on a Solaris box at work).

I'm also on Win32 (Win2000, to be exact).  I use WinCVS with two
directories: parrot and parrot-cvs.  parrot is my working copy and
parrot-cvs is what's currently on CVS.  I use Visual Studio.NET beta 2
($13 from MS) as my editor, since ActiveState has a nifty Perl code
editor plugin for it, and most of my work (I mostly muck with Configure,
but wade into the C once in a while) is with Perl code.  The actual
directory structure looks like this:

+--+ Perl 6
|  +--+ parrot
|  |  +--- .vcproj and related files
|  |  +--+ parrot
|  | +--- working copy of CVS files
|  |
|  +--+ parrot-cvs
|  |  +--+ parrot
|  | +--- pure copy of CVS files
|  |
|  +--+ babyperl
|  |  +--- files related to babyperl
|  |
|  +--+ smoke
|  |  +--- remote smoke-testing stuff

I use PPT (Perl Power Tools--look around on the CPAN) to get various
Unix utilities (although I haven't gotten their patch program to work
well, diff works okay--I just apply patches via an SSH connection to a
BSD box).

Of course, you may choose a different solution than my cobbled-together
collection of free and nonfree software--I've heard Cygwin works well
for this sort of thing.

Whatever tools you use, make sure you have fun working on Parrot.  That
is, after all, what it's all about.

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

They *will* pay for what they've done.




Re: Perl6 Tinderbox

2001-10-06 Thread H.Merijn Brand

On Sat 06 Oct 2001 :58, Michael G Schwern [EMAIL PROTECTED] wrote:
 On Fri, Oct 05, 2001 at 05:18:07PM -0700, Zach Lipton wrote:
  Because the need for a tinderbox testing platform is fairly urgent right now
  for perl6, I am releasing my (place your favorite adjective in the blank
  here) tinderbox client for perl6 ahead of the near-rewrite that I am working
  on to use Devel::Tinderbox::Reporter (which was just written) and
  Test::Smoke (which wouldn't help perl6 all that much anyway.
 
 There's an existing Parrot::Smoke module, I forget where it is off hand.

CPAN/authors/id/M/MB/MBARBON/Parrot-Smoke-0.02.tar.gz

I'd expext

-- 
H.Merijn BrandAmsterdam Perl Mongers (http://www.amsterdam.pm.org/)
using perl-5.6.1, 5.7.1  623 on HP-UX 10.20  11.00, AIX 4.2, AIX 4.3,
 WinNT 4, Win2K pro  WinCE 2.11 often with Tk800.022 /| DBD-Unify
ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/




vtable.h

2001-10-06 Thread Simon Cozens

I've just committed some files which generate vtable.h; these were
actually left over from my experiments of a *long* time ago. [1] It
might need quite a few changes, but it's a good start, and I think
it's general enough to survive. 

The next thing I want to do with it is have something akin to
process_op_func.pl which takes a macro-ized description of the vtable
functions and turns them into real C code. Volunteers welcome, or I'll
write it myself. :) This could be a good place, however, for newcomers
to Parrot to get involved with something relatively straightforward
but pretty crucial. Hint, hint...

Simon

[1] In short, back in the beginning, Dan and I independently started
implementing Parrot; Dan's version was more complete and sensible by
the time we got together to discuss it, so his became the codebase
that was checked into CVS. Dan had started with the interpreter main
loop and ops, and I had started with PMCs.

-- 
Actually Perl *can* be a Bondage  Discipline language but it's unique
among such languages in that it lets you use safe words. 
-- Piers Cawley



RE: vtable.h

2001-10-06 Thread Gibbs Tanton - tgibbs

for add we will end up with

void add( PMC* self, PMC* left, PMC* right );

does this represent:

self = left + right 

or some other ordering?
-Original Message-
From: Simon Cozens
To: [EMAIL PROTECTED]
Sent: 10/6/2001 7:50 AM
Subject: vtable.h

I've just committed some files which generate vtable.h; these were
actually left over from my experiments of a *long* time ago. [1] It
might need quite a few changes, but it's a good start, and I think
it's general enough to survive. 

The next thing I want to do with it is have something akin to
process_op_func.pl which takes a macro-ized description of the vtable
functions and turns them into real C code. Volunteers welcome, or I'll
write it myself. :) This could be a good place, however, for newcomers
to Parrot to get involved with something relatively straightforward
but pretty crucial. Hint, hint...

Simon

[1] In short, back in the beginning, Dan and I independently started
implementing Parrot; Dan's version was more complete and sensible by
the time we got together to discuss it, so his became the codebase
that was checked into CVS. Dan had started with the interpreter main
loop and ops, and I had started with PMCs.

-- 
Actually Perl *can* be a Bondage  Discipline language but it's unique
among such languages in that it lets you use safe words. 
-- Piers Cawley



RE: vtable.h

2001-10-06 Thread Gibbs Tanton - tgibbs

2 other things

1.) Will each different type of PMC have its own vtable, function
definitions, etc or will they all share everything with switches on type in
the function definitions.

2.) Can you give an idea of what you think the macro-ized function should
look like (an example would be great.)

Thanks!
Tanton

-Original Message-
From: Simon Cozens
To: [EMAIL PROTECTED]
Sent: 10/6/2001 7:50 AM
Subject: vtable.h

I've just committed some files which generate vtable.h; these were
actually left over from my experiments of a *long* time ago. [1] It
might need quite a few changes, but it's a good start, and I think
it's general enough to survive. 

The next thing I want to do with it is have something akin to
process_op_func.pl which takes a macro-ized description of the vtable
functions and turns them into real C code. Volunteers welcome, or I'll
write it myself. :) This could be a good place, however, for newcomers
to Parrot to get involved with something relatively straightforward
but pretty crucial. Hint, hint...

Simon

[1] In short, back in the beginning, Dan and I independently started
implementing Parrot; Dan's version was more complete and sensible by
the time we got together to discuss it, so his became the codebase
that was checked into CVS. Dan had started with the interpreter main
loop and ops, and I had started with PMCs.

-- 
Actually Perl *can* be a Bondage  Discipline language but it's unique
among such languages in that it lets you use safe words. 
-- Piers Cawley



Re: vtable.h

2001-10-06 Thread Simon Cozens

On Sat, Oct 06, 2001 at 08:11:30AM -0500, Gibbs Tanton - tgibbs wrote:
 void add( PMC* self, PMC* left, PMC* right );
 does this represent:
 self = left + right 

Yes.

-- 
UNIX was not designed to stop you from doing stupid things, because that
would also stop you from doing clever things.
-- Doug Gwyn



RE: vtable.h

2001-10-06 Thread Gibbs Tanton - tgibbs

 2.) Can you give an idea of what you think the macro-ized function
should
 look like (an example would be great.)

No, because then you'll go away and implement it, and I want to
encourage
some fresh blood to do that. :) 

Okey Dokey...I promise not to do it :)

Seriously, before I do that, I need to seriously think about what vtable
accessors ought to look like;

(pmc1-vtable[want_vtbl_add])(pmc1, pmc2, pmc3)

is going to scare people away quickly, and, while

   PMC_ADD(pmc1, pmc2, pmc3) 

s cute, (and allows us to autogenerate Parrot byte ops ;) Macro
Hell is something we want to avoid. 

Well, you currently have vtable as a struct so you would say
pmc1-vtable-add( pmc1, pmc2, pmc3 )

which doesn't look that bad.  Really, I would imagine all of this would be
autogenerated by process_opfunc.pl so it doesn't matter what the longhand
looks like.  We can use PMC_ADD in basic_opcodes.ops just like we use
INT_CONST or whatever and the macro is stripped out of the perl.

Also, how will adds of different types be handled.  In the above if pmc2 is
an int and pmc3 is a float we're going to have to know that and do a switch
or something to convert to/create the right type.

Tanton



[patch] give Configure a policy

2001-10-06 Thread Alex Gough

I've modified Configure.pl to take defaults from a previous build (if
there was one).  This should play nicely with hints, and '--defaults'
by doing the Right Thing.  I've added a '--nopolicy' option to disable
this.

Patch below sig.

Alex Gough
-- 
W.W- A little nonsense now and then is relished by the wisest men.

##
Index: Configure.pl
===
RCS file: /home/perlcvs/parrot/Configure.pl,v
retrieving revision 1.23
diff -u -r1.23 Configure.pl
--- Configure.pl2001/10/04 20:19:38 1.23
+++ Configure.pl2001/10/06 14:07:16
@@ -8,9 +8,11 @@
 use Getopt::Long;
 use ExtUtils::Manifest qw(manicheck);
  
-my($opt_debugging, $opt_defaults, $opt_version, $opt_help) = (0, 0, 0, 0);
+my($opt_debugging, $opt_defaults, $opt_version,
+   $opt_help, $opt_nopolicy) = (0) x 5;
 my(%opt_defines);
 my $result = GetOptions( 
+   'nopolicy' = \$opt_nopolicy,
'debugging!' = \$opt_debugging,
'defaults!'  = \$opt_defaults,
'version'= \$opt_version,
@@ -29,6 +31,7 @@
 Options:
--debugging  Enable debugging
--defaults   Accept all default values
+   --nopolicy   Do not take values from previous build
--define name=value  Defines value name as value
--help   This text
--versionShow assembler version
@@ -60,25 +63,39 @@
 #Some versions don't seem to have ivtype or nvtype--provide 
 #defaults for them.
 #XXX Figure out better defaults
+my %policy;
+unless ($opt_nopolicy) {
+eval '
+   require Parrot::Config;
+   %policy = %Parrot::Config::PConfig;
+';
+if ($@) {
+   print No policy available, using defaults\n;
+}
+else {
+   print Using defaults from earlier build\n;
+   $policy{__have_policy} = 1;
+}
+}
 my(%c)=(
-   iv =   ($Config{ivtype}   ||'long'),
+   iv =  ($policy{ivtype} || $Config{ivtype}   ||'long'),
intvalsize =   undef,
 
-   nv =   ($Config{nvtype}   ||'double'),
+   nv =  ($policy{nvtype} || $Config{nvtype}   ||'double'),
numvalsize =   undef,
 
-   opcode_t = ($Config{ivtype}   ||'long'),
+   opcode_t =($policy{ivtype} || $Config{ivtype}   ||'long'),
longsize = undef,
 
-   cc =   $Config{cc},
+   cc =  ($policy{cc} || $Config{cc}),
#ADD C COMPILER FLAGS HERE
-   ccflags =  $Config{ccflags}. -I./include,
-   libs = $Config{libs},
+   ccflags = ($policy{ccflags}||$Config{ccflags}. -I./include),
+   libs =($policy{libs}   ||$Config{libs}),
cc_debug = '-g',
o ='.o',   # object files extension
-   exe =  $Config{_exe},
+   exe = ($policy{exe} || $Config{_exe}),
 
-   ld =   $Config{ld},
+   ld =  ($policy{ld}  || $Config{ld}),
ld_out =   '-o ',  # ld output file
ld_debug = '', # include debug info in executable
 
@@ -91,8 +108,9 @@
 @c{keys %opt_defines}=@opt_defines{keys %opt_defines};
 
 # set up default values
+# don't need these if previously complied, can take from Parrot::Config
 my $hints = hints/ . lc($^O) . .pl;
-if(-f $hints) {
+if(!$policy{__have_policy}  -f $hints) {
local($/);
open HINT,  $hints or die Unable to open hints file '$hints';
my $hint = HINT;




Re: More speed trials

2001-10-06 Thread Simon Cozens

On Sat, Oct 06, 2001 at 12:44:59AM -0400, Bryan C. Warnock wrote:
 Ops/sec:31,716,577.291820

Wowsers. What are you running that thing on?

For comparison, on this machine:
Parrot  Ops/sec: 500.00
Python2 ops/sec: 3289276.607351
(Python 1 is slightly faster - at the moment.)

That's not fast enough; once PMCs get introduced, that advantage
is going to fall away. What are we doing wrong? :( Python uses a
switch, however, maybe that's it.

-- 
pudge i've dreamed in Perl many time, last night i dreamed in Make,
and that just sucks.



Re: vtable.h

2001-10-06 Thread Simon Cozens

On Sat, Oct 06, 2001 at 09:01:34AM -0500, Gibbs Tanton - tgibbs wrote:
 which doesn't look that bad.  Really, I would imagine all of this would be
 autogenerated by process_opfunc.pl so it doesn't matter what the longhand
 looks like.

Not really; I expect that external code will also manipulate PMCs.

 Also, how will adds of different types be handled.  In the above if pmc2 is
 an int and pmc3 is a float we're going to have to know that and do a switch
 or something to convert to/create the right type.

There'll actually (and I need to change my vtable code to reflect this) be
several versions of each vtable function, depending on the relative type of
each PMC. Basically, there'll be two easily optimizable versions (i.e. types
are the same, or types can be easily converted with a cast or simple function)
and a non-optimized version, which would actually be the naive implementation
in many cases. (These types are way out of my depth - call -get_integer on
each one, and add the result.)

I didn't think that up, by the way, it was Dan's idea. :)

-- 
Oh dear. I've just realised that my fvwm config lasted longer than my
marriage, in that case.
- Anonymous



Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 10:58 am, Dan Sugalski wrote:

 It's the function pointer indirection, to some extent. The switch dispatch
 loop should help some. Also I don't think you should make too many
 performance comparisons until we've got something equivalent to compare
 with.

Unless we're already slower.  ;-)  (Which is what I wanted to check.)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: More speed trials

2001-10-06 Thread Dan Sugalski

At 11:36 AM 10/6/2001 -0400, Bryan C. Warnock wrote:
On Saturday 06 October 2001 10:58 am, Dan Sugalski wrote:
 
  It's the function pointer indirection, to some extent. The switch dispatch
  loop should help some. Also I don't think you should make too many
  performance comparisons until we've got something equivalent to compare
  with.

Unless we're already slower.  ;-)

True, true. But we're not, which is good. A 200% speed improvement's sort 
of good, depending on what ops we execute. And it looks like we execute 
considerably more iterations of the actual loop than perl 5 does, which is 
also good.

(Which is what I wanted to check.)

I'm glad you did. Rational benchmarks are good, even if they tell us 
something we don't want to hear... :)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: More speed trials

2001-10-06 Thread Dan Sugalski

At 05:46 PM 10/6/2001 +0200, Paolo Molaro wrote:
I get about the same number for Parrot on my K6-400, but
compiling with -O2 gets it up to 11,500,000, so maybe your forgot
to use -O2 or it may be the laptop in power-saving mode:-)
The current mono interp can do at least twice that many ops
using a switch.

Ah, that's the number I wanted. So mono manages 23M ops/sec, then?

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: More speed trials

2001-10-06 Thread Michael Fischer



On Oct 06, Gibbs Tanton - tgibbs [EMAIL PROTECTED] took up a
keyboard and banged out

 I think that changing from a function based implementation to a switch
 based
 implementation will help on many platforms.  Someone did a patch on
 that,
 maybe we could update it and commit it.

I'll revise it to fit current state of code and kick it back out to you
and the list before Monday AM.


Michael
-- 
Michael Fischer 7.5 million years to run
[EMAIL PROTECTED]printf %d, 0x2a;
-- deep thought 



[PATCH] vtable.tbl: REGEX pointer

2001-10-06 Thread Bryan C . Warnock

Index: vtable.tbl
===
RCS file: /home/perlcvs/parrot/vtable.tbl,v
retrieving revision 1.1
diff -u -r1.1 vtable.tbl
--- vtable.tbl  2001/10/06 12:41:57 1.1
+++ vtable.tbl  2001/10/06 16:56:14
@@ -35,5 +35,5 @@
 void logical_orPMC* leftPMC* right
 void logical_and   PMC* leftPMC* right
 void logical_not   PMC* left
-void match PMC* leftREGEX re
+void match PMC* leftREGEX* re
 void repeatPMC* leftPMC* right

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: vtable.h

2001-10-06 Thread Michael Maraist

On Sat, 6 Oct 2001, Simon Cozens wrote:

 On Sat, Oct 06, 2001 at 09:01:34AM -0500, Gibbs Tanton - tgibbs wrote:
  Also, how will adds of different types be handled.  In the above if pmc2 is
  an int and pmc3 is a float we're going to have to know that and do a switch
  or something to convert to/create the right type.

 There'll actually (and I need to change my vtable code to reflect this) be
 several versions of each vtable function, depending on the relative type of
 each PMC. Basically, there'll be two easily optimizable versions (i.e. types
 are the same, or types can be easily converted with a cast or simple function)
 and a non-optimized version, which would actually be the naive implementation
 in many cases. (These types are way out of my depth - call -get_integer on
 each one, and add the result.)

So would it be something like(ultimtaely put into a macro):
AUTO_OP add_p_p_p {
  if (!P1)
CREATE_PMC(P1);
  if (!P2 || !P3)
throw exception; // however this is done in Parrot
  P2-vtable-add[P3-type]-(interp, P1, P2, P3); //in macro
}

In this way each vtable operation is really an array of handlers for
each possible type of input.  This avoids any comparisons.  Invalid
comparisons all share a common function (which throws an invalid data-intermingling
exception).

int_pmc_vtable = { ., { pmc_vtable_add_int_int,
pmc_vtable_add_int_float, pmc_vtable_add_int_string,
pmc_vtable_add_int_iconst, pmc_vtable_add_int_fconst, ... }, ... };


// maps to RET_DT pmc_vtable_add_int_int(interp_t*, PMC*, PMC*, PMC*)
AUTO_VOP add_int_int
{
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + P3-ival;
}

AUTO_VOP add_int_float
{
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + P3-fval;
}

AUTO_VOP add_int_iconst
{
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + P3;
}

AUTO_VOP add_int_fconst
{
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + Parrot_float_constants[P3];
}

AUTO_VOP add_int_string
{
  UPGRADE(P3,PMC_INT);
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + P3-ival;
}

Alternatively, if we can't be both a string AND an int in PMC:
AUTO_VOP add_int_string
{
  int p3ival = PMC_STR_TO_INT(P3);
  UPGRADE(P1,PMC_INT);
  P1-ival = P2-ival + p3ival;
}



This assumes that a = b op c will be the same as a = b.op( c ), which
I think is fair.  Thus add_float_int produces a float while
add_int_float produces an int.  The compiler can worry about the order
or the parameters.

I don't think there's much value in writing separate a op= b since you
could just do:

P1-vtable-add[P1-type]-(interp, P1, P1, P2);

With hardly any additional overhead.  The optimized code might have
been:

AUTO_VOP inc_int_int
{
  P1-ival += P2-ival; // avoids casting P1
}

But now you have LOTS more vtable ops.

My question at this point is if the PMC's are polymorphic like Perl5
or if there is an explicit type type.  Polymorphics can make for vary
large vtable sub-arrays. (int, int_float, int_float_string,
int_string, etc).

If PMC-types are bit-masked (for easy upgrading) such as:
O   O O   O
^   ^ ^   
|   | |   ...
INT FLOAT STR

We could apply a macro that extract the desired type.
Such as GET_PMC_TYPE_INT(Px) which if it is of type int, it returns
int, else float else string.

#define GET_PMC_TYPE_INT(type) (type  PMC_INT)?PMC_INT : (type 
PMC_FLOAT)?PMC_FLOAT : (type  PMC_STRING)?PMC_STRING : type

Likewise GET_PMC_TYPE_FLOAT would return first float then int then string

It's not as fast because we're not avoiding the nested if-statements,
but it's easy enough to read.

P2-vtable-add[ GET_PMC_TYPE_INT(P3-type) ]-(...)

Ideally, the bit-pattern for the pmc-type is numerically small (for
small sub-arrays).

enum PMC_TYPES { PMC_INT, PMC_FLOAT, PMC_STR, PMC_INT_FLOAT,
PMC_INT_STR, PMC_INT_FLOAT_STR,
 PMC_FLOAT, ... };

In this way we simply map everything that has INT in it to the same
handler.  No conditionals at all (but lots and lots of vtable space).
Thankfully this is constant and could be assigned globally such that
there is no intialization overhead.

-Michael




Re: vtable.h

2001-10-06 Thread Michael Maraist

On Sat, 6 Oct 2001, Michael Maraist wrote:

 My question at this point is if the PMC's are polymorphic like Perl5
 or if there is an explicit type type.  Polymorphics can make for vary
 large vtable sub-arrays. (int, int_float, int_float_string,
 int_string, etc).

 If PMC-types are bit-masked (for easy upgrading) such as:
 O   O O   O
 ^   ^ ^   
 |   | |   ...
 INT FLOAT STR

 We could apply a macro that extract the desired type.
 Such as GET_PMC_TYPE_INT(Px) which if it is of type int, it returns
 int, else float else string.

 #define GET_PMC_TYPE_INT(type) (type  PMC_INT)?PMC_INT : (type 
 PMC_FLOAT)?PMC_FLOAT : (type  PMC_STRING)?PMC_STRING : type

 Likewise GET_PMC_TYPE_FLOAT would return first float then int then string

 It's not as fast because we're not avoiding the nested if-statements,
 but it's easy enough to read.

 P2-vtable-add[ GET_PMC_TYPE_INT(P3-type) ]-(...)

Ops!  Stupid me.  I forgot that at the add_p_p_p level we don't know
that it's an INT / FLOAT, etc.

The only way that we could use bit-masked types is if we wrote a
complex if statement:
 ( P2  PMC_INT )? (P3  PMC_INT ? PMC_INT : (P3  PMC_FLOAT ?
 PMC_FLOAT : PMC_STR ) :
   (P2  PMC_FLOAT)? (P3  PMC_FLOAT ? PMC_FLOAT : ( P3  PMC_INT ) ?
   PMC_INT : PMC_STR ) :
   (P2  PMC_STR)? ( P3  PMC_STR ? PMC_STR : 

This is beyond ugly, not to mention not upgradable if/when we add new types.

So unless we're using an enum-style type-compaction doubly indirected
vtables won't be feasible.


 Ideally, the bit-pattern for the pmc-type is numerically small (for
 small sub-arrays).

 enum PMC_TYPES { PMC_INT, PMC_FLOAT, PMC_STR, PMC_INT_FLOAT,
 PMC_INT_STR, PMC_INT_FLOAT_STR,
  PMC_FLOAT, ... };

 In this way we simply map everything that has INT in it to the same
 handler.  No conditionals at all (but lots and lots of vtable space).
 Thankfully this is constant and could be assigned globally such that
 there is no intialization overhead.

 -Michael


-Michael




Re: vtable.h

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 01:13 pm, Michael Maraist wrote:
 So would it be something like(ultimtaely put into a macro):
 AUTO_OP add_p_p_p {
   if (!P1)
 CREATE_PMC(P1);
   if (!P2 || !P3)
 throw exception; // however this is done in Parrot
   P2-vtable-add[P3-type]-(interp, P1, P2, P3); //in macro
 }

 In this way each vtable operation is really an array of handlers for
 each possible type of input. 

Arghh, no. Surely you don't mean 'each possible' the way that I'm reading 
'each possible'? [1]


 int_pmc_vtable = { ., { pmc_vtable_add_int_int,
 pmc_vtable_add_int_float, pmc_vtable_add_int_string,
 pmc_vtable_add_int_iconst, pmc_vtable_add_int_fconst, ... }, ... };

And each and every class, object, and package that potentially creates 
and/or modifes its vtable.

{snip}


 This assumes that a = b op c will be the same as a = b.op( c ), which
 I think is fair.  Thus add_float_int produces a float while
 add_int_float produces an int.  The compiler can worry about the order
 or the parameters.

add int_float should also produce a float.  (Barring 'use integer', string 
typing,  or overloading.)

{snip}

 My question at this point is if the PMC's are polymorphic like Perl5
 or if there is an explicit type type.  Polymorphics can make for vary
 large vtable sub-arrays. (int, int_float, int_float_string,
 int_string, etc).

Polymorphic plus, I believe.

[1] And don't call me Shirley.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



printf format strings

2001-10-06 Thread Bryan C . Warnock

What are our short- and long-term goals for handling printf formats for
configurable types? 

This fixes the ones not dependent on the answer above.  I'm also wrapping 
some lengthy lines.   And why aren't the coding standards up on dev.perl.org?

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Index: packfile.c
===
RCS file: /home/perlcvs/parrot/packfile.c,v
retrieving revision 1.10
diff -u -r1.10 packfile.c
--- packfile.c	2001/10/06 00:57:43	1.10
+++ packfile.c	2001/10/06 17:29:56
@@ -1689,10 +1689,15 @@
 case PFC_STRING:
 printf([ 'PFC_STRING', {\n);
 printf(FLAGS= 0x%04x,\n, self-string-flags);
-printf(ENCODING = %ld,\n,  (long) self-string-encoding-which);
-printf(TYPE = %ld,\n,  (long) self-string-type);
-printf(SIZE = %ld,\n,  (long) self-string-bufused);
-printf(DATA = '%s'\n,  self-string-bufstart); /* TODO: Not a good idea in general */
+printf(ENCODING = %ld,\n, 
+(long) self-string-encoding-which);
+printf(TYPE = %ld,\n,  
+(long) self-string-type);
+printf(SIZE = %ld,\n,  
+(long) self-string-bufused);
+/* TODO: Not a good idea in general */
+printf(DATA = '%s'\n,  
+(char *) self-string-bufstart); 
 printf(} ],\n);
 break;
 
Index: test_main.c
===
RCS file: /home/perlcvs/parrot/test_main.c,v
retrieving revision 1.14
diff -u -r1.14 test_main.c
--- test_main.c	2001/10/06 00:57:43	1.14
+++ test_main.c	2001/10/06 17:29:56
@@ -46,13 +46,22 @@
 int i;
 time_t foo;
 
-printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart);
+printf(String %p has length %i: %.*s\n, (void *) s, 
+(int) string_length(s), (int) string_length(s), 
+(char *) s-bufstart);
 string_concat(s, t, 0);
-printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart);
+printf(String %p has length %i: %.*s\n, (void *) s, 
+(int) string_length(s), (int) string_length(s), 
+(char *) s-bufstart);
 string_chopn(s, 4);
-printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart);
+printf(String %p has length %i: %.*s\n, (void *) s, 
+(int) string_length(s), (int) string_length(s), 
+(char *) s-bufstart);
 string_chopn(s, 4);
-printf(String %p has length %i: %.*s\n, s, (int) string_length(s), (int) string_length(s), (char *) s-bufstart);
+printf(String %p has length %i: %.*s\n, (void *) s, 
+(int) string_length(s), (int) string_length(s), 
+(char *) s-bufstart);
+
 foo = time(0);
 for (i = 0; i  1; i++) {
 string_concat(s, t, 0);



[PATCH] non-init var possibility

2001-10-06 Thread Bryan C . Warnock

mask and max_to_alloc are unitialized if the size requested is less that 1.  
(Which it could be, since INTVAL is signed.)  Of course, if it happens, you 
should get what you deserve, but this at least horks them cleanly.

Creation of an UINTVAL (UNTVAL? :-)  and subsequent patches will follow 
pending feedback.

Is the behavior of malloc(0) consistent?

Index: memory.c
===
RCS file: /home/perlcvs/parrot/memory.c,v
retrieving revision 1.12
diff -u -r1.12 memory.c
--- memory.c2001/10/06 00:57:43 1.12
+++ memory.c2001/10/06 17:39:55
@@ -40,8 +40,8 @@
 */
 void *
 mem_allocate_aligned(INTVAL size) {
-ptrcast_t max_to_alloc;
-ptrcast_t mask;
+ptrcast_t max_to_alloc = 0;
+ptrcast_t mask = 0;
 ptrcast_t i;
 void *mem = NULL;


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[PATCH] packfile.c another uninit var potential

2001-10-06 Thread Bryan C . Warnock

Index: packfile.c
===
RCS file: /home/perlcvs/parrot/packfile.c,v
retrieving revision 1.10
diff -u -r1.10 packfile.c
--- packfile.c  2001/10/06 00:57:43 1.10
+++ packfile.c  2001/10/06 17:53:04
@@ -1507,11 +1507,12 @@

 if (!self) {
 /* TODO: OK to gloss over this? */
-return 0;
+return (opcode_t) 0;
 }

 switch(self-type) {
 case PFC_NONE:
+packed_size = 0;
 break;

 case PFC_INTEGER:
@@ -1533,12 +1534,17 @@
 break;
 
 default:
+packed_size = 0;
 break;
 }
 
 /* Tack on space for the initial type and size fields */
-
-return packed_size + 2 * sizeof(opcode_t);
+if (packed_size) {
+return packed_size + 2 * sizeof(opcode_t);
+}
+else {
+return 0;
+}
 }
-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: [PATCH] non-init var possibility

2001-10-06 Thread Gibbs Tanton - tgibbs

No, the behavior of malloc(0) is implementation defined. 

-Original Message-
From: Bryan C. Warnock
To: [EMAIL PROTECTED]
Sent: 10/6/2001 12:43 PM
Subject: [PATCH] non-init var possibility

mask and max_to_alloc are unitialized if the size requested is less that
1.  
(Which it could be, since INTVAL is signed.)  Of course, if it happens,
you 
should get what you deserve, but this at least horks them cleanly.

Creation of an UINTVAL (UNTVAL? :-)  and subsequent patches will follow 
pending feedback.

Is the behavior of malloc(0) consistent?

Index: memory.c
===
RCS file: /home/perlcvs/parrot/memory.c,v
retrieving revision 1.12
diff -u -r1.12 memory.c
--- memory.c2001/10/06 00:57:43 1.12
+++ memory.c2001/10/06 17:39:55
@@ -40,8 +40,8 @@
 */
 void *
 mem_allocate_aligned(INTVAL size) {
-ptrcast_t max_to_alloc;
-ptrcast_t mask;
+ptrcast_t max_to_alloc = 0;
+ptrcast_t mask = 0;
 ptrcast_t i;
 void *mem = NULL;


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: [PATCH] vtable.tbl: REGEX pointer

2001-10-06 Thread Gibbs Tanton - tgibbs

 
Thanks! Applied.
-Original Message-
From: Bryan C. Warnock
To: [EMAIL PROTECTED]
Sent: 10/6/2001 11:56 AM
Subject: [PATCH] vtable.tbl: REGEX pointer

Index: vtable.tbl
===
RCS file: /home/perlcvs/parrot/vtable.tbl,v
retrieving revision 1.1
diff -u -r1.1 vtable.tbl
--- vtable.tbl  2001/10/06 12:41:57 1.1
+++ vtable.tbl  2001/10/06 16:56:14
@@ -35,5 +35,5 @@
 void logical_orPMC* leftPMC* right
 void logical_and   PMC* leftPMC* right
 void logical_not   PMC* left
-void match PMC* leftREGEX re
+void match PMC* leftREGEX* re
 void repeatPMC* leftPMC* right

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: vtable.h

2001-10-06 Thread b_vlad

I could help you with the process_op_func.pl thing.
Unless, you've already coded it yourself! :-).

Cheers,
Vladimir Bogdanov.

ps. have to figure how to get the WinCVS thing work...
can't seem to be able to access cvs.perl.org. I've used
the following set up:

CVSROOT = :pserver:[EMAIL PROTECTED]:/home/perlcvs
Authentication = passwd file on the remote host.

any suggestion on how to make it work?

-Original Message-
From: Gibbs Tanton - tgibbs [mailto:[EMAIL PROTECTED]]
Sent: Saturday, October 06, 2001 6:12 AM
To: 'Simon Cozens '; '[EMAIL PROTECTED] '
Subject: RE: vtable.h


for add we will end up with

void add( PMC* self, PMC* left, PMC* right );

does this represent:

self = left + right 

or some other ordering?
-Original Message-
From: Simon Cozens
To: [EMAIL PROTECTED]
Sent: 10/6/2001 7:50 AM
Subject: vtable.h

I've just committed some files which generate vtable.h; these were
actually left over from my experiments of a *long* time ago. [1] It
might need quite a few changes, but it's a good start, and I think
it's general enough to survive. 

The next thing I want to do with it is have something akin to
process_op_func.pl which takes a macro-ized description of the vtable
functions and turns them into real C code. Volunteers welcome, or I'll
write it myself. :) This could be a good place, however, for newcomers
to Parrot to get involved with something relatively straightforward
but pretty crucial. Hint, hint...

Simon

[1] In short, back in the beginning, Dan and I independently started
implementing Parrot; Dan's version was more complete and sensible by
the time we got together to discuss it, so his became the codebase
that was checked into CVS. Dan had started with the interpreter main
loop and ops, and I had started with PMCs.

-- 
Actually Perl *can* be a Bondage  Discipline language but it's unique
among such languages in that it lets you use safe words. 
-- Piers Cawley



RE: [PATCH] non-init var possibility

2001-10-06 Thread Tom Hughes

In message [EMAIL PROTECTED]
  Gibbs Tanton - tgibbs [EMAIL PROTECTED] wrote:

 No, the behavior of malloc(0) is implementation defined.

It is, yes, but there are only two legal results according to
the ISO C standard:

If the size of the space requested is zero, the behavior is
 implementation-defined: either a null pointer is returned, or
 the behavior is as if the size were some nonzero value, except
 that the returned pointer shall not be used to access an object.

In other words it can't crash or do anything else undesirable, and
the result will always be something that can't be dereferenced, but
can be freed (given that the standard requires free(NULL) to work).

Given that, although we can't say the behaviour is strictly speaking
consistent it is true that as far as performing normal operations on
the pointer go you are unlikely to notice which behaviour a given
platform has chosen.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




[Patch] Lint, take two.

2001-10-06 Thread Josh Wilmes


Here's a replacement for my previous patch.   This one includes the 
following:

  Makefile target for lint (runs lclint with some very permissive settings)
  Fixes some ignored return values
  A few minor casts.

--Josh

-- 
Josh Wilmes  ([EMAIL PROTECTED]) | http://www.hitchhiker.org




Index: Configure.pl
===
RCS file: /home/perlcvs/parrot/Configure.pl,v
retrieving revision 1.23
diff -u -r1.23 Configure.pl
--- Configure.pl	2001/10/04 20:19:38	1.23
+++ Configure.pl	2001/10/06 21:17:42
@@ -72,7 +72,8 @@
 
 	cc =			$Config{cc},
 	#ADD C COMPILER FLAGS HERE
-	ccflags =		$Config{ccflags}. -I./include,
+cc_inc  =  -I./include,
+	ccflags =		$Config{ccflags},
 	libs =			$Config{libs},
 	cc_debug =		'-g',
 	o =			'.o',		# object files extension
Index: Makefile.in
===
RCS file: /home/perlcvs/parrot/Makefile.in,v
retrieving revision 1.18
diff -u -r1.18 Makefile.in
--- Makefile.in	2001/10/06 12:41:57	1.18
+++ Makefile.in	2001/10/06 21:17:42
@@ -9,7 +9,7 @@
 #DO NOT ADD C COMPILER FLAGS HERE
 #Add them in Configure.pl--look for the
 #comment 'ADD C COMPILER FLAGS HERE'
-CFLAGS = ${ccflags} ${cc_debug}
+CFLAGS = ${ccflags} ${cc_inc} ${cc_debug}
 
 C_LIBS = ${libs}
 
@@ -19,6 +19,9 @@
 TEST_PROG = test_prog${exe}
 PDUMP = pdump${exe}
 
+LINT = lclint
+LINTFLAGS = +showscan +posixlib -weak +longintegral +matchanyintegral -formattype
+
 .c$(O):
 	$(CC) $(CFLAGS) -o $@ -c $
 
@@ -86,3 +89,9 @@
 
 update:
 	cvs -q update -dP
+
+lint: test_prog pdump
+	$(LINT) ${cc_inc} $(LINTFLAGS) `echo $(O_FILES) | sed 's/\.o/\.c/g'`
+	$(LINT) ${cc_inc} $(LINTFLAGS) test_main.c
+	$(LINT) ${cc_inc} $(LINTFLAGS) pdump.c	
+
Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.32
diff -u -r1.32 basic_opcodes.ops
--- basic_opcodes.ops	2001/10/06 00:57:43	1.32
+++ basic_opcodes.ops	2001/10/06 21:17:43
@@ -140,7 +140,7 @@
 
 /* TIME Ix */
 AUTO_OP time_i {
-  INT_REG(P1) = time(NULL);
+  INT_REG(P1) = (INTVAL)time(NULL);
 }
 
 /* PRINT Ix */
@@ -316,7 +316,7 @@
 
 /* TIME Nx */
 AUTO_OP time_n {
-  NUM_REG(P1) = time(NULL);
+  NUM_REG(P1) = (FLOATVAL)time(NULL);
 }
 
 /* PRINT Nx */
Index: interpreter.c
===
RCS file: /home/perlcvs/parrot/interpreter.c,v
retrieving revision 1.23
diff -u -r1.23 interpreter.c
--- interpreter.c	2001/10/03 16:21:30	1.23
+++ interpreter.c	2001/10/06 21:17:44
@@ -235,6 +235,8 @@
 
 /* The default opcode function table would be a good thing here... */
 {
+/*@-castfcnptr@*/
+   
 opcode_t *(**foo)();
 foo = mem_sys_allocate(2048 * sizeof(void *));
 
Index: packfile.c
===
RCS file: /home/perlcvs/parrot/packfile.c,v
retrieving revision 1.10
diff -u -r1.10 packfile.c
--- packfile.c	2001/10/06 00:57:43	1.10
+++ packfile.c	2001/10/06 21:17:47
@@ -1306,27 +1306,28 @@
 #if TRACE_PACKFILE
 printf(PackFile_Constant_unpack(): Unpacking no-type constant...\n);
 #endif
+return 1;
 break;
 
 case PFC_INTEGER:
 #if TRACE_PACKFILE
 printf(PackFile_Constant_unpack(): Unpacking integer constant...\n);
 #endif
-PackFile_Constant_unpack_integer(self, cursor, size);
+return(PackFile_Constant_unpack_integer(self, cursor, size));
 break;
 
 case PFC_NUMBER:
 #if TRACE_PACKFILE
 printf(PackFile_Constant_unpack(): Unpacking number constant...\n);
 #endif
-PackFile_Constant_unpack_number(self, cursor, size);
+return(PackFile_Constant_unpack_number(self, cursor, size));
 break;
 
 case PFC_STRING:
 #if TRACE_PACKFILE
 printf(PackFile_Constant_unpack(): Unpacking string constant...\n);
 #endif
-PackFile_Constant_unpack_string(self, cursor, size);
+return(PackFile_Constant_unpack_string(self, cursor, size));
 break;
 
 default:
@@ -1335,7 +1336,7 @@
 break;
 }
 
-return 1;
+/*NOTREACHED*/
 }
 
 
Index: pdump.c
===
RCS file: /home/perlcvs/parrot/pdump.c,v
retrieving revision 1.3
diff -u -r1.3 pdump.c
--- pdump.c	2001/09/30 20:25:22	1.3
+++ pdump.c	2001/10/06 21:17:47
@@ -60,7 +60,10 @@
 
 pf = PackFile_new();
 
-PackFile_unpack(pf, packed, packed_size);
+if (!PackFile_unpack(pf, packed, packed_size)) {
+printf( Can't unpack.\n );
+return 1;
+}
 PackFile_dump(pf);
 PackFile_DELETE(pf);
 
Index: register.c
===
RCS file: /home/perlcvs/parrot/register.c,v
retrieving revision 1.10
diff -u -r1.10 register.c
--- 

Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

1) Assuming a core set of unoverrideable opcodes 0-128 (so I don't need to 
differentiate between core and alternate opcodes.)
2) Maintaining each operation as a block (so that any necessary variables 
are declared locally to each case.)
3) Incrementing the pc pointer directly.
4) Accessing the necessary registers as current written (from the 
interpreter struct.)

Benchmarks on test.pasm:

Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long 
Function table: 31,712,475 ops/sec
Switch hybrid: 39,215,686 ops/sec (+24%)

Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long
Function table: 13,181,019 ops/sec
Switch hybrid: 18,416,206 ops/sec (+40%)

This is relatively consistent with my pre-Parrot testing.  If the model 
holds, reserving 256 (vice 128, which we're almost at) will reduce the 
difference slightly.  (Obviously, by clustering most often used codes to the 
front, you'd probably get better performance since you're not traipsing all 
about memory any longer.  Currently, for instance, comparision and branches 
are 40-60 code blocks away, while 'end' (which occurs once) is at offset 0. 
The ops used in this test are mostly up front.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[PATCH] Switchable runops core functions

2001-10-06 Thread Gregor N. Purdy

All --

I've had a couple of inspirations since the 0.0.2 release, and this
is the one I can do from home, without the op_info stuff from one
of my earlier patches.

Assume there is a configuration space of runops core behaviors that
is based on various settings of the interpreter flags. If there
were enough of them, we'd want to be able to do the combinatorics
with a script that generates the code, but for this example there
are only two such flags:

PARROT_TRACE_FLAG- If true, we print tracing info
PARROT_BOUNDS_FLAG   - If true, we check bounds

I can imagine a third one, which I'll get to in more detail later:

PARROT_EVENT_FLAG- If true, we check for events

Now, we already had test_prog (BTW, when are we going to call this
*the* interpreter, instead of just a test program) set up to
intercept the -t flag to turn on tracing.

I've set up the -b flag to turn on bounds checking (off by default
now).

But, the most important part of this patch is the implementation
of dynamic switching of runops cores. This means that we have much
flexibility, but programs that don't use a feature don't have to
pay for it in the inner loop. But, those features can be turned
on and off at *run time*.

Here's how it works:

  * A new element of the interpreter structure: resume_addr.

There's a new check in runops so that if we end, but have
a resume_addr set, we go back to the point where we
select a core based on the flags, and resume execution
with the new core.

  * New ops 'trace_ic' and 'bounds_ic' (_i variants would be
reasonable, too).

These ops twiddle the appropriate bits of the interpreter's
flag, set up a resume address and return the PC for the
next instruction (just like any other op).

It is required that you have an 'end' op follow these ops
to force the DO_OP loop to terminate and trigger the
mechanism described above.

I've included a test program, t/trace.pasm, that demonstrates
this.

I've also run t/test.pasm with and without -b and noted a small
increase in performance without bounds checking for those that
want to live fast and dangerously :)


Now, on to PARROT_EVENT_FLAG. With the mechanism implemented
here, events could be turned off at the start (when the event
queue is empty), and ops that queue events could turn the
flag on and cause a resumption so that a core that checks for
events would be used. When the event queue is empty, we can
flip the flag back off again. Details, of course, to be
filled in by folks who want to do the event stuff.

Anyway, whether code that does events needs to leave the flag
on or can get by with flipping it like this, programs that
don't do events don't have to pay *any* inner loop cost for
having an interpreter that allows other programs to use them.
I think that counts as being The Parrot Way (TM).


With enough thought, chances are good that we could come up
with a program that generates runops_cores.[hc] from some
specification of the interpreter flags and some code fragments
with combination hints. Sort of a Poor Man's Aspect-Oriented
Programming for Parrot Inner Loops...

I'm sure you can imagine your own bits of switchable code for
runops cores, but one more that I've thought a little bit
about is:

PARROT_PROFILE_FLAG

for op profiling. I'm sure you can imagine what that would
look like.


I'm considering committing this patch, but I'll wait for some
feedback from others to see if I've missed something important.


Regards,

-- Gregor




? include/parrot/vtable.h
Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.32
diff -a -u -r1.32 basic_opcodes.ops
--- basic_opcodes.ops   2001/10/06 00:57:43 1.32
+++ basic_opcodes.ops   2001/10/06 22:27:13
@@ -699,3 +699,20 @@
 AUTO_OP xor_i {
   INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3);
 }
+
+/* BOUNDS_ic */
+AUTO_OP bounds_ic {
+  if (P1) { interpreter-flags |=  PARROT_BOUNDS_FLAG; }
+  else{ interpreter-flags = ~PARROT_BOUNDS_FLAG; }
+  RESUME(3); /* After the end op which must follow bounds */
+  RETURN(2);
+}
+
+/* TRACE_ic */
+AUTO_OP trace_ic {
+  if (P1) { interpreter-flags |=  PARROT_TRACE_FLAG; }
+  else{ interpreter-flags = ~PARROT_TRACE_FLAG; }
+  RESUME(3); /* After the end op which must follow trace */
+  RETURN(2);
+}
+
Index: interpreter.c
===
RCS file: /home/perlcvs/parrot/interpreter.c,v
retrieving revision 1.23
diff -a -u -r1.23 interpreter.c
--- interpreter.c   2001/10/03 16:21:30 1.23
+++ interpreter.c   2001/10/06 22:27:13
@@ -13,6 +13,13 @@
 #include parrot/parrot.h
 #include parrot/interp_guts.h
 
+runops_core_f   runops_cores[4] = {
+  runops_t0b0_core,
+  runops_t0b1_core,
+  runops_t1b0_core,
+  runops_t1b1_core
+};
+
 char *op_names[2048];
 int   op_args[2048];
 
@@ -42,26 

Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 06:38 pm, Bryan C. Warnock wrote:
 4) Accessing the necessary registers as current written (from the
 interpreter struct.)

The added benchmarks are the caching of the interpreter's register groups
within the runops_*_core.  (You can't cache the register set itself, as 
functions may manipulate the register stack.)


 Benchmarks on test.pasm:

 Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long
 Function table: 31,712,475 ops/sec
 Switch hybrid: 39,215,686 ops/sec (+24%)
Switch (rcache): 41,152,263 ops/sec (+30%)


 Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long
 Function table: 13,181,019 ops/sec
 Switch hybrid: 18,416,206 ops/sec (+40%)
Switch (rcache): 18,203,883 ops/sec (+38%)

One of the more interesting discoveries?  Adding a 'default:' case to the 
switch slowed down the Linux runs by several percent.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[PROPOSED] Crystalizing loader

2001-10-06 Thread Gregor N. Purdy

All --

My previous post included a patch. This one doesn't because I can't work
on this one away from my office. But, I'm going to put the idea out to
the list, and perhaps someone will beat me to trying it (but, do please
tell me if you are going to so I don't go duplicating your effort when
I get back to my office).

I have a 32-bit system, so the discussion below will be geared to that
environment. I think it shouldn't be too hard to adapt the technique to
a 64-bit system, but that's not my area of specialization.


After the bytecode is loaded, but before it is executed, put it through
a stage of processing that requires about as much information as a
disassembler would (which is why my op_info stuff from one of my previous
patches is required).

This process converts opcodes into pointers to the op functions, and
arguments to pointers to the constant values or register entries. This
means that we amortize the dereferences over all invocations of the
op at each PC, which when tight loops are involved should make for
noticable savings.

In the case of my 32-bit machine, I could do the conversion in-place
and hand the resulting crystalized bytecode over to a runops
variant that knows what to expect. BTW, this could be controlled by
an interpreter flag (but one that doesn't have a corresponding op):

PARROT_CRYSTALIZE_FLAG

Although, one wonders if these flags really should be part of the
code object rather than part of the interpreter, so that they are
local to their compilation unit. Perhaps the true answer will be
some combination of interpreter flags and code flags combine to
select the runops core that is used.

One extra trick needed, though, is a version of process_opfunc.pl
that compiles the ops so that they expect their arguments to already
have the dereferencing done. This shouldn't be too hard, but we'd
need room for another parallel opcode table, or we'd need to switch
it in as appropriate.


If I don't hear from someone that they are going to try this out, I'll
take it on next time I'm in my office (possibly as early as tomorrow
Morning, EST).


Regards,

-- Gregor



? include/parrot/vtable.h
Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.32
diff -a -u -r1.32 basic_opcodes.ops
--- basic_opcodes.ops   2001/10/06 00:57:43 1.32
+++ basic_opcodes.ops   2001/10/06 22:27:13
@@ -699,3 +699,20 @@
 AUTO_OP xor_i {
   INT_REG(P1) = INT_REG(P2) ^ INT_REG(P3);
 }
+
+/* BOUNDS_ic */
+AUTO_OP bounds_ic {
+  if (P1) { interpreter-flags |=  PARROT_BOUNDS_FLAG; }
+  else{ interpreter-flags = ~PARROT_BOUNDS_FLAG; }
+  RESUME(3); /* After the end op which must follow bounds */
+  RETURN(2);
+}
+
+/* TRACE_ic */
+AUTO_OP trace_ic {
+  if (P1) { interpreter-flags |=  PARROT_TRACE_FLAG; }
+  else{ interpreter-flags = ~PARROT_TRACE_FLAG; }
+  RESUME(3); /* After the end op which must follow trace */
+  RETURN(2);
+}
+
Index: interpreter.c
===
RCS file: /home/perlcvs/parrot/interpreter.c,v
retrieving revision 1.23
diff -a -u -r1.23 interpreter.c
--- interpreter.c   2001/10/03 16:21:30 1.23
+++ interpreter.c   2001/10/06 22:27:13
@@ -13,6 +13,13 @@
 #include parrot/parrot.h
 #include parrot/interp_guts.h
 
+runops_core_f   runops_cores[4] = {
+  runops_t0b0_core,
+  runops_t0b1_core,
+  runops_t1b0_core,
+  runops_t1b1_core
+};
+
 char *op_names[2048];
 int   op_args[2048];
 
@@ -42,26 +49,44 @@
 }
 }
 
-/*=for api interpreter runops
+/*=for api interpreter runops_t0b0_core
  * run parrot operations until the program is complete
+ *
+ * No tracing.
+ * No bounds checking.
  */
 opcode_t *
-runops_notrace_core (struct Parrot_Interp *interpreter) {
+runops_t0b0_core (struct Parrot_Interp *interpreter, opcode_t * pc) {
 /* Move these out of the inner loop. No need to redeclare 'em each
time through */
 opcode_t *(* func)();
 opcode_t *(**temp)();
+
+while (*pc) { DO_OP(pc, temp, func, interpreter); }
+
+return pc;
+}
+
+/*=for api interpreter runops_t0b1_core
+ * run parrot operations until the program is complete
+ *
+ * No tracing.
+ * With bounds checking.
+ */
+opcode_t *
+runops_t0b1_core (struct Parrot_Interp *interpreter, opcode_t * pc) {
+/* Move these out of the inner loop. No need to redeclare 'em each
+   time through */
+opcode_t *(* func)();
+opcode_t *(**temp)();
 opcode_t * code_start;
 INTVAL code_size;
 opcode_t * code_end;
-opcode_t * pc;
 
 code_start = (opcode_t *)interpreter-code-byte_code;
 code_size  = interpreter-code-byte_code_size;
 code_end   = (opcode_t *)(interpreter-code-byte_code + code_size);
 
-pc = code_start;
-
 while (pc = code_start  pc  code_end  *pc) {
 DO_OP(pc, temp, func, 

Re: More speed trials

2001-10-06 Thread Buggs

On Sunday 07 October 2001 01:16, Bryan C. Warnock wrote:
[...]
 One of the more interesting discoveries?  Adding a 'default:' case to the
 switch slowed down the Linux runs by several percent.

In that, umh, case: do you have an explanation
or could you provide the code?


Buggs



Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 07:43 pm, Gregor N. Purdy wrote:
 The Crystalizing Loader proposal I just made would work better if the
 addresses to the current registers were always the same and pushing
 regs onto stacks made copies, rather than having the current reg file
 be the new set of regs.

And now that you mention it, that may be how the register stack is handled.
Let's take a look...  Nope, it's handled like a regular stack.
Of course, for hardware registers, that isn't so.  You do copy the registers 
onto a stack, but you still reference the registers.  But that's more of a 
function of hardware (where registers are different from memory) than 
software (where memory is the same as memory).  However, to 
always refer to the base set of registers, and to push and pop copies of 
those registers onto a stack that intrigues me.  (Depends on how much we 
push and pull, I guess.  But you did bring up one thing - you don't get a 
copy of the registers when you push.  That makes it nigh impossible to pass 
values in the registers when you need to save the registers off.  Dan?)

But before we going jumping the gun, let's see what straight registers do.
{dum de dum de dum...} Runs about the same for me.  (A shade slower on 
Linux.) 

 I'm interested to know if there's a way to turn the op funcs into chunks
 of code that longjmp around (or something equivalent) so we can get rid of
 function call overhead for simple ops (complex ops could consist primarily
 of a function call internally).

But argument passing?  In theory, you'd just be coding by hand what the 
platform's calling semantics already provide you.  (More or less.)


 In this case, the crystalizing loader puts the address to jump to in place
 of the opcode, and opcodes jump to the location in the next opcode field
 when they are done, and the 'end' opcode is replaced by a well-known
 location that terminates the runops core.

Saving the dereference of the opcode type.  Yes, I'm reserving judgement on 
this (whilst I ponder it.)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 08:05 pm, Buggs wrote:
 On Sunday 07 October 2001 01:16, Bryan C. Warnock wrote:
 [...]

  One of the more interesting discoveries?  Adding a 'default:' case to
  the switch slowed down the Linux runs by several percent.

 In that, umh, case: do you have an explanation
 or could you provide the code?

http://members.home.net/bcwarno/Perl6/spool/interpreter.c
http://members.home.net/bcwarno/Perl6/spool/switch.cinc

An assembler diff between adding the default and not.  I'd interpret it, but 
I haven't figured out exactly how yet.  (These diffs are repeated later on 
in the other runops loop.  I snipped them for brevity.)  The default case 
wasn't exercised during runtime, so it has to be related to the Athlon, 
which I know behaves weirdly.

175c175
   ja  .L212
---
   ja  .L11
179c179
   movl.L213(%eax), %eax
---
   movl.L212(%eax), %eax
185c185
 .L213:
---
 .L212:
3126,3130d3125
 .L212:
   leal-40(%ebp), %eax
   addl$4, (%eax)
   jmp .L11
   .p2align 4,,7

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: More speed trials

2001-10-06 Thread Gregor N. Purdy

Bryan --

 ...  But you did bring up one thing - you don't get a 
 copy of the registers when you push.  That makes it nigh impossible to pass 
 values in the registers when you need to save the registers off.  Dan?)

This is the part about the current design I have a hard time
understanding. Thats why I asked for an example of what the subroutine
calling convention is supposed to look like (at least preliminarily). I
don't remember seeing one.

At least for now, though, I've been able to implement Poor Man's (I'm
using that phrase a lot lately--I wonder what that means :) Subroutines
in Jako through the use of address arithmetic in the assembler and a
sprinkle of cleverness. It doesn't do arguments, though. Sooner or later
The Jako compiler is going to have to become a real compiler and do real
register alloation, etc. But I can't do that until the appropriate bits
of Parrot are implemented and my brain is configured to work with them.

 But before we going jumping the gun, let's see what straight registers do.
 {dum de dum de dum...} Runs about the same for me.  (A shade slower on 
 Linux.)

Could you elaborate on this statement please? I'm not sure I follow...

  I'm interested to know if there's a way to turn the op funcs into chunks
  of code that longjmp around (or something equivalent) so we can get rid of
  function call overhead for simple ops (complex ops could consist primarily
  of a function call internally).
 
 But argument passing?  In theory, you'd just be coding by hand what the 
 platform's calling semantics already provide you.  (More or less.)

There's no argument passing, because the args are on the stream. If
everything is in the byte code stream. You jump to a fixed up address.
The code there knows the PC within the byte code, so it messes with
its args (fixed up pointers to regs and constants) and then jumps to
the address thats been fixed up in place of the next op's opcode (after
updating the PC). No argument passing. Unless I've missed something...

  In this case, the crystalizing loader puts the address to jump to in place
  of the opcode, and opcodes jump to the location in the next opcode field
  when they are done, and the 'end' opcode is replaced by a well-known
  location that terminates the runops core.
 
 Saving the dereference of the opcode type.  Yes, I'm reserving judgement on 
 this (whilst I ponder it.)

Yeah, I want to save (really amortize) all those dereferences and also
save the function call overhead for all simple ops (as I said before, 
complex ops that need temporary variables and such would probably be
moved to functions and the code at the jump target would call that
function with appropriate arg passing and then get back to the same
business as the rest of the ops by updating PC and jumping to the next
op func body.


Regards,

-- Gregor




Re: More speed trials

2001-10-06 Thread Bryan C . Warnock

On Saturday 06 October 2001 09:07 pm, Gregor N. Purdy wrote:
  But before we going jumping the gun, let's see what straight registers
  do. {dum de dum de dum...} Runs about the same for me.  (A shade slower
  on Linux.)

 Could you elaborate on this statement please? I'm not sure I follow...

Oh, since I wasn't doing any register stack manipulation, I pointed to the 
register set itself (to save another level of indirection) to see if that 
would indeed improve performance on register manipulation.  The x86 ran 1/10 
second slower, and the SPARC was unchanged.  (So there's no real performance 
gain to point at for working off the bottom instead of the top.)


   I'm interested to know if there's a way to turn the op funcs into
   chunks of code that longjmp around (or something equivalent) so we can
   get rid of function call overhead for simple ops (complex ops could
   consist primarily of a function call internally).
 
  But argument passing?  In theory, you'd just be coding by hand what the
  platform's calling semantics already provide you.  (More or less.)

 There's no argument passing, because the args are on the stream. If
 everything is in the byte code stream. You jump to a fixed up address.
 The code there knows the PC within the byte code, so it messes with
 its args (fixed up pointers to regs and constants) and then jumps to
 the address thats been fixed up in place of the next op's opcode (after
 updating the PC). No argument passing. Unless I've missed something...

Well, yes.  Argument passing.  Whether they're on the stack or in the 
stream.  (In this case, the stream *is* the stack, sort of.)  I'm just 
saying that, in essence, all the jumping that you'd be coding, with the 
arguments in the stream (vice the stack), is more or less simply reinventing 
the calling semantics of whatever hardware you're on.  At some point, 
though, we will have to trade maintainability and sanity for speed.  ;-)


   In this case, the crystalizing loader puts the address to jump to in
   place of the opcode, and opcodes jump to the location in the next
   opcode field when they are done, and the 'end' opcode is replaced by a
   well-known location that terminates the runops core.
 
  Saving the dereference of the opcode type.  Yes, I'm reserving judgement
  on this (whilst I ponder it.)

 Yeah, I want to save (really amortize) all those dereferences and also
 save the function call overhead for all simple ops (as I said before,
 complex ops that need temporary variables and such would probably be
 moved to functions and the code at the jump target would call that
 function with appropriate arg passing and then get back to the same
 business as the rest of the ops by updating PC and jumping to the next
 op func body.

Well, the simple ops switch is all inlined.  (No function calls.)  But you 
lose the ability to truly cache those addresses, so you can't call them 
directly.  And attemping to discern between an already converted address and 
a simply op will lose any ground you've gained.

(But some caching of the dereferences are good.  On the x86, where registers 
are scarce, I just squeezed another 1.3 million ops/sec by doing that.  But 
the same trick on the SPARC (which has the registers to cache it 
automatically) suffered a performance hit with the overhead of storing the 
deference.)
 
-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: More speed trials

2001-10-06 Thread Gibbs Tanton - tgibbs

I tried removing the bounds checking and adding multiple DO_OPs inside the
while loop.  with -O0 the loop unrolling helped, but removing the bounds
checking actually slowed it down.  With -O3, neither one helped at all.