Re: Need JIT help please - JIT broken with optimized build on Windows (VC)

2007-08-20 Thread Paolo Molaro
On 08/16/07 Joshua Isom wrote:
 The optimization done by the parrot jit is invalid for the x86 C calling
 convention: the area of memory containing the passed arguments
 can be used as scratch space by the called function.
[...]
 Let's go with a Microsoft blog about it, 
 http://blogs.msdn.com/oldnewthing/archive/2004/01/08/48616.aspx.  With 
 __stdcall, the callee cleans the stack.  With __cdecl(used everywhere else, 
 and in part of windows), the caller cleans the stack.  If the ops are 
 converted to __cdecl, it should help fix it(since the function shouldn't 
 logically assume anything about how much stack space is available for 
 clobbering).  I'm having trouble finding anything about who owns the stack 
 space.  My personal view is that the caller owns that stack space, since it 
 can also be used to store local variables, i.e. don't move around data 
 that's in order for calling a function.

It doesn't matter if the call conv is cdecl or stdcall, you always end
up with arguments on the stack:

arg2
arg1
arg0
return ip
[possibly old ebp value]

The location where arg0, arg1 and arg2 are stored on the stack can and
will be overwritten by the called function, it doesn't matter if esp is
restored by the caller or by the callee. So, if someone wants to do the
following calls:
foo (1, 2, 3);
foo (1, 2, 3);

it can't emit (with cdecl):

push 3
push 2
push 1
call foo
call foo
add esp, 12

because the stack words where 1, 2 and 3 where stored may have been
overwritten by the first call to foo, resulting in garbage being passed
to the second call.
The called function knows that it can clobber the stack space where
arguments were passed to it (it is of course a compiler bug if it
changes data above the passed-in arguments).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Need JIT help please - JIT broken with optimized build on Windows (VC)

2007-08-16 Thread Paolo Molaro
On 08/16/07 Ron Blaschke wrote:
  This optimization reaches likely back to times, when the opcode engine was 
  designed. It's saving one interpreter push statement [1] per JIT calling 
  one 
  external function, and I've always thought of it as a very cool (and valid) 
  thingy, when I first realized, why the interpreter is the second argument 
  in 
  opcode functions ;)
 
 I think it's a really cool idea, too.  I'd like to have a way to disable
 it, though, to measure its effect, and maybe to work around compilers
 like VC (at least until a better solution comes around).

The optimization done by the parrot jit is invalid for the x86 C calling
convention: the area of memory containing the passed arguments
can be used as scratch space by the called function.
If you can make sure it's not harmful that way you could still use it
when calling parrot's own jitted functions which could use a different
calling convention, but it is wrong when interoperating with other code
(gcc can generate the same issues, so it's not just VC).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Supporting safe managed references

2006-01-24 Thread Paolo Molaro
On 01/24/06 Jonathan Worthington wrote:
 .NET has these managed reference thingies.  They're basically like 

They are called managed pointers.

 pointers, but safe.  What makes them safe is that only certain instructions 
 can create them and the pointer value can't be set directly (we can do that 
 fine - just have a PMC with an underlying struct that hold the pointer, but 
 provide no PMC methods that set that pointer).

The value of the managed pointers can of course change (you can store
managed pointers to local variables...). This happens also when you
restrict your implementation to verifiable code.

 Making them work on Parrot is no problem.  Making them work without 
 comprimising the safety of the VM is harder.  Amongst the things you can 

I think you're approaching this from the wrong side. The IL bytecode is
checked for correctness and verifiability during translation and this is
completely independent on the VM that executes the translated code.
Assuming you don't introduce bugs in the translation and the VM works
correctly, after translation the code won't compromise the VM.

 get a pointer to are parameters and local variables.  Under .NET that means 
 stack locations; with Parrot that means registers.  So, imagine this.

Well, you'll need a stack in parrot, too, or performance will be
orrible. It's still not clear to me if your effort is supposed to build
a practical system or just a proof of concept, though. If you restrict
yourself to a subset of verifiable ops and don't care about speed, any
approach will work;-)

 1) A two part solution.
 
 a) Make set an MMD operation.  This way, a .NET managed reference PMC knows 
[...]
 b) Add a v-table flag saying returning me is forbidden and checking that 
[...]
 2) Add a new Reference register type.  We have INSP, so we'd also have R 
[...]
 Maybe someone has an idea for a better way.  Anyhow, that's the problem and 
 the ideas so far.  Comments and suggestions very welcome.

I think the only sane solution is to make sure that the integer
registers are at least pointer-sized and use them to store managed
pointers (not sure if this is possible with parrot: some time back there
was the policy of allowing to configure the size of int regs, so
making a build provide different semantics than another one...).

The only caveat is that you'll need to check those registers during GC.

BTW, from a quick look at your interesting blog:
the arrays as supported by the VM are fixed length and can only contain
elements of a single type. One reading of the specification hints that
they are always object types, so a FixedPMCArray would do fine for any
array.

Arrays have an element type and all the elements will be of that type.
The element type can be almost any type, though, so not only objects,
but also bytes, integers etc.
Also, if you care about the details: both arrays and strings must be
allocated in a single amount chunk of memory (but see above: with a
subset, non-pratical system you can ignore this).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: pmc_type

2004-10-29 Thread Paolo Molaro
On 10/29/04 Leopold Toetsch wrote:
  Ugh, yeah, but what does that buy you?  In dynamic languages pure
  derivational typechecking is very close to useless.
 
  Actually, if I were to write a perl runtime for parrot, mono or
  even the JVM I'd experiment with the same pattern.
 
 For the latter two yes, but as Luke has outlined that doesn't really
 help for languages where methods are changing under the hood.

If a method changes you just replace the pointer in the vtable
to point to the new method implementation. Invalidation is the
same, you just replace it with a method that gives the 
method not found error/exception.

  You would assign small interger IDs to the names of the methods
  and build a vtable indexed by the id.
 
 Well, we already got a nice method cache, which makes lookup a
 vtable-like operation, i.e. an array lookup. But that's runtime only
 (and it needs invalidation still). So actually just the first method
 lookup is a hash operation.

And where is it cached and how? Take (sorry, still perl5 syntax:-):

foreach $i (@list) {
$i-method ();
}

With the vtable idea, the low-level operations are (in pseudo-C):
vtable = $i-vtable; // just a memory dereference
code = vtable [method-constant-id]; // another mem deref
run_code (code);

From your description it seems it would look like:
vtable = $i-vtable;
code = vtable-method_lookup (method); // C function call
run_code (code);

Note that $i may be of different type for each loop iteration.
Even a cached lookup is going to be slower than a simple memory 
dereference. Of course this only matters if the lookup is 
actually a bottleneck of your function call speed.

  matter in parrot as long as bytecode values are as big as C ints:-)
 
 That ought to come ;) Cachegrind shows no problem with opcode fetch and
 you know, when it's compiled to JIT bytecode size doesn't matter anyway.
 We just avoid the opcode and operand decoding.

If you use a JIT, decode overhead is already very small:-) AFAIK, alpha
is the only interesting architecture that doesn't do byte access (at least
on older processors) and so it may be a little inefficient there. But I think
you should optimize for the common case. On my machine going through a byte
opcode array is faster than an int one by about 15% (more if level 2 cache
or mem is needed to hold it). The only issue is when you need to load int
values that don't fit in a byte, but those are not so common as register 
numbers in your bytecode which currently take a whole int could just use
a byte.
Anyway, the two approaches may also balance out if the opcodes are
in ro memory. The issue is that in perl, for example, so much is
supposed to happen at runtime, because the 'use' operator changes the
compiling environment, so you actually need to compile at runtime in many 
cases, not only eval. That means emitting parrot bytecode in memory and
this bytecode is per-process, so it increases memory usage and eventually
swapping activity. As you say, since you jit, this memory is wasted, since
it goes unused soon after it is written.
Another issue is disk-load time: when you have small test apps it doesn't
matter, but when you start having bigger apps it might (even mmapping
has its cost, if you need a larger working set to load bytecodes).

BTW, in the computed goto code, make the array of address labels const:
it helps reducing the rw working set at least when parrot is built as an
executable.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: pmc_type

2004-10-28 Thread Paolo Molaro
On 10/27/04 Luke Palmer wrote:
 Stéphane Payrard writes:
  That would allow to implement typechecking in imcc.
  
.sym Scalar a
a = new .PerlInt  # ok.  Perlint is derived from Scalar
 
 Ugh, yeah, but what does that buy you?  In dynamic languages pure
 derivational typechecking is very close to useless.  The reason C++[1]
 has pure derivational semantics is because of implementation.  The
 vtable functions have the same relative address, so you can use a
 derived object interchangably.  In a language where methods are looked
 up by name, such strictures are more often over-restrictive than
 helpful.

Actually, if I were to write a perl runtime for parrot, mono or 
even the JVM I'd experiment with the same pattern. I guess it could
be applied to a python implementation, too.
You would assign small interger IDs to the names of the methods
and build a vtable indexed by the id. In most cases the method name
is known at compile time, so you know the id and you can get
the method with a simple load from the vtable. This is much faster
than a hash table lookup (I hinted at this in my old RFC for perl6).
Of course the table would be sparse, especially in pathological 
programs, so you could have a limit, like 100 entries or less
with IDs bigger than that using a different lookup (binary search 
on an array, for example). There are a number of optimizations that 
can be done to reduce the vtable size, but I'm not sure this would
matter in parrot as long as bytecode values are as big as C ints:-)
Maybe someone has time to write a script and run it on a bunch of 
perl programs and report how many different method names are usually
created. Of course it also depends how much the hash lookup will cost
wrt the total cost of a subroutine call...

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: ICU Outdated - Ideas

2004-08-09 Thread Paolo Molaro
On 08/03/04 Leon Brocard wrote:
 IIRC the mono people wrote their own, but with the ICU data files.
 Apart from license issues, this might be an interesting thing to look
 at.

We use some of the ICU data files for locale info like day/month names,
date/time formats etc. We have a tool that takes the data, merges with
our 'fixes' to the data and spits out C data structures.
We still use ICU for string collation if the libs are available.
For a number of reasons (ICU is huge for what we need from it, its
behaviour doesn't always match our requirements, it is an external
dependency, it uses C++ and others) we'd like to drop the ICU dependency
as soon as possible.
If people are interested, we might be able to cooperate to develop a 
library which does just that (unicode string collation) and which 
could be imported and used both in mono and parrot (with minimal or no
changes).
From the mono point of view, the license would have to be of the MIT/X11
kind (though initially written in C, I'd like to be able to move the 
code to C# later to take advantage of the jit: our C# libs are MIT/X11
licensed). I guess this would be fine for parrot as well, but let me
know if it's not.
The other requirement is to efficiently handle UTF-16 strings, which are
the internal representation in the CLR (I don't care if the library can
also handle UTF-8 or UCS4: we just won't use those bits): ie as long as the
code doesn't need to convert the strings to UCS4 before operation, it 
should be fine.
I haven't investigated all the issues with collation support, so it
might end up that it's not practical to use the same code in mono and
parrot, but I thought I'd ask and see what people think.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Newbie Question

2004-04-02 Thread Paolo Molaro
On 04/01/04 Goplat wrote:
  I read in the FAQ, vis a vis using the .NET instead of writing your own
  The .NET VM didn't even exist when we started development, or at least we
  didn't know about it when we were working on the design. We do now, though
  it's still not suitable.
[...]
 Those VMs are designed for statically typed languages. That's fine,
 since Java, C#, and lots of other languages are statically typed. Perl
 isn't.  For a variety of reasons, it means that Perl would run more
 slowly there than on an interpreter geared towards dynamic languages.

People may want to take a look at this paper:
http://www.python.org/pycon/dc2004/papers/9/IronPython_PyCon2004.html
It suggests a few techniques to implement dynamic languages on the CLR
(some surprisingly similar to the ones I suggested on these lists or in
private emails to people asking: kudos to Jim for actually writing the
code instead of just talking like me:-).
As it has been suggested, python on the CLR runs significantly faster
for some programs and significantly slower for others: now to find out
what case is more prevalent in common code:-) It seems safe to say,
though, that it can run rougthly at the same speed as the current C
implementation, but with all the interoperability advantages that
running on the CLR gives. That sounds good enough for me: I would pay a
50% speed degradation on the dynamic language side when I can easily
implement the time-critical code in C# and have that chunk run 10+
times faster. This new version of IronPython seems to
implement all/most of the semantics of python so it's also likely that
it could be improved to be even faster.
Note also his conclusions, though:

IronPython can run fast on .NET and presumably on any decent
implementation of the CLR. Python is an extremely dynamic language and
this offers compelling evidence that other dynamic languages should be
able to run well on this platform. However, implementing a dynamic
language for the CLR is not a simple process. The CLR is primarily
designed to support statically typed OO and procedural languages.
Allowing a dynamic language to run well on this platform requires
careful performance tuning and judicious use of the underlying CLR
constructs.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: [PATCH] more oo*.* benchmarks

2004-03-21 Thread Paolo Molaro
On 03/21/04 Jarkko Hietaniemi wrote:
[...]
  oofib   100%144%132%240%212%140%136%
[...]
 That being said, people more conversant than me in Python/Ruby
 (or Parrot) are welcome to carefully compare the scripts to verify that
 the scripts really do implement the same tasks.

oofib.imc seems to use int registers for the arguments and the
calculations, though at least the perl code uses scalars (of course).
So, while that tells us that using typed integer registers makes for
faster code, the equivalent code should be using PerlInt PMCs, I think.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Classes and metaclasses

2004-03-14 Thread Paolo Molaro
On 03/13/04 Mark Sparshatt wrote:
 One difficulty is when calling class methods some languages require that 
 you provide the class object as the receiver (Ruby and c# do this) while 
 some languages let you use an instance of the class as the receiver 
 (Java does this)

I think you're confusing what it looks like and what it does.
When designing a VM such as parrot it's very important to keep the two
things separate.
What it looks like is the realm of the compiler. The compiler maps
from the language syntax to the execution capabilities provided by the
virtual machine and runtime. Even if from the syntax point of view
it looks like static methods in C# require the class name to invoke
them, this doesn't mean a 'class object' is involved at all. Same with
java: if it's allowed to call a static method with the syntax:
instance_of_class.static_method_of_class ();
it doesn't mean instance_of_class is involved in the call: it isn't.
The compiler will find static_method_of_class and emit a direct call to it,
possibly discarding instance_of_class completely (unless it's an
expression with possible side effects, but in any case, the instance
object is not involved in the call).

What it does is the realm of the virtual machine or runtime.
The virtual machine needs to provide enough power for the compiler
to be able to implement its specific languuage semantics (either
directly or through helper methods/libraries). Of course, if the VM
provides more than one way to do it, the compiler should try to use the
more efficient one. For example, on the CLR you can call a static method
in at least three ways:
* directly from IL code (same overhead as a direct C function call)
* using reflection (provides lots of flexibility, since it can
be done at runtime, arguments are converted to the correct
types if needed etc. Of course the price to pay is slowness...)
* delegate invocation: some of the benefits of a reflection call
with an overhead just slightly bigger than a direct call

I think parrot should provide two mechanisms: a fast one for languages
that can take advantage of it and a more dynamic one for use by the
other implementations. Of course the main issue becomes: what happens
when two different langauges with different semantics need to call each
others's methods? This is the main issue that parrot faces if it is to
become a real VM for dynamic languages, but sadly this problem space has
not been addressed yet, mostly because of the lack of real langauge
implementations (but the pie contest is rapidly approaching and it's
likely to be a big driving force:-) The PHP compiler is also progressing
nicely from what I hear, so it's likely that by summer some people
will have to actually deal with inter-languages calls).

Some of the call issues are present already: I don't remember if it has
been addressed somewhere already, but how does parrot deal with
perl (arguments passed in the @_ array) with the parrot call convention
that passes some args in registers and some not?
For example, consider:
sub my_func {
my ($arg1) = shift;
do_something (@_);
}
When it is called with:
my_func (1);
my_func (1, 2, 3, 4, ..., 11, 12, 13);
my_func (@an_array);
Since the call has no prototype some args go in registers and some in
the overflow array. In the first case my_func will need to build the @_
array from the first PMC arg register. In the second case the compiler
needs to create a temporary array and put some args in registers and
some in the overflow temp array. When called, my_func needs to gather
all the args and put them in the @_ array. 
In the third case the compiler needs to discard the passed array, put
some of the elements in registers and some in a newly created temp array
(the overflow array) unless it can play tricks by modifying @an_array
(but this is only safe when @an_array is a temporary list, such as one
returned from sort etc.). Again, at the method prolog, my_func needs to
reconstruct the argument array @_ from args in registers and in the
overflow array. Some of the shuffling could be avoided if @_ becomes
a magic array which keeps some data in registers and some in the
overflow array, but this has many issues itself, since the first
arguments are in registers which could be overwritten at the first call.
So it looks like there is already a lot of complexity and memory
shuffling: has anyone generated the code (maybe by hand) to implement in
parrot something like the above scenario and measured the speed
characteristics vs the equivalent perl5/python code (I don't know,
though, if in python the call semantics are similar to the perl5 ones...)?
Thanks.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: newbie question....

2004-03-14 Thread Paolo Molaro

My weekly perusing on parrot lists...

On 03/12/04 Dan Sugalski wrote:
 For example, if you look you'll see we have 28  binary add ops. 
 .NET, on the other hand, only has one, and most hardware CPUs have a 

Actually, there are three opcodes: add, add.ovf, add.ovf.un (the last
two throw an exception on overflow with signed or unsigned addition:
does parrot have any way to detect oveflow?).

 few, two or three. However... for us each of those add ops has a very 
 specific, fixed, and invariant parameter list. The .NET version, on 
 the other hand, is specified to be fully general, and has to take the 
 two parameters off the stack and do whatever the right thing is, 
 regardless of whether they're platform ints, floats, objects, or a 
 mix of these. With most hardware CPUs you'll find that several bits 

Well, not really: add is specified for fp numbers, 32-bit ints, 64-bit
ints and pointer-sized ints. Addition of objects or structs is handled
by the compiler (by calling the op_Addition static method if it exists,
otherwise the operation is not defined for the types). Also, no mixing
is allowed, except between 32-bit ints ant pointer-sized ints,
conversions, if needed, need to be inserted by the compiler.

 in each parameter are dedicated to identifying the type of the 
 parameter (int constant, register number, indirect offset from a 
 register). In both cases (.NET and hardware) the engine needs to 
 figure out *at runtime* what kind of parameters its been given. 

Well, on hardware the opcodes are really different, even if it may look
like they have a major opcode and a sub-opcode specifying the type.

 the decoded form. .NET does essentially the same thing, decoding the 
 parameter types and getting specific, when it JITs the code. (And 
 runs pretty darned slowly when running without a JIT, though .NET was 
 designed to have a JIT always available)

Yes, so it doesn't matter:-) It's like saying that x86 code runs slow if
you run it in an emulator:-) It's true, but almost nobody cares
(especially since IL code can now be run with a jit on x86, ppc, sparc
and itanium - s390, arm, amd64 are in the works).

 Parrot doesn't have massive parallelism, nor are we counting on 
 having a JIT everywhere or in all circumstances. We could waste a 
 bunch of bits encoding type information in the parameters and figure 
 it all out at runtime, but... why bother? Since we *know* with 
 certainty at compile (or assemble) time what the parameter types are, 
 there's no reason to not take advantage of it. So we do.

Sure, doing things as java does, with different opcodes for different
types is entirely reasonable if you design a VM for interpretation
(though arguably there should be a limit to the combinatorial explosion
of different type arguments). There is only a marginal issue with
generics code that the IL way of doing opcodes allows and the java 
style does not, but it doesn't matter much.

 real penalty to doing it our way. It actually simplifies the JIT some 
 (no need to puzzle out the parameter types), so in that we get a win 
 over other platforms since JIT expenses are paid by the user every 
 run, while our form of decoding's only paid when you compile.

This overhead is negligible (and is completely avoided by using the
ahead of time compilation feature of mono).

 Finally, there's the big does it matter, and to whom? question. As 
 someone actually writing parrot assembly, it looks like parrot only 
 has one add op--when emitting pasm or pir you use the add 
 mnemonic. That it gets qualified and assembles down to one variant or 

Well, as you mention, someone has to do it and parrot needs to do it
anyway for runtime-generated parrot asm (if parrot doesn't do it already
I guess it will need to do it anyway to support features like eval etc.).
Anyway, if you're going to JIT it doesn't matter if you use one opcode
for add or one opcode for each different kind of addition. If you're
going to interpret the bytecode, having specific opcodes makes sense.

 For things like the JVM or .NET, opcodes are also bit-limited (though 
 there's much less of a real reason to do so) since they only allocate 
 a byte for their opcode number. Whether that's a good idea or not 

Don't know about the JVM, but the CLR doesn't have a single byte limit
for opcodes: two byte opcodes are already specified (and if you consider
prefix opcodes you could say there are 3 and 4 bytes opcodes already:
unaligned.volatile.cpblk is such an opcode). Also, the design allows for
any number of bytes per opcode, though I don't think that will be ever
needed: the CLR is designed to provide a fast implementation of the
low-level opcodes and to provide fast method calls: combining the two
you can implement rich semantics in a fast way without needing to change
the VM. There are still a few rough areas that could use a speedup
with specialized opcodes, but there are very few of them and 2-3
additional opcodes will fix them.

 Parrot, on the other 

Re: pdd15_objects.pod, attributes, properties, and life

2004-02-27 Thread Paolo Molaro
On 02/27/04 Dan Sugalski wrote:
   What .NET calls an attribute parrot calls a property
   What .NET calls a property parrot calls an attribute
[...]
 Oh, yeah. No matter which way we go we'll be confusing someone, since 
 there are languages that call what we call an attribute an attribute. 
 The .NET reversal's just the big nasty one.

In the CLR properties are defined by:
a name
an optional set method
an optional get method
One of the two methods must exist (enabling readonly/writeonly
properties). So, the only thing CLR properties have in common with
Parrot attributes is that they both have a name:-) From the pod
description it looks like Parrot attributes have per-object values,
though CLR properties can be static or instance properties.
At least half of the name confusion goes away by simply rewriting the
sentence to:
What .NET (and C/C++ etc) calls a field parrot calls an attribute
Though I would personally just use the name 'field' for them and have no
need to explain to programmers what attributes are.
Apparently, there are two vtable methods to get and set an attribute,
still this doesn't mean an attribute is like a CLR property, since the
access methods are per-class in Parrot, while per-property in the CLR.

Small comment on a chunk of the doc:
=item addattribute Px, Sy

Add attribute Sy to class Px. This will add the attribute slot to all
objects of class Px and children of class Px, with a default value of
CNull.

This basically means that all the alive objects in the heap must be
considered and a possibly expensive isa() op is run on them and if they
are derived from Px a lot of memmoves would be going on in the
attriibutes array. Hope no one calls this method on a busy server:-)

Next issue: parrot properties. The description says:
=item Property

A name and value pair attached to a PMC. Properties may be attached to
the PMC in its role as a container or the PMC in its role as a value.

Properties are global to the PMC. That is there can only be one
property named FOO attached to a PMC, and it is globally visible to
all inspectors of the PMCs properties. They are Inot restricted by
class.

Properties are generally assigned at runtime, and a particular
property may or may not exist on a PMC at any particular
time. Properties are not restricted to objects as such, and any PMC
may have a property attached to it.

Basically a parrot property has a name and a value generally assigned
at runtime to a PMC (considered as a value or as a container), whatever
its type.
A CLR attribute is a completely different thing:-) It is an object
defined only at compile time for some specific metadata elements:
assemblies (libraries), types, methods, fields, method
arguments, properties, events.
They have no name, so there can be multiple different objects of the
same type associated to a metadata element (actually getting the same
attribute object repeatedly will give different object instances...).

So the other half of the naming confusion goes away by simply removing
the misleading statement:
What .NET calls an attribute parrot calls a property

Hope this helps. Thanks.
lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Alignment Issues with *ManagedStruct?

2004-02-06 Thread Paolo Molaro
On 02/05/04 Uri Guttman wrote:
 with this struct (from leo with some minor changes):
 
 struct xt {
   char x;
   struct yt {
   char i,k;
   int  j;
   } Y;
   char z;
 } X;
 
 and this trivial script (i pass the filename on the command line):
[...]
 i get this lovely output:
 
 struct yt
 char i : offset 0
 char k : offset 1
 int j : offset 2
 struct xt
 char x : offset 0
 struct yt Y : offset 1
 char z : offset 7
[...]
 BTW, this was on a sparc/solaris box.

The offsets look incorrect. On basically every modern 32 or 64 bit
OS (with sizeof(int) == 4) it should look like:

struct yt (size=8, alignment=4)
char i : offset 0
char k : offset 1
int j : offset 4

struct xt (size=16, alignment=4)
char x : offset 0
struct yt Y : offset 4
char z : offset 12

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: How to run typeless languages fast on parrot?

2003-11-07 Thread Paolo Molaro
On 11/06/03 Leopold Toetsch wrote:
  because pmc as so much slower that native types i loose most of 
  parrots
  brilliant speed due to the fact that i don't know the types at
  compile-time.
 
 When you get rid of all the temps and the clone[1], the loop with PMCs 
 runs at about 3 times the speed of perl5. So I'd not use the term 
 slow. It would be faster with native types of course, but its hard to 
 impossible to decide at compile time, that native types do the same 
 thing. I don't know PHP but in perl the + operator could be overloaded 
 and calculate the phase of the moon instead of adding 5.5 ...

Yes, this is a very important point. Current dynamic languages can get a
speedup in several ways:
*) reducing opcode dispatch overhead (the various mechanisms parrot
uses, like cgoto etc. and by jitting the code)
*) better VM implementation (for example the use of the vtable instead
of multiple checks trick that Dan mentions)
*) code specialization

The first two you get for free using parrot or another advanced VM
with a JIT (well, the second item may require to properly design
the language specific PMCs or classes, but that's not so difficult after
so many years with the perl/python/php engines).
The latter is what potentially gives the bigger benefit, but it requires
lot of work in the compiler frontend (using typed variables make it
easier, but to preserve a better feel something like pysco is needed).
For example, in the sample code, the compiler could see that the
constants are integers and doubles and that the '+' operator hasn't been
overloaded, so it could do away with using PMCs (but, again, this is a
lot of work).

Parrot already helps with the first two items a lot and it provides most
of the needed features for the last item: specialized registers and a
JIT. What I think it's missing is a way to perform function/method calls
very fast (maybe also allowing inlining). This is needed when mixing
different languages that install their own vtable implementations
and also inside a dynamic language that uses vtables to multiplex
(like it would happen with perl's IV, NV etc variables). But this is a
small concern: much more work needs to go into the compiler frontends
for php/python/perl etc to take advantage of it.

 # for ($i = 0; $i  10_000_000; $i = $i + 5.5) { }
 .sub _main
 .sym var i
 i = new PerlUndef
 i = 0
 $P0 = new PerlUndef
 $P0 = 1000
 lp:
 if i = $P0 goto ex
 i = i + 5.5
 goto lp
 ex:
 end
 .end

I couldn't resist writing an equivalent program that runs on the Mono
VM:-) The Perl* classes implement just a few methods to support the
sample code, but adding more is mostly an issue of cutpaste.
I believe the code accurately simulates what would happen with perl or
php: a scalar starts out as an integers and becomes a float mid-way.
There are three different loops that simulate how smart a compiler
frontend could be: one puts just $i in a 'PMC' (PerlSV in the C# code),
the second puts the loop upper bound, too, to match with the above imcc
code and the last puts the 5.5 constant in a PerlSV, too.

The run times on my machine are:

mono smart-compiler: 250 ms
mono imcc-equiv: 340 ms
mono little-dumber-compiler: 460 ms
parrot -j -O 4: 580 ms
parrot -j: 600 ms
parrot: 635 ms
perl: 1130 ms
php4: 2150 ms

The parrot build was configured with --optimize. Parrot has a very good
startup time (~10 ms), while the startup time for mono (running the same
program with the loop limit set to 10) is about 100 ms. We obviously need
to improve this, but this also means that the actual time to execute the
loops above is about 100 ms lower for mono and 10 ms lower for parrot 
(times were measured with time (1)).
My *guess* is that mono executes the same code faster because of better
register allocation and faster function calls, I expect parrot to
improve on both counts as it matures, at least as long as the vtable
functions are implemented in C: anyone with performance numbers for
vtable functions implemented in parrot asm?
Hope this helps in the discussion (I included the C# code below so that
people can check if I missed something of the semantics).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


using System;

abstract class PerlBase {
public abstract PerlBase Add (double v);
public abstract PerlBase Add (int v);
public abstract PerlBase Add (PerlBase v);
public abstract bool LessThan (int v);
public abstract bool LessThan (PerlBase v);
public abstract PerlIV IsIV ();
public abstract PerlNV IsNV ();
}

sealed class PerlIV: PerlBase {
internal int value;

public PerlIV (int v) {
value = v;
}

public override PerlBase Add (double v) {
PerlNV res = new PerlNV (v + 

Re: How to run typeless languages fast on parrot?

2003-11-07 Thread Paolo Molaro
On 11/07/03 Leopold Toetsch wrote:
 Very interesting. While we don't compete with typed languages, its nice 

Oh, but mono _will_ also compete with dynamically typed languages! :-)

 to see, that parrot execution speed is in the region of mono - for such 
 small tight loops.

Well, that was mono executing the same code a dynamic language like php,
python or perl would generate for running on the CLR.
I only implemented it in C# because I don't have the time to write a
perl/php/python compiler:-) If the loop was coded properly in C# it
would have been much faster (as it would have been much faster on parrot
as well if it used only integer and float registers).

 Register allocation AFAIK. PMCs are not in registers yet (only I- and 
 N-Regs). I dunno on which architecture you ran the test, but FYI 

Sorry, forgot to mention I have a 1.1 Pentium III (we currently have the
jit running only on x86, itanium and ppc).

 JIT/i386 calls vtable functions directly for simple opcodes like add. If 
 we have finally PMCs in registers too, we could safe more cycles...

Yes, I noted that the unoptimized parrot is only slightly slower than
the optimized build: this means that most of the code is jitted, good.

 A little thing is missing in your code: if num+num gives an equivalent 
 int, it should get morphed back to a PerlIV.

Oh, is that a requirement of the Parrot spec or PHP? AFAIK perl doesn't
behave that way (though in some configuration it tries to do operations
with integer arithmetric, the type is kept as a NV). The quick code I
put together isn't designed for such frequent type flip-flops, so I
fixed it with a suboptimal solution, but it's still very fast:
400 ms (it was 340 before, for reference parrot takes 580 ms).
For those interested, the Add method in PerlNV became:

public override PerlBase Add (double v) {
value += v;
int c = (int)value;
if (value == c) {
if (ivcache == null) {
ivcache = new PerlIV (c);
ivcache.nvcache = this;
} else {
ivcache.value = c;
}
return ivcache;
}
return this;
}
A better solution would have been to store the values directly in PerlSV
and use the value field in it just as a fast dispatch for the different
vtables in PerlIV, PerlNV etc.

Thanks!

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Registers vs. Stack

2003-08-22 Thread Paolo Molaro
On 08/21/03 Tom Locke wrote:
 Note that I have *absolutely* no opinion on this (I lack the knowledge).
 It's just that with Perl, Python, Ruby, the JVM and the CLR all stack based,
 Parrot seems out on a limb. That's fine by me -- innovation is not about
 following the crowd, but I feel it does warrant stronger justification.

A well-designed register-based interpreter can be faster than a
stack-based interpreter (because of the reduced opcode dispatch overhead).
Doing a simple JIT for it may be also easier, if you ignore some
advanced optimizations.
That seems to be the main reason for parrot to go for it.
Perl/Python/Ruby don't have (opcode dispatch) speed as their main aim
(they use coarse instructions) so they use a stack-based design because
it's much simpler.
The JVM and the CLR use a stack-based instruction representation, but
they are intended for JIT compilation in the end and in that case a
register-based representation doesn't buy you anything (and complicates
things such as the calling convention).
That said, it will be interesting to see how Parrot will turn out to be
once it's reasonably feature complete: it may end up providing
interesting research results (whether good or bad, we don't know yet:-).

 p.s. (and at the risk of being controversial :) Why did Miguel de Icaza say
 Parrot was based on religion? Was it realted to this issue? Why is he
 wrong?

See his side of the story at:
http://primates.ximian.com/~miguel/activity-log.php (22 July 2003).
There are also a few short comments on some blogs. But the main point
is: at the end of the day numbers are what count, though anyone is free
to assign more weight to different numbers such as:

* execution speed (in mini, macro and bogus benchmarks:-)
* number of languages that can be reasonably run on the VM
* number of langauges that _cannot_ reasonably run on the VM:-)
* memory usage overhead
* runtime safety features
* number of platforms supported
* number of developers working on the VM
* number of users of (programmers for) the VM

So, you can't really say someone is wrong, until you measure at least
some of the above quantities and set your priorities for which ones you
prefer. Alas, measuring is hard and someone's priorities may not match
the priorities of whoever implements/designs the virtual machines:-)

I guess some first reasonably good numbers will come out of the Python
on Parrot pie-fest: can't wait a year for it, though:-)

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: This week's summary

2003-07-16 Thread Paolo Molaro
On 07/16/03 Dan Sugalski wrote:
pit/pratfalls. At one point, Tupshin suggested emulating a 'more
traditional stack-oriented processor' and I don't think he was 
joking...
 
 Indeed, I wasn't, but I wish somebody would at least have the
 decency to tell me how insane this is. ;-)
 
 Oh, sorry.
 
 You're insane. :)
 
 Traditional processors aren't stack-oriented, not even ones that are
 more register-starved than the x86 family. (I'm thinking of the 6502
 with it's 1.75 registers here)

The wording stack-oriented processor is a little misleading, since it
usually means the processor has a stack-oriented instruction set,
instead of a register one. The original context, instead, implies it
refers to GCC's assumptions about the existence of the runtime stack
(as the contiguous area of memory where call frames are stored).
There is already a (limited) gcc backend that targets the CLR that
sets up an area of memory for that use, but, as you can guess, it's not
vey nice.

 The base architecture's fixed, and I'm not inclined to change it at
 this point. GCC could handle it if someone wanted to work it out--it
 can already deal with multiple register classes, since most machines
 these days have at least two (general purpose and float) and the ones
 with vector processors arguably have three types. It'd probably have
 to do less work for parrot than for other systems, as we have more
 registers, and register starvation's one of the more annoying things
 a compiler has to deal with.

I think a gcc port would require parrot to provide at least a stack 
memory area and a register (sp) that points to it. There may be other
issues with the parrot instruction set, but since you have already
hundreds (or thousands?) of opcodes, I guess it wouldn't be an issue to
add a few more if needed:-)

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Re: Objects, finally (try 1)

2003-01-12 Thread Paolo Molaro
On 01/11/03 Nicholas Clark wrote:
  This allows us to declare 8bit characters and strings of those and all the 
  stuff we're used to with C like unions ... (C# has 16bit chars, and strings
  are UTF8 encoded , IIRC) ...
 
 That doesn't sound right. But if it is right, then it sounds very wrong.
 
 (Translation: Are you sure about your terms, because what you describe sounds
 wonky. Hence if they are using UTF8 but with 16 bit chars, that feels like a
 silly design decision to me. Perl 5 performance is not enjoying a variable
 length encoding, but using an 8 bit encoding in 8 bit chars at least makes
 it small in memory.)

The CLR runtimes use 16 bit chars and UTF16-encoded strings (at least as
far as it's visible to the 'user' programs).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Objects, finally (try 1)

2003-01-12 Thread Paolo Molaro
On 01/12/03 Gopal V wrote:
 If memory serves me right, Paolo Molaro wrote:
  The CLR runtimes use 16 bit chars and UTF16-encoded strings (at least as
  far as it's visible to the 'user' programs).
 
 1023.2.3 #Strings heap
 11 The stream of bytes pointed to by a #Strings header is the physical 
representation of the logical string heap.
 13 but parts that are reachable from a table shall contain a valid null 
terminated UTF8 string. When the #String

The #Strings heap doesn't contain strings for programs that run in the
CLR (unlike the #US -user sring- heap that contains the strings in
UTF-16) encoding. What matters, though, is the encoding of the String
class at runtime and that is defined to be UTF-16, it has absolutely no
importance what encoding it has on disk (even though that encoding is
still UTF-16).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Quick note on JIT bits

2002-11-16 Thread Paolo Molaro
On 11/16/02 Gopal V wrote:
  the above was a partial cutpaste with s/mono/parrot/ :-)
 
 But don't act like you invented it  Kaffe had one before you thought
 about mono JIT ... 

The idea probably predates kaffe, too. Anyway, I didn't say I had the
idea, but the implementation. Quoting what you dropped from my mail, I
said we had the complete examples and:

 We have also an implementation of the symbol file writer in
 mono/jit/debug* that may be helpful to look at.

If you think you have novel ideas in the JIT (or in the intepreter) space, 
you're probably just deluding yourself.

 http://www.kaffe.org/doc/kaffe/FAQ.xdebugging
 
 I just saved some typing by cut-pasting what I had in my box 
 And don't bother to thank me for introducing the debug jit so that you 
 could show up with Mono has this and more banner

I pointed at the mono implementation simply because it is more complete
and flexible than the kaffe one. kaffe only does the stabs format
(and it outputs only the info for the code). Mono has the code that
allows you to access locals and arguments in a stack frame and inspect
the fields of the runtime-created types. Mono can also output symbol
files in both stabs and dwarf format: the dwarf format doesn't have the
limitation of stabs regarding line number info, for example, and it
makes it possible to output debug info also for optimized code.
BTW: the mono implementation is not mine, I just started it and then
Martin Baulig filled the holes and implemented the dwarf stuff.

If you can find a better implementation than mono's, I'd like to know
about it, too:-) Until then, it's just a waste of people's time to
provide limited information or to point to a limited implementation of
the idea ;-)

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Quick note on JIT bits

2002-11-15 Thread Paolo Molaro
On 11/15/02 Gopal V wrote:
 It is possible ... JIT generated code looks just like loaded code to
 gcc ... Typically gdb should only need access to a symfile to correctly
 allow debugging ... So an .o  file of the JIT'd code should be correctly 
 generated with all the trimmings.
 
 $ gdb parrot
 (gdb) run -debug=dwarf2 --break __main__ Foo.pbc
 (gdb) call Parrot_debug_make_syms()
 (gdb) add-symbol-file Foo.o
 Reading symbols from Foo.o
 (gdb) frame
 #0 __main__ at Foo.py:5 (HINT: where's the python compiler :-)
 
 The trick here is to use `gas' or the gnu assembler to generate the 
 debugging info from the assembly dump you provide ... For example a
 function would be dumped as...

You can find the complete examples of how the jit debugging features
work in the mono tarball (mono/doc directory): the above was a
partial cutpaste with s/mono/parrot/ :-)

We have also an implementation of the symbol file writer in mono/jit/debug*
that may be helpful to look at.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Minor patch to make the GC count total bytes copied [APPLIED]

2002-04-13 Thread Paolo Molaro

On 04/12/02 Dan Sugalski wrote:
 FWIW, the numbers were:
 
 No JIT:  Parrot  866 gen/sec  Mono  11 gen/sec
JIT:  Parrot 1068 gen/sec  Mono 114 gen/sec

Interesting data: was this taken a while ago?
I get different ratios on my machine (PIII 1.1):
Parrot JIT: 850 (though the output is all garbled)
Mono JIT: 113

Still quite a bit slower, though. Our String.cs code is still from the
age when we didn't know C# and didn't know how the CLR worked, basically
the same kind of ugly and slow code as the life.cs sample you posted:-)
Anyway, if really the speed ratio went from 9.3 to 7.5 just waiting a
few weeks, I think we are on the good track, even if the code was not
optimized to run your life benchmark:-) (ok, it may just be that parrot
got slower, who knows...)

 In this case, it is apples to apples under the hood--concatenation in 
 Parrot currently generates new strings as well. The only place (at 
 the moment) we don't do immutable strings is for chomp.

Interesting. Currently our string methods are all done in C# code,
but the spec basically requires us to do most of it in C, so I guess
the ratio will change quite a bit when we do that even without an
optimizing JIT (plus, it will save us two memory allocations per 
string object).

 On the other hand, there's a very strong Who cares? argument. 
 (Targeted directly at the CLR design, not at Mono) If the design of 
 the CLR requires certain operations to be really slow, well... 
 they're going to be slow, and code that uses it will be stuck with 
 lousy speeds. It *is* apples to apples comparison of speeds, since 
 we're both doing concatenation. That you're potentially saddled with 
 a slow design isn't my fault. :)

See above: if strings are immutable, concatenation is not the same thing
as when they are buffers, so, in real world code, you don't use a string
if you need a buffer in C# code. A part from the current known System.String
slowness, nobody would write C# code like in life.cs (well, specially
not for use in benchmarks:-), so I don't consider it a design issue.
Of course, it's a problem if perl programmers start translating the perl
code to C# changing every scalar to a System.String ;-)
But, as we learn to avoid some code patterns in perl code because they
slow down, the same thing I hope will happen to C# programmers,
specially if they see the same kind of improvements in speed as
applying to life.cs the suggestions in my first mail:

$ mono life2.exe 
500 generations in 67 milliseconds, 7434 gen/sec.
RESULT: 0

I guess parrot can see about the same speed difference if you use an
array instead of the strings. _That_ is a design issue with perl, for
example, where if you just want an array of ints, you have to pay for
the whole scalar thing for each element, you can't escape it.
I don't consider it a design issue if the _developer_ writes slow code
(though the language should make it easy to write the fast code).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Minor patch to make the GC count total bytes copied [APPLIED]

2002-04-12 Thread Paolo Molaro

On 04/11/02 Dan Sugalski wrote:
 I'm not sure which is worse--the amount of data we're copying around, 
 or the fact that we eat Mono's lunch while we do so.

:-)
Could you post the code for the sample? Is it based on the snipped Simon
posted a while ago where it used the pattern:

string buffer = ;
...
for (...) {
buffer += *;
}

A string in the CLR is completely different from a scalar in the perl
world. A string is immutable, so when you use the += operator on it,
it really creates a new object each time, so, if the test uses that
pattern it's comparing apples and oranges. At the very least, you should
use a StringBuilder (or, if you want to really compare the language/VM
implementation differences, you'd use a simple char[] array in the C#
case).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: JIT compilation

2001-11-18 Thread Paolo Molaro

On 11/17/01 Dan Sugalski wrote:
 BTW: we just got our compiler running on linux and compiling a simple
 program, hopefully by next week it can be used for some more real code
 generation.
 
 Yahoo! Congrats. Are we still slower than you are? :)

It's a couple of months I'm in features-and-correctness mode, so I guess
the current mono interpreter is slower than parrot: we have a design to
make it twice as fast as it is now, but it would be a waste of time
since with the JIT we are at least 30-40 times faster anyway :-)

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: JIT compilation

2001-11-16 Thread Paolo Molaro

On 11/08/01 Benoit Cerrina wrote:
 I heard that, I was thinking that it would be great to run ruby on mono but
 ruby is very dynamic (like perl but since its so much easier to use and
 program
 it is also easier to redefine the methods and done more often)

There is an effort to compile ruby to the CLR, I don't know more,
because I can't read japanese :-) And there is someone working on python
support in the mono compiler, too.
BTW: we just got our compiler running on linux and compiling a simple
program, hopefully by next week it can be used for some more real code
generation.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: JIT compilation

2001-11-16 Thread Paolo Molaro

On 11/16/01 Simon Cozens wrote:
 On Fri, Nov 16, 2001 at 03:32:06PM +0100, Paolo Molaro wrote:
  And there is someone working on python
  support in the mono compiler, too.
 
 I know, I've just seen. Wouldn't it be really wonderful, Paolo, if
 someone wrote some Perl bindings for it as well? :)

It would be wonderful :-)
It would be even better if the ActiveState people could share
either the code for the work they already did or their wisdom so that
whoever undertakes the task knows what development paths to avoid.
I know they found it quite hard to implement perl in CLR, I guess both
in speed and featurewise: their input would be very appreciated and I
guess they are listening on this list :-)
If there is already more info in the package they provide on the web
site, I'll go read that provided I can download it in some useful format
(hint, hint).

Anyway, implementing perl is hard in any language as also parrot
will show soon, when the real guts of perl will need to be implemented.
I wonder if the main problem they had was mapping a perl scalar to a
System.String? I'd also like to know what system they used for dynamic
invocation of methods...

lupus / back to System.Reflection.Emit ...

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Revamping the build system

2001-10-23 Thread Paolo Molaro

On 10/23/01 Simon Cozens wrote:
 On Thu, Oct 11, 2001 at 03:24:31PM -0400, Dan Sugalski wrote:
  1) Build minimal perl 6 with default parameters using platform build tool
 
 But platform build tool is going to be 'make' - the alternative is
 that we maintain and ship every flavour of batch or shell script we can
 think of. I don't think much of that.
 
 And if we have to use make, then we're back with the very problems of portably
 calling compilers and so on that this supposed new build system was meant to
 avoid.

I'm going to bite and say the words (and get the flames).

autoconf automake libtool

Yes, they are not perfect, but they work for most of the other projects
and I see no real reason they wouldn't work well enough for parrot.
If anyone wants to develop a different build system, please think
a little and ask yourself if the new system is going to be really better
than auto*/libtool (and done on time).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Revamping the build system

2001-10-23 Thread Paolo Molaro

On 10/23/01 Simon Cozens wrote:
 On Tue, Oct 23, 2001 at 12:16:04PM +0200, Paolo Molaro wrote:
  autoconf automake libtool
 
 MVS, MacOS, cross-compilation.

cross-compilation is not an issue at all with auto* (I'd say it makes it
almost easy to support it).

MacOS: I guess any type of Makefile based build system would not work on
MacOS. They can use a separate build setup with project files for the
metrowerks compiler or whatever the compiler is going to be there.

MVS: people using it are already self-inflicting so much pain that they 
wouldn't notice a little more :-), so they can use a separate build
system if auto* doesn't work on it. Anyway, I'm not sure auto* doesn't
work on mvs.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: PMCs and how the opcode functions will work

2001-10-10 Thread Paolo Molaro

On 10/09/01 Dan Sugalski wrote:
 For sanity's sake, I don't suppose you'd consider
 
 typedef void* (*vtable_func_t)();
 
 to make it
 
 vtable_func_t vtable_funcs[VTABLE_SIZE];
 
 I'd be thrilled. Abstract types are A Good Thing. In fact, I'll go make it 
 so right now. :)

... and to go a step further in sanity and maintainability, I'd suggest
using a structure with properly typed function pointers instead of an
array:

typedef void (*parrot_pmc_add) (PMC *dest, PMC *a, PMC *b);
typedef void (*parrot_pmc_dispose) (PMC *cookie);
...

typedef struct {
parrot_pmc_add add;
parrot_pmc_dispose dispose;
...
} ParrotVtable;

So the actual code to invoke the method is type checked by the compiler
(and easier to read).

pmc-vtable-add (pmc, pmc_a, pmc_b);

instead of the casts you'd need with the array (or the macro hell to
hide it).

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: [PATCH] Big patch to have DO_OP as optional switch() statment

2001-10-09 Thread Paolo Molaro

On 10/07/01 Bryan C. Warnock wrote:
 while (*pc) {
 switch (*pc) {
 }
 }

With the early mono interpreter I observed a 10% slowdown when I
checked that the instruction pointer was within the method's code:
that is not a check you need on every opcode dispatch, but only with
branches, so it makes sense to have a simple while (1) loop.

About the goto label feature discussed elsewhere, if the dispatch
loop is emitted at compile time, there is no compatibility problem
with non-gcc compilers, since we know what compiler we are going to use.
I got speedups in the 10-20% range with dispatch-intensive benchmarks in
mono. It can also be coded in a way similar to the switch code
with a couple of defines, if needed, so that the same code compiles
on both gcc and strict ANSI compilers.

 I don't see (on simple inspection) a default case, which implies that all 
 functions would be in the switch.  There's two problems with that.  First, 
 you can't then swap out (or add) opcode functions, which compilation units 
 need to do.  They're all fixed, unless you handle opcode differentiation 
 within each case.  (And there some other problems associated with that, too, 
 but they're secondary.)  Second, the switch will undoubtedly grow too large 
 to be efficient.  A full switch with as few as 512 branches already shows 
 signs of performance degradation. [1]  Thirdly, any function calls that you 
 then *do* have to make, come at the expense of the switching branch, on top 
 of normal function call overhead.
 
 I've found [2] that the fastest solution (on the platforms I've tested) are 
 within the family:
 
 while (*pc) {
 if (*pc  CORE_OPCODE_NUMBER) {
 pc = func_table[*pc]();
 } else {
 switch (*pc) {
 }
 }
 
 That keeps the switch branching small.  Do this:
 
 while (*pc) {
 switch (*pc) {
 case : ...
 default: pc = func_table[*pc]();
 }
 }
 
 seems simpler, but introduces some potential page (or at least i-cache(?)) 
 thrashing, as you've got to do a significant jump just in order to jump 
 again.  The opcode comparison, followed by a small jump, behaves much nicer.

... but adds a comparison even for opcodes that don't need it.
As with the check for the program counter, it's a check that
not all the opcodes need and as such should be left out of the
fast path. This means that the first 200 and something opcodes
are the most common ones _and_ the ones that need to be fast:
there is no point in avoiding a jump for calls to exit(),
read() etc (assuming those need opcodes at all).

The problem here is to make sure we really need the opcode swap
functionality, it's really something that is going to kill
dispatch performance.
If a module wants to change the meaning of, eg the + operator,
it can simply request the compiler to insert a call to a
subroutine, instead of changing the meaning assigned to the
VM opcode. The compiler is free to inline the sub, of course,
just don't cripple the normal case with unnecessary overhead
and let the special case pay the price of flexibility.
Of course, if the special case is not so special, a _new_
opcode can be introduced, but there is really no reason to
change the meaning of an opcode on the fly, IMHO.
Comment, or flame, away.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: [PATCH] Big patch to have DO_OP as optional switch() statment

2001-10-09 Thread Paolo Molaro

On 10/09/01 Benjamin Stuhl wrote:
 Unfortunately, compiler tricks only work at compile time.
 They're great for static languages like C++ or C#, but Perl
 supports doing
 
 %CORE::GLOBAL::{'print'} = \myprint;
 
 at _runtime_. This is much to late to be going back and
 patching up any occurences of print_p in the opstream, so
 we need a level of indirection on every overridable opcode.
 (Note that the _overloadable_ ones like the math routines
 don't need that level of indirection - they get it by
 vectoring through the PMC vtables).

And the same is supposed to happen when you override print,
since print is just a method in the Stream 'class' or
whatever it's going to be called.
If there is a 'print' opcode the implementation should do
something like:

case PRINT_OP:
interp-current_stdout-vtable-print (...);

and that will do the right thing anyway.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: [PATCH] Big patch to have DO_OP as optional switch() statment

2001-10-09 Thread Paolo Molaro

On 10/09/01 Bryan C. Warnock wrote:
  About the goto label feature discussed elsewhere, if the dispatch
  loop is emitted at compile time, there is no compatibility problem
  with non-gcc compilers, since we know what compiler we are going to use.
  I got speedups in the 10-20% range with dispatch-intensive benchmarks in
  mono. It can also be coded in a way similar to the switch code
  with a couple of defines, if needed, so that the same code compiles
  on both gcc and strict ANSI compilers.
 
 I also tested (previously; I need to hit it again) replacing the loop, 
 switch, and breaks with a lot of gotos and labels.
 
 LOOP:
 /* function look up, if need be */
 switch (*pc) {
 case (1) : { /* yada yada yada */; goto LOOP }
 ...
 }
 
 It improved the speed of non-optimized code, because you didn't jump to the 
 end of the switch simply to jump back to the loop conditional.  But I didn't 
 see any additional improvements with optimized code, because the optimizers 
 take care of that for you.  (Well, really, they put another copy of the 
 while condition at the bottom.)  

Yes, that was basically the same mistake I did at first when testing
the goto label stuff, little or no improvement. But the 'right' way to do
it is to use the goto label construct not only instead of switch(), but
also instead of 'break;'. Oh, and the address array needs to be
declared const and static, too.

const static void * goto_map[] = {OP1_LABEL, OP2_LABEL, ...};

...
while (1) {
goto *goto_map [*ip];   // switch (*ip)
OP1_LABEL:
do_stuff ();
update_ip ();
goto *goto_map [*ip];   // break;
OP2_LABEL:
do_other_stuff ();
update_ip ();
goto *goto_map [*ip];   // break;
...
}

  The problem here is to make sure we really need the opcode swap
  functionality, it's really something that is going to kill
  dispatch performance.
  If a module wants to change the meaning of, eg the + operator,
  it can simply request the compiler to insert a call to a
  subroutine, instead of changing the meaning assigned to the
  VM opcode. The compiler is free to inline the sub, of course,
  just don't cripple the normal case with unnecessary overhead
  and let the special case pay the price of flexibility.
  Of course, if the special case is not so special, a _new_
  opcode can be introduced, but there is really no reason to
  change the meaning of an opcode on the fly, IMHO.
  Comment, or flame, away.
 
 But how are you going to introduce the new opcode?  Recompile Perl? 

Nope, this is done at the design and early beta stage: we decide which
opcodes make sense and which not and we may add specialized opcodes if
it makes sense to do so. Once the opcode set is defined, it can't be
changed until the next release.

 Unacceptable.  We understand that from a classic language perspective, we're 
 slow and lumbering.  We're Perl.  We need that flexibilty.  We're trying to 
 make that flexibility as fast as possible.

Completely agree. My point is that I don't see a real reason to be
flexible at the opcode level (changing the meaning of the opcodes)
when you can have the same outcome with a cleaner design (set a
different method in a class' vtable).
Many things were done as opcodes in perl5 because calling subroutines
is slow there, IIRC. Many of that features can simply be methods
in some class in perl6: it's cleaner and it's flexible and it doesn't
require to change the meaning of the opcodes.

 I've got three different opcode loops so far for Parrot.  (Linux(x86)/gcc, 
 Solaris(SPARC)/Forte, and Solaris(SPARC)/gcc).  I've tried most every 
 combination I can think of (am still working off the last couple, as a 
 matter of fact).  (Particularly ever since I received the inexplicable 
 slowdown adding a default case.)  Take nothing for granted, and try it all.  
 I've posted some of my more ridiculous failures, and have posted what I have 
 found to be my best numbers.  Anyone is free to come up with a better, 
 faster solution that meets the requirements.

Thanks for your efforts, hard numbers are always better than talk! :-)
I was just trying to offer the experience I gathered working on similar 
issues, hoping it can be useful. And yes, I was suggesting to change
a bit the requirements (not of the language, but of the current VM design)
more than proposing an implementation of it.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Strings db

2001-09-24 Thread Paolo Molaro

On 09/24/01 Michael Maraist wrote:
  GNU does offer the gettext tools library for just such a purpose. I don't
  know how it will translate to the various platforms however, and it likely
  is a major overkill for what we are trying to do.
  http://www.gnu.org/manual/gettext/html_mono/gettext.html#SEC2 - Purpose
  It might make sense to implement something like this now, rather than create
  our own system and find out it is insufficient down the road.
  Just a thought,
  Grant M.
 
 But wouldn't that make parrot GPL'd?

IIRC, the latest release is LGPL and GLib 2.0 will probably contain a
(again LGPL) gettext implementation to be used on systems that don't
provide their own.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-10 Thread Paolo Molaro

On 09/07/01 Dan Sugalski wrote:
 The only optimizations that interpreter had, were computed goto and
 allocating the eval stack with alloca() instead of malloc().
 
 Doesn't this really get in the way of threading? You'll chew up great gobs 
 of system stack, and that's a scarce resource when running with threads.

The number of entries in the eval stack is bounded, I've never seen it
grow past 13.

 I think they worked also on outputting IL bytecode...
 
 Yep, but that version was slower. Dick Hardt snagged me at TPC this year (I 
 think I spent most of my non-speaking time being pinned to a table or bench 
 by folks with Things To Say... :) and made a .NET pitch. They were going 
 with the translation to .net but switched to embedding perl. It was a lot 
 faster. (Brad Kuhn had a paper on translating perl to java bytecode and he 
 had lots of trouble, but for different reasons)

My point is that using their experience, a few changes should be designed to
better support dynamic languages in the CLR in the same way that currently
the research is focused on extending it for functional languages, generics
etc.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-06 Thread Paolo Molaro

On 09/06/01 Dan Sugalski wrote:
 The original mono interpreter (that didn't implement all the semantics
 required by IL code that slow down interpretation) ran about 4 times
 faster than perl/python on benchmarks dominated by branches, function 
 calls,
 integer ops or fp ops.
 
 Right, but mono's not an interpreter, unless I'm misunderstanding. It's a 
 version of .NET, so it compiles its code before executing. And the IL it 
 compiles is darned close to x86 assembly, so the conversion's close to 
 trivial.

Nope, if we had written a runtime, library, compiler and JIT engine in two 
months we'd be all on vacation now ;-)
The figures are actually for a stack-based interpreter that executes IL opcodes,
no assembly whatsoever. And, no, IL is not close to x86 assembly:-)
I don't expect a new perl to run that fast, but there is a lot of room for
improvement.

 Java is way faster than perl currently in many tasks:
 
 Only when JITed. In which case you're comparing apples to oranges. A better 
 comparison is against Java without JIT. (Yes, I know, Java *has* a JIT, but 
 for reasonable numbers at a technical level (and yes, I also realize that 
 generally speaking most folks don't care about that--they just want to know 
 which runs faster) you need to compare like things)

It's not so much that java *has* a JIT, but that it *can* have it. My point is,
take it into consideration when designing parrot. There's no need to
code it right from the start, that would be wrong, but allow for it in the design.

 it will be difficult
 to beat it starting from a dynamic langauge like perl, we'll all pay
 the price to have a useful language like perl.
 
 Unfortunately (and you made reference to this in an older mail I haven't 
 answered yet) dynamic languages don't lend themselves to on-the-fly 
 compilation quite the way that static languages do. Heck, they don't tend 
 to lend themselves to compilation (certainly not optimization, and don't 
 get me started) period as much as static languages. That's OK, it just 
 means our technical challenges are similar to but not the same as for 
 Java/C/C++/C#/Whatever.

Yep, but for many things there is an overlap. As for the dynamic language
issue, I'd like the ActiveState people that worked on perl - .net
integration to share their knowledge on the issues involved.

 The speed of the above loop depends a lot on the actual implementation
 (the need to do a function call in the current parrot code whould blow
 away any advantage gained skipping stack updates, for example).
 
 A stack interpreter would still have the function calls. If we were 
 compiling to machine code, we'd skip the function calls for both.

Nope, take the hint: inline the code in a big switch and voila', no
function call ;-)

 numbers. (This is sounding familiar--at TPC Miguel tried to convince me 
 that .Net was the best back-end architecture to generate bytecode for) 

I know the ActivState people did work on this area. Too bad their stuff
is not accessible on Linux (some msi file format stuff).
I don't know if .net is the best back-end arch, it's certanly going
to be a common runtime to target since it's going to be fast and
support GC, reflection etc. With time and the input from the dynamic language
people may become a compelling platform to run perl/python on.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-06 Thread Paolo Molaro

On 09/06/01 Dan Sugalski wrote:
 Okay, I just did a test run, converting my sample program from interpreted 
 to compiled. (Hand-conversion, unfortunately, to C that went through GCC)
 
 Went from 2.72M ops/sec to the equivalent of 22.5M ops/sec. And with -O3 on 
 it went to 120M ops/sec. The smaller number is more appropriate, since a 
 JIT/TIL version of the code won't do the sort of aggressive optimization 
 that GCC can do.
 
 I'm not sure if I like those numbers (because they show we can speed things 
 up with a translation to native code) or dislike them (because they show 
 how much time the interpreter's burning). Still, they are numbers.

A 10x slowdown on that kind of code is normal for an interpreter
(where 10x can range from 5x to 20x, depending on the semantics).
But I think this is not a big issue: speed optimizations need
to be possible, there's no need to implement them right now.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-06 Thread Paolo Molaro

On 09/06/01 Dan Sugalski wrote:
 Then I'm impressed. I expect you've done some things that I haven't yet. 

The only optimizations that interpreter had, were computed goto and
allocating the eval stack with alloca() instead of malloc().
Of course, now it's slower, because I implemented the full semantics required
by IL code (the biggest slowdown came from having to consider
argments and local vars of any arbitrary size; making the alu opcodes
work for any data type slowed it down by 10 % only): but the parrot
interpreter doesn't need to deal with that kind of stuff that slows
down interpretation big time. Still, it's about 2x faster than perl5
on the same benchmarks, though I haven't tried to optimize the new code, yet.

 Also, while I have numbers for Parrot, I do *not* have comparable numbers 
 for Perl 5, since there isn't any equivalence there. By next week we'll 
 have a basic interpreter you can build so we can see how it stacks up 
 against Mono.

See above, I expect it to be faster, at least handling the low-level stuff
since I hope you're not going to add int8, uint8 etc, type handling.

 Yep, but for many things there is an overlap. As for the dynamic language
 issue, I'd like the ActiveState people that worked on perl - .net
 integration to share their knowledge on the issues involved.
 
 That one was easy. They embedded a perl interpreter into the .NET execution 
 engine as foreign code and passed any perl to be executed straight to the 
 perl interpreter.

I think they worked also on outputting IL bytecode...

 I know the ActivState people did work on this area. Too bad their stuff
 is not accessible on Linux (some msi file format stuff).
 I don't know if .net is the best back-end arch, it's certanly going
 to be a common runtime to target since it's going to be fast and
 support GC, reflection etc. With time and the input from the dynamic 
 language
 people may become a compelling platform to run perl/python on.
 
 Doubt it. That'd require Microsoft's involvement, and I don't see that 
 happening. Heck, most of what makes dynamic languages really useful is 

The ECMA people are not (only) m$, there is people in the committee
interested in both other implementations and input on the specs.

 completely counter to what .NET (and java, for that matter) wants to do. 
 Including runtime compilation from source and runtime redefinition of 
 functions. (Not to mention things like per-object runtime changeable 
 multimethod dispatch, which just makes my head hurt)

Naah, with the reflection support you can create types and methods on the fly,
the rest can probably be done with a couple of ad-hoc opcodes.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-05 Thread Paolo Molaro

On 09/04/01 Dan Sugalski wrote:
 Regardless, it's the way we're going to go for now. If it turns out to be a 
 performance dog then we'll go with a stack-based system. Initial 
 indications look pretty good, though.

Care to share some numbers/code for that?

 You're right that optimization research on stack based machines
 is more limited than that on register machines, but you'll also note
 that there are basically no portable interpreters/runtimes based
 on register machines:-)
 
 Yes, but is this because:
 
 A) Register machines are really bad when done in software
 B) People *think* register machines are really bad when done in software
 C) People think stack machines are just cleaner
 D) Most people writing interpreters are working with a glorified CS 
 freshman project with bells and whistles hung on it.
 
 From looking at the interpreters we have handy (and I've looked at a 
 bunch), my bet is D.

Well, if it may help, I'm not a CS student :-)
I think a register machine doesn't give any benefit over a stack machine
and I'm looking for evidences that could prove me wrong.

  More on this point later in the mail.
 There's a reason for that: register virtual machines are more complex
 and more difficult to optimize.
 
 You'd be surprised there. The core of a register VM and a stack VM are 
 equivalent in complexity. A register VM makes reordering ops easier, tends 
 to reduce the absolute number of memory accesses (there's a *lot* of stack 
 thrash that goes on with a stack machine), and doesn't seem to have much 
 negative impact on things.

When done in sw, the main difference is that stack-based machines have a linear stack
and the logic is simple, with register based machines you need to care for
register windows. As for memory accesses, stack based machines access
always the top items of the stack (that fit in 1-2 cache lines) while

add_i N1, N15, N31

may touch a lot more memory (especially if the int registers are int64).
Also, since the indexes for the add_i ops can be any integer, the compiler
can't optimize much, while in a stack based machine they are hardcoded
(0 and 1, for example).

 What, you mean the x86? The whole world doesn't suck. Alpha, Sparc, MIPS, 
 PA-RISC, and the PPC all have a reasonable number of registers, and by all 
 accounts the IA64 does as well. Besides, you can think of a register 

I would expect that from a CS guy! ;-) My point is: optimize for the common
usage patterns/environments and that, as much as we may not like it, is
32 bit x86. Parrot may run super-fast on alpha, but if people have x86 machines
and servers that won't be any good for them. The engine needs to run well
on all the archs, but don't optimize for the special cases.

 machine as a stack machine where you can look back in the stack directly 
 when need be, and you don't need to mess with the stack pointer nearly as 
 often.

In a sw stack machine you can look back at any stack position just fine,
you just don't have to care about register windows:-)

  Literature is mostly concerned
 about getting code for real register machines out of trees and DAGs and
 the optimizations are mostly of two types:
 1) general optimizations that are independed on the actual cpu
 2) optimizations specific to the cpu
 
 [1] can be done in parrot even if the underlying virtual machine is
 register or stack based, it doesn't matter.
 
 Actually it does. Going to a register machine's generally more 
 straightforward than going to a stack based one. Yes, there are register 

What is easier than a post-order walk of a tree? A related point
is that if parrot is meant to run code from different high level
languages, it needs also to be simple for those languages to
generate parrot bytecode.

 usage issues, but they're less of an issue than with a pure stack machine, 
 because you have less stack snooping that needs doing, and reordering 
 operations tends to be simpler. You also tend to fetch data from variables 
 into work space less often, since you essentially have more than one temp 
 slot handy.

I'm not proposing a pure stack machine; besides you can have local variables
in a function and access to them is as fast as access to the registers in a 
register machine.

 Why on earth are you assuming we're going to toss the optree, DAGs, or even 
 the source? That's all going to be kept in the bytecode files.
 
 Also, even if that stuff is tossed, parrot bytecode makes a reasonable MIR. 
 And some recent research indicates that even without all the intermediate 
 stuff saved you can buy a win. (There are currently executable translators 
 that go from one CPUs machine language to another directly. The resulting 
 executables run as fast or faster in most cases. I've even heard reports of 
 one of the translators taking 386 executables and translating it out and 
 back into 386 executables that ran faster. I have no first-hand proof of 
 that, though)

That's all well and good, but most of the research 

Re: An overview of the Parrot interpreter

2001-09-05 Thread Paolo Molaro

On 09/04/01 Uri Guttman wrote:
 does it really matter about comprehension? this is not going to be used
 by the unwashed masses. a stack machine is easier to describe (hence all
 the freshman CS projects :), but as dan has said, there isn't much
 mental difference if you have done any serious assembler coding. is the
 pdp-8 (or the teradyne 18 bit extended rip off) with 1 register (the
 accumulator) a register or stack based machine?

It's easier to generate code for a stack machine and parrot is supposed to
be a target for several languages: make it hard and only perl6 will target parrot.
That said, I haven't seen any evidence a register based machine is going to
be (significantly?) faster than a stack based one.
I'm genuinely interested in finding data about that.

 but it doesn't matter what the underlying hardware machine is. that is
 the realm of the c compiler. there is no need to think about the map of
 parrot to any real hardware. they will all have their benefits and

I brought the topic up because it was given as a benefit of the parrot
register architecture. An issue is that, if parrot is going to run
also low-level opcodes, it better know about the underlying arch
(well, it would be faster than the current perl anyway, but
that's for another reason).

 that more than one temp slot is a big win IMO. with stack based you
 typically have to push/pop all the time to get anything done. here we
 have 32 PMC registers and you can grab a bunch and save them and then
 use them directly. makes coding the internal functions much cleaner. if
 you have ever programmed on a register cpu vs. a stack one, you will
 understand. having clean internal code is a major win for register
 based. we all know how critical it is to have easy to grok internals. :)

The only difference in the execution engine is that you need to update
the stack pointer. The problem is when you need to generate code
for the virtual machine.

   DS No. Absolutely not. The primary tenet is Keep things fast. In my
   DS experience simple things have no speed benefit, and often have a
   DS speed deficit over more complex things. The only time it's a
   DS problem is when the people doing the actual work on the system
   DS can't keep the relevant bits in their heads. Then you lose, but
   DS not because of complexity per se, but rather because of programmer
   DS inefficiency. We aren't at that point, and there's no reason we
   DS need to be. (Especially because register machines tend to be
   DS simpler to work with than stack machines, and the bits that are
   DS too complex for mere mortals get dealt with by code rather than
   DS people anyway)
 
 amen. i fully agree, register machines are much simpler to code with. i
 don't know why paolo thinks stack based is harder to code to.

stack machine - post-order walk of the tree

reg machine - instruction selection - register allocation - 

 on some machines like the alpha, getting single bytes is slower than
 fetching 32 bits. 

Too bad less than 1 % of the people that will use parrot have
an alpha or a Cray. Optimize for the common case. Besides I think
in the later alphas they added some opcodes to deal with the problem.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: An overview of the Parrot interpreter

2001-09-04 Thread Paolo Molaro

On 09/02/01 Simon Cozens wrote:
 =head1 The Software CPU
 
 Like all interpreter systems of its kind, the Parrot interpreter is
 a virtual machine; this is another way of saying that it is a software
 CPU. However, unlike other VMs, the Parrot interpreter is designed to
 more closely mirror hardware CPUs.
 
 For instance, the Parrot VM will have a register architecture, rather
 than a stack architecture. It will also have extremely low-level
 operations, more similar to Java's than the medium-level ops of Perl and
 Python and the like.
 
 The reasoning for this decision is primarily that by resembling the
 underlying hardware to some extent, it's possible to compile down Parrot
 bytecode to efficient native machine language. It also allows us to make
 use of the literature available on optimizing compilation for hardware
 CPUs, rather than the relatively slight volume of information on
 optimizing for macro-op based stack machines.

I'm not convinced the register machine is the way to go.
You're right that optimization research on stack based machines
is more limited than that on register machines, but you'll also note
that there are basically no portable interpreters/runtimes based
on register machines:-) More on this point later in the mail.
There's a reason for that: register virtual machines are more complex
and more difficult to optimize.

You say that, since a virtual register machine is closer to the actual hw
that will run the program, it's easier to produce the corresponding
machine code and execute that instead. The problem is that that's true
only on architectures where the virtual machine matches closely the
cpu and I don't see that happenning with the current register starved
main archs.

The point about the literature is flawed, IMHO. Literature is mostly concerned
about getting code for real register machines out of trees and DAGs and
the optimizations are mostly of two types:
1) general optimizations that are independed on the actual cpu
2) optimizations specific to the cpu

[1] can be done in parrot even if the underlying virtual machine is
register or stack based, it doesn't matter.
[2] will optimize for the virtual machine and not for the underlying arch,
so you get optimized bytecode for a virtual arch. At this point, though, when
you need to actually execute the code, you won't be able to optimize further for
the actual cpu because most of the useful info (op trees and DAGs) are gone
and there is way less info in the literature about emulating CPU than
generating machine code from op-trees.

If you have bytecode for a stack machine, instead, you can easily reconstruct
the tree, apply the optimizations and generate the efficient machine code
required to make an interpreter for low-level ops useable.

The flow for a register machine will look basically like this:

perl code
op tree
tree optimization
instr selection
reg allocation
byte code
sw cpu emulation on hw cpu
or
translation of machine code to real machine code
exec real machne code

Note that on the last steps there is little (shared) research.

The flow for a stack machine will look instead like this:

perl code
op tree
tree optimization
byte code
interpret byte code
or
instr selection
reg allocation
exec machne code

All the steps are well documented and understood in the research 
(and especially open source) community.

Another point I'd like to make is: keep things simple.
A stack machine is easy to run and write code for. A virtual
register machine is much more complicated, no matter how many
registers you have, you'll need register windows (i.e., you'll
use the registers as a stack).
A simple design is not necessarily slow: complex stuff adds
dcache and icache pressure, it's harder to debug and optimize
(but we know that from the perl5 experience, don't we?).

 Operations will be represented by several bytes of Parrot machine code;
 the first CIV will specify the operation number, and the remaining
 arguments will be operator-specific. Operations will usually be targeted
 at a specific data type and register type; so, for instance, the
 Cdec_i_c takes two CIVs as arguments, and decrements contents of the
 integer register designated by the first CIV by the value in the
 second CIV. Naturally, operations which act on CNV registers will
 use CNVs for constants; however, since the first argument is almost
 always a register Bnumber rather than actual data, even operations on
 string and PMC registers will take an CIV as the first argument. 

Please, reconsider also the size of the bytecode: a minumun of 8 bytes
for a simple operation will result in large cache trashing:

Consider:

sub add ($a, $b) {
return $a+$b;
}

Assuming the args will be already in registers, this becomes something
like:

add_i R1, A1, A2 (16 bytes)
ret (4 bytes)


Re: PDD X: Perl API conventions

2001-03-09 Thread Paolo Molaro

On 03/03/01 Damien Neil wrote:
  All the function names shall begin with the Cperl_ prefix. The only exception
  is function names that may be used only in the perl core: these names shall
  begin with the C_perl_ prefix. This will make it possible to export only
  the perl_* functions from the libperl library on the platforms that support that.
 
 ISO/ANSI C reserves identifiers beginning with a _.  I recommend using
 "perl_" and "perl__" if you want to distinguish internal-only functions
 from public ones.

My understanding is that symbols with double underscore or underscore
followed by an uppercase letter are reserved, while _something
symbols are ok if they are not exported and that is actually
what we are doing. I prefer _perl because it is readily apparent
that the interface is private. Besides ANSI C allows a compiler to
consider only the first 6 chars in identifiers: we don't want to go
that way.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



PDD X: Perl API conventions

2001-03-03 Thread Paolo Molaro

On 03/02/01 Dan Sugalski wrote:
 [gazing into crystal ball . . . ] I predict some header somewhere is going
 to already #define "INT".  Perhaps PERL_INT and PERL_NUM ?
 
 Good point. We should probably prefix all perl 6's data types and functions 
 like we do in perl 5. Perl_ for functionish things, PL_ for dataish things.

It's awhile I wanted to spark some discussion in this area and this
comment made me want to write a PDD on API conventions. The result
follows. Comments welcome.

=head1 TITLE

Perl API conventions

=head1 VERSION

1

=head2 CURRENT

 Maintainer: Paolo Molaro [EMAIL PROTECTED]
 Class: Internals
 PDD Number: Unassigned
 Version: 1
 Status: Developing
 Last Modified: 3 March 2001
 PDD Format: 1
 Language: English

=head2 HISTORY

=over 4

=item Version 1

First version

=back

=head1 ABSTRACT

This PDD describes perl API conventions on function, type and macro names
as well as conventions for the order of the parameters and other general
guidelines to provide a clean and consistent API.

=head1 DESCRIPTION

The perl5 core uses a number of different prefixes (or none at all) for
function, macro and type names. This leads to all sorts of problems,
from namespace pollution (think embedding perl) to poor consistency
(think learning the perl internals).

This PDD proposes a set of conventions perl6 implementors must follow
when contributing to the core functions, macros and types that are
exported in a header file for use in the core or in extensions.

A C implementation is assumed in the examples: implementations in OO
languages mostly will drop the prefix in the function names.

The core as well as the extensions will use the same API (no more Cembed.c).

=head1 IMPLEMENTATION

=head2 Interpreter contexts

The API should remain the same when perl is compiled to support threads or not
(or other similar features that require an interpreter context): this means that
an interpreter context is always passed as an argument to the functions that may
require it. This can be avoided in some cases if the interpreter context is stored
in the objects (this scenario is assumed in the examples below where a PerlSV
knows it has been created in a specific interpreter: this probably only matters
for references, a plain PerlSV should not need a context).

=head2 Type names

Type names shall begin with the CPerl prefix (note there is no underscore).
Examples could be:

PerlIV integer;
PerlSV *scalar;
PerlAV *array;

=head2 Function names

All the function names shall begin with the Cperl_ prefix. The only exception
is function names that may be used only in the perl core: these names shall
begin with the C_perl_ prefix. This will make it possible to export only
the perl_* functions from the libperl library on the platforms that support that.

When a function applies to a well defined internal type, the function name will
be composed of three parts:

=over 4

=item * the Cperl_ prefix

=item * the type tag

=item * the action

=back

The type tag is usually derived from the lowercased type name and removing the Cperl 
prefix, Examples could be:

PerlSV *perl_interp_sv_new (PerlInterp *interpreter);
voidperl_sv_set_nv (PerlSV *scalar, PerlNV value);
PerlSV *perl_av_fetch  (PerlAV *array, PerlIV index, PerlIV lval);

When a function applies to an object such as a PerlSV, the object shall be the first
argument of the function.

=head2 Macro names and enumerations

Macro names and enumeration values shall follow the conventions for function names
with the only difference of using uppercase instead of lowercase.

=head2 Global variable names

Global variables? What global variables?

=head2 Interfacing with extensions

Note: this will need to be expanded.

Extension will install C functions that have the following signature:

int extension_func (PerlContext *context, PerlAV *args, PerlAV *result);

PerlContext will contain the necessary info about the context the function is run
in, such as a PerlInterp*, the info to support want() and so on.
The Cargs array will contain the arguments passed to the function. 
Any result will be pushed to the Cresult array. This means that there is no
need to learn a different API to push values on a stack opposed to push
values into an array.
The meaning of the integer return value will depend on other aspects of the
internals (exception handling for example).

=head1 REFERENCES

perlguts.pod

The conventions used in other successful free software projects like The Gimp,
GLib and Gtk+.

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: Another approach to vtables

2001-02-10 Thread Paolo Molaro

On 02/07/01 Edwin Steiner wrote:
 [snip]
 
 I thought about it once more. Maybe I was confused by the *constant* NATIVE.
 Are you suggesting a kind of multiple dispatch (first operand selects
 the vtable, second operand selects the slot in the vtable)?
 
 So
 $dest = $first + $second
 becomes
 first-vtable-add[second-vtable-type(second)](dest,first,second,key);
 ?
 
 or maybe
 first-vtable-add[second-vtable-slot_select](dest,first,second,key);
 which saves a call by directly reading an integer from the vtable of second.
 
 (BTW, this is also how overloading with respect to the second argument
 could be handled (should it be decided on the language level to do that):
 There could be a slot like add[ARCANE_MAGIC] selected by
 second-vtable-slot_select
 which does all kinds of complicated checks and branches without any cost
 for the vfunctions in the other slots.)
 
 Such a multiple dispatch seems to me like the only solution which avoids
 the following (eg. in Python):
 'first + second' becomes
   1. call virtual function 'add' on first
   2. inside first-add do lots of checks about type of second

Something like what's done in python looks sensible to me.
If a vtable add function is also indexed by type you get exponential
growth of the vtable with the addition of other types and we want to
make that easy in perl 6. Also, it doesn't work if I introduce
my bigint type (the internal int vtable knows nothing about it):

$int = 1;
$bigint = new bigint ("9" x 999);
$res = $bigint + $int; # works, bigint knows about internal int
$res = $int + $bigint; # doesn't work, since the bigint is the second arg

The proposed solution (used in elestic, for example) is to have the add
method return a value indicating it has performed the addition: if it's
false, we try to add using the add method in the second argument that may 
know better...
In the method, you check the types and perform the work only on the
ones you know about.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better



Re: RFC 326 (v1) Symbols, symbols everywhere

2000-10-02 Thread Paolo Molaro

On 09/27/00 Ken Fox wrote:
 Dave Storrs wrote:
  It isn't terribly clear to me either
 
 Well, he does give a couple references that would clear it up.
 X11 Atoms are well documented.
 
  saying is that you can qs() a method name, get a "thingie" out, store the
  thingine in a scalar, and then that scalar becomes a direct portal to the
  method...somewhat like a coderef, but without a required deref.
 
 Actually it's more trivial than that. When you "intern" a symbol, you're
 adding a string-to-integer mapping to the compiler's symbol table. Whenever
 the compiler sees the string, it replaces it with the corresponding
 integer. (The type stays "symbol" though; I'm sort of mixing implementation
 and semantics.) Think of it like a compile-time hash for barewords.

Not only that: every time the compiler sees another symbol with the
same string representation, it uses the already created symbol, so
it doesn't use more memory.
A non-trivial program probably will use several packages (or binary
modules that use several packages, ie Gtk). Let's look at the DESTROY 
method. Currently a string is malloc()ed (in the symbol table for
every package), so that takes 8 bytes for the string + the malloc 
overhead (at least 4 bytes, probably 8 on 32 bit systems). This
doesn't consider other memory that could be saved using hash tables
optimized for symbols (ie integers instead of strings).
Repeat that for all the duplicated method names in a class hierarchy
and you'll easily gain several KB of memory.
As a bonus you'll get faster performance (as integer compare is
faster than a strcmp).

As for the possible uses in the language I should have used a
better example. Let's consider an XML/SGML file with all that
ugly tags and attributes we love (well, no!). An XML parser
loads the file and stores the tags and attributes names as
strings: a lot of tags appear many times in an XML file
leading to a huge memory consumption problem. Now, if the
parser could use symbols, the memory for a tag name would
be allocated only once (so it's also faster because it
doesn't call malloc() that often).
Walking the tree in your perl program you could use integer 
comparison instead of string comparison.

use Benchmark;

$num1 = 10;
$num2 = 20;
$string1 = 'htmltag';
$string2 = 'htmltag';
$string3 = 'buffy';

timethese(1000, {
'number' = '$num1 == $num2',
'stringe' = '$string1 eq $string2', # worst case
'string' = '$string1 eq $string3',  # best case: length differs
});

Gives:
Benchmark: timing 1000 iterations of number, string, stringe...
number:  4 wallclock secs ( 3.87 usr +  0.00 sys =  3.87 CPU)
string:  6 wallclock secs ( 4.28 usr +  0.01 sys =  4.29 CPU)
   stringe:  7 wallclock secs ( 5.97 usr +  0.00 sys =  5.97 CPU)

In the internals using C the performance gains are way more than
the 30% average here.

So, both for internal use and as a language feature there are
advantages, implementation is easy. If no one shows a significant
drawback, it's a deal:-)

The only real problem I see is choosing the single character for
using symbols in the language. I suggested ^ or :, but * may work
as well if typeglobs go away.

Thanks,
    lupus

-- 
Paolo Molaro, Open Source Developer, Linuxcare, Inc.
+39.049.8043411 tel, +39.049.8043412 fax
[EMAIL PROTECTED], http://www.linuxcare.com/
Linuxcare. Support for the revolution.