More thougths on DOD

2003-01-06 Thread Leopold Toetsch
Attached test program shows some additional effects of PMC size and
timing. A PMC is 32 byte, a SPMC is 16 byte, matching current and 
minimal PMC sizes for i386 (or a typical 32 bit system).

1) for linear access half sized PMCs give double mark speed. This is
in good relation to stress.pasm

2) As we have only one free_list per pool, memory usage will get more
and more random, when e.g. parts of arrays are rewritten with new
values, which come from the free_list. The worst case behaviour will be
total random access to our pools data.

For PMCs, the worst case random access takes about double the time of
a linear access. SPMCs take almost 3 times, but are still faster then
PMCs. But the advantage of smaller PMCs is almost gone.

3) Conclusion
To keep PMCs more tightly together when reusing them, these numbers
seem to indicate, that we might need a free_list per pool->arena,
which would need a pool->arena pointer in the PMC.

4) With a pool->arena pointer in the PMC, we could also try to use a
spearate flags field, which could be 1 byte per PMC sized.

5) On my system pools of different sizes are in totally different
ranges of memory (sbrk vs. mmap). This makes the quick bit mask test
in trace_mem_block totally unusable.

Comments, further experiments with 4) and so on welcome,

leo
/* test program for pmc flag access */
/* run program with
 *  cc -o tpmc -Wall tpmc.c -O3 && ./tpmc
 *
 * the timing macro needs adjustment for !i386
 */

#include 
#include 
#include 
#include 

#ifndef NO_USE_RDTSC
#define rdtsclh(low, hi) \
__asm__ __volatile__ ("rdtsc" : "=a" (low), "=d" (hi) )
void rdtsc(long *t)
{
long sec, usec;
rdtsclh(usec, sec);
*t = (long)(1000.0 * ((double)sec + ((double)usec / 100.0)));
}
#else
#include 
double
floatval_time(void)
{
struct timeval t;
gettimeofday(&t, NULL);
return (double)t.tv_sec + ((double)t.tv_usec / 100.0);
}
void rdtsc(long *t)
{
double s = floatval_time();
*t = (long)(s * 1000.0);
}
#endif

#define N 16
#define SIZE 65536
struct pool {
char *mem;
char *flags;
} pool[N];

typedef struct pmc {
int flags;
int fill[7];
} PMC;

typedef struct spmc {
int flags;
int fill[3];
} SPMC;


int main(int argc, char *argv[])
{
int i, j, k, l, n;
long a, b, e;

rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
}
}
rdtsc(&b);
printf("%d empty   ticks %10ld\n", i, e = b - a);

for (i = 0; i < N; i++) {
pool[i].mem = calloc(SIZE, sizeof(PMC));
pool[i].flags = calloc(SIZE, sizeof(char));
}

for  (n = 0; n < 3; n++) {
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((PMC*)pool[j].mem)[i].flags |= 1;
}
}
rdtsc(&b);
printf("%d linear  PMC ticks %10ld\n", i, b - a - e);

rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((PMC*)pool[l].mem)[k].flags |= 1;
}
}
rdtsc(&b);
printf("%d random  PMC ticks %10ld\n", i, b - a - e);
}


for (i = 0; i < N; i++) {
free(pool[i].mem);
free(pool[i].flags);
}
for (i = 0; i < N; i++) {
pool[i].mem = calloc(SIZE, sizeof(SPMC));
pool[i].flags = calloc(SIZE, sizeof(char));
}
for  (n = 0; n < 3; n++) {
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((SPMC*)pool[j].mem)[i].flags |= 1;
}
}
rdtsc(&b);
printf("%d linear SPMC ticks %10ld\n", i, b - a - e);

rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((SPMC*)pool[l].mem)[k].flags |= 1;
}
}
rdtsc(&b);
printf("%d random SPMC ticks %10ld\n", i, b - a - e);
}

/* now aligned pool with sep flags */

for (i = 0; i < N; i++) {
free(pool[i].mem);
free(pool[i].flags);
}
for (i = 0; i < N; i++) {
pool[i].mem = memalign(SIZE*sizeof(PMC), SIZE*sizeof(PMC));
pool[i].flags = calloc(SIZE, sizeof(char));
/* printf("pool %d %p\n", i, pool[i].mem); */
}

rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i <

perl6 testing on mingw32

2003-01-06 Thread James Michael DuPont
first of all :
to run this, it needs a -I.

second of all, the test has errors 
the Perl6grammar.pm did not return a true value at (eval 58) line 3.
can be ignored, because via some magic, it is regenerated.

I get the following error
>>WHOA!  Somehow you got a different number of results than tests ran!
>>This should never happen!  Please contact the author immediately!
>>END failed--call queue aborted.

test results :
---
mdupont@PI
/cygdrive/c/development/testing/parrot/parrot/languages/perl6
$ perl -I. ./perl6 --test
Perl6grammar.pm did not return a true value at (eval 58) line 3.

Test details:
t/builtins/array.t2/3
t/builtins/string.t...4/4
t/compiler/1.t..14/14
t/compiler/2.t5/5
t/compiler/3.t7/7
t/compiler/4.t3/3
t/compiler/5.t5/5
t/compiler/6.t6/6
t/compiler/7.t1/1
t/compiler/8.t6/6
t/compiler/9.t4/4
t/compiler/a.t3/3
t/compiler/b.t6/6
t/compiler/c.t6/6
t/compiler/qsort.t1/1
t/rx/basic.t..6/6
t/rx/call.t...2/2
t/rx/special.t2/2

Test summary:
t/builtins/array.tesok, 2/3 skipped: various reasons
t/builtins/string.tesok
t/compiler/1.tesok
t/compiler/2.tesok
t/compiler/3.tesok
t/compiler/4.tesok
t/compiler/5.tesok
t/compiler/6.tesok
t/compiler/7.tesok
t/compiler/8.tesok
t/compiler/9.tesok
t/compiler/a.tesok
t/compiler/b.tesok
t/compiler/c.tesok
t/compiler/qsort.tesok
t/rx/basic.tes..ok
t/rx/call.tes...ok
t/rx/special.tesok
All tests successful, 2 subtests skipped.
Files=18, Tests=84,  3 wallclock secs ( 0.79 cusr +  1.03 csys =  1.82
CPU)
WHOA!  Somehow you got a different number of results than tests ran!
This should never happen!  Please contact the author immediately!
END failed--call queue aborted.


=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Re: Object semantics

2003-01-06 Thread Dan Sugalski
At 11:00 AM +0530 1/6/03, Gopal V wrote:

If memory serves me right, Dan Sugalski wrote:

 >>  Why would we want to avoid this? It looks exactly like what ought to
 >>  happen.


If you can provide that in-vm , it would be a lot faster ...(hmm, that's
one argument that should convince you ;)


Struct copying? Not a problem. Cloning this sort of thing is the same 
as cloning off strings. Deeper copies, where we trace the tree, are a 
bit trickier, but there are vtable entries to do that, so classes can 
implement things as fast as possible if people need them.

But like I said , I need lots of sticky notes for all the opcodes for
parrot ...(I'm still in "can't remember all opcodes" mode)


 Just because C# does it doesn't mean that he likes it. :)


To end all further debate --

*) C# has something like this ,

*) I can't see what Dan has in head for parrot ,


IIRC, it's grey and squishy. :)


*) I don't want feature creep into parrot


If you can find a feature that we don't have, I'd be surprised. :) 
There's not much left out of perl, so there's not much left out of 
parrot.

*) Our C# -> JVM compiler already has the workarounds for this like:
   invokestatic	"MyStruct" "copyIn__" "(LMyStruct;)LMyStruct;"


I think we can make things a bit easier than that. We should build a 
good base struct PMC type to build on to make this faster, though.

*) I really don't like valuetypes that much :).


And I don't much like reftypes, so we're even. :)


*) Rhys will be doing most of the design , this is his headache actually :-)

So does that cover all bases ?


I think so. I'm about ready to get objects formally(ish) defined, but 
given that I'm in a coffee shop and unconnected now, it might hit the 
list at the same time this does. (Or before, depending on how the 
mail queues go...)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: More thougths on DOD

2003-01-06 Thread Dan Sugalski
At 1:30 PM +0100 1/6/03, Leopold Toetsch wrote:

Attached test program shows some additional effects of PMC size and
timing. A PMC is 32 byte, a SPMC is 16 byte, matching current and 
minimal PMC sizes for i386 (or a typical 32 bit system).

1) for linear access half sized PMCs give double mark speed. This is
in good relation to stress.pasm

Not at all surprising, as it doubles the cache density. OTOH, this 
also argues for moving the mark stuff out of the PMC struct entirely 
if we can manage it. A region of the PMC pool that's entirely mark 
area would up the cache density by a factor of three or four.

2) As we have only one free_list per pool, memory usage will get more
and more random, when e.g. parts of arrays are rewritten with new
values, which come from the free_list. The worst case behaviour will be
total random access to our pools data.


This is a very good point. It might be worth having an "allocate X 
PMCs into buffer Y" call that could potentially let us do more 
optimization here, since if we're allocating a full pool's worth of 
PMCs it may be more prudent to just allocate a full new pool and 
stuff them all into the single array.

For PMCs, the worst case random access takes about double the time of
a linear access. SPMCs take almost 3 times, but are still faster then
PMCs. But the advantage of smaller PMCs is almost gone.

3) Conclusion
To keep PMCs more tightly together when reusing them, these numbers
seem to indicate, that we might need a free_list per pool->arena,
which would need a pool->arena pointer in the PMC.


We could avoid this with properly aligned PMC pools, if we don't mind 
playing pointer masking games. Whether this would be faster is an 
open question, though.

4) With a pool->arena pointer in the PMC, we could also try to use a
spearate flags field, which could be 1 byte per PMC sized.


I don't think a single byte's enough for PMC flags, though it could 
be for internal use. (Heck, two bits may be enough, in which case it 
may be worth having a separate mark bitstring for live/dead marks)

5) On my system pools of different sizes are in totally different
ranges of memory (sbrk vs. mmap). This makes the quick bit mask test
in trace_mem_block totally unusable.


Yeah, but I'm not sure that is a problem as such--we're going to have 
to deal with multiple memory blocks anyway, so...
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Question about the disassembler

2003-01-06 Thread James Michael DuPont
Dear fellow hackers,

the "C" disassembler is crashing :

mdupont@introspector:~/development/parrot/parrot$ ./disassemble
t/pmc/sub_1.pbc 
disassemble: debug.c:1055: PDB_disassemble_op: Assertion `size + 20 <
space' failed.
Aborted


And the perl dissambler works !

I have been looking into and it seems that the parrot contains very
little meta-data compared to IL/DOTGNU

the table at the top of the program contains many strings, pmc_* and
sometimes other slots. But i wonder if you will try and represent OO
classes like IL does with the IL .class attribute?

best regards,

mike


=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Re: More thougths on DOD

2003-01-06 Thread Leopold Toetsch
Dan Sugalski wrote:


At 1:30 PM +0100 1/6/03, Leopold Toetsch wrote:



1) for linear access half sized PMCs give double mark speed. This is
in good relation to stress.pasm



... A region of the PMC pool that's entirely mark area would 
up the cache density by a factor of three or four.


s/three of four/32/ for current PMC. next_for_gc is not really used that 
often IMHO, that it should go in the same place.

Marking current PObj headers needs these flags:
live, on_free_list, is_PMC, is_PMC_ptr, is_Buffer_ptr, active_destroy 
(and ev. is_SPMC i.e. scalar or small PMC). This would fit fine in one 
byte. Only when we have a more complicated structure (with data, 
properties, whatever) then we need next_for_gc.

... The worst case behaviour will be
total random access to our pools data.



This is a very good point. It might be worth having an "allocate X PMCs 
into buffer Y" call that could potentially let us do more optimization 
here, since if we're allocating a full pool's worth of PMCs it may be 
more prudent to just allocate a full new pool and stuff them all into 
the single array.


Yes. But will a HL be able to emit such a call? This would need thorough 
loop analysis or some help of the HL user to declare e.g. array size in 
advance, which is rather unperlish for example. But OTOH the doc could 
state, if you know you'll need a really big array, then say so to perl6, 
you'll get more speed then.

... we might need a free_list per pool->arena,
which would need a pool->arena pointer in the PMC.



We could avoid this with properly aligned PMC pools, if we don't mind 
playing pointer masking games. Whether this would be faster is an open 
question, though.


Aligned PMC pools don't help for above case of longrunning programs, 
when newly created PMCs do come from different pool->arena areas, they 
will be scattered all over the whole allocated regions.

The idea above is to have a free_list per header_pool->arena, which 
would keep items tighter together then our free_list per header pool does.

Or: At some point it would be cheaper to forget freed PMCs and allocate 
fresh new pools, cache-friendly and tightly put together.


4) With a pool->arena pointer in the PMC, we could also try to use a
spearate flags field, which could be 1 byte per PMC sized.



I don't think a single byte's enough for PMC flags, though it could be 
for internal use. (Heck, two bits may be enough, in which case it may be 
worth having a separate mark bitstring for live/dead marks)


S. above and pobject_lives() in dod.c. We need 1 byte per header for 
marking. Not for all PObj flags, these will still exist.
The bit arithmetic will be faster for big programs ("big" will be 
defined later ;-)


5) On my system pools of different sizes are in totally different
ranges of memory (sbrk vs. mmap). This makes the quick bit mask test
in trace_mem_block totally unusable.



Yeah, but I'm not sure that is a problem as such--we're going to have to 
deal with multiple memory blocks anyway, so...


Or skip this alltogether ;-)


leo