More thougths on DOD
Attached test program shows some additional effects of PMC size and
timing. A PMC is 32 byte, a SPMC is 16 byte, matching current and
minimal PMC sizes for i386 (or a typical 32 bit system).
1) for linear access half sized PMCs give double mark speed. This is
in good relation to stress.pasm
2) As we have only one free_list per pool, memory usage will get more
and more random, when e.g. parts of arrays are rewritten with new
values, which come from the free_list. The worst case behaviour will be
total random access to our pools data.
For PMCs, the worst case random access takes about double the time of
a linear access. SPMCs take almost 3 times, but are still faster then
PMCs. But the advantage of smaller PMCs is almost gone.
3) Conclusion
To keep PMCs more tightly together when reusing them, these numbers
seem to indicate, that we might need a free_list per pool->arena,
which would need a pool->arena pointer in the PMC.
4) With a pool->arena pointer in the PMC, we could also try to use a
spearate flags field, which could be 1 byte per PMC sized.
5) On my system pools of different sizes are in totally different
ranges of memory (sbrk vs. mmap). This makes the quick bit mask test
in trace_mem_block totally unusable.
Comments, further experiments with 4) and so on welcome,
leo
/* test program for pmc flag access */
/* run program with
* cc -o tpmc -Wall tpmc.c -O3 && ./tpmc
*
* the timing macro needs adjustment for !i386
*/
#include
#include
#include
#include
#ifndef NO_USE_RDTSC
#define rdtsclh(low, hi) \
__asm__ __volatile__ ("rdtsc" : "=a" (low), "=d" (hi) )
void rdtsc(long *t)
{
long sec, usec;
rdtsclh(usec, sec);
*t = (long)(1000.0 * ((double)sec + ((double)usec / 100.0)));
}
#else
#include
double
floatval_time(void)
{
struct timeval t;
gettimeofday(&t, NULL);
return (double)t.tv_sec + ((double)t.tv_usec / 100.0);
}
void rdtsc(long *t)
{
double s = floatval_time();
*t = (long)(s * 1000.0);
}
#endif
#define N 16
#define SIZE 65536
struct pool {
char *mem;
char *flags;
} pool[N];
typedef struct pmc {
int flags;
int fill[7];
} PMC;
typedef struct spmc {
int flags;
int fill[3];
} SPMC;
int main(int argc, char *argv[])
{
int i, j, k, l, n;
long a, b, e;
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
}
}
rdtsc(&b);
printf("%d empty ticks %10ld\n", i, e = b - a);
for (i = 0; i < N; i++) {
pool[i].mem = calloc(SIZE, sizeof(PMC));
pool[i].flags = calloc(SIZE, sizeof(char));
}
for (n = 0; n < 3; n++) {
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((PMC*)pool[j].mem)[i].flags |= 1;
}
}
rdtsc(&b);
printf("%d linear PMC ticks %10ld\n", i, b - a - e);
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((PMC*)pool[l].mem)[k].flags |= 1;
}
}
rdtsc(&b);
printf("%d random PMC ticks %10ld\n", i, b - a - e);
}
for (i = 0; i < N; i++) {
free(pool[i].mem);
free(pool[i].flags);
}
for (i = 0; i < N; i++) {
pool[i].mem = calloc(SIZE, sizeof(SPMC));
pool[i].flags = calloc(SIZE, sizeof(char));
}
for (n = 0; n < 3; n++) {
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((SPMC*)pool[j].mem)[i].flags |= 1;
}
}
rdtsc(&b);
printf("%d linear SPMC ticks %10ld\n", i, b - a - e);
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i < SIZE; i++) {
k = (int) ((double)SIZE * rand()/(RAND_MAX+1.0));
((SPMC*)pool[l].mem)[k].flags |= 1;
}
}
rdtsc(&b);
printf("%d random SPMC ticks %10ld\n", i, b - a - e);
}
/* now aligned pool with sep flags */
for (i = 0; i < N; i++) {
free(pool[i].mem);
free(pool[i].flags);
}
for (i = 0; i < N; i++) {
pool[i].mem = memalign(SIZE*sizeof(PMC), SIZE*sizeof(PMC));
pool[i].flags = calloc(SIZE, sizeof(char));
/* printf("pool %d %p\n", i, pool[i].mem); */
}
rdtsc(&a);
for (j = 0; j < N; j++) {
l = (int) ((double)N * rand()/(RAND_MAX+1.0));
for (i = 0; i <
perl6 testing on mingw32
first of all : to run this, it needs a -I. second of all, the test has errors the Perl6grammar.pm did not return a true value at (eval 58) line 3. can be ignored, because via some magic, it is regenerated. I get the following error >>WHOA! Somehow you got a different number of results than tests ran! >>This should never happen! Please contact the author immediately! >>END failed--call queue aborted. test results : --- mdupont@PI /cygdrive/c/development/testing/parrot/parrot/languages/perl6 $ perl -I. ./perl6 --test Perl6grammar.pm did not return a true value at (eval 58) line 3. Test details: t/builtins/array.t2/3 t/builtins/string.t...4/4 t/compiler/1.t..14/14 t/compiler/2.t5/5 t/compiler/3.t7/7 t/compiler/4.t3/3 t/compiler/5.t5/5 t/compiler/6.t6/6 t/compiler/7.t1/1 t/compiler/8.t6/6 t/compiler/9.t4/4 t/compiler/a.t3/3 t/compiler/b.t6/6 t/compiler/c.t6/6 t/compiler/qsort.t1/1 t/rx/basic.t..6/6 t/rx/call.t...2/2 t/rx/special.t2/2 Test summary: t/builtins/array.tesok, 2/3 skipped: various reasons t/builtins/string.tesok t/compiler/1.tesok t/compiler/2.tesok t/compiler/3.tesok t/compiler/4.tesok t/compiler/5.tesok t/compiler/6.tesok t/compiler/7.tesok t/compiler/8.tesok t/compiler/9.tesok t/compiler/a.tesok t/compiler/b.tesok t/compiler/c.tesok t/compiler/qsort.tesok t/rx/basic.tes..ok t/rx/call.tes...ok t/rx/special.tesok All tests successful, 2 subtests skipped. Files=18, Tests=84, 3 wallclock secs ( 0.79 cusr + 1.03 csys = 1.82 CPU) WHOA! Somehow you got a different number of results than tests ran! This should never happen! Please contact the author immediately! END failed--call queue aborted. = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
Re: Object semantics
At 11:00 AM +0530 1/6/03, Gopal V wrote: If memory serves me right, Dan Sugalski wrote: >> Why would we want to avoid this? It looks exactly like what ought to >> happen. If you can provide that in-vm , it would be a lot faster ...(hmm, that's one argument that should convince you ;) Struct copying? Not a problem. Cloning this sort of thing is the same as cloning off strings. Deeper copies, where we trace the tree, are a bit trickier, but there are vtable entries to do that, so classes can implement things as fast as possible if people need them. But like I said , I need lots of sticky notes for all the opcodes for parrot ...(I'm still in "can't remember all opcodes" mode) Just because C# does it doesn't mean that he likes it. :) To end all further debate -- *) C# has something like this , *) I can't see what Dan has in head for parrot , IIRC, it's grey and squishy. :) *) I don't want feature creep into parrot If you can find a feature that we don't have, I'd be surprised. :) There's not much left out of perl, so there's not much left out of parrot. *) Our C# -> JVM compiler already has the workarounds for this like: invokestatic "MyStruct" "copyIn__" "(LMyStruct;)LMyStruct;" I think we can make things a bit easier than that. We should build a good base struct PMC type to build on to make this faster, though. *) I really don't like valuetypes that much :). And I don't much like reftypes, so we're even. :) *) Rhys will be doing most of the design , this is his headache actually :-) So does that cover all bases ? I think so. I'm about ready to get objects formally(ish) defined, but given that I'm in a coffee shop and unconnected now, it might hit the list at the same time this does. (Or before, depending on how the mail queues go...) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More thougths on DOD
At 1:30 PM +0100 1/6/03, Leopold Toetsch wrote: Attached test program shows some additional effects of PMC size and timing. A PMC is 32 byte, a SPMC is 16 byte, matching current and minimal PMC sizes for i386 (or a typical 32 bit system). 1) for linear access half sized PMCs give double mark speed. This is in good relation to stress.pasm Not at all surprising, as it doubles the cache density. OTOH, this also argues for moving the mark stuff out of the PMC struct entirely if we can manage it. A region of the PMC pool that's entirely mark area would up the cache density by a factor of three or four. 2) As we have only one free_list per pool, memory usage will get more and more random, when e.g. parts of arrays are rewritten with new values, which come from the free_list. The worst case behaviour will be total random access to our pools data. This is a very good point. It might be worth having an "allocate X PMCs into buffer Y" call that could potentially let us do more optimization here, since if we're allocating a full pool's worth of PMCs it may be more prudent to just allocate a full new pool and stuff them all into the single array. For PMCs, the worst case random access takes about double the time of a linear access. SPMCs take almost 3 times, but are still faster then PMCs. But the advantage of smaller PMCs is almost gone. 3) Conclusion To keep PMCs more tightly together when reusing them, these numbers seem to indicate, that we might need a free_list per pool->arena, which would need a pool->arena pointer in the PMC. We could avoid this with properly aligned PMC pools, if we don't mind playing pointer masking games. Whether this would be faster is an open question, though. 4) With a pool->arena pointer in the PMC, we could also try to use a spearate flags field, which could be 1 byte per PMC sized. I don't think a single byte's enough for PMC flags, though it could be for internal use. (Heck, two bits may be enough, in which case it may be worth having a separate mark bitstring for live/dead marks) 5) On my system pools of different sizes are in totally different ranges of memory (sbrk vs. mmap). This makes the quick bit mask test in trace_mem_block totally unusable. Yeah, but I'm not sure that is a problem as such--we're going to have to deal with multiple memory blocks anyway, so... -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Question about the disassembler
Dear fellow hackers, the "C" disassembler is crashing : mdupont@introspector:~/development/parrot/parrot$ ./disassemble t/pmc/sub_1.pbc disassemble: debug.c:1055: PDB_disassemble_op: Assertion `size + 20 < space' failed. Aborted And the perl dissambler works ! I have been looking into and it seems that the parrot contains very little meta-data compared to IL/DOTGNU the table at the top of the program contains many strings, pmc_* and sometimes other slots. But i wonder if you will try and represent OO classes like IL does with the IL .class attribute? best regards, mike = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
Re: More thougths on DOD
Dan Sugalski wrote:
At 1:30 PM +0100 1/6/03, Leopold Toetsch wrote:
1) for linear access half sized PMCs give double mark speed. This is
in good relation to stress.pasm
... A region of the PMC pool that's entirely mark area would
up the cache density by a factor of three or four.
s/three of four/32/ for current PMC. next_for_gc is not really used that
often IMHO, that it should go in the same place.
Marking current PObj headers needs these flags:
live, on_free_list, is_PMC, is_PMC_ptr, is_Buffer_ptr, active_destroy
(and ev. is_SPMC i.e. scalar or small PMC). This would fit fine in one
byte. Only when we have a more complicated structure (with data,
properties, whatever) then we need next_for_gc.
... The worst case behaviour will be
total random access to our pools data.
This is a very good point. It might be worth having an "allocate X PMCs
into buffer Y" call that could potentially let us do more optimization
here, since if we're allocating a full pool's worth of PMCs it may be
more prudent to just allocate a full new pool and stuff them all into
the single array.
Yes. But will a HL be able to emit such a call? This would need thorough
loop analysis or some help of the HL user to declare e.g. array size in
advance, which is rather unperlish for example. But OTOH the doc could
state, if you know you'll need a really big array, then say so to perl6,
you'll get more speed then.
... we might need a free_list per pool->arena,
which would need a pool->arena pointer in the PMC.
We could avoid this with properly aligned PMC pools, if we don't mind
playing pointer masking games. Whether this would be faster is an open
question, though.
Aligned PMC pools don't help for above case of longrunning programs,
when newly created PMCs do come from different pool->arena areas, they
will be scattered all over the whole allocated regions.
The idea above is to have a free_list per header_pool->arena, which
would keep items tighter together then our free_list per header pool does.
Or: At some point it would be cheaper to forget freed PMCs and allocate
fresh new pools, cache-friendly and tightly put together.
4) With a pool->arena pointer in the PMC, we could also try to use a
spearate flags field, which could be 1 byte per PMC sized.
I don't think a single byte's enough for PMC flags, though it could be
for internal use. (Heck, two bits may be enough, in which case it may be
worth having a separate mark bitstring for live/dead marks)
S. above and pobject_lives() in dod.c. We need 1 byte per header for
marking. Not for all PObj flags, these will still exist.
The bit arithmetic will be faster for big programs ("big" will be
defined later ;-)
5) On my system pools of different sizes are in totally different
ranges of memory (sbrk vs. mmap). This makes the quick bit mask test
in trace_mem_block totally unusable.
Yeah, but I'm not sure that is a problem as such--we're going to have to
deal with multiple memory blocks anyway, so...
Or skip this alltogether ;-)
leo
