In perl.perl6.internals, you wrote:
> --- Leopold Toetsch <[EMAIL PROTECTED]> wrote:
>>   * SLOW (same slow with register or odd aligned)
>>   * 0x818118a <jit_func+194>:    sub    0x8164cac,%ebx
>>   * 0x8181190 <jit_func+200>:    jne    0x818118a <jit_func+194>

> The slow one has the loop crossing over a 16 byte boundary. Try moving it
> over a bit.

Yep, actually it looks like a 8 byte boundary:
Following program:

#!/usr/bin/perl -w
use strict;

for (my $i = 0; $i < 100; $i++) {
        printf "%3d\t", $i;
        open(P, ">m.pasm");
        for (0..$i) {
        print(P <<'ENOP');
        noop
ENOP
        }
        print(P <<'EOF');
        set    I3, 1
        set    I4, 100000000
        set    I5, I4
        time   N1
REDO:   sub    I4, I4, I3
        if     I4, REDO
        time   N5
        sub    N2, N5, N1
        set    N1, I5
        mul    N1, 2
        div    N1, N2
        set    N2, 1000000.0
        div    N1, N2
        print  N1
        print  " M op/s\n"
        end
EOF
        close(P);
        system("perl assemble.pl m.pasm | parrot -j -");
}

And here is the output:

  0     790.826400 M op/s
  1     523.305494 M op/s
  2     788.544190 M op/s
  3     783.447189 M op/s
  4     783.975462 M op/s
  5     788.208178 M op/s
  6     782.466484 M op/s
  7     788.059343 M op/s
  8     788.836349 M op/s
  9     522.986581 M op/s
 10     788.895326 M op/s
 11     784.021624 M op/s
 12     789.773978 M op/s
 13     788.065635 M op/s
 14     783.558056 M op/s
 15     789.010709 M op/s
 16     782.463565 M op/s
 17     523.049517 M op/s
 18     781.350657 M op/s
 19     784.184698 M op/s
 20     789.683646 M op/s
 21     781.362666 M op/s
 22     783.994146 M op/s
 23     789.100887 M op/s
 24     783.990848 M op/s
 25     370.620840 M op/s
 26     786.862561 M op/s
 27     784.092342 M op/s
 28     789.106826 M op/s
 29     784.027852 M op/s
 30     780.688935 M op/s
 31     787.913154 M op/s
 32     783.576354 M op/s
 33     526.877272 M op/s
 34     780.493905 M op/s
 35     790.339116 M op/s
 36     789.166586 M op/s
 37     782.154592 M op/s
 38     786.902789 M op/s
 39     783.834446 M op/s
 40     784.003305 M op/s
 41     522.135984 M op/s
 42     780.618829 M op/s
 43     790.167145 M op/s
 44     783.284786 M op/s
 45     790.363689 M op/s
 46     781.002931 M op/s
 47     783.720572 M op/s
 48     789.774350 M op/s
 49     523.933363 M op/s
 50     786.970706 M op/s
 51     780.966576 M op/s
 52     789.234894 M op/s
 53     784.317040 M op/s
 54     780.993842 M op/s
 55     789.914164 M op/s
 56     783.705196 M op/s
 57     291.958023 M op/s
 58     783.653215 M op/s
 59     788.739927 M op/s
 60     784.599837 M op/s
 61     783.917218 M op/s
 62     790.051795 M op/s
 63     782.589121 M op/s
 64     784.846120 M op/s
 65     523.988181 M op/s
 66     788.746231 M op/s
 67     781.811980 M op/s
 68     786.188159 M op/s
 69     790.023521 M op/s
 70     783.149502 M op/s
 71     786.531300 M op/s
 72     781.711076 M op/s
 73     527.106372 M op/s
 74     783.735948 M op/s
 75     788.491194 M op/s
 76     782.442035 M op/s
 77     780.387170 M op/s
 78     789.259770 M op/s
 79     779.781801 M op/s
 80     788.186701 M op/s
 81     523.328673 M op/s
 82     790.407627 M op/s
 83     782.751235 M op/s
 84     788.410417 M op/s
 85     782.625627 M op/s
 86     782.056516 M op/s
 87     787.631292 M op/s
 88     782.218409 M op/s
 89     425.664145 M op/s
 90     778.734333 M op/s
 91     787.851363 M op/s
 92     784.661485 M op/s
 93     788.292247 M op/s
 94     783.754621 M op/s
 95     789.181805 M op/s
 96     788.326694 M op/s
 97     523.357568 M op/s
 98     782.105369 M op/s
 99     781.796679 M op/s

This of course has the assumption, that the program did run at the
same address, which is - from my experience with gdb - usually true.

So moving the critical part of a program by just one byte can cause a
huge slowdown.

(This is an Athlon 800, i386/linux)

leo

Reply via email to