In perl.perl6.internals, you wrote:
> --- Leopold Toetsch <[EMAIL PROTECTED]> wrote:
>> * SLOW (same slow with register or odd aligned)
>> * 0x818118a <jit_func+194>: sub 0x8164cac,%ebx
>> * 0x8181190 <jit_func+200>: jne 0x818118a <jit_func+194>
> The slow one has the loop crossing over a 16 byte boundary. Try moving it
> over a bit.
Yep, actually it looks like a 8 byte boundary:
Following program:
#!/usr/bin/perl -w
use strict;
for (my $i = 0; $i < 100; $i++) {
printf "%3d\t", $i;
open(P, ">m.pasm");
for (0..$i) {
print(P <<'ENOP');
noop
ENOP
}
print(P <<'EOF');
set I3, 1
set I4, 100000000
set I5, I4
time N1
REDO: sub I4, I4, I3
if I4, REDO
time N5
sub N2, N5, N1
set N1, I5
mul N1, 2
div N1, N2
set N2, 1000000.0
div N1, N2
print N1
print " M op/s\n"
end
EOF
close(P);
system("perl assemble.pl m.pasm | parrot -j -");
}
And here is the output:
0 790.826400 M op/s
1 523.305494 M op/s
2 788.544190 M op/s
3 783.447189 M op/s
4 783.975462 M op/s
5 788.208178 M op/s
6 782.466484 M op/s
7 788.059343 M op/s
8 788.836349 M op/s
9 522.986581 M op/s
10 788.895326 M op/s
11 784.021624 M op/s
12 789.773978 M op/s
13 788.065635 M op/s
14 783.558056 M op/s
15 789.010709 M op/s
16 782.463565 M op/s
17 523.049517 M op/s
18 781.350657 M op/s
19 784.184698 M op/s
20 789.683646 M op/s
21 781.362666 M op/s
22 783.994146 M op/s
23 789.100887 M op/s
24 783.990848 M op/s
25 370.620840 M op/s
26 786.862561 M op/s
27 784.092342 M op/s
28 789.106826 M op/s
29 784.027852 M op/s
30 780.688935 M op/s
31 787.913154 M op/s
32 783.576354 M op/s
33 526.877272 M op/s
34 780.493905 M op/s
35 790.339116 M op/s
36 789.166586 M op/s
37 782.154592 M op/s
38 786.902789 M op/s
39 783.834446 M op/s
40 784.003305 M op/s
41 522.135984 M op/s
42 780.618829 M op/s
43 790.167145 M op/s
44 783.284786 M op/s
45 790.363689 M op/s
46 781.002931 M op/s
47 783.720572 M op/s
48 789.774350 M op/s
49 523.933363 M op/s
50 786.970706 M op/s
51 780.966576 M op/s
52 789.234894 M op/s
53 784.317040 M op/s
54 780.993842 M op/s
55 789.914164 M op/s
56 783.705196 M op/s
57 291.958023 M op/s
58 783.653215 M op/s
59 788.739927 M op/s
60 784.599837 M op/s
61 783.917218 M op/s
62 790.051795 M op/s
63 782.589121 M op/s
64 784.846120 M op/s
65 523.988181 M op/s
66 788.746231 M op/s
67 781.811980 M op/s
68 786.188159 M op/s
69 790.023521 M op/s
70 783.149502 M op/s
71 786.531300 M op/s
72 781.711076 M op/s
73 527.106372 M op/s
74 783.735948 M op/s
75 788.491194 M op/s
76 782.442035 M op/s
77 780.387170 M op/s
78 789.259770 M op/s
79 779.781801 M op/s
80 788.186701 M op/s
81 523.328673 M op/s
82 790.407627 M op/s
83 782.751235 M op/s
84 788.410417 M op/s
85 782.625627 M op/s
86 782.056516 M op/s
87 787.631292 M op/s
88 782.218409 M op/s
89 425.664145 M op/s
90 778.734333 M op/s
91 787.851363 M op/s
92 784.661485 M op/s
93 788.292247 M op/s
94 783.754621 M op/s
95 789.181805 M op/s
96 788.326694 M op/s
97 523.357568 M op/s
98 782.105369 M op/s
99 781.796679 M op/s
This of course has the assumption, that the program did run at the
same address, which is - from my experience with gdb - usually true.
So moving the critical part of a program by just one byte can cause a
huge slowdown.
(This is an Athlon 800, i386/linux)
leo