So then let's take it slow, starting with the lowest of low-hanging fruit and showing numbers along the way.
The patch below improves the x86_64 codegen to convert: movl $0, %e{ax,cx,dx,sp,si,di} to: xorl %e{ax,cx,dx,sp,si,di}, %e{ax,cx,dx,sp,si,di} Here are benchmark numbers of the .text sizes of a 3-way bootstrap tcc, before this patch (old) and after this patch (new): bin old new diff %reduction --- --- --- ---- ---------- tcc 328786 325878 2908 0.88 libtcc.a 20214 20213 201 0.99 bcheck.o 23254 23209 45 0.19 bt-exe.o 4732 4693 39 0.82 bt-log.o 648 648 0 0 libtcc1.a 12678 12498 180 1.42 So it's not much, but it's also not nothing (except for the bt-log.o case, where it literally is nothing). And timing results for the compiling tcc.c 10 times test: old new --- --- 1010 ms 1008 ms ...which is just a fancy way of saying there appears to be no statistical difference in speed after applying this patch. All these measurements were run on my machine, which is an OpenBSD/amd64 machine with a very old and slow i3 CPU. This patch could easily be applied to movq $0, %r{ax,cx,dx,sp,si,di}, which I'll send in a follow-up email. The %r8-%15 and equivalent %r8d-%r15d conversions are slightly different. The i386 codegen should be able to have a mechanical translation of this diff applied to it as well. ~Brian diff --git a/x86_64-gen.c b/x86_64-gen.c index 81ec5d9..775e132 100644 --- a/x86_64-gen.c +++ b/x86_64-gen.c @@ -486,8 +486,13 @@ void load(int r, SValue *sv) orex(1,r,0, 0xb8 + REG_VALUE(r)); /* mov $xx, r */ gen_le64(sv->c.i); } else { - orex(0,r,0, 0xb8 + REG_VALUE(r)); /* mov $xx, r */ - gen_le32(fc); + if (fc == 0 && r < 8) { + o(0x31); /* xor r, r */ + o(0xc0 + REG_VALUE(r) * 9); + } else { + orex(0,r,0, 0xb8 + REG_VALUE(r)); /* mov $xx, r */ + gen_le32(fc); + } } } else if (v == VT_LOCAL) { orex(1,0,r,0x8d); /* lea xxx(%ebp), r */ _______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel