On 2017-03-25 12:22:15 -0400, Tom Lane wrote: > More random musing ... have you considered making the jump-target fields > in expressions be relative rather than absolute indexes? That is, > EEO_JUMP would look like > > op += (stepno); \ > EEO_DISPATCH(); \ > > instead of > > op = &state->steps[stepno]; \ > EEO_DISPATCH(); \ > > I have not carried out a full patch to make this work, but just making > that one change and examining the generated assembly code looks promising. > Instead of this > > movslq 40(%r14), %r8 > salq $6, %r8 > addq 24(%rbx), %r8 > movq %r8, %r14 > jmp *(%r8) > > we get this > > movslq 40(%r14), %rax > salq $6, %rax > addq %rax, %r14 > jmp *(%r14)
That seems like a good idea. I've not done this in the committed version (and I don't think we necessarily need to this before the release), but fo rthe future it seems like a good plan. It makes sense that it's faster - there's no need to reference state->steps. > which certainly looks like it ought to be faster. Also, the real reason > I got interested in this at all is that with relative jumps, groups of > steps would be position-independent within the steps array, which would > enable some compile-time tricks that seem impractical with the current > definition. Indeed. > BTW, now that I've spent a bit of time looking at the generated assembly > code, I'm kind of disinclined to believe any arguments about how we have > better control over branch prediction with the jump-threading > implementation. I measured the performance difference between using it and not using it, and it came out a pretty clear plus. On gcc 6.3, gcc master snapshot, and clang-3.9. It's not just that more jumps are duplicated, it's also that the switch() always adds a boundary check. > At least with current gcc (6.3.1 on Fedora 25) at -O2, > what I see is multiple places jumping to the same indirect jump > instruction :-(. It's not a total disaster: as best I can tell, all the > uses of EEO_JUMP remain distinct. But gcc has chosen to implement about > 40 of the 71 uses of EEO_NEXT by jumping to the same couple of > instructions that increment the "op" register and then do an indirect > jump :-(. Yea, I see some of that too - "usually" when there's more than just the jump in common. I think there's some gcc variables that influence this (min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be worthwhile experimenting with setting them locally via a pragma or such. I think Aants wanted to experiment with that, too. Then there's also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785 which causes some forms of computed goto (not ours I think) to be deoptimized in gcc. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers