On Sun, 13 Dec 2020 at 9:28 PM, Andrey Borodin <x4...@yandex-team.ru> wrote: > +1 > This will make all INSERTs and UPDATES for tsvector's GiSTs.
Oh, I didn't realize that this code is getting used in GIST index insertion and creation too. Will check there. > Also I really like idea of taking advantage of hardware capabilities like > __builtin_* etc wherever possible. Yes. Also, the __builtin_popcount() uses SIMD vectorization (on arm64 : "cnt v0.8b, v0.8b"), hence there's all the more reason to use it. Over and above that, I had thought that if we can auto-vectorize the byte-by-byte xor operation and the popcount() call using compiler optimizations, we would benefit out of this, but didn't see any more improvement. I hoped for the benefit because that would have allowed us to process in 128-bit chunks or 256-bit chunks, since the vector registers are at least that long. Maybe gcc is not that smart to translate __builtin_popcount() to 128/256 bit vectorized instruction. But for XOR operator, it does translate to 128bit vectorized instructions (on arm64 : "eor v2.16b, v2.16b, v18.16b") > Meanwhile there are at least 4 incarnation of hemdistsign() functions that > are quite similar. I'd propose to refactor them somehow... Yes, I hope we get the benefit there also. Before that, I thought I should post the first use-case to get some early comments. Thanks for your encouraging comments :)