Hi. Em qua., 15 de jan. de 2025 às 07:57, John Naylor <johncnaylo...@gmail.com> escreveu:
> On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > Couple of thoughts: > > > > 1. I was actually hoping for a comment on the constant's definition, > > perhaps along the lines of > > > > /* > > * The hex expansion of each possible byte value (two chars per value). > > */ > > Works for me. With that, did you mean we then wouldn't need a comment > in the code? > > > 2. Since "src" is defined as "const char *", I'm pretty sure that > > pickier compilers will complain that > > > > + unsigned char usrc = *((unsigned char *) src); > > > > results in casting away const. Recommend > > > > + unsigned char usrc = *((const unsigned char *) src); > > Thanks for the reminder! > > > 3. I really wonder if > > > > + memcpy(dst, &hextbl[2 * usrc], 2); > > > > is faster than copying the two bytes manually, along the lines of > > > > + *dst++ = hextbl[2 * usrc]; > > + *dst++ = hextbl[2 * usrc + 1]; > > > > Compilers that inline memcpy() may arrive at the same machine code, > > but why rely on the compiler to make that optimization? If the > > compiler fails to do so, an out-of-line memcpy() call will surely > > be a loser. > > See measurements at the end. As for compilers, gcc 3.4.6 and clang > 3.0.0 can inline the memcpy. The manual copy above only gets combined > to a single word starting with gcc 12 and clang 15, and latest MSVC > still can't do it (4A in the godbolt link below). Are there any > buildfarm animals around that may not inline memcpy for word-sized > input? > > > A variant could be > > > > + const char *hexptr = &hextbl[2 * usrc]; > > + *dst++ = hexptr[0]; > > + *dst++ = hexptr[1]; > > > > but this supposes that the compiler fails to see the common > > subexpression in the other formulation, which I believe > > most modern compilers will see. > > This combines to a single word starting with clang 5, but does not > work on gcc 14.2 or gcc trunk (4B below). I have gcc 14.2 handy, and > on my machine bytewise load/stores are somewhere in the middle: > > master 1158.969 ms > v3 776.791 ms > variant 4A 775.777 ms > variant 4B 969.945 ms > > https://godbolt.org/z/ajToordKq Your example from godbolt, has a have an important difference, which modifies the assembler result. -static const char hextbl[] = "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff" ; +static const char hextbl[512] = "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff" ; best regards, Ranier Vilela