I wrote:

> I don't think the new structuring will pose any challenges for rebasing
0002, either. This might need some experimentation, though:
>
> + * Subroutine of pg_utf8_verifystr() to check on char. Returns the
length of the
> + * character at *s in bytes, or 0 on invalid input or premature end of
input.
> + *
> + * XXX: could this be combined with pg_utf8_verifychar above?
> + */
> +static inline int
> +pg_utf8_verify_one(const unsigned char *s, int len)
>
> It seems like it would be easy to have pg_utf8_verify_one in my proposed
pg_utf8.h header and replace the body of pg_utf8_verifychar with it.

0001: I went ahead and tried this for v15, and also attempted some clean-up:

- Rename pg_utf8_verify_one to pg_utf8_verifychar_internal.
- Have pg_utf8_verifychar_internal return -1 for invalid input to match
other functions in the file. We could also do this for check_ascii, but
it's not quite the same thing, because the string could still have valid
bytes in it, just not enough to advance the pointer by the stride length.
- Remove hard-coded numbers (not wedded to this).

- Use a call to pg_utf8_verifychar in the slow path.
- Reduce pg_utf8_verifychar to thin wrapper around
pg_utf8_verifychar_internal.

The last two aren't strictly necessary, but it prevents bloating the binary
in the slow path, and aids readability. For 0002, this required putting
pg_utf8_verifychar* in src/port. (While writing this I noticed I neglected
to explain that with a comment, though)

Feedback welcome on any of the above.

Since by now it hardly resembles the simdjson (or Fuchsia for that matter)
fallback that it took inspiration from, I've removed that mention from the
commit message.

0002: Just a rebase to work with the above. One possible review point: We
don't really need to have separate control over whether to use special
instructions for CRC and UTF-8. It should probably be just one configure
knob, but having them separate is perhaps easier to review.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment: v15-0001-Rewrite-pg_utf8_verifystr-for-speed.patch
Description: Binary data

Attachment: v15-0002-Use-SSE-instructions-for-pg_utf8_verifystr-where.patch
Description: Binary data

Reply via email to