Dennis Bjorklund <[EMAIL PROTECTED]> writes: > ... This also means that the start byte can never start with 7 or 8 > ones, that is illegal and should be tested for and rejected. So the > longest utf-8 sequence is 6 bytes (and the longest character needs 4 > bytes (or 31 bits)).
Tatsuo would know more about this than me, but it looks from here like our coding was originally designed to support only 16-bit-wide internal characters (ie, 16-bit pg_wchar datatype width). I believe that the regex library limitation here is gone, and that as far as that library is concerned we could assume a 32-bit internal character width. The question at hand is whether we can support 32-bit characters or not --- and if not, what's the next bug to fix? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster