Re: [HACKERS] Optimizing COPY

Heikki Linnakangas Wed, 12 Nov 2008 08:22:37 -0800

Chuck McDevitt wrote:

What if the block of text is split in the middle of a multibyte character?
I don't think it is safe to assume raw blocks always end on a character 
boundary.

Yeah, it's not. I realized myself after submitting. The generic approachis to loop with pg_mblen() to find out the max. safe length. For UTF-8,and probably many other multi-byte encodings as well, we can detectwhether a byte is the first byte of a multi-byte character, just bylooking at the few high-bits of the byte.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing COPY

Reply via email to