> Now, in the multibyte case, again in textlen(), I see: > > /* optimization for single byte encoding */ > if (pg_database_encoding_max_length() <= 1) > PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); > > PG_RETURN_INT32( > pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)); > > Three questions here. > 1) In the case of encoding max length == 1, can we treat it the same as > the non-multibyte case (I presume they are exactly the same)?
Yes. > 2) Can encoding max length ever be < 1? Doesn't make sense to me. No. It seems just a defensive coding. > 3) In the case of encoding max length > 1, if I understand correctly, > each encoded character can be one *or more* bytes, up to and encluding > encoding max length bytes. Right. > So the *only* way presently to get the length > of the original character string is to loop through the entire string > checking the length of each individual character (that's what > pg_mbstrlen_with_len() does it seems)? Yes. > Finally, if 3) is true, then there is no way to avoid the retrieval and > decompression of the datum just to find out its length. For large > datums, detoasting plus the looping through each character would add a > huge amount of overhead just to get at the length of the original > string. I don't know if we need to be able to get *just* the length > often enough to really care, but if we do, I had an idea for some future > release (I wouldn't propose doing this for 7.3): > > - add a new EXTENDED state to va_external for MULTIBYTE > - any string with max encoding length > 1 would be EXTENDED even if it > is not EXTERNAL and not COMPRESSED. > - to each of the structs in the union, add va_strlen > - populate va_strlen on INSERT and maintain it on UPDATE. > > Now a new function similar to toast_raw_datum_size(), maybe > toast_raw_datum_strlen() could be used to get the original string > length, whether MB or not, without needing to retrieve and decompress > the entire datum. > > I understand we would either: have to steal another bit from the VARHDR > which would reduce the effective size of a valena from 1GB down to .5GB; > or we would need to add a byte or two to the VARHDR which is extra > per-datum overhead. I'm not sure we would want to do either. But I > wanted to toss out the idea while it was fresh on my mind. Interesting idea. I also was thinking about adding some extra infomation to text data types such as character set, collation etc. for 7.4 or later. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly