Tatsuo Ishii wrote: >>Now a new function similar to toast_raw_datum_size(), maybe >>toast_raw_datum_strlen() could be used to get the original string >>length, whether MB or not, without needing to retrieve and decompress >>the entire datum. >> >>I understand we would either: have to steal another bit from the VARHDR >>which would reduce the effective size of a valena from 1GB down to .5GB; >>or we would need to add a byte or two to the VARHDR which is extra >>per-datum overhead. I'm not sure we would want to do either. But I >>wanted to toss out the idea while it was fresh on my mind. > > > Interesting idea. I also was thinking about adding some extra > infomation to text data types such as character set, collation > etc. for 7.4 or later.
I ran some tests to confirm the theory above regarding overhead; create table strtest(f1 text); do 100 times insert into strtest values('12345....'); -- 100000 characters loop do 1000 times select length(f1) from strtest; loop Results: SQL_ASCII database, new code: ============================= PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); ==> 2 seconds SQL_ASCII database, old code: ============================= text *t = PG_GETARG_TEXT_P(0); PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ==> 66 seconds EUC_JP database, new & old code: ================================ text *t = PG_GETARG_TEXT_P(0); PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)); ==> 469 seconds So it appears that, while detoasting is moderately expensive (adds 64 seconds to the test), the call to pg_mbstrlen_with_len() is very expensive (adds 403 seconds to the test). Joe ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])