Martijn van Oosterhout wrote:
On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote:
Actually, you can determine the length of a UTF-8 encoded character by
looking at the most significant bits of the first byte. So we could
store a UTF-8 encoded CHAR(1) field without any additional length header.

Except in postgres the length of a datum is currently only determined
from the type, or from a standard varlena header. Going down the road
of having to call type specific length functions for the values in
columns 1 to n-1 just to read column n seems like a really bad idea.

We want to make access to later columns *faster* not slower, which
means keeping to the simplest (code-wise) scheme possible.

We really have two goals. We want to reduce on-disk storage size to save I/O, and we want to keep processing simple to save CPU. Some ideas help one goal but hurt the other so we have to strike a balance between the two.

My gut feeling is that it wouldn't be that bad compared to what we have now or the new proposed varlena scheme, but before someone actually tries it and shows some numbers, this is just hand-waving.

Heikki Linnakangas

