Re: [HACKERS] Fixed length data types issue

2006-09-22 Thread Bruno Wolff III
On Mon, Sep 11, 2006 at 19:05:12 -0400, Gregory Stark <[EMAIL PROTECTED]> wrote: > > I'm not sure how gmp and the others represent their data but my first guess is > that there's no particular reason the base of the mantissa and exponent have > to be the same as the base the exponent is interpre

Re: [HACKERS] Fixed length data types issue

2006-09-18 Thread Bruno Wolff III
On Fri, Sep 08, 2006 at 15:08:18 -0400, Andrew Dunstan <[EMAIL PROTECTED]> wrote: > > From time to time the idea of a logical vs physical mapping for columns > has been mentioned. Among other benefits, that might allow us to do some > rearrangement of physical ordering to reduce space wasted o

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Gregory Stark
Martijn van Oosterhout writes: > I don't think making a special typlen value just for a type that can > store a single UTF-8 character is smart. I just can't see enough use to > make it worth it. Well there are lots of data types that can probably tell how long they are based on internal state.

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 01:38:54PM +0200, Mario Weilguni wrote: > What about the "char" type? Isn't it designed for that? Or will this type > disappear in future releases? "char" is used in the system catalogs, I don't think it's going to go any time soon. There it's used as a (surprise) single

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Mario Weilguni
van Oosterhout Cc: [EMAIL PROTECTED]; pgsql-hackers@postgresql.org Betreff: Re: [HACKERS] Fixed length data types issue Martijn van Oosterhout wrote: > I don't think making a special typlen value just for a type that can > store a single UTF-8 character is smart. I just can't see en

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: I don't think making a special typlen value just for a type that can store a single UTF-8 character is smart. I just can't see enough use to make it worth it. Assuming that we can set encoding per-column one day, I agree. If you have a CHAR(1) field, you're goi

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 11:43:52AM +0100, Heikki Linnakangas wrote: > My gut feeling is that it wouldn't be that bad compared to what we have > now or the new proposed varlena scheme, but before someone actually > tries it and shows some numbers, this is just hand-waving. Well, that depends on w

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: Actually, you can determine the length of a UTF-8 encoded character by looking at the most significant bits of the first byte. So we could store a UTF-8 encoded CHAR(1) field without any additional

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: > Gregory Stark wrote: > >It's limited but I wouldn't say it's very limiting. In the cases where it > >doesn't apply there's no way out anyways. A UTF8 field will need a length > >header in some form. > > Actually, you can determi

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Gregory Stark wrote: It's limited but I wouldn't say it's very limiting. In the cases where it doesn't apply there's no way out anyways. A UTF8 field will need a length header in some form. Actually, you can determine the length of a UTF-8 encoded character by looking at the most significant b

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread mark
On Thu, Sep 14, 2006 at 10:21:30PM +0100, Gregory Stark wrote: > >> One very nifty trick would be to fix "char" to act as CHAR(), and map > >> CHAR(1) automatically to "char". > > Sorry, probably a stupid idea considering multi-byte encodings. I > > suppose it could be an optimization for single-b

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Gregory Stark
Bruce Momjian <[EMAIL PROTECTED]> writes: >> One very nifty trick would be to fix "char" to act as CHAR(), and map >> CHAR(1) automatically to "char". > > Sorry, probably a stupid idea considering multi-byte encodings. I > suppose it could be an optimization for single-byte encodings, but that >

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Bruce Momjian
Bruce Momjian wrote: > Gregory Stark wrote: > > > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > > > Gregory Stark wrote: > > >> > > >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's > > >> the > > >> consensus though then I can work on either fixing "char" semantic

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Mark Dilger
My apologies if you are seeing this twice. I posted it last night, but it still does not appear to have made it to the group. Mark Dilger wrote: Tom Lane wrote: Mark Dilger <[EMAIL PROTECTED]> writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Bruce Momjian
Gregory Stark wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > Gregory Stark wrote: > >> > >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's the > >> consensus though then I can work on either fixing "char" semantics to match > >> CHAR(1) or adding a separate

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Markus Schaber
Hi, Jim, Jim Nasby wrote: > I'd love to have the ability to control toasting thresholds manually. > This could result in a lot of speed improvements in cases where a > varlena field isn't frequently accessed and will be fairly large, yet > not large enough to normally trigger toasting. An address

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Tom Lane wrote: Mark Dilger <[EMAIL PROTECTED]> writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane
Mark Dilger <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Please provide a stack trace --- AFAIK there shouldn't be any reason why >> a pass-by-ref 3-byte type wouldn't work. > (gdb) bt > #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 > #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7,

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Tom Lane wrote: Mark Dilger <[EMAIL PROTECTED]> writes: int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference.

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Arturo Perez
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Jim Nasby) wrote: > I'd love to have the ability to control toasting thresholds > manually. ... Being able to force a field to be > toasted before it normally would could drastically improve tuple > density without requiring the developer t

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane
Mark Dilger <[EMAIL PROTECTED]> writes: > int1 works perfectly, as far as I can tell. int3 works great in memory, > but can't be stored to a table. The problem seems to be that > store_att_byval allows data of size 1 byte but not size 3 bytes, forcing > me to pass int3 by reference. But when

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Mark Dilger wrote: Tom Lane wrote: Mark Dilger <[EMAIL PROTECTED]> writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 => smallint, int3 => int

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Jim Nasby
On Sep 11, 2006, at 1:57 PM, Gregory Stark wrote: Tom Lane <[EMAIL PROTECTED]> writes: I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive.

Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Gregory Stark
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's the >> consensus though then I can work on either fixing "char" semantics to match >> CHAR(1) or adding a separate type instead. > > What semantics?

Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 14:25 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Is this an 8.2 thing? > > You are joking, no? Confirming, using an open question, and a smile. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com --

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread mark
On Mon, Sep 11, 2006 at 07:05:12PM -0400, Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > Gregory Stark <[EMAIL PROTECTED]> writes: > > > At first I meant that as a reductio ad absurdum argument, but, uh, > > > come to think of it why *do* we have our own arbitrary precision > > > l

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: > > At first I meant that as a reductio ad absurdum argument, but, uh, > > come to think of it why *do* we have our own arbitrary precision > > library? Is there any particular reason we can't use one of the > > exis

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > At first I meant that as a reductio ad absurdum argument, but, uh, > come to think of it why *do* we have our own arbitrary precision > library? Is there any particular reason we can't use one of the > existing binary implementations? Going over to binar

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > That's utterly irrelevant. The point is that there are standard > applications today in which people need that much precision; therefore, > the argument that "10^508 is far more than anyone could want" is on > exceedingly shaky ground. My point is those app

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> No, that got rejected as being too much of a restriction of the dynamic >> range, eg John's comment here: >> http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php > That logic seems questionable. John m

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > No, that got rejected as being too much of a restriction of the dynamic > range, eg John's comment here: > http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php That logic seems questionable. John makes two points: a) crypto applications are wit

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread mark
On Mon, Sep 11, 2006 at 01:15:43PM -0400, Tom Lane wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > In any case it seems a bit backwards to me. Wouldn't it be better to > > preserve bits in the case of short length words where they're precious > > rather than long ones? If we make 0xxx th

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Is this an 8.2 thing? You are joking, no? > If not, is Numeric508 applied? No, that got rejected as being too much of a restriction of the dynamic range, eg John's comment here: http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php I think a

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> In any case it seems a bit backwards to me. Wouldn't it be better to >> preserve bits in the case of short length words where they're precious >> rather than long ones? If we make 0xxx the 1-byte case it means

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxx the 1-byte case it means ... Well, I don't find that real persuasiv

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Alvaro Herrera
Gregory Stark wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > > Well it's irrelevant if we add a special data type to handle CHAR(1). > > > > In that case you should probably be using "char" ... > > Well "char" doesn't have quite the same semantics as CHAR(1). If that's the > consen

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> I'm imagining that it would give you the same old uncompressed in-memory >> representation as it does now, ie, 4-byte length word and uncompressed >> data. > Sure, but how would you know? Sometimes you would get a

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Simon Riggs
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote: > After further thought I have an alternate proposal (snip) > * If high order bit of datum's first byte is 0, then it's an > uncompressed datum in what's essentially the same as our current > in-memory format except that the 4-byte length word m

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Well it's irrelevant if we add a special data type to handle CHAR(1). > > In that case you should probably be using "char" ... Well "char" doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fi

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Martijn van Oosterhout
On Mon, Sep 11, 2006 at 03:13:36PM +0100, Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > >> Also Heikki points out here that it would be nice to allow for the case > >> for a > >> 0-byte header. > > > > I don't think there's enough code space for that; at least not compared > > t

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Alvaro Herrera
Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > >> Also Heikki points out here that it would be nice to allow for the case > >> for a > >> 0-byte header. > > > > I don't think there's enough code space for that; at least not compared > > to its use case. > > Well it's irrelevant

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: >> Also Heikki points out here that it would be nice to allow for the case for a >> 0-byte header. > > I don't think there's enough code space for that; at least not compared > to its use case. Well it's irrelevant if we add a special data type to handle CHAR(

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Mark Dilger <[EMAIL PROTECTED]> writes: > > ... The argument made upthread that a > > quadratic number of conversion operators is necessitated doesn't seem > > right to me, given that each type could upcast to the canonical built in > > type. (int1 => sma

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Markus Schaber
Hi, Tom, Tom Lane wrote: > The only way we could pack stuff without alignment is to go over to the > idea that memory and disk representations are different --- where in > this case the "conversion" might just be a memcpy to a known-aligned > location. The performance costs of that seem pretty d

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> I'm a bit confused by this and how it would be handled in your sketch. I >> assumed we needed a bit pattern dedicated to 4-byte length headers because >> even though it would never occur on disk it would be necessa

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Jeremy Drake
On Sun, 10 Sep 2006, Kevin Brown wrote: > Tom Lane wrote: > > (does anyone know the cost of ntohl() on modern > > Intel CPUs?) > > I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit > mode, another one with the same processor running in 32-bit mode, a a > third running a Pentium 4

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Kevin Brown
Tom Lane wrote: > (does anyone know the cost of ntohl() on modern > Intel CPUs?) I wrote a simple test program to determine this: #include int main (int argc, char *argv[]) { unsigned long i; uint32_t a; a = 0; for (i = 0 ; i < 40L ; ++i) { #ifdef CALL_N

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Tom Lane wrote: > After further thought I have an alternate proposal that does that, > but it's got its own disadvantage: it requires storing uncompressed > 4-byte length words in big-endian byte order everywhere. This might > be a showstopper (does anyone know the cost of ntohl() on modern > Inte

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > I'm a bit confused by this and how it would be handled in your sketch. I > assumed we needed a bit pattern dedicated to 4-byte length headers because > even though it would never occur on disk it would be necessary to for the > uncompressed and/or detoast

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Tom Lane wrote: > > >> Either way, I think it would be interesting to consider > > >> > > >> (a) length word either one or two bytes, not four. You can't need more > > >> than 2 byte

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Either way, I think it would be interesting to consider > >> > >> (a) length word either one or two bytes, not four. You can't need more > >> than 2 bytes for a datum that fits in a disk pag

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Either way, I think it would be interesting to consider >> >> (a) length word either one or two bytes, not four. You can't need more >> than 2 bytes for a datum that fits in a disk page ... > That is an interesting observation, thoug

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > * Consider ways of storing rows more compactly on disk > > o Support a smaller header for short variable-length fields? > > With respect to the business of having different on-disk and in-memory > representations, we h

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > * Consider ways of storing rows more compactly on disk > o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that already today:

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Tom Lane wrote: Mark Dilger <[EMAIL PROTECTED]> writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 => smallint, int3 => integer, ascii1 => text

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Added to TODO: * Consider ways of storing rows more compactly on disk o Store disk pages with no alignment/padding? o Reorder physical storage order to reduce padding? o Support a smaller header for short variable-length fie

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Mark Dilger <[EMAIL PROTECTED]> writes: > ... The argument made upthread that a > quadratic number of conversion operators is necessitated doesn't seem > right to me, given that each type could upcast to the canonical built in > type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 =

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Martijn van Oosterhout wrote: On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable at

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Martijn van Oosterhout
On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: > >Well, it is unless you are willing to give up support of non-Intel CPUs; > >most other popular chips are strict about alignment, and will fail an > >attempt to do a nonaligned fetch. > > Intel CPUs are detectable at compile time, righ

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Tom Lane wrote: Bruce Momjian <[EMAIL PROTECTED]> writes: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips ar

Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: > > > The performance costs of that seem pretty daunting, however, especially when > > you reflect that simply stepping over a varlena field would require > > memcpy'ing its length word to someplace. > > I think if

Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > The performance costs of that seem pretty daunting, however, especially when > you reflect that simply stepping over a varlena field would require > memcpy'ing its length word to someplace. I think if you give up on disk and in-memory representations being t

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 04:49:22PM -0400, Andrew Dunstan wrote: > [EMAIL PROTECTED] wrote: > >Only ASCII values store more space efficiently in UTF-8. All values > >over 127 store more space efficiently using UTF-16. > This second statement is demonstrably not true. Only values above 0x07ff > req

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 04:49:49PM -0400, Alvaro Herrera wrote: > Actually he muttered something about iterators, and not needing to > convert anything. Yes, many of the useful functions accept strings in two forms, either UTF-16 or CharacterIterators. The iterator pretty much only has to know how

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan
[EMAIL PROTECTED] wrote: Only ASCII values store more space efficiently in UTF-8. All values over 127 store more space efficiently using UTF-16. This second statement is demonstrably not true. Only values above 0x07ff require more than 2 bytes in UTF-8. All chars up to that point are sto

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: > On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > > But Martijn already clarified that ICU does not actually force you to > > switch everything to UTF-16, so this is not an issue anyway. > > If my memory is correct, it does this by converting it to UTF-1

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: > > The authors of the library in question? Java? Anybody whose primary > > alphabet isn't LATIN1 based? :-) > Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than > UTF-8. That covers a l

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: > On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > > [EMAIL PROTECTED] wrote: > > > I think I've been involved in a discussion like this in the past. Was > > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > > means that UTF-8 applica

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: > > I think I've been involved in a discussion like this in the past. Was > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > means that UTF-8 applications are at a disadvantage when us

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > No one has mentioned that we page value on disk to match the CPU > > alignment. This is done for efficiency, but is not strictly required. > > Well, it is unless you are willing to give up support of non-Intel CPUs; > most other popu

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: > > > But that won't help in the example you posted upthread, because char(N) > > > is not fixed-length. > > > > It can be fixed-length, or at least, have an upper bo

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan
Bruce Momjian wrote: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. From time to time the idea of a logical vs physical mapping for columns has been mentioned. Among other benefits, that might allow u

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > No one has mentioned that we page value on disk to match the CPU > alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about a

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Gregory Stark wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Gregory Stark wrote: > > > But I think this is a dead-end route. What you're looking at is the > > > number "1" > > > repeated for *every* record in the table. And what your proposing amounts > > > to > > > noticing that the

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: > I think I've been involved in a discussion like this in the past. Was > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > means that UTF-8 applications are at a disadvantage when using the > library. UTF-16 is considered more efficient to work with for

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'l

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > >> AFAICT, most of the useful operations work on UChar, which is uint16: > >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > > 5324ef288165e2ac > > Oh, you're confusing UCS-2

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout writes: >> AFAICT, most of the useful operations work on UChar, which is uint16: >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > 5324ef288165e2ac > Oh, you're confusing UCS-2 with UTF-16, Ah, you're right, I did misunderstand that. However,

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 12:19:19PM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > >> what's more, the docs suggest that it doesn't support anything wider > >> than UTF16. > > > Well, that's not true, which part of the docs w

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout writes: > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: >> what's more, the docs suggest that it doesn't support anything wider >> than UTF16. > Well, that's not true, which part of the docs were you looking at? AFAICT, most of the useful operations work on UCh

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > The reason this is a relevant consideration: we are talking about > changes that would remove existing functionality for people who don't > have that library. Huh? If you don't select ICU at compile time you get no difference from what we

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Martijn van Oosterhout writes: > I'm still missing the argument of why you can't just make a 16-byte > type. Around half the datatypes in postgresql are fixed-length and have > no header. I'm completely confused about why people are hung up about > bytea(16) not being fixed length when it's triv

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout writes: > On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: >> Uhm, an ICU source tree is over 40 *megabytes*. > I don't understand this argument. No-one asked what size the LDAP > libraries were when we added support for them. No-one cares that > libssl/libcry

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: > > But that won't help in the example you posted upthread, because char(N) > > is not fixed-length. > > It can be fixed-length, or at least, have an upper bound. If marked > up to contain only ascii characters, it doesn't, at lea

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 08:50:57AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > But it's largely true for OLTP applications too. The more compact the > > data the more tuples fit on a page and the greater the chance you > > have the page you need in cache. > But a linear amount of more

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > But that won't help in

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 02:14:58PM +0200, Peter Eisentraut wrote: > So "mathematically", you are right, the collation is a property of the > operation, not of the operands. But semantically, the operands do > carry the information of what collation order they would like to be > compared under,

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Peter Eisentraut wrote: The real problem is that the established method dividing up the locale categories ignores both the technological and the linguistic reality. In reality, all properties like lc_collate, lc_ctype, and lc_numeric are dependent on the property "language of the text". I don'

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Heikki Linnakangas wrote: > have a default set per-database, per-table or per-column, but it's > not a property of the actual value of a field. I think that the > phrase "collation of a string" doesn't make sense. The real problem is that the established method dividing up the locale categories i

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: > Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the > rest of Postgres itself and that doesn't even include documentation. Even if > you exclude the data and regression tests you're still talking about dependi

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 11:58:59AM +0100, Heikki Linnakangas wrote: > Martijn van Oosterhout wrote: > >I think that if SQL COLLATE gets in we'll get this almost for free. > >Collation and charset are both properties of strings. Once you've got a > >mechanism to know the collation of a string, you j

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just attach the charset to the same place. The only difference is that changin

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 05:54:01AM -0400, Andrew Dunstan wrote: > >The encoding is set per-database. Even if you need UTF-8 to encode > >user-supplied strings, there can still be many small ASCII fields in > >the database. Country code, currency code etc. > > ISTM we should revisit this when we

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan
Heikki Linnakangas wrote: Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. -- Heikki Linnakangas

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Gregory Stark wrote: > > But that won't help in the example you posted upthread, because > > char(N) is not fixed-length. > > Sure it is because any sane database--certainly any sane database > using char(N)--is in C locale anyways. This matter is completely independent of the choice of locale and

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Gregory Stark <[EMAIL PROTECTED]> writes: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > > Gregory Stark wrote: > > > > But that won't help in the example you posted upthread, because > > > > char(N) is not fixed-length. > > > > > > Sure it is because any sane database--certainly any sane da

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew - Supernews
On 2006-09-08, Gregory Stark <[EMAIL PROTECTED]> wrote: >> But that won't help in the example you posted upthread, because char(N) >> is not fixed-length. > > Sure it is because any sane database--certainly any sane database using > char(N)--is in C locale anyways. You're confusing locale and cha

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: > > > But that won't help in the example you posted upthread, because > > > char(N) is not fixed-length. > > > > Sure it is because any sane database--certainly any sane database > > using char(N)--is in C locale anyways. > > Thi

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > > But that won't help in the example you p

  1   2   >