Re: [HACKERS] Reducing data type space usage

2006-09-18 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-09-15 kell 19:34, kirjutas Tom Lane: Bruce Momjian [EMAIL PROTECTED] writes: Oh, OK, I had high byte meaning no header, but clear is better, so 0001 is 0x01, and is . But I see now that bytea does store nulls, so yea, we would be better using

Re: [HACKERS] Reducing data type space usage

2006-09-18 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: would adding this - first byte 0xxx field length 1 byte, exactly that value first byte 10xx 0xx data bytes follow first byte 110x -- x data bytes to follow first byte 111x -- x

Re: [HACKERS] Reducing data type space usage

2006-09-17 Thread Martijn van Oosterhout
On Sat, Sep 16, 2006 at 08:56:11PM +0100, Gregory Stark wrote: [Re inet and cidr] Why are these varlena? Just for ipv6 addresses? Is the network mask length not stored if it's not present? This gives us a strange corner case in that ipv4 addresses will *always* fit in the smallfoo data type

Re: [HACKERS] Reducing data type space usage

2006-09-17 Thread Gregory Stark
Martijn van Oosterhout kleptog@svana.org writes: On Sat, Sep 16, 2006 at 08:56:11PM +0100, Gregory Stark wrote: [Re inet and cidr] Why are these varlena? Just for ipv6 addresses? Is the network mask length not stored if it's not present? This gives us a strange corner case in that ipv4

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at great length about how wastefully we store CHAR(1) ? Sure, this has a somewhat restricted use

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at great length about how wastefully we store CHAR(1) ?

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Weren't you the one that's been going on at great length about how wastefully we store CHAR(1) ? Sure, this has a somewhat restricted use case, but it's about as efficient as we could possibly get within that use

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Andrew Dunstan
Tom Lane wrote: To review: Bruce is proposing a var-length type structure with the properties first byte 0xxx field length 1 byte, exactly that value first byte 1xxx xxx data bytes follow This can support *any* stored value from zero to 127 bytes long.

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I like this scheme a lot - maximum bang for buck. Is there any chance we can do it transparently, without exposing new types? It is in effect an implementation detail ISTM, and ideally the user would not need to have any knowledge of it. Well,

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: I like this scheme a lot - maximum bang for buck. Is there any chance we can do it transparently, without exposing new types? It is in effect an implementation detail ISTM, and ideally the user would not need to have any

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-09-15 kell 19:18, kirjutas Tom Lane: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, it'll be a 1-byte header with length indicating that no bytes follow, Well, in my idea, 1001 would be 0x01. I was going to use the remaining 7 bits for the

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian
Hannu Krosing wrote: ?hel kenal p?eval, R, 2006-09-15 kell 19:18, kirjutas Tom Lane: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, it'll be a 1-byte header with length indicating that no bytes follow, Well, in my idea, 1001 would be 0x01. I was going to use the

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at great length about how wastefully we store CHAR(1) ? Sure, this has a

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at great length about how wastefully

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Mark Dilger
Mark Dilger wrote: Wouldn't a 4-byte numeric be a float4 and an 8-byte numeric be a float8. I'm not sure I see the difference. Nevermind. I don't normally think about numeric as anything other than an arbitrarily large floating point type. But it does differ in that you can specify the

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread mark
On Sat, Sep 16, 2006 at 02:13:49PM -0700, Mark Dilger wrote: Mark Dilger wrote: Wouldn't a 4-byte numeric be a float4 and an 8-byte numeric be a float8. I'm not sure I see the difference. Nevermind. I don't normally think about numeric as anything other than an arbitrarily large floating

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Heikki Linnakangas
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. Weren't you the one that's been going on at great length about how wastefully we store CHAR(1) ? Sure, this has a somewhat

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Sure, this helps with CHAR(1) but there were plen OK. Ooops, sorry, I guess I sent that before I was finished editing it. I'm glad you could divine what I meant because I'm not entirely

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: why not go all the way, and do utf-7 encoded header if hi bit is set ? or just always have an utf-8 encoded header. That definition is (a) very expensive to scan, and (b) useless for anything except utf-8 encoded text. Whatever mechanism we select should

[HACKERS] Reducing data type space usage

2006-09-15 Thread Gregory Stark
Following up on the recent discussion on list about wasted space in data representations I want to summarise what we found and make some proposals: As I see it there are two cases: Case 1) Data types that are variable length but often quite small. This includes things like NUMERIC which in

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 06:50:37PM +0100, Gregory Stark wrote: With a CHAR(1) and CASH style numeric substitute we won't have 25-100% performance lost on the things that would fit in 1-4 bytes. And with the variable sized varlena header we'll limit to 25% at worst and 1-2% usually the

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Mark Dilger
Gregory Stark wrote: snip Case 2) Solving this is quite difficult without introducing major performance problems or security holes. The one approach we have that's practical right now is introducing special data types such as the oft-mentioned char data type. char doesn't have quite

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Bruce Momjian
Gregory Stark wrote: Case 2) Data types that are different sizes depending on the typmod but are always the same size that can be determined statically for a given typmod. In the case of a ASCII encoded database CHAR(n) fits this category and in any case we'll eventually have

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: FYI, we also need to figure out how to store a zero-length string. That will probably be high-bit, and then all zero bits. We don't store a zero-byte in strings, so that should be unique for . No, it'll be a 1-byte header with length indicating that no

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: FYI, we also need to figure out how to store a zero-length string. That will probably be high-bit, and then all zero bits. We don't store a zero-byte in strings, so that should be unique for . No, it'll be a 1-byte header with

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, it'll be a 1-byte header with length indicating that no bytes follow, Well, in my idea, 1001 would be 0x01. I was going to use the remaining 7 bits for the 7-bit ascii value. Huh? I thought you said 0001 would be 0x01,

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, it'll be a 1-byte header with length indicating that no bytes follow, Well, in my idea, 1001 would be 0x01. I was going to use the remaining 7 bits for the 7-bit ascii value. Huh? I thought you said

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Oh, OK, I had high byte meaning no header Just how annoying would it be if I pointed out I suggested precisely this a few days ago? Tom said he didn't think there was enough code space and my own experimentation was slowly leading me to agree, sadly. It

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Oh, OK, I had high byte meaning no header, but clear is better, so 0001 is 0x01, and is . But I see now that bytea does store nulls, so yea, we would be better using 1001, and it is the same size as . I'm liking this idea more

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Tom said he didn't think there was enough code space and my own experimentation was slowly leading me to agree, sadly. There isn't if you want the type to also handle long strings. But what if we restrict it to short strings? See my message just now.

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Oh, OK, I had high byte meaning no header Just how annoying would it be if I pointed out I suggested precisely this a few days ago? Tom said he didn't think there was enough code space and my own experimentation was slowly

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom said he didn't think there was enough code space and my own experimentation was slowly leading me to agree, sadly. There isn't if you want the type to also handle long strings. But what if we restrict it to short

Re: [HACKERS] Reducing data type space usage

2006-09-15 Thread Bort, Paul
Gregory Stark writes: Tom Lane [EMAIL PROTECTED] writes: There isn't if you want the type to also handle long strings. But what if we restrict it to short strings? See my message just now. Then it seems like it imposes a pretty hefty burden on the user. But there are a lot of