Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
"Tom Lane" <[EMAIL PROTECTED]> wrote > > If we were willing to invent the "varlena2" datum format then we could > save four bytes per numeric, plus reduce numeric's alignment requirement > from int to short which would probably save another byte per value on > average. I'm not sure that that's worth doing if numeric and inet are > the only beneficiaries, but it might be. > I would support "varlena2" from user's applications. Another benefit is for char types. Many applications are from DB2, Oracle or SQL Server: Max Char Length DB2 32672 SQL 8000 Oracle 4000 All of above just need varlena2. To support bigger char types, we could follow the tradition "long varchar", etc. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Could someone please quantify how much bang we might get for what seems > like quite a lot of bucks? > I appreciate the need for speed, but the saving here strikes me as > marginal at best, unless my instincts are all wrong (quite possible) Two bytes per numeric value is not a lot, agreed. If we were willing to invent the "varlena2" datum format then we could save four bytes per numeric, plus reduce numeric's alignment requirement from int to short which would probably save another byte per value on average. I'm not sure that that's worth doing if numeric and inet are the only beneficiaries, but it might be. >From a speed perspective the only thing favoring UNKNOWNNUMERIC is the possibility for saving some conversion operations when the grammar's initial guess about datatype is wrong. That's probably pretty marginal though. I was thinking of it more as a vehicle for helping us clean up the type-assignment behavior some more. The example I have in my notes is that "float4col = 1.8" is certain to fail since 1.8 will be taken as float8, not float4, and then you have precision mismatch problems. You can make it work by quoting: "float4col = '1.8'" but that's surely pretty ugly. If the constant were initially UNKNOWNNUMERIC then it would end up as float4 without that hack. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
[patches removed] Tom Lane wrote: Simon Riggs <[EMAIL PROTECTED]> writes: It seems straightforward enough to put in an additional test, similar to the ones already there so that if its too big for a decimal we make it a float straight away - only a float can be that big in that case. After that I can't really see what the problem is? Wrong answer. You'll be introducing weird corner cases into the type resolution behavior. An approach that would actually have some credibility would be to not resolve constants to NUMERIC right away, but to invent an UNKNOWNNUMERIC pseudotype with coercion behavior comparable to the existing UNKNOWN type for string literals. This has been proposed before but hasn't really been needed so far. Of course, this converts the project from a minor localized hack on NUMERIC into a major piece of fiddling with the type resolution rules, with the potential for unforeseen side-effects on the behavior of other data types. It might be worth doing anyway --- I don't recall at the moment what problems UNKNOWNNUMERIC was intended to solve, but if they're still open issues then it's something we ought to get around to sometime. Could someone please quantify how much bang we might get for what seems like quite a lot of bucks? I appreciate the need for speed, but the saving here strikes me as marginal at best, unless my instincts are all wrong (quite possible) cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
On Wed, Nov 02, 2005 at 06:12:37PM -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > It seems straightforward enough to put in an additional test, similar to > > the ones already there so that if its too big for a decimal we make it a > > float straight away - only a float can be that big in that case. After > > that I can't really see what the problem is? > > Wrong answer. You'll be introducing weird corner cases into the type > resolution behavior. > > An approach that would actually have some credibility would be to not > resolve constants to NUMERIC right away, but to invent an UNKNOWNNUMERIC > pseudotype with coercion behavior comparable to the existing UNKNOWN > type for string literals. This has been proposed before but hasn't > really been needed so far. Of course, this converts the project from a > minor localized hack on NUMERIC into a major piece of fiddling with the > type resolution rules, with the potential for unforeseen side-effects on > the behavior of other data types. It might be worth doing anyway --- I > don't recall at the moment what problems UNKNOWNNUMERIC was intended to > solve, but if they're still open issues then it's something we ought to > get around to sometime. Thought I'd look to see if I could find anything about UNKNOWNNUMERIC, but no such luck (ISTM we really need a better way to find discussion on old ideas...) But while looking I did find this TODO, which might be relevant to the current discussion: # Change NUMERIC to enforce the maximum precision, and increase it Unfortunately I can't find any reference to that in the archives... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
Simon Riggs <[EMAIL PROTECTED]> writes: > It seems straightforward enough to put in an additional test, similar to > the ones already there so that if its too big for a decimal we make it a > float straight away - only a float can be that big in that case. After > that I can't really see what the problem is? Wrong answer. You'll be introducing weird corner cases into the type resolution behavior. An approach that would actually have some credibility would be to not resolve constants to NUMERIC right away, but to invent an UNKNOWNNUMERIC pseudotype with coercion behavior comparable to the existing UNKNOWN type for string literals. This has been proposed before but hasn't really been needed so far. Of course, this converts the project from a minor localized hack on NUMERIC into a major piece of fiddling with the type resolution rules, with the potential for unforeseen side-effects on the behavior of other data types. It might be worth doing anyway --- I don't recall at the moment what problems UNKNOWNNUMERIC was intended to solve, but if they're still open issues then it's something we ought to get around to sometime. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
On Wed, 2005-11-02 at 15:09 -0500, Tom Lane wrote: > [ thinks for a moment... ] Actually, neither proposal is going to get > off the ground, because the parser's handling of numeric constants is > predicated on the assumption that type NUMERIC can handle any valid > value of FLOAT8, and so we can get away with converting to NUMERIC on > sight and then coercing to float later if parse analysis finds out the > constant should be float. If the dynamic range of NUMERIC is less than > 10^308 then this fails. So we have to find another bit somewhere, or > the idea is dead in the water. We convert a Value node to a Const in backend/parser/parse_node.c:make_const It seems straightforward enough to put in an additional test, similar to the ones already there so that if its too big for a decimal we make it a float straight away - only a float can be that big in that case. After that I can't really see what the problem is? There is only a single call where numeric_float8() is called anywhere in the code, which is during selectivity calculations. In that case we actually call numeric_float8_no_overflow(). If its a FLOAT8OID, then we can simply avoid the conversion, since it already would be one. Can you explain further? Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
On Wed, 2005-11-02 at 15:09 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I wasn't trying to claim the bit assignment made sense. My point was > > that the work to mangle the two fields together to make it make sense > > looked like it would take more CPU (since the standard representation of > > signed integers is different for +ve and -ve values). It is the more CPU > > I'm worried about, not the wasted bits on the weight. > > I think that's purely hypothetical. The existing representation, as > well as the one you propose, both require shift-and-mask operations > to pull the fields out of the packed format. The format I'm suggesting > would require some different shift-and-mask operations. As a first > approximation it'd be a wash, and any actual differences would be > CPU-specific enough that we shouldn't put a whole lot of weight on the > point. (C compilers tend to be fairly bright about optimizing field > extraction operations.) OK > Moreover, you've forgotten the basic gating factor here, which is > whether such a proposal will get accepted at all. Reducing the > available range from 1000 digits to 255 might pass without too much > objection, but dropping it down another factor of 4 to 63 starts to > bring it uncomfortably close to mattering to people. > > [ thinks for a moment... ] Actually, neither proposal is going to get > off the ground, because the parser's handling of numeric constants is > predicated on the assumption that type NUMERIC can handle any valid > value of FLOAT8, and so we can get away with converting to NUMERIC on > sight and then coercing to float later if parse analysis finds out the > constant should be float. If the dynamic range of NUMERIC is less than > 10^308 then this fails. So we have to find another bit somewhere, or > the idea is dead in the water. Well, that certainly is obscure, I'll give you that. 308 huh? The middle ground between 64 and 308 is somewhere around 255, yes? :-) I'll get on it. Including Catch-308. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
Simon Riggs <[EMAIL PROTECTED]> writes: > I wasn't trying to claim the bit assignment made sense. My point was > that the work to mangle the two fields together to make it make sense > looked like it would take more CPU (since the standard representation of > signed integers is different for +ve and -ve values). It is the more CPU > I'm worried about, not the wasted bits on the weight. I think that's purely hypothetical. The existing representation, as well as the one you propose, both require shift-and-mask operations to pull the fields out of the packed format. The format I'm suggesting would require some different shift-and-mask operations. As a first approximation it'd be a wash, and any actual differences would be CPU-specific enough that we shouldn't put a whole lot of weight on the point. (C compilers tend to be fairly bright about optimizing field extraction operations.) Moreover, you've forgotten the basic gating factor here, which is whether such a proposal will get accepted at all. Reducing the available range from 1000 digits to 255 might pass without too much objection, but dropping it down another factor of 4 to 63 starts to bring it uncomfortably close to mattering to people. [ thinks for a moment... ] Actually, neither proposal is going to get off the ground, because the parser's handling of numeric constants is predicated on the assumption that type NUMERIC can handle any valid value of FLOAT8, and so we can get away with converting to NUMERIC on sight and then coercing to float later if parse analysis finds out the constant should be float. If the dynamic range of NUMERIC is less than 10^308 then this fails. So we have to find another bit somewhere, or the idea is dead in the water. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
On Wed, 2005-11-02 at 13:46 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote: > >> I don't think it'd be worth having 2 types. Remember that the weight is > >> measured in base-10k digits. Suppose for instance > >>sign1 bit > >>weight 7 bits (-64 .. +63) > >>dscale 8 bits (0..255) > > > I've coded a short patch to do this, which is the result of two > > alternate patches and some thinking, but maybe not enough yet. > > What your patch does is Thanks for checking this so quickly. > > sign2 bits OK, thats just a mistake in my second patch. Thats easily corrected. Please ignore that for now. > weight 8 bits (-128..127) > dscale 6 bits (0..63) > > which is simply pretty lame: weight effectively has a factor of 8 more > dynamic range than dscale in this representation. What's the point of > being able to represent 1 * 1 ^ -128 (ie, 10^-512) if the dscale > only lets you show 63 fractional digits? You've got to allocate the > bits in a saner fashion. Yes, that takes a little more work. I wasn't trying to claim the bit assignment made sense. My point was that the work to mangle the two fields together to make it make sense looked like it would take more CPU (since the standard representation of signed integers is different for +ve and -ve values). It is the more CPU I'm worried about, not the wasted bits on the weight. Spending CPU cycles on *all* numerics just so we can have numbers with > +/-64 decimal places doesn't seem a good trade. Hence I stuck the numeric sign back on the dscale, and so dscale and weight seem out of balance. So, AFAICS, the options are: 0 (current cvstip) Numeric range up to 1000, with additional 2 bytes per column value 1. Numeric range up to 128, but with overhead to extract last bit 2. Numeric range up to 64 I'm suggesting we choose (2) other views are welcome. (I'll code it whichever way we decide.) > Also, since the internal (unpacked) calculation representation has a > much wider dynamic range than this, it'd probably be appropriate to add > some range checks to the code that forms a packed value from unpacked. Well, there already is one that does that, otherwise I would have added one as you suggest. (The unpacked code has int values, whereas the previous packed format used u/int16 values). Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
Simon Riggs <[EMAIL PROTECTED]> writes: > On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote: >> I don't think it'd be worth having 2 types. Remember that the weight is >> measured in base-10k digits. Suppose for instance >> sign1 bit >> weight 7 bits (-64 .. +63) >> dscale 8 bits (0..255) > I've coded a short patch to do this, which is the result of two > alternate patches and some thinking, but maybe not enough yet. What your patch does is sign2 bits weight 8 bits (-128..127) dscale 6 bits (0..63) which is simply pretty lame: weight effectively has a factor of 8 more dynamic range than dscale in this representation. What's the point of being able to represent 1 * 1 ^ -128 (ie, 10^-512) if the dscale only lets you show 63 fractional digits? You've got to allocate the bits in a saner fashion. Yes, that takes a little more work. Also, since the internal (unpacked) calculation representation has a much wider dynamic range than this, it'd probably be appropriate to add some range checks to the code that forms a packed value from unpacked. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Reducing the overhead of NUMERIC data
On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > FWIW, most databases I've used limit NUMERIC to 38 digits, presumably to > > fit length info into 1 or 2 bytes. So there's something to be said for a > > small numeric type that has less overhead and a large numeric (what we > > have today). > > I don't think it'd be worth having 2 types. Remember that the weight is > measured in base-10k digits. Suppose for instance > sign1 bit > weight 7 bits (-64 .. +63) > dscale 8 bits (0..255) > This gives us a dynamic range of 1e-256 to 1e255 as well as the ability > to represent up to 255 displayable fraction digits. Does anyone know > any real database applications where that's not enough? > > (I'm neglecting NaN here in supposing we need only 1 bit for sign, > but we could have a special encoding for NaN. Perhaps disallow the > weight = -64 case and use that to signal NaN.) I've coded a short patch to do this, which is the result of two alternate patches and some thinking, but maybe not enough yet. The patch given here is different on two counts from above: This sets... #define NUMERIC_MAX_PRECISION 64 since #define NUMERIC_MAX_RESULT_SCALE(NUMERIC_MAX_PRECISION * 2) We don't seem to be able to use all of the bits actually available to us in the format. Perhaps we need to decouple these now? Previously, we had room for 14 bits, which gave a maximum of 16384. We were using NUMERIC_MAX of 1000, so doubling it didn't give problems. The above on-disk format showed sign & weight together, whereas the current code has sign and dscale together. Trying to put sign and weight together is somewhat difficult, since weight is itself a signed value. I coded it up that way around, which is reasonably straightforward but harder than the patch enclosed here. But AFAICS - which isn't that far normally I grant you, doing things that way around would require some twos-complement work to get things correct when weight is negative. That worries me. IMHO we should accept the step down in maximum numeric precision (down to "only" 64 digits) rather than put extra processing into every manipulation of a NUMERIC datatype. With luck, I've misunderstood and we can have both performance and precision. If not, I commend 64 digits to you as sufficient for every imaginable purpose - saving 2 bytes off every numeric column. (And still 28 decimal places more accurate than Oracle). Best Regards, Simon Riggs Index: src/include/utils/numeric.h === RCS file: /projects/cvsroot/pgsql/src/include/utils/numeric.h,v retrieving revision 1.20 diff -c -c -r1.20 numeric.h *** src/include/utils/numeric.h 1 Jan 2005 05:43:09 - 1.20 --- src/include/utils/numeric.h 2 Nov 2005 18:06:03 - *** *** 15,24 #define _PG_NUMERIC_H_ /* ! * Hardcoded precision limit - arbitrary, but must be small enough that ! * dscale values will fit in 14 bits. */ ! #define NUMERIC_MAX_PRECISION 1000 /* * Internal limits on the scales chosen for calculation results --- 15,24 #define _PG_NUMERIC_H_ /* ! * Hardcoded precision limit - must be small enough that ! * dscale values will fit into the number of bits available in NumericData */ ! #define NUMERIC_MAX_PRECISION 64 /* * Internal limits on the scales chosen for calculation results *** *** 39,49 /* * Sign values and macros to deal with packing/unpacking n_sign_dscale */ ! #define NUMERIC_SIGN_MASK 0xC000 ! #define NUMERIC_POS 0x ! #define NUMERIC_NEG 0x4000 ! #define NUMERIC_NAN 0xC000 ! #define NUMERIC_DSCALE_MASK 0x3FFF #define NUMERIC_SIGN(n) ((n)->n_sign_dscale & NUMERIC_SIGN_MASK) #define NUMERIC_DSCALE(n) ((n)->n_sign_dscale & NUMERIC_DSCALE_MASK) #define NUMERIC_IS_NAN(n) (NUMERIC_SIGN(n) != NUMERIC_POS && \ --- 39,49 /* * Sign values and macros to deal with packing/unpacking n_sign_dscale */ ! #define NUMERIC_SIGN_MASK 0xC0 ! #define NUMERIC_POS 0x00 ! #define NUMERIC_NEG 0x40 ! #define NUMERIC_NAN 0xC0 ! #define NUMERIC_DSCALE_MASK 0x3F #define NUMERIC_SIGN(n) ((n)->n_sign_dscale & NUMERIC_SIGN_MASK) #define NUMERIC_DSCALE(n) ((n)->n_sign_dscale & NUMERIC_DSCALE_MASK) #define NUMERIC_IS_NAN(n) (NUMERIC_SIGN(n) != NUMERIC_POS && \ *** *** 61,74 typedef struct NumericData { int32 varlen; /* Variable size (std varlena header) */ ! int16 n_weight; /* Weight of 1st digit */ ! uint16 n_sign_dscale; /* Sign + display scale */ char n_data[1]; /* Digits (really array of NumericDigit) */ } NumericData; typedef NumericData *Numeric; ! #define NUMERIC_HDRSZ (sizeof(int32) + sizeof(int16) + sizeof(uint16)) /* --- 61,74 typedef struct NumericData { int32 varlen; /* Variable size (std varlena header) */ ! int8 n_weight; /* Weight of 1st dig