Re: [HACKERS] more numeric stuff

2010-08-13 Thread Bruce Momjian
Tom Lane wrote:
  3. 64-bit arithmetic.  Right now, mul_var() and div_var() use int for
  arithmetic, but haven't we given up on supporting platforms without
  long long?  I'm not sure I'm motivated enough to write the patch
  myself, but it seems like 64-bit arithmetic would give us a lot more
  room to postpone carries.
 
 I don't think this would win unless we went to 32-bit NumericDigit,
 which is a problem from the on-disk-compatibility standpoint, not to
 mention making the alignment issues even worse.  Postponing carries is
 good, but we have enough headroom for that already --- I really doubt
 that making the array elements wider would save anything noticeable
 unless you increase NBASE.

Should we be collecting pg_upgrade-breaking changes on the TODO list so
we can implement them in one future release?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more numeric stuff

2010-08-13 Thread Robert Haas
On Fri, Aug 13, 2010 at 1:10 PM, Bruce Momjian br...@momjian.us wrote:
 Tom Lane wrote:
  3. 64-bit arithmetic.  Right now, mul_var() and div_var() use int for
  arithmetic, but haven't we given up on supporting platforms without
  long long?  I'm not sure I'm motivated enough to write the patch
  myself, but it seems like 64-bit arithmetic would give us a lot more
  room to postpone carries.

 I don't think this would win unless we went to 32-bit NumericDigit,
 which is a problem from the on-disk-compatibility standpoint, not to
 mention making the alignment issues even worse.  Postponing carries is
 good, but we have enough headroom for that already --- I really doubt
 that making the array elements wider would save anything noticeable
 unless you increase NBASE.

 Should we be collecting pg_upgrade-breaking changes on the TODO list so
 we can implement them in one future release?

Possibly, but I don't think we want to do this one even if we WERE
willing to break pg_upgrade.  Increasing NBASE would be a complete
disaster in terms of Numeric on-disk footprint, which - even with the
changes I just implemented - is already uncomfortably high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] more numeric stuff

2010-08-04 Thread Robert Haas
I have a couple ideas for further work on the numeric code that I want
to get feedback on.

1. Cramming it down some more.  I propose that we introduce a third
format with a one-byte header: 1 bit for sign, 3 bits for dynamic
scale, and 4 bits for weight (the first of which is a sign bit).  This
might seem crazy, but it's still enough to represent values with a
weight of between +7 and -8, but it's still enough to represent a
number with up to 32 digits before the decimal point and up to 7
decimal places, which covers a lot of ground.  And if you've got a
billion rows on disk with several numeric values in each row, saving a
byte per value starts to be significant.  We don't need any special
marker to indicate that the 1-byte format is in use, because we can
deduce it from the length of the varlena (after excluding the header):
even = 2b or 4b header, odd = 1b header.  There can't be any
odd-length numerics already on disk, so there shouldn't be any
compatibility break for pg_upgrade to worry about.

2. Don't untoast/don't copy.  Right now, given a numeric stored as a
short varlena (the normal case if it's coming from on disk), we
untoast it before doing anything, and then we copy the digits into a
separate palloc'd digit buffer.  Copying the data twice is clearly a
waste.  It seems that very few of the var-manipulation functions in
numeric.c actually scribble on their input (exceptions I've found so
far: round_var, trunc_var, strip_var).  So, when translating a Numeric
into a NumericVar (set_var_from_num), we could potentially skip
allocating the digit buffer if the digit string in the Numeric is
already allocated, and teach the few functions that need to scribble
on their input to force the buffer to be allocated if it hasn't been
yet.  I'm not too sure whether this is the trouble; a quick test this
morning suggested that such a patch would not be too difficult to
write, but on the other hand the performance gain was pretty small.
Another, not necessarily mutually exclusive option would be to try to
operate directly on the packed format.  That looks like it would
require some fairly major surgery; I'm not sure what we would do with
the many copies of this code:

Numeric num = PG_GETARG_NUMERIC(0);

3. 64-bit arithmetic.  Right now, mul_var() and div_var() use int for
arithmetic, but haven't we given up on supporting platforms without
long long?  I'm not sure I'm motivated enough to write the patch
myself, but it seems like 64-bit arithmetic would give us a lot more
room to postpone carries.

OK, time to duck.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more numeric stuff

2010-08-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I have a couple ideas for further work on the numeric code that I want
 to get feedback on.

 1. Cramming it down some more.  I propose that we introduce a third
 format with a one-byte header: 1 bit for sign, 3 bits for dynamic
 scale, and 4 bits for weight (the first of which is a sign bit).  This
 might seem crazy,

Yes, it does.  In the first place it isn't going to work conveniently
because NumericDigit requires int16 alignment.  In the second, shaving
just one byte doesn't seem like enough win to be worth the trouble.
I don't believe your billion rows argument because you aren't
factoring in the result of row-level alignment padding --- most of the
time you're not going to win anything.

 We don't need any special
 marker to indicate that the 1-byte format is in use, because we can
 deduce it from the length of the varlena (after excluding the header):
 even = 2b or 4b header, odd = 1b header.  There can't be any
 odd-length numerics already on disk, so there shouldn't be any
 compatibility break for pg_upgrade to worry about.

Really?  Not sure this is true, because numerics can be toast-compressed.
It hardly ever happens, but to do this that's not good enough.

 2. Don't untoast/don't copy.

This would be good, but I'm not sure how to do it.  The main problem
again is NumericDigit alignment.  Only about half the time is the digit
array going to be aligned the way you need, so that puts a real crimp
in the possible win.  (In fact, if we assume the previous field is more
than byte aligned and the toast header is one byte, then the digit array
is *never* properly aligned on disk :-()

One possibility is to have an additional toasting rule that forces
odd-byte-alignment of a field's one-byte header.  But it's a bit hard to
argue that numeric deserves the additional overhead that that would put
into all the core tuple forming/deforming logic.

 3. 64-bit arithmetic.  Right now, mul_var() and div_var() use int for
 arithmetic, but haven't we given up on supporting platforms without
 long long?  I'm not sure I'm motivated enough to write the patch
 myself, but it seems like 64-bit arithmetic would give us a lot more
 room to postpone carries.

I don't think this would win unless we went to 32-bit NumericDigit,
which is a problem from the on-disk-compatibility standpoint, not to
mention making the alignment issues even worse.  Postponing carries is
good, but we have enough headroom for that already --- I really doubt
that making the array elements wider would save anything noticeable
unless you increase NBASE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more numeric stuff

2010-08-04 Thread Robert Haas
On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I have a couple ideas for further work on the numeric code that I want
 to get feedback on.

 1. Cramming it down some more.  I propose that we introduce a third
 format with a one-byte header: 1 bit for sign, 3 bits for dynamic
 scale, and 4 bits for weight (the first of which is a sign bit).  This
 might seem crazy,

 Yes, it does.  In the first place it isn't going to work conveniently
 because NumericDigit requires int16 alignment.

It is definitely not convenient.  I'm not disputing that.

 In the second, shaving
 just one byte doesn't seem like enough win to be worth the trouble.
 I don't believe your billion rows argument because you aren't
 factoring in the result of row-level alignment padding --- most of the
 time you're not going to win anything.

Row-level alignment padding is a problem, and on very short rows, or
rows where numeric is the only varlena, you may see no benefit.  But
if there are multiple text or numeric columns packed up next to each
other, things are more promising.

 We don't need any special
 marker to indicate that the 1-byte format is in use, because we can
 deduce it from the length of the varlena (after excluding the header):
 even = 2b or 4b header, odd = 1b header.  There can't be any
 odd-length numerics already on disk, so there shouldn't be any
 compatibility break for pg_upgrade to worry about.

 Really?  Not sure this is true, because numerics can be toast-compressed.
 It hardly ever happens, but to do this that's not good enough.

I was thinking of it like PG_GETARG_TEXT_PP() and similar - we would
detoast compressed and external datums, but leave packed ones as-is.
At that point you should have an accurate length count, and can decide
what to do.

 2. Don't untoast/don't copy.

 This would be good, but I'm not sure how to do it.  The main problem
 again is NumericDigit alignment.  Only about half the time is the digit
 array going to be aligned the way you need, so that puts a real crimp
 in the possible win.  (In fact, if we assume the previous field is more
 than byte aligned and the toast header is one byte, then the digit array
 is *never* properly aligned on disk :-()

This is another reason why I think a 1-byte numeric header would be
good to have.

 One possibility is to have an additional toasting rule that forces
 odd-byte-alignment of a field's one-byte header.  But it's a bit hard to
 argue that numeric deserves the additional overhead that that would put
 into all the core tuple forming/deforming logic.

Yeah, plus we'd be adding more alignment padding for an extremely
tenuous performance gain.  The benchmarks I did this morning seem to
indicate that the extra palloc/pfree/memcpy overhead is only barely
more than zero, so it only makes sense if we can get it without
suffering other penalties.

 3. 64-bit arithmetic.  Right now, mul_var() and div_var() use int for
 arithmetic, but haven't we given up on supporting platforms without
 long long?  I'm not sure I'm motivated enough to write the patch
 myself, but it seems like 64-bit arithmetic would give us a lot more
 room to postpone carries.

 I don't think this would win unless we went to 32-bit NumericDigit,
 which is a problem from the on-disk-compatibility standpoint,

This would increase the average size of a Numeric value considerably,
so it would be a very BAD thing IMO.

 not to
 mention making the alignment issues even worse.  Postponing carries is
 good, but we have enough headroom for that already --- I really doubt
 that making the array elements wider would save anything noticeable
 unless you increase NBASE.

I dunno, it was just a thought, based on some quick benchmarking that
indicated some possible hotspots in that area.  But I didn't test it
carefully enough to be sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more numeric stuff

2010-08-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This would be good, but I'm not sure how to do it.  The main problem
 again is NumericDigit alignment.  Only about half the time is the digit
 array going to be aligned the way you need, so that puts a real crimp
 in the possible win.  (In fact, if we assume the previous field is more
 than byte aligned and the toast header is one byte, then the digit array
 is *never* properly aligned on disk :-(

 This is another reason why I think a 1-byte numeric header would be
 good to have.

Hmm.  That's a good point --- 1-byte toast header plus 1-byte numeric
header would leave you correctly aligned, anytime the previous field
didn't end on an odd byte boundary.  So maybe the combination of both
things would have enough synergy to be worth the trouble.  Still,
it seems like it'd be quite messy to deal with 1-byte header followed
by NumericDigits without any padding ... there'd be no way to declare
that as a C struct, for sure.  Have you got a plan for what this would
actually look like in code?

Also, maybe this idea should supersede the one with two-byte numeric
header.  I'm not sure it's worth having three variants, and we are
not at all committed to the two-byte version yet.

 I don't think this would win unless we went to 32-bit NumericDigit,
 which is a problem from the on-disk-compatibility standpoint,

 This would increase the average size of a Numeric value considerably,
 so it would be a very BAD thing IMO.

Oh, I certainly wasn't advocating for doing that ;-)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more numeric stuff

2010-08-04 Thread Robert Haas
On Wed, Aug 4, 2010 at 7:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Aug 4, 2010 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This would be good, but I'm not sure how to do it.  The main problem
 again is NumericDigit alignment.  Only about half the time is the digit
 array going to be aligned the way you need, so that puts a real crimp
 in the possible win.  (In fact, if we assume the previous field is more
 than byte aligned and the toast header is one byte, then the digit array
 is *never* properly aligned on disk :-(

 This is another reason why I think a 1-byte numeric header would be
 good to have.

 Hmm.  That's a good point --- 1-byte toast header plus 1-byte numeric
 header would leave you correctly aligned, anytime the previous field
 didn't end on an odd byte boundary.  So maybe the combination of both
 things would have enough synergy to be worth the trouble.  Still,
 it seems like it'd be quite messy to deal with 1-byte header followed
 by NumericDigits without any padding ... there'd be no way to declare
 that as a C struct, for sure.  Have you got a plan for what this would
 actually look like in code?

No.  I was hoping you'd have a brilliant idea.  Generally, I think
we'd need to treat a Numeric as essentially a void * and probably
lose the special cases that try to operate directly on the packed
format. That would allow us to confine the knowledge of the multiple
header formats to the pack/unpack functions (set_var_from_num and
make_result).

 Also, maybe this idea should supersede the one with two-byte numeric
 header.  I'm not sure it's worth having three variants, and we are
 not at all committed to the two-byte version yet.

It's a thought, but let's not get ahead of ourselves.  The code for
the two-byte header code is done, tested, reviewed, and committed,
whereas the code for the one-byte header is vaporware and full of
difficulties.  Furthermore, let's not kid ourselves: a broad range of
useful values can be represented using a one-byte header, but to need
a four-byte header instead of a two-byte header you need to be doing
something fairly ridiculous.  Even if the one-byte header thing gets
implemented, I don't think it makes sense to give back 2 bytes on all
the fine things that can be represented with a two-byte header for
some tenuous code complexity benefit.

 I don't think this would win unless we went to 32-bit NumericDigit,
 which is a problem from the on-disk-compatibility standpoint,

 This would increase the average size of a Numeric value considerably,
 so it would be a very BAD thing IMO.

 Oh, I certainly wasn't advocating for doing that ;-)

Oh, good.  :-)

Making this smaller is too much work to think about doing *anything*
that might make it bigger.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers