Re: [PATCHES] WIP: rewrite numeric division

2007-09-13 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
 Michael Paesold [EMAIL PROTECTED] writes:
  Please, let's revisit this, and not postpone it without further 
  discussion. I never knew about the correctness issues in div_var(), but 
  since I know about it, I feel I am just waiting until that problem will 
  hit me or anyone else.
 
 Yeah.  I was basically waiting to see if anyone could come up with a
 faster solution.  Since no one seems to have an idea how to do it
 better, I'm inclined to apply the patch for 8.3.
 
   regards, tom lane

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Michael Paesold
Please, let's revisit this, and not postpone it without further 
discussion. I never knew about the correctness issues in div_var(), but 
since I know about it, I feel I am just waiting until that problem will 
hit me or anyone else.
So can you, Tom, please describe in what situations the old code is 
really unsafe?


We usually *round* all values to at maximum 4 decimal places -- are we 
on the save side? Does this only affect high precision division, or any 
divisions?


Best Regards
Michael Paesold

Bruce Momjian wrote:

Because this has not been applied, this has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:

I wrote:

I just blew the dust off my old copy of Knuth vol 2, and see that his
algorithm for multi-precision division generates output digits that are
correct to start with (or at least he never needs to revisit a digit
after moving on to the next).  ISTM we should go over to an approach
like that.

The attached proposed patch rewrites div_var() using Knuth's algorithm,
meaning that division should always produce an exact truncated output
when asked to truncate at X number of places.  This passes regression
tests and fixes both of the cases previously exhibited:
http://archives.postgresql.org/pgsql-bugs/2007-06/msg00068.php
http://archives.postgresql.org/pgsql-general/2005-05/msg01109.php

The bad news is that it's significantly slower than the CVS-HEAD code;
it appears that for long inputs, div_var is something like 4X slower
than before, depending on platform.  The numeric_big regression test
takes about twice as long as before on one of my machines, and 50%
longer on another.  This is because the innermost loop now involves
integer division, which it didn't before.  (According to oprofile,
just about all the time goes into the loop that subtracts qhat * divisor
from the working dividend, which is what you'd expect.)

Now it's unlikely that real-world applications are going to be as
dependent on the speed of div_var for long inputs as numeric_big is.
And getting the right answer has to take priority over speed anyway.
Still this is a bit annoying.  Anyone see a way to speed it up, or
have another approach?

regards, tom lane



Content-Description: numeric-div.patch.gz

[ Type application/octet-stream treated as attachment, skipping... ]



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Tom Lane
Michael Paesold [EMAIL PROTECTED] writes:
 Please, let's revisit this, and not postpone it without further 
 discussion. I never knew about the correctness issues in div_var(), but 
 since I know about it, I feel I am just waiting until that problem will 
 hit me or anyone else.

Yeah.  I was basically waiting to see if anyone could come up with a
faster solution.  Since no one seems to have an idea how to do it
better, I'm inclined to apply the patch for 8.3.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Michael Paesold [EMAIL PROTECTED] writes:
 Please, let's revisit this, and not postpone it without further 
 discussion. I never knew about the correctness issues in div_var(), but 
 since I know about it, I feel I am just waiting until that problem will 
 hit me or anyone else.

 Yeah.  I was basically waiting to see if anyone could come up with a
 faster solution.  Since no one seems to have an idea how to do it
 better, I'm inclined to apply the patch for 8.3.

My only reservation is that I don't have the numeric methods background to
tell if it's really necessary. My fix does resolve the only actual documented
inaccuracy in the existing method.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Yeah.  I was basically waiting to see if anyone could come up with a
 faster solution.  Since no one seems to have an idea how to do it
 better, I'm inclined to apply the patch for 8.3.

 My only reservation is that I don't have the numeric methods background to
 tell if it's really necessary. My fix does resolve the only actual documented
 inaccuracy in the existing method.

Well, this doesn't take a lot of numerical methods background: the
fundamental problem is that the existing code generates an *approximate*
answer, whereas people who are doing div and mod on large integers tend
to expect an *exact* answer.  Approximate doesn't cut it --- there will
always be cases where an off-by-one-in-the-last-internal-place error can
carry far enough to the left to be visible to the user.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Yeah.  I was basically waiting to see if anyone could come up with a
 faster solution.  Since no one seems to have an idea how to do it
 better, I'm inclined to apply the patch for 8.3.

 My only reservation is that I don't have the numeric methods background to
 tell if it's really necessary. My fix does resolve the only actual documented
 inaccuracy in the existing method.

 Well, this doesn't take a lot of numerical methods background: the
 fundamental problem is that the existing code generates an *approximate*
 answer, whereas people who are doing div and mod on large integers tend
 to expect an *exact* answer.  Approximate doesn't cut it --- there will
 always be cases where an off-by-one-in-the-last-internal-place error can
 carry far enough to the left to be visible to the user.

So does your code behave differently for this or does it round off to the
arbitrary division precision before hitting trunc?

postgres=# select trunc(::numeric / 10::numeric);
trunc 
--
 1000
(1 row)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Well, this doesn't take a lot of numerical methods background: the
 fundamental problem is that the existing code generates an *approximate*
 answer, whereas people who are doing div and mod on large integers tend
 to expect an *exact* answer.  Approximate doesn't cut it --- there will
 always be cases where an off-by-one-in-the-last-internal-place error can
 carry far enough to the left to be visible to the user.

 So does your code behave differently for this or does it round off to the
 arbitrary division precision before hitting trunc?

 postgres=# select trunc(::numeric / 10::numeric);
 trunc 
 --
  1000
 (1 row)

No, my proposed patch doesn't change that.  It might be that we should
provide an integer division operator for NUMERIC, so that you can get
at the exact result of trunc(x/y).

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP: rewrite numeric division

2007-07-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 No, my proposed patch doesn't change that.  It might be that we should
 provide an integer division operator for NUMERIC, so that you can get
 at the exact result of trunc(x/y).

I was also thinking that if the denominator had only factors of 2 and 5 we
could calculate the precision to be precisely enough to maintain the original
precision. Ie, /1000 should just give you n.nnn not n.nnn and more
importantly it should never round.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] WIP: rewrite numeric division

2007-07-16 Thread Bruce Momjian

Because this has not been applied, this has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
 I wrote:
  I just blew the dust off my old copy of Knuth vol 2, and see that his
  algorithm for multi-precision division generates output digits that are
  correct to start with (or at least he never needs to revisit a digit
  after moving on to the next).  ISTM we should go over to an approach
  like that.
 
 The attached proposed patch rewrites div_var() using Knuth's algorithm,
 meaning that division should always produce an exact truncated output
 when asked to truncate at X number of places.  This passes regression
 tests and fixes both of the cases previously exhibited:
 http://archives.postgresql.org/pgsql-bugs/2007-06/msg00068.php
 http://archives.postgresql.org/pgsql-general/2005-05/msg01109.php
 
 The bad news is that it's significantly slower than the CVS-HEAD code;
 it appears that for long inputs, div_var is something like 4X slower
 than before, depending on platform.  The numeric_big regression test
 takes about twice as long as before on one of my machines, and 50%
 longer on another.  This is because the innermost loop now involves
 integer division, which it didn't before.  (According to oprofile,
 just about all the time goes into the loop that subtracts qhat * divisor
 from the working dividend, which is what you'd expect.)
 
 Now it's unlikely that real-world applications are going to be as
 dependent on the speed of div_var for long inputs as numeric_big is.
 And getting the right answer has to take priority over speed anyway.
 Still this is a bit annoying.  Anyone see a way to speed it up, or
 have another approach?
 
   regards, tom lane
 

Content-Description: numeric-div.patch.gz

[ Type application/octet-stream treated as attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] WIP: rewrite numeric division

2007-06-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Yeah, my proposed patch is schoolbook division.  I don't think I'd trust
 Newton's method to give anything but a close approximation, which is
 what we have in HEAD already.

 Well unless we start storing rational numbers it'll always be a limited to a
 finite number of decimals. The key is whether we can guarantee a lower bound
 on the number of accurate decimal places. As long as we can then we can keep
 going until the decimal places we're going to return are all accurate.

This is nonsense.  Consider an approximate division that gives

...123.99...

You can keep generating 9's forever, but you'll never know whether the
accurate result of trunc() should be 123 or 124.  Unless you use a
method that gives you trustworthy digits to start with.

 It looks like multiplication can also generate incorrect results.

It can, but only if told to produce fewer digits than would be in the
exact result, so I'm not worried about that.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] WIP: rewrite numeric division

2007-06-19 Thread Michael Paesold

Tom Lane wrote:

I wrote:

...

Now it's unlikely that real-world applications are going to be as
dependent on the speed of div_var for long inputs as numeric_big is.
And getting the right answer has to take priority over speed anyway.
Still this is a bit annoying.  Anyone see a way to speed it up, or
have another approach?

regards, tom lane


+1 for the change from me.

We use PostgreSQL for financial accounting stuff, including plpgsql 
triggers and functions etc. And we use numeric for all that. I always 
thought that numeric division was exact! :-) I never saw problem, 
perhaps because we round to very few digits, but well.


Please apply this patch. I can accept the performance drop, as long as 
there won't be bad surprises with correctness.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] WIP: rewrite numeric division

2007-06-18 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Where did we get the CVS-HEAD algorithm from anyways?

IIRC it's from Smith's FM library cited in numeric.c, which is also
where most of numeric.c's higher-order-function algorithms came from.
It's good code but oriented to scientific computing, which means it
thinks that a close approximation to the right answer is good enough.
In hindsight, there are too many people out there who expect div and mod
to be exact for us to go with that approach.

 wikipedia lists a whole
 bunch of multiplication algorithms -- none of which sound like this -- but
 only two division algorithms, school-book which is O(n^2) and Newton's
 Method which has complexity equal to the multiplication method used.

Yeah, my proposed patch is schoolbook division.  I don't think I'd trust
Newton's method to give anything but a close approximation, which is
what we have in HEAD already.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] WIP: rewrite numeric division

2007-06-17 Thread Tom Lane
I wrote:
 I just blew the dust off my old copy of Knuth vol 2, and see that his
 algorithm for multi-precision division generates output digits that are
 correct to start with (or at least he never needs to revisit a digit
 after moving on to the next).  ISTM we should go over to an approach
 like that.

The attached proposed patch rewrites div_var() using Knuth's algorithm,
meaning that division should always produce an exact truncated output
when asked to truncate at X number of places.  This passes regression
tests and fixes both of the cases previously exhibited:
http://archives.postgresql.org/pgsql-bugs/2007-06/msg00068.php
http://archives.postgresql.org/pgsql-general/2005-05/msg01109.php

The bad news is that it's significantly slower than the CVS-HEAD code;
it appears that for long inputs, div_var is something like 4X slower
than before, depending on platform.  The numeric_big regression test
takes about twice as long as before on one of my machines, and 50%
longer on another.  This is because the innermost loop now involves
integer division, which it didn't before.  (According to oprofile,
just about all the time goes into the loop that subtracts qhat * divisor
from the working dividend, which is what you'd expect.)

Now it's unlikely that real-world applications are going to be as
dependent on the speed of div_var for long inputs as numeric_big is.
And getting the right answer has to take priority over speed anyway.
Still this is a bit annoying.  Anyone see a way to speed it up, or
have another approach?

regards, tom lane



binZ6bTIGGXbC.bin
Description: numeric-div.patch.gz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq