Re: Greatest Common Divisor

2020-01-25 Thread Vik Fearing
On 25/01/2020 15:18, Dean Rasheed wrote:
> 
> Committed with some adjustments, mostly cosmetic but a couple more 
> substantive:

Thanks!

> The code to guard against a floating point exception with inputs of
> (INT_MIN, -1) wasn't quite right because it actually just moved the
> problem so that it would fall over with inputs of (INT_MIN, +1).

Good catch.

> The convention in numeric.c is that the xxx_var() functions take
> *pointers* to their NumericVar arguments rather than copies, and they
> do not modify their inputs, as indicated by the use of "const". You
> might just have gotten away with what you were doing, but I think it
> was bad style and potentially unsafe -- for example, someone calling
> gcd_var() with a NumericVar that came from some other computation and
> having a non-null buf would risk having the buf freed in the copy,
> leaving the original NumericVar with a buf pointing to freed memory.

Thank you for taking the time to look closely at this.  This was my
first time dealing with "numeric" so I was bound to make some mistakes.
-- 
Vik Fearing




Re: Greatest Common Divisor

2020-01-25 Thread Dean Rasheed
On Mon, 20 Jan 2020 at 08:04, Vik Fearing  wrote:
>
> On 20/01/2020 08:44, Dean Rasheed wrote:
> >>
> > I see this has been marked RFC. I'll take it,
>

Committed with some adjustments, mostly cosmetic but a couple more substantive:

The code to guard against a floating point exception with inputs of
(INT_MIN, -1) wasn't quite right because it actually just moved the
problem so that it would fall over with inputs of (INT_MIN, +1).

The convention in numeric.c is that the xxx_var() functions take
*pointers* to their NumericVar arguments rather than copies, and they
do not modify their inputs, as indicated by the use of "const". You
might just have gotten away with what you were doing, but I think it
was bad style and potentially unsafe -- for example, someone calling
gcd_var() with a NumericVar that came from some other computation and
having a non-null buf would risk having the buf freed in the copy,
leaving the original NumericVar with a buf pointing to freed memory.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
On Mon, 20 Jan 2020 at 19:04, Alvaro Herrera  wrote:
>
> On 2020-Jan-20, Dean Rasheed wrote:
>
> > +   
> > +    greatest common divisor  the largest positive number that
> > +divides both inputs with no remainder; returns 
> > 0 if
> > +both inputs are zero
> > +   
>
> Warning, severe TOC/bikeshedding ahead.
>
> I don't know why, but this dash-semicolon sequence reads strange to me
> and looks out of place.  I would use parens for the first phrase and
> keep the semicolon, that is "greatest common divisor (the largest ...);
> returns 0 if ..."
>
> That seems more natural to me, and we're already using parens in other
> description s.
>

Hmm, OK. I suppose that's more logical because then the bit in parens
is the standard definition of gcd/lcm, and the part after the
semicolon is the implementation choice for the special case not
covered by the standard definition.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
On Mon, 20 Jan 2020 at 18:52, Vik Fearing  wrote:
>
> On 20/01/2020 11:28, Dean Rasheed wrote:
> >
> > +   
> > +least common multiple  the smallest strictly positive number
> > +that is an integer multiple of both inputs; returns
> > 0
> > +if either input is zero
> > +   
>
> In that case should lcm be "...that is an integral multiple..." since
> the numeric version will return numeric?
>

So "integral multiple" instead of "integer multiple"? I think I'm more
used to the latter, but I'm happy with either.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-20 Thread Alvaro Herrera
On 2020-Jan-20, Dean Rasheed wrote:

> +   
> +    greatest common divisor  the largest positive number that
> +divides both inputs with no remainder; returns 0 
> if
> +both inputs are zero
> +   

Warning, severe TOC/bikeshedding ahead.

I don't know why, but this dash-semicolon sequence reads strange to me
and looks out of place.  I would use parens for the first phrase and
keep the semicolon, that is "greatest common divisor (the largest ...);
returns 0 if ..."

That seems more natural to me, and we're already using parens in other
description s.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-20 Thread Vik Fearing
On 20/01/2020 11:28, Dean Rasheed wrote:
> Looking at the docs, I think it's worth going a little further than
> just saying what the acronyms stand for -- especially since the
> behaviour for zero inputs is an implementation choice (albeit the most
> common one). I propose the following:
>
> +   
> +    greatest common divisor  the largest positive number that
> +divides both inputs with no remainder; returns 0 
> if
> +both inputs are zero
> +   
>
> and:
>
> +   
> +least common multiple  the smallest strictly positive number
> +that is an integer multiple of both inputs; returns
> 0
> +if either input is zero
> +   
>
> (I have tried to be precise in my use of terms like "number" and
> "integer", to cover the different cases)


In that case should lcm be "...that is an integral multiple..." since
the numeric version will return numeric?


Other than that, I'm happy with this change.

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
Looking at the docs, I think it's worth going a little further than
just saying what the acronyms stand for -- especially since the
behaviour for zero inputs is an implementation choice (albeit the most
common one). I propose the following:

+   
+greatest common divisor  the largest positive number that
+divides both inputs with no remainder; returns 0 if
+both inputs are zero
+   

and:

+   
+least common multiple  the smallest strictly positive number
+that is an integer multiple of both inputs; returns
0
+if either input is zero
+   

(I have tried to be precise in my use of terms like "number" and
"integer", to cover the different cases)

Regards,
Dean




Re: Greatest Common Divisor

2020-01-20 Thread Vik Fearing
On 20/01/2020 08:44, Dean Rasheed wrote:
> On Tue, 7 Jan 2020 at 12:31, Tom Lane  wrote:
>> Dean Rasheed  writes:
>>> Do we actually need the smallint versions of these functions?
>> Doubt it.  It'd be fairly hard even to call those, since e.g. "42"
>> is an int not a smallint.
>>
> I see this has been marked RFC. I'll take it, 


Thanks!


> and barring objections,
> I'll start by ripping out the smallint code.


No strong objection.

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-19 Thread Dean Rasheed
On Tue, 7 Jan 2020 at 12:31, Tom Lane  wrote:
>
> Dean Rasheed  writes:
> > Do we actually need the smallint versions of these functions?
>
> Doubt it.  It'd be fairly hard even to call those, since e.g. "42"
> is an int not a smallint.
>

I see this has been marked RFC. I'll take it, and barring objections,
I'll start by ripping out the smallint code.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-07 Thread Tom Lane
Dean Rasheed  writes:
> Do we actually need the smallint versions of these functions?

Doubt it.  It'd be fairly hard even to call those, since e.g. "42"
is an int not a smallint.

regards, tom lane




Re: Greatest Common Divisor

2020-01-07 Thread Dean Rasheed
Do we actually need the smallint versions of these functions?

I would have thought that automatic casting would take care of any
cases that need smallints, and I can't believe that there's any
performance benefit to be had that's worth maintaining the extra code.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-06 Thread Fabien COELHO


Hello Merlin,


For low-level arithmetic code like this one, with tests and loops
containing very few hardware instructions, I think that helping compiler
optimizations is a good idea.


Do you have any performance data to back that up?


Yep.

A generic data is the woes about speculative execution related CVE (aka 
Spectre) fixes and their impact on performance, which is in percents, 
possibly tens of them, when the thing is more or less desactivated to 
mitigate the security issue:


  
https://www.nextplatform.com/2018/03/16/how-spectre-and-meltdown-mitigation-hits-xeon-performance/

Some data about the __builtin_expect compiler hints:

  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0479r0.html

Basically, they are talking about percents, up to tens in some cases, 
which is consistent with the previous example.


As I said, helping the compiler is a good idea, and pg has been doing that 
with the likely/unlikely macros for some time, there are over an hundred 
of them, including in headers which get expanded ("logging.h", "float.h", 
"simplehash.h", …), which is a good thing.


I do not see any good reason to stop doing that, especially in low-level 
arithmetic functions.


Now, I do not have specific data about the performance impact on a gcd 
implementation. Mileage may vary depending on hardware, compiler, options 
and other overheads.


--
Fabien.

Re: Greatest Common Divisor

2020-01-06 Thread Merlin Moncure
On Mon, Jan 6, 2020 at 6:52 AM Fabien COELHO  wrote:
>
>
> Hello Robert,
>
> >>   if (arg1 == PG_INT32_MIN)
> >>   if (arg2 == 0 || arg2 == PG_INT32_MIN)
> >>
> >> And possibly a "likely" on the while.
> >
> > I don't think decoration the code with likely() and unlikely() all
> > over the place is a very good idea.
>
> > Odds are good that we'll end up with a bunch that are actually
> > non-optimal, and nobody will ever figure it out because it's hard to
> > figure out.
>
> My 0.02€: I'd tend to disagree.
>
> Modern pipelined processors can take advantage of speculative execution on
> branches, so if you know which branch is the more likely it can help.
>
> Obviously if you get it wrong it does not, but for the above cases it
> seems to me that they are rather straightforward.
>
> It also provides some "this case is expected to be exceptional" semantics
> to people reading the code.
>
> > I have a hard time believing that we're going to be much
> > worse off if we just write the code normally.
>
> I think that your point applies to more general programming in postgres,
> but this is not the context here.
>
> For low-level arithmetic code like this one, with tests and loops
> containing very few hardware instructions, I think that helping compiler
> optimizations is a good idea.

Do you have any performance data to back that up?

merlin




Re: Greatest Common Divisor

2020-01-06 Thread Fabien COELHO


Hello Robert,


  if (arg1 == PG_INT32_MIN)
  if (arg2 == 0 || arg2 == PG_INT32_MIN)

And possibly a "likely" on the while.


I don't think decoration the code with likely() and unlikely() all
over the place is a very good idea.


Odds are good that we'll end up with a bunch that are actually 
non-optimal, and nobody will ever figure it out because it's hard to 
figure out.


My 0.02€: I'd tend to disagree.

Modern pipelined processors can take advantage of speculative execution on 
branches, so if you know which branch is the more likely it can help.


Obviously if you get it wrong it does not, but for the above cases it 
seems to me that they are rather straightforward.


It also provides some "this case is expected to be exceptional" semantics 
to people reading the code.


I have a hard time believing that we're going to be much 
worse off if we just write the code normally.


I think that your point applies to more general programming in postgres, 
but this is not the context here.


For low-level arithmetic code like this one, with tests and loops 
containing very few hardware instructions, I think that helping compiler 
optimizations is a good idea.


Maybe in the "while" the compiler would assume that it is going to loop 
anyway by default, so it may be less useful there.


--
Fabien.

Re: Greatest Common Divisor

2020-01-06 Thread Robert Haas
On Sat, Jan 4, 2020 at 4:21 PM Fabien COELHO  wrote:
> In GCD implementations, for instance:
>
>   if (arg1 == PG_INT32_MIN)
>   if (arg2 == 0 || arg2 == PG_INT32_MIN)
>
> And possibly a "likely" on the while.

I don't think decoration the code with likely() and unlikely() all
over the place is a very good idea. Odds are good that we'll end up
with a bunch that are actually non-optimal, and nobody will ever
figure it out because it's hard to figure out. I have a hard time
believing that we're going to be much worse off if we just write the
code normally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-04 Thread Vik Fearing
On 04/01/2020 20:08, Dean Rasheed wrote:
> On Sat, 4 Jan 2020 at 17:55, Vik Fearing  wrote:
>> On 04/01/2020 10:37, Dean Rasheed wrote:
>>> BTW, there is actually no need to restrict the inputs to integral
>>> values because GCD is something that has a perfectly natural extension
>>> to floating point inputs (see for example [1]). Moreover, since we
>>> already have a mod(numeric, numeric) that works for arbitrary inputs,
>>> Euclid's algorithm just works.
>>> [...]
>>> If it were more work to support non-integer inputs, I'd say that it's
>>> not worth the effort, but since it's actually less work to just allow
>>> it, then why not?
>>
>> Okay, I allow that now, but I've still left it for lcm.  I can't find
>> anything anywhere that defines lcm for floating point (I do find it for
>> fractions) and the result of abs(a*b)/gcd(a,b) certainly doesn't match
>> "lowest" in the examples I tried.
>>
> Here's another article on the subject:
> https://www.math-only-math.com/hcf-and-lcm-of-decimals.html


Yeah, my eyes weren't aligning the decimal points properly.


Attached version frees up lcm to work on non-integrals.  Thanks for your
input!

-- 

Vik Fearing

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..e2b7d6d240 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,32 @@
-43
       
 
+  
+   
+
+ gcd
+
+gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
+  
+   
+
+ lcm
+
+lcm(a, b)
+   
+   (same as argument types)
+   least common multiple
+   lcm(1071, 462)
+   23562
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 583ce71e66..53673a8dea 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,226 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/*
+ * Greatest Common Divisor
+ *
+ * Special cases:
+ *   - gcd(x, 0) = gcd(0, x) = abs(x)
+ *   		because 0 is divisible by anything
+ *   - gcd(0, 0) = 0
+ *   		complies with the previous definition and is a common convention
+ *
+ * The following cases involving INT_MIN have two possible results.  They could
+ * return INT_MIN because INT_MIN is a valid divisor of INT_MIN, or they could
+ * throw an exception because the result is negative.
+ * The consensus is to throw an exception.
+ *
+ *   - gcd(INT_MIN, 0)
+ *   - gcd(INT_MIN, INT_MIN)
+ *
+ * Any other value with INT_MIN will be a positive value representable within
+ * the data type.
+ */
+static int32
+int4gcd_internal(int32 arg1, int32 arg2)
+{
+	int32	swap;
+	int32	a1, a2;
+
+	/*
+	 * Put the greater absolute value in arg1.
+	 *
+	 * This would happen automatically in the loop below, but avoids an
+	 * expensive modulo simulation on some architectures.
+	 *
+	 * We do this in negative space in order to handle INT_MIN.
+	 */
+	a1 = (arg1 < 0) ? arg1 : -arg1;
+	a2 = (arg2 < 0) ? arg2 : -arg2;
+	if (a1 > a2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	/* Special care needs to be taken with INT_MIN.  See comments above. */
+	if (arg1 == PG_INT32_MIN)
+	{
+		if (arg2 == 0 || arg2 == PG_INT32_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("integer out of range")));
+		/*
+		 * Making arg2 positive avoids the INT_MIN % -1 issue described in
+		 * int4mod.
+		 */
+		arg2 = -arg2;
+	}
+
+	/* Find GCD using the basic Euclidean algorithm */
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	/*
+	 * Make sure the result is positive. (We know we don't have INT_MIN
+	 * anymore).
+	 */
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+static int16
+int2gcd_internal(int16 arg1, int16 arg2)
+{
+	/* See int4gcd_internal for commented version. */
+	int16	swap;
+	int16	a1, a2;
+
+	a1 = (arg1 < 0) ? arg1 : -arg1;
+	a2 = (arg2 < 0) ? arg2 : -arg2;
+	if (a1 > a2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	if (arg1 == PG_INT16_MIN)
+	{
+		if (arg2 == 0 || arg2 == PG_INT16_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("smallint out of range")));
+		arg2 = -arg2;
+	}
+
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32	arg1 = PG_GETARG_INT32(0);
+	int32	arg2 = PG_GETARG_INT32(1);
+	int32	result;
+
+	result = int4gcd_internal(arg1, arg2);
+
+	PG_RETURN_INT32(result);
+}
+
+Datum
+int2gcd(PG_FUNCTION_ARGS)
+{
+	int16	arg1 = PG_GETARG_INT16(0);
+	int16	arg2 = PG_GETARG_INT16(1);
+	int16	result;
+
+	res

Re: Greatest Common Divisor

2020-01-04 Thread Fabien COELHO


Hello Vik,


Add unlikely() where appropriate.


Any particular place in mind where I didn't already put it?


In GCD implementations, for instance:

 if (arg1 == PG_INT32_MIN)
 if (arg2 == 0 || arg2 == PG_INT32_MIN)

And possibly a "likely" on the while.

In LCM implementations, for instance:

 if (arg1 == 0 || arg2 == 0)
 if (arg1 == arg2)

The later is partially redundant with preceeding case BTW, which could be 
managed inside this one, reducing the number of tests? Something like:


 if (arg1 == arg2)
   if (arg1 == MIN_INT)
 error
   else
 return abs(arg1)

I'm not sure why you want to deal with a1 == a2 case separately, could it 
not just work with the main code?


If you want to deal with it separately, then why not doing arg1 == -arg2 
as well?



Please stop suggesting it.


Fine, fine!

Tom also suggested to align implementations as much as possible, and I do 
agree with him.


Also, I'd suggest to add a comment to explain that the precise C99 modulo 
semantics is required to make the algorithm work, and that it may not work 
with C89 semantics for instance.



Note: in the numeric code you abs the value, ISTM consistent to do it
as well in the other implementations.


As noted in the comments, numeric does not have the INT_MIN problem.


Sure, but there are special cases at the beginning all the same: NAN, 
INTEGRAL…


Would it make sense that NAN is returned on NUMERIC when the 
computation cannot be performed, eg on non integer values?


I don't think so, no.


Ok. Why? I do not have an opinion, but ISTM that there is a choice and it 
should be explained. Could be consistency with other cases, whatever.



Why the positive constraint on LCM(NUMERIC, NUMERIC)? Why not absoluting?


I didn't see a definition for negative inputs, but now I see one so I've
lifted the restriction.


Good.

About tests: again, I'd check the LCM overflow on smaller values.

I'm not convinced by the handling of fractional numerics in gcd, eg:

 +SELECT gcd(330.3::numeric, 462::numeric);
 + gcd
 +-
 + 0.3
 +(1 row)

ISTM that the average user, including myself, would expect an integer 
result from gcd.


If this is kept, the documentation should be clear about what it does and 
what it means, because the least to say is that it is surprising.


Somehow I could have expected the arguments to be casted to int, so that 
it would lead to 66.


Python does a type error, which I find even better. I'd vote for erroring.

If this fractional gcd makes some sense and is desirable, then ISTM that 
lcm(a,b) = a / gcd(a, b) * b should make as much sense and should be 
allowed as well, for consistency.


--
Fabien.

Re: Greatest Common Divisor

2020-01-04 Thread Dean Rasheed
On Sat, 4 Jan 2020 at 17:55, Vik Fearing  wrote:
> On 04/01/2020 10:37, Dean Rasheed wrote:
> >
> > BTW, there is actually no need to restrict the inputs to integral
> > values because GCD is something that has a perfectly natural extension
> > to floating point inputs (see for example [1]). Moreover, since we
> > already have a mod(numeric, numeric) that works for arbitrary inputs,
> > Euclid's algorithm just works.
> > [...]
> > If it were more work to support non-integer inputs, I'd say that it's
> > not worth the effort, but since it's actually less work to just allow
> > it, then why not?
>
>
> Okay, I allow that now, but I've still left it for lcm.  I can't find
> anything anywhere that defines lcm for floating point (I do find it for
> fractions) and the result of abs(a*b)/gcd(a,b) certainly doesn't match
> "lowest" in the examples I tried.
>

Here's another article on the subject:
https://www.math-only-math.com/hcf-and-lcm-of-decimals.html

It works because gcd(a*10^n, b*10^n) = gcd(a, b)*10^n, and therefore
lcm(a*10^n, b*10^n) = lcm(a, b)*10^n, so the results will just have
their decimal points shifted. For example:

gcd(54, 24) = 6
lcm(54, 24) = 216 = 4*54 = 9*24

gcd(5.4, 2.4) = 0.6
lcm(5.4, 2.4) = 21.6 = 4*5.4 = 9*2.4

that is the lowest common integer multiple of the two decimal inputs.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-04 Thread Vik Fearing
On 04/01/2020 10:34, Fabien COELHO wrote:
> Code comments: gcd(n, 0) = abs(n), not n?


OK, changed.


> Add unlikely() where appropriate.


Any particular place in mind where I didn't already put it?


> I'd deal with int_min and 0 at the beginning and then proceed with
> absoluting the values, rather than the dance around a1/arg1, a2/arg2,
> and other arg2 = -arg2, and arg1 = -arg1 anyway in various places,
> which does not make the code that easy to understand.
>
> Pseudo code could be:
>
>    if ((a1 == min && (a2 == min || a2 == 0)) ||
>    (a2 == min && a1 == 0))
>  error;
>    a1 = abs(a1), a2 = abs(a2);
>    euclids;
>    return;


This would cause one of my tests to fail.  Please stop suggesting it.


> Note: in the numeric code you abs the value, ISTM consistent to do it
> as well in the other implementations.


As noted in the comments, numeric does not have the INT_MIN problem.


> Would it make sense that NAN is returned on NUMERIC when the
> computation cannot be performed, eg on non integer values?


I don't think so, no.


> Why the positive constraint on LCM(NUMERIC, NUMERIC)? Why not absoluting? 


I didn't see a definition for negative inputs, but now I see one so I've
lifted the restriction.


On 04/01/2020 10:37, Dean Rasheed wrote:
>
> BTW, there is actually no need to restrict the inputs to integral
> values because GCD is something that has a perfectly natural extension
> to floating point inputs (see for example [1]). Moreover, since we
> already have a mod(numeric, numeric) that works for arbitrary inputs,
> Euclid's algorithm just works.
> [...]
> If it were more work to support non-integer inputs, I'd say that it's
> not worth the effort, but since it's actually less work to just allow
> it, then why not?


Okay, I allow that now, but I've still left it for lcm.  I can't find
anything anywhere that defines lcm for floating point (I do find it for
fractions) and the result of abs(a*b)/gcd(a,b) certainly doesn't match
"lowest" in the examples I tried.


On 04/01/2020 18:01, Tom Lane wrote:
> Dean Rasheed  writes:
>> OTOH, for numeric inputs, this could easily end up needing many
>> thousands of divisions and it's not hard to construct inputs that take
>> minutes to compute, although this is admittedly with ridiculously
>> large inputs (~10^13), and AFAICS, the performance is OK with
>> "normal" sized inputs. Should we put a limit on the size of the
>> inputs?
> No, but a CHECK_FOR_INTERRUPTS in the loop would be well-advised,
> if there's not one already inside the called functions.


Good idea.  Added.

-- 

Vik Fearing

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..e2b7d6d240 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,32 @@
-43
   
 
+  
+   
+
+ gcd
+
+gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
+  
+   
+
+ lcm
+
+lcm(a, b)
+   
+   (same as argument types)
+   least common multiple
+   lcm(1071, 462)
+   23562
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 583ce71e66..5244b891dc 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,226 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/*
+ * Greatest Common Divisor
+ *
+ * Special cases:
+ *   - gcd(x, 0) = gcd(0, x) = abs(x)
+ *   		because 0 is divisible by anything
+ *   - gcd(0, 0) = 0
+ *   		complies with the previous definition and is a common convention
+ *
+ * The following cases involving INT_MIN have two possible results.  They could
+ * return INT_MIN because INT_MIN is a valid divisor of INT_MIN, or they could
+ * throw an exception because the result is negative.
+ * The consensus is to throw an exception.
+ *
+ *   - gcd(INT_MIN, 0)
+ *   - gcd(INT_MIN, INT_MIN)
+ *
+ * Any other value with INT_MIN will be a positive value representable within
+ * the data type.
+ */
+static int32
+int4gcd_internal(int32 arg1, int32 arg2)
+{
+	int32	swap;
+	int32	a1, a2;
+
+	/*
+	 * Put the greater absolute value in arg1.
+	 *
+	 * This would happen automatically in the loop below, but avoids an
+	 * expensive modulo simulation on some architectures.
+	 *
+	 * We do this in negative space in order to handle INT_MIN.
+	 */
+	a1 = (arg1 < 0) ? arg1 : -arg1;
+	a2 = (arg2 < 0) ? arg2 : -arg2;
+	if (a1 > a2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	/* Special care needs to be taken with INT_MIN.  See comments above. */
+	if (arg1 == PG_INT32_MIN)
+	{
+		if (arg2 == 0 || arg2 == PG_INT32_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUM

Re: Greatest Common Divisor

2020-01-04 Thread Tom Lane
Dean Rasheed  writes:
> OTOH, for numeric inputs, this could easily end up needing many
> thousands of divisions and it's not hard to construct inputs that take
> minutes to compute, although this is admittedly with ridiculously
> large inputs (~10^13), and AFAICS, the performance is OK with
> "normal" sized inputs. Should we put a limit on the size of the
> inputs?

No, but a CHECK_FOR_INTERRUPTS in the loop would be well-advised,
if there's not one already inside the called functions.

> There are apparently more efficient algorithms, but I think that
> should definitely be kept out of scope for this patch.

+1

regards, tom lane




Re: Greatest Common Divisor

2020-01-04 Thread Dean Rasheed
On Sat, 4 Jan 2020 at 09:37, Dean Rasheed  wrote:
>
> Well Vik has now provided a numeric implementation and it doesn't
> appear to be too much code.
>

BTW, I did a bit of research into the efficiency of Euclid's
algorithm. It's actually quite interesting:

It turns out that the worst case is when the inputs are successive
values from the Fibonacci sequence. In that case, since
Fib(n)/Fib(n-1) = 1 remainder Fib(n-2), the algorithm will walk
backwards through the whole sequence before terminating, and the
result will always be 1.

For bigint inputs, the worst case is gcd(7540113804746346429,
4660046610375530309) which requires something like 90 divisions.
Testing that, it's still sub-millisecond though, so I don't think
there's any problem there.

OTOH, for numeric inputs, this could easily end up needing many
thousands of divisions and it's not hard to construct inputs that take
minutes to compute, although this is admittedly with ridiculously
large inputs (~10^13), and AFAICS, the performance is OK with
"normal" sized inputs. Should we put a limit on the size of the
inputs? I'm not sure exactly how that would work, but I think it would
have to take into account the relative weights of the inputs rather
than just the maximum weight. At the very least, I think we need a
check for interrupts here (c.f. the numeric factorial function).
Perhaps such a check is sufficient. It's not like there aren't lots of
other ways to tie up the server.

There are apparently more efficient algorithms, but I think that
should definitely be kept out of scope for this patch.

Regards,
Dean




Re: Greatest Common Divisor

2020-01-04 Thread Dean Rasheed
On Thu, 2 Jan 2020 at 15:13, Tom Lane  wrote:
>
> Stephen Frost  writes:
> > * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> >> I'm not objecting to adding it, I'm just curious. In fact, I think
> >> that if we do add this, then we should probably add lcm() at the same
> >> time, since handling its overflow cases is sufficiently non-trivial to
> >> justify not requiring users to have to implement it themselves.
>
> > I tend to agree with this.
>
> Does this impact the decision about whether we need a variant for
> numeric?  I was leaning against that, primarily because (a)
> it'd introduce a set of questions about what to do with non-integral
> inputs, and (b) it'd make the patch quite a lot larger, I imagine.
> But a variant of lcm() that returns numeric would have much more
> resistance to overflow.
>

Well Vik has now provided a numeric implementation and it doesn't
appear to be too much code.

BTW, there is actually no need to restrict the inputs to integral
values because GCD is something that has a perfectly natural extension
to floating point inputs (see for example [1]). Moreover, since we
already have a mod(numeric, numeric) that works for arbitrary inputs,
Euclid's algorithm just works. For example:

SELECT gcd(285, 7845);
 gcd
-
  15

SELECT gcd(28.5, 7.845);
  gcd
---
 0.015

Essentially, this is because gcd(a*10^n, b*10^n) = gcd(a, b) * 10^n,
so you can think of it as pre-multiplying by a power of 10 large
enough to make both inputs integers, and then dividing the result by
that power of 10.

If it were more work to support non-integer inputs, I'd say that it's
not worth the effort, but since it's actually less work to just allow
it, then why not?


> Maybe we could just define "lcm(bigint, bigint) returns numeric"
> and figure that that covers all cases, but it feels slightly
> weird.  You couldn't do lcm(lcm(a,b),c) without casting.
> I guess that particular use-case could be addressed with
> "lcm(variadic bigint[]) returns numeric", but that's getting
> really odd.
>

Having thought about that, I don't like defining these functions to
return a different type than their inputs. I think most people tend to
be working with a given type, and are used to having to move to a
wider type if necessary. We don't, for example, define "mul(bigint,
bigint) returns numeric".

Also I don't think it really buys you all that much -- the problem
with lcm(lcm(a,b),c) where bigint inputs produce a numeric output
isn't just that you need to add casting; the lcm(a,b) result may not
fit in a bigint, so the cast might fail. So really, this is just
postponing the problem a bit, without really fixing it. As for
"lcm(variadic bigint[]) returns numeric", to implement that you'd need
to use numeric computations internally, so I suspect it's
implementation would be at least as complex as lcm(numeric, numeric).

FWIW, looking for precedents elsewhere, I note that the C++ standard
library defines these functions to return the same type as the inputs.
To me, that seems more natural.

Regards,
Dean

[1] https://www.geeksforgeeks.org/program-find-gcd-floating-point-numbers/




Re: Greatest Common Divisor

2020-01-04 Thread Fabien COELHO



Bonjour Vik,


Here is an updated patch fixing that.


As I said, welcome to arithmetic:-)

Patch v5 applies cleanly.

Doc: I'd consider using an example the result of which is 42 instead of 
21, for obvious geek motivations. Possibly gcd(2142, 462) should be ok.


I got it wrong with my previous comment on gcd(int_min, 0). I'm okay with 
erroring on int_min.


Code comments: gcd(n, 0) = abs(n), not n?

About the code.

Add unlikely() where appropriate.

I'd deal with int_min and 0 at the beginning and then proceed with 
absoluting the values, rather than the dance around a1/arg1, a2/arg2, and 
other arg2 = -arg2, and arg1 = -arg1 anyway in various places, which does 
not make the code that easy to understand.


Pseudo code could be:

   if ((a1 == min && (a2 == min || a2 == 0)) ||
   (a2 == min && a1 == 0))
 error;
   a1 = abs(a1), a2 = abs(a2);
   euclids;
   return;

Note: in the numeric code you abs the value, ISTM consistent to do it as 
well in the other implementations.


Would it make sense that NAN is returned on NUMERIC when the computation 
cannot be performed, eg on non integer values?


Why the positive constraint on LCM(NUMERIC, NUMERIC)? Why not absoluting?

Tests: you can make LCM fail on much smaller values for int2/4/8, you do 
not need to start around max_int.


--
Fabien.




Re: Greatest Common Divisor

2020-01-04 Thread Vik Fearing
On 04/01/2020 09:35, Fabien COELHO wrote:
>>> I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be
>>> nicer?
>>
>>
>> What justification for that do you have?
>
> ISTM that the current implementation has:
>
>   \forall int4 n, n \neq MIN_INT4, \gcd(n, 0) = 0 ?
>
> In which case applying the same rule for min int seems ok. 


No, gcd(n, 0) = n.

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-04 Thread Fabien COELHO



Hello Tom,


Which architecture has single cycle division? I think it's way above
that, based on profiles I've seen. And Agner seems to back me up:
https://www.agner.org/optimize/instruction_tables.pdf
That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
moder uarch like skylake-x.


Huh.  I figured Intel would have thrown sufficient transistors at that
problem by now.


It is not just a problem of number of transistors, division is 
intrisically iterative (with various kind of iterations used in division 
algorithms), involving some level of guessing and other arithmetics, so 
the latency can only be bad, and the possibility of implementing that in 1 
cycle at 3 GHz looks pretty remote.


--
Fabien.




Re: Greatest Common Divisor

2020-01-04 Thread Fabien COELHO


Bonjour Vik,


The point of swapping is to a void possibly expensive modulo, but this
should be done on absolute values, otherwise it may not achieve its
purpose as stated by the comment?


Ah, true.  How widespread are these architectures that need this special
treatment?  Is it really worth handling?


Dunno. AFAICR it was with sparc architectures 25 years ago.

Also I do not like much relying on the subtleties of C99 % wrt negative 
numbers to have the algorithm work, I'd be much at ease to deal with sign 
and special values at the beginning of the function and proceed with 
positive numbers afterwards.



I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be nicer?



What justification for that do you have?


ISTM that the current implementation has:

  \forall int4 n, n \neq MIN_INT4, \gcd(n, 0) = 0 ?

In which case applying the same rule for min int seems ok.

--
Fabien Coelho - CRI, MINES ParisTech

Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 03/01/2020 20:14, Fabien COELHO wrote:
>
> Bonsoir Vik,
>
>  +int4gcd_internal(int32 arg1, int32 arg2)
>  +{
>  +   int32   swap;
>  +
>  +   /*
>  +    * Put the greater value in arg1.
>  +    * This would happen automatically in the loop below, but
> avoids  an
>  +    * expensive modulo simulation on some architectures.
>  +    */
>  +   if (arg1 < arg2)
>  +   {
>  +   swap = arg1;
>  +   arg1 = arg2;
>  +   arg2 = swap;
>  +   }
>
>
> The point of swapping is to a void possibly expensive modulo, but this
> should be done on absolute values, otherwise it may not achieve its
> purpose as stated by the comment?


Here is an updated patch fixing that.

-- 

Vik Fearing

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..e2b7d6d240 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,32 @@
-43
   
 
+  
+   
+
+ gcd
+    
+    gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
+  
+   
+
+ lcm
+
+lcm(a, b)
+   
+   (same as argument types)
+   least common multiple
+   lcm(1071, 462)
+   23562
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 583ce71e66..611207e642 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,212 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/*
+ * Greatest Common Divisor
+ *
+ * Special cases:
+ *   - gcd(x, 0) = x; because 0 is divisible by anything
+ *   - gcd(0, 0) = 0; complies with the previous definition and is a
+ *common convention
+ *
+ * The following cases involving INT_MIN have two possible results.  They could
+ * return INT_MIN because INT_MIN is a valid divisor of INT_MIN, or they could
+ * throw an exception because the result is negative.
+ * The consensus is to throw an exception.
+ *
+ *   - gcd(INT_MIN, 0)
+ *   - gcd(INT_MIN, INT_MIN)
+ *
+ * Any other value with INT_MIN will be a positive value representable within
+ * the data type.
+ */
+static int32
+int4gcd_internal(int32 arg1, int32 arg2)
+{
+	int32	swap;
+	int32	a1, a2;
+
+	/*
+	 * Put the greater absolute value in arg1.
+	 *
+	 * This would happen automatically in the loop below, but avoids an
+	 * expensive modulo simulation on some architectures.
+	 *
+	 * We do this in negative space in order to handle INT_MIN.
+	 */
+	a1 = (arg1 < 0) ? arg1 : -arg1;
+	a2 = (arg2 < 0) ? arg2 : -arg2;
+	if (a1 > a2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	/* Special care needs to be taken with INT_MIN.  See comments above. */
+	if (arg1 == PG_INT32_MIN)
+	{
+		if (arg2 == 0 || arg2 == PG_INT32_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("integer out of range")));
+		/*
+		 * Making arg2 positive avoids the INT_MIN % -1 issue described in
+		 * int4mod.
+		 */
+		arg2 = -arg2;
+	}
+
+	/* Find GCD using the basic Euclidean algorithm */
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	/*
+	 * Make sure the result is positive. (We know we don't have INT_MIN
+	 * anymore).
+	 */
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+static int16
+int2gcd_internal(int16 arg1, int16 arg2)
+{
+	/* See int4gcd_internal for commented version. */
+	int16	swap;
+	int16	a1, a2;
+
+	a1 = (arg1 < 0) ? arg1 : -arg1;
+	a2 = (arg2 < 0) ? arg2 : -arg2;
+	if (a1 > a2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	if (arg1 == PG_INT16_MIN)
+	{
+		if (arg2 == 0 || arg2 == PG_INT16_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("smallint out of range")));
+		arg2 = -arg2;
+	}
+
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32	arg1 = PG_GETARG_INT32(0);
+	int32	arg2 = PG_GETARG_INT32(1);
+	int32	result;
+
+	result = int4gcd_internal(arg1, arg2);
+
+	PG_RETURN_INT32(result);
+}
+
+Datum
+int2gcd(PG_FUNCTION_ARGS)
+{
+	int16	arg1 = PG_GETARG_INT16(0);
+	int16	arg2 = PG_GETARG_INT16(1);
+	int16	result;
+
+	result = int2gcd_internal(arg1, arg2);
+
+	PG_RETURN_INT16(result);
+}
+
+/*
+ * Least Common Multiple
+ *
+ * Both arguments must be positive.  If either argument is 0 we can just return
+ * 0 without further calculation.  This has the nice side effect of avoiding a
+ * division by zero for lcm(0, 0) since we use gcd in the formula.  Also, if
+ * both arguments are the same, we can just return it without further
+ * calculation.
+ */
+
+Datum
+int4lcm(PG_FUNCTION_ARGS)
+{
+	int32	

Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 04/01/2020 01:26, Vik Fearing wrote:
> On 04/01/2020 01:21, Tom Lane wrote:
>> Vik Fearing  writes:
>>> On 03/01/2020 20:14, Fabien COELHO wrote:
 I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be nicer?
>>> What justification for that do you have?
>> Zero is the "correct" answer for that, isn't it, independently of overflow
>> considerations?  
>
> I would say not.  The correct answer is INT_MIN but we've decided a
> negative result is not desirable.


Wolfram Alpha agrees.

https://www.wolframalpha.com/input/?i=gcd%28-9223372036854775808%2C0%29

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Vik Fearing  writes:
> On 04/01/2020 01:21, Tom Lane wrote:
>> Zero is the "correct" answer for that, isn't it, independently of overflow
>> considerations?  

> I would say not.

Oh, right, I was misremembering the identity gcd(a,0) as being 0 not a.
Never mind that then.

> The correct answer is INT_MIN but we've decided a
> negative result is not desirable.

Agreed.  On the other hand, we could stave off overflow the same
way we discussed for lcm: make it return int8.  We're still stuck
with the special case for INT64_MIN in gcd64 of course, so maybe
that's just inconsistent rather than being worthwhile.

[ thinks for a bit... ]  In practice, I imagine few people use gcd on
negative values, so doing weird things with the datatype choices is
probably not better than throwing an error for this case.

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 04/01/2020 01:21, Tom Lane wrote:
> Vik Fearing  writes:
>> On 03/01/2020 20:14, Fabien COELHO wrote:
>>> I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be nicer?
>> What justification for that do you have?
> Zero is the "correct" answer for that, isn't it, independently of overflow
> considerations?  


I would say not.  The correct answer is INT_MIN but we've decided a
negative result is not desirable.


> We should strive to give the correct answer if it's known
> and representable, rather than have arbitrary failure conditions.


On that we fully agree.


> (IOW, we should throw errors only when the *result* is out of range
> or undefined, not just because the input is an edge case.)


That's what I do with the rest of it.  INT_MIN is only an error if the
result of the calculation is also INT_MIN.

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Vik Fearing  writes:
> On 03/01/2020 20:14, Fabien COELHO wrote:
>> I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be nicer?

> What justification for that do you have?

Zero is the "correct" answer for that, isn't it, independently of overflow
considerations?  We should strive to give the correct answer if it's known
and representable, rather than have arbitrary failure conditions.

(IOW, we should throw errors only when the *result* is out of range
or undefined, not just because the input is an edge case.)

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 04/01/2020 00:49, Tom Lane wrote:
> Vik Fearing  writes:
>> On 03/01/2020 20:14, Fabien COELHO wrote:
>>> The point of swapping is to a void possibly expensive modulo, but this
>>> should be done on absolute values, otherwise it may not achieve its
>>> purpose as stated by the comment?
>> Ah, true. How widespread are these architectures that need this special
>> treatment? Is it really worth handling?
> On some older RISC architectures, integer division is really slow, like
> slower than floating-point.  I'm not sure if that's true on any platform
> people still care about though.  In recent years, CPU architects have been
> able to throw all the transistors they needed at such problems.  On a
> machine with single-cycle divide, it's likely that the extra
> compare-and-branch is a net loss.


OK.


> Might be worth checking it on ARM in particular, as being a RISC
> architecture that's still popular.


I don't know how I would check this.


> Also, if we end up having a "numeric" implementation, it absolutely is
> worth it for that, because there is nothing cheap about numeric_div.


The patch includes a numeric version, and I take care to short-circuit
everything I can.


> I'd be sort of inclined to have the swap in the other implementations
> just to keep the algorithms as much alike as possible.


They can't quite be the same behavior because numeric doesn't have the
unrepresentable -INT_MIN problem, and integers don't have NaN.

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Andres Freund  writes:
> On 2020-01-03 18:49:18 -0500, Tom Lane wrote:
>> On a machine with single-cycle divide, it's likely that the extra
>> compare-and-branch is a net loss.

> Which architecture has single cycle division? I think it's way above
> that, based on profiles I've seen. And Agner seems to back me up:
> https://www.agner.org/optimize/instruction_tables.pdf
> That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
> moder uarch like skylake-x.

Huh.  I figured Intel would have thrown sufficient transistors at that
problem by now.  But per that result, it's worth having the swap step
even on CISC, never mind RISC.

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Andres Freund
Hi,

On 2020-01-03 18:49:18 -0500, Tom Lane wrote:
> On some older RISC architectures, integer division is really slow, like
> slower than floating-point.  I'm not sure if that's true on any platform
> people still care about though.  In recent years, CPU architects have been
> able to throw all the transistors they needed at such problems.  On a
> machine with single-cycle divide, it's likely that the extra
> compare-and-branch is a net loss.

Which architecture has single cycle division? I think it's way above
that, based on profiles I've seen. And Agner seems to back me up:
https://www.agner.org/optimize/instruction_tables.pdf

That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
moder uarch like skylake-x.

Greetings,

Andres Freund




Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Vik Fearing  writes:
> On 03/01/2020 20:14, Fabien COELHO wrote:
>> The point of swapping is to a void possibly expensive modulo, but this
>> should be done on absolute values, otherwise it may not achieve its
>> purpose as stated by the comment?

> Ah, true. How widespread are these architectures that need this special
> treatment? Is it really worth handling?

On some older RISC architectures, integer division is really slow, like
slower than floating-point.  I'm not sure if that's true on any platform
people still care about though.  In recent years, CPU architects have been
able to throw all the transistors they needed at such problems.  On a
machine with single-cycle divide, it's likely that the extra
compare-and-branch is a net loss.

Might be worth checking it on ARM in particular, as being a RISC
architecture that's still popular.

Also, if we end up having a "numeric" implementation, it absolutely is
worth it for that, because there is nothing cheap about numeric_div.
I'd be sort of inclined to have the swap in the other implementations
just to keep the algorithms as much alike as possible.

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Andres Freund
On 2020-01-03 18:00:01 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2020-Jan-03, Robert Haas wrote:
> >> Then every time we add a function, or anything else, we can bikeshed
> >> about whether it should go in pg_catalog or pg_extra!
> 
> > Yeah, I was just thinking about that :-)  I was thinking that all
> > standard-mandated functions, as well as system functions, should be in
> > pg_catalog; and otherwise stuff should not get in the user's way.
> 
> I think that ship sailed a long time ago, frankly.
> 
> Why is it that this particular proposal is such a problem that we
> need to redesign how we add features?  There are currently 2977
> rows in a default installation's pg_proc, with 2447 unique values
> of proname.  Certainly at least a couple of thousand of them are not
> standard-mandated; despite which there are only 357 named 'pg_something'.
> gcd and/or lcm are not going to move the needle noticeably.
> 
> I'd also submit that just pushing a bunch of built-in stuff into a
> schema that's behind the users' schema instead of in front doesn't
> mean that all is magically better.  There are still going to be the
> same issues that make CVE-2018-1058 such a problem, but now we get
> to have them in both directions not just one:
> 
> * a system-supplied function in "pg_extra" could still capture a call
> away from a user-supplied one in an earlier schema, if it is a better
> match to the actual argument types;
> 
> * malicious users now have a much better chance to capture other
> people's calls to "pg_extra" functions, since they can just drop an
> exact match into public.
> 
> (BTW, I'm pretty sure we've had this conversation before.  I
> definitely recall a proposal to try to move functions not meant
> for user consumption at all, such as index support functions,
> into a whole other schema that wouldn't be in the path period.
> It went nowhere, partly because those functions don't seem to
> be big problems in practice.)

+1 to all of this.




Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Jan-03, Robert Haas wrote:
>> Then every time we add a function, or anything else, we can bikeshed
>> about whether it should go in pg_catalog or pg_extra!

> Yeah, I was just thinking about that :-)  I was thinking that all
> standard-mandated functions, as well as system functions, should be in
> pg_catalog; and otherwise stuff should not get in the user's way.

I think that ship sailed a long time ago, frankly.

Why is it that this particular proposal is such a problem that we
need to redesign how we add features?  There are currently 2977
rows in a default installation's pg_proc, with 2447 unique values
of proname.  Certainly at least a couple of thousand of them are not
standard-mandated; despite which there are only 357 named 'pg_something'.
gcd and/or lcm are not going to move the needle noticeably.

I'd also submit that just pushing a bunch of built-in stuff into a
schema that's behind the users' schema instead of in front doesn't
mean that all is magically better.  There are still going to be the
same issues that make CVE-2018-1058 such a problem, but now we get
to have them in both directions not just one:

* a system-supplied function in "pg_extra" could still capture a call
away from a user-supplied one in an earlier schema, if it is a better
match to the actual argument types;

* malicious users now have a much better chance to capture other
people's calls to "pg_extra" functions, since they can just drop an
exact match into public.

(BTW, I'm pretty sure we've had this conversation before.  I
definitely recall a proposal to try to move functions not meant
for user consumption at all, such as index support functions,
into a whole other schema that wouldn't be in the path period.
It went nowhere, partly because those functions don't seem to
be big problems in practice.)

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 03/01/2020 20:14, Fabien COELHO wrote:
>
> Bonsoir Vik,
>
>  +int4gcd_internal(int32 arg1, int32 arg2)
>  +{
>  +   int32   swap;
>  +
>  +   /*
>  +    * Put the greater value in arg1.
>  +    * This would happen automatically in the loop below, but
> avoids  an
>  +    * expensive modulo simulation on some architectures.
>  +    */
>  +   if (arg1 < arg2)
>  +   {
>  +   swap = arg1;
>  +   arg1 = arg2;
>  +   arg2 = swap;
>  +   }
>
>
> The point of swapping is to a void possibly expensive modulo, but this
> should be done on absolute values, otherwise it may not achieve its
> purpose as stated by the comment?


Ah, true.  How widespread are these architectures that need this special
treatment?  Is it really worth handling?


> I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be nicer?


What justification for that do you have?

-- 

Vik Fearing





Re: Greatest Common Divisor

2020-01-03 Thread Alvaro Herrera
On 2020-Jan-03, Robert Haas wrote:

> On Fri, Jan 3, 2020 at 4:11 PM Alvaro Herrera  
> wrote:
> > Maybe a very simple solution is indeed to have a separate pg_math or
> > pg_extra or whatever, which by default is *last* in the search_path.
> > That would make a user's gcd() be chosen preferently, if one exists.
> 
> Then every time we add a function, or anything else, we can bikeshed
> about whether it should go in pg_catalog or pg_extra!

Yeah, I was just thinking about that :-)  I was thinking that all
standard-mandated functions, as well as system functions, should be in
pg_catalog; and otherwise stuff should not get in the user's way.

> FWIW, EnterpriseDB has something like this for Advanced Server, and it
> actually adds a fair amount of complexity, much of it around
> OverrideSearchPath. It's not unmanageable, but it's not trivial,
> either.

Oh, hmm.  okay.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 4:11 PM Alvaro Herrera  wrote:
> Maybe a very simple solution is indeed to have a separate pg_math or
> pg_extra or whatever, which by default is *last* in the search_path.
> That would make a user's gcd() be chosen preferently, if one exists.

Then every time we add a function, or anything else, we can bikeshed
about whether it should go in pg_catalog or pg_extra!

FWIW, EnterpriseDB has something like this for Advanced Server, and it
actually adds a fair amount of complexity, much of it around
OverrideSearchPath. It's not unmanageable, but it's not trivial,
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 3:51 PM Merlin Moncure  wrote:
> Is that right? Default search_path is for pg_catalog to resolve before
> public.  Lightly testing with a hand rolled pg_advisory_lock
> implementation that raise a notice, my default database seemed to
> prefer the build in function.  Maybe I'm not following you.

Nope, I'm just wrong. Sorry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Chapman Flack
On 1/3/20 4:10 PM, Alvaro Herrera wrote:

> Maybe a very simple solution is indeed to have a separate pg_math or
> pg_extra or whatever, which by default is *last* in the search_path.
> That would make a user's gcd() be chosen preferently, if one exists.

I'm liking the direction this is going.

Regards,
-Chap




Re: Greatest Common Divisor

2020-01-03 Thread Alvaro Herrera
On 2020-Jan-03, Merlin Moncure wrote:

> On Fri, Jan 3, 2020 at 1:32 PM Robert Haas  wrote:

> > True, but because of the way search_path is typically set, they'd
> > probably continue to get their own version anyway, so I'm not sure
> > what the problem is.
> 
> Is that right? Default search_path is for pg_catalog to resolve before
> public.  Lightly testing with a hand rolled pg_advisory_lock
> implementation that raise a notice, my default database seemed to
> prefer the build in function.  Maybe I'm not following you.

Maybe a very simple solution is indeed to have a separate pg_math or
pg_extra or whatever, which by default is *last* in the search_path.
That would make a user's gcd() be chosen preferently, if one exists.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-03 Thread Chapman Flack
On 1/3/20 3:09 PM, Peter Eisentraut wrote:
> Geometry is generally in scope, though, for Postgres specifically and
> for databases in general.
> 
> Abstract algebra is not in scope, so far, and we still haven't been told
> the use case for this.

It's funny, I think I've used gcd and lcm in real life way more often
than sinh and cosh, maybe even as often as sin and cos. For example,
how many times around will I have to go with this engine crankshaft
to be able to confirm the painted links on the timing chain really
do line up with the sprocket marks? (Need to count the sprocket
teeth and the chain links.)

Or, if I'm cycling through two different-length tuple stores, how
many times before the same tuples coincide again? That isn't a question
I've yet had an occasion to face, but I don't have to squint real hard
to imagine it arising in a database in some situation or other. This
is just me riffing, as of course I'm not the person who had such a
pressing use case as to justify sitting down to write the patch.

Another funny thing: this message sent me googling just to indulge
my own "is gcd more abstract algebra or number theory?" quibble*, and
I ended up discovering there are more algorithms for it than the
Euclidean one I remember.

There's a binary one using only ands, subtractions, and shifts,
asymptotically the same as Euclid but perhaps somewhat faster:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm

It looks fairly simple to code up, if not quite as short as Euclid.

There's at least one specific to representations like numeric:
https://en.wikipedia.org/wiki/Lehmer%27s_GCD_algorithm

... considerably more effort to implement though.

It might be possible, if there are crypto libraries we're already
linking to for other reasons like SSL, there could be good
big-number gcd implementations already in there.

Regards,
-Chap


* maybe I've decided to call it number theory if the things being gcd'd
are integers, abstract algebra if they belong to other commutative rings.




Re: Greatest Common Divisor

2020-01-03 Thread Merlin Moncure
On Fri, Jan 3, 2020 at 1:32 PM Robert Haas  wrote:
>
> On Fri, Jan 3, 2020 at 2:27 PM Chapman Flack  wrote:
> > On 1/3/20 2:11 PM, Robert Haas wrote:
> > > and moving things to another schema does not help with that. It does
> > > potentially help with the namespace pollution issue, but how much of
> > > an issue is that anyway? Unless you've set up an unusual search_path
> > > configuration, your own schemas probably precede pg_catalog in your
> > > search path, besides which it seems unlikely that many people have a
> > > gcd() function that does anything other than take the greatest common
> > > divisor.
> >
> > As seen in this thread though, there can be edge cases of "take the
> > greatest common divisor" that might not be identically treated in a
> > thoroughly-reviewed addition to core as in someone's hastily-rolled
> > local version.
>
> True, but because of the way search_path is typically set, they'd
> probably continue to get their own version anyway, so I'm not sure
> what the problem is.

Is that right? Default search_path is for pg_catalog to resolve before
public.  Lightly testing with a hand rolled pg_advisory_lock
implementation that raise a notice, my default database seemed to
prefer the build in function.  Maybe I'm not following you.

> There are counter-arguments to that, though. Maintaining a lot of
> extensions with only one or two functions in them is a nuisance.
> Having things installed by default is convenient for wanting to use
> them. Maintaining contrib code so that it works whether or not the SQL
> definitions have been updated via ALTER EXTENSION .. UPDATE takes some
> work and thought, and sometimes we screw it up.

If the external contract changes (which seems likely for gcd) than I
would much rather have the core team worry about this than force your
users to worry about it, which is what putting the function in core
would require them to do (if version < x call it this way, > y then
that way etc).  This is exactly why we shouldn't be putting non
standard items in core (maybe excepting some pg_ prefixed
administration functions).

Now, it's quite unfair to $OP to saddle his proposal and patch with
the broader considerations of core/extension packaging, so if some
kind of rational framework can be applied to the NEXT submission, or a
least a discussion about this can start, those are all good options.
But we need to start from somewhere, and moving forward with, "If it's
not sql standard or prefixed with pg_, it ought not to be in
pg_catalog" might be a good way to open the discussion.

merlin




Re: Greatest Common Divisor

2020-01-03 Thread Peter Eisentraut

On 2020-01-03 16:22, Tom Lane wrote:

Peter Eisentraut  writes:

On 2020-01-02 15:50, Dean Rasheed wrote:

Out of curiosity, what was the original use-case for this?



Yeah, I'm wondering, is this useful for any typical analytics or
business application?  Otherwise, abstract algebra functionality seems a
bit out of scope.


Nobody complained when we added sinh, cosh, tanh, asinh, acosh, atanh
last year, so I'm feeling skeptical of claims that gcd should be out
of scope.


Geometry is generally in scope, though, for Postgres specifically and 
for databases in general.


Abstract algebra is not in scope, so far, and we still haven't been told 
the use case for this.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 2:27 PM Chapman Flack  wrote:
> On 1/3/20 2:11 PM, Robert Haas wrote:
> > and moving things to another schema does not help with that. It does
> > potentially help with the namespace pollution issue, but how much of
> > an issue is that anyway? Unless you've set up an unusual search_path
> > configuration, your own schemas probably precede pg_catalog in your
> > search path, besides which it seems unlikely that many people have a
> > gcd() function that does anything other than take the greatest common
> > divisor.
>
> As seen in this thread though, there can be edge cases of "take the
> greatest common divisor" that might not be identically treated in a
> thoroughly-reviewed addition to core as in someone's hastily-rolled
> local version.

True, but because of the way search_path is typically set, they'd
probably continue to get their own version anyway, so I'm not sure
what the problem is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Chapman Flack
On 1/3/20 2:11 PM, Robert Haas wrote:
> and moving things to another schema does not help with that. It does
> potentially help with the namespace pollution issue, but how much of
> an issue is that anyway? Unless you've set up an unusual search_path
> configuration, your own schemas probably precede pg_catalog in your
> search path, besides which it seems unlikely that many people have a
> gcd() function that does anything other than take the greatest common
> divisor.

As seen in this thread though, there can be edge cases of "take the
greatest common divisor" that might not be identically treated in a
thoroughly-reviewed addition to core as in someone's hastily-rolled
local version.

Regards,
-Chap




Re: Greatest Common Divisor

2020-01-03 Thread Fabien COELHO



Bonsoir Vik,

 +int4gcd_internal(int32 arg1, int32 arg2)
 +{
 +   int32   swap;
 +
 +   /*
 +* Put the greater value in arg1.
 +* This would happen automatically in the loop below, but avoids  an
 +* expensive modulo simulation on some architectures.
 +*/
 +   if (arg1 < arg2)
 +   {
 +   swap = arg1;
 +   arg1 = arg2;
 +   arg2 = swap;
 +   }


The point of swapping is to a void possibly expensive modulo, but this 
should be done on absolute values, otherwise it may not achieve its 
purpose as stated by the comment?



gcd() is now strictly positive, so INT_MIN is no longer a valid result.


Ok.

I'm unsure about gcd(INT_MIN, 0) should error. Possibly 0 would be 
nicer?


--
Fabien.




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 1:57 PM Chapman Flack  wrote:
> Is there a middle ground staring us in the face, where certain things
> could be added in core, but in a new schema like pg_math (pg_ !), so
> if you want them you put them on your search path or qualify them
> explicitly, and if you don't, you don't?

I guess, but it seems like a patch whose mandate is to add one or two
functions should not be burdened with inventing an entirely new way to
do extensibility. Also, I'm not entirely sure that really addresses
all the concerns. Part of my concern about continually adding new
functions to core comes from the fact that it bloats the core code,
and moving things to another schema does not help with that. It does
potentially help with the namespace pollution issue, but how much of
an issue is that anyway? Unless you've set up an unusual search_path
configuration, your own schemas probably precede pg_catalog in your
search path, besides which it seems unlikely that many people have a
gcd() function that does anything other than take the greatest common
divisor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Chapman Flack
On 1/3/20 1:46 PM, Robert Haas wrote:
> On Fri, Jan 3, 2020 at 1:10 PM Merlin Moncure  wrote:
>> Just stop doing it.  It's very little extra work to package an item
>> into an extension and this protects your hapless users who might have
>> implemented a function called gcd() that does something different.
>> ...
> There are counter-arguments to that, though. Maintaining a lot of
> extensions with only one or two functions in them is a nuisance.
> Having things installed by default is convenient for wanting to use
> them. Maintaining contrib code so that it works whether or not the SQL
> definitions have been updated via ALTER EXTENSION .. UPDATE takes some
> work and thought, and sometimes we screw it up.

Is there a middle ground staring us in the face, where certain things
could be added in core, but in a new schema like pg_math (pg_ !), so
if you want them you put them on your search path or qualify them
explicitly, and if you don't, you don't?

Regards,
-Chap




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 1:10 PM Merlin Moncure  wrote:
> Just stop doing it.  It's very little extra work to package an item
> into an extension and this protects your hapless users who might have
> implemented a function called gcd() that does something different.
> Ideally, the public namespace should contain (by default) only sql
> standard functions with all non-standard material in an appropriate
> extension.  Already released material is obviously problematic and
> needs more thought but we ought to at least stop making the problem
> worse if possible.

There are counter-arguments to that, though. Maintaining a lot of
extensions with only one or two functions in them is a nuisance.
Having things installed by default is convenient for wanting to use
them. Maintaining contrib code so that it works whether or not the SQL
definitions have been updated via ALTER EXTENSION .. UPDATE takes some
work and thought, and sometimes we screw it up.

I don't find any position on this topic to be without merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Vik Fearing
On 02/01/2020 16:12, Tom Lane wrote:
> Stephen Frost  writes:
>> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>>> I'm not objecting to adding it, I'm just curious. In fact, I think
>>> that if we do add this, then we should probably add lcm() at the same
>>> time, since handling its overflow cases is sufficiently non-trivial to
>>> justify not requiring users to have to implement it themselves.
>> I tend to agree with this.
> Does this impact the decision about whether we need a variant for
> numeric?  I was leaning against that, primarily because (a)
> it'd introduce a set of questions about what to do with non-integral
> inputs, and (b) it'd make the patch quite a lot larger, I imagine.
> But a variant of lcm() that returns numeric would have much more
> resistance to overflow.
>
> Maybe we could just define "lcm(bigint, bigint) returns numeric"
> and figure that that covers all cases, but it feels slightly
> weird.  You couldn't do lcm(lcm(a,b),c) without casting.
> I guess that particular use-case could be addressed with
> "lcm(variadic bigint[]) returns numeric", but that's getting
> really odd.


Okay.  Here is a version that should handle everyone's comments.


gcd() is now strictly positive, so INT_MIN is no longer a valid result.


I added an lcm() function. It returns the same type as its arguments so
overflow is possible.  I made this choice because int2mul returns int2
(and same for its friends).  One can just cast to a bigger integer if
needed.


Because of that, I added a version of gcd() and lcm() for numeric.  This
was my first time working with numeric so reviewers should pay extra
attention there, please.


Patch attached.

-- 

Vik Fearing

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..e2b7d6d240 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,32 @@
-43
   
 
+  
+   
+    
+ gcd
+
+gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
+  
+   
+
+ lcm
+
+lcm(a, b)
+   
+   (same as argument types)
+   least common multiple
+   lcm(1071, 462)
+   23562
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 583ce71e66..a70009ae09 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,199 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/*
+ * Greatest Common Divisor
+ *
+ * Special cases:
+ *   - gcd(x, 0) = x; because 0 is divisible by anything
+ *   - gcd(0, 0) = 0; complies with the previous definition and is a
+ *common convention
+ *
+ * The following cases involving INT_MIN have two possible results.  They could
+ * return INT_MIN because INT_MIN is a valid divisor of INT_MIN, or they could
+ * throw an exception because the result is negative.
+ * The consensus is to throw an exception.
+ *
+ *   - gcd(INT_MIN, 0)
+ *   - gcd(INT_MIN, INT_MIN)
+ *
+ * Any other value with INT_MIN will be a positive value representable within
+ * the data type.
+ */
+static int32
+int4gcd_internal(int32 arg1, int32 arg2)
+{
+	int32	swap;
+
+	/*
+	 * Put the greater value in arg1.
+	 * This would happen automatically in the loop below, but avoids an
+	 * expensive modulo simulation on some architectures.
+	 */
+	if (arg1 < arg2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	/* Special care needs to be taken with INT_MIN.  See comments above. */
+	if (arg2 == PG_INT32_MIN)
+	{
+		if (arg1 == 0 || arg1 == PG_INT32_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("integer out of range")));
+		arg1 = -arg1;
+	}
+
+	/* Find GCD using the basic Euclidean algorithm */
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	/*
+	 * Make sure the result is positive. (We know we don't have INT_MIN
+	 * anymore).
+	 */
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+static int16
+int2gcd_internal(int16 arg1, int16 arg2)
+{
+	/* See int4gcd_internal for commented version. */
+	int16	swap;
+
+	if (arg1 < arg2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	if (arg2 == PG_INT16_MIN)
+	{
+		if (arg1 == 0 || arg1 == PG_INT16_MIN)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("smallint out of range")));
+		arg1 = -arg1;
+	}
+
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	return arg1;
+}
+
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32	arg1 = PG_GETARG_INT32(0);
+	int32	arg2 = PG_GETARG_INT32(1);
+	int32	result;
+
+	result = int4gcd_internal(arg1, arg2);
+
+	PG_RETURN_INT32(resul

Re: Greatest Common Divisor

2020-01-03 Thread Merlin Moncure
On Fri, Jan 3, 2020 at 10:24 AM Robert Haas  wrote:
>
> On Fri, Jan 3, 2020 at 10:23 AM Tom Lane  wrote:
> > Now, those functions were just exposing libc functionality, so there
> > wasn't a lot of code to write.  There might be a good argument that
> > gcd isn't useful enough to justify the amount of code we'd have to
> > add (especially if we allow it to scope-creep into needing to deal
> > with "numeric" calculations).  But I'm not on board with just
> > dismissing it as uninteresting.
>
> Yeah. There's always the question with things like this as to whether
> we ought to push certain things into contrib modules that are not
> installed by default to avoid bloating the set of things built into
> the core server. But it's hard to know where to draw the line.

Just stop doing it.  It's very little extra work to package an item
into an extension and this protects your hapless users who might have
implemented a function called gcd() that does something different.
Ideally, the public namespace should contain (by default) only sql
standard functions with all non-standard material in an appropriate
extension.  Already released material is obviously problematic and
needs more thought but we ought to at least stop making the problem
worse if possible.

merlin




Re: Greatest Common Divisor

2020-01-03 Thread Alvaro Herrera
On 2020-Jan-03, Robert Haas wrote:

> On Fri, Jan 3, 2020 at 10:23 AM Tom Lane  wrote:
> > Now, those functions were just exposing libc functionality, so there
> > wasn't a lot of code to write.  There might be a good argument that
> > gcd isn't useful enough to justify the amount of code we'd have to
> > add (especially if we allow it to scope-creep into needing to deal
> > with "numeric" calculations).  But I'm not on board with just
> > dismissing it as uninteresting.
> 
> Yeah. There's always the question with things like this as to whether
> we ought to push certain things into contrib modules that are not
> installed by default to avoid bloating the set of things built into
> the core server. But it's hard to know where to draw the line. There's
> no objective answer to the question of whether gcd() or sinh() is more
> useful to have in core;

The SQL standard's feature T622 requires trigonometric functions, while
it doesn't list gcd() or anything of the sort, so there's that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-03 Thread Robert Haas
On Fri, Jan 3, 2020 at 10:23 AM Tom Lane  wrote:
> Now, those functions were just exposing libc functionality, so there
> wasn't a lot of code to write.  There might be a good argument that
> gcd isn't useful enough to justify the amount of code we'd have to
> add (especially if we allow it to scope-creep into needing to deal
> with "numeric" calculations).  But I'm not on board with just
> dismissing it as uninteresting.

Yeah. There's always the question with things like this as to whether
we ought to push certain things into contrib modules that are not
installed by default to avoid bloating the set of things built into
the core server. But it's hard to know where to draw the line. There's
no objective answer to the question of whether gcd() or sinh() is more
useful to have in core; each is more useful to people who need that
one but not the other, and trying to guess whether more or fewer
people need gcd() than sinh() seems like a fool's errand. Perhaps in
retrospect we would be better off having a 'math' extension where a
lot of this stuff lives, and people who want that extension can
install it and others need not bother. But, to try to create that now
and move things there would break upgrades for an exceedingly marginal
benefit. I don't really like the namespace pollution that comes with
accepting feature requests like this, but it's hard to argue that it's
a serious show-stopper or that the cure is any less bad than the
disease. And I'm sure that I'd be much more likely to use gcd() or
lcm() in a query than tanh()...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-01-02 15:50, Dean Rasheed wrote:
>> Out of curiosity, what was the original use-case for this?

> Yeah, I'm wondering, is this useful for any typical analytics or 
> business application?  Otherwise, abstract algebra functionality seems a 
> bit out of scope.

Nobody complained when we added sinh, cosh, tanh, asinh, acosh, atanh
last year, so I'm feeling skeptical of claims that gcd should be out
of scope.

Now, those functions were just exposing libc functionality, so there
wasn't a lot of code to write.  There might be a good argument that
gcd isn't useful enough to justify the amount of code we'd have to
add (especially if we allow it to scope-creep into needing to deal
with "numeric" calculations).  But I'm not on board with just
dismissing it as uninteresting.

regards, tom lane




Re: Greatest Common Divisor

2020-01-03 Thread Peter Eisentraut

On 2020-01-02 15:50, Dean Rasheed wrote:

Out of curiosity, what was the original use-case for this?


Yeah, I'm wondering, is this useful for any typical analytics or 
business application?  Otherwise, abstract algebra functionality seems a 
bit out of scope.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Greatest Common Divisor

2020-01-03 Thread Fabien COELHO




Normally gcd() returns a positive integer, and gcd(a,0) = gcd(a,a) =
abs(a). But since abs(INT_MIN) cannot be represented as a 32-bit
integer, both those cases should throw an integer-out-of-range error.


I'm also in favor of that option, rather than sending a negative result as 
a result.


About lcm(a, b): a / gcd(a, b) * b, at least if a & b are positive. If 
not, some thoughts are needed:-)


Returning a NUMERIC as suggested by Tom would solve the overflow problem 
by sending it back to the user who has to cast. This looks ok to me.


Maybe we could provide "int4 lcm(int2, int2)", "int8 lcm(int4, int4)", as 
ISTM that there cannot be overflows on those (eg for the later: lcm <= 
a*b, a & b are 31 non-signed bits, 62 bits are needed, 63 are available).


--
Fabien.




Re: Greatest Common Divisor

2020-01-02 Thread Merlin Moncure
On Sat, Dec 28, 2019 at 12:15 PM Fabien COELHO  wrote:
>
>
> Bonsoir Vik,
>
> > I recently came across the need for a gcd function (actually I needed
> > lcm) and was surprised that we didn't have one.
>
> Why not.

Proliferation of code in the public namespace; it can displace code
that is written by others during the upgrade.

merlin




Re: Greatest Common Divisor

2020-01-02 Thread Tom Lane
Stephen Frost  writes:
> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>> I'm not objecting to adding it, I'm just curious. In fact, I think
>> that if we do add this, then we should probably add lcm() at the same
>> time, since handling its overflow cases is sufficiently non-trivial to
>> justify not requiring users to have to implement it themselves.

> I tend to agree with this.

Does this impact the decision about whether we need a variant for
numeric?  I was leaning against that, primarily because (a)
it'd introduce a set of questions about what to do with non-integral
inputs, and (b) it'd make the patch quite a lot larger, I imagine.
But a variant of lcm() that returns numeric would have much more
resistance to overflow.

Maybe we could just define "lcm(bigint, bigint) returns numeric"
and figure that that covers all cases, but it feels slightly
weird.  You couldn't do lcm(lcm(a,b),c) without casting.
I guess that particular use-case could be addressed with
"lcm(variadic bigint[]) returns numeric", but that's getting
really odd.

regards, tom lane




Re: Greatest Common Divisor

2020-01-02 Thread Stephen Frost
Greetings,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> I'm not objecting to adding it, I'm just curious. In fact, I think
> that if we do add this, then we should probably add lcm() at the same
> time, since handling its overflow cases is sufficiently non-trivial to
> justify not requiring users to have to implement it themselves.

I tend to agree with this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Greatest Common Divisor

2020-01-02 Thread Dean Rasheed
Out of curiosity, what was the original use-case for this?

I'm not objecting to adding it, I'm just curious. In fact, I think
that if we do add this, then we should probably add lcm() at the same
time, since handling its overflow cases is sufficiently non-trivial to
justify not requiring users to have to implement it themselves.

I don't like the INT_MIN handling though:

select gcd(-2147483648,0);
 gcd
-
 -2147483648
(1 row)

select gcd(-2147483648,-2147483648);
 gcd
-
 -2147483648
(1 row)

Normally gcd() returns a positive integer, and gcd(a,0) = gcd(a,a) =
abs(a). But since abs(INT_MIN) cannot be represented as a 32-bit
integer, both those cases should throw an integer-out-of-range error.

In addition, the following case should produce 1, but for me it
produces an error. This is actually going to be platform-dependent as
it is currently implemented (see the comments in int4div and int4mod):

select gcd(-2147483648,-1);
ERROR:  floating-point exception
DETAIL:  An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, such as
division by zero.

so there needs to be some special-case INT_MIN handling at the start
to deal with these cases.

AFAIK, what gcd(0,0) should be is not well defined, but most common
implementations seem to return 0 (I checked Matlab, python and Java's
standard libraries). This seems like a reasonable extrapolation of the
rule gcd(a,0) = gcd(0,a) = gcd(a,a) = abs(a), so I don't have a
problem with doing the same here, but I think that it should be
documented (e.g., see [1]), if for no other reason than users might
expect it to be safe to divide by the result.

Regards,
Dean

[1] https://www.mathworks.com/help/matlab/ref/gcd.html




Re: Greatest Common Divisor

2019-12-29 Thread Fabien COELHO



Hello,


Because I do not trust C modulo as I had a lot of problems with it?:-)


If I recall correctly (and I'm traveling and away from those notes),
the exact semantics of C's % with negative operands was left
implementation-defined until, was it, C99 ?


Indeed, my woes with C % started before that date:-)

By Googling the C99 spec, I found: "When integers are divided, the result 
of the / operator is the algebraic quotient with any fractional part 
discarded (aka truncation toward zero). If the quotient a/b is 
representable, the expression (a/b)*b + a%b shall equal a."


Let a = 2 and b = -3, then a/b == 0 (-0.666 truncated toward zero), then

   (a/b)*b + a%b == a

=> 0 * -3 + (2 % -3) == 2

=> 2 % -3 == 2

Then with a = -2, b = 3, then a/b == 0 (same as above), and the same 
reasoning leads to


   -2 % 3 == -2

Which is indeed what was produced with C, but not with Python.

The good news is that the absolute value of the modulo is the module in 
the usual sense, which is what is needed for the Euclidian descent and 
allows fixing the sign afterwards, as Vik was doing.



So it might be ok to rely on the specified C99 behavior (whichever
behavior that is, he wrote, notelessly) for PG 12 and later, where
C99 is expected.


Yep, probably with a comment.

--
Fabien.




Re: Greatest Common Divisor

2019-12-29 Thread Chapman Flack
On 12/29/19 02:30, Fabien COELHO wrote:

>>> C modulo operator (%) is a pain because it is not positive remainder
>>> (2 % -3 == -1 vs 2 % 3 == 2, AFAICR).
>>
>> This does not seem to be the case...
> ...
> Because I do not trust C modulo as I had a lot of problems with it? :-)

If I recall correctly (and I'm traveling and away from those notes),
the exact semantics of C's % with negative operands was left
implementation-defined until, was it, C99 ?

So it might be ok to rely on the specified C99 behavior (whichever
behavior that is, he wrote, notelessly) for PG 12 and later, where
C99 is expected.

Regards,
-Chap




Re: Greatest Common Divisor

2019-12-29 Thread Vik Fearing
On 29/12/2019 08:30, Fabien COELHO wrote:
>
>>> I'm wondering what it should do on N, 0 and 0, 0. Raise an error?
>>> Return 0? Return 1? return N? There should be some logic and comments
>>> explaining it.
>>
>> Well, gcd(N, 0) is N, and gcd(0, 0) is 0, so I don't see an issue here?
>
> I think that there should be a comment.


Done.


>>> It does not seem that fixing the sign afterwards is the right thing to
>>> do. I'd rather turn both arguments positive before doing the descent.
>>
>> Why isn't it the right thing to do?
>
> Because I do not trust C modulo as I had a lot of problems with it? :-)
>
> If it works, but it should deserve a clear comment explaining why.


Surely such a comment should be on the mod functions and not in this patch.


>
>>> Which raises an issue with INT_MIN by the way, which has no positive:-(
>>
>> That's an argument against abs-ing the input values.  It's only an issue
>> with gcd(INT_MIN, INT_MIN) which currently returns INT_MIN.
>
> That should be an error instead, because -INT_MIN cannot be represented?


Why should it error?  Is INT_MIN not a valid divisor of INT_MIN?  I
added a comment instead.


>>> Also, the usual approach is to exchange args so that the largest is
>>> first, because there may be a software emulation if the hardware does
>>> not implement modulo. At least it was the case with some sparc
>>> processors 25 years ago:-)
>>
>> The args will exchange themselves.
>
> Yep, but after a possibly expensive software-emulated modulo operation?
>

I'll just trust you on this.  Swap added.


Thanks!

-- 

Vik

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..b2c663cd92 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,19 @@
-43
   
 
+  
+   
+
+ gcd
+
+gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 04825fc77d..61b50db71c 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,76 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/* int[24]gcd()
+ * Greatest Common Divisor
+ *
+ * By definition, gcd(0, 0) = 0 and gcd(x, 0) = x.
+ */
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32		arg1 = PG_GETARG_INT32(0);
+	int32		arg2 = PG_GETARG_INT32(1);
+	int32		swap;
+
+	/*
+	 * Put the greater value in arg1.
+	 * This would happen automatically in the loop below, but avoids an
+	 * expensive modulo simulation on some architectures.
+	 */
+	if (arg1 < arg2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	/* Find GCD using the basic Euclidean algorithm */
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	/*
+	 * Make sure the result is positive. (INT_MIN will still be negative
+	 * because there is no positive representation of it).
+	 */
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT32(arg1);
+}
+
+Datum
+int2gcd(PG_FUNCTION_ARGS)
+{
+	/* See int4gcd for commented version. */
+	int16		arg1 = PG_GETARG_INT16(0);
+	int16		arg2 = PG_GETARG_INT16(1);
+	int16		swap;
+
+	if (arg1 < arg2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT16(arg1);
+}
+
 Datum
 int2larger(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index a40ae40dff..59be1534a4 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -667,6 +667,38 @@ int8mod(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(arg1 % arg2);
 }
 
+/* int8gcd()
+ * Greatest Common Divisor
+ *
+ * See int4gcd for commented version.
+ */
+Datum
+int8gcd(PG_FUNCTION_ARGS)
+{
+	int64		arg1 = PG_GETARG_INT64(0);
+	int64		arg2 = PG_GETARG_INT64(1);
+	int64		swap;
+
+	if (arg1 < arg2)
+	{
+		swap = arg1;
+		arg1 = arg2;
+		arg2 = swap;
+	}
+
+	while (arg2 != 0)
+	{
+		swap = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = swap;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT64(arg1);
+}
+
 
 Datum
 int8inc(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..b3057f2cdd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10729,4 +10729,15 @@
   proname => 'pg_partition_root', prorettype => 'regclass',
   proargtypes => 'regclass', prosrc => 'pg_partition_root' },
 
+# greatest common divisor
+{ oid => '8463', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int2', proargtypes => 'int2 int2',
+  prosrc => 'int2

Re: Greatest Common Divisor

2019-12-28 Thread Fabien COELHO


Bonjour Vik,


Should there be a NUMERIC version as well? I'd say maybe yes.


I thought about that, too, but also decided against it for this patch.


Hmmm. ISTM that int functions are available for numeric?


I'm wondering what it should do on N, 0 and 0, 0. Raise an error?
Return 0? Return 1? return N? There should be some logic and comments
explaining it.


Well, gcd(N, 0) is N, and gcd(0, 0) is 0, so I don't see an issue here?


I think that there should be a comment.


I'd test with INT_MIN and INT_MAX.


Okay, I'll add tests for those, instead of the pretty much random values
I have now.


C modulo operator (%) is a pain because it is not positive remainder
(2 % -3 == -1 vs 2 % 3 == 2, AFAICR).


This does not seem to be the case...


Indeed, I tested quickly with python, but it has yet another behavior as 
shown above, what a laugh!


So with C: 2 % -3 == 2, -2 % 3 == -2

Note that AFAICS there is no integer i so that 3 * i - (-2) == -2.


It does not seem that fixing the sign afterwards is the right thing to
do. I'd rather turn both arguments positive before doing the descent.


Why isn't it the right thing to do?


Because I do not trust C modulo as I had a lot of problems with it? :-)

If it works, but it should deserve a clear comment explaining why.


Which raises an issue with INT_MIN by the way, which has no positive:-(


That's an argument against abs-ing the input values.  It's only an issue
with gcd(INT_MIN, INT_MIN) which currently returns INT_MIN.


That should be an error instead, because -INT_MIN cannot be represented?


Any other value with INT_MIN will be 1 or something lower than INT_MAX.


Looks ok.


Also, the usual approach is to exchange args so that the largest is
first, because there may be a software emulation if the hardware does
not implement modulo. At least it was the case with some sparc
processors 25 years ago:-)


The args will exchange themselves.


Yep, but after a possibly expensive software-emulated modulo operation?

--
Fabien.

Re: Greatest Common Divisor

2019-12-28 Thread Vik Fearing
On 28/12/2019 19:15, Fabien COELHO wrote:
>
>> So here one is, using the basic Euclidean algorithm.  I made one for
>> smallint, integer, and bigint.
>
> Should pg provide the LCM as well? Hmmm, probably not, too likely to
> overflow.


I decided against it for that reason.


> Should there be a NUMERIC version as well? I'd say maybe yes.


I thought about that, too, but also decided against it for this patch.


> I'm wondering what it should do on N, 0 and 0, 0. Raise an error?
> Return 0? Return 1? return N? There should be some logic and comments
> explaining it.


Well, gcd(N, 0) is N, and gcd(0, 0) is 0, so I don't see an issue here?


> I'd test with INT_MIN and INT_MAX.


Okay, I'll add tests for those, instead of the pretty much random values
I have now.


> Given that there are no overflows risk with the Euclidian descent, would
> it make sense that the int2 version call the int4 implementation?


Meh.


>
> C modulo operator (%) is a pain because it is not positive remainder
> (2 % -3 == -1 vs 2 % 3 == 2, AFAICR). 


This does not seem to be the case...


> It does not seem that fixing the sign afterwards is the right thing to
> do. I'd rather turn both arguments positive before doing the descent.


Why isn't it the right thing to do?


> Which raises an issue with INT_MIN by the way, which has no positive:-(


That's an argument against abs-ing the input values.  It's only an issue
with gcd(INT_MIN, INT_MIN) which currently returns INT_MIN.  Any other
value with INT_MIN will be 1 or something lower than INT_MAX.


> Also, the usual approach is to exchange args so that the largest is
> first, because there may be a software emulation if the hardware does
> not implement modulo. At least it was the case with some sparc
> processors 25 years ago:-)


The args will exchange themselves.


Thanks for the review!  Attached is a new patch that changes the
regression tests based on your comments (and another comment that I got
on irc to test gcd(b, a)).

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..b2c663cd92 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,19 @@
-43
   
 
+  
+   
+
+ gcd
+    
+    gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 04825fc77d..bc74d367bb 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,47 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/* int[24]gcd()
+ * Greatest Common Divisor
+ */
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32		arg1 = PG_GETARG_INT32(0);
+	int32		arg2 = PG_GETARG_INT32(1);
+
+	while (arg2 != 0)
+	{
+		int32	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT32(arg1);
+}
+
+Datum
+int2gcd(PG_FUNCTION_ARGS)
+{
+	int16		arg1 = PG_GETARG_INT16(0);
+	int16		arg2 = PG_GETARG_INT16(1);
+
+	while (arg2 != 0)
+	{
+		int16	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT16(arg1);
+}
+
 Datum
 int2larger(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index a40ae40dff..97cbd1443b 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -667,6 +667,28 @@ int8mod(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(arg1 % arg2);
 }
 
+/* int8gcd()
+ * Greatest Common Divisor
+ */
+Datum
+int8gcd(PG_FUNCTION_ARGS)
+{
+	int64		arg1 = PG_GETARG_INT64(0);
+	int64		arg2 = PG_GETARG_INT64(1);
+
+	while (arg2 != 0)
+	{
+		int64	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT64(arg1);
+}
+
 
 Datum
 int8inc(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..b3057f2cdd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10729,4 +10729,15 @@
   proname => 'pg_partition_root', prorettype => 'regclass',
   proargtypes => 'regclass', prosrc => 'pg_partition_root' },
 
+# greatest common divisor
+{ oid => '8463', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int2', proargtypes => 'int2 int2',
+  prosrc => 'int2gcd' },
+{ oid => '8464', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int4', proargtypes => 'int4 int4',
+  prosrc => 'int4gcd' },
+{ oid => '8465', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int8', proargtypes => 'int8 int8',
+  prosrc => 'int8gcd' },
+
 ]
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 8c255b9e4d..cca47af7fb 100644
--- a/

Re: Greatest Common Divisor

2019-12-28 Thread Fabien COELHO


Bonsoir Vik,


I recently came across the need for a gcd function (actually I needed
lcm) and was surprised that we didn't have one.


Why not.


So here one is, using the basic Euclidean algorithm.  I made one for
smallint, integer, and bigint.


Should pg provide the LCM as well? Hmmm, probably not, too likely to 
overflow.


Should there be a NUMERIC version as well? I'd say maybe yes.

I'm wondering what it should do on N, 0 and 0, 0. Raise an error? Return 
0? Return 1? return N? There should be some logic and comments explaining 
it.


I'd test with INT_MIN and INT_MAX.

Given that there are no overflows risk with the Euclidian descent, would
it make sense that the int2 version call the int4 implementation?

C modulo operator (%) is a pain because it is not positive remainder (2 % 
-3 == -1 vs 2 % 3 == 2, AFAICR). It does not seem that fixing the sign 
afterwards is the right thing to do. I'd rather turn both arguments 
positive before doing the descent.


Which raises an issue with INT_MIN by the way, which has no positive:-(

Also, the usual approach is to exchange args so that the largest is first, 
because there may be a software emulation if the hardware does not 
implement modulo. At least it was the case with some sparc processors 25 
years ago:-)


See for instance (the int min value is probably not well handled):

  https://svn.cri.ensmp.fr/svn/linear/trunk/src/arithmetique/pgcd.c

Basically, welcome to arithmetic:-)

--
Fabien.

Greatest Common Divisor

2019-12-28 Thread Vik Fearing
I recently came across the need for a gcd function (actually I needed
lcm) and was surprised that we didn't have one.


So here one is, using the basic Euclidean algorithm.  I made one for
smallint, integer, and bigint.

-- 

Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..b2c663cd92 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -870,6 +870,19 @@
-43
   
 
+  
+   
+
+ gcd
+
+gcd(a, b)
+   
+   (same as argument types)
+   greatest common divisor
+   gcd(1071, 462)
+   21
+  
+
   

 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 04825fc77d..bc74d367bb 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1196,6 +1196,47 @@ int2abs(PG_FUNCTION_ARGS)
 	PG_RETURN_INT16(result);
 }
 
+/* int[24]gcd()
+ * Greatest Common Divisor
+ */
+Datum
+int4gcd(PG_FUNCTION_ARGS)
+{
+	int32		arg1 = PG_GETARG_INT32(0);
+	int32		arg2 = PG_GETARG_INT32(1);
+
+	while (arg2 != 0)
+	{
+		int32	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT32(arg1);
+}
+
+Datum
+int2gcd(PG_FUNCTION_ARGS)
+{
+	int16		arg1 = PG_GETARG_INT16(0);
+	int16		arg2 = PG_GETARG_INT16(1);
+
+	while (arg2 != 0)
+	{
+		int16	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT16(arg1);
+}
+
 Datum
 int2larger(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index a40ae40dff..97cbd1443b 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -667,6 +667,28 @@ int8mod(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(arg1 % arg2);
 }
 
+/* int8gcd()
+ * Greatest Common Divisor
+ */
+Datum
+int8gcd(PG_FUNCTION_ARGS)
+{
+	int64		arg1 = PG_GETARG_INT64(0);
+	int64		arg2 = PG_GETARG_INT64(1);
+
+	while (arg2 != 0)
+	{
+		int64	tmp = arg2;
+		arg2 = arg1 % arg2;
+		arg1 = tmp;
+	}
+
+	if (arg1 < 0)
+		arg1 = -arg1;
+
+	PG_RETURN_INT64(arg1);
+}
+
 
 Datum
 int8inc(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..b3057f2cdd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10729,4 +10729,15 @@
   proname => 'pg_partition_root', prorettype => 'regclass',
   proargtypes => 'regclass', prosrc => 'pg_partition_root' },
 
+# greatest common divisor
+{ oid => '8463', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int2', proargtypes => 'int2 int2',
+  prosrc => 'int2gcd' },
+{ oid => '8464', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int4', proargtypes => 'int4 int4',
+  prosrc => 'int4gcd' },
+{ oid => '8465', descr => 'greatest common divisor',
+  proname => 'gcd', prorettype => 'int8', proargtypes => 'int8 int8',
+  prosrc => 'int8gcd' },
+
 ]
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 8c255b9e4d..66906a7b03 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,11 @@ FROM (VALUES (-2.5::numeric),
   2.5 |  3
 (7 rows)
 
+-- gcd
+SELECT gcd(a, b), gcd(a, -b), gcd(-a, b), gcd(-a, -b)
+FROM (VALUES (24948, 4914)) AS v(a, b);
+ gcd | gcd | gcd | gcd 
+-+-+-+-
+ 378 | 378 | 378 | 378
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index bda7a8daef..a261964454 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -403,3 +403,11 @@ FROM (VALUES (-2.5::numeric),
   2.5 |  3
 (7 rows)
 
+-- gcd
+SELECT gcd(a, b), gcd(a, -b), gcd(-a, b), gcd(-a, -b)
+FROM (VALUES (6186, 6410818)) AS v(a, b);
+ gcd  | gcd  | gcd  | gcd  
+--+--+--+--
+ 1466 | 1466 | 1466 | 1466
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 8447a28c3d..1a0836998a 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -388,6 +388,16 @@ SELECT '' AS five, q1 * 2 AS "twice int4" FROM INT8_TBL;
   | 9135780246913578
 (5 rows)
 
+SELECT q1, q2, gcd(q1, q2) FROM INT8_TBL;
+q1|q2 |   gcd
+--+---+--
+  123 |   456 |3
+  123 |  4567890123456789 |3
+ 4567890123456789 |   123 |3
+ 4567890123456789 |  4567890123456789 | 4567890123456789
+ 4567890123456789 | -4567890123456789 | 4567890123456789
+(5 rows)
+
 -- int8 op int4
 SELECT q1 + 42::int4 AS "8plus4"