from:"mark"

On Wed, Jul 19, 2017 at 10:08 PM, Alvaro Herrera 
wrote:

> So let's step back a bit,
> get a patch that works for the case where the types match on both sides
> of the FK, then we review that patch; if all is well, we can discuss the
> other problem as a stretch goal.

Agreed. This should be a future improvment.

I think the next step should be testing the performnce before/after the
modifiactions.

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Mark Dilger

> On Jul 19, 2017, at 9:53 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> So, thinking about how that would actually work ... the thing to do in
>> order to preserve existing on-disk values is to alternate between signed
>> and unsigned interpretations of abstimes.  That is, right now, abstime
>> is signed with origin 1970.  The conversion I'm arguing we should make
>> real soon now is to unsigned with origin 1970.  If the project lives
>> so long, in circa 70 years we'd switch it to signed with origin 2106.
>> Yadda yadda.
> 
> Doesn't this plan amount to breaking pg_upgrade compatibility and
> hoping that nobody notice?  Existing values will be silently
> reinterpreted according to the new rules.  If we had two actually
> separate types and let people convert columns from the old type to the
> new type with just a validation scan, that would be engineering at the
> level of quality that I've come to associate with this project.  

This is what I have done in my fork.  I repurposed the type as "secondstamp"
since I think of it as a timestamp down to second precision (truncating
fractional seconds.)  I changed the Oids assigned to the catalog entries
associated with the type, and considered myself somewhat immune to
the project dropping the abstime type, which the documentation warned
would happen eventually.

> If
> this type is so marginal that we don't want to do that kind of work,
> then I think we should just rip it out; that doesn't preclude someone
> maintaining it in their own fork, or even adding it back as a new type
> in a contrib module with a #define for the base year.  Silently
> changing the interpretation of the same data in the core code, though,
> seems both far too clever and not clever enough.

I would be happy to contribute the "secondstamp" type as part of a patch
that removes the abstime type.  I can add a catalog table that holds the
epoch, and add documentation for the type stating that every time the
epoch changes, their data will be automatically reinterpreted, and as such
they should only use the datatype if that is ok. 

Under this plan, users with abstime columns who upgrade to postgres 11
will not have an easy time, because the type will be removed.  But that is
the same and no worse than what they would get if we just remove the
abstime type in postgres 11 without any replacement. 

I'm not implementing any of this yet, as I expect further objections.

Thoughts?

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Mark Dilger

> On Jul 19, 2017, at 10:23 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Doesn't this plan amount to breaking pg_upgrade compatibility and
>>> hoping that nobody notice?
>> 
>> Well, what we'd need to do is document that the type is only meant to be
>> used to store dates within say +/- 30 years from current time.  As long
>> as people adhere to that use-case, the proposal would work conveniently
>> long into the future ...
> 
> Typically, when you try to store an out-of-range value in PostgreSQL,
> you get an ERROR, and that's one of the selling points of PostgreSQL.
> PostgreSQL users regularly beat up other projects for, say, allowing
> -00-00 to be considered a valid date, or any similar perceived
> laxity in enforcing data consistency.  I don't like the idea that we
> can just deviate from that principle whenever adhering to it is too
> much work.

I don't see the relevance of this statement.  I am not planning to allow
abstime data that is outside the range of the new epoch.  Attempts to
cast strings like '1962-07-07 01:02:03' to abstime would draw an
exception with a suitably informative message.

Now, the objection to having on-disk data automagically change meaning
is concerning, and I was particularly unsatisfied with the idea that
NOSTART_ABSTIME and NOEND_ABSTIME would suddenly be
reinterpreted as seconds in the year 2068, which is why I made mention
of it upthread.  I was less concerned with dates prior to 1970 being
reinterpreted, though it is hard to justify why that bothers me less.

>> I'd definitely be on board with just dropping the type altogether despite
>> Mark's concern.
> 
> Then I vote for that option.

I was somewhat surprised when Tom was onboard with the idea of keeping
abstime around (for my benefit).  My original post was in response to his
statement that, right offhand, he couldn't think of any use for abstime that
wasn't handled better by timestamptz (paraphrase), and so I said that
improving storage efficiency was such a use.  I maintain that position.  The
abstime type is a good and useful type, and we will lose that use when we
discard it.  Those who feel otherwise might like to also argue for dropping
float4 because float8 does all the same stuff better.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

On Wed, Jul 19, 2017 at 7:28 PM, Robert Haas  wrote:

> Why do we have to solve that limitation?


Since the regress test labled element_foreing_key fails now that I made the
RI queries utilise @(anyarray, anyelement), that means it's not functioning
as it is meant to be.

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

*To summarise,* the options we have to solve the limitation of the
@>(anyarray , anyelement) where it produces the following error: operator
does not exist: integer[] @> smallint

*Option 1: *Multiple Operators
Have separate operators for every combination of datatypes instead of a
single polymorphic definition (i.e int4[] @>> int8, int4[] @>> int4, int4[]
@>> int2, int4[] @>> numeric.)

Drawback: High maintenance.


*Option 2: *Explicit casting
Where we compare the datatype of the 2 operands and cast with the
appropriate datatype

Drawback: figuring out the appropriate cast may require considerable
computation


*Option 3:* Unsafe Polymorphic datatypes
This a little out there. But since @>(anyarray, anyelement) have to resolve
to the same datatype. How about defining new datatypes without this
constraint? Where we handle the datatypes ourselves? It would ve something
like @>(unsafeAnyarray, unsafeAnyelement).

Drawback: a lot of defensive programming has to be implemented to guard
against any exception.


*Another thing*
Until this is settled, another thing I have to go through is performance
testing. To provide evidence that all we did actually enhances the
performance of the RI checks. How can I go about this?

Best Regards,
Mark Rofail

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

On Tue, Jul 18, 2017 at 11:14 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:
>
> Why did we add an operator and not a support
> procedure?

I thought the support procedures were constant within an opclass. They
implement the mandotary function required of an opclass. I don't see why we
would need to implement new ones since they already deal with the lefthand
operand which is the refrencing coloumn and is always an array so anyarray
would suffice.

Also the support procedure don't interact with the left and right operands
simultanously. And we want to target the combinations of  int4[] @>> int8,
int4[] @>> int4, int4[] @>> int2, int4[] @>> numeric.

So I think implementing operators is the way to go.

Best Regards,
Mark Rofail.

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-18 Thread Mark Dilger


> On Jul 18, 2017, at 9:13 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
> 
>> 
>> On Jul 17, 2017, at 2:34 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>> 
>>> 
>>> On Jul 17, 2017, at 2:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> 
>>> Mark Dilger <hornschnor...@gmail.com> writes:
>>>>> On Jul 17, 2017, at 12:54 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>>>> On Jul 15, 2017, at 3:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>>>> The types abstime, reltime, and tinterval need to go away, or be
>>>>>> reimplemented, sometime well before 2038 when they will overflow.
>>> 
>>>>> These types provide a 4-byte datatype for storing real-world second
>>>>> precision timestamps, as occur in many log files.  Forcing people to
>>>>> switch to timestamp or timestamptz will incur a 4 byte per row
>>>>> penalty.  In my own builds, I have changed the epoch on these so
>>>>> they won't wrap until sometime after 2100 C.E.  I see little point in
>>>>> switching to an 8-byte millisecond precision datatype when a perfectly
>>>>> good 4-byte second precision datatype already serves the purpose.
>>> 
>>> Well, if you or somebody is willing to do the legwork, I'd be on board
>>> with a plan that says that every 68 years we redefine the origin of
>>> abstime.  I imagine it could be done so that currently-stored abstime
>>> values retain their present meaning as long as they're not too old.
>>> For example the initial change would toss abstimes before 1970 overboard,
>>> repurposing that range of values as being 2038-2106, but values between
>>> 1970 and 2038 still mean the same as they do today.  If anybody still
>>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>>> again, lather rinse repeat.
>>> 
>>> But we're already past the point where it would be time to make the
>>> first such switch, if we're gonna do it.  So I'd like to see somebody
>>> step up to the plate sooner not later.
>> 
>> Assuming other members of the community would not object to such
>> a plan, I'd be willing to step up to that plate.  I'll wait a respectable 
>> time,
>> maybe until tomorrow, to allow others to speak up.
> 
> There was not much conversation about this, so I went ahead with what
> I think makes a logical first step.  The attached patch removes the tinterval
> datatype from the sources.
> 
> I intend to remove reltime next, and then make the changes to abstime in
> a third patch.

As predicted, this second patch (which should be applied *after* the prior
tinterval_abatement patch) removes the reltime datatype from the sources.

mark



reltime_abatement.patch.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-18 Thread Mark Dilger


> On Jul 17, 2017, at 2:34 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
> 
>> 
>> On Jul 17, 2017, at 2:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> 
>> Mark Dilger <hornschnor...@gmail.com> writes:
>>>> On Jul 17, 2017, at 12:54 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>>> On Jul 15, 2017, at 3:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>>> The types abstime, reltime, and tinterval need to go away, or be
>>>>> reimplemented, sometime well before 2038 when they will overflow.
>> 
>>>> These types provide a 4-byte datatype for storing real-world second
>>>> precision timestamps, as occur in many log files.  Forcing people to
>>>> switch to timestamp or timestamptz will incur a 4 byte per row
>>>> penalty.  In my own builds, I have changed the epoch on these so
>>>> they won't wrap until sometime after 2100 C.E.  I see little point in
>>>> switching to an 8-byte millisecond precision datatype when a perfectly
>>>> good 4-byte second precision datatype already serves the purpose.
>> 
>> Well, if you or somebody is willing to do the legwork, I'd be on board
>> with a plan that says that every 68 years we redefine the origin of
>> abstime.  I imagine it could be done so that currently-stored abstime
>> values retain their present meaning as long as they're not too old.
>> For example the initial change would toss abstimes before 1970 overboard,
>> repurposing that range of values as being 2038-2106, but values between
>> 1970 and 2038 still mean the same as they do today.  If anybody still
>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>> again, lather rinse repeat.
>> 
>> But we're already past the point where it would be time to make the
>> first such switch, if we're gonna do it.  So I'd like to see somebody
>> step up to the plate sooner not later.
> 
> Assuming other members of the community would not object to such
> a plan, I'd be willing to step up to that plate.  I'll wait a respectable 
> time,
> maybe until tomorrow, to allow others to speak up.

There was not much conversation about this, so I went ahead with what
I think makes a logical first step.  The attached patch removes the tinterval
datatype from the sources.

I intend to remove reltime next, and then make the changes to abstime in
a third patch.

mark



tinterval_abatement.patch.1
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-18 Thread Mark Cave-Ayland

On 17/07/17 18:08, Magnus Hagander wrote:

> On Mon, Jul 17, 2017 at 6:47 PM, Mark Cave-Ayland
> <mark.cave-ayl...@ilande.co.uk <mailto:mark.cave-ayl...@ilande.co.uk>>
> wrote: 
> Great to hear from you! It has definitely been a while...
> 
> Indeed. You should spend more time on these lists :P

Well I do get the emails, unfortunately it's trying to find the time to
read them all ;)

> > How would that actually work, though? Given the same user in ldap could
> > now potentially match multiple different users in PostgreSQL. Would you
> > then create two accounts for the user, one with the uid as name and one
> > with email as name? Wouldn't that actually cause more issues than it 
> solves?
> 
> Normally what happens for search+bind is that you execute the custom
> filter with substitutions in order to find the entry for your bind DN,
> but you also request the value of ldapsearchattribute (or equivalent) at
> the same time. Say for example you had an entry like this:
> 
> dn: uid=mha,dc=users,dc=hagander,dc=net
> uid: mha
> mail: mag...@hagander.net <mailto:mag...@hagander.net>
> 
> Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
> "uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
> email address.
> 
> If the bind is successful then the current user identity should be set
> to the value of the ldapsearchattribute retrieved from the bind DN
> entry, which with the default of "uid" would be "mha". Hence you end up
> with the same user role "mha" regardless of whether a uid or email
> address was entered.
> 
> 
> Right. This is the part that doesn't work for PostgreSQL. Because we
> have already specified the username -- it goes in the startup packet in
> order to pick the correct row from pg_hba.conf.

I don't think that's necessarily going to be an issue here because if
you're specifying a custom LDAP filter then there's a very good chance
that you're delegating access control to information held in the
directory anyway, e.g.

  (&(memberOf=cn=pgusers,dc=groups,dc=hagander,dc=net)(uid=%u))

  ((&(uid=%u)(|(uid=mha)(uid=mark)(uid=thomas)))

In the mail example above when you're using more than one attribute, I
think it's fair enough to simply say in the documentation you must set
user to "all" in pg_hba.conf since it is impossible to know which
attribute is being used to identify the user role until after
authentication.

I should add that personally I don't recommend such setups where the
user can log in using more than one identifier, but there are clients
out there who absolutely will insist on it (think internal vs. external
users) so if the LDAP support is being updated then it's worth exploring
to see if these cases can be supported.

> I guess in theory we could treat it like Kerberos or another one of the
> systems where we get the username from an external entity. But then
> you'd still have to specify the mapping again in pg_ident.conf, and it
> would be a pretty strong break of backwards compatibility.

(goes and glances at the code)

Is there no reason why you couldn't just overwrite port->user_name based
upon ldapsearchattribute at the end of CheckLDAPAuth?

And if this were only enabled when a custom filter were specified then
it shouldn't cause any backwards compatibility issues being a new feature?

> In terms of matching multiple users, all LDAP authentication routines
> I've seen will fail if more than one DN matches the search filter, so
> this nicely handles the cases where either the custom filter is
> incorrect or more than one entry is accidentally matched in the
> directory.
> 
> So do we, in the current implementation. But it's a lot less likely to
> happen in the current implementation, since we do a single equals lookup.

Great, that's absolutely fine :)


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-18 Thread Mark Rofail

On Tue, 18 Jul 2017 at 7:43 pm, Alexander Korotkov 
wrote:

>  separate operators for int4[] @>> int8, int4[] @>> int4, int4[] @>> int2,
> int4[] @>> numeric.
>

My only comment on the separate operators is its high maintenance.  Any new
datatype introduced a corresponding operator should be created.

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-18 Thread Mark Rofail

On Tue, 18 Jul 2017 at 7:43 pm, Alexander Korotkov <aekorot...@gmail.com>
wrote:

> On T upue, Jul 18, 2017 at 2:24 AM, Mark Rofail <markm.rof...@gmail.com>
> wrote:
>
>> On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera <
>> alvhe...@2ndquadrant.com> wrote:
>>>
>>> We have one opclass for each type combination -- int4 to int2, int4 to
>>> int4, int4 to int8, etc.  You just need to add the new strategy to all
>>> the opclasses.
>>
>>
>>  I tried this approach by manually declaring the operator multiple of
>> times in pg_amop.h (src/include/catalog/pg_amop.h)
>>
>> so instead of the polymorphic declaration
>> DATA(insert ( 2745   2277 2283 5 s 6108 2742 0 )); /* anyarray @>>
>> anyelem */
>>
>> multiple declarations were used, for example for int4[] :
>> DATA(insert ( 2745   1007 20 5 s 6108 2742 0 )); /* int4[] @>> int8 */
>> DATA(insert ( 2745   1007 23 5 s 6108 2742 0 )); /* int4[] @>> int4 */
>> DATA(insert ( 2745   1007 21 5 s 6108 2742 0 )); /* int4[] @>> int2 */
>> DATA(insert ( 2745   1007 1700 5 s 6108 2742 0 ));/* int4[] @>> numeric
>> */
>>
>> However, make check produced:
>> could not create unique index "pg_amop_opr_fam_index"
>> Key (amopopr, amoppurpose, amopfamily)=(6108, s, 2745) is duplicated.
>>
>> Am I implementing this the wrong way or do we need to look for another
>> approach?
>>
>
> The problem is that you need to have not only opclass entries for the
> operators, but also operators themselves.  I.e. separate operators for
> int4[] @>> int8, int4[] @>> int4, int4[] @>> int2, int4[] @>> numeric.  You
> tried to add multiple pg_amop rows for single operator and consequently get
> unique index violation.
>
> Alvaro, do you think we need to define all these operators?  I'm not
> sure.  If even we need it, I think
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends


> On Jul 17, 2017, at 3:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <hornschnor...@gmail.com> writes:
>>> On Jul 17, 2017, at 2:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Well, if you or somebody is willing to do the legwork, I'd be on board
>>> with a plan that says that every 68 years we redefine the origin of
>>> abstime.  I imagine it could be done so that currently-stored abstime
>>> values retain their present meaning as long as they're not too old.
>>> For example the initial change would toss abstimes before 1970 overboard,
>>> repurposing that range of values as being 2038-2106, but values between
>>> 1970 and 2038 still mean the same as they do today.  If anybody still
>>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>>> again, lather rinse repeat.
> 
>> Assuming other members of the community would not object to such
>> a plan, I'd be willing to step up to that plate.
> 
> So, thinking about how that would actually work ... the thing to do in
> order to preserve existing on-disk values is to alternate between signed
> and unsigned interpretations of abstimes.  That is, right now, abstime
> is signed with origin 1970.  The conversion I'm arguing we should make
> real soon now is to unsigned with origin 1970.  If the project lives
> so long, in circa 70 years we'd switch it to signed with origin 2106.
> Yadda yadda.
> 
> Now, this should mostly work conveniently, except that we have
> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with.  Those
> are nicely positioned at the ends of the signed range so that
> abstime_cmp_internal() doesn't need to treat them specially.
> In an unsigned world they'd be smack in the middle of the range.
> We could certainly teach abstime_cmp_internal to special-case them
> and deliver logically correct results, but there's a bigger problem:
> those will correspond to two seconds in January 2038 that will need
> to be representable when the time comes.  Maybe we can make that
> work by defining the values past 2038 as being two seconds less than
> you might otherwise expect, but I bet it's gonna be messy.  It might
> be saner to just desupport +/-infinity for abstime.
> 
> The same goes for INVALID_ABSTIME, which doesn't have any clear
> use-case that I know of, and is certainly not documented.

I use the abstime type in some catalog tables, and if I allowed those
fields to have something equivalent to NULL, which I do not, I would need
INVALID_ABSTIME, since NULL is not allowed in the constant header of a
catalog table.

I wonder if old versions of postgres had catalog tables with abstime used
in this way?  Other than that, I can't think of a reason to use INVALID_ABSTIME.

mark




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-17 Thread Mark Rofail

On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera 
 wrote:
>
> We have one opclass for each type combination -- int4 to int2, int4 to
> int4, int4 to int8, etc.  You just need to add the new strategy to all
> the opclasses.


 I tried this approach by manually declaring the operator multiple of times
in pg_amop.h (src/include/catalog/pg_amop.h)

so instead of the polymorphic declaration
DATA(insert ( 2745   2277 2283 5 s 6108 2742 0 )); /* anyarray @>> anyelem
*/

multiple declarations were used, for example for int4[] :
DATA(insert ( 2745   1007 20 5 s 6108 2742 0 )); /* int4[] @>> int8 */
DATA(insert ( 2745   1007 23 5 s 6108 2742 0 )); /* int4[] @>> int4 */
DATA(insert ( 2745   1007 21 5 s 6108 2742 0 )); /* int4[] @>> int2 */
DATA(insert ( 2745   1007 1700 5 s 6108 2742 0 ));/* int4[] @>> numeric */

However, make check produced:
could not create unique index "pg_amop_opr_fam_index"
Key (amopopr, amoppurpose, amopfamily)=(6108, s, 2745) is duplicated.

Am I implementing this the wrong way or do we need to look for another
approach?
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index a5238c3af5..9d6447923d 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy	5
 
 
 /*
@@ -43,7 +44,7 @@ ginarrayextract(PG_FUNCTION_ARGS)
 	bool	   *nulls;
 	int			nelems;
 
-	get_typlenbyvalalign(ARR_ELEMTYPE(array),
+	get_typlenbyvalalign(ARR_ELEMTYPE(array),	
 		 , , );
 
 	deconstruct_array(array,
@@ -110,6 +111,11 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
+		case GinContainsElemStrategy:
+			/* only items that match the queried element 
+are considered candidate  */
+			*searchMode = GIN_SEARCH_MODE_DEFAULT;
+			break;
 		case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
@@ -171,6 +177,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +265,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+			case GinContainsElemStrategy:
+			case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6e19..8c9eb0c676 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4232,6 +4232,117 @@ arraycontained(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(result);
 }
 
+/*
+ * array_contains_elem : checks an array for a spefific element
+ */
+static bool
+array_contains_elem(AnyArrayType *array, Datum elem, Oid element_type,
+bool element_isnull, Oid collation,	void **fn_extra)
+{
+	Oid 		arr_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+	int 		nelems;
+	int			typlen;
+	bool		typbyval;
+	char		typalign;
+	int			i;
+	array_iter 	it1;
+	FunctionCallInfoData locfcinfo;
+
+	if (arr_type != element_type)
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot compare different element types")));
+	
+	if (element_isnull)
+		return false;
+		
+	/*
+	 * We arrange to look up the equality function only once per series of
+	 * calls, assuming the element type doesn't change underneath us.  The
+	 * typcache is used so that we have no memory leakage when being used as
+	 * an index support function.
+	 */
+	typentry = (TypeCacheEntry *)*fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != arr_type)
+	{
+		typentry = lookup_type_cache(arr_type,
+	 TYPECACHE_EQ_OPR_FINFO);
+		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+			ereport(ERROR,
+	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+	 errmsg("could not identify an equality operator for type %s",
+			format_type_be(arr_type;
+		*fn_extra = (void *)typentry;
+	}
+	typlen = typentry->typlen;
+	typbyval = typentry->typbyval;
+	typalign = typentry->typalign;
+
+	/*
+	 * Apply the comparison operator to each pair of array elements.
+	 */
+	InitFunctionCallInfoData(locfcinfo, >eq_opr_finfo, 2,
+			 collation, NULL, NULL);
+
+	/* Loop over source data */
+	nelems = ArrayGetNItems(AARR_NDIM(array), AARR_DIMS(array));
+	array_iter_setup(, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum elt1;
+		bool isnull;
+		bool oprresult;
+
+		/* Get element, checking for NULL */
+		elt1 = array_iter_next(, , i, typlen, typbyval, typalign);
+
+		/*
+		 * We assume that the comparison operator is strict, so a NULL can't
+		 * match anything.  XXX this diverges from the "NULL=NULL" behavior of
+		 * array_eq, should we act like that?
+		 */
+		if (isnull)
+			continue;
+
+		/*
+			*

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends


> On Jul 17, 2017, at 3:56 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <hornschnor...@gmail.com> writes:
>> On Jul 17, 2017, at 3:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Now, this should mostly work conveniently, except that we have
>>> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with ... It might
>>> be saner to just desupport +/-infinity for abstime.
> 
>> I don't use those values, so it is no matter to me if we desupport them.  It
>> seems a bit pointless, though, because we still have to handle legacy
>> values that we encounter.  I assume some folks will have those values in
>> their tables when they upgrade.
> 
> Well, some folks will also have pre-1970 dates in their tables, no?
> We're just blowing those off.  They'll print out as some post-2038
> date or other, and too bad.
> 
> Basically, the direction this is going in is that abstime will become
> an officially supported type, but its range of supported values is "not
> too many decades either way from now".  If you are using it to store
> very old dates then You're Doing It Wrong, and eventually you'll get
> bitten.  Given that contract, I don't see a place for +/-infinity.

Works for me.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends


> On Jul 17, 2017, at 3:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <hornschnor...@gmail.com> writes:
>>> On Jul 17, 2017, at 2:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Well, if you or somebody is willing to do the legwork, I'd be on board
>>> with a plan that says that every 68 years we redefine the origin of
>>> abstime.  I imagine it could be done so that currently-stored abstime
>>> values retain their present meaning as long as they're not too old.
>>> For example the initial change would toss abstimes before 1970 overboard,
>>> repurposing that range of values as being 2038-2106, but values between
>>> 1970 and 2038 still mean the same as they do today.  If anybody still
>>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>>> again, lather rinse repeat.
> 
>> Assuming other members of the community would not object to such
>> a plan, I'd be willing to step up to that plate.
> 
> So, thinking about how that would actually work ... the thing to do in
> order to preserve existing on-disk values is to alternate between signed
> and unsigned interpretations of abstimes.  That is, right now, abstime
> is signed with origin 1970.  The conversion I'm arguing we should make
> real soon now is to unsigned with origin 1970.  If the project lives
> so long, in circa 70 years we'd switch it to signed with origin 2106.
> Yadda yadda.

Right, I already had the same idea.  That's not how I am doing it in my
fork currently, but that's what you would want to do to support any values
already stored somewhere.

> Now, this should mostly work conveniently, except that we have
> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with.  Those
> are nicely positioned at the ends of the signed range so that
> abstime_cmp_internal() doesn't need to treat them specially.
> In an unsigned world they'd be smack in the middle of the range.
> We could certainly teach abstime_cmp_internal to special-case them
> and deliver logically correct results, but there's a bigger problem:
> those will correspond to two seconds in January 2038 that will need
> to be representable when the time comes.  Maybe we can make that
> work by defining the values past 2038 as being two seconds less than
> you might otherwise expect, but I bet it's gonna be messy.  It might
> be saner to just desupport +/-infinity for abstime.

I don't use those values, so it is no matter to me if we desupport them.  It
seems a bit pointless, though, because we still have to handle legacy
values that we encounter.  I assume some folks will have those values in
their tables when they upgrade.

> The same goes for INVALID_ABSTIME, which doesn't have any clear
> use-case that I know of, and is certainly not documented.

Likewise, I don't know how to desupport this while still supporting legacy
tables.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends


> On Jul 17, 2017, at 2:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <hornschnor...@gmail.com> writes:
>>> On Jul 17, 2017, at 12:54 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>> On Jul 15, 2017, at 3:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>> The types abstime, reltime, and tinterval need to go away, or be
>>>> reimplemented, sometime well before 2038 when they will overflow.
> 
>>> These types provide a 4-byte datatype for storing real-world second
>>> precision timestamps, as occur in many log files.  Forcing people to
>>> switch to timestamp or timestamptz will incur a 4 byte per row
>>> penalty.  In my own builds, I have changed the epoch on these so
>>> they won't wrap until sometime after 2100 C.E.  I see little point in
>>> switching to an 8-byte millisecond precision datatype when a perfectly
>>> good 4-byte second precision datatype already serves the purpose.
> 
> Well, if you or somebody is willing to do the legwork, I'd be on board
> with a plan that says that every 68 years we redefine the origin of
> abstime.  I imagine it could be done so that currently-stored abstime
> values retain their present meaning as long as they're not too old.
> For example the initial change would toss abstimes before 1970 overboard,
> repurposing that range of values as being 2038-2106, but values between
> 1970 and 2038 still mean the same as they do today.  If anybody still
> cares in circa 2085, we toss 1970-2038 overboard and move the origin
> again, lather rinse repeat.
> 
> But we're already past the point where it would be time to make the
> first such switch, if we're gonna do it.  So I'd like to see somebody
> step up to the plate sooner not later.

Assuming other members of the community would not object to such
a plan, I'd be willing to step up to that plate.  I'll wait a respectable time,
maybe until tomorrow, to allow others to speak up.

> I'd be inclined to toss reltime and tinterval overboard in any case.

Yeah, I don't use those either, preferring to cast abstime to timestamp
(or timestamptz) prior to doing any math on them.

> 
>> Oh, and if you could be so kind, please remove them in a commit that
>> does nothing else.  It's much easier to skip the commit that way.
> 
> We doubtless would do that, but you're fooling yourself if you imagine
> that you can maintain such a fork for long while still tracking the
> community version otherwise.  Once those hand-assigned OIDs are free
> they'll soon be absorbed by future feature patches, and then you'll
> have enormous merge problems.

Oh, not so much.  Knowing that they were going to be deprecated, I
already moved the Oids for these to something higher than 1.  In
my own builds, the Oid generator does not assign Oids lower than 65536,
which leaves room for me to assign in the 1-65535 range without
merge headaches.  It would, however, be simpler if the types did not
go away, as that would cause minor merge headaches elsewhere, such
as in the regression tests.

mark





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends


> On Jul 17, 2017, at 12:54 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
> 
> 
>> On Jul 15, 2017, at 3:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> 
>> The types abstime, reltime, and tinterval need to go away, or be
>> reimplemented, sometime well before 2038 when they will overflow.
>> It's not too soon to start having a plan for that, especially seeing
>> that it seems to take a decade or more for us to actually get rid
>> of anything we've deprecated.
>> 
>> Right offhand, I don't think there is any functionality in these
>> types that isn't handled as well or better by timestamptz, interval,
>> and tstzrange respectively.  
> 
> These types provide a 4-byte datatype for storing real-world second
> precision timestamps, as occur in many log files.  Forcing people to
> switch to timestamp or timestamptz will incur a 4 byte per row
> penalty.  In my own builds, I have changed the epoch on these so
> they won't wrap until sometime after 2100 C.E.  I see little point in
> switching to an 8-byte millisecond precision datatype when a perfectly
> good 4-byte second precision datatype already serves the purpose.
> 
> That said, I am fully aware that these are deprecated and expect you
> will remove them, at which time I'll have to keep them in my tree
> and politely refuse to merge in your change which removes them.

Oh, and if you could be so kind, please remove them in a commit that
does nothing else.  It's much easier to skip the commit that way.

Thanks!

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

> On Jul 15, 2017, at 3:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> The types abstime, reltime, and tinterval need to go away, or be
> reimplemented, sometime well before 2038 when they will overflow.
> It's not too soon to start having a plan for that, especially seeing
> that it seems to take a decade or more for us to actually get rid
> of anything we've deprecated.
> 
> Right offhand, I don't think there is any functionality in these
> types that isn't handled as well or better by timestamptz, interval,
> and tstzrange respectively.  

These types provide a 4-byte datatype for storing real-world second
precision timestamps, as occur in many log files.  Forcing people to
switch to timestamp or timestamptz will incur a 4 byte per row
penalty.  In my own builds, I have changed the epoch on these so
they won't wrap until sometime after 2100 C.E.  I see little point in
switching to an 8-byte millisecond precision datatype when a perfectly
good 4-byte second precision datatype already serves the purpose.

That said, I am fully aware that these are deprecated and expect you
will remove them, at which time I'll have to keep them in my tree
and politely refuse to merge in your change which removes them.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Mark Cave-Ayland

On 17/07/17 13:09, Magnus Hagander wrote:

Hi Magnus,

Great to hear from you! It has definitely been a while...

> Generally you find that you will be given the option to set the
> attribute for the default search filter of the form
> "(attribute=username)" which defaults to uid for UNIX-type systems and
> sAMAccountName for AD. However there is always the ability to specify a
> custom filter where the user is substituted via e.g. %u to cover all the
> other use-cases.
> 
> Right, but that's something we do already today. It just defaults to
> uid, but it's easy to change. 

Yes, I think that bit is fine as long as the default can be overridden.
There's always a choice as to whether the defaults favour a POSIX-based
LDAP or AD environment but that happens with all installations so it's
not a big deal.

> As an example, I don't know if anyone would actually do this with
> PostgreSQL but I've been asked on multiple occasions to configure
> software so that users should be allowed to log in with either their
> email address or username which is easily done with a custom LDAP filter
> like "(|(mail=%u)(uid=%u))".
> 
> 
> How would that actually work, though? Given the same user in ldap could
> now potentially match multiple different users in PostgreSQL. Would you
> then create two accounts for the user, one with the uid as name and one
> with email as name? Wouldn't that actually cause more issues than it solves?

Normally what happens for search+bind is that you execute the custom
filter with substitutions in order to find the entry for your bind DN,
but you also request the value of ldapsearchattribute (or equivalent) at
the same time. Say for example you had an entry like this:

dn: uid=mha,dc=users,dc=hagander,dc=net
uid: mha
mail: mag...@hagander.net

Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
"uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
email address.

If the bind is successful then the current user identity should be set
to the value of the ldapsearchattribute retrieved from the bind DN
entry, which with the default of "uid" would be "mha". Hence you end up
with the same user role "mha" regardless of whether a uid or email
address was entered.

In terms of matching multiple users, all LDAP authentication routines
I've seen will fail if more than one DN matches the search filter, so
this nicely handles the cases where either the custom filter is
incorrect or more than one entry is accidentally matched in the directory.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 17/07/17 00:14, Stephen Frost wrote:

>> If it helps, we normally recommend that clients use ldaps for both AD
>> and UNIX environments, although this can be trickier from an
>> administrative perspective in AD environments because it can require
>> changes to the Windows firewall and certificate installation.
> 
> LDAPS is better than straight LDAP, of course, but it still doesn't
> address the issue that the password is sent to the server, which both
> SCRAM and Kerberos do and is why AD environments use Kerberos for
> authentication, and why everything in an AD environment also should use
> Kerberos.
> 
> Using Kerberos should also avoid the need to hack the Windows firewall
> or deal with certificate installation.  In an AD environment, it's
> actually pretty straight-forward to add a PG server too.  Further, in my
> experience at least, there's been other changes recommended by Microsoft
> that prevent using LDAP for auth because it's insecure.

Oh sure - I'm not questioning that Kerberos is a far better choice in
pure AD environments, it's just that I spend the majority of my time in
mixed-mode environments where Windows is very much in the minority.

In my experience LDAP is often implemented badly; for example the
majority of software still uses simple binds (i.e. plain logins) rather
than using SASL binds which support a whole range of better
authentication methods (e.g. GSSAPI, and even DIGEST-MD5 has been
mandatory for v3 and is implemented on AD).

And yes, while better authentication mechanisms do exist, I find that
all too often most software packages claim LDAP support rather than
Kerberos, and even then it is often limited to LDAP simple binds without
ldaps support.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 16/07/17 23:26, Thomas Munro wrote:

> Thank you very much for this feedback and example, which I used in the
> documentation in the patch.  I see similar examples in the
> documentation for other things on the web.
> 
> I'll leave it up to Magnus and Stephen to duke it out over whether we
> want to encourage LDAP usage, extend documentation to warn about
> cleartext passwords with certain LDAP implementations or
> configurations, etc etc.  I'll add this patch to the commitfest and
> get some popcorn.

If it helps, we normally recommend that clients use ldaps for both AD
and UNIX environments, although this can be trickier from an
administrative perspective in AD environments because it can require
changes to the Windows firewall and certificate installation.

Whilst OpenLDAP will support ldap+starttls you can end up with some
clients with starttls either disabled or misconfigured sending plaintext
passwords over the wire regardless, so it's generally easiest to
firewall ldap port 389 at the edge of the trusted VLAN so that only
ldaps port 636 connections make it out onto the untrusted network
hosting the local AD/OpenLDAP server.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 16/07/17 00:08, Thomas Munro wrote:

> On Fri, Jul 14, 2017 at 11:04 PM, Magnus Hagander <mag...@hagander.net> wrote:
>> On Thu, Jul 13, 2017 at 9:31 AM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>> A post on planet.postgresql.org today reminded me that a colleague had
>>> asked me to post this POC patch here for discussion.  It allows custom
>>> filters with ldapsearchprefix and ldapsearchsuffix.  Another approach
>>> might be to take a filter pattern with "%USERNAME%" or whatever in it.
>>> There's an existing precedent for the prefix and suffix approach, but
>>> on the other hand a pattern approach would allow filters where the
>>> username is inserted more than once.
>>
>>
>> Do we even need prefix/suffix? If we just make it "ldapsearchpattern", then
>> you could have something like:
>>
>> ldapsearchattribute="uid"
>> ldapsearchfilter="|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)"
>>
>> We could then always to substitution of the kind:
>> (&(attr=)())
>>
>> which would in this case give:
>> (&(uid=mha)(|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)))
>>
>>
>> Basically we'd always AND together the username lookup with the additional
>> filter.
> 
> Ok, so we have 3 ideas put forward:
> 
> 1.  Wrap username with ldapsearchprefix ldapsearchsuffix to build
> filter (as implemented by POC patch)
> 2.  Optionally AND ldapsearchfilter with the existing
> ldapsearchattribute-based filter (Magnus's proposal)
> 3.  Pattern-based ldapsearchfilter so that %USER% is replaced with
> username (my other suggestion)
> 
> The main argument for approach 1 is that it follows the style of the
> bind-only mode.
> 
> With idea 2, I wonder if there are some more general kinds of things
> that people might want to do that that wouldn't be possible because it
> has to include (attribute=user)... perhaps something involving a
> substring or other transformation functions (but I'm no LDAP expert,
> that may not make sense).
> 
> With idea 3 it would allow "(|(foo=%USER%)(bar=%USER%))", though I
> don't know if any such multiple-mention filters would ever make sense
> in a sane LDAP configuration.
> 
> Any other views from LDAP-users?

I've spent quite a bit of time integrating various bits of
non-PostgreSQL software to LDAP and in my experience option 3 tends to
be the standard.

Generally you find that you will be given the option to set the
attribute for the default search filter of the form
"(attribute=username)" which defaults to uid for UNIX-type systems and
sAMAccountName for AD. However there is always the ability to specify a
custom filter where the user is substituted via e.g. %u to cover all the
other use-cases.

As an example, I don't know if anyone would actually do this with
PostgreSQL but I've been asked on multiple occasions to configure
software so that users should be allowed to log in with either their
email address or username which is easily done with a custom LDAP filter
like "(|(mail=%u)(uid=%u))".


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New partitioning - some feedback

2017-07-16 Thread Mark Kirkwood


On 16/07/17 05:24, David Fetter wrote:


On Fri, Jul 14, 2017 at 09:49:25PM -0500, Robert Haas wrote:

On Mon, Jul 10, 2017 at 5:46 PM, David Fetter <da...@fetter.org> wrote:

With utmost respect, it's less messy than adding '!' to the already
way too random and mysterious syntax of psql's \ commands.  What
should '\det!' mean?  What about '\dT!'?

Since \det lists foreign tables, \det! would list foreign tables even
if they are partitions.  Plain \det would show only the ones that are
not partitions.

\dT! wouldn't be meaningful, since \dT lists data types and data types
can't be partitions.  If you're trying to conjure up a rule that every
\d command must accept the same set of modifiers, a quick
look at the output of \? and a little experimentation will quickly
show you that neither S nor + apply to all command types, so I see no
reason why that would need to be true for a new modifier either.

TBH, I think we should just leave this well enough alone.  We're
post-beta2 now, there's no clear consensus on what to do here, and
there will be very little opportunity for users to give us feedback if
we stick a change into an August beta3 before a September final
release.

I think a new modifier would be too rushed at this stage, but there's
no reason to throw out the progress on \d vs \dt.




+1

And similarly, there seemed to be a reasonably clear push to label the 
'partitions' as such.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-14 Thread Mark Rofail

On Wed, Jul 12, 2017 at 2:30 PM, Mark Rofail <markm.rof...@gmail.com> wrote:

> On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera <alvhe...@2ndquadrant.com
> > wrote:
>>
>> We have one opclass for each type combination -- int4 to int2, int4 to
>> int4, int4 to int8, etc.  You just need to add the new strategy to all
>> the opclasses.
>>
>
> Can you clarify this solution ? I think another solution would be external
> casting
>
>>
>> If external casting is to be used. If for example the two types in
question are smallint and integer. Would a function get_common_type(Oid
leftopr, Oid rightopr) be useful ?, that given the two types return the
"common" type between the two in this case integer.

Best Regards,
 Mark Rofail

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-12 Thread Mark Rofail

On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:
>
> We have one opclass for each type combination -- int4 to int2, int4 to
> int4, int4 to int8, etc.  You just need to add the new strategy to all
> the opclasses.
>

Can you clarify this solution ? I think another solution would be external
casting

BTW now that we've gone through this a little further, it's starting to
> look like a mistake to me to use the same @> operator for (anyarray,
> anyelement) than we use for (anyarray, anyarray).


I agree. Changed to @>>

Best Regards,
Mark Rofail

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-11 Thread Mark Rofail

here are the modifications to ri_triggers.c

On Wed, Jul 12, 2017 at 12:26 AM, Mark Rofail <markm.rof...@gmail.com>
wrote:
>
> *What I did *
>
>- now the RI checks utilise the @>(anyarray, anyelement)
>
> Best Regards,
> Mark Rofail
>
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3a25ba52f3..2d2b8e6a4f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2650,7 +2650,7 @@ quoteRelationName(char *buffer, Relation rel)
  * ri_GenerateQual --- generate a WHERE clause equating two variables
  *
  * The idea is to append " sep leftop op rightop" to buf, or if fkreftype is
- * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop op ANY(rightop)" to buf.
+ * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop <@ rightop" to buf.
  *
  * The complexity comes from needing to be sure that the parser will select
  * the desired operator.  We always name the operator using
@@ -2694,21 +2694,34 @@ ri_GenerateQual(StringInfo buf,
  	else
  		oprright = operform->oprright;
  
-	appendStringInfo(buf, " %s %s", sep, leftop);
-	if (leftoptype != operform->oprleft)
-		ri_add_cast_to(buf, operform->oprleft);
- 
- 	appendStringInfo(buf, " OPERATOR(%s.%s) ",
+ 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT){
+		appendStringInfo(buf, " %s %s", sep, rightop);
+
+		if (rightoptype != oprright)
+ 			ri_add_cast_to(buf, oprright);
+
+		appendStringInfo(buf, " @> ");
+
+		appendStringInfoString(buf, leftop);
+
+		if (leftoptype != operform->oprleft)
+			ri_add_cast_to(buf, operform->oprleft);
+	 }	
+	else{
+		appendStringInfo(buf, " %s %s", sep, leftop);
+
+		if (leftoptype != operform->oprleft)
+			ri_add_cast_to(buf, operform->oprleft);
+
+		appendStringInfo(buf, " OPERATOR(%s.%s) ",
  	 quote_identifier(nspname), oprname);
- 
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoString(buf, "ANY (");
- 	appendStringInfoString(buf, rightop);
- 	if (rightoptype != oprright)
- 		ri_add_cast_to(buf, oprright);
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoChar(buf, ')');
 
+		appendStringInfoString(buf, rightop);
+
+		if (rightoptype != oprright)
+ 			ri_add_cast_to(buf, oprright);
+	}
+	
 	ReleaseSysCache(opertup);
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-11 Thread Mark Rofail

On Sun, Jul 9, 2017 at 7:42 PM, Alexander Korotkov <aekorot...@gmail.com>
wrote:

> We may document that GIN index is required to accelerate RI queries for
> array FKs.  And users are encouraged to manually define them.
> It's also possible to define new option when index on referencing
> column(s) would be created automatically.  But I think this option should
> work the same way for regular FKs and array FKs.
>

I just thought because GIN index is suited for composite elements, it would
be appropriate for array FKs.

So we should leave it to the user ? I think tht would be fine too.

*What I did *

   - now the RI checks utilise the @>(anyarray, anyelement)
  - however there's a small problem:
  operator does not exist: integer[] @> smallint
  I assume that external casting would be required here. But how can I
  downcast smallint to integer or interger to numeric automatically ?

*What I plan to do*

   - work on the above mentioned buy/limitation
   - otherwise, I think this concludes limitation #5

fatal performance issues.  If you issue any UPDATE or DELETE against the PK
> table, you get a query like this for checking to see if the RI constraint
> would be violated:

SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;.

or is there anything remaining ?

Best Regards,
Mark Rofail

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-09 Thread Mark Rofail

On Sun, Jul 9, 2017 at 2:38 AM, Alexander Korotkov 
wrote:

> Could you, please, specify idea of what you're implementing in more
> detail?
>

Ultimatley we would like an indexed scan instead of a sequential scan, so I
thought we needed to index the FK array columns first.

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-08 Thread Mark Rofail

* What I am working on*

   - since we want to create an index on the referencing column, I am
   working on firing a 'CREATE INDEX' query programatically right after the
   'CREATE TABLE' query
   - The problem I ran into is how to specify my Strategy (
  GinContainsElemStrategy) within the CREATE INDEX query. For
example: CREATE
  INDEX ON fktable USING gin (fkcolumn array_ops)
  Where does the strategy number fit?
  - The patch is attached here, is the approach I took to creating an
  index programmatically, correct?

Best Regard,
Mark Rofail
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dc18fd1eae..085b63aa98 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7139,6 +7139,31 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 	 errmsg("array foreign keys support only NO ACTION and RESTRICT actions")));
+
+		IndexStmt *stmt = makeNode(IndexStmt);
+		stmt->unique = false; /* is index unique? Nope, should allow duplicates*/
+		stmt->concurrent = false; /* should this be a concurrent index build? we want 
+	to lock out writes on the table until it's done. */
+		stmt->idxname = NULL; 		/* let the idxname be generated */ 
+		stmt->relation = /* relation name */;
+		stmt->accessMethod = "gin";	/* name of access method: GIN */
+		stmt->indexParams = /* column name + */"array_ops";
+		stmt->options = NULL;
+		stmt->tableSpace = NULL; 	/* NULL for default */
+		stmt->whereClause = NULL;
+		stmt->excludeOpNames = NIL;
+		stmt->idxcomment = NULL;
+		stmt->indexOid = InvalidOid;
+		stmt->oldNode = InvalidOid; /* relfilenode of existing storage, if any: None*/
+		stmt->primary = false; 		/* is index a primary key? Nope */
+		stmt->isconstraint = false; /* is it for a pkey/unique constraint? Nope */
+		stmt->deferrable = false;
+		stmt->initdeferred = false;
+		stmt->transformed = false;
+		stmt->if_not_exists = false; /* just do nothing if index already exists? Nope 
+	(this shouldn't happen)*/
+
+		ATExecAddIndex(tab, rel, stmt, true, lockmode);
 	}
 
  	/*
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3a25ba52f3..0045f64c9e 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2650,7 +2650,7 @@ quoteRelationName(char *buffer, Relation rel)
  * ri_GenerateQual --- generate a WHERE clause equating two variables
  *
  * The idea is to append " sep leftop op rightop" to buf, or if fkreftype is
- * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop op ANY(rightop)" to buf.
+ * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop <@ rightop" to buf.
  *
  * The complexity comes from needing to be sure that the parser will select
  * the desired operator.  We always name the operator using
@@ -2697,17 +2697,10 @@ ri_GenerateQual(StringInfo buf,
 	appendStringInfo(buf, " %s %s", sep, leftop);
 	if (leftoptype != operform->oprleft)
 		ri_add_cast_to(buf, operform->oprleft);
- 
- 	appendStringInfo(buf, " OPERATOR(%s.%s) ",
- 	 quote_identifier(nspname), oprname);
- 
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoString(buf, "ANY (");
+ 	appendStringInfo(buf, " @> "); 
  	appendStringInfoString(buf, rightop);
  	if (rightoptype != oprright)
  		ri_add_cast_to(buf, oprright);
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoChar(buf, ')');
 
 	ReleaseSysCache(opertup);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New partitioning - some feedback

2017-07-07 Thread Mark Kirkwood


On 07/07/17 19:54, Michael Banck wrote:


On Fri, Jul 07, 2017 at 07:40:55PM +1200, Mark Kirkwood wrote:

On 07/07/17 13:29, Amit Langote wrote:

Someone complained about this awhile back [1].  And then it came up again
[2], where Noah appeared to take a stance that partitions should be
visible in views / output of commands that list "tables".

Although I too tend to prefer not filling up the \d output space by
listing partitions (pg_class.relispartition = true relations), there
wasn't perhaps enough push for creating a patch for that.  If some
committer is willing to consider such a patch, I can make one.

Yeah, me too (clearly). However if the consensus is that all these partition
tables *must* be shown in \d output, then I'd be happy if they were
identified as such rather than just 'table' (e.g 'partition table').

+1.

Or maybe just 'partition' is enough if 'partition table' would widen the
column output unnecessarily.




Yeah, that is probably better (and 'partition table' is potentially 
confusing as Robert pointed out).


Cheers

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Revisiting NAMEDATALEN

2017-07-07 Thread Mark Dilger

> On Jul 7, 2017, at 2:53 AM, Emrul <em...@emrul.com> wrote:
> 
> Tom, thank you for that pointer.  I get now that it is not free and therefore
> why its not something that should be changed by default.
> 
> I guess the problem is 'build your own copy' (i.e. compiling from source) is
> something that sends most DB teams running into the hills.

To make matters worse, if you change NAMEDATALEN, compile, and run
'make check', some of the tests will fail.  The tests are very sensitive to the
exact output of the sql they execute, and changing NAMEDATALEN, or
indeed any one of many other options, causes some of the test output to
change. Even configure's options, such as --with-blocksize, cannot be
changed from the default value without potentially breaking the regression
tests.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New partitioning - some feedback

2017-07-07 Thread Mark Kirkwood


On 07/07/17 13:29, Amit Langote wrote:



Someone complained about this awhile back [1].  And then it came up again
[2], where Noah appeared to take a stance that partitions should be
visible in views / output of commands that list "tables".

Although I too tend to prefer not filling up the \d output space by
listing partitions (pg_class.relispartition = true relations), there
wasn't perhaps enough push for creating a patch for that.  If some
committer is willing to consider such a patch, I can make one.




Yeah, me too (clearly). However if the consensus is that all these 
partition tables *must* be shown in \d output, then I'd be happy if they 
were identified as such rather than just 'table' (e.g 'partition table').


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New partitioning - some feedback

2017-07-06 Thread Mark Kirkwood

I've been trying out the new partitioning in version 10. Firstly, I must 
say this is excellent - so much nicer than the old inheritance based method!


My only niggle is the display of partitioned tables via \d etc. e.g:

part=# \d
List of relations
 Schema | Name | Type  |  Owner
+--+---+--
 public | date_fact| table | postgres
 public | date_fact_201705 | table | postgres
 public | date_fact_201706 | table | postgres
 public | date_fact_20170601   | table | postgres
 public | date_fact_2017060100 | table | postgres
 public | date_fact_201707 | table | postgres
 public | date_fact_rest   | table | postgres
(7 rows)

Now it can be inferred from the names that date_fact is a partitioned 
table and the various date_fact_ are its partitions - but \d is not 
providing any hints of this. The more detailed individual describe is fine:


part=# \d date_fact
  Table "public.date_fact"
 Column |   Type   | Collation | Nullable | Default
+--+---+--+-
 id | integer  |   | not null |
 dte| timestamp with time zone |   | not null |
 val| integer  |   | not null |
Partition key: RANGE (dte)
Number of partitions: 6 (Use \d+ to list them.)

I'd prefer *not* to see a table and its partitions all intermixed in the 
same display (especially with nothing indicating which are partitions) - 
as this will make for unwieldy long lists when tables have many 
partitions. Also it would be good if the 'main' partitioned table and 
its 'partitions' showed up as a different type in some way.


I note the they do in pg_class:

part=# SELECT relname,relkind,relispartition FROM pg_class WHERE relname 
LIKE 'date_fact%';

   relname| relkind | relispartition
--+-+
 date_fact| p   | f
 date_fact_201705 | r   | t
 date_fact_201706 | r   | t
 date_fact_20170601   | r   | t
 date_fact_2017060100 | r   | t
 date_fact_201707 | r   | t
 date_fact_rest   | r   | t
(7 rows)

...so it looks to be possible to hide the partitions from the main 
display and/or mark them as such. Now I realize that making this comment 
now that beta is out is a bit annoying - apologies, but I think seeing a 
huge list of 'tables' is going to make \d frustrating for folk doing 
partitioning.


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-05 Thread Mark Rofail

To make the queries fired by the RI triggers GIN indexed. We need to ‒ as
Tom Lane has previously suggested[1] ‒ to replace the query

SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;

with

SELECT 1 FROM ONLY fktable x WHERE ARRAY[$1] <@ fkcol FOR SHARE OF x;

but since we have @<(anyarray, anyelement) it can be improved to

SELECT 1 FROM ONLY fktable x WHERE $1 @> fkcol FOR SHARE OF x;

and the piece of code responsible for all of this is ri_GenerateQual in
ri_triggers.c.

How to accomplish that is the next step. I don't know if we should hardcode
the "@>" symbol or if we just index the fk table then ri_GenerateQual would
be able to find the operator on it's own.

*What I plan to do:*

   - study how to index the fk table upon its creation. I suspect this can
   be done in tablecmds.c

*Questions:*

   - how can you programmatically in C index a table?

[1] https://www.postgresql.org/message-id/28389.1351094795%40sss.pgh.pa.us

Best Regards,
Mark Rofail
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3a25ba52f3..0045f64c9e 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2650,7 +2650,7 @@ quoteRelationName(char *buffer, Relation rel)
  * ri_GenerateQual --- generate a WHERE clause equating two variables
  *
  * The idea is to append " sep leftop op rightop" to buf, or if fkreftype is
- * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop op ANY(rightop)" to buf.
+ * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop <@ rightop" to buf.
  *
  * The complexity comes from needing to be sure that the parser will select
  * the desired operator.  We always name the operator using
@@ -2697,17 +2697,10 @@ ri_GenerateQual(StringInfo buf,
 	appendStringInfo(buf, " %s %s", sep, leftop);
 	if (leftoptype != operform->oprleft)
 		ri_add_cast_to(buf, operform->oprleft);
- 
- 	appendStringInfo(buf, " OPERATOR(%s.%s) ",
- 	 quote_identifier(nspname), oprname);
- 
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoString(buf, "ANY (");
+ 	appendStringInfo(buf, " @> "); 
  	appendStringInfoString(buf, rightop);
  	if (rightoptype != oprright)
  		ri_add_cast_to(buf, oprright);
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoChar(buf, ')');
 
 	ReleaseSysCache(opertup);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

2017-07-05 Thread Mark Dilger


> On Jul 5, 2017, at 5:30 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> 
> On Wed, Jul 5, 2017 at 9:44 AM, Mark Dilger <hornschnor...@gmail.com> wrote:
>> 
>>> On Jul 3, 2017, at 10:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> 
>>> On Mon, Jul 3, 2017 at 8:57 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>>> On 30 June 2017 at 05:14, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>>> 
>>>>> This is explained in section 15.2 [1], refer below para:
>>>>> "The query might be suspended during execution. In any situation in
>>>>> which the system thinks that partial or incremental execution might
>>>>> occur, no parallel plan is generated. For example, a cursor created
>>>>> using DECLARE CURSOR will never use a parallel plan. Similarly, a
>>>>> PL/pgSQL loop of the form FOR x IN query LOOP .. END LOOP will never
>>>>> use a parallel plan, because the parallel query system is unable to
>>>>> verify that the code in the loop is safe to execute while parallel
>>>>> query is active."
>>>> 
>>>> Can you explain "unable to verify that the code in the loop is safe to
>>>> execute while parallel query is active". Surely we aren't pushing code
>>>> in the loop into the actual query, so the safety of command in the FOR
>>>> loop has nothing to do with the parallel safety of the query.
>>>> 
>>>> Please give an example of something that would be unsafe? Is that
>>>> documented anywhere, README etc?
>>>> 
>>>> FOR x IN query LOOP .. END LOOP
>>>> seems like a case that would be just fine, since we're going to loop
>>>> thru every row or break early.
>>>> 
>>> 
>>> It is not fine because we don't support partial execution support.  In
>>> above case, if the loop breaks, we can't break parallel query
>>> execution.  Now, I don't think it will impossible to support the same,
>>> but as of now, parallel subsystem doesn't have such a support.
>> 
>> I can understand this, but wonder if I could use something like
>> 
>> FOR I TOTALLY PROMISE TO USE ALL ROWS rec IN EXECUTE sql LOOP
>> ...
>> END LOOP;
>> 
>> if I hacked the grammar up a bit.  Would the problem go away, or would
>> I still have problems when exceptions beyond my control get thrown inside
>> the loop?
>> 
> 
> I don't think it is just a matter of hacking grammar, internally we
> are using cursor fetch to fetch the rows and there we are passing some
> fixed number of rows to fetch which again is a killer to invoke the
> parallel query.

Is the risk that a RETURN will be executed within the loop block the only
risk that is causing parallel query to be disabled in these plpgsql loops?
If so, it seems that a plpgsql language syntax which created a loop that
disabled RETURN from within the loop body would solve the problem.  I
do not propose any such language extension.  It was just a thought
experiment.

Is there a conditional somewhere in the source that says, in effect, "if this
is a plpgsql loop, then disable parallel query"?  If so, and if I removed that
check such that parallel query would run in plpgsql loops, would I only
get breakage when I returned out of a loop?  Or would there be other
situations where that would break?

Thanks,

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

2017-07-04 Thread Mark Dilger


> On Jul 3, 2017, at 10:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> 
> On Mon, Jul 3, 2017 at 8:57 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> On 30 June 2017 at 05:14, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> 
>>> This is explained in section 15.2 [1], refer below para:
>>> "The query might be suspended during execution. In any situation in
>>> which the system thinks that partial or incremental execution might
>>> occur, no parallel plan is generated. For example, a cursor created
>>> using DECLARE CURSOR will never use a parallel plan. Similarly, a
>>> PL/pgSQL loop of the form FOR x IN query LOOP .. END LOOP will never
>>> use a parallel plan, because the parallel query system is unable to
>>> verify that the code in the loop is safe to execute while parallel
>>> query is active."
>> 
>> Can you explain "unable to verify that the code in the loop is safe to
>> execute while parallel query is active". Surely we aren't pushing code
>> in the loop into the actual query, so the safety of command in the FOR
>> loop has nothing to do with the parallel safety of the query.
>> 
>> Please give an example of something that would be unsafe? Is that
>> documented anywhere, README etc?
>> 
>> FOR x IN query LOOP .. END LOOP
>> seems like a case that would be just fine, since we're going to loop
>> thru every row or break early.
>> 
> 
> It is not fine because we don't support partial execution support.  In
> above case, if the loop breaks, we can't break parallel query
> execution.  Now, I don't think it will impossible to support the same,
> but as of now, parallel subsystem doesn't have such a support.

I can understand this, but wonder if I could use something like

FOR I TOTALLY PROMISE TO USE ALL ROWS rec IN EXECUTE sql LOOP
...
END LOOP;

if I hacked the grammar up a bit.  Would the problem go away, or would
I still have problems when exceptions beyond my control get thrown inside
the loop?  And if exception handling is a problem in the loop, are exceptions
somehow not a problem in other parallel queries?

Obviously I mean that syntax above in jest, but the question is in ernest.

Thanks,

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

2017-06-30 Thread Mark Dilger

> On Jun 29, 2017, at 8:55 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
> 
> 
> Changing myfunc to create a temporary table, to execute the sql to populate
> that temporary table, and to then loop through the temporary table's rows
> fixes the problem.  For the real-world example where I hit this, that single
> change decreases the runtime from 13.5 seconds to 2.5 seconds.

Actually, this is wrong.  On further review, by the time I had changed the
function definition, the data in the test server I was querying had likely 
changed
enough for the performance to change.  That duped me into thinking I had
found a work-around.  I can no longer reproduce any performance improvement
in this manner.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

2017-06-29 Thread Mark Dilger


> On Jun 29, 2017, at 9:14 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> 
> On Fri, Jun 30, 2017 at 9:25 AM, Mark Dilger <hornschnor...@gmail.com> wrote:
>> Hackers,
>> 
>> In src/pl/plpgsql/src/pl_exec.c: exec_run_select intentionally does not
>> allow a parallel plan if a portal will be returned.  This has the practical
>> consequence that a common coding practice (at least for me) of doing
>> something like:
>> 
>> create function myfunc(arg1 text, arg2 text) returns setof myfunctype as $$
>> declare
>>sql text;
>>result  myfunctype;
>> begin
>>-- unsafe interpolation, but this is just a code example
>>sql := 'select foo from bar where a = ' || arg1 || ' and b = ' || 
>> arg2;
>>for result in execute sql loop
>>return next result;
>>end loop;
>>return;
>> end;
>> $$ language plpgsql volatile;
>> 
>> can't run the generated 'sql' in parallel.  I think this is understandable, 
>> but
>> the documentation of this limitation in the sgml docs is thin.  Perhaps
>> someone who understands this limitation better than I do can document it?
>> 
> 
> This is explained in section 15.2 [1], refer below para:
> "The query might be suspended during execution. In any situation in
> which the system thinks that partial or incremental execution might
> occur, no parallel plan is generated. For example, a cursor created
> using DECLARE CURSOR will never use a parallel plan. Similarly, a
> PL/pgSQL loop of the form FOR x IN query LOOP .. END LOOP will never
> use a parallel plan, because the parallel query system is unable to
> verify that the code in the loop is safe to execute while parallel
> query is active."
> 
> Check if that clarifies your doubts, otherwise, we might need to write
> something more so that it can be easier for users to understand.

That's what I was looking for.  I think I had trouble finding it since I was
using plpgsql (notice no slash) and "parallel worker" and so forth to try
to find it.  "PL/pgSQL" and "parallel query" are not terms I use, and did
not notice them buried a few sentences down in this paragraph.

Thanks for the citation.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

2017-06-29 Thread Mark Dilger

Hackers,

In src/pl/plpgsql/src/pl_exec.c: exec_run_select intentionally does not
allow a parallel plan if a portal will be returned.  This has the practical
consequence that a common coding practice (at least for me) of doing
something like:

create function myfunc(arg1 text, arg2 text) returns setof myfunctype as $$
declare
sql text;
result  myfunctype; 
begin
-- unsafe interpolation, but this is just a code example
sql := 'select foo from bar where a = ' || arg1 || ' and b = ' || arg2;
for result in execute sql loop
return next result;
end loop;
return;
end;
$$ language plpgsql volatile;

can't run the generated 'sql' in parallel.  I think this is understandable, but
the documentation of this limitation in the sgml docs is thin.  Perhaps
someone who understands this limitation better than I do can document it?
Changing myfunc to create a temporary table, to execute the sql to populate
that temporary table, and to then loop through the temporary table's rows
fixes the problem.  For the real-world example where I hit this, that single
change decreases the runtime from 13.5 seconds to 2.5 seconds.  This
makes sense to me, because doing it that way means pl_exec knows that
it won't be returning a portal for the executed sql, so the parallel plan is
still allowed.

Sorry to be a nag.  I'm only trying to help the next person who might
otherwise trip over this limitation.  Perhaps this belongs in section 15.3.4
or 42.5.4.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-25 Thread Mark Rofail

*What I did:*



   - read into the old patch but couldn't apply it since it's quite old. It
   needs to be rebased and that's what I am working on.  It's a lot of work.
  - incomplete patch can be found attached here

*Bugs*

   - problem with the @>(anyarray, anyelement) opertator: if for example,
   you apply the operator as follows  '{AA646'}' @> 'AA646' it
   maps to @>(anyarray, anyarray) since 'AA646' is interpreted as
   char[] instead of Text

*Suggestion:*

   - since I needed to check if the Datum was null and its type, I had to
   do it in the arraycontainselem and pass it as a parameter to the underlying
   function array_contains_elem. I'm proposing to introduce a new struct like
   ArrayType, but ElementType along all with brand new MACROs to make dealing
   with anyelement easier in any polymorphic context.


Best Regards,
Mark Rofail

On Tue, Jun 20, 2017 at 12:19 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Mark Rofail wrote:
> > Okay, so major breakthrough.
> >
> > *Updates:*
> >
> >- The operator @>(anyarray, anyelement) is now functional
> >   - The segmentation fault was due to applying PG_FREE_IF_COPY on a
> >   datum when it should only be applied on TOASTed inputs
> >   - The only problem now is if for example you apply the operator as
> >   follows  '{AA646'}' @> 'AA646' it maps to
> @>(anyarray,
> >   anyarray) since 'AA646' is interpreted as char[] instead
> of Text
> >- Added some regression tests (src/test/regress/sql/arrays.sql) and
> >their results(src/test/regress/expected/arrays.out)
> >- wokred on the new GIN strategy, I don't think it would vary much
> from
> >GinContainsStrategy.
>
> OK, that's great.
>
> > *What I plan to do:*
> >
> >- I need to start working on the Referential Integrity code but I
> don't
> >where to start
>
> You need to study the old patch posted by Marco Nenciarini.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea655a10a8..712f631e88 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2288,6 +2288,14 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  confiselement
+  bool
+  
+  If a foreign key, is it an array ELEMENT
+  foreign key?
+ 
+
+ 
   coninhcount
   int4
   
@@ -2324,6 +2332,18 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  confelement
+  bool[]
+  
+  
+ 	If a foreign key, list of booleans expressing which columns
+ 	are array ELEMENT columns; see
+ 	
+ 	for details
+  
+ 
+
+ 
   conpfeqop
   oid[]
   pg_operator.oid
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..c1c847bc7e 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -881,7 +881,112 @@ CREATE TABLE order_items (
 .

   
-
+  
+   
+Array ELEMENT Foreign Keys
+ 
+
+ ELEMENT foreign key
+
+ 
+
+ constraint
+ Array ELEMENT foreign key
+
+ 
+
+ constraint
+ ELEMENT foreign key
+
+ 
+
+ referential integrity
+
+ 
+
+ Another option you have with foreign keys is to use a
+ referencing column which is an array of elements with
+ the same type (or a compatible one) as the referenced
+ column in the related table. This feature is called
+ array element foreign key and is implemented
+ in PostgreSQL with ELEMENT foreign key constraints,
+ as described in the following example:
+ 
+
+CREATE TABLE drivers (
+driver_id integer PRIMARY KEY,
+first_name text,
+last_name text,
+...
+);
+
+CREATE TABLE races (
+race_id integer PRIMARY KEY,
+title text,
+race_day DATE,
+...
+final_positions integer[] ELEMENT REFERENCES drivers
+);
+
+ 
+ The above example uses an array (final_positions)
+ to store the results of a race: for each of its elements
+ a referential integrity check is enforced on the
+ drivers table.
+ Note that ELEMENT REFERENCES is an extension
+ of PostgreSQL and it is not included in the SQL standard.
+
+ 
+
+ Even though the most common use case for array ELEMENT
+ foreign keys is on a single column key, you can define an array
+ ELEMENT foreign key constraint on a group
+ of columns. As the following example shows, it must be written in table
+ constraint form:
+ 
+
+CREATE TABLE available_moves (
+kind text,
+move text,
+description text,
+PRIMARY KEY (

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-19 Thread Mark Rofail

Okay, so major breakthrough.

*Updates:*

   - The operator @>(anyarray, anyelement) is now functional
  - The segmentation fault was due to applying PG_FREE_IF_COPY on a
  datum when it should only be applied on TOASTed inputs
  - The only problem now is if for example you apply the operator as
  follows  '{AA646'}' @> 'AA646' it maps to @>(anyarray,
  anyarray) since 'AA646' is interpreted as char[] instead of Text
   - Added some regression tests (src/test/regress/sql/arrays.sql) and
   their results(src/test/regress/expected/arrays.out)
   - wokred on the new GIN strategy, I don't think it would vary much from
   GinContainsStrategy.

*What I plan to do:*

   - I need to start working on the Referential Integrity code but I don't
   where to start

Best Regards,
Mark Rofail
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index cc7435e030..a1b3f53ed9 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy	5
 
 
 /*
@@ -110,6 +111,11 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
+		case GinContainsElemStrategy:
+			/* only items that match the queried element 
+are considered candidate  */
+			*searchMode = GIN_SEARCH_MODE_DEFAULT;
+			break;
 		case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
@@ -171,6 +177,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +265,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+			case GinContainsElemStrategy:
+			case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d9c8aa569c..c563aa564e 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4232,6 +4232,117 @@ arraycontained(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(result);
 }
 
+/*
+ * array_contains_elem : checks an array for a spefific element
+ */
+static bool
+array_contains_elem(AnyArrayType *array, Datum elem, Oid element_type,
+bool element_isnull, Oid collation,	void **fn_extra)
+{
+	Oid 		arr_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+	int 		nelems;
+	int			typlen;
+	bool		typbyval;
+	char		typalign;
+	int			i;
+	array_iter 	it1;
+	FunctionCallInfoData locfcinfo;
+
+	if (arr_type != element_type)
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot compare different element types")));
+	
+	if (element_isnull)
+		return false;
+		
+	/*
+	 * We arrange to look up the equality function only once per series of
+	 * calls, assuming the element type doesn't change underneath us.  The
+	 * typcache is used so that we have no memory leakage when being used as
+	 * an index support function.
+	 */
+	typentry = (TypeCacheEntry *)*fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != arr_type)
+	{
+		typentry = lookup_type_cache(arr_type,
+	 TYPECACHE_EQ_OPR_FINFO);
+		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+			ereport(ERROR,
+	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+	 errmsg("could not identify an equality operator for type %s",
+			format_type_be(arr_type;
+		*fn_extra = (void *)typentry;
+	}
+	typlen = typentry->typlen;
+	typbyval = typentry->typbyval;
+	typalign = typentry->typalign;
+
+	/*
+	 * Apply the comparison operator to each pair of array elements.
+	 */
+	InitFunctionCallInfoData(locfcinfo, >eq_opr_finfo, 2,
+			 collation, NULL, NULL);
+
+	/* Loop over source data */
+	nelems = ArrayGetNItems(AARR_NDIM(array), AARR_DIMS(array));
+	array_iter_setup(, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum elt1;
+		bool isnull;
+		bool oprresult;
+
+		/* Get element, checking for NULL */
+		elt1 = array_iter_next(, , i, typlen, typbyval, typalign);
+
+		/*
+		 * We assume that the comparison operator is strict, so a NULL can't
+		 * match anything.  XXX this diverges from the "NULL=NULL" behavior of
+		 * array_eq, should we act like that?
+		 */
+		if (isnull)
+			continue;
+
+		/*
+			* Apply the operator to the element pair
+			*/
+		locfcinfo.arg[0] = elt1;
+		locfcinfo.arg[1] = elem;
+		locfcinfo.argnull[0] = false;
+		locfcinfo.argnull[1] = false;
+		locfcinfo.isnull = false;
+		oprresult = DatumGetBool(FunctionCallInvoke());
+		if (oprresult)
+			return true;
+	}
+
+	return false;
+}
+
+Datum
+arraycontainselem(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_A

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-17 Thread Mark Rofail

*Updates till now:*

   - added a record to pg_proc (src/include/catalog/pg_proc.h)
   - modified opr_sanity regression check expected results
   - implemented a  low-level function called `array_contains_elem` as an
   equivalent to `array_contain_compare` but accepts anyelement instead of
   anyarray as the right operand. This is more efficient than constructing an
   array and then immediately deconstructing it.

*Questions:*

   - I'd like to check that anyelem and anyarray have the same element
   type. but anyelem is obtained from PG_FUNCTION_ARGS as a Datum. How can
   I make such a check?

Best Regards,
Mark Rofail
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index cc7435e030..214aac8fba 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy		5
 
 
 /*
@@ -43,7 +44,7 @@ ginarrayextract(PG_FUNCTION_ARGS)
 	bool	   *nulls;
 	int			nelems;
 
-	get_typlenbyvalalign(ARR_ELEMTYPE(array),
+	get_typlenbyvalalign(ARR_ELEMTYPE(array),	
 		 , , );
 
 	deconstruct_array(array,
@@ -110,7 +111,8 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
-		case GinContainsStrategy:
+			case GinContainsElemStrategy:
+			case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
 			else	/* everything contains the empty set */
@@ -171,6 +173,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +261,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+			case GinContainsElemStrategy:
+			case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d9c8aa569c..8009ab5acb 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4232,6 +4232,107 @@ arraycontained(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(result);
 }
 
+/*
+ * array_contains_elem : checks an array for a spefific element
+ */
+static bool
+array_contains_elem(AnyArrayType *array, Datum elem, Oid collation,
+	void **fn_extra)
+{
+	Oid 		element_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+	int 		nelems;
+	int			typlen;
+	bool		typbyval;
+	char		typalign;
+	int			i;
+	array_iter 	it1;
+	FunctionCallInfoData locfcinfo;
+
+	/*
+	 * We arrange to look up the equality function only once per series of
+	 * calls, assuming the element type doesn't change underneath us.  The
+	 * typcache is used so that we have no memory leakage when being used as
+	 * an index support function.
+	 */
+	typentry = (TypeCacheEntry *)*fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != element_type)
+	{
+		typentry = lookup_type_cache(element_type,
+	 TYPECACHE_EQ_OPR_FINFO);
+		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+			ereport(ERROR,
+	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+	 errmsg("could not identify an equality operator for type %s",
+			format_type_be(element_type;
+		*fn_extra = (void *)typentry;
+	}
+	typlen = typentry->typlen;
+	typbyval = typentry->typbyval;
+	typalign = typentry->typalign;
+
+	/*
+	 * Apply the comparison operator to each pair of array elements.
+	 */
+	InitFunctionCallInfoData(locfcinfo, >eq_opr_finfo, 2,
+			 collation, NULL, NULL);
+
+	/* Loop over source data */
+	nelems = ArrayGetNItems(AARR_NDIM(array), AARR_DIMS(array));
+	array_iter_setup(, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum elt1;
+		bool isnull;
+		bool oprresult;
+
+		/* Get element, checking for NULL */
+		elt1 = array_iter_next(, , i, typlen, typbyval, typalign);
+
+		/*
+		 * We assume that the comparison operator is strict, so a NULL can't
+		 * match anything.  XXX this diverges from the "NULL=NULL" behavior of
+		 * array_eq, should we act like that?
+		 */
+		if (isnull)
+			continue;
+
+		/*
+			* Apply the operator to the element pair
+			*/
+		locfcinfo.arg[0] = elt1;
+		locfcinfo.arg[1] = elem;
+		locfcinfo.argnull[0] = false;
+		locfcinfo.argnull[1] = false;
+		locfcinfo.isnull = false;
+		oprresult = DatumGetBool(FunctionCallInvoke());
+		if (oprresult)
+			return true;
+	}
+
+	return false;
+}
+
+Datum
+arraycontainselem(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY(0);
+	Datum elem = PG_GETARG_DATUM(1);
+	Oid collation = PG_GET_COLLATION();
+	bool result;
+
+	result = array_contains_elem(array, elem, collation,
+ >flinfo->fn_extra);
+
+	/* Avoid leaking memory when handed toasted input. */
+	AARR_FREE_IF_COPY(array, 0

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-14 Thread Mark Kirkwood


On 15/06/17 11:10, Tom Lane wrote:


Jeff Janes <jeff.ja...@gmail.com> writes:

On Sat, Jun 10, 2017 at 7:42 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

In the second place, this really fails to respond to what I'd call
the main usability problem with \dRp+, which is that the all-tables
property is likely to lead to an unreadably bulky list of affected tables.
What I'd say the patch ought to do is *replace* \dRp+'s list of affected
tables with a notation like "(all tables)" when puballtables is true.

I'd considered that, but I find the pager does a fine job of dealing with
the bulkiness of the list.

Have you tried it with a few tens of thousands of tables?  Even if your
pager makes it work comfortably, others might find it less satisfactory.


I thought it might be a good idea to not only
point out that it is all tables, but also remind people of exactly what
tables those are currently (in case it had slipped their mind that all
tables will include table from other schemas not in their search_path, for
example)

I'm not really buying that.  If they don't know what "all tables" means,
a voluminous list isn't likely to help much.

I was hoping we'd get some more votes in this thread, but it seems like
we've only got three, and by my count two of them are for just printing
"all tables".




I'd certainly prefer to see 'all tables' - in addition to being more 
compact, it also reflects more correctly how the publication was defined.


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-10 Thread Mark Kirkwood

On 06/06/17 10:12, Gavin Flower wrote:

On 06/06/17 09:41, Mark Kirkwood wrote:

On 05/06/17 09:30, Tom Lane wrote:

I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org

and it seems to me that there are a couple of things we ought to do
about

it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value". The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that
design
originated when we were envisioning stats samples of just a few
thousand
rows --- specifically, default_statistics_target was originally just
10,
leading to a 3000-row sample size. So accepting two-appearance
values as

MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

As a round number, I'm thinking that a good floor would be a frequency
estimate of 1/1000. With today's typical sample size of 3 rows,
a value would have to appear at least 30 times in the sample to be
believed to be an MCV. That seems like it gives us a reasonable margin
of error against the kind of sampling noise seen in the above-cited
thread.

than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this
rule is
totally useless to filter anything that we would otherwise consider
as an
MCV. I wonder if we shouldn't make it be "at least double the
estimated

average frequency".

Or possibly calculate the sample standard deviation and make use of
that to help decide on a more flexible cutoff than twice the avg
frequency?

Are there any research papers that might help us here (I'm drowning
in a sea of barely relevant search results for most phrases I've
tried so far)? I recall there were some that Tom referenced when this
stuff was originally written.

On the other hand I do have access to some mathematicians
specializing in statistics - so can get their thoughts on this issue
if you feel it would be worthwhile.

Cheers

Mark

The standard deviation (sd) is proportional to the square root of the
number in the sample in a Normal Distribution.

In a Normal Distribution, about 2/3 the values will be within plus or
minus one sd of the mean.

There seems to be an implicit assumption that the distribution of
values follows the Normal Distribution - has this been verified? I
suspect that real data will have a skewed distribution of values, and
may even be multi modal (multiple peaks) The Normal Distribution has
one central peak with 2 tails of the same shape & size.

So in a sample of 100 the sd is proportional to 10%,
for 10,000 the sd is proportional to 1%.

So essentially, the larger the sample size the more reliable is
knowledge of the most common values (ignoring pathologically extreme
distributions!) - the measure of reliability depends on the distribution.

How about selecting the cut off as the mean plus one sd, or something
of that nature? Note that the cut off point may result in no mcv's
being selected - especially for small samples.

If practicable, it would be good to sample real datasets. Suggest
looking at datasets were the current mechanism looks reasonable, and
ones were the estimates are too far off. Also, if possible, try any
new selection method on the datasets and see what the difference is.

The above is based on what I remember from my university statistics
papers, I took it up to 4th year level many moons ago.

With respect to the assumption of Normal distribution - it is easy to
come up with an example that shows it need not be: consider our our
Pgbench 'accounts' table. The distribution of 'bid' values is not Normal
(it is Uniform).

Now I realized I made people think about Normal distributions by
mentioning computing the standard deviation of the sample (and I
probably should have said 'standard deviation of the *sample
frequencies*') - but this was only a suggestion for working out how to
(maybe) be less arbitrary about how we decide what values to put in the
MCV list (currently 25% different from the mean frequency).

regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-10 Thread Mark Rofail

• After finding the arraycontains function, I implemented
arraycontainselem that corresponds to the operator @<(anyarray,
anyelem)
   ◦ Please read the attached patch file to view my progress.

•  In addition to src/backend/utils/adt/arrayfuncs.c where I
implemented arraycontainselem.

   ◦ I also edited pg_amop (src/include/catalog/pg_amop.h) since
it stores information about operators associated with access method
operator families.

+DATA(insert ( 2745   2277 2283 2 s 2753 2742 0 ));
{
2745: Oid amopfamily; (denotes gin array_ops)
277: Oid amoplefttype; (denotes anyaray)
2283: Oid amoprighttype; (denotes anyelem)
5: int16 amopstrategy; /* operator strategy number */ (denotes the new
startegy that is yet to be created)
's': char amoppurpose; (denotes 's' for search)
2753: Oid amopopr; (denotes the new operator Oid)
2742: Oid amopmethod;(denotes gin)
0: Oid amopsortfamily; (0 since search operator)
}

   ◦ And pg_operator (src/include/catalog/pg_operator.h) since it
stores information about operators.
+DATA(insert OID = 2753 (  "@>"   PGNSP PGUID b f f 2277 2283 16 0  0
arraycontainselem 0 0 ));
{
 "@>": NameData oprname; /* name of operator */
Oid oprnamespace; /* OID of namespace containing this oper */
Oid oprowner; /* operator owner */
'b': char oprkind; /* 'l', 'r', or 'b' */ (denotes infix)
'f': bool oprcanmerge; /* can be used in merge join? */
'f': bool oprcanhash; /* can be used in hash join? */
277: Oid oprleft; (denotes anyaray)
2283: Oid oprright; (denotes anyelem)
16: Oid oprresult;  (denotes boolean)
0: Oid oprcom; /* OID of commutator oper, or 0 if none */ (needs to be
revisited)
0: Oid oprnegate; /* OID of negator oper, or 0 if none */ (needs to be
revisited)
arraycontainselem: regproc oprcode; /* OID of underlying function */
0: regproc oprrest; /* OID of restriction estimator, or 0 */
0: regproc oprjoin; /* OID of join estimator, or 0 */
}
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index cc7435e030..14fedc8066 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy		5
 
 
 /*
@@ -110,7 +111,8 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
 			else	/* everything contains the empty set */
@@ -171,6 +173,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +261,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d9c8aa569c..e1ff6d33b5 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4215,6 +4215,40 @@ arraycontains(PG_FUNCTION_ARGS)
 }
 
 Datum
+arraycontainselem(PG_FUNCTION_ARGS)
+{
+		Datum *elem = PG_GETARG_DATUM(0);
+		AnyArrayType *array1;
+		AnyArrayType *array2 = PG_GETARG_ANY_ARRAY(1);
+		Oid collation = PG_GET_COLLATION();
+		bool result;
+
+		int16 typlen;
+		bool typbyval;
+		char typalign;
+		int nelems;
+
+		/* we have one element */
+		nelems= 1;
+
+		/* get required info about the element type */
+		get_typlenbyvalalign(ARR_ELEMTYPE(array),
+			 , , );
+
+		/* now build the array */
+		array1 =  construct_array(, nelems,collation, , , );
+
+		result = array_contain_compare(array2, array1, collation, true,
+			>flinfo->fn_extra);
+
+		/* Avoid leaking memory when handed toasted input. */
+		PG_FREE_IF_COPY(elem,0);
+		AARR_FREE_IF_COPY(array, 1);
+
+		PG_RETURN_BOOL(result);
+}
+
+Datum
 arraycontained(PG_FUNCTION_ARGS)
 {
 	AnyArrayType *array1 = PG_GETARG_ANY_ARRAY(0);
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index da0228de6b..2da9002577 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -687,6 +687,8 @@ DATA(insert (	2595   718 600 15 o 3291 783 1970 ));
  */
 DATA(insert (	2745   2277 2277 1 s 2750 2742 0 ));
 DATA(insert (	2745   2277 2277 2 s 2751 2742 0 ));
+//TODO link the operator's pg_operator OID
+DATA(insert ( 2745   2277 2283 5 s 2753 2742 0 ));
 DATA(insert (	2745   2277 2277 3 s 2752 2742 0 ));
 DATA(insert (	2745   2277 2277 4 s 1070 2742 0 ));
 
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index ccbb17efec..626a0b1c49 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -1567,6 +1567,9 @@

Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-05 Thread Mark Kirkwood

On 05/06/17 09:30, Tom Lane wrote:

I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org
and it seems to me that there are a couple of things we ought to do about
it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value". The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that design
originated when we were envisioning stats samples of just a few thousand
rows --- specifically, default_statistics_target was originally just 10,
leading to a 3000-row sample size. So accepting two-appearance values as
MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

Second, the code also has a rule that potential MCVs need to have an
estimated frequency at least 25% larger than what it thinks the "average"
value's frequency is. A rule of that general form seems like a good idea,
but I now think the 25% threshold is far too small to do anything useful.
In particular, in any case like this where there are more distinct values
than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this rule is
totally useless to filter anything that we would otherwise consider as an
MCV. I wonder if we shouldn't make it be "at least double the estimated
average frequency".

Or possibly calculate the sample standard deviation and make use of that
to help decide on a more flexible cutoff than twice the avg frequency?

Are there any research papers that might help us here (I'm drowning in a
sea of barely relevant search results for most phrases I've tried so
far)? I recall there were some that Tom referenced when this stuff was
originally written.

On the other hand I do have access to some mathematicians specializing
in statistics - so can get their thoughts on this issue if you feel it
would be worthwhile.

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Mark Kirkwood




On 05/06/17 13:08, Mark Kirkwood wrote:

On 05/06/17 00:04, Erik Rijkers wrote:


On 2017-05-31 16:20, Erik Rijkers wrote:

On 2017-05-31 11:16, Petr Jelinek wrote:
[...]
Thanks to Mark's offer I was able to study the issue as it happened 
and

found the cause of this.

[0001-Improve-handover-logic-between-sync-and-apply-worker.patch]


This looks good:

-- out_20170531_1141.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
100 -- All is well.

So this is 100x a 1-minute test with 100x success. (This on the most
fastidious machine (slow disks, meagre specs) that used to give 15%
failures)


[Improve-handover-logic-between-sync-and-apply-worker-v2.patch]

No errors after (several days of) running variants of this. (2500x 1 
minute runs; 12x 1-hour runs)


Same here, no errors with the v2 patch applied (approx 2 days - all 1 
minute runs)




Further, reapplying the v1 patch (with a bit of editing as I wanted to 
apply it to my current master), gets a failure with missing rows in the 
history table quite quickly. I'll put back the v2 patch and resume runs 
with that, but I'm cautiously optimistic that the v2 patch solves the issue.


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Mark Kirkwood


On 05/06/17 00:04, Erik Rijkers wrote:


On 2017-05-31 16:20, Erik Rijkers wrote:

On 2017-05-31 11:16, Petr Jelinek wrote:
[...]

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

[0001-Improve-handover-logic-between-sync-and-apply-worker.patch]


This looks good:

-- out_20170531_1141.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
100 -- All is well.

So this is 100x a 1-minute test with 100x success. (This on the most
fastidious machine (slow disks, meagre specs) that used to give 15%
failures)


[Improve-handover-logic-between-sync-and-apply-worker-v2.patch]

No errors after (several days of) running variants of this. (2500x 1 
minute runs; 12x 1-hour runs)


Same here, no errors with the v2 patch applied (approx 2 days - all 1 
minute runs)


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

> On Jun 4, 2017, at 2:19 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> On 2017-06-04 14:16:14 -0700, Mark Dilger wrote:
>> Sorry, I was not clear.  What I meant to get at was that if you remove from 
>> the
>> executor all support for SRFs inside case statements, you might foreclose 
>> the option
>> of extending the syntax at some later date to allow aggregates over
>> SRFs.
> 
> Seems very unlikely that we'd ever want to do that.  The right way to do
> this is to simply move the SRF into the from list.  Having the executor
> support arbitrary sources of tuples would just complicate and slow down
> already complicated and slow code...
> 
> 
>> I'm
>> not saying that this works currently, but in principle if you allowed that 
>> SUM() that
>> I put up there, you'd get back exactly one row from it, same as you get from 
>> the
>> ELSE clause.  That would seem to solve the problem without going so far as
>> completely disallowing the SRF altogether.
> 
> But what would the benefit be?

In my example, the aggregate function is taking a column from the table as
an argument, so the output of the aggregate function needs to be computed per 
row,
not just once.  And if the function is expensive, or has side-effects, you might
only want it to execute for those rows where the CASE statement is true, rather
than for all of them.  You may get that same behavior using lateral or some 
such,
I'm uncertain, but in a complicated CASE statement, it be more straightforward
to write something like:

SELECT
CASE
WHEN t.x = 'foo' THEN expensive_aggfunc1(srf1(t.y,t.z))
WHEN t.x = 'bar' THEN expensive_aggfunc2(srf2(t.y,t.z))
WHEN t.x = 'baz' THEN expensive_aggfunc3(srf3(t.y,t.z))

WHEN t.x = 'zzz' THEN expensive_aggfuncN(srfN(t.y,t.z))
ELSE 5
END
FROM mytable t;

Than to try to write it any other way.

I'm not advocating anything here, even though it may sound that way to you.
I'm just thinking this thing through, given that you may be committing a removal
of functionality that we want back at some later time.

Out of curiosity, how would you rewrite what I have above such that the
aggregate function is not inside the case statement, and the expensive_aggfuncs
are only called for those (t.y,t.z) that are actually appropriate?

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity


> On Jun 4, 2017, at 12:35 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> Hi Mark,
> 
> On 2017-06-04 11:55:03 -0700, Mark Dilger wrote:
>>> Yea, I'm not a big fan of the either the pre v10 or the v10 behaviour of
>>> SRFs inside coalesce/case.  Neither is really resonable imo - I'm not
>>> sure a reasonable behaviour even exists.  IIRC I'd argued in the
>>> original SRF thread that we should just throw an error, and I think we'd
>>> concluded that we'd not do so for now.
>> 
>> I am trying to get my head around the type of query you and Tom
>> are discussing.  When you say you are unsure a reasonable behavior
>> even exists, are you saying such queries have no intuitive meaning?
> 
> I'm not saying that there aren't some cases where it's intuitive, but
> there's definitely lots that don't have intuitive meaning.  Especially
> when there are multiple SRFs in the same targetlist.
> 
> Do I understand correctly that you're essentially advocating for the <
> v10 behaviour?

No, I'm not advocating either way.  I merely wanted to know which queries
you and Tom were talking about.

>  It'd be nontrivial to implement that, without loosing
> the significant benefits (Significantly slower, higher code complexity,
> weird behaviour around multiple SRFs) gained by removing the previous
> implementation.   I'd really like to see some examples of when all of
> this is useful - I've yet to see a realistic one that's not just as
> easily written differently
> 
> 
>> Can you give an example of such a query which has no intuitive
>> meaning?  Perhaps I am not thinking about the right kind of queries.
>> I have been thinking about examples like:
>> 
>> SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
>>  FROM table_with_columns_x_and_y_and_z;
>> 
>> Which to me gives 'z' output rows per table row where y is true, and
>> one output row per table row where y is false.
> 
> Try any query that has one SRF outside of the CASE, and one inside.  In
> the old behaviour that'll make the total number of rows returned nearly
> undeterministic because of the least-common-multiple behaviour.
> 
>> That could be changed with an aggregate function such as:
>> SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
>>  FROM table_with_columns_x_and_y;
> 
> That query doesn't work.  First off, aggregates don't take set arguments
> (neither in old nor new releases), secondly aggregates are evaluated
> independently of CASE/COALESCE statements, thirdly you're missing group
> bys.  Those all are independent of the v10 changes.

Sorry, I was not clear.  What I meant to get at was that if you remove from the
executor all support for SRFs inside case statements, you might foreclose the 
option
of extending the syntax at some later date to allow aggregates over SRFs.  I'm
not saying that this works currently, but in principle if you allowed that 
SUM() that
I put up there, you'd get back exactly one row from it, same as you get from the
ELSE clause.  That would seem to solve the problem without going so far as
completely disallowing the SRF altogether.  The reasons for not putting a GROUP 
BY
clause in the example are (a) I don't know where it would go, syntactically, 
and (b)
it would defeat the purpose, because once you are grouping, you again have the
possibility of getting more than one row.

I'm not advocating implementing this; I'm just speculating about possible future
features in which SRFs inside a case statement might be allowed.

> 
>> Thanks, and my apologies if I am merely lacking sufficient imagination to
>> think of a proper example.
> 
> Might be worthwhile to reread the thread about the SRF reimplementation.
> 
> https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypow...@alap3.anarazel.de

This helps, thanks!

Mark Dilger



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity


> On Jun 4, 2017, at 11:55 AM, Mark Dilger <hornschnor...@gmail.com> wrote:
> 
> SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
>   FROM table_with_columns_x_and_y;

Sorry, this table is supposed to be the same as the previous one,

table_with_columns_x_and_y_and_z



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity


> On Jun 2, 2017, at 8:11 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> Hi,
> 
> 
> On 2017-06-02 22:53:00 -0400, Tom Lane wrote:
>> I think you've got enough on your plate.  I can take care of whatever
>> we decide to do here.
> 
> Thanks.
> 
> 
>> Andres Freund <and...@anarazel.de> writes:
>>>> Another possibility is to say that we've broken this situation
>>>> irretrievably and we should start throwing errors for SRFs in
>>>> places where they'd be conditionally evaluated.  That's not real
>>>> nice perhaps, but it's better than the way things are right now.
>> 
>>> I'd be ok with that too, but I don't really see a strong need so far.
>> 
>> The argument for this way is basically that it's better to break
>> apps visibly than silently.
> 
> Right, I got that.
> 
> 
>> The behavior for SRF-inside-CASE is
>> not going to be the same as before even if we implement the fix
>> I suggest above, and it's arguable that this new behavior is not
>> at all intuitive.
> 
> Yea, I'm not a big fan of the either the pre v10 or the v10 behaviour of
> SRFs inside coalesce/case.  Neither is really resonable imo - I'm not
> sure a reasonable behaviour even exists.  IIRC I'd argued in the
> original SRF thread that we should just throw an error, and I think we'd
> concluded that we'd not do so for now.

I am trying to get my head around the type of query you and Tom
are discussing.  When you say you are unsure a reasonable behavior
even exists, are you saying such queries have no intuitive meaning?

Can you give an example of such a query which has no intuitive
meaning?  Perhaps I am not thinking about the right kind of queries.
I have been thinking about examples like:

SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
FROM table_with_columns_x_and_y_and_z;

Which to me gives 'z' output rows per table row where y is true, and
one output row per table row where y is false.  That could be changed
with an aggregate function such as:

SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
FROM table_with_columns_x_and_y;

Which to me gives one output row per table row regardless of whether y
is true.


Thanks, and my apologies if I am merely lacking sufficient imagination to
think of a proper example.

Mark Dilger


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-02 Thread Mark Kirkwood


On 03/06/17 11:10, Petr Jelinek wrote:


On 02/06/17 22:29, Petr Jelinek wrote:

On 02/06/17 08:55, Mark Kirkwood wrote:

On 02/06/17 17:11, Erik Rijkers wrote:


On 2017-06-02 00:46, Mark Kirkwood wrote:

On 31/05/17 21:16, Petr Jelinek wrote:

I'm seeing a new failure with the patch applied - this time the
history table has missing rows. Petr, I'll put back your access :-)

Is this error during 1-minute runs?

I'm asking because I've moved back to longer (1-hour) runs (no errors
so far), and I'd like to keep track of what the most 'vulnerable'
parameters are.


Yeah, still using your test config (with my minor modifications).

When I got the error the 1st time, I did a complete make clean and
rebuildbut it is still possible I've 'done it wrong' - so
independent confirmation would be good!

Well, I've seen this issue as well while I was developing the fix, but
the patch I proposed fixed it for me as well as the original issue.


While I was testing something for different thread I noticed that I
manage transactions incorrectly in this patch. Fixed here, I didn't test
it much yet (it takes a while as you know :) ). Not sure if it's related
to the issue you've seen though.


Ok - I've applied this version, and running tests again. I needed to do 
a git pull to apply the patch, so getting some other changes too!


Cheers

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-06-02 Thread Mark Dilger


> On Jun 2, 2017, at 9:59 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Robert Haas <robertmh...@gmail.com> writes:
>> Also, you're attacking a straw man. Accidentally storing a meaningless
>> parse location in the catalog isn't a feature, and we shouldn't
>> pretend otherwise.
> 
> It's not meaningless, and it is a feature, useful for debugging.
> Otherwise you'd have a hard time telling one occurrence of e.g. "+" from
> another when you're trying to make sense of a stored tree.  We worked out
> all this behavior ages ago for other expression node trees that are stored
> in the catalogs (default expressions, index expressions, check
> constraints, etc etc) and I don't see a reason for partition expressions
> to be different.

Ok, that makes more sense now.  Thanks for clarifying the history of how
and why the location field from parsing is getting stored.

> I agree that this means you can't just strcmp a couple of stored node tree
> strings to decide if they're equal, but you couldn't anyway --- a moment's
> perusal of equalfuncs.c will show you other special cases, each with their
> own rationale.  Maybe you'd like to complain that every one of those
> rationales is wrongheaded but I do not think you will get far.
> 
> I think the best advice for Mark is to look at pg_get_expr() output and
> see if that matches.

Yes, I'm already doing that based on the discussion so far.

Mark Dilger


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-02 Thread Mark Kirkwood


On 02/06/17 17:11, Erik Rijkers wrote:


On 2017-06-02 00:46, Mark Kirkwood wrote:

On 31/05/17 21:16, Petr Jelinek wrote:

I'm seeing a new failure with the patch applied - this time the
history table has missing rows. Petr, I'll put back your access :-)


Is this error during 1-minute runs?

I'm asking because I've moved back to longer (1-hour) runs (no errors 
so far), and I'd like to keep track of what the most 'vulnerable' 
parameters are.




Yeah, still using your test config (with my minor modifications).

When I got the error the 1st time, I did a complete make clean and 
rebuildbut it is still possible I've 'done it wrong' - so 
independent confirmation would be good!


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-01 Thread Mark Kirkwood


On 31/05/17 21:16, Petr Jelinek wrote:


On 29/05/17 23:06, Mark Kirkwood wrote:

On 29/05/17 23:14, Petr Jelinek wrote:


On 29/05/17 03:33, Jeff Janes wrote:


What would you want to look at?  Would saving the WAL from the master be
helpful?


Useful info is, logs from provider (mainly the logical decoding logs
that mention LSNs), logs from subscriber (the lines about when sync
workers finished), contents of the pg_subscription_rel (with srrelid
casted to regclass so we know which table is which), and pg_waldump
output around the LSNs found in the logs and in the pg_subscription_rel
(+ few lines before and some after to get context). It's enough to only
care about LSNs for the table(s) that are out of sync.


I have a run that aborted with failure (accounts table md5 mismatch).
Petr - would you like to have access to the machine ? If so send me you
public key and I'll set it up.

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

The busy loop in apply stops at the point when worker shmem state
indicates that table synchronization was finished, but that might not be
visible in the next transaction if it takes long to flush the final
commit to disk so we might ignore couple of transactions for given table
in the main apply because we think it's still being synchronized. This
also explains why I could not reproduce it on my testing machine (fast
ssd disk array where flushes were always fast) and why it happens
relatively rarely because it's one specific commit during the whole
synchronization process that needs to be slow.

So as solution I changed the busy loop in the apply to wait for
in-catalog status rather than in-memory status to make sure things are
really there and flushed.

While working on this I realized that the handover itself is bit more
complex than necessary (especially for debugging and for other people
understanding it) so I did some small changes as part of this patch to
make the sequences of states table goes during synchronization process
to always be the same. This might cause unnecessary update per one table
synchronization process in some cases but that seems like small enough
price to pay for clearer logic. And it also fixes another potential bug
that I identified where we might write wrong state to catalog if main
apply crashed while sync worker was waiting for status update.

I've been running tests on this overnight on another machine where I was
able to reproduce  the original issue within few runs (once I found what
causes it) and so far looks good.





I'm seeing a new failure with the patch applied - this time the history 
table has missing rows. Petr, I'll put back your access :-)


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-06-01 Thread Mark Dilger

> On Jun 1, 2017, at 6:31 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <hornschnor...@gmail.com> writes:
>> When you guys commit changes that impact partitioning, I notice, and change
>> my code to match.  But in this case, it seemed to me the change that got
>> committed was not thought through, and it might benefit the community for
>> me to point it out, rather than quietly make my code behave the same as
>> what got committed.
> 
> Let me explain the project standards in words of one syllable: user code
> should not examine the contents of node trees.  That's what pg_get_expr
> is for.  There is not, never has been, and never will be any guarantee
> that we won't whack those structures around in completely arbitrary ways,
> as long as we do a catversion bump along with it.

I have already dealt with this peculiarity and don't care much at this point, 
but
I think your characterization of my position is inaccurate:

I'm really not talking about examining the contents of node trees.  I'm only
talking about comparing two of them for equality, and getting false as an 
answer,
when they are actually equal.  Asking "Is A == B" is not "examining the 
contents",
it is asking the project supplied comparator function to examine the contents,
which is on par with asking the project supplied pg_get_expr function to do so.
There is currently only one production in gram.y in the public sources that
creates partitions, so you don't notice the relpartbound being dependent on
which grammar production defined the table.  I notice, and think it is an odd
thing to encode in the the relpartbound field.  Bumping the catversion is 
irrelevant
if you have two or more different productions in gram.y that give rise to these
field values.  Apparently, you don't care.  Fine by me.  It just seemed to me an
oversight when I first encountered it, rather than something anybody would
intentionally commit to the sources.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

> On May 31, 2017, at 6:32 PM, Amit Langote <langote_amit...@lab.ntt.co.jp> 
> wrote:
> 
> On 2017/06/01 4:50, Robert Haas wrote:
>> On Wed, May 31, 2017 at 3:40 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>>> recent changes have introduced the :location field to the partboundspec
>>> in pg_catalog.pg_class.  This means that if two identical tables with
>>> identical partitioning scheme are created, but one is done before a change
>>> to gram.y, and the other after a change to gram.y, the relpartbound fields
>>> for those two tables could show up as different.
>>> 
>>> Can we perhaps remove the :location field here?  If not, could somebody
>>> please defend why this belongs in the catalog entries for the table?  Sorry
>>> if I am missing something obvious...
> 
> I now think it's kind of annoying too, although not exactly unprecedented
> as others have pointed out.  As you well might know, the location field in
> the parse node is to position the error cursor at the correct point in the
> erring command text. 

I knew about the location field already, but not that it was already occurring
elsewhere in the catalogs.  I just never paid attention to any of the columns
that are doing so.  Alvaro's criticisms of my complaint were quite informative.
(Thanks, Alvaro.)  I think standardizing the location field to -1 at some point
in all such places would be a good idea, though I am not sufficiently qualified
to say if that should be in 10 or 11, nor whether doing so might cause
backwards compatibility issues, nor whether those would be too much cost
to justify the changes.

> By the way, didn't you first have to come across that?  The commit
> (80f583ffe93) that introduced location field to partboundspec also bumped
> up the catalog version, so you would have to reinitialize the database
> directory anyway.

I don't have my work in production yet.  Each time I merge changes into
my repository, I run "make check-world", and that of course re-initializes
the data directories for the test databases.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

> On May 31, 2017, at 4:00 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> 
> Mark Dilger wrote:
> 
>>>>
>>>>relpartbound
>>>>  | 
>>>>
>>>> relpartbound   
>>>> | ?column? | ?column?
>>>> ++--+--
>>>> {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
>>>> :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
>>>> false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
>>>> :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l 
>>>> :listdatums ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 
>>>> :constlen 2 :constbyval true :constisnull false :location -1 :constvalue 2 
>>>> [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f
>>>> | f  
>>>> (1 row)
> 
>> I concede that mitigates the problem somewhat, though I still think a user 
>> may look
>> at pg_class, see there is a column that appears to show the partition 
>> boundaries,
>> and then decide to check whether two tables have the same partition 
>> boundaries
>> by comparing those fields, without passing them first through pg_get_expr(), 
>> a
>> function they may never have heard of.
> 
> How do you expect that the used would actually interpret the above
> weird-looking value?  There's no way to figure out what operations are
> being done in that ugly blob of text.

I don't know how the average user would use it.  I can only tell you what
I am doing, which may be on the fringe of what users do typically.

I have modified the system to add a number of catalog tables not normally
present in postgres.  A few of those catalog tables are partitioned.  Since
they have to be set up at bootstrap time, I can't use the CREATE TABLE
syntax in some src/backend/catalog/*.sql file to create them, I have to
create them with header files, genbki.pl and friends.  There is no support
for this, so I created my own.  That puts me at risk of getting out of sync
with the public sources implementation of partitioning.  As such, I have
regression tests that create at run time partitioned tables that have all the
same columns and boundaries as my catalog tables, and I compare the
pg_class entries against each other to make sure there are no inconsistencies.
When you guys commit changes that impact partitioning, I notice, and change
my code to match.  But in this case, it seemed to me the change that got
committed was not thought through, and it might benefit the community for
me to point it out, rather than quietly make my code behave the same as
what got committed.

I doubt too many people will follow in my footsteps here, but other people
may hit this :location peculiarity for other reasons.

Once again, this is just to give you context about why I said anything in the
first place.

> I suspect it would be better to reduce the location field to -1 as
> proposed, though, since the location has no actual meaning once the node
> is stored standalone rather than as part of a larger command.  However
> it's clear that we're not consistent about doing this elsewhere.

I have no opinion about whether this should be done for 10.0, given the
release schedule.  Changing it for 10.0 or 11.0 seems reasonable to me.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle


> On May 31, 2017, at 3:42 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> On 2017-05-31 15:38:58 -0700, Mark Dilger wrote:
>>> Normal users aren't going to make sense of node trees in the first
>>> place.  You should use pg_get_expr for it:
>>> postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class 
>>> WHERE relpartbound IS NOT NULL;
>>> ┌──┐
>>> │ pg_get_expr  │
>>> ├──┤
>>> │ FOR VALUES IN (1, 2) │
>>> └──┘
>>> (1 row)
>> 
>> I concede that mitigates the problem somewhat, though I still think a user 
>> may look
>> at pg_class, see there is a column that appears to show the partition 
>> boundaries,
>> and then decide to check whether two tables have the same partition 
>> boundaries
>> by comparing those fields, without passing them first through pg_get_expr(), 
>> a
>> function they may never have heard of.
>> 
>> To me, it seems odd to immortalize a SQL parsing field in the catalog 
>> definition of
>> the relation, but perhaps that's just my peculiar sensibilities.  If the 
>> community is more
>> on your side, I'm not going to argue it.
> 
> Given that various other node trees stored in the catalog also have
> location fields, about which I recall no complaints, I don't think this
> is a significant issue.  Nor something that I think makes sense to solve
> in isolation, without tackling all stored expressions.

Ok.  Just to clarify, I didn't come up with this problem as part of some general
code review.  I merged the recent changes into my tree, and my regression
tests broke.  That's fine; that's what they are there to do.  Break, and in so 
doing,
alert me to the fact that the project has changed things that will require me to
make modifications.  It just seemed strange to me that I should be changing 
stuff
to try to get the :location field to be equal in my code to the :location field 
in the
public sources, since it seems to serve no purpose.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle


> On May 31, 2017, at 3:17 PM, Andres Freund <and...@anarazel.de> wrote:
> 
> On 2017-05-31 15:06:06 -0700, Mark Dilger wrote:
>> That's cold comfort, given that most users will be looking at the pg_class
>> table and not writing C code that compares Node objects.  I wrote a bit of
>> regression test logic that checks, and sure enough the relpartbound field
>> shows up as unequal:
>>  
>>   relpartbound   
>>  
>> SELECT a.relpartbound, b.relpartbound, a.relpartbound = b.relpartbound, 
>> a.relpartbound::text = b.relpartbound::text
>>FROM pg_class a, pg_class b
>>WHERE a.relname = 'acct_partitioned_1'
>>  AND b.relname = 'acct_partitioned_2';
>>  
>>   relpartbound   
>>   |  
>>   
>> relpartbound   | 
>> ?column? | ?column?
>> ++--+--
>> {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
>> :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
>> false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
>> :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l :listdatums 
>> ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 :constlen 2 
>> :constbyval true :constisnull false :location -1 :constvalue 2 [ 0 0 0 0 0 0 
>> 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f| f  
>> (1 row)
> 
> Normal users aren't going to make sense of node trees in the first
> place.  You should use pg_get_expr for it:
> postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class WHERE 
> relpartbound IS NOT NULL;
> ┌──┐
> │ pg_get_expr  │
> ├──┤
> │ FOR VALUES IN (1, 2) │
> └──┘
> (1 row)

I concede that mitigates the problem somewhat, though I still think a user may 
look
at pg_class, see there is a column that appears to show the partition 
boundaries,
and then decide to check whether two tables have the same partition boundaries
by comparing those fields, without passing them first through pg_get_expr(), a
function they may never have heard of.

To me, it seems odd to immortalize a SQL parsing field in the catalog 
definition of
the relation, but perhaps that's just my peculiar sensibilities.  If the 
community is more
on your side, I'm not going to argue it.

Mark Dilger



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle


> On May 31, 2017, at 2:49 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> 
> Mark Dilger wrote:
>> Hackers,
>> 
>> recent changes have introduced the :location field to the partboundspec
>> in pg_catalog.pg_class.  This means that if two identical tables with
>> identical partitioning scheme are created, but one is done before a change
>> to gram.y, and the other after a change to gram.y, the relpartbound fields
>> for those two tables could show up as different.
> 
> Actually, if you look at equalfuncs.c, you'll note that locations are
> not really compared:
> 
> /* Compare a parse location field (this is a no-op, per note above) */
> #define COMPARE_LOCATION_FIELD(fldname) \
>   ((void) 0)
> 
> where the referenced note is:
> 
> * NOTE: it is intentional that parse location fields (in nodes that have
> * one) are not compared.  This is because we want, for example, a variable
> * "x" to be considered equal() to another reference to "x" in the query.

That's cold comfort, given that most users will be looking at the pg_class
table and not writing C code that compares Node objects.  I wrote a bit of
regression test logic that checks, and sure enough the relpartbound field
shows up as unequal:

relpartbound

SELECT a.relpartbound, b.relpartbound, a.relpartbound = b.relpartbound, 
a.relpartbound::text = b.relpartbound::text
FROM pg_class a, pg_class b
WHERE a.relname = 'acct_partitioned_1'
  AND b.relname = 'acct_partitioned_2';

relpartbound

|   
 relpartbound   

 | ?column? | ?column?
++--+--
 {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
:consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull false 
:location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> :upperdatums 
<> :location 82} | {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST 
:consttype 23000 :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true 
:constisnull false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) 
:lowerdatums <> :upperdatums <> :location 73} | f| f  
(1 row)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_class.relpartbound definition overly brittle

Hackers,

recent changes have introduced the :location field to the partboundspec
in pg_catalog.pg_class.  This means that if two identical tables with
identical partitioning scheme are created, but one is done before a change
to gram.y, and the other after a change to gram.y, the relpartbound fields
for those two tables could show up as different.

Can we perhaps remove the :location field here?  If not, could somebody
please defend why this belongs in the catalog entries for the table?  Sorry
if I am missing something obvious...

Thanks,

Mark Dilger


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces


> On May 31, 2017, at 7:58 AM, Bruce Momjian <br...@momjian.us> wrote:
> 
> On Wed, May 31, 2017 at 07:53:22AM -0700, Mark Dilger wrote:
>>> Just to clarify, the TEMPORARY clause would allow the tablespace to
>>> start up empty, while normal tablespaces can't do that, right?  One big
>>> problem is that we don't have any way to prevent non-temporary
>>> tablespaces from being created on transient storage.  I wonder if we
>>> should document this restriction, but it seems awkward to do.
>> 
>> It depends what you mean by allowing the tablespace to start up empty.
>> It must not be empty once users can run queries, since the catalogs will 
>> still
>> have entries for the tablespace and its dependent objects.  So, what must
>> happen is that during startup the tablespace and its temporary tables and
>> indexes get recreated if they are missing.
> 
> Uh, I thought only the sessions that created the temporary objects could
> see them, and since they are not in WAL and autovacuum can't see them,
> their non-existence in a temporary tablespace would not be a problem.

You are correct.  I was thinking about an extension to allow unlogged
tablespaces on temporary filesystems, but got the words "unlogged" and
"temporary" mixed up in my thinking and in what I wrote.  I should have
written that unlogged tablespaces would only host unlogged tables and
unlogged indexes, such that users are not surprised to find their data
missing.

On reflection, I think both features are worthwhile, and not at all exclusive
of each other, though unlogged tablespaces is probably considerably more
work to implement.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces


> On May 31, 2017, at 6:04 AM, Bruce Momjian <br...@momjian.us> wrote:
> 
> On Wed, May 31, 2017 at 07:53:53AM -0400, Robert Haas wrote:
>> On Tue, May 30, 2017 at 6:50 PM, Mark Dilger <hornschnor...@gmail.com> wrote:
>>>> On May 29, 2017, at 11:53 AM, Bruce Momjian <br...@momjian.us> wrote:
>>>> Right now we don't document that temp_tablespaces can use
>>>> non-restart-safe storage, e.g. /tmp, ramdisks.  Would this be safe?
>>>> Should we document this?
>>> 
>>> The only safe way to do temporary tablespaces that I have found is to extend
>>> the grammar to allow CREATE TEMPORARY TABLESPACE, and then refuse
>>> to allow the creation of any non-temporary table (or index on same) in that
>>> tablespace.  Otherwise, it is too easy to be surprised to discover that your
>>> table contents have gone missing.
>> 
>> I think this would be a sensible approach.
> 
> Just to clarify, the TEMPORARY clause would allow the tablespace to
> start up empty, while normal tablespaces can't do that, right?  One big
> problem is that we don't have any way to prevent non-temporary
> tablespaces from being created on transient storage.  I wonder if we
> should document this restriction, but it seems awkward to do.

It depends what you mean by allowing the tablespace to start up empty.
It must not be empty once users can run queries, since the catalogs will still
have entries for the tablespace and its dependent objects.  So, what must
happen is that during startup the tablespace and its temporary tables and
indexes get recreated if they are missing.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces

2017-05-30 Thread Mark Dilger

> On May 29, 2017, at 11:53 AM, Bruce Momjian <br...@momjian.us> wrote:
> 
> Right now we don't document that temp_tablespaces can use
> non-restart-safe storage, e.g. /tmp, ramdisks.  Would this be safe? 
> Should we document this?

The only safe way to do temporary tablespaces that I have found is to extend
the grammar to allow CREATE TEMPORARY TABLESPACE, and then refuse
to allow the creation of any non-tempoary table (or index on same) in that
tablespace.  Otherwise, it is too easy to be surprised to discover that your
table contents have gone missing.

If the project wanted to extend the grammar and behavior in this direction,
maybe temp_tablespaces would work that way?  I'm not so familiar with what
the temp_tablespaces GUC is for -- only ever implemented what I described
above.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] syscache entries out of order

2017-05-30 Thread Mark Dilger

Peter,

Just FYI, as of 665d1fad99e7b11678b0d5fa24d2898424243cd6, syscache.h
entries are not in alphabetical order, which violates the coding standard
specified in the comment for these entries.  In particular, it is the 
PUBLICATION
and SUBSCRIPTION entries that are mis-ordered.

Sorry for my pedantry.  Patch attached.

Mark Dilger



syscache.patch.0
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-05-29 Thread Mark Rofail

>
> rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
>  oid  | opfmethod |opfname | opfnamespace | opfowner
> --+---++--+--
>  2745 |  2742 | array_ops  |   11 |   10
>  3659 |  2742 | tsvector_ops   |   11 |   10
>  4036 |  2742 | jsonb_ops  |   11 |   10
>  4037 |  2742 | jsonb_path_ops |   11 |   10
> (4 rows)

I am particulary intrested in array_ops but I have failed in locating the
code behind it. Where is it reflected in the source code

Best Regards,
Mark Rofail

Re: [HACKERS] logical replication - still unstable after all these months

2017-05-29 Thread Mark Kirkwood


On 29/05/17 23:14, Petr Jelinek wrote:


On 29/05/17 03:33, Jeff Janes wrote:


What would you want to look at?  Would saving the WAL from the master be
helpful?


Useful info is, logs from provider (mainly the logical decoding logs
that mention LSNs), logs from subscriber (the lines about when sync
workers finished), contents of the pg_subscription_rel (with srrelid
casted to regclass so we know which table is which), and pg_waldump
output around the LSNs found in the logs and in the pg_subscription_rel
(+ few lines before and some after to get context). It's enough to only
care about LSNs for the table(s) that are out of sync.



I have a run that aborted with failure (accounts table md5 mismatch). 
Petr - would you like to have access to the machine ? If so send me you 
public key and I'll set it up.


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months




On 29/05/17 13:33, Jeff Janes wrote:
On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood 
<mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> 
wrote:


On 28/05/17 19:01, Mark Kirkwood wrote:


So running in cloud land now...so for no errors - will update.




The framework ran 600 tests last night, and I see 3 'NOK' results,
i.e 3 failed test runs (all scale 25 and 8 pgbench clients). Given
the way the test decides on failure (gets tired of waiting for the
table md5's to match) - it begs the question 'What if it had
waited a bit longer'? However from what I can see in all cases:

- the rowcounts were the same in master and replica
- the md5 of pgbench_accounts was different


All four tables should be wrong if there is still a transaction it is 
waiting for, as all the changes happen in a single transaction.




Yeah, my thought exactly.

I also got a failure, after 87 iterations of a similar test case.  It 
waited for hours, as mine requires manual intervention to stop 
waiting.  On the subscriber, one account still had a zero balance, 
while the history table on the subscriber agreed with both history and 
accounts on the publisher and the account should not have been zero, 
so definitely a transaction atomicity got busted.




Interesting, great that we are seeing the same thing.

I altered the script to also save the tellers and branches tables and 
repeated the runs, but so far it hasn't failed again in over 800 
iterations using the altered script.



...so does seem possible that there is some bug being tickled
here. Unfortunately the test framework blasts away the failed
tables and subscription and continues on...I'm going to amend it
to stop on failure so I can have a closer look at what happened.


What would you want to look at?  Would saving the WAL from the master 
be helpful?





Good question - I initially wanted to see if anything changed if I 
waited longer (which you have already figured out) and what was actually 
wrong with the accounts record (which you have gotten to as well)! Nice. 
After that, I'd thought of doing another transaction on an accounts 
record that lives in the same page as the 'gammy' one on the master - 
generally poke about and see what is happening :-)


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months


On 29/05/17 16:26, Erik Rijkers wrote:


On 2017-05-29 00:17, Mark Kirkwood wrote:

On 28/05/17 19:01, Mark Kirkwood wrote:


So running in cloud land now...so for no errors - will update.


The framework ran 600 tests last night, and I see 3 'NOK' results, i.e
3 failed test runs (all scale 25 and 8 pgbench clients). Given the way


Could you also give the params for the successful runs?


Scale 25, clients 90 and 64, scale 5 clients 90, 64 and 8 (i.e the 
defaults in the driver script).


Can you say anything about hardware?  (My experience is that older, 
slower, 'worse' hardware makes for more fails.)


It's running in a openstack cloud (so is a libvirt guest): 4 cpus, 4 GB 
ram and 2 disks: one for each Postgres instance, both formatted xfs. Hmm 
so maybe I should run a VM on my workstation and crank the IOPS limit 
way down...in the meantime I'll just let it run :-)



Many thanks, by the way.  I'm glad that it turns out I'm probably not 
doing something uniquely stupid (although I'm not glad that there 
seems to be a bug, and an elusive one at that)





Yeah looks like something subtle :-( Hopefully now its out in the open 
we'll all figure it together!


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months


On 28/05/17 19:01, Mark Kirkwood wrote:



So running in cloud land now...so for no errors - will update.





The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3 
failed test runs (all scale 25 and 8 pgbench clients). Given the way the 
test decides on failure (gets tired of waiting for the table md5's to 
match) - it begs the question 'What if it had waited a bit longer'? 
However from what I can see in all cases:


- the rowcounts were the same in master and replica
- the md5 of pgbench_accounts was different

...so does seem possible that there is some bug being tickled here. 
Unfortunately the test framework blasts away the failed tables and 
subscription and continues on...I'm going to amend it to stop on failure 
so I can have a closer look at what happened.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months