date:20170512

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread Dilip Kumar

On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
> Hi,
>
> Please find the following updated patches attached:

I have done some testing with the new patch, most of the cases worked
as per the expectation except below

I expect the planner to select only "Seq Scan on t1" whereas it's
scanning both the partitions?

create table t (a int, b varchar) partition by hash(a);
create table t1 partition of t for values with (modulus 8, remainder 0);
create table t2 partition of t for values with (modulus 8, remainder 1);

postgres=# explain select * from t where a=8;
QUERY PLAN
--
 Append  (cost=0.00..51.75 rows=12 width=36)
   ->  Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 8)
   ->  Seq Scan on t2  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 8)
(5 rows)


Some cosmetic comments.
---
+ RangeVar   *rv = makeRangeVarFromNameList(castNode(List, nameEl->arg));
+

Useless Hunk.

 /*
- * Build a CREATE SEQUENCE command to create the sequence object, and
- * add it to the list of things to be done before this CREATE/ALTER
- * TABLE.
+ * Build a CREATE SEQUENCE command to create the sequence object, and add
+ * it to the list of things to be done before this CREATE/ALTER TABLE.
  */

Seems like, in src/backend/parser/parse_utilcmd.c, you have changed
the existing code with
pgindent.  I think it's not a good idea to mix pgindent changes with your patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Amit Kapila

On Sat, May 13, 2017 at 1:08 AM, Robert Haas  wrote:
> On Fri, May 12, 2017 at 2:45 PM, Tom Lane  wrote:
>
> Maybe a shorter argument for hash partitioning is that not one but two
> different people proposed patches for it within months of the initial
> partitioning patch going in.  When multiple people are thinking about
> implementing the same feature almost immediately after the
> prerequisite patches land, that's a good clue that it's a desirable
> feature.  So I think we should try to solve the problems, rather than
> giving up.
>

Can we think of defining separate portable hash functions which can be
used for the purpose of hash partitioning?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multi-column range partition constraint

2017-05-12 Thread Amit Langote

On Sat, May 13, 2017 at 11:01 AM, Robert Haas  wrote:
> On Fri, May 12, 2017 at 12:46 PM, Robert Haas  wrote:
>> On Fri, May 12, 2017 at 3:26 AM, Amit Langote
>>  wrote:
>>> Fixed.
>>
>> This seems to be the same patch version as last time, so none of the
>> things you mention as fixed are, in fact, fixed.

Really sorry about the mix-up.

> We are kind of running out of time here before beta1, but Amit thinks
> he can get the correct patch posted within about 24 hours, so I'm
> going to wait for that to happen rather than trying to revise his
> previous version myself.  Hence, next update tomorrow.  Argh.

Attached is the correct version.

Thanks,
Amit


0001-Emit-correct-range-partition-constraint-expression.patch
Description: Binary data


0002-Add-pg_get_partition_constraintdef.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Andres Freund

On 2017-05-12 21:56:30 -0400, Robert Haas wrote:
> Cheap isn't free, though.  It's got a double-digit percentage overhead
> rather than a large-multiple-of-the-runtime overhead as triggers do,
> but people still won't want to pay it unnecessarily, I think.

That should be partiall addressable with reasonable amounts of
engineering though.  Efficiently computing the target partition in a
hash-partitioned table can be implemented very efficiently, and adding
infrastructure for multiple bulk insert targets in copy should be quite
doable as well. It's also work that's generally useful, not just for
backups.

The bigger issue to me here wrt pg_dump is that partitions can restored
in parallel, but that'd probably not work as well if dumped
separately. Unless we'd do the re-routing on load, rather than when
dumping - which'd also increase cache locality, by most of the time
(same architecture/encoding/etc) having one backend insert into the same
partition.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Addition of pg_dump --no-publications

2017-05-12 Thread Michael Paquier

On Fri, May 12, 2017 at 10:19 PM, Peter Eisentraut
 wrote:
> On 5/11/17 21:59, Michael Paquier wrote:
>> On Thu, May 11, 2017 at 3:19 PM, Michael Paquier
>>  wrote:
>>> I imagine that pg_dump -s would be the basic operation that users
>>> would do first before creating a subcription on a secondary node, but
>>> what I find surprising is that publications are dumped by default. I
>>> don't find confusing that those are actually included by default to be
>>> consistent with the way subcriptions are handled, what I find
>>> confusing is that there are no options to not dump them, and no
>>> options to bypass their restore.
>>>
>>> So, any opinions about having pg_dump/pg_restore --no-publications?
>>
>> And that's really a boring patch, giving the attached.
>
> committed

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multi-column range partition constraint

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 12:46 PM, Robert Haas  wrote:
> On Fri, May 12, 2017 at 3:26 AM, Amit Langote
>  wrote:
>> Fixed.
>
> This seems to be the same patch version as last time, so none of the
> things you mention as fixed are, in fact, fixed.

We are kind of running out of time here before beta1, but Amit thinks
he can get the correct patch posted within about 24 hours, so I'm
going to wait for that to happen rather than trying to revise his
previous version myself.  Hence, next update tomorrow.  Argh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 7:36 PM, David Fetter  wrote:
> On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote:
>> On 5/12/17 18:13, Alvaro Herrera wrote:
>> > I think for logical replication the tuple should appear as being in the
>> > parent table, not the partition.  No?
>>
>> Logical replication replicates base table to base table.  How those
>> tables are tied together into a partitioned table or an inheritance tree
>> is up to the system catalogs on each side.
>
> This seems like a totally reasonable approach to pg_dump, especially
> in light of the fact that logical replication already (and quite
> reasonably) does it this way.  Hard work has been done to make
> tuple-routing cheap, and this is one of the payoffs.

Cheap isn't free, though.  It's got a double-digit percentage overhead
rather than a large-multiple-of-the-runtime overhead as triggers do,
but people still won't want to pay it unnecessarily, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-12 Thread Robert Haas

On Wed, May 10, 2017 at 10:05 PM, Michael Paquier
 wrote:
> Regarding the patch, this is way to close to the progress facility
> already in place. So why don't you extend it for the executor?

I don't think that's a good idea.  The existing progress facility
advertises a fixed number of counters with a command-type-specific
interpretation, but for *query* progress reporting, we really need a
richer model.  A query plan can contain an unbounded number of
executor nodes and Remi's idea is, quite reasonably, to report
something about each one.

>From a design point of view, I think a patch like this has to clear a
number of hurdles.  It needs to:

1. Decide what data to expose.  The sample output looks reasonable in
this case, although the amount of code churn looks really high.

2. Find a way to advertise the data that it exposes.  This patch looks
like it allocates a half-MB of shared memory per backend and gives up
if that's not enough.

3. Find a way to synchronize the access to the data that it exposes.
This patch ignores that problem completely, so multiple progress
report commands running at the same time will behave unpredictably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Hm.  That would behave less than desirably if getObjectClass() could
>> return a value that wasn't a member of the enum, but it's hard to
>> credit that happening.  I guess I'd vote for removing the default:
>> case from all of the places that have "switch (getObjectClass(object))",
>> as forgetting to add an entry seems like a much larger hazard.

> Ignoring the one in alter.c's AlterObjectNamespace_oid, which only
> handles a small subset of the enum values, that gives the following
> warnings:

> /pgsql/source/master/src/backend/catalog/dependency.c: In function 
> 'doDeletion':
> /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
> enumeration value 'OCLASS_ROLE' not handled in switch [-Wswitch]
>   switch (getObjectClass(object))
>   ^
> /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
> enumeration value 'OCLASS_DATABASE' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
> enumeration value 'OCLASS_TBLSPACE' not handled in switch [-Wswitch]

> /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
> enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch]

Hm.  I think it's intentional that shared objects are not handled there,
but I wonder whether the lack of SUBSCRIPTION represents a bug.
Anyway I think it would be fine to add those to the switch as explicit
error cases.

> /pgsql/source/master/src/backend/commands/tablecmds.c: In function 
> 'ATExecAlterColumnType':
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_AM' not handled in switch [-Wswitch]
>switch (getObjectClass(&foundObject))
>^
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_EVENT_TRIGGER' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_PUBLICATION' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_PUBLICATION_REL' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch]
> /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
> enumeration value 'OCLASS_TRANSFORM' not handled in switch [-Wswitch]

All of those ought to go into the category that prints
"unexpected object depending on column", I think; certainly
"unrecognized object class" is not a better report, and the fact that
we've evidently not touched this switch in the last few additions of
object types provides fodder for the idea that the default: is unhelpful.
(Not to mention that not having OCLASS_STATISTIC_EXT here is an outright
bug as of today...)  So losing the default: case here seems good to me.

Alternatively, we could drop the explicit listing of oclasses we're not
expecting to see, and let the default: case be "unexpected object
depending on column".  Treating the default separately is only of value if
we'd expect getObjectDescription to fail, but there's no good reason for
this code to be passing judgment on whether that might happen.

What do we want this to do about statistics objects?  We could either
throw a feature-not-implemented error or try to update the stats
object for the new column type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread David Fetter

On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote:
> On 5/12/17 18:13, Alvaro Herrera wrote:
> > I think for logical replication the tuple should appear as being in the
> > parent table, not the partition.  No?
> 
> Logical replication replicates base table to base table.  How those
> tables are tied together into a partitioned table or an inheritance tree
> is up to the system catalogs on each side.

This seems like a totally reasonable approach to pg_dump, especially
in light of the fact that logical replication already (and quite
reasonably) does it this way.  Hard work has been done to make
tuple-routing cheap, and this is one of the payoffs.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Peter Eisentraut

On 5/12/17 18:13, Alvaro Herrera wrote:
> I think for logical replication the tuple should appear as being in the
> parent table, not the partition.  No?

Logical replication replicates base table to base table.  How those
tables are tied together into a partitioned table or an inheritance tree
is up to the system catalogs on each side.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 5/12/17 14:23, Robert Haas wrote:
> > One alternative would be to change the way that we dump and restore
> > the data.  Instead of dumping the data with the individual partitions,
> > dump it all out for the parent and let tuple routing sort it out at
> > restore time.
> 
> I think this could be a pg_dump option.  One way it dumps out the
> partitions, and another way it recomputes the partitions.  I think that
> could be well within pg_dump's mandate.

I was thinking the same, but enable that option automatically for hash
partitioning.  Each partition is still dumped separately, but instead of
referencing the specific partition in the TABLE DATA object, reference
the parent table.  This way, the restore can still be parallelized, but
tuple routing is executed for each tuple.

> (cough ... logical replication ... cough)

I think for logical replication the tuple should appear as being in the
parent table, not the partition.  No?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Oh, I see your patch also fixes missing code in getObjectDescription().
> >> We need that.  Is there a decent way to get the compiler to warn about
> >> that oversight?
> 
> > We could remove the default clause.  That results in the expected
> > warning, and no others (i.e. the switch was already complete):
> 
> > /pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: 
> > enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch]
> 
> Hm.  That would behave less than desirably if getObjectClass() could
> return a value that wasn't a member of the enum, but it's hard to
> credit that happening.  I guess I'd vote for removing the default:
> case from all of the places that have "switch (getObjectClass(object))",
> as forgetting to add an entry seems like a much larger hazard.

Ignoring the one in alter.c's AlterObjectNamespace_oid, which only
handles a small subset of the enum values, that gives the following
warnings:

/pgsql/source/master/src/backend/catalog/dependency.c: In function 'doDeletion':
/pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
enumeration value 'OCLASS_ROLE' not handled in switch [-Wswitch]
  switch (getObjectClass(object))
  ^
/pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
enumeration value 'OCLASS_DATABASE' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
enumeration value 'OCLASS_TBLSPACE' not handled in switch [-Wswitch]

/pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: 
enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch]

/pgsql/source/master/src/backend/commands/tablecmds.c: In function 
'ATExecAlterColumnType':
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_AM' not handled in switch [-Wswitch]
   switch (getObjectClass(&foundObject))
   ^
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_EVENT_TRIGGER' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_PUBLICATION' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_PUBLICATION_REL' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch]
/pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: 
enumeration value 'OCLASS_TRANSFORM' not handled in switch [-Wswitch]

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Oh, I see your patch also fixes missing code in getObjectDescription().
>> We need that.  Is there a decent way to get the compiler to warn about
>> that oversight?

> We could remove the default clause.  That results in the expected
> warning, and no others (i.e. the switch was already complete):

> /pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: 
> enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch]

Hm.  That would behave less than desirably if getObjectClass() could
return a value that wasn't a member of the enum, but it's hard to
credit that happening.  I guess I'd vote for removing the default:
case from all of the places that have "switch (getObjectClass(object))",
as forgetting to add an entry seems like a much larger hazard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Oh, sorry --- I already pushed something about this.
> 
> > That's fine, thanks.
> 
> Oh, I see your patch also fixes missing code in getObjectDescription().
> We need that.  Is there a decent way to get the compiler to warn about
> that oversight?

We could remove the default clause.  That results in the expected
warning, and no others (i.e. the switch was already complete):

/pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: 
enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch]

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Oh, sorry --- I already pushed something about this.

> That's fine, thanks.

Oh, I see your patch also fixes missing code in getObjectDescription().
We need that.  Is there a decent way to get the compiler to warn about
that oversight?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Although I've not done anything about it here, I'm not happy about the
> >> handling of dependencies for stats objects.
> 
> > Here are two patches regarding handling of dependencies.
> 
> Oh, sorry --- I already pushed something about this.

That's fine, thanks.

I'm glad that this thread got you interested in look over the extended
statistics stuff.  Thanks for the input.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Although I've not done anything about it here, I'm not happy about the
>> handling of dependencies for stats objects.

> Here are two patches regarding handling of dependencies.

Oh, sorry --- I already pushed something about this.

> The first one
> implements your first suggestion: add a NORMAL dependency on each
> column, and do away with RemoveStatisticsExt.  This works well and
> should uncontroversial.

No, I wanted an AUTO dependency there, for exactly the reason you
mention:

> If we only do this, then DROP TABLE needs to say CASCADE if there's a
> statistics object in the table.

I don't think we really want that, do we?  A stats object can't be of
any value if the underlying table is gone.  Perhaps that calculus
would change for cross-table stats but I don't see why.

> This seems pointless to me, so the
> second patch throws in an additional dependency on the table as a whole,
> AUTO this time, so that if the table is dropped, the statistics goes
> away without requiring cascade.  There is no point in forcing CASCADE
> for this case, ISTM.  This works well too, but I admit there might be
> resistance to doing it.  OTOH this is how CREATE INDEX works.

Uh, no it isn't.  Indexes have auto dependencies on the individual
columns they index, and *not* on the table as a whole.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch to fix documentation about AFTER triggers

2017-05-12 Thread Paul Jungwirth


Here is a patch to amend the docs here:

https://www.postgresql.org/docs/devel/static/plpgsql-trigger.html

In the example for an AFTER trigger, you see this code:

--
-- Create a row in emp_audit to reflect the operation performed on emp,
-- make use of the special variable TG_OP to work out the operation.
--
IF (TG_OP = 'DELETE') THEN
INSERT INTO emp_audit SELECT 'D', now(), user, OLD.*;
RETURN OLD;
 ELSIF (TG_OP = 'UPDATE') THEN
INSERT INTO emp_audit SELECT 'U', now(), user, NEW.*;
RETURN NEW;
ELSIF (TG_OP = 'INSERT') THEN
INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
RETURN NEW;
END IF;
RETURN NULL; -- result is ignored since this is an AFTER trigger

What are all those RETURNs doing in there? The comment on the final 
RETURN is correct, so returning NEW or OLD above seems confusing, and 
likely a copy/paste error.


This patch just removes those three lines from the example code.

Thanks!
Paul
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index a6088e9..dc29e7c 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3999,13 +3999,10 @@ CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
 --
 IF (TG_OP = 'DELETE') THEN
 INSERT INTO emp_audit SELECT 'D', now(), user, OLD.*;
-RETURN OLD;
 ELSIF (TG_OP = 'UPDATE') THEN
 INSERT INTO emp_audit SELECT 'U', now(), user, NEW.*;
-RETURN NEW;
 ELSIF (TG_OP = 'INSERT') THEN
 INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
-RETURN NEW;
 END IF;
 RETURN NULL; -- result is ignored since this is an AFTER trigger
 END;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] About forced dependencies on catversion.h

2017-05-12 Thread Tom Lane

backend/access/transam/Makefile contains

# ensure that version checks in xlog.c get recompiled when catversion.h changes
xlog.o: xlog.c $(top_srcdir)/src/include/catalog/catversion.h

which, at the time it was added, was sufficient to ensure you could do
"make check" after bumping catversion and things would work, whether
or not you bother with --enable-depend.

It ain't sufficient anymore though.  First off, pg_rewind.c also has
compiled-in knowledge of CATALOG_VERSION_NO.  So if you try "make
check-world" after naively bumping catversion, the pg_rewind TAP tests
will blow up most spectacularly, because the backend will have been
rebuilt to use the new catversion while pg_rewind won't have.

I started this post with the intention of proposing that we add a
similar forced dependency for pg_rewind.o.  However, grepping revealed
a much bigger hazard, which is that CATALOG_VERSION_NO also factors
into TABLESPACE_VERSION_DIRECTORY, and there are dependencies on that
in about eight different files.  The scary thing there is that a naive
test run will appear to work as long as you didn't rebuild any of those
files; while if you caused a rebuild of just some of them, it's gonna
be a mess.

So what I now think is that we'd be better off removing the forced
dependency shown above.  It's just encouraging people to take unsafe
shortcuts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:

> Although I've not done anything about it here, I'm not happy about the
> handling of dependencies for stats objects.  I do not think that cloning
> RemoveStatistics as RemoveStatisticsExt is sane at all.  The former is
> meant to deal with cleaning up pg_statistic rows that we know will be
> there, so there's no real need to expend pg_depend overhead to track them.
> For objects that are only loosely connected, the right thing is to use
> the dependency system; in particular, this design has no real chance of
> working well with cross-table stats.  Also, it's really silly to have
> *both* this hard-wired mechanism and a pg_depend entry; the latter is
> surely redundant if you have the former.  IMO we should revert
> RemoveStatisticsExt and instead handle things by making stats objects
> auto-dependent on the individual column(s) they reference (not the whole
> table).
> 
> I'm also of the opinion that having an AUTO dependency, rather than
> a NORMAL dependency, on the stats object's schema is the wrong semantics.
> There isn't any other case where you can drop a non-empty schema without
> saying CASCADE, and I'm mystified why this case should act that way.

Here are two patches regarding handling of dependencies.  The first one
implements your first suggestion: add a NORMAL dependency on each
column, and do away with RemoveStatisticsExt.  This works well and
should uncontroversial.

If we only do this, then DROP TABLE needs to say CASCADE if there's a
statistics object in the table.  This seems pointless to me, so the
second patch throws in an additional dependency on the table as a whole,
AUTO this time, so that if the table is dropped, the statistics goes
away without requiring cascade.  There is no point in forcing CASCADE
for this case, ISTM.  This works well too, but I admit there might be
resistance to doing it.  OTOH this is how CREATE INDEX works.

(I considered what would happen with a stats object covering multiple
tables.  I think it's reasonable to drop the stats too in that case,
under the rationale that the object is no longer useful.  Not really
sure about this.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 06fad759888d5060f9b4cf4d4f4fdec777350027 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 12 May 2017 17:16:26 -0300
Subject: [PATCH 1/2] fix dependencies problem

---
 src/backend/catalog/heap.c  | 74 -
 src/backend/catalog/objectaddress.c | 20 +
 src/backend/commands/statscmds.c| 18 
 src/test/regress/expected/stats_ext.out | 13 --
 src/test/regress/sql/stats_ext.sql  |  9 ++--
 5 files changed, 45 insertions(+), 89 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ab3d83f29b..ba63939022 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1615,10 +1615,7 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
heap_close(attr_rel, RowExclusiveLock);
 
if (attnum > 0)
-   {
RemoveStatistics(relid, attnum);
-   RemoveStatisticsExt(relid, attnum);
-   }
 
relation_close(rel, NoLock);
 }
@@ -1873,7 +1870,6 @@ heap_drop_with_catalog(Oid relid)
 * delete statistics
 */
RemoveStatistics(relid, 0);
-   RemoveStatisticsExt(relid, 0);
 
/*
 * delete attribute tuples
@@ -2784,76 +2780,6 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
heap_close(pgstatistic, RowExclusiveLock);
 }
 
-
-/*
- * RemoveStatisticsExt --- remove entries in pg_statistic_ext for a relation
- *
- * If attnum is zero, remove all entries for rel; else remove only the
- * one(s) involving that column.
- */
-void
-RemoveStatisticsExt(Oid relid, AttrNumber attnum)
-{
-   Relationpgstatisticext;
-   SysScanDesc scan;
-   ScanKeyData key;
-   HeapTuple   tuple;
-
-   /*
-* Scan pg_statistic_ext to delete relevant tuples
-*/
-   pgstatisticext = heap_open(StatisticExtRelationId, RowExclusiveLock);
-
-   ScanKeyInit(&key,
-   Anum_pg_statistic_ext_stxrelid,
-   BTEqualStrategyNumber, F_OIDEQ,
-   ObjectIdGetDatum(relid));
-
-   scan = systable_beginscan(pgstatisticext,
- 
StatisticExtRelidIndexId,
- true, NULL, 1, &key);
-
-   while (HeapTupleIsValid(tuple = systable_getnext(scan)))
-   {
-   booldelete = false;
-
-   if (attnum == 0)
-   delete = true;
-   else if (attnum != 0)
-   {
-   Form_pg_statistic_ext   staForm;
-   int i;
-
-   /*
-

Re: [HACKERS] CTE inlining

2017-05-12 Thread Merlin Moncure

On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian  wrote:
> On Tue, May  9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
>> Ilya Shkuratov  writes:
>> > Ok, it seems that most people in discussion are agree that removing 
>> > optimization
>> > fence is a right thing to do.
>> > Nonetheless I still hoping to discuss the algorithm and its implementation.
>>
>> Yeah, so far we've mainly discussed whether to do that and how to control
>> it, not what the actual results would be.
>
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1.  add INLINE to disable the CTE fence
> 2.  add MATERIALIZE to enable the CTE fence
>
> or some other keywords.  I think most people prefer #2 because:
>
> *  most users writing queries prefer #2

Yeah, I think there was rough consensus on this point.I think it's
fair to assume that most (or at least a majority) of queries will
benefit from inlining, people would want to opt out of, rather than
opt in to, generally good optimization strategies.   This will hit
some in people today, but this is not a backwards compatibility issue
since performance is generally not really fairly described as
compatibility criteria.  If this feature drops we ought to warn people
in the release notes though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-12 Thread Bruce Momjian

On Tue, May  9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
> Ilya Shkuratov  writes:
> > Ok, it seems that most people in discussion are agree that removing 
> > optimization
> > fence is a right thing to do.
> > Nonetheless I still hoping to discuss the algorithm and its implementation. 
> 
> Yeah, so far we've mainly discussed whether to do that and how to control
> it, not what the actual results would be.

To summarize, it seems we have two options if we want to add fence
control to CTEs:

1.  add INLINE to disable the CTE fence
2.  add MATERIALIZE to enable the CTE fence

or some other keywords.  I think most people prefer #2 because:

*  most users writing queries prefer #2
*  most users assume full optimization and it seems natural to turn
   _off_ an optimization via a keyword 
*  while some queries can be inlined, all queries can be materialized,
   so doing #1 means INLINE would be only a preference, which could be
   confusing

Anyway, I am very glad we are considering addressing this in PG 11.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

I wrote:
> Tomas Vondra  writes:
>> On 5/12/17 4:46 AM, Tom Lane wrote:
>>> Although I've not done anything about it here, I'm not happy about the
>>> handling of dependencies for stats objects.

>> Yeah, it's a bit frankensteinian ...

> I'm prepared to create a fix for that, but it'd be easier to commit the
> current patch first, to avoid merge conflicts.

Done.  I noted that we weren't making an owner dependency either :-(.

Also, we might want to think about letting stats objects be extension
members sometime.  As things stand, if an extension script does CREATE
STATISTICS, the stats object will just be "loose" rather than bound
into the extension.

We still have docs and nomenclature issues to work on.  Anyone want
to take point on those?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Peter Eisentraut

On 5/12/17 14:23, Robert Haas wrote:
> One alternative would be to change the way that we dump and restore
> the data.  Instead of dumping the data with the individual partitions,
> dump it all out for the parent and let tuple routing sort it out at
> restore time.

I think this could be a pg_dump option.  One way it dumps out the
partitions, and another way it recomputes the partitions.  I think that
could be well within pg_dump's mandate.

(cough ... logical replication ... cough)

> Of course, this isn't very satisfying because now
> dump-and-restore hasn't really preserved the state of the database;

That depends no whether you consider the partitions to be a user-visible
or an internal detail.  The current approach is sort of in the middle,
so it makes sense to allow the user to interpret it either way depending
on need.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Tab-completing DROP STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:

> I did not review the rest of the regression test changes, nor the
> tab-complete changes.  I can however report that DROP STATISTICS 
> doesn't successfully complete any stats object names.

The attached patch fixes this problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5f47c59f8a..d55656769b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16681,6 +16681,12 @@ SELECT relname FROM pg_class WHERE 
pg_table_is_visible(oid);
is operator family visible in search path
   
   
+   
pg_statistic_ext_is_visible(stat_oid)
+   
+   boolean
+   is statistics object visible in search path
+  
+  

pg_table_is_visible(table_oid)

boolean
@@ -16739,6 +16745,9 @@ SELECT relname FROM pg_class WHERE 
pg_table_is_visible(oid);
 pg_opfamily_is_visible


+pg_statistic_ext_is_visible
+   
+   
 pg_table_is_visible


diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e521bd908a..5b339222af 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -34,6 +34,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_ts_parser.h"
@@ -2142,6 +2143,72 @@ get_statistics_oid(List *names, bool missing_ok)
 }
 
 /*
+ * StatisticsObjIsVisible
+ * Determine whether a statistics object (identified by OID) is 
visible in
+ * the current search path.  Visible means "would be found by 
searching
+ * for the unqualified statistics object name".
+ */
+bool
+StatisticsObjIsVisible(Oid relid)
+{
+   HeapTuple   stxtup;
+   Form_pg_statistic_ext stxform;
+   Oid stxnamespace;
+   boolvisible;
+
+   stxtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(relid));
+   if (!HeapTupleIsValid(stxtup))
+   elog(ERROR, "cache lookup failed for statistics object %u", 
relid);
+   stxform = (Form_pg_statistic_ext) GETSTRUCT(stxtup);
+
+   recomputeNamespacePath();
+
+   /*
+* Quick check: if it ain't in the path at all, it ain't visible. Items 
in
+* the system namespace are surely in the path and so we needn't even do
+* list_member_oid() for them.
+*/
+   stxnamespace = stxform->stxnamespace;
+   if (stxnamespace != PG_CATALOG_NAMESPACE &&
+   !list_member_oid(activeSearchPath, stxnamespace))
+   visible = false;
+   else
+   {
+   /*
+* If it is in the path, it might still not be visible; it 
could be
+* hidden by another statistics object of the same name earlier 
in the
+* path. So we must do a slow check for conflicting objects.
+*/
+   char   *stxname = NameStr(stxform->stxname);
+   ListCell   *l;
+
+   visible = false;
+   foreach(l, activeSearchPath)
+   {
+   Oid namespaceId = lfirst_oid(l);
+
+   if (namespaceId == stxnamespace)
+   {
+   /* Found it first in path */
+   visible = true;
+   break;
+   }
+   if (SearchSysCacheExists2(STATEXTNAMENSP,
+ 
PointerGetDatum(stxname),
+ 
ObjectIdGetDatum(namespaceId)))
+   {
+   /* Found something else first in path */
+   break;
+   }
+   }
+   }
+
+   ReleaseSysCache(stxtup);
+
+   return visible;
+}
+
+/*
  * get_ts_parser_oid - find a TS parser by possibly qualified name
  *
  * If not found, returns InvalidOid if missing_ok, else throws error
@@ -4236,6 +4303,17 @@ pg_conversion_is_visible(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_statistic_ext_is_visible(PG_FUNCTION_ARGS)
+{
+   Oid oid = PG_GETARG_OID(0);
+
+   if (!SearchSysCacheExists1(STATEXTOID, ObjectIdGetDatum(oid)))
+   PG_RETURN_NULL();
+
+   PG_RETURN_BOOL(StatisticsObjIsVisible(oid));
+}
+
+Datum
 pg_ts_parser_is_visible(PG_FUNCTION_ARGS)
 {
Oid oid = PG_GETARG_OID(0);
diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 236d876b1c..720e540276 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@

Re: [HACKERS] Hash Functions

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 2:45 PM, Tom Lane  wrote:
> Yeah, that isn't really appetizing at all.  If we were doing physical
> partitioning below the user-visible level, we could make it fly.
> But the existing design makes the partition boundaries user-visible
> which means we have to insist that the partitioning rule is immutable
> (in the strongest sense of being invariant across all installations
> you could possibly wish to transport data between).

I think you're right.

> Now, we already have some issues of that sort with range partitioning;
> if say you try to transport data to another system with a different
> sort ordering, you have probably got problems.  But that issue already
> existed with the old manual partitioning approach, ie if you have a
> CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
> to such a system could fail already.  So I'm okay with that.  But it
> seems like hash partitioning is going to introduce new, exciting, and
> hard-to-document-or-avoid portability gotchas.  Do we really need
> to go there?

That is a good question.  I think it basically amounts to this
question: is hash partitioning useful, and if so, for what?

Range and list partitioning inherently provide management benefits.
For example, if you range-partition your data by year, then when you
want to get rid of the oldest year worth of data, you can drop the
entire partition at once, which is likely to be much faster than a
DELETE statement.  But hash partitioning provide no such benefits
because, by design, the distribution of the data among the partitions
is essentially random.  Dropping one partition will not usually be a
useful thing to do because the rows in that partition are logically
unconnected.  So, if you have a natural way of partitioning a table by
range or list, that's probably going to be superior to hash
partitioning, but sometimes there isn't an obviously useful way of
breaking up the data, but you still want to partition it in order to
reduce the size of the individual tables.

That, in turn, allows maintenance operations to be performed each
partition separately.  For example, suppose your table is big enough
that running CLUSTER or VACUUM FULL on it takes eight hours, but you
can't really afford an eight-hour maintenance window when the table
gets bloated.  However, you can afford (on Sunday nights when activity
is lowest) a window of, say, one hour.  Well, if you hash-partition
the table, you can CLUSTER or VACUUM FULL one partition a week for N
weeks until you get to them all.  Similarly, if you need to create an
index, you can build it on one partition at a time.  You can even add
the index to one partition to see how well it works -- for example,
does it turn too many HOT updates into non-HOT updates, causing bloat?
-- and try it out before you go do it across the board.  And an
unpartitioned table can only accommodate one VACUUM process at a time,
but a partitioned table can have one per partition.  Partitioning a
table also means that you don't need a single disk volume large enough
to hold the whole thing; instead, you can put each partition in a
separate tablespace or (in some exciting future world where PostgreSQL
looks more like a distributed system) perhaps even on another server.

For a table that is a few tens of gigabytes, none of this amounts to a
hill of beans, but for a table that is a few tens of terabytes, the
time it takes to perform administrative operations can become a really
big problem.  A good example is, say, a table full of orders.  Imagine
a high-velocity business like Amazon or UPS that has a constant stream
of new orders, or a mobile app that has a constant stream of new user
profiles.  If that data grows fast enough that the table needs to be
partitioned, how should it be done?  It's often desirable to create
partitions of about equal size and about equal hotness, and
range-partitioning on the creation date or order ID number means
continually creating new partitions, and having all of the hot data in
the same partition.

In my experience, it is *definitely* the case that users with very
large tables are experiencing significant pain right now.  The freeze
map changes were a good step towards ameliorating some of that pain,
but I think hash partitioning is another step in that direction, and I
think there will be other applications as well.  Even for users who
don't have data that large, there are also interesting
query-performance implications of hash partitioning.  Joins between
very large tables tend to get implemented as merge joins, which often
means sorting all the data, which is O(n lg n) and not easy to
parallelize.  But if those very large tables happened to be compatibly
partitioned on the join key, you could do a partitionwise join per the
patch Ashutosh proposed, which would be cheaper and easier to do in
parallel.

Maybe a shorter argument for hash partitioning is that not one but two
different people proposed patches for it within

Re: [HACKERS] Hash Functions

2017-05-12 Thread Kenneth Marshall

On Fri, May 12, 2017 at 02:23:14PM -0400, Robert Haas wrote:
> 
> What about integers?  I think we're already assuming two's-complement
> arithmetic, which I think means that the only problem with making the
> hash values portable for integers is big-endian vs. little-endian.
> That's sounds solveable-ish.
> 

xxhash produces identical hashes independent for big-endian and little-
endian.

https://github.com/Cyan4973/xxHash

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Tom Lane

Robert Haas  writes:
> On Fri, May 12, 2017 at 1:34 PM, Tom Lane  wrote:
>> I'd vote that it's not, which means that this whole approach to hash
>> partitioning is unworkable.  I agree with Andres that demanding hash
>> functions produce architecture-independent values will not fly.

> If we can't produce architecture-independent hash values, then what's
> the other option?

Forget hash partitioning.  There's no law saying that that's a good
idea and we have to have it.  With a different set of constraints,
maybe we could do it, but I think the existing design decisions have
basically locked it out --- and I doubt that hash partitioning is so
valuable that we should jettison other desirable properties to get it.

> One alternative would be to change the way that we dump and restore
> the data.  Instead of dumping the data with the individual partitions,
> dump it all out for the parent and let tuple routing sort it out at
> restore time.  Of course, this isn't very satisfying because now
> dump-and-restore hasn't really preserved the state of the database;
> indeed, you could make it into a hard failure by creating a foreign
> key referencing a partition hash partition.

Yeah, that isn't really appetizing at all.  If we were doing physical
partitioning below the user-visible level, we could make it fly.
But the existing design makes the partition boundaries user-visible
which means we have to insist that the partitioning rule is immutable
(in the strongest sense of being invariant across all installations
you could possibly wish to transport data between).

Now, we already have some issues of that sort with range partitioning;
if say you try to transport data to another system with a different
sort ordering, you have probably got problems.  But that issue already
existed with the old manual partitioning approach, ie if you have a
CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
to such a system could fail already.  So I'm okay with that.  But it
seems like hash partitioning is going to introduce new, exciting, and
hard-to-document-or-avoid portability gotchas.  Do we really need
to go there?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 1:34 PM, Tom Lane  wrote:
> I'd vote that it's not, which means that this whole approach to hash
> partitioning is unworkable.  I agree with Andres that demanding hash
> functions produce architecture-independent values will not fly.

If we can't produce architecture-independent hash values, then what's
the other option?

One alternative would be to change the way that we dump and restore
the data.  Instead of dumping the data with the individual partitions,
dump it all out for the parent and let tuple routing sort it out at
restore time.  Of course, this isn't very satisfying because now
dump-and-restore hasn't really preserved the state of the database;
indeed, you could make it into a hard failure by creating a foreign
key referencing a partition hash partition.  After dump-and-restore,
the row ends up in some other partition and the foreign key can't be
recreated because the relationship no longer holds.  This isn't
limited to foreign keys, either; similar problems could be created
with CHECK constraints or other per-table properties that can vary
between one child and another.

I basically think it's pretty futile to suppose that we can get away
with having a dump and restore move rows around between partitions
without having that blow up in some cases.

> Maintaining such a property for float8 (and the types that depend on it)
> might be possible if you believe that nobody ever uses anything but IEEE
> floats, but we've never allowed that as a hard assumption before.

I don't know how standard that is.  Is there any hardware that
anyone's likely to be using that doesn't?  TBH, I don't really care if
support for obscure, nearly-dead platforms like VAX or whatever don't
quite work with hash-partitioned tables.  In practice, PostgreSQL only
sorta works on that kind of platform anyway; there are far bigger
problems than this.  On the other hand, if there are servers being
shipped in 2017 that don't use IEEE floats, that's another problem.

What about integers?  I think we're already assuming two's-complement
arithmetic, which I think means that the only problem with making the
hash values portable for integers is big-endian vs. little-endian.
That's sounds solveable-ish.

> Even architecture dependence isn't the whole scope of the problem.
> Consider for example dumping a LATIN1-encoded database and trying
> to reload it into a UTF8-encoded database.  People will certainly
> expect that to be possible, and do you want to guarantee that the
> hash of a text value is encoding-independent?

No, I think that's expecting too much.  I'd be just fine telling
people that if you hash-partition on a text column, it may not load
into a database with another encoding.  If you care about that, don't
use hash-partitioning, or don't change the encoding, or dump out the
partitions one by one and reload all the roads into the parent table
for re-routing, solving whatever problems come up along the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-12 Thread Peter Eisentraut

On 5/12/17 12:12, Dagfinn Ilmari Mannsåker wrote:
> Peter Eisentraut  writes:
> 
>> Committed, adjusting for some of the suggestions.
> 
> Looks like one occurrence was still left in:
> 
> $ ag --no-group --ignore po --ignore release-\* --ignore wal.sgml 
> '\btransaction\s+log\b'
> doc/src/sgml/func.sgml:18631:end of the transaction log at any instant, 
> while the write location is the end of
> 
> The four other occurrences in the same paragraph were fixed, so I assume
> this was just an oversight.

Fixed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-12 Thread Peter Eisentraut

On 5/11/17 12:00, Tom Lane wrote:
> Peter Eisentraut  writes:
>> Most documentation and error messages still uses the term "transaction
>> log" to refer to the write-ahead log.  Here is a patch to rename that,
>> which I think should be done, to match the xlog -> wal renaming in APIs.
> 
> This will need to be rebased over d10c626de, which made a few of the
> same/similar changes but did not try to hit stuff that wasn't adjacent
> to what it was changing anyway.  I'm +1 for the idea though.  I think
> we should also try to standardize on the terminology "WAL location"
> for LSNs, not variants such as "WAL position" or "log position".

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Bruce Momjian

On Fri, May 12, 2017 at 08:35:26PM +0300, Konstantin Knizhnik wrote:
> 
> 
> On 12.05.2017 18:23, Bruce Momjian wrote:
> >On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> >>Definitely changing session context (search_path, date/time format, ...) may
> >>cause incorrect behavior of cached statements.
> >I wonder if we should clear the cache whenever any SET command is
> >issued.
> 
> Well, with autoprepare cache disabled on each set variable, alter system and
> any slow utility statement
> only one regression test is not passed. And only because of different error
> message context:

Great.  We have to think of how we would keep this reliable when future
changes happen.  Simple is often better because it limits the odds of
breakage.

> >>Actually you may get the same problem with explicitly prepared statements
> >>(certainly, in the last case, you better understand what going on and it is
> >>your choice whether to use or not to use prepared statement).
> >>
> >>The fact of failure of 7 regression tests means that autoprepare can really
> >>change behavior of existed program. This is why my suggestion is  to switch
> >>off this feature by default.
> >I would like to see us target something that can be enabled by default.
> >Even if it only improves performance by 5%, it would be better overall
> >than a feature that improves performance by 90% but is only used by 1%
> >of our users.
> 
> I have to agree with you here.

Good.  I know there are database systems that will try to get the most
performance possible no matter how complex it is for users to configure
or to maintain, but that isn't us.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Andres Freund



On May 12, 2017 12:50:41 AM PDT, Konstantin Knizhnik 
 wrote:

>Definitely changing session context (search_path, date/time format,
>...) 
>may cause incorrect behavior of cached statements.
>Actually you may get the same problem with explicitly prepared 
>statements (certainly, in the last case, you better understand what 
>going on and it is your choice whether to use or not to use prepared 
>statement).
>
>The fact of failure of 7 regression tests means that autoprepare can 
>really change behavior of existed program. This is why my suggestion is
> 
>to switch off this feature by default.

I can't see us agreeing with this feature unless it actually reliably works, 
even if disabled by default.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Joe Conway

On 05/12/2017 10:17 AM, Robert Haas wrote:
> On Fri, May 12, 2017 at 1:12 PM, Andres Freund wrote:
>> Given that a lot of data types have a architecture dependent
>> representation, it seems somewhat unrealistic and expensive to have
>> a hard rule to keep them architecture agnostic.   And if that's not
>> guaranteed, then I'm doubtful it makes sense as a soft rule
>> either.
> 
> That's a good point, but the flip side is that, if we don't have
> such a rule, a pg_dump of a hash-partitioned table on one
> architecture might fail to restore on another architecture.  Today, I
> believe that, while the actual database cluster is
> architecture-dependent, a pg_dump is architecture-independent.  Is it
> OK to lose that property?


Not from where I sit.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Konstantin Knizhnik




On 12.05.2017 18:23, Bruce Momjian wrote:

On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:

Definitely changing session context (search_path, date/time format, ...) may
cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.


Well, with autoprepare cache disabled on each set variable, alter system 
and any slow utility statement
only one regression test is not passed. And only because of different 
error message context:


*** /home/knizhnik/postgresql.master/src/test/regress/expected/date.out 
2017-04-11 18:07:56.497461208 +0300
--- /home/knizhnik/postgresql.master/src/test/regress/results/date.out 
2017-05-12 20:21:19.767566302 +0300

***
*** 1443,1452 
  --
  SELECT EXTRACT(MICROSEC  FROM DATE 'infinity'); -- ERROR: 
timestamp units "microsec" not recognized

  ERROR:  timestamp units "microsec" not recognized
- CONTEXT:  SQL function "date_part" statement 1
  SELECT EXTRACT(UNDEFINED FROM DATE 'infinity'); -- ERROR: 
timestamp units "undefined" not supported

  ERROR:  timestamp units "undefined" not supported
- CONTEXT:  SQL function "date_part" statement 1
  -- test constructors
  select make_date(2013, 7, 15);
   make_date
--- 1443,1450 

==




Actually you may get the same problem with explicitly prepared statements
(certainly, in the last case, you better understand what going on and it is
your choice whether to use or not to use prepared statement).

The fact of failure of 7 regression tests means that autoprepare can really
change behavior of existed program. This is why my suggestion is  to switch
off this feature by default.

I would like to see us target something that can be enabled by default.
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.


I have to agree with you here.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Tom Lane

Robert Haas  writes:
> On Fri, May 12, 2017 at 1:12 PM, Andres Freund  wrote:
>> Given that a lot of data types have a architecture dependent representation, 
>> it seems somewhat unrealistic and expensive to have a hard rule to keep them 
>> architecture agnostic.   And if that's not guaranteed, then I'm doubtful it 
>> makes sense as a soft rule either.

> That's a good point, but the flip side is that, if we don't have such
> a rule, a pg_dump of a hash-partitioned table on one architecture
> might fail to restore on another architecture.  Today, I believe that,
> while the actual database cluster is architecture-dependent, a pg_dump
> is architecture-independent.  Is it OK to lose that property?

I'd vote that it's not, which means that this whole approach to hash
partitioning is unworkable.  I agree with Andres that demanding hash
functions produce architecture-independent values will not fly.

Maintaining such a property for float8 (and the types that depend on it)
might be possible if you believe that nobody ever uses anything but IEEE
floats, but we've never allowed that as a hard assumption before.

Even architecture dependence isn't the whole scope of the problem.
Consider for example dumping a LATIN1-encoded database and trying
to reload it into a UTF8-encoded database.  People will certainly
expect that to be possible, and do you want to guarantee that the
hash of a text value is encoding-independent?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error at the time of dropping subscription and few more issues

2017-05-12 Thread Masahiko Sawada

On Fri, May 12, 2017 at 9:35 PM, tushar  wrote:
> Hi,
>
> There are few more issues , found in logical replication
>
> (1)ERROR:  tuple concurrently updated
>
> Publication Server - (X machine)
> \\create table \ create publication \ insert rows
>  create table t(n int);
>  create publication pub for table t;
>  insert into t values (generate_series(1,100));
>
> Subscription Server-(Y machine)
> \\create table t / create subscription
> create table t(n int);
> create subscription sub connection 'dbname=postgres  port=5000 user=centos
> password=a' publication pub;
>
> \\drop subscription and  re-create  (repeat this 2-3 times)
> postgres=# drop subscription sub;
> NOTICE:  dropped replication slot "sub" on publisher
> DROP SUBSCRIPTION
> postgres=# create subscription sub connection 'dbname=postgres port=5000
> user=centos  password=a' publication pub;
> NOTICE:  synchronized table states
> NOTICE:  created replication slot "sub" on publisher
> CREATE SUBSCRIPTION
> postgres=# select count(*) from t;
>   count
> -
>  100
> (1 row)
>
> postgres=# drop subscription sub;
> ERROR:  tuple concurrently updated

I wonder the subscriber had already behaved oddly during dropping and
creating the subscription repeatedly. An issue related to doing DROP
SUBSCRIPTION repeatedly is reported on another thread and is under the
discussion[1]. This issue might be relevant with that.

>
> (2) Not able to drop the subscription even 'nocreate slot' is specified
>
> postgres=# create subscription s2s1 connection 'dbname=postgres port=5000
> user=t  password=a' publication t with (nocreate
> slot,enabled,copydata,SYNCHRONOUS_COMMIT='on');
> NOTICE:  synchronized table states
> CREATE SUBSCRIPTION
>
> --not able to drop subscription ,  i have checked on Publication - no such
> slot created but still it is looking for slot.
> postgres=# drop subscription s2s1;
> ERROR:  could not drop the replication slot "s2s1" on publisher
> DETAIL:  The error was: ERROR:  replication slot "s2s1" does not exist

As documentation mentions[2], we can drop subscription after disabled
and set slot name to NONE by ALTER SUBSCRIPTION.

> (3)Alter publication SET  command doesn't give you NOTICE message about
> tables which got removed.
>
> postgres=# create publication pub for table t,t1,t2 ;
> CREATE PUBLICATION
>
> postgres=# select * from pg_publication_tables ;
>  pubname | schemaname | tablename
> -++---
>  pub | public | t
>  pub | public | t1
>  pub | public | t2
> (3 rows)
>
> postgres=# alter publication pub set table t;
> ALTER PUBLICATION
>
> postgres=# select * from pg_publication_tables ;
>  pubname | schemaname | tablename
> -++---
>  pub | public | t
> (1 row)
>
> in subscription -  (we are getting NOTICE message, about tables which got
> removed)
>
> postgres=#  alter subscription sub set publication pub refresh;
> NOTICE:  removed subscription for table public.t1
> NOTICE:  removed subscription for table public.t2
> ALTER SUBSCRIPTION
>
> I think  - in publication too ,we should provide NOTICE messages.
>

I guess one of the reason why we emit such a NOTICE message is that
subscriber cannot control which table the upstream server replicate.
So if a table got disassociated on the publisher the subscriber should
report that to user. On the other hand, since the publication can
control it and the changes are obvious, I'm not sure we really need to
do that.

BTW I think it's better for the above NOTICE message to have subscription name.

[1] 
https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/devel/static/logical-replication-subscription.html#logical-replication-subscription-slot


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 1:12 PM, Andres Freund  wrote:
> Given that a lot of data types have a architecture dependent representation, 
> it seems somewhat unrealistic and expensive to have a hard rule to keep them 
> architecture agnostic.   And if that's not guaranteed, then I'm doubtful it 
> makes sense as a soft rule either.

That's a good point, but the flip side is that, if we don't have such
a rule, a pg_dump of a hash-partitioned table on one architecture
might fail to restore on another architecture.  Today, I believe that,
while the actual database cluster is architecture-dependent, a pg_dump
is architecture-independent.  Is it OK to lose that property?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Andres Freund



On May 12, 2017 10:05:56 AM PDT, Robert Haas  wrote:
>On Fri, May 12, 2017 at 12:08 AM, Jeff Davis  wrote:
>> 1. The hash functions as they exist today aren't portable -- they can
>> return different results on different machines. That means using
>these
>> functions for hash partitioning would yield different contents for
>the
>> same partition on different architectures (and that's bad,
>considering
>> they are logical partitions and not some internal detail).
>
>Hmm, yeah, that is bad.

Given that a lot of data types have a architecture dependent representation, it 
seems somewhat unrealistic and expensive to have a hard rule to keep them 
architecture agnostic.   And if that's not guaranteed, then I'm doubtful it 
makes sense as a soft rule either.

Andres

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 12:55 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, May 12, 2017 at 12:04 PM, Tom Lane  wrote:
>>> Actually, I now remember that I wrote that while at Salesforce (because
>>> they'd run into the problem that constant ArrayRef expressions weren't
>>> simplified).  So that means it's been languishing in a git branch for
>>> *two* years.  I'm kind of inclined to push it before it gets forgotten
>>> again ;-)
>
>> You will probably not be surprised to hear that I think this is a
>> feature and thus subject to the feature freeze currently in effect.
>
> [ shrug ]  Sure.  I'll do what I should have done then, which is stick
> it into the next CF list.

Thanks.

> If you intend to also object to my proposed get_attstatsslot refactoring,
> please do that promptly.  That one I think is bug-proofing.

I wasn't planning to express an opinion on that one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread Ashutosh Bapat

On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
> Hi,
>
> Please find the following updated patches attached:
>
> 0001-Cleanup.patch : Does some cleanup and code refactoring required
> for hash partition patch. Otherwise, there will be unnecessary diff in
> 0002 patch

Thanks for splitting the patch.

+if (isnull[0])
+cur_index = partdesc->boundinfo->null_index;
This code assumes that null_index will be set to -1 when has_null is false. Per
RelationBuildPartitionDesc() this is true. But probably we should write this
code as
if (isnull[0])
{
if (partdesc->boundinfo->has_null)
cur_index = partdesc->boundinfo->null_index;
}
That way we are certain that when has_null is false, cur_index = -1 similar to
the original code.

Additional arguement to ComputePartitionAttrs() isn't used anywhere in this
patch, so may be this better be part of 0002. If we do this the only change
that will remain in patch is the refactoring of RelationBuildPartitionDesc(),
so we may consider merging it into 0002, unless we find that some more
refactoring is needed. But for now, having it as a separate patch helps.

Here's some more comments on 0002

+ * In the case of hash partitioning, datums is a 2-D array, stores modulus and
+ * remainder values at datums[x][0] and datums[x][1] respectively for each
+ * partition in the ascending order.

This comment about datums should appear in a paragraph of itself and may be
rephrased as in the attached patch. May be we could also add a line about
ndatums for hash partitioned tables as in the attached patch.


+ * (see the above enum); NULL for has and list
typo s/has/hash

+if (key->strategy == PARTITION_STRATEGY_HASH)
+{
+ndatums = nparts;
+hbounds = (PartitionHashBound **) palloc(nparts *
+
sizeof(PartitionHashBound *));
+i = 0;
+foreach (cell, boundspecs)
+{
+PartitionBoundSpec *spec = lfirst(cell);
+
[ clipped ]
+hbounds[i]->index = i;
+i++;
+}
For list and range partitioned table we order the bounds so that two
partitioned tables have them in the same order irrespective of order in which
they are specified by the user or hence stored in the catalogs. The partitions
then get indexes according the order in which their bounds appear in ordered
arrays of bounds. Thus any two partitioned tables with same partition
specification always have same PartitionBoundInfoData. This helps in
partition-wise join to match partition bounds of two given tables.  Above code
assigns the indexes to the partitions as they appear in the catalogs. This
means that two partitioned tables with same partition specification but
different order for partition bound specification will have different
PartitionBoundInfoData represenation.

If we do that, probably partition_bounds_equal() would reduce to just matching
indexes and the last element of datums array i.e. the greatest modulus datum.
If ordered datums array of two partitioned table do not match exactly, the
mismatch can be because missing datums or different datums. If it's a missing
datum it will change the greatest modulus or have corresponding entry in
indexes array as -1. If the entry differs it will cause mismatching indexes in
the index arrays.

+ * is not a factor of 15.
+ *
+ *
+ * Get greatest bound in array boundinfo->datums which is
An extra line here.


+if (offset < 0)
+{
+nmod = DatumGetInt32(datums[0][0]);
+valid_bound = (nmod % spec->modulus) == 0;
+}
+else
+{
+pmod = DatumGetInt32(datums[offset][0]);
+valid_bound = (spec->modulus % pmod) == 0;
+
+if (valid_bound && (offset + 1) < ndatums)
+{
+nmod = DatumGetInt32(datums[offset + 1][0]);
+valid_bound = (nmod % spec->modulus) == 0;
+}
+}
May be name the variables as prev_mod(ulus) and next_mod(ulus) for better
readability.

+ *   for p_p1: satisfies_hash_partition(2, 1, hash_fn(a), hash_fn(b))
+ *   for p_p2: satisfies_hash_partition(4, 2, hash_fn(a), hash_fn(b))
+ *   for p_p3: satisfies_hash_partition(8, 0, hash_fn(a), hash_fn(b))
+ *   for p_p4: satisfies_hash_partition(8, 4, hash_fn(a), hash_fn(b))
The description here may be read as if we are calling the same hash function
for both a and b, but that's not true. So, you may want to clarify that
in hash_fn(a) hash_fn means hash function specified for key a.


+if (key->partattrs[i] != 0)
+{
+keyCol = (Node *) makeVar(1,
+  key->partattrs[i],
+

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread Robert Haas

On Thu, May 11, 2017 at 10:38 PM, Amit Langote
 wrote:
> So, adding keycol IS NOT NULL (like we currently do for expressions) in
> the implicit partition constraint would be more future-proof than
> generating an actual catalogued NOT NULL constraint on the keycol?  I now
> tend to think it would be better.  Directly inserting into a range
> partition with a NULL value for a column currently generates a "null value
> in column \"%s\" violates not-null constraint" instead of perhaps more
> relevant "new row for relation \"%s\" violates partition constraint".
> That said, we *do* document the fact that a NOT NULL constraint is added
> on range key columns, but we might as well document instead that we don't
> currently support routing tuples with NULL values in the partition key
> through a range-partitioned table and so NULL values cause error.
>
> Can we still decide to do that instead?

I suggest you start a new thread on that topic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 12:08 AM, Jeff Davis  wrote:
> 1. The hash functions as they exist today aren't portable -- they can
> return different results on different machines. That means using these
> functions for hash partitioning would yield different contents for the
> same partition on different architectures (and that's bad, considering
> they are logical partitions and not some internal detail).

Hmm, yeah, that is bad.

> We could revert 26043592 (copied Tom because that was his patch) to
> make hash_any() go back to being portable -- do we know what that
> speedup actually was? Maybe the benefit is smaller on newer
> processors? Another option is to try to do some combination of
> byteswapping and word-at-a-time, which might be better than
> byte-at-a-time if the byteswapping is done with a native instruction.

With regard to portability, I find that in 2009, according to Tom, we
had "already agreed" that it was dispensible:

http://postgr.es/m/23832.1234214...@sss.pgh.pa.us

I was not able to find where that was agreed.  On performance, I found this:

https://www.postgresql.org/message-id/20081104202655.gp18...@it.is.rice.edu

It says at the end: "The average time to reindex the table using our
current hash_any() without the separate mix()/final() was 1696ms and
1482ms with the separate mix()/final() stages giving almost 13% better
performance for this stupid metric."

> 5. For salts[2], I don't think it's too hard to support them in an
> optional way. We just allow the function to be a two-argument function
> with a default. Places that care about specifying the salt may do so
> if the function has pronargs==2, otherwise it just gets the default
> value. If we have salts, I don't think having 64-bit hashes is very
> important. If we run out of bits, we can just salt the hash function
> differently and get more hash bits. This is not urgent and I believe
> we should just implement salts when and if some algorithm needs them.

The potential problem with that is that the extra argument might slow
down the hash functions enough to matter.  Argument unpacking is not
free, and the hashing logic itself will get more complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread Ashutosh Bapat

On Wed, May 10, 2017 at 11:39 PM, Robert Haas  wrote:
> On Wed, May 3, 2017 at 9:09 AM, amul sul  wrote:
>> Fixed in the attached version.
>
> +[ PARTITION BY { HASH | RANGE | LIST } ( {  class="parameter">column_name | (  class="parameter">expression ) } [ COLLATE 
> In the department of severe nitpicking, I would have expected this to
> either use alphabetical order (HASH | LIST | RANGE) or to add the new
> method at the end on the theory that we probably did the important
> ones first (RANGE | LIST | HASH).

Importance is subjective, so may be we should arrange them in
alphabetical order, to keep the list in some order and be consistent
everywhere in the code and documentation.

>  More broadly, I wonder why we're
> cramming this into the datums arrays instead of just adding another
> field to PartitionBoundInfoData that is only used by hash
> partitioning.

It would be good if we store datums corresponding to partition bounds
in the same place. So that we don't have to handle hash partition
specially in all the places where we handle partition bound datums. We
already do that for list and range partitions. May be we want to
continue doing so for hash as well. In my comments to Amul's latest
patch, I have described a possibility that partition_bounds_equal()
need not compare all entries in the datums array. It can just compare
greated modulus and the indexes array from given two partition bounds
to check whether they are equal. If that works well, we will probably
address your complaint about DatumIsEqual() in a different manner.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-12 Thread Tom Lane

Robert Haas  writes:
> On Fri, May 12, 2017 at 12:04 PM, Tom Lane  wrote:
>> Actually, I now remember that I wrote that while at Salesforce (because
>> they'd run into the problem that constant ArrayRef expressions weren't
>> simplified).  So that means it's been languishing in a git branch for
>> *two* years.  I'm kind of inclined to push it before it gets forgotten
>> again ;-)

> You will probably not be surprised to hear that I think this is a
> feature and thus subject to the feature freeze currently in effect.

[ shrug ]  Sure.  I'll do what I should have done then, which is stick
it into the next CF list.

If you intend to also object to my proposed get_attstatsslot refactoring,
please do that promptly.  That one I think is bug-proofing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 12:38 PM, Tom Lane  wrote:
> I can't draw any other conclusion but that you've hacked something
> to make FATAL act like LOG.  Which is a fatal mistake.

I see what you did there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 12:04 PM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> On 05/11/2017 06:21 PM, Tom Lane wrote:
>>> Yeah, I think it's a lack-of-round-tuits situation.  Your patch reminds
>>> me of a more ambitious attempt I made awhile back to reduce the amount
>>> of duplicative boilerplate in eval_const_expressions.  I think I wrote
>>> it during last year's beta period, stashed it because I didn't feel like
>>> arguing for applying it right then, and then forgot about it.
>
>> Hmph, now we're almost in beta period again. :-(.
>
> Actually, I now remember that I wrote that while at Salesforce (because
> they'd run into the problem that constant ArrayRef expressions weren't
> simplified).  So that means it's been languishing in a git branch for
> *two* years.  I'm kind of inclined to push it before it gets forgotten
> again ;-)

You will probably not be surprised to hear that I think this is a
feature and thus subject to the feature freeze currently in effect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multi-column range partition constraint

2017-05-12 Thread Robert Haas

On Fri, May 12, 2017 at 3:26 AM, Amit Langote
 wrote:
> Fixed.

This seems to be the same patch version as last time, so none of the
things you mention as fixed are, in fact, fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-12 Thread Petr Jelinek

On 12/05/17 10:57, Petr Jelinek wrote:
> On 12/05/17 03:31, Andres Freund wrote:
>> On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
>>> On 2017-05-11 14:51:55 -0700,  wrote:
 On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> I plan to commit the next pending patch after the back branch releases
> are cut - it's an invasive fix and the issue doesn't cause corruption
> "just" slow slot creation. So it seems better to wait for a few days,
> rather than hurry it into the release.

 Now that that's done, here's an updated version of that patch.  Note the
 new logic to trigger xl_running_xact's to be logged at the right spot.
 Works well in my testing.

 I plan to commit this fairly soon, unless somebody wants a bit more time
 to look into it.

 Unless somebody protests, I'd like to slightly revise how the on-disk
 snapshots are stored on master - given the issues this bug/commit showed
 with the current method - but I can see one could argue that that should
 rather be done next release.
>>>
>>> As usual I forgot my attachement.
>>
>> This patch also seems to offer a way to do your 0005 in, possibly, more
>> efficient manner.  We don't ever need to assume a transaction is
>> timetravelling anymore.  Could you check whether the attached, hasty,
>> patch resolves the performance problems you measured?  Also, do you have
>> a "reference" workload for that?
>>
> 
> Hmm, well it helps but actually now that we don't track individual
> running transactions anymore it got much less effective (my version of
> 0005 does as well).
> 
> The example workload I test with is:
> session 1: open transaction, do a write, keep it open
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
> session 1: commit
> 
> And wait for session 3 to finish slot creation, takes about 20 mins on
> my laptop without patches, minute and half with your patches for 0004
> and 0005 (or with your 0004 and my 0005) and about 2s with my original
> 0004 and 0005.
> 
> What makes it slow is the constant qsorting IIRC.
> 

Btw I still think that we should pursue the approach you proposed. I
think we can deal with the qsorting in some other ways (ordered insert
perhaps?) later. What you propose fixes the correctness, makes
performance less awful in the above workload and also makes the
snapbuild bit easier to read.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process

2017-05-12 Thread Tom Lane

"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> I have filtered the logs based on PID (19825) to see if this helps to
> debug the issue further.

Is this really a stock Postgres build?

The proximate cause of the PANIC is that the startup process is seeing
other processes active even though it hasn't reachedConsistency.  This
is bad on any number of levels, quite aside from that particular PANIC,
because those other processes are presumably seeing non-consistent
database state.  Looking elsewhere in the log, we see that indeed there
seem to be several backend processes happily executing commands.
For instance, here's the trace of one of them starting up:

[19810-58f473ff.4d62-187] 2017-04-17 07:51:28.783 GMT < > DEBUG:  0: forked 
new backend, pid=19850 socket=10
[19810-58f473ff.4d62-188] 2017-04-17 07:51:28.783 GMT < > LOCATION:  
BackendStartup, postmaster.c:3884
[19850-58f47400.4d8a-1] 2017-04-17 07:51:28.783 GMT <  > LOG:  57P03: the 
database system is starting up
[19850-58f47400.4d8a-2] 2017-04-17 07:51:28.783 GMT <  > LOCATION:  
ProcessStartupPacket, postmaster.c:2143
[19850-58f47400.4d8a-3] 2017-04-17 07:51:28.784 GMT <  authentication> DEBUG:  
0: postgres child[19850]: starting with (

Now, that LOG message proves that this backend has observed that the
database is not ready to allow connections.  So why did it only emit the
message as LOG and keep going?  The code for this in 9.3 looks like

/*
 * If we're going to reject the connection due to database state, say so
 * now instead of wasting cycles on an authentication exchange. (This 
also
 * allows a pg_ping utility to be written.)
 */
switch (port->canAcceptConnections)
{
case CAC_STARTUP:
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 errmsg("the database system is 
starting up")));
break;
...

I can't draw any other conclusion but that you've hacked something
to make FATAL act like LOG.  Which is a fatal mistake.  Errors that
are marked FATAL are generally ones where allowing the process to
keep going is not an acceptable outcome.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-05-12 Thread Dmitriy Sarafannikov


> Ok, i agree. Patch is attached.

I added a patch to the CF


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-12 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> Committed, adjusting for some of the suggestions.

Looks like one occurrence was still left in:

$ ag --no-group --ignore po --ignore release-\* --ignore wal.sgml 
'\btransaction\s+log\b'
doc/src/sgml/func.sgml:18631:end of the transaction log at any instant, 
while the write location is the end of

The four other occurrences in the same paragraph were fixed, so I assume
this was just an oversight.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-12 Thread Tom Lane

Heikki Linnakangas  writes:
> On 05/11/2017 06:21 PM, Tom Lane wrote:
>> Yeah, I think it's a lack-of-round-tuits situation.  Your patch reminds
>> me of a more ambitious attempt I made awhile back to reduce the amount
>> of duplicative boilerplate in eval_const_expressions.  I think I wrote
>> it during last year's beta period, stashed it because I didn't feel like
>> arguing for applying it right then, and then forgot about it.

> Hmph, now we're almost in beta period again. :-(.

Actually, I now remember that I wrote that while at Salesforce (because
they'd run into the problem that constant ArrayRef expressions weren't
simplified).  So that means it's been languishing in a git branch for
*two* years.  I'm kind of inclined to push it before it gets forgotten
again ;-)

Looking at the patch, it still seems solid, but I remember that one
thing I was concerned about was whether the more generalized code
was any slower.  Not sure about a good test case to measure that
though --- const-simplification isn't a large chunk of most workloads.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-12 Thread Peter Eisentraut

On 5/3/17 09:10, David Steele wrote:
> The documentation changes look good to me, but the error message changes
> are a random mix of "WAL" and "write-ahead log", even when referring to
> the same thing.

The reason for this is that some of the error messages refer to a
--waldir option or something like it, so it seems appropriate to also
refer to "WAL" in the error messages.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-12 Thread Peter Eisentraut

Committed, adjusting for some of the suggestions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> I'm prepared to create a fix for that, but it'd be easier to commit the
> >> current patch first, to avoid merge conflicts.
> 
> > It seems we're mostly in agreement regarding the parts I was touching.
> > Do you want to push your version of the patch?
> 
> It's still almost all your work, so I figured you should push it.

OK, I'll do that shortly then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Bruce Momjian

On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> Definitely changing session context (search_path, date/time format, ...) may
> cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.

> Actually you may get the same problem with explicitly prepared statements
> (certainly, in the last case, you better understand what going on and it is
> your choice whether to use or not to use prepared statement).
> 
> The fact of failure of 7 regression tests means that autoprepare can really
> change behavior of existed program. This is why my suggestion is  to switch
> off this feature by default.

I would like to see us target something that can be enabled by default. 
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.

> But in 99.9% real cases (my estimation plucked out of thin air:) there will
> be no such problems with autoprepare. And it can significantly improve
> performance of OLTP applications
> which are not able to use prepared statements (because of working through
> pgbouncer or any other reasons).

Right, but we can't ship something that is 99.9% accurate when the
inaccuracy is indeterminate.  The bottom line is that Postgres has a
very high bar, and I realize you are just prototyping at this point, but
the final product is going to have to address all the intricacies of the
issue for it to be added.

> Can autoprepare slow down the system?
> Yes, it can. It can happen if application perform larger number of unique
> queries and autoprepare cache size is not limited.
> In this case large (and infinitely growing) number of stored plans can
> consume a lot of memory and, what is even worse, slowdown cache lookup.
> This is why I by default limit number of cached statements
> (autoprepare_limit parameter) by 100.

Yes, good idea.

> I am almost sure that there will be some other issues with autoprepare which
> I have not encountered yet (because I mostly tested it on pgbench and
> Postgres regression tests).
> But I am also sure that benefit of doubling system performance is good
> motivation to continue work in this direction.

Right, you are still testing to see where the edges are.

> My main concern is whether to continue to improve current approach with
> local (per-backend) cache of prepared statements.
> Or create shared cache (as in Oracle). It is much more difficult to
> implement shared cache (the same problem with session context, different
> catalog snapshots, cache invalidation,...)
> but it also provides more opportunities for queries optimization and tuning.

I would continue in the per-backend cache direction at this point
because we don't even have that solved yet.  The global cache is going
to be even more complex.

Ultimately I think we are going to want global and local caches because
the plans of the local cache are much more likely to be accurate.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> I'm prepared to create a fix for that, but it'd be easier to commit the
>> current patch first, to avoid merge conflicts.

> It seems we're mostly in agreement regarding the parts I was touching.
> Do you want to push your version of the patch?

It's still almost all your work, so I figured you should push it.

>> Hm.  OK, but then that example is pretty misleading, because it leads
>> the reader to suppose that the planner can tell the difference between
>> the selectivities of the two queries.  Maybe what's lacking is an
>> explanation of how you'd use this statistics type.

> I think we should remove the complex example from that page, and instead
> refer the reader to chapters 14 and 69.

Seems reasonable.  If we did add enough material there to be a standalone
description, it would be duplicative probably.  However, that does put
the onus on the other chapters to offer a complete definition, rather
than leaving details to the man page as we do in many other cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Alvaro Herrera

Tom Lane wrote:
> Tomas Vondra  writes:

> >> Although I've not done anything about it here, I'm not happy about the
> >> handling of dependencies for stats objects.
> 
> > Yeah, it's a bit frankensteinian ...
> 
> I'm prepared to create a fix for that, but it'd be easier to commit the
> current patch first, to avoid merge conflicts.

It seems we're mostly in agreement regarding the parts I was touching.
Do you want to push your version of the patch?

> >> Lastly, I tried the example given in the CREATE STATISTICS reference page,
> >> and it doesn't seem to work.
> 
> > I assume you're talking about the functional dependencies and in that 
> > case that's expected behavior, because f. dependencies do assume the 
> > conditions are "consistent" with the functional dependencies.
> 
> Hm.  OK, but then that example is pretty misleading, because it leads
> the reader to suppose that the planner can tell the difference between
> the selectivities of the two queries.  Maybe what's lacking is an
> explanation of how you'd use this statistics type.

I think we should remove the complex example from that page, and instead
refer the reader to chapters 14 and 69.  The CREATE STATISTICS page can
just give some overview of what the syntax looks like, and all
explanations of complex topics such as "what does it mean for a query to
be consistent with the functional dependencies" can be given at length
elsewhere.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-12 Thread Peter Eisentraut

On 5/11/17 23:59, Tom Lane wrote:
> Right, but the existing code is *designed* to hold the lock till end of
> top-level transaction, regardless of what happens in any subtransaction.
> My understanding of your complaint is that you do not think that's OK
> for any lock stronger than AccessShareLock.

What he is saying (I believe) is: Because of that custom logic to hold
the lock until the end of the transaction across subtransactions, you
will keep holding the lock even if a subtransaction aborts, which is
nonstandard behavior.  Previously, nobody cared, because nobody else was
locking sequences.  But now we're proposing to make ALTER SEQUENCE lock
the sequence, so this behavior would be visible:

S1: BEGIN;
S1: SAVEPOINT s1;
S1: SELECT nextval('seq1');
S1: ROLLBACK TO SAVEPOINT s1;
S2: ALTER SEQUENCE seq1 MAXVALUE 100;  -- would now wait for S1 commit

My amendment to that is that "previously nobody cared" is not quite
correct, because the same already happens currently with DROP SEQUENCE.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there any way to access heap_open() from _PG_init ??

2017-05-12 Thread Tom Lane

Sairam Gaddam  writes:
> During startup (_PG_init), I need to access some meta info of
> table/relation (like PK Column Position, FK Column Positions, Index Column
> Positions etc...) and load it into memory.

Why not fetch that info at first use, instead?

If you insist on doing it at _PG_init, you'll never be able to make the
extension work as a shared_preload_libraries item, where _PG_init
would be run in the postmaster.  (I think local_preload_libraries would
be problematic too; not sure that you're inside a transaction there.)
You won't be able to retry after an error, or more generally to cope with
post-load-time changes in the data you want to cache.

It's probably possible to make it work as long as the library gets loaded
during a transaction, ie in response to some SQL command.  Without seeing
your code we can't guess why its crashing though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-12 Thread Peter Eisentraut

On 5/11/17 17:28, Andres Freund wrote:
> Isn't that pretty much the point?  The whole open_share_lock()
> optimization looks like it really only can make a difference with
> subtransactions?

Yeah that confused me too.  That keep-the-lock-for-the-whole-transaction
logic was introduced in a2597ef17958e75e7ba26507dc407249cc9e7134 around
7.3, way before subtransactions (8.0).  The point there was apparently
just to avoid hitting the lock manager on subsequent calls.  That code
has since been moved around end refactored many times.  When
subtransactions were introduced, the current logic of keeping the lock
across subtransactions was added in order to keep the original intent.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread Neha Khatri

>
>
>
> I sent my version of patch in parallel. I think we don't need to do the
> relation open like you did, all the info is in syscache.
>

That's right.

>
Regards,
Neha

Re: [HACKERS] Addition of pg_dump --no-publications

2017-05-12 Thread Tom Lane

David Fetter  writes:
> While it's consistent with surrounding code, I find the use of ints to
> express what is in essence a boolean condition puzzling.  Any
> insights?

IIRC, it's forced by the getopt_long API, particularly the way that
the long-options struct has to be declared.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-12 Thread Neha Khatri

On Sat, May 13, 2017 at 12:04 AM, Petr Jelinek  wrote:

> > After this commit 024711bb544645c8b1061e9f02b261e2e336981d I get
> > following error while executing CREATE SUBSCRIPTION:
> >
> > CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres host=localhost
> > user=neha port=5432' PUBLICATION mypub;
> > NOTICE:  synchronized table states
> > ERROR:  could not create replication slot "sub1": ERROR:  could not
> > load library "/home/neha/postgres/PGCurrentInstall/lib/pgoutput.so":
> > /home/neha/postgres/PGCurrentInstall/lib/pgoutput.so: undefined
> > symbol: OutputPluginUpdateProgress
> >
>
> Hmm, that sounds like partial rebuild/install (old postgres binary with
> new pgoutput one).
>

That's right.  Thanks.

Neha

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-12 Thread Peter Eisentraut

On 5/11/17 16:34, Andres Freund wrote:
>>> This'd probably need to be removed, as we'd otherwise would get very
>>> weird semantics around aborted subxacts.
>> Can you explain in more detail what you mean by this?
> Well, right now we don't do proper lock-tracking for sequences, always
> assigning them to the toplevel transaction.  But that doesn't seem
> proper when nextval() would conflict with ALTER SEQUENCE et al, because
> then locks would continue to be held by aborted savepoints.

I see what you mean here.  We already have this issue with DROP SEQUENCE.

While it would be nice to normalize this, I think it's quite esoteric.
I doubt users have any specific expectations how sequences behave in
aborted subtransactions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread Petr Jelinek

On 12/05/17 15:55, Neha Khatri wrote:
> On 5/11/17, Petr Jelinek  wrote:
>> Hi,
>>
>> On 11/05/17 14:25, tushar wrote:
>>> Hi,
>>>
>>> I observed that -we cannot publish "foreign table" in Publication
>>>
>>> but same thing is not true for Subscription
>>>
>>> postgres=# create foreign table t (n int) server db1_server options
>>> (table_name 't');
>>> CREATE FOREIGN TABLE
>>> postgres=# alter subscription sub refresh publication ;
>>> NOTICE:  added subscription for table public.t
>>> ALTER SUBSCRIPTION
>>>
>>> Is this an expected behavior ?   if we cannot publish then how  can we
>>> add subscription for it.
> 
> The above foreign table subscription succeeds only if the publisher
> has published a same named normal table i.e.
>   create table t (n int);
>   CREATE PUBLICATION mypub FOR TABLE t;
> 
> I think in current implemetation of LogicalRep. it is users
> responsibility to match the table definition at publisher and
> subscriber. Subscriber can not determine by itself what publisher has
> defined. This usecase does not align with this assumption.
> 
> 
>> However, the foreign tables indeed can't be subscribed.
> 
> I suspect that a user would want to subcribe a foreign table in real world.
> 
>> I think it does make sense to add check for this into CREATE/ALTER
>> SUBSCRIBER though so that user is informed immediately about the mistake
>> rather than by errors in the logs later.
> 
> Yes, system should prohibit such operation though.
> I tried to write to a patch to achieve this. It disalows subscription
> of relation other than RELKIND_RELATION.
> Attached is the patch. Comments?
> 

I sent my version of patch in parallel. I think we don't need to do the
relation open like you did, all the info is in syscache.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)

2017-05-12 Thread Petr Jelinek

On 12/05/17 15:02, Peter Eisentraut wrote:
> On 5/9/17 11:43, Petr Jelinek wrote:
>> Here is rebased version of the other patch (the interface rework). I
>> also fixed the tab completion there.
> 
> Committed.
> 
> One small thing I changed, to make it look more like CREATE/ALTER TABLE,
> is that the CREATE commands use WITH (...) but the ALTER commands use
> SET (...).
> 

Makes sense, thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Addition of pg_dump --no-publications

2017-05-12 Thread David Fetter

On Fri, May 12, 2017 at 10:59:27AM +0900, Michael Paquier wrote:
> On Thu, May 11, 2017 at 3:19 PM, Michael Paquier
>  wrote:
> > I imagine that pg_dump -s would be the basic operation that users
> > would do first before creating a subcription on a secondary node, but
> > what I find surprising is that publications are dumped by default. I
> > don't find confusing that those are actually included by default to be
> > consistent with the way subcriptions are handled, what I find
> > confusing is that there are no options to not dump them, and no
> > options to bypass their restore.
> >
> > So, any opinions about having pg_dump/pg_restore --no-publications?
> 
> And that's really a boring patch, giving the attached.

While it's consistent with surrounding code, I find the use of ints to
express what is in essence a boolean condition puzzling.  Any
insights?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PROVE_FLAGS

2017-05-12 Thread Tom Lane

Michael Paquier  writes:
> On Fri, May 12, 2017 at 9:05 PM, Andrew Dunstan
>  wrote:
>> Does anyone object to me backpatching this? It seems to me kinda crazy
>> to have --verbose hardcoded on the back branches and not on the dev branch.

> +1. A maximum of consistency with the test code when possible is nice to have.

No objection here either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tom Lane

Tomas Vondra  writes:
> On 5/12/17 4:46 AM, Tom Lane wrote:
>> If people agree that that reads better, we should make an effort to
>> propagate that terminology elsewhere in the docs, and into error messages,
>> objectaddress.c output, etc.

> I'm fine with the 'statistics object' wording. I've been struggling with 
> this bit while working on the patch, and I agree it sounds better and 
> getting it consistent is definitely worthwhile.

OK.  We can work on that as a followon patch.

>> Although I've not done anything about it here, I'm not happy about the
>> handling of dependencies for stats objects.

> Yeah, it's a bit frankensteinian ...

I'm prepared to create a fix for that, but it'd be easier to commit the
current patch first, to avoid merge conflicts.

>> Lastly, I tried the example given in the CREATE STATISTICS reference page,
>> and it doesn't seem to work.

> I assume you're talking about the functional dependencies and in that 
> case that's expected behavior, because f. dependencies do assume the 
> conditions are "consistent" with the functional dependencies.

Hm.  OK, but then that example is pretty misleading, because it leads
the reader to suppose that the planner can tell the difference between
the selectivities of the two queries.  Maybe what's lacking is an
explanation of how you'd use this statistics type.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-12 Thread Beena Emerson

Hello,


On Fri, May 12, 2017 at 5:30 PM, Rahila Syed  wrote:

> Hello,
>
> >(1) With the new patch, we allow new partitions when there is overlapping
> data with default partition. The entries in default are ignored when
> running queries satisfying the new partition.
> This was introduced in latest version. We are not allowing adding a
> partition when entries with same key value exist in default partition. So
> this scenario should not
> come in picture. Please find attached an updated patch which corrects this.
>

Thank you for the updated patch. However, now I cannot create a partition
after default.

CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);

CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


>
>
> >(2) I get the following warning:
>
> >partition.c: In function ‘check_new_partition_bound’:
> >partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
> >   && boundinfo->has_default)
>^
> >preproc.y:3250.2-8: warning: type clash on default action:  != <>
> I failed to notice this warning. I will look into it.
>
> >This is incredibly ugly.  I don't know exactly what should be done
> >about it, but I think PARTITION_DEFAULT is a bad idea and has got to
> >go.  Maybe add a separate isDefault flag to PartitionBoundSpec
> Will look at other ways to do it.
>
> >Doesn't it strike you as a bit strange that get_qual_for_default()
> >doesn't return a qual?  Functions should generally have names that
> >describe what they do.
> Will fix this.
>
> >There's an existing function that you can use to concatenate two lists
> >instead of open-coding it.
> Will check this.
>
> >you should really add the documentation and
> >regression tests which you mentioned as a TODO.  And run the code
> >through pgindent
> I will also update the next version with documentation and regression tests
> and run pgindent
>
> Thank you,
> Rahila Syed
>
> On Fri, May 12, 2017 at 4:33 PM, Beena Emerson 
> wrote:
>
>>
>>
>> On Thu, May 11, 2017 at 7:37 PM, Rahila Syed 
>> wrote:
>>
>>> Hello,
>>>
>>> Please find attached an updated patch with review comments and bugs
>>> reported till date implemented.
>>>
>>
>> Hello Rahila,
>>
>> Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."
>>
>> (1) With the new patch, we allow new partitions when there is overlapping
>> data with default partition. The entries in default are ignored when
>> running queries satisfying the new partition.
>>
>> DROP TABLE list1;
>> CREATE TABLE list1 (
>> a int,
>> b int
>> ) PARTITION BY LIST (a);
>> CREATE TABLE list1_1 (LIKE list1);
>> ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
>> CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
>> INSERT INTO list1 SELECT generate_series(1,2),1;
>> -- Partition overlapping with DEF
>> CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
>> INSERT INTO list1 SELECT generate_series(2,3),2;
>>
>> postgres=# SELECT * FROM list1 ORDER BY a,b;
>>  a | b
>> ---+---
>>  1 | 1
>>  2 | 1
>>  2 | 2
>>  3 | 2
>> (4 rows)
>>
>> postgres=# SELECT * FROM list1 WHERE a=2;
>>  a | b
>> ---+---
>>  2 | 2
>> (1 row)
>>
>> This ignores the a=2 entries in the DEFAULT.
>>
>> postgres=# SELECT * FROM list1_def;
>>  a | b
>> ---+---
>>  2 | 1
>>  3 | 2
>> (2 rows)
>>
>>
>> (2) I get the following warning:
>>
>> partition.c: In function ‘check_new_partition_bound’:
>> partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in
>> this function [-Wmaybe-uninitialized]
>>&& boundinfo->has_default)
>>^
>> preproc.y:3250.2-8: warning: type clash on default action:  != <>
>>
>>
>>> >1.
>>> >In following block, we can just do with def_index, and we do not need
>>> found_def
>>> >flag. We can check if def_index is -1 or not to decide if default
>>> partition is
>>> >present.
>>> found_def is used to set boundinfo->has_default which is used at couple
>>> of other places to check if default partition exists. The implementation
>>> is similar
>>> to has_null.
>>>
>>> >3.
>>> >In following function isDefaultPartitionBound, first statement "return
>>> false"
>>> >is not needed.
>>> It is needed to return false if the node is not DefElem.
>>>
>>> Todo:
>>> Add regression tests
>>> Documentation
>>>
>>> Thank you,
>>> Rahila Syed
>>>
>>>
>>>
>


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-12 Thread Petr Jelinek

On 12/05/17 15:09, Neha Khatri wrote:
> On Fri, May 12, 2017 at 8:19 PM, Simon Riggs  wrote:
>>
>> On 11 May 2017 at 18:29, Simon Riggs  wrote:
>>> On 11 May 2017 at 18:13, Andres Freund  wrote:
>>>
> New patch, v3.
>
> Applying in 90 minutes, barring objections.

 Could you please wait till tomorrow?  I've bigger pending fixes for 
 related code pending/being tested that I plan to push today.  I'd also 
 like to take a look before...
>>>
>>> Sure.
>>
>> The changes I've added are very minor, so I'm not expecting debate.
>> The main part of the patch is the same as Petr posted 19days ago.
>>
>> I'm travelling now, so after waiting till tomorrow as you requested I
>> have committed the patch.
>>
> 
> Prior to this commit CREATE SUBSCRIPTION used to work smoothly.
> 
> After this commit 024711bb544645c8b1061e9f02b261e2e336981d I get
> following error while executing CREATE SUBSCRIPTION:
> 
> CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres host=localhost
> user=neha port=5432' PUBLICATION mypub;
> NOTICE:  synchronized table states
> ERROR:  could not create replication slot "sub1": ERROR:  could not
> load library "/home/neha/postgres/PGCurrentInstall/lib/pgoutput.so":
> /home/neha/postgres/PGCurrentInstall/lib/pgoutput.so: undefined
> symbol: OutputPluginUpdateProgress
> 

Hmm, that sounds like partial rebuild/install (old postgres binary with
new pgoutput one).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread Petr Jelinek

On 11/05/17 15:43, Petr Jelinek wrote:
> Hi,
> 
> On 11/05/17 14:25, tushar wrote:
>> Hi,
>>
>> I observed that -we cannot publish "foreign table" in Publication
>>
>> postgres=# create foreign table t (n int) server db1_server options
>> (table_name 't1');
>> CREATE FOREIGN TABLE
>>
>> postgres=# create publication pub for table t;
>> ERROR:  "t" is not a table
>> DETAIL:  Only tables can be added to publications.
>> postgres=#
>>
>> but same thing is not true for Subscription
>>
>> postgres=# create foreign table t (n int) server db1_server options
>> (table_name 't');
>> CREATE FOREIGN TABLE
>> postgres=# alter subscription sub refresh publication ;
>> NOTICE:  added subscription for table public.t
>> ALTER SUBSCRIPTION
>>
>> Is this an expected behavior ?   if we cannot publish then how  can we 
>> add subscription for it.
>>
> 
> Thanks for report. What you can publish and what you can subscribe is
> not necessarily same (we can write to relations which we can't capture
> from wal, for example unlogged table can't be published but can be
> subscribed).
> 
> However, the foreign tables indeed can't be subscribed. I originally
> planned to have foreign tables allowed on subscriber but it turned out
> to be more complex to implement than I had anticipated do I ripped the
> code for that from the original patch.
> 
> We do check for this, but only during replication which we have to do
> because the fact that relation 't' was foreign table during ALTER
> SUBSCRIPTION does not mean that it won't be something else half hour later.
> 
> I think it does make sense to add check for this into CREATE/ALTER
> SUBSCRIBER though so that user is informed immediately about the mistake
> rather than by errors in the logs later.
> 
> I'll look into writing patch for this. I don't think it's beta blocker
> though.
> 

So I moved the relkind check to single function and call it from all the
necessary places. See the attached

I now wonder if we should do some other checks as well (columns etc).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Check-relkind-of-tables-in-CREATE-ALTER-SUBSCRIPTION.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread Neha Khatri

[Correction below]

On 5/12/17, Neha Khatri  wrote:
> On 5/11/17, Petr Jelinek  wrote
>
>> However, the foreign tables indeed can't be subscribed.

Yes, I suspect that a user would _not_ want to subcribe a foreign
table in real world.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread Neha Khatri

On 5/11/17, Petr Jelinek  wrote:
> Hi,
>
> On 11/05/17 14:25, tushar wrote:
>> Hi,
>>
>> I observed that -we cannot publish "foreign table" in Publication
>>
>> but same thing is not true for Subscription
>>
>> postgres=# create foreign table t (n int) server db1_server options
>> (table_name 't');
>> CREATE FOREIGN TABLE
>> postgres=# alter subscription sub refresh publication ;
>> NOTICE:  added subscription for table public.t
>> ALTER SUBSCRIPTION
>>
>> Is this an expected behavior ?   if we cannot publish then how  can we
>> add subscription for it.

The above foreign table subscription succeeds only if the publisher
has published a same named normal table i.e.
  create table t (n int);
  CREATE PUBLICATION mypub FOR TABLE t;

I think in current implemetation of LogicalRep. it is users
responsibility to match the table definition at publisher and
subscriber. Subscriber can not determine by itself what publisher has
defined. This usecase does not align with this assumption.


> However, the foreign tables indeed can't be subscribed.

I suspect that a user would want to subcribe a foreign table in real world.

> I think it does make sense to add check for this into CREATE/ALTER
> SUBSCRIBER though so that user is informed immediately about the mistake
> rather than by errors in the logs later.

Yes, system should prohibit such operation though.
I tried to write to a patch to achieve this. It disalows subscription
of relation other than RELKIND_RELATION.
Attached is the patch. Comments?

Regards,
Neha
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b76cdc5..f6013da 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -440,9 +440,20 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 			{
 RangeVar   *rv = (RangeVar *) lfirst(lc);
 Oid			relid;
+Relation		relation;
 
 relid = RangeVarGetRelid(rv, AccessShareLock, false);
 
+relation = RelationIdGetRelation(relid);
+
+if (RelationGetForm(relation)->relkind != RELKIND_RELATION)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("\"%s\" is not a normal table",
+	RelationGetRelationName(relation)),
+			 errdetail("Only normal tables can be subscribed.")));
+RelationClose(relation);
+
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
 			}
@@ -553,8 +564,21 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
 	{
 		RangeVar   *rv = (RangeVar *) lfirst(lc);
 		Oid			relid;
+		Relation		relation;
 
 		relid = RangeVarGetRelid(rv, AccessShareLock, false);
+
+		relation = RelationIdGetRelation(relid);
+
+		if (RelationGetForm(relation)->relkind != RELKIND_RELATION)
+			ereport(NOTICE,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("\"%s\" is not a normal table",
+		RelationGetRelationName(relation)),
+	 errdetail("Only normal tables can be subscribed.")));
+
+		RelationClose(relation);
+
 		pubrel_local_oids[off++] = relid;
 
 		if (!bsearch(&relid, subrel_local_oids,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-12 Thread Tom Lane

Michael Paquier  writes:
> On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki
>  wrote:
>> Likewise, when the first host has already reached max_connections, libpq 
>> doesn't attempt the connection aginst later hosts.

> It seems to me that the feature is behaving as wanted. Or in short
> attempt to connect to the next host only if a connection cannot be
> established. If there is a failure once the exchange with the server
> has begun, just consider it as a hard failure. This is an important
> property for authentication and SSL connection failures actually.

I would not really expect that reconnection would retry after arbitrary
failure cases.  Should it retry for "wrong database name", for instance?
It's not hard to imagine that leading to very confusing behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Addition of pg_dump --no-publications

2017-05-12 Thread Peter Eisentraut

On 5/11/17 21:59, Michael Paquier wrote:
> On Thu, May 11, 2017 at 3:19 PM, Michael Paquier
>  wrote:
>> I imagine that pg_dump -s would be the basic operation that users
>> would do first before creating a subcription on a secondary node, but
>> what I find surprising is that publications are dumped by default. I
>> don't find confusing that those are actually included by default to be
>> consistent with the way subcriptions are handled, what I find
>> confusing is that there are no options to not dump them, and no
>> options to bypass their restore.
>>
>> So, any opinions about having pg_dump/pg_restore --no-publications?
> 
> And that's really a boring patch, giving the attached.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-12 Thread Neha Khatri

On Fri, May 12, 2017 at 8:19 PM, Simon Riggs  wrote:
>
> On 11 May 2017 at 18:29, Simon Riggs  wrote:
> > On 11 May 2017 at 18:13, Andres Freund  wrote:
> >
> >>>New patch, v3.
> >>>
> >>>Applying in 90 minutes, barring objections.
> >>
> >> Could you please wait till tomorrow?  I've bigger pending fixes for 
> >> related code pending/being tested that I plan to push today.  I'd also 
> >> like to take a look before...
> >
> > Sure.
>
> The changes I've added are very minor, so I'm not expecting debate.
> The main part of the patch is the same as Petr posted 19days ago.
>
> I'm travelling now, so after waiting till tomorrow as you requested I
> have committed the patch.
>

Prior to this commit CREATE SUBSCRIPTION used to work smoothly.

After this commit 024711bb544645c8b1061e9f02b261e2e336981d I get
following error while executing CREATE SUBSCRIPTION:

CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres host=localhost
user=neha port=5432' PUBLICATION mypub;
NOTICE:  synchronized table states
ERROR:  could not create replication slot "sub1": ERROR:  could not
load library "/home/neha/postgres/PGCurrentInstall/lib/pgoutput.so":
/home/neha/postgres/PGCurrentInstall/lib/pgoutput.so: undefined
symbol: OutputPluginUpdateProgress

Regards
Neha


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)

2017-05-12 Thread Peter Eisentraut

On 5/9/17 11:43, Petr Jelinek wrote:
> Here is rebased version of the other patch (the interface rework). I
> also fixed the tab completion there.

Committed.

One small thing I changed, to make it look more like CREATE/ALTER TABLE,
is that the CREATE commands use WITH (...) but the ALTER commands use
SET (...).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PROVE_FLAGS

2017-05-12 Thread Michael Paquier

On Fri, May 12, 2017 at 9:05 PM, Andrew Dunstan
 wrote:
> Does anyone object to me backpatching this? It seems to me kinda crazy
> to have --verbose hardcoded on the back branches and not on the dev branch.

+1. A maximum of consistency with the test code when possible is nice to have.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread amul sul

On Thu, May 11, 2017 at 9:32 PM, Dilip Kumar  wrote:
> On Wed, May 3, 2017 at 6:39 PM, amul sul  wrote:
>> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas  wrote:
>>
>>>I spent some time today looking at these patches.  It seems like there
>>>is some more work still needed here to produce something committable
>>>regardless of which way we go, but I am inclined to think that Amul's
>>>patch is a better basis for work going forward than Nagata-san's
>>>patch. Here are some general comments on the two patches:
>>
>> Thanks for your time.
>>
>> [...]
>>
>>> - Neither patch contains any documentation updates, which is bad.
>>
>> Fixed in the attached version.
>
> I have done an intial review of the patch and I have some comments.  I
> will continue the review
> and testing and report the results soon
>
> -
> Patch need to be rebased
>
> 
>
> if (key->strategy == PARTITION_STRATEGY_RANGE)
> {
> /* Disallow nulls in the range partition key of the tuple */
> for (i = 0; i < key->partnatts; i++)
> if (isnull[i])
> ereport(ERROR,
> (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
> errmsg("range partition key of row contains null")));
> }
>
> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
> for hash also, right?
> 
We do.

>
> RangeDatumContent **content;/* what's contained in each range bound datum?
>   * (see the above enum); NULL for list
>   * partitioned tables */
>
> This will be NULL for hash as well we need to change the comments.
> -
Fixed in previously posted patch(v3).

>
>   bool has_null; /* Is there a null-accepting partition? false
>   * for range partitioned tables */
>   int null_index; /* Index of the null-accepting partition; -1
>
> Comments needs to be changed for these two members as well
> 
Fixed in previously posted patch(v3).

>
> +/* One bound of a hash partition */
> +typedef struct PartitionHashBound
> +{
> + int modulus;
> + int remainder;
> + int index;
> +} PartitionHashBound;
>
> It will good to add some comments to explain the structure members
>
I think we don't really need that, variable names are ample to explain
its purpose.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Getting error at the time of dropping subscription and few more issues

2017-05-12 Thread tushar


Hi,

There are few more issues , found in logical replication

(1)ERROR:  tuple concurrently updated

Publication Server - (X machine)
\\create table \ create publication \ insert rows
 create table t(n int);
 create publication pub for table t;
 insert into t values (generate_series(1,100));

Subscription Server-(Y machine)
\\create table t / create subscription
create table t(n int);
create subscription sub connection 'dbname=postgres  port=5000 
user=centos  password=a' publication pub;


\\drop subscription and  re-create  (repeat this 2-3 times)
postgres=# drop subscription sub;
NOTICE:  dropped replication slot "sub" on publisher
DROP SUBSCRIPTION
postgres=# create subscription sub connection 'dbname=postgres port=5000 
user=centos  password=a' publication pub;

NOTICE:  synchronized table states
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION
postgres=# select count(*) from t;
  count
-
 100
(1 row)

postgres=# drop subscription sub;
ERROR:  tuple concurrently updated

(2) Not able to drop the subscription even 'nocreate slot' is specified

postgres=# create subscription s2s1 connection 'dbname=postgres 
port=5000 user=t  password=a' publication t with (nocreate 
slot,enabled,copydata,SYNCHRONOUS_COMMIT='on');

NOTICE:  synchronized table states
CREATE SUBSCRIPTION

--not able to drop subscription ,  i have checked on Publication - no 
such slot created but still it is looking for slot.

postgres=# drop subscription s2s1;
ERROR:  could not drop the replication slot "s2s1" on publisher
DETAIL:  The error was: ERROR:  replication slot "s2s1" does not exist

(3)Alter publication SET  command doesn't give you NOTICE message about 
tables which got removed.


postgres=# create publication pub for table t,t1,t2 ;
CREATE PUBLICATION

postgres=# select * from pg_publication_tables ;
 pubname | schemaname | tablename
-++---
 pub | public | t
 pub | public | t1
 pub | public | t2
(3 rows)

postgres=# alter publication pub set table t;
ALTER PUBLICATION

postgres=# select * from pg_publication_tables ;
 pubname | schemaname | tablename
-++---
 pub | public | t
(1 row)

in subscription -  (we are getting NOTICE message, about tables which 
got removed)


postgres=#  alter subscription sub set publication pub refresh;
NOTICE:  removed subscription for table public.t1
NOTICE:  removed subscription for table public.t2
ALTER SUBSCRIPTION

I think  - in publication too ,we should provide NOTICE messages.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



https://sites.google.com/a/enterprisedb.com/old-new-touplestores/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PROVE_FLAGS

2017-05-12 Thread Andrew Dunstan



On 05/04/2017 12:50 AM, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Andrew Dunstan  writes:
>>> Can someone please explain to me why we have this in Makefile.global.in?
>>> (from commit e9c81b60 )
>>> PROVE_FLAGS =
>> Before that commit it was like
>>
>>  PROVE_FLAGS = --verbose
> right.
>
>> which had some value.  I agree that now we'd be better off to not
>> set it at all, especially since the convention now appears to be that
>> automatically-supplied prove options should be set in PG_PROVE_FLAGS.
> Good point.
>
>> I'd suggest that the comment just above be replaced by something like
>>
>> # User-supplied prove flags can be provided in PROVE_FLAGS.
> Works for me.
>


Does anyone object to me backpatching this? It seems to me kinda crazy
to have --verbose hardcoded on the back branches and not on the dev branch.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-12 Thread Rahila Syed

Hello,

>(1) With the new patch, we allow new partitions when there is overlapping
data with default partition. The entries in default are ignored when
running queries satisfying the new partition.
This was introduced in latest version. We are not allowing adding a
partition when entries with same key value exist in default partition. So
this scenario should not
come in picture. Please find attached an updated patch which corrects this.

>(2) I get the following warning:

>partition.c: In function ‘check_new_partition_bound’:
>partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
>   && boundinfo->has_default)
   ^
>preproc.y:3250.2-8: warning: type clash on default action:  != <>
I failed to notice this warning. I will look into it.

>This is incredibly ugly.  I don't know exactly what should be done
>about it, but I think PARTITION_DEFAULT is a bad idea and has got to
>go.  Maybe add a separate isDefault flag to PartitionBoundSpec
Will look at other ways to do it.

>Doesn't it strike you as a bit strange that get_qual_for_default()
>doesn't return a qual?  Functions should generally have names that
>describe what they do.
Will fix this.

>There's an existing function that you can use to concatenate two lists
>instead of open-coding it.
Will check this.

>you should really add the documentation and
>regression tests which you mentioned as a TODO.  And run the code
>through pgindent
I will also update the next version with documentation and regression tests
and run pgindent

Thank you,
Rahila Syed

On Fri, May 12, 2017 at 4:33 PM, Beena Emerson 
wrote:

>
>
> On Thu, May 11, 2017 at 7:37 PM, Rahila Syed 
> wrote:
>
>> Hello,
>>
>> Please find attached an updated patch with review comments and bugs
>> reported till date implemented.
>>
>
> Hello Rahila,
>
> Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."
>
> (1) With the new patch, we allow new partitions when there is overlapping
> data with default partition. The entries in default are ignored when
> running queries satisfying the new partition.
>
> DROP TABLE list1;
> CREATE TABLE list1 (
> a int,
> b int
> ) PARTITION BY LIST (a);
> CREATE TABLE list1_1 (LIKE list1);
> ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
> CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
> INSERT INTO list1 SELECT generate_series(1,2),1;
> -- Partition overlapping with DEF
> CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
> INSERT INTO list1 SELECT generate_series(2,3),2;
>
> postgres=# SELECT * FROM list1 ORDER BY a,b;
>  a | b
> ---+---
>  1 | 1
>  2 | 1
>  2 | 2
>  3 | 2
> (4 rows)
>
> postgres=# SELECT * FROM list1 WHERE a=2;
>  a | b
> ---+---
>  2 | 2
> (1 row)
>
> This ignores the a=2 entries in the DEFAULT.
>
> postgres=# SELECT * FROM list1_def;
>  a | b
> ---+---
>  2 | 1
>  3 | 2
> (2 rows)
>
>
> (2) I get the following warning:
>
> partition.c: In function ‘check_new_partition_bound’:
> partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
>&& boundinfo->has_default)
>^
> preproc.y:3250.2-8: warning: type clash on default action:  != <>
>
>
>> >1.
>> >In following block, we can just do with def_index, and we do not need
>> found_def
>> >flag. We can check if def_index is -1 or not to decide if default
>> partition is
>> >present.
>> found_def is used to set boundinfo->has_default which is used at couple
>> of other places to check if default partition exists. The implementation
>> is similar
>> to has_null.
>>
>> >3.
>> >In following function isDefaultPartitionBound, first statement "return
>> false"
>> >is not needed.
>> It is needed to return false if the node is not DefElem.
>>
>> Todo:
>> Add regression tests
>> Documentation
>>
>> Thank you,
>> Rahila Syed
>>
>>
>>


default_partition_v11.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-12 Thread Tomas Vondra

On 5/12/17 4:46 AM, Tom Lane wrote:

Alvaro Herrera  writes:

Hmm, yeah, makes sense.  Here's a patch for this approach.

...

Also, while reading the docs changes, I got rather ill from the
inconsistent and not very grammatical handling of "statistics" as a
noun, particularly the inability to decide from one sentence to the next
if that is singular or plural.  In the attached, I fixed this in the
ref/*_statistics.sgml files by always saying "statistics object" instead.
If people agree that that reads better, we should make an effort to
propagate that terminology elsewhere in the docs, and into error messages,
objectaddress.c output, etc.

I'm fine with the 'statistics object' wording. I've been struggling with 
this bit while working on the patch, and I agree it sounds better and 
getting it consistent is definitely worthwhile.

>

Although I've not done anything about it here, I'm not happy about the
handling of dependencies for stats objects.  I do not think that cloning
RemoveStatistics as RemoveStatisticsExt is sane at all.  The former is
meant to deal with cleaning up pg_statistic rows that we know will be
there, so there's no real need to expend pg_depend overhead to track them.
For objects that are only loosely connected, the right thing is to use
the dependency system; in particular, this design has no real chance of
working well with cross-table stats.  Also, it's really silly to have
*both* this hard-wired mechanism and a pg_depend entry; the latter is
surely redundant if you have the former.  IMO we should revert
RemoveStatisticsExt and instead handle things by making stats objects
auto-dependent on the individual column(s) they reference (not the whole
table).

I'm also of the opinion that having an AUTO dependency, rather than
a NORMAL dependency, on the stats object's schema is the wrong semantics.
There isn't any other case where you can drop a non-empty schema without
saying CASCADE, and I'm mystified why this case should act that way.

Yeah, it's a bit frankensteinian ...

>

Lastly, I tried the example given in the CREATE STATISTICS reference page,
and it doesn't seem to work.  Without the stats object, the two queries
are both estimated at one matching row, whereas they really produce 100
and 0 rows respectively.  With the stats object, they seem to both get
estimated at ~100 rows, which is a considerable improvement for one case
but a considerable disimprovement for the other.  If this is not broken,
I'd like to know why not.  What's the point of an expensive extended-
stats mechanism if it can't get this right?

I assume you're talking about the functional dependencies and in that 
case that's expected behavior, because f. dependencies do assume the 
conditions are "consistent" with the functional dependencies.

This is an inherent limitation of functional dependencies, and perhaps a 
price for the simplicity of this statistics type. If your application 
executes queries that are likely not consistent with this, don't use 
functional dependencies.

The simplicity is why dependencies were implemented first, mostly to 
introduce all the infrastructure. The other statistics types (MCV, 
histograms) don't have this limitation, but those did not make it into pg10.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PoC: full merge join on comparison clause

2017-05-12 Thread Alexander Kuzmenkov


Hi hackers,

As you know, at this time Postgres cannot perform a full join on a 
comparison clause. For example, if we have two tables with numeric 
columns and run a query like 'select * from t1 full join t2 on t1.a > 
t2.a', we get an error: "FULL JOIN is only supported with merge-joinable 
or hash-joinable join conditions". Such queries are legitimate SQL and 
sometimes arise when porting applications from different DBMS, so it 
would be good to support them in Postgres. They can be rewritten as 
union of right and left joins, but it requires manual work and runs 
somewhat slower (see the benchmark at the end of the letter). This 
proof-of-concept patch explores the possibility of performing such 
queries as merge joins.


Consider the following example where outer and inner relations are in 
ascending order, and we are asked to return outer tuples that are 
greater than inner.

outer > inner
   outer tuple -  6   4  - marked tuple
  7   5
  8   6  - inner tuple
  8   7

The main difference from normal merge join is that we do not need to 
advance the marked tuple. This behavior can be implemented with some 
simple changes to the function that compares inner and outer tuples. 
However, for the join clause 'outer < inner' we would have to advance 
the marked tuple, which would require adding a new state to the merge 
join executor node. We do not do this. Instead, at the path creation 
stage, we make sure that the particular combination of sorting order and 
join clause allows us to perform merge join the simple way.


The optimizer requires some other changes to support these joins. 
Currently, it uses the same clauses to generate equivalence classes and 
to perform merge joins. This patch has to separate these two uses. 
Clauses that correspond to a btree equality operator are used to 
construct equivalence classes; the operator families for these clauses 
are recorded in the 'equivopfamilies' field of RestrictInfo struct. 
Clauses that correspond to btree equality or comparison are used to 
perform merge joins, and have their operator families recorded in the 
'mergeopfamilies'.


The optimizer also has to check whether the particular join clause list 
can be used for merge join, and ensure that it is compatible with 
inner/outer path ordering. These checks are performed by 
'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'.


There is an important unsolved problem in this patch. When generating 
index paths for base relations, the optimizer tries to use only one scan 
direction to limit the number of paths. This direction might not be 
suitable for a given join clause, and the input path will have to be 
sorted. We could generate paths for both directions, but this was 
specifically removed for optimization (SHA 834ddc62 by Tom Lane, 
10/27/2007 09:45 AM).


For inner joins one would expect the merge join to be slower than the 
nested loop, because it has more complex code paths, and indeed this can 
be seen on simple benchmarks (see the end of the letter). Costs should 
be revised further to reflect this difference.


I would be glad to hear your opinion on this approach.

Some benchmarks:

= Full join vs union of left and right joins 



test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a 
union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null;

   QUERY PLAN
-
 Append  (cost=809.69..70703.19 rows=334 width=8) (actual 
time=8.336..1195.534 rows=5007546 loops=1)
   ->  Merge Left Join  (cost=809.69..34230.49 rows=333 width=8) 
(actual time=8.335..920.442 rows=5007537 loops=1)

 Merge Cond: (t1.a > t4.a)
 ->  Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 
rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1)

   Heap Fetches: 97
 ->  Sort  (cost=809.39..834.39 rows=1 width=4) (actual 
time=8.300..356.821 rows=5007538 loops=1)

   Sort Key: t4.a
   Sort Method: quicksort  Memory: 931kB
   ->  Seq Scan on t4  (cost=0.00..145.00 rows=1 
width=4) (actual time=0.019..2.533 rows=1 loops=1)
   ->  Nested Loop Anti Join  (cost=0.28..3072.71 rows=6667 width=8) 
(actual time=4.685..35.421 rows=9 loops=1)
 ->  Seq Scan on t4 t4_1  (cost=0.00..145.00 rows=1 
width=4) (actual time=0.010..0.656 rows=1 loops=1)
 ->  Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10 
rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=1)

   Index Cond: (a > t4_1.a)
   Heap Fetches: 971
 Planning time: 1.414 ms
 Execution time: 1324.985 ms
(16 rows)

test1=# explain analyze select * from t

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-12 Thread Beena Emerson

On Thu, May 11, 2017 at 7:37 PM, Rahila Syed  wrote:

> Hello,
>
> Please find attached an updated patch with review comments and bugs
> reported till date implemented.
>

Hello Rahila,

Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."

(1) With the new patch, we allow new partitions when there is overlapping
data with default partition. The entries in default are ignored when
running queries satisfying the new partition.

DROP TABLE list1;
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;

postgres=# SELECT * FROM list1 ORDER BY a,b;
 a | b
---+---
 1 | 1
 2 | 1
 2 | 2
 3 | 2
(4 rows)

postgres=# SELECT * FROM list1 WHERE a=2;
 a | b
---+---
 2 | 2
(1 row)

This ignores the a=2 entries in the DEFAULT.

postgres=# SELECT * FROM list1_def;
 a | b
---+---
 2 | 1
 3 | 2
(2 rows)


(2) I get the following warning:

partition.c: In function ‘check_new_partition_bound’:
partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
   && boundinfo->has_default)
   ^
preproc.y:3250.2-8: warning: type clash on default action:  != <>


> >1.
> >In following block, we can just do with def_index, and we do not need
> found_def
> >flag. We can check if def_index is -1 or not to decide if default
> partition is
> >present.
> found_def is used to set boundinfo->has_default which is used at couple
> of other places to check if default partition exists. The implementation
> is similar
> to has_null.
>
> >3.
> >In following function isDefaultPartitionBound, first statement "return
> false"
> >is not needed.
> It is needed to return false if the node is not DefElem.
>
> Todo:
> Add regression tests
> Documentation
>
> Thank you,
> Rahila Syed
>
>
>

Re: [HACKERS] UPDATE of partition key

2017-05-12 Thread Amit Khandekar

On 12 May 2017 at 14:56, Amit Kapila  wrote:
> I think it might be better to summarize all the options discussed
> including what the patch has and see what most people consider as
> sensible.

Yes, makes sense. Here are the options that were discussed so far for
ROW triggers :

Option 1 : (the patch follows this option)
--
BR Update trigger for source partition.
BR,AR Delete trigger for source partition.
BR,AR Insert trigger for destination partition.
No AR Update trigger.

Rationale :

BR Update trigger should be fired because that trigger can even modify
the rows, and that can even result in partition key update even though
the UPDATE statement is not updating the partition key.

Also, fire the delete/insert triggers on respective partitions since
the rows are about to be deleted/inserted. AR update trigger should
not be fired because that required an actual update to have happened.

Option 2
--
BR Update trigger for source partition.
AR Update trigger on destination partition.
No insert/delete triggers.

Rationale :

Since it's an UPDATE statement, only update triggers should be fired.
The update ends up moving the row into another partition, so AR Update
trigger should be fired on this partition rather than the original
partition.

Option 3

BR, AR delete triggers on source partition
BR, AR insert triggers on destination partition.

Rationale :
Since the update is converted to delete+insert, just skip the update
triggers completely.

Option 4

BR-AR update triggers for source partition
BR-AR insert triggers for destination partition

Rationale :
Since it is an update statement, both BR and AR UPDATE trigger should
be fired on original partition.
Since update is converted to delete+insert, the corresponding triggers
should be fired, but since we already are firing UPDATE trigger on
original partition, skip delete triggers, otherwise both UPDATE and
DELETE triggers would get fired on the same partition.

For statement triggers, I think it should be based on the
documentation recently checked in for partitions in general.

+A statement that targets a parent table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the parent table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.

Based on that, for row movement as well, the trigger should be fired
only for the table referred in the UPDATE statement, and not for any
child tables, or for any partitions to which the rows were moved. The
doc in this row-movement patch also matches with this behaviour.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread amul sul

On Wed, May 10, 2017 at 6:04 PM, Ashutosh Bapat
 wrote:
> On Wed, May 3, 2017 at 6:39 PM, amul sul  wrote:
>> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas  wrote:
>>
>>>
>>> This is not yet a detailed review - I may be missing things, and
>>> review and commentary from others is welcome.  If there is no major
>>> disagreement with the idea of moving forward using Amul's patch as a
>>> base, then I will do a more detailed review of that patch (or,
>>> hopefully, an updated version that addresses the above comments).
>>
>
> I agree that Amul's approach makes dump/restore feasible whereas
> Nagata-san's approach makes that difficult. That is a major plus point
> about Amul's patch. Also, it makes it possible to implement
> Nagata-san's syntax, which is more user-friendly in future.
>
> Here are some review comments after my initial reading of Amul's patch:
>
> Hash partitioning will partition the data based on the hash value of the
> partition key. Does that require collation? Should we throw an error/warning 
> if
> collation is specified in PARTITION BY clause?
>
> +int   *indexes;/* Partition indexes; in case of hash
> + * partitioned table array length will be
> + * value of largest modulus, and for others
> + * one entry per member of the datums array
> + * (plus one if range partitioned table) */
> This may be rewritten as "Partition indexes: For hash partitioned table the
> number of indexes will be same as the largest modulus. For list partitioned
> table the number of indexes will be same as the number of datums. For range
> partitioned table the number of indexes will be number of datums plus one.".
> You may be able to reword it to a shorter version, but essentially we will 
> have
> separate description for each strategy.
>
Okay, will fix this.

> I guess, we need to change the comments for the other members too. For example
> "datums" does not contain tuples with key->partnatts attributes for hash
> partitions. It contains a tuple with two attributes, modulus and remainder. We
> may not want to track null_index separately since rows with NULL partition key
> will fit in the partition corresponding to the hash value of NULL. OR may be 
> we
> want to set null_index to partition which contains NULL values, if there is a
> partition created for corresponding remainder, modulus pair and set has_null
> accordingly. Accordingly we will need to update the comments.
>
> cal_hash_value() may be renamed as calc_has_value() or compute_hash_value()?
>
Okay, will rename to compute_hash_value().

> Should we change the if .. else if .. construct in 
> RelationBuildPartitionDesc()
> to a switch case? There's very less chance that we will support a fourth
> partitioning strategy, so if .. else if .. may be fine.
>
> +intmod = hbounds[i]->modulus,
> +place = hbounds[i]->remainder;
> Although there are places in the code where we separate variable declaration
> with same type by comma, most of the code declares each variable with the data
> type on separate line. Should variable "place" be renamed as "remainder" since
> that's what it is ultimately?
>
Okay, will rename "place" to "remainder".

> RelationBuildPartitionDesc() fills up mapping array but never uses it. In this

Agreed, mapping array is not that much useful but not useless, it
required at the end of RelationBuildPartitionDesc() while assigning
OIDs to result->oids, see for-loop just before releasing mapping
memory.

> code the index into mapping array itself is the mapping so it doesn't need to
> be maintained separately like list partiioning case. Similary next_index usage
> looks unnecessary, although that probably improves readability, so may be 
> fine.
>
Anyway, will remove uses of "next_index".

> + *   for p_p1: satisfies_hash_partition(2, 1, pkey, value)
> + *   for p_p2: satisfies_hash_partition(4, 2, pkey, value)
> + *   for p_p3: satisfies_hash_partition(8, 0, pkey, value)
> + *   for p_p4: satisfies_hash_partition(8, 4, pkey, value)
> What the function builds is satisfies_hash_partition(2, 1, pkey). I don't see
> code to add value as an argument to the function. Is that correct?
>
Sorry for confusion,  "pkey" & "value" are the column of table in the
give example.
Renamed those column name to "a" & "b".

> +intmodulus = DatumGetInt32(datum);
> May be you want to rename this variable to greatest_modulus like in the other
> places.
>
Okay, will fix this.

> +Assert(spec->modulus > 0 && spec->remainder >= 0);
> I liked this assertion. Do you want to add spec->modulus > spec->reminder also
> here?
>
Okay, will add this too.

> +char   *strategy;/* partitioning strategy
> +   ('hash', 'list' or 'range') */
>
> We need the second line to

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-12 Thread Simon Riggs

On 11 May 2017 at 18:29, Simon Riggs  wrote:
> On 11 May 2017 at 18:13, Andres Freund  wrote:
>
>>>New patch, v3.
>>>
>>>Applying in 90 minutes, barring objections.
>>
>> Could you please wait till tomorrow?  I've bigger pending fixes for related 
>> code pending/being tested that I plan to push today.  I'd also like to take 
>> a look before...
>
> Sure.

The changes I've added are very minor, so I'm not expecting debate.
The main part of the patch is the same as Petr posted 19days ago.

I'm travelling now, so after waiting till tomorrow as you requested I
have committed the patch.

Cheers

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-12 Thread tushar


On 05/11/2017 07:13 PM, Petr Jelinek wrote:

I think it does make sense to add check for this into CREATE/ALTER
SUBSCRIBER though so that user is informed immediately about the mistake
rather than by errors in the logs later.
+1 , there are  few similar cases   - where user does not  get error at 
prompt , for instance

--when publication doesn't not exist
postgres=# create subscription sub connection 'dbname=postgres port=5000 
user=centos password=a' publication nowhere;

NOTICE:  synchronized table states
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION
--No check validation for Publication name in ALTER
postgres=# alter subscription sub set publication _ refresh;
ALTER SUBSCRIPTION
--slot name given in ALTER
postgres=# alter subscription sub with ( slot name='nowhere');
ALTER SUBSCRIPTION
--and due to that , we are not able to drop it later.
postgres=# drop subscription sub;
ERROR:  could not drop the replication slot "nowhere" on publisher
DETAIL:  The error was: ERROR:  replication slot "nowhere" does not exist
postgres=#

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-12 Thread Amit Kapila

On Fri, May 12, 2017 at 10:49 AM, Amit Khandekar  wrote:
> On 12 May 2017 at 08:30, Amit Kapila  wrote:
>> On Thu, May 11, 2017 at 5:41 PM, Amit Khandekar  
>> wrote:
>
>> If we try to compare it with the non-partitioned update,
>> there also it is internally a delete and insert operation, but we
>> don't fire triggers for those.
>
> For a non-partitioned table, the delete+insert is internal, whereas
> for partitioned table, it is completely visible to the user.
>

If the user has executed an update on root table, then it is
transparent.  I think we can consider it user visible only in case if
there is some user visible syntax like "Update ... Move Row If
Constraint Not Satisfied"

>>
 (b) It seems inconsistent to consider behavior for row and statement
 triggers differently
>>>
>>> I am not sure whether we should compare row and statement triggers.
>>> Statement triggers are anyway fired only per-statement, depending upon
>>> whether it is update or insert or delete. It has nothing to do with
>>> how the rows are modified.
>>>
>>
>> Okay.  The broader point I was trying to convey was that the way this
>> patch defines the behavior of triggers doesn't sound good to me.  It
>> appears to me that in this thread multiple people have raised points
>> around trigger behavior and you should try to consider those.
>
> I understand that there is no single solution which will provide
> completely intuitive trigger behaviour. Skipping BR delete trigger
> should be fine. But then for consistency, we should skip BR insert
> trigger as well, the theory being that the delete+insert are not fired
> by the user so we should not fire them. But I feel both should be
> fired to avoid any consequences unexpected to the user who has
> installed those triggers.
>
> The only specific concern of yours is that of firing *both* update as
> well as insert triggers on the same table, right ? My explanation for
> this was : we have done this before for UPSERT, and we had documented
> the same. We can do it here also.
>
>>  Apart from the options, Robert has suggested, another option could be that
>> we allow firing BR-AR update triggers for original partition and BR-AR
>> insert triggers for the new partition.  In this case, one can argue
>> that we have not actually updated the row in the original partition,
>> so there is no need to fire AR update triggers,
>
> Yes that's what I think. If there is no update happened, then AR
> update trigger should not be executed. AR triggers are only for
> scenarios where it is guaranteed that the DML operation has happened
> when the trigger is being executed.
>
>> but I feel that is what we do for non-partitioned table update and it should 
>> be okay here
>> as well.
>
> I don't think so. For e.g. if a BR trigger returns NULL, the update
> does not happen, and then the AR trigger does not fire as well. Do you
> see any other scenarios for non-partitioned tables, where AR triggers
> do fire when the update does not happen ?
>

No, but here also it can be considered as an update for original partition.

>
> Overall, I am also open to skipping both insert+delete BR trigger,
>

I think it might be better to summarize all the options discussed
including what the patch has and see what most people consider as
sensible.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-12 Thread Petr Jelinek

On 12/05/17 03:31, Andres Freund wrote:
> On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
>> On 2017-05-11 14:51:55 -0700,  wrote:
>>> On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
 I plan to commit the next pending patch after the back branch releases
 are cut - it's an invasive fix and the issue doesn't cause corruption
 "just" slow slot creation. So it seems better to wait for a few days,
 rather than hurry it into the release.
>>>
>>> Now that that's done, here's an updated version of that patch.  Note the
>>> new logic to trigger xl_running_xact's to be logged at the right spot.
>>> Works well in my testing.
>>>
>>> I plan to commit this fairly soon, unless somebody wants a bit more time
>>> to look into it.
>>>
>>> Unless somebody protests, I'd like to slightly revise how the on-disk
>>> snapshots are stored on master - given the issues this bug/commit showed
>>> with the current method - but I can see one could argue that that should
>>> rather be done next release.
>>
>> As usual I forgot my attachement.
> 
> This patch also seems to offer a way to do your 0005 in, possibly, more
> efficient manner.  We don't ever need to assume a transaction is
> timetravelling anymore.  Could you check whether the attached, hasty,
> patch resolves the performance problems you measured?  Also, do you have
> a "reference" workload for that?
> 

Hmm, well it helps but actually now that we don't track individual
running transactions anymore it got much less effective (my version of
0005 does as well).

The example workload I test with is:
session 1: open transaction, do a write, keep it open
session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
session 1: commit

And wait for session 3 to finish slot creation, takes about 20 mins on
my laptop without patches, minute and half with your patches for 0004
and 0005 (or with your 0004 and my 0005) and about 2s with my original
0004 and 0005.

What makes it slow is the constant qsorting IIRC.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-12 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
> Takayuki
> I found a wrong sentence here in the doc.  I'm sorry, this is what I asked
> you to add...
> 
> https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq-
> paramkeywords
> 
> connect_timeout
> Maximum wait for connection, in seconds (write as a decimal integer string).
> Zero or not specified means wait indefinitely. It is not recommended to
> use a timeout of less than 2 seconds. This timeout applies separately to
> each connection attempt. For example, if you specify two hosts and both
> of them are unreachable, and connect_timeout is 5, the total time spent
> waiting for a connection might be up to 10 seconds.
> 
> 
> The program behavior is that libpq times out after connect_timeout seconds
> regardless of how many hosts are specified.  I confirmed it like this:
> 
> $ export PGOPTIONS="-c post_auth_delay=30"
> $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p
> 5432,5433
> (psql erros out after 5 seconds)
> 
> Could you fix the doc with something like this?
> 
> "This timeout applies across all the connection attempts. For example, if
> you specify two hosts and both of them are unreachable, and connect_timeout
> is 5, the total time spent waiting for a connection is up to 5 seconds."
> 
> Should we also change the minimum "2 seconds" part to be longer, according
> to the number of hosts?

Instead, I think we should fix the program to match the documented behavior.  
Otherwise, if the first database machine is down, libpq might wait for about 2 
hours (depending on the OS's TCP keepalive setting), during which it tims out 
after connect_timeout and does not attempt to connect to other hosts.

I'll add this item in the PostgreSQL 10 Open Items.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-12 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> It seems to me that the feature is behaving as wanted. Or in short attempt
> to connect to the next host only if a connection cannot be established.
> If there is a failure once the exchange with the server has begun, just
> consider it as a hard failure. This is an important property for
> authentication and SSL connection failures actually.

But PgJDBC behaves as expected -- attempt another connection to other hosts 
(and succeed).  I believe that's what users would naturally expect.  The 
current libpq implementation handles only the socket-level connect failure.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-12 Thread Thomas Kellerer

> Just to se what other RDBMS are doing with CTEs; Look at slide 
> 31 here: 
> https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf

That is taken from Markus Winand's post:

https://twitter.com/MarkusWinand/status/852862475699707904

   "Seems like MySQL is getting the best WITH support of all tested DBs
(RECURSIVE not tested!)"



--
View this message in context: 
http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961164.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-12 Thread Masahiko Sawada

On Fri, May 12, 2017 at 11:24 AM, Masahiko Sawada  wrote:
> On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
>  wrote:
>> On 11/05/17 10:10, Masahiko Sawada wrote:
>>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
>>>  wrote:
 On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada  
 wrote:
> Barring any objections, I'll add these two issues to open item.

 It seems to me that those open items have not been added yet to the
 list. If I am following correctly, they could be defined as follows:
 - Dropping subscription may stuck if done during tablesync.
 -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process.
>>
>> I think the solution to this is to reintroduce the LWLock that was
>> removed and replaced with the exclusive lock on catalog [1]. I am afraid
>> that correct way of handling this is to do both LWLock and catalog lock
>> (first LWLock under which we kill the workers and then catalog lock so
>> that something that prevents launcher from restarting them is held till
>> the end of transaction).
>
> I agree to reintroduce LWLock and to stop logical rep worker first and
> then modify catalog. That way we can reduce catalog lock level (maybe
> to RowExclusiveLock) so that apply worker can see it. Also I think
> that we need to do more things like in order to prevent that we keep
> to hold LWLock until end of transaction, because holding LWLock until
> end of transaction is not good idea and could be cause of deadlock. So
> for example we can commit the transaction in DropSubscription after
> cleaned pg_subscription record and all its dependencies and then start
> new transaction for the remaining work. Of course we also need to
> disallow DROP SUBSCRIPTION being executed in a user transaction
> though.

Attached two draft patches to solve these issues.

Attached 0001 patch reintroduces LogicalRepLauncherLock and makes DROP
SUBSCRIPTION keep holding it until commit. To prevent from deadlock
possibility, I disallowed DROP SUBSCRIPTION being called in a
transaction block. But there might be more sensible solution for this.
please give me feedback.

>
>>
 -- Avoid orphaned tablesync worker if apply worker exits before
 changing its status.
>>>
>>
>> The behavior question I have about this is if sync workers should die
>> when apply worker dies (ie they are tied to apply worker) or if they
>> should be tied to the subscription.
>>
>> I guess taking down all the sync workers when apply worker has exited is
>> easier to solve. Of course it means that if apply worker restarts in
>> middle of table synchronization, the table synchronization will have to
>> start from scratch. That being said, in normal operation apply worker
>> should only exit/restart if subscription has changed or has been
>> dropped/disabled and I think sync workers want to exit/restart in that
>> situation as well.
>
> I agree that sync workers are tied to the apply worker.
>
>>
>> So for example having shmem detach hook for an apply worker (or reusing
>> the existing one) that searches for all the other workers for same
>> subscription and shuts them down as well sounds like solution to this.
>
> Seems reasonable solution.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


0001-Fix-a-deadlock-bug-between-DROP-SUBSCRIPTION-and-app.patch
Description: Binary data


0002-Wait-for-table-sync-worker-to-finish-when-apply-work.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread Amit Langote

On 2017/05/12 14:24, Ashutosh Bapat wrote:
> On Fri, May 12, 2017 at 8:08 AM, Amit Langote
>  wrote:
>> On 2017/05/12 11:20, Robert Haas wrote:
>>> Yeah, but I have a feeling that marking the columns NOT NULL is going
>>> to make it really hard to support that in the future when we get the
>>> syntax hammered out.  If it had only affected the partition
>>> constraints that'd be different.
>>
>> So, adding keycol IS NOT NULL (like we currently do for expressions) in
>> the implicit partition constraint would be more future-proof than
>> generating an actual catalogued NOT NULL constraint on the keycol?  I now
>> tend to think it would be better.  Directly inserting into a range
>> partition with a NULL value for a column currently generates a "null value
>> in column \"%s\" violates not-null constraint" instead of perhaps more
>> relevant "new row for relation \"%s\" violates partition constraint".
>> That said, we *do* document the fact that a NOT NULL constraint is added
>> on range key columns, but we might as well document instead that we don't
>> currently support routing tuples with NULL values in the partition key
>> through a range-partitioned table and so NULL values cause error.
> 
> in get_partition_for_tuple() we have
> if (key->strategy == PARTITION_STRATEGY_RANGE)
> {
> /* Disallow nulls in the range partition key of the tuple */
> for (i = 0; i < key->partnatts; i++)
> if (isnull[i])
> ereport(ERROR,
> (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
> errmsg("range partition key of row contains null")));
> }
> 
> Instead of throwing an error here, we should probably return -1 and
> let the error be ""no partition of relation \"%s\" found for row",
> which is the real error, not having a partition which can accept NULL.
> If in future we decide to support NULL values in partition keys, we
> need to just remove above code from get_partition_for_tuple() and
> everything will work as is. I am assuming that we don't add any
> implicit/explicit NOT NULL constraint right now.

We *do* actually, for real columns:

create table p (a int) partition by range (a);
\d p
  Table "public.p"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
Partition key: RANGE (a)

For expression keys, we emit IS NOT NULL as part of the implicit partition
constraint.  The above check for NULL is really for the expressions,
because if any simple columns of the key contain NULL, they will fail the
NOT NULL constraint itself (with that error message).  As I said in my
previous message, I'm thinking that emitting IS NOT NULL as part of the
implicit partition constraint might be better instead of adding it as a
NOT NULL constraint, that is, for the simple column keys; we already do
that for the expression keys for which we cannot add the NOT NULL
constraint anyway.

The way things are currently, error messages generated when a row with
NULL in the range partition key is *directly* into the partition looks a
bit inconsistent, depending on whether the target key is a simple column
or expression:

create table p (a int, b int) partition by range (a, abs(b));
create table p1 partition of p for values from (1, 1) to (1, 10);

insert into p1 values (NULL, NULL);
ERROR:  null value in column "a" violates not-null constraint
DETAIL:  Failing row contains (null, null).

insert into p1 values (1, NULL);
ERROR:  new row for relation "p1" violates partition constraint
DETAIL:  Failing row contains (1, null).

It would be nice if both said "violates partition constraint".

BTW, note that this is independent of your suggestion to emit "partition
not found" message instead of the "no NULLs allowed in the range partition
key" message, which seems fine to me to implement.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 104 matches

Mail list logo