Re: [HACKERS] expanding inheritance in partition bound order

2017-08-06 Thread Ashutosh Bapat
On Fri, Aug 4, 2017 at 10:55 PM, Robert Haas  wrote:
> On Fri, Aug 4, 2017 at 3:38 AM, Amit Langote
>  wrote:
>> The current way to expand inherited tables, including partitioned tables,
>> is to use either find_all_inheritors() or find_inheritance_children()
>> depending on the context.  They return child table OIDs in the (ascending)
>> order of those OIDs, which means the callers that need to lock the child
>> tables can do so without worrying about the possibility of deadlock in
>> some concurrent execution of that piece of code.  That's good.
>>
>> For partitioned tables, there is a possibility of returning child table
>> (partition) OIDs in the partition bound order, which in addition to
>> preventing the possibility of deadlocks during concurrent locking, seems
>> potentially useful for other caller-specific optimizations.  For example,
>> tuple-routing code can utilize that fact to implement binary-search based
>> partition-searching algorithm.  For one more example, refer to the "UPDATE
>> partition key" thread where it's becoming clear that it would be nice if
>> the planner had put the partitions in bound order in the ModifyTable that
>> it creates for UPDATE of partitioned tables [1].
>
> I guess I don't really understand why we want to change the locking
> order.  That is bound to make expanding the inheritance hierarchy more
> expensive. If we use this approach in all cases, it seems to me we're
> bound to reintroduce the problem we fixed in commit
> c1e0e7e1d790bf18c913e6a452dea811e858b554 and maybe add a few more in
> the same vein.

I initially didn't understand this, but I think now I understand it.
Establishing the order of children by partition bounds requires
building the relcache entry right now. That's what is expensive and
would introduce the same problems as the commit you have quoted.

> But I don't see that there's any necessary relation
> between the order of locking and the order of expansion inside the
> relcache entry/plan/whatever else -- so I propose that we keep the
> existing locking order and only change the other stuff.
>
> While reading related code this morning, I noticed that
> RelationBuildPartitionDesc and RelationGetPartitionDispatchInfo have
> *already* changed the locking order for certain operations, because
> the PartitionDesc's OID array is bound-ordered not OID-ordered.  That
> means that when RelationGetPartitionDispatchInfo uses the
> PartitionDesc's OID arra to figure out what to lock, it's potentially
> going to encounter partitions in a different order than would have
> been the case if it had used find_all_inheritors directly.  I'm
> tempted to think that RelationGetPartitionDispatchInfo shouldn't
> really be doing locking at all.  The best way to have the locking
> always happen in the same order is to have only one piece of code that
> determines that order - and I vote for find_all_inheritors.  Aside
> from the fact that it's the way we've always done it (and still do it
> in most other places), that code includes infinite-loop defenses which
> the new code you've introduced lacks.
>

+1.

> Concretely, my proposal is:
>
> 1. Before calling RelationGetPartitionDispatchInfo, the calling code
> should use find_all_inheritors to lock all the relevant relations (or
> the planner could use find_all_inheritors to get a list of relation
> OIDs, store it in the plan in order, and then at execution time we
> visit them in that order and lock them).
>
> 2. RelationGetPartitionDispatchInfo assumes the relations are already locked.
>
> 3. While we're optimizing, in the first loop inside of
> RelationGetPartitionDispatchInfo, don't call heap_open().  Instead,
> use get_rel_relkind() to see whether we've got a partitioned table; if
> so, open it.  If not, there's no need.
>
> 4. For safety, add a function bool RelationLockHeldByMe(Oid) and add
> to this loop a check if (!RelationLockHeldByMe(lfirst_oid(lc1))
> elog(ERROR, ...).  Might be interesting to stuff that check into the
> relation_open(..., NoLock) path, too.
>
> One objection to this line of attack is that there might be a good
> case for locking only the partitioned inheritors first and then going
> back and locking the leaf nodes in a second pass, or even only when
> required for a particular row.  However, that doesn't require putting
> everything in bound order - it only requires moving the partitioned
> children to the beginning of the list.  And I think rather than having
> new logic for that, we should teach find_inheritance_children() to do
> that directly.  I have a feeling Ashutosh is going to cringe at this
> suggestion, but my idea is to do this by denormalizing: add a column
> to pg_inherits indicating whether the child is of
> RELKIND_PARTITIONED_TABLE.  Then, when find_inheritance_children scans
> pg_inherits, it can pull that flag out for free along with the
> relation OID, and qsort() first by the flag and then by the OID.  It
> can 

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-06 Thread Amit Langote
On 2017/08/04 20:28, Ashutosh Bapat wrote:
> On Fri, Aug 4, 2017 at 1:08 PM, Amit Langote
>  wrote:
>> The current way to expand inherited tables, including partitioned tables,
>> is to use either find_all_inheritors() or find_inheritance_children()
>> depending on the context.  They return child table OIDs in the (ascending)
>> order of those OIDs, which means the callers that need to lock the child
>> tables can do so without worrying about the possibility of deadlock in
>> some concurrent execution of that piece of code.  That's good.
>>
>> For partitioned tables, there is a possibility of returning child table
>> (partition) OIDs in the partition bound order, which in addition to
>> preventing the possibility of deadlocks during concurrent locking, seems
>> potentially useful for other caller-specific optimizations.  For example,
>> tuple-routing code can utilize that fact to implement binary-search based
>> partition-searching algorithm.  For one more example, refer to the "UPDATE
>> partition key" thread where it's becoming clear that it would be nice if
>> the planner had put the partitions in bound order in the ModifyTable that
>> it creates for UPDATE of partitioned tables [1].
> 
> Thanks a lot for working on this. Partition-wise join can benefit from
> this as well. See comment about build_simple_rel's matching algorithm
> in [1]. It will become O(n) instead of O(n^2).

Nice.  It seems that we have a good demand for $subject. :)

>> So attached are two WIP patches:
>>
>> 0001 implements two interface functions:
>>
>>   List *get_all_partition_oids(Oid, LOCKMODE)
>>   List *get_partition_oids(Oid, LOCKMODE)
>>
>> that resemble find_all_inheritors() and find_inheritance_children(),
>> respectively, but expect that users call them only for partitioned tables.
>>  Needless to mention, OIDs are returned with canonical order determined by
>> that of the partition bounds and they way partition tree structure is
>> traversed (top-down, breadth-first-left-to-right).  Patch replaces all the
>> calls of the old interface functions with the respective new ones for
>> partitioned table parents.  That means expand_inherited_rtentry (among
>> others) now calls get_all_partition_oids() if the RTE is for a partitioned
>> table and find_all_inheritors() otherwise.
>>
>> In its implementation, get_all_partition_oids() calls
>> RelationGetPartitionDispatchInfo(), which is useful to generate the result
>> list in the desired partition bound order.  But the current interface and
>> implementation of RelationGetPartitionDispatchInfo() needs some rework,
>> because it's too closely coupled with the executor's tuple routing code.
> 
> May be we want to implement get_all_partition_oids() calling
> get_partition_oids() on each new entry that gets added, similar to
> find_all_inheritors(). That might avoid changes to DispatchInfo() and
> also dependency on that structure.
> 
> Also almost every place which called find_all_inheritors() or
> find_inheritance_children() is changed to if () else case calling
> those functions or the new function as required. May be we should
> create macros/functions to do that routing to keep the code readable.
> May be find_all_inheritors() and find_inheritance_children()
> themselves become the routing function and their original code moves
> into new functions get_all_inheritors() and
> get_inheritance_children(). We may choose other names for functions.
> The idea is to create routing functions/macros instead of sprinkling
> code with if () else conditions.

Given the Robert's comments [1], it seems that we might have to abandon
the idea to separate the interface for partitioned and non-partitioned
inheritance cases.  I'm thinking about the issues and alternatives he
mentioned in his email.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobwbh12OJerqAGyPEjb_%2B2y7T0nqRKTcjed6L4NTET6Fg%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-06 Thread Amit Khandekar
On 4 August 2017 at 22:55, Robert Haas  wrote:
>
> 1. Before calling RelationGetPartitionDispatchInfo, the calling code
> should use find_all_inheritors to lock all the relevant relations (or
> the planner could use find_all_inheritors to get a list of relation
> OIDs, store it in the plan in order, and then at execution time we
> visit them in that order and lock them).
>
> 2. RelationGetPartitionDispatchInfo assumes the relations are already locked.

I agree. I think overall, we should keep
RelationGetPartitionDispatchInfo() only for preparing the dispatch
info in the planner, and generate the locked oids (using
find_all_inheritors() or get_partitioned_oids() or whatever) *without*
using RelationGetPartitionDispatchInfo(), since
RelationGetPartitionDispatchInfo() is generating the pd structure
which we don't want in every expansion.

>
> 3. While we're optimizing, in the first loop inside of
> RelationGetPartitionDispatchInfo, don't call heap_open().  Instead,
> use get_rel_relkind() to see whether we've got a partitioned table; if
> so, open it.  If not, there's no need.

Yes, this way we need to open only the partitioned tables.

> P.S. While I haven't reviewed 0002 in detail, I think the concept of
> minimizing what needs to be built in RelationGetPartitionDispatchInfo
> is a very good idea.

True. I also think, RelationGetPartitionDispatchInfo () should be
called while preparing the ModifyTable plan; the PartitionDispatch
data structure returned by RelationGetPartitionDispatchInfo() should
be stored in that plan, and then the execution-time fields in
PartitionDispatch would be populated in ExecInitModifyTable().


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-08-06 Thread Thomas Munro
On Thu, Aug 3, 2017 at 3:03 AM, Robert Haas  wrote:
> On Fri, Jul 21, 2017 at 1:31 AM, Thomas Munro
>  wrote:
>> Thanks Neha.  It's be best to post the back trace and if possible
>> print oldestXact and ShmemVariableCache->oldestXid from the stack
>> frame for TruncateCLOG.
>>
>> The failing assertion in TruncateCLOG() has a comment that says
>> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
>> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
>> *after* it calls TruncateCLOG().  What am I missing here?
>
> This problem was introduced by commit
> ea42cc18c35381f639d45628d792e790ff39e271, so this should be added to
> the PostgreSQL 10 open items list. That commit intended to introduce a
> distinction between (1) the oldest XID that can be safely examined and
> (2) the oldest XID that can't yet be safely reused.  These are the
> same except when we're in the middle of truncating CLOG: (1) advances
> before the truncation, and (2) advances afterwards. That's why
> AdvanceOldestClogXid() happens before truncation proper and
> SetTransactionIdLimit() happens afterwards, and changing the order
> would, I think, be quite wrong.

Added to open items.

> AFAICS, that assertion is simply a holdover from an earlier version of
> the patch that escaped review.  There's just no reason to suppose that
> it's true.

Craig, are you planning to post a patch to remove the assertion, or
make it hold?

>> What actually prevents ShmemVariableCache->oldestXid from going
>> backwards anyway?  Suppose there are two or more autovacuum processes
>> that reach vac_truncate_clog() concurrently.  They do a scan of
>> pg_database whose tuples they access without locking through a
>> pointer-to-volatile because they expect concurrent in-place writers,
>> come up with a value for frozenXID, and then arrive at
>> SetTransactionIdLimit() in whatever order and clobber
>> ShmemVariableCache->oldestXid.  What am I missing here?
>
> Hmm, there could be a bug there, but I don't think it's *this* bug.

I'm not sure that it wrong per se, as long as no one asserts that the
number can't go backwards...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small code improvement for btree

2017-08-06 Thread Masahiko Sawada
On Sat, Aug 5, 2017 at 3:29 AM, Peter Geoghegan  wrote:
> On Fri, Aug 4, 2017 at 11:12 AM, Alvaro Herrera
>  wrote:
>> Interesting.  We learned elsewhere that it's better to integrate the
>> "!= 0" test as part of the macro definition; so a
>> better formulation of this patch would be to change the
>> P_INCOMPLETE_SPLIT macro and omit the comparison in the Assert.  (See
>> commit 594e61a1de03 for an example).

Thank you for the information. The macros other than
P_INCOMPLETE_SPLIT in btree.h such as P_ISLEAF, P_ISROOT also doesn't
return booleans. Should we deal with them as well?

>>
>>
>>> -   LockBuffer(hbuffer, BUFFER_LOCK_SHARE);
>>> +   LockBuffer(hbuffer, BT_READ);
>
> +1.
>
> One Linus Torvalds rant that I actually agreed with was a rant against
> the use of bool as a type in C code. It's fine, as long as you never
> forget that it's actually just another integer.
>
>> I think BT_READ and BT_WRITE are useless, and I'd rather get rid of
>> them ...
>
> Fair enough, but we should either use them consistently or not at all.
> I'm not especially concerned about which, as long as it's one of those
> two.
>

I definitely agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subscription code improvements

2017-08-06 Thread Masahiko Sawada
On Sun, Aug 6, 2017 at 7:44 AM, Noah Misch  wrote:
> On Wed, Aug 02, 2017 at 04:09:43PM -0400, Peter Eisentraut wrote:
>> On 8/1/17 00:17, Noah Misch wrote:
>> > The above-described topic is currently a PostgreSQL 10 open item.  Peter,
>> > since you committed the patch believed to have created it, you own this 
>> > open
>> > item.  If some other commit is more relevant or if this does not belong as 
>> > a
>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>> > open item ownership[1] and send a status update within three calendar days 
>> > of
>> > this message.  Include a date for your subsequent status update.  Testers 
>> > may
>> > discover new open items at any time, and I want to plan to get them all 
>> > fixed
>> > well in advance of shipping v10.  Consequently, I will appreciate your 
>> > efforts
>> > toward speedy resolution.  Thanks.
>>
>> I'm looking into this now and will report by Friday.
>
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I think this open item has closed by the commit
7e174fa793a2df89fe03d002a5087ef67abcdde8 ?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subscription code improvements

2017-08-06 Thread Masahiko Sawada
On Sat, Aug 5, 2017 at 10:29 AM, Peter Eisentraut
 wrote:
> On 8/4/17 12:02, Masahiko Sawada wrote:
>> To make ALTER SUBSCRIPTION REFRESH being transactional, I prefer
>> Petr's proposal. Because it can make ALTER SUBSCRIPTION and DROP
>> SUBSCRIPTION stop the table sync workers that are in progress of
>> copying data. I'm not sure killing table sync workers in DDL commands
>> would be acceptable but since it can free unnecessary slots of logical
>> replication workers and replication slots I'd prefer this idea.
>
> OK, I have committed the 0004 patch.

Thank you!

>
>> However, even with this patch there is still an issue that NOTICE
>> messages "removed subscription for table public.t1" can be appeared
>> even if we rollback ALTER SUBSCRIPTION REFRESH command as I mentioned
>> on earlier thread. Since I think this behaviour will confuse users who
>> check server logs I'd like to deal with it, I don't have a good idea
>> though.
>
> Maybe we can just remove those messages?
>
> We don't get messages when we create a subscription about which tables
> are part of it.  So why do we need such messages when we refresh a
> subscription?

I think that the messages is useful when we add/remove tables to/from
the publication and then refresh the subscription, so we might want to
change it to DEBUG rather than elimination.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-06 Thread Amit Langote
Fujita-san,

On 2017/08/07 12:45, Etsuro Fujita wrote:
> Hi,
> 
> I noticed that tuple-routing for partitioned tables that contain
> non-writable foreign partitions doesn't work as expected.  Here is an
> example:
> 
> postgres=# create extension file_fdw;
> postgres=# create server file_server foreign data wrapper file_fdw;
> postgres=# create user mapping for CURRENT_USER server file_server;
> postgres=# create table p (a int) partition by list (a);
> postgres=# create foreign table t1 partition of p for values in (1) server
> file_server options (format 'csv', filename '/path/to/file', delimiter ',');
> postgres=# create table t2 partition of p for values in (2);
> postgres=# insert into p values (1);
> ERROR:  cannot insert into foreign table "t1"
> 
> Looks good, but:
> 
> postgres=# insert into p values (2);
> ERROR:  cannot insert into foreign table "t1"
> 
> The insert should work but doesn't.  (It also seems odd to me that the
> error message points to t1, not t2.)  The reason for that is
> CheckValidResultRel in ExecSetupPartitionTupleRouting aborts any insert
> into a partitioned table in the case where the partitioned table contains
> at least one foreign partition into which the FDW can't insert.  I don't
> think that that is intentional behavior, so I'd like to propose to fix
> that by skipping CheckValidResultRel for foreign partitions because we can
> abort an insert into a foreign partition after ExecFindPartition in
> ExecInsert, by checking to see if resultRelInfo->ri_FdwRoutine is not
> NULL.  Attached is a proposed patch for that.  Since COPY FROM has the
> same issue, I added regression tests for COPY FROM as well as INSERT to
> file_fdw.

Thanks for reporting this.  All the issues you mention here seem valid to me.

So, once we add support in general for foreign partition tuple-routing,
we'd still get the same error ("cannot route to foreign partitions") in
the file_fdw's case, but only because
resultRelInfo->ri_FdwRoutine->ExecForeignInsert will be NULL for file_fdw.

The patch looks good too.  Although, looking at the following hunk:

+   Assert(partrel->rd_rel->relkind == RELKIND_RELATION ||
+  partrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE);
+
/*
 * Verify result relation is a valid target for the current 
operation.
 */
!   if (partrel->rd_rel->relkind == RELKIND_RELATION)
!   CheckValidResultRel(partrel, CMD_INSERT);

makes me now wonder if we need the CheckValidResultRel check at all.  The
only check currently in place for RELKIND_RELATION is
CheckCmdReplicaIdentity(), which is a noop (returns true) for inserts anyway.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-06 Thread Etsuro Fujita

Hi,

I noticed that tuple-routing for partitioned tables that contain 
non-writable foreign partitions doesn't work as expected.  Here is an 
example:


postgres=# create extension file_fdw;
postgres=# create server file_server foreign data wrapper file_fdw;
postgres=# create user mapping for CURRENT_USER server file_server;
postgres=# create table p (a int) partition by list (a);
postgres=# create foreign table t1 partition of p for values in (1) 
server file_server options (format 'csv', filename '/path/to/file', 
delimiter ',');

postgres=# create table t2 partition of p for values in (2);
postgres=# insert into p values (1);
ERROR:  cannot insert into foreign table "t1"

Looks good, but:

postgres=# insert into p values (2);
ERROR:  cannot insert into foreign table "t1"

The insert should work but doesn't.  (It also seems odd to me that the 
error message points to t1, not t2.)  The reason for that is 
CheckValidResultRel in ExecSetupPartitionTupleRouting aborts any insert 
into a partitioned table in the case where the partitioned table 
contains at least one foreign partition into which the FDW can't insert. 
 I don't think that that is intentional behavior, so I'd like to 
propose to fix that by skipping CheckValidResultRel for foreign 
partitions because we can abort an insert into a foreign partition after 
ExecFindPartition in ExecInsert, by checking to see if 
resultRelInfo->ri_FdwRoutine is not NULL.  Attached is a proposed patch 
for that.  Since COPY FROM has the same issue, I added regression tests 
for COPY FROM as well as INSERT to file_fdw.


Best regards,
Etsuro Fujita
*** /dev/null
--- b/contrib/file_fdw/data/list1.csv
***
*** 0 
--- 1,2 
+ 1,foo
+ 1,bar
*** /dev/null
--- b/contrib/file_fdw/data/list2.bad
***
*** 0 
--- 1,2 
+ 2,baz
+ 1,qux
*** /dev/null
--- b/contrib/file_fdw/data/list2.csv
***
*** 0 
--- 1,2 
+ 2,baz
+ 2,qux
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 162,167  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 162,188 
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
  
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ INSERT INTO pt VALUES (2, 'xyzzy');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ DROP TABLE pt;
+ 
  -- privilege tests
  SET ROLE regress_file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 289,294  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 289,375 
  
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a | b 
+ --+---+---
+ (0 rows)
+ 
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ ERROR:  cannot route inserted tuples to a foreign table
+ CONTEXT:  COPY pt, line 2: "1,qux"
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (4 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a |  b  
+ --+---+-
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (2 

Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Craig Ringer
On 7 August 2017 at 11:25, Thomas Munro 
wrote:

> On Mon, Aug 7, 2017 at 2:20 PM, Tom Lane  wrote:
> > Thomas Munro  writes:
> >> Since partitioned tables have no storage themselves, is there
> >> any technical reason we couldn't remove a partitioned table's dropped
> >> pg_attribute so that its TupleDesc matches partitions created later?
> >
> > You'd break views referring to the partitioned table, or at least to
> > any columns after the dropped one.
>
> I will put a huge sign up next to my desk: "What about the rules?"
>
> > There's been talk of separating column identity (think OID) from column
> > logical and physical positions.  If we did that, and had Vars using the
> > column identity number while tupdescs were sorted according to physical
> > position, then what you're thinking of could be made to work.  But a
> > couple of people have attacked that problem and been unable to finish
> > it :-(
>
> Hmm, yeah I see.  I have seen that[1] and I hope it comes back.  It
> seems like it might be a step on the path towards incremental
> materialized views (at least in one proposal) which is why I asked
> about it on this list recently[2].
>

Can we instead create the new partitions with the same dropped columns?

Ensure that every partition, parent and child, has the same column-set?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Thomas Munro
On Mon, Aug 7, 2017 at 2:35 PM, Amit Langote
 wrote:
> On 2017/08/07 10:58, Thomas Munro wrote:
>> Of course there are other usage patterns where you might prefer it
>> this way, because you'll mostly be inserting into partitions created
>> before the change.  In general, would it be better for the partitioned
>> table's TupleDesc to match partitions created before or after a
>> change?  Since partitioned tables have no storage themselves, is there
>> any technical reason we couldn't remove a partitioned table's dropped
>> pg_attribute so that its TupleDesc matches partitions created later?
>
> That means the parent's TupleDesc will begin mismatching that of all of
> the existing partitions and they will suddenly need a map where they
> didn't before.

True, that doesn't sound great, but eventually you'd stop doing it in
common usage patterns.

> I guess you considered it, but optimizing for the common case of range
> partitioning where most of the inserts go to the newest partition will
> hurt the other partitioning methods, like hash, where that won't
> necessarily be true.

Right, you wouldn't want to do it there.  I guess with hash
partitioning you wouldn't typically be rotating partitions (dropped
old ones and creating new ones), so the TupleDescs stay in lock-step.

>> Is there some way that tupconvert.c could make this type of difference
>> moot?
>
> Do you mean the difference arising due to dropped columns in either the
> partitioned table or the table attached as a partition?

Yeah.  I can't think of any way.  I thought for a moment about fast
column removal path involving sliding memory rather than full
deform/reform, but that's uninspiring.  I am not actually proposing
any change here since I have no evidence that there's any real
practical problem.  I just wanted to share the realisation I had
during an off-list discussion about tuple deforming, when considering
whether we actually expect to hit the tuple conversion case often.
Depending on the history of your schema the answer may be: never,
sometimes or always, and once you reach this always state you'll never
get out of it.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FYI: branch for v11 devel is planned for next week

2017-08-06 Thread Tom Lane
The release team discussed this a couple weeks ago, but I don't
think anybody mentioned it on -hackers: v10 seems to be in good
enough shape that it's okay to make the REL_10_STABLE branch soon,
and open HEAD for v11 development.

Last year we branched on Aug 15, and we should be able to keep
to the same schedule.  Barring objections or bad bugs showing up
soon, I'll make the branch early next week.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Thomas Munro
On Mon, Aug 7, 2017 at 2:20 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Since partitioned tables have no storage themselves, is there
>> any technical reason we couldn't remove a partitioned table's dropped
>> pg_attribute so that its TupleDesc matches partitions created later?
>
> You'd break views referring to the partitioned table, or at least to
> any columns after the dropped one.

I will put a huge sign up next to my desk: "What about the rules?"

> There's been talk of separating column identity (think OID) from column
> logical and physical positions.  If we did that, and had Vars using the
> column identity number while tupdescs were sorted according to physical
> position, then what you're thinking of could be made to work.  But a
> couple of people have attacked that problem and been unable to finish
> it :-(

Hmm, yeah I see.  I have seen that[1] and I hope it comes back.  It
seems like it might be a step on the path towards incremental
materialized views (at least in one proposal) which is why I asked
about it on this list recently[2].

[1] 
https://www.postgresql.org/message-id/flat/20141209174146.gp1...@alvh.no-ip.org
[2] 
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scan on inheritance parent with no children in current session

2017-08-06 Thread Amit Langote
On 2017/08/04 18:11, Ashutosh Bapat wrote:
> After that commit in session 1, we get an Append plan
> postgres=# explain verbose select * from parent;
> QUERY PLAN
> ---
>  Append  (cost=0.00..0.00 rows=1 width=4)
>->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=4)
>  Output: parent.a
> (3 rows)
> 
> I don't think this is an intentional change. Here's patch to fix it.
> The comment in the patch uses term "real child" in the context of
> comments about temporary children from other session and the comment
> at the end of the function where rte->inh is reset. May be we should
> move the second comment before setting has_child in the patch and use
> "real child" in the comment at the end to avoid repetition. But I want
> to first check whether we want this fix or we can live with the Append
> plan.

Good catch.  I agree that getting an Append node after that commit is
unintentional and we should fix so that we don't get an Append.  So, +1 to
your patch.  I looked at the patch and the code fix seems to do what we want.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Amit Langote
Hi Thomas,

On 2017/08/07 10:58, Thomas Munro wrote:
> Hi hackers,
> 
> If you drop a column from a partitioned table then it has a TupleDesc
> that matches existing partitions, but new partitions created after
> that have non-same TupleDescs (according to convert_tuples_by_name)
> because they don't have the dropped column.  That means that inserts
> to partitions created later need to go via the deform->remap->form
> code path in tupconvert.c.  If you're using a time-based partitioning
> scheme where you add a new partition for each month and mostly insert
> into the current month, as is very common, then after dropping a
> column you'll eventually finish up sending ALL your inserts through
> tupconvert.c for the rest of time.

That's good observation.

> For example, having hacked my tree to print out a message to tell me
> if it had to convert a tuple:
> 
> postgres=# create table parent (a int, b int) partition by list (b);
> CREATE TABLE
> postgres=# create table child1 partition of parent for values in (1);
> CREATE TABLE
> postgres=# alter table parent drop column a;
> ALTER TABLE
> postgres=# create table child2 partition of parent for values in (2);
> CREATE TABLE
> postgres=# insert into parent values (1);
> NOTICE:  no map
> INSERT 0 1
> postgres=# insert into parent values (2);
> NOTICE:  map!
> INSERT 0 1
> 
> Of course there are other usage patterns where you might prefer it
> this way, because you'll mostly be inserting into partitions created
> before the change.  In general, would it be better for the partitioned
> table's TupleDesc to match partitions created before or after a
> change?  Since partitioned tables have no storage themselves, is there
> any technical reason we couldn't remove a partitioned table's dropped
> pg_attribute so that its TupleDesc matches partitions created later?

That means the parent's TupleDesc will begin mismatching that of all of
the existing partitions and they will suddenly need a map where they
didn't before.

I guess you considered it, but optimizing for the common case of range
partitioning where most of the inserts go to the newest partition will
hurt the other partitioning methods, like hash, where that won't
necessarily be true.

> Is there some way that tupconvert.c could make this type of difference
> moot?

Do you mean the difference arising due to dropped columns in either the
partitioned table or the table attached as a partition?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Tom Lane
Thomas Munro  writes:
> Since partitioned tables have no storage themselves, is there
> any technical reason we couldn't remove a partitioned table's dropped
> pg_attribute so that its TupleDesc matches partitions created later?

You'd break views referring to the partitioned table, or at least to
any columns after the dropped one.

There's been talk of separating column identity (think OID) from column
logical and physical positions.  If we did that, and had Vars using the
column identity number while tupdescs were sorted according to physical
position, then what you're thinking of could be made to work.  But a
couple of people have attacked that problem and been unable to finish
it :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-08-06 Thread Amit Langote
On 2017/08/05 11:05, Robert Haas wrote:
> On Thu, Aug 3, 2017 at 8:45 PM, Amit Langote
>  wrote:
>>> 0003 needs a rebase.
>>
>> Rebased patch attached.
> 
> Committed.

Thank you.

> I think 0004 is a new feature, so I'm leaving that for v11.

Sure.

By the way, bulk of 0004 is refactoring which it seems is what Jeevan's
default partition patch set also includes as one of the patches [1].  It
got a decent amount review from Ashutosh.  I broke it down into a separate
patch, so that the patch to add the new feature is its own tiny patch.

I also spotted a couple of comments referring to attachRel that we just
recently renamed.

So, attached are:

0001: s/attachRel/attachrel/g
0002: Refactoring to introduce a PartConstraintImpliedByRelConstraint
0003: Add the feature to skip the scan of individual leaf partitions

Totally fine if you postpone 0002 and 0003 to when the tree opens up for
PG 11.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAOgcT0OARciE2X%2BU0rjSKp9VuC279dYcCGkc3nCWKhHQ1_m2rw%40mail.gmail.com
From 26e7205fd7c35c0b497c1a7c31152393d3551b23 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Aug 2017 10:45:39 +0900
Subject: [PATCH 1/3] Typo: attachRel is now attachrel

---
 src/backend/commands/tablecmds.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1b8d4b3d17..d27c43bdc7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13491,7 +13491,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 * Prevent circularity by seeing if rel is a partition of attachrel. (In
 * particular, this disallows making a rel a partition of itself.)
 *
-* We do that by checking if rel is a member of the list of attachRel's
+* We do that by checking if rel is a member of the list of attachrel's
 * partitions provided the latter is partitioned at all.  We want to 
avoid
 * having to construct this list again, so we request the strongest lock
 * on all partitions.  We need the strongest lock, because we may decide
@@ -13746,7 +13746,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
{
/*
 * Adjust the constraint that we constructed 
above for
-* attachRel so that it matches this 
partition's attribute
+* attachrel so that it matches this 
partition's attribute
 * numbers.
 */
my_partconstr = 
map_partition_varattnos(my_partconstr, 1,
-- 
2.11.0

From fc6de6ab2800d509ec2af35c96722672e47e99cc Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 15 Jun 2017 19:22:31 +0900
Subject: [PATCH 2/3] Some refactoring of code in ATExecAttachPartition()

Take the code to check using table's constraints if the partition
constraint validation can be skipped and put it into a separate
function PartConstraintImpliedByRelConstraint().
---
 src/backend/commands/tablecmds.c | 187 +--
 1 file changed, 101 insertions(+), 86 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d27c43bdc7..818335ffe9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -473,6 +473,8 @@ static void CreateInheritance(Relation child_rel, Relation 
parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel);
 static ObjectAddress ATExecAttachPartition(List **wqueue, Relation rel,
  PartitionCmd *cmd);
+static bool PartConstraintImpliedByRelConstraint(Relation partrel,
+ List *partConstraint);
 static ObjectAddress ATExecDetachPartition(Relation rel, RangeVar *name);
 
 
@@ -13422,15 +13424,12 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
Relationattachrel,
catalog;
List   *attachrel_children;
-   TupleConstr *attachrel_constr;
-   List   *partConstraint,
-  *existConstraint;
+   List   *partConstraint;
SysScanDesc scan;
ScanKeyData skey;
AttrNumber  attno;
int natts;
TupleDesc   tupleDesc;
-   boolskip_validate = false;
ObjectAddress address;
const char *trigger_name;
boolfound_whole_row;
@@ -13625,88 +13624,10 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
elog(ERROR, "unexpected whole-row reference found in partition 
key");
 
/*
-* Check if we can do away with 

[HACKERS] Effect of dropping a partitioned table's column over time

2017-08-06 Thread Thomas Munro
Hi hackers,

If you drop a column from a partitioned table then it has a TupleDesc
that matches existing partitions, but new partitions created after
that have non-same TupleDescs (according to convert_tuples_by_name)
because they don't have the dropped column.  That means that inserts
to partitions created later need to go via the deform->remap->form
code path in tupconvert.c.  If you're using a time-based partitioning
scheme where you add a new partition for each month and mostly insert
into the current month, as is very common, then after dropping a
column you'll eventually finish up sending ALL your inserts through
tupconvert.c for the rest of time.

For example, having hacked my tree to print out a message to tell me
if it had to convert a tuple:

postgres=# create table parent (a int, b int) partition by list (b);
CREATE TABLE
postgres=# create table child1 partition of parent for values in (1);
CREATE TABLE
postgres=# alter table parent drop column a;
ALTER TABLE
postgres=# create table child2 partition of parent for values in (2);
CREATE TABLE
postgres=# insert into parent values (1);
NOTICE:  no map
INSERT 0 1
postgres=# insert into parent values (2);
NOTICE:  map!
INSERT 0 1

Of course there are other usage patterns where you might prefer it
this way, because you'll mostly be inserting into partitions created
before the change.  In general, would it be better for the partitioned
table's TupleDesc to match partitions created before or after a
change?  Since partitioned tables have no storage themselves, is there
any technical reason we couldn't remove a partitioned table's dropped
pg_attribute so that its TupleDesc matches partitions created later?
Is there some way that tupconvert.c could make this type of difference
moot?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-06 Thread Peter Geoghegan
On Sun, Aug 6, 2017 at 1:06 PM, Peter Geoghegan  wrote:
> On Sat, Aug 5, 2017 at 8:26 PM, Tom Lane  wrote:
>> I'm quite disturbed though that the set of installed collations on these
>> two test cases seem to be entirely different both from each other and from
>> what you reported.  The base collations look generally similar, but the
>> "keyword variant" versions are not comparable at all.  Considering that
>> the entire reason we are interested in ICU in the first place is its
>> alleged cross-version collation behavior stability, this gives me the
>> exact opposite of a warm fuzzy feeling.  We need to understand why it's
>> like that and what we can do to reduce the variation, or else we're just
>> buying our users enormous future pain.  At least with the libc collations,
>> you can expect that if you have en_US.utf8 available today you will
>> probably still have en_US.utf8 available tomorrow.  I am not seeing any
>> reason to believe that the same holds for ICU collations.
>
> +1. That seems like something that is important to get right up-front.

I've looked into this. I'll give an example of what keyword variants
there are for Greek, and then discuss what I think each is. These
keyword variant locations on my machine with master + ICU support (ICU
55):

postgres=# \dOS+ el-*
 List of collations
   Schema   │  Name  │ Collate  │  Ctype
│ Provider │ Description
┼┼──┼──┼──┼─
 pg_catalog │ el-u-co-emoji-x-icu│ el-u-co-emoji│
el-u-co-emoji│ icu  │ Greek
 pg_catalog │ el-u-co-eor-x-icu  │ el-u-co-eor  │ el-u-co-eor
│ icu  │ Greek
 pg_catalog │ el-u-co-search-x-icu   │ el-u-co-search   │
el-u-co-search   │ icu  │ Greek
 pg_catalog │ el-u-co-standard-x-icu │ el-u-co-standard │
el-u-co-standard │ icu  │ Greek
 pg_catalog │ el-x-icu   │ el   │ el
│ icu  │ Greek
(5 rows)

Greek has only one region, standard Greek. A few other
language-regions have variations like multiple regions (e.g. Austrian
German), or a phonebook variant, which you don't see here. Almost all
have -emoji, -search, and -standard, which you do see here.

We pass "commonlyUsed = true" to ucol_getKeywordValuesForLocale()
within pg_import_system_collations(), and so it "will return only
commonly used values with the given locale in preferred order". But
should we go even further? If the charter of
pg_import_system_collations() is to import every possible valid
collation for pg_collation, then it's already failing at that by
limiting itself to "common variants". I agree with the decision to do
that, though, and I think we probably need to go a bit further.

Possible issues with current ICU pg_collation entries after initdb:

* I don't think we should have user-visible "search" collations at all.

Apparently "search" collations are useful because "primary- and
secondary-level distinctions for searching may not be the same as
those for sorting; in ICU, many languages provide a special "search"
collator with the appropriate level settings for search" [1]. I don't
think that we should expose "search" keyword variants at all, because
clearly they're an implementation detail that Postgres may one day
have special knowledge of [2], to correctly mix searching and sorting
semantics. For the time being, those should simply not be added within
pg_import_system_collations(). Someone could still create the entries
themselves, which seems harmless. Let's avoid establishing the
expectation that they'll be in pg_collation.

* Redundant ICU spellings for the same collation seem to appear.

I find it questionable that there is both a "el-x-icu" and a
"el-u-co-standard-x-icu". That looks like an artifact of how
pg_import_system_collations() was written, as opposed to a bonafide
behavioral difference. I cannot find an example of a
"$COUNTRY_CODE-x-icu" collation without a corresponding
"$COUNTRY_CODE-*-u-standard-x-icu" (The situation is similar for
regional variants, like Austrian German). What, if anything, is the
difference between each such pair of collations? Can we find a way to
provide only one canonical entry if those are simply different ICU
spellings?

* Many emoji variant collations.

I have to wonder if there is much value in creating so many
pg_collation entries that are mere variants to do pictographic emoji
sorting. Call me a killjoy, but I think that users that want that
behavior can create the collations themselves. We could still document
it. I wouldn't mind it if there wasn't so many emoji collations.

* Many EOR variant collations.

EOR as a collation variant is an ICU hack to get around the fact that
EOR doesn't fit with their taxonomy for locales. My understanding is
that there is supposed to be one EOR collation, used across Europe,
per the ISO standard. I think ICU structures it 

Re: [HACKERS] snapbuild woes

2017-08-06 Thread Andres Freund
On 2017-06-09 09:25:34 +0200, Antonin Houska wrote:
> Andres Freund  wrote:
> 
> > Looking at 0001:
> > - GetOldestSafeDecodingTransactionId() only guarantees to return an xid
> >   safe for decoding (note how procArray->replication_slot_catalog_xmin
> >   is checked), not one for the initial snapshot - so afaics this whole
> >   exercise doesn't guarantee much so far.
> 
> I happen to use CreateInitDecodingContext() in an extension, so I had to think
> what the new argumen exactly means (as for the incompatibility between PG
> 9.6.2 and 9.6.3, I suppose preprocessor directives can handle it).
> 
> One thing I'm failing to understand is: if TRUE is passed for
> need_full_snapshot, then slot->effective_xmin receives the result of
> 
> GetOldestSafeDecodingTransactionId(need_full_snapshot)
> 
> but this does include "catalog xmin".
> 
> If the value returned is determined by an existing slot which has valid
> effective_catalog_xmin and invalid effective_xmin (i.e. that slot only
> protects catalog tables from VACUUM but not the regular ones), then the new
> slot will think it (i.e. the new slot) protects even non-catalog tables, but
> that's no true.
> 
> Shouldn't the xmin_horizon be computed by this call instead?
> 
> GetOldestSafeDecodingTransactionId(!need_full_snapshot);
> 
> (If so, I think "considerCatalog" argument name would be clearer than
> "catalogOnly".)

Good catch. Pushing a fix. Afaict it's luckily inconsequential atm
because fo the way we wait for concurrent snapshots when creating a
slot. But it obviously nevertheless needs tobe fixed.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-08-06 Thread Andreas Karlsson

On 08/04/2017 08:48 PM, Shay Rojansky wrote:

On 2017-08-04 07:22:42 +0300, Shay Rojansky wrote:
> I'm still not convinced of the risk/problem of simply setting the session
> id context as I explained above (rather than disabling the optimization),
> but of course either solution resolves my problem.

How would that do anything? Each backend has it's own local
memory. I.e. any cache state that openssl would maintain wouldn't be
useful. If you want to take advantage of features around this you really
need to cache tickets in shared memory...

Guys, there's no data being cached at the backend - RFC5077 is about 
packaging information into a client-side opaque session ticket that 
allows skipping a roundtrip on the next connection. As I said, simply 
setting the session id context (*not* the session id or anything else) 
makes this feature work, even though a completely new backend process is 
launched.


Yes, session tickets are encrypted data which is stored by the client. 
But if we are going to support them I think we should do it properly 
with new GUCs for the key file and disabling the feature. Using a key 
file is less necessary for PostgreSQL than for a web server since it is 
less common to do round robin load balancing between different 
PostgreSQL instances.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-06 Thread Alik Khilazhev
Hello Fabien,


> On 5 Aug 2017, at 12:15, Fabien COELHO  wrote:
> 
> 
> Hello Alik,
> 
> I've done some math investigations, which consisted in spending one hour with 
> Christian, a statistician colleague of mine. He took an old book out of a 
> shelf, opened it to page 550 (roughly in the middle), and explained to me how 
> to build a real zipfian distribution random generator.
> 
> The iterative method is for parameter a>1 and works for unbounded values. It 
> is simple to add a bound. In practice the iterative method is quite 
> effective, i.e. number of iterations is typically small, at least if the 
> bound is large and if parameter a is not too close to 1.

> I've attached a python3 script which implements the algorithm. It looks like 
> magic. Beware that a C implementation should take care of float and int 
> overflows.
> 
Thank you for the script. I will rewrite it to C and add to the patch soon. 

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Draft for 2017-08-10 Release

2017-08-06 Thread Jonathan Katz
Hi,

I have put together a draft of the press release notes for the upcoming 
20170810 release.  It is available here:


https://git.postgresql.org/gitweb/?p=press.git;a=blob_plain;f=update_releases/current/20170810securityrelease.md
 


Please let me know by Tuesday Aug 8 if you have any corrections, believe there 
is something missing from the highlight list, or something that should be 
removed from the highlight list.

(And of course typos, etc.)

Thanks!

Jonathan

Re: [HACKERS] Draft release notes up for review

2017-08-06 Thread Jonathan Katz

> On Aug 5, 2017, at 5:37 PM, Tom Lane  wrote:
> 
> Jonathan Katz  writes:
>> I see this one
>>  > Fix potential data corruption when freezing a tuple whose XMAX is a 
>> multixact with exactly one still-interesting member
>> But I’m unsure how prevalent it is and if it should be highlighted.
> 
> I'm not sure about that either.  I do not think anyone did the legwork
> to determine the exact consequences of that bug, or the probability of
> someone hitting it in the field.  But I think the latter must be really
> low, because we haven't heard any field reports that seem to match up.

OK, thanks for the clarification.  I will follow-up once I have the draft ready 
for technical review.

Thanks,

Jonathan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-08-06 Thread Paul A Jungwirth
On Sat, Aug 5, 2017 at 6:22 PM, Robert Haas  wrote:
> On Sat, Aug 5, 2017 at 4:28 PM, Paul A Jungwirth
>  wrote:
>> I don't have an opinion on the urgency of back-porting a fix, but if
>> pg_stop_backup(boolean) allows for inconsistent backups, it does sound
>> like a problem on 9.6 too.
>
> It doesn't.  The talk about inconsistent backups is, I think, not a
> very precise way of speaking.

Thank you for the reassurance and the detailed explanation!

Paul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers