Re: [HACKERS] Causal reads take II

2017-07-30 Thread Thomas Munro
On Sun, Jul 30, 2017 at 7:07 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I looked through the code of `synchronous-replay-v1.patch` a bit and ran a
> few
> tests. I didn't manage to break anything, except one mysterious error that
> I've
> got only once on one of my replicas, but I couldn't reproduce it yet.
> Interesting thing is that this error did not affect another replica or
> primary.
> Just in case here is the log for this error (maybe you can see something
> obvious, that I've not noticed):
>
> ```
> LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
> Directory not empty
> CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
> LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
> Directory not empty
> CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
> LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
> Directory not empty
> CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
> LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
> Directory not empty
> CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
> LOG:  directories for tablespace 47733 could not be removed
> HINT:  You can remove the directories manually if necessary.
> CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
> FATAL:  could not create directory "pg_tblspc/47734/PG_10_201707211/47732":
> File exists
> CONTEXT:  WAL redo at 0/125F5768 for Storage/CREATE:
> pg_tblspc/47734/PG_10_201707211/47732/47736
> LOG:  startup process (PID 8034) exited with exit code 1
> LOG:  terminating any other active server processes
> LOG:  database system is shut down
> ```

Hmm.  The first error ("could not remove directory") could perhaps be
explained by temporary files from concurrent backends, leaked files
from earlier crashes or copying a pgdata directory over the top of an
existing one as a way to set it up, leaving behind some files from an
earlier test?  The second error ("could not create directory") is a
bit stranger though... I think this must come from
TablespaceCreateDbspace(): it must have stat()'d the file and got
ENOENT, decided to create the directory, acquired
TablespaceCreateLock, stat()'d the file again and found it still
absent, then run mkdir() on the parents and got EEXIST and finally on
the directory to be created, and surprisingly got EEXIST.  That means
that someone must have concurrently created the directory.  Perhaps in
your testing you accidentally copied a pgdata directory over the top
of it while it was running?  In any case I'm struggling to see how
anything in this patch would affect anything at the REDO level.

> And speaking about the code, so far I have just a few notes (some of them
> merely questions):
>
> * In general the idea behind this patch sounds interesting for me, but it
>   relies heavily on time synchronization. As mentioned in the documentation:
>   "Current hardware clocks, NTP implementations and public time servers are
>   unlikely to allow the system clocks to differ more than tens or hundreds
> of
>   milliseconds, and systems synchronized with dedicated local time servers
> may
>   be considerably more accurate." But as far as I remember from my own
>   experience sometimes it maybe not that trivial on something like AWS
> because
>   of virtualization. Maybe it's an unreasonable fear, but is it possible to
>   address this problem somehow?

Oops, I had managed to lose an important hunk that deals with
detecting excessive clock drift (ie badly configured servers) while
rebasing a couple of versions back.  Here is a version to put it back.

With that change, if you disable NTP and manually set your standby's
clock to be more than 1.25s (assuming synchronous_replay_lease_time is
set to the default of 5s) behind the primary, the synchronous_replay
should be unavailable and you should see this error in the standby's
log:

ereport(LOG,
(errmsg("the primary server's clock time is too
far ahead for synchronous_replay"),
 errhint("Check your servers' NTP configuration or
equivalent.")));

One way to test this without messing with your NTP setting or
involving two different computers is to modify this code temporarily
in WalSndKeepalive:

now = GetCurrentTimestamp() + 1250100;

This is a best effort intended to detect a system not running ntpd at
all or talking to an insane time server.  Fundamentally this proposal
is based on the assumption that you can get your system clocks into
sync within a tolerance that we feel confident estimating an upper
bound for.

It does appear that some Amazon OS images come with NTP disabled;
that's a problem if you want to use this feature, but if you're
running a virtual server without an ntpd you'll pretty soon drift
seconds to minutes off UTC time and get "unavailable for synchronous
replay" errors from this patch (and possibly the LOG message above,
depending on the direction of drift).


Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-30 Thread Ashutosh Bapat
On Sat, Jul 29, 2017 at 2:55 AM, Robert Haas  wrote:
> On Fri, Jul 28, 2017 at 9:30 AM, Ashutosh Bapat
>  wrote:
>> 0004 patch
>> The patch adds another column partdefid to catalog pg_partitioned_table. The
>> column gives OID of the default partition for a given partitioned table. This
>> means that the default partition's OID is stored at two places 1. in the
>> default partition table's pg_class entry and in pg_partitioned_table. There 
>> is
>> no way to detect when these two go out of sync. Keeping those two in sync is
>> also a maintenance burdern. Given that default partition's OID is required 
>> only
>> while adding/dropping a partition, which is a less frequent operation, it 
>> won't
>> hurt to join a few catalogs (pg_inherits and pg_class in this case) to find 
>> out
>> the default partition's OID. That will be occasional performance hit
>> worth the otherwise maintenance burden.
>
> Performance isn't the only consideration here.  We also need to think
> about locking and concurrency.  I think that most operations that
> involve locking the parent will also involve locking the default
> partition.  However, we can't safely build a relcache entry for a
> relation before we've got some kind of lock on it.  We can't assume
> that there is no concurrent DDL going on before we take some lock.  We
> can't assume invalidation messages are processed before we have taken
> some lock.  If we read multiple catalog tuples, they may be from
> different points in time.  If we can figure out everything we need to
> know from one or two syscache lookups, it may be easier to verify that
> the code is bug-free vs. having to do something more complicated.
>

The code takes a lock on the parent relation. While that function
holds that lock nobody else would change partitions of that relation
and hence nobody changes the default partition.
heap_drop_with_catalog() has code to lock the parent. Looking up
pg_inherits catalog for its partitions followed by identifying the
partition which has default partition bounds specification while
holding the lock on the parent should be safe. Any changes to
partition bounds, or partitions would require lock on the parent. In
order to prevent any buggy code changing the default partition without
sufficient locks, we should lock the default partition after it's
found and check the default partition bound specification again. Will
that work?

> Now that having been said, I'm not taking the position that Jeevan's
> patch (based on Amit Langote's idea) has definitely got the right
> idea, just that you should think twice before shooting down the
> approach.
>

If we can avoid the problems specified by Amit Langote, I am fine with
the approach of reading the default partition OID from the Relcache as
well. But I am not able to device a solution to those problems.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Complex Source Code Reading Strategy

2017-07-30 Thread Zeray Kalayu
On Fri, Jul 28, 2017 at 6:23 AM, Craig Ringer  wrote:
> On 28 July 2017 at 07:45, Tom Lane  wrote:
>>
>> Peter Geoghegan  writes:
>> > 2. Start somewhere. I have no idea where that should be, but it has to
>> > be some particular place that seems interesting to you.
>>
>> Don't forget to start with the available documentation, ie
>> https://www.postgresql.org/docs/devel/static/internals.html
>> You should certainly read
>> https://www.postgresql.org/docs/devel/static/overview.html
>> and depending on what your interests are, there are probably other
>> chapters of Part VII that are worth your time.
>>
>> Also keep an eye out for README files in the part of the source
>> tree you're browsing in.
>
>
> In fact, even though you won't initially understand much from some of them,
> reading most of
>
> find src/ -name README\*
>
> can be quite useful. It's nearly time for me to do that again myself; each
> time I absorb more.
>
> There are very useful comments at the start of some of the source files too.
> Unfortunately in some cases the really important explanation will be on some
> function that you won't know to look for, not the comment at the top of the
> file, so there's an element of discovery there.
>
> I'd start with the docs as Tom suggested, then
>
> * https://www.postgresql.org/developer/
> * https://momjian.us/main/presentations/internals.html
> * https://wiki.postgresql.org/wiki/Development_information
> * https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
> * https://wiki.postgresql.org/wiki/Developer_FAQ
>
> (some of which need to be added to the "developer information" wiki page I
> think)
>

Thanks dear hackers. This is an enormous help for me.

I think recommending specific techniques/tools like Cscope, find src/
-name README\*   and others you might know that make life easy  with
PG hacking can be great help for the beginners.

Regards,
Zeray


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-30 Thread Ashutosh Sharma
Hi Christoph,

On Mon, Jul 31, 2017 at 2:44 AM, Christoph Berg  wrote:
> Re: Tom Lane 2017-07-28 <3254.1501276...@sss.pgh.pa.us>
>> Christoph Berg  writes:
>> > The plperl segfault on Debian's kfreebsd port I reported back in 2013
>> > is also still present:
>> > https://www.postgresql.org/message-id/20130515064201.GC704%40msgid.df7cb.de
>> > https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=kfreebsd-amd64=10~beta2-1=1499947011=0
>>
>> So it'd be interesting to know if it's any better with HEAD ...
>
> Unfortunately not:
>
> == creating database "pl_regression"  ==
> CREATE DATABASE
> ALTER DATABASE
> == installing plperl  ==
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
> The only interesting line in log/postmaster.log is a log_line_prefix-less
> Util.c: loadable library and perl binaries are mismatched (got handshake key 
> 0xd500080, needed 0xd600080)
> ... which is unchanged from the beta2 output.


I am not able to reproduce this issue on my Windows or Linux box. Have
you deleted Util.c and SPI.c files before starting with the build? The
point is, these files are generated during build time and if they
already exist then i think. they are not re generated. I would suggest
to delete both the .c files and rebuild your source and then give a
try.

Here are the results i got on Windows and Linux respectively on HEAD,

On Windows:

C:\Users\ashu\git-clone-postgresql\postgresql\src\tools\msvc>vcregress.bat
PLCHECK

Checking plperl
(using postmaster on localhost, default port)
== dropping database "pl_regression"  ==
NOTICE:  database "pl_regression" does not exist, skipping
DROP DATABASE
== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
CREATE EXTENSION
== installing plperlu ==
CREATE EXTENSION
== running regression test queries==
test plperl   ... ok
test plperl_lc... ok
test plperl_trigger   ... ok
test plperl_shared... ok
test plperl_elog  ... ok
test plperl_util  ... ok
test plperl_init  ... ok
test plperlu  ... ok
test plperl_array ... ok
test plperl_plperlu   ... ok

==
 All 10 tests passed.
==

On Linux:

LD_LIBRARY_PATH="/home/ashu/git-clone-postgresql/postgresql/tmp_install/home/ashu/git-clone-postgresql/postgresql/TMP/postgres/lib"
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --dbname=pl_regression
--load-extension=plperl  --load-extension=plperlu plperl plperl_lc
plperl_trigger plperl_shared plperl_elog plperl_util plperl_init
plperlu plperl_array
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 50848 with PID 18140
== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
CREATE EXTENSION
== installing plperlu ==
CREATE EXTENSION
== running regression test queries==
test plperl   ... ok
test plperl_lc... ok
test plperl_trigger   ... ok
test plperl_shared... ok
test plperl_elog  ... ok
test plperl_util  ... ok
test plperl_init  ... ok
test plperlu  ... ok
test plperl_array ... ok
== shutting down postmaster   ==
== removing temporary instance==

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-30 Thread Robert Haas
On Fri, Jul 14, 2017 at 3:02 AM, Ashutosh Bapat
 wrote:
> Here's revised patch set with only 0004 revised. That patch deals with
> creating multi-level inheritance hierarchy from multi-level partition
> hierarchy. The original logic of recursively calling
> inheritance_planner()'s guts over the inheritance hierarchy required
> that for every such recursion we flatten many lists created by that
> code. Recursion also meant that root->append_rel_list is traversed as
> many times as the number of partitioned partitions in the hierarchy.
> Instead the revised version keep the iterative shape of
> inheritance_planner() intact, thus naturally creating flat lists,
> iterates over root->append_rel_list only once and is still easy to
> read and maintain.

0001-0003 look basically OK to me, modulo some cosmetic stuff.  Regarding 0004:

+if (brel->reloptkind != RELOPT_BASEREL &&
+brte->relkind != RELKIND_PARTITIONED_TABLE)

I spent a lot of time staring at this code before I figured out what
was going on here.  We're iterating over simple_rel_array, so the
reloptkind must be RELOPT_OTHER_MEMBER_REL if it isn't RELOPT_BASEREL.
But does that guarantee that rtekind is RTE_RELATION such that
brte->relkind will be initialized to a value?  I'm not 100% sure.  I
think it would be clearer to write this test like this:

Assert(IS_SIMPLE_REL(brel));
if (brel->reloptkind == RELOPT_OTHER_MEMBER_REL &&
(brte->rtekind != RELOPT_BASEREL ||
brte->relkind != RELKIND_PARTITIONED_TABLE))
continue;

Note that the way you wrote the comment is says if it *is* another
REL, not if it's *not* a baserel; it's good if those kinds of little
details match between the code and the comments.

It is not clear to me what the motivation is for the API changes in
expanded_inherited_rtentry.  They don't appear to be necessary.  If
they are necessary, you need to do a more thorough job updating the
comments.  This one, in particular:

  *  If so, add entries for all the child tables to the query's
  *  rangetable, and build AppendRelInfo nodes for all the child tables
  *  and add them to root->append_rel_list.  If not, clear the entry's

And the comments could maybe say something like "We return the list of
appinfos rather than directly appending it to append_rel_list because
$REASON."

- * is a partitioned table.
+ * RTE simply duplicates the parent *partitioned* table.
  */
-if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)

This is another case where it's hard to understand the test from the comments.

+ * In case of multi-level inheritance hierarchy, for every child we require
+ * PlannerInfo of its immediate parent. Hence we save those in a an array

Say why.  Also, need to fix "a an".

I'm a little bit surprised that this patch doesn't make any changes to
allpaths.c or relnode.c.  It looks to me like we'll generate paths for
the new RTEs that are being added.  Are we going to end up with
multiple levels of Append nodes, then?  Does the consider the way
consider_parallel is propagated up and down in set_append_rel_size()
and set_append_rel_pathlist() really work with multiple levels?  Maybe
this is all fine; I haven't tried to verify it in depth.

Overall I think this is a reasonable direction to go but I'm worried
that there may be bugs lurking -- other code that needs adjusting that
hasn't been found, really.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing comment for max_logical_replication_workers in postgresql.conf.sample

2017-07-30 Thread Tatsuo Ishii
> Attached patch looks good except the excessive tab stops:
> + 
> # (change requires restart)
> 
> I will commit/push this with removing the excessive tab stops if
> there's no objection.

Done. Each fix were pushed in separate commit because the version
needed to back patch are differ.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 10 beta docs: different replication solutions

2017-07-30 Thread Steve Singer


We don't seem to describe logical replication on

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

The attached patch adds a section.

Steve


diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 138bdf2..1329d1f 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -158,6 +158,26 @@ protocol to make nodes agree on a serializable transactional order.

   
 
+
+  
+   Logical Replication
+   
+
+
+  Logical replication allows a database server to send a stream of
+  data modifications to another server.
+  PostgreSQL logical replication constructs
+  a stream of logical data modifications from the WAL.
+
+
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow in multiple
+ directions. For more information on logical replication, see .
+
+   
+  
+ 
   
Trigger-Based Master-Standby Replication

@@ -290,6 +310,7 @@ protocol to make nodes agree on a serializable transactional order.
  Shared Disk Failover
  File System Replication
  Write-Ahead Log Shipping
+ Logical Replication
  Trigger-Based Master-Standby Replication
  Statement-Based Replication Middleware
  Asynchronous Multimaster Replication
@@ -304,6 +325,7 @@ protocol to make nodes agree on a serializable transactional order.
  NAS
  DRBD
  Streaming Repl.
+ Logical Repl.
  Slony
  pgpool-II
  Bucardo
@@ -315,6 +337,7 @@ protocol to make nodes agree on a serializable transactional order.
  shared disk
  disk blocks
  WAL
+ Logical decoding
  table rows
  SQL
  table rows
@@ -330,6 +353,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
 
 
 
@@ -337,6 +361,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -349,6 +374,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -360,6 +386,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync off
  
+ 
  
  
  
@@ -371,6 +398,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync on
  
+ 
  
  
  
@@ -385,6 +413,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
 
 
 
@@ -393,6 +422,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -403,6 +433,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

2017-07-30 Thread Amit Langote
Thank you for weighing in and reviewing the patch.

On 2017/07/28 20:55, Etsuro Fujita wrote:
> On 2017/07/26 15:29, Amit Langote wrote:
>> On 2017/07/25 9:43, David G. Johnston wrote:
>>> On Mon, Jul 24, 2017 at 5:19 PM, Amit Langote wrote:
>>> I'm curious what the other limitations are...
> 
> I think COPY has the same limitation as INSERT.

Yes.  I updated the patch to mention that as well.

>> When I first wrote that documentation line (I am assuming you're asking
>> about "although these have some limitations that normal tables do not"), I
>> was thinking about the fact that the core system does not enforce
>> (locally) any constraints defined on foreign tables.  Since we allow
>> inserting data into partitions directly, it is imperative that we enforce
>> the "partition constraint" along with the traditional constraints such as
>> NOT NULL and CHECK constraints, which we can do for local table partitions
>> but not for foreign table ones.
>>
>> Anyway, attached patch documents all these limitations about foreign table
>> partitions more prominently.
> 
> Typo: s/the they is/they are/

Fixed in the attached.

Thanks,
Amit
From 2b9c28808b725ed4551a2876a187531439f13928 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 3 Apr 2017 16:45:15 +0900
Subject: [PATCH] Clarify that partition constraint is not enforced on foreign
 tables

---
 doc/src/sgml/ddl.sgml  | 16 +---
 doc/src/sgml/ref/create_foreign_table.sgml | 17 +++--
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..a0ab648928 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2988,9 +2988,11 @@ VALUES ('Albany', NULL, NULL, 'NY');

 Partitions can also be foreign tables
 (see ),
-although these have some limitations that normal tables do not.  For
-example, data inserted into the partitioned table is not routed to
-foreign table partitions.
+although they have some limitations that normal tables do not.  For
+example, routing the data inserted (or copied) into the partitioned table
+to foreign table partitions is not supported, nor are the partition
+constraints enforced when the data is directly inserted (or copied) into
+them.

 

@@ -3297,6 +3299,14 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
not the partitioned table.
   
  
+
+ 
+  
+   Routing tuples to partitions that are foreign tables is not supported.
+   So, if a tuple inserted (or copied) into the table routes to one of
+   the foreign partitions, an error will occur.
+  
+ 
 
 
 
diff --git a/doc/src/sgml/ref/create_foreign_table.sgml 
b/doc/src/sgml/ref/create_foreign_table.sgml
index 065c982082..43a6cbcfab 100644
--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -79,7 +79,9 @@ CHECK ( expression ) [ NO INHERIT ]
   
If PARTITION OF clause is specified then the table is
created as a partition of parent_table with specified
-   bounds.
+   bounds.  Note that routing tuples to partitions that are foreign tables
+   is not supported.  So, if a tuple inserted (or copied) into the table
+   routes to one of the foreign partitions, an error will occur.
   
 
   
@@ -279,16 +281,19 @@ CHECK ( expression ) [ NO INHERIT ]
   Notes
 

-Constraints on foreign tables (such as CHECK
-or NOT NULL clauses) are not enforced by the
-core PostgreSQL system, and most foreign data wrappers
-do not attempt to enforce them either; that is, the constraint is
+Constraints (both the user-defined constraints such as CHECK
+or NOT NULL clauses and the partition constraint) are not
+enforced by the core PostgreSQL system, and most foreign
+data wrappers do not attempt to enforce them either; that is, they are
 simply assumed to hold true.  There would be little point in such
 enforcement since it would only apply to rows inserted or updated via
 the foreign table, and not to rows modified by other means, such as
 directly on the remote server.  Instead, a constraint attached to a
 foreign table should represent a constraint that is being enforced by
-the remote server.
+the remote server.  That becomes especially important if the table is
+being used in a partition hierarchy, where it is recommended to add
+a constraint matching the partition constraint expression on
+the remote table.

 

-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-07-30 Thread Masahiko Sawada
On Fri, Jul 28, 2017 at 1:13 PM, Masahiko Sawada  wrote:
> On Thu, Jul 27, 2017 at 9:31 AM, Masahiko Sawada  
> wrote:
>> On Wed, Jul 26, 2017 at 10:29 PM, Robert Haas  wrote:
>>> On Mon, Jun 19, 2017 at 4:30 AM, Masahiko Sawada  
>>> wrote:
> I think that either of the options you suggested now would be better.
> We'll need that for stopping the tablesync which is in progress during
> DropSubscription as well as those will currently still keep running. I
> guess we could simply just register syscache callback, the only problem
> with that is we'd need to AcceptInvalidationMessages regularly while we
> do the COPY which is not exactly free so maybe killing at the end of
> transaction would be better (both for refresh and drop)?

 Attached patch makes table sync worker check its relation subscription
 state at the end of COPY and exits if it has disappeared, instead of
 killing table sync worker in DDL. There is still a problem that a
 table sync worker for a large table can hold a slot for a long time
 even after its state is deleted. But it would be for PG11 item.
>>>
>>> Do we still need to do something about this?  Should it be an open item?
>>>
>>
>> Thank you for looking at this.
>>
>> Yeah, I think it should be added to the open item list. The patch is
>> updated by Petr and discussed on another thread[1] that also addresses
>> other issues of subscription codes. 0004 patch of that thread is an
>> updated patch of the patch attached on this thread.
>>
>
> Does anyone have any opinions? Barring any objections, I'll add this
> to the open item list.
>

Added it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on a function index

2017-07-30 Thread Masahiko Sawada
Moved to -hackers.

On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken  wrote:
> Thank you Masahiko! I've tested and confirmed that this patch fixes the
> problem.
>

Thank you for the testing. This issue should be added to the open item
since this cause of the server crash. I'll add it.

> On Fri, Jul 28, 2017 at 3:07 AM, Masahiko Sawada 
> wrote:
>>
>> On Mon, Jul 24, 2017 at 4:22 PM,   wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference:  14758
>> > Logged by:  Scott Milliken
>> > Email address:  sc...@deltaex.com
>> > PostgreSQL version: 10beta2
>> > Operating system:   Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
>> > Description:
>> >
>> > I'm testing logical replication on 10beta2, and found a segfault that I
>> > can
>> > reliably reproduce with an index on a not-actually immutable function.
>> >
>> > Here's the function in question:
>> >
>> > ```
>> > CREATE OR REPLACE FUNCTION public.immutable_random(integer)
>> >  RETURNS double precision
>> >  LANGUAGE sql
>> >  IMMUTABLE
>> > AS $function$SELECT random();
>> > $function$;
>> > ```
>> >
>> > It's not actually immutable since it's calling random (a hack to get an
>> > efficient random sort on a table).
>> >
>> > (Aside: I'd understand if it errored on creation of the index, but would
>> > really prefer to keep using this instead of tablesample because it's
>> > fast,
>> > deterministic, and doesn't have sampling biases like the SYSTEM
>> > sampling.)
>> >
>> >
>> > Here's full reproduction instructions:
>> >
>> >
>> > Primary:
>> > ```
>> > mkdir -p /tmp/test-seg0
>> > PGPORT=5301 initdb -D /tmp/test-seg0
>> > echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
>> > PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5301 -E utf8 test
>> >
>> > psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
>> > testtbl;"
>> > psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
>> > 'a');"
>> > ```
>> >
>> > Secondary:
>> > ```
>> > mkdir -p /tmp/test-seg1
>> > PGPORT=5302 initdb -D /tmp/test-seg1
>> > PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5302 -E utf8 test
>> >
>> > psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5302 -U test test -c 'CREATE FUNCTION
>> > public.immutable_random(integer) RETURNS double precision LANGUAGE sql
>> > IMMUTABLE AS $function$ SELECT random(); $function$'
>> > psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
>> > USING btree (immutable_random(id));"
>> > psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub
>> > CONNECTION
>> > 'port=5301 user=test dbname=test' PUBLICATION testpub;"
>> > ```
>> >
>> > The secondary crashes with a segfault:
>> >
>> > ```
>> > 2017-07-23 23:55:37.961 PDT [4823] LOG:  logical replication table
>> > synchronization worker for subscription "test0_testpub", table "testtbl"
>> > has started
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG:  worker process: logical
>> > replication
>> > worker for subscription 16396 sync 16386 (PID 4823) was terminated by
>> > signal
>> > 11: Segmentation fault
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG:  terminating any other active
>> > server
>> > processes
>> > 2017-07-23 23:55:38.245 PDT [4763] WARNING:  terminating connection
>> > because
>> > of crash of another server process
>> > 2017-07-23 23:55:38.245 PDT [4763] DETAIL:  The postmaster has commanded
>> > this server process to roll back the current transaction and exit,
>> > because
>> > another server process exited
>> >  abnormally and possibly corrupted shared memory.
>> > 2017-07-23 23:55:38.245 PDT [4763] HINT:  In a moment you should be able
>> > to
>> > reconnect to the database and repeat your command.
>> > 2017-07-23 23:55:38.247 PDT [4758] LOG:  all server processes
>> > terminated;
>> > reinitializing
>> > 2017-07-23 23:55:38.256 PDT [4826] LOG:  database system was
>> > interrupted;
>> > last known up at 2017-07-23 23:55:36 PDT
>> > 2017-07-23 23:55:38.809 PDT [4826] LOG:  database system was not
>> > properly
>> > shut down; automatic recovery in progress
>> > 2017-07-23 23:55:38.812 PDT [4826] LOG:  

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-30 Thread Julien Rouhaud
On 31/07/2017 01:47, Andres Freund wrote:
> Julien, could you quickly verify that everything's good for you now too?
> 

I just checked on current HEAD
(cc9f08b6b813e30789100b6b34110d8be1090ba0) and everything's good for me.

Thanks!

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-30 Thread Andres Freund
Hi,

On 2017-07-29 16:14:08 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ 0002-Move-ExecProcNode-from-dispatch-to-function-pointer-.patch ]
> 
> Here's a reviewed version of this patch.

Thanks!  I pushed both now.


> I added dummy ExecProcNodeMtd functions to the various node types that
> lacked them because they expect to be called through MultiExecProcNode
> instead.  In the old coding, trying to call ExecProcNode on one of those
> node types would have led to a useful error message; as you had it,
> it'd have dumped core, which is not an improvement.

Ok, makes sense.


> Also, I removed the ExecReScan stanza from ExecProcNodeFirst; that
> should surely be redundant, because we should only get to that function
> through ExecProcNode().  If somehow it's not redundant, please add a
> comment explaining why not.

Makes sense too.


> Some other minor cosmetic changes, mostly comment wordsmithing.

Thanks!


Julien, could you quickly verify that everything's good for you now too?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-30 Thread Shay Rojansky
Just to continue the above, I can confirm that adding a simple call
to SSL_CTX_set_session_id_context() to be_tls_init() with some arbitrary
const value fixes the error for me. Attached is a patch (ideally a test
should be done for this, but that's beyond what I can invest at the moment,
let me know if it's absolutely necessary).

On Mon, Jul 31, 2017 at 1:15 AM, Shay Rojansky  wrote:

> Hi Tom.
>
> Again, I know little about this, but from what I understand PostgreSQL
> wouldn't actually need to do/implement anything here - the session ticket
> might be used only to abbreviate the SSL handshake (this would explain why
> it's on by default without any application support). In other words, simply
> setting the session context id may make the problem go away and at the same
> time unlock the abbreviated SSL handshake optimization. I could be wrong
> about this though.
>
> Whether the above is correct or not, SSL resumption - which removes a
> network roundtrip from the connection process - may be a worthy
> optimization even for long-lived connections such as PostgreSQL, although
> obviously much less valuable than, say, short-lived HTTP connections.
>
> But regardless, it seems that as you say: if you *don't* want to support
> resumption, you're required to explicitly disable it with
> SSL_OP_NO_TICKET.
>
> Just to give some context, Npgsql has its own, internal TLS implementation
> which does not implement session tickets at the client side - this is the
> workaround currently used. However, it would be much better if the standard
> .NET SSL implementation could be used instead (i.e. I'm hoping a backport
> would be possible here).
>
> On Sun, Jul 30, 2017 at 10:59 PM, Tom Lane  wrote:
>
>> I wrote:
>> > I think what you need to do is tell SslStream not to expect that PG
>> > servers will do session resumption.  (I'm a bit astonished that that
>> > would be its default assumption in the first place, but whatever.)
>>
>> Actually, after a bit of further googling, it seems that the brain
>> damage here may be on the server side.  It seems that OpenSSL will
>> send a session ticket if requested, even though the surrounding
>> application has given it no means to identify the session (!?).
>> Apparently we need to pass SSL_OP_NO_TICKET to SSL_CTX_set_options
>> to prevent that from happening.
>>
>> regards, tom lane
>>
>
>


openssl-set-session-id-context.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-30 Thread Shay Rojansky
Hi Tom.

Again, I know little about this, but from what I understand PostgreSQL
wouldn't actually need to do/implement anything here - the session ticket
might be used only to abbreviate the SSL handshake (this would explain why
it's on by default without any application support). In other words, simply
setting the session context id may make the problem go away and at the same
time unlock the abbreviated SSL handshake optimization. I could be wrong
about this though.

Whether the above is correct or not, SSL resumption - which removes a
network roundtrip from the connection process - may be a worthy
optimization even for long-lived connections such as PostgreSQL, although
obviously much less valuable than, say, short-lived HTTP connections.

But regardless, it seems that as you say: if you *don't* want to support
resumption, you're required to explicitly disable it with SSL_OP_NO_TICKET.

Just to give some context, Npgsql has its own, internal TLS implementation
which does not implement session tickets at the client side - this is the
workaround currently used. However, it would be much better if the standard
.NET SSL implementation could be used instead (i.e. I'm hoping a backport
would be possible here).

On Sun, Jul 30, 2017 at 10:59 PM, Tom Lane  wrote:

> I wrote:
> > I think what you need to do is tell SslStream not to expect that PG
> > servers will do session resumption.  (I'm a bit astonished that that
> > would be its default assumption in the first place, but whatever.)
>
> Actually, after a bit of further googling, it seems that the brain
> damage here may be on the server side.  It seems that OpenSSL will
> send a session ticket if requested, even though the surrounding
> application has given it no means to identify the session (!?).
> Apparently we need to pass SSL_OP_NO_TICKET to SSL_CTX_set_options
> to prevent that from happening.
>
> regards, tom lane
>


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-30 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane 2017-07-28 <3254.1501276...@sss.pgh.pa.us>
>> Christoph Berg  writes:
>>> The plperl segfault on Debian's kfreebsd port I reported back in 2013
>>> is also still present:

>> So it'd be interesting to know if it's any better with HEAD ...

> Unfortunately not:
> The only interesting line in log/postmaster.log is a log_line_prefix-less
> Util.c: loadable library and perl binaries are mismatched (got handshake key 
> 0xd500080, needed 0xd600080)
> ... which is unchanged from the beta2 output.

Well, that's quite interesting, because it implies that this is indeed
the same type of problem.  I wonder why the patch didn't fix it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-30 Thread Christoph Berg
Re: Tom Lane 2017-07-28 <3254.1501276...@sss.pgh.pa.us>
> Christoph Berg  writes:
> > The plperl segfault on Debian's kfreebsd port I reported back in 2013
> > is also still present:
> > https://www.postgresql.org/message-id/20130515064201.GC704%40msgid.df7cb.de
> > https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=kfreebsd-amd64=10~beta2-1=1499947011=0
> 
> So it'd be interesting to know if it's any better with HEAD ...

Unfortunately not:

== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

The only interesting line in log/postmaster.log is a log_line_prefix-less
Util.c: loadable library and perl binaries are mismatched (got handshake key 
0xd500080, needed 0xd600080)
... which is unchanged from the beta2 output.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tom Lane
"Tels"  writes:
> On Sun, July 30, 2017 12:22 pm, Tom Lane wrote:
>> Yeah, I looked into that.  The closest candidate I can find is that
>> perl 5.10.1 contains Test::More 0.92.  However, it's not real clear
>> to me exactly which files I'd need to pull out of 5.10.1 and inject into
>> an older tarball --- the layout seems a lot different from a standalone
>> package.

> So basically the two files:
> http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/More.pm
> http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/Builder/Module.pm
> might do the trick.

Thanks for the hint.  I transplanted these files out of a 5.10.1
tarball into 5.8.3, then built as usual:
lib/Test/Simple.pm
lib/Test/More.pm
lib/Test/Builder.pm
lib/Test/Builder/Module.pm
The result seems to work, although it fails a few of 5.8.3's tests,
probably because I didn't copy over the relevant test scripts.
It's good enough to run PG's tests though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-30 Thread Tom Lane
I wrote:
> I think what you need to do is tell SslStream not to expect that PG
> servers will do session resumption.  (I'm a bit astonished that that
> would be its default assumption in the first place, but whatever.)

Actually, after a bit of further googling, it seems that the brain
damage here may be on the server side.  It seems that OpenSSL will
send a session ticket if requested, even though the surrounding
application has given it no means to identify the session (!?).
Apparently we need to pass SSL_OP_NO_TICKET to SSL_CTX_set_options
to prevent that from happening.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-30 Thread Tom Lane
Shay Rojansky  writes:
> When trying to connect with Npgsql to PostgreSQL with client authentication
> (PG has ssl_ca_file set), the first connection works just fine. The second
> connection, however, fails and the PostgreSQL logs contain the message
> session id context uninitialized". This occurs when using .NET's default
> SSL implementation, SslStream, which supports session resumption - the
> session connection's ClientHello message contains a session ticket from the
> first session, triggering the issue.

AFAIK Postgres doesn't support session resumption.  If I am correctly
understanding what that is supposed to provide, it would require saving
all of a backend's internal state on the off chance that somebody would
request resuming the session later.  I do not think we are going there.
The idea makes sense for servers with relatively lightweight per-session
state, but that ain't us.

I think what you need to do is tell SslStream not to expect that PG
servers will do session resumption.  (I'm a bit astonished that that
would be its default assumption in the first place, but whatever.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-30 Thread Shay Rojansky
Dear hackers, a long-standing issue reported by users of the Npgsql .NET
driver for PostgreSQL may have its roots on the PostgreSQL side. I'm far
from being an SSL/OpenSSL expert so please be patient if the terms/analysis
are incorrect.

When trying to connect with Npgsql to PostgreSQL with client authentication
(PG has ssl_ca_file set), the first connection works just fine. The second
connection, however, fails and the PostgreSQL logs contain the message
session id context uninitialized". This occurs when using .NET's default
SSL implementation, SslStream, which supports session resumption - the
session connection's ClientHello message contains a session ticket from the
first session, triggering the issue.

>From some research, it seems that for session resumption/reuse to work, the
SSL/TLS server must call SSL_CTX_set_session_id_context/and
SSL_set_session_id_context with some arbitrary binary data, to distinguish
between contexts/applications. A grep in the PostgreSQL source for
"set_session_id_context" doesn't yield anything.

Can someone with more knowledge confirm whether an issue exists on the
PostgreSQL side? If so, it seems completely trivial to fix this.

Thanks,

Shay


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tels
Moin,

On Sun, July 30, 2017 12:22 pm, Tom Lane wrote:
> "Tels"  writes:
>> On Sun, July 30, 2017 1:21 am, Tom Lane wrote:
 So the question is, does anyone care?  I wouldn't except that our
 documentation appears to claim that we work with Perl "5.8 or later".
>
>> Not sure how often People use old Perl versions out in the field. I'd
>> venture this is either happens with "ancient" stuff, e.g. where people
>> just can't or want upgrade.
>> Otherwise, an up-to-date OS is just necessary for security, anyway, and
>> that would contain a Perl from this decade, wouldn't it?
>
> Well, that's not really the point, IMO.  The reason I'm interested in
> this is the same reason I run some buildfarm critters on ancient
> platforms: if we do something that breaks backwards compatibility
> with old software, we should know it and make a deliberate decision
> that it's okay.  (And update the relevant compatibility claims in our
> docs.)  Moving the compatibility goalposts without knowing it isn't
> good, especially if it happens in supposedly-stable release branches.

Sure, I was just pointing out that moving the goalpost forward knowingly,
as you put it, can be ok vs. trying to support ancient software at all
costs. Reads to me as we are in agreement here.

>>> I am unable to confirm our claim that we work with Test::More 0.82,
>>> because CPAN has only versions from a year or three back.  (Anyone
>>> know of a more, er, comprehensive archive than CPAN?)  It's also
>>> interesting to speculate about how old a version of IPC::Run is new
>>> enough, but I see no easy way to get much data on that either.
>
>> Test::More has been bundled with Perl since 5.6.2 (you can use
>> "corelist"
>> to check for these things), so if all fails, it might be possible to
>> extract a version from a Perl distribution [4].
>
> Yeah, I looked into that.  The closest candidate I can find is that
> perl 5.10.1 contains Test::More 0.92.  However, it's not real clear
> to me exactly which files I'd need to pull out of 5.10.1 and inject into
> an older tarball --- the layout seems a lot different from a standalone
> package.

Module distributions contain a MANIFEST like this:

http://search.cpan.org/~exodist/Test-Simple/MANIFEST

And while reconstructing a module for an old version can be quite a
hassle,, it looks like Test::More is just plain Perl and only uses
Test::Builder::Module.

So basically the two files:

http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/More.pm
http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/Builder/Module.pm

might do the trick. Unless PG uses also Test::Simple or other modules,
which I haven't check, then you just need to add these, too.

And basically, you put these files into a subdirectory, and use:

  use lib "mydir";

to tell Perl to load modules from there first.

Here is a sample archive with a modern Test::More and an old Test::More
from 5.10.1. There are two scripts to see how they are loaded and used.

  http://bloodgate.com/tmp/test-more-test.tar.gz

One thing to watch out is that this would use the old Test::More, but any
module it loads (or the script use) comes from the system Perl. So the
test might not be 100% complete accurate, f.i. if a newer IO::Scalar is
used with the old Test::More.

You could also compile and install an old Perl version into a local
subdirectory and test with that. That takes a bit more time, though.

Hope this helps,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 8] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-30 Thread Mengxing Liu
In the last week, I focused on tuning the performance of skip list and fixed 
several bugs.
1. As only out-conflicts are checked in RWConflictExists, I removed all 
modification concerned with in-conflicts;
2. If the conflict list is too short, I inserted an item just like inserting 
into an ordered linked list, that is, 
comparing items existing in the list one by one to find the right position. 
Using skip list is slow when the list is short.
3. Not using the abstract skip list interface; I was afraid that it would bring 
a little overhead. But results showed that 
it had no influence. I will roll it back if necessary.


Sadly, The performance is not better than the original version.  But it's 
highest one among all efforts I did before.
It likes hash table. Tracking conflict is faster but inserting conflicts is 
slower.
In a quick test, cpu cycles consumed by two functions 
RWConflictExists/SetRWConflict is as below.
| | RWConflictExists | SetRWConflict |
| linked list | 1.39% | 0.14% |
| skip list | 1.15% | 0.35% |


According to the suggestions of Alvaro,  I'll give a detailed performance 
report tomorrow.


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-30 Thread Mengxing Liu
Thanks for your reply. 


Actually, the result of without "rdtsc" is reasonable. I used perf to analyze 
the performance and found that
even thought the function tracking conflicts (RWConflictExists) was faster, the 
function inserting conflicts (SetRWConflict)
was too slower. According to your suggestion, I found  there were much more 
waiting events of predicate_lock_manager.
That means, slower SetRWConflict resulted in more lock conflicts. 
So I took some effort to made it faster in the last days.  

Why the code with "rdtsc" is much faster? I thought that may be caused by some 
mistakes.
When I changed a machine to run the code, this phenomenon didn't happen 
anymore..
-Original Messages-
From: "Robert Haas" 
Sent Time: 2017-07-29 02:46:47 (Saturday)
To: "Mengxing Liu" 
Cc: "Alvaro Herrera" , kgrittn , 
"pgsql-hackers@postgresql.org" 
Subject: Re: [HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict 
tracking in serializable transactions


On Wed, Jul 26, 2017 at 11:41 AM, Mengxing Liu  
wrote:

Hi, all. There was a very strange phenomenon I couldn't explain. So I was 
wondering if you can help me.


I was trying to replace the linked list with a skip list in serializable 
transaction object for faster conflict tracking. But the performance is bad.
So I used the instruction "rdtsc" to compare the speed of my skip list and the 
original linked list. The skip list was about 1.5x faster.


The interesting thing is that if I added the instruction "rdstc" at the end of 
the function "RWConflictExists", 
the performance of the whole system was increased by at most 3 times! 
Here is the result. 


| benchmarks | without rdtsc  | with rdtsc |
| simpe read/write | 4.91 | 14.16 |
| ssibench | 9.72 | 10.24 |
| tpcb | 26.45 | 26.38 |


( The simple read/write benchmark has the most number of conflicts. )


The patch is attached. All the difference of the two columns is with/without a 
simple line of code:
__asm__ __volatile__ ("rdtsc"); 
But I don't know why this instruction will influence the performance so much!


Lock contention is really expensive, so a slight delay that is just long enough 
to prevent the contention from happening can sometimes improve performance.  
This example is surprisingly dramatic, though.  Of course, we can't commit it 
this way -- it will break on non-x86.


I would suggest that you gather information on what wait events are occurring 
in the "without rdtsc" case.  Like this:



$ script

$ psql

psql=> select wait_event from pg_stat_activity;

psql=> \watch 0.5

...run test in another window...

^C

\q

^D

...use awk or perl or something to count up the wait events and see where the 
contention is happening...


--

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

Sincerely


Mengxing Liu








Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tom Lane
Noah Misch  writes:
> On Sun, Jul 30, 2017 at 12:05:10PM -0400, Tom Lane wrote:
>> Well, OK, but I'd still like to tweak configure so that it records
>> an absolute path for prove rather than just setting PROVE=prove.
>> That way you'd at least be able to tell from the configure log
>> whether you are possibly at risk.

> That's an improvement.

The reason it does that seems to be that we use AC_CHECK_PROGS
rather than AC_PATH_PROGS for locating "prove".  I can see no
particular consistency to the decisions made in configure.in
about which to use:

AC_CHECK_PROGS(GCOV, gcov)
AC_CHECK_PROGS(LCOV, lcov)
AC_CHECK_PROGS(GENHTML, genhtml)
AC_CHECK_PROGS(DTRACE, dtrace)
AC_CHECK_PROGS(XML2_CONFIG, xml2-config)
AC_CHECK_PROGS(DBTOEPUB, dbtoepub)
AC_CHECK_PROGS(XMLLINT, xmllint)
AC_CHECK_PROGS(XSLTPROC, xsltproc)
AC_CHECK_PROGS(OSX, [osx sgml2xml sx])
AC_CHECK_PROGS(FOP, fop)
AC_CHECK_PROGS(PROVE, prove)

versus

AC_PATH_PROG(TAR, tar)
PGAC_PATH_BISON
PGAC_PATH_FLEX
PGAC_PATH_PERL
PGAC_PATH_PYTHON
AC_PATH_PROG(ZIC, zic)
PGAC_PATH_TCLCONFIGSH([$with_tclconfig])

I'm tempted to propose that we switch *all* of these uses of
AC_CHECK_PROGS to AC_PATH_PROGS.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Noah Misch
On Sun, Jul 30, 2017 at 12:05:10PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Sun, Jul 30, 2017 at 01:21:28AM -0400, Tom Lane wrote:
> >> I think it'd be a good idea to insist that "prove" be in
> >> the same directory we found "perl" in.
> 
> > Nah; on my machines, I use /usr/bin/perl and ~/sw/cpan/bin/prove.  The 
> > latter
> > is built against the former, so there's no particular hazard.
> 
> Well, OK, but I'd still like to tweak configure so that it records
> an absolute path for prove rather than just setting PROVE=prove.
> That way you'd at least be able to tell from the configure log
> whether you are possibly at risk.

That's an improvement.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tom Lane
"Tels"  writes:
> On Sun, July 30, 2017 1:21 am, Tom Lane wrote:
>>> So the question is, does anyone care?  I wouldn't except that our
>>> documentation appears to claim that we work with Perl "5.8 or later".

> Not sure how often People use old Perl versions out in the field. I'd
> venture this is either happens with "ancient" stuff, e.g. where people
> just can't or want upgrade.
> Otherwise, an up-to-date OS is just necessary for security, anyway, and
> that would contain a Perl from this decade, wouldn't it?

Well, that's not really the point, IMO.  The reason I'm interested in
this is the same reason I run some buildfarm critters on ancient
platforms: if we do something that breaks backwards compatibility
with old software, we should know it and make a deliberate decision
that it's okay.  (And update the relevant compatibility claims in our
docs.)  Moving the compatibility goalposts without knowing it isn't
good, especially if it happens in supposedly-stable release branches.

>> I am unable to confirm our claim that we work with Test::More 0.82,
>> because CPAN has only versions from a year or three back.  (Anyone
>> know of a more, er, comprehensive archive than CPAN?)  It's also
>> interesting to speculate about how old a version of IPC::Run is new
>> enough, but I see no easy way to get much data on that either.

> Test::More has been bundled with Perl since 5.6.2 (you can use "corelist"
> to check for these things), so if all fails, it might be possible to
> extract a version from a Perl distribution [4].

Yeah, I looked into that.  The closest candidate I can find is that
perl 5.10.1 contains Test::More 0.92.  However, it's not real clear
to me exactly which files I'd need to pull out of 5.10.1 and inject into
an older tarball --- the layout seems a lot different from a standalone
package.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tom Lane
Noah Misch  writes:
> On Sun, Jul 30, 2017 at 01:21:28AM -0400, Tom Lane wrote:
>> I think it'd be a good idea to insist that "prove" be in
>> the same directory we found "perl" in.

> Nah; on my machines, I use /usr/bin/perl and ~/sw/cpan/bin/prove.  The latter
> is built against the former, so there's no particular hazard.

Well, OK, but I'd still like to tweak configure so that it records
an absolute path for prove rather than just setting PROVE=prove.
That way you'd at least be able to tell from the configure log
whether you are possibly at risk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Noah Misch
On Sun, Jul 30, 2017 at 01:21:28AM -0400, Tom Lane wrote:
> I wrote:
> > So the question is, does anyone care?  I wouldn't except that our
> > documentation appears to claim that we work with Perl "5.8 or later".
> > And the lack of field complaints suggests strongly that nobody else
> > cares.  So I'm inclined to think we just need to be more specific
> > about the minimum Perl version --- but what exactly?
> 
> I've done some more research on this.  It seems to me there are four
> distinct components to any claim about whether we work with a particular
> Perl version:
> 
> 1. Can it run the build-related Perl scripts needed to build PG from
> a bare git checkout, on non-Windows platforms?
> 2. Can it run our MSVC build scripts?
> 3. Does pl/perl compile (and pass its regression tests) against it?
> 4. Can it run our TAP tests?

> 5.8.3 does appear to succeed at points 1,3,4.  Now, to get through
> the TAP tests you need to install IPC::Run, and you also need to
> update Test::More because the version shipped with 5.8.3 is too old.
> But at least the failures that you get from lacking these are pretty
> clear.

> Anyway, pending some news about compatibility of the MSVC scripts,
> I think we ought to adjust our docs to state that 5.8.3 is the
> minimum supported Perl version.

Works for me.  I wouldn't wait for testing of the MSVC scripts.  Strawberry
Perl started with 5.8.8.  ActivePerl is far older, but it distributes old
versions to subscription holders only.  Besides, the main advantage of
old-version support is letting folks use a packaged/preinstalled binary, and
that doesn't apply on Windows.

> Also, I got seriously confused at one point during these tests because,
> while our configure script carefully sets PERL to an absolute path name,
> it's content to set PROVE to "prove", paying no attention to whether
> that version of "prove" matches "perl".  Is it really okay to run the
> TAP tests with a different perl version than we selected for other
> purposes?

Typically yes, though one can construct exceptions.

> I think it'd be a good idea to insist that "prove" be in
> the same directory we found "perl" in.

Nah; on my machines, I use /usr/bin/perl and ~/sw/cpan/bin/prove.  The latter
is built against the former, so there's no particular hazard.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tels
Moin Tom,

On Sun, July 30, 2017 1:21 am, Tom Lane wrote:
> I wrote:
>> So the question is, does anyone care?  I wouldn't except that our
>> documentation appears to claim that we work with Perl "5.8 or later".
>> And the lack of field complaints suggests strongly that nobody else
>> cares.  So I'm inclined to think we just need to be more specific
>> about the minimum Perl version --- but what exactly?

My 0.02 cent on Perl versions:

Not sure how often People use old Perl versions out in the field. I'd
venture this is either happens with "ancient" stuff, e.g. where people
just can't or want upgrade.

Otherwise, an up-to-date OS is just necessary for security, anyway, and
that would contain a Perl from this decade, wouldn't it?

If someone is still running Perl 5.8.3, they also got glorius Unicode
circa Unicode 4.0 or in this area...

> I've done some more research on this.  It seems to me there are four
> distinct components to any claim about whether we work with a particular
> Perl version:
>
> 1. Can it run the build-related Perl scripts needed to build PG from
> a bare git checkout, on non-Windows platforms?
> 2. Can it run our MSVC build scripts?
> 3. Does pl/perl compile (and pass its regression tests) against it?
> 4. Can it run our TAP tests?
>
> I have no info to offer about point #2, but I did some tests with
> ancient 5.8.x Perl versions on my buildfarm hosts prairiedog and
> gaur.  (I would have liked to use something faster, but these Perl
> versions failed to build at all on more recent platforms, and
> I thought it rather pointless to try to hack them enough to build.)
>
> I find that perl 5.8.0 and later seem to succeed at point #1,
> but you need at least 5.8.1 to compile plperl.  Also, on prairiedog
> 5.8.1 fails plperl's regression tests because of some seemingly
> utf8-related bug.  That may be an ancient-macOS problem more than
> anything else, so I didn't poke at it too hard.
>
> The really surprising thing I found out is that you can't run the
> TAP tests on anything older than 5.8.3, because that's when they
> added "prove".  I'm bemused by the idea that such a fundamental
> facility would get added in a third-digit minor release, but there
> you have it.  (Now, in fairness, this amounted to importing a feature
> that already existed on CPAN, but still...)
>
> 5.8.3 does appear to succeed at points 1,3,4.  Now, to get through
> the TAP tests you need to install IPC::Run, and you also need to
> update Test::More because the version shipped with 5.8.3 is too old.
> But at least the failures that you get from lacking these are pretty
> clear.
>
> I am unable to confirm our claim that we work with Test::More 0.82,
> because CPAN has only versions from a year or three back.  (Anyone
> know of a more, er, comprehensive archive than CPAN?)  It's also
> interesting to speculate about how old a version of IPC::Run is new
> enough, but I see no easy way to get much data on that either.
>
> Anyway, pending some news about compatibility of the MSVC scripts,
> I think we ought to adjust our docs to state that 5.8.3 is the
> minimum supported Perl version.
>
> Also, I got seriously confused at one point during these tests because,
> while our configure script carefully sets PERL to an absolute path name,
> it's content to set PROVE to "prove", paying no attention to whether
> that version of "prove" matches "perl".  Is it really okay to run the
> TAP tests with a different perl version than we selected for other
> purposes?  I think it'd be a good idea to insist that "prove" be in
> the same directory we found "perl" in.

Thank you for the analysis. I agree about "prove".

As for Test::More:

Test::More has been bundled with Perl since 5.6.2 (you can use "corelist"
to check for these things), so if all fails, it might be possible to
extract a version from a Perl distribution [4].

CPAN authors are encouraged to clean out old versions due to the sheer
size of the archive. (Not all got the memo, tho...*cough*) and most
mirrors only carry the current files.

The original author is Michael G. Schwern [0]. But it seems he cleaned
house :)

You might have luck with an mirror [1] who didn't clean out, or with
archive.org.

But with Test::More, it seems a bit confusing, as it is part of
Test::Simple [2], which in turn is part of Test2, which is now on github
[3]. It's Test-Modules all the way down.

I'm not sure you'd find old Test::More versions ready-to-use in this.

My apologies if you knew that already.

However, I do so happen to have a large archive with Perl releases and
CPAN modules. It was first mirrored on mid-2015 - so anything that was
deleted before 2015 unfortunately I can't help you with that.

But if you need a specific module version, just ping me and I can see if
it's in there.

Hope this helps,

Tels

[0]: http://ftp.nluug.nl/languages/perl/CPAN/authors/id/M/MS/MSCHWERN/
[1]: http://mirrors.cpan.org/
[2]: http://search.cpan.org/~exodist/Test-Simple-1.302086/
[3]: 

Re: [HACKERS] Page Scan Mode in Hash Index

2017-07-30 Thread Ashutosh Sharma
Hi,

On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma  wrote:
> While doing the code coverage testing of v7 patch shared with - [1], I
> found that there are few lines of code in _hash_next() that are
> redundant and needs to be removed. I basically came to know this while
> testing the scenario where a hash index scan starts when a split is in
> progress. I have removed those lines and attached is the newer version
> of patch.
>

Please find the new version of patches for page at a time scan in hash
index rebased on top of latest commit in master branch. Also, i have
ran pgindent script with pg_bsd_indent version 2.0 on all the modified
files. Thanks.

> [1] - 
> https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com
>
> On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma  wrote:
>> Hi,
>>
>> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila  wrote:
>>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma  
>>> wrote:

 Please note that these patches needs to be applied on top of  [1].

>>>
>>> Few more review comments:
>>>
>>> 1.
>>> - page = BufferGetPage(so->hashso_curbuf);
>>> + blkno = so->currPos.currPage;
>>> + if (so->hashso_bucket_buf == so->currPos.buf)
>>> + {
>>> + buf = so->currPos.buf;
>>> + LockBuffer(buf, BUFFER_LOCK_SHARE);
>>> + page = BufferGetPage(buf);
>>> + }
>>>
>>> Here, you are assuming that only bucket page can be pinned, but I
>>> think that assumption is not right.  When _hash_kill_items() is called
>>> before moving to next page, there could be a pin on the overflow page.
>>> You need some logic to check if the buffer is pinned, then just lock
>>> it.  I think once you do that, it might not be convinient to release
>>> the pin at the end of this function.
>>
>> Yes, there are few cases where we might have pin on overflow pages too
>> and we need to handle such cases in _hash_kill_items. I have taken
>> care of this in the attached v7 patch. Thanks.
>>
>>>
>>> Add some comments on top of _hash_kill_items to explain the new
>>> changes or say some thing like "See _bt_killitems for details"
>>
>> Added some more comments on the new changes.
>>
>>>
>>> 2.
>>> + /*
>>> + * We save the LSN of the page as we read it, so that we know whether it
>>> + * safe to apply LP_DEAD hints to the page later.  This allows us to drop
>>> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
>>> + */
>>> + so->currPos.lsn = PageGetLSN(page);
>>> +
>>>
>>> The second part of above comment doesn't make sense because vacuum's
>>> will still be blocked because we hold the pin on primary bucket page.
>>
>> That's right. It doesn't make sense because we won't allow vacuum to
>> start. I have removed it.
>>
>>>
>>> 3.
>>> + {
>>> + /*
>>> + * No more matching tuples were found. return FALSE
>>> + * indicating the same. Also, remember the prev and
>>> + * next block number so that if fetching tuples using
>>> + * cursor we remember the page from where to start the
>>> + * scan.
>>> + */
>>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>>
>>> You can't read opaque without having lock and by this time it has
>>> already been released.
>>
>> I have corrected it. Please refer to the attached v7 patch.
>>
>> Also, I think if you want to save position for
>>> cursor movement, then you need to save the position of last bucket
>>> when scan completes the overflow chain, however as you have written it
>>> will be always invalid block number. I think there is similar problem
>>> with prevblock number.
>>
>> Did you mean last bucket or last page. If it is last page, then I am
>> already storing it.
>>
>>>
>>> 4.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>>> offnum,
>>> +   OffsetNumber maxoff, ScanDirection dir)
>>> +{
>>> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
>>> + IndexTuple  itup;
>>> + int itemIndex;
>>> +
>>> + if (ScanDirectionIsForward(dir))
>>> + {
>>> + /* load items[] in ascending order */
>>> + itemIndex = 0;
>>> +
>>> + /* new page, relocate starting position by binary search */
>>> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>>>
>>> What is the need to find offset number when this function already has
>>> that as an input parameter?
>>
>> It's not required. I have removed it.
>>
>>>
>>> 5.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>>> offnum,
>>> +   OffsetNumber maxoff, ScanDirection dir)
>>>
>>> I think maxoff is not required to be passed a parameter to this
>>> function as you need it only for forward scan. You can compute it
>>> locally.
>>
>> It is required for both forward and backward scan. However, I am not
>> passing it to _hash_load_qualified_items() now.
>>
>>>
>>> 6.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>>> offnum,
>>> +   OffsetNumber