date:20170514

[HACKERS] NOT NULL constraints on range partition key columns

2017-05-14 Thread Amit Langote

Starting a new thread to discuss the proposal I put forward in [1] to stop
creating explicit NOT NULL constraint on range partition keys that are
simple columns.  I said the following:

On 2017/05/12 11:20, Robert Haas wrote:
> On Thu, May 11, 2017 at 10:15 PM, Amit Langote
>  wrote:
>> On 2017/05/12 10:42, Robert Haas wrote:
>>> Actually, I think that not supporting nulls for range partitioning may
>>> have been a fairly bad decision.
>>
>> I think the relevant discussion concluded [1] that way, because we
>> couldn't decide which interface to provide for specifying where NULLs are
>> placed or because we decided to think about it later.
>
> Yeah, but I have a feeling that marking the columns NOT NULL is going
> to make it really hard to support that in the future when we get the
> syntax hammered out.  If it had only affected the partition
> constraints that'd be different.

So, adding keycol IS NOT NULL (like we currently do for expressions) in
the implicit partition constraint would be more future-proof than
generating an actual catalogued NOT NULL constraint on the keycol?  I now
tend to think it would be better.  Directly inserting into a range
partition with a NULL value for a column currently generates a "null value
in column \"%s\" violates not-null constraint" instead of perhaps more
relevant "new row for relation \"%s\" violates partition constraint".
That said, we *do* document the fact that a NOT NULL constraint is added
on range key columns, but we might as well document instead that we don't
currently support routing tuples with NULL values in the partition key
through a range-partitioned table and so NULL values cause error.

Can we still decide to do that instead?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/f52fb5c3-b6ad-b6ff-4bb6-145fdb805fa6%40lab.ntt.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event triggers + table partitioning cause server crash in current master

2017-05-14 Thread Amit Langote

On 2017/05/14 12:03, Mark Dilger wrote:
> Hackers,
> 
> I discovered a reproducible crash using event triggers in the current
> development version, 29c7d5e483acaa74a0d06dd6c70b320bb315.
> I was getting a crash before this version, and cloned a fresh copy of
> the sources to be sure I was up to date, so I don't think the bug can be
> attributed to Andres' commit.  (The prior version I was testing against
> was heavily modified by me, so I recreated the bug using the latest
> standard, unmodified sources.)
> 
> I create both before and after event triggers early in the regression test
> schedule, which then fire here and there during the following tests, leading
> fairly reproducibly to the server crashing somewhere during the test suite.
> These crashes do not happen for me without the event triggers being added
> to the tests.  Many tests show as 'FAILED' simply because the logging
> that happens in the event triggers creates unexpected output for the test.
> Those "failures" are expected.  The server crashes are not.
> 
> The server logs suggest the crashes might be related to partitioned tables.
> 
> Please find attached the patch that includes my changes to the sources
> for recreating this bug.  The logs and regression.diffs are a bit large; let
> me know if you need them.
> 
> I built using the command
> 
> ./configure --enable-cassert --enable-tap-tests && make -j4 && make check

Thanks for the report and providing steps to reproduce.

It seems that it is indeed a bug related to creating range-partitioned
tables.  DefineRelation() calls AlterTableInternal() to add NOT NULL
constraints on the range partition key columns, but the code fails to
first initialize the event trigger context information.  Attached patch
should fix that.

Thanks to the above test case, I also discovered that in the case of
creating a partition, manipulations performed by MergeAttributes() on the
input schema list may cause it to become invalid, that is, the List
metadata (length) will no longer match the reality, because while the
ListCells are deleted from the input list, the List pointer passed to
list_delete_cell does not point to the same list.  This caused a crash
when the CreateStmt in question was subsequently passed to copyObject,
which tried to access CreateStmt.tableElts that has become invalid as just
described.  The attached patch also takes care of that.

Thanks,
Amit
>From 43f29ac13abda3c6ad71143d462c2dcd2c42f521 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 15 May 2017 14:00:54 +0900
Subject: [PATCH] Fix to make partitioned tables work with event triggers

There were a couple of bugs here:

 1. When creating range partitioned tables, AlterTableInternal is
called which invokes the event trigger code.  We must perform
EventTriggerAlterTableStart() before calling AlterTableInternal
so that the event trigger context information is initialized at
all.

 2. When creating a partition, CreateStmt.tableElts is likely to
become invalid upon return from MergeAttributes().  We must replace
the invalid value with the newly constructed correct list value
computed by MergeAttributes().  For example, in the caller
ProcessUtilitySlow(), EventTriggerCollectSimpleCommand() is called
right after returning from DefineRelation(), and it expects the
the content of CreateStmt to be valid to perform copyObject on it.

First bug was reported by Mark Dilger.
Report: https://postgr.es/m/4A9B8269-9771-4FB7-BFA3-84249800917D%40gmail.com
---
 src/backend/commands/tablecmds.c| 17 ++
 src/test/regress/expected/event_trigger.out | 32 ++
 src/test/regress/sql/event_trigger.sql  | 35 +
 3 files changed, 84 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b94db89b25..c3f489308a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -620,6 +620,19 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 			 stmt->relation->relpersistence,
 			 stmt->partbound != NULL,
 			 &inheritOids, &old_constraints, &parentOidCount);
+	/*
+	 * The original value of stmt->tableElts may have been obsoleted by
+	 * MergeAttributes(), especially those for partitions, wherein the
+	 * cells of stmt->tableElts are deleted.  Note that in a partition's
+	 * case, stmt->tableElts contains dummy ColumnDef's created to capture
+	 * the partition's own column constraints which are merged into the actual
+	 * column definitions derived from the parent, before deleting the
+	 * aforementioned dummy ones.  We don't need to refer to stmt->tableElts
+	 * hereafter in this function, although the caller expects it to contain
+	 * valid elements upon return.  So replace the original list with the
+	 * one that's known to be valid.
+	 */
+	stmt->tableElts = schema;

 	/*
 	 * Create a tuple descriptor from the relation schema.  Note that this
@@ -853,7

Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-14 Thread Michael Paquier

On Mon, May 15, 2017 at 2:04 PM, Dilip Kumar  wrote:
> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar  wrote:
>> After your fix, now tupleid is invalid which is expected, but seems
>> like we need to do something more. As per the comments seems like it
>> is expected to get the oldtuple from planSlot.  But I don't see any
>> code for handling that part.
>
> Maybe we should do something like attached patch.

As a deficiency, shouldn't this try as well to improve regression test
coverage for FDW triggers with inheritance trees? Those tests are in
postgres_fdw. You may find other issues on the way..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Duplicate usage of tablespace location?

2017-05-14 Thread Michael Paquier

On Thu, May 11, 2017 at 1:09 PM, Kyotaro HORIGUCHI
 wrote:
> If we can accept multiple server versions share a tablespace
> directory, pg_basebackup also can allow that situation. The
> attached patch does that. Similary to the server code, it
> correctly fails if the same version subdirectory exists.

+#define verify_dir_is_empty_or_create(dirname, created, found) \
+verify_and_create_dir(dirname, created, found, false)
This solution looks like a quick-and-dirty fix. I tend to prefer a
solution close to whet Pierre is proposing on the other thread by
localizing things in ReceiveAndUnpackTarFile(). This makes the check
more localized, and there is no need to complicate low-level APIs of
pg_basebackup.c.

By the way, it may be better to keep discussions on the first thread created:
https://www.postgresql.org/message-id/05c62730-8670-4da6-b783-52e66fb42...@pinaraf.info
A patch has been submitted to the next CF there as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

2017-05-14 Thread Michael Paquier

On Tue, May 2, 2017 at 2:28 AM, Pierre Ducroquet  wrote:
> I didn't have much time to spend on that issue until today, and I found a
> way to solve it that seems acceptable to me.
>
> The biggest drawback will be that if the backup is interrupted, previous
> tablespaces already copied will stay on disk, but since that issue could
> already happen, I don't think it's too much a drawback.

Yeah... Perhaps cleanup_directories_atexit() could be made smarter by
registering the list of folders to cleanup when they are created. But
that would be for another patch.

> The patch simply delays the empty folder checking until we start reading the
> tablespace tarball. The first entry of the tarball being the PG_MAJOR_CATVER
> folder, that can not be found otherwise, there is no real alternative to
> that.

I think I like this idea. The server allows tablespaces to be created
in parallel for different versions in the same path. It seems that
pg_basebackup should allow the same for consistency. The current
behavior is present since pg_basebackup has been introduced by the
way. Perhaps Magnus has some insight to offer on this.

> I will submit this patch in the current commit fest.

I have not spotted any flaws in the refactored logic.

 boolbasetablespace;
+boolfirstfile = 1;
 char   *copybuf = NULL;
booleans are assigned with true or false.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-14 Thread Dilip Kumar

On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar  wrote:
> After your fix, now tupleid is invalid which is expected, but seems
> like we need to do something more. As per the comments seems like it
> is expected to get the oldtuple from planSlot.  But I don't see any
> code for handling that part.

Maybe we should do something like attached patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


fix_fdw_trigger.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typos in pg_basebackup.c

2017-05-14 Thread Michael Paquier

Hi all,

Found $subject while working on the area. A patch is attached.
Thanks,
-- 
Michael


basebackup-typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bumping HASH_VERSION to 3

2017-05-14 Thread Noah Misch

On Wed, May 10, 2017 at 11:25:22AM +0530, Amit Kapila wrote:
> On Tue, Apr 11, 2017 at 11:53 PM, Bruce Momjian  wrote:
> > On Fri, Mar 31, 2017 at 02:48:05PM -0400, Tom Lane wrote:
> >> +1, as long as we're clear on what will happen when pg_upgrade'ing
> >> an installation containing hash indexes.  I think a minimum requirement is
> >> that it succeed and be able to start up, and allow the user to manually
> >> REINDEX such indexes afterwards.  Bonus points for:
> >>
> >> 1. teaching pg_upgrade to create a script containing the required REINDEX
> >> commands.  (I think it's produced scripts for similar requirements in the
> >> past.)
> >>
> >> 2. marking the index invalid so that the system would silently ignore it
> >> until it's been reindexed.  I think there might be adequate infrastructure
> >> for that already thanks to REINDEX CONCURRENTLY, and it'd just be a matter
> >> of getting pg_upgrade to hack the indexes' catalog state.  (If not, it's
> >> probably not worth the trouble.)
> >
> > We already have code to do all of that, but it was removed from
> > pg_upgrade in 9.5.  You can still see it in 9.4:
> >
> > 
> > contrib/pg_upgrade/version_old_8_3.c::old_8_3_invalidate_hash_gin_indexes()
> >
> 
> Thanks for the pointer.
> 
> > I would be happy to restore that code and make it work for PG 10.
> >
> 
> Attached patch implements the two points suggested by Tom. I will add
> this to PG-10 open issues list so that we don't forget about this
> work, let me know if you think otherwise.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of plaintext password type references

2017-05-14 Thread Noah Misch

On Thu, May 11, 2017 at 10:08:30AM +0900, Michael Paquier wrote:
> On Wed, May 10, 2017 at 10:22 PM, Michael Paquier  
> wrote:
> > On Wed, May 10, 2017 at 10:01 PM, Tom Lane  wrote:
> >> Heikki Linnakangas  writes:
> >>> Also note that changing the signature check_password_hook_type would
> >>> break any external modules that use the hook. Removing
> >>> PASSWORD_TYPE_PLAINTEXT will do that too, because any password hook
> >>> function would use that constant (see e.g. contrib/passwordcheck). If we
> >>> were to change the signature, I'd actually like to simplify it by
> >>> removing the password_type parameter altogether. The hook function can
> >>> call get_password_type() on the password itself to get the same
> >>> information. But it's not worth changing the API and breaking external
> >>> modules for that.
> >
> > Ahah. I just had the same thought before reading this message.
> 
> And attached is a patch to do that. I am all for this one to get a
> more simple interface in place.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-14 Thread Neha Khatri

On Fri, May 12, 2017 at 11:58 PM, Petr Jelinek  wrote:

> On 11/05/17 15:43, Petr Jelinek wrote:
> > Hi,
>
> >
> > We do check for this, but only during replication which we have to do
> > because the fact that relation 't' was foreign table during ALTER
> > SUBSCRIPTION does not mean that it won't be something else half hour
> later.
> >
> > I think it does make sense to add check for this into CREATE/ALTER
> > SUBSCRIBER though so that user is informed immediately about the mistake
> > rather than by errors in the logs later.
> >
> > I'll look into writing patch for this. I don't think it's beta blocker
> > though.
> >
>
> So I moved the relkind check to single function and call it from all the
> necessary places. See the attached
>
>
With this patch the error will be like this:

  logical replication target relation public.t is not a table

But it is possible that the referred table is Foreign Table of Partitioned
table (so actually the referred object is indeed a table).
Would it make sense to specify in the message that the table is not a
normal table or something in that line?

Regards,
Neha

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-14 Thread Noah Misch

On Mon, May 08, 2017 at 12:28:38PM -0400, Peter Eisentraut wrote:
> On 5/8/17 02:58, Noah Misch wrote:
> > IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> > for your status update.  Please reacquaint yourself with the policy on open
> > item ownership[1] and then reply immediately.  If I do not hear from you by
> > 2017-05-09 07:00 UTC, I will transfer this item to release management team
> > ownership without further notice.
> 
> I got side-tracked by the minor releases preparation.  I will work on
> this now and report on Wednesday.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-05-16 04:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-14 Thread Noah Misch

On Fri, May 12, 2017 at 08:54:13AM +, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
> > Takayuki
> > I found a wrong sentence here in the doc.  I'm sorry, this is what I asked
> > you to add...
> > 
> > https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq-
> > paramkeywords
> > 
> > connect_timeout
> > Maximum wait for connection, in seconds (write as a decimal integer string).
> > Zero or not specified means wait indefinitely. It is not recommended to
> > use a timeout of less than 2 seconds. This timeout applies separately to
> > each connection attempt. For example, if you specify two hosts and both
> > of them are unreachable, and connect_timeout is 5, the total time spent
> > waiting for a connection might be up to 10 seconds.
> > 
> > 
> > The program behavior is that libpq times out after connect_timeout seconds
> > regardless of how many hosts are specified.  I confirmed it like this:
> > 
> > $ export PGOPTIONS="-c post_auth_delay=30"
> > $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p
> > 5432,5433
> > (psql erros out after 5 seconds)
> > 
> > Could you fix the doc with something like this?
> > 
> > "This timeout applies across all the connection attempts. For example, if
> > you specify two hosts and both of them are unreachable, and connect_timeout
> > is 5, the total time spent waiting for a connection is up to 5 seconds."
> > 
> > Should we also change the minimum "2 seconds" part to be longer, according
> > to the number of hosts?
> 
> Instead, I think we should fix the program to match the documented behavior.  
> Otherwise, if the first database machine is down, libpq might wait for about 
> 2 hours (depending on the OS's TCP keepalive setting), during which it tims 
> out after connect_timeout and does not attempt to connect to other hosts.
> 
> I'll add this item in the PostgreSQL 10 Open Items.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in reorderbuffer.c

2017-05-14 Thread Masahiko Sawada

Hi,

Attached patch for $subject.
I think "that's" is correct word.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_reorderbuffer_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-14 Thread Andres Freund

On 2017-05-14 14:54:37 +0200, Petr Jelinek wrote:
> On 13/05/17 22:23, Andres Freund wrote:
> >> And wait for session 3 to finish slot creation, takes about 20 mins on
> >> my laptop without patches, minute and half with your patches for 0004
> >> and 0005 (or with your 0004 and my 0005) and about 2s with my original
> >> 0004 and 0005.
> > 
> > Is that with assertions enabled or not?  With assertions all the time
> > post patches seems to be spent in some Asserts in reorderbuffer.c,
> > without it takes less than a second for me here.
> >
> 
> Right you are, I always forget to switch that off before benchmarks.

Phew ;)

> > I'm appylying these now.
> 
> Cool. Just for posterity, this also fixes the issue number 3 as the
> switch to consistent state is done purely based on xl_running_xacts and
> not decoded commits/aborts.

Cool.  Although I'm still not convinced, as noted somewhere in this
thread, that it actually did much to start with ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-14 Thread Noah Misch

On Thu, May 11, 2017 at 05:55:02PM +0530, tushar wrote:
> I observed that -we cannot publish "foreign table" in Publication
> 
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't1');
> CREATE FOREIGN TABLE
> 
> postgres=# create publication pub for table t;
> ERROR:  "t" is not a table
> DETAIL:  Only tables can be added to publications.
> postgres=#
> 
> but same thing is not true for Subscription
> 
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't');
> CREATE FOREIGN TABLE
> postgres=# alter subscription sub refresh publication ;
> NOTICE:  added subscription for table public.t
> ALTER SUBSCRIPTION
> 
> Is this an expected behavior ?   if we cannot publish then how  can we  add
> subscription for it.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-14 Thread Noah Misch

On Mon, May 08, 2017 at 06:27:30PM +0900, Masahiko Sawada wrote:
> I encountered a situation where DROP SUBSCRIPTION got stuck when
> initial table sync is in progress. In my environment, I created
> several tables with some data on publisher. I created subscription on
> subscriber and drop subscription immediately after that. It doesn't
> always happen but I often encountered it on my environment.
> 
> ps -x command shows the following.
> 
>  96796 ?Ss 0:00 postgres: masahiko postgres [local] DROP
> SUBSCRIPTION
>  96801 ?Ts 0:00 postgres: bgworker: logical replication
> worker for subscription 40993waiting
>  96805 ?Ss 0:07 postgres: bgworker: logical replication
> worker for subscription 40993 sync 16418
>  96806 ?Ss 0:01 postgres: wal sender process masahiko [local] idle
>  96807 ?Ss 0:00 postgres: bgworker: logical replication
> worker for subscription 40993 sync 16421
>  96808 ?Ss 0:00 postgres: wal sender process masahiko [local] idle
> 
> The DROP SUBSCRIPTION process (pid 96796) is waiting for the apply
> worker process (pid 96801) to stop while holding a lock on
> pg_subscription_rel. On the other hand the apply worker is waiting for
> acquiring a tuple lock on pg_subscription_rel needed for heap_update.
> Also table sync workers (pid 96805 and 96807) are waiting for the
> apply worker process to change their status.
> 
> Also, even when DROP SUBSCRIPTION is done successfully, the table sync
> worker can be orphaned because I guess that the apply worker can exit
> before change status of table sync worker.
> 
> I'm using 1f30295.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-05-14 Thread Kyotaro HORIGUCHI

At Fri, 12 May 2017 11:44:19 +0900, Masahiko Sawada  
wrote in 
> On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
>  wrote:
> > Hi all,
> >
> > I had my eyes on the WAL sender code this morning, and I have noticed
> > that walsender.c is not completely consistent with the PID lookups it
> > does in walsender.c. In two code paths, the PID value is checked
> > without holding the WAL sender spin lock (WalSndRqstFileReload and
> > pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> > what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> > doing for ages.
> >
> > Any thoughts about the patch attached to make things more consistent?
> > It seems to me that having some safeguards would be nice for
> > robustness.
> 
> +1. I think this is a sensible change.

It intends to avoid exccesive locking during looking up stats
values. But we don't have so much vacant WanSnd slots in a
reasonable setup. Thus it seems reasonable to read the pid value
within the lock section since it adds practically no additional
cost. pg_stat_get_wal_receiver seems to need the same amendment
since the code is a parallel to that of wal receiver.

Or, if we were too sensitive to such locks for nothing, we could
use double-checked locking but I don't think we are so here.

In short, +1 too and walreceiver needs the same amendment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Bruce Momjian

On Sun, May 14, 2017 at 01:06:03PM -0700, Andres Freund wrote:
> On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
> > Personally while I would like to avoid code that actively crashes or
> > fails basic tests on Vax
> 
> I personally vote for simply refusing to run/compile on non-IEEE
> platforms, including VAX.  The benefit of even trying to get that right,
> not to speak of actually testing, seems just not there.

Do we even know that floats are precise enough to determine the
partition.  For example, if you have 6.1, is it possible for
that to be 5.999 on some systems?  Are IEEE systems all the same for
these values?  I would say we should disallow any approximate date type
for partitioning completely.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-05-14 Thread Bruce Momjian

On Thu, May 11, 2017 at 11:50:03PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > Bruce, the release notes do not mention yet that support for cleartext
> > passwords is removed. This has been done recently by Heikki in
> > eb61136d. This has as consequence that CREATE ROLE PASSWORD
> > UNENCRYPTED returns an error, and that password_encryption loses its
> > value 'off' compared to last releases.
> 
> The release notes only say that they're current through 4-22.  The
> usual process is to update them in batches every so often.  It'd be
> great if Bruce has time to do another round before Monday, but the
> current situation is not really broken.

Done.  Applied patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/release-10.sgml b/doc/src/sgml/release-10.sgml
new file mode 100644
index 6497641..4f953dc
*** a/doc/src/sgml/release-10.sgml
--- b/doc/src/sgml/release-10.sgml
***
*** 128,133 
--- 128,155 
  
  
   
+  
+   Rename WAL-related functions and views to use 
lsn
+   instead of location (David Rowley)
+  
+ 
+ 
+ 
+  
+  
+   RenameWAL-related functions and views to use lsn
+   instead of location (David Rowley)
+  
+ 
+ 
+ 
+  
+ 
+  Remove the ability to store unencrypted passwords on the server
+  (Heikki Linnakangas)
+ 
+ 
+ 
+  The server-side variable 
+  no longer supports off or plain.
+  The UNENCRYPTED option is no longer supported for
+  CREATE/ALTER USER ... PASSSWORD.  Similarly, the
+  --unencrypted has been removed from createuser.
+  The default for password_encryption is still
+  md5, and users migrating passwords from older systems
+  will have them stored encrypted by default in this release.
+ 
+ 
+ 
+ 
+  
+   
+Add function PQencryptPasswordConn()
+to allow creation of more types of encrypted passwords on the
+client-side (Michael Paquier, Heikki Linnakangas)
+   
+ 
+   
+Previously only MD5 passwords could be created using PQencryptPassword().
+This new function can also create SCRAM-SHA-256 passwords.
+   
+  
+ 
+  
+   
***
*** 2847,2852 
--- 2912,2919 


 Push aggregates to foreign data wrapper servers, where possible
***
*** 2858,2864 
 from the foreign data wrapper server, and offloads
 aggregate computation from the requesting server.  The postgres_fdw is able to
!perform this optimization.

   
  
--- 2925,2932 
 from the foreign data wrapper server, and offloads
 aggregate computation from the requesting server.  The postgres_fdw is able to
!perform this optimization.  There are also improvements in
!pushing down joins involving extensions.

   
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multi-column range partition constraint

2017-05-14 Thread Amit Langote

On 2017/05/14 1:07, Robert Haas wrote:
> On Sat, May 13, 2017 at 12:42 AM, Amit Langote  
> wrote:
>> Attached is the correct version.
> 
> Thank you!  I committed 0001 with a couple of cosmetic tweaks and with
> the change I previously suggested around partexprs_item.  You argued
> that wouldn't work because the contents of partexprs_item was
> modified, but that's not so: partexprs_item in
> get_range_key_properties is a pointer the partexprs_item in the
> caller.  When it modifies *partexprs_item, it's not changing the
> contents of the ListCell itself, just the caller's ListCell *.

I see.

> I also ran pgindent over the patch.

Oops, had forgotten about pgindent.

> Also committed 0002.  In that case, I removed CHECK (...) from the
> output; the caller can always add that if it's desired, but since a
> partitioning constraint is NOT a CHECK constraint I don't actually
> think it's useful in most cases.  I also tweaked the regression tests
> slightly.

Thanks for reviewing and committing.  Agree about not including CHECK ().

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-14 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> On Fri, May 12, 2017 at 10:44 PM, Tom Lane  wrote:
> > I would not really expect that reconnection would retry after
> > arbitrary failure cases.  Should it retry for "wrong database name", for
> instance?
> > It's not hard to imagine that leading to very confusing behavior.
> 
> I guess not as well. That would be tricky for the user to have a different
> behavior depending on the error returned by the server, which is why the
> current code is doing things right IMO. Now, the feature has been designed
> similarly to JDBC with its parametrization, so it could be surprising for
> users to get a different failure handling compared to that. Not saying that
> JDBC is doing it wrong, but libpq does nothing wrong either.

I didn't intend to make the user have a different behavior depending on the 
error returned by the server.  I meant attempting connection to alternative 
hosts when the server returned an error. I thought the new libpq feature tries 
to connect to other hosts when a connection attempt fails, where the 
"connection" is the *database connection* (user's perspective), not the *socket 
connection* (PG developer's perspective).  I think PgJDBC meets the user's 
desire better -- "Please connect to some host for better HA if a database 
server is unavailable for some reason."

By the way, could you elaborate what problem could occur if my solution is 
applied?  (it doesn't seem easy for me to imagine...)  FYI, as below, the case 
Tom picked up didn't raise an issue:

[libpq]
$ psql -h localhost,localhost -p 5450,5451 -d aaa
psql: FATAL:  database "aaa" does not exist
$

[JDBC]
$ java org.hsqldb.cmdline.SqlTool postgres
SqlTool v. 3481.
2017-05-15T10:23:55.991+0900  SEVERE  Connection error:
org.postgresql.util.PSQLException: FATAL: database "aaa" does not exist
  Location: File: postinit.c, Routine: InitPostgres, Line: 846
  Server SQLState: 3D000
at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412)
at 
org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2538)
at 
org.postgresql.core.v3.QueryExecutorImpl.(QueryExecutorImpl.java:122)
at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:227)
at 
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:194)
at org.postgresql.Driver.makeConnection(Driver.java:431)
at org.postgresql.Driver.connect(Driver.java:247)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.hsqldb.lib.RCData.getConnection(Unknown Source)
at org.hsqldb.cmdline.SqlTool.objectMain(Unknown Source)
at org.hsqldb.cmdline.SqlTool.main(Unknown Source)

Failed to get a connection to 
'jdbc:postgresql://localhost:5450,localhost:5451/aaa' as user "tunakawa".
Cause: FATAL: database "aaa" does not exist
  Location: File: postinit.c, Routine: InitPostgres, Line: 846
  Server SQLState: 3D000
$

Regards
Takayuki Tsunakawa

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Andres Freund

Hi,

On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
> but wanting a CHECK constraint that applies to only one partition
> seems pretty reasonable (e.g. CHECK that records for older years are
> all in the 'inactive' state, or whatever).

On a hash-partitioned table?


> Now that's not to say that we shouldn't have a
> reload-through-the-top-parent switch; actually, I think that's a good
> idea.  I just don't believe that it can ever be a complete substitute
> for portable hash functions.  I think almost everybody here agrees
> that it isn't necessary to have hash functions that are 100% portable,
> e.g. to VAX.  But it would be nice IMHO if you could use an integer
> column as the partitioning key and have that be portable between Intel
> and POWER.

I'm not saying it can't work for any datatype, I just think it'd be a
very bad idea to make it work for any non-trivial ones. The likelihood
of reguarly breaking or preventing us from improving things seems high.
I'm not sure that having a design where this most of the time works for
some datatypes is a good one.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-14 Thread Robert Haas

On Sun, May 14, 2017 at 9:19 PM, Michael Paquier
 wrote:
> On Fri, May 12, 2017 at 10:44 PM, Tom Lane  wrote:
>> Michael Paquier  writes:
>>> On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki
>>>  wrote:
 Likewise, when the first host has already reached max_connections, libpq 
 doesn't attempt the connection aginst later hosts.
>>
>>> It seems to me that the feature is behaving as wanted. Or in short
>>> attempt to connect to the next host only if a connection cannot be
>>> established. If there is a failure once the exchange with the server
>>> has begun, just consider it as a hard failure. This is an important
>>> property for authentication and SSL connection failures actually.
>>
>> I would not really expect that reconnection would retry after arbitrary
>> failure cases.  Should it retry for "wrong database name", for instance?
>> It's not hard to imagine that leading to very confusing behavior.
>
> I guess not as well. That would be tricky for the user to have a
> different behavior depending on the error returned by the server,
> which is why the current code is doing things right IMO. Now, the
> feature has been designed similarly to JDBC with its parametrization,
> so it could be surprising for users to get a different failure
> handling compared to that. Not saying that JDBC is doing it wrong, but
> libpq does nothing wrong either.

I concur.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Robert Haas

On Sun, May 14, 2017 at 6:29 PM, Andres Freund  wrote:
> On 2017-05-14 18:25:08 -0400, Tom Lane wrote:
>> It may well be that we can get away with saying "we're not going
>> to make it simple to move hash-partitioned tables with float
>> partition keys between architectures with different float
>> representations".  But there's a whole lot of daylight between that
>> and denying any support for float representations other than the
>> currently-most-popular one.
>
> Note that I, IIRC in the mail you responded to, also argued that I don't
> think it'd be a good idea to rely on hashfunctions being portable.  The
> amount of lock-in that'd create, especially for more complex datatypes,
> seems wholly inadvisable.  I still think that dumping tables in a way
> they're reloaded via the top-partition (probably one copy statement for
> each child partition), and prohibiting incoming fkeys to partitions, is
> a better approach to all this.

You'd have to prohibit a heck of a lot more than that in order for
this to work 100% reliably.  You'd have to prohibit CHECK constraints,
triggers, rules, RLS policies, and UNIQUE indexes, at the least.  You
might be able to convince me that some of those things are useless
when applied to partitions, but wanting a CHECK constraint that
applies to only one partition seems pretty reasonable (e.g. CHECK that
records for older years are all in the 'inactive' state, or whatever).
I think getting this to work 100% reliably in all cases would require
an impractically large hammer.

Now that's not to say that we shouldn't have a
reload-through-the-top-parent switch; actually, I think that's a good
idea.  I just don't believe that it can ever be a complete substitute
for portable hash functions.  I think almost everybody here agrees
that it isn't necessary to have hash functions that are 100% portable,
e.g. to VAX.  But it would be nice IMHO if you could use an integer
column as the partitioning key and have that be portable between Intel
and POWER.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-14 Thread Michael Paquier

On Fri, May 12, 2017 at 10:44 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki
>>  wrote:
>>> Likewise, when the first host has already reached max_connections, libpq 
>>> doesn't attempt the connection aginst later hosts.
>
>> It seems to me that the feature is behaving as wanted. Or in short
>> attempt to connect to the next host only if a connection cannot be
>> established. If there is a failure once the exchange with the server
>> has begun, just consider it as a hard failure. This is an important
>> property for authentication and SSL connection failures actually.
>
> I would not really expect that reconnection would retry after arbitrary
> failure cases.  Should it retry for "wrong database name", for instance?
> It's not hard to imagine that leading to very confusing behavior.

I guess not as well. That would be tricky for the user to have a
different behavior depending on the error returned by the server,
which is why the current code is doing things right IMO. Now, the
feature has been designed similarly to JDBC with its parametrization,
so it could be surprising for users to get a different failure
handling compared to that. Not saying that JDBC is doing it wrong, but
libpq does nothing wrong either.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-14 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,> Takayuki
> Instead, I think we should fix the program to match the documented behavior.
> Otherwise, if the first database machine is down, libpq might wait for about
> 2 hours (depending on the OS's TCP keepalive setting), during which it tims
> out after connect_timeout and does not attempt to connect to other hosts.
> 
> I'll add this item in the PostgreSQL 10 Open Items.

Please use the attached patch to fix the problem.  I confirmed the success as 
follows:

$ add "post_auth_delay = 10" in postgresql.conf on host1
$ start database servers on host1 and host2
$ psql -h host1,host2 -p 5432,5433 -d "dbname=postgres connect_timeout=3"
(psql connected to host2 after 3 seconds, which I checked with \conninfo)

Regards
Takayuki Tsunakawa

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index eb5aaf7098..d02e5201fa 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1720,6 +1720,8 @@ connectDBComplete(PGconn *conn)
 {
PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
time_t  finish_time = ((time_t) -1);
+   int ret = 0;
+   int timeout = 0;
 
if (conn == NULL || conn->status == CONNECTION_BAD)
return 0;
@@ -1729,8 +1731,7 @@ connectDBComplete(PGconn *conn)
 */
if (conn->connect_timeout != NULL)
{
-   int timeout = atoi(conn->connect_timeout);
-
+   timeout = atoi(conn->connect_timeout);
if (timeout > 0)
{
/*
@@ -1761,7 +1762,8 @@ connectDBComplete(PGconn *conn)
return 1;   /* success! */
 
case PGRES_POLLING_READING:
-   if (pqWaitTimed(1, 0, conn, finish_time))
+   ret = pqWaitTimed(1, 0, conn, finish_time);
+   if (ret == -1)
{
conn->status = CONNECTION_BAD;
return 0;
@@ -1769,7 +1771,8 @@ connectDBComplete(PGconn *conn)
break;
 
case PGRES_POLLING_WRITING:
-   if (pqWaitTimed(0, 1, conn, finish_time))
+   ret = pqWaitTimed(0, 1, conn, finish_time);
+   if (ret == -1)
{
conn->status = CONNECTION_BAD;
return 0;
@@ -1782,6 +1785,22 @@ connectDBComplete(PGconn *conn)
return 0;
}
 
+   if (ret == 1)   /* connect_timeout elapsed */
+   {
+   /* If there are no more hosts, return (the error 
message is already set) */
+   if (++conn->whichhost >= conn->nconnhost)
+   {
+   conn->whichhost = 0;
+   conn->status = CONNECTION_BAD;
+   return 0;
+   }
+   /* Attempt connection to the next host, starting the 
connect_timeout timer */
+   pqDropConnection(conn, true);
+   conn->addr_cur = 
conn->connhost[conn->whichhost].addrlist;
+   conn->status = CONNECTION_NEEDED;
+   finish_time = time(NULL) + timeout;
+   }
+
/*
 * Now try to advance the state machine.
 */
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 756c6d7779..1d6ea93a0a 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -991,11 +991,9 @@ pqWait(int forRead, int forWrite, PGconn *conn)
 /*
  * pqWaitTimed: wait, but not past finish_time.
  *
- * If finish_time is exceeded then we return failure (EOF).  This is like
- * the response for a kernel exception because we don't want the caller
- * to try to read/write in that case.
- *
  * finish_time = ((time_t) -1) disables the wait limit.
+ *
+ * Returns -1 on failure, 0 if the socket is readable/writable, 1 if it timed 
out.
  */
 int
 pqWaitTimed(int forRead, int forWrite, PGconn *conn, time_t finish_time)
@@ -1005,13 +1003,13 @@ pqWaitTimed(int forRead, int forWrite, PGconn *conn, 
time_t finish_time)
result = pqSocketCheck(conn, forRead, forWrite, finish_time);
 
if (result < 0)
-   return EOF; /* errorMessage is 
already set */
+   return -1;  /* errorMessage is 
already set */
 
if (result == 0)
{
printfPQExpBuffer(

Re: [HACKERS] Hash Functions

2017-05-14 Thread Peter Geoghegan

On Sun, May 14, 2017 at 3:30 PM, Tom Lane  wrote:
> I agree that the Far Eastern systems that can't easily be replaced
> by Unicode are that way mostly because they're a mess.  But I'm
> still of the opinion that locking ourselves into Unicode is a choice
> we might regret, far down the road.

It's not a choice that has any obvious upside, so I have no reason to
disagree. My point was only that Robert's contention that "You could
argue that text-under-LATIN1 and text-under-UTF8 aren't really the
same data type at all" seems wrong to me, because PostgreSQL seems to
want to treat encoding as a property of the machine. This is evidenced
by the fact that we expect applications to change client encoding
"transparently". That is, client encoding may be changed without in
any way affecting humans that speak a natural language that is
provided for by the application's client encoding. That's a great
ideal to have, and one that is very close to completely workable.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Thomas Munro

On Mon, May 15, 2017 at 10:08 AM, Thomas Munro
 wrote:
> [2] 
> https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop

Though looking more closely I see that the default is IEEE in 64 bit
builds, which seems like a good way to kill the older format off.
If/when someone gets PostgreSQL ported to z/OS it probably won't be
because they want to run it on an ancient 32 bit mainframe.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 pgindent run

2017-05-14 Thread Tom Lane

Andres Freund  writes:
> On 2017-05-14 17:22:16 -0400, Stephen Frost wrote:
>> Releasing alpha/beta is not the same as branching, which I didn't expect
>> us to do for a while yet..

> Well, tagging then.  Imo it still should be done before we tag
> beta1/alpha1.

Too late, IMO.  Yeah, I'd have preferred to pgindent before beta1 too,
just in case pgindent breaks something ... but it's been a long time
since that happened, so realistically, not having done it yet is not
going to meaningfully degrade the relevance of beta testing.

At this point, for Bruce to do it before the wrap, he'd pretty much
have to do it tonight, which we already agreed among the release team
is not enough notice.

Also, I liked the approach Robert took last year where he did a dry
run and then manually fixed places that he saw would get messed up
by pgindent.  That's a lot more pleasant than undoing such damage
afterwards.  I don't know if Robert plans to do that this year, but if
not, I'm planning to get to it tomorrow or Tuesday.  I can't do it
right now because I'm still on higher-priority tasks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Tom Lane

Peter Geoghegan  writes:
> The express goal of the Unicode consortium is to replace all existing
> encodings with Unicode. My personal opinion is that a Unicode
> monoculture would be a good thing, provided reasonable differences can
> be accommodated.

Can't help remembering Randall Munroe's take on such things:
https://xkcd.com/927/

I agree that the Far Eastern systems that can't easily be replaced
by Unicode are that way mostly because they're a mess.  But I'm
still of the opinion that locking ourselves into Unicode is a choice
we might regret, far down the road.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Andres Freund

On 2017-05-14 18:25:08 -0400, Tom Lane wrote:
> It may well be that we can get away with saying "we're not going
> to make it simple to move hash-partitioned tables with float
> partition keys between architectures with different float
> representations".  But there's a whole lot of daylight between that
> and denying any support for float representations other than the
> currently-most-popular one.

Note that I, IIRC in the mail you responded to, also argued that I don't
think it'd be a good idea to rely on hashfunctions being portable.  The
amount of lock-in that'd create, especially for more complex datatypes,
seems wholly inadvisable.  I still think that dumping tables in a way
they're reloaded via the top-partition (probably one copy statement for
each child partition), and prohibiting incoming fkeys to partitions, is
a better approach to all this.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Tom Lane

Andres Freund  writes:
> On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
>> Personally while I would like to avoid code that actively crashes or
>> fails basic tests on Vax

> I personally vote for simply refusing to run/compile on non-IEEE
> platforms, including VAX.

The point of wanting that is not because anybody thinks that VAX per
se is an interesting platform anymore.  The point is to avoid locking
ourselves into narrow assumptions that we may need to break someday.
Saying "nobody cares about floats other than IEEE floats" is morally
indistinguishable from saying "nobody cares about running on any
platform except Windows", which was a depressingly common opinion
back in the nineties or so.

It may well be that we can get away with saying "we're not going
to make it simple to move hash-partitioned tables with float
partition keys between architectures with different float
representations".  But there's a whole lot of daylight between that
and denying any support for float representations other than the
currently-most-popular one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Thomas Munro

On Mon, May 15, 2017 at 7:59 AM, Greg Stark  wrote:
> On 13 May 2017 at 10:29, Robert Haas  wrote:
>> - Floats.  There may be different representations in use on different
>> hardware, which could be a problem.  Tom didn't answer my question
>> about whether any even-vaguely-modern hardware is still using non-IEEE
>> floats, which I suspect means that the answer is "no".
>
> Fwiw the answer to that is certainly no. The only caveat is that some
> platforms have not entirely complete implementations of IEEE missing
> corner cases such as denormalized values but I don't think that would
> be something that would be changed with a different hash function
> though.

Well... along with the Intel/IEEE-754 and VAX formats, there is a
third floating point format that is/was in widespread use: IBM hex
float[1].  It's base 16 rather than base 2, and might be the default
on some IBM operating systems[2] for the C float and double types (but
fortunately not xlc for AIX or Linux, and I have no clue about i/OS).
This is probably irrelevant because it looks like people aren't
running PostgreSQL on z/OS right now[3], but unlike VAXen these
systems are alive and well so I just wanted to respond to the
implication on this thread that the whole world is a VAX or an IEEE
system :-)  People really use this... I happen to know a shop that has
petabytes of IBM hex float data.

(IBM hardware also supports base 10 floats, but they show up as
different types in C so not relevant.)

[1] https://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
[2] 
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop
[3] 
https://www.postgresql.org/message-id/flat/BLU437-SMTP4B3FF36035D8A3C3816D49C160%40phx.gbl

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 pgindent run

2017-05-14 Thread Stephen Frost

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-14 17:22:16 -0400, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote:
> > > > I would like to run pgindent on the head source tree this Wednesday
> > > > afternoon, UTC.   Is that OK for everyone?
> > > 
> > > Shouldn't we do that before we branch? And if Thursday still is the
> > > intended alpha/beta release date, Wednesday would be too late, no?
> > 
> > Releasing alpha/beta is not the same as branching, which I didn't expect
> > us to do for a while yet..
> 
> Well, tagging then.  Imo it still should be done before we tag
> beta1/alpha1.

For my 2c, at least, I'm not sure I see why that matters all that
much..?  I'm not against doing the pgindent run today, to be clear, but
I don't follow what the harm is with waiting until Wednesday.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] PG10 pgindent run

2017-05-14 Thread Andres Freund

On 2017-05-14 17:22:16 -0400, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote:
> > > I would like to run pgindent on the head source tree this Wednesday
> > > afternoon, UTC.   Is that OK for everyone?
> > 
> > Shouldn't we do that before we branch? And if Thursday still is the
> > intended alpha/beta release date, Wednesday would be too late, no?
> 
> Releasing alpha/beta is not the same as branching, which I didn't expect
> us to do for a while yet..

Well, tagging then.  Imo it still should be done before we tag
beta1/alpha1.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 pgindent run

2017-05-14 Thread Stephen Frost

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote:
> > I would like to run pgindent on the head source tree this Wednesday
> > afternoon, UTC.   Is that OK for everyone?
> 
> Shouldn't we do that before we branch? And if Thursday still is the
> intended alpha/beta release date, Wednesday would be too late, no?

Releasing alpha/beta is not the same as branching, which I didn't expect
us to do for a while yet..

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] postgres 9.6.2 update breakage

2017-05-14 Thread Christopher Allan Webber

Jan Nieuwenhuizen writes:

> Roel Janssen writes:
>
>> So, it would be something like:
>> postgres pg_upgrade \
>> ...
>
> It's great to have a recipe `that works', so thanks!
>
> However, whether or not we automate this, I cannot help to wonder if
> we should support downgrading -- at least to the previous version
> in this case?
>
> If I'm not mistaken, everything else in GuixSD will run if I select a
> previous system generation in Grub...except for this?
>
> Is involving postgres developers an option, I'm sure a least one of
> the postgresql hackers[cc] are already looking at Guix[SD]?
>
> Greetings,
> janneke

There's a big difference in upgrading and downgrading between guix
revisions and doing so in highly stateful databases, unfortunately.

I can't speak for postgres specifically, but here's my experience with
migrations as the tech lead of MediaGoblin:

 - upgrades should be taken with extreme caution, and you should back up
   first.
 - downgrades should be taken with ten times the amount of caution of
   upgrades, a vat of coffee to work through the problems, and a barrel
   of whiskey for when it doesn't.  I say that as someone who's mostly
   given up coffee and doesn't drink alcohol.

State changes are bad enough when unidirectional.  Django, for instance,
provides an API that does both upgrades and downgrades.  Almost
everybody spends a bunch of time carefully crafting their upgrades, and
just leaves their downgrades as the stubs that come with it.  These are
stubs that drop columns entirely, possibly columns that data was moved
to in the migration.  Reverse course, and suddenly you don't have a lot
of data you used to.

What we really want to do is provide the option to snapshot things
*before* you do an upgrade, IMO...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Peter Geoghegan

On Sat, May 13, 2017 at 9:11 PM, Robert Haas  wrote:
> The latter is
> generally false already.  Maybe LATIN1 -> UTF8 is no-fail, but what
> about UTF8 -> LATIN1 or SJIS -> anything?  Based on previous mailing
> list discussions, I'm under the impression that it is sometimes
> debatable how a character in one encoding should be converted to some
> other encoding, either because it's not clear whether there is a
> mapping at all or it's unclear what mapping should be used.

The express goal of the Unicode consortium is to replace all existing
encodings with Unicode. My personal opinion is that a Unicode
monoculture would be a good thing, provided reasonable differences can
be accommodated. So, it might be that there is ambiguity about how one
codepoint can be converted to another in another encoding, but that's
because encodings like SJIS and BIG5 are needlessly ambiguous. It has
nothing to do with cultural preferences leaving the question
undecidable (at least by a panel of natural language experts), and
everything to do with these regional character encoding systems being
objectively bad. They richly deserve to die, and are in fact dying.

Encoding actually *is* a property of the machine, even though regional
encodings obfuscate things. There is a reason why MacOS and Java use
UTF-16 rather than UTF-8, and there is a reason why the defacto
standard on the web is UTF-8, and those reasons are completely
technical. AFAICT, whatever non-technical reasons remain are actually
technical debt in disguise.

Where this leaves hash partitioning, I cannot say.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Andres Freund

On 2017-05-14 15:59:09 -0400, Greg Stark wrote:
> Personally while I would like to avoid code that actively crashes or
> fails basic tests on Vax

I personally vote for simply refusing to run/compile on non-IEEE
platforms, including VAX.  The benefit of even trying to get that right,
not to speak of actually testing, seems just not there.

> even I don't think we need to worry about
> replication or federated queries in a heterogeneous environment where
> some servers are Vaxen and some are modern hardware. That seems a bit
> far-fetched.

Imo there's little point in trying to delineate some subset that we want
to support on platforms that are 20 years out of date.  Either we do, or
don't. And since there's no point aside of some intellectual
curiosity...

On the other hand, I don't believe my opinion that requiring hashing to
be portable is unrealistic, is meaningfully affected by disallowing
non-IEEE floats.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-05-14 Thread Pavel Stehule

2017-05-14 5:04 GMT+02:00 Pavel Stehule :

>
>
> 2017-05-13 22:20 GMT+02:00 Tom Lane :
>
>> Pavel Stehule  writes:
>> > I am working on migration large Oracle application to Postgres. When I
>> > started migration procedures with OUT parameters I found following limit
>>
>> > "record or row variable cannot be part of multiple-item INTO list"
>>
>> IIRC, the reason for disallowing that is that it's totally unclear what
>> the semantics ought to be.  Is that variable a single target (demanding
>> a compatible composite-valued column from the source query), or does it
>> eat one source column per field within the record/row?  The former is 100%
>> inconsistent with what happens if the record/row is the only INTO target;
>> while the latter would be very bug-prone, and it's especially unclear what
>> ought to happen if it's an as-yet-undefined record variable.
>
>
> I don't think so. The semantics should be same like now.
>
> now, the output (s1,s2,s3) can be assigned to
>
> 1. scalar variables - implemented with aux row variable (s1,s2,s3) ->
> r(ts1,ts2,ts3)
> 2. record - (s1, s2, s3) -> rec(s1,s2,s3)
> 3. row - (s1,s2,s3) -> r(s1,s2,s3)
>
> If we allow composite values there, then situation is same
>
> 1. (s1, c2, s3, c4) -> r(ts1, tc2, ts3, tc4)
> 2. (s1, c2, s3, c4) -> rec(s1, c2, s3, c4)
> 3. (s1, c2, s3, c4) -> row(s1, c2, s3, c4)
>
> So there are not any inconsistency if we use rule
>
> 1. if there is one target, use it
> 2. if there are more target, create aux row variable
>
> Same technique is used for function output - build_row_from_vars - and
> there are not any problem.
>
> If you try assign composite to scalar or scalar to composite, then the
> assignment should to fail. But when statement is correct, then this invalid
> assignments should not be there.
>
>
>>
>> Yeah, we could invent some semantics or other, but I think it would
>> mostly be a foot-gun for unwary programmers.
>>
>> We do allow you to write out the columns individually for such cases:
>>
>> SELECT ... INTO v1, rowvar.c1, rowvar.c2, rowvar.c3, v2 ...
>>
>
> It doesn't help to performance and readability (and maintainability) for
> following cases
>
> There are often pattern
>
> PROCEDURE p(..., OUT r widetab%ROWTYPE, OUT errordesc COMPOSITE)
>
> Now there is a workaround
>
> SELECT * FROM p() INTO auxrec;
> r := auxrec.widetab;
> errordesc := auxrec.errordesc;
>
> But it creates N (number of OUT variables) of assignments commands over
> records.
>
> If this workaround is safe, then implementation based on aux row variable
> should be safe too, because it is manual implementation.
>
>
>
>>
>> and I think it's better to encourage people to stick to that.
>
>
> I don't think so using tens OUT variables is some nice, but current behave
> is too restrictive. More, I didn't find a case, where current
> implementation should not work (allow records needs some work).
>

here is patch

all regress tests passed

Regards

Pavel


>
>
>>
>> regards, tom lane
>>
>
>


plpgsql-into-multitarget.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Greg Stark

On 13 May 2017 at 10:29, Robert Haas  wrote:
> - Floats.  There may be different representations in use on different
> hardware, which could be a problem.  Tom didn't answer my question
> about whether any even-vaguely-modern hardware is still using non-IEEE
> floats, which I suspect means that the answer is "no".

Fwiw the answer to that is certainly no. The only caveat is that some
platforms have not entirely complete implementations of IEEE missing
corner cases such as denormalized values but I don't think that would
be something that would be changed with a different hash function
though.

Personally while I would like to avoid code that actively crashes or
fails basic tests on Vax even I don't think we need to worry about
replication or federated queries in a heterogeneous environment where
some servers are Vaxen and some are modern hardware. That seems a bit
far-fetched.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 pgindent run

2017-05-14 Thread Andres Freund

On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote:
> I would like to run pgindent on the head source tree this Wednesday
> afternoon, UTC.   Is that OK for everyone?

Shouldn't we do that before we branch? And if Thursday still is the
intended alpha/beta release date, Wednesday would be too late, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Receive buffer size for the statistics socket

2017-05-14 Thread Tom Lane

So I put in the patch I'd proposed to reduce sleep delays in the stats
regression test, and I see that frogmouth has now failed that test twice,
with symptoms suggesting that it's dropping the last stats report ---
but not all of the stats reports --- from the test's first session.
I considered reverting the test patch, but on closer inspection that seems
like it would be shooting the messenger, because this is indicating a real
and reproducible loss of stats data.

I put in some debug elog's to see how much data is getting shoved at the
stats collector during this test, and it seems that it can be as much as
about 12K, between what the first session can send at exit and what the
second session will send immediately after startup.  (I think this value
should be pretty platform-independent, but be it noted that I'm measuring
on a 64-bit Linux system while frogmouth is 32-bit Windows.)

Now, what's significant about that is that frogmouth is a pretty old
Windows version, and what I read on the net is that Windows versions
before 2012 have only 8KB socket receive buffer size by default.
So this behavior is plausibly explained by the theory that the stats
collector's receive buffer is overflowing, causing loss of stats messages.
This could well explain the stats-test failures we've seen in the past
too, which if memory serves were mostly on Windows.

Also, it's clear that a session could easily shove much more than 8KB at
a time out to the stats collector, because what we're doing in the stats
test does not involve touching any very large number of tables.  So I
think this is not just a test failure but is telling us about a plausible
mechanism for real-world statistics drops.

I observe a default receive buffer size around 124K on my Linux box,
which seems like it'd be far less prone to overflow than 8K.

I propose that it'd be a good idea to try to set the stats socket's
receive buffer size to be a minimum of, say, 100K on all platforms.
Code for this would be analogous to what we already have in pqcomm.c
(circa line 760) for forcing up the send buffer size, but SO_RCVBUF
not SO_SNDBUF.

A further idea is that maybe backends should be tweaked to avoid
blasting large amounts of data at the stats collector in one go.
That would require more thought to design, though.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small improvement to compactify_tuples

2017-05-14 Thread Sokolov Yura


Good day, everyone.

I've been playing a bit with unlogged tables - just random updates on 
simple

key-value table. I've noticed amount of cpu spent in a compactify_tuples
(called by PageRepareFragmentaion). Most of time were spent in qsort of
itemidbase items.

itemidbase array is bounded by number of tuples in a page, and 
itemIdSortData

structure is simple, so specialized version could be a better choice.

Attached patch adds combination of one pass of prefix sort with 
insertion

sort for larger array and shell sort for smaller array.
Insertion sort and shell sort are implemented as macros and could be 
reused.


I've tested following table:

create unlogged table test3 (
id integer PRIMARY KEY with (fillfactor=85),
val text
) WITH (fillfactor=85);
insert into test3 select i, '!'||i from generate_series(1, 1000) 
as i;


With pgbench script:

\set id1 RANDOM(1, :scale)
\set id2 RANDOM(1, :scale)

select * from test3 where id = :id1;
update test3 set val = '!'|| :id2 where id = :id1;

And command:

pgbench -M prepared -c 3 -s 1000 -T 1000 -P 3 -n -f test3.sql 
testdb


Using 1GB shared_buffers and synchronous_commit=off.

On my notebook improvement is:

before patch:

progress: 63.0 s, 15880.1 tps, lat 0.189 ms stddev 0.127
progress: 66.0 s, 15975.8 tps, lat 0.188 ms stddev 0.122
progress: 69.0 s, 15904.1 tps, lat 0.189 ms stddev 0.152
progress: 72.0 s, 15000.9 tps, lat 0.200 ms stddev 0.213
progress: 75.0 s, 15101.7 tps, lat 0.199 ms stddev 0.192
progress: 78.0 s, 15854.2 tps, lat 0.189 ms stddev 0.158
progress: 81.0 s, 15803.3 tps, lat 0.190 ms stddev 0.158
progress: 84.0 s, 15242.9 tps, lat 0.197 ms stddev 0.203
progress: 87.0 s, 15184.1 tps, lat 0.198 ms stddev 0.215

after patch:

progress: 63.0 s, 17108.5 tps, lat 0.175 ms stddev 0.140
progress: 66.0 s, 17271.9 tps, lat 0.174 ms stddev 0.155
progress: 69.0 s, 17243.5 tps, lat 0.174 ms stddev 0.143
progress: 72.0 s, 16675.3 tps, lat 0.180 ms stddev 0.206
progress: 75.0 s, 17187.4 tps, lat 0.175 ms stddev 0.157
progress: 78.0 s, 17293.0 tps, lat 0.173 ms stddev 0.159
progress: 81.0 s, 16289.8 tps, lat 0.184 ms stddev 0.180
progress: 84.0 s, 16131.2 tps, lat 0.186 ms stddev 0.170
progress: 87.0 s, 16741.1 tps, lat 0.179 ms stddev 0.165

I understand that it is quite degenerate test case.
But probably this improvement still has sense.

With regards,
--
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom f8ed235dbb60a79b4f59dc8a4af014b2ca698772 Mon Sep 17 00:00:00 2001
From: Sokolov Yura aka funny_falcon 
Date: Sun, 14 May 2017 20:57:00 +0300
Subject: [PATCH] storage/page/bufpage.c: improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch add one pass of prefix sort + insertion sort for large array
of items, and shell sort for smaller arrays.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off).
---
 src/backend/storage/page/bufpage.c | 42 +--
 src/include/c.h| 59 ++
 2 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index fdf045a..b7c6392 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -433,6 +433,36 @@ itemoffcompare(const void *itemidp1, const void *itemidp2)
 		((itemIdSort) itemidp1)->itemoff;
 }
 
+#define NSPLIT 256
+#define PREFDIV (BLCKSZ / NSPLIT)
+/* one pass of prefix sort for high 8 bits of itemoff.*/
+static void
+prefix_presort(itemIdSort itemidbase, int nitems)
+{
+	itemIdSortData copy[MaxIndexTuplesPerPage];
+	int	count[NSPLIT+1] = { 0 };
+	int i, pos, highbits;
+
+	Assert(nitems <= MaxIndexTuplesPerPage);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		count[highbits]++;
+	}
+	/* sort in decreasing order */
+	for (i = NSPLIT-1; i != 0; i--)
+		count[i-1] += count[i];
+	Assert(count[0] == nitems);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		pos = count[highbits+1]++;
+		copy[pos] = itemidbase[i];
+	}
+	Assert(count[1] == nitems);
+	memcpy(itemidbase, copy, sizeof(itemIdSortData) * nitems);
+}
+
 /*
  * After removing or marking some line pointers unused, move the tuples to
  * remove the gaps caused by the removed items.
@@ -444,9 +474,15 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
 	Offset		upper;
 	int			i;
 
-	/* sort itemIdSortData array into decreasing itemoff order */
-	qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
-		  itemoffcompare);
+	if (ni

Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-14 Thread Dilip Kumar

On Sun, May 14, 2017 at 5:35 PM, Ildus Kurbangaliev
 wrote:
> TRAP: FailedAssertion("!(((const void*)(fdw_trigtuple) != ((void *)0))
> ^ ((bool) (((const void*)(tupleid) != ((void *)0)) &&
> ((tupleid)->ip_posid != 0", File: "trigger.c", Line: 2428)
>
> I'm not sure how it should be fixed, because as I see `oldtuple` will
> be set only for AFTER ROW triggers by `wholerow` junk attribute.

There is comment on the ExecUpdate function

 *  When updating a foreign table,
 * tupleid is invalid; the FDW has to figure out which row to
 * update using data from the planSlot.  oldtuple is passed to
 * foreign table triggers; it is NULL when the foreign table has
 * no relevant triggers.

After your fix, now tupleid is invalid which is expected, but seems
like we need to do something more. As per the comments seems like it
is expected to get the oldtuple from planSlot.  But I don't see any
code for handling that part.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres 9.6.2 update breakage

2017-05-14 Thread Jan Nieuwenhuizen

Roel Janssen writes:

> So, it would be something like:
> postgres pg_upgrade \
> ...

It's great to have a recipe `that works', so thanks!

However, whether or not we automate this, I cannot help to wonder if
we should support downgrading -- at least to the previous version
in this case?

If I'm not mistaken, everything else in GuixSD will run if I select a
previous system generation in Grub...except for this?

Is involving postgres developers an option, I'm sure a least one of
the postgresql hackers[cc] are already looking at Guix[SD]?

Greetings,
janneke

-- 
Jan Nieuwenhuizen  | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com | Avatar®  http://AvatarAcademy.nl  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PG10 pgindent run

2017-05-14 Thread Bruce Momjian

I would like to run pgindent on the head source tree this Wednesday
afternoon, UTC.   Is that OK for everyone?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-14 Thread Petr Jelinek

On 13/05/17 22:23, Andres Freund wrote:
> On 2017-05-12 10:57:55 +0200, Petr Jelinek wrote:
>> Hmm, well it helps but actually now that we don't track individual
>> running transactions anymore it got much less effective (my version of
>> 0005 does as well).
>>
>> The example workload I test with is:
>> session 1: open transaction, do a write, keep it open
>> session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
>> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
>> session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
>> session 1: commit
>>
>> And wait for session 3 to finish slot creation, takes about 20 mins on
>> my laptop without patches, minute and half with your patches for 0004
>> and 0005 (or with your 0004 and my 0005) and about 2s with my original
>> 0004 and 0005.
> 
> Is that with assertions enabled or not?  With assertions all the time
> post patches seems to be spent in some Asserts in reorderbuffer.c,
> without it takes less than a second for me here.
>

Right you are, I always forget to switch that off before benchmarks.

> I'm appylying these now.

Cool. Just for posterity, this also fixes the issue number 3 as the
switch to consistent state is done purely based on xl_running_xacts and
not decoded commits/aborts.

So all done here, thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-14 Thread Ildus Kurbangaliev

Hello hackers,
i was experimenting with fdw tables recently, and discovered two bugs
in postgres core code (tested on stable 9.6 and master).

Steps to reproduce:

1) create parent table
2) create child local table
3) create child foreign table
4) create 'before row update` trigger at foreign table
5) make update query on parent table.

I attached sql file with these steps. At the end postgres will show an
error like:

ERROR:  could not open file "base/12410/33037": No such file or
directory

33037 is relid of the foreign table.  Bug is related with the fact that
postgres will try use latest scanned tupleid from local table to try
get an old tuple for trigger of foreign table.

It should be fixed with the patch like:

--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1324,7 +1324,6 @@ ExecModifyTable(ModifyTableState *node)
JunkFilter *junkfilter;
TupleTableSlot *slot;
TupleTableSlot *planSlot;
-   ItemPointer tupleid = NULL;
ItemPointerData tuple_ctid;
HeapTupleData oldtupdata;
HeapTuple   oldtuple;
@@ -1381,6 +1380,8 @@ ExecModifyTable(ModifyTableState *node)
 */
for (;;)
{
+   ItemPointer tupleid = NULL;
+

After this patch the second bug will appear:

TRAP: FailedAssertion("!(((const void*)(fdw_trigtuple) != ((void *)0))
^ ((bool) (((const void*)(tupleid) != ((void *)0)) &&
((tupleid)->ip_posid != 0", File: "trigger.c", Line: 2428)

I'm not sure how it should be fixed, because as I see `oldtuple` will
be set only for AFTER ROW triggers by `wholerow` junk attribute.

Regards,
Ildus Kurbangaliev


test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-14 Thread amul sul

On Sat, May 13, 2017 at 12:11 PM, Dilip Kumar  wrote:
> On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
>> Hi,
>>
>> Please find the following updated patches attached:
>
> I have done some testing with the new patch, most of the cases worked
> as per the expectation except below
>
> I expect the planner to select only "Seq Scan on t1" whereas it's
> scanning both the partitions?
>
> create table t (a int, b varchar) partition by hash(a);
> create table t1 partition of t for values with (modulus 8, remainder 0);
> create table t2 partition of t for values with (modulus 8, remainder 1);
>
> postgres=# explain select * from t where a=8;
> QUERY PLAN
> --
>  Append  (cost=0.00..51.75 rows=12 width=36)
>->  Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
>  Filter: (a = 8)
>->  Seq Scan on t2  (cost=0.00..25.88 rows=6 width=36)
>  Filter: (a = 8)
> (5 rows)
>
You are correct.  As of now constraint exclusion doesn't work on
partition constraint involves function call[1], and hash partition
constraint does have satisfies_hash_partition() function call.

>
> Some cosmetic comments.
> ---
> + RangeVar   *rv = makeRangeVarFromNameList(castNode(List, nameEl->arg));
> +
>
> Useless Hunk.
>
>  /*
> - * Build a CREATE SEQUENCE command to create the sequence object, and
> - * add it to the list of things to be done before this CREATE/ALTER
> - * TABLE.
> + * Build a CREATE SEQUENCE command to create the sequence object, and add
> + * it to the list of things to be done before this CREATE/ALTER TABLE.
>   */
>
> Seems like, in src/backend/parser/parse_utilcmd.c, you have changed
> the existing code with
> pgindent.  I think it's not a good idea to mix pgindent changes with your 
> patch.
>
Oops, my silly mistake, sorry about that. Fixed in attached version.

Regards,
Amul

1] 
https://www.postgresql.org/message-id/CA%2BTgmoaE9NZ_RiqZQLp2aJXPO4E78QxkQYL-FR2zCDop96Ahdg%40mail.gmail.com


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-14 Thread amul sul

On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat
 wrote:
> On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
>> Hi,
>>
>> Please find the following updated patches attached:
>>
>> 0001-Cleanup.patch : Does some cleanup and code refactoring required
>> for hash partition patch. Otherwise, there will be unnecessary diff in
>> 0002 patch
>
> Thanks for splitting the patch.
>
> +if (isnull[0])
> +cur_index = partdesc->boundinfo->null_index;
> This code assumes that null_index will be set to -1 when has_null is false. 
> Per
> RelationBuildPartitionDesc() this is true. But probably we should write this
> code as
> if (isnull[0])
> {
> if (partdesc->boundinfo->has_null)
> cur_index = partdesc->boundinfo->null_index;
> }
> That way we are certain that when has_null is false, cur_index = -1 similar to
> the original code.
>
Okay will add this.  I still don't understood point of having has_null
variable, if no null accepting partition exists then null_index is
alway set to -1 in RelationBuildPartitionDesc.  Anyway, let not change
the original code.

> Additional arguement to ComputePartitionAttrs() isn't used anywhere in this
> patch, so may be this better be part of 0002. If we do this the only change
> that will remain in patch is the refactoring of RelationBuildPartitionDesc(),
> so we may consider merging it into 0002, unless we find that some more
> refactoring is needed. But for now, having it as a separate patch helps.
>
Okay.

> Here's some more comments on 0002
>
> + * In the case of hash partitioning, datums is a 2-D array, stores modulus 
> and
> + * remainder values at datums[x][0] and datums[x][1] respectively for each
> + * partition in the ascending order.
>
> This comment about datums should appear in a paragraph of itself and may be
> rephrased as in the attached patch. May be we could also add a line about
> ndatums for hash partitioned tables as in the attached patch.
>
Thanks, looks good to me; will include this.

[...]
>
> +if (key->strategy == PARTITION_STRATEGY_HASH)
> +{
> +ndatums = nparts;
> +hbounds = (PartitionHashBound **) palloc(nparts *
> +
> sizeof(PartitionHashBound *));
> +i = 0;
> +foreach (cell, boundspecs)
> +{
> +PartitionBoundSpec *spec = lfirst(cell);
> +
> [ clipped ]
> +hbounds[i]->index = i;
> +i++;
> +}
> For list and range partitioned table we order the bounds so that two
> partitioned tables have them in the same order irrespective of order in which
> they are specified by the user or hence stored in the catalogs. The partitions
> then get indexes according the order in which their bounds appear in ordered
> arrays of bounds. Thus any two partitioned tables with same partition
> specification always have same PartitionBoundInfoData. This helps in
> partition-wise join to match partition bounds of two given tables.  Above code
> assigns the indexes to the partitions as they appear in the catalogs. This
> means that two partitioned tables with same partition specification but
> different order for partition bound specification will have different
> PartitionBoundInfoData represenation.
>
> If we do that, probably partition_bounds_equal() would reduce to just matching
> indexes and the last element of datums array i.e. the greatest modulus datum.
> If ordered datums array of two partitioned table do not match exactly, the
> mismatch can be because missing datums or different datums. If it's a missing
> datum it will change the greatest modulus or have corresponding entry in
> indexes array as -1. If the entry differs it will cause mismatching indexes in
> the index arrays.
>
Make sense, will fix this.

[...]
>
> +if (offset < 0)
> +{
> +nmod = DatumGetInt32(datums[0][0]);
> +valid_bound = (nmod % spec->modulus) == 0;
> +}
> +else
> +{
> +pmod = DatumGetInt32(datums[offset][0]);
> +valid_bound = (spec->modulus % pmod) == 0;
> +
> +if (valid_bound && (offset + 1) < ndatums)
> +{
> +nmod = DatumGetInt32(datums[offset + 1][0]);
> +valid_bound = (nmod % spec->modulus) == 0;
> +}
> +}
> May be name the variables as prev_mod(ulus) and next_mod(ulus) for better
> readability.
>
Okay, will rename to prev_modulus and next_modulus resp.

> + *   for p_p1: satisfies_hash_partition(2, 1, hash_fn(a), hash_fn(b))
> + *   for p_p2: satisfies_hash_partition(4, 2, hash_fn(a), hash_fn(b))
> + *   for p_p3: satisfies_hash_partition(8, 0, hash_fn(a), hash_fn(b))
> + *   for p_p4: satisfies_hash_partition(8, 4, hash_fn(a), hash_fn(b))
> The description here may be read as if w

51 matches

Mail list logo