[HACKERS] NOT NULL constraints on range partition key columns
Starting a new thread to discuss the proposal I put forward in [1] to stop creating explicit NOT NULL constraint on range partition keys that are simple columns. I said the following: On 2017/05/12 11:20, Robert Haas wrote: > On Thu, May 11, 2017 at 10:15 PM, Amit Langote > wrote: >> On 2017/05/12 10:42, Robert Haas wrote: >>> Actually, I think that not supporting nulls for range partitioning may >>> have been a fairly bad decision. >> >> I think the relevant discussion concluded [1] that way, because we >> couldn't decide which interface to provide for specifying where NULLs are >> placed or because we decided to think about it later. > > Yeah, but I have a feeling that marking the columns NOT NULL is going > to make it really hard to support that in the future when we get the > syntax hammered out. If it had only affected the partition > constraints that'd be different. So, adding keycol IS NOT NULL (like we currently do for expressions) in the implicit partition constraint would be more future-proof than generating an actual catalogued NOT NULL constraint on the keycol? I now tend to think it would be better. Directly inserting into a range partition with a NULL value for a column currently generates a "null value in column \"%s\" violates not-null constraint" instead of perhaps more relevant "new row for relation \"%s\" violates partition constraint". That said, we *do* document the fact that a NOT NULL constraint is added on range key columns, but we might as well document instead that we don't currently support routing tuples with NULL values in the partition key through a range-partitioned table and so NULL values cause error. Can we still decide to do that instead? Thanks, Amit [1] https://www.postgresql.org/message-id/f52fb5c3-b6ad-b6ff-4bb6-145fdb805fa6%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event triggers + table partitioning cause server crash in current master
On 2017/05/14 12:03, Mark Dilger wrote: > Hackers, > > I discovered a reproducible crash using event triggers in the current > development version, 29c7d5e483acaa74a0d06dd6c70b320bb315. > I was getting a crash before this version, and cloned a fresh copy of > the sources to be sure I was up to date, so I don't think the bug can be > attributed to Andres' commit. (The prior version I was testing against > was heavily modified by me, so I recreated the bug using the latest > standard, unmodified sources.) > > I create both before and after event triggers early in the regression test > schedule, which then fire here and there during the following tests, leading > fairly reproducibly to the server crashing somewhere during the test suite. > These crashes do not happen for me without the event triggers being added > to the tests. Many tests show as 'FAILED' simply because the logging > that happens in the event triggers creates unexpected output for the test. > Those "failures" are expected. The server crashes are not. > > The server logs suggest the crashes might be related to partitioned tables. > > Please find attached the patch that includes my changes to the sources > for recreating this bug. The logs and regression.diffs are a bit large; let > me know if you need them. > > I built using the command > > ./configure --enable-cassert --enable-tap-tests && make -j4 && make check Thanks for the report and providing steps to reproduce. It seems that it is indeed a bug related to creating range-partitioned tables. DefineRelation() calls AlterTableInternal() to add NOT NULL constraints on the range partition key columns, but the code fails to first initialize the event trigger context information. Attached patch should fix that. Thanks to the above test case, I also discovered that in the case of creating a partition, manipulations performed by MergeAttributes() on the input schema list may cause it to become invalid, that is, the List metadata (length) will no longer match the reality, because while the ListCells are deleted from the input list, the List pointer passed to list_delete_cell does not point to the same list. This caused a crash when the CreateStmt in question was subsequently passed to copyObject, which tried to access CreateStmt.tableElts that has become invalid as just described. The attached patch also takes care of that. Thanks, Amit >From 43f29ac13abda3c6ad71143d462c2dcd2c42f521 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 15 May 2017 14:00:54 +0900 Subject: [PATCH] Fix to make partitioned tables work with event triggers There were a couple of bugs here: 1. When creating range partitioned tables, AlterTableInternal is called which invokes the event trigger code. We must perform EventTriggerAlterTableStart() before calling AlterTableInternal so that the event trigger context information is initialized at all. 2. When creating a partition, CreateStmt.tableElts is likely to become invalid upon return from MergeAttributes(). We must replace the invalid value with the newly constructed correct list value computed by MergeAttributes(). For example, in the caller ProcessUtilitySlow(), EventTriggerCollectSimpleCommand() is called right after returning from DefineRelation(), and it expects the the content of CreateStmt to be valid to perform copyObject on it. First bug was reported by Mark Dilger. Report: https://postgr.es/m/4A9B8269-9771-4FB7-BFA3-84249800917D%40gmail.com --- src/backend/commands/tablecmds.c| 17 ++ src/test/regress/expected/event_trigger.out | 32 ++ src/test/regress/sql/event_trigger.sql | 35 + 3 files changed, 84 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b94db89b25..c3f489308a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -620,6 +620,19 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, stmt->relation->relpersistence, stmt->partbound != NULL, &inheritOids, &old_constraints, &parentOidCount); + /* + * The original value of stmt->tableElts may have been obsoleted by + * MergeAttributes(), especially those for partitions, wherein the + * cells of stmt->tableElts are deleted. Note that in a partition's + * case, stmt->tableElts contains dummy ColumnDef's created to capture + * the partition's own column constraints which are merged into the actual + * column definitions derived from the parent, before deleting the + * aforementioned dummy ones. We don't need to refer to stmt->tableElts + * hereafter in this function, although the caller expects it to contain + * valid elements upon return. So replace the original list with the + * one that's known to be valid. + */ + stmt->tableElts = schema; /* * Create a tuple descriptor from the relation schema. Note that this @@ -853,7
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On Mon, May 15, 2017 at 2:04 PM, Dilip Kumar wrote: > On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar wrote: >> After your fix, now tupleid is invalid which is expected, but seems >> like we need to do something more. As per the comments seems like it >> is expected to get the oldtuple from planSlot. But I don't see any >> code for handling that part. > > Maybe we should do something like attached patch. As a deficiency, shouldn't this try as well to improve regression test coverage for FDW triggers with inheritance trees? Those tests are in postgres_fdw. You may find other issues on the way.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicate usage of tablespace location?
On Thu, May 11, 2017 at 1:09 PM, Kyotaro HORIGUCHI wrote: > If we can accept multiple server versions share a tablespace > directory, pg_basebackup also can allow that situation. The > attached patch does that. Similary to the server code, it > correctly fails if the same version subdirectory exists. +#define verify_dir_is_empty_or_create(dirname, created, found) \ +verify_and_create_dir(dirname, created, found, false) This solution looks like a quick-and-dirty fix. I tend to prefer a solution close to whet Pierre is proposing on the other thread by localizing things in ReceiveAndUnpackTarFile(). This makes the check more localized, and there is no need to complicate low-level APIs of pg_basebackup.c. By the way, it may be better to keep discussions on the first thread created: https://www.postgresql.org/message-id/05c62730-8670-4da6-b783-52e66fb42...@pinaraf.info A patch has been submitted to the next CF there as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace
On Tue, May 2, 2017 at 2:28 AM, Pierre Ducroquet wrote: > I didn't have much time to spend on that issue until today, and I found a > way to solve it that seems acceptable to me. > > The biggest drawback will be that if the backup is interrupted, previous > tablespaces already copied will stay on disk, but since that issue could > already happen, I don't think it's too much a drawback. Yeah... Perhaps cleanup_directories_atexit() could be made smarter by registering the list of folders to cleanup when they are created. But that would be for another patch. > The patch simply delays the empty folder checking until we start reading the > tablespace tarball. The first entry of the tarball being the PG_MAJOR_CATVER > folder, that can not be found otherwise, there is no real alternative to > that. I think I like this idea. The server allows tablespaces to be created in parallel for different versions in the same path. It seems that pg_basebackup should allow the same for consistency. The current behavior is present since pg_basebackup has been introduced by the way. Perhaps Magnus has some insight to offer on this. > I will submit this patch in the current commit fest. I have not spotted any flaws in the refactored logic. boolbasetablespace; +boolfirstfile = 1; char *copybuf = NULL; booleans are assigned with true or false. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar wrote: > After your fix, now tupleid is invalid which is expected, but seems > like we need to do something more. As per the comments seems like it > is expected to get the oldtuple from planSlot. But I don't see any > code for handling that part. Maybe we should do something like attached patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_fdw_trigger.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typos in pg_basebackup.c
Hi all, Found $subject while working on the area. A patch is attached. Thanks, -- Michael basebackup-typo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bumping HASH_VERSION to 3
On Wed, May 10, 2017 at 11:25:22AM +0530, Amit Kapila wrote: > On Tue, Apr 11, 2017 at 11:53 PM, Bruce Momjian wrote: > > On Fri, Mar 31, 2017 at 02:48:05PM -0400, Tom Lane wrote: > >> +1, as long as we're clear on what will happen when pg_upgrade'ing > >> an installation containing hash indexes. I think a minimum requirement is > >> that it succeed and be able to start up, and allow the user to manually > >> REINDEX such indexes afterwards. Bonus points for: > >> > >> 1. teaching pg_upgrade to create a script containing the required REINDEX > >> commands. (I think it's produced scripts for similar requirements in the > >> past.) > >> > >> 2. marking the index invalid so that the system would silently ignore it > >> until it's been reindexed. I think there might be adequate infrastructure > >> for that already thanks to REINDEX CONCURRENTLY, and it'd just be a matter > >> of getting pg_upgrade to hack the indexes' catalog state. (If not, it's > >> probably not worth the trouble.) > > > > We already have code to do all of that, but it was removed from > > pg_upgrade in 9.5. You can still see it in 9.4: > > > > > > contrib/pg_upgrade/version_old_8_3.c::old_8_3_invalidate_hash_gin_indexes() > > > > Thanks for the pointer. > > > I would be happy to restore that code and make it work for PG 10. > > > > Attached patch implements the two points suggested by Tom. I will add > this to PG-10 open issues list so that we don't forget about this > work, let me know if you think otherwise. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of plaintext password type references
On Thu, May 11, 2017 at 10:08:30AM +0900, Michael Paquier wrote: > On Wed, May 10, 2017 at 10:22 PM, Michael Paquier > wrote: > > On Wed, May 10, 2017 at 10:01 PM, Tom Lane wrote: > >> Heikki Linnakangas writes: > >>> Also note that changing the signature check_password_hook_type would > >>> break any external modules that use the hook. Removing > >>> PASSWORD_TYPE_PLAINTEXT will do that too, because any password hook > >>> function would use that constant (see e.g. contrib/passwordcheck). If we > >>> were to change the signature, I'd actually like to simplify it by > >>> removing the password_type parameter altogether. The hook function can > >>> call get_password_type() on the password itself to get the same > >>> information. But it's not worth changing the API and breaking external > >>> modules for that. > > > > Ahah. I just had the same thought before reading this message. > > And attached is a patch to do that. I am all for this one to get a > more simple interface in place. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Heikki, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] If subscription to foreign table valid ?
On Fri, May 12, 2017 at 11:58 PM, Petr Jelinek wrote: > On 11/05/17 15:43, Petr Jelinek wrote: > > Hi, > > > > > We do check for this, but only during replication which we have to do > > because the fact that relation 't' was foreign table during ALTER > > SUBSCRIPTION does not mean that it won't be something else half hour > later. > > > > I think it does make sense to add check for this into CREATE/ALTER > > SUBSCRIBER though so that user is informed immediately about the mistake > > rather than by errors in the logs later. > > > > I'll look into writing patch for this. I don't think it's beta blocker > > though. > > > > So I moved the relkind check to single function and call it from all the > necessary places. See the attached > > With this patch the error will be like this: logical replication target relation public.t is not a table But it is possible that the referred table is Foreign Table of Partitioned table (so actually the referred object is indeed a table). Would it make sense to specify in the message that the table is not a normal table or something in that line? Regards, Neha
Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression
On Mon, May 08, 2017 at 12:28:38PM -0400, Peter Eisentraut wrote: > On 5/8/17 02:58, Noah Misch wrote: > > IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due > > for your status update. Please reacquaint yourself with the policy on open > > item ownership[1] and then reply immediately. If I do not hear from you by > > 2017-05-09 07:00 UTC, I will transfer this item to release management team > > ownership without further notice. > > I got side-tracked by the minor releases preparation. I will work on > this now and report on Wednesday. IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due for your status update. Please reacquaint yourself with the policy on open item ownership[1] and then reply immediately. If I do not hear from you by 2017-05-16 04:00 UTC, I will transfer this item to release management team ownership without further notice. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified
On Fri, May 12, 2017 at 08:54:13AM +, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa, > > Takayuki > > I found a wrong sentence here in the doc. I'm sorry, this is what I asked > > you to add... > > > > https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq- > > paramkeywords > > > > connect_timeout > > Maximum wait for connection, in seconds (write as a decimal integer string). > > Zero or not specified means wait indefinitely. It is not recommended to > > use a timeout of less than 2 seconds. This timeout applies separately to > > each connection attempt. For example, if you specify two hosts and both > > of them are unreachable, and connect_timeout is 5, the total time spent > > waiting for a connection might be up to 10 seconds. > > > > > > The program behavior is that libpq times out after connect_timeout seconds > > regardless of how many hosts are specified. I confirmed it like this: > > > > $ export PGOPTIONS="-c post_auth_delay=30" > > $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p > > 5432,5433 > > (psql erros out after 5 seconds) > > > > Could you fix the doc with something like this? > > > > "This timeout applies across all the connection attempts. For example, if > > you specify two hosts and both of them are unreachable, and connect_timeout > > is 5, the total time spent waiting for a connection is up to 5 seconds." > > > > Should we also change the minimum "2 seconds" part to be longer, according > > to the number of hosts? > > Instead, I think we should fix the program to match the documented behavior. > Otherwise, if the first database machine is down, libpq might wait for about > 2 hours (depending on the OS's TCP keepalive setting), during which it tims > out after connect_timeout and does not attempt to connect to other hosts. > > I'll add this item in the PostgreSQL 10 Open Items. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix a typo in reorderbuffer.c
Hi, Attached patch for $subject. I think "that's" is correct word. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_typo_in_reorderbuffer_c.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapbuild woes
On 2017-05-14 14:54:37 +0200, Petr Jelinek wrote: > On 13/05/17 22:23, Andres Freund wrote: > >> And wait for session 3 to finish slot creation, takes about 20 mins on > >> my laptop without patches, minute and half with your patches for 0004 > >> and 0005 (or with your 0004 and my 0005) and about 2s with my original > >> 0004 and 0005. > > > > Is that with assertions enabled or not? With assertions all the time > > post patches seems to be spent in some Asserts in reorderbuffer.c, > > without it takes less than a second for me here. > > > > Right you are, I always forget to switch that off before benchmarks. Phew ;) > > I'm appylying these now. > > Cool. Just for posterity, this also fixes the issue number 3 as the > switch to consistent state is done purely based on xl_running_xacts and > not decoded commits/aborts. Cool. Although I'm still not convinced, as noted somewhere in this thread, that it actually did much to start with ;) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] If subscription to foreign table valid ?
On Thu, May 11, 2017 at 05:55:02PM +0530, tushar wrote: > I observed that -we cannot publish "foreign table" in Publication > > postgres=# create foreign table t (n int) server db1_server options > (table_name 't1'); > CREATE FOREIGN TABLE > > postgres=# create publication pub for table t; > ERROR: "t" is not a table > DETAIL: Only tables can be added to publications. > postgres=# > > but same thing is not true for Subscription > > postgres=# create foreign table t (n int) server db1_server options > (table_name 't'); > CREATE FOREIGN TABLE > postgres=# alter subscription sub refresh publication ; > NOTICE: added subscription for table public.t > ALTER SUBSCRIPTION > > Is this an expected behavior ? if we cannot publish then how can we add > subscription for it. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table
On Mon, May 08, 2017 at 06:27:30PM +0900, Masahiko Sawada wrote: > I encountered a situation where DROP SUBSCRIPTION got stuck when > initial table sync is in progress. In my environment, I created > several tables with some data on publisher. I created subscription on > subscriber and drop subscription immediately after that. It doesn't > always happen but I often encountered it on my environment. > > ps -x command shows the following. > > 96796 ?Ss 0:00 postgres: masahiko postgres [local] DROP > SUBSCRIPTION > 96801 ?Ts 0:00 postgres: bgworker: logical replication > worker for subscription 40993waiting > 96805 ?Ss 0:07 postgres: bgworker: logical replication > worker for subscription 40993 sync 16418 > 96806 ?Ss 0:01 postgres: wal sender process masahiko [local] idle > 96807 ?Ss 0:00 postgres: bgworker: logical replication > worker for subscription 40993 sync 16421 > 96808 ?Ss 0:00 postgres: wal sender process masahiko [local] idle > > The DROP SUBSCRIPTION process (pid 96796) is waiting for the apply > worker process (pid 96801) to stop while holding a lock on > pg_subscription_rel. On the other hand the apply worker is waiting for > acquiring a tuple lock on pg_subscription_rel needed for heap_update. > Also table sync workers (pid 96805 and 96807) are waiting for the > apply worker process to change their status. > > Also, even when DROP SUBSCRIPTION is done successfully, the table sync > worker can be orphaned because I guess that the apply worker can exit > before change status of table sync worker. > > I'm using 1f30295. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race conditions with WAL sender PID lookups
At Fri, 12 May 2017 11:44:19 +0900, Masahiko Sawada wrote in > On Thu, May 11, 2017 at 10:48 AM, Michael Paquier > wrote: > > Hi all, > > > > I had my eyes on the WAL sender code this morning, and I have noticed > > that walsender.c is not completely consistent with the PID lookups it > > does in walsender.c. In two code paths, the PID value is checked > > without holding the WAL sender spin lock (WalSndRqstFileReload and > > pg_stat_get_wal_senders), which looks like a very bad idea contrary to > > what the new WalSndWaitStopping() does and what InitWalSenderSlot() is > > doing for ages. > > > > Any thoughts about the patch attached to make things more consistent? > > It seems to me that having some safeguards would be nice for > > robustness. > > +1. I think this is a sensible change. It intends to avoid exccesive locking during looking up stats values. But we don't have so much vacant WanSnd slots in a reasonable setup. Thus it seems reasonable to read the pid value within the lock section since it adds practically no additional cost. pg_stat_get_wal_receiver seems to need the same amendment since the code is a parallel to that of wal receiver. Or, if we were too sensitive to such locks for nothing, we could use double-checked locking but I don't think we are so here. In short, +1 too and walreceiver needs the same amendment. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 01:06:03PM -0700, Andres Freund wrote: > On 2017-05-14 15:59:09 -0400, Greg Stark wrote: > > Personally while I would like to avoid code that actively crashes or > > fails basic tests on Vax > > I personally vote for simply refusing to run/compile on non-IEEE > platforms, including VAX. The benefit of even trying to get that right, > not to speak of actually testing, seems just not there. Do we even know that floats are precise enough to determine the partition. For example, if you have 6.1, is it possible for that to be 5.999 on some systems? Are IEEE systems all the same for these values? I would say we should disallow any approximate date type for partitioning completely. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 10 release notes
On Thu, May 11, 2017 at 11:50:03PM -0400, Tom Lane wrote: > Michael Paquier writes: > > Bruce, the release notes do not mention yet that support for cleartext > > passwords is removed. This has been done recently by Heikki in > > eb61136d. This has as consequence that CREATE ROLE PASSWORD > > UNENCRYPTED returns an error, and that password_encryption loses its > > value 'off' compared to last releases. > > The release notes only say that they're current through 4-22. The > usual process is to update them in batches every so often. It'd be > great if Bruce has time to do another round before Monday, but the > current situation is not really broken. Done. Applied patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + diff --git a/doc/src/sgml/release-10.sgml b/doc/src/sgml/release-10.sgml new file mode 100644 index 6497641..4f953dc *** a/doc/src/sgml/release-10.sgml --- b/doc/src/sgml/release-10.sgml *** *** 128,133 --- 128,155 + + Rename WAL-related functions and views to use lsn + instead of location (David Rowley) + + + + + + + RenameWAL-related functions and views to use lsn + instead of location (David Rowley) + + + + + + + Remove the ability to store unencrypted passwords on the server + (Heikki Linnakangas) + + + + The server-side variable + no longer supports off or plain. + The UNENCRYPTED option is no longer supported for + CREATE/ALTER USER ... PASSSWORD. Similarly, the + --unencrypted has been removed from createuser. + The default for password_encryption is still + md5, and users migrating passwords from older systems + will have them stored encrypted by default in this release. + + + + + + +Add function PQencryptPasswordConn() +to allow creation of more types of encrypted passwords on the +client-side (Michael Paquier, Heikki Linnakangas) + + + +Previously only MD5 passwords could be created using PQencryptPassword(). +This new function can also create SCRAM-SHA-256 passwords. + + + + + *** *** 2847,2852 --- 2912,2919 Push aggregates to foreign data wrapper servers, where possible *** *** 2858,2864 from the foreign data wrapper server, and offloads aggregate computation from the requesting server. The postgres_fdw is able to !perform this optimization. --- 2925,2932 from the foreign data wrapper server, and offloads aggregate computation from the requesting server. The postgres_fdw is able to !perform this optimization. There are also improvements in !pushing down joins involving extensions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multi-column range partition constraint
On 2017/05/14 1:07, Robert Haas wrote: > On Sat, May 13, 2017 at 12:42 AM, Amit Langote > wrote: >> Attached is the correct version. > > Thank you! I committed 0001 with a couple of cosmetic tweaks and with > the change I previously suggested around partexprs_item. You argued > that wouldn't work because the contents of partexprs_item was > modified, but that's not so: partexprs_item in > get_range_key_properties is a pointer the partexprs_item in the > caller. When it modifies *partexprs_item, it's not changing the > contents of the ListCell itself, just the caller's ListCell *. I see. > I also ran pgindent over the patch. Oops, had forgotten about pgindent. > Also committed 0002. In that case, I removed CHECK (...) from the > output; the caller can always add that if it's desired, but since a > partitioning constraint is NOT a CHECK constraint I don't actually > think it's useful in most cases. I also tweaked the regression tests > slightly. Thanks for reviewing and committing. Agree about not including CHECK (). Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur
From: Michael Paquier [mailto:michael.paqu...@gmail.com] > On Fri, May 12, 2017 at 10:44 PM, Tom Lane wrote: > > I would not really expect that reconnection would retry after > > arbitrary failure cases. Should it retry for "wrong database name", for > instance? > > It's not hard to imagine that leading to very confusing behavior. > > I guess not as well. That would be tricky for the user to have a different > behavior depending on the error returned by the server, which is why the > current code is doing things right IMO. Now, the feature has been designed > similarly to JDBC with its parametrization, so it could be surprising for > users to get a different failure handling compared to that. Not saying that > JDBC is doing it wrong, but libpq does nothing wrong either. I didn't intend to make the user have a different behavior depending on the error returned by the server. I meant attempting connection to alternative hosts when the server returned an error. I thought the new libpq feature tries to connect to other hosts when a connection attempt fails, where the "connection" is the *database connection* (user's perspective), not the *socket connection* (PG developer's perspective). I think PgJDBC meets the user's desire better -- "Please connect to some host for better HA if a database server is unavailable for some reason." By the way, could you elaborate what problem could occur if my solution is applied? (it doesn't seem easy for me to imagine...) FYI, as below, the case Tom picked up didn't raise an issue: [libpq] $ psql -h localhost,localhost -p 5450,5451 -d aaa psql: FATAL: database "aaa" does not exist $ [JDBC] $ java org.hsqldb.cmdline.SqlTool postgres SqlTool v. 3481. 2017-05-15T10:23:55.991+0900 SEVERE Connection error: org.postgresql.util.PSQLException: FATAL: database "aaa" does not exist Location: File: postinit.c, Routine: InitPostgres, Line: 846 Server SQLState: 3D000 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2538) at org.postgresql.core.v3.QueryExecutorImpl.(QueryExecutorImpl.java:122) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:227) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.(PgConnection.java:194) at org.postgresql.Driver.makeConnection(Driver.java:431) at org.postgresql.Driver.connect(Driver.java:247) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.hsqldb.lib.RCData.getConnection(Unknown Source) at org.hsqldb.cmdline.SqlTool.objectMain(Unknown Source) at org.hsqldb.cmdline.SqlTool.main(Unknown Source) Failed to get a connection to 'jdbc:postgresql://localhost:5450,localhost:5451/aaa' as user "tunakawa". Cause: FATAL: database "aaa" does not exist Location: File: postinit.c, Routine: InitPostgres, Line: 846 Server SQLState: 3D000 $ Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Hi, On 2017-05-14 21:22:58 -0400, Robert Haas wrote: > but wanting a CHECK constraint that applies to only one partition > seems pretty reasonable (e.g. CHECK that records for older years are > all in the 'inactive' state, or whatever). On a hash-partitioned table? > Now that's not to say that we shouldn't have a > reload-through-the-top-parent switch; actually, I think that's a good > idea. I just don't believe that it can ever be a complete substitute > for portable hash functions. I think almost everybody here agrees > that it isn't necessary to have hash functions that are 100% portable, > e.g. to VAX. But it would be nice IMHO if you could use an integer > column as the partitioning key and have that be portable between Intel > and POWER. I'm not saying it can't work for any datatype, I just think it'd be a very bad idea to make it work for any non-trivial ones. The likelihood of reguarly breaking or preventing us from improving things seems high. I'm not sure that having a design where this most of the time works for some datatypes is a good one. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur
On Sun, May 14, 2017 at 9:19 PM, Michael Paquier wrote: > On Fri, May 12, 2017 at 10:44 PM, Tom Lane wrote: >> Michael Paquier writes: >>> On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki >>> wrote: Likewise, when the first host has already reached max_connections, libpq doesn't attempt the connection aginst later hosts. >> >>> It seems to me that the feature is behaving as wanted. Or in short >>> attempt to connect to the next host only if a connection cannot be >>> established. If there is a failure once the exchange with the server >>> has begun, just consider it as a hard failure. This is an important >>> property for authentication and SSL connection failures actually. >> >> I would not really expect that reconnection would retry after arbitrary >> failure cases. Should it retry for "wrong database name", for instance? >> It's not hard to imagine that leading to very confusing behavior. > > I guess not as well. That would be tricky for the user to have a > different behavior depending on the error returned by the server, > which is why the current code is doing things right IMO. Now, the > feature has been designed similarly to JDBC with its parametrization, > so it could be surprising for users to get a different failure > handling compared to that. Not saying that JDBC is doing it wrong, but > libpq does nothing wrong either. I concur. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 6:29 PM, Andres Freund wrote: > On 2017-05-14 18:25:08 -0400, Tom Lane wrote: >> It may well be that we can get away with saying "we're not going >> to make it simple to move hash-partitioned tables with float >> partition keys between architectures with different float >> representations". But there's a whole lot of daylight between that >> and denying any support for float representations other than the >> currently-most-popular one. > > Note that I, IIRC in the mail you responded to, also argued that I don't > think it'd be a good idea to rely on hashfunctions being portable. The > amount of lock-in that'd create, especially for more complex datatypes, > seems wholly inadvisable. I still think that dumping tables in a way > they're reloaded via the top-partition (probably one copy statement for > each child partition), and prohibiting incoming fkeys to partitions, is > a better approach to all this. You'd have to prohibit a heck of a lot more than that in order for this to work 100% reliably. You'd have to prohibit CHECK constraints, triggers, rules, RLS policies, and UNIQUE indexes, at the least. You might be able to convince me that some of those things are useless when applied to partitions, but wanting a CHECK constraint that applies to only one partition seems pretty reasonable (e.g. CHECK that records for older years are all in the 'inactive' state, or whatever). I think getting this to work 100% reliably in all cases would require an impractically large hammer. Now that's not to say that we shouldn't have a reload-through-the-top-parent switch; actually, I think that's a good idea. I just don't believe that it can ever be a complete substitute for portable hash functions. I think almost everybody here agrees that it isn't necessary to have hash functions that are 100% portable, e.g. to VAX. But it would be nice IMHO if you could use an integer column as the partitioning key and have that be portable between Intel and POWER. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur
On Fri, May 12, 2017 at 10:44 PM, Tom Lane wrote: > Michael Paquier writes: >> On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki >> wrote: >>> Likewise, when the first host has already reached max_connections, libpq >>> doesn't attempt the connection aginst later hosts. > >> It seems to me that the feature is behaving as wanted. Or in short >> attempt to connect to the next host only if a connection cannot be >> established. If there is a failure once the exchange with the server >> has begun, just consider it as a hard failure. This is an important >> property for authentication and SSL connection failures actually. > > I would not really expect that reconnection would retry after arbitrary > failure cases. Should it retry for "wrong database name", for instance? > It's not hard to imagine that leading to very confusing behavior. I guess not as well. That would be tricky for the user to have a different behavior depending on the error returned by the server, which is why the current code is doing things right IMO. Now, the feature has been designed similarly to JDBC with its parametrization, so it could be surprising for users to get a different failure handling compared to that. Not saying that JDBC is doing it wrong, but libpq does nothing wrong either. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,> Takayuki > Instead, I think we should fix the program to match the documented behavior. > Otherwise, if the first database machine is down, libpq might wait for about > 2 hours (depending on the OS's TCP keepalive setting), during which it tims > out after connect_timeout and does not attempt to connect to other hosts. > > I'll add this item in the PostgreSQL 10 Open Items. Please use the attached patch to fix the problem. I confirmed the success as follows: $ add "post_auth_delay = 10" in postgresql.conf on host1 $ start database servers on host1 and host2 $ psql -h host1,host2 -p 5432,5433 -d "dbname=postgres connect_timeout=3" (psql connected to host2 after 3 seconds, which I checked with \conninfo) Regards Takayuki Tsunakawa diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index eb5aaf7098..d02e5201fa 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1720,6 +1720,8 @@ connectDBComplete(PGconn *conn) { PostgresPollingStatusType flag = PGRES_POLLING_WRITING; time_t finish_time = ((time_t) -1); + int ret = 0; + int timeout = 0; if (conn == NULL || conn->status == CONNECTION_BAD) return 0; @@ -1729,8 +1731,7 @@ connectDBComplete(PGconn *conn) */ if (conn->connect_timeout != NULL) { - int timeout = atoi(conn->connect_timeout); - + timeout = atoi(conn->connect_timeout); if (timeout > 0) { /* @@ -1761,7 +1762,8 @@ connectDBComplete(PGconn *conn) return 1; /* success! */ case PGRES_POLLING_READING: - if (pqWaitTimed(1, 0, conn, finish_time)) + ret = pqWaitTimed(1, 0, conn, finish_time); + if (ret == -1) { conn->status = CONNECTION_BAD; return 0; @@ -1769,7 +1771,8 @@ connectDBComplete(PGconn *conn) break; case PGRES_POLLING_WRITING: - if (pqWaitTimed(0, 1, conn, finish_time)) + ret = pqWaitTimed(0, 1, conn, finish_time); + if (ret == -1) { conn->status = CONNECTION_BAD; return 0; @@ -1782,6 +1785,22 @@ connectDBComplete(PGconn *conn) return 0; } + if (ret == 1) /* connect_timeout elapsed */ + { + /* If there are no more hosts, return (the error message is already set) */ + if (++conn->whichhost >= conn->nconnhost) + { + conn->whichhost = 0; + conn->status = CONNECTION_BAD; + return 0; + } + /* Attempt connection to the next host, starting the connect_timeout timer */ + pqDropConnection(conn, true); + conn->addr_cur = conn->connhost[conn->whichhost].addrlist; + conn->status = CONNECTION_NEEDED; + finish_time = time(NULL) + timeout; + } + /* * Now try to advance the state machine. */ diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 756c6d7779..1d6ea93a0a 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -991,11 +991,9 @@ pqWait(int forRead, int forWrite, PGconn *conn) /* * pqWaitTimed: wait, but not past finish_time. * - * If finish_time is exceeded then we return failure (EOF). This is like - * the response for a kernel exception because we don't want the caller - * to try to read/write in that case. - * * finish_time = ((time_t) -1) disables the wait limit. + * + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if it timed out. */ int pqWaitTimed(int forRead, int forWrite, PGconn *conn, time_t finish_time) @@ -1005,13 +1003,13 @@ pqWaitTimed(int forRead, int forWrite, PGconn *conn, time_t finish_time) result = pqSocketCheck(conn, forRead, forWrite, finish_time); if (result < 0) - return EOF; /* errorMessage is already set */ + return -1; /* errorMessage is already set */ if (result == 0) { printfPQExpBuffer(
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 3:30 PM, Tom Lane wrote: > I agree that the Far Eastern systems that can't easily be replaced > by Unicode are that way mostly because they're a mess. But I'm > still of the opinion that locking ourselves into Unicode is a choice > we might regret, far down the road. It's not a choice that has any obvious upside, so I have no reason to disagree. My point was only that Robert's contention that "You could argue that text-under-LATIN1 and text-under-UTF8 aren't really the same data type at all" seems wrong to me, because PostgreSQL seems to want to treat encoding as a property of the machine. This is evidenced by the fact that we expect applications to change client encoding "transparently". That is, client encoding may be changed without in any way affecting humans that speak a natural language that is provided for by the application's client encoding. That's a great ideal to have, and one that is very close to completely workable. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 10:08 AM, Thomas Munro wrote: > [2] > https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop Though looking more closely I see that the default is IEEE in 64 bit builds, which seems like a good way to kill the older format off. If/when someone gets PostgreSQL ported to z/OS it probably won't be because they want to run it on an ancient 32 bit mainframe. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 pgindent run
Andres Freund writes: > On 2017-05-14 17:22:16 -0400, Stephen Frost wrote: >> Releasing alpha/beta is not the same as branching, which I didn't expect >> us to do for a while yet.. > Well, tagging then. Imo it still should be done before we tag > beta1/alpha1. Too late, IMO. Yeah, I'd have preferred to pgindent before beta1 too, just in case pgindent breaks something ... but it's been a long time since that happened, so realistically, not having done it yet is not going to meaningfully degrade the relevance of beta testing. At this point, for Bruce to do it before the wrap, he'd pretty much have to do it tonight, which we already agreed among the release team is not enough notice. Also, I liked the approach Robert took last year where he did a dry run and then manually fixed places that he saw would get messed up by pgindent. That's a lot more pleasant than undoing such damage afterwards. I don't know if Robert plans to do that this year, but if not, I'm planning to get to it tomorrow or Tuesday. I can't do it right now because I'm still on higher-priority tasks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Peter Geoghegan writes: > The express goal of the Unicode consortium is to replace all existing > encodings with Unicode. My personal opinion is that a Unicode > monoculture would be a good thing, provided reasonable differences can > be accommodated. Can't help remembering Randall Munroe's take on such things: https://xkcd.com/927/ I agree that the Far Eastern systems that can't easily be replaced by Unicode are that way mostly because they're a mess. But I'm still of the opinion that locking ourselves into Unicode is a choice we might regret, far down the road. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-14 18:25:08 -0400, Tom Lane wrote: > It may well be that we can get away with saying "we're not going > to make it simple to move hash-partitioned tables with float > partition keys between architectures with different float > representations". But there's a whole lot of daylight between that > and denying any support for float representations other than the > currently-most-popular one. Note that I, IIRC in the mail you responded to, also argued that I don't think it'd be a good idea to rely on hashfunctions being portable. The amount of lock-in that'd create, especially for more complex datatypes, seems wholly inadvisable. I still think that dumping tables in a way they're reloaded via the top-partition (probably one copy statement for each child partition), and prohibiting incoming fkeys to partitions, is a better approach to all this. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Andres Freund writes: > On 2017-05-14 15:59:09 -0400, Greg Stark wrote: >> Personally while I would like to avoid code that actively crashes or >> fails basic tests on Vax > I personally vote for simply refusing to run/compile on non-IEEE > platforms, including VAX. The point of wanting that is not because anybody thinks that VAX per se is an interesting platform anymore. The point is to avoid locking ourselves into narrow assumptions that we may need to break someday. Saying "nobody cares about floats other than IEEE floats" is morally indistinguishable from saying "nobody cares about running on any platform except Windows", which was a depressingly common opinion back in the nineties or so. It may well be that we can get away with saying "we're not going to make it simple to move hash-partitioned tables with float partition keys between architectures with different float representations". But there's a whole lot of daylight between that and denying any support for float representations other than the currently-most-popular one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 7:59 AM, Greg Stark wrote: > On 13 May 2017 at 10:29, Robert Haas wrote: >> - Floats. There may be different representations in use on different >> hardware, which could be a problem. Tom didn't answer my question >> about whether any even-vaguely-modern hardware is still using non-IEEE >> floats, which I suspect means that the answer is "no". > > Fwiw the answer to that is certainly no. The only caveat is that some > platforms have not entirely complete implementations of IEEE missing > corner cases such as denormalized values but I don't think that would > be something that would be changed with a different hash function > though. Well... along with the Intel/IEEE-754 and VAX formats, there is a third floating point format that is/was in widespread use: IBM hex float[1]. It's base 16 rather than base 2, and might be the default on some IBM operating systems[2] for the C float and double types (but fortunately not xlc for AIX or Linux, and I have no clue about i/OS). This is probably irrelevant because it looks like people aren't running PostgreSQL on z/OS right now[3], but unlike VAXen these systems are alive and well so I just wanted to respond to the implication on this thread that the whole world is a VAX or an IEEE system :-) People really use this... I happen to know a shop that has petabytes of IBM hex float data. (IBM hardware also supports base 10 floats, but they show up as different types in C so not relevant.) [1] https://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture [2] https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop [3] https://www.postgresql.org/message-id/flat/BLU437-SMTP4B3FF36035D8A3C3816D49C160%40phx.gbl -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 pgindent run
* Andres Freund (and...@anarazel.de) wrote: > On 2017-05-14 17:22:16 -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote: > > > > I would like to run pgindent on the head source tree this Wednesday > > > > afternoon, UTC. Is that OK for everyone? > > > > > > Shouldn't we do that before we branch? And if Thursday still is the > > > intended alpha/beta release date, Wednesday would be too late, no? > > > > Releasing alpha/beta is not the same as branching, which I didn't expect > > us to do for a while yet.. > > Well, tagging then. Imo it still should be done before we tag > beta1/alpha1. For my 2c, at least, I'm not sure I see why that matters all that much..? I'm not against doing the pgindent run today, to be clear, but I don't follow what the harm is with waiting until Wednesday. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PG10 pgindent run
On 2017-05-14 17:22:16 -0400, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote: > > > I would like to run pgindent on the head source tree this Wednesday > > > afternoon, UTC. Is that OK for everyone? > > > > Shouldn't we do that before we branch? And if Thursday still is the > > intended alpha/beta release date, Wednesday would be too late, no? > > Releasing alpha/beta is not the same as branching, which I didn't expect > us to do for a while yet.. Well, tagging then. Imo it still should be done before we tag beta1/alpha1. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 pgindent run
* Andres Freund (and...@anarazel.de) wrote: > On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote: > > I would like to run pgindent on the head source tree this Wednesday > > afternoon, UTC. Is that OK for everyone? > > Shouldn't we do that before we branch? And if Thursday still is the > intended alpha/beta release date, Wednesday would be too late, no? Releasing alpha/beta is not the same as branching, which I didn't expect us to do for a while yet.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] postgres 9.6.2 update breakage
Jan Nieuwenhuizen writes: > Roel Janssen writes: > >> So, it would be something like: >> postgres pg_upgrade \ >> ... > > It's great to have a recipe `that works', so thanks! > > However, whether or not we automate this, I cannot help to wonder if > we should support downgrading -- at least to the previous version > in this case? > > If I'm not mistaken, everything else in GuixSD will run if I select a > previous system generation in Grub...except for this? > > Is involving postgres developers an option, I'm sure a least one of > the postgresql hackers[cc] are already looking at Guix[SD]? > > Greetings, > janneke There's a big difference in upgrading and downgrading between guix revisions and doing so in highly stateful databases, unfortunately. I can't speak for postgres specifically, but here's my experience with migrations as the tech lead of MediaGoblin: - upgrades should be taken with extreme caution, and you should back up first. - downgrades should be taken with ten times the amount of caution of upgrades, a vat of coffee to work through the problems, and a barrel of whiskey for when it doesn't. I say that as someone who's mostly given up coffee and doesn't drink alcohol. State changes are bad enough when unidirectional. Django, for instance, provides an API that does both upgrades and downgrades. Almost everybody spends a bunch of time carefully crafting their upgrades, and just leaves their downgrades as the stubs that come with it. These are stubs that drop columns entirely, possibly columns that data was moved to in the migration. Reverse course, and suddenly you don't have a lot of data you used to. What we really want to do is provide the option to snapshot things *before* you do an upgrade, IMO... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 9:11 PM, Robert Haas wrote: > The latter is > generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what > about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing > list discussions, I'm under the impression that it is sometimes > debatable how a character in one encoding should be converted to some > other encoding, either because it's not clear whether there is a > mapping at all or it's unclear what mapping should be used. The express goal of the Unicode consortium is to replace all existing encodings with Unicode. My personal opinion is that a Unicode monoculture would be a good thing, provided reasonable differences can be accommodated. So, it might be that there is ambiguity about how one codepoint can be converted to another in another encoding, but that's because encodings like SJIS and BIG5 are needlessly ambiguous. It has nothing to do with cultural preferences leaving the question undecidable (at least by a panel of natural language experts), and everything to do with these regional character encoding systems being objectively bad. They richly deserve to die, and are in fact dying. Encoding actually *is* a property of the machine, even though regional encodings obfuscate things. There is a reason why MacOS and Java use UTF-16 rather than UTF-8, and there is a reason why the defacto standard on the web is UTF-8, and those reasons are completely technical. AFAICT, whatever non-technical reasons remain are actually technical debt in disguise. Where this leaves hash partitioning, I cannot say. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-14 15:59:09 -0400, Greg Stark wrote: > Personally while I would like to avoid code that actively crashes or > fails basic tests on Vax I personally vote for simply refusing to run/compile on non-IEEE platforms, including VAX. The benefit of even trying to get that right, not to speak of actually testing, seems just not there. > even I don't think we need to worry about > replication or federated queries in a heterogeneous environment where > some servers are Vaxen and some are modern hardware. That seems a bit > far-fetched. Imo there's little point in trying to delineate some subset that we want to support on platforms that are 20 years out of date. Either we do, or don't. And since there's no point aside of some intellectual curiosity... On the other hand, I don't believe my opinion that requiring hashing to be portable is unrealistic, is meaningfully affected by disallowing non-IEEE floats. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
2017-05-14 5:04 GMT+02:00 Pavel Stehule : > > > 2017-05-13 22:20 GMT+02:00 Tom Lane : > >> Pavel Stehule writes: >> > I am working on migration large Oracle application to Postgres. When I >> > started migration procedures with OUT parameters I found following limit >> >> > "record or row variable cannot be part of multiple-item INTO list" >> >> IIRC, the reason for disallowing that is that it's totally unclear what >> the semantics ought to be. Is that variable a single target (demanding >> a compatible composite-valued column from the source query), or does it >> eat one source column per field within the record/row? The former is 100% >> inconsistent with what happens if the record/row is the only INTO target; >> while the latter would be very bug-prone, and it's especially unclear what >> ought to happen if it's an as-yet-undefined record variable. > > > I don't think so. The semantics should be same like now. > > now, the output (s1,s2,s3) can be assigned to > > 1. scalar variables - implemented with aux row variable (s1,s2,s3) -> > r(ts1,ts2,ts3) > 2. record - (s1, s2, s3) -> rec(s1,s2,s3) > 3. row - (s1,s2,s3) -> r(s1,s2,s3) > > If we allow composite values there, then situation is same > > 1. (s1, c2, s3, c4) -> r(ts1, tc2, ts3, tc4) > 2. (s1, c2, s3, c4) -> rec(s1, c2, s3, c4) > 3. (s1, c2, s3, c4) -> row(s1, c2, s3, c4) > > So there are not any inconsistency if we use rule > > 1. if there is one target, use it > 2. if there are more target, create aux row variable > > Same technique is used for function output - build_row_from_vars - and > there are not any problem. > > If you try assign composite to scalar or scalar to composite, then the > assignment should to fail. But when statement is correct, then this invalid > assignments should not be there. > > >> >> Yeah, we could invent some semantics or other, but I think it would >> mostly be a foot-gun for unwary programmers. >> >> We do allow you to write out the columns individually for such cases: >> >> SELECT ... INTO v1, rowvar.c1, rowvar.c2, rowvar.c3, v2 ... >> > > It doesn't help to performance and readability (and maintainability) for > following cases > > There are often pattern > > PROCEDURE p(..., OUT r widetab%ROWTYPE, OUT errordesc COMPOSITE) > > Now there is a workaround > > SELECT * FROM p() INTO auxrec; > r := auxrec.widetab; > errordesc := auxrec.errordesc; > > But it creates N (number of OUT variables) of assignments commands over > records. > > If this workaround is safe, then implementation based on aux row variable > should be safe too, because it is manual implementation. > > > >> >> and I think it's better to encourage people to stick to that. > > > I don't think so using tens OUT variables is some nice, but current behave > is too restrictive. More, I didn't find a case, where current > implementation should not work (allow records needs some work). > here is patch all regress tests passed Regards Pavel > > >> >> regards, tom lane >> > > plpgsql-into-multitarget.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 13 May 2017 at 10:29, Robert Haas wrote: > - Floats. There may be different representations in use on different > hardware, which could be a problem. Tom didn't answer my question > about whether any even-vaguely-modern hardware is still using non-IEEE > floats, which I suspect means that the answer is "no". Fwiw the answer to that is certainly no. The only caveat is that some platforms have not entirely complete implementations of IEEE missing corner cases such as denormalized values but I don't think that would be something that would be changed with a different hash function though. Personally while I would like to avoid code that actively crashes or fails basic tests on Vax even I don't think we need to worry about replication or federated queries in a heterogeneous environment where some servers are Vaxen and some are modern hardware. That seems a bit far-fetched. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 pgindent run
On 2017-05-14 09:53:17 -0400, Bruce Momjian wrote: > I would like to run pgindent on the head source tree this Wednesday > afternoon, UTC. Is that OK for everyone? Shouldn't we do that before we branch? And if Thursday still is the intended alpha/beta release date, Wednesday would be too late, no? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Receive buffer size for the statistics socket
So I put in the patch I'd proposed to reduce sleep delays in the stats regression test, and I see that frogmouth has now failed that test twice, with symptoms suggesting that it's dropping the last stats report --- but not all of the stats reports --- from the test's first session. I considered reverting the test patch, but on closer inspection that seems like it would be shooting the messenger, because this is indicating a real and reproducible loss of stats data. I put in some debug elog's to see how much data is getting shoved at the stats collector during this test, and it seems that it can be as much as about 12K, between what the first session can send at exit and what the second session will send immediately after startup. (I think this value should be pretty platform-independent, but be it noted that I'm measuring on a 64-bit Linux system while frogmouth is 32-bit Windows.) Now, what's significant about that is that frogmouth is a pretty old Windows version, and what I read on the net is that Windows versions before 2012 have only 8KB socket receive buffer size by default. So this behavior is plausibly explained by the theory that the stats collector's receive buffer is overflowing, causing loss of stats messages. This could well explain the stats-test failures we've seen in the past too, which if memory serves were mostly on Windows. Also, it's clear that a session could easily shove much more than 8KB at a time out to the stats collector, because what we're doing in the stats test does not involve touching any very large number of tables. So I think this is not just a test failure but is telling us about a plausible mechanism for real-world statistics drops. I observe a default receive buffer size around 124K on my Linux box, which seems like it'd be far less prone to overflow than 8K. I propose that it'd be a good idea to try to set the stats socket's receive buffer size to be a minimum of, say, 100K on all platforms. Code for this would be analogous to what we already have in pqcomm.c (circa line 760) for forcing up the send buffer size, but SO_RCVBUF not SO_SNDBUF. A further idea is that maybe backends should be tweaked to avoid blasting large amounts of data at the stats collector in one go. That would require more thought to design, though. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small improvement to compactify_tuples
Good day, everyone. I've been playing a bit with unlogged tables - just random updates on simple key-value table. I've noticed amount of cpu spent in a compactify_tuples (called by PageRepareFragmentaion). Most of time were spent in qsort of itemidbase items. itemidbase array is bounded by number of tuples in a page, and itemIdSortData structure is simple, so specialized version could be a better choice. Attached patch adds combination of one pass of prefix sort with insertion sort for larger array and shell sort for smaller array. Insertion sort and shell sort are implemented as macros and could be reused. I've tested following table: create unlogged table test3 ( id integer PRIMARY KEY with (fillfactor=85), val text ) WITH (fillfactor=85); insert into test3 select i, '!'||i from generate_series(1, 1000) as i; With pgbench script: \set id1 RANDOM(1, :scale) \set id2 RANDOM(1, :scale) select * from test3 where id = :id1; update test3 set val = '!'|| :id2 where id = :id1; And command: pgbench -M prepared -c 3 -s 1000 -T 1000 -P 3 -n -f test3.sql testdb Using 1GB shared_buffers and synchronous_commit=off. On my notebook improvement is: before patch: progress: 63.0 s, 15880.1 tps, lat 0.189 ms stddev 0.127 progress: 66.0 s, 15975.8 tps, lat 0.188 ms stddev 0.122 progress: 69.0 s, 15904.1 tps, lat 0.189 ms stddev 0.152 progress: 72.0 s, 15000.9 tps, lat 0.200 ms stddev 0.213 progress: 75.0 s, 15101.7 tps, lat 0.199 ms stddev 0.192 progress: 78.0 s, 15854.2 tps, lat 0.189 ms stddev 0.158 progress: 81.0 s, 15803.3 tps, lat 0.190 ms stddev 0.158 progress: 84.0 s, 15242.9 tps, lat 0.197 ms stddev 0.203 progress: 87.0 s, 15184.1 tps, lat 0.198 ms stddev 0.215 after patch: progress: 63.0 s, 17108.5 tps, lat 0.175 ms stddev 0.140 progress: 66.0 s, 17271.9 tps, lat 0.174 ms stddev 0.155 progress: 69.0 s, 17243.5 tps, lat 0.174 ms stddev 0.143 progress: 72.0 s, 16675.3 tps, lat 0.180 ms stddev 0.206 progress: 75.0 s, 17187.4 tps, lat 0.175 ms stddev 0.157 progress: 78.0 s, 17293.0 tps, lat 0.173 ms stddev 0.159 progress: 81.0 s, 16289.8 tps, lat 0.184 ms stddev 0.180 progress: 84.0 s, 16131.2 tps, lat 0.186 ms stddev 0.170 progress: 87.0 s, 16741.1 tps, lat 0.179 ms stddev 0.165 I understand that it is quite degenerate test case. But probably this improvement still has sense. With regards, -- Sokolov Yura aka funny.falcon Postgres Professional: https://postgrespro.ru The Russian Postgres CompanyFrom f8ed235dbb60a79b4f59dc8a4af014b2ca698772 Mon Sep 17 00:00:00 2001 From: Sokolov Yura aka funny_falcon Date: Sun, 14 May 2017 20:57:00 +0300 Subject: [PATCH] storage/page/bufpage.c: improve compactify_tuples Items passed to compactify_tuples are almost sorted. So call to general qsort makes unnecessary overhead on function calls (swapfunc, med, itemoffcompare). This patch add one pass of prefix sort + insertion sort for large array of items, and shell sort for smaller arrays. Insertion and Shell sort are implemented using macroses. This approach allows to save 3% of cpu in degenerate case (highly intensive HOT random updates on unlogged table with synchronized_commit=off). --- src/backend/storage/page/bufpage.c | 42 +-- src/include/c.h| 59 ++ 2 files changed, 98 insertions(+), 3 deletions(-) diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index fdf045a..b7c6392 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -433,6 +433,36 @@ itemoffcompare(const void *itemidp1, const void *itemidp2) ((itemIdSort) itemidp1)->itemoff; } +#define NSPLIT 256 +#define PREFDIV (BLCKSZ / NSPLIT) +/* one pass of prefix sort for high 8 bits of itemoff.*/ +static void +prefix_presort(itemIdSort itemidbase, int nitems) +{ + itemIdSortData copy[MaxIndexTuplesPerPage]; + int count[NSPLIT+1] = { 0 }; + int i, pos, highbits; + + Assert(nitems <= MaxIndexTuplesPerPage); + for (i = 0; i < nitems; i++) + { + highbits = itemidbase[i].itemoff / PREFDIV; + count[highbits]++; + } + /* sort in decreasing order */ + for (i = NSPLIT-1; i != 0; i--) + count[i-1] += count[i]; + Assert(count[0] == nitems); + for (i = 0; i < nitems; i++) + { + highbits = itemidbase[i].itemoff / PREFDIV; + pos = count[highbits+1]++; + copy[pos] = itemidbase[i]; + } + Assert(count[1] == nitems); + memcpy(itemidbase, copy, sizeof(itemIdSortData) * nitems); +} + /* * After removing or marking some line pointers unused, move the tuples to * remove the gaps caused by the removed items. @@ -444,9 +474,15 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page page) Offset upper; int i; - /* sort itemIdSortData array into decreasing itemoff order */ - qsort((char *) itemidbase, nitems, sizeof(itemIdSortData), - itemoffcompare); + if (ni
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On Sun, May 14, 2017 at 5:35 PM, Ildus Kurbangaliev wrote: > TRAP: FailedAssertion("!(((const void*)(fdw_trigtuple) != ((void *)0)) > ^ ((bool) (((const void*)(tupleid) != ((void *)0)) && > ((tupleid)->ip_posid != 0", File: "trigger.c", Line: 2428) > > I'm not sure how it should be fixed, because as I see `oldtuple` will > be set only for AFTER ROW triggers by `wholerow` junk attribute. There is comment on the ExecUpdate function * When updating a foreign table, * tupleid is invalid; the FDW has to figure out which row to * update using data from the planSlot. oldtuple is passed to * foreign table triggers; it is NULL when the foreign table has * no relevant triggers. After your fix, now tupleid is invalid which is expected, but seems like we need to do something more. As per the comments seems like it is expected to get the oldtuple from planSlot. But I don't see any code for handling that part. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres 9.6.2 update breakage
Roel Janssen writes: > So, it would be something like: > postgres pg_upgrade \ > ... It's great to have a recipe `that works', so thanks! However, whether or not we automate this, I cannot help to wonder if we should support downgrading -- at least to the previous version in this case? If I'm not mistaken, everything else in GuixSD will run if I select a previous system generation in Grub...except for this? Is involving postgres developers an option, I'm sure a least one of the postgresql hackers[cc] are already looking at Guix[SD]? Greetings, janneke -- Jan Nieuwenhuizen | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | AvatarĀ® http://AvatarAcademy.nl -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PG10 pgindent run
I would like to run pgindent on the head source tree this Wednesday afternoon, UTC. Is that OK for everyone? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapbuild woes
On 13/05/17 22:23, Andres Freund wrote: > On 2017-05-12 10:57:55 +0200, Petr Jelinek wrote: >> Hmm, well it helps but actually now that we don't track individual >> running transactions anymore it got much less effective (my version of >> 0005 does as well). >> >> The example workload I test with is: >> session 1: open transaction, do a write, keep it open >> session 2: pgbench -M simple -N -c 10 -P 1 -T 5 >> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender >> session 2: pgbench -M simple -N -c 10 -P 1 -T 20 >> session 1: commit >> >> And wait for session 3 to finish slot creation, takes about 20 mins on >> my laptop without patches, minute and half with your patches for 0004 >> and 0005 (or with your 0004 and my 0005) and about 2s with my original >> 0004 and 0005. > > Is that with assertions enabled or not? With assertions all the time > post patches seems to be spent in some Asserts in reorderbuffer.c, > without it takes less than a second for me here. > Right you are, I always forget to switch that off before benchmarks. > I'm appylying these now. Cool. Just for posterity, this also fixes the issue number 3 as the switch to consistent state is done purely based on xl_running_xacts and not decoded commits/aborts. So all done here, thanks! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
Hello hackers, i was experimenting with fdw tables recently, and discovered two bugs in postgres core code (tested on stable 9.6 and master). Steps to reproduce: 1) create parent table 2) create child local table 3) create child foreign table 4) create 'before row update` trigger at foreign table 5) make update query on parent table. I attached sql file with these steps. At the end postgres will show an error like: ERROR: could not open file "base/12410/33037": No such file or directory 33037 is relid of the foreign table. Bug is related with the fact that postgres will try use latest scanned tupleid from local table to try get an old tuple for trigger of foreign table. It should be fixed with the patch like: --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -1324,7 +1324,6 @@ ExecModifyTable(ModifyTableState *node) JunkFilter *junkfilter; TupleTableSlot *slot; TupleTableSlot *planSlot; - ItemPointer tupleid = NULL; ItemPointerData tuple_ctid; HeapTupleData oldtupdata; HeapTuple oldtuple; @@ -1381,6 +1380,8 @@ ExecModifyTable(ModifyTableState *node) */ for (;;) { + ItemPointer tupleid = NULL; + After this patch the second bug will appear: TRAP: FailedAssertion("!(((const void*)(fdw_trigtuple) != ((void *)0)) ^ ((bool) (((const void*)(tupleid) != ((void *)0)) && ((tupleid)->ip_posid != 0", File: "trigger.c", Line: 2428) I'm not sure how it should be fixed, because as I see `oldtuple` will be set only for AFTER ROW triggers by `wholerow` junk attribute. Regards, Ildus Kurbangaliev test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Sat, May 13, 2017 at 12:11 PM, Dilip Kumar wrote: > On Fri, May 12, 2017 at 6:08 PM, amul sul wrote: >> Hi, >> >> Please find the following updated patches attached: > > I have done some testing with the new patch, most of the cases worked > as per the expectation except below > > I expect the planner to select only "Seq Scan on t1" whereas it's > scanning both the partitions? > > create table t (a int, b varchar) partition by hash(a); > create table t1 partition of t for values with (modulus 8, remainder 0); > create table t2 partition of t for values with (modulus 8, remainder 1); > > postgres=# explain select * from t where a=8; > QUERY PLAN > -- > Append (cost=0.00..51.75 rows=12 width=36) >-> Seq Scan on t1 (cost=0.00..25.88 rows=6 width=36) > Filter: (a = 8) >-> Seq Scan on t2 (cost=0.00..25.88 rows=6 width=36) > Filter: (a = 8) > (5 rows) > You are correct. As of now constraint exclusion doesn't work on partition constraint involves function call[1], and hash partition constraint does have satisfies_hash_partition() function call. > > Some cosmetic comments. > --- > + RangeVar *rv = makeRangeVarFromNameList(castNode(List, nameEl->arg)); > + > > Useless Hunk. > > /* > - * Build a CREATE SEQUENCE command to create the sequence object, and > - * add it to the list of things to be done before this CREATE/ALTER > - * TABLE. > + * Build a CREATE SEQUENCE command to create the sequence object, and add > + * it to the list of things to be done before this CREATE/ALTER TABLE. > */ > > Seems like, in src/backend/parser/parse_utilcmd.c, you have changed > the existing code with > pgindent. I think it's not a good idea to mix pgindent changes with your > patch. > Oops, my silly mistake, sorry about that. Fixed in attached version. Regards, Amul 1] https://www.postgresql.org/message-id/CA%2BTgmoaE9NZ_RiqZQLp2aJXPO4E78QxkQYL-FR2zCDop96Ahdg%40mail.gmail.com 0001-Cleanup_v2.patch Description: Binary data 0002-hash-partitioning_another_design-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat wrote: > On Fri, May 12, 2017 at 6:08 PM, amul sul wrote: >> Hi, >> >> Please find the following updated patches attached: >> >> 0001-Cleanup.patch : Does some cleanup and code refactoring required >> for hash partition patch. Otherwise, there will be unnecessary diff in >> 0002 patch > > Thanks for splitting the patch. > > +if (isnull[0]) > +cur_index = partdesc->boundinfo->null_index; > This code assumes that null_index will be set to -1 when has_null is false. > Per > RelationBuildPartitionDesc() this is true. But probably we should write this > code as > if (isnull[0]) > { > if (partdesc->boundinfo->has_null) > cur_index = partdesc->boundinfo->null_index; > } > That way we are certain that when has_null is false, cur_index = -1 similar to > the original code. > Okay will add this. I still don't understood point of having has_null variable, if no null accepting partition exists then null_index is alway set to -1 in RelationBuildPartitionDesc. Anyway, let not change the original code. > Additional arguement to ComputePartitionAttrs() isn't used anywhere in this > patch, so may be this better be part of 0002. If we do this the only change > that will remain in patch is the refactoring of RelationBuildPartitionDesc(), > so we may consider merging it into 0002, unless we find that some more > refactoring is needed. But for now, having it as a separate patch helps. > Okay. > Here's some more comments on 0002 > > + * In the case of hash partitioning, datums is a 2-D array, stores modulus > and > + * remainder values at datums[x][0] and datums[x][1] respectively for each > + * partition in the ascending order. > > This comment about datums should appear in a paragraph of itself and may be > rephrased as in the attached patch. May be we could also add a line about > ndatums for hash partitioned tables as in the attached patch. > Thanks, looks good to me; will include this. [...] > > +if (key->strategy == PARTITION_STRATEGY_HASH) > +{ > +ndatums = nparts; > +hbounds = (PartitionHashBound **) palloc(nparts * > + > sizeof(PartitionHashBound *)); > +i = 0; > +foreach (cell, boundspecs) > +{ > +PartitionBoundSpec *spec = lfirst(cell); > + > [ clipped ] > +hbounds[i]->index = i; > +i++; > +} > For list and range partitioned table we order the bounds so that two > partitioned tables have them in the same order irrespective of order in which > they are specified by the user or hence stored in the catalogs. The partitions > then get indexes according the order in which their bounds appear in ordered > arrays of bounds. Thus any two partitioned tables with same partition > specification always have same PartitionBoundInfoData. This helps in > partition-wise join to match partition bounds of two given tables. Above code > assigns the indexes to the partitions as they appear in the catalogs. This > means that two partitioned tables with same partition specification but > different order for partition bound specification will have different > PartitionBoundInfoData represenation. > > If we do that, probably partition_bounds_equal() would reduce to just matching > indexes and the last element of datums array i.e. the greatest modulus datum. > If ordered datums array of two partitioned table do not match exactly, the > mismatch can be because missing datums or different datums. If it's a missing > datum it will change the greatest modulus or have corresponding entry in > indexes array as -1. If the entry differs it will cause mismatching indexes in > the index arrays. > Make sense, will fix this. [...] > > +if (offset < 0) > +{ > +nmod = DatumGetInt32(datums[0][0]); > +valid_bound = (nmod % spec->modulus) == 0; > +} > +else > +{ > +pmod = DatumGetInt32(datums[offset][0]); > +valid_bound = (spec->modulus % pmod) == 0; > + > +if (valid_bound && (offset + 1) < ndatums) > +{ > +nmod = DatumGetInt32(datums[offset + 1][0]); > +valid_bound = (nmod % spec->modulus) == 0; > +} > +} > May be name the variables as prev_mod(ulus) and next_mod(ulus) for better > readability. > Okay, will rename to prev_modulus and next_modulus resp. > + * for p_p1: satisfies_hash_partition(2, 1, hash_fn(a), hash_fn(b)) > + * for p_p2: satisfies_hash_partition(4, 2, hash_fn(a), hash_fn(b)) > + * for p_p3: satisfies_hash_partition(8, 0, hash_fn(a), hash_fn(b)) > + * for p_p4: satisfies_hash_partition(8, 4, hash_fn(a), hash_fn(b)) > The description here may be read as if w