Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
Hi, I had a look into this patch and would like to share some of my review comments that requires author's attention. 1) The comment for page_checksum() needs to be corrected. It seems like it has been copied from page_header and not edited it further. /* * page_header * * Allows inspection of page header fields of a raw page */ PG_FUNCTION_INFO_V1(page_checksum); Datum page_checksum(PG_FUNCTION_ARGS) 2) It seems like you have choosen wrong datatype for page_checksum. I am getting negative checksum value when trying to run below query. I think the return type for the SQL function page_checksum should be 'integer' instead of 'smallint'. postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 100); page_checksum --- -19532 (1 row) Above problem also exist in case of page_header. I am facing similar problem with page_header as well. postgres=# SELECT page_header(get_raw_page('pg_class', 0)); page_header - (0/2891538,-28949,1,220,7216,8192,8192,4,0) (1 row) 3) Any reason for not marking bt_page_items or page_checksum as PARALLEL_SAFE. 4) Error messages in new bt_page_items are not consistent with old bt_page_items. For eg. if i pass meta page blocknumber as input to bt_page_items the error message thrown by new and old bt_page_items are different. postgres=# SELECT * FROM bt_page_items('btree_index', 0) LIMIT 1; ERROR: block 0 is a meta page postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index', 0)) LIMIT 1; ERROR: block is a meta page postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index', 1024)) LIMIT 1; ERROR: block number 1024 is out of range for relation "btree_index" postgres=# SELECT * FROM bt_page_items('btree_index', 1024) LIMIT 1; ERROR: block number out of range 5) Code duplication in bt_page_items() and bt_page_items_bytea() needs to be handled. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Tue, Mar 7, 2017 at 9:39 PM, Peter Eisentrautwrote: > On 3/6/17 16:33, Tomas Vondra wrote: >>> I think it would be better not to maintain so much duplicate code >>> between bt_page_items(text, int) and bt_page_items(bytea). How about >>> just redefining bt_page_items(text, int) as an SQL-language function >>> calling bt_page_items(get_raw_page($1, $2))? >>> >> >> Maybe. We can also probably share the code at the C level, so I'll look >> into that. > > I think SQL would be easier, but either way some reduction in > duplication would be good. > >>> For page_checksum(), the checksums are internally unsigned, so maybe >>> return type int on the SQL level, so that there is no confusion about >>> the sign? >>> >> >> That ship already sailed, I'm afraid. We already have checksum in the >> page_header() output, and it's defined as smallint there. So using int >> here would be inconsistent. > > OK, no worries then. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
Hi, > > I've assigned to review this patch. > At first, I'd like to notice that I like idea and general design. > Secondly, patch set don't apply cleanly to master. Please, rebase it. Thanks for showing your interest towards this patch. I would like to inform that this patch has got dependency on patch for 'Write Ahead Logging in hash index - [1]' and 'Microvacuum support in hash index [1]'. Hence, until above two patches becomes stable I may have to keep on rebasing this patch. However, I will try to share you the updated patch asap. > > > On Tue, Feb 14, 2017 at 8:27 AM, Ashutosh Sharma> wrote: >> >> 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this >> patch rewrites the hash index scan module to work in page-at-a-time >> mode. It basically introduces two new functions-- _hash_readpage() and >> _hash_saveitem(). The former is used to load all the qualifying tuples >> from a target bucket or overflow page into an items array. The latter >> one is used by _hash_readpage to save all the qualifying tuples found >> in a page into an items array. Apart from that, this patch bascially >> cleans _hash_first(), _hash_next and hashgettuple(). > > > I see that forward and backward scan cases of function _hash_readpage contain > a lot of code duplication > Could you please refactor this function to have less code duplication? Sure, I will try to avoid the code duplication as much as possible. > > Also, I wonder if you have a special idea behind inserting data in test.sql > by 1002 separate SQL statements. > INSERT INTO con_hash_index_table (keycol) SELECT a FROM GENERATE_SERIES(1, > 1000) a; > > You can achieve the same result by execution of single SQL statement. > INSERT INTO con_hash_index_table (keycol) SELECT (a - 1) % 1000 + 1 FROM > GENERATE_SERIES(1, 1002000) a; > Unless you have some special idea of doing this in 1002 separate transactions. There is no reason for having so many INSERT statements in test.sql file. I think it would be better to replace it with single SQL statement. Thanks. [1]- https://www.postgresql.org/message-id/CAA4eK1KibVzgVETVay0%2BsiVEgzaXnP5R21BdWiK9kg9wx2E40Q%40mail.gmail.com [2]- https://www.postgresql.org/message-id/CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in get_partition_for_tuple
Hi Jeevan, On 2017/03/13 14:31, Jeevan Ladhe wrote: > Hi Amit, > > I was able to reproduce the crash, and with the attached patch the crash > goes > away. Also, "make check-world" passes clean. > > Patch looks good to me. Thanks for the review. > However, In following comment in your test: > > -- check routing error through a list partitioned table when they key is > null > > I think you want to say: > > -- check routing error through a list partitioned table when the key is null You're right, fixed that in the attached updated patch. Thanks, Amit >From dbdb0f8f5205a3abd4691c00febf196b853df43f Mon Sep 17 00:00:00 2001 From: amitDate: Fri, 10 Mar 2017 11:53:41 +0900 Subject: [PATCH] Fix a bug in get_partition_for_tuple Handle the NULL partition key more carefully in the case of list partitioning. --- src/backend/catalog/partition.c | 10 +++--- src/test/regress/expected/insert.out | 7 +++ src/test/regress/sql/insert.sql | 6 ++ 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index e01ef864f0..a5ba7448eb 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1726,10 +1726,14 @@ get_partition_for_tuple(PartitionDispatch *pd, errmsg("range partition key of row contains null"))); } - if (partdesc->boundinfo->has_null && isnull[0]) - /* Tuple maps to the null-accepting list partition */ + /* + * A null partition key is only acceptable if null-accepting list + * partition exists. + */ + cur_index = -1; + if (isnull[0] && partdesc->boundinfo->has_null) cur_index = partdesc->boundinfo->null_index; - else + else if (!isnull[0]) { /* Else bsearch in partdesc->boundinfo */ bool equal = false; diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out index 116854e142..7fafa98212 100644 --- a/src/test/regress/expected/insert.out +++ b/src/test/regress/expected/insert.out @@ -365,6 +365,13 @@ DETAIL: Failing row contains (1, 2). insert into mlparted1 (a, b) values (2, 3); ERROR: new row for relation "mlparted11" violates partition constraint DETAIL: Failing row contains (3, 2). +-- check routing error through a list partitioned table when the key is null +create table lparted_nonullpart (a int, b char) partition by list (b); +create table lparted_nonullpart_a partition of lparted_nonullpart for values in ('a'); +insert into lparted_nonullpart values (1); +ERROR: no partition of relation "lparted_nonullpart" found for row +DETAIL: Partition key of the failing row contains (b) = (null). +drop table lparted_nonullpart; -- check that RETURNING works correctly with tuple-routing alter table mlparted drop constraint check_b; create table mlparted12 partition of mlparted1 for values from (5) to (10); diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql index c56c3c22f8..f9c00705a2 100644 --- a/src/test/regress/sql/insert.sql +++ b/src/test/regress/sql/insert.sql @@ -226,6 +226,12 @@ insert into mlparted values (1, 2); -- selected by tuple-routing insert into mlparted1 (a, b) values (2, 3); +-- check routing error through a list partitioned table when the key is null +create table lparted_nonullpart (a int, b char) partition by list (b); +create table lparted_nonullpart_a partition of lparted_nonullpart for values in ('a'); +insert into lparted_nonullpart values (1); +drop table lparted_nonullpart; + -- check that RETURNING works correctly with tuple-routing alter table mlparted drop constraint check_b; create table mlparted12 partition of mlparted1 for values from (5) to (10); -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in get_partition_for_tuple
Hi Amit, I was able to reproduce the crash, and with the attached patch the crash goes away. Also, "make check-world" passes clean. Patch looks good to me. However, In following comment in your test: -- check routing error through a list partitioned table when they key is null I think you want to say: -- check routing error through a list partitioned table when the key is null Thanks, Jeevan Ladhe On Fri, Mar 10, 2017 at 8:26 AM, Amit Langotewrote: > Just observed a crash due to thinko in the logic that handles NULL > partition key. Absence of null-accepting partition in this case should > have caused an error, instead the current code proceeds with comparison > resulting in crash. > > create table p (a int, b char) partition by list (b); > create table p1 partition of p for values in ('a'); > insert into p values (1); -- crashes > > Attached patch fixes that and adds a test. > > Thanks, > Amit > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] wait events for disk I/O
On Thu, Mar 9, 2017 at 10:54 AM, Rushabh Lathiawrote: > Thanks Rajkumar for performing tests on this patch. > > Yes, I also noticed similar results in my testing. Additionally sometime I > also > noticed ReadSLRUPage event on my system. > > I also run the reindex database command and I notices below IO events. > > SyncImmedRelation, > WriteDataBlock > WriteBuffile, > WriteXLog, > ReadDataBlock I have tried for generating some more events, by running pgbench on master/slave configuration, able to see three more events. WriteInitXLogFile SyncInitXLogFile ReadXLog -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist
Sure, understood. Regards, Neha
Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist
Neha Khatriwrites: > Then, should it be alright to remove the doubt itself? It's a perfectly legitimate comment describing a potential optimization. There are lots of other similar comments that might or might not ever get addressed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist
On Mon, Mar 13, 2017 at 3:52 PM, Tom Lanewrote: > David Rowley writes: > > On 13 March 2017 at 14:22, Neha Khatri wrote: > >> This copyObject still exits in the current code. So I was wondering if > the > >> comment question still holds good and why the question there in first > place. > >> To make a new Var object, copyObject seem to be the right choice, then > why > >> the doubt? > > > The doubt is in the fact if copyObject() is required at all. The other > > option being to simply reference the same object without having made a > copy. > > Right. Note that the code that 5efe3121 replaced effectively made a new > Var object using makeVar. The new code makes a new Var object using > copyObject, so there's no actual behavioral change in that fragment, just > fewer lines of code. But it's fair to wonder whether it wouldn't be safe > just to link to the existing Var object. This is tied up in the planner's > general willingness to scribble on its input data structures, so that > linking to a pre-existing object is vulnerable to some unrelated bit of > code deciding to scribble on that object. Ideally that wouldn't happen > ... but cleaning it up looks like a mighty tedious bit of janitorial work, > with uncertain payoff. So it hasn't happened in the last twenty years > and I'm not prepared to bet that it ever will. > Then, should it be alright to remove the doubt itself? Regards, Neha
Re: [HACKERS] asynchronous execution
> > > I think it will, because Append itself has been made async-capable by one > of the patches and UNION ALL uses Append. But as mentioned above, only > the postgres_fdw foreign tables will be able to utilize this for now. > > Ok, I'll re-run my test from a few weeks back and see if anything has changed.
Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist
David Rowleywrites: > On 13 March 2017 at 14:22, Neha Khatri wrote: >> This copyObject still exits in the current code. So I was wondering if the >> comment question still holds good and why the question there in first place. >> To make a new Var object, copyObject seem to be the right choice, then why >> the doubt? > The doubt is in the fact if copyObject() is required at all. The other > option being to simply reference the same object without having made a copy. Right. Note that the code that 5efe3121 replaced effectively made a new Var object using makeVar. The new code makes a new Var object using copyObject, so there's no actual behavioral change in that fragment, just fewer lines of code. But it's fair to wonder whether it wouldn't be safe just to link to the existing Var object. This is tied up in the planner's general willingness to scribble on its input data structures, so that linking to a pre-existing object is vulnerable to some unrelated bit of code deciding to scribble on that object. Ideally that wouldn't happen ... but cleaning it up looks like a mighty tedious bit of janitorial work, with uncertain payoff. So it hasn't happened in the last twenty years and I'm not prepared to bet that it ever will. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need a builtin way to run all tests faster manner
Andres Freundwrites: > On 2017-03-11 22:14:07 -0500, Tom Lane wrote: >> This looks generally sane to me, although I'm not very happy about folding >> the "$(MKDIR_P) output_iso" call into pg_isolation_regress_check --- that >> seems weird and unlike the way it's done for the regular regression test >> case. > Yea, not super happy about that either - alternatively we could fold it > into pg_regress. Yeah, teaching pg_regress to auto-create the --temp-instance directory seems perfectly sane from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow interrupts on waiting standby
On Wed, Mar 8, 2017 at 5:45 AM, Peter Eisentrautwrote: > On 1/30/17 20:34, Michael Paquier wrote: >> Both things are fixed in the new version attached. I have added as >> well this patch to the next commit fest: >> https://commitfest.postgresql.org/13/977/ > > I'm not clear now what this patch is intending to accomplish, seeing > that the original issue has already been fixed. This makes ResolveRecoveryConflictWithVirtualXIDs() more responsive if the process is signaled, as the wait in pg_usleep can go up to 1s if things remain stuck for a long time. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 03/13/2017 03:11 AM, Andreas Karlsson wrote: I also fixed the the code to properly support triggers. And by "support triggers" I actually meant fixing the support for moving the foreign keys to the new index. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 03/02/2017 03:10 AM, Michael Paquier wrote: On Wed, Mar 1, 2017 at 2:21 AM, Andreas Karlssonwrote: +/* + * Copy contraint flags for old index. This is safe because the old index + * guaranteed uniquness. + */ +newIndexForm->indisprimary = oldIndexForm->indisprimary; +oldIndexForm->indisprimary = false; +newIndexForm->indisexclusion = oldIndexForm->indisexclusion; +oldIndexForm->indisexclusion = false; [...] +deleteDependencyRecordsForClass(RelationRelationId, newIndexOid, +RelationRelationId, DEPENDENCY_AUTO); +deleteDependencyRecordsForClass(RelationRelationId, oldIndexOid, +ConstraintRelationId, DEPENDENCY_INTERNAL); + +// TODO: pg_depend for old index? Spotted one of my TODO comments there so I have attached a patch where I have cleaned up that function. I also fixed the the code to properly support triggers. There is a lot of mumbo-jumbo in the patch to create the exact same index definition as the original one being reindexed, and that's a huge maintenance burden for the future. You can blame me for that in the current patch. I am wondering if it would not just be better to generate a CREATE INDEX query string and then use the SPI to create the index, and also do the following extensions at SQL level: - Add a sort of WITH NO DATA clause where the index is created, so the index is created empty, and is marked invalid and not ready. - Extend pg_get_indexdef_string() with an optional parameter to enforce the index name to something else, most likely it should be extended with the WITH NO DATA/INVALID clause, which should just be a storage parameter by the way. By doing something like that what the REINDEX CONCURRENTLY code path should just be careful about is that it chooses an index name that avoids any conflicts. Hm, I am not sure how much that would help since a lot of the mumb-jumbo is by necessity in the step where we move the constraints over from the old index to the new. Andreas diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 306def4a15..ca1aeca65f 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -923,7 +923,8 @@ ERROR: could not serialize access due to read/write dependencies among transact Acquired by VACUUM (without FULL), - ANALYZE, CREATE INDEX CONCURRENTLY, and + ANALYZE, CREATE INDEX CONCURRENTLY, + REINDEX CONCURRENTLY, ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see ). diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index 3908ade37b..3449c0af73 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name +REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name @@ -68,9 +68,12 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } CONCURRENTLY option failed, leaving an invalid index. Such indexes are useless but it can be convenient to use REINDEX to rebuild them. Note that - REINDEX will not perform a concurrent build. To build the - index without interfering with production you should drop the index and - reissue the CREATE INDEX CONCURRENTLY command. + REINDEX will perform a concurrent build if + CONCURRENTLY is specified. To build the index without interfering + with production you should drop the index and reissue either the + CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY + command. Indexes of toast relations can be rebuilt with REINDEX + CONCURRENTLY. @@ -152,6 +155,21 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } +CONCURRENTLY + + + When this option is used, PostgreSQL will rebuild the + index without taking any locks that prevent concurrent inserts, + updates, or deletes on the table; whereas a standard reindex build + locks out writes (but not reads) on the table until it's done. + There are several caveats to be aware of when using this option + see . + + + + + VERBOSE @@ -231,6 +249,172 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } + + Rebuilding Indexes Concurrently + + + index + rebuilding concurrently + + + +Rebuilding an index can interfere with regular operation of a database. +Normally PostgreSQL locks the table whose index is rebuilt +against writes and performs the entire index build with a single scan of the +table. Other transactions can still read the table, but if they try to +insert, update, or delete rows in the table they will block until the +index rebuild is
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshovwrote: > Here I attached rebased patch waitlsn_10dev_v3 (core feature) > I will leave the choice of implementation (core/contrib) to the discretion > of the community. > > Will be glad to hear your suggestion about syntax, code and patch. Hi Ivan, Here is some feedback based on a first read-through of the v4 patch. I haven't tested it yet. First, I'll restate my view of the syntax-vs-function question: I think an fmgr function is the wrong place to do this, because it doesn't work for our 2 higher isolation levels as mentioned. Most people probably use READ COMMITTED most of the time so the extension version you've proposed is certainly useful for many people and I like it, but I will vote against inclusion in core of any feature that doesn't consider all three of our isolation levels, especially if there is no way to extend support to other levels later. I don't want PostgreSQL to keep adding features that eventually force everyone to use READ COMMITTED because they want to use feature X, Y or Z. Maybe someone can think of a clever way for an extension to insert a wait for a user-supplied LSN *before* acquiring a snapshot so it can work for the higher levels, or maybe the hooks should go into core PostgreSQL so that the extension can exist as an external project not requiring a patched PostgreSQL installation, or maybe this should be done with new core syntax that extends transaction commands. Do other people have views on this? + * Portions Copyright (c) 2012-2017, PostgresPro Global Development Group This should say PostgreSQL. +wl_lsn_updated_hook(void) +{ +uint i; +/* + * After update lastReplayedEndRecPtr set Latches in SHMEM array + */ +if (counter_waitlsn % count_waitlsn == 0 +|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn)) +{ Doesn't this mean that if you are waiting for LSN 1234, and the primary sends that LSN and then doesn't send anything for another hour, a standby waiting in pg_waitlsn is quite likely to skip that update (either because of count_waitlsn or interval_waitlsn), and then finish up waiting for a very long time? I'm not sure if this is a good idea, but it's an idea: You could keep your update skipping logic, but make sure you always catch the important case where recovery hits the end of WAL, by invoking your callback from WaitForWALToBecomeAvailable. It could have a boolean parameter that means 'don't skip this one!'. In other words, it's OK to skip updates, but not if you know there is no more WAL available to apply (since you have no idea how long it will be for more to arrive). Calling GetCurrentTimestamp() at high frequency (after every WAL record is replayed) may be a bad idea. It's a slow system call on some operating systems. Can we use an existing timestamp for free, like recoveryLastXTime? Remembering that the motivation for using this feature is to wait for *whole transactions* to be replayed and become visible, there is no sensible reason to wait for random WAL positions inside a transaction, so if you used that then you would effectively skip all non-COMMIT records and also skip some COMMIT records that are coming down the pipe too fast. +static void +wl_own_latch(void) +{ +SpinLockAcquire(>l_arr[MyBackendId].slock); +OwnLatch(>l_arr[MyBackendId].latch); +is_latch_owned = true; + +if (state->backend_maxid < MyBackendId) +state->backend_maxid = MyBackendId; + +state->l_arr[MyBackendId].pid = MyProcPid; +SpinLockRelease(>l_arr[MyBackendId].slock); +} What is the point of using extra latches for this? Why not just use the standard latch? Syncrep and many other things do that. I'm not actually sure if there is ever a reason to create more latches in regular backends. SIGUSR1 will be delivered and set the main latch anyway. There are cases of specialised latches in the system, like the wal receiver latch, and I'm reliably informed that that solves problems like delivering a wakeup message without having to know which backend is currently the wal receiver (a problem we might consider solving today with a condition variable?) I don't think anything like that applies here. +for (i = 0; i <= state->backend_maxid; i++) +{ +SpinLockAcquire(>l_arr[i].slock); +if (state->l_arr[i].pid != 0) +SetLatch(>l_arr[i].latch); +SpinLockRelease(>l_arr[i].slock); +} Once we get through the update-skipping logic above, we hit this loop which acquires spinlocks for every backend one after another and sets the latches of every backend, no matter whether they are waiting for the LSN that has been applied. Assuming we go with this scan-the-whole-array approach, why not include the LSN waited for in the array slots, so that we can avoid disturbing processes that are waiting for a later LSN? Could you talk a bit about
Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]
> Hello, > > On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote: > >> I've looked at the patch, and as I'm not that familiar with the > >> pg-sourcecode, customs and so on, this isn't a review, more like food > >> for thought and all should be taken with a grain of salt. :) > >> > >> So here are a few questions and remarks: > >> > >> >+ double limit_tuples = -1.0; > >> > >> Surely the limit cannot be fractional, and must be an integer. So > >> wouldn't it be better the same type as say: > >> > >> >+ if (root->limit_tuples >= 0.0 && > >> > >> Than you could also compare with ">= 0", not ">= 0.0". > >> > > The above variable represents the "estimated" number of rows at the > > planning stage, not execution time. > > You may be able to see Path structure has "rows" field declared as > > double type. It makes sense to consider stupid paths during planning, > > even if it is eventually rejected. For example, if a cross join with > > two large tables appear during planning, 64bit integer will make > > overflow easily. > > Hm, ok. Not related to your patch, just curious: Is there a mechanism in > place that automatically rejects plans where the limit would overflow the > double to uint64 conversation? Or is this more of a "there would be hopefully > a plan with a better limit so we do not use the bad one"? > > Would it possible to force a plan where such overflow would occur? > We have no such mechanism, and less necessity. Estimated number of rows in plan time is stored in the plan_rows field of the Plan structure, as FP64. Once plan-tree gets constructed, estimated number of rows shall not affect to the execution. (Some plan might use it for estimation of resource consumption on execution time.) On the other hands, the actual number of rows that was processed is saved on the instrument field of the PlanState structure. It is counted up from the zero by one. So, people wants to replace the hardware prior to uint64 overflow. If 1B rows are processed per sec, uint64 overflow happen at 26th century(!). > >> And this comment might be better "were we already called?" > >> > >> >+ boolrs_started; /* are we already > >> called? */ > >> > > Other variables in ResultState uses present form, like: > > > > + boolrs_started; /* are we already called? */ > > boolrs_done;/* are we done? */ > > boolrs_checkqual; /* do we need to check the qual? */ > > } ResultState; > > Yes, I noted that, but still "are" and "called" and "already" don't read > well together for me: > > are - present form > called - past form like "were we called?", or "are we called bob?" an > ongoing process > already - it has started > > So "are we already called" reads like someone is waiting for being called. > > Maybe to mirror the comment on "rs_done": > > /* have we started yet? */ > > Also, maybe it's easier for the comment to describe what is happening in > the code because of the flag, not just to the flag itself: > > /* To do things once when we are called */ > > Anyway, it is a minor point and don't let me distract you from your work, > I do like the feature and the patch :) > Fixed to "have we started yet?" PG-Strom Project / NEC OSS Promotion Center KaiGai Koheipassdown-limit-fdw.v6.patch Description: passdown-limit-fdw.v6.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
On 2017/03/11 8:19, Corey Huinker wrote: > > On Thu, Feb 23, 2017 at 6:59 AM, Kyotaro HORIGUCHI >> > wrote: > > 9e43e87 > > > Patch fails on current master, but correctly applies to 9e43e87. Thanks > for including the commit id. > > Regression tests pass. > > As with my last attempt at reviewing this patch, I'm confused about what > kind of queries can take advantage of this patch. Is it only cases where a > local table has multiple inherited foreign table children? IIUC, Horiguchi-san's patch adds asynchronous capability for ForeignScan's driven by postgres_fdw (after building some relevant infrastructure first). The same might be added to other Scan nodes (and probably other nodes as well) eventually so that more queries will benefit from asynchronous execution. It may just be that ForeignScan's benefit more from asynchronous execution than other Scan types. > Will it work > with queries where two foreign tables are referenced and combined with a > UNION ALL? I think it will, because Append itself has been made async-capable by one of the patches and UNION ALL uses Append. But as mentioned above, only the postgres_fdw foreign tables will be able to utilize this for now. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scram and \password
On Mon, Mar 13, 2017 at 12:48 AM, Joe Conwaywrote: > On 03/11/2017 11:07 PM, Michael Paquier wrote: >> Having a RLS on pg_authid to allow a user to look at its own password >> type is an idea. > > Given that that is not likely at this stage of the dev cycle, what about > a special purpose SQL function that returns the password type for the > current user? OK, so what about doing the following then: 1) Create pg_role_password_type(oid), taking a role OID in input and returning the type. 2) Extend PQencryptPassword with a method, and document. Plaintext is forbidden. 3) Extend \password in psql with a similar -method=scram|md5 argument, and forbid the use of "plain" format. After thinking about it, extending PQencryptPassword() would lead to future breakage, which makes it clear to not fall into the trap of having password_encryption set to scram in the server's as well as in pg_hba.conf and PQencryptPassword() enforcing things to md5. > Or is it a security issue of some sort to allow a user to > know their own password type? Don't think so. Any user has access to the value of password_encryption. So one can guess what will be the format of a password hashed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 11 March 2017 at 00:33, Petr Jelinekwrote: > On 09/03/17 18:48, Peter Eisentraut wrote: >> On 3/6/17 05:27, Petr Jelinek wrote: >>> And lastly I changed the automagic around exporting, not exporting and >>> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit >>> parameter for the CREATE_REPLICATION_SLOT command (and added simple >>> framework for adding more of those if needed in the future). >> >> It might be useful to make this into a separate patch, for clarity. >> > > Okay here it is with this part split out. The first patch also help with > Craig's logical decoding on standby so it definitely makes sense to be > split. Greatly appreciated. Committing this in chunks makes sense anyway. I've attached a slightly version that makes pg_recvlogical skip slot export. The second patch is unchanged, use the copy from the immediately prior mail. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 02af255a84736ac2783705055d6e998e476359af Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Thu, 9 Mar 2017 14:20:28 +0100 Subject: [PATCH 1/4] Add option to control snapshot export to CREATE_REPLICATION_SLOT We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT in the replication protocol, but several upcoming patches want more control over what happens with the slot. This means that when we allow creation of replication slots on standbys, which cannot export snapshots because they cannot allocate new XIDs, we don't have to silently omit the snapshot creation. It also allows clients like pg_recvlogical, which neither need nor can use the exported snapshot, to suppress its creation. Since snapshot exporting can fail this improves reliability. --- doc/src/sgml/logicaldecoding.sgml | 13 +++-- doc/src/sgml/protocol.sgml | 16 +- src/backend/commands/subscriptioncmds.c| 6 ++- .../libpqwalreceiver/libpqwalreceiver.c| 15 -- src/backend/replication/repl_gram.y| 43 src/backend/replication/repl_scanner.l | 2 + src/backend/replication/walsender.c| 58 -- src/bin/pg_basebackup/streamutil.c | 5 ++ src/include/nodes/replnodes.h | 2 +- src/include/replication/walreceiver.h | 6 +-- 10 files changed, 140 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 03c2c69..2b7d6e9 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -268,11 +268,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot - + Exported Snapshots - When a new replication slot is created using the streaming replication interface, - a snapshot is exported + When a new replication + slot is created using the streaming replication interface, a snapshot + is exported (see ), which will show exactly the state of the database after which all changes will be included in the change stream. This can be used to create a new replica by @@ -282,6 +283,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot database's state at that point in time, which afterwards can be updated using the slot's contents without losing any changes. + + Creation of a snapshot is not always possible - in particular, it will + fail when connected to a hot standby. Applications that do not require + snapshot export may suppress it with the NOEXPORT_SNAPSHOT + option. + diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 3d6e8ee..95603d3 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1487,7 +1487,7 @@ The commands accepted in walsender mode are: - CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin } + CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT ] } CREATE_REPLICATION_SLOT @@ -1538,6 +1538,20 @@ The commands accepted in walsender mode are: + + EXPORT_SNAPSHOT + NOEXPORT_SNAPSHOT + + + Decides what to do with the snapshot created during logical slot + initialization. EXPORT_SNAPSHOT, which is the + default, will export the snapshot for use in other sessions. This + option can't be used inside a transaction. The + NOEXPORT_SNAPSHOT will just use the snapshot for logical + decoding as normal but won't do anything else with it. + + + diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
Re: [HACKERS] Need a builtin way to run all tests faster manner
0;115;0c On 2017-03-11 22:14:07 -0500, Tom Lane wrote: > Andres Freundwrites: > > On 2017-03-11 11:48:31 -0800, Andres Freund wrote: > >> I think that'd be a good plan. We probably should also keep --outputdir > >> seperate (which test_decoding/Makefile does, but > >> pg_isolation_regress_check doesn't)? > > > Here's a patch doing that (based on yours). I'd be kind of inclined to > > set --outputdir for !isolation tests too; possibly even move tmp_check > > below output_iso/ output_regress/ or such - but that seems like it > > potentially could cause some disagreement... > > This looks generally sane to me, although I'm not very happy about folding > the "$(MKDIR_P) output_iso" call into pg_isolation_regress_check --- that > seems weird and unlike the way it's done for the regular regression test > case. Yea, not super happy about that either - alternatively we could fold it into pg_regress. But given the way that prove_checks works, I thought it'd not be too ugly comparatively. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scram and \password
On Mon, Mar 13, 2017 at 9:15 AM, Robert Haaswrote: > On Fri, Mar 10, 2017 at 5:43 PM, Michael Paquier > wrote: >> On Sat, Mar 11, 2017 at 2:53 AM, Jeff Janes wrote: >>> Should the \password tool in psql inspect password_encryption and act on it >>> being 'scram'? >> >> Not sure if it is wise to change the default fot this release. > > Seems like an odd way to phrase it. Aren't we talking about making a > feature that worked in previous releases continue to work? Considering how fresh scram is, it seems clear to me that we do not want to just switch the default values of password_encryption, the default behaviors of PQencryptPassword() and \password only to scram, but have something else. Actually if we change nothing for default deployments of Postgres using md5, PQencryptPassword() and \password would still work properly. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On 11 March 2017 at 14:32, Craig Ringerwrote: > I'll extract this part of the patch so it can be looked at separately, > it'll be clearer that way. Apparently I thought that last time too since I already posted it split up. Ahem. Working on too many different things at once. Last-posted patches apply fine, no need for an update. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 13 March 2017 at 10:56, Craig Ringerwrote: > On 7 March 2017 at 21:08, Simon Riggs wrote: > >> Patch 4 committed. Few others need rebase. > > Since this patch series and initial data copy for logical replication > both add a facility for suppressing initial snapshot export on a > logical slot, I've dropped patch 0003 (make snapshot export on logical > slot creation) in favour of Petr's similar patch. > > I will duplicate it in this patch series for ease of application. (The > version here is slightly extended over Petr's so I'll re-post the > modified version on the logical replication initial data copy thread > too). > > The main thing I want to direct attention to for Simon, as committer, > is the xlog'ing of VACUUM's xid threshold before we advance it and > start removing tuples. This is necessary for the standby to know > whether a given replication slot is safe to use and fail with conflict > with recovery if it is not, or if it becomes unsafe due to master > vacuum activity. Note that we can _not_ use the various vacuum records > for this because we don't know which are catalogs and which aren't; > we'd have to add a separate "is catalog" field to each vacuum xlog > record, which is undesirable. The downstream can't look up whether > it's a catalog or not because it doesn't have relcache/syscache access > during decoding. > > This change might look a bit similar to the vac_truncate_clog change > in the txid_status patch, but it isn't really related. The txid_status > change is about knowing when we can safely look up xids in clog and > preventing a race with clog truncation. This change is about knowing > when we can know all catalog tuples for a given xid will still be in > the heap, not vacuumed away. Both are about making sure standbys know > more about the state of the system in a low-cost way, though. > > WaitForMasterCatalogXminReservation(...) in logical.c is also worth > looking more closely at. I should also note that because the TAP tests currently take a long time, I recommend skipping the tests for this patch by default and running them only when actually touching logical decoding. I'm looking at ways to make them faster, but they're inevitably going to take a while until we can get hot standby feedback replies in place, including cascading support. Which I have as WIP, but won't make this release. Changing the test import to use Test::More skip_all => "disabled by default, too slow"; will be sufficient. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Sun, Mar 12, 2017 at 3:05 PM, Peter Geogheganwrote: > There is still an open item here, though: The leader-as-worker > Tuplesortstate, a special case, can still leak files. I phrased this badly. What I mean is that there can be instances where temp files are left on disk following a failure such as palloc() OOM; no backend ends up doing an unlink() iff a leader-as-worker Tuplesortstate was used and we get unlucky. I did not mean a leak of virtual or real file descriptors, which would see Postgres print a refcount leak warning from resowner.c. Naturally, these "leaked" files will eventually be deleted by the next restart of the server at the latest, within RemovePgTempFiles(). Note also that a duplicate unlink() (with annoying LOG message) is impossible under any circumstances with V9, regardless of whether or not a leader-as-worker Tuplesort state is involved. Anyway, I was sure that I needed to completely nail this down in order to be consistent with existing guarantees, but another look at OpenTemporaryFile() makes me doubt that. ResourceOwnerEnlargeFiles() is called, which itself uses palloc(), which can of course fail. There are remarks over that function within resowner.c about OOM: /* * Make sure there is room for at least one more entry in a ResourceOwner's * files reference array. * * This is separate from actually inserting an entry because if we run out * of memory, it's critical to do so *before* acquiring the resource. */ void ResourceOwnerEnlargeFiles(ResourceOwner owner) { ... } But this happens after OpenTemporaryFileInTablespace() has already returned. Taking care to allocate memory up-front here is motivated by keeping the vFD cache entry and current resource owner in perfect agreement about the FD_XACT_TEMPORARY-ness of a file, and that's it. It's *not* true that there is a broader sense in which OpenTemporaryFile() is atomic, which for some reason I previously believed to be the case. So, I haven't failed to prevent an outcome that wasn't already possible. It doesn't seem like it would be that hard to fix this, and then have the parallel tuplesort patch live up to that new higher standard. But, it's possible that Tom or maybe someone else would consider that a bad idea, for roughly the same reason that we don't call RemovePgTempFiles() for *crash* induced restarts, as mentioned by Thomas up-thead: * NOTE: we could, but don't, call this during a post-backend-crash restart * cycle. The argument for not doing it is that someone might want to examine * the temp files for debugging purposes. This does however mean that * OpenTemporaryFile had better allow for collision with an existing temp * file name. */ void RemovePgTempFiles(void) { ... } Note that I did put some thought into making sure OpenTemporaryFile() does the right thing with collisions with existing temp files. So, maybe the right thing is to do nothing at all. I don't have strong feelings either way on this question. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist
(Redirecting to Hackers, since Novice is not the correct place for this question) On 13 March 2017 at 14:22, Neha Khatriwrote: > Hi, > > I was debugging that when does the function _copyVar get invoked, and the > first hit for that was in the add_vars_to_targetlist. There I happened to > see the following comment: > > /* XXX is copyObject necessary here? */ > > Further digging showed that this copyObject got added in the commit > 5efe3121: > > + /* XXX is copyObject necessary here? */ > + rel->targetlist = lappend(rel->targetlist, > + create_tl_element((Var *) copyObject(var), > + length(rel->targetlist) + > 1)); > > This copyObject still exits in the current code. So I was wondering if the > comment question still holds good and why the question there in first place. > To make a new Var object, copyObject seem to be the right choice, then why > the doubt? > The doubt is in the fact if copyObject() is required at all. The other option being to simply reference the same object without having made a copy. The danger of not making a copy would be that any changes made would naturally affect all things which reference the object. It would seem the comment and the copyObject() are still there because nobody is satisfied that it's not required enough to go and remove it, that weighted against the fact that removing likely wouldn't buy that much performance wise is likely the reason it's still there. Probably if someone came up with a realistic enough case to prove that it was worth removing, then someone might take some time to go and check if it was safe to remove. There's a good chance that it'll not happen until then, giving that nobody's bothered in almost 18 years. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On 13 March 2017 at 08:54, Vaishnavi Prabakaranwrote: > Before going with this fix, I would like you to consider the option of > asking batch processing users(via documentation) to set single-row mode > before calling PQgetResult(). > Either way we need to fix the documentation part, letting users know how > they can activate single-row mode while using batch processing. > Let me know your thoughts. Thanks for looking into this, it's clear that I didn't cover the combo of single-row-mode and batch mode adequately. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN cost estimate
On 2 March 2017 at 04:40, Alvaro Herrerawrote: > Alvaro Herrera wrote: > >> I have added this patch to the commitfest, which I've been intending to >> get in for a long time. I'll be submitting an updated patch, if needed. > > Here is Emre's patch rebased to current sources. Looking over this now, and have noticed a couple of things: 1. + Assert(nnumbers == 1); I think its a bad idea to Assert() this. The stat tuple can come from a plugin which could do anything. Seems like if we need to be certain of that then it should be an elog(ERROR), maybe mention that we expected a 1 element array, but got elements. 2. + Assert(varCorrelation >= 0 && varCorrelation <= 1); same goes for that. I don't think we want to Assert() that either. elog(ERROR) seems better, or perhaps we should fall back on the old estimation method in this case? The Asserted range also seems to contradict the range mentioned in pg_statistic.h: /* * A "correlation" slot describes the correlation between the physical order * of table tuples and the ordering of data values of this column, as seen * by the "<" operator identified by staop. (As with the histogram, more * than one entry could theoretically appear.) stavalues is not used and * should be NULL. stanumbers contains a single entry, the correlation * coefficient between the sequence of data values and the sequence of * their actual tuple positions. The coefficient ranges from +1 to -1. */ #define STATISTIC_KIND_CORRELATION 3 3. + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel); should numBlocks be named numRanges? After all we call the option "pages_per_range". 4. + * correlated table is copied 4 times, the correlation would be 0.25, + * although the index would be almost as good as the version on the What's meant by "table is copied 4 times" ? As best as I can work out, it means. INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; but I'm uncertain, and it's not very clear to me what it means. 5. + blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2; I missed the comment that explains why we divide by two here. 6. Might want to consider not applying this estimate when the statistics don't contain any STATISTIC_KIND_CORRELATION array. In this case the estimation is still applied the same way, only the *indexCorrelation is 0. Consider: CREATE TABLE r2 (values INT NOT NULL); INSERT INTO r2 VALUES(1); ANALYZE r2; \x on SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class WHERE relname = 'r2'); Notice the lack of any stakind being set to 3. 7. + blockProportion = (double) BrinGetPagesPerRange(indexRel) / + baserel->pages; Perhaps the blockProportion should also be clamped to the number of pages in the relation. Even a 1 page relation is estimated to have 128 pages with the default pages_per_range, by the method used in the patch. blockProportion = Max(blockProportion, baserel->relpages); during my test the selectivity was set to 64.5, then clamped to 1 by CLAMP_PROBABILITY(). This does not seem very nice. Good news is, I'm seeing much better plans coming out in cases when the index is unlikely to help. So +1 for fixing this issue. Will Emre be around to make the required changes to the patch? I see it's been a while since it was originally posted. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On Sat, Mar 11, 2017 at 12:52 AM, Daniel Veritewrote: > Hi, > > I notice that PQsetSingleRowMode() doesn't work when getting batch results. > > The function is documented as: > " int PQsetSingleRowMode(PGconn *conn); > > This function can only be called immediately after PQsendQuery or one > of its sibling functions, before any other operation on the connection > such as PQconsumeInput or PQgetResult" > > But PQbatchQueueProcess() unconditionally clears conn->singleRowMode, > so whatever it was when sending the query gets lost, and besides > other queries might have been submitted in the meantime. > > Also if trying to set that mode when fetching like this > > while (QbatchQueueProcess(conn)) { >r = PQsetSingleRowMode(conn); >if (r!=1) { > fprintf(stderr, "PQsetSingleRowMode() failed"); >} >.. > > it might work the first time, but on the next iterations, conn->asyncStatus > might be PGASYNC_READY, which is a failure condition for > PQsetSingleRowMode(), so that won't do. > Thanks for investigating the problem, and could you kindly explain what "next iteration" you mean here? Because I don't see any problem in following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode() , PQgetResult(). Am I missing anything? Please note that it is MUST to call PQgetResult immediately after PQbatchQueueProcess() as documented. > > ISTM that the simplest fix would be that when in batch mode, > PQsetSingleRowMode() should register that the last submitted query > does request that mode, and when later QbatchQueueProcess dequeues > the batch entry for the corresponding query, this flag would be popped off > and set as the current mode. > > Please find attached the suggested fix, against the v5 of the patch. > Before going with this fix, I would like you to consider the option of asking batch processing users(via documentation) to set single-row mode before calling PQgetResult(). Either way we need to fix the documentation part, letting users know how they can activate single-row mode while using batch processing. Let me know your thoughts. Best Regards, Vaishnavi, Fujitsu Australia.
Re: [HACKERS] scram and \password
On Fri, Mar 10, 2017 at 5:43 PM, Michael Paquierwrote: > On Sat, Mar 11, 2017 at 2:53 AM, Jeff Janes wrote: >> Should the \password tool in psql inspect password_encryption and act on it >> being 'scram'? > > Not sure if it is wise to change the default fot this release. Seems like an odd way to phrase it. Aren't we talking about making a feature that worked in previous releases continue to work? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentrautwrote: > On 3/2/17 21:40, Robert Haas wrote: >> On the point mentioned above, I >> don't think adding a partition should move tuples, necessarily; seems >> like it would be good enough - maybe better - for it to fail if there >> are any that would need to be moved. > > ISTM that the uses cases of various combinations of adding a default > partition, adding another partition after it, removing the default > partition, clearing out the default partition in order to add more > nondefault partitions, and so on, need to be more clearly spelled out > for each partitioning type. We also need to consider that pg_dump and > pg_upgrade need to be able to reproduce all those states. Seems to be a > bit of work still ... This patch is only targeting list partitioning. It is not entirely clear that the concept makes sense for range partitioning; you can already define a partition from the end of the last partitioning up to infinity, or from minus-infinity up to the starting point of the first partition. The only thing a default range partition would do is let you do is have a single partition cover all of the ranges that would otherwise be unassigned, which might not even be something we want. I don't know how complete the patch is, but the specification seems clear enough. If you have partitions for 1, 3, and 5, you get partition constraints of (a = 1), (a = 3), and (a = 5). If you add a default partition, you get a constraint of (a != 1 and a != 3 and a != 5). If you then add a partition for 7, the default partition's constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The partition must be revalidated at that point for conflicting rows, which we can either try to move to the new partition, or just error out if there are any, depending on what we decide we want to do. I don't think any of that requires any special handling for either pg_dump or pg_upgrade; it all just falls out of getting the partitioning constraints correct and consistently enforcing them, just as for any other partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello. I added pgsql-hackers. > > This occurs also on git master and back to 9.4. > > At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson> wrote in gmail.com> > > On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson > wrote: > > > First, the postgresql configuration differs only minimally from the > stock > > > config: > > > > > > Assume wal_keep_segments = 0. > > > Assume the use of physical replication slots. > > > Assume one master, one standby. > > > > > > Lastly, we have observed the behavior "in the wild" at least twice and > in > > > the lab a dozen or so times. > > > > > > EXAMPLE #1 (logs are from the replica): > > > > > > user=,db=,app=,client= DEBUG: creating and filling new WAL file > > > user=,db=,app=,client= DEBUG: done creating and filling new WAL file > > > user=,db=,app=,client= DEBUG: sending write 6/8B00 flush > 6/8A00 > > > apply 5/748425A0 > > > user=,db=,app=,client= DEBUG: sending write 6/8B00 flush > 6/8B00 > > > apply 5/74843020 > > > > > > user=,db=,app=,client= DEBUG: postmaster received signal 2 > > > user=,db=,app=,client= LOG: received fast shutdown request > > > user=,db=,app=,client= LOG: aborting any active transactions > > > > > > And, upon restart: > > > > > > user=,db=,app=,client= LOG: restartpoint starting: xlog > > > user=,db=,app=,client= DEBUG: updated min recovery point to > 6/67002390 on > > > timeline 1 > > > user=,db=,app=,client= DEBUG: performing replication slot checkpoint > > > user=,db=,app=,client= DEBUG: updated min recovery point to > 6/671768C0 on > > > timeline 1 > > > user=,db=,app=,client= CONTEXT: writing block 589 of relation > > > base/13294/16501 > > > user=,db=,app=,client= LOG: invalid magic number in log segment > > > 00010006008B, offset 0 > > > user=,db=,app=,client= DEBUG: switched WAL source from archive to > stream > > > after failure > > > user=,db=,app=,client= LOG: started streaming WAL from primary at > > > 6/8A00 on timeline 1 > > > user=,db=,app=,client= FATAL: could not receive data from WAL stream: > > > ERROR: requested WAL segment 00010006008A has already been > > > removed > > I managed to reproduce this. A little tweak as the first patch > lets the standby to suicide as soon as walreceiver sees a > contrecord at the beginning of a segment. > > - M(aster): createdb as a master with wal_keep_segments = 0 > (default), min_log_messages = debug2 > - M: Create a physical repslot. > - S(tandby): Setup a standby database. > - S: Edit recovery.conf to use the replication slot above then > start it. > - S: touch /tmp/hoge > - M: Run pgbench ... > - S: After a while, the standby stops. > > LOG: STOP THE SERVER > > - M: Stop pgbench. > - M: Do 'checkpoint;' twice. > - S: rm /tmp/hoge > - S: Fails to catch up with the following error. > > > FATAL: could not receive data from WAL stream: ERROR: requested WAL > segment 0001002B has already been removed > > I have been testing / reviewing the latest patch "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might need some more clarification on this. Before applying the patch, I tried re-producing the above error - - I had master->standby in streaming replication - Took the backup of master - with a low max_wal_size and wal_keep_segments = 0 - Configured standby with recovery.conf - Created replication slot on master - Configured the replication slot on standby and started the standby - I got the below error >> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at 0/F2000140: wanted 24, got 0 >> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from primary at 0/F200 on timeline 1 >> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000100F2 has already been removed and i could notice that the file "000100F2" was removed from the master. This can be easily re-produced and this occurs irrespective of configuring replication slots. As long as the file "000100F2" is available on the master, standby continues to stream WALs without any issues. some more details - Contents of the file "000100F2" on standby before pg_stop_backup() rmgr: Standby len (rec/tot): 24/50, tx: 0, lsn: 0/F228, prev 0/F198, desc: RUNNING_XACTS nextXid 638 latestCompletedXid 637 oldestRunningXid 638 rmgr: Standby len (rec/tot): 24/50, tx: 0, lsn: 0/F260, prev 0/F228, desc: RUNNING_XACTS nextXid 638 latestCompletedXid 637 oldestRunningXid 638 rmgr: XLOGlen (rec/tot): 80/ 106, tx: 0, lsn: 0/F298, prev 0/F260, desc: CHECKPOINT_ONLINE
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Feb 16, 2017 at 8:45 AM, Peter Geogheganwrote: >> I do not think there should be any reason why we can't get the >> resource accounting exactly correct here. If a single backend manages >> to remove every temporary file that it creates exactly once (and >> that's currently true, modulo system crashes), a group of cooperating >> backends ought to be able to manage to remove every temporary file >> that any of them create exactly once (again, modulo system crashes). > > I believe that we are fully in agreement here. In particular, I think > it's bad that there is an API that says "caller shouldn't throw an > elog error between these two points", and that will be fixed before > too long. I just think that it's worth acknowledging a certain nuance. I attach my V9 of the patch. I came up some stuff for the design of resource management that I think meets every design goal that we have for shared/unified BufFiles: * Avoids both resource leaks, and spurious double-freeing of resources (e.g., a second unlink() for a file from a different process) when there are errors. The latter problem was possible before, a known issue with V8 of the patch. I believe that this revision avoids these problems in a way that is *absolutely bulletproof* in the face of arbitrary failures (e.g., palloc() failure) in any process at any time. Although, be warned that there is a remaining open item concerning resource management in the leader-as-worker case, which I go into below. There are now what you might call "critical sections" in one function. That is, there are points where we cannot throw an error (without a BEGIN_CRIT_SECTION()!), but those are entirely confined to unification code within the leader, where we can be completely sure that no error can be raised. The leader can even fail before some but not all of a particular worker's segments are in its local resource manager, and we still do the right thing. I've been testing this by adding code that randomly throws errors at points interspersed throughout worker and leader unification hand-off points. I then leave this stress-test build to run for a few hours, while monitoring for leaked files and spurious fd.c reports of double-unlink() and similar issues. Test builds change LOG to PANIC within several places in fd.c, while MAX_PHYSICAL_FILESIZE was reduced from 1GiB to BLCKSZ. All of these guarantees are made without any special care from caller to buffile.c. The only V9 change to tuplesort.c or logtape.c in this general area is that they have to pass a dynamic shared memory segment to buffile.c, so that it can register a new callback. That's it. This may be of particular interest to Thomas. All complexity is confined to buffile.c. * No expansion in the use of shared memory to manage resources. BufFile refcount is still per-worker. The role of local resource managers is unchanged. * Additional complexity over and above ordinary BufFile resource management is confined to the leader process and its on_dsm_detach() callback. Only the leader registers a callback. Of course, refcount management within BufFileClose() can still take place in workers, but that isn't something that we rely on (that's only for non-error paths). In general, worker processes mostly have resource managers managing their temp file segments as a thing that has nothing to do with BufFiles (BufFiles are still not owned by resowner.c/fd.c -- they're blissfully unaware of all of this stuff). * In general, unified BufFiles can still be treated in exactly the same way as conventional BufFiles, and things just work, without any special cases being exercised internally. There is still an open item here, though: The leader-as-worker Tuplesortstate, a special case, can still leak files. So, stress-testing will only show the patch to be completely robust against resource leaks when nbtsort.c is modified to enable FORCE_SINGLE_WORKER testing. Despite the name FORCE_SINGLE_WORKER, you can also modify that file to force there to be arbitrary-many workers requested (just change "requested = 1" to something else). The leader-as-worker problem is avoided because we don't have the leader participating as a worker this way, which would otherwise present issues for resowner.c that I haven't got around to fixing just yet. It isn't hard to imagine why this is -- one backend with two FDs for certain fd.c temp segments is just going to cause problems for resowner.c without additional special care. Didn't seem worth blocking on that. I want to prove that my general approach is workable. That problem is confined to one backend's resource manager when it is the leader participating as a worker. It is not a refcount problem. The simplest solution here would be to ban the leader-as-worker case by contract. Alternatively, we could pass fd.c segments from the leader-as-worker Tuplesortstate's BufFile to the leader Tuplesortstate's BufFile without opening or closing anything. This way, there will be no
Re: [HACKERS] possible encoding issues with libxml2 functions
2017-03-12 21:57 GMT+01:00 Noah Misch: > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote: > > 2017-03-12 0:56 GMT+01:00 Noah Misch : > > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: > > > > There are possible two fixes > > > > > > > > a) clean decl on input - the encoding info can be removed from decl > part > > > > > > > > b) use xml_out_internal everywhere before transformation to > > > > xmlChar. pg_xmlCharStrndup can be good candidate. > > > > > > I'd prefer (a) if the xml type were a new feature, because no good can > > > come of > > > storing an encoding in each xml field when we know the actual encoding > is > > > the > > > database encoding. However, if you implemented (a), we'd still see > > > untreated > > > values brought over via pg_upgrade. Therefore, I would try (b) > first. I > > > suspect the intent of xml_parse() was to implement (b); it will be > > > interesting > > > to see your test case that malfunctions. > > > > > > > I looked there again and I found so this issue is related to xpath > function > > only > > > > Functions based on xml_parse are working without problems. xpath_internal > > uses own direct xmlCtxtReadMemory without correct encoding sanitation. > > > > so fix is pretty simple > > Please add a test case. > It needs a application - currently there is not possibility to import XML document via recv API :( I wrote a pgimportdoc utility, but it is not part of core > > > --- a/src/backend/utils/adt/xml.c > > +++ b/src/backend/utils/adt/xml.c > > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype > *data, > > ArrayType *namespaces, > > ns_count = 0; > > } > > > > - datastr = VARDATA(data); > > - len = VARSIZE(data) - VARHDRSZ; > > + datastr = xml_out_internal(data, 0); > > Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly? > The > answer is probably in the archives, because someone understood the problem > enough to document "Some XML-related functions may not work at all on > non-ASCII data when the server encoding is not UTF-8. This is known to be > an > issue for xpath() in particular." Probably there are two possible issues 1. what I touched - recv function does encoding to database encoding - but document encoding is not updated. 2. there are not possibility to encode from document encoding to database encoding. > > + len = strlen(datastr); > > + > > xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; > > + > > The two lines of functional change don't create a cause for more newlines, > so > don't add these two newlines. > ok
Re: [HACKERS] delta relations in AFTER triggers
On Fri, Mar 10, 2017 at 11:48 AM, Kevin Grittnerwrote: > On Tue, Mar 7, 2017 at 6:28 PM, Kevin Grittner wrote: > >> New patch attached. > > And bit-rotted less than 24 hours later by fcec6caa. > > New patch attached just to fix bit-rot. > > That conflicting patch might be a candidate to merge into the new > Ephemeral Named Relation provided by my patch, for more flexibility > and extensibility... Thanks. I found a new way to break it: run the trigger function so that the plan is cached by plpgsql, then ALTER TABLE incompatibly, then run the trigger function again. See attached. On Wed, Mar 8, 2017 at 1:28 PM, Kevin Grittner wrote: >>> I was looking for omissions that would cause some kind of statements >>> to miss out on ENRs arbitrarily. It seemed to me that >>> parse_analyze_varparams should take a QueryEnvironment, mirroring >>> parse_analyze, so that PrepareQuery could pass it in. Otherwise, >>> PREPARE wouldn't see ENRs. Is there any reason why SPI_prepare should >>> see them but PREPARE not? >> >> Any thoughts about that? > > Do you see any way to test that code, or would it be dead code there > "just in case" we later decided to do something that needed it? I'm > not a big fan of the latter. I've had to spend too much time > maintaining and/or ripping out code that fits that description. I guess you could test it by reaching PREPARE and EXECUTE via dynamic SQL inside a plpgsql function (ie EXECUTE 'EXECUTE ...'). Really I was just trying to be thorough and examine every path into the parser and analyser to make sure they all supported the new QueryEnvironment argument. When I found that the PREPARE path didn't, my first thought was that there may be PLs that wouldn't be able to take advantage of plan reuse any other way, but I see that all the built-in PLs expose SPI_prepare, so that isn't a problem for them. You're probably right that it's not actually very useful. We've recorded this obscure omission in the archives. > Miscellanea: > > Do you suppose we should have all PLs that are part of the base > distro covered? I vote for doing that in Postgres 11. My pl/python patch[1] may be a useful starting point, but I haven't submitted it in this CF and nobody has shown up with pl/tcl or pl/perl versions. > What is necessary to indicate an additional SQL feature covered? I assume you're talking about information_schema.sql_features, and I see you've created a new thread to talk about that. I'm not sure about that, but a couple of thoughts occurred to me when looking for references to transition tables in an old draft standard I have. These are both cases where properties of the subject table should probably also affect access to the derived transition tables: * What privileges implications are there for transition tables? I'm wondering about column and row level privileges; for example, if you can't see a column in the subject table, I'm guessing you shouldn't be allowed to see it in the transition table either, but I'm not sure. * In future we could consider teaching it about functional dependencies as required by the spec; if you can SELECT id, name FROM GROUP BY id, I believe you should be able to SELECT id, name FROM GROUP BY id, but currently you can't. [1] https://www.postgresql.org/message-id/CAEepm=3wvmpmz3bkftk2kcnd9kr7hxpz2skj8sfzx_vsute...@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com DROP TABLE IF EXISTS hoge; CREATE TABLE hoge ( id int primary key, name text ); CREATE OR REPLACE FUNCTION hoge_upd_func() RETURNS TRIGGER LANGUAGE plpgsql AS $$ BEGIN RAISE WARNING 'old table = %, new table = %', (SELECT string_agg(id || '=' || name, ',') FROM d), (SELECT string_agg(id || '=' || name, ',') FROM i); RETURN NULL; END; $$; CREATE TRIGGER hoge_upd_trigger AFTER UPDATE ON hoge REFERENCING OLD TABLE AS d NEW TABLE AS i FOR EACH STATEMENT EXECUTE PROCEDURE hoge_upd_func(); insert into hoge values (1, '1'), (2, '2'), (3, '3'); update hoge set name = name || name; -- now change 'name' to an integer to see what happens... alter table hoge alter column name type int using name::integer; update hoge set name = (name::text || name::text)::integer; -- at this point we get an error message: -- ERROR: attribute 2 has wrong type -- DETAIL: Table has type integer, but query expects text. -- That error ^ can be cleared by recreating the function hoge_upd_func. -- now drop column 'name' alter table hoge drop column name; update hoge set id = id; -- segfault! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible encoding issues with libxml2 functions
On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote: > 2017-03-12 0:56 GMT+01:00 Noah Misch: > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: > > > There are possible two fixes > > > > > > a) clean decl on input - the encoding info can be removed from decl part > > > > > > b) use xml_out_internal everywhere before transformation to > > > xmlChar. pg_xmlCharStrndup can be good candidate. > > > > I'd prefer (a) if the xml type were a new feature, because no good can > > come of > > storing an encoding in each xml field when we know the actual encoding is > > the > > database encoding. However, if you implemented (a), we'd still see > > untreated > > values brought over via pg_upgrade. Therefore, I would try (b) first. I > > suspect the intent of xml_parse() was to implement (b); it will be > > interesting > > to see your test case that malfunctions. > > > > I looked there again and I found so this issue is related to xpath function > only > > Functions based on xml_parse are working without problems. xpath_internal > uses own direct xmlCtxtReadMemory without correct encoding sanitation. > > so fix is pretty simple Please add a test case. > --- a/src/backend/utils/adt/xml.c > +++ b/src/backend/utils/adt/xml.c > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data, > ArrayType *namespaces, > ns_count = 0; > } > > - datastr = VARDATA(data); > - len = VARSIZE(data) - VARHDRSZ; > + datastr = xml_out_internal(data, 0); Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly? The answer is probably in the archives, because someone understood the problem enough to document "Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xpath() in particular." > + len = strlen(datastr); > + > xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; > + The two lines of functional change don't create a cause for more newlines, so don't add these two newlines. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Stephen Frost (sfr...@snowman.net) wrote: > > * Haribabu Kommi (kommi.harib...@gmail.com) wrote: > > > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy> > > wrote: > > > > The new status of this patch is: Ready for Committer > > > > > > Thanks for the review. > > > > I've started taking a look at this with an eye towards committing it > > soon. > > I've spent a good bit of time going over this, possibly even more than > it was worth, but hopefully we'll see people making use of this data > type with PG10 and as more IPv6 deployment happens. And, naturally, re-reading the email as it hit the list made me realize that the documentation/error-message incorrectly said "3rd and 4th" bytes were being set to FF and FE, when it's actually the 4th and 5th byte. The code was correct, just the documentation and the error message had the wrong numbers. The commit message is also correct. As a reminder to myself, this will also need a catversion bump when it gets committed, of course. Updated patch attached. Thanks! Stephen From 5b470010cacdc32bec521b1d57ad9f60c7bd638a Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Thu, 9 Mar 2017 17:47:28 -0500 Subject: [PATCH] Add support for EUI-64 MAC addresses as macaddr8 This adds in support for EUI-64 MAC addresses by adding a new data type called 'macaddr8' (using our usual convention of indicating the number of bytes stored). This was largely a copy-and-paste from the macaddr data type, with appropriate adjustments for having 8 bytes instead of 6 and adding support for converting a provided EUI-48 (6 byte format) to the EUI-64 format. Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and 5th bytes but does not perform the IPv6 modified EUI-64 action of flipping the 7th bit, but we add a function to perform that specific action for the user as it may be commonly done by users who wish to calculate their IPv6 address based on their network prefix and 48-bit MAC address. Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me. Reviewed by: Vitaly Burovoy, Kuntal Ghosh Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=tctnq+efkecehtmu_yo1mvx8hsk_g...@mail.gmail.com --- contrib/btree_gin/Makefile | 4 +- contrib/btree_gin/btree_gin--1.0--1.1.sql | 35 ++ contrib/btree_gin/btree_gin.c | 10 + contrib/btree_gin/btree_gin.control | 2 +- contrib/btree_gin/expected/macaddr8.out | 51 +++ contrib/btree_gin/sql/macaddr8.sql | 22 ++ contrib/btree_gist/Makefile | 11 +- contrib/btree_gist/btree_gist--1.3--1.4.sql | 64 contrib/btree_gist/btree_gist.control | 2 +- contrib/btree_gist/btree_gist.h | 1 + contrib/btree_gist/btree_macaddr8.c | 200 ++ contrib/btree_gist/expected/macaddr8.out| 89 + contrib/btree_gist/sql/macaddr8.sql | 37 ++ doc/src/sgml/brin.sgml | 11 + doc/src/sgml/btree-gin.sgml | 3 +- doc/src/sgml/btree-gist.sgml| 4 +- doc/src/sgml/datatype.sgml | 83 + doc/src/sgml/func.sgml | 56 +++ src/backend/utils/adt/Makefile | 2 +- src/backend/utils/adt/mac.c | 13 +- src/backend/utils/adt/mac8.c| 560 src/backend/utils/adt/network.c | 10 + src/backend/utils/adt/selfuncs.c| 1 + src/include/catalog/pg_amop.h | 18 + src/include/catalog/pg_amproc.h | 7 + src/include/catalog/pg_cast.h | 6 + src/include/catalog/pg_opclass.h| 3 + src/include/catalog/pg_operator.h | 23 +- src/include/catalog/pg_opfamily.h | 3 + src/include/catalog/pg_proc.h | 37 +- src/include/catalog/pg_type.h | 4 + src/include/utils/inet.h| 22 ++ src/test/regress/expected/macaddr8.out | 355 ++ src/test/regress/expected/opr_sanity.out| 6 + src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule| 1 + src/test/regress/sql/macaddr8.sql | 89 + 37 files changed, 1827 insertions(+), 20 deletions(-) create mode 100644 contrib/btree_gin/btree_gin--1.0--1.1.sql create mode 100644 contrib/btree_gin/expected/macaddr8.out create mode 100644 contrib/btree_gin/sql/macaddr8.sql create mode 100644 contrib/btree_gist/btree_gist--1.3--1.4.sql create mode 100644 contrib/btree_gist/btree_macaddr8.c create mode 100644 contrib/btree_gist/expected/macaddr8.out create mode 100644 contrib/btree_gist/sql/macaddr8.sql create mode 100644 src/backend/utils/adt/mac8.c create mode 100644 src/test/regress/expected/macaddr8.out create mode 100644 src/test/regress/sql/macaddr8.sql diff --git
[HACKERS] bugfix: xpath encoding issue
Hi When I tested XMLTABLE function I found a bug of XPATH function - xpath_internal There xmltype is not correctly encoded to xmlChar due possible invalid encoding info in header. It is possible when XML was loaded with recv function and has not UTF8 encoding. The functions based on xml_parse function works well. The fix is simple diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index f81cf489d2..89aae48cb3 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, ns_count = 0; } - datastr = VARDATA(data); - len = VARSIZE(data) - VARHDRSZ; + datastr = xml_out_internal(data, 0); + len = strlen(datastr); + xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; + if (xpath_len == 0) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), Regards Pavel
Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Haribabu Kommi (kommi.harib...@gmail.com) wrote: > > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy> > wrote: > > > The new status of this patch is: Ready for Committer > > > > Thanks for the review. > > I've started taking a look at this with an eye towards committing it > soon. I've spent a good bit of time going over this, possibly even more than it was worth, but hopefully we'll see people making use of this data type with PG10 and as more IPv6 deployment happens. Of particular note, I rewrote macaddr8_in to not use sscanf(). sscanf(), at least on my system, would accept negative values even for '%2x', leading to slightly odd errors with certain inputs, including with our existing macaddr type: =# select '00-203040506'::macaddr; ERROR: invalid octet value in "macaddr" value: "00-203040506" LINE 1: select '00-203040506'::macaddr; With my rewrite, the macaddr8 type will throw a clearer (in my view, at least) error: =# select '00-203040506'::macaddr8; ERROR: invalid input syntax for type macaddr8: "00-203040506" LINE 1: select '00-203040506'::macaddr8; One other point is that the previously disallowed format with just two colons ('0800:2b01:0203') is now allowed. Given that both the two dot format ('0800.2b01.0203') and the two dash format ('0800-2b01-0203') were accepted, this seemed alright to me. Is there really a good reason to disallow the two colon format? I didn't change how macaddr works as it doesn't appear to produce any outright incorrect behavior as-is (just slightly odd error messages) and some users might be expecting the current errors. I don't hold that position very strongly, however, and I have little doubt that the new macaddr8_in() is faster than using sscanf(), so that might be reason to consider rewriting macaddr_in in a similar fashion (or having a generic set of functions to handle both). I considered using the functions we already use for bytea, but they don't quite match up to the expectations for MAC addresses (in particular, we shouldn't accept random whitespace in the middle of a MAC address). Perhaps we could modify those functions to be parameterized in a way to support how a MAC address should look, but it's not all that much code to be reason enough to put a lot of effort towards that, in my view at least. This also reduces the risk that bugs get introduced which break existing behavior too. I also thought about what we expect the usage of macaddr8 to be and realized that we should really have a function to help go from EUI-48 to the IPv6 Modified EUI-64 format, since users will almost certainly want to do exactly that. As such, I added a macaddr8_set7bit() function which will perform the EUI-64 -> Modified EUI-64 change (which is just setting the 7th bit) and added associated documentation and reasoning for why that function exists. In any case, it would be great to get some additional review of this, in particular of my modifications to macaddr8_in() and if anyone has any thoughts regarding the added macaddr8_set7bit() function. I'm going to take a break from it for a couple days to see if there's any additional comments and then go over it again myself. Barring issues, I'll commit the attached later this week. Thanks! Stephen From 239affc0e5b09964029f075c5370556c56de005d Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Thu, 9 Mar 2017 17:47:28 -0500 Subject: [PATCH] Add support for EUI-64 MAC addresses as macaddr8 This adds in support for EUI-64 MAC addresses by adding a new data type called 'macaddr8' (using our usual convention of indicating the number of bytes stored). This was largely a copy-and-paste from the macaddr data type, with appropriate adjustments for having 8 bytes instead of 6 and adding support for converting a provided EUI-48 (6 byte format) to the EUI-64 format. Conversion from EUI-48 to EUI-64 adds FFFE as the 4th and 5th bytes but does not perform the IPv6 modified EUI-64 action of flipping the 7th bit, but we add a function to perform that specific action for the user as it may be commonly done by users who wish to calculate their IPv6 address based on their network prefix and 48-bit MAC address. Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me. Reviewed by: Vitaly Burovoy, Kuntal Ghosh Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=tctnq+efkecehtmu_yo1mvx8hsk_g...@mail.gmail.com --- contrib/btree_gin/Makefile | 4 +- contrib/btree_gin/btree_gin--1.0--1.1.sql | 35 ++ contrib/btree_gin/btree_gin.c | 10 + contrib/btree_gin/btree_gin.control | 2 +- contrib/btree_gin/expected/macaddr8.out | 51 +++ contrib/btree_gin/sql/macaddr8.sql | 22 ++ contrib/btree_gist/Makefile | 11 +- contrib/btree_gist/btree_gist--1.3--1.4.sql | 64 contrib/btree_gist/btree_gist.control | 2 +-
Re: [HACKERS] possible encoding issues with libxml2 functions
2017-03-12 0:56 GMT+01:00 Noah Misch: > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: > > Today I played with xml_recv function and with xml processing functions. > > > > xml_recv function ensures correct encoding from document encoding to > server > > encoding. But the decl section holds original encoding info - that should > > be obsolete after encoding. Sometimes we solve this issue by removing > decl > > section - see the xml_out function. > > > > Sometimes we don't do it - lot of functions uses direct conversion from > > xmltype to xmlChar. > > > There are possible two fixes > > > > a) clean decl on input - the encoding info can be removed from decl part > > > > b) use xml_out_internal everywhere before transformation to > > xmlChar. pg_xmlCharStrndup can be good candidate. > > I'd prefer (a) if the xml type were a new feature, because no good can > come of > storing an encoding in each xml field when we know the actual encoding is > the > database encoding. However, if you implemented (a), we'd still see > untreated > values brought over via pg_upgrade. Therefore, I would try (b) first. I > suspect the intent of xml_parse() was to implement (b); it will be > interesting > to see your test case that malfunctions. > I looked there again and I found so this issue is related to xpath function only Functions based on xml_parse are working without problems. xpath_internal uses own direct xmlCtxtReadMemory without correct encoding sanitation. so fix is pretty simple diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index f81cf489d2..89aae48cb3 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, ns_count = 0; } - datastr = VARDATA(data); - len = VARSIZE(data) - VARHDRSZ; + datastr = xml_out_internal(data, 0); + len = strlen(datastr); + xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; + if (xpath_len == 0) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), Regards Pavel
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
"David G. Johnston"writes: > There are only four commands and a finite number of usage permutations. > Enumerating and figuring out the proper behavior for each should be done. > Thus - If the expressions are bad they are considered false but the block > is created > If the flow-control command is bad the system will tell the user why and > how to get back to a valid state - the entire machine state goes INVALID > until a corrective command is encountered. > For instance: > \if > \else > \elif > warning: elif block cannot occur directly within an \else block. either > start a new \if, \endif the current scope, or type \else to continue > entering commands into the existing else block. no expression evaluation > has occurred. > \echo 'c' > warning: command ignored in broken \if block scope - see prior correction > options This is looking a whole lot like the overcomplicated error reporting that we already considered and rejected. I think it's sufficient to print something like "\elif is not allowed to follow \else; command ignored" and not change state. We're not really helping anybody by going into an "invalid machine state" AFAICS, and having such a thing complicates the mental model more than I'd like. A different way of looking at this problem, which will seem like overkill right now but would absolutely not be once you consider looping, is that what should happen when we see \if is that we do nothing but absorb text until we see the matching \endif. At that point we could bitch and throw everything away if, say, there's \elif after \else, or anything else you want to regard as a "compile time error". Otherwise we start execution, and from there on it probably has to behave as we've been discussing. But this'd be pretty unfriendly from an interactive standpoint, and I'm not really convinced that it makes for significantly better error reporting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-03-10 9:43 GMT+01:00 Jan Michálek: > > > 2017-03-09 20:10 GMT+01:00 Peter Eisentraut com>: > >> This is looking pretty neat. I played around with it a bit. There are >> a couple of edge cases that you need to address, I think. >> > > Thanks, original code is very synoptical and and well prepared for adding > new formats. > > >> >> - Does not support \x >> > > I know, i dnot`t know, if \x make sense in this case. I will look, how it > is done in other formats like html. I think, that it should work in sense, > that table generated to rst should give similar output after processing > like output of html format. > > I prepared something like this (i have no prepared diff, i need do some another changes) There a few things I need to do. First problem is bold column names, i should do it in sme fashin as "RECORD", but i need to do some research about length of column. Bigger problem is with tab indent, rst processor doesn`t work with this in this case. jelen=# execute q \g | xclip +-++ | **RECORD 1** | +-++ | column1 | Elephant, kangaroo,| | | squirrel, gorilla | +-++ | column2 | 121| +-++ | column3 | 1.0035971223021583 | +-++ | column4 | 0. | +-++ | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello| +-++ | **RECORD 2** | +-++ | column1 | goat, rhinoceros, | | | monkey, ape| +-++ | column2 | 11121 | +-++ | column3 | 1.0007824726134585 | +-++ | column4 | 5. | +-++ | column5 | xx xx xx xx xx xx xx xx xx xx | +-++ | **RECORD 3** | +-++ | column1 | donkey, cow, horse, tit, | | | eagle, whale, | | | aligator, | | | pelican,| | | grasshoper | | | pig| | | bat| +-++ | column2 | 14351 | +-++ | column3 | 50.3877551020408163| +-++ | column4 | 345.11 | +-++ | column5 | yy yy yy yy yy yy yy yy yy yy | +-++ > >> - When \pset format is rst, then \pset linestyle also shows up as >> "rst". That is wrong. Same for markdown.
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On Sun, Mar 12, 2017 at 10:24 AM, Tom Lanewrote: > > One point here is that we need to distinguish problems in the expression, > which could arise from changing variable values, from some other types of > mistakes like \elif with no preceding \if. When you see something like > that you pretty much have to treat it as a no-op; but I don't think that's > a problem for scripting usage. > One of my discarded write-ups from last night made a point that we don't really distinguish between run-time and compile-time errors - possibly because we haven't had to until now. If we detect what would be considered a compile-time error (\elif after \else for instance) we could treat anything that isn't a conditional meta-command as a no-op with a warning (and exit in stop-script mode). There are only four commands and a finite number of usage permutations. Enumerating and figuring out the proper behavior for each should be done. Thus - If the expressions are bad they are considered false but the block is created If the flow-control command is bad the system will tell the user why and how to get back to a valid state - the entire machine state goes INVALID until a corrective command is encountered. For instance: \if \else \elif warning: elif block cannot occur directly within an \else block. either start a new \if, \endif the current scope, or type \else to continue entering commands into the existing else block. no expression evaluation has occurred. \echo 'c' warning: command ignored in broken \if block scope - see prior correction options Yes, that's wordy, but if that was shown the user would be able to recognize their situation and be able to get back to their desired state. Figuring out what the valid correction commands are for each invalid state, and where the user is left, is tedious but mechanical. So we'd need an INVALID state as well as the existing IGNORE state. \endif would always work - but take you up one nesting level The user shouldn't need to memorize the invalid state rules. While we could document them and point the reader there having them inline seems preferable. > > We could imagine resolving this tension by treating failed \if expressions > differently in interactive and noninteractive cases. But I fear that cure > would be worse than the disease. > I don't think this becomes necessary - we should distinguish the error types the same in both modes.
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
> > > (1) document that \if-related commands MUST be on their own > line (i.e. like cpp #if directives?). > I have no opinion on whether \if-related comments must be on their own line, though I coded as if that were the case. I want to point out that the goal down the road is to allow rudimentary expressions beyond just 'will this string cast to boolean true'. For example, in the earlier thread "Undefined psql variables", I proposed a slash command that would test if a named psql var were defined, and if not then assign it a value. Tom suggested leveraging if-then infrastructure like this \if not defined(x) \set x y \fi Which would be great. I ask that whatever we decide in terms of how much more input we read to digest the expression allow for constructs like the one above.
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Corey Huinkerwrites: > Reading this, I started to wonder "so how did I get that impression?" and I > found this from Feb 9: > IMO, an erroneous backslash command should have no effect, period. > "It failed but we'll treat it as if it were valid" is a rathole > I don't want to descend into. It's particularly bad in interactive > mode, because the most natural thing to do is correct your spelling > and issue the command again --- but if psql already decided to do > something on the strength of the mistaken command, that doesn't work, > and you'll have to do something or other to unwind the unwanted > control state before you can get back to what you meant to do. Yeah, it's not the greatest thing for interactive usage, but as we already discussed, this feature needs to be optimized for scripting not interaction --- and even a bit of thought shows that the current behavior is disastrous for scripting. If your only suggestion for getting sane behavior in a script is "set ON_ERROR_STOP", you've failed to provide useful error handling. One point here is that we need to distinguish problems in the expression, which could arise from changing variable values, from some other types of mistakes like \elif with no preceding \if. When you see something like that you pretty much have to treat it as a no-op; but I don't think that's a problem for scripting usage. We could imagine resolving this tension by treating failed \if expressions differently in interactive and noninteractive cases. But I fear that cure would be worse than the disease. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On Sun, Mar 12, 2017 at 1:52 AM, David G. Johnston < david.g.johns...@gmail.com> wrote: > > Oddly, Corey was using you as support for this position...though without an actual quote: > > """ Reading this, I started to wonder "so how did I get that impression?" and I found this from Feb 9: IMO, an erroneous backslash command should have no effect, period. "It failed but we'll treat it as if it were valid" is a rathole I don't want to descend into. It's particularly bad in interactive mode, because the most natural thing to do is correct your spelling and issue the command again --- but if psql already decided to do something on the strength of the mistaken command, that doesn't work, and you'll have to do something or other to unwind the unwanted control state before you can get back to what you meant to do.
Re: [HACKERS] scram and \password
On 03/11/2017 11:07 PM, Michael Paquier wrote: > Having a RLS on pg_authid to allow a user to look at its own password > type is an idea. Given that that is not likely at this stage of the dev cycle, what about a special purpose SQL function that returns the password type for the current user? Or is it a security issue of some sort to allow a user to know their own password type? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
I find, there is problem in tab indent in rst, it looks that lines should be aligned to left in some cases. 2017-03-10 9:43 GMT+01:00 Jan Michálek: > > > 2017-03-09 20:10 GMT+01:00 Peter Eisentraut com>: > >> This is looking pretty neat. I played around with it a bit. There are >> a couple of edge cases that you need to address, I think. >> > > Thanks, original code is very synoptical and and well prepared for adding > new formats. > > >> >> - Does not support \x >> > > I know, i dnot`t know, if \x make sense in this case. I will look, how it > is done in other formats like html. I think, that it should work in sense, > that table generated to rst should give similar output after processing > like output of html format. > > >> >> - When \pset format is rst, then \pset linestyle also shows up as >> "rst". That is wrong. Same for markdown. >> > > I will look on this. > > >> >> - Broken output in tuples_only (\t) mode. (rst and markdown) >> > > Similar to \x, im not certain, what it should return. I will look, what > returns html format. Or i can use it in markdown for nice vs expanded > format. > > >> >> - rst: Do something about \pset title; the way it currently shows up >> appears to be invalid; could use ".. table:: title" directive >> > > OK, it shouldn`t be problem alter this. > > >> >> - markdown: Extra blank line between table and footer. >> > > It is because markdown needs empty line after table, if is row count > presented. > > >> >> - markdown: We should document or comment somewhere exactly which of the >> various markdown table formats this is supposed to produce. (Pandoc >> pipe_tables?) >> > > I use format that was similar to aligned format and ascii linestyle, > because it allows me to use existing features. I should look over more > table styles in markdown. > > >> >> - markdown: Table title needs to be after the table, like >> >> Table: title >> >> I will change this. > > >> - markdown: Needs to escape | characters in cell contents. (Not >> needed for rst.) More escaping might be needed. >> > > This can be problem because of aligning, i will look on this, this same > problem as replace newline with for markdown. > > Have Nice day > > Jan > > >> >> -- >> Peter Eisentraut http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >> > > > > -- > Jelen > Starší čeledín datovýho chlíva > -- Jelen Starší čeledín datovýho chlíva
Re: [HACKERS] Parallel Append implementation
Moin, On Sat, March 11, 2017 11:29 pm, Robert Haas wrote: > On Fri, Mar 10, 2017 at 6:01 AM, Tels> wrote: >> Just a question for me to understand the implementation details vs. the >> strategy: >> >> Have you considered how the scheduling decision might impact performance >> due to "inter-plan parallelism vs. in-plan parallelism"? >> >> So what would be the scheduling strategy? And should there be a fixed >> one >> or user-influencable? And what could be good ones? >> >> A simple example: >> >> E.g. if we have 5 subplans, and each can have at most 5 workers and we >> have 5 workers overall. >> >> So, do we: >> >> Assign 5 workers to plan 1. Let it finish. >> Then assign 5 workers to plan 2. Let it finish. >> and so on >> >> or: >> >> Assign 1 workers to each plan until no workers are left? > > Currently, we do the first of those, but I'm pretty sure the second is > way better. For example, suppose each subplan has a startup cost. If > you have all the workers pile on each plan in turn, every worker pays > the startup cost for every subplan. If you spread them out, then > subplans can get finished without being visited by all workers, and > then the other workers never pay those costs. Moreover, you reduce > contention for spinlocks, condition variables, etc. It's not > impossible to imagine a scenario where having all workers pile on one > subplan at a time works out better: for example, suppose you have a > table with lots of partitions all of which are on the same disk, and > it's actually one physical spinning disk, not an SSD or a disk array > or anything, and the query is completely I/O-bound. Well, it could > be, in that scenario, that spreading out the workers is going to turn > sequential I/O into random I/O and that might be terrible. In most > cases, though, I think you're going to be better off. If the > partitions are on different spindles or if there's some slack I/O > capacity for prefetching, you're going to come out ahead, maybe way > ahead. If you come out behind, then you're evidently totally I/O > bound and have no capacity for I/O parallelism; in that scenario, you > should probably just turn parallel query off altogether, because > you're not going to benefit from it. I agree with the proposition that both strategies can work well, or not, depending on system-setup, the tables and data layout. I'd be a bit more worried about turning it into the "random-io-case", but that's still just a feeling and guesswork. So which one will be better seems speculative, hence the question for benchmarking different strategies. So, I'd like to see the scheduler be out in a single place, maybe a function that get's called with the number of currently running workers, the max. number of workers to be expected, the new worker, the list of plans still todo, and then schedules that single worker to one of these plans by strategy X. That would make it easier to swap out X for Y and see how it fares, wouldn't it? However, I don't think the patch needs to select the optimal strategy right from the beginning (if that even exists, maybe it's a mixed strategy), even "not so optimal" parallelism will be better than doing all things sequentially. Best regards, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue about the streaming replication
Thanks for your information. Now I know the pg_rewind tool. But why PG does not recover the diverge automatically? There exists two options at least, analogy to what "git merge" does: a) the old master find out and rewind itself to the common base of the new master in the WAL history before applying new WAL segments from new master. b) maybe with logical replication, two masters could merge their data. if conflict exists, uses the new version. That means data committed on the old master could survive the crash, which may be important for some business. The pg_rewind helps to do option a. But how about option b? 2017-03-12 19:20 GMT+08:00 Michael Paquier: > On Sun, Mar 12, 2017 at 5:24 PM, Jinhua Luo wrote: >> I think this diverge scenario is common, because it's likely the >> master would crash due to some hardware issue (e.g. power off) which >> would cause some committed transaction has not yet synced to slave, >> while the slave would be promoted to new master and accepts new >> transactions, then how to recover the old master? Moreover, how to >> recover the data on old master which is missing on new master? > > pg_rewind (https://www.postgresql.org/docs/9.6/static/app-pgrewind.html) > has been designed with exactly this scenario in mind, aka recycling a > past master as a slave to a promoted node. Have you looked at it? What > you are trying to do is much likely going to create corruptions on > your systems, so I am not surprised that you see inconsistency > failures, what you are seeing is pretty much the tip of hte iceberg. > -- > Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue about the streaming replication
On Sun, Mar 12, 2017 at 5:24 PM, Jinhua Luowrote: > I think this diverge scenario is common, because it's likely the > master would crash due to some hardware issue (e.g. power off) which > would cause some committed transaction has not yet synced to slave, > while the slave would be promoted to new master and accepts new > transactions, then how to recover the old master? Moreover, how to > recover the data on old master which is missing on new master? pg_rewind (https://www.postgresql.org/docs/9.6/static/app-pgrewind.html) has been designed with exactly this scenario in mind, aka recycling a past master as a slave to a promoted node. Have you looked at it? What you are trying to do is much likely going to create corruptions on your systems, so I am not surprised that you see inconsistency failures, what you are seeing is pretty much the tip of hte iceberg. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] issue about the streaming replication
Hi, I make a test to see how postgresql handle replication diverge problem: a) setup two pg cluster A and B b) run A as master, B as salve, using streaming replication c) insert some data into table foobar on A, shutdown the network between A and B at the meantime, which ends up some data would be missing on B d) A crashes e) promote B as new master, insert some data into table foobar on B f) now data on A and B diverge When I restart A as new slave, it reports below error in log: record with incorrect prev-link And worse is, when I shutdown B and promotes A as master again, it fails to startup: LOG: database system was shut down in recovery FATAL: invalid memory alloc request size 2281725952 what's this error and why? I think this diverge scenario is common, because it's likely the master would crash due to some hardware issue (e.g. power off) which would cause some committed transaction has not yet synced to slave, while the slave would be promoted to new master and accepts new transactions, then how to recover the old master? Moreover, how to recover the data on old master which is missing on new master? Regards, Jinhua Luo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Hello Tom, * Daniel Verite previously pointed out the desirability of disabling variable expansion while skipping script. That doesn't seem to be here, ISTM that it is still there, but for \elif conditions which are currently always checked. fabien=# \if false fabien@# \echo `echo BAD` command ignored, use \endif or Ctrl-C to exit current branch. fabien@# \else fabien=# \echo `echo OK` OK fabien=# \endif IIRC, I objected to putting knowledge of ConditionalStack into the shared psqlscan.l lexer, and I still think that would be a bad idea; but we need some way to get the lexer to shut that off. Probably the best way is to add a passthrough "void *" argument that would let the get_variable callback function mechanize the rule about not expanding in a false branch. Hmmm. I see this as a circumvolute way of providing the stack knowledge without actually giving the stack... it seems that would work, so why not. * Whether or not you think it's important not to expand skipped variables, I think that it's critical that skipped backtick expressions not be executed. \if something \elif `expr :var1 + :var2 = :var3` \endif I think it's essential that expr not be called if the first if-condition succeeded. This was the behavior at some point, but it was changed because we understood that it was required that boolean errors were detected and the resulting command be simply ignored. I'm really fine with having that back. * The documentation says that an \if or \elif expression extends to the end of the line, but actually the code is just eating one OT_NORMAL argument. That means it's OK to do this: [...] More generally, I do not think that the approach of having exec_command simply fall out immediately when in a false branch is going to work, because it ignores the fact that different backslash commands have different argument parsing rules. Some will eat the rest of the line and some won't. I'm afraid that it might be necessary to remove that code block and add a test to every single backslash command that decides whether to actually perform its action after it's consumed its arguments. That would be tedious :-(. Indeed. IMO the very versatile lexing conventions of backslash commands, or rather their actual lack of any consistency, makes it hard to get something very sane out of this, especially with the "do not evaluate in false branch" argument. As a simple way out, I suggest to: (1) document that \if-related commands MUST be on their own line (i.e. like cpp #if directives?). (2) check that it is indeed the case when one \if-related command detected. (3) call it a feature if someone does not follow the rule and gets a strange behavior as a result, as below: regression=# \if 0 regression@# \echo foo \endif command ignored, use \endif or Ctrl-C to exit current branch. (notice we're not out of the conditional) * I'm not on board with having a bad expression result in failing the \if or \elif altogether. This was understood as a requirement on previous versions which did not fail. I do agree that it seems better to keep the structure on errors, at least for script usage. It was stated several times upthread that that should be processed as though the result were "false", and I agree with that. I'm fine with that, if everyone could agree before Corey spends more time on this... [...] We might as well replace the recommendation to use ON_ERROR_STOP with a forced abort() for an invalid expression value, because trying to continue a script with this behavior is entirely useless. Hmmm. Maybe your remark is rhetorical. That could be for scripting use, but in interactive mode aborting coldly on syntax errors is not too nice for the user. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers