Re: Identity columns should own only one sequence

2019-05-02 Thread Laurenz Albe
On Thu, 2019-05-02 at 22:43 +0200, Peter Eisentraut wrote:
> On 2019-04-29 18:28, Laurenz Albe wrote:
> > I still think thatthat there is merit to Michael's idea of removing
> > sequence "ownership" (which is just a dependency) when the DEFAULT
> > on the column is dropped, but this approach is possibly cleaner.
> 
> I think the proper way to address this would be to create some kind of
> dependency between the sequence and the default.

That is certainly true.  But that's hard to retrofit into existing databases,
so it would probably be a modification that is not backpatchable.

Yours,
Laurenz Albe





Re: Unhappy about API changes in the no-fsm-for-small-rels patch

2019-05-02 Thread John Naylor
On Thu, May 2, 2019 at 4:57 PM Amit Kapila  wrote:
>
> On Thu, May 2, 2019 at 12:39 PM John Naylor  
> wrote:
> >
> Okay,  I have updated the patch to incorporate your changes and call
> relcache invalidation at required places. I have updated comments at a
> few places as well.  The summarization of this patch is that (a) it
> moves the local map to relation cache (b) performs the cache
> invalidation whenever we create fsm (either via backend or vacuum),
> when some space in a page is freed by vacuum (provided fsm doesn't
> exist) or whenever the local map is cleared (c) additionally, we clear
> the local map when we found a block from FSM, when we have already
> tried all the blocks present in cache or when we are allowed to create
> FSM.
>
> If we agree on this, then we can update the README accordingly.
>
> Can you please test/review?

There isn't enough time. But since I already wrote some debugging
calls earlier (attached), I gave it a brief spin, I found this patch
isn't as careful as HEAD making sure we don't try the same block twice
in a row. If you insert enough tuples into an empty table such that we
need to extend, you get something like this:

DEBUG:  Not enough space on block 0
DEBUG:  Now trying block 0
DEBUG:  Not enough space on block 0
DEBUG:  Updating local map for block 0

At this point, I'm sorry to say, but I'm in favor of reverting. There
just wasn't enough time to redesign and debug a feature like this
during feature freeze.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


debug-free-space.patch
Description: Binary data


Re: POC: Cleaning up orphaned files using undo logs

2019-05-02 Thread Dilip Kumar
On Thu, May 2, 2019 at 7:00 PM Robert Haas  wrote:
>
> On Thu, May 2, 2019 at 5:32 AM Dilip Kumar  wrote:
> > Yeah, at least in this patch it looks silly.  Actually, I added that
> > index to make the qsort stable when execute_undo_action sorts them in
> > block order.  But, as part of this patch we don't have that processing
> > so either we can remove this structure completely as you suggested but
> > undo processing patch has to add that structure or we can just add
> > comment that why we added this index field.
>
> Well, the qsort comparator could compute the index as ptr - array_base
> just like any other code, couldn't it?
>
I might be completely missing but (ptr - array_base) is only valid
when first time you get the array, but qsort will swap the element
around and after that you will never be able to make out which element
was at lower index and which one was at higher index.   Basically, our
goal is to preserve the order of the undo record for the same block
but their order might get changed due to swap when they are getting
compared with the undo record pointer of the another block and once
the order is swap we will never know what was their initial positions?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
>> I just finished a successful run of the core regression tests with CCA.
>> Given the calendar, I think that's about as much CCA testing as I should
>> do personally.  I'll make a cleanup pass on this patch and try to get it
>> pushed within a few hours, if there are not objections.

> Sounds good to me.

Pushed --- hopefully, we have enough time before Sunday that we can get
reasonably complete buildfarm testing.

I did manually verify that all branches get through "reindex table
pg_class" and "reindex index pg_class_oid_index" under
CLOBBER_CACHE_ALWAYS, as well as a normal-mode check-world.  But CCA
world runs seem like a good idea.

As far as a permanent test scheme goes, I noticed while testing that
src/bin/scripts/t/090_reindexdb.pl and
src/bin/scripts/t/091_reindexdb_all.pl seem to be giving us a good
deal of coverage on this already, although of course they never caught the
problem with non-HOT updates, nor any of the deadlock issues.  Still,
it seems like maybe a core regression test that's been lobotomized enough
to be perfectly parallel-safe might not give us more coverage than can
be had there.

regards, tom lane




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
>> How do you feel about the other patch to rejigger the order of operations
>> in CommandCounterIncrement?  I think that's a bug, but it's probably
>> noncritical for most people.  What I'm leaning towards for that one is
>> waiting till after the minor releases, then pushing it to all branches.

> I've not yet have the mental cycles to look more deeply into it. I
> thought your explanation why the current way is wrong made sense, but I
> wanted to look a bit more into how it came to be how it is now.

Well, I wrote that code, and I can say pretty confidently that this
failure mode just didn't occur to me at the time.

> I agree
> that pushing after the minors would make sense, it's too subtle to go
> for it now.

It is subtle, and given that it's been there this long, I don't feel
urgency to fix it Right Now.  I think we're already taking plenty of
risk back-patching the REINDEX patch :-(

regards, tom lane




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
> I just finished a successful run of the core regression tests with CCA.
> Given the calendar, I think that's about as much CCA testing as I should
> do personally.  I'll make a cleanup pass on this patch and try to get it
> pushed within a few hours, if there are not objections.

Sounds good to me.


> How do you feel about the other patch to rejigger the order of operations
> in CommandCounterIncrement?  I think that's a bug, but it's probably
> noncritical for most people.  What I'm leaning towards for that one is
> waiting till after the minor releases, then pushing it to all branches.

I've not yet have the mental cycles to look more deeply into it. I
thought your explanation why the current way is wrong made sense, but I
wanted to look a bit more into how it came to be how it is now. I agree
that pushing after the minors would make sense, it's too subtle to go
for it now.

Greetings,

Andres Freund




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-02 12:59:55 -0400, Tom Lane wrote:
>> One interesting thing that turns up in check-world is that if wal_level
>> is minimal, we have to manually force an XID to be assigned, else
>> reindexing pg_class fails with "cannot commit a transaction that deleted
>> files but has no xid" :-(.  Perhaps there's some other cleaner place to
>> do that?

> Hm. We could replace that RecordTransactionCommit() with an xid
> assignment or such. But that seems at least as fragile. Or we could
> expand the logic we have for LogStandbyInvalidations() a few lines below
> the elog to also be able to handle files.  IIRC that was introduced to
> handle somewhat related issues about being able to run VACUUM
> (containing invalidations) without an xid.

Well, that's something we can maybe improve later.  I'm content to leave
the patch as it is for now.

>> if (RelationIsMapped(relation))
>> +{
>> +/* Since we're not updating pg_class, these had better not 
>> change */
>> +Assert(classform->relfrozenxid == freezeXid);
>> +Assert(classform->relminmxid == minmulti);
>> +Assert(classform->relpersistence == persistence);

> Hm. Could we additionally assert that we're dealing with an index?

Will do.

>> +/* These changes are safe even for a mapped relation */

> You'd probably have noticed that, but this one probably has to go.

Ah, right.  As I said, I'd not paid much attention to the comments yet.

I just finished a successful run of the core regression tests with CCA.
Given the calendar, I think that's about as much CCA testing as I should
do personally.  I'll make a cleanup pass on this patch and try to get it
pushed within a few hours, if there are not objections.

How do you feel about the other patch to rejigger the order of operations
in CommandCounterIncrement?  I think that's a bug, but it's probably
noncritical for most people.  What I'm leaning towards for that one is
waiting till after the minor releases, then pushing it to all branches.

regards, tom lane




Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

2019-05-02 Thread Peter Eisentraut
On 2019-05-02 16:33, Andres Freund wrote:
> And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:
> 
>* Lock level here should match reindex_index() heap lock. If 
> the OID
>* isn't valid, it means the index as concurrently dropped, 
> which is
>* not a problem for us; just return normally.
>*/
>   *heapOid = IndexGetRelation(relId, true);

It sets it but uses it only internally.  There is no code path that
passes in a non-zero heapOid, and there is no code path that does
anything with the heapOid passed back out.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 22:39:15 +0200, Peter Eisentraut wrote:
> On 2019-05-02 16:33, Andres Freund wrote:
> > And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:
> > 
> >  * Lock level here should match reindex_index() heap lock. If 
> > the OID
> >  * isn't valid, it means the index as concurrently dropped, 
> > which is
> >  * not a problem for us; just return normally.
> >  */
> > *heapOid = IndexGetRelation(relId, true);
> 
> It sets it but uses it only internally.  There is no code path that
> passes in a non-zero heapOid, and there is no code path that does
> anything with the heapOid passed back out.

RangeVarGetRelidExtended() can call the callback multiple times, if
there are any concurrent schema changes. That's why it's unlocking the
previously locked heap oid.

Greetings,

Andres Freund




Re: Identity columns should own only one sequence

2019-05-02 Thread Peter Eisentraut
On 2019-04-29 18:28, Laurenz Albe wrote:
> I still think thatthat there is merit to Michael's idea of removing
> sequence "ownership" (which is just a dependency) when the DEFAULT
> on the column is dropped, but this approach is possibly cleaner.

I think the proper way to address this would be to create some kind of
dependency between the sequence and the default.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: New vacuum option to do only freezing

2019-05-02 Thread Alvaro Herrera
Pushed.  I added one comment to explain xmin_frozen also, which
otherwise seemed a bit mysterious.  I did not backpatch, though, so
9.6-11 are a bit different, but I'm not sure it's a good idea at this
point, though it should be pretty innocuous.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ERROR: failed to add item to the index page

2019-05-02 Thread Andreas Joseph Krogh
På torsdag 02. mai 2019 kl. 21:38:02, skrev Peter Geoghegan :
 > Pushed, though final version does the test a little differently. It
 > adds the required heap TID space to itupsz, rather than subtracting it
 > from pgspc. This is actually representative of the underlying logic,
 > and avoids unsigned underflow. Thanks! 
 --
 Andreas Joseph Krogh

Re: ERROR: failed to add item to the index page

2019-05-02 Thread Peter Geoghegan
On Tue, Apr 30, 2019 at 6:28 PM Peter Geoghegan  wrote:
> Attached is a much more polished version of the same patch. I tried to
> make clear how the "page full" test (the test that has been fixed to
> take heap TID space for high key into account) is related to other
> close-by code, such as the tuple space limit budget within
> _bt_check_third_page(), and the code that sets up an actual call to
> _bt_truncate().

Pushed, though final version does the test a little differently. It
adds the required heap TID space to itupsz, rather than subtracting it
from pgspc. This is actually representative of the underlying logic,
and avoids unsigned underflow.

-- 
Peter Geoghegan




Re: How to estimate the shared memory size required for parallel scan?

2019-05-02 Thread Thomas Munro
On Thu, May 2, 2019 at 8:06 PM Thomas Munro  wrote:
> I was pinged off-list by someone who is working on a parallel-aware
> FDW, who asked if I still had the test code I mentioned up-thread.
> While digging that out, I couldn't resist hacking it a bit more until
> it gave the right answers, only sooner:

Here's a version that fixes an assertion failure when the regression
test tries to UPDATE a file_fdw table.  That'll teach me to turn off
assertions for benchmarking, and then change something, and then post
a patch...  Added to commitfest.

-- 
Thomas Munro
https://enterprisedb.com


0001-Make-file_fdw-parallel-aware-v2.patch
Description: Binary data


Re: pg_upgrade --clone error checking

2019-05-02 Thread Jeff Janes
On Thu, May 2, 2019 at 12:28 PM Alvaro Herrera 
wrote:

> On 2019-May-02, Jeff Janes wrote:
>
> >
> > When something is doomed to fail, we should report the failure as early
> as
> > feasibly detectable.
>
> I agree -- this check should be done before checking the database
> contents.  Maybe even before "Checking cluster versions".
>

It looks like it was designed for early checking, it just wasn't placed
early enough.  So changing it is pretty easy, as check_file_clone does not
need to be invented, and there is no additional code duplication over what
was already there.

This patch moves the checking to near the beginning.

It carries the --link mode checking along with it.  That should be done as
well, and doing it as a separate patch would make both patches uglier.

Cheers,

Jeff


pg_upgrade_filesystem.patch
Description: Binary data


Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 12:59:55 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
> >>> But don't we need to worry about resetting relfrozenxid?
> 
> >> Indexes don't have that though? We couldn't do it for pg_class itself,
> >> but that's not a problem here.
> 
> > Hmm.  Again, that seems like the sort of assumption that could bite
> > us later.  But maybe we could add some assertions that the new values
> > match the old?  I'll experiment.

index_create() has
Assert(relfrozenxid == InvalidTransactionId);
Assert(relminmxid == InvalidMultiXactId);

I think we should just add the same to reindex_index() (modulo accessing
relcache rather than local vars, of course)?


> Huh, this actually seems to work.  The attached is a quick hack for
> testing.  It gets through check-world straight up and with
> xxx_FORCE_RELEASE, and I've verified reindexing pg_class works with
> CLOBBER_CACHE_ALWAYS, but it'll be a few hours before I have a full CCA
> run.

Great.


> One interesting thing that turns up in check-world is that if wal_level
> is minimal, we have to manually force an XID to be assigned, else
> reindexing pg_class fails with "cannot commit a transaction that deleted
> files but has no xid" :-(.  Perhaps there's some other cleaner place to
> do that?

Hm. We could replace that RecordTransactionCommit() with an xid
assignment or such. But that seems at least as fragile. Or we could
expand the logic we have for LogStandbyInvalidations() a few lines below
the elog to also be able to handle files.  IIRC that was introduced to
handle somewhat related issues about being able to run VACUUM
(containing invalidations) without an xid.


> +  * If we're dealing with a mapped index, pg_class.relfilenode doesn't
> +  * change; instead we have to send the update to the relation mapper.
> +  *
> +  * For mapped indexes, we don't actually change the pg_class entry at 
> all;
> +  * this is essential when reindexing pg_class itself.  That leaves us 
> with
> +  * possibly-inaccurate values of relpages etc, but those will be fixed 
> up
> +  * later.
>*/
>   if (RelationIsMapped(relation))
> + {
> + /* Since we're not updating pg_class, these had better not 
> change */
> + Assert(classform->relfrozenxid == freezeXid);
> + Assert(classform->relminmxid == minmulti);
> + Assert(classform->relpersistence == persistence);

Hm. Could we additionally assert that we're dealing with an index? The
above checks will trigger for tables right now, but I'm not sure that'll
always be the case.


> + /*
> +  * In some code paths it's possible that the tuple update we'd
> +  * otherwise do here is the only thing that would assign an XID 
> for
> +  * the current transaction.  However, we must have an XID to 
> delete
> +  * files, so make sure one is assigned.
> +  */
> + (void) GetCurrentTransactionId();

Not pretty, but seems tolerable.


> - /* These changes are safe even for a mapped relation */
> - if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
> - {
> - classform->relpages = 0;/* it's empty until further 
> notice */
> - classform->reltuples = 0;
> - classform->relallvisible = 0;
> - }
> - classform->relfrozenxid = freezeXid;
> - classform->relminmxid = minmulti;
> - classform->relpersistence = persistence;
> + /* These changes are safe even for a mapped relation */

You'd probably have noticed that, but this one probably has to go.

Greetings,

Andres Freund




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
>>> But don't we need to worry about resetting relfrozenxid?

>> Indexes don't have that though? We couldn't do it for pg_class itself,
>> but that's not a problem here.

> Hmm.  Again, that seems like the sort of assumption that could bite
> us later.  But maybe we could add some assertions that the new values
> match the old?  I'll experiment.

Huh, this actually seems to work.  The attached is a quick hack for
testing.  It gets through check-world straight up and with
xxx_FORCE_RELEASE, and I've verified reindexing pg_class works with
CLOBBER_CACHE_ALWAYS, but it'll be a few hours before I have a full CCA
run.

One interesting thing that turns up in check-world is that if wal_level
is minimal, we have to manually force an XID to be assigned, else
reindexing pg_class fails with "cannot commit a transaction that deleted
files but has no xid" :-(.  Perhaps there's some other cleaner place to
do that?

If we go this path, we should remove RelationSetIndexList altogether
(in HEAD), but I've not done so here.  The comments likely need more
work too.

I have to go out and do some errands for the next few hours, so I can't
push this forward any more right now.

regards, tom lane

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ce02410..508a04e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3330,20 +3330,15 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
-	/*
-	 * Build a new physical relation for the index. Need to do that before
-	 * "officially" starting the reindexing with SetReindexProcessing -
-	 * otherwise the necessary pg_class changes cannot be made with
-	 * encountering assertions.
-	 */
-	RelationSetNewRelfilenode(iRel, persistence);
-
 	/* ensure SetReindexProcessing state isn't leaked */
 	PG_TRY();
 	{
 		/* Suppress use of the target index while rebuilding it */
 		SetReindexProcessing(heapId, indexId);
 
+		/* Build a new physical relation for the index */
+		RelationSetNewRelfilenode(iRel, persistence);
+
 		/* Initialize the index and rebuild */
 		/* Note: we do not need to re-establish pkey setting */
 		index_build(heapRelation, iRel, indexInfo, true, true);
@@ -3491,7 +3486,6 @@ reindex_relation(Oid relid, int flags, int options)
 	Relation	rel;
 	Oid			toast_relid;
 	List	   *indexIds;
-	bool		is_pg_class;
 	bool		result;
 	int			i;
 
@@ -3527,32 +3521,8 @@ reindex_relation(Oid relid, int flags, int options)
 	 */
 	indexIds = RelationGetIndexList(rel);
 
-	/*
-	 * reindex_index will attempt to update the pg_class rows for the relation
-	 * and index.  If we are processing pg_class itself, we want to make sure
-	 * that the updates do not try to insert index entries into indexes we
-	 * have not processed yet.  (When we are trying to recover from corrupted
-	 * indexes, that could easily cause a crash.) We can accomplish this
-	 * because CatalogTupleInsert/CatalogTupleUpdate will use the relcache's
-	 * index list to know which indexes to update. We just force the index
-	 * list to be only the stuff we've processed.
-	 *
-	 * It is okay to not insert entries into the indexes we have not processed
-	 * yet because all of this is transaction-safe.  If we fail partway
-	 * through, the updated rows are dead and it doesn't matter whether they
-	 * have index entries.  Also, a new pg_class index will be created with a
-	 * correct entry for its own pg_class row because we do
-	 * RelationSetNewRelfilenode() before we do index_build().
-	 */
-	is_pg_class = (RelationGetRelid(rel) == RelationRelationId);
-
-	/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
-	if (is_pg_class)
-		(void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
-
 	PG_TRY();
 	{
-		List	   *doneIndexes;
 		ListCell   *indexId;
 		char		persistence;
 
@@ -3580,15 +3550,11 @@ reindex_relation(Oid relid, int flags, int options)
 			persistence = rel->rd_rel->relpersistence;
 
 		/* Reindex all the indexes. */
-		doneIndexes = NIL;
 		i = 1;
 		foreach(indexId, indexIds)
 		{
 			Oid			indexOid = lfirst_oid(indexId);
 
-			if (is_pg_class)
-RelationSetIndexList(rel, doneIndexes);
-
 			reindex_index(indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 		  persistence, options);
 
@@ -3597,9 +3563,6 @@ reindex_relation(Oid relid, int flags, int options)
 			/* Index should no longer be in the pending list */
 			Assert(!ReindexIsProcessingIndex(indexOid));
 
-			if (is_pg_class)
-doneIndexes = lappend_oid(doneIndexes, indexOid);
-
 			/* Set index rebuild count */
 			pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
 		 i);
@@ -3615,9 +3578,6 @@ reindex_relation(Oid relid, int flags, int options)
 	PG_END_TRY();
 	ResetReindexPending();
 
-	if (is_pg_class)
-		RelationSetIndexList(rel, indexIds);
-
 	/*
 	 * Close rel

Re: pgbench - add option to show actual builtin script code

2019-05-02 Thread Fabien COELHO




Now the patch is good now.

The new status of this patch is: Ready for Committer


Ok, thanks.

--
Fabien.




Re: pg_upgrade --clone error checking

2019-05-02 Thread Alvaro Herrera
On 2019-May-02, Jeff Janes wrote:

> I think the error message wording is OK, I think it should be thrown
> earlier, before the "Creating dump of database schemas" (which can take a
> long time), and preferably before either database is even started.  So
> ideally it would be something like:
> 
> 
> Performing Consistency Checks
> -
> Checking cluster versions
> Checking file cloning support
> File cloning not supported on this platform
> Failure, exiting
> 
> 
> When something is doomed to fail, we should report the failure as early as
> feasibly detectable.

I agree -- this check should be done before checking the database
contents.  Maybe even before "Checking cluster versions".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_upgrade --clone error checking

2019-05-02 Thread Jeff Janes
On Thu, May 2, 2019 at 11:57 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-05-01 22:10, Jeff Janes wrote:
> > With the new pg_upgrade --clone, if we are going to end up throwing the
> > error "file cloning not supported on this platform" (which seems to
> > depend only on ifdefs) I think we should throw it first thing, before
> > any other checks are done and certainly before pg_dump gets run.
>
> Could you explain in more detail what command you are running, what
> messages you are getting, and what you would like to see instead?
>

I'm running:

pg_upgrade --clone -b /home/jjanes/pgsql/REL9_6_12/bin/ -B
/home/jjanes/pgsql/origin_jit/bin/ -d /home/jjanes/pgsql/data_96/ -D
/home/jjanes/pgsql/data_clone/

And I get:

Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for tables WITH OIDS   ok
Checking for invalid "unknown" user columns ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok

file cloning not supported on this platform
Failure, exiting

I think the error message wording is OK, I think it should be thrown
earlier, before the "Creating dump of database schemas" (which can take a
long time), and preferably before either database is even started.  So
ideally it would be something like:


Performing Consistency Checks
-
Checking cluster versions
Checking file cloning support
File cloning not supported on this platform
Failure, exiting


When something is doomed to fail, we should report the failure as early as
feasibly detectable.

Cheers,

Jeff


Re: Attempt to consolidate reading of XLOG page

2019-05-02 Thread Antonin Houska
Antonin Houska  wrote:

> Alvaro Herrera  wrote:
> 
> > Not sure about the walsize; maybe it can be a member in XLogReadPos, and
> > given to XLogReadInitPos()?  (Maybe rename XLogReadPos as
> > XLogReadContext or something like that, indicating it's not just the
> > read position.)
> 
> As pointed out by others, XLogReadPos is not necessary. So if XLogRead()
> receives XLogReaderState instead, it can get the segment size from there.

Eventually I found out that it's good to have a separate structure for the
read position because walsender calls the XLogRead() function directly, not
via the XLOG reader. Currently the structure name is XLogSegment (maybe
someone can propose better name) and it's a member of XLogReaderState. No
field of the new structure is duplicated now.

The next version of the patch is attached.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

This change is not necessary for the XLogRead() refactoring itself, but I
noticed the problem while working on it. Not sure it's worth a separate CF
entry.

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 9196aa3aae..8cb551a837 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -35,6 +35,7 @@ static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 	  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
 XLogRecPtr recptr);
+static void XLogReaderInvalReadState(XLogReaderState *state);
 static int ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
  int reqLen);
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...) pg_attribute_printf(2, 3);
@@ -620,7 +621,7 @@ err:
 /*
  * Invalidate the xlogreader's read state to force a re-read.
  */
-void
+static void
 XLogReaderInvalReadState(XLogReaderState *state)
 {
 	state->readSegNo = 0;
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index f3bae0bf49..2d3c067135 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -212,9 +212,6 @@ extern struct XLogRecord *XLogReadRecord(XLogReaderState *state,
 extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
 			 XLogRecPtr recptr, char *phdr);
 
-/* Invalidate read state */
-extern void XLogReaderInvalReadState(XLogReaderState *state);
-
 #ifdef FRONTEND
 extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
 #endif			/* FRONTEND */
The timeline information is available to caller via XLogReaderState. Now that
XLogRead() is gonna be (sometimes) responsible for determining the TLI, it
would have to be added the (TimeLineID *) argument too, just to be consistent
with the current coding style. Since XLogRead() updates also other
position-specific fields of XLogReaderState, it seems simpler if we remove the
output argument from XLogPageReadCB and always report the TLI via
XLogReaderState.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 19b32e21df..5723aa54a7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -886,8 +886,7 @@ static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
-			 int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
-			 TimeLineID *readTLI);
+			 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -11509,7 +11508,7 @@ CancelBackup(void)
  */
 static int
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
-			 XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
+			 XLogRecPtr targetRecPtr, char *readBuf)
 {
 	XLogPageReadPrivate *private =
 	(XLogPageReadPrivate *) xlogreader->private_data;
@@ -11626,7 +11625,7 @@ retry:
 	Assert(targetPageOff == readOff);
 	Assert(reqLen <= readLen);
 
-	*readTLI = curFileTLI;
+	xlogreader->readPageTLI = curFileTLI;
 
 	/*
 	 * Check the page header immediately, so that we can retry immediately if
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 8cb551a837..244fc7d634 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -558,7 +558,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 
 		readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ,
    state->currRecPtr,
-   state->readBuf, &state->readPageTLI);
+   state->readBuf);
 		if (readLen < 0)
 			goto err;
 
@@ -576,7 +576,7

Re: New vacuum option to do only freezing

2019-05-02 Thread Alvaro Herrera
On 2019-May-01, Andres Freund wrote:

> Alvaro, could we perhaps clean this up a bit? This is pretty confusing
> looking.  I think this probably could just be changed to
> 
> boolxmin_frozen = false;
> 
> xid = HeapTupleHeaderGetXmin(tuple);
> 
> if (xid == FrozenTransactionId)
> xmin_frozen = true;
> else if (TransactionIdIsNormal(xid))
> 
> or somesuch.  There's no need to check for
> HeapTupleHeaderXminFrozen(tuple) etc, because HeapTupleHeaderGetXmin()
> already does so - and if it didn't, the issue Tom points out would be
> problematic.

Ah, yeah, that's simpler.  I would like to introduce a couple of very
minor changes to the proposed style, per the attached.

* don't initialize xmin_frozen at all; rather, only set its value to the
correct one when we have determined what it is.  Doing premature
initialization is what led to some of those old bugs, so I prefer not to
do it.

* Handle the BootstrapXid and InvalidXid cases explicitly, by setting
xmin_frozen to true when xmin is not normal.  After all, those XID
values do not need any freezing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1cca0581655614703b020facbbf8d64b442f95be Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 May 2019 11:50:18 -0400
Subject: [PATCH] heap_prepare_freeze_tuple: Simplify coding

---
 src/backend/access/heap/heapam.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a05b6a07ad0..d6ceb8fd4a4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6108,9 +6108,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 	/* Process xmin */
 	xid = HeapTupleHeaderGetXmin(tuple);
-	xmin_frozen = ((xid == FrozenTransactionId) ||
-   HeapTupleHeaderXminFrozen(tuple));
-	if (TransactionIdIsNormal(xid))
+	if (!TransactionIdIsNormal(xid))
+		xmin_frozen = true;
+	else
 	{
 		if (TransactionIdPrecedes(xid, relfrozenxid))
 			ereport(ERROR,
@@ -6118,7 +6118,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	 errmsg_internal("found xmin %u from before relfrozenxid %u",
 	 xid, relfrozenxid)));
 
-		if (TransactionIdPrecedes(xid, cutoff_xid))
+		xmin_frozen = TransactionIdPrecedes(xid, cutoff_xid);
+		if (xmin_frozen)
 		{
 			if (!TransactionIdDidCommit(xid))
 ereport(ERROR,
@@ -6128,7 +6129,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 			frz->t_infomask |= HEAP_XMIN_FROZEN;
 			changed = true;
-			xmin_frozen = true;
 		}
 	}
 
-- 
2.17.1



Re: New vacuum option to do only freezing

2019-05-02 Thread Robert Haas
On Thu, May 2, 2019 at 10:28 AM Andres Freund  wrote:
> It'd be good if somebody could make a pass over the safety mechanisms in
> heap_prepare_freeze_tuple(). I added them at some point, after a data
> corrupting bug related to freezing, but they really were more intended
> as a secondary layer of defense, rather than the primary one.  My
> understanding is still that we assume that we never should reach
> heap_prepare_freeze_tuple() for something that is below the horizon,
> even after this change, right?

I think so.  This code is hit if the tuple wasn't dead yet at the time
that heap_page_prune() decided not to prune it, but it is dead by the
time we retest the tuple status.  But the same value of OldestXmin was
used for both checks, so the only way that can really happen is if the
XID in the tuple header was running before and is no longer running.
However, if the XID was running at the time that heap_page_prune()
ran, than OldestXmin certainly can't be newer than that XID.  And
therefore the value we're intending to set for relfrozenxid has surely
got to be older, so we could hardly prune using OldestXmin as the
threshold and then relfrozenxid to a newer XID.

Actually, I now believe that my original concern here was exactly
backwards.  Prior to the logic change in this commit, with index
vacuum disabled, a tuple that becomes dead between the
heap_page_prune() check and the lazy_scan_heap() check would have
followed the tupgone = true path.  That would cause it to be treated
as if the second vacuum pass were going to remove it - i.e. not
frozen.  But with index cleanup disabled, there will never be a second
vacuum pass.  So a tuple that was actually being kept was not sent to
heap_prepare_freeze_tuple() with, basically, no justification.
Imagine for example that the tuple was not old in terms of its XID
age, but its MXID age was somehow ancient.  Then we'd fail to freeze
it on the theory that it was going to be removed, but yet not remove
it.  Oops.  The revised logic - which is as per Tom's suggestion -
does the right thing, which is treat the tuple as one we've chosen to
keep.

While looking at this code, I think I may have spotted another bug, or
at least a near-bug. lazy_record_dead_tuple() thinks it's OK to just
forget about dead tuples if there's not enough memory, which I think
is OK in the normal case where the dead tuple has been truncated to a
dead line pointer.  But in the tupgone = true case it's NOT ok,
because in that case we're leaving behind not just a dead line pointer
but an actual tuple which we have declined to freeze on the assumption
that it will be removed later.  But if lazy_record_dead_tuple()
forgets about it, then it won't be removed later.  That could lead to
tuples remaining that precede the freeze horizons.  The reason why
this may be only a near-bug is that we seem to try pretty hard to make
sure that we'll never call that function in the first place without
enough space being available.  Still, it seems to me that it would be
safer to change the code to look like:

if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples)
elog(ERROR, "oh crap");

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
>> But don't we need to worry about resetting relfrozenxid?

> Indexes don't have that though? We couldn't do it for pg_class itself,
> but that's not a problem here.

Hmm.  Again, that seems like the sort of assumption that could bite
us later.  But maybe we could add some assertions that the new values
match the old?  I'll experiment.

regards, tom lane




Re: pg_upgrade --clone error checking

2019-05-02 Thread Peter Eisentraut
On 2019-05-01 22:10, Jeff Janes wrote:
> With the new pg_upgrade --clone, if we are going to end up throwing the
> error "file cloning not supported on this platform" (which seems to
> depend only on ifdefs) I think we should throw it first thing, before
> any other checks are done and certainly before pg_dump gets run.

Could you explain in more detail what command you are running, what
messages you are getting, and what you would like to see instead?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > ISTM that if we go down this path, we should split (not now, but either
> > still in v12, or *early* in v13), the sets of indexes that are intended
> > to a) not being used for catalog queries b) may be skipped for index
> > insertions. It seems pretty likely that somebody will otherwise soon
> > introduce an heap_update() somewhere into the index build process, and
> > it'll work just fine in testing due to HOT.
> 
> Given the assertions you added in CatalogIndexInsert, I'm not sure
> why that's a big hazard?

Afaict the new RelationSetIndexList() trickery would prevent that
assertion from being reached, because RelationGetIndexList() will not
see the current index, and therefore CatalogIndexInsert() won't know to
assert it either.  It kinda works today for reindex_relation(), because
we'll "un-hide" the already rebuilt indexes - i.e. we'd not notice the
bug on pg_class' first index, but for later ones it'd trigger.  I guess
you could argue that we'll just have to rely on REINDEX TABLE pg_class
regression tests to make sure REINDEX INDEX pg_class_* ain't broken :/.


> > I kinda wonder if there's not a third approach hiding somewhere here. We
> > could just stop updating pg_class in RelationSetNewRelfilenode() in
> > pg_class, when it's an index on pg_class.
> 
> Hmm ... are all those indexes mapped?  I guess so.

They are:

postgres[13357][1]=# SELECT oid::regclass, relfilenode FROM pg_class WHERE oid 
IN (SELECT indexrelid FROM pg_index WHERE indrelid = 'pg_class'::regclass);
┌───┬─┐
│oid│ relfilenode │
├───┼─┤
│ pg_class_oid_index│   0 │
│ pg_class_relname_nsp_index│   0 │
│ pg_class_tblspc_relfilenode_index │   0 │
└───┴─┘
(3 rows)

I guess that doesn't stricly have to be the case for at least some of
them, but it seems unlikely that we'd want to change that.


> But don't we need to worry about resetting relfrozenxid?

Indexes don't have that though? We couldn't do it for pg_class itself,
but that's not a problem here.

Greetings,

Andres Freund




Re: Why is infinite_recurse test suddenly failing?

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> Hm, I just noticed:
>'HEAD' => [
>'force_parallel_mode = 
> regress'
>  ]

Oooh, I didn't see that.

> on all those animals. So it's not necessarily the case that HEAD and
> backbranch runs are behaving all that identical.  Note that isn't a
> recent config change, so it's not an explanation as to why they started
> to fail only recently.

No, but it does point at another area of the code in which a relevant
change could've occurred.

While we're looking at this --- Mark, if you could install gdb
on your buildfarm hosts, that would be really handy.  I think that's
the only extra thing the buildfarm script needs to extract stack
traces from core dumps.  We'd likely already know where the problem
is if we had a stack trace ...

regards, tom lane




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> ISTM that if we go down this path, we should split (not now, but either
> still in v12, or *early* in v13), the sets of indexes that are intended
> to a) not being used for catalog queries b) may be skipped for index
> insertions. It seems pretty likely that somebody will otherwise soon
> introduce an heap_update() somewhere into the index build process, and
> it'll work just fine in testing due to HOT.

Given the assertions you added in CatalogIndexInsert, I'm not sure
why that's a big hazard?

> I kinda wonder if there's not a third approach hiding somewhere here. We
> could just stop updating pg_class in RelationSetNewRelfilenode() in
> pg_class, when it's an index on pg_class.

Hmm ... are all those indexes mapped?  I guess so.  But don't we need
to worry about resetting relfrozenxid?

regards, tom lane




Re: Why is infinite_recurse test suddenly failing?

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 11:02:03 -0400, Tom Lane wrote:
> In the past week, four different buildfarm members have shown
> non-reproducing segfaults in the "select infinite_recurse()"
> test case, rather than the expected detection of stack overrun
> before we get to the point of a segfault.

I was just staring at bonito's failure in confusion.


> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bonito&dt=2019-05-01%2023%3A05%3A36
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=takin&dt=2019-05-01%2008%3A16%3A48
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=buri&dt=2019-04-27%2023%3A54%3A46
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=demoiselle&dt=2019-04-27%2014%3A55%3A52
> 
> They're all on HEAD, and they all look like
> 
> 2019-05-01 23:11:00.145 UTC [13933:65] LOG:  server process (PID 17161) was 
> terminated by signal 11: Segmentation fault
> 2019-05-01 23:11:00.145 UTC [13933:66] DETAIL:  Failed process was running: 
> select infinite_recurse();
> 
> I scraped the buildfarm database and verified that there are no similar
> failures for at least three months back; nor, offhand, can I remember ever
> having seen this test fail in many years.  So it seems we broke something
> recently.  No idea what though.

I can't recall any recent changes to relevant area of code.


> (Another possibility, seeing that these are all members of Mark's PPC64
> flotilla, is that there's some common misconfiguration --- but it's hard
> to credit that such a problem would only affect HEAD not the back
> branches.)

Hm, I just noticed:
   'HEAD' => [
   'force_parallel_mode = 
regress'
 ]

on all those animals. So it's not necessarily the case that HEAD and
backbranch runs are behaving all that identical.  Note that isn't a
recent config change, so it's not an explanation as to why they started
to fail only recently.

Greetings,

Andres Freund




Re: New vacuum option to do only freezing

2019-05-02 Thread Andres Freund
On 2019-05-02 11:09:10 -0400, Robert Haas wrote:
> If I understand correctly, your first proposal amounts to redefining
> tups_vacuumed to count #2 rather than #1, and your second proposal
> amounts to making tups_vacuumed count #1 + #2 rather than #1.  I
> suggest a third option: have two counters.

+1




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 10:49:00 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-05-01 22:01:53 -0400, Tom Lane wrote:
> >> I think that argument is pretty pointless considering that "REINDEX TABLE
> >> pg_class" does it this way, and that code is nearly old enough to
> >> vote.
>
> > IMO the reindex_relation() case isn't comparable.
>
> IMV it's the exact same case: we need to perform a pg_class update while
> one or more of pg_class's indexes shouldn't be touched.  I am kind of
> wondering why it didn't seem to be necessary to cover this for REINDEX
> INDEX back in 2003, but it clearly is necessary now.
>
> > That's not pretty either :(
>
> So, I don't like your patch, you don't like mine.  Anybody else
> want to weigh in?

Well, I think I can live with your fix. I think it's likely to hide
future bugs, but this is an active bug. And, as you say, we don't have a
lot of time.


ISTM that if we go down this path, we should split (not now, but either
still in v12, or *early* in v13), the sets of indexes that are intended
to a) not being used for catalog queries b) may be skipped for index
insertions. It seems pretty likely that somebody will otherwise soon
introduce an heap_update() somewhere into the index build process, and
it'll work just fine in testing due to HOT.


We already have somewhat separate and half complimentary mechanisms
here:
1) When REINDEX_REL_SUPPRESS_INDEX_USE is set (only cluster.c), we mark
   indexes on tables as unused by SetReindexPending(). That prevents them
   from being used for catalog queries. But it disallows new inserts
   into them.

2) When reindex_index() starts processing an index, it marks it as being
   processed. Indexes on this list are not alowed to be inserted to
   (enforced by assertions).  Note that this currently removes the
   specific index from the list set by 1).

   It also marks the heap as being reindexed, which then triggers (as
   the sole effect afaict), some special case logic in
   index_update_stats(), that avoids the syscache and opts for a direct
   manual catalog scan. I'm a bit confused as to why that's necessary.

3) Just for pg_class, reindex_relation(), just hard-sets the list of
   indexes that are alreday rebuilt. This allows index insertions into
   the the indexes that are later going to be rebuilt - which is
   necessary because we currently update pg_class in
   RelationSetNewRelfilenode().

Seems odd to resort to RelationSetIndexList(), when we could just mildly
extend the SetReindexPending() logic instead.

I kinda wonder if there's not a third approach hiding somewhere here. We
could just stop updating pg_class in RelationSetNewRelfilenode() in
pg_class, when it's an index on pg_class. The pg_class changes for
mapped indexes done aren't really crucial, and are going to be
overwritten later by index_update_stats().  That'd have the big
advantage that we'd afaict not need the logic of having to allow
catalog modifications at all during the reindex path at all.


> We do not have the luxury of time to argue about this.  If we commit
> something today, we *might* get a useful set of CLOBBER_CACHE_ALWAYS
> results for all branches by Sunday.

Yea. I think I'll also just trigger a manual CCA run of check-world for
all branches (thank god for old workstations). And CCR for at least a
few crucial bits.


> Those regression tests will have to come out of the back branches on
> Sunday, because we are not shipping minor releases with unstable
> regression tests, and I've heard no proposal for avoiding the
> occasional-deadlock problem.

Yea, I've just proposed the same in a separate thread.

Greetings,

Andres Freund




Re: New vacuum option to do only freezing

2019-05-02 Thread Robert Haas
On Tue, Apr 16, 2019 at 4:00 PM Tom Lane  wrote:
> I wrote:
> > I'm thinking that we really need to upgrade vacuum's reporting totals
> > so that it accounts in some more-honest way for pre-existing dead
> > line pointers.  The patch as it stands has made the reporting even more
> > confusing, rather than less so.
>
> Here's a couple of ideas about that:
>
> 1. Ignore heap_page_prune's activity altogether, on the grounds that
> it's just random chance that any cleanup done there was done during
> VACUUM and not some preceding DML operation.  Make tups_vacuumed
> be the count of dead line pointers removed.  The advantage of this
> way is that tups_vacuumed would become independent of previous
> non-VACUUM pruning activity, as well as preceding index-cleanup-disabled
> VACUUMs.  But maybe it's hiding too much information.
>
> 2. Have heap_page_prune count, and add to tups_vacuumed, only HOT
> tuples that it deleted entirely.  The action of replacing a DEAD
> root tuple with a dead line pointer doesn't count for anything.
> Then also add the count of dead line pointers removed to tups_vacuumed.
>
> Approach #2 is closer to the way we've defined tups_vacuumed up to
> now, but I think it'd be more realistic in cases where previous
> pruning or index-cleanup-disabled vacuums have left lots of dead
> line pointers.
>
> I'm not especially wedded to either of these --- any other ideas?

I think it's almost impossible to have clear reporting here with only
a single counter.  There are two clearly-distinct cleanup operations
going on here: (1) removing tuples from pages, and (2) making dead
line pointers unused so that they can be reused for new tuples.  They
happen in equal quantity when there are no HOT updates: pruning makes
dead tuples into dead line pointers, and then index vacuuming allows
the dead line pointers to be set unused.  But if there are HOT
updates, intermediate tuples in each HOT chain are removed from the
page but the line pointers are directly set to unused, so VACUUM could
remove a lot of tuples but not need to make very many dead line
pointers unused.  On the other hand, the opposite could also happen:
maybe lots of single-page HOT-pruning has happened prior to VACUUM, so
VACUUM has lots of dead line pointers to make unused but removes very
few tuples because that's already been done.

For the most part, tups_vacuumed seems to be intending to count #1
rather than #2. While the comments for heap_page_prune and
heap_prune_chain are not as clear as might be desirable, it appears to
me that those functions are counting tuples removed from a page,
ignoring everything that might happen to line pointers -- so using the
return value of this function is consistent with wanting to count #1.
However, there's one place that seems slightly unclear about this,
namely this comment:

/*
 * DEAD item pointers are to be vacuumed normally; but we don't
 * count them in tups_vacuumed, else we'd be double-counting (at
 * least in the common case where heap_page_prune() just freed up
 * a non-HOT tuple).
 */

If we're counting tuples removed from pages, then it's not merely that
we would be double-counting, but that we would be counting completely
the wrong thing.  However, as far as I can see, that's just an issue
with the comments; the code is in all cases counting tuple removals,
not dead line pointers marked unused.

If I understand correctly, your first proposal amounts to redefining
tups_vacuumed to count #2 rather than #1, and your second proposal
amounts to making tups_vacuumed count #1 + #2 rather than #1.  I
suggest a third option: have two counters.  tups_vacuum can continue
to count #1, with just a comment adjustment. And then we can add a
second counter which is incremented every time lazy_vacuum_page does
ItemIdSetUnused, which will count #2.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-01 00:43:34 -0400, Tom Lane wrote:
>> ...  What it's really currently doing at the
>> moment of the deadlock is cleaning out its temporary schema after the
>> client disconnected.

> I'm inclined to remove the tests from the backbranches, once we've
> committed a fix for the actual REINDEX issue, and most of the farm has
> been through a cycle or three. I don't think we'll figure out how to
> make them robust in time for next week's release.

Yeah, as I just said in my other message, I see no other alternative for
next week's releases.  We can leave the test in place in HEAD a bit
longer, but I don't really want it there for the beta either, unless we
can think of some better plan.

> I don't think we can really rely on the post-disconnect phase completing
> in a particularly deterministic time.

Exactly :-(

>> Another fairly interesting thing is that this log includes the telltale
>> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  
>> while checking uniqueness of tuple (12,71) in relation "pg_class"
>> Why did I have to dig to find that information in HEAD?  Have we lost
>> some useful context reporting?  (Note this run is in the v10 branch.)

FWIW, as best I can reconstruct the sequence of events, I might just
not've looked.  I got an error and just assumed it was the same as what
we'd seen in the buildfarm; but now we realize that there were multiple
ways to get deadlocks, and only some of them would have shown this.
For the moment I'm willing to assume this isn't a real issue.

regards, tom lane




Why is infinite_recurse test suddenly failing?

2019-05-02 Thread Tom Lane
In the past week, four different buildfarm members have shown
non-reproducing segfaults in the "select infinite_recurse()"
test case, rather than the expected detection of stack overrun
before we get to the point of a segfault.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bonito&dt=2019-05-01%2023%3A05%3A36
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=takin&dt=2019-05-01%2008%3A16%3A48
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=buri&dt=2019-04-27%2023%3A54%3A46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=demoiselle&dt=2019-04-27%2014%3A55%3A52

They're all on HEAD, and they all look like

2019-05-01 23:11:00.145 UTC [13933:65] LOG:  server process (PID 17161) was 
terminated by signal 11: Segmentation fault
2019-05-01 23:11:00.145 UTC [13933:66] DETAIL:  Failed process was running: 
select infinite_recurse();

I scraped the buildfarm database and verified that there are no similar
failures for at least three months back; nor, offhand, can I remember ever
having seen this test fail in many years.  So it seems we broke something
recently.  No idea what though.

(Another possibility, seeing that these are all members of Mark's PPC64
flotilla, is that there's some common misconfiguration --- but it's hard
to credit that such a problem would only affect HEAD not the back
branches.)

Thoughts?

regards, tom lane




Re: pgbench - add option to show actual builtin script code

2019-05-02 Thread Ibrar Ahmed
Now the patch is good now.

The new status of this patch is: Ready for Committer


Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-01 00:43:34 -0400, Tom Lane wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-05-01%2003%3A12%3A13
> 
> The relevant bit of log is
> 
> 2019-05-01 05:24:47.527 CEST [97690:429] pg_regress/create_view LOG:  
> statement: DROP SCHEMA temp_view_test CASCADE;
> 2019-05-01 05:24:47.605 CEST [97690:430] pg_regress/create_view LOG:  
> statement: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:47.858 CEST [97694:1] [unknown] LOG:  connection received: 
> host=[local]
> 2019-05-01 05:24:47.863 CEST [97694:2] [unknown] LOG:  connection authorized: 
> user=pgbuildfarm database=regression
> 2019-05-01 05:24:47.878 CEST [97694:3] pg_regress/reindex_catalog LOG:  
> statement: REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.887 CEST [97694:4] pg_regress/reindex_catalog ERROR:  
> deadlock detected
> 2019-05-01 05:24:48.887 CEST [97694:5] pg_regress/reindex_catalog DETAIL:  
> Process 97694 waits for ShareLock on transaction 2559; blocked by process 
> 97690.
>   Process 97690 waits for RowExclusiveLock on relation 1259 of database 
> 16387; blocked by process 97694.
>   Process 97694: REINDEX TABLE pg_class;
>   Process 97690: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:48.887 CEST [97694:6] pg_regress/reindex_catalog HINT:  See 
> server log for query details.
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  
> while checking uniqueness of tuple (12,71) in relation "pg_class"
> 2019-05-01 05:24:48.887 CEST [97694:8] pg_regress/reindex_catalog STATEMENT:  
> REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.904 CEST [97690:431] pg_regress/create_view LOG:  
> disconnection: session time: 0:00:03.748 user=pgbuildfarm database=regression 
> host=[local]
> 
> which is mighty confusing at first glance, but I think the explanation is
> that what the postmaster is reporting is process 97690's *latest* query,
> not what it's currently doing.  What it's really currently doing at the
> moment of the deadlock is cleaning out its temporary schema after the
> client disconnected.  So this says you were careless about where to insert
> the reindex_catalog test in the test schedule: it can't be after anything
> that creates any temp objects.  That seems like kind of a problem :-(.
> We could put it second, after the tablespace test, but that would mean
> that we're reindexing after very little churn has happened in the
> catalogs, which doesn't seem like much of a stress test.

I'm inclined to remove the tests from the backbranches, once we've
committed a fix for the actual REINDEX issue, and most of the farm has
been through a cycle or three. I don't think we'll figure out how to
make them robust in time for next week's release.

I don't think we can really rely on the post-disconnect phase completing
in a particularly deterministic time. I was wondering for a second
whether we could just trigger the cleanup of temp tables in the test
group before the reindex_catalog table with an explicit DISCARD, but
that seems might fragile too.


Obviously not something trivially changable, and never even remotely
backpatchable, but once more I'm questioning the wisdom of all the
early-release logic we have for catalog tables...


> Another fairly interesting thing is that this log includes the telltale
> 
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  
> while checking uniqueness of tuple (12,71) in relation "pg_class"
> 
> Why did I have to dig to find that information in HEAD?  Have we lost
> some useful context reporting?  (Note this run is in the v10 branch.)

Hm. There's still code for it. And I found another run on HEAD still
showing it
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-05-01%2010%3A45%3A00

+ERROR:  deadlock detected
+DETAIL:  Process 13455 waits for ShareLock on transaction 2986; blocked by 
process 16881.
+Process 16881 waits for RowExclusiveLock on relation 1259 of database 16384; 
blocked by process 13455.
+HINT:  See server log for query details.
+CONTEXT:  while checking uniqueness of tuple (39,35) in relation "pg_class"

What made you think it's not present on HEAD?

Greetings,

Andres Freund




Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2019-05-01 22:01:53 -0400, Tom Lane wrote:
>> I think that argument is pretty pointless considering that "REINDEX TABLE
>> pg_class" does it this way, and that code is nearly old enough to
>> vote.

> IMO the reindex_relation() case isn't comparable.

IMV it's the exact same case: we need to perform a pg_class update while
one or more of pg_class's indexes shouldn't be touched.  I am kind of
wondering why it didn't seem to be necessary to cover this for REINDEX
INDEX back in 2003, but it clearly is necessary now.

> That's not pretty either :(

So, I don't like your patch, you don't like mine.  Anybody else
want to weigh in?

We do not have the luxury of time to argue about this.  If we commit
something today, we *might* get a useful set of CLOBBER_CACHE_ALWAYS
results for all branches by Sunday.  Those regression tests will have to
come out of the back branches on Sunday, because we are not shipping minor
releases with unstable regression tests, and I've heard no proposal for
avoiding the occasional-deadlock problem.

regards, tom lane




Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

2019-05-02 Thread Andres Freund
On 2019-05-02 10:44:44 +0200, Peter Eisentraut wrote:
> On 2019-04-30 17:17, Andres Freund wrote:
> > indOid = RangeVarGetRelidExtended(indexRelation,
> >   
> > concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
> >   0,
> >   
> > RangeVarCallbackForReindexIndex,
> >   (void 
> > *) &heapOid);
> > 
> > doesn't pass concurrent-ness to RangeVarCallbackForReindexIndex(). Which
> > then goes on to lock the table
> > 
> > static void
> > RangeVarCallbackForReindexIndex(const RangeVar *relation,
> > Oid relId, Oid 
> > oldRelId, void *arg)
> > 
> > if (OidIsValid(*heapOid))
> > LockRelationOid(*heapOid, ShareLock);
> > 
> > without knowing that it should use ShareUpdateExclusive. Which
> > e.g. ReindexTable knows:
> > 
> > /* The lock level used here should match reindex_relation(). */
> > heapOid = RangeVarGetRelidExtended(relation,
> >
> > concurrent ? ShareUpdateExclusiveLock : ShareLock,
> >0,
> >
> > RangeVarCallbackOwnsTable, NULL);
> > 
> > so there's a lock upgrade hazard.
> 
> Confirmed.
> 
> What seems weird to me is that the existing callback argument heapOid
> isn't used at all.  It seems to have been like that since the original
> commit of the callback infrastructure.  Therefore also, this code

Hm? But that's a different callback from the one used from
reindex_index()?  reindex_relation() uses the
RangeVarCallbackOwnsTable() callback and passes in NULL as the argument,
whereas reindex_index() passses in RangeVarCallbackForReindexIndex() and
passes in &heapOid?

And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:

 * Lock level here should match reindex_index() heap lock. If 
the OID
 * isn't valid, it means the index as concurrently dropped, 
which is
 * not a problem for us; just return normally.
 */
*heapOid = IndexGetRelation(relId, true);


> Patch to remove the unused code attached; but needs some checking for
> this dubious conditional block.
> 
> Thoughts?

I might miss something here, and it's actually unused. But if so the fix
would be to make it being used, because it's actually
important. Otherwise ReindexIndex() becomes racy or has even more
deadlock hazards.

Greetings,

Andres Freund




Re: New vacuum option to do only freezing

2019-05-02 Thread Andres Freund
Hi,

On 2019-05-02 10:20:25 -0400, Robert Haas wrote:
> On Tue, Apr 16, 2019 at 12:44 PM Tom Lane  wrote:
> > So after thinking about this a bit more ...
> > 2. Act as if the tuple were still live, just as would've happened if the
> > state didn't change till just after we looked instead of just before.
> >
> > Option #2 is a lot simpler and safer, and can be implemented as I
> > suggested earlier, assuming we're all good with the assumption that
> > heap_prepare_freeze_tuple isn't going to do anything bad.
> 
> After studying this more carefully, I agree.  You and Andres and
> Alvaro are all correct, and I'm plain wrong.  Thanks for explaining.
> I have committed a patch that changes the logic as per your
> suggestion, and also removes nleft_dead_tuples and nleft_dead_itemids.

It'd be good if somebody could make a pass over the safety mechanisms in
heap_prepare_freeze_tuple(). I added them at some point, after a data
corrupting bug related to freezing, but they really were more intended
as a secondary layer of defense, rather than the primary one.  My
understanding is still that we assume that we never should reach
heap_prepare_freeze_tuple() for something that is below the horizon,
even after this change, right?

Greetings,

Andres Freund




Re: pgbench - add option to show actual builtin script code

2019-05-02 Thread Fabien COELHO


Hello,

Patch looks good to me, and work fine on my machine. One minor 
observation is option 'list' mostly used to list the elements like 
"pgbench -b list" shows the available builtin scripts. Therefore we 
should use a word which seems to be more relevant like --show-script.


Thanks for the review.

Here is a version with "--show-script". I also thought about "--listing", 
maybe.



The new status of this patch is: Waiting on Author


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ee2501be55..c48593cfb9 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -355,7 +355,6 @@ pgbench  options  d
   
  
 
-
  
   -c clients
   --client=clients
@@ -617,6 +616,16 @@ pgbench  options  d
   
  
 
+ 
+  --show-scriptscriptname
+  
+   
+Show the actual code of builtin script scriptname
+on stderr, and exit immediately.
+   
+  
+ 
+
  
   -t transactions
   --transactions=transactions
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index a03ab281a5..55e6f6464a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -651,6 +651,7 @@ usage(void)
 		   "  --progress-timestamp use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED   set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM  fraction of transactions to log (e.g., 0.01 for 1%%)\n"
+		   "  --show-script=NAME   show builtin script code, then exit\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug  print debugging output\n"
 		   "  -h, --host=HOSTNAME  database server host or socket directory\n"
@@ -4648,7 +4649,7 @@ listAvailableScripts(void)
 
 	fprintf(stderr, "Available builtin scripts:\n");
 	for (i = 0; i < lengthof(builtin_script); i++)
-		fprintf(stderr, "\t%s\n", builtin_script[i].name);
+		fprintf(stderr, "  %13s: %s\n", builtin_script[i].name, builtin_script[i].desc);
 	fprintf(stderr, "\n");
 }
 
@@ -5088,6 +5089,7 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"show-script", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5440,6 +5442,13 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 10:			/* list */
+{
+	const BuiltinScript   *s = findBuiltin(optarg);
+	fprintf(stderr, "-- %s: %s\n%s\n", s->name, s->desc, s->script);
+	exit(0);
+}
+break;
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit(1);
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 69a6d03bb3..f7fa18418b 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -218,6 +218,15 @@ pgbench(
 	],
 	'pgbench builtin list');
 
+# builtin listing
+pgbench(
+	'--show-script se',
+	0,
+	[qr{^$}],
+	[ qr{select-only: }, qr{SELECT abalance FROM pgbench_accounts WHERE},
+	  qr{(?!UPDATE)}, qr{(?!INSERT)} ],
+	'pgbench builtin listing');
+
 my @script_tests = (
 
 	# name, err, { file => contents }


Re: New vacuum option to do only freezing

2019-05-02 Thread Robert Haas
On Tue, Apr 16, 2019 at 12:44 PM Tom Lane  wrote:
> So after thinking about this a bit more ...
>
> ISTM that what we have here is a race condition (ie, tuple changed state
> since heap_page_prune), and that ideally we want the code to resolve it
> as if no race had happened.  That is, either of these behaviors would
> be acceptable:
>
> 1. Delete the tuple, just as heap_page_prune would've done if it had seen
> it DEAD.  (Possibly we could implement that by jumping back and doing
> heap_page_prune again, but that seems pretty messy and bug-prone.
> In any case, if we're not doing index cleanup then this must reduce to
> "replace tuple by a dead line pointer", not remove it altogether.)
>
> 2. Act as if the tuple were still live, just as would've happened if the
> state didn't change till just after we looked instead of just before.
>
> Option #2 is a lot simpler and safer, and can be implemented as I
> suggested earlier, assuming we're all good with the assumption that
> heap_prepare_freeze_tuple isn't going to do anything bad.

After studying this more carefully, I agree.  You and Andres and
Alvaro are all correct, and I'm plain wrong.  Thanks for explaining.
I have committed a patch that changes the logic as per your
suggestion, and also removes nleft_dead_tuples and nleft_dead_itemids.

I'll reply separately to your other point about tups_vacuumed reporting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Fixing order of resowner cleanup in 12, for Windows

2019-05-02 Thread Tom Lane
Thomas Munro  writes:
> A while back I posted a patch[1] to change the order of resowner
> cleanup so that DSM handles are released last.  That's useful for the
> error cleanup path on Windows, when a SharedFileSet is cleaned up (a
> mechanism that's used by parallel CREATE INDEX and parallel hash join,
> for spilling files to disk under a temporary directory, with automatic
> cleanup).

I guess what I'm wondering is if there are any potential negative
consequences, ie code that won't work if we change the order like this.
I'm finding it hard to visualize what that would be, but then again
this failure mode wasn't obvious either.

> I suppose we probably should make the change to 12 though: then owners
> of extensions that use DSM detach hooks (if there any such extensions)
> will have a bit of time to get used to the new order during the beta
> period.  I'll need to find someone to test this with a fault injection
> scenario on Windows before committing it, but wanted to sound out the
> list for any objections to this late change?

Since we haven't started beta yet, I don't see a reason not to change
it.  Worst case is that it causes problems and we revert it.

I concur with not back-patching, in any case.

regards, tom lane




Inconsistent error message wording for REINDEX CONCURRENTLY

2019-05-02 Thread Tom Lane
In view of the REINDEX-on-pg_class kerfuffle that we're currently
sorting through, I was very glad to see that the concurrent reindex
code doesn't even try:

regression=# reindex index concurrently pg_class_oid_index;
psql: ERROR:  concurrent reindex is not supported for catalog relations
regression=# reindex table concurrently pg_class; 
psql: ERROR:  concurrent index creation on system catalog tables is not 
supported

It'd be nice though if those error messages gave the impression of having
been written on the same planet.

(It might be worth comparing wording of other errors-in-common between
what are evidently two completely different code paths...)

regards, tom lane




Re: walsender vs. XLogBackgroundFlush during shutdown

2019-05-02 Thread Simon Riggs
On Wed, 1 May 2019 at 01:28, Tomas Vondra 
wrote:


> Now, this situation is apparently expected, because WalSndWaitForWal()
> does this:
>
> /*
>  * If we're shutting down, trigger pending WAL to be written out,
>  * otherwise we'd possibly end up waiting for WAL that never gets
>  * written, because walwriter has shut down already.
>  */
> if (got_STOPPING)
> XLogBackgroundFlush();
>
> but unfortunately that does not actually do anything, because right at
> the very beginning XLogBackgroundFlush() does this:
>
> /* back off to last completed page boundary */
> WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
>
> That is, it intentionally ignores the incomplete page, which means the
> walsender can't read the record and reach GetFlushRecPtr().
>
> XLogBackgroundFlush() does this since (at least) 2007, so how come we
> never had issues with this before?
>

Yeh, not quite what I originally wrote for that.

I think the confusion is that XLogBackgroundFlush() doesn't do quite what
it says.

XLogWrite() kinda does with its "flexible" parameter. So I suggest we do
the same on XLogBackgroundFlush() so callers can indicate whether they want
it to be flexible or not.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgbench - add option to show actual builtin script code

2019-05-02 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Patch looks good to me, and work fine on my machine. One minor observation is 
option 'list' mostly used to list the elements like "pgbench -b list" shows the 
available builtin scripts. Therefore we should use a word which seems to be 
more relevant like --show-script.

The new status of this patch is: Waiting on Author


Re: POC: Cleaning up orphaned files using undo logs

2019-05-02 Thread Robert Haas
On Thu, May 2, 2019 at 5:32 AM Dilip Kumar  wrote:
> Yeah, at least in this patch it looks silly.  Actually, I added that
> index to make the qsort stable when execute_undo_action sorts them in
> block order.  But, as part of this patch we don't have that processing
> so either we can remove this structure completely as you suggested but
> undo processing patch has to add that structure or we can just add
> comment that why we added this index field.

Well, the qsort comparator could compute the index as ptr - array_base
just like any other code, couldn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-05-02 Thread Laurenz Albe
I wrote:
> I propose the attached patch which fixes the problem.

I forgot to attach the patch.  Here it is.

Yours,
Laurenz Albe
From 6bbad0acf3baae3a08d1f911b7017642c8a8afe9 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 2 May 2019 14:32:27 +0200
Subject: [PATCH] Don't canonicalize dateranges with 'infinity' bounds

Since adding one to infinity doesn't change the value, canonicalization
of dateranges with an explicit '-infinity' as exclusive lower bound
would make it an inclusive bound, while dateranges with 'infinity' as
inclusive upper bound would turn it into an exclusive bound.
---
 src/backend/utils/adt/rangetypes.c   |  4 ++--
 src/test/regress/expected/rangetypes.out | 24 
 src/test/regress/sql/rangetypes.sql  |  4 
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 72c450c70e..a98f17bb66 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -1431,13 +1431,13 @@ daterange_canonical(PG_FUNCTION_ARGS)
 	if (empty)
 		PG_RETURN_RANGE_P(r);
 
-	if (!lower.infinite && !lower.inclusive)
+	if (!(lower.infinite || DATE_NOT_FINITE(lower.val)) && !lower.inclusive)
 	{
 		lower.val = DirectFunctionCall2(date_pli, lower.val, Int32GetDatum(1));
 		lower.inclusive = true;
 	}
 
-	if (!upper.infinite && upper.inclusive)
+	if (!(upper.infinite || DATE_NOT_FINITE(upper.val)) && upper.inclusive)
 	{
 		upper.val = DirectFunctionCall2(date_pli, upper.val, Int32GetDatum(1));
 		upper.inclusive = false;
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index accf1e0d9e..60d875e898 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -652,6 +652,30 @@ select daterange('2000-01-10'::date, '2000-01-11'::date, '(]');
  [01-11-2000,01-12-2000)
 (1 row)
 
+select daterange('-infinity'::date, '2000-01-01'::date, '()');
+   daterange
+
+ (-infinity,01-01-2000)
+(1 row)
+
+select daterange('-infinity'::date, '2000-01-01'::date, '[)');
+   daterange
+
+ [-infinity,01-01-2000)
+(1 row)
+
+select daterange('2000-01-01'::date, 'infinity'::date, '[)');
+   daterange   
+---
+ [01-01-2000,infinity)
+(1 row)
+
+select daterange('2000-01-01'::date, 'infinity'::date, '[]');
+   daterange   
+---
+ [01-01-2000,infinity]
+(1 row)
+
 -- test GiST index that's been built incrementally
 create table test_range_gist(ir int4range);
 create index test_range_gist_idx on test_range_gist using gist (ir);
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 55638a85ee..9fdb1953df 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -165,6 +165,10 @@ select daterange('2000-01-10'::date, '2000-01-20'::date, '(]');
 select daterange('2000-01-10'::date, '2000-01-20'::date, '()');
 select daterange('2000-01-10'::date, '2000-01-11'::date, '()');
 select daterange('2000-01-10'::date, '2000-01-11'::date, '(]');
+select daterange('-infinity'::date, '2000-01-01'::date, '()');
+select daterange('-infinity'::date, '2000-01-01'::date, '[)');
+select daterange('2000-01-01'::date, 'infinity'::date, '[)');
+select daterange('2000-01-01'::date, 'infinity'::date, '[]');
 
 -- test GiST index that's been built incrementally
 create table test_range_gist(ir int4range);
-- 
2.20.1



Bad canonicalization for dateranges with 'infinity' bounds

2019-05-02 Thread Laurenz Albe
https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-INFINITE has:

   Also, some element types have a notion of “infinity”, but that is just
   another value so far as the range type mechanisms are concerned.
   For example, in timestamp ranges, [today,] means the same thing as [today,).
   But [today,infinity] means something different from [today,infinity) —
   the latter excludes the special timestamp value infinity.

This does not work as expected for ranges with discrete base types,
notably daterange:

test=> SELECT '[2000-01-01,infinity]'::daterange;
   daterange   
---
 [2000-01-01,infinity)
(1 row)

test=> SELECT '(-infinity,2000-01-01)'::daterange;
   daterange

 [-infinity,2000-01-01)
(1 row)

This is because "daterange_canonical" makes no difference for 'infinity',
and adding one to infinity does not change the value.

I propose the attached patch which fixes the problem.

Yours,
Laurenz Albe





Re: walsender vs. XLogBackgroundFlush during shutdown

2019-05-02 Thread Tomas Vondra

On Wed, May 01, 2019 at 07:12:52PM +0200, Alexander Kukushkin wrote:

Hi,

On Wed, 1 May 2019 at 17:02, Tomas Vondra  wrote:


OK, so that seems like a separate issue, somewhat unrelated to the issue I
reported. And I'm not sure it's a walsender issue - it seems it might be a
psycopg2 issue in not reporting the flush properly, no?


Agree, it is a different issue, but I am unsure what to blame,
postgres or psycopg2.
Right now in the psycopg2 we confirm more or less every XLogData
message, but at the same time LSN on the primary is moving forward and
we get updates with keepalive messages.
I perfectly understand the need to periodically send updates of flush
= walEnd (which comes from keepalive). It might happen that there is
no transaction activity but WAL is still generated and as a result
replication slot will prevent WAL from being cleaned up.
From the client side perspective, it confirmed everything that it
should, but from the postgres side, this is not enough to shut down
cleanly. Maybe it is possible to change the check (sentPtr ==
replicatedPtr) to something like (lastMsgSentPtr <= replicatedPtr) or
it would be unsafe?



I don't know.

In general I think it's a bit strange that we're waiting for walsender
processes to catch up even in fast shutdown mode, instead of just aborting
them like other backends. But I assume there are reasons for that. OTOH it
makes us vulnerable to issues like this, when a (presumably) misbehaving
downstream prevents a shutdown.


>ptr=0x55cb958dca08, len=94) at be-secure.c:318
>#2  0x55cb93aa7b87 in secure_write (port=0x55cb958d71e0,
>ptr=0x55cb958dca08, len=94) at be-secure.c:265
>#3  0x55cb93ab6bf9 in internal_flush () at pqcomm.c:1433
>#4  0x55cb93ab6b89 in socket_flush () at pqcomm.c:1409
>#5  0x55cb93dac30b in send_message_to_frontend
>(edata=0x55cb942b4380 ) at elog.c:3317
>#6  0x55cb93da8973 in EmitErrorReport () at elog.c:1481
>#7  0x55cb93da5abf in errfinish (dummy=0) at elog.c:481
>#8  0x55cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3
>"sending replication keepalive") at elog.c:1376
>#9  0x55cb93bcae71 in WalSndKeepalive (requestReply=true) at
>walsender.c:3358

Is it stuck in the send() call forever, or did you happen to grab
this bracktrace?


No, it didn't stuck there. During the shutdown postgres starts sending
a few thousand keepalive messages per second and receives back so many
feedback message, therefore the chances of interrupting somewhere in
the send are quite high.


Uh, that seems a bit broken, perhaps?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Fixing order of resowner cleanup in 12, for Windows

2019-05-02 Thread Thomas Munro
Hi all,

A while back I posted a patch[1] to change the order of resowner
cleanup so that DSM handles are released last.  That's useful for the
error cleanup path on Windows, when a SharedFileSet is cleaned up (a
mechanism that's used by parallel CREATE INDEX and parallel hash join,
for spilling files to disk under a temporary directory, with automatic
cleanup).  Previously we believed that it was OK to unlink things that
other processes might have currently open as long as you use the
FILE_SHARE_DELETE flag, but that turned out not to be the full story:
you can unlink files that someone has open, but you can't unlink the
directory that contains them!  Hence the desire to reverse the
clean-up order.

It didn't seem worth the risk of back-patching the change, because the
only consequence is a confusing message that appears somewhere near
the real error:

LOG: could not rmdir directory
"base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty

I suppose we probably should make the change to 12 though: then owners
of extensions that use DSM detach hooks (if there any such extensions)
will have a bit of time to get used to the new order during the beta
period.  I'll need to find someone to test this with a fault injection
scenario on Windows before committing it, but wanted to sound out the
list for any objections to this late change?

[1] 
https://www.postgresql.org/message-id/CAEepm%3D2ikUtjmiJ18bTnwaeUBoiYN%3DwMDSdhU1jy%3D8WzNhET-Q%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com




Re: speeding up planning with partitions

2019-05-02 Thread Amit Langote
On Wed, May 1, 2019 at 4:08 AM Tom Lane  wrote:
> OK, I tweaked that a bit and pushed both versions.

Thank you.

Regards,
Amit




Re: POC: Cleaning up orphaned files using undo logs

2019-05-02 Thread Dilip Kumar
On Tue, Apr 30, 2019 at 8:44 PM Robert Haas  wrote:

> UndoRecInfo looks a bit silly, I think.  Isn't index just the index of
> this entry in the array?  You can always figure that out by ptr -
> array_base. Instead of having UndoRecPtr urp in this structure, how
> about adding that to UnpackedUndoRecord?  When inserting, caller
> leaves it unset and InsertPreparedUndo sets it; when retrieving,
> UndoFetchRecord or UndoRecordBulkFetch sets it.  With those two
> changes, I think you can get rid of UndoRecInfo entirely and just
> return an array of UnpackedUndoRecords.  This might also eliminate the
> need for an 'urp' member in PreparedUndoSpace.
>
Yeah, at least in this patch it looks silly.  Actually, I added that
index to make the qsort stable when execute_undo_action sorts them in
block order.  But, as part of this patch we don't have that processing
so either we can remove this structure completely as you suggested but
undo processing patch has to add that structure or we can just add
comment that why we added this index field.

I am ok with other comments and will work on them.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Volatile function weirdness

2019-05-02 Thread Julien Rouhaud
On Thu, May 2, 2019 at 11:05 AM Vik Fearing  wrote:
>
> Why do these two queries produce different results?

See https://www.postgresql.org/message-id/30382.1537932...@sss.pgh.pa.us




Volatile function weirdness

2019-05-02 Thread Vik Fearing
Why do these two queries produce different results?

vik=# select random(), random(), random() from generate_series(1, 5);
  random   |  random   |  random
---+---+---
  0.47517032455653 | 0.631991865579039 | 0.985628996044397
 0.341754949185997 | 0.304212234914303 | 0.545252074021846
 0.684523592237383 | 0.595671262592077 | 0.560677206143737
 0.352716268971562 | 0.131561728194356 | 0.399888414423913
 0.877433629240841 | 0.543397729285061 | 0.133583522867411
(5 rows)

vik=# select random(), random(), random() from generate_series(1, 5)
order by random();
  random   |  random   |  random
---+---+---
 0.108651491813362 | 0.108651491813362 | 0.108651491813362
 0.178489942103624 | 0.178489942103624 | 0.178489942103624
 0.343531942460686 | 0.343531942460686 | 0.343531942460686
 0.471797252073884 | 0.471797252073884 | 0.471797252073884
 0.652373222634196 | 0.652373222634196 | 0.652373222634196
(5 rows)

Obviously I'm not talking about the actual values, but the fact that
when the volatile function is put in the ORDER BY clause, it seems to
get called just once per row rather than each time like the first query.

Is this as designed?  It's certainly unexpected, and my initial reaction
is undesirable.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support




Re: Unhappy about API changes in the no-fsm-for-small-rels patch

2019-05-02 Thread Amit Kapila
On Thu, May 2, 2019 at 12:39 PM John Naylor  wrote:
>
> On Thu, May 2, 2019 at 2:31 PM Amit Kapila  wrote:
> > @@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
> >   if ((rel->rd_smgr->smgr_fsm_nblocks == 0 ||
> >   rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) &&
> >   !smgrexists(rel->rd_smgr, FSM_FORKNUM))
> > + {
> >   smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
> > + fsm_clear_local_map(rel);
> > + }
> >
> > I think this won't be correct because when we call fsm_extend via
> > vacuum the local map won't be already existing, so it won't issue an
> > invalidation call.  Isn't it better to directly call
> > CacheInvalidateRelcache here to notify other backends that their local
> > maps are invalid now?
>
> Yes, you're quite correct.
>

Okay,  I have updated the patch to incorporate your changes and call
relcache invalidation at required places. I have updated comments at a
few places as well.  The summarization of this patch is that (a) it
moves the local map to relation cache (b) performs the cache
invalidation whenever we create fsm (either via backend or vacuum),
when some space in a page is freed by vacuum (provided fsm doesn't
exist) or whenever the local map is cleared (c) additionally, we clear
the local map when we found a block from FSM, when we have already
tried all the blocks present in cache or when we are allowed to create
FSM.

If we agree on this, then we can update the README accordingly.

Can you please test/review?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


store_local_map_relcache_v3.patch
Description: Binary data


Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

2019-05-02 Thread Peter Eisentraut
On 2019-04-30 17:17, Andres Freund wrote:
>   indOid = RangeVarGetRelidExtended(indexRelation,
> 
> concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
> 0,
> 
> RangeVarCallbackForReindexIndex,
> (void 
> *) &heapOid);
> 
> doesn't pass concurrent-ness to RangeVarCallbackForReindexIndex(). Which
> then goes on to lock the table
> 
> static void
> RangeVarCallbackForReindexIndex(const RangeVar *relation,
>   Oid relId, Oid 
> oldRelId, void *arg)
> 
>   if (OidIsValid(*heapOid))
>   LockRelationOid(*heapOid, ShareLock);
> 
> without knowing that it should use ShareUpdateExclusive. Which
> e.g. ReindexTable knows:
> 
>   /* The lock level used here should match reindex_relation(). */
>   heapOid = RangeVarGetRelidExtended(relation,
>  
> concurrent ? ShareUpdateExclusiveLock : ShareLock,
>  0,
>  
> RangeVarCallbackOwnsTable, NULL);
> 
> so there's a lock upgrade hazard.

Confirmed.

What seems weird to me is that the existing callback argument heapOid
isn't used at all.  It seems to have been like that since the original
commit of the callback infrastructure.  Therefore also, this code

if (relId != oldRelId && OidIsValid(oldRelId))
{
/* lock level here should match reindex_index() heap lock */
UnlockRelationOid(*heapOid, ShareLock);

in RangeVarCallbackForReindexIndex() can't ever do anything useful.

Patch to remove the unused code attached; but needs some checking for
this dubious conditional block.

Thoughts?


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 89db1445787b3fb676ecfa07964682e72a58e7bb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 2 May 2019 10:38:06 +0200
Subject: [PATCH] Remove unused argument of RangeVarCallbackForReindexIndex()

---
 src/backend/commands/indexcmds.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fabba3bacd..2800255d9d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2304,7 +2304,6 @@ void
 ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 {
Oid indOid;
-   Oid heapOid = InvalidOid;
Relationirel;
charpersistence;
 
@@ -2317,7 +2316,7 @@ ReindexIndex(RangeVar *indexRelation, int options, bool 
concurrent)
  
concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
  0,
  
RangeVarCallbackForReindexIndex,
- (void 
*) &heapOid);
+ NULL);
 
/*
 * Obtain the current persistence of the existing index.  We already 
hold
@@ -2350,19 +2349,6 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
Oid relId, Oid 
oldRelId, void *arg)
 {
charrelkind;
-   Oid*heapOid = (Oid *) arg;
-
-   /*
-* If we previously locked some other index's heap, and the name we're
-* looking up no longer refers to that relation, release the now-useless
-* lock.
-*/
-   if (relId != oldRelId && OidIsValid(oldRelId))
-   {
-   /* lock level here should match reindex_index() heap lock */
-   UnlockRelationOid(*heapOid, ShareLock);
-   *heapOid = InvalidOid;
-   }
 
/* If the relation does not exist, there's nothing more to do. */
if (!OidIsValid(relId))
@@ -2394,9 +2380,9 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 * isn't valid, it means the index as concurrently dropped, 
which is
 * not a problem for us; just return normally.
 */
-   *heapOid = IndexGetRelation(relId, true);
-   if (OidIsValid(*heapOid))
-   LockRelationOid(*heapOid, ShareLock);
+   Oid heapOid = IndexGetRelation(relId, true);
+

Re: How to estimate the shared memory size required for parallel scan?

2019-05-02 Thread Thomas Munro
On Mon, Aug 20, 2018 at 11:09 AM Thomas Munro
 wrote:
> 2.  Teach your GetForeignPath function to do something like this:
> [blah blah blah]

I was pinged off-list by someone who is working on a parallel-aware
FDW, who asked if I still had the test code I mentioned up-thread.
While digging that out, I couldn't resist hacking it a bit more until
it gave the right answers, only sooner:

$ seq -f '%20.0f' 1 1000 > numbers.csv

create extension file_fdw;
create server files foreign data wrapper file_fdw;
create foreign table numbers (n int) server files
  options (filename '/path/to/numbers.csv', format 'csv');
explain select count(*) from numbers;

select count(*) from numbers;

Non-parallel: 2.6s
1 worker: 1.4s
2 workers: 0.9s
3 workers: 0.7s

Finally, I can do parallel hash joins between CSV files!

select count(*) from numbers n1 join numbers n2 using (n);

Non-parallel: 11.4s
1 worker: 6.6s
2 workers: 4.8s
3 workers: 4.1s

There are probably some weird fence-post or accounting bugs hiding in
this patch -- it has to count bytes carefully, and deal with some edge
cases around lines that span chunks.  It's only a rough draft, but
might eventually serve as a useful example of a parallel-aware FDW.

-- 
Thomas Munro
https://enterprisedb.com


0001-Make-file_fdw-parallel-aware.patch
Description: Binary data


Re: Failure in contrib test _int on loach

2019-05-02 Thread Heikki Linnakangas

(resending, previous attempt didn't make it to pgsql-hackers)

On 29/04/2019 16:16, Anastasia Lubennikova wrote:

In previous emails, I have sent two patches with test and bugfix (see
attached).
After Heikki shared his concerns, I've rechecked the algorithm and
haven't found any potential error.
So, if other hackers are agreed with my reasoning, the suggested fix is
sufficient and can be committed.


I still believe there is a problem with grandparent splits with this. 
I'll try to construct a test case later this week, unless you manage to 
create one before that.


- Heikki




Re: Unhappy about API changes in the no-fsm-for-small-rels patch

2019-05-02 Thread John Naylor
On Thu, May 2, 2019 at 2:31 PM Amit Kapila  wrote:
> @@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>   if ((rel->rd_smgr->smgr_fsm_nblocks == 0 ||
>   rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) &&
>   !smgrexists(rel->rd_smgr, FSM_FORKNUM))
> + {
>   smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
> + fsm_clear_local_map(rel);
> + }
>
> I think this won't be correct because when we call fsm_extend via
> vacuum the local map won't be already existing, so it won't issue an
> invalidation call.  Isn't it better to directly call
> CacheInvalidateRelcache here to notify other backends that their local
> maps are invalid now?

Yes, you're quite correct.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services