Re: pg_upgrade and wraparound

2021-05-08 Thread Michael Paquier
On Mon, May 03, 2021 at 11:10:44AM -0400, Jan Wieck wrote:
> Not yet, but I will enter it so that we can get it into 15 for sure.

I may be missing something but this is not listed:
https://commitfest.postgresql.org/33/

Could you add it to the CF app please?  There are so many patches and
discussions that this would easily get lost if you don't register it.
And from what I can see having a discussion on this matter looks
adapted to me.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and wraparound

2021-05-03 Thread Jan Wieck

On 4/30/21 3:32 PM, Bruce Momjian wrote:

On Sat, Mar 13, 2021 at 08:43:54AM -0500, Jan Wieck wrote:

On 3/12/21 8:30 PM, Michael Paquier wrote:
> Hi Jan,
> 
> On Fri, Mar 12, 2021 at 06:13:33PM -0500, Jan Wieck wrote:

> > One of the things in my way is that when using pg_resetwal to put the
> > NextXID way into the future (to push the old cluster close to wraparound for
> > example), the postmaster won't start because it doesn't have the pg_xact
> > files for that around. Should pg_resetwal create the files in the gap
> > between the old NextXID and the new one?
> 
> I think that you should add this patch to the next commit fest to

> track it properly:
> https://commitfest.postgresql.org/33/
> --
> Michael
> 


Actually this is the wrong patch (this one is for PG-12, not for HEAD). Will
update later today.

But yes, putting it into the next commitfest after initial discussion is the
plan.


Uh, were either of these things done?



Not yet, but I will enter it so that we can get it into 15 for sure.


Regards, Jan

--
Jan Wieck
Postgres User since 1994




Re: pg_upgrade and wraparound

2021-04-30 Thread Bruce Momjian
On Sat, Mar 13, 2021 at 08:43:54AM -0500, Jan Wieck wrote:
> On 3/12/21 8:30 PM, Michael Paquier wrote:
> > Hi Jan,
> > 
> > On Fri, Mar 12, 2021 at 06:13:33PM -0500, Jan Wieck wrote:
> > > One of the things in my way is that when using pg_resetwal to put the
> > > NextXID way into the future (to push the old cluster close to wraparound 
> > > for
> > > example), the postmaster won't start because it doesn't have the pg_xact
> > > files for that around. Should pg_resetwal create the files in the gap
> > > between the old NextXID and the new one?
> > 
> > I think that you should add this patch to the next commit fest to
> > track it properly:
> > https://commitfest.postgresql.org/33/
> > --
> > Michael
> > 
> 
> Actually this is the wrong patch (this one is for PG-12, not for HEAD). Will
> update later today.
> 
> But yes, putting it into the next commitfest after initial discussion is the
> plan.

Uh, were either of these things done?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade and wraparound

2021-03-13 Thread Jan Wieck

On 3/12/21 8:30 PM, Michael Paquier wrote:

Hi Jan,

On Fri, Mar 12, 2021 at 06:13:33PM -0500, Jan Wieck wrote:

One of the things in my way is that when using pg_resetwal to put the
NextXID way into the future (to push the old cluster close to wraparound for
example), the postmaster won't start because it doesn't have the pg_xact
files for that around. Should pg_resetwal create the files in the gap
between the old NextXID and the new one?


I think that you should add this patch to the next commit fest to
track it properly:
https://commitfest.postgresql.org/33/
--
Michael



Actually this is the wrong patch (this one is for PG-12, not for HEAD). 
Will update later today.


But yes, putting it into the next commitfest after initial discussion is 
the plan.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: pg_upgrade and wraparound

2021-03-12 Thread Michael Paquier
Hi Jan,

On Fri, Mar 12, 2021 at 06:13:33PM -0500, Jan Wieck wrote:
> One of the things in my way is that when using pg_resetwal to put the
> NextXID way into the future (to push the old cluster close to wraparound for
> example), the postmaster won't start because it doesn't have the pg_xact
> files for that around. Should pg_resetwal create the files in the gap
> between the old NextXID and the new one?

I think that you should add this patch to the next commit fest to
track it properly:
https://commitfest.postgresql.org/33/
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and wraparound

2021-03-12 Thread Jan Wieck

Resurrecting an old thread.

We (AWS) have seen this wraparound during pg_upgrade more often recently 
with customers who have millions of large objects in their databases.


On 6/11/18 1:14 PM, Tom Lane wrote:

Andres Freund  writes:

I suspect the issue is that pg_resetwal does:
if (set_xid != 0)
{
ControlFile.checkPointCopy.nextXid = set_xid;



/*
 * For the moment, just set oldestXid to a value that will force
 * immediate autovacuum-for-wraparound.  It's not clear whether 
adding
 * user control of this is useful, so let's just do something 
that's
 * reasonably safe.  The magic constant here corresponds to the
 * maximum allowed value of autovacuum_freeze_max_age.
 */
ControlFile.checkPointCopy.oldestXid = set_xid - 20;
if (ControlFile.checkPointCopy.oldestXid < 
FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += 
FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}



but we have codepath that doesn't check for oldestXidDB being
InvalidOid.  Not great.


Hm, I think I'd define the problem as "pg_resetwal is violating the
expectation that oldestXidDB be valid".

However, this just explains the basically-cosmetic issue that the
complaint message mentions OID 0.  It doesn't really get us to the
answer to why Alexander is seeing a failure.  It might be useful
to see pg_controldata output for the old cluster, as well as
"select datname, datfrozenxid from pg_database" output from the
old cluster.


Unfortunately I don't have pg_controldata output from the old clusters 
either. I would like to be able to artificially create an "old" cluster 
that fails during pg_upgrade in that way.


One of the things in my way is that when using pg_resetwal to put the 
NextXID way into the future (to push the old cluster close to wraparound 
for example), the postmaster won't start because it doesn't have the 
pg_xact files for that around. Should pg_resetwal create the files in 
the gap between the old NextXID and the new one?


Onw thing I do have is a patch that provides a workaround for the 
problem as well as a substantial speed improvement for the case at hand. 
This patch adds some options to pg_upgrade, pg_dump and pg_restore.


Option added to pg_dump:

--blob-in-parallel

This option requires --schema-only. It causes pg_dump to emit the BLOB 
metadata with SECTION_DATA instead of SECTION_PRE_DATA. This causes the 
statements for creating the large object metadata (lo_create(OID) and 
ALTER LARGE OBJECT) to move into the parallel phase of pg_restore, which 
means that their metadata will be created in parallel. In my tests a 
database containing large objects only is upgraded in 1/#cores the time.


Option added to pg_restore:

--blob-batch-size=N

With this option pg_restore tries to put N BLOB TOC entries into one 
transaction. This is per parallel worker and it will commit those 
batches if there is a change in object type, so only BLOB TOC entries 
will ever be batched at all. With a sufficient 
'max_locks_per_transation' a --blob-batch-size=1000 nicely reduces the 
number of XIDs consumed for upgrading 10M large objects from 20M to 10K.


Options added to pg_upgrade:

--blob-in-parallel   forwarded to pg_dump
--blob-batch-size=N  forwarded to pg_restore
--restore-jobs=N forwarded as --jobs=N to pg_restore


Patch is attached.


Regards, Jan


--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c6059fc..fdcb5e7 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -865,6 +865,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
 	WaitForCommands(AH, pipefd);
 
 	/*
+	 * Close an eventually open BLOB batch transaction.
+	 */
+	CommitBlobTransaction((Archive *)AH);
+
+	/*
 	 * Disconnect from database and clean up.
 	 */
 	set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 97941fa..2bedd02 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -201,6 +201,8 @@ typedef struct Archive
 
 	int			numWorkers;		/* number of parallel processes */
 	char	   *sync_snapshot_id;	/* sync snapshot id for parallel operation */
+	int			blobBatchSize;	/* # of blob to restore per transaction */
+	bool		blobInParallel;	/* place "BLOB" TEs in SECTION_DATA */
 
 	/* info needed for string escaping */
 	int			encoding;		/* libpq code for client_encoding */
@@ -268,6 +270,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
 extern int	StartBlob(Archive *AH, Oid oid);
 extern int	EndBlob(Archive *AH, Oid oid);
 
+extern void CommitBlobTransaction(Archive 

Re: pg_upgrade and wraparound

2018-06-27 Thread Daniel Verite
Alexander Shutyaev wrote:

> Is there any hope the issue with pg_upgrade can be resolved? If not,
> could you give me some hints as to how can I decrease time needed
> for pg_dumpall | psql?

Not sure about the larger problem, but for the part about having more
than 1 million large objects that get committed individually, setting
fsync=off for the reload phase is likely to help a lot.

Just don't forget to turn it on again when it's done. See
https://blog.2ndquadrant.com/postgresql-fsync-off-warning-in-config-file/


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_upgrade and wraparound

2018-06-26 Thread Arjen Nienhuis
On Tue, Jun 26, 2018 at 8:38 AM Alexander Shutyaev  wrote:
>
> Hello again,
>
> I've performed another test - I've migrated to the new cluster using dump 
> restore: pg_dumpall | psql. It went well, although it took 6 days while 
> pg_upgrade usually took a night.
>
> Is there any hope the issue with pg_upgrade can be resolved? If not, could 
> you give me some hints as to how can I decrease time needed for pg_dumpall | 
> psql?

If you use pg_dump with the custom format you can use pg_restore
--jobs=n to do a parallel restore.

I think you can do pg_dumpall --globals-only to restore users, and
then do a pg_dump per database.

Groeten, Arjen



Re: pg_upgrade and wraparound

2018-06-26 Thread Alexander Shutyaev
Hello again,

I've performed another test - I've migrated to the new cluster using dump
restore: pg_dumpall | psql. It went well, although it took 6 days while
pg_upgrade usually took a night.

Is there any hope the issue with pg_upgrade can be resolved? If not, could
you give me some hints as to how can I decrease time needed for pg_dumpall
| psql?

Thanks in advance!

2018-06-13 0:11 GMT+03:00 Alexander Shutyaev :

> Back again,
>
> >> Alexander, could you hack things up so autovacuum logging is enabled
> >> (log_autovacuum_min_duration=0), and see whether it's triggered?
>
> I've changed this config setting in both 9.6 and 10.4 postgresql.conf,
> then I've ran pg_upgrade once more.
>
> However I'm not sure how can I see whether autovacuum was triggered or
> not. I've tried grepping the logs for lines containing both 'vacuum' and
> 'auto' (case-insensitive) - there were none. If you can be more specific, I
> can look for anything else.
>
> I've googled on how can one see that the autovacuum is working, and found
> out this query, which I ran on the 10.4 cluster:
>
> select count(*) from pg_stat_all_tables where last_autovacuum is not null;
>  count
> ---
>  0
> (1 row)
>
> So it seems autovacuum is indeed not working, just as you proposed.
>
> If I correctly summarized all your responses, the problem is that:
>
> 1) pg_restore (as part of pg_upgrade) inserts each large object in a
> different transaction
>
> That seems true to me given the log output - each time an object is
> inserted the wraparound warning decrements by 1
>
> 2) the autovacuum doesn't work while the database is restored
>
> That also seems true (see above)
>
> 3) the number of large objects is so big that as they are restored the
> transaction wraparound occurs
>
> Here's the number of large objects taken from the 9.6 cluster (spaces
> added manually for clarity):
>
> select count(*) from pg_largeobject_metadata ;
>count
> ---
>  133 635 871
> (1 row)
>
> If I've googled correctly - the transaction number is a 32bit integer so
> it's limit is 2 147 483 647 which is a lot more. I guess I'm missing
> something.
>
> This is just my attempt to summarize our progress so far.
>
> I'm further open to your suggestions.
>
> 2018-06-12 14:32 GMT+03:00 Daniel Verite :
>
>> Andres Freund wrote:
>>
>> > I'm not entirely clear why pg_restore appears to use a separate
>> > transaction for each large object, surely exascerbating the problem.
>>
>> To make sure that per-object locks don't fill up the shared
>> lock table?
>> There might be hundreds of thousands of large objects.
>> If it had to restore N objects per transaction, would it know
>> how to compute N that is large enough to be effective
>> and small enough not to exhaust the shared table?
>>
>> Best regards,
>> --
>> Daniel Vérité
>> PostgreSQL-powered mailer: http://www.manitou-mail.org
>> Twitter: @DanielVerite
>>
>
>


Re: pg_upgrade and wraparound

2018-06-12 Thread Alexander Shutyaev
 Back again,

>> Alexander, could you hack things up so autovacuum logging is enabled
>> (log_autovacuum_min_duration=0), and see whether it's triggered?

I've changed this config setting in both 9.6 and 10.4 postgresql.conf, then
I've ran pg_upgrade once more.

However I'm not sure how can I see whether autovacuum was triggered or not.
I've tried grepping the logs for lines containing both 'vacuum' and 'auto'
(case-insensitive) - there were none. If you can be more specific, I can
look for anything else.

I've googled on how can one see that the autovacuum is working, and found
out this query, which I ran on the 10.4 cluster:

select count(*) from pg_stat_all_tables where last_autovacuum is not null;
 count
---
 0
(1 row)

So it seems autovacuum is indeed not working, just as you proposed.

If I correctly summarized all your responses, the problem is that:

1) pg_restore (as part of pg_upgrade) inserts each large object in a
different transaction

That seems true to me given the log output - each time an object is
inserted the wraparound warning decrements by 1

2) the autovacuum doesn't work while the database is restored

That also seems true (see above)

3) the number of large objects is so big that as they are restored the
transaction wraparound occurs

Here's the number of large objects taken from the 9.6 cluster (spaces added
manually for clarity):

select count(*) from pg_largeobject_metadata ;
   count
---
 133 635 871
(1 row)

If I've googled correctly - the transaction number is a 32bit integer so
it's limit is 2 147 483 647 which is a lot more. I guess I'm missing
something.

This is just my attempt to summarize our progress so far.

I'm further open to your suggestions.

2018-06-12 14:32 GMT+03:00 Daniel Verite :

> Andres Freund wrote:
>
> > I'm not entirely clear why pg_restore appears to use a separate
> > transaction for each large object, surely exascerbating the problem.
>
> To make sure that per-object locks don't fill up the shared
> lock table?
> There might be hundreds of thousands of large objects.
> If it had to restore N objects per transaction, would it know
> how to compute N that is large enough to be effective
> and small enough not to exhaust the shared table?
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


Re: pg_upgrade and wraparound

2018-06-12 Thread Daniel Verite
Andres Freund wrote:

> I'm not entirely clear why pg_restore appears to use a separate
> transaction for each large object, surely exascerbating the problem.

To make sure that per-object locks don't fill up the shared
lock table?
There might be hundreds of thousands of large objects.
If it had to restore N objects per transaction, would it know
how to compute N that is large enough to be effective
and small enough not to exhaust the shared table?

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_upgrade and wraparound

2018-06-11 Thread Alexander Shutyaev
>>  From the query below I am going to say the above query was done on the
bof database. Is that correct?

Yes, it is.

>>  Can you run the table_name query in template0 in the 9.6 cluster?

At first I couldn't. There was an error:

psql: FATAL:  database "template0" is not currently accepting connections

I've googled for it and found out that template0 is some special system
database that is protected from connections, but that can be changed (see
https://wiki.postgresql.org/wiki/Adventures_in_PostgreSQL,_Episode_1)
I've changed that and connected to this database and ran your query. The
result is attached in 96-query1-template0.txt.

2018-06-11 22:10 GMT+03:00 Adrian Klaver :

> On 06/11/2018 11:32 AM, Alexander Shutyaev wrote:
>
>> I'm back with more details.
>>
>> First, I've deleted the smaller sslentry database, since I don't need it,
>> just so that it doesn't somehow spoil the picture. Now there is only 1 user
>> database - bof (OID=16400). After that I've ran the pg_upgrade on a clean
>> 10.4 cluster and it failed in the same way.
>>
>> Now, the answers to your queries.
>>
>>  >> 2) The upgrade stops because of transaction ID wraparound, which is
>> strange as that is not showing up in the 9.6 cluster I presume. You might
>> want the queries found below on the 9.6 and 10 clusters to help figure this
>> out:
>>
>>  >> SELECT c.oid::regclass as table_name,
>>  >>greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
>>  >> FROM pg_class c
>>  >> LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
>>  >> WHERE c.relkind IN ('r', 'm');
>>
>> 9.6 result - see attached 96-query1.txt
>> 10.4 result - see attached 104-query1.txt
>>
>
> I should have been clearer in my previous post, the above query is per
> database. From the query below I am going to say the above query was done
> on the bof database. Is that correct?
>
> Given the below from 96-query2.txt:
>
> template0 | 110588398
>
> Can you run the table_name query in template0 in the 9.6 cluster?
>
>
>
>>  >> SELECT datname, age(datfrozenxid) FROM pg_database;
>>
>> 9.6 result - see attached 96-query2.txt
>> 10.4 result - see attached 104-query2.txt
>>
>>  >>  It might be useful
>>  >> to see pg_controldata output for the old cluster, as well as
>>  >> "select datname, datfrozenxid from pg_database" output from the
>>  >> old cluster.
>>
>> for the query - see above, for pg_controldata:
>>
>> 9.6 - see attached 96-pg_controldata.txt
>> 10.4 - see attached 104-pg_controldata.txt
>>
>>  >> Alexander, could you hack things up so autovacuum logging is enabled
>>  >> (log_autovacuum_min_duration=0), and see whether it's triggered?
>>
>> I'll be happy to, but that will require to run pg_upgrade once more and
>> that takes more that half a day and during this time clusters are not
>> available to me. Given the data I'm attaching it may happen that the
>> colleagues will want to see something else from my clusters or maybe change
>> some settings before running the pg_upgrade again. Therefore, I'll wait 12
>> hours after this message in case there will be any more requests and the
>> I'll run the pg_upgrade again.
>>
>> Thank you all for trying to solve this matter, this is much appreciated!
>> :)
>>
>> 2018-06-11 20:29 GMT+03:00 Andres Freund > and...@anarazel.de>>:
>>
>> On 2018-06-11 13:14:12 -0400, Tom Lane wrote:
>> > Andres Freund mailto:and...@anarazel.de>>
>> writes:
>> > > I suspect the issue is that pg_resetwal does:
>> > > if (set_xid != 0)
>> > > {
>> > > ControlFile.checkPointCopy.nextXid = set_xid;
>> > > > /*
>> > >  * For the moment, just set oldestXid to a value that
>> will force
>> > >  * immediate autovacuum-for-wraparound.  It's not
>> clear whether adding
>> > >  * user control of this is useful, so let's just do
>> something that's
>> > >  * reasonably safe.  The magic constant here
>> corresponds to the
>> > >  * maximum allowed value of autovacuum_freeze_max_age.
>> > >  */
>> > > ControlFile.checkPointCopy.oldestXid = set_xid -
>> 20;
>> > > if (ControlFile.checkPointCopy.oldestXid <
>> FirstNormalTransactionId)
>> > > ControlFile.checkPointCopy.oldestXid +=
>> FirstNormalTransactionId;
>> > > ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
>> > > }
>> > > > but we have codepath that doesn't check for oldestXidDB
>> being
>> > > InvalidOid.  Not great.
>> > > Hm, I think I'd define the problem as "pg_resetwal is
>> violating the
>> > expectation that oldestXidDB be valid".
>>
>> Well, what could it do otherwise?  ForceTransactionIdLimitUpdate()
>> currently does a syscache check for database existence. That'll just
>> return a lookup failure for InvalidOid, so we're reasonably good on
>> that
>> front.
>>
>> Using a hardcoded 

Re: pg_upgrade and wraparound

2018-06-11 Thread Adrian Klaver

On 06/11/2018 11:32 AM, Alexander Shutyaev wrote:

I'm back with more details.

First, I've deleted the smaller sslentry database, since I don't need 
it, just so that it doesn't somehow spoil the picture. Now there is only 
1 user database - bof (OID=16400). After that I've ran the pg_upgrade on 
a clean 10.4 cluster and it failed in the same way.


Now, the answers to your queries.

 >> 2) The upgrade stops because of transaction ID wraparound, which is 
strange as that is not showing up in the 9.6 cluster I presume. You 
might want the queries found below on the 9.6 and 10 clusters to help 
figure this out:


 >> SELECT c.oid::regclass as table_name,
 >>        greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
 >> FROM pg_class c
 >> LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
 >> WHERE c.relkind IN ('r', 'm');

9.6 result - see attached 96-query1.txt
10.4 result - see attached 104-query1.txt


I should have been clearer in my previous post, the above query is per 
database. From the query below I am going to say the above query was 
done on the bof database. Is that correct?


Given the below from 96-query2.txt:

template0 | 110588398

Can you run the table_name query in template0 in the 9.6 cluster?




 >> SELECT datname, age(datfrozenxid) FROM pg_database;

9.6 result - see attached 96-query2.txt
10.4 result - see attached 104-query2.txt

 >>  It might be useful
 >> to see pg_controldata output for the old cluster, as well as
 >> "select datname, datfrozenxid from pg_database" output from the
 >> old cluster.

for the query - see above, for pg_controldata:

9.6 - see attached 96-pg_controldata.txt
10.4 - see attached 104-pg_controldata.txt

 >> Alexander, could you hack things up so autovacuum logging is enabled
 >> (log_autovacuum_min_duration=0), and see whether it's triggered?

I'll be happy to, but that will require to run pg_upgrade once more and 
that takes more that half a day and during this time clusters are not 
available to me. Given the data I'm attaching it may happen that the 
colleagues will want to see something else from my clusters or maybe 
change some settings before running the pg_upgrade again. Therefore, 
I'll wait 12 hours after this message in case there will be any more 
requests and the I'll run the pg_upgrade again.


Thank you all for trying to solve this matter, this is much appreciated! :)

2018-06-11 20:29 GMT+03:00 Andres Freund >:


On 2018-06-11 13:14:12 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@anarazel.de>> writes:
> > I suspect the issue is that pg_resetwal does:
> >     if (set_xid != 0)
> >     {
> >             ControlFile.checkPointCopy.nextXid = set_xid;
> 
> >             /*

> >              * For the moment, just set oldestXid to a value that will 
force
> >              * immediate autovacuum-for-wraparound.  It's not clear 
whether adding
> >              * user control of this is useful, so let's just do 
something that's
> >              * reasonably safe.  The magic constant here corresponds to 
the
> >              * maximum allowed value of autovacuum_freeze_max_age.
> >              */
> >             ControlFile.checkPointCopy.oldestXid = set_xid - 20;
> >             if (ControlFile.checkPointCopy.oldestXid < 
FirstNormalTransactionId)
> >                     ControlFile.checkPointCopy.oldestXid += 
FirstNormalTransactionId;
> >             ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> >     }
> 
> > but we have codepath that doesn't check for oldestXidDB being

> > InvalidOid.  Not great.
> 
> Hm, I think I'd define the problem as "pg_resetwal is violating the

> expectation that oldestXidDB be valid".

Well, what could it do otherwise?  ForceTransactionIdLimitUpdate()
currently does a syscache check for database existence. That'll just
return a lookup failure for InvalidOid, so we're reasonably good on that
front.

Using a hardcoded 20 seems worse, will have funny results if
running with a smaller autovacuum_freeze_max_age...


> However, this just explains the basically-cosmetic issue that the
> complaint message mentions OID 0.  It doesn't really get us to the
> answer to why Alexander is seeing a failure.  It might be useful
> to see pg_controldata output for the old cluster, as well as
> "select datname, datfrozenxid from pg_database" output from the
> old cluster.

pg_upgrade starts the server with autovacuum disabled, I suspect
restoring all the large objects ends up using a lot of transaction
ids. GetNewTransactionId() should start autovacuum, but I'd guess that's
where things are going wrong for some reason.

Alexander, could you hack things up so autovacuum logging is enabled
(log_autovacuum_min_duration=0), and see whether it's triggered?

I'm not entirely clear why pg_restore appears 

Re: pg_upgrade and wraparound

2018-06-11 Thread Alexander Shutyaev
I'm back with more details.

First, I've deleted the smaller sslentry database, since I don't need it,
just so that it doesn't somehow spoil the picture. Now there is only 1 user
database - bof (OID=16400). After that I've ran the pg_upgrade on a clean
10.4 cluster and it failed in the same way.

Now, the answers to your queries.

>> 2) The upgrade stops because of transaction ID wraparound, which is
strange as that is not showing up in the 9.6 cluster I presume. You might
want the queries found below on the 9.6 and 10 clusters to help figure this
out:

>> SELECT c.oid::regclass as table_name,
>>greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
>> FROM pg_class c
>> LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
>> WHERE c.relkind IN ('r', 'm');

9.6 result - see attached 96-query1.txt
10.4 result - see attached 104-query1.txt

>> SELECT datname, age(datfrozenxid) FROM pg_database;

9.6 result - see attached 96-query2.txt
10.4 result - see attached 104-query2.txt

>>  It might be useful
>> to see pg_controldata output for the old cluster, as well as
>> "select datname, datfrozenxid from pg_database" output from the
>> old cluster.

for the query - see above, for pg_controldata:

9.6 - see attached 96-pg_controldata.txt
10.4 - see attached 104-pg_controldata.txt

>> Alexander, could you hack things up so autovacuum logging is enabled
>> (log_autovacuum_min_duration=0), and see whether it's triggered?

I'll be happy to, but that will require to run pg_upgrade once more and
that takes more that half a day and during this time clusters are not
available to me. Given the data I'm attaching it may happen that the
colleagues will want to see something else from my clusters or maybe change
some settings before running the pg_upgrade again. Therefore, I'll wait 12
hours after this message in case there will be any more requests and the
I'll run the pg_upgrade again.

Thank you all for trying to solve this matter, this is much appreciated! :)

2018-06-11 20:29 GMT+03:00 Andres Freund :

> On 2018-06-11 13:14:12 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > I suspect the issue is that pg_resetwal does:
> > > if (set_xid != 0)
> > > {
> > > ControlFile.checkPointCopy.nextXid = set_xid;
> >
> > > /*
> > >  * For the moment, just set oldestXid to a value that will
> force
> > >  * immediate autovacuum-for-wraparound.  It's not clear
> whether adding
> > >  * user control of this is useful, so let's just do
> something that's
> > >  * reasonably safe.  The magic constant here corresponds
> to the
> > >  * maximum allowed value of autovacuum_freeze_max_age.
> > >  */
> > > ControlFile.checkPointCopy.oldestXid = set_xid -
> 20;
> > > if (ControlFile.checkPointCopy.oldestXid <
> FirstNormalTransactionId)
> > > ControlFile.checkPointCopy.oldestXid +=
> FirstNormalTransactionId;
> > > ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> > > }
> >
> > > but we have codepath that doesn't check for oldestXidDB being
> > > InvalidOid.  Not great.
> >
> > Hm, I think I'd define the problem as "pg_resetwal is violating the
> > expectation that oldestXidDB be valid".
>
> Well, what could it do otherwise?  ForceTransactionIdLimitUpdate()
> currently does a syscache check for database existence. That'll just
> return a lookup failure for InvalidOid, so we're reasonably good on that
> front.
>
> Using a hardcoded 20 seems worse, will have funny results if
> running with a smaller autovacuum_freeze_max_age...
>
>
> > However, this just explains the basically-cosmetic issue that the
> > complaint message mentions OID 0.  It doesn't really get us to the
> > answer to why Alexander is seeing a failure.  It might be useful
> > to see pg_controldata output for the old cluster, as well as
> > "select datname, datfrozenxid from pg_database" output from the
> > old cluster.
>
> pg_upgrade starts the server with autovacuum disabled, I suspect
> restoring all the large objects ends up using a lot of transaction
> ids. GetNewTransactionId() should start autovacuum, but I'd guess that's
> where things are going wrong for some reason.
>
> Alexander, could you hack things up so autovacuum logging is enabled
> (log_autovacuum_min_duration=0), and see whether it's triggered?
>
> I'm not entirely clear why pg_restore appears to use a separate
> transaction for each large object, surely exascerbating the problem.
>
> Greetings,
>
> Andres Freund
>
pg_control version number:960
Catalog version number:   201608131
Database system identifier:   6341258630649216079
Database cluster state:   in production
pg_control last modified: Mon 11 Jun 2018 09:10:28 PM MSK
Latest checkpoint location:   811B/24CA0B40
Prior checkpoint location:811B/24CA0AD0
Latest checkpoint's REDO location:  

Re: pg_upgrade and wraparound

2018-06-11 Thread Andres Freund
On 2018-06-11 13:14:12 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I suspect the issue is that pg_resetwal does:
> > if (set_xid != 0)
> > {
> > ControlFile.checkPointCopy.nextXid = set_xid;
> 
> > /*
> >  * For the moment, just set oldestXid to a value that will force
> >  * immediate autovacuum-for-wraparound.  It's not clear whether 
> > adding
> >  * user control of this is useful, so let's just do something 
> > that's
> >  * reasonably safe.  The magic constant here corresponds to the
> >  * maximum allowed value of autovacuum_freeze_max_age.
> >  */
> > ControlFile.checkPointCopy.oldestXid = set_xid - 20;
> > if (ControlFile.checkPointCopy.oldestXid < 
> > FirstNormalTransactionId)
> > ControlFile.checkPointCopy.oldestXid += 
> > FirstNormalTransactionId;
> > ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> > }
> 
> > but we have codepath that doesn't check for oldestXidDB being
> > InvalidOid.  Not great.
> 
> Hm, I think I'd define the problem as "pg_resetwal is violating the
> expectation that oldestXidDB be valid".

Well, what could it do otherwise?  ForceTransactionIdLimitUpdate()
currently does a syscache check for database existence. That'll just
return a lookup failure for InvalidOid, so we're reasonably good on that
front.

Using a hardcoded 20 seems worse, will have funny results if
running with a smaller autovacuum_freeze_max_age...


> However, this just explains the basically-cosmetic issue that the
> complaint message mentions OID 0.  It doesn't really get us to the
> answer to why Alexander is seeing a failure.  It might be useful
> to see pg_controldata output for the old cluster, as well as
> "select datname, datfrozenxid from pg_database" output from the
> old cluster.

pg_upgrade starts the server with autovacuum disabled, I suspect
restoring all the large objects ends up using a lot of transaction
ids. GetNewTransactionId() should start autovacuum, but I'd guess that's
where things are going wrong for some reason.

Alexander, could you hack things up so autovacuum logging is enabled
(log_autovacuum_min_duration=0), and see whether it's triggered?

I'm not entirely clear why pg_restore appears to use a separate
transaction for each large object, surely exascerbating the problem.

Greetings,

Andres Freund



Re: pg_upgrade and wraparound

2018-06-11 Thread Tom Lane
Andres Freund  writes:
> I suspect the issue is that pg_resetwal does:
>   if (set_xid != 0)
>   {
>   ControlFile.checkPointCopy.nextXid = set_xid;

>   /*
>* For the moment, just set oldestXid to a value that will force
>* immediate autovacuum-for-wraparound.  It's not clear whether 
> adding
>* user control of this is useful, so let's just do something 
> that's
>* reasonably safe.  The magic constant here corresponds to the
>* maximum allowed value of autovacuum_freeze_max_age.
>*/
>   ControlFile.checkPointCopy.oldestXid = set_xid - 20;
>   if (ControlFile.checkPointCopy.oldestXid < 
> FirstNormalTransactionId)
>   ControlFile.checkPointCopy.oldestXid += 
> FirstNormalTransactionId;
>   ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
>   }

> but we have codepath that doesn't check for oldestXidDB being
> InvalidOid.  Not great.

Hm, I think I'd define the problem as "pg_resetwal is violating the
expectation that oldestXidDB be valid".

However, this just explains the basically-cosmetic issue that the
complaint message mentions OID 0.  It doesn't really get us to the
answer to why Alexander is seeing a failure.  It might be useful
to see pg_controldata output for the old cluster, as well as
"select datname, datfrozenxid from pg_database" output from the
old cluster.

regards, tom lane



Re: pg_upgrade and wraparound

2018-06-11 Thread Andres Freund
On 2018-06-09 15:52:26 -0400, Tom Lane wrote:
> Adrian Klaver  writes:
> > On 06/09/2018 03:46 AM, Alexander Shutyaev wrote:
> >> The upgrade operation failed after several hours with the following error:
> >> database is not accepting commands to avoid wraparound data loss in 
> >> database with OID 0
> 
> > Do you know which database has an OID of 0?
> 
> Well, none do, so the correct question is what is passing an invalid
> database OID to the code that's complaining.  This sure looks like a
> bug, though I'm not sure we have enough info to locate it.

It sure looks like ShmemVariableCache->oldestXidDB isn't initialized.
As far as I remember we just initialize that from a checkpoint at
startup.   I suspect the issue is that pg_resetwal does:
if (set_xid != 0)
{
ControlFile.checkPointCopy.nextXid = set_xid;

/*
 * For the moment, just set oldestXid to a value that will force
 * immediate autovacuum-for-wraparound.  It's not clear whether 
adding
 * user control of this is useful, so let's just do something 
that's
 * reasonably safe.  The magic constant here corresponds to the
 * maximum allowed value of autovacuum_freeze_max_age.
 */
ControlFile.checkPointCopy.oldestXid = set_xid - 20;
if (ControlFile.checkPointCopy.oldestXid < 
FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += 
FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}

but we have codepath that doesn't check for oldestXidDB being
InvalidOid.  Not great.

Greetings,

Andres Freund



Re: pg_upgrade and wraparound

2018-06-11 Thread Andres Freund
Hi,

On 2018-06-09 13:46:16 +0300, Alexander Shutyaev wrote:
> Hello!
> 
> I've been trying to upgrade a postgresql cluster from 9.6 to 10. I've
> executed the pg_upgrade with the following options:
> 
>  /usr/lib/postgresql/10/bin/pg_upgrade -b /usr/lib/postgresql/9.6/bin/ -B
> /usr/lib/postgresql/10/bin/ -d /var/lib/postgresql/9.6/main -D
> /var/lib/postgresql/10/main -o ' -c
> config_file=/etc/postgresql/9.6/main/postgresql.conf' -O ' -c
> config_file=/etc/postgresql/10/main/postgresql.conf'
> 
> The upgrade operation failed after several hours with the following error:
> 
> database is not accepting commands to avoid wraparound data loss in
> database with OID 0
> 
> Earlier in the log there are a lot of messages like
> 
> pg_restore: executing BLOB 1740736966
> pg_restore: WARNING:  database with OID 0 must be vacuumed within 1000279
> transactions
> HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
> database.
> You might also need to commit or roll back old prepared transactions.

Since this happens in the version you're migrating too, I suggest
replacing the warning with a PANIC and then using the debugger to look
at the corefile generated (when using an appropriate ulimit).  It's not
immediately obvious why there'd not be correct knowledge about the
oldest database around. If you do so, please include 'p *ShmemVariableCache'
output.

Greetings,

Andres Freund



Re: pg_upgrade and wraparound

2018-06-11 Thread Adrian Klaver

On 06/10/2018 11:46 PM, Alexander Shutyaev wrote:
 >> Is this the regular Postgres log or the pg_upgrade log which should 
be something like pg_upgrade_server.log?


This is the pg_upgrade_dump_16400.log.

How did you get into the 10 cluster to report on the database OID's and 

names?

After the pg_upgrade failed I was able to start both clusters, so I 
connected to the new 10.4 cluster and ran the query.


I am at a loss for an explanation. My thoughts:

1) The database with an  OID of 0 is a mystery, though it does not seem 
to be stopping the upgrade by itself.


2) The upgrade stops because of transaction ID wraparound, which is 
strange as that is not showing up in the 9.6 cluster I presume. You 
might want the queries found below on the 9.6 and 10 clusters to help 
figure this out:


https://www.postgresql.org/docs/10/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

SELECT c.oid::regclass as table_name,
   greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');

SELECT datname, age(datfrozenxid) FROM pg_database;




Which database has the large objects?


bof (OID=16400). It is also effectively the only database that matters 
here. The other one - sslentry only contains a couple of tables and a 
dozen of records.



Did you check this view to confirm?


Yes, I did:

select * from pg_prepared_xacts;
  transaction | gid | prepared | owner | database
-+-+--+---+--
(0 rows)




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: pg_upgrade and wraparound

2018-06-11 Thread Alexander Shutyaev
>>  Is this the regular Postgres log or the pg_upgrade log which should be
something like pg_upgrade_server.log?

This is the pg_upgrade_dump_16400.log.

>>  How did you get into the 10 cluster to report on the database OID's and
names?

After the pg_upgrade failed I was able to start both clusters, so I
connected to the new 10.4 cluster and ran the query.

>>  Which database has the large objects?

bof (OID=16400). It is also effectively the only database that matters
here. The other one - sslentry only contains a couple of tables and a dozen
of records.

>>  Did you check this view to confirm?

Yes, I did:

select * from pg_prepared_xacts;
 transaction | gid | prepared | owner | database
-+-+--+---+--
(0 rows)


2018-06-11 3:15 GMT+03:00 Adrian Klaver :

> On 06/10/2018 02:45 PM, Alexander Shutyaev wrote:
>
> Comments inline.
>
> The error log is like this. Here's its tail:
>>
>
> Is this the regular Postgres log or the pg_upgrade log which should be
> something like pg_upgrade_server.log?
>
>
> pg_restore: [archiver (db)] could not execute query: ERROR:  database is
>> not accepting commands to avoid wraparound data loss in database with OID 0
>> HINT:  Stop the postmaster and vacuum that database in single-user mode.
>>
>
> How did you get into the 10 cluster to report on the database OID's and
> names?
>
> You might also need to commit or roll back old prepared transactions.
>>  Command was: ALTER LARGE OBJECT 1740737402 OWNER TO bof_user;
>>
>> Before that there is a lot of similar messages - the only things
>> chainging are the "executing BLOB nnn" number and "must be vacuumed within
>> nnn transactions" number.
>>
>>
> Which database has the large objects?
>
> As for the prepared transactions - no, I don't have them, our application
>> doesn't use this functionality.
>>
>
> Did you check this view to confirm?:
>
> https://www.postgresql.org/docs/10/static/view-pg-prepared-xacts.html
>
> Just trying to eliminate possibilities.
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: pg_upgrade and wraparound

2018-06-10 Thread Adrian Klaver

On 06/10/2018 02:45 PM, Alexander Shutyaev wrote:

Comments inline.


The error log is like this. Here's its tail:


Is this the regular Postgres log or the pg_upgrade log which should be 
something like pg_upgrade_server.log?



pg_restore: [archiver (db)] could not execute query: ERROR:  database is 
not accepting commands to avoid wraparound data loss in database with OID 0

HINT:  Stop the postmaster and vacuum that database in single-user mode.


How did you get into the 10 cluster to report on the database OID's and 
names?



You might also need to commit or roll back old prepared transactions.
     Command was: ALTER LARGE OBJECT 1740737402 OWNER TO bof_user;

Before that there is a lot of similar messages - the only things 
chainging are the "executing BLOB nnn" number and "must be vacuumed 
within nnn transactions" number.




Which database has the large objects?

As for the prepared transactions - no, I don't have them, our 
application doesn't use this functionality.


Did you check this view to confirm?:

https://www.postgresql.org/docs/10/static/view-pg-prepared-xacts.html

Just trying to eliminate possibilities.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: pg_upgrade and wraparound

2018-06-10 Thread Alexander Shutyaev
The error log is like this. Here's its tail:

pg_restore: executing BLOB 1740737401
pg_restore: WARNING:  database with OID 0 must be vacuumed within 103
transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: WARNING:  database with OID 0 must be vacuumed within 102
transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: executing BLOB 1740737402
pg_restore: WARNING:  database with OID 0 must be vacuumed within 101
transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 9759463; 2613 1740737402
BLOB 1740737402 bof_user
pg_restore: [archiver (db)] could not execute query: ERROR:  database is
not accepting commands to avoid wraparound data loss in database with OID 0
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
Command was: ALTER LARGE OBJECT 1740737402 OWNER TO bof_user;

Before that there is a lot of similar messages - the only things chainging
are the "executing BLOB nnn" number and "must be vacuumed within nnn
transactions" number.

As for the prepared transactions - no, I don't have them, our application
doesn't use this functionality.

2018-06-11 0:34 GMT+03:00 Adrian Klaver :

> On 06/10/2018 02:09 PM, Alexander Shutyaev wrote:
>
>> Some more notes on databses.
>>
>> Although the pg_upgrade failed, I've decided to check the databases in
>> the new cluster (10.4). There is no database with oid 0 either. Also to be
>> noted that some system databases changed the oids while others retained
>> them.
>>
>
> If I am following the source code for pg_upgrade correctly that is
> expected. Pretty sure because the order of object creation is different.
>
>
>> And of my databases - sslentry. It had a very big oid (can that seem
>> strange?) and its oid has changed.
>>
>
> OID's are added at time of object creation so I would say the ssslentry
> database was created some time after the other databases in the 9.6
> cluster. Actually probably more accurate to say after 1016305714 -
> 16400(bof db) objects that have OID's where created.
>
> When the upgrade failed pg_upgrade should have pointed you at an error log.
>
> Did it and is there anything useful there?
>
> From your OP post:
>
> "You might also need to commit or roll back old prepared transactions."
>
> Do you have any of those in the 9.6 cluster?
>
> See:
>
> https://www.postgresql.org/docs/10/static/view-pg-prepared-xacts.html
>
>
>
>> select oid, datname from pg_database;
>>oid  |  datname
>> ---+---
>>   13011 | template0
>>   16400 | bof
>>   13012 | postgres
>>   16401 | sslentry
>>   1 | template1
>> (5 rows)
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: pg_upgrade and wraparound

2018-06-10 Thread Adrian Klaver

On 06/10/2018 02:09 PM, Alexander Shutyaev wrote:

Some more notes on databses.

Although the pg_upgrade failed, I've decided to check the databases in 
the new cluster (10.4). There is no database with oid 0 either. Also to 
be noted that some system databases changed the oids while others 
retained them.


If I am following the source code for pg_upgrade correctly that is 
expected. Pretty sure because the order of object creation is different.




And of my databases - sslentry. It had a very big oid (can that seem 
strange?) and its oid has changed.


OID's are added at time of object creation so I would say the ssslentry 
database was created some time after the other databases in the 9.6 
cluster. Actually probably more accurate to say after 1016305714 - 
16400(bof db) objects that have OID's where created.


When the upgrade failed pg_upgrade should have pointed you at an error log.

Did it and is there anything useful there?

From your OP post:

"You might also need to commit or roll back old prepared transactions."

Do you have any of those in the 9.6 cluster?

See:

https://www.postgresql.org/docs/10/static/view-pg-prepared-xacts.html



select oid, datname from pg_database;
   oid  |  datname
---+---
  13011 | template0
  16400 | bof
  13012 | postgres
  16401 | sslentry
      1 | template1
(5 rows)






--
Adrian Klaver
adrian.kla...@aklaver.com



Re: pg_upgrade and wraparound

2018-06-10 Thread Alexander Shutyaev
Some more notes on databses.

Although the pg_upgrade failed, I've decided to check the databases in the
new cluster (10.4). There is no database with oid 0 either. Also to be
noted that some system databases changed the oids while others retained
them.

And of my databases - sslentry. It had a very big oid (can that seem
strange?) and its oid has changed.

select oid, datname from pg_database;
  oid  |  datname
---+---
 13011 | template0
 16400 | bof
 13012 | postgres
 16401 | sslentry
 1 | template1
(5 rows)


2018-06-11 0:03 GMT+03:00 Alexander Shutyaev :

> No database with oid 0 in 9.6 cluster:
>
> postgres=# select oid, datname from pg_database;
> oid |  datname
> +---
>   1 | template1
>   12438 | template0
>   16400 | bof
>   12439 | postgres
>  1016305714 | sslentry
> (5 rows)
>
> >> 1) OS and version you are using?
>
> Ubuntu 16.04.4
>
> >> 2) Where are you getting you Postgres from?:
> >>  a) Package
> >>  Repo?
> >>  b) Source
> >>  File source?
>
> I've installed it from postgresql repo -  http://apt.postgresql.org/
> pub/repos/apt/
>
> >> 3) Where there any issues with 9.6 cluster before you tried to upgrade
> from it?
>
> No, I don't remember anything specific about it.
>
> 2018-06-10 23:55 GMT+03:00 Adrian Klaver :
>
>> On 06/10/2018 01:46 PM, Alexander Shutyaev wrote:
>>
>>> I can provide you with more info. Please tell me what you need. I really
>>> hope
>>> someone here can help me somehow solve or workaround this, because I
>>> really
>>> need to migrate to v10 for its features.
>>>
>>
>> Well as Tom Lane said a database will not have an OID of 0, so we need to
>> know where that is coming from. On the oft chance it occurred in your 9.6
>> cluster can you see if:
>>
>> select oid, datname from pg_database;
>>
>> in the 9.6 cluster shows an OID of 0.
>>
>> Also:
>>
>> 1) OS and version you are using?
>>
>> 2) Where are you getting you Postgres from?:
>> a) Package
>> Repo?
>> b) Source
>> File source?
>>
>> 3) Where there any issues with 9.6 cluster before you tried to upgrade
>> from it?
>>
>>
>>
>>> Thanks in advance,
>>> Alexander
>>>
>>>
>>>
>>> --
>>> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f184378
>>> 0.html
>>>
>>>
>>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>
>


Re: pg_upgrade and wraparound

2018-06-10 Thread Alexander Shutyaev
No database with oid 0 in 9.6 cluster:

postgres=# select oid, datname from pg_database;
oid |  datname
+---
  1 | template1
  12438 | template0
  16400 | bof
  12439 | postgres
 1016305714 | sslentry
(5 rows)

>> 1) OS and version you are using?

Ubuntu 16.04.4

>> 2) Where are you getting you Postgres from?:
>>  a) Package
>>  Repo?
>>  b) Source
>>  File source?

I've installed it from postgresql repo -
http://apt.postgresql.org/pub/repos/apt/

>> 3) Where there any issues with 9.6 cluster before you tried to upgrade
from it?

No, I don't remember anything specific about it.

2018-06-10 23:55 GMT+03:00 Adrian Klaver :

> On 06/10/2018 01:46 PM, Alexander Shutyaev wrote:
>
>> I can provide you with more info. Please tell me what you need. I really
>> hope
>> someone here can help me somehow solve or workaround this, because I
>> really
>> need to migrate to v10 for its features.
>>
>
> Well as Tom Lane said a database will not have an OID of 0, so we need to
> know where that is coming from. On the oft chance it occurred in your 9.6
> cluster can you see if:
>
> select oid, datname from pg_database;
>
> in the 9.6 cluster shows an OID of 0.
>
> Also:
>
> 1) OS and version you are using?
>
> 2) Where are you getting you Postgres from?:
> a) Package
> Repo?
> b) Source
> File source?
>
> 3) Where there any issues with 9.6 cluster before you tried to upgrade
> from it?
>
>
>
>> Thanks in advance,
>> Alexander
>>
>>
>>
>> --
>> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f184378
>> 0.html
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: pg_upgrade and wraparound

2018-06-10 Thread Alexander Shutyaev
 I can provide you with more info. Please tell me what you need. I really
hope someone here can help me somehow solve or workaround this, because I
really need to migrate to v10 for its features.

Thanks in advance,
Alexander

2018-06-09 22:52 GMT+03:00 Tom Lane :

> Adrian Klaver  writes:
> > On 06/09/2018 03:46 AM, Alexander Shutyaev wrote:
> >> The upgrade operation failed after several hours with the following
> error:
> >> database is not accepting commands to avoid wraparound data loss in
> >> database with OID 0
>
> > Do you know which database has an OID of 0?
>
> Well, none do, so the correct question is what is passing an invalid
> database OID to the code that's complaining.  This sure looks like a
> bug, though I'm not sure we have enough info to locate it.
>
> regards, tom lane
>


Re: pg_upgrade and wraparound

2018-06-10 Thread Alexander Shutyaev
I can provide you with more info. Please tell me what you need. I really hope
someone here can help me somehow solve or workaround this, because I really
need to migrate to v10 for its features.

Thanks in advance,
Alexander



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pg_upgrade and wraparound

2018-06-09 Thread Tom Lane
Adrian Klaver  writes:
> On 06/09/2018 03:46 AM, Alexander Shutyaev wrote:
>> The upgrade operation failed after several hours with the following error:
>> database is not accepting commands to avoid wraparound data loss in 
>> database with OID 0

> Do you know which database has an OID of 0?

Well, none do, so the correct question is what is passing an invalid
database OID to the code that's complaining.  This sure looks like a
bug, though I'm not sure we have enough info to locate it.

regards, tom lane



Re: pg_upgrade and wraparound

2018-06-09 Thread Adrian Klaver

On 06/09/2018 03:46 AM, Alexander Shutyaev wrote:

Hello!

I've been trying to upgrade a postgresql cluster from 9.6 to 10. I've 
executed the pg_upgrade with the following options:


  /usr/lib/postgresql/10/bin/pg_upgrade -b /usr/lib/postgresql/9.6/bin/ 
-B /usr/lib/postgresql/10/bin/ -d /var/lib/postgresql/9.6/main -D 
/var/lib/postgresql/10/main -o ' -c 
config_file=/etc/postgresql/9.6/main/postgresql.conf' -O ' -c 
config_file=/etc/postgresql/10/main/postgresql.conf'


The upgrade operation failed after several hours with the following error:

database is not accepting commands to avoid wraparound data loss in 
database with OID 0


Do you know which database has an OID of 0?



Earlier in the log there are a lot of messages like

pg_restore: executing BLOB 1740736966
pg_restore: WARNING:  database with OID 0 must be vacuumed within 
1000279 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in 
that database.

You might also need to commit or roll back old prepared transactions.
pg_restore: WARNING:  database with OID 0 must be vacuumed within 
1000278 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in 
that database.

You might also need to commit or roll back old prepared transactions.

I've tried to do VACUUM FULL on my 9.6 cluster on all databases and then 
retried the pg_upgrade - it failed in the same way.


Also to be noted, earlier this cluster was succesfully upgraded with 
pg_upgrade using similar parameters from older versions (at least 2 
times, something like 9.1 -> 9.3, 9.3 -> 9.6). The database is around 
700 GB and has very many pg_largeobjects in it.


What could be the reason of this and how can I perform my upgrade?

Thanks in advance,
Alexander



--
Adrian Klaver
adrian.kla...@aklaver.com



pg_upgrade and wraparound

2018-06-09 Thread Alexander Shutyaev
Hello!

I've been trying to upgrade a postgresql cluster from 9.6 to 10. I've
executed the pg_upgrade with the following options:

 /usr/lib/postgresql/10/bin/pg_upgrade -b /usr/lib/postgresql/9.6/bin/ -B
/usr/lib/postgresql/10/bin/ -d /var/lib/postgresql/9.6/main -D
/var/lib/postgresql/10/main -o ' -c
config_file=/etc/postgresql/9.6/main/postgresql.conf' -O ' -c
config_file=/etc/postgresql/10/main/postgresql.conf'

The upgrade operation failed after several hours with the following error:

database is not accepting commands to avoid wraparound data loss in
database with OID 0

Earlier in the log there are a lot of messages like

pg_restore: executing BLOB 1740736966
pg_restore: WARNING:  database with OID 0 must be vacuumed within 1000279
transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: WARNING:  database with OID 0 must be vacuumed within 1000278
transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.

I've tried to do VACUUM FULL on my 9.6 cluster on all databases and then
retried the pg_upgrade - it failed in the same way.

Also to be noted, earlier this cluster was succesfully upgraded with
pg_upgrade using similar parameters from older versions (at least 2 times,
something like 9.1 -> 9.3, 9.3 -> 9.6). The database is around 700 GB and
has very many pg_largeobjects in it.

What could be the reason of this and how can I perform my upgrade?

Thanks in advance,
Alexander