Re: Huge performance penalty with parallel queries in Windows x64 v. Linux x64

2021-05-03 Thread Thomas Munro
On Tue, May 4, 2021 at 4:05 AM Hans Buschmann  wrote:
> The main difference is the time shown for the Gather Merge step (65 ms vs. 7 
> ms)

No Windows here, but could it be super slow at launching workers?  How
does a trivial parallel query compare, something like?

SET force_parallel_mode = on;
EXPLAIN ANALYZE SELECT 42;




Huge performance penalty with parallel queries in Windows x64 v. Linux x64

2021-05-03 Thread Hans Buschmann

When developing a solution for a new customer request I created a new query 
over the production data.

Despite the relatively low row counts of the involved tables (all < 100k) I 
noticed quite a long execution time of about 85 ms to 100 ms.

The explain anaylze plan showed a parallel execution plan with 2 parallels.

The data structure and index structure was not quite optimal for this kind of 
query (which does not matter in this case).

The comparison of the explain analyze plans on win-x64 and Linux x64 showed 
about 3 times longer execution on windows.

For comparison I reinstalled the production data on two test databases on 
different virtual machines on the same hardware (the very same machine with 
Hyper-V virtualization).

The steps were only (on a mostly complete idle machine):
1. create test database
2. pg_restore of the production data from same dump file
3. analyze on the database
4. run the query multiple times (about 5 times) and took the fastest explain 
analyze.

On fedora 34 64 bit, PG 13.2 unmodified self compiled the query took about 33 
ms.
On Windows Server 2019 64 bit, PG 13.2 from EDB download packages the query 
took about 85 ms.

 version
--
 PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.1.1 20210428 
(Red Hat 11.1.1-1), 64-bit

  version

 PostgreSQL 13.2, compiled by Visual C++ build 1914, 64-bit

The corresponding explain plans are available at explain.depesz.com

-- fedora
https://explain.depesz.com/s/Mq3P

-- windows
https://explain.depesz.com/s/VLtZ

The main difference is the time shown for the Gather Merge step (65 ms vs. 7 ms)

The explain plans and the non-standard GUCs are included in the attachments, 
the configuration for the databases seems quite comparable.

Unfortunately I cannot disclose the query and the table data.

My experience with parallel queries is not very wide, but this massive 
execution time difference of the exact same query on the exact same data on the 
exact same hardware with the same, unmodified last stable Postgres version is 
very astonishing.

BTW I generally observed slower execution under Windows, so production has 
moved now to Linux.

There seem no relevant GUC differences concerning query execution, so the 
performance penalty of 300% to 900% (one step only) is not easily explainable.

The databases remain on the system to repeat the queries on request in the 
queue of further investigation.


Thanks for looking.

Hans Buschmann


 QUERY PLAN

 Append  (cost=8284.93..8437.72 rows=2022 width=198) (actual 
time=31.506..32.637 rows=34 loops=1)
   CTE qsum
 ->  Sort  (cost=8280.38..8284.93 rows=1822 width=180) (actual 
time=31.502..32.577 rows=22 loops=1)
   Sort Key: or_followup.of_season, orders.or_clis_sub_code
   Sort Method: quicksort  Memory: 28kB
   ->  Hash Left Join  (cost=7864.72..8181.70 rows=1822 width=180) 
(actual time=31.274..32.566 rows=22 loops=1)
 Hash Cond: (or_followup.of_season = seasons.id_sea)
 ->  Hash Left Join  (cost=7862.95..8165.94 rows=1822 
width=106) (actual time=31.252..32.523 rows=22 loops=1)
   Hash Cond: (orders.or_clis_sub_code = 
clients_sub.clis_sub_code)
   ->  Finalize GroupAggregate  (cost=7818.93..8098.89 
rows=1822 width=94) (actual time=30.987..32.251 rows=22 loops=1)
 Group Key: or_followup.of_season, 
orders.or_clis_sub_code
 ->  Gather Merge  (cost=7818.93..8028.30 rows=1518 
width=94) (actual time=30.978..32.207 rows=30 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Partial GroupAggregate  
(cost=6818.91..6853.06 rows=759 width=94) (actual time=24.688..25.201 rows=10 
loops=3)
 Group Key: or_followup.of_season, 
orders.or_clis_sub_code
 ->  Sort  (cost=6818.91..6820.80 
rows=759 width=20) (actual time=24.601..24.652 rows=1334 loops=3)
   Sort Key: or_followup.of_season, 
orders.or_clis_sub_code
   Sort Method: quicksort  Memory: 
53kB
   Worker 0:  Sort Method: 
quicksort  Memory: 179kB
   Worker 1:  Sort Method: 
quicksort 

Re: pg_upgrade and wraparound

2021-05-03 Thread Jan Wieck

On 4/30/21 3:32 PM, Bruce Momjian wrote:

On Sat, Mar 13, 2021 at 08:43:54AM -0500, Jan Wieck wrote:

On 3/12/21 8:30 PM, Michael Paquier wrote:
> Hi Jan,
> 
> On Fri, Mar 12, 2021 at 06:13:33PM -0500, Jan Wieck wrote:

> > One of the things in my way is that when using pg_resetwal to put the
> > NextXID way into the future (to push the old cluster close to wraparound for
> > example), the postmaster won't start because it doesn't have the pg_xact
> > files for that around. Should pg_resetwal create the files in the gap
> > between the old NextXID and the new one?
> 
> I think that you should add this patch to the next commit fest to

> track it properly:
> https://commitfest.postgresql.org/33/
> --
> Michael
> 


Actually this is the wrong patch (this one is for PG-12, not for HEAD). Will
update later today.

But yes, putting it into the next commitfest after initial discussion is the
plan.


Uh, were either of these things done?



Not yet, but I will enter it so that we can get it into 15 for sure.


Regards, Jan

--
Jan Wieck
Postgres User since 1994




Re: "invalid contrecord" error on replica

2021-05-03 Thread Adrien Nayrat
Oh, I forgot to tell I was able to recover the secondary by replacing the 
000100AA00A0 from the archives into pg_wal. Then the secondary were 
able to finish recovery, start streaming replication and fetch subsequent wals.


I wondered why there was a CHECKPOINT_SHUTDOWN record. I dig a little more:

First, the filesystem on primary were full and I got :

PANIC: could not write to file "pg_wal/xlogtemp.305": No space left on device

The instance crashed and restarted in recovery mode.
At the end of the recovery I got:
checkpoint starting: end-of-recovery immediate
checkpoint complete: ...

Then a FATAL message:
FATAL: could not write to file "pg_wal/xlogtemp.9405": No space left on device

Followed by:
aborting startup due to process failure

Maybe it is this checkpoint which were not replicated? The primary had enough 
space for this record. But I don't understand how the secondary received records 
beginning by AA/A1...


I googled about this and I found other similar issues:
https://www.postgresql.org/message-id/flat/15938-8591df7e95064538%40postgresql.org


https://www.postgresql.org/message-id/CAMp7vw97871F21X7FHHdmU2FXGME4HTgMYxkAubMdCU2xevmxQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/E73F4CFB-E322-461E-B1EC-82FAA808FEE6%40lifetrenz.com

https://www.postgresql.org/message-id/15398-b4896eebf0bed218%40postgresql.org
https://www.postgresql.org/message-id/flat/15412-f9a89b026e6774d1%40postgresql.org

--
Adrien NAYRAT
https://blog.anayrat.info




Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)

2021-05-03 Thread WR

Am 03.05.2021 um 10:41 schrieb Laurenz Albe:

On Sat, 2021-05-01 at 12:59 +0200, Wolfgang Rißler wrote:

Am 30.04.2021 um 16:16 schrieb Tom Lane:

I would recommend trying to use a reasonably late-vintage libpq; we do
fix bugs in it on a regular basis.
The common stumbling block for cross-version situations is that the
client makes assumptions about system catalog contents that are not
valid in some other server release.  libpq proper doesn't really touch
the catalogs, so it's mostly impervious to that problem; but you'll need
to test your applications.


Of course we'll do. One thing is, that we load and write bytea's. And as
I read, there have been some changes. All other Operations are less
problematic.


Version 9.0 changed the default "bytea" output format to hexadecimal.
If you want the old text output format, set "bytea_output" to "escape".

If you output "bytea"s in binary form, nothing has changed.

This is a useful hint. I did this already in my PG10 <-> libpq10(x86) 
testinstallation.


Thank you.



--
May the source be with you




Re: "invalid contrecord" error on replica

2021-05-03 Thread Adrien Nayrat

On 03/05/2021 10:43, Laurenz Albe wrote:

On Sun, 2021-05-02 at 22:43 +0200, Adrien Nayrat wrote:

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFBE8

You are trying to stream from a server whose WAL position is *behind* your
own.  That cannot work.

Yours,
Laurenz Albe


Yes, it was during primary's recovery as it crashed, it restarted from previous 
checkpoint.


--
Adrien NAYRAT





Re: "invalid contrecord" error on replica

2021-05-03 Thread Laurenz Albe
On Sun, 2021-05-02 at 22:43 +0200, Adrien Nayrat wrote:
> LOG:  started streaming WAL from primary at AA/A100 on timeline 1
> FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
> point AA/A100 is ahead of the WAL flush position of this server 
> AA/A0FFFBE8

You are trying to stream from a server whose WAL position is *behind* your
own.  That cannot work.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)

2021-05-03 Thread Laurenz Albe
On Sat, 2021-05-01 at 12:59 +0200, Wolfgang Rißler wrote:
> Am 30.04.2021 um 16:16 schrieb Tom Lane:
> > I would recommend trying to use a reasonably late-vintage libpq; we do
> > fix bugs in it on a regular basis.
> > The common stumbling block for cross-version situations is that the
> > client makes assumptions about system catalog contents that are not
> > valid in some other server release.  libpq proper doesn't really touch
> > the catalogs, so it's mostly impervious to that problem; but you'll need
> > to test your applications.
> 
> Of course we'll do. One thing is, that we load and write bytea's. And as 
> I read, there have been some changes. All other Operations are less 
> problematic.

Version 9.0 changed the default "bytea" output format to hexadecimal.
If you want the old text output format, set "bytea_output" to "escape".

If you output "bytea"s in binary form, nothing has changed.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)

2021-05-03 Thread Wolfgang Rißler

Am 01.05.2021 um 18:26 schrieb Adrian Klaver:

On 5/1/21 3:59 AM, Wolfgang Rißler wrote:

This is my problem, I completely dont know, how to start compiling my 
own actual 32bit libpq on windows (and I would like to do it with VS 
2019).
For libpqxx there have been some hints how to do so in the past, and 
now there is a complete project, which deals with compiling it on 
windows with VS and CMake. But I didnt find such hints for libpq or 
the whole postgresDB.


Have you looked at below?:

https://www.postgresql.org/docs/current/install-windows.html



Stupid me, I've over overlooked this. So I will collect all the tools 
together and give it a try.


Thank you.


--

- may the source be with you -