Hello I have this situation now, that I can't wrap my head around and I'm hoping someone can shed some light onto what is going on, before I spend hours trying to reproduce this by taking the code apart into a test case.
A few facts: - I've got a piece of code split into several classes. - It initializes a dynamic number of threads. For the sake of this topic, I'll stick to just 2 threads to illustrate. - Each thread creates its own database connection (psycopg2.connect()) when it's started - Each thread, for each task it does, prints out its own name and the PID of the database session. Running the code, I see this: Threadcount 2 Starting thread Thread-1 Starting thread Thread-2 Thread-1 CONNECT Thread-2 CONNECT Thread-2 Run plan <<'Create basic order'>> Thread-2 Processing query block <'Query block 1'> Thread-2 File 'select_random_product.sql' Thread-1 Run plan <<'Create basic order'>> Thread-1 Processing query block <'Query block 1'> Thread-1 File 'select_random_product.sql' Thread-2 PID 31909 Thread-2 PID 31908 Thread-2 File 'select_random_customer.sql' Thread-2 PID 31908 Thread-2 File 'create_simple_order.sql' Thread-1 File 'select_random_customer.sql' Thread-2 PID 31908 Thread-2 File 'modify_simple_order.sql' Thread-2 PID 31908 Thread-1 File 'create_simple_order.sql' Thread-2 PID 31908 Thread-2 PID 31908 Thread-1 File 'modify_simple_order.sql' Thread-2 PID 31908 All threads exited. In the database log, with full logging enabled, I see this: 2022-10-17 10:59:03 UTC:10.0.1.17(36350):[unknown]@[unknown]:[31908]:LOG: connection received: host=10.0.1.17 port=36350 2022-10-17 10:59:03 UTC:10.0.1.17(36356):[unknown]@[unknown]:[31909]:LOG: connection received: host=10.0.1.17 port=36356 2022-10-17 10:59:03 UTC:10.0.1.17(36356):postgres@postgres:[31909]:LOG: connection authorized: user=postgres database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off) 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: connection authorized: user=postgres database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256, compression=off) 2022-10-17 10:59:03 UTC:10.0.1.17(36356):postgres@postgres:[31909]:LOG: execute <unnamed>: BEGIN 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: execute <unnamed>: BEGIN 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: duration: 0.065 ms 2022-10-17 10:59:03 UTC:10.0.1.17(36356):postgres@postgres:[31909]:LOG: statement: select pg_backend_pid(); 2022-10-17 10:59:03 UTC:10.0.1.17(36356):postgres@postgres:[31909]:LOG: duration: 0.246 ms 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: statement: select pg_backend_pid(); 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: duration: 0.237 ms 2022-10-17 10:59:03 UTC:10.0.1.17(36356):postgres@postgres:[31909]:LOG: disconnection: session time: 0:00:00.021 user=postgres database=postgres host=10.0.1.17 port=36356 2022-10-17 10:59:03 UTC:10.0.1.17(36350):postgres@postgres:[31908]:LOG: statement: .... many more entries from 31908 Perhaps you can already see the problem. One thread randomly disconnects. We can see this in several places. 1) In the script's output, we can see that Thread-2 first reports PID 31909 and then switches to 31908. 31909 goes silent. and 2) The database log shows the same thing: - 2 sessions connect - 2 sessions get established - shortly after, one disconnects, and the other keeps connected and executes the entire workload of both threads. In order. If I start with 3 threads, I see basically the same thing. Two will disconnect right after connecting, and the remaining one will then run all 3 tasks. Any ideas why this is happening? Thank you!