On Mon, May 5, 2025 at 2:36 PM Achilleas Mantzios < a.mantz...@cloud.gatewaynet.com> wrote:
> > On 5/5/25 13:27, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios < > a.mantz...@cloud.gatewaynet.com> wrote: > >> >> On 5/5/25 11:00, Mladen Marinović wrote: >> >> >> >> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios < >> a.mantz...@cloud.gatewaynet.com> wrote: >> >>> >>> On 5/5/25 09:52, Mladen Marinović wrote: >>> >>> Hi, >>> >>> We recently migrated our production instances from PG11 to PG17. While >>> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As >>> everything worked on the test servers we pushed this to production a few >>> weeks ago. We did not notice any problems until a few days ago (but the >>> problems were here from the start). The main manifestation of the problems >>> is a service that runs a fixed query to get a backlog of unprocessed data >>> (limited to a 1000 rows). When testing the query using pgAdmin connected >>> directly to the database we get a result in cca. 20 seconds. The same query >>> runs for 2 hours when using pgBouncer to connect to the same database. >>> >>> >>> That's a huge jump, I hope you guys did extensive testing of your app. >>> In which language is your app written? If java, then define >>> prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in >>> pgbouncer. >>> >> Mainly python, but the problem was noticed in a java service. >> Prepare treshold was already set to 0. We changed the >> max_prepared_statements >> to 0 from the default (200) but no change was noticed. >> >>> How about search paths ? any difference on those between the two runs ? >>> Do you set search_path in pgbouncer ? what is "cca." btw ? >>> >>> >>> The more interesting part is that when we issue an explain of the same >>> query we get different plans. We did this a few seconds apart so there >>> should be no difference in collected statistics. We ruled out prepared >>> statements, as we suspected the generic plan might be the problem, but it >>> is not. Is there any pgBouncer or PG17 parameter that might be the cause of >>> this? >>> >>> >>> Does this spawn any connections (such as dblink) ? are there limits per >>> user/db pool_size in pgbouncer ? >>> >> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, >> UPDATE, DELETE,...) There are limits, but they are not hit. The query just >> uses a different plan and runs slower because of that. >> >>> Pgbouncer, in contrast to its old friend PgPool-II is completely >>> passive, just passes through SQL to the server as fast as possible as it >>> can. But I am sure you know that. Good luck, keep us posted! >>> >> Yes, that is what puzzles me. >> >> What is the pgbouncer's timeout in the server connections ? >> >> How about "idle in transaction" ? do you get any of those? What's the >> isolation level ? >> >> How about the user ? is this the same user doing pgadmin queries VS via >> the app ? >> >> Can you identify the user under which the problem is manifested and : >> >> ALTER user "unlucky_user" SET log_statement = 'all'; >> >> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help >> you debug the prepared statements .. just in case , and other stuff not >> printed by log_statement = all. >> > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. The > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > ok, this is something, at least one more extreme thought ruled out. How > about search_path ? is this the SAME user that is issuing the statements > in pgadmin VS pgbouncer ? > The user is the same, the search path is the same: show search_path ; search_path ----------------- "$user", public (1 row) There is only one schema (public) so there is no posibility that one connection uses different tables. As I remember correctly the planner uses the postgresql parameters (from the conf file visible in pg_settings) and table/row statistics to choose a plan. It is unclear to me how those numbers can be different when using pgbouncer (I have tried selecting the configuration using both connections and it is the same). > Is there a connect_query inside pgbouncer's conf ? > The connect_query parameter is not used. > you have to show all configuration involved and also full logging on the > backend for said user. > I can provide redacted explains if it would help. Full logging is not feasible. > > >> Regards, >> Mladen Marinović >> >>