On 5/5/25 14:26, Mladen Marinović wrote:
Hi,

Mystery not solved...but identified. The pool is in transaction mode and some connections use set enable_mergejoin=off, but they do not set it back to on. Upon getting the connection from the pool the parameter is still set to off causing the planner to not use this kind of join which results in different plans when using this tainted pgbouncer connection instead of the clean one from pg17.

The problem is that server_reset_query is not used when the pool is in transaction mode. Now, we have to see how to fix this problem.

But you've got this : https://www.pgbouncer.org/config.html

"


     server_reset_query_always

Whether |server_reset_query| should be run in all pooling modes. When this setting is off (default), the |server_reset_query| will be run only in pools that are in sessions-pooling mode. Connections in transaction-pooling mode should not have any need for a reset query.

This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer. It changes non-deterministic breakage to deterministic breakage: Clients always lose their state after each transaction.

"



Regards,
Mladen Marinović

On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia <ejberde...@yahoo.com> wrote:

    Is the query using parameter markers? Is the source executing the
    query forcing a "bad" data type casting?

    Yahoo Mail: Search, Organize, Conquer
    
<https://mail.onelink.me/107872968?pid=nativeplacement&c=US_Acquisition_YMktg_315_SearchOrgConquer_EmailSignature&af_sub1=Acquisition&af_sub2=US_YMktg&af_sub3=&af_sub4=100002039&af_sub5=C01_Email_Static_&af_ios_store_cpp=0c38e4b0-a27e-40f9-a211-f4e2de32ab91&af_android_url=https://play.google.com/store/apps/details?id=com.yahoo.mobile.client.android.mail&listing=search_organize_conquer>

        On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović
        <ma...@kset.org> wrote:


        On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM
        <serh...@hotmail.com> wrote:

            Hi  , you had better try  vacuum  analyze for the whole db
            ,   pgbouncer  connection layer can not causeslow queries.


        I did that already. But the slow query is the consequence of
        the different plan, not the statistics.

            
------------------------------------------------------------------------
            *From:* Mladen Marinović <ma...@kset.org>
            *Sent:* Monday, May 5, 2025 12:27 PM
            *To:* Achilleas Mantzios <a.mantz...@cloud.gatewaynet.com>
            *Cc:* pgsql-general@lists.postgresql.org
            <pgsql-general@lists.postgresql.org>
            *Subject:* Re: Different execution plans in PG17 and
            pgBouncer...


            On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios
            <a.mantz...@cloud.gatewaynet.com> wrote:


                On 5/5/25 11:00, Mladen Marinović wrote:



                    On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios
                    <a.mantz...@cloud.gatewaynet.com> wrote:


                        On 5/5/25 09:52, Mladen Marinović wrote:

                            Hi,

                            We recently migrated our production
                            instances from PG11 to PG17. While doing
                            so we upgraded our pgBouncer instances
                            from 1.12 to 1.24. As everything worked on
                            the test servers we pushed this to
                            production a few weeks ago. We did not
                            notice any problems until a few days ago
                            (but the problems were here from the
                            start). The main manifestation of the
                            problems is a service that runs a fixed
                            query to get a backlog of unprocessed data
                            (limited to a 1000 rows). When testing the
                            query using pgAdmin connected directly to
                            the database we get a result in cca. 20
                            seconds. The same query runs for 2 hours
                            when using pgBouncer to connect to the
                            same database.


                        That's a huge jump, I hope you guys did
                        extensive testing of your app. In which
                        language is your app written? If java, then
                        define prepareThreshold=0 in your jdbc and set
                        max_prepared_statements = 0 in pgbouncer.

                    Mainly python, but the problem was noticed in a
                    java service.
                    Prepare treshold was already set to 0. We changed
                    the max_prepared_statements to 0 from the default
                    (200) but no change was noticed.

                        How about search paths ? any difference on
                        those between the two runs ? Do you set
                        search_path in pgbouncer ? what is "cca." btw ?


                            The more interesting part is that when we
                            issue an explain of the same query we get
                            different plans. We did this a few seconds
                            apart so there should be no difference in
                            collected statistics. We ruled out
                            prepared statements, as we suspected the
                            generic plan might be the problem, but it
                            is not. Is there any pgBouncer or PG17
                            parameter that might be the cause of this?


                        Does this spawn any connections (such as
                        dblink) ? are there limits per user/db
                        pool_size in pgbouncer ?

                    No additional connection nor dbling. Just plain
                    SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...)
                    There are limits, but they are not hit. The query
                    just uses a different plan and runs slower because
                    of that.

                        Pgbouncer, in contrast to its old friend
                        PgPool-II is completely passive, just passes
                        through SQL to the server as fast as possible
                        as it can. But I am sure you know that. Good
                        luck, keep us posted!

                    Yes, that is what puzzles me.

                What is the pgbouncer's timeout in the server
                connections ?

                How about "idle in transaction" ? do you get any of
                those? What's the isolation level ?

                How about the user ? is this the same user doing
                pgadmin queries VS via the app ?

                Can you identify the user under which the problem is
                manifested and :

                ALTER user "unlucky_user" SET log_statement = 'all';

                ALTER user "unlucky_user" SET
                log_min_duration_statement = 0; -- to help you debug
                the prepared statements .. just in case , and other
                stuff not printed by log_statement = all.

            None of those parameters should affect the fact that when
            issuing the explain select query (the statement is not
            prepared) from psql directly gives a different result than
            issuing it over the pgbouncer connection. The result is
            repeatable.

            We have rolled back pgbouncer to 1.12. and it seems the
            problem persists. This is one of the weirdest things I
            have ever seen with PostgreSQL.

                    Regards,
                    Mladen Marinović

Reply via email to