Performance problems with Postgres JDBC 42.4.2
Dear all, I'm running a query from Java on a postgres database: Java version: 17 JDBC version: 42.4.2 Postgres version: 13.1 In parallel I'm testing the same queries from pgAdmin 4 version 6.13 The tables I'm using contains more than 10million rows each and I have two questions here: 1. I need to extract the path of a file without the file itself. For this I use two alternatives as I found that sentence "A" is much faster than the "B" one: "A" sentence: SELECT DISTINCT ( LEFT(opf.file_path, length(opf.file_path) - position('/' in reverse(opf.file_path))) ) AS path FROM product AS op JOIN product_file AS opf ON opf.product_id = op.id WHERE op.proprietary_end_date <= CURRENT_DATE AND op.id LIKE 'urn:esa:psa:%' "B" sentence: SELECT DISTINCT ( regexp_replace(opf.file_path, '(.*)\/(.*)$', '\1') ) AS path FROM product AS op JOIN product_file AS opf ON opf.product_id = op.id WHERE op.proprietary_end_date <= CURRENT_DATE AND op.id LIKE 'urn:esa:psa:%' 2. Running sentence "A" on the pgAdmin client takes 4-5 minutes to finish but running it from a Java program it never ends. This is still the case when I limit the output to the first 100 rows so I assume this is not a problem with the amount of data being transferred but the way postgres resolve the query. To make it work in Java I had to define a postgres function that I call from the Java code instead of running the query directly. I had a similar problem in the past with a query that performed very poorly from a Java client while it was fine from pgAdmin or a python script. In that case it was a matter of column types not compatible with the JDBC (citext) deriving in an implicit cast that prevented the postgres engine from using a given index or to cast all the values of that column before using it, not sure now. But I don't think this is not the case here. Could anyone help me again? Many thanks in advance Jose
Re: [EXTERNAL] Performance down with JDBC 42
Very good point from Danny: generic and custom plans. One thing that is almost certainly not at play here, and is mentioned: there are some specific cases where the planner does not optimise for the query in total to be executed as fast/cheap as possible, but for the first few rows. One reason for that to happen is if a query is used as a cursor. (Warning: shameless promotion) I did a writeup on JDBC clientside/serverside prepared statements and custom and generic plans: https://dev.to/yugabyte/postgres-query-execution-jdbc-prepared-statements-51e2 The next obvious question then is if something material did change with JDBC for your old and new JDBC versions, I do believe the prepareThreshold did not change. Frits Hoogland > On 5 Nov 2023, at 20:47, David Rowley wrote: > > On Mon, 6 Nov 2023 at 08:37, Abraham, Danny wrote: >> >> Both plans refer to the same DB. > > JDBC is making use of PREPARE statements, whereas psql, unless you're > using PREPARE is not. > >> #1 – Fast – using psql or old JDBC driver > > The absence of any $1 type parameters here shows that's a custom plan > that's planned specifically using the parameter values given. > >> Slow – when using JDBC 42 > > Because this query has $1, $2, etc, that's a generic plan. When > looking up statistics histogram bounds and MCV slots cannot be > checked. Only ndistinct is used. If you have a skewed dataset, then > this might not be very good. > > You might find things run better if you adjust postgresql.conf and set > plan_cache_mode = force_custom_plan then select pg_reload_conf(); > > Please also check the documentation so that you understand the full > implications for that. > > David > >