Performance problems with Postgres JDBC 42.4.2

2023-11-06 Thread Jose Osinde
Dear all,

I'm running a query  from Java on a postgres database:

Java version: 17
JDBC version: 42.4.2
Postgres version: 13.1

In parallel I'm testing the same queries from pgAdmin 4 version 6.13

The tables I'm using contains more than 10million rows each and I have two
questions here:

1. I need to extract the path of a file without the file itself. For this I
use two alternatives as I found that sentence "A" is much faster than the
"B" one:

"A" sentence:

SELECT DISTINCT ( LEFT(opf.file_path, length(opf.file_path) - position('/'
in reverse(opf.file_path))) ) AS path
   FROM product AS op JOIN product_file AS opf ON
opf.product_id = op.id
   WHERE op.proprietary_end_date <= CURRENT_DATE
AND op.id LIKE 'urn:esa:psa:%'

"B" sentence:

SELECT DISTINCT ( regexp_replace(opf.file_path, '(.*)\/(.*)$', '\1') ) AS
path
   FROM product AS op JOIN product_file AS opf ON
opf.product_id = op.id
   WHERE op.proprietary_end_date <= CURRENT_DATE
AND op.id LIKE 'urn:esa:psa:%'

2. Running sentence "A" on the pgAdmin client takes 4-5 minutes to finish
but running it from a Java program it never ends. This is still the case
when I limit the output to the first 100 rows so I assume this is not a
problem with the amount of data being transferred but the way postgres
resolve the query. To make it work in Java I had to define a postgres
function that I call from the Java code instead of running the query
directly.

I had a similar problem in the past with a query that performed very poorly
from a Java client while it was fine from pgAdmin or a python script. In
that case it was a matter of column types not compatible with the JDBC (citext)
deriving in an implicit cast that prevented the postgres engine from using
a given index or to cast all the values of that column before using it, not
sure now. But I don't think this is not the case here.

Could anyone help me again?

Many thanks in advance
Jose


Re: [EXTERNAL] Performance down with JDBC 42

2023-11-06 Thread Frits Hoogland
Very good point from Danny: generic and custom plans.

One thing that is almost certainly not at play here, and is mentioned: there 
are some specific cases where the planner does not optimise for the query in 
total to be executed as fast/cheap as possible, but for the first few rows. One 
reason for that to happen is if a query is used as a cursor.

(Warning: shameless promotion) I did a writeup on JDBC clientside/serverside 
prepared statements and custom and generic plans: 
https://dev.to/yugabyte/postgres-query-execution-jdbc-prepared-statements-51e2
The next obvious question then is if something material did change with JDBC 
for your old and new JDBC versions, I do believe the prepareThreshold did not 
change.


Frits Hoogland




> On 5 Nov 2023, at 20:47, David Rowley  wrote:
> 
> On Mon, 6 Nov 2023 at 08:37, Abraham, Danny  wrote:
>> 
>> Both plans refer to the same DB.
> 
> JDBC is making use of PREPARE statements, whereas psql, unless you're
> using PREPARE is not.
> 
>> #1 – Fast – using psql or old JDBC driver
> 
> The absence of any $1 type parameters here shows that's a custom plan
> that's planned specifically using the parameter values given.
> 
>> Slow – when using JDBC 42
> 
> Because this query has $1, $2, etc, that's a generic plan. When
> looking up statistics histogram bounds and MCV slots cannot be
> checked. Only ndistinct is used. If you have a skewed dataset, then
> this might not be very good.
> 
> You might find things run better if you adjust postgresql.conf and set
> plan_cache_mode = force_custom_plan then select pg_reload_conf();
> 
> Please also check the documentation so that you understand the full
> implications for that.
> 
> David
> 
>