Performance problems with Postgres JDBC 42.4.2

2023-11-06 Thread Jose Osinde
Dear all,

I'm running a query  from Java on a postgres database:

Java version: 17
JDBC version: 42.4.2
Postgres version: 13.1

In parallel I'm testing the same queries from pgAdmin 4 version 6.13

The tables I'm using contains more than 10million rows each and I have two
questions here:

1. I need to extract the path of a file without the file itself. For this I
use two alternatives as I found that sentence "A" is much faster than the
"B" one:

"A" sentence:

SELECT DISTINCT ( LEFT(opf.file_path, length(opf.file_path) - position('/'
in reverse(opf.file_path))) ) AS path
   FROM product AS op JOIN product_file AS opf ON
opf.product_id = op.id
   WHERE op.proprietary_end_date <= CURRENT_DATE
AND op.id LIKE 'urn:esa:psa:%'

"B" sentence:

SELECT DISTINCT ( regexp_replace(opf.file_path, '(.*)\/(.*)$', '\1') ) AS
path
   FROM product AS op JOIN product_file AS opf ON
opf.product_id = op.id
   WHERE op.proprietary_end_date <= CURRENT_DATE
AND op.id LIKE 'urn:esa:psa:%'

2. Running sentence "A" on the pgAdmin client takes 4-5 minutes to finish
but running it from a Java program it never ends. This is still the case
when I limit the output to the first 100 rows so I assume this is not a
problem with the amount of data being transferred but the way postgres
resolve the query. To make it work in Java I had to define a postgres
function that I call from the Java code instead of running the query
directly.

I had a similar problem in the past with a query that performed very poorly
from a Java client while it was fine from pgAdmin or a python script. In
that case it was a matter of column types not compatible with the JDBC (citext)
deriving in an implicit cast that prevented the postgres engine from using
a given index or to cast all the values of that column before using it, not
sure now. But I don't think this is not the case here.

Could anyone help me again?

Many thanks in advance
Jose


Re: Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
You are right but the correct query is the one in the attached file. What
we want to do here is to force psql to send the "wrong data types" to
postgres and, as a result of this, get a bad plan.

Cheers,
Jose Osinde

On Thu, Aug 25, 2022 at 3:48 PM Ranier Vilela  wrote:

> Em qui., 25 de ago. de 2022 às 10:43, Jose Osinde 
> escreveu:
>
>>
>> Dear Depesz, Laurenz,
>>
>> Thanks very much for the fast responses. They are actually correct and
>> saved me a lot of time. I couldn't test the cast from the Java test but
>> this is something I can deal with later on (most probably updating the
>> column types to text in the database side instead). But what I could do was
>> reproduce the same problem in the psql console using the cast in the other
>> way. This sentence:
>>
>> explain analyze select logical_identifier, version_id, lastproduct
>> FROM test_product_ui_partition.product_ui pui
>> WHERE
>> pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::citext
>> AND
>> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext;
>>
> The query in explain.txt attached, it seems not the same.
>
> explain analyze select logical_identifier, version_id, lastproduct
>FROM test_product_ui_partition.product_ui pui
>   WHERE 
> pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::text
>   AND 
> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text;
>
> ::text?
>
> regards,
>
> Ranier Vilela
>
>


Re: Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
Dear Depesz, Laurenz,

Thanks very much for the fast responses. They are actually correct and
saved me a lot of time. I couldn't test the cast from the Java test but
this is something I can deal with later on (most probably updating the
column types to text in the database side instead). But what I could do was
reproduce the same problem in the psql console using the cast in the other
way. This sentence:

explain analyze select logical_identifier, version_id, lastproduct
FROM test_product_ui_partition.product_ui pui
WHERE
pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::citext
AND
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext;

Creates an output equivalent to that returned from the JAVA application and
reproduces the exact same problems: Scans all the partitions instead of
select the right one and uses sec scans for all the cases.
Attached the result.

Again, many thanks for your help,
Jose Osinde
explain analyze select logical_identifier, version_id, lastproduct 
   FROM test_product_ui_partition.product_ui pui 
  WHERE 
pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::text 
  AND 
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text;
  


 Gather  (cost=1000.00..5617399.10 rows=19 width=82) (actual 
time=10485.082..10497.268 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..5616397.20 rows=15 width=82) (actual 
time=9221.542..10479.134 rows=0 loops=3)
 ->  Parallel Seq Scan on product_ui_em16 pui_10  
(cost=0.00..2603849.81 rows=3 width=112) (actual time=3883.905..5141.463 rows=0 
loops=3)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 5337106
 ->  Parallel Seq Scan on product_ui_rosetta pui_6  
(cost=0.00..2382752.79 rows=1 width=57) (actual time=6213.704..6213.704 rows=0 
loops=2)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 5878988
 ->  Parallel Seq Scan on product_ui_mex pui_7  (cost=0.00..354434.31 
rows=1 width=56) (actual time=1417.182..1417.186 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1085822
 ->  Parallel Seq Scan on product_ui_smart1 pui_8  
(cost=0.00..148741.56 rows=1 width=63) (actual time=1125.713..1125.713 rows=0 
loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1187167
 ->  Parallel Seq Scan on product_ui_vex pui_13  (cost=0.00..112914.84 
rows=1 width=56) (actual time=919.386..919.386 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1133382
 ->  Parallel Seq Scan on product_ui_bc pui_9  (cost=0.00..8890.52 
rows=1 width=83) (actual time=78.910..78.910 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 67763
 ->  Parallel Seq Scan on product_ui_ch1 pui_2  (cost=0.00..2224.51 
rows=1 width=59) (actual time=20.162..20.162 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 20345
 ->  Parallel Seq Scan on product_ui_huygens pui_1  (cost=0.00..2090.82 
rows=1 width=70) (actual time=19.724..19.727 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 19700
   

Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
actual time=1280.037..1280.037
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 1085822
->  Parallel Seq Scan on product_ui_smart1 pui_8
(cost=0.00..148741.56 rows=1 width=63) (actual time=1045.532..1045.533
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 1187167
->  Parallel Seq Scan on product_ui_vex pui_13
(cost=0.00..112914.84 rows=1 width=56) (actual time=968.542..968.542
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 1133382
->  Parallel Seq Scan on product_ui_bc pui_9
(cost=0.00..8890.52 rows=1 width=83) (actual time=88.747..88.747
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 67763
->  Parallel Seq Scan on product_ui_ch1 pui_2
(cost=0.00..2224.51 rows=1 width=59) (actual time=23.304..23.304
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 20345
->  Parallel Seq Scan on product_ui_huygens pui_1
(cost=0.00..2090.82 rows=1 width=70) (actual time=21.786..21.786
rows=0 loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 19700
->  Parallel Seq Scan on product_ui_ground_based pui_3
(cost=0.00..260.74 rows=1 width=75) (actual time=2.615..2.616 rows=0
loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 2237
->  Parallel Seq Scan on product_ui_giotto pui_4
(cost=0.00..209.79 rows=1 width=47) (actual time=2.188..2.189 rows=0
loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 2129
->  Parallel Seq Scan on product_ui_juice pui_12
(cost=0.00..13.88 rows=1 width=67) (actual time=0.132..0.132 rows=0
loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 100
->  Parallel Seq Scan on product_ui_emrsp pui_11
(cost=0.00..10.26 rows=1 width=65) (actual time=0.000..0.000 rows=0
loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
->  Parallel Seq Scan on product_ui_hubble pui_5
(cost=0.00..3.29 rows=1 width=62) (actual time=0.047..0.047 rows=0
loops=1)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 33
Planning Time: 16.874 ms
Execution Time: .496 ms
Elapsed: 10122 ms


Source code of the test is attached in the mail. It provides some other
test cases as:

   - Query other non partitioned table, with correct performance
   - Query the correct partitioned table according to the partition key
   column (mission_id). In this case only that table is scanned, but again,
   using PARALLEL SEQ SCAN and not the INDEX SCAN

Could you please provide any hint on the possible reasons of this behavior
and the performance degradation that is affecting only the JAVA client.

Best regards,

Jose Osinde


JdbcTest.java
Description: Binary data