Re: Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
You are right but the correct query is the one in the attached file. What
we want to do here is to force psql to send the "wrong data types" to
postgres and, as a result of this, get a bad plan.

Cheers,
Jose Osinde

On Thu, Aug 25, 2022 at 3:48 PM Ranier Vilela  wrote:

> Em qui., 25 de ago. de 2022 às 10:43, Jose Osinde 
> escreveu:
>
>>
>> Dear Depesz, Laurenz,
>>
>> Thanks very much for the fast responses. They are actually correct and
>> saved me a lot of time. I couldn't test the cast from the Java test but
>> this is something I can deal with later on (most probably updating the
>> column types to text in the database side instead). But what I could do was
>> reproduce the same problem in the psql console using the cast in the other
>> way. This sentence:
>>
>> explain analyze select logical_identifier, version_id, lastproduct
>> FROM test_product_ui_partition.product_ui pui
>> WHERE
>> pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::citext
>> AND
>> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext;
>>
> The query in explain.txt attached, it seems not the same.
>
> explain analyze select logical_identifier, version_id, lastproduct
>FROM test_product_ui_partition.product_ui pui
>   WHERE 
> pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::text
>   AND 
> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text;
>
> ::text?
>
> regards,
>
> Ranier Vilela
>
>


Re: Select on partitioned table is very slow

2022-08-25 Thread Ranier Vilela
Em qui., 25 de ago. de 2022 às 10:43, Jose Osinde 
escreveu:

>
> Dear Depesz, Laurenz,
>
> Thanks very much for the fast responses. They are actually correct and
> saved me a lot of time. I couldn't test the cast from the Java test but
> this is something I can deal with later on (most probably updating the
> column types to text in the database side instead). But what I could do was
> reproduce the same problem in the psql console using the cast in the other
> way. This sentence:
>
> explain analyze select logical_identifier, version_id, lastproduct
> FROM test_product_ui_partition.product_ui pui
> WHERE
> pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::citext
> AND
> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext;
>
The query in explain.txt attached, it seems not the same.

explain analyze select logical_identifier, version_id, lastproduct
   FROM test_product_ui_partition.product_ui pui
  WHERE 
pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::text
  AND 
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text;

::text?

regards,

Ranier Vilela


Re: Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
Dear Depesz, Laurenz,

Thanks very much for the fast responses. They are actually correct and
saved me a lot of time. I couldn't test the cast from the Java test but
this is something I can deal with later on (most probably updating the
column types to text in the database side instead). But what I could do was
reproduce the same problem in the psql console using the cast in the other
way. This sentence:

explain analyze select logical_identifier, version_id, lastproduct
FROM test_product_ui_partition.product_ui pui
WHERE
pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::citext
AND
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext;

Creates an output equivalent to that returned from the JAVA application and
reproduces the exact same problems: Scans all the partitions instead of
select the right one and uses sec scans for all the cases.
Attached the result.

Again, many thanks for your help,
Jose Osinde
explain analyze select logical_identifier, version_id, lastproduct 
   FROM test_product_ui_partition.product_ui pui 
  WHERE 
pui.mission_id='urn:esa:psa:context:investigation:mission.em16'::text 
  AND 
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text;
  


 Gather  (cost=1000.00..5617399.10 rows=19 width=82) (actual 
time=10485.082..10497.268 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..5616397.20 rows=15 width=82) (actual 
time=9221.542..10479.134 rows=0 loops=3)
 ->  Parallel Seq Scan on product_ui_em16 pui_10  
(cost=0.00..2603849.81 rows=3 width=112) (actual time=3883.905..5141.463 rows=0 
loops=3)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 5337106
 ->  Parallel Seq Scan on product_ui_rosetta pui_6  
(cost=0.00..2382752.79 rows=1 width=57) (actual time=6213.704..6213.704 rows=0 
loops=2)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 5878988
 ->  Parallel Seq Scan on product_ui_mex pui_7  (cost=0.00..354434.31 
rows=1 width=56) (actual time=1417.182..1417.186 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1085822
 ->  Parallel Seq Scan on product_ui_smart1 pui_8  
(cost=0.00..148741.56 rows=1 width=63) (actual time=1125.713..1125.713 rows=0 
loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1187167
 ->  Parallel Seq Scan on product_ui_vex pui_13  (cost=0.00..112914.84 
rows=1 width=56) (actual time=919.386..919.386 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 1133382
 ->  Parallel Seq Scan on product_ui_bc pui_9  (cost=0.00..8890.52 
rows=1 width=83) (actual time=78.910..78.910 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 67763
 ->  Parallel Seq Scan on product_ui_ch1 pui_2  (cost=0.00..2224.51 
rows=1 width=59) (actual time=20.162..20.162 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 20345
 ->  Parallel Seq Scan on product_ui_huygens pui_1  (cost=0.00..2090.82 
rows=1 width=70) (actual time=19.724..19.727 rows=0 loops=1)
   Filter: (((mission_id)::text = 
'urn:esa:psa:context:investigation:mission.em16'::text) AND 
((logical_identifier)::text = 
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
   Rows Removed by Filter: 19700
 ->  Parallel Seq Scan on 

Re: Select on partitioned table is very slow

2022-08-25 Thread Laurenz Albe
On Thu, 2022-08-25 at 11:10 +0200, hubert depesz lubaczewski wrote:
> Hi,
> 
> On Thu, Aug 25, 2022 at 10:49:51AM +0200, Jose Osinde wrote:
> > select logical_identifier, version_id, lastproduct
> >    from test_product_ui_partition.product_ui pui
> >     where pui.mission_id='urn:esa:psa:context:investigation:mission.em16'
> >   and 
> > pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'
> 
> > EXPLAIN ANALYZE FROM PGADMIN
> > 
> > Index Scan using product_ui_em16_logical_identifier_idx on
> > product_ui_em16 pui  (cost=0.69..19.75 rows=7 width=112) (actual
> > time=0.133..0.134 rows=1 loops=1)
> > [...]  Index Cond: (logical_identifier =
> > 'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext)"
> > [...]  Filter: (mission_id =
> > 'urn:esa:psa:context:investigation:mission.em16'::citext)"
> > Planning Time: 0.237 ms
> > Execution Time: 0.149 ms
> 
> I really wish you didn't butcher explains like this, but we can work
> with it.
> 
> Please note that the condition for filter is:
> 
> mission_id = 'urn:esa:psa:context:investigation:mission.em16'::citext
> 
> Specifically, column mission_id (which is partition key) is compared
> with some value that is in citext type - same as column.
> This means that pg can take this value, compare with partitioning
> schema, and pick one partition.
> 
> Now look at the explain from java:
> 
> >   Filter: (((mission_id)::text =
> > 'urn:esa:psa:context:investigation:mission.em16'::text) AND
> 
> The rest is irrelevant.
> 
> The important part is that java sent query that doesn't compare value of
> column mission_id with some value, but rather compares *cast* of the
> column.
> 
> Since it's not column value, then partitioning can't check what's going
> on (cast can just as well make it totally different value), and it also
> can't really use index on mission_id.
> 
> Why it happens - no idea, sorry, I don't grok java.
> 
> But you should be able to test/work on fix with simple, non-partitioned
> table, just make there citext column, and try searching for value in it,
> and check explain from the search. If it will cast column - it's no
> good.
> 
> Sorry I can't tell you what to fix, but perhaps this will be enough for
> you to find solution.

Quite so.

You are probably using a prepared statement in JDBC.

You probably have to use explicit type casts, like:

select logical_identifier, version_id, lastproduct 
   from test_product_ui_partition.product_ui pui 
where pui.mission_id = ? :: citext
  and pui.logical_identifier = ? :: citext

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Select on partitioned table is very slow

2022-08-25 Thread hubert depesz lubaczewski
Hi,

On Thu, Aug 25, 2022 at 10:49:51AM +0200, Jose Osinde wrote:
> select logical_identifier, version_id, lastproduct
>from test_product_ui_partition.product_ui pui
> where pui.mission_id='urn:esa:psa:context:investigation:mission.em16'
>   and 
> pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'

> EXPLAIN ANALYZE FROM PGADMIN
> 
> Index Scan using product_ui_em16_logical_identifier_idx on
> product_ui_em16 pui  (cost=0.69..19.75 rows=7 width=112) (actual
> time=0.133..0.134 rows=1 loops=1)
> [...]  Index Cond: (logical_identifier =
> 'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext)"
> [...]  Filter: (mission_id =
> 'urn:esa:psa:context:investigation:mission.em16'::citext)"
> Planning Time: 0.237 ms
> Execution Time: 0.149 ms

I really wish you didn't butcher explains like this, but we can work
with it.

Please note that the condition for filter is:

mission_id = 'urn:esa:psa:context:investigation:mission.em16'::citext

Specifically, column mission_id (which is partition key) is compared
with some value that is in citext type - same as column.
This means that pg can take this value, compare with partitioning
schema, and pick one partition.

Now look at the explain from java:

>   Filter: (((mission_id)::text =
> 'urn:esa:psa:context:investigation:mission.em16'::text) AND

The rest is irrelevant.

The important part is that java sent query that doesn't compare value of
column mission_id with some value, but rather compares *cast* of the
column.

Since it's not column value, then partitioning can't check what's going
on (cast can just as well make it totally different value), and it also
can't really use index on mission_id.

Why it happens - no idea, sorry, I don't grok java.

But you should be able to test/work on fix with simple, non-partitioned
table, just make there citext column, and try searching for value in it,
and check explain from the search. If it will cast column - it's no
good.

Sorry I can't tell you what to fix, but perhaps this will be enough for
you to find solution.

Best regards,

depesz





Select on partitioned table is very slow

2022-08-25 Thread Jose Osinde
Dear all,

After detecting some performance issues accessing a partitioned table in a
postgres database we created a simple Java test to analyse the possible
causes of this problem. The test is very simple. It just creates a database
connection and executes a query on a partitioned table including the
partition key column in the where clause. What we expect is that the
EXPLAIN ANALYZE returns a single INDEX SCAN on the child partitioned table,
as reported when this *same* query is executed from the pgadmin or psql
clients. What we actually get is a PARALLEL SEQ SCAN on all the tables
belonging to the partition. Here some more information

Database:

   - PostgreSQL, version: 13.7

JAVA

   - openjdk version "1.8.0_292"
   - JDBC: PostgreSQL JDBC Driver, version: 42.4.2


TABLE DEFINITION

CREATE TABLE IF NOT EXISTS test_product_ui_partition.product_ui
(
id bigint NOT NULL DEFAULT
nextval('test_product_ui_partition.product_ui_seq'::regclass),
bundle_description citext COLLATE pg_catalog."default",
bundle_distribution_path character varying COLLATE pg_catalog."default",
mission_id citext COLLATE pg_catalog."default" NOT NULL,
...
has_facets boolean,
CONSTRAINT product_ui_pkey PRIMARY KEY (id, mission_id)
) PARTITION BY LIST (mission_id);

-- Relevant indexes

CREATE INDEX IF NOT EXISTS product_ui_mission_id_idx
ON test_product_ui_partition.product_ui USING btree
(mission_id COLLATE pg_catalog."default" ASC NULLS LAST);

CREATE INDEX IF NOT EXISTS product_ui_logical_identifier_idx
ON test_product_ui_partition.product_ui USING btree
(logical_identifier COLLATE pg_catalog."default" ASC NULLS LAST)


TABLE SIZES

  Schema   |Name |   Type|
Owner   | Persistence | Access method |  Size   |
---+-+---+--+-+---+-
 test_product_ui_partition | product_ui  | partitioned table |
postgres | permanent   |   | 0 bytes |
---+-+---+--+-+---+---+-
 test_product_ui_partition | product_ui_em16 | table |
postgres | permanent   | heap  | 19 GB   |
---+---+---+--+-+---+---+---
 test_product_ui_partition | product_ui_bc   | table |
postgres | permanent   | heap  | 66 MB   |


TEST QUERY

select logical_identifier, version_id, lastproduct
   from test_product_ui_partition.product_ui pui
where pui.mission_id='urn:esa:psa:context:investigation:mission.em16'
  and 
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'

*This query returns one single row*


EXPLAIN ANALYZE FROM PGADMIN

Index Scan using product_ui_em16_logical_identifier_idx on
product_ui_em16 pui  (cost=0.69..19.75 rows=7 width=112) (actual
time=0.133..0.134 rows=1 loops=1)
[...]  Index Cond: (logical_identifier =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::citext)"
[...]  Filter: (mission_id =
'urn:esa:psa:context:investigation:mission.em16'::citext)"
Planning Time: 0.237 ms
Execution Time: 0.149 ms

EXPLAIN ANALYZE FROM JAVA TEST

class org.postgresql.jdbc.PgConnection
PostgreSQL
version: 13.7
PostgreSQL JDBC Driver
version: 42.4.2
Query: explain analyze select logical_identifier, version_id,
lastproduct from test_product_ui_partition.product_ui pui where
pui.mission_id='urn:esa:psa:context:investigation:mission.em16' and
pui.logical_identifier='urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'
Gather  (cost=1000.00..5617399.10 rows=19 width=82) (actual
time=9987.415...325 rows=1 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Append  (cost=0.00..5616397.20 rows=15 width=82)
(actual time=8240.465..9981.736 rows=0 loops=3)
->  Parallel Seq Scan on product_ui_em16 pui_10
(cost=0.00..2603849.81 rows=3 width=112) (actual
time=3048.850..4790.105 rows=0 loops=3)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 5337106
->  Parallel Seq Scan on product_ui_rosetta pui_6
(cost=0.00..2382752.79 rows=1 width=57) (actual
time=6070.946..6070.946 rows=0 loops=2)
  Filter: (((mission_id)::text =
'urn:esa:psa:context:investigation:mission.em16'::text) AND
((logical_identifier)::text =
'urn:esa:psa:em16_tgo_frd:data_raw:frd_raw_sc_n_20220729t00-20220729t235959'::text))
  Rows Removed by Filter: 5878988
->  Parallel Seq Scan on product_ui_mex pui_7
(cost=0.00..354434.31 rows=1 width=56) (actual time=1280.037..1280.037
rows=0 loops=1)
  Filter: