Re: Query Planner not taking advantage of HASH PARTITION

2022-04-21 Thread Justin Pryzby
On Wed, Apr 20, 2022 at 07:11:37PM -0700, Benjamin Tingle wrote:
> @ the first point about write locks
> I think I had/have a misconception about how inserts work in postgres. It's
> my understanding that postgres will never draft a parallel insert plan for
> any query (except maybe CREATE TABLE AS?)

It's correct that DML (INSERT/UPDATE/DELETE) currently is not run in parallel.
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

> because the process needs to acquire an exclusive access write lock to the
> table it is inserting on.

But this is incorrect - DML acquires a relation lock, but not a strong one.
Multiple processes can insert into a table at once (because the row-excl lock
level is not self-conflicting, to be technical).
https://www.postgresql.org/docs/current/explicit-locking.html

In fact, that's a design requirement.  It's understood that many people would
be unhappy if only one client were able to run UPDATEs at once, and that only a
toy system would acquire a strong lock for DML.

-- 
Justin




Re: Query Planner not taking advantage of HASH PARTITION

2022-04-21 Thread Benjamin Tingle
Going to forward my response from an individual thread with Jeff here in
case anyone else can help me out.

I wasn't sure if forwarding the message to the mailing list would be
considered as a continuation of the same subject, so I'm just going to
paste it here. I'm a bit of an email noob :P

-

Jeff,

First off, thanks for the thoughtful response.

@ the first point about write locks
I think I had/have a misconception about how inserts work in postgres. It's
my understanding that postgres will never draft a parallel insert plan for
any query (except maybe CREATE TABLE AS?) because the process needs to
acquire an exclusive access write lock to the table it is inserting on. My
thinking was that since the partitions are treated as separate tables that
they can be theoretically inserted to in parallel.

@ the second point about indexes on textfield
I realized my error on this after I sent the email, indexes do not speed up
large joins, just small ones.

@ the third point about hash joins
So this is interesting to me. Your description of how hash joins work
sounds like the behaviour I would want, yet performing huge joins is where
my larger databases have been getting stuck. Upon looking back at my code,
I think I realize perhaps why they were getting stuck. So my database
doesn't have just one table, it has three principal tables which relate to
one another: Full disclosure, these databases/tables are distributed
between multiple machines and can get quite enormous (some tables
individually are 200+GB)

tab1 (textfield1 text, tf1_id bigint) unique on textfield1
tab2 (textfield2 text, tf2_id bigint) unique on textfield2
tab3 (tf1_id_fk bigint, tf2_id_fk bigint) unique on tf1_id_fk, tf2_id_fk

So as I'm uploading new data (in the form of (textfield1, textfield2)
entries) I need to record the ID of each matching record on the join(s) or
null if there was no match (thus a new record). The way I have been
accomplishing this so far has been like so:

1. create temporary table upload(textfield1 text, textfield2 text, tf1_id
bigint, tf2_id bigint);
2. copy :'source' to upload(textfield1, textfield2);
3. update upload set tf1_id = tab1.tf1_id from tab1 where upload.textfield1
= tab1.textfield1;
4. create temporary table new_textfield1 (textfield1 text, tf1_id bigint);
5. insert into new_textfield1 (select distinct on (textfield1) textfield1,
nextval('tf1_id_sequence') as tf1_id from upload where tf1_id is null)
6. update upload u set tf1_id = ntf1.tf1_id from new_textfield1 ntf1 where
u.tf1_id is null and u.textfield1 = ntf1.textfield1;
-- etc. continue process for tab2, tab3

Now, since I wrote that code I've learned about aggregation & window
functions so I can generate the new ids during the big table join rather
than after the fact, but the big join "update" statement has been where the
process gets stuck for huge tables. I notice that the query planner
generates a different strategy when just selecting data vs "insert select"
or "update".

 For example, when I write a query to join entries from one huge table to
another using a select statement, I get a nice parallel hash join plan like
you mention.

> explain select * from hugetable1 join hugetable2 on some_field;


Results in:

>  Gather
>Workers Planned: 2
>->  Parallel Hash Left Join
>  Hash Cond: ((ht1.some_field)::text = (ht2.some_field)::text)
>  ->  Parallel Append
>->  Parallel Seq Scan on hugetable2 ht2
>

However, when I change this to an update or insert select, I get a very
different plan.

> explain insert into dummy1 (select *  from hugetable1 join hugetable2 on
> some_field)
> OR
> explain update hugetable1 ht1 set id = ht2.id from hugetable2 ht2 where
> ht1.some_field = ht2.some_field


Results in:

> Update on hugetable1 ||OR|| Insert on dummy1
>->  Merge Join
>  Merge Cond: ((ht1.some_field)::text = (ht2.some_field)::text)
>  ->  Sort
>Sort Key: ht1.some_field
>->  Seq Scan on hugetable1 ht1
>  ->  Materialize
>->  Sort
>  Sort Key: ht2.some_field
>  ->  Append
>->  Seq Scan on hugetable2 ht2


Maybe this query should perform similarly to the hash join, but the fact
remains that I've had databases stuck for weeks on plans like this. The
partitioning strategy I've adopted has been an attempt to force data
locality during the join operation, and so far has been working reasonably
well. If you have some insight into why these large update/insert
operations go so slowly, it would be much appreciated.

-Ben

On Sun, Apr 17, 2022 at 11:50 AM Alvaro Herrera 
wrote:

> On 2022-Apr-14, Benjamin Tingle wrote:
>
> > It doesn't help if I partition temp_data by textfield beforehand either
> > (using the same scheme as the target table). It still opts to