pg_hint_plan extension might be able to force a plan.

Also, I don’t know if perf probes & perf record/script could be useful for 
creating a log of all the calls to do memory allocation along with the unwound 
call stacks? Then analyzing that file? At least this can be done for a single 
process, and just while the problematic sql is running.

-Jeremy

Sent from my TI-83

> On Apr 19, 2019, at 20:34, Gunther <r...@gusw.net> wrote:
> 
>> On 4/19/2019 17:01, Justin Pryzby wrote:
>> Were you able to reproduce the issue in some minimized way ?  Like after
>> joining fewer tables or changing to join with fewer join conditions ?
>> 
>> On Thu, Apr 18, 2019 at 05:21:28PM +0200, Tomas Vondra wrote:
>>> It would be possible to do at least one of these two things:
> Thanks, and sorry for my pushyness. Yes, I have pin pointed the HashJoin, and 
> I have created the two tables involved.
> 
> The data distribution of the join keys, they are all essentially UUIDs and 
> essentially random.
> 
> I am sharing this data with you. However, only someone who can actually 
> control the planner can use it to reproduce the problem. I have tried but not 
> succeeded. But I am sure the problem is reproduced by this material.
> 
> Here is the part of the plan that generates this massive number of calls to 
> 
>         ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 
> width=1197)
>               Hash Cond: (((q.documentinternalid)::text = 
> (documentinformationsubject.documentinternalid)::text) AND 
> ((r.targetinternalid)::text = 
> (documentinformationsubject.actinternalid)::text))
>               ->  Hash Right Join  (cost=1341541.37..2612134.36 rows=13 
> width=341)
>                     Hash Cond: 
> (((documentinformationsubject_2.documentinternalid)::text = 
> (q.documentinternalid)::text) AND 
> ((documentinformationsubject_2.actinternalid)::text = 
> (q.actinternalid)::text))
>                   ... let's call this tmp_q ...
>               ->  Hash  (cost=2908913.87..2908913.87 rows=34619 width=930)
>                     ->  Gather Merge  (cost=2892141.40..2908913.87 rows=34619 
> width=930)
>                         ... let's call this tmp_r ...                
> This can be logically reduced to the following query
> 
> SELECT *
>   FROM tmp_q q
>   RIGHT OUTER JOIN tmp_r r
>     USING(documentInternalId, actInternalId);
> with the following two tables
> 
> CREATE TABLE xtmp_q (
>  documentinternalid        character varying(255),
>  operationqualifiercode    character varying(512),
>  operationqualifiername    character varying(512),
>  actinternalid             character varying(255),
>  approvalinternalid        character varying(255),
>  approvalnumber            character varying(555),
>  approvalnumbersystem      character varying(555),
>  approvalstatecode         character varying(512),
>  approvalstatecodesystem   character varying(512),
>  approvaleffectivetimelow  character varying(512),
>  approvaleffectivetimehigh character varying(512),
>  approvalstatuscode        character varying(32),
>  licensecode               character varying(512),
>  agencyid                  character varying(555),
>  agencyname                text
> );
> 
> CREATE TABLE tmp_r (
>  documentinternalid      character varying(255),
>  is_current              character(1),
>  documentid              character varying(555),
>  documenttypecode        character varying(512),
>  subjectroleinternalid   character varying(255),
>  subjectentityinternalid character varying(255),
>  subjectentityid         character varying(555),
>  subjectentityidroot     character varying(555),
>  subjectentityname       character varying,
>  subjectentitytel        text,
>  subjectentityemail      text,
>  otherentityinternalid   character varying(255),
>  confidentialitycode     character varying(512),
>  actinternalid           character varying(255),
>  operationcode           character varying(512),
>  operationname           text,
>  productitemcode         character varying(512),
>  productinternalid       character varying(255)..
> );
> you can download the data here (URLs just a tiny bit obfuscated):
> 
> The small table http:// gusw dot net/tmp_q.gz
> 
> The big table is in the form of 9 parts of 20 MB each, http:// gusw dot 
> net/tmp_r.gz.00, .01, .02, ..., .09, maybe you need only the first part.
> 
> Download as many as you have patience to grab, and then import the data like 
> this:
> 
> \copy tmp_q from program 'zcat tmp_q.gz'
> \copt tmp_r from program 'cat tmp_r.gz.* |zcat'
> The only problem is that I can't test that this actually would trigger the 
> memory problem, because I can't force the plan to use the right join, it 
> always reverts to the left join hashing the tmp_q:
> 
>         ->  Hash Left Join  (cost=10.25..5601401.19 rows=5505039 width=12118)
>               Hash Cond: (((r.documentinternalid)::text = 
> (q.documentinternalid)::text) AND ((r.actinternalid)::text = 
> (q.actinternalid)::text))
>               ->  Seq Scan on tmp_r r  (cost=0.00..5560089.39 rows=5505039 
> width=6844)
>               ->  Hash  (cost=10.10..10.10 rows=10 width=6306)
>                     ->  Seq Scan on tmp_q q  (cost=0.00..10.10 rows=10 
> width=6306)
> which is of course much better, but when tmp_q and tmp_r are the results of 
> complex stuff that the planner can't estimate, then it gets it wrong, and 
> then the issue gets triggered because we are hashing on the big tmp_r, not 
> tmp_q.
> 
> It would be so nice if there was a way to force a specific plan for purposes 
> of the testing.  I tried giving false data in pg_class reltuples and relpages:
> 
> foo=# analyze tmp_q;
> ANALYZE
> foo=# analyze tmp_r;
> ANALYZE
> foo=# select relname, relpages, reltuples from pg_class where relname in 
> ('tmp_q', 'tmp_r');
>  relname | relpages |  reltuples
> ---------+----------+-------------
>  tmp_r   |  5505039 | 1.13467e+08
>  tmp_q   |        7 |         236
> (2 rows)
> 
> foo=# update pg_class set (relpages, reltuples) = (5505039, 1.13467e+08) 
> where relname = 'tmp_q';
> UPDATE 1
> foo=# update pg_class set (relpages, reltuples) = (7, 236) where relname = 
> 'tmp_r';
> UPDATE 1
> but that didn't help. Somehow the planner outsmarts every such trick, so I 
> can't get it to follow my right outer join plan where the big table is 
> hashed.  I am sure y'all know some way to force it.
> 
> regards,
> -Gunther

Reply via email to