pg_hint_plan extension might be able to force a plan. Also, I don’t know if perf probes & perf record/script could be useful for creating a log of all the calls to do memory allocation along with the unwound call stacks? Then analyzing that file? At least this can be done for a single process, and just while the problematic sql is running.
-Jeremy Sent from my TI-83 > On Apr 19, 2019, at 20:34, Gunther <r...@gusw.net> wrote: > >> On 4/19/2019 17:01, Justin Pryzby wrote: >> Were you able to reproduce the issue in some minimized way ? Like after >> joining fewer tables or changing to join with fewer join conditions ? >> >> On Thu, Apr 18, 2019 at 05:21:28PM +0200, Tomas Vondra wrote: >>> It would be possible to do at least one of these two things: > Thanks, and sorry for my pushyness. Yes, I have pin pointed the HashJoin, and > I have created the two tables involved. > > The data distribution of the join keys, they are all essentially UUIDs and > essentially random. > > I am sharing this data with you. However, only someone who can actually > control the planner can use it to reproduce the problem. I have tried but not > succeeded. But I am sure the problem is reproduced by this material. > > Here is the part of the plan that generates this massive number of calls to > > -> Hash Right Join (cost=4255031.53..5530808.71 rows=34619 > width=1197) > Hash Cond: (((q.documentinternalid)::text = > (documentinformationsubject.documentinternalid)::text) AND > ((r.targetinternalid)::text = > (documentinformationsubject.actinternalid)::text)) > -> Hash Right Join (cost=1341541.37..2612134.36 rows=13 > width=341) > Hash Cond: > (((documentinformationsubject_2.documentinternalid)::text = > (q.documentinternalid)::text) AND > ((documentinformationsubject_2.actinternalid)::text = > (q.actinternalid)::text)) > ... let's call this tmp_q ... > -> Hash (cost=2908913.87..2908913.87 rows=34619 width=930) > -> Gather Merge (cost=2892141.40..2908913.87 rows=34619 > width=930) > ... let's call this tmp_r ... > This can be logically reduced to the following query > > SELECT * > FROM tmp_q q > RIGHT OUTER JOIN tmp_r r > USING(documentInternalId, actInternalId); > with the following two tables > > CREATE TABLE xtmp_q ( > documentinternalid character varying(255), > operationqualifiercode character varying(512), > operationqualifiername character varying(512), > actinternalid character varying(255), > approvalinternalid character varying(255), > approvalnumber character varying(555), > approvalnumbersystem character varying(555), > approvalstatecode character varying(512), > approvalstatecodesystem character varying(512), > approvaleffectivetimelow character varying(512), > approvaleffectivetimehigh character varying(512), > approvalstatuscode character varying(32), > licensecode character varying(512), > agencyid character varying(555), > agencyname text > ); > > CREATE TABLE tmp_r ( > documentinternalid character varying(255), > is_current character(1), > documentid character varying(555), > documenttypecode character varying(512), > subjectroleinternalid character varying(255), > subjectentityinternalid character varying(255), > subjectentityid character varying(555), > subjectentityidroot character varying(555), > subjectentityname character varying, > subjectentitytel text, > subjectentityemail text, > otherentityinternalid character varying(255), > confidentialitycode character varying(512), > actinternalid character varying(255), > operationcode character varying(512), > operationname text, > productitemcode character varying(512), > productinternalid character varying(255).. > ); > you can download the data here (URLs just a tiny bit obfuscated): > > The small table http:// gusw dot net/tmp_q.gz > > The big table is in the form of 9 parts of 20 MB each, http:// gusw dot > net/tmp_r.gz.00, .01, .02, ..., .09, maybe you need only the first part. > > Download as many as you have patience to grab, and then import the data like > this: > > \copy tmp_q from program 'zcat tmp_q.gz' > \copt tmp_r from program 'cat tmp_r.gz.* |zcat' > The only problem is that I can't test that this actually would trigger the > memory problem, because I can't force the plan to use the right join, it > always reverts to the left join hashing the tmp_q: > > -> Hash Left Join (cost=10.25..5601401.19 rows=5505039 width=12118) > Hash Cond: (((r.documentinternalid)::text = > (q.documentinternalid)::text) AND ((r.actinternalid)::text = > (q.actinternalid)::text)) > -> Seq Scan on tmp_r r (cost=0.00..5560089.39 rows=5505039 > width=6844) > -> Hash (cost=10.10..10.10 rows=10 width=6306) > -> Seq Scan on tmp_q q (cost=0.00..10.10 rows=10 > width=6306) > which is of course much better, but when tmp_q and tmp_r are the results of > complex stuff that the planner can't estimate, then it gets it wrong, and > then the issue gets triggered because we are hashing on the big tmp_r, not > tmp_q. > > It would be so nice if there was a way to force a specific plan for purposes > of the testing. I tried giving false data in pg_class reltuples and relpages: > > foo=# analyze tmp_q; > ANALYZE > foo=# analyze tmp_r; > ANALYZE > foo=# select relname, relpages, reltuples from pg_class where relname in > ('tmp_q', 'tmp_r'); > relname | relpages | reltuples > ---------+----------+------------- > tmp_r | 5505039 | 1.13467e+08 > tmp_q | 7 | 236 > (2 rows) > > foo=# update pg_class set (relpages, reltuples) = (5505039, 1.13467e+08) > where relname = 'tmp_q'; > UPDATE 1 > foo=# update pg_class set (relpages, reltuples) = (7, 236) where relname = > 'tmp_r'; > UPDATE 1 > but that didn't help. Somehow the planner outsmarts every such trick, so I > can't get it to follow my right outer join plan where the big table is > hashed. I am sure y'all know some way to force it. > > regards, > -Gunther