Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Tom Lane
[ please keep the mailing list cc'd ] Kenneth Marshall writes: > Here are the stats for articles.id: > 4,7,9,11,13,14,16,17,18,19,20,21,22,23, > 24,25,26,32,33,34,36,40,41,42,43,44,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,99,100,101,102,106,107,108,109,113,1 > > 14,115,116,117,118,119,120,

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Kenneth Marshall
On Fri, Jun 19, 2020 at 05:25:33PM -0500, Kenneth Marshall wrote: > On Fri, Jun 19, 2020 at 06:10:34PM -0400, Tom Lane wrote: > > > max(objectcustomfieldvalues.objectid) = 28108423 and here is the > > > histogram for that column: > > > > ... 3304313,3693956,27667772} > > > > Hmm, does seem like

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Tom Lane
Kenneth Marshall writes: > On Fri, Jun 19, 2020 at 06:10:34PM -0400, Tom Lane wrote: >> Hmm, does seem like you have some outlier keys. Are any of the keys in >> the column you're trying to join to larger than 27667772? > The only values above 27667772? for objectid are: Sorry, I meant the othe

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Kenneth Marshall
On Fri, Jun 19, 2020 at 06:10:34PM -0400, Tom Lane wrote: > > max(objectcustomfieldvalues.objectid) = 28108423 and here is the > > histogram for that column: > > ... 3304313,3693956,27667772} > > Hmm, does seem like you have some outlier keys. Are any of the keys in > the column you're trying t

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Tom Lane
Kenneth Marshall writes: > On Fri, Jun 19, 2020 at 04:59:15PM -0400, Tom Lane wrote: >> What's the actual maximum value of objectcustomfieldvalues.objectid, >> and how does that compare to the endpoint value in the pg_stats >> histogram for that column? If you've got one outlier in the table, >>

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Kenneth Marshall
On Fri, Jun 19, 2020 at 04:59:15PM -0400, Tom Lane wrote: > > > Thank you for the information and suggestion. I tried bumping the > > statistics for the > > objectcustomfieldvalues.objectid column to 2k, 5k and 10k followed by an > > analyze and > > the query plan stayed the same. I also analyze

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Tom Lane
Kenneth Marshall writes: > On Fri, Jun 19, 2020 at 04:11:10PM -0400, Tom Lane wrote: >> The reason for such an estimation error usually is that the maximum >> join key values recorded in pg_stats are off: the join side that is >> going to be exhausted is the one with the smaller max join key. >> "

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Kenneth Marshall
On Fri, Jun 19, 2020 at 04:11:10PM -0400, Tom Lane wrote: > > It looks like the planner is being too optimistic about how quickly the > mergejoin will end: > > > -> Merge Join (cost=0.71..892.64 rows=1 width=137) (actual > > time=21165.453..21165.453 rows=0 loops=1) > >

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Benjamin Coutu
> While you're waiting, you might think about recasting the query to > avoid the OR. Perhaps you could do a UNION of two scans of the > transactions table? Thanks for the hint, I am well aware of the workaround for OR via UNION. I am not trying to improve this query per se as it is the small roo

Re: PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Tom Lane
Kenneth Marshall writes: > I was looking at a slow query in our CMDB that using postgresql-12.3 as its > backend. I since I am using the pg_trgm extension elsewhere I decided to give > it a try. To my surprise, the query plan did not change. But when I disabled > the index scan I got the much, muc

PostgreSQL 12.3 slow index scan chosen

2020-06-19 Thread Kenneth Marshall
Hi PostgreSQL users, I was looking at a slow query in our CMDB that using postgresql-12.3 as its backend. I since I am using the pg_trgm extension elsewhere I decided to give it a try. To my surprise, the query plan did not change. But when I disabled the index scan I got the much, much faster sca

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Michael Lewis
> > While you're waiting, you might think about recasting the query to > avoid the OR. Perhaps you could do a UNION of two scans of the > transactions table? > Minor note- use UNION ALL to avoid the dedupe work if you already know those will be distinct sets, or having duplicates is fine.

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Tom Lane
"Benjamin Coutu" writes: > I don't want to waste your time but maybe there is room for improvement as > both "account" and "contract" are highly distinct and the individual subplan > estimates are quite accurate: Yeah, as I said, the estimates you're getting for the OR'd subplans are pretty stu

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Benjamin Coutu
> No. The subplan estimates are for the number of rows produced by one > execution of the subplan, ie the numbers of "accounts" or "contracts" > rows that match those inner WHERE conditions. This has very little > a-priori relationship to the number of "transactions" rows that will > satisfy the

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread David G. Johnston
On Friday, June 19, 2020, Laurenz Albe wrote: > > > I am absolutely aware that those are hashed sub plans below a seqscan > and that Postgres therefore has to scan all tuples of the table. But the > problem is that upper nodes (which are excluded from > > this example for simplicity) think they w

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Tom Lane
"Benjamin Coutu" writes: > please consider the following SQL query: > SELECT * FROM "transactions" WHERE > "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR > "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%') > This yields the following pla

Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Laurenz Albe
On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote: > please consider the following SQL query: > > SELECT * FROM "transactions" WHERE > "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') > OR > "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%

Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread Benjamin Coutu
Hello, please consider the following SQL query: SELECT * FROM "transactions" WHERE "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%') This yields the following plan on Postgres 11: Seq S