The optimizer was a bit too clever. It used the same plan for the LEFT JOIN. But that put me on the right track. I tried a LATERAL join. But the optimizer saw through that too and used the same plan. So I tried a materialized CTE and that finally forced it to use a different plan. That made it run in ~70ms -- about 18x faster. Thanks!
explain analyze with r as materialized ( select * from matching_rules where id >= 0 and id < 60 ) select r.id, i.id from r join items i on i.name ~ r.name_matches ; QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Nested Loop (cost=2.78..714.20 rows=230 width=8) (actual time=0.071..69.545 rows=702 loops=1) Join Filter: (i.name ~ r.name_matches) Rows Removed by Join Filter: 45298 CTE r -> Seq Scan on matching_rules (cost=0.00..2.78 rows=46 width=26) (actual time=0.007..0.047 rows=46 loops=1) Filter: ((id >= 0) AND (id < 60)) Rows Removed by Filter: 6 -> CTE Scan on r (cost=0.00..0.92 rows=46 width=36) (actual time=0.008..0.090 rows=46 loops=1) -> Materialize (cost=0.00..23.00 rows=1000 width=27) (actual time=0.000..0.081 rows=1000 loops=46) -> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27) (actual time=0.003..0.092 rows=1000 loops=1) Planning Time: 0.206 ms Execution Time: 69.633 ms On Wed, Aug 25, 2021 at 4:05 PM Justin Pryzby <pry...@telsasoft.com> wrote: > On Wed, Aug 25, 2021 at 11:47:43AM -0500, Jack Christensen wrote: > > I have items that need to be categorized by user defined matching rules. > > Trusted users can create rules that include regular expressions. I've > > reduced the problem to this example. > > > I use the following query to find matches: > > > > select r.id, i.id > > from items i > > join matching_rules r on i.name ~ r.name_matches; > > > > When there are few rules the query runs quickly. But as the number of > rules > > increases the runtime often increases at a greater than linear rate. > > Maybe it's because the REs are cached by RE_compile_and_cache(), but if you > loop over the REs in the inner loop, then the caching is ineffecive. > > Maybe you can force it to join with REs on the outer loop by writing it as: > | rules LEFT JOIN items WHERE rules.id IS NOT NULL, > ..to improve performance, or at least test that theory. > > -- > Justin >