Tom, is there a different construct than the Similar To that would work? I know for certain that the first few characters could be different due to the nature of geohashes. So I may not be able to optimize the prefix aspect in some cases.
Best regards, Ty On Jun 20, 2017 10:19 AM, "Tom Lane" <t...@sss.pgh.pa.us> wrote: > Tieson Molly <tvm...@gmail.com> writes: > > I have an index I created on the prefix of a column: > > > create index location_geo_idx ON locations( substring(geohash, 0, 5)); > > > I currently use a similar to query, but I wanted to know if there is a > > faster way to query multiple value using this index than this? > > > select l.geohash from locations l where l.geohash similar to > '(dr7g|dr7e)%'; > > Well, you've got a couple of problems there. The most basic one is that > that index doesn't match that query at all. You need to arrange things > so that the lefthand side of the SIMILAR TO operator is exactly the > indexed value, not something that's related to it. (Yes, in principle > that index could be used to answer this query, but it would require a > great deal more intimate knowledge than the planner has about the > semantics of both substring() and SIMILAR TO.) IOW, you need to write > > select l.geohash from locations l > where substring(l.geohash, 0, 5) similar to '(dr7g|dr7e)%'; > > The other possible solution would be to just index the geohash strings > verbatim; unless they are quite long, that's what I'd recommend, usually. > > Secondly, if you're using a non-C locale, you're likely not getting an > indexscan plan anyway; check it with EXPLAIN. To get an indexed prefix > search out of a pattern match, the index has to use C sorting rules, > which you can force with a COLLATE or text_pattern_ops option if the > database's prevailing locale isn't C. > > Thirdly, if you experiment with EXPLAIN a little bit, you'll soon realize > that the planner is not great at extracting common prefix strings out of > OR'd pattern branches: > > regression=# create table loc (f1 text unique); > CREATE TABLE > regression=# explain select * from loc where f1 similar to '(dr7g|dr7e)%'; > QUERY PLAN > ------------------------------------------------------------------------- > Bitmap Heap Scan on loc (cost=4.22..14.37 rows=1 width=32) > Filter: (f1 ~ '^(?:(?:dr7g|dr7e).*)$'::text) > -> Bitmap Index Scan on loc_f1_key (cost=0.00..4.22 rows=7 width=0) > Index Cond: ((f1 >= 'd'::text) AND (f1 < 'e'::text)) > (4 rows) > > The useful part of this for speed purposes is the "Index Cond", and > you can see that it's only enforcing that the first character be "d". > I don't remember that code very well at the moment, but I'm a bit > surprised that it's even figured out that the "d" is common to both > branches. You can get a lot more traction if you factor the common > prefix manually: > > regression=# explain select * from loc where f1 similar to 'dr7(g|e)%'; > QUERY PLAN > ------------------------------------------------------------------------- > Bitmap Heap Scan on loc (cost=4.22..14.37 rows=1 width=32) > Filter: (f1 ~ '^(?:dr7(?:g|e).*)$'::text) > -> Bitmap Index Scan on loc_f1_key (cost=0.00..4.22 rows=7 width=0) > Index Cond: ((f1 >= 'dr7'::text) AND (f1 < 'dr8'::text)) > (4 rows) > > or maybe even > > regression=# explain select * from loc where f1 similar to 'dr7g%' or f1 > similar to 'dr7e%'; > QUERY PLAN > ------------------------------------------------------------ > ------------------- > Bitmap Heap Scan on loc (cost=8.45..19.04 rows=2 width=32) > Recheck Cond: ((f1 ~ '^(?:dr7g.*)$'::text) OR (f1 ~ > '^(?:dr7e.*)$'::text)) > Filter: ((f1 ~ '^(?:dr7g.*)$'::text) OR (f1 ~ '^(?:dr7e.*)$'::text)) > -> BitmapOr (cost=8.45..8.45 rows=14 width=0) > -> Bitmap Index Scan on loc_f1_key (cost=0.00..4.22 rows=7 > width=0) > Index Cond: ((f1 >= 'dr7g'::text) AND (f1 < 'dr7h'::text)) > -> Bitmap Index Scan on loc_f1_key (cost=0.00..4.22 rows=7 > width=0) > Index Cond: ((f1 >= 'dr7e'::text) AND (f1 < 'dr7f'::text)) > (8 rows) > > Whether this is worth the trouble depends a lot on your data distribution, > but any of them are probably better than the seqscan you're no doubt > getting right now. > > regards, tom lane >