On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx <morrisdeo...@gmail.com> wrote:
> Disclaimer: Out over my skis again. > > From what you say here, and over on SO, it sounds like you've got two > problems: > > * Matching on *huge *numbers of records because of common tags. > > * A dynamic collection of tags as they're customer driven/configured. > > An "ideal" solution might look like a bit-index for each tag+tuple, but > Postgres does not have such a structure. The closest I've seen are Bloom > filter based indexes. That's likely not going to work here as you don't > know the collection of tags at any one time. If, however, you create your > own frequency count estimates for tags, you may well find that there are a > small number of common tags, and a large number of rare tags. That would be > good to find out. If you do have some super common (non selective) tags, > then perhaps a Bloom index based on that collection could be effective. Or > expression indexes on the very common tags. In your SaaS setup, you might > need counts/indexes tied to some kind of customer/tenancy distinction ID, > understood. But, for simplicity, I'm just saying a single set of frequency > counts, etc. > > Here's a recent article on Bloom filter based indexes in Postgres that > looks decent: > https://www.percona.com/blog/2019/06/14/bloom-indexes-in-postgresql/ > One other question might be whether you are always querying for a specific tag or small set of tags, or if your queries are for relatively random tags. ie, if you are always looking for the same 2 or 3 tags, then maybe you could use a functional index or trigger-populate a new column on insert/update that indicates whether those tags are present. It is possible that you want a Graph model for this data instead of a Relational model. ie, if you are finding a bunch of users with common features, you may find traversing a graph (such as Neo4j - or if you _have_ to stay with a PG backend, something like Cayley.io) to be much more efficient and flexible.