Re: Slow "not in array" operation

Rick Otten Wed, 13 Nov 2019 03:19:13 -0800

On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx <[email protected]>
wrote:


> Disclaimer: Out over my skis again.
>
> From what you say here, and over on SO, it sounds like you've got two
> problems:
>
> * Matching on *huge *numbers of records because of common tags.
>
> * A dynamic collection of tags as they're customer driven/configured.
>
> An "ideal" solution might look like a bit-index for each tag+tuple, but
> Postgres does not have such a structure. The closest I've seen are Bloom
> filter based indexes. That's likely not going to work here as you don't
> know the collection of tags at any one time. If, however, you create your
> own frequency count estimates for tags, you may well find that there are a
> small number of common tags, and a large number of rare tags. That would be
> good to find out. If you do have some super common (non selective) tags,
> then perhaps a Bloom index based on that collection could be effective. Or
> expression indexes on the very common tags. In your SaaS setup, you might
> need counts/indexes tied to some kind of customer/tenancy distinction ID,
> understood. But, for simplicity, I'm just saying a single set of frequency
> counts, etc.
>
> Here's a recent article on Bloom filter based indexes in Postgres that
> looks decent:
> https://www.percona.com/blog/2019/06/14/bloom-indexes-in-postgresql/
>

One other question might be whether you are always querying for a specific
tag or small set of tags, or if your queries are for relatively random
tags.  ie, if you are always looking for the same 2 or 3 tags, then maybe
you could use a functional index or trigger-populate a new column on
insert/update that indicates whether those tags are present.

It is possible that you want a Graph model for this data instead of a
Relational model.  ie, if you are finding a bunch of users with common
features, you may find traversing a graph (such as Neo4j - or if you _have_
to stay with a PG backend, something like Cayley.io) to be much more
efficient and flexible.

Re: Slow "not in array" operation

Reply via email to