On Mon, Nov 9, 2020 at 1:39 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > While investigating the failures, I've tried increasing the values a > lot, without observing any measurable increase in runtime. IIRC I've > even used (10 * target_partlen) or something like that. That tells me > it's not very sensitive part of the code, so I'd suggest to simply use > something that we know is large enough to be safe.
Okay, then it's not worth being clever. > For example, the largest bloom filter we can have is 32kB, i.e. 262kb, > at which point the largest gap is less than 95 (per the gap table). And > we may use up to BLOOM_MAX_NUM_PARTITIONS, so let's just use > BLOOM_MAX_NUM_PARTITIONS * 100 Sure. > FWIW I wonder if we should do something about bloom filters that we know > can get larger than page size. In the example I used, we know that > nbits=575104 is larger than page, so as the filter gets more full (and > thus more random and less compressible) it won't possibly fit. Maybe we > should reject that right away, instead of "delaying it" until later, on > the basis that it's easier to fix at CREATE INDEX time (compared to when > inserts/updates start failing at a random time). Yeah, I'd be inclined to reject that right away. > The problem with this is of course that if the index is multi-column, > this may not be strict enough (i.e. each filter would fit independently, > but the whole index row is too large). But it's probably better to do at > least something, and maybe improve that later with some whole-row check. A whole-row check would be nice, but I don't know how hard that would be. As a Devil's advocate proposal, how awful would it be to not allow multicolumn brin-bloom indexes? -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company