Hi, I would like share some thoughts about 'heap all consistent' part and one of the open questions:
The idea behind 'heap all consistent' is to use heap data to validate the index. BRIN doesn't store heap tuples, so there is no straightforward way to check if every tuple was indexed or not. We have range data, so we need to do something with every heap tuple and corresponding range. Something that will tell us if the range data covers the heap tuple or not. What options I see here: 1) We can use the addValue() function. It returns FALSE if range data was not changed (in other words range data already covers heap tuple data that we pass to the function). It's very easy to do, we can use heap tuples directly. But the problem I see here is that addValue() can return TRUE even if heap tuple data have been already covered by the range, but range data itself changed for some reason (some new algorithm were applied for instance). So I think we can have some false positives that we can do nothing about. 2) We can use the consistent() function. It requires ScanKey and returns true if the range data satisfies ScanKey's condition. So we need to convert every heap tuple into ScanKey somehow. This approach is implemented now in the patch, so I tried to describe all details about heap tuple to ScanKey conversion in the comment: /* * Prepare ScanKey for index attribute. * * ConsistentFn requires ScanKey, so we need to generate ScanKey for every * attribute somehow. We want ScanKey that would result in TRUE for every heap * tuple within the range when we use its indexed value as sk_argument. * To generate such a ScanKey we need to define the right operand type and the strategy number. * Right operand type is a type of data that index is built on, so it's 'opcintype'. * There is no strategy number that we can always use, * because every opclass defines its own set of operators it supports and strategy number * for the same operator can differ from opclass to opclass. * So to get strategy number we look up an operator that gives us desired behavior * and which both operand types are 'opcintype' and then retrieve the strategy number for it. * Most of the time we can use '='. We let user define operator name in case opclass doesn't * support '=' operator. Also, if such operator doesn't exist, we can't proceed with the check. * * Generated once, and will be reused for all heap tuples. * Argument field will be filled for every heap tuple before * consistent function invocation, so leave it NULL for a while. * */ With this approach function brin_check() has optional parameter 'consistent_operator_names' that it seems to me could be very confusing for users. In general I think this is the most complex approach both in terms of use and implementation. 3) The approach that seems to me the most clear and straightforward: to add new optional function to BRIN opclass API. The function that would say if passed value is covered with the current range data. it's exactly what we want to know, so we can use heap data directly here. Something like that: bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull) It could be an optional function that will be implemented for all core BRIN opclasses. So if somebody wants to use it for some custom opclass they will need to implement it too, but it's not required. I understand that adding something to the index opclass API requires very strong arguments. So the argument here is that it will let to do brin check very robust (without possible false positives as in the first approach) and easy to use (no additional parameters in the check function). Also, the withinRange() function could be written in such a way that it would be more efficient for our task than addValue() or consistent(). I'd be glad to hear your thoughts! Best regards, Arseniy Mukhin