Hi, > I'm confused, what exactly is the benefit of this? What extension > performs a direct table scan bypassing the executor, searching for NULLs > or not-NULLs?
Basically any extension that accesses the tables without SPI in order to avoid parsing and planning overhead for relatively simple cases. One can specify *several* ScanKeys for a single scan which will be an equivalent of WHERE condition(a) AND b IS NOT NULL /* AND ... */; > If heapam can check for NULL/not-NULL more efficiently than the code > that calls it [...] This is done not for efficiency but rather for convenience. Additionally, practice shows that for an extension author it's very easy to miss a comment in skey.h: """ * SK_SEARCHARRAY, SK_SEARCHNULL and SK_SEARCHNOTNULL are supported only * for index scans, not heap scans; """ ... which results in many hours of debugging. The current interface is misleading and counterintuitive. I did my best in order to add as few new assembly instructions as possible, and only one extra if/else branching. I don't expect any measurable performance difference since the bottleneck for SeqScans is unlikely to be CPU in the affected piece of code but rather disk/locks/network/etc. On top of that the scenario when somebody is really worried about the performance AND is using seqscans (not index scans) AND this particular seqscan is a bottleneck (not JOINs, etc) seems rare, to me at least. > For tableam extensions, which may or may not support checking for NULLs, > we need to add an 'amsearchnulls' field to the table AM API. This will result in an unnecessary complication of the code and expensive extra checks that for the default heapam will always return true. I would argue that what we actually want is to force any TAM to support checking for NULLs. At least until somebody working on a real TAM will complain about this limitation. > But then let's also modify the caller in nodeSeqScan.c to actually make use > of it. That could actually be a good point. If memory serves I noticed that WHERE ... IS NULL queries don't even hit HeapKeyTest() and I was curious where the check for NULLs is actually made. As I understand, SeqNext() in nodeSeqscan.c simply iterates over all the tuples it can find and pushes them to the parent node. We could get a slightly better performance for certain queries if SeqNext() did the check internally. Unfortunately I'm not very experienced with plan nodes in order to go down this rabbit hole straight away. I suggest we make one change at a time and keep the patchset small as it was previously requested by many people on several occasions (the 64-bit XIDs story, etc). I will be happy to propose a follow-up patch accompanied by the benchmarks if and when we reach the consensus on this patch. -- Best regards, Aleksander Alekseev