Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

Alexander Kuzmenkov Wed, 12 Apr 2017 07:02:51 -0700

On 12.04.2017 15:04, Tom Lane wrote:

Andrew Gierth <[email protected]> writes:

"Alexander" == Alexander Kuzmenkov <[email protected]> writes:
  Alexander> Structurally, the patch consists of two major parts: a
  Alexander> specialized executor node
Why?
It strikes me that the significant fact here is not that we're doing
count(*), but that we don't need any columns from the bitmap heap scan
result.  Rather than creating a whole new node, can't the existing
bitmap heapscan be taught to skip fetching the actual table page in
cases where it's all-visible, not lossy, and no columns are needed?

+1 ... while I hadn't actually looked at the code, it seemed to me that
anything like the optimization-as-described would be impossibly klugy
from the planner's standpoint.  Your formulation sounds lots nicer.


Detecting that no columns are needed in the executor might be a bit tricky
because of the planner's habit of inserting a "physical tlist" to avoid a
projection step.  (See also nearby discussion about custom scan planning.)
But we could fix that.  I think one rule that would make sense is to
just disable the physical-tlist substitution if the relation's targetlist
is empty --- it wouldn't be buying much in such a case anyhow.  Then the
runtime tlist for the scan node would also be empty, and away you go.

When making an early prototype of this optimization, I did what you aredescribing with the bitmap heap scan executor node. In an internalreview, it was said that the bitmap heap scan node is alreadycomplicated enough and shouldn't have more logic added to it, so Irewrote it a separate node. To me, your approach looks good too, so ifthe community is generally in favor of this, I could rewrite theexecutor as such.

With planner, the changes are more complex. Two things must be donethere. First, when the tlist is empty, we must use a different costfunction for bitmap heap scan, because the heap access pattern isdifferent. Second, choose_bitmap_and() must use a different algorithmfor choosing the right combination of paths. A standard algorithmchooses the combination based on cost. For count(*) purposes, thedecisive factor is that the path has to check all the restrictions, orelse we'll need heap access to recheck some of them, which defeats thepurpose of having this optimization. The planner code that builds andcosts the index path is fairly complex, and integrating this additionalbehavior into it didn't look good to me. Instead, I created aspecialized path node and isolated the logic that handles it. Theresulting implementation adds several functions, but it is mostlyself-contained and has a single entry point in grouping_planner(). Ithandles the specific case of bitmap count plans and doesn't complicatethe existing code any further.

The planner part is to some extent independent of whether we use aseparate executor node or not. If we choose not to, the same count(*)optimization code I proposed could create plain bitmap heap scan paths.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

Reply via email to