Re: [HACKERS] Resjunk sort columns, Heikki's index-only quals patch, and bug #5000

Robert Haas Sun, 20 Sep 2009 18:25:38 -0700

On Mon, Sep 14, 2009 at 5:41 AM, Heikki Linnakangas
<[email protected]> wrote:
> Heikki Linnakangas wrote:
>> Tom Lane wrote:
>>> It strikes me that in the cases where it wouldn't be necessary to
>>> compute junk sort-key columns, it would be because we were scanning an
>>> index that includes those values.  So if the plan were set up to pull
>>> those values from the index and return them, then we'd not have to add
>>> this extra complexity to grouping_planner --- the argument that it's not
>>> worth it to get rid of the junk columns comes back into play.  Moreover,
>>> such an ability would also mean that if the user *does* ask for the
>>> sort column value as output (ie it's not resjunk), we can still satisfy
>>> the query from the index without recomputing the expensive function.
>>>
>>> So this is where we come to the connection to Heikki's index-only-quals
>>> patch.  As submitted, that code is only able to use an index column in
>>> a scan qual, it's not able to return it as part of the scan result.
>>> This example makes it clear that that definition is missing a large
>>> part of the potential benefit of an index value extraction capability.
>>>
>>> To be able to do anything along that line would require some more work
>>> in the executor and a *lot* more work in the planner, and I'm honestly
>>> not sure what the planner part of it would look like.
>>
>> I think we should separate the Heap Fetch operation from the IndexScan.
>
> I've been hacking on that approach. It's quite unfinished, but before I
> spend any more time on it, I'd like to get some feedback on the overall
> design.
>
> The attached patch can create plans where quals are checked and joins
> are performed using values from indexes only, and the heap tuples are
> fetched only for matching rows. Passes regression tests, but code is
> quite ugly at points. Cost estimation is bogus. The patch builds on the
> indexam-api-changes patch I posted earlier, which is also attached. I
> haven't yet done the changes to that patch that were discussed.
>
> I haven't done any performance testing. The overhead of an extra
> executor node for each index scan could slow down simple queries, we
> might need to compensate that somehow, maybe reintroduce a fastpath
> combined IndexScan+HeapFetch node.  I'm also afraid the extra work I've
> pushed to the stage where Paths are constructed could slow down planning
> quite a bit if you have a lot of indexes.
>
>
> Path nodes now carry a targetlist. That's because when you have a path like:
>
>  HeapFetch
>   -> Join
>     ...
>
> You won't have all the columns of the join rel available at the join
> node yet, because they will be fetched in the HeapFetch node above. The
> targetlist in Path nodes reflect that, and the targetlist of the final
> Plan nodes are created from the targetlists in the Path nodes instead of
> the ones in RelOptInfos.
>
> Per earlier discussion, I changed the way index tuple fetching works in
> B-tree, so that it can now be relied on. Matching index tuples are
> copied to backend-local memory when the scan steps on a page.
>
> Var nodes that refer to index columns (indexquals and the new index-only
> filters) now have a new field, varindexno, set. While we could've
> continued with the old representation, now that we have more expressions
> that refer to index vars instead of heap vars, this makes debugging easier.


I've taken a quick look through the rest of this patch and there's
obviously some hackery that needs to be cleaned up before this is
committable, but you knew that already.  I don't see a lot to object
to at a design level, but OTOH I'm not terribly familiar with the
executor, which is where most of the non-planner changes are to be
found, so I can't really offer an opinion with any certainty.

One other random thought: I notice that you copied a (completely
inscrutable, to me) comment beginning with "Check if we are evaluating
PlanQual..." from BitmapHeapNext to HeapFetchNext, so maybe this
hasn't escaped you either - but couldn't the heap fetches for a bitmap
index scan be postponed until higher up in the join tree just as
you're proposing to do for a regular index scan?  It even seems
conceivable that it could be right to do a regular index scan (on,
say, the inner side of a nested loop), and then accumulate all the
results and do a single bitmap heap scan on the result.  This might be
too marginal a case to worry about... especially for version one...
just throwing it out there.

I wonder if HeapFetchNext should be called just HeapNext for parity
with BitmapHeapNext, and the Heap Fetch node called a Heap Scan for
parity with Bitmap Heap Scan.

Since you previously stated that you were going to put this patch
aside to work on HS and SR[1], I'm going to move this to Returned with
Feedback for now.  Hope that's OK, and that the feedback is sufficient
and useful.

...Robert

[1] http://archives.postgresql.org/message-id/[email protected]

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Resjunk sort columns, Heikki's index-only quals patch, and bug #5000

Reply via email to