Re: [HACKERS] Refactoring of heapam code.

Anastasia Lubennikova Tue, 06 Sep 2016 08:11:15 -0700

06.09.2016 07:44, Pavan Deolasee:

On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova<[email protected] <mailto:[email protected]>>wrote:
    Thank you for the review, I'll fix these problems in final version.

    Posting the first message I intended to raise the discussion. Patches
    were attached mainly to illustrate the problem and to be more
    concrete.
I started looking at the first patch posted above, but it seems you'llrewrite these patches after discussion on the heapam refactoring ideasyou posted here https://wiki.postgresql.org/wiki/HeapamRefactoring<https://wiki.postgresql.org/wiki/HeapamRefactoring> concludes.


Thank you for the review.

Some comments anyways on the first patch:

1. It does not apply cleanly with git-apply - many white space errors


I'll fix all merge issues and attach it to the following message.

2. I don't understand the differencebetween PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seemto do exactly the same thing and can be used interchangeably.


The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.

3. While I like the idea of using separate interfaces to getheap/index tuple from a page, may be they can internally use a commoninterface instead of duplicating what PageGetItem() does already.


I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?

It sounds reasonable while we have similar layout for both heap andindex pages.

In any case, it'll be easy to separate them when necessary.

4. Should we rename PageGetItemHeap() to PageGetHeapTuple() orsomething like that?


I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

5. If we do this, we should probably have corresponding wrapperfunctions/macros for remaining callers of PageGetItem()

There are some calls of PageGetItem() left in BRIN and SP-gist indexes,
I can write simple wrappers for them.

I left them just because they were out of interest for my compressionprototype.

I also looked at the refactoring design doc. Looks like a fineapproach to me, but the API will need much more elaborate discussions.I am not sure if the interfaces as presented are enough to supporteverything that even heapam can do today.


What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

My point here is that heapam is too complicated for many simpler use-cases
read-only tables and append-only tables do not need many MVCC related tricks

like vacuum and complicated locking, tables with fixed len attributescan use

more optimal page layout. And so on.

I suggest refactoring, that will allow us to develop new heap-likeaccess methods.For the first version, they must have methods to "heapify" tuple i.eturn internalrepresentation of the data to regular HeapTuple, for example put somestubs intoMVCC fields. Of course this approach has its disadvantages, such asperformance issues.It definitely won't be enough to write column storage or to implementother great

data structures. But it allows us not to depend of the Executor's code.

And much more thoughts will be required to ensure we don't miss outthings that new primary access method may need.

Do you have any particular ideas of new access methods?

A few points regarding how the proposed API maps to heapam:

- How do we support parallel scans on heap?


As far as I understand, parallel heap scan requires one more API function
heap_beginscan_parallel(). It uses the same API function - heap_getnext(),

but in addition to ordinary seq scan it selects page to read in asynchronized manner.

- Does the primary AM need to support locking of rows?


That's a good question. I'd say it should be an option.

- Would there be separate API to interpret the PAM tuple itself?(something that htup_details.h does today). It seems natural thatevery PAM will have its own interpretation of tuple structure and wewould like to hide that inside PAM implementation.


As I already wrote, for the first prototype, I'd use function to "heapify"
tuple and don't care about executor issues at all. More detailed discussion
of this question you can find in another thread [1].

- There are many upper modules that need access to system attributes(xmin, xmax for starters). How do you plan to handle that? You thinkwe can provide enough abstraction so that the callers don't need toknow the tuple structures of individual PAM?


To be honest, I didn't thought about it.
Do you mean external modules or upper levels of abstraction in Postgres?

I think that backward compatibility support will be pretty complicated.Could you provide any examples?

I don't know what to do with the CF entry itself. I could change thestatus to "waiting on author" but then the design draft may not getenough attention. So I'm leaving it in the current state for others tolook at.

I'll try to update patches as soon as possible. Little code cleanup willbe useful

regardless of final refactoring design.


[1] http://postgresql.nabble.com/Pluggable-storage-td5916322.html

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Refactoring of heapam code.

Reply via email to