06.09.2016 07:44, Pavan Deolasee:
On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova
Thank you for the review, I'll fix these problems in final version.
Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more
I started looking at the first patch posted above, but it seems you'll
rewrite these patches after discussion on the heapam refactoring ideas
you posted here https://wiki.postgresql.org/wiki/HeapamRefactoring
Thank you for the review.
Some comments anyways on the first patch:
1. It does not apply cleanly with git-apply - many white space errors
I'll fix all merge issues and attach it to the following message.
2. I don't understand the difference
between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem
to do exactly the same thing and can be used interchangeably.
The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.
3. While I like the idea of using separate interfaces to get
heap/index tuple from a page, may be they can internally use a common
interface instead of duplicating what PageGetItem() does already.
I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and
In any case, it'll be easy to separate them when necessary.
4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or
something like that?
I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.
5. If we do this, we should probably have corresponding wrapper
functions/macros for remaining callers of PageGetItem()
There are some calls of PageGetItem() left in BRIN and SP-gist indexes,
I can write simple wrappers for them.
I left them just because they were out of interest for my compression
I also looked at the refactoring design doc. Looks like a fine
approach to me, but the API will need much more elaborate discussions.
I am not sure if the interfaces as presented are enough to support
everything that even heapam can do today.
What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.
My point here is that heapam is too complicated for many simpler use-cases
read-only tables and append-only tables do not need many MVCC related tricks
like vacuum and complicated locking, tables with fixed len attributes
more optimal page layout. And so on.
I suggest refactoring, that will allow us to develop new heap-like
For the first version, they must have methods to "heapify" tuple i.e
representation of the data to regular HeapTuple, for example put some
MVCC fields. Of course this approach has its disadvantages, such as
It definitely won't be enough to write column storage or to implement
data structures. But it allows us not to depend of the Executor's code.
And much more thoughts will be required to ensure we don't miss out
things that new primary access method may need.
Do you have any particular ideas of new access methods?
A few points regarding how the proposed API maps to heapam:
- How do we support parallel scans on heap?
As far as I understand, parallel heap scan requires one more API function
heap_beginscan_parallel(). It uses the same API function - heap_getnext(),
but in addition to ordinary seq scan it selects page to read in a
- Does the primary AM need to support locking of rows?
That's a good question. I'd say it should be an option.
- Would there be separate API to interpret the PAM tuple itself?
(something that htup_details.h does today). It seems natural that
every PAM will have its own interpretation of tuple structure and we
would like to hide that inside PAM implementation.
As I already wrote, for the first prototype, I'd use function to "heapify"
tuple and don't care about executor issues at all. More detailed discussion
of this question you can find in another thread .
- There are many upper modules that need access to system attributes
(xmin, xmax for starters). How do you plan to handle that? You think
we can provide enough abstraction so that the callers don't need to
know the tuple structures of individual PAM?
To be honest, I didn't thought about it.
Do you mean external modules or upper levels of abstraction in Postgres?
I think that backward compatibility support will be pretty complicated.
Could you provide any examples?
I don't know what to do with the CF entry itself. I could change the
status to "waiting on author" but then the design draft may not get
enough attention. So I'm leaving it in the current state for others to
I'll try to update patches as soon as possible. Little code cleanup will
regardless of final refactoring design.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company