Re: [HACKERS] Refactoring of heapam code.

2016-09-26 Thread Pavan Deolasee
On Tue, Sep 6, 2016 at 8:39 PM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> 06.09.2016 07:44, Pavan Deolasee:
>
>
> I don't know what to do with the CF entry itself. I could change the
> status to "waiting on author" but then the design draft may not get enough
> attention. So I'm leaving it in the current state for others to look at.
>
>
> I'll try to update patches as soon as possible. Little code cleanup will
> be useful
> regardless of final refactoring design.
>
>
I'm marking the patch as "Returned with Feedback". I assume you'll submit
fresh set of patches for the next CF anyways.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Refactoring of heapam code.

2016-09-13 Thread Michael Paquier
On Mon, Sep 12, 2016 at 7:12 PM, Pavan Deolasee
 wrote:
> I was thinking about locking, bulk reading (like page-mode API) etc. While
> you've added an API for vacuuming, we would probably also need an API to
> collect dead tuples, pruning etc (not sure if that can be combined with
> vacuum). Of course, these are probably very specific to current
> implementation of heap/MVCC and not all storages will need that.

Likely not, but it is likely that people would like to be able to get
that if some custom method needs it. I think that what would be a good
first step here is to brainstorm a potential set of custom AMs for
tables, get the smallest, meaningful, one from the set of ideas
proposed, and define a basic set of relation/storage/table AM routines
that we can come up to support it. And then build up a prototype using
it. We have been talking a lot about refactoring and reordering stuff,
which is something that would be good in the long term to make things
more generic with heap, but getting an idea of "we-want-that" may
prove to help in designing a minimum set if features that we'd like to
have here.

This would likely need anyway to extend CREATE TABLE to be able to
pass a custom AM for a given relation and store in pg_catalog, but
that's a detail in the whole picture...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-09-12 Thread Pavan Deolasee
On Tue, Sep 6, 2016 at 8:39 PM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> 06.09.2016 07:44, Pavan Deolasee:
>
> 2. I don't understand the difference between PageGetItemHeapHeaderOnly()
> and PageGetItemHeap(). They seem to do exactly the same thing and can be
> used interchangeably.
>
>
> The only difference between these two macros is that
> PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
> it only checks header fields (see HeapTupleHeaderData). I divided it into
> two separate functions, while I was working on page compression and
> I wanted to avoid unnecessary decompression calls. These names are
> just a kind of 'markers' to make the code reading and improving easier.
>
>
Ok. I still don't get it, but that's probably because I haven't seen a real
use case of that. Right now, both macros look exactly the same.


> 3. While I like the idea of using separate interfaces to get heap/index
> tuple from a page, may be they can internally use a common interface
> instead of duplicating what PageGetItem() does already.
>
>
> I don't sure I get it right. Do you suggest to leave PageGetItem and write
> PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
> It sounds reasonable while we have similar layout for both heap and index
> pages.
> In any case, it'll be easy to separate them when necessary.
>
>
Yes, that's what I was thinking.


> 4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something
> like that?
>
>
> I don't feel like doing that, because HeapTuple is a different structure.
> What we do get from page is a HeapTupleHeaderData structure
> followed by user's data.
>

Ok, makes sense.


>
>
> I also looked at the refactoring design doc. Looks like a fine approach to
> me, but the API will need much more elaborate discussions. I am not sure if
> the interfaces as presented are enough to support everything that even
> heapam can do today.
>
>
> What features of heapam do you think could be unsupportable in this API?
> Maybe I've just missed them.
>

I was thinking about locking, bulk reading (like page-mode API) etc. While
you've added an API for vacuuming, we would probably also need an API to
collect dead tuples, pruning etc (not sure if that can be combined with
vacuum). Of course, these are probably very specific to current
implementation of heap/MVCC and not all storages will need that.


>
> I suggest refactoring, that will allow us to develop new heap-like access
> methods.
> For the first version, they must have methods to "heapify" tuple i.e turn
> internal
> representation of the data to regular HeapTuple, for example put some
> stubs into
> MVCC fields. Of course this approach has its disadvantages, such as
> performance issues.
> It definitely won't be enough to write column storage or to implement
> other great
> data structures. But it allows us not to depend of the Executor's code.
>
>
Ok, understood.


>
> - There are many upper modules that need access to system attributes
> (xmin, xmax for starters). How do you plan to handle that? You think we can
> provide enough abstraction so that the callers don't need to know the tuple
> structures of individual PAM?
>
>
> To be honest, I didn't thought about it.
> Do you mean external modules or upper levels of abstraction in Postgres?
>

I meant upper levels of abstraction like the executor. For example, while
inserting a new tuple, the executor (the index AM's insert routine to be
precise) may need to wait for another transaction to finish. Today, it can
easily get that information by looking at the xmax of the conflicting
tuple. How would we handle that with abstraction since not every PAM will
have a notion of xmax?

Thanks,
Pavan

 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Refactoring of heapam code.

2016-09-06 Thread Anastasia Lubennikova

06.09.2016 07:44, Pavan Deolasee:




On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova 
> 
wrote:




Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more
concrete.


I started looking at the first patch posted above, but it seems you'll 
rewrite these patches after discussion on the heapam refactoring ideas 
you posted here https://wiki.postgresql.org/wiki/HeapamRefactoring 
 concludes.




Thank you for the review.


Some comments anyways on the first patch:

1. It does not apply cleanly with git-apply - many white space errors


I'll fix all merge issues and attach it to the following message.

2. I don't understand the difference 
between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem 
to do exactly the same thing and can be used interchangeably.


The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.

3. While I like the idea of using separate interfaces to get 
heap/index tuple from a page, may be they can internally use a common 
interface instead of duplicating what PageGetItem() does already.


I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and 
index pages.

In any case, it'll be easy to separate them when necessary.

4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or 
something like that?


I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

5. If we do this, we should probably have corresponding wrapper 
functions/macros for remaining callers of PageGetItem()



There are some calls of PageGetItem() left in BRIN and SP-gist indexes,
I can write simple wrappers for them.
I left them just because they were out of interest for my compression 
prototype.



I also looked at the refactoring design doc. Looks like a fine 
approach to me, but the API will need much more elaborate discussions. 
I am not sure if the interfaces as presented are enough to support 
everything that even heapam can do today.


What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

My point here is that heapam is too complicated for many simpler use-cases
read-only tables and append-only tables do not need many MVCC related tricks
like vacuum and complicated locking, tables with fixed len attributes 
can use

more optimal page layout. And so on.

I suggest refactoring, that will allow us to develop new heap-like 
access methods.
For the first version, they must have methods to "heapify" tuple i.e 
turn internal
representation of the data to regular HeapTuple, for example put some 
stubs into
MVCC fields. Of course this approach has its disadvantages, such as 
performance issues.
It definitely won't be enough to write column storage or to implement 
other great

data structures. But it allows us not to depend of the Executor's code.

And much more thoughts will be required to ensure we don't miss out 
things that new primary access method may need.



Do you have any particular ideas of new access methods?



A few points regarding how the proposed API maps to heapam:

- How do we support parallel scans on heap?


As far as I understand, parallel heap scan requires one more API function
heap_beginscan_parallel(). It uses the same API function - heap_getnext(),
but in addition to ordinary seq scan it selects page to read in a 
synchronized manner.



- Does the primary AM need to support locking of rows?


That's a good question. I'd say it should be an option.

- Would there be separate API to interpret the PAM tuple itself? 
(something that htup_details.h does today). It seems natural that 
every PAM will have its own interpretation of tuple structure and we 
would like to hide that inside PAM implementation.


As I already wrote, for the first prototype, I'd use function to "heapify"
tuple and don't care about executor issues at all. More detailed discussion
of this question you can find in another thread [1].

- There are many upper modules that need access to system attributes 
(xmin, xmax for starters). How do you plan to handle that? You think 
we can provide enough abstraction so that the callers don't need to 
know the tuple structures of individual PAM?


To be 

Re: [HACKERS] Refactoring of heapam code.

2016-09-05 Thread Pavan Deolasee
On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

>
>>
> Thank you for the review, I'll fix these problems in final version.
>
> Posting the first message I intended to raise the discussion. Patches
> were attached mainly to illustrate the problem and to be more concrete.
>

I started looking at the first patch posted above, but it seems you'll
rewrite these patches after discussion on the heapam refactoring ideas you
posted here https://wiki.postgresql.org/wiki/HeapamRefactoring concludes.

Some comments anyways on the first patch:

1. It does not apply cleanly with git-apply - many white space errors
2. I don't understand the difference between PageGetItemHeapHeaderOnly()
and PageGetItemHeap(). They seem to do exactly the same thing and can be
used interchangeably.
3. While I like the idea of using separate interfaces to get heap/index
tuple from a page, may be they can internally use a common interface
instead of duplicating what PageGetItem() does already.
4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something
like that?
5. If we do this, we should probably have corresponding wrapper
functions/macros for remaining callers of PageGetItem()

I also looked at the refactoring design doc. Looks like a fine approach to
me, but the API will need much more elaborate discussions. I am not sure if
the interfaces as presented are enough to support everything that even
heapam can do today. And much more thoughts will be required to ensure we
don't miss out things that new primary access method may need.

A few points regarding how the proposed API maps to heapam:

- How do we support parallel scans on heap?
- Does the primary AM need to support locking of rows?
- Would there be separate API to interpret the PAM tuple itself? (something
that htup_details.h does today). It seems natural that every PAM will have
its own interpretation of tuple structure and we would like to hide that
inside PAM implementation.
- There are many upper modules that need access to system attributes (xmin,
xmax for starters). How do you plan to handle that? You think we can
provide enough abstraction so that the callers don't need to know the tuple
structures of individual PAM?

I don't know what to do with the CF entry itself. I could change the status
to "waiting on author" but then the design draft may not get enough
attention. So I'm leaving it in the current state for others to look at.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Refactoring of heapam code.

2016-08-15 Thread Anastasia Lubennikova

08.08.2016 12:43, Anastasia Lubennikova:

08.08.2016 03:51, Michael Paquier:
On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera 
 wrote:

Anastasia Lubennikova wrote:
So there is a couple of patches. They do not cover all mentioned 
problems,

but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.
Well, I am interested in the topic. And just had a look at the 
patches proposed.


+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.



Here is the promised design draft.
https://wiki.postgresql.org/wiki/HeapamRefactoring

Looking forward to your comments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Alvaro Herrera
Anastasia Lubennikova wrote:

> By the way, I thought about looking at the mentioned patch and
> probably reviewing it, but didn't find any of your patches on the
> current commitfest. Could you point out the thread?

Sorry, I haven't posted anything yet.

> >Agreed.  But changing its name while keeping it in heapam.c does not
> >really improve things enough.  I'd rather have it moved elsewhere that's
> >not specific to "heaps" (somewhere in access/common perhaps).  However,
> >renaming the functions would break third-party code for no good reason.
> >I propose that we only rename things if we also break it for other
> >reasons (say because the API changes in a fundamental way).
> 
> Yes, I agree that it should be moved to another file.
> Just to be more accurate: it's not in heapam.c now, it is in
> "src/backend/catalog/heap.c" which requires much more changes
> than I did.

Argh.  Clearly, the code organization in this area is not good at all.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

08.08.2016 03:51, Michael Paquier:

On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera  wrote:

Anastasia Lubennikova wrote:

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.

Well, I am interested in the topic. And just had a look at the patches proposed.

+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

05.08.2016 20:56, Alvaro Herrera:

Anastasia Lubennikova wrote:

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

Hm.  I'm hacking on heapam.c pretty heavily ATM.  Your first patch
causes no problem I think, or if it does it's probably pretty minor, so
that's okay.  I'm unsure about the second one -- I think it's okay too,
because it mostly seems to be about moving stuff from heapam.c to hio.c
and shuffling some code around that I don't think I'm modifying.


I'm sure these patches will not cause any problems. The second one is
mostly about moving unrelated stuff from heapam.c to hio.c and renaming
couple of functions in heap.c.
Anyway, the are not a final solution just proof of concept.

By the way, I thought about looking at the mentioned patch and
probably reviewing it, but didn't find any of your patches on the
current commitfest. Could you point out the thread?


Also I think that it's quite strange to have a function heap_create(), that
is called
from index_create(). It has absolutely nothing to do with heap access
method.

Agreed.  But changing its name while keeping it in heapam.c does not
really improve things enough.  I'd rather have it moved elsewhere that's
not specific to "heaps" (somewhere in access/common perhaps).  However,
renaming the functions would break third-party code for no good reason.
I propose that we only rename things if we also break it for other
reasons (say because the API changes in a fundamental way).


Yes, I agree that it should be moved to another file.
Just to be more accurate: it's not in heapam.c now, it is in
"src/backend/catalog/heap.c" which requires much more changes
than I did.

Concerning to the third-party code. It's a good observation and we
definitely should keep it in mind. Nevertheless, I doubt that there's a 
lot of

external calls to these functions. And I hope this thread will end up with
fundamental changes introducing clear API for all mentioned problems.




One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

I don't see us fixing this bit, really.  Historical baggage and all
that.  For example, a lot of SQL queries use "WHERE relkind = 'r'".  If
we change the name, we should probably also change the relkind constant,
and that would break the queries.  If we change the name and do not
change the constant, then we have to have a comment "we call them
RELKIND_TABLE but the char is r because it was called RELATION
previously", which isn't any better.


Good point.
I'd rather change it to RELKIND_BASIC_RELATION or something like that,
which is more specific and still goes well with 'r' constant. But minor 
changes

like that can wait until we clarify the API in general.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-07 Thread Michael Paquier
On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera  wrote:
> Anastasia Lubennikova wrote:
>> So there is a couple of patches. They do not cover all mentioned problems,
>> but I'd like to get a feedback before continuing.
>
> I agree that we could improve things in this general area, but I do not
> endorse any particular changes you propose in these patches; I haven't
> reviewed your patches.

Well, I am interested in the topic. And just had a look at the patches proposed.

+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
pfree(bistate);
 }

-
 /*
  * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Alvaro Herrera
Anastasia Lubennikova wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.

Hm.  I'm hacking on heapam.c pretty heavily ATM.  Your first patch
causes no problem I think, or if it does it's probably pretty minor, so
that's okay.  I'm unsure about the second one -- I think it's okay too,
because it mostly seems to be about moving stuff from heapam.c to hio.c
and shuffling some code around that I don't think I'm modifying.

> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.

Agreed.  But changing its name while keeping it in heapam.c does not
really improve things enough.  I'd rather have it moved elsewhere that's
not specific to "heaps" (somewhere in access/common perhaps).  However,
renaming the functions would break third-party code for no good reason.
I propose that we only rename things if we also break it for other
reasons (say because the API changes in a fundamental way).

> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...

I don't see us fixing this bit, really.  Historical baggage and all
that.  For example, a lot of SQL queries use "WHERE relkind = 'r'".  If
we change the name, we should probably also change the relkind constant,
and that would break the queries.  If we change the name and do not
change the constant, then we have to have a comment "we call them
RELKIND_TABLE but the char is r because it was called RELATION
previously", which isn't any better.

> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

05.08.2016 16:30, Kevin Grittner:

On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
 wrote:


They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.


Good point. I сonfused letters for views (virtual relations) and
materialized views (primary storage).
Should be:

"primary storage" - r, S, t, m. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, v. They have no physical storage, only entries
in caches and catalogs.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Kevin Grittner
On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
 wrote:

> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Thom Brown
On 5 August 2016 at 08:54, Anastasia Lubennikova
 wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.
>
> There is a number of problems I see in the existing code:
> Problem 1. Heap != Relation.
>
> File heapam.c has a note:
>  *  This file contains the heap_ routines which implement
>  *  the POSTGRES heap access method used for all POSTGRES
>  *  relations.
> This statement is wrong, since we also have index relations,
> that are implemented using other access methods.
>
> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.
>
> And so on, and so on.
>
> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...
>
> Problem 2.
> Some functions are shared between heap and indexes access methods.
> Maybe someday it used to be handy, but now, I think, it's time to separate
> them, because IndexTuples and HeapTuples have really little in common.
>
> Problem 3. The word "heap" is used in many different contexts.
> Heap - a storage where we have tuples and visibility information.
> Generally speaking, it can be implemented over any data structure,
> not just a plain table as it is done now.
>
> Heap - an access method, that provides a set of routines for plain table
> storage, such as insert, delete, update, gettuple, vacuum and so on.
> We use this access method not only for user's tables.
>
> There are many types of relations (pg_class.h):
> #define  RELKIND_RELATION'r'/* ordinary table */
> #define  RELKIND_INDEX   'i'/* secondary index */
> #define  RELKIND_SEQUENCE'S'/* sequence object */
> #define  RELKIND_TOASTVALUE  't'/* for out-of-line
> values */
> #define  RELKIND_VIEW'v'/* view */
> #define  RELKIND_COMPOSITE_TYPE  'c'/* composite type */
> #define  RELKIND_FOREIGN_TABLE   'f'/* foreign table */
> #define  RELKIND_MATVIEW 'm'/* materialized view */
>
> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.
>
> Now, for some reasons, we sometimes use name "heap" for both
> "primary storage" and "virual relations". See src/backend/catalog/heap.c.
> But some functions work only with "primary storage". See pgstat_relation().
> And we constantly have to check relkind everywhere.
> I'd like it would be clear from the code interface and naming.
>
> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.
>
> Patch 1:
> There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
> from the given page. It is used for both heap and index tuples.
> It doesn't seems a problem, until you are going to find anything in this
> code.
>
> The first patch "PageGetItem_refactoring.patch" is intended to fix it.
> It is just renaming:
> (IndexTuple) PageGetItem ---> PageGetItemIndex
> (HeapTupleHeader) PageGetItem ---> PageGetItemHeap
> Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple,
> SpGistDeadTuple)
> still use PageGetItem.
>
> I also changed functions that do not access tuple's data, but only need
> HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.
>
> I do not insist on these particular names, by the way.
>
> Patch 2.
> heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
> and outdated comments. The patch is intended to improve it.
> Fun fact: I found several "check it later" comments unchanged since
>  "Postgres95 1.01 Distribution - Virgin Sources" commit.
>
> I wonder if we can wind better name relation_drop_with_catalog() since,
> again, it's
> not about all kinds of relations. We use another function to drop index
> relations.
>
> I hope these changes will be useful for the future development.
> Suggested patches are mostly about renaming and rearrangement of functions
> between files, so, if nobody has conceptual objections, all I need from
> reviewers
> is an attentive look to find typos, grammar mistakes and overlooked areas.

Could you add this to the commitfest?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

There is a number of problems I see in the existing code:
Problem 1. Heap != Relation.

File heapam.c has a note:
 *  This file contains the heap_ routines which implement
 *  the POSTGRES heap access method used for all POSTGRES
 *  relations.
This statement is wrong, since we also have index relations,
that are implemented using other access methods.

Also I think that it's quite strange to have a function heap_create(), 
that is called
from index_create(). It has absolutely nothing to do with heap access 
method.


And so on, and so on.

One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

Problem 2.
Some functions are shared between heap and indexes access methods.
Maybe someday it used to be handy, but now, I think, it's time to separate
them, because IndexTuples and HeapTuples have really little in common.

Problem 3. The word "heap" is used in many different contexts.
Heap - a storage where we have tuples and visibility information.
Generally speaking, it can be implemented over any data structure,
not just a plain table as it is done now.

Heap - an access method, that provides a set of routines for plain table
storage, such as insert, delete, update, gettuple, vacuum and so on.
We use this access method not only for user's tables.

There are many types of relations (pg_class.h):
#define  RELKIND_RELATION'r'/* ordinary table */
#define  RELKIND_INDEX   'i'/* secondary index */
#define  RELKIND_SEQUENCE'S'/* sequence object */
#define  RELKIND_TOASTVALUE  't'/* for out-of-line 
values */

#define  RELKIND_VIEW'v'/* view */
#define  RELKIND_COMPOSITE_TYPE  'c'/* composite type */
#define  RELKIND_FOREIGN_TABLE   'f'/* foreign table */
#define  RELKIND_MATVIEW 'm'/* materialized view */

They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Now, for some reasons, we sometimes use name "heap" for both
"primary storage" and "virual relations". See src/backend/catalog/heap.c.
But some functions work only with "primary storage". See pgstat_relation().
And we constantly have to check relkind everywhere.
I'd like it would be clear from the code interface and naming.

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

Patch 1:
There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
from the given page. It is used for both heap and index tuples.
It doesn't seems a problem, until you are going to find anything in this 
code.


The first patch "PageGetItem_refactoring.patch" is intended to fix it.
It is just renaming:
(IndexTuple) PageGetItem ---> PageGetItemIndex
(HeapTupleHeader) PageGetItem ---> PageGetItemHeap
Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, 
SpGistDeadTuple)

still use PageGetItem.

I also changed functions that do not access tuple's data, but only need
HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.

I do not insist on these particular names, by the way.

Patch 2.
heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
and outdated comments. The patch is intended to improve it.
Fun fact: I found several "check it later" comments unchanged since
 "Postgres95 1.01 Distribution - Virgin Sources" commit.

I wonder if we can wind better name relation_drop_with_catalog() since, 
again, it's
not about all kinds of relations. We use another function to drop index 
relations.


I hope these changes will be useful for the future development.
Suggested patches are mostly about renaming and rearrangement of functions
between files, so, if nobody has conceptual objections, all I need from 
reviewers

is an attentive look to find typos, grammar mistakes and overlooked areas.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 4983bbb..257b609 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -130,7 +130,7 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
 
 		ItemId		id = PageGetItemId(page, off);
 
-		itup =