On 29.01.2020 21:16, Robert Haas wrote:
On Wed, Jan 29, 2020 at 10:30 AM Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:

I think that the idea of calling ambuild() on the fly is not going to
work, because, again, I don't think that calling that from random
places in the code is safe.

It is not a random place in the code.
Actually it is just one place - _bt_getbuf
Why it can be unsafe if it affects only private backends data?


What I expect we're going to need to do
here is model this on the approach used for unlogged tables. For an
unlogged table, each table and index has an init fork which contains
the correct initial contents for that relation - which is nothing at
all for a heap table, and a couple of boilerplate pages for an index.
In the case of an unlogged table, the init forks get copied over the
main forks after crash recovery, and then we have a brand-new, empty
relation with brand-new empty indexes which everyone can use. In the
case of global temporary tables, I think that we should do the same
kind of copying, but at the time when the session first tries to
access the table. There is some fuzziness in my mind about what
exactly constitutes accessing the table - it probably shouldn't be
when the relcache entry is built, because that seems too early, but
I'm not sure what is exactly right. In any event, it's easier to find
a context where copying some files on disk (that are certain not to be
changing) is safe than it is to find a context where index builds are
safe.

I do not think that approach used for unlogged tables is good for GTT.
Unlogged tables has to be reinitialized only after server restart.
GTT to should be initialized by each backend on demand.
It seems to me that init fork is used for unlogged table because recovery process to not have enough context to be able to reintialize table and indexes. It is much safer and simpler for recovery process just to copy files. But GTT case is different. Heap and indexes can be easily initialized by backend  using existed functions.

Approach with just calling btbuild is much simpler than you propose with creating extra forks and copying data from it. You say that it not safe. But you have not explained why it is unsafe. Yes, I agree that it is my responsibility to prove that it is safe. And as I already wrote, I can not provide such proof now. I will be pleased if you or anybody else can help to convince that this approach is safe or demonstrate problems with this approach.

Copying data from fork doesn't help to provide the same behavior of GTT indexes as regular indexes. And from my point of view compatibility with regular tables is most important point in GTT design. If for some reasons it is not possible, than we should think about other solutions. But right now I do not know such problems. We have two working prototypes of GTT. Certainly it is not mean lack of problems with the current implementations. But I really like to receive more constructive critics rather than "this approach is wrong because it is unsafe".

planner, and the executor that remember information about indexes"
have to be properly updated.  It is done using invalidation mechanism.
The same mechanism is used in case of DDL operations with GTT, because
we change system catalog.
I mean, that's not really a valid argument. Invalidations can only
take effect at certain points in the code, and the whole argument here
is about which places in the code are safe for which operations, so
the fact that some things (like accepting invalidations) are safe at
some points in the code (like the places where we accept them) does
not prove that other things (like calling ambuild) are safe at other
points in the code (like wherever you are proposing to call it). In
particular, if you've got a relation open, there's currently no way
for another index to show up while you've still got that relation
open.
The same is true for GTT. Right now building GTT index also locks the relation. It may be not absolutely needed, because data of relation is local and can not be changed by some other backend.
But I have not added some special handling of GTT here.
Mostly because I want to follow the same way as with regular indexes and prevent possible problems which as you mention can happen
if we somehow changing locking policy.


That means that the planner and executor (which keep the
relevant relations open) don't ever have to worry about updating their
data structures, because it can never be necessary. It also means that
any code anywhere in the system that keeps a lock on a relation can
count on the list of indexes for that relation staying the same until
it releases the lock. In fact, it can hold on to pointers to data
allocated by the relcache and count on those pointers being stable for
as long as it holds the lock, and RelationClearRelation contain
specific code that aims to make sure that certain objects don't get
deallocated and reallocated at a different address precisely for that
reason. That code, however, only works as long as nothing actually
changes. The upshot is that it's entirely possible for changing
catalog entries in one backend with an inadequate lock level -- or at
unexpected point in the code -- to cause a core dump either in that
backend or in some other backend. This stuff is really subtle, and
super-easy to screw up.

I am speaking a bit generally here, because I haven't really studied
*exactly* what might go wrong in the relcache, or elsewhere, as a
result of creating an index on the fly. However, I'm very sure that a
general appeal to invalidation messages is not sufficient to make
something like what you want to do safe. Invalidation messages are a
complex, ancient, under-documented, fragile system for solving a very
specific problem that is not the one you are hoping they'll solve
here. They could justifiably be called magic, but it's not the sort of
magic where the fairy godmother waves her wand and solves all of your
problems; it's more like the kind where you go explore the forbidden
forest and are never seen or heard from again.

Actually index is not created on the fly.
Index is created is usual way, by executing "create index" command.
So all  components of the Postgres (planner, executor,...) treat GTT indexes in the same way as regular indexes.
Locking and invalidations policies are exactly the same for them.
The only difference is that content of GTT index is constructed  on demand using private backend data. Is it safe or not? We are just reading data from local buffers/files and writing them here.
May be I missed something but I do not see any unsafety here.
There are issues with updating statistic but them can be solved.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply via email to