On 05.02.2020 00:38, Robert Haas wrote:

My guess it that the right time to do this work is just after we
acquire locks, at the end of parse analysis. I think trying to do it
during execution is too late, since the planner looks at indexes, and
trying to do it in the planner instead of before we start planning
seems more likely to cause bugs and has no real advantages. It's just
better to do complicated things (like creating indexes) separately
rather than in the middle of some other complicated thing (like
planning). I could tie my shoelaces the first time they get tangled up
with my break pedal but it's better to do it before I get in the car.
I have implemented this approach in my new patch

https://www.postgresql.org/message-id/3e88b59f-73e8-685e-4983-9026f94c57c5%40postgrespro.ru

I have added check whether index is initialized or not to plancat.c where optimizer checks if index is valid. Now it should work for all kinds of indexes (B-Tree, hash, user defined access methods...).

And I'm still inclined to do it by flat-copying files rather than
calling ambuild. It will be slightly faster, but also importantly, it
will guarantee that (1) every backend gets exactly the same initial
state and (2) it has fewer ways to fail because it doesn't involve
calling any user-defined code. Those seem like fairly compelling
advantages, and I don't see what the disadvantages are. I think
calling ambuild() at the point in time proposed in the preceding
paragraph would be fairly safe and would probably work OK most of the
time, but I can't think of any reason it would be better.

There is very important reason (from my point of view): allow other sessions to use created index and
so provide compatible behavior with regular tables (and with Oracle).
So we should be able to populate index with existed GTT data.
And ambuild will do it.


Incidentally, what I'd be inclined to do is - if the session is
running a query that does only read-only operations, let it continue
to point to the "master" copy of the GTT and its indexes, which is
stored in the relfilenodes indicated for those relations in pg_class.
If it's going to acquire a lock heavier than AccessShareLock, then
give it is own copies of the table and indexes, stored in a temporary
relfilenode (tXXX_YYY) and redirect all future access to that GTT by
this backend to there. Maybe there's some reason this won't work, but
it seems nice to avoid saying that we've "attached" to the GTT if all
we did is read the empty table.

Sorry, I do not understand the benefits of such optimization. It seems to be very rare situation when session will try to access temp table which was not previously filled with data. But even if it happen, keeping "master" copy will not safe much: we in any case have shared metadata and no data. Yes, with current approach, first access to GTT will cause creation of empty indexes. But It is just initialization of 1-3 pages. I do not think that delaying index initialization can be really useful.

In any case, calling ambuild is the simplest and most universal approach, providing desired and compatible behavior. I really do not understand why we should try yo invent some alternative solution.




Reply via email to