Re: [HACKERS] Prototype: In-place upgrade v02

Zdenek Kotala Mon, 08 Sep 2008 01:09:44 -0700

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
Heikki Linnakangas napsal(a):
The patch seems to be missing the new htup.c file.
I'm sorry. I attached new version which is synchronized with current
head. I would like to say more comments as well.
1) The patch contains also changes which was discussed during Julycommit fest. - PageGetTempPage modification suggested by Tom
- another hash.h backward compatible cleanup
It might be a good idea to split that into a separate patch. The sheersize of this patch is quite daunting, even though the bulk of it isstraightforward search&replace.


Yes, I will do it.

2) I add tuplimits.h header file which contains tuple limits fordifferent access method. It is not finished yet, but idea is to keepall limits in one file and easily add limits for different page layoutversion - for example replace static computing with dynamic based onrelation (maxtuplesize could be store in pg_class for each relation).
I need this header also because I fallen in a cycle in header dependency.
3) I already sent Page API performance result inhttp://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php
I replaced call sequence PagetGetItemId, PageGetItemId withPageGetIndexTuple and PageGetHeapTuple function. It is main differencein this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identifycorrect tupleheader version.
It would be good to mention that PageAPI (and tuple API)implementation is only prototype without any performance optimization.
You mentioned 5% performance degradation in that thread. What test casewas that? What would be a worst-case scanario, and how bad is it?


Paul van den Bogaart tested long run OLTP workload on it. He used iGen test.

5% is a pretty hefty price, especially when it's paid by not onlyupgraded installations, but also freshly initialized clusters. I thinkyou'll need to pursue those performance optimizations.

5% is worst scenario. Current version is not optimized. It is written for easydebugging and (D)tracing. Pageheaders structures are very similar and we caneasily remove switches for most of attributes and replace function with macrosor inline function.

4) This patch contains more topics for decision. First is general ifthis approach is acceptable.
I don't like the invasiveness of this approach. It's pretty invasivealready, and ISTM you'll need similar switch-case handling of all datatypes that have changed the internal representation as well.

I agree in general. But for example new page API is not so invasive and by myopinion it should be implemented (with or without multiversion support), becauseit cleans a code. HeapTuple processing is easy too, but unfortunately itrequires lot of modifications on many places. I has wonder how many pieces ofcode access directly to HeapTupleHeader and does not use HeapTuple datastructure. I think we should make a conclusion what is recommended usage ofHeapTupleHeader and HeapTuple. Most of changes in a code is like replacingHeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. Ithink it should be cleanup anyway.

You mentioned data types, but it is not a problem. You can easily extend datatype attribute about version information and call correct in/out functions. Oruse different Oid for new data type version. There are more possible easysolutions for data types. And for conversion You can use ALTER TABLE command.Main idea is keep data in all format in a relation. This approach should usealso for integer/float datetime problem.

We've talked about this before, so you'll remember that I favor tehapproach is to convert the page format, page at a time, when the pagesare read in. I grant you that there's non-trivial issues with that aswell, like if the converted data takes more space and don't fit in thepage anymore.


I like conversion on read too, because it is easy but there are more problems.

The non-fit page is one them. Others problems are with indexes. For examplehash index stores bitmap into page and it is not mentioned anywhere. Only hasham knows what page contains this kind of data. It is probably impossible toconvert this page during a reading. :(

I wonder if we could go with some sort of a hybrid approach? Convert thewhole page when it's read in, but if it doesn't fit, fall back totricks like loosening the alignment requirements on platforms that canhandle non-aligned data, or support a special truncated page header,without pd_tli and pd_prune_xid fields. Just a thought, not sure howfeasible those particular tricks are, but something along those lines..


OK, I have backup idea :-). Stay tuned :-)

All in all, though. I find it a bit hard to see the big picture. Forupgrade-in-place, what are all the pieces that we need? To keep thisconcrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3-> 8.4? That's fine with me as well, but let's pick one) and forgetabout hypothetical changes that might occur in a future version. I can see:
1. Handling page layout changes (pd_prune_xid, pd_flags)
2. Handling tuple header changes (infomask2, HOT bits, combocid)

2.5 + composite data type

3. Handling changes in data type representation (packed varlens)

3.5 Data types generally (cidr/inet)

4. Toast chunk size

4.5 general MaxTupleSize for each different AM

5. Catalogs

6. AM methods

After putting all those together, how large a patch are we talkingabout, and what's the performance penalty then? How much of all thatneeds to be in core, and how much can live in a pgfoundry project or anextra binary in src/bin or contrib? I realize that none of us have acrystal ball, and one has to start somewhere, but I feel uneasycommitting to an approach until we have a full plan.

Unfortunately, I'm still in analyzing phase. Presented patch is prototype of onepossible approach. I hit lot of problems and I don't have still answers on allof them. I'm going to update wiki page to share all these information.

At this moment, I think that I can implement offline heap conversion (8.2->8.4)and all indexed will be reindex. It is what we can have for 8.4. Onlineconversion has lot of problems which we are not able to answer at this moment.


                Zdenek



--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

Reply via email to