[HACKERS] : [HACKERS] ????: [HACKERS] Otvet: WAL and indexes (Re: [HACKERS] WAL status todo)
Just a confirmation. Do you plan overwrite storage manager also in 7.2 ? Yes if I'll get enough time. Vadim
RE: [HACKERS] Re: ?????: ?????: WAL and indexes (Re: [HACKERS] WAL status todo)
I'm still nervous about how we're going to test the WAL code adequately for the lesser-used index types. Any ideas out there? First, seems we'll have to follow to what you've proposed for their redo/undo: log each *fact* of changing a page to know was update op done entirely or not (rebuild index if so). + log information about where to find tuple pointing to heap (for undo). This is much easy to do than logging suitable for recovery. Vadim
Re: [HACKERS] Otvet: WAL and indexes (Re: [HACKERS] WAL status todo)
* Mikheev, Vadim [EMAIL PROTECTED] [001016 09:33] wrote: I don't understand why WAL needs to log internal operations of any of the index types. Seems to me that you could treat indexes as black boxes that are updated as side effects of WAL log items for heap tuples: when adding a heap tuple as a result of a WAL item, you just call the usual index insert routines, and when deleting a heap tuple as a result On recovery backend *can't* use any usual routines: system catalogs are not available. of undoing a WAL item, you mark the tuple invalid but don't physically remove it till VACUUM (thus no need to worry about its index entries). One of the purposes of WAL is immediate removing tuples inserted by aborted xactions. I want make VACUUM *optional* in future - space must be available for reusing without VACUUM. And this is first, very small, step in this direction. Why would vacuum become optional? Would WAL offer an option to not reclaim free space? We're hoping that vacuum becomes unneeded when postgresql is run with some flag indicating that we're uninterested in time travel. How much longer do you estimate until you can make it work that way? thanks, -Alfred
[HACKERS] Re: : WAL and indexes (Re: [HACKERS] WAL status todo)
"Mikheev, Vadim" [EMAIL PROTECTED] writes: I don't understand why WAL needs to log internal operations of any of the index types. Seems to me that you could treat indexes as black boxes that are updated as side effects of WAL log items for heap tuples: when adding a heap tuple as a result of a WAL item, you just call the usual index insert routines, and when deleting a heap tuple as a result On recovery backend *can't* use any usual routines: system catalogs are not available. OK, good point, but that just means you can't use the catalogs to discover what indexes exist for a given table. You could still create log entries that look like "insert indextuple X into index Y" without any further detail. the index is corrupt and rebuild it from scratch, using Hiroshi's index-rebuild code. How fast is rebuilding of index for table with 10^7 records? It's not fast, of course. But the point is that you should seldom have to do it. I agree to consider rtree/hash/gist as experimental index access methods BUT we have to have at least *one* reliable index AM with short down time/ fast recovery. With all due respect, I wonder just how "reliable" btree WAL undo/redo will prove to be ... let alone the other index types. I worry that this approach is putting too much emphasis on making it fast, and not enough on making it right. regards, tom lane
Re: [HACKERS] Re: Otvet: WAL and indexes (Re: [HACKERS] WAL status todo)
* Tom Lane [EMAIL PROTECTED] [001016 09:47] wrote: "Mikheev, Vadim" [EMAIL PROTECTED] writes: I don't understand why WAL needs to log internal operations of any of the index types. Seems to me that you could treat indexes as black boxes that are updated as side effects of WAL log items for heap tuples: when adding a heap tuple as a result of a WAL item, you just call the usual index insert routines, and when deleting a heap tuple as a result On recovery backend *can't* use any usual routines: system catalogs are not available. OK, good point, but that just means you can't use the catalogs to discover what indexes exist for a given table. You could still create log entries that look like "insert indextuple X into index Y" without any further detail. One thing you guys may wish to consider is selectively fsyncing on system catelogs and marking them dirty when opened for write: postgres: i need to write to a critical table... opens table, marks dirty completes operation and marks undirty and fsync -or- postgres: i need to write to a critical table... opens table, marks dirty crash, burn, smoke (whatever) Now you may still have the system tables broken, however the chances of that may be siginifigantly reduced depending on how often writes must be done to them. It's a hack, but depending on the amount of writes done to critical tables it may reduce the window for these inconvient situations signifigantly. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."
[HACKERS] Re: : : WAL and indexes (Re: [HACKERS] WAL status todo)
"Mikheev, Vadim" [EMAIL PROTECTED] writes: And how could I use such records on recovery being unable to know what data columns represent keys, what functions should be used for ordering? Um, that's not built into the index either, is it? OK, you win ... I'm still nervous about how we're going to test the WAL code adequately for the lesser-used index types. Any ideas out there? regards, tom lane
[HACKERS] : [HACKERS] Otvet: WAL and indexes (Re: [HACKERS] WAL status todo)
One of the purposes of WAL is immediate removing tuples inserted by aborted xactions. I want make VACUUM *optional* in future - space must be available for reusing without VACUUM. And this is first, very small, step in this direction. Why would vacuum become optional? Would WAL offer an option to not reclaim free space? We're hoping that vacuum becomes unneeded Reclaiming free space is issue of storage manager, as I said here many times. WAL is just Write A-head Log (first write to log then to data files, to have ability to recover using log data) and for matter of space it can only help to delete tuples inserted by aborted transaction. when postgresql is run with some flag indicating that we're uninterested in time travel. Time travel is gone ~ 3 years ago and vacuum was needed all these years and will be needed to reclaim space in 7.1 How much longer do you estimate until you can make it work that way? Hopefully in 7.2 Vadim
Re: [HACKERS] Re: ?????: ?????: WAL and indexes (Re: [HACKERS] WAL status todo)
"Mikheev, Vadim" [EMAIL PROTECTED] writes: And how could I use such records on recovery being unable to know what data columns represent keys, what functions should be used for ordering? Um, that's not built into the index either, is it? OK, you win ... I'm still nervous about how we're going to test the WAL code adequately for the lesser-used index types. Any ideas out there? Wait for bug reports? :-) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] WAL status todo
On Sat, 14 Oct 2000, Vadim Mikheev wrote: Well, hopefully WAL will be ready for alpha testing in a few days. Unfortunately at the moment I have to step side from main stream to implement new file naming, the biggest todo for integration WAL into system. I would really appreciate any help in the following issues (testing can start regardless of their statuses but they must be resolved anyway): I have downloaded the source via CVSup. Where can I find the WAL and the TOAST code? Thanks!! -- "And I'm happy, because you make me feel good, about me." - Melvin Udall - Martín Marqués email: [EMAIL PROTECTED] Santa Fe - Argentinahttp://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -
[HACKERS] : [HACKERS] WAL status todo
Well, hopefully WAL will be ready for alpha testing in a few days. Unfortunately at the moment I have to step side from main stream to implement new file naming, the biggest todo for integration WAL into system. I would really appreciate any help in the following issues (testing can start regardless of their statuses but they must be resolved anyway): I have downloaded the source via CVSup. Where can I find the WAL and the TOAST code? HEAP/BTREE related WAL code are in src/backend/acces/{heap|nbtree}/ #ifdef-ed with XLOG. Vadim
[HACKERS] WAL status todo
Well, hopefully WAL will be ready for alpha testing in a few days. Unfortunately at the moment I have to step side from main stream to implement new file naming, the biggest todo for integration WAL into system. I would really appreciate any help in the following issues (testing can start regardless of their statuses but they must be resolved anyway): 1. BTREE: sometimes WAL can't guarantee right order of items on leaf pages after recovery - new flag BTP_REORDER introduced to mark such pages. Btree should be changed to handle this case in normal processing mode. 2. HEAP: like 1., this issue is result of attempt to go without compensation records (ie without logging undo operations): it's possible that sometimes in redo there will be no space for new records because of in recovery we don't undo changes for aborted xactions immediately - function like BTREE' _bt_cleanup_page_ required for HEAP as well as general inspection of all places where HEAP' redo ops try to insert records (initially I thought that in recovery we'll undo changes immediately after reading abort record from log - this wouldn't work for BTREE: splits must be redo-ne before undo). 3. There are no redo/undo for HASH, RTREE GIST yet. This would be *really really great* if someone could implement it using BTREE' redo/undo code as prototype. These are the most complex parts of this todo. Probably, something else will follow later. Regards, Vadim