Re: [HACKERS] Enabling Checksums
On 18 December 2012 02:21, Jeff Davis pg...@j-davis.com wrote: On Mon, 2012-12-17 at 19:14 +, Simon Riggs wrote: We'll need a way of expressing some form of corruption tolerance. zero_damaged_pages is just insane, The main problem I see with zero_damaged_pages is that it could potentially write out the zero page, thereby really losing your data if it wasn't already lost. (Of course, we document that you should have a backup first, but it's still dangerous). I assume that this is the same problem you are talking about. I think we should discuss whether we accept my premise? Checksums will actually detect more errors than we see now, and people will want to do something about that. Returning to backup is one way of handling it, but on a busy production system with pressure on, there is incentive to implement a workaround, not a fix. It's not an easy call to say we've got 3 corrupt blocks, so I'm going to take the whole system offline while I restore from backup. If you do restore from backup, and the backup also contains the 3 corrupt blocks, what then? Clearly part of the response could involve pg_dump on the damaged structure, at some point. I suppose we could have a new ReadBufferMaybe function that would only be used by a sequential scan; and then just skip over the page if it's corrupt, depending on a GUC. That would at least allow sequential scans to (partially) work, which might be good enough for some data recovery situations. If a catalog index is corrupted, that could just be rebuilt. Haven't thought about the details, though. Not sure if you're being facetious here or not. Mild reworking of the logic for heap page access could cope with a NULL buffer response and subsequent looping, which would allow us to run pg_dump against a damaged table to allow data to be saved, keeping file intact for further analysis. I'm suggesting we work a little harder than your block is corrupt and give some thought to what the user will do next. Indexes are a good case, because we can/should report the block error, mark the index as invalid and then hint that it should be rebuilt. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 02:41 AM, Bruce Momjian wrote: On Mon, Dec 17, 2012 at 12:14:29PM +0100, Bernhard Schrader wrote: Hello together, last thursday I upgraded one of our 9.0.6 postgresql servers to 9.2.2 with pg_upgrade. So far everything seemed to work but we now discover problems with the enum types. If we run one specific query it breaks all time with such an error message: ERROR: invalid internal value for enum: 520251 if this number should represent the enumtypid it is not existing anymore in pg_enum. How could i solve this problem? should we regenerate all enums? or what could we do? Hopefully anyone has a clue, google doesn't seem to be the ressource for this problem. We seriously tested the enum code so I am pretty confused why this is failing. If you do pg_dump --binary-upgrade --schema-only, do you see that a number like this being defined just before the enum is added? Hi Bruce, if i am dumping this db and grepping through the dump, i can't find the number. As far as we can see, the enum that is affected has now the enumtypid 16728. is there a table which keeps the possible typecasts from enum to text/text to enum etc.? if so, maybe the mapping in here is corrupt since the upgrade. regards -- Bernhard Schrader System Administration InnoGames GmbH Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany Tel +49 40 7889335-53 Fax +49 40 7889335-22 Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com – bernhard.schra...@innogames.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Tue, Dec 18, 2012 at 11:45 AM, Erik Rijkers e...@xs4all.nl wrote: On Tue, December 18, 2012 08:04, Alexander Korotkov wrote: I ran the same test again: HEAD versus trgm_regex v6, 7 and 9. In v9 there is some gain but also some regression. It remains a difficult problem... If I get some time in the holidays I'll try to diversify the test program; it is now too simple. Note, that regexes which contains {,n} are likely not what do you expect. test=# select 'xq' ~ 'x[aeiou]{,2}q'; ?column? -- f (1 row) test=# select 'xa{,2}q' ~ 'x[aeiou]{,2}q'; ?column? -- t (1 row) You should use {0,n} to express from 0 to n occurences. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP: index support for regexp search
On Tue, December 18, 2012 09:45, Alexander Korotkov wrote: You should use {0,n} to express from 0 to n occurences. Thanks, but I know that of course. It's a testing program; and in the end robustness with unexpected or even wrong input is as important as performance. (to put it bluntly, I am also trying to get your patch to fall over ;-)) Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 12/18/12 3:17 AM, Simon Riggs wrote: Clearly part of the response could involve pg_dump on the damaged structure, at some point. This is the main thing I wanted to try out more, once I have a decent corruption generation tool. If you've corrupted a single record but can still pg_dump the remainder, that seems the best we can do to help people recover from that. Providing some documentation on how to figure out what rows are in that block, presumably by using the contrib inspection tools, would be helpful too. Indexes are a good case, because we can/should report the block error, mark the index as invalid and then hint that it should be rebuilt. Marking a whole index invalid because there's one bad entry has enough downsides that I'm not sure how much we'd want to automate that. Not having that index available could easily result in an effectively down system due to low performance. The choices are uglier if it's backing a unique constraint. In general, what I hope people will be able to do is switch over to their standby server, and then investigate further. I think it's unlikely that people willing to pay for block checksums will only have one server. Having some way to nail down if the same block is bad on a given standby seems like a useful interface we should offer, and it shouldn't take too much work. Ideally you won't find the same corruption there. I'd like a way to check the entirety of a standby for checksum issues, ideally run right after it becomes current. It seems the most likely way to see corruption on one of those is to replicate a corrupt block. There is no good way to make the poor soul who has no standby server happy here. You're just choosing between bad alternatives. The first block error is often just that--the first one, to be joined by others soon afterward. My experience at how drives fail says the second error is a lot more likely after you've seen one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Tue, Dec 18, 2012 at 12:51 PM, Erik Rijkers e...@xs4all.nl wrote: On Tue, December 18, 2012 09:45, Alexander Korotkov wrote: You should use {0,n} to express from 0 to n occurences. Thanks, but I know that of course. It's a testing program; and in the end robustness with unexpected or even wrong input is as important as performance. (to put it bluntly, I am also trying to get your patch to fall over ;-)) I found most of regressions in 0.9 version to be in {,n} cases. New version of patch use more of trigrams than previous versions. For example for regex 'x[aeiou]{,2}q'. In 0.7 version we use trigrams '__2', '_2_' and '__q'. In 0.9 version we use trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_', '__2', '_2_' and '__q'. But, actually trigram '__2' or '_2_' never occurs. It enough to have one of them, all others are just causing a slowdown. Simultaneously, we can't decide reasonably which trigrams to use without knowing their frequencies. For example, if trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_' were altogether more rare than '__2', newer version of patch would be faster. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Error restoring from a base backup taken from standby
On 18.12.2012 00:35, Simon Riggs wrote: On 17 December 2012 17:39, Heikki Linnakangashlinnakan...@vmware.com wrote: (This is different from the other issue related to timeline switches I just posted about. There's no timeline switch involved in this one.) If you do pg_basebackup -x against a standby server, in some circumstances the backup fails to restore with error like this: C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not properly shut down; automatic recovery in progress C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at 0/1764F48 C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online backup C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online backup was taken must be available at recovery. C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832) exited with exit code 1 C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup process failure I spotted this bug while reading the code, and it took me quite a while to actually construct a test case to reproduce the bug, so let me begin by discussing the code where the bug is. You get the above error, WAL ends before end of online backup, when you reach the end of WAL before reaching the backupEndPoint stored in the control file, which originally comes from the backup_label file. backupEndPoint is only used in a base backup taken from a standby, in a base backup taken from the master, the end-of-backup WAL record is used instead to mark the end of backup. In the xlog redo loop, after replaying each record, we check if we've just reached backupEndPoint, and clear it from the control file if we have. Now the problem is, if there are no WAL records after the checkpoint redo point, we never even enter the redo loop, so backupEndPoint is not cleared even though it's reached immediately after reading the initial checkpoint record. To deal with the similar situation wrt. reaching consistency for hot standby purposes, we call CheckRecoveryConsistency() before the redo loop. The straightforward fix is to copy-paste the check for backupEndPoint to just before the redo loop, next to the CheckRecoveryConsistency() call. Even better, I think we should move the backupEndPoint check into CheckRecoveryConsistency(). It's already responsible for keeping track of whether minRecoveryPoint has been reached, so it seems like a good idea to do this check there as well. Attached is a patch for that (for 9.2), as well as a script I used to reproduce the bug. The script is a bit messy, and requires tweaking the paths at the top. Anyone spot a problem with this? Yep. The problem is one of design, not merely a coding error. The design assumes that the master is up, connected and doing work. Which is perfectly reasonable, since this is the reason we are doing a backup from the standby. But obviously not always true, so what we're saying is that the selection of backupEndPoint is incorrect in the first place. Putting code in to cope with that poor choice at recovery seems wrong - we should record the correct location. Hmm, pg_controlinfo says: pg_control version number:922 Catalog version number: 201204301 Database system identifier: 5823207860608399803 Database cluster state: in crash recovery pg_control last modified: ti 18. joulukuuta 2012 11.00.58 Latest checkpoint location: 0/1764EE8 Prior checkpoint location:0/1764EE8 Latest checkpoint's REDO location:0/1764EE8 Latest checkpoint's TimeLineID: 1 ... Minimum recovery ending location: 0/1764F48 Backup start location:0/1764EE8 Backup end location: 0/1764F48 ... That seems correct to me. The backup is considered valid after reaching 0/1764F48, which is where the checkpoint record ends. minRecoveryPoint is set to the same. What do you think it should be set to? In passing, I see another problem as well. We assume that the backup is still valid if we promote the standby to a master during the backup, but that isn't documented and so isn't full analysed to see that is true. No, that is taken care of in do_pg_stop_backup: /* * Parse the BACKUP FROM line. If we are taking an online backup from the * standby, we confirm that the standby has not been promoted during the * backup. */ ptr = strstr(remaining, BACKUP FROM:); if (!ptr || sscanf(ptr, BACKUP FROM: %19s\n, backupfrom) != 1) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(invalid data in file \%s\, BACKUP_LABEL_FILE))); if (strcmp(backupfrom, standby) == 0 !backup_started_in_recovery) ereport(ERROR,
Re: [HACKERS] Error restoring from a base backup taken from standby
On 18 December 2012 09:18, Heikki Linnakangas hlinnakan...@vmware.com wrote: That seems correct to me. The backup is considered valid after reaching 0/1764F48, which is where the checkpoint record ends. minRecoveryPoint is set to the same. What do you think it should be set to? I already said? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
Robert Haas robertmh...@gmail.com writes: And on the other hand, if you could get a clean split between the two grammars, then regardless of exactly what the split was, it might seem a win. But it seemed to me when I looked at this that you'd have to duplicate a lot of stuff and the small parser still wouldn't end up being very small, which I found hard to get excited about. I think the goal is not so much about getting a much smaller parser, but more about have a separate parser that you don't care about the bloat of, so that you can improve DDL without fearing about main parser performance regressions. If we come out with no regression and no gain on the main query parser, I say it still worth the separation effort. And anyway we only add things to the main parser (queries) when the standard saith we have to. Tom Lane t...@sss.pgh.pa.us writes: I'm not sure what other tool might be better though. I looked through http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages but it seems our options are a bit limited if we want something that produces C. It's not clear to me that any of the likely options are as mature as bison, let alone likely to substantially outperform it. (For instance, Hyacc sounded pretty promising until I got to the part about it doesn't yet support %union or %type.) Still, I didn't spend much time on this --- maybe somebody else would like to do a bit more research. I did spend a very little time on it too, with a different search angle, and did find that experimental thing that might be worth looking at, or maybe not. http://en.wikipedia.org/wiki/Parsing_expression_grammar http://piumarta.com/software/peg/ Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error restoring from a base backup taken from standby
On 18.12.2012 11:30, Simon Riggs wrote: On 18 December 2012 09:18, Heikki Linnakangashlinnakan...@vmware.com wrote: That seems correct to me. The backup is considered valid after reaching 0/1764F48, which is where the checkpoint record ends. minRecoveryPoint is set to the same. What do you think it should be set to? I already said? In this situation, there are no more WAL records in the standby the backup was taken from, after the checkpoint record. Using receivedUpto would yield the same value, and you'd still have the same problem. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup from cascading standby after timeline switch
On 18 December 2012 00:53, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 17 December 2012 14:16, Heikki Linnakangas hlinnakan...@vmware.com wrote: I also wonder if pg_basebackup should include *all* timeline history files in the backup, not just the latest one strictly required to restore. Why? the timeline history file includes the previous timelines already. The original intention was that the WAL archive might contain multiple timeline files corresponding to various experimental recovery attempts; furthermore, such files might be hand-annotated (that's why there's a comment provision). So they would be at least as valuable from an archival standpoint as the WAL files proper. I think we ought to just copy all of them, period. Anything less is penny-wise and pound-foolish. What I'm saying is that the new history file is created from the old one, so the latest file includes all the history as a direct copy. The only thing new is one line of information. Copying all files grows at O(N^2) with redundancy and will eventually become a space problem and a performance issue for smaller systems. There should be some limit to keep this sane, for example, the last 32 history files, or the last 1000 lines of history. Some sane limit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Sunday, December 16, 2012 11:14 PM Tom Lane wrote: Pavan Deolasee pavan.deola...@gmail.com writes: On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs si...@2ndquadrant.com wrote: As explained above, I disagree that it looks like a good idea, and you've shown no evidence it would be or is true. Lets separate out these two issues. What you are suggesting as a follow up to Tom's idea, I've no objection to that and that might be worthwhile optimisation to try out. But this patch itself does not attempt to deal with that and its a separate work item and will require invasive changes and tests. Another thing that would need to be considered, if we do want to restrict when pruning happens, is whether it is worth introducing some other path altogether for setting the all-visible bit. Or perhaps we should modify autovacuum's rules so that it will fire on relations for which there might be lots of unmarked all-visible pages. Can something similar be also used for putting deleted index pages into FSM. The reason is that currently deleted index pages are recorded in FSM in the next Vacuum cycle. So if after bulk index update (always increasing order), even if the auto vacuum is done once, it still does not put deleted index pages into FSM. Now let's assume there are no operations which can lead to auto-vacuum on same table, next cycle of bulk update will allocate new index pages. I had observed this in one of the tests that if bulk index update happens such that new value is always increasing, then index bloat happens. As per initial analysis, it seems one of the reasons is what I described above. If required, I can create a self-containing test which can show that bulk-index update can lead to index bloat. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] review: pgbench - aggregation of info written into log
Hello this patch was proposed http://archives.postgresql.org/pgsql-hackers/2012-08/msg00817.php and there was no objections * there are no issue with patching * no warnings * code is clean and respect our source code policy * tested without errors My last objections was satisfied - last interval isn't logged, but it better, because it can be incomplete - and it is better to drop it. This patch is ready for commit. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLByte* usage
Andres Freund and...@2ndquadrant.com writes: In 2) unfortunately one has to make decision in which way to simplify negated XLByte(LT|LE) expressions. I tried to make that choice very careful and when over every change several times after that, so I hope there aren't any bad changes, but more eyeballs are needed. + if (lsn PageGetLSN(page)) + if (lsn = PageGetLSN(page)) + if (lsn = PageGetLSN(page)) + if (max_lsn this_lsn) My only comment here would be that I would like it much better that the code always use the same operators, and as we read from left to right, that we pick and =. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLByte* usage
On 2012-12-18 13:14:10 +0100, Dimitri Fontaine wrote: Andres Freund and...@2ndquadrant.com writes: In 2) unfortunately one has to make decision in which way to simplify negated XLByte(LT|LE) expressions. I tried to make that choice very careful and when over every change several times after that, so I hope there aren't any bad changes, but more eyeballs are needed. + if (lsn PageGetLSN(page)) + if (lsn = PageGetLSN(page)) + if (lsn = PageGetLSN(page)) + if (max_lsn this_lsn) My only comment here would be that I would like it much better that the code always use the same operators, and as we read from left to right, that we pick and =. I did that in most places (check the xlog.c changes), but in this case it didn't seem to be appropriate because because that would have meant partially reversing the order of operands which would have looked confusing as well, given this check is a common patter across nearly every xlog replay function. Imo its a good idea trying to choose or = as operator but its also important to keep the order consistent inside a single function/similar functions. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review : Add hooks for pre- and post-processor executables for COPY and \copy
Basic stuff: - Rebase of Patch is required. - Compiles cleanly without any errors/warnings - Regression tests pass. What it does: - This patch is useful when COPY command input/output are stored in compression format or in any command/script uses these output/input in any means; without generating intermediate temporary files. This feature can be used in server side using COPY statement by administrator. Or can be used in psql internal \copy command by any user. Code Review comments: - 1. Modify the comment in function header of: parse_slash_copy (needs to modify for new syntax) 2. Comments for functions OpenPipeStream ClosePipeStream are missing. 3. Any Script errors are not directly visible to user; If there problems in script no way to cleanup. Shouldn't this be mentioned in User Manual. Test case issues: -- 1. Broken pipe is not handled in case of psql \copy command; Issue are as follows: Following are verified on SuSE-Linux 10.2. 1) psql is exiting when \COPY xxx TO command is issued and command/script is not found When popen is called in write mode it is creating valid file descriptor and when it tries to write to file Broken pipe error is coming which is not handled. psql# \copy pgbench_accounts TO PROGRAM '../compress.sh pgbench_accounts4.txt' 2) When \copy command is in progress then program/command is killed/crashed due to any problem psql is exiting. Script used in testcases: -- 1. compress.sh echo 'cat $1' compress.sh echo 'bzip2 -z $1' compress.sh chmod +x compress.sh 2. decompress.sh echo 'bzip2 -d -c -k $*' decompress.sh chmod +x decompress.sh Testcases executed are attached with this mail. With Regards, Amit Kapila. --1. check success case admin can use this COPY statement feature. select count(*) from pgbench_accounts; copy pgbench_accounts TO PROGRAM '/home/kiran/installation/bin/compress.sh /home/kiran/installation/bin/pgbench_accounts1.txt'; delete from pgbench_accounts; copy pgbench_accounts FROM PROGRAM '/home/kiran/installation/bin/decompress.sh /home/kiran/installation/bin/pgbench_accounts1.txt.bz2'; select count(*) from pgbench_accounts; --2. check commands are running with relative path to database directory. copy pgbench_accounts TO PROGRAM '../compress.sh ../pgbench_accounts4.txt'; delete from pgbench_accounts; copy pgbench_accounts FROM PROGRAM '../decompress.sh ../pgbench_accounts4.txt.bz2'; select count(*) from pgbench_accounts; --3. invalid case; where command not found select count(*) from pgbench_accounts; copy pgbench_accounts TO PROGRAM 'compress.sh pgbench_accounts1.txt'; copy pgbench_accounts FROM PROGRAM 'decompress.sh pgbench_accounts1.txt.bz2'; --4. invalid case; where file / script having problem. select count(*) from pgbench_accounts; copy pgbench_accounts TO PROGRAM '../compress.sh abc/pgbench_accounts1.txt'; copy pgbench_accounts FROM PROGRAM '../decompress.sh abc/pgbench_accounts1.txt.bz2'; --repeat all above cases from psql command. --5. check success case admin can use this \COPY psql feature. select count(*) from pgbench_accounts; \copy pgbench_accounts TO PROGRAM '/home/kiran/installation/bin/compress.sh /home/kiran/installation/bin/pgbench_accounts1.txt'; delete from pgbench_accounts; \copy pgbench_accounts FROM PROGRAM '/home/kiran/installation/bin/decompress.sh /home/kiran/installation/bin/pgbench_accounts1.txt.bz2'; select count(*) from pgbench_accounts; --6. check commands are running with relative path to psql current working directory. \copy pgbench_accounts TO PROGRAM '../compress.sh ../pgbench_accounts4.txt'; delete from pgbench_accounts; \copy pgbench_accounts FROM PROGRAM '../decompress.sh ../pgbench_accounts4.txt.bz2'; select count(*) from pgbench_accounts; --7. invalid case; where command not found select count(*) from pgbench_accounts; \copy pgbench_accounts TO PROGRAM 'compress.sh pgbench_accounts1.txt'; \copy pgbench_accounts FROM PROGRAM 'decompress.sh pgbench_accounts1.txt.bz2'; --8. invalid case; where file / script having problem. select count(*) from pgbench_accounts; \copy pgbench_accounts TO PROGRAM '../compress.sh abc/pgbench_accounts1.txt'; \copy pgbench_accounts FROM PROGRAM '../decompress.sh abc/pgbench_accounts1.txt.bz2'; --9. Normal user can't execute COPY statement create user abc; \c abc select count(*) from pgbench_accounts; copy pgbench_accounts TO PROGRAM '/home/kiran/installation/bin/compress.sh /home/kiran/installation/bin/pgbench_accounts1.txt'; delete from pgbench_accounts; copy pgbench_accounts FROM PROGRAM '/home/kiran/installation/bin/decompress.sh /home/kiran/installation/bin/pgbench_accounts1.txt.bz2'; select count(*) from pgbench_accounts; --10. Normal user can execute \COPY psql
Re: [HACKERS] configure.in and setproctitle/optreset problem
Re: Tom Lane 2012-12-18 26465.1355798...@sss.pgh.pa.us I think we should assume that the libedit developers are utterly clueless about not trampling on application namespace, and just cut that library out of *all* our link checks except for the symbols we specifically expect to get from libedit/libreadline. Hence, I propose the attached slightly more extensive patch. I've verified that this produces a working build with Debian's libedit. I confirm that your patch fixes the problem on master and REL9_2_STABLE (manually applied, one hunk failed). Thanks! Christoph -- Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Greg Smith wrote: In general, what I hope people will be able to do is switch over to their standby server, and then investigate further. I think it's unlikely that people willing to pay for block checksums will only have one server. Having some way to nail down if the same block is bad on a given standby seems like a useful interface we should offer, and it shouldn't take too much work. Ideally you won't find the same corruption there. I'd like a way to check the entirety of a standby for checksum issues, ideally run right after it becomes current. It seems the most likely way to see corruption on one of those is to replicate a corrupt block. There is no good way to make the poor soul who has no standby server happy here. You're just choosing between bad alternatives. The first block error is often just that--the first one, to be joined by others soon afterward. My experience at how drives fail says the second error is a lot more likely after you've seen one. +1 on all of that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure.in and setproctitle/optreset problem
Peter Eisentraut pete...@gmx.net writes: On Mon, 2012-12-17 at 18:02 +0100, Christoph Berg wrote: I have no clue why no one else has seen this bug before, but the reason for the error seems to be that configure is invoking the setproctitle test including -ledit. libedit.so is linked to libbsd.so, which in turn contains setproctitle(), so HAVE_SETPROCTITLE is set even this is Linux, not BSD. I reported this here: http://archives.postgresql.org/pgsql-bugs/2012-07/msg00127.php I thought the issue sounded familiar ... The correct fix, IMO/IIRC, is to add LDFLAGS=-Wl,--as-needed before running most of the configure checks, instead of after. Meh. It's not clear to me at all that that fixes the issue here, or at least that it does so in any way that's reliable. The proposal to add --as-needed during configure was made to fix a different problem, namely making the wrong decision about whether libintl needs to be pulled in explicitly. We don't seem to have done anything about that --- personally I was afraid that it could break as many cases as it fixed. Reviewing that thread, I still think that swapping the order of the probes for libintl and libgssapi is the safest fix for the previous issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
There is no good way to make the poor soul who has no standby server happy here. You're just choosing between bad alternatives. The first block error is often just that--the first one, to be joined by others soon afterward. My experience at how drives fail says the second error is a lot more likely after you've seen one. For what it's worth Oracle allows you to recover a specific block from backups including replaying the archive logs for that one block. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 02:41 AM, Bruce Momjian wrote: On Mon, Dec 17, 2012 at 12:14:29PM +0100, Bernhard Schrader wrote: Hello together, last thursday I upgraded one of our 9.0.6 postgresql servers to 9.2.2 with pg_upgrade. So far everything seemed to work but we now discover problems with the enum types. If we run one specific query it breaks all time with such an error message: ERROR: invalid internal value for enum: 520251 if this number should represent the enumtypid it is not existing anymore in pg_enum. How could i solve this problem? should we regenerate all enums? or what could we do? Hopefully anyone has a clue, google doesn't seem to be the ressource for this problem. We seriously tested the enum code so I am pretty confused why this is failing. If you do pg_dump --binary-upgrade --schema-only, do you see that a number like this being defined just before the enum is added? Hi Bruce, if i am dumping this db and grepping through the dump, i can't find the number. As far as we can see, the enum that is affected has now the enumtypid 16728. is there a table which keeps the possible typecasts from enum to text/text to enum etc.? if so, maybe the mapping in here is corrupt since the upgrade. regards ### Hi again, maybe there are more information needed to point this stuff out. I'm not quite sure what would be useful, so i just give you that stuff where is stumpled upon. 1.) We have some staging servers, where i first used pg_upgrade to make sure everything is running and nothing breaks on our beta/live servers. And it worked there, without any problem i can use the enums which break on the beta servers 2.) As mentioned, on beta servers the usage of the enum fails with error message: ERROR: invalid internal value for enum: 520251 3.) If i search for the enumtypid or oid in pg_enum, it is obviously not there. select * from pg_enum where enumtypid=520251; (No rows) select * from pg_enum where oid=520251; (No rows) 4.) If i am searching for the enumlabels which are used by the query i am getting as enumtypid 16728 which also belongs to the expected pg_type 5.) pg_enum of the enumtypid looks like this select oid,* from pg_enum where enumtypid=16728; oid | enumtypid | enumsortorder | enumlabel +---+---+--- 16729 | 16728 | 1 | att 16730 | 16728 | 2 | def 16731 | 16728 | 3 | all 646725 | 16728 | 4 | adm_att 646726 | 16728 | 5 | adm_def 6.) enumlabels adm_att and adm_def are also defined under another enumtypid, but i think this shouldn't affect anything. just wanted to mention this. 7.) during pg_upgrade i used --link method Well, if you need any other info please ask. i just can't imagine why this stuff worked on staging servers but not on beta, as they are identical on database point of view. -- Bernhard Schrader System Administration InnoGames GmbH Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany Tel +49 40 7889335-53 Fax +49 40 7889335-22 Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com -- bernhard.schra...@innogames.de
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 03:45 AM, Bernhard Schrader wrote: On 12/18/2012 02:41 AM, Bruce Momjian wrote: On Mon, Dec 17, 2012 at 12:14:29PM +0100, Bernhard Schrader wrote: Hello together, last thursday I upgraded one of our 9.0.6 postgresql servers to 9.2.2 with pg_upgrade. So far everything seemed to work but we now discover problems with the enum types. If we run one specific query it breaks all time with such an error message: ERROR: invalid internal value for enum: 520251 if this number should represent the enumtypid it is not existing anymore in pg_enum. How could i solve this problem? should we regenerate all enums? or what could we do? Hopefully anyone has a clue, google doesn't seem to be the ressource for this problem. We seriously tested the enum code so I am pretty confused why this is failing. If you do pg_dump --binary-upgrade --schema-only, do you see that a number like this being defined just before the enum is added? Hi Bruce, if i am dumping this db and grepping through the dump, i can't find the number. As far as we can see, the enum that is affected has now the enumtypid 16728. is there a table which keeps the possible typecasts from enum to text/text to enum etc.? if so, maybe the mapping in here is corrupt since the upgrade. The translations from oid to label are in pg_enum, but it looks like somehow you have lost that mapping. I'm not sure what you've done but AFAICT pg_upgrade is doing the right thing. I just did this (from 9.0 to 9.2) and the pg_upgrade_dump_all.sql that is used to create the new catalog has these lines: -- For binary upgrade, must preserve pg_type oid SELECT binary_upgrade.set_next_pg_type_oid('40804'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type array oid SELECT binary_upgrade.set_next_array_pg_type_oid('40803'::pg_catalog.oid); CREATE TYPE myenum AS ENUM ( ); -- For binary upgrade, must preserve pg_enum oids SELECT binary_upgrade.set_next_pg_enum_oid('40805'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'foo'; SELECT binary_upgrade.set_next_pg_enum_oid('40806'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'bar'; SELECT binary_upgrade.set_next_pg_enum_oid('40807'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'baz'; and this worked exactly as expected, with a table using this type showing the expected values. Can you produce a test case demonstrating the error? When you run pg_upgrade, use the -r flag to keep all the intermediate files so we can see what's going on. It's no good dumping the new db looking for these values if they have been lost. You would need to have a physical copy of the old db and dump that in binary upgrade mode looking for the Oid. If you don't have a physical copy of the old db or the intermediate dump file pg_upgrade used then recovery is going to be pretty difficult. It's not necessarily impossible, but it might involve you getting some outside help. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On Tue, Dec 18, 2012 at 10:52:46AM -0500, Andrew Dunstan wrote: The translations from oid to label are in pg_enum, but it looks like somehow you have lost that mapping. I'm not sure what you've done but AFAICT pg_upgrade is doing the right thing. I just did this (from 9.0 to 9.2) and the pg_upgrade_dump_all.sql that is used to create the new catalog has these lines: -- For binary upgrade, must preserve pg_type oid SELECT binary_upgrade.set_next_pg_type_oid('40804'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type array oid SELECT binary_upgrade.set_next_array_pg_type_oid('40803'::pg_catalog.oid); CREATE TYPE myenum AS ENUM ( ); -- For binary upgrade, must preserve pg_enum oids SELECT binary_upgrade.set_next_pg_enum_oid('40805'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'foo'; SELECT binary_upgrade.set_next_pg_enum_oid('40806'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'bar'; SELECT binary_upgrade.set_next_pg_enum_oid('40807'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'baz'; and this worked exactly as expected, with a table using this type showing the expected values. Can you produce a test case demonstrating the error? When you run pg_upgrade, use the -r flag to keep all the intermediate files so we can see what's going on. It's no good dumping the new db looking for these values if they have been lost. You would need to have a physical copy of the old db and dump that in binary upgrade mode looking for the Oid. If you don't have a physical copy of the old db or the intermediate dump file pg_upgrade used then recovery is going to be pretty difficult. It's not necessarily impossible, but it might involve you getting some outside help. Yes, this matches what I thought too. You see the binary_upgrade.set_next_pg_enum_oid() calls in pg_dump --binary-upgrade --schema-only and those set the oid of the newly created enum. I agree you would need to run this on the _old_ cluster for us to figure out how it failed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Tue, Dec 18, 2012 at 09:28:00AM +0400, Groshev Andrey wrote: 18.12.2012, 05:22, Bruce Momjian br...@momjian.us: This is the first pg_upgrade mismatch report we have gotten about 9.1. I have asked the reporter for details. Is what is the full 9.1 version number? --- # rpm -qa |grep postgres postgresql90-devel-9.0.11-1PGDG.rhel6.x86_64 postgresql91-9.1.7-1PGDG.rhel6.x86_64 postgresql90-9.0.11-1PGDG.rhel6.x86_64 postgresql90-server-9.0.11-1PGDG.rhel6.x86_64 postgresql91-libs-9.1.7-1PGDG.rhel6.x86_64 postgresql91-server-9.1.7-1PGDG.rhel6.x86_64 postgresql91-devel-9.1.7-1PGDG.rhel6.x86_64 postgresql90-libs-9.0.11-1PGDG.rhel6.x86_64 postgresql90-contrib-9.0.11-1PGDG.rhel6.x86_64 postgresql91-contrib-9.1.7-1PGDG.rhel6.x86_64 Full version ? It is not full postgresql91-9.1.7-1PGDG.rhel6.x86_64 or I do not understand something? I installed latest postgresql from the repository http://yum.pgrpms.org Oops, I see that now, sorry. I wanted to make sure you were on the most recent 9.1 version, and you are. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
Hello. I've tried to implement this API for our Multicorn FDW, and I have a few questions about the API. First, I don't understand what are the requirements on the rowid pseudo-column values. In particular, this sentence from a previous mail makes it ambiguous to me: At the beginning, I constructed the rowid pseudo-column using INTERNALOID, but it made troubles when fetched tuples are materialized prior to table modification. So, the rowid argument of them are re-defined as const char *. Does that mean that one should only store a cstring in the rowid pseudo-column ? Or can one store an arbitrary pointer ? Currently, I'm building a text value using cstring_to_text_with_len. Could there be a problem with that ? Secondly, how does one use a returning clause ? I've tried to look at the postgres_fdw code, but it does not seem to handle that. For what its worth, knowing that the postgres_fdw is still in WIP status, here is a couple result of my experimentation with it: - Insert into a foreign table mapped to a table with a before trigger, using a returning clause, will return the values as they were before being modified by the trigger. - Same thing, but if the trigger prevents insertion (returning NULL), the would-have-been inserted row is still returned, although the insert statement reports zero rows. - Inserting into a table with default values does not work as intended, since missing values are replaced by a null value in the remote statement. What can be done to make the behaviour more consistent ? I'm very excited about this feature, thank you for making this possible. Regards, -- Ronan Dunklau 2012/12/14 Albe Laurenz laurenz.a...@wien.gv.at Kohei KaiGai wrote: I came up with one more query that causes a problem: [...] This causes a deadlock, but one that is not detected; the query just keeps hanging. The UPDATE in the CTE has the rows locked, so the SELECT ... FOR UPDATE issued via the FDW connection will hang indefinitely. I wonder if that's just a pathological corner case that we shouldn't worry about. Loopback connections for FDWs themselves might not be so rare, for example as a substitute for autonomous subtransactions. I guess it is not easily possible to detect such a situation or to do something reasonable about it. It is not avoidable problem due to the nature of distributed database system, not only loopback scenario. In my personal opinion, I'd like to wait for someone implements distributed lock/transaction manager on top of the background worker framework being recently committed, to intermediate lock request. However, it will take massive amount of efforts to existing lock/transaction management layer, not only enhancement of FDW APIs. It is obviously out of scope in this patch. So, I'd like to suggest authors of FDW that support writable features to put mention about possible deadlock scenario in their documentation. At least, above writable CTE example is a situation that two different sessions concurrently update the test relation, thus, one shall be blocked. Fair enough. I tried to overhaul the documentation, see the attached patch. There was one thing that I was not certain of: You say that for writable foreign tables, BeginForeignModify and EndForeignModify *must* be implemented. I thought that these were optional, and if you can do your work with just, say, ExecForeignDelete, you could do that. Yes, that's right. What I wrote was incorrect. If FDW driver does not require any state during modification of foreign tables, indeed, these are not mandatory handler. I have updated the documentation, that was all I changed in the attached patches. OK. I split the patch into two portion, part-1 is the APIs relevant patch, part-2 is relevant to postgres_fdw patch. Great. I'll mark the patch as ready for committer. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумент а$Документ Failure, exiting I am now confused over the error message above. This is the code that is generating the error: /* * TOAST table names initially match the heap pg_class oid. * In pre-8.4, TOAST table names change during CLUSTER; in pre-9.0, * TOAST table names change during ALTER TABLE ALTER COLUMN SET TYPE. * In = 9.0, TOAST relation names always use heap table oids, hence * we cannot check relation names when upgrading from pre-9.0. * Clusters upgraded to 9.0 will get matching TOAST names. */ if (strcmp(old_rel-nspname, new_rel-nspname) != 0 || ((GET_MAJOR_VERSION(old_cluster.major_version) = 900 || strcmp(old_rel-nspname, pg_toast) != 0) strcmp(old_rel-relname, new_rel-relname) != 0)) pg_log(PG_FATAL, Mismatch of relation names: database \%s\, old rel %s.%s, new rel %s.%s\n, old_db-db_name, old_rel-nspname, old_rel-relname, new_rel-nspname, new_rel-relname); Looking at the Russian, I see 'old rel' public.lob.* and 'new rel' public.plob.*. I assume the database is called 'database', and the schema is called 'public', but what is 'lob' and 'plob'? If those are tables or indexes, what is after the period? Do you have periods embedded in the table/index names? That is certainly possible, but not common, e.g.: test= create table test.x (y int); CREATE TABLE Is the schema called public.lob? I expected to see schema.objname. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] system administration functions with hardcoded superuser checks
There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
On 18 December 2012 17:09, Peter Eisentraut pete...@gmx.net wrote: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? +1 But I would include recovery functions also. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
later in the log pg_dump, I found the definition of new rel -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting I am now confused over the error message above. This is the code that is generating the error: /* * TOAST table names initially match the heap pg_class oid. * In pre-8.4, TOAST table names change during CLUSTER; in pre-9.0, * TOAST table names change during ALTER TABLE ALTER COLUMN SET TYPE. * In = 9.0, TOAST relation names always use heap table oids, hence * we cannot check relation names when upgrading from pre-9.0. * Clusters upgraded to 9.0 will get matching TOAST names. */ if (strcmp(old_rel-nspname, new_rel-nspname) != 0 || ((GET_MAJOR_VERSION(old_cluster.major_version) = 900 || strcmp(old_rel-nspname, pg_toast) != 0) strcmp(old_rel-relname, new_rel-relname) != 0)) pg_log(PG_FATAL, Mismatch of relation names: database \%s\, old rel %s.%s, new rel %s.%s\n, old_db-db_name, old_rel-nspname, old_rel-relname, new_rel-nspname, new_rel-relname); Looking at the Russian, I see 'old rel' public.lob.* and 'new rel' public.plob.*. I assume the database is called 'database', and the schema is called 'public', but what is 'lob' and 'plob'? If those are tables or indexes, what is after the period? Do you have periods embedded in the table/index names? That is certainly possible, but not common, e.g.: test= create table test.x (y int); CREATE TABLE Is the schema called public.lob? I expected to see schema.objname. -- Bruce Momjian br...@momjian.us http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? isn't it too strong gun for some people ??? I believe so some one can decrease necessary rights and it opens doors to system. pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir is relative dangerous and I am not for opening these functions. power user can simply to write extension, but he knows what he does/ Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup from cascading standby after timeline switch
On Tue, Dec 18, 2012 at 8:09 PM, Simon Riggs si...@2ndquadrant.com wrote: On 18 December 2012 00:53, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 17 December 2012 14:16, Heikki Linnakangas hlinnakan...@vmware.com wrote: I also wonder if pg_basebackup should include *all* timeline history files in the backup, not just the latest one strictly required to restore. Why? the timeline history file includes the previous timelines already. The original intention was that the WAL archive might contain multiple timeline files corresponding to various experimental recovery attempts; furthermore, such files might be hand-annotated (that's why there's a comment provision). So they would be at least as valuable from an archival standpoint as the WAL files proper. I think we ought to just copy all of them, period. Anything less is penny-wise and pound-foolish. What I'm saying is that the new history file is created from the old one, so the latest file includes all the history as a direct copy. The only thing new is one line of information. The timeline history file includes only ancestor timelines history. So the latest one might not include all the history. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 05:22 PM, Bruce Momjian wrote: On Tue, Dec 18, 2012 at 10:52:46AM -0500, Andrew Dunstan wrote: The translations from oid to label are in pg_enum, but it looks like somehow you have lost that mapping. I'm not sure what you've done but AFAICT pg_upgrade is doing the right thing. I just did this (from 9.0 to 9.2) and the pg_upgrade_dump_all.sql that is used to create the new catalog has these lines: -- For binary upgrade, must preserve pg_type oid SELECT binary_upgrade.set_next_pg_type_oid('40804'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type array oid SELECT binary_upgrade.set_next_array_pg_type_oid('40803'::pg_catalog.oid); CREATE TYPE myenum AS ENUM ( ); -- For binary upgrade, must preserve pg_enum oids SELECT binary_upgrade.set_next_pg_enum_oid('40805'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'foo'; SELECT binary_upgrade.set_next_pg_enum_oid('40806'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'bar'; SELECT binary_upgrade.set_next_pg_enum_oid('40807'::pg_catalog.oid); ALTER TYPE public.myenum ADD VALUE 'baz'; and this worked exactly as expected, with a table using this type showing the expected values. Can you produce a test case demonstrating the error? When you run pg_upgrade, use the -r flag to keep all the intermediate files so we can see what's going on. It's no good dumping the new db looking for these values if they have been lost. You would need to have a physical copy of the old db and dump that in binary upgrade mode looking for the Oid. If you don't have a physical copy of the old db or the intermediate dump file pg_upgrade used then recovery is going to be pretty difficult. It's not necessarily impossible, but it might involve you getting some outside help. Yes, this matches what I thought too. You see the binary_upgrade.set_next_pg_enum_oid() calls in pg_dump --binary-upgrade --schema-only and those set the oid of the newly created enum. I agree you would need to run this on the _old_ cluster for us to figure out how it failed. Hey, i just made a testrun, i restored a dump to a testmachine with 9.0 running, made a pg_dump --binary-upgrade --schema-only of that, made my upgrade to 9.2, after that i checked the schema dump and the values of the enumtypid in the 9.2 database and they were identically. Thats how it is expected to be. Nevertheless this didn't worked with the beta server. but i have no dump to prove this. Beside the fact that i want to fix my db's, i would also like to help to improve the upgrade process, but i have no clue right now how i could do this. i think i will try some other dbs to check if there maybe an error occurs. Beside of that, we tested a little bit more with the failing query: The statement which is causing the error is a big UPDATE-statement with FROM. After some testing we figured out that the subselect in the FROM-clause is working fine. And if we simplify the UPDATE-statement it's also working. We're able to show the data and we're able to do simple updates on the table. But the two things combined are not working. We checked the data from the subselect - it's correct. In the FROM-clause we're using a window-function to calculate a ranking. Do you know, if there is any mapping for window-functions which has to deal with enums? regards -- Bernhard Schrader System Administration InnoGames GmbH Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany Tel +49 40 7889335-53 Fax +49 40 7889335-22 Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com – bernhard.schra...@innogames.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error restoring from a base backup taken from standby
On Tue, Dec 18, 2012 at 2:39 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: (This is different from the other issue related to timeline switches I just posted about. There's no timeline switch involved in this one.) If you do pg_basebackup -x against a standby server, in some circumstances the backup fails to restore with error like this: C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not properly shut down; automatic recovery in progress C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at 0/1764F48 C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online backup C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online backup was taken must be available at recovery. C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832) exited with exit code 1 C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup process failure I spotted this bug while reading the code, and it took me quite a while to actually construct a test case to reproduce the bug, so let me begin by discussing the code where the bug is. You get the above error, WAL ends before end of online backup, when you reach the end of WAL before reaching the backupEndPoint stored in the control file, which originally comes from the backup_label file. backupEndPoint is only used in a base backup taken from a standby, in a base backup taken from the master, the end-of-backup WAL record is used instead to mark the end of backup. In the xlog redo loop, after replaying each record, we check if we've just reached backupEndPoint, and clear it from the control file if we have. Now the problem is, if there are no WAL records after the checkpoint redo point, we never even enter the redo loop, so backupEndPoint is not cleared even though it's reached immediately after reading the initial checkpoint record. Good catch! To deal with the similar situation wrt. reaching consistency for hot standby purposes, we call CheckRecoveryConsistency() before the redo loop. The straightforward fix is to copy-paste the check for backupEndPoint to just before the redo loop, next to the CheckRecoveryConsistency() call. Even better, I think we should move the backupEndPoint check into CheckRecoveryConsistency(). It's already responsible for keeping track of whether minRecoveryPoint has been reached, so it seems like a good idea to do this check there as well. Attached is a patch for that (for 9.2), as well as a script I used to reproduce the bug. The patch looks good to me. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup from cascading standby after timeline switch
Fujii Masao masao.fu...@gmail.com writes: On Tue, Dec 18, 2012 at 8:09 PM, Simon Riggs si...@2ndquadrant.com wrote: What I'm saying is that the new history file is created from the old one, so the latest file includes all the history as a direct copy. The only thing new is one line of information. The timeline history file includes only ancestor timelines history. So the latest one might not include all the history. Indeed. And even if there are a thousand of them, so what? It'd still be less space than a single WAL segment file. Better to keep the data than wish we had it later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Bernhard Schrader bernhard.schra...@innogames.de writes: Beside of that, we tested a little bit more with the failing query: The statement which is causing the error is a big UPDATE-statement with FROM. After some testing we figured out that the subselect in the FROM-clause is working fine. And if we simplify the UPDATE-statement it's also working. We're able to show the data and we're able to do simple updates on the table. But the two things combined are not working. Does the table being updated have any indexes on enum columns? I'm suspicious that the bogus OID is in an index page somewhere, and not in the table at all. If that is the answer, then reindexing said index would get rid of the problem (as well as all evidence that would help us find out how it happened ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 01:24 PM, Tom Lane wrote: Bernhard Schrader bernhard.schra...@innogames.de writes: Beside of that, we tested a little bit more with the failing query: The statement which is causing the error is a big UPDATE-statement with FROM. After some testing we figured out that the subselect in the FROM-clause is working fine. And if we simplify the UPDATE-statement it's also working. We're able to show the data and we're able to do simple updates on the table. But the two things combined are not working. Does the table being updated have any indexes on enum columns? I'm suspicious that the bogus OID is in an index page somewhere, and not in the table at all. If that is the answer, then reindexing said index would get rid of the problem (as well as all evidence that would help us find out how it happened ...) Unless they can make a physical backup of the datadir first. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 2012-12-18 13:24:12 -0500, Tom Lane wrote: Bernhard Schrader bernhard.schra...@innogames.de writes: Beside of that, we tested a little bit more with the failing query: The statement which is causing the error is a big UPDATE-statement with FROM. After some testing we figured out that the subselect in the FROM-clause is working fine. And if we simplify the UPDATE-statement it's also working. We're able to show the data and we're able to do simple updates on the table. But the two things combined are not working. Does the table being updated have any indexes on enum columns? I'm suspicious that the bogus OID is in an index page somewhere, and not in the table at all. I already wondered whether it could be a problem caused by pg_upgrade not ignoring invalid indexes until recently, but I don't really see how it could cause an invalid oid to end up in the index. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Andres Freund and...@2ndquadrant.com writes: On 2012-12-18 13:24:12 -0500, Tom Lane wrote: Does the table being updated have any indexes on enum columns? I'm suspicious that the bogus OID is in an index page somewhere, and not in the table at all. I already wondered whether it could be a problem caused by pg_upgrade not ignoring invalid indexes until recently, but I don't really see how it could cause an invalid oid to end up in the index. It seems like this might indicate a flaw in our scheme for preventing uncommitted enum values from getting into tables/indexes. Hard to see what though. Bernhard, if you do identify a particular index as being the source of the failure, that would at least tell us for sure which enum type is at fault. I don't suppose you would have any info about the history of that enum type in your database? The fact that the OID is odd implies that it belonged to a value that was added by ALTER TYPE ADD VALUE, but what we'd want is some context around any past uses of that command, especially if they failed or were rolled back. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 02:34 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2012-12-18 13:24:12 -0500, Tom Lane wrote: Does the table being updated have any indexes on enum columns? I'm suspicious that the bogus OID is in an index page somewhere, and not in the table at all. I already wondered whether it could be a problem caused by pg_upgrade not ignoring invalid indexes until recently, but I don't really see how it could cause an invalid oid to end up in the index. It seems like this might indicate a flaw in our scheme for preventing uncommitted enum values from getting into tables/indexes. Hard to see what though. Bernhard, if you do identify a particular index as being the source of the failure, that would at least tell us for sure which enum type is at fault. I don't suppose you would have any info about the history of that enum type in your database? The fact that the OID is odd implies that it belonged to a value that was added by ALTER TYPE ADD VALUE, but what we'd want is some context around any past uses of that command, especially if they failed or were rolled back. He's upgrading from 9.0, which didn't have enum extension at all, and where odd enums didn't mean anything special. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Andrew Dunstan and...@dunslane.net writes: He's upgrading from 9.0, which didn't have enum extension at all, and where odd enums didn't mean anything special. Really? The noncontiguous pg_enum OIDs shown in http://archives.postgresql.org/pgsql-hackers/2012-12/msg01089.php suggest strongly that *something's* been done to that type since it was created. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 02:58 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: He's upgrading from 9.0, which didn't have enum extension at all, and where odd enums didn't mean anything special. Really? The noncontiguous pg_enum OIDs shown in http://archives.postgresql.org/pgsql-hackers/2012-12/msg01089.php suggest strongly that *something's* been done to that type since it was created. That's what he said. People have been known to hack pg_enum on their own, especially before we added enum extension. Of course, if they do that they get to keep both pieces. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - GetOldestXmin
On Fri, Dec 14, 2012 at 7:19 PM, Andres Freund and...@2ndquadrant.com wrote: On 2012-12-14 14:01:30 -0500, Robert Haas wrote: On Fri, Dec 14, 2012 at 6:46 AM, Andres Freund and...@2ndquadrant.com wrote: Just moving that tidbit inside the lock seems to be the pragmatic choice. GetOldestXmin is called * once per checkpoint * one per index build * once in analyze * twice per vacuum * once for HS feedback messages Nothing of that occurs frequently enough that 5 instructions will make a difference. I would be happy to go an alternative path, but right now I don't see any nice one. A already_locked parameter to GetOldestXmin seems to be a cure worse than the disease. I'm not sure that would be so bad, but I guess I question the need to do it this way at all. Most of the time, if you need to advertise your global xmin, you use GetSnapshotData(), not GetOldestXmin(), and I guess I'm not seeing why that wouldn't also work here. Am I dumb? I wondered upthread whether that would be better: On 2012-12-13 21:03:44 +0100, Andres Freund wrote: Another alternative to this would be to get a snapshot with GetSnapshotData(), copy the xmin to the logical slot, then call ProcArrayEndTransaction(). But that doesn't really seem to be nicer to me. Not sure why I considered it ugly anymore, but it actually has a noticeable disadvantage. GetOldestXmin is nicer is than GetSnapshotData as the latter set a fairly new xid as xmin whereas GetOldestXmin returns the actual current xmin horizon. Thats preferrable because it allows us to start up more quickly. snapbuild.c can only start building a snapshot once it has seen a xl_running_xact with oldestRunningXid = own_xmin. Otherwise we cannot be sure that no relevant catalog tuples have been removed. I'm a bit confused. Are you talking about the difference between RecentGlobalXmin and RecentXmin? I think GetSnapshotData() updates both. Anyway, if there's no nicer way, I think it's probably OK to add a parameter to GetOldestXmin(). It seems like kind of a hack, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Andrew Dunstan and...@dunslane.net writes: People have been known to hack pg_enum on their own, especially before we added enum extension. Of course, if they do that they get to keep both pieces. Yeah ... this would be very readily explainable if there had been a manual deletion from pg_enum somewhere along the line. Even if there were at that time no instances of the OID left in tables, there could be some in upper btree pages. They'd have caused no trouble in 9.0 but would (if odd) cause trouble in 9.2. Of course, this theory doesn't explain why the problem was seen on some copies and not others cloned from the same database --- unless maybe there had been an index page split on the master in between the clonings, and that moved the troublesome OID into a position where it was more likely to get compared-to. That's not a hugely convincing explanation though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, 2012-12-18 at 08:17 +, Simon Riggs wrote: I think we should discuss whether we accept my premise? Checksums will actually detect more errors than we see now, and people will want to do something about that. Returning to backup is one way of handling it, but on a busy production system with pressure on, there is incentive to implement a workaround, not a fix. It's not an easy call to say we've got 3 corrupt blocks, so I'm going to take the whole system offline while I restore from backup. Up until now, my assumption has generally been that, upon finding the corruption, the primary course of action is taking that server down (hopefully you have a good replica), and do some kind of restore or sync a new replica. It sounds like you are exploring other possibilities. I suppose we could have a new ReadBufferMaybe function that would only be used by a sequential scan; and then just skip over the page if it's corrupt, depending on a GUC. That would at least allow sequential scans to (partially) work, which might be good enough for some data recovery situations. If a catalog index is corrupted, that could just be rebuilt. Haven't thought about the details, though. Not sure if you're being facetious here or not. No. It was an incomplete thought (as I said), but sincere. Mild reworking of the logic for heap page access could cope with a NULL buffer response and subsequent looping, which would allow us to run pg_dump against a damaged table to allow data to be saved, keeping file intact for further analysis. Right. I'm suggesting we work a little harder than your block is corrupt and give some thought to what the user will do next. Indexes are a good case, because we can/should report the block error, mark the index as invalid and then hint that it should be rebuilt. Agreed; this applies to any derived data. I don't think it will be very practical to keep a server running in this state forever, but it might give enough time to reach a suitable maintenance window. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, 2012-12-18 at 04:06 -0500, Greg Smith wrote: Having some way to nail down if the same block is bad on a given standby seems like a useful interface we should offer, and it shouldn't take too much work. Ideally you won't find the same corruption there. I'd like a way to check the entirety of a standby for checksum issues, ideally run right after it becomes current. It seems the most likely way to see corruption on one of those is to replicate a corrupt block. Part of the design is that pg_basebackup would verify checksums during replication, so we should not replicate corrupt blocks (of course, that's not implemented yet, so it's still a concern for now). And we can also have ways to do background/offline checksum verification with a separate utility. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Tue, Dec 18, 2012 at 09:34:53PM +0400, Groshev Andrey wrote: later in the log pg_dump, I found the definition of new rel -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); Can you post the full definition of the table on this public email list? Also, why did the error think this was in the public schema? Any idea? --- 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting I am now confused over the error message above. This is the code that is generating the error: /* * TOAST table names initially match the heap pg_class oid. * In pre-8.4, TOAST table names change during CLUSTER; in pre-9.0, * TOAST table names change during ALTER TABLE ALTER COLUMN SET TYPE. * In = 9.0, TOAST relation names always use heap table oids, hence * we cannot check relation names when upgrading from pre-9.0. * Clusters upgraded to 9.0 will get matching TOAST names. */ if (strcmp(old_rel-nspname, new_rel-nspname) != 0 || ((GET_MAJOR_VERSION(old_cluster.major_version) = 900 || strcmp(old_rel-nspname, pg_toast) != 0) strcmp(old_rel-relname, new_rel-relname) != 0)) pg_log(PG_FATAL, Mismatch of relation names: database \%s\, old rel %s.%s, new rel %s.%s\n, old_db-db_name, old_rel-nspname, old_rel-relname, new_rel-nspname, new_rel-relname); Looking at the Russian, I see 'old rel' public.lob.* and 'new rel' public.plob.*. I assume the database is called 'database', and the schema is called 'public', but what is 'lob' and 'plob'? If those are tables or indexes, what is after the period? Do you have periods embedded in the table/index names? That is certainly possible, but not common, e.g.: test= create table test.x (y int); CREATE TABLE Is the schema called public.lob? I expected to see schema.objname. -- Bruce Momjian br...@momjian.us http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/18/2012 09:38 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: People have been known to hack pg_enum on their own, especially before we added enum extension. Of course, if they do that they get to keep both pieces. Yeah ... this would be very readily explainable if there had been a manual deletion from pg_enum somewhere along the line. Even if there were at that time no instances of the OID left in tables, there could be some in upper btree pages. They'd have caused no trouble in 9.0 but would (if odd) cause trouble in 9.2. Of course, this theory doesn't explain why the problem was seen on some copies and not others cloned from the same database --- unless maybe there had been an index page split on the master in between the clonings, and that moved the troublesome OID into a position where it was more likely to get compared-to. That's not a hugely convincing explanation though. regards, tom lane Guys, thank youuu ll. :) reindex helped, did reindex on two tables, and everything is now working like expected. I will provide tomorrow all information which could help to understand everything in detail, but now it's gonna be late in germany :). and i got a headache of all this stuff ^^ Thanks so much!!! -- Bernhard Schrader System Administration InnoGames GmbH Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany Tel +49 40 7889335-53 Fax +49 40 7889335-22 Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com – bernhard.schra...@innogames.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
Hi, 2012/12/18 Ronan Dunklau rdunk...@gmail.com: Hello. I've tried to implement this API for our Multicorn FDW, and I have a few questions about the API. First, I don't understand what are the requirements on the rowid pseudo-column values. In particular, this sentence from a previous mail makes it ambiguous to me: At the beginning, I constructed the rowid pseudo-column using INTERNALOID, but it made troubles when fetched tuples are materialized prior to table modification. So, the rowid argument of them are re-defined as const char *. Does that mean that one should only store a cstring in the rowid pseudo-column ? Or can one store an arbitrary pointer ? Currently, I'm building a text value using cstring_to_text_with_len. Could there be a problem with that ? All what we require on the rowid pseudo-column is it has capability to identify a particular row on the remote side. In case of postgres_fdw, ctid of the relevant table is sufficient for the purpose. I don't recommend to set the rowid field an arbitrary pointer, because scanned tuple may be materialized between scanning and modifying foreign table, thus, the arbitrary pointer shall be dealt under the assumption of cstring data type. Secondly, how does one use a returning clause ? I've tried to look at the postgres_fdw code, but it does not seem to handle that. For what its worth, knowing that the postgres_fdw is still in WIP status, here is a couple result of my experimentation with it: - Insert into a foreign table mapped to a table with a before trigger, using a returning clause, will return the values as they were before being modified by the trigger. - Same thing, but if the trigger prevents insertion (returning NULL), the would-have-been inserted row is still returned, although the insert statement reports zero rows. Hmm. Do you want to see the real final contents of the rows being inserted, don't you. Yep, the proposed interface does not have capability to modify the supplied tuple on ExecForeignInsert or other methods. Probably, it needs to adjust interface to allow FDW drivers to return a modified HeapTuple or TupleTableSlot for RETURNING calculation. (As trigger doing, it can return the given one as-is if no change) Let me investigate the code around this topic. - Inserting into a table with default values does not work as intended, since missing values are replaced by a null value in the remote statement. It might be a bug of my proof-of-concept portion at postgres_fdw. The prepared INSERT statement should list up columns being actually used only, not all ones unconditionally. Thanks, What can be done to make the behaviour more consistent ? I'm very excited about this feature, thank you for making this possible. Regards, -- Ronan Dunklau 2012/12/14 Albe Laurenz laurenz.a...@wien.gv.at Kohei KaiGai wrote: I came up with one more query that causes a problem: [...] This causes a deadlock, but one that is not detected; the query just keeps hanging. The UPDATE in the CTE has the rows locked, so the SELECT ... FOR UPDATE issued via the FDW connection will hang indefinitely. I wonder if that's just a pathological corner case that we shouldn't worry about. Loopback connections for FDWs themselves might not be so rare, for example as a substitute for autonomous subtransactions. I guess it is not easily possible to detect such a situation or to do something reasonable about it. It is not avoidable problem due to the nature of distributed database system, not only loopback scenario. In my personal opinion, I'd like to wait for someone implements distributed lock/transaction manager on top of the background worker framework being recently committed, to intermediate lock request. However, it will take massive amount of efforts to existing lock/transaction management layer, not only enhancement of FDW APIs. It is obviously out of scope in this patch. So, I'd like to suggest authors of FDW that support writable features to put mention about possible deadlock scenario in their documentation. At least, above writable CTE example is a situation that two different sessions concurrently update the test relation, thus, one shall be blocked. Fair enough. I tried to overhaul the documentation, see the attached patch. There was one thing that I was not certain of: You say that for writable foreign tables, BeginForeignModify and EndForeignModify *must* be implemented. I thought that these were optional, and if you can do your work with just, say, ExecForeignDelete, you could do that. Yes, that's right. What I wrote was incorrect. If FDW driver does not require any state during modification of foreign tables, indeed, these are not mandatory handler. I have updated the documentation, that was all I changed in the attached patches. OK. I split the patch into two portion, part-1 is the APIs relevant patch, part-2
Re: [HACKERS] Parser Cruft in gram.y
On Tue, Dec 18, 2012 at 4:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: And on the other hand, if you could get a clean split between the two grammars, then regardless of exactly what the split was, it might seem a win. But it seemed to me when I looked at this that you'd have to duplicate a lot of stuff and the small parser still wouldn't end up being very small, which I found hard to get excited about. I think the goal is not so much about getting a much smaller parser, but more about have a separate parser that you don't care about the bloat of, so that you can improve DDL without fearing about main parser performance regressions. Well that would be nice, but the problem is that I see no way to implement it. If, with a unified parser, the parser is 14% of our source code, then splitting it in two will probably crank that number up well over 20%, because there will be duplication between the two. That seems double-plus un-good. I can't help but suspect that the way we handle keywords today is monumentally inefficient. The unreserved_keyword products, et al, just seem somehow badly wrong-headed. We take the trouble to distinguish all of those cases so that we an turn around and not distinguish them. I feel like there ought to be some way to use lexer states to handle this - if we're in a context where an unreserved keyword will be treated as an IDENT, then have the lexer return IDENT when it sees an unreserved keyword. I might be wrong, but it seems like that would eliminate a whole lot of parser state transitions. However, even if I'm right, I have no idea how to implement it. It just seems very wasteful that we have so many parser states that have no purpose other than (effectively) to convert an unreserved_keyword into an IDENT when the lexer could do the same thing much more cheaply given a bit more context. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
On 12/18/12 5:10 PM, Robert Haas wrote: I can't help but suspect that the way we handle keywords today is monumentally inefficient. The unreserved_keyword products, et al, just seem somehow badly wrong-headed. We take the trouble to distinguish all of those cases so that we an turn around and not distinguish them. I feel like there ought to be some way to use lexer states to handle this - if we're in a context where an unreserved keyword will be treated as an IDENT, then have the lexer return IDENT when it sees an unreserved keyword. The problem would be the lookahead. You need to know the next token before you can decide what context the current one is in. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - GetOldestXmin
Hi, Robert Haas robertmh...@gmail.com schrieb: On Fri, Dec 14, 2012 at 7:19 PM, Andres Freund and...@2ndquadrant.com wrote: On 2012-12-14 14:01:30 -0500, Robert Haas wrote: On Fri, Dec 14, 2012 at 6:46 AM, Andres Freund and...@2ndquadrant.com wrote: Just moving that tidbit inside the lock seems to be the pragmatic choice. GetOldestXmin is called * once per checkpoint * one per index build * once in analyze * twice per vacuum * once for HS feedback messages Nothing of that occurs frequently enough that 5 instructions will make a difference. I would be happy to go an alternative path, but right now I don't see any nice one. A already_locked parameter to GetOldestXmin seems to be a cure worse than the disease. I'm not sure that would be so bad, but I guess I question the need to do it this way at all. Most of the time, if you need to advertise your global xmin, you use GetSnapshotData(), not GetOldestXmin(), and I guess I'm not seeing why that wouldn't also work here. Am I dumb? I wondered upthread whether that would be better: On 2012-12-13 21:03:44 +0100, Andres Freund wrote: Another alternative to this would be to get a snapshot with GetSnapshotData(), copy the xmin to the logical slot, then call ProcArrayEndTransaction(). But that doesn't really seem to be nicer to me. Not sure why I considered it ugly anymore, but it actually has a noticeable disadvantage. GetOldestXmin is nicer is than GetSnapshotData as the latter set a fairly new xid as xmin whereas GetOldestXmin returns the actual current xmin horizon. Thats preferrable because it allows us to start up more quickly. snapbuild.c can only start building a snapshot once it has seen a xl_running_xact with oldestRunningXid = own_xmin. Otherwise we cannot be sure that no relevant catalog tuples have been removed. I'm a bit confused. Are you talking about the difference between RecentGlobalXmin and RecentXmin? I think GetSnapshotData() updates both. The problem is that at the time GetSnapshotData returns the xmin horizon might have gone upwards and tuples required for decoding might get removed by other backends. That needs to be prevented while holding the procarray lock exclusively. Does it make more sense now? Andres --- Please excuse the brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
On Tue, Dec 18, 2012 at 5:24 PM, Peter Eisentraut pete...@gmx.net wrote: On 12/18/12 5:10 PM, Robert Haas wrote: I can't help but suspect that the way we handle keywords today is monumentally inefficient. The unreserved_keyword products, et al, just seem somehow badly wrong-headed. We take the trouble to distinguish all of those cases so that we an turn around and not distinguish them. I feel like there ought to be some way to use lexer states to handle this - if we're in a context where an unreserved keyword will be treated as an IDENT, then have the lexer return IDENT when it sees an unreserved keyword. The problem would be the lookahead. You need to know the next token before you can decide what context the current one is in. Yeah, that's why I don't know how to make it work. It feels like this is partly artifact of the tool, though. I mean, suppose we haven't read anything yet. Then, the next token can't be an IDENT, so if we see an unreserved keyword, we know we're not going to convert it to an IDENT. OTOH, if we've seen CREATE TABLE, the next token cannot be an unreserved keyword that is intended as a keyword; it has to be something that will reduce to ColId. I guess the problem situation is where we can shift the keyword and then use the following token to decide whether to reduce it to ColId/type_function_name/ColLabel or use some other rule instead; the CREATE INDEX CONCURRENTLY productions might be such a case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
On Tue, Dec 18, 2012 at 12:09:10PM -0500, Peter Eisentraut wrote: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? +1. You can already use a SECURITY DEFINER wrapper, so I don't think this opens any particular floodgate. GRANT is a nicer interface. However, I would not advertise this as a replacement for wrapper functions until pg_dump can preserve ACL changes to pg_catalog objects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - GetOldestXmin
On Tue, Dec 18, 2012 at 5:25 PM, anara...@anarazel.de and...@anarazel.de wrote: The problem is that at the time GetSnapshotData returns the xmin horizon might have gone upwards and tuples required for decoding might get removed by other backends. That needs to be prevented while holding the procarray lock exclusively. Well, for the ordinary use of GetSnapshotData(), that doesn't matter, because GetSnapshotData() also updates proc-xmin. If you're trying to store a different value in that field then of course it matters. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
On Tue, Dec 18, 2012 at 7:41 PM, Noah Misch n...@leadboat.com wrote: On Tue, Dec 18, 2012 at 12:09:10PM -0500, Peter Eisentraut wrote: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? +1. You can already use a SECURITY DEFINER wrapper, so I don't think this opens any particular floodgate. GRANT is a nicer interface. However, I would not advertise this as a replacement for wrapper functions until pg_dump can preserve ACL changes to pg_catalog objects. Yeah. That is a bit of a foot-gun to this approach, although I too agree on the general theory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
On 18.12.2012 18:38, Pavel Stehule wrote: 2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? isn't it too strong gun for some people ??? I believe so some one can decrease necessary rights and it opens doors to system. No one was speaking about making them executable by a wider group of users by default (i.e. decreasing necessary rights). Today, when you need to provide the EXECUTE privilege on those functions, you have three options (a) make him a superuser - obviously not a good choice (b) create a SECURITY DEFINER wrapper **for each function separately** (c) deny to do that Being able to do a plain GRANT on the function is merely a simpler way to do (b). It has advantages (less objects/functions to care about) and disadvantages (e.g. you can't do additional parameter values checks). pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir is relative dangerous and I am not for opening these functions. power user can simply to write extension, but he knows what he does/ I see only dangers that are already present. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - GetOldestXmin
On 2012-12-18 19:56:18 -0500, Robert Haas wrote: On Tue, Dec 18, 2012 at 5:25 PM, anara...@anarazel.de and...@anarazel.de wrote: The problem is that at the time GetSnapshotData returns the xmin horizon might have gone upwards and tuples required for decoding might get removed by other backends. That needs to be prevented while holding the procarray lock exclusively. Well, for the ordinary use of GetSnapshotData(), that doesn't matter, because GetSnapshotData() also updates proc-xmin. If you're trying to store a different value in that field then of course it matters. Absolutely right. I don't want to say there's anything wrong with it right now. The problem for me is that it sets proc-xmin to the newest value it can while I want/need the oldest valid value... I will go with adding a already_locked parameter to GetOldestXmin then. Thanks for the input, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles
[moved to hackers] On Wednesday, December 5, 2012, Tom Lane wrote: Jeff Janes jeff.ja...@gmail.com javascript:; writes: I now see where the cost is coming from. In commit 21a39de5809 (first appearing in 9.2) the fudge factor cost estimate for large indexes was increased by about 10 fold, which really hits this index hard. This was fixed in commit bf01e34b556 Tweak genericcostestimate's fudge factor for index size, by changing it to use the log of the index size. But that commit probably won't be shipped until 9.3. Hm. To tell you the truth, in October I'd completely forgotten about the January patch, and was thinking that the 1/1 cost had a lot of history behind it. But if we never shipped it before 9.2 then of course that idea is false. Perhaps we should backpatch the log curve into 9.2 --- that would reduce the amount of differential between what 9.2 does and what previous branches do for large indexes. I think we should backpatch it for 9.2.3. I've seen another email which is probably due to the same issue (nested loop vs hash join). And some monitoring of a database I am responsible for suggests it might be heading in that direction as well as the size grows. But I am wondering if it should be present at all in 9.3. When it was introduced, the argument seemed to be that smaller indexes might be easier to keep in cache. And surely that is so. But if a larger index that covers the same type of queries exists when a smaller one also exists, we can assume the larger one also exists for a reason. While it may be easier to keep a smaller index in cache, it is not easier to keep both a larger and a smaller one in cache as the same time. So it seems to me that this reasoning is a wash. (Countering this argument is that a partial index is more esoteric, and so if one exists it is more likely to have been well-thought out) The argument for increasing the penalty by a factor of 10 was that the smaller one could be swamped by noise such as page-boundary-roundoff behavior. I don't really know what that means, but it seems to me that if it is so easily swamped by noise, then it probably isn't so important in the first place which one it chooses. Whereas, I think that even the log based penalty has the risk of being too much on large indexes. (For one thing, it implicitly assumes the fan-out ratio at each level of btree is e, when it will usually be much larger than e.) One thing which depends on the index size which, as far as I can tell, is not currently being counted is the cost of comparing the tuples all the way down the index. This would be proportional to log2(indextuples) * cpu_index_tuple_cost, or maybe log2(indextuples) * (cpu_index_tuple_cost+cpu_operator_cost), or something like that. This cost would depend on the number index tuples, not baserel tuples, and so would penalize large indexes. It would be much smaller than the current log(pages/1) penalty, but it would be more principle-based rather than heuristic-based. The log(pages/1) change is more suitable for back-patching because it is more conservative, being asymptotic with the previous behavior at the low end. But I don't think that the case for that previous behavior was ever all that strong. If we really want a heuristic to give a bonus to partial indexes, maybe we should explicitly give them a bonus, rather than penalizing ordinary indexes (which penalty is then used in comparing them to hash joins and such, not just partial indexes). maybe something like bonus = 0.05 * (reltuples-indextuples)/reltuples Cheers, Jeff
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote: I've updated the patch to include the optimization described in the previous post, i.e. if the number of relations is below a certain threshold, use a simple for loop, for large numbers of relations use bsearch calls. This is done by a new constant BSEARCH_LIMIT, which is set to 10 in the patch. Then I've modified the 'drop-test' script to take yet another argument - number of indexes for each table. So for example this ./drop-test.py 1 100 3 'dbname=test' means create 1 tables, 3 indexes for each of them, and then drop them in batches of 100 tables. Then I've run the test with 0, 1, 2, ... 11 indexes for each table for (a) unpatched HEAD (b) patch v3.1 (without the optimization) (c) patch v3.3 (with BSEARCH_LIMIT=10) and I got these results: Nice work! 1) dropping one-by-one -- This is the really interesting part - the issue with the v3.1 is that for a single table, it's ~2x slower than unpatched PostgreSQL. 0 1 2 3 4 5 6 789 10 11 -- unpatched 16 28 40 52 63 75 87 99 110 121 135 147 v3.1 33 43 46 56 58 60 63 72 75 76 79 80 v3.3 16 20 23 25 29 33 36 40 44 47 79 82 The values are durations in seconds, rounded to integer values. I've run the test repeatedly and there's very small variance in the numbers. The v3.3 improves that and it's actually even faster than unpatched PostgreSQL. How can this happen? I believe it's because the code is rewritten from for each relation (r) in the drop list DropRelFileNodeAllBuffers (r) for each shared buffer check and invalidate to copy the relations from drop list to array (a) DropRelFileNodeAllBuffers(a) for each shared buffer for each relation (r) in the array check and invalidate At least that's the only explanation I was able to come up with. That seems like a rather sensible explanation. The earlier algorithm iterated over shared buffers multiple times which is the expensive thing here, the new version doesn't so its sensible that its faster as long as the comparison method for checking whether a buffer belongs to relation is suitably fast. Yet another interesting observation is that even the v3.1 is about as fast as the unpatched code once there are 3 or more indexes (or TOAST tables). So my opinion is that the optimizated patch works as expected, and that even without the optimization the performance would be acceptable for most real-world cases. I think except of the temp buffer issue mentioned below its ready. -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes) { - int i; + int i, j; + + /* sort the list of rnodes */ + pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator); /* If it's a local relation, it's localbuf.c's problem. */ - if (RelFileNodeBackendIsTemp(rnode)) + for (i = 0; i nnodes; i++) { - if (rnode.backend == MyBackendId) - DropRelFileNodeAllLocalBuffers(rnode.node); - return; + if (RelFileNodeBackendIsTemp(rnodes[i])) + { + if (rnodes[i].backend == MyBackendId) + DropRelFileNodeAllLocalBuffers(rnodes[i].node); + } } While you deal with local buffers here you don't anymore in the big loop over shared buffers. That wasn't needed earlier since we just returned after noticing we have local relation, but thats not the case anymore. for (i = 0; i NBuffers; i++) { + RelFileNodeBackend *rnode = NULL; volatile BufferDesc *bufHdr = BufferDescriptors[i]; - + /* * As in DropRelFileNodeBuffers, an unlocked precheck should be safe * and saves some cycles. */ - if (!RelFileNodeEquals(bufHdr-tag.rnode, rnode.node)) + + /* + * For low number of relations to drop just use a simple walk through, + * to save the bsearch overhead. The BSEARCH_LIMIT is rather a guess + * than a exactly determined value, as it depends on many factors (CPU + * and RAM speeds, amount of shared buffers etc.). + */ + if (nnodes = BSEARCH_LIMIT) I think thats a sensible plan. It makes sense that for a small number of relations a sequential scan of the rnodes array is faster than a bsearch and 10 sounds like a good value although I would guess the optimal value is slightly higher on most machines. But if it works fine without regressions thats pretty good... + +/* + * Used to sort relfilenode array
Re: [HACKERS] Patch for checking file parameters to psql before password prompt
On Sun, Dec 2, 2012 at 4:37 AM, Alastair Turner b...@ctrlf5.co.za wrote: Patch for the changes discussed in http://archives.postgresql.org/pgsql-hackers/2010-10/msg00919.php attached (eventually ...) In summary: If the input file (-f) doesn't exist or the ouput or log files (-o and -l) can't be created psql exits before prompting for a password. I assume you meant -L instead of -l here for specifying psql's log file. I don't think that the inability to write to psql's log file should be treated as a fatal error, especially since it is not treated as such by the current code: $ psql test -L /tmp/not_allowed psql: could not open log file /tmp/not_allowed: Permission denied [... proceeds to psql prompt from here ...] and the user (or script) may still usefully perform his work. Whereas with your patch: $ psql test -L /tmp/not_allowed psql: could not open log file /tmp/not_allowed: Permission denied $ And IMO the same concern applies to the query results file, -o. Although +1 for the part about having psql exit early if the input filename does not exist, since psql already bails out in this case, and there is nothing else to be done in such case. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Makefiles don't seem to remember to rebuild everything anymore
On Mon, Dec 17, 2012 at 1:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Now perhaps this is not make's fault so much as a lack of adequate dependency specifications. It may be that we can still use .SECONDARY if we add the $(OBJS) lists as explicit targets of make all in backend directories, but I'm not sure how invasive that would be. I experimented a bit with this: diff --git a/src/backend/common.mk b/src/backend/common.mk index 2e56151..822b1e9 100644 *** a/src/backend/common.mk --- b/src/backend/common.mk *** SUBDIROBJS = $(SUBDIRS:%=%/$(subsysfilen *** 20,26 # top-level backend directory obviously has its own all target ifneq ($(subdir), src/backend) ! all: $(subsysfilename) endif SUBSYS.o: $(SUBDIROBJS) $(OBJS) --- 20,26 # top-level backend directory obviously has its own all target ifneq ($(subdir), src/backend) ! all: $(subsysfilename) $(OBJS) endif SUBSYS.o: $(SUBDIROBJS) $(OBJS) which seems to fix the main issue, but it's still a bit wonky as far as making objfiles.txt goes: $ cd pgsql/src/backend/parser/ $ rm analyze.o rm: remove regular file `analyze.o'? y $ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o analyze.o analyze.c $ make touch objfiles.txt $ make make: Nothing to be done for `all'. This is definitely not per make's contract, either. I think maybe the Don't rebuild the list if only the OBJS have changed hack in common.mk is a brick or two shy of a load, but I don't know how to fix that. I feel like it's been this way for a while - at least I feel like I've noticed this before. I think there is some inevitable kludginess around having one makefile per subdirectory that leads to these kinds of issues. Maybe we should get rid of all the makefiles under src/backend except for the top-level one and just do everything there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - GetOldestXmin
On Tue, Dec 18, 2012 at 7:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2012-12-18 19:56:18 -0500, Robert Haas wrote: On Tue, Dec 18, 2012 at 5:25 PM, anara...@anarazel.de and...@anarazel.de wrote: The problem is that at the time GetSnapshotData returns the xmin horizon might have gone upwards and tuples required for decoding might get removed by other backends. That needs to be prevented while holding the procarray lock exclusively. Well, for the ordinary use of GetSnapshotData(), that doesn't matter, because GetSnapshotData() also updates proc-xmin. If you're trying to store a different value in that field then of course it matters. Absolutely right. I don't want to say there's anything wrong with it right now. The problem for me is that it sets proc-xmin to the newest value it can while I want/need the oldest valid value... I will go with adding a already_locked parameter to GetOldestXmin then. Or instead of bool already_locked, maybe bool advertise_xmin? Seems like that might be more friendly to the abstraction boundaries. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2012-12-17 11:44:00 +0900, Michael Paquier wrote: Thanks for all your comments. The new version (v5) of this patch fixes the error you found when reindexing indexes being referenced in foreign keys. The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in charge of scanning pg_constraint for foreign keys that refer the parent relation (confrelid) of the index being swapped and then switch conindid to the new index if the old index was referenced. This API also takes care of switching the dependency between the foreign key and the old index by calling changeDependencyFor. I also added a regression test for this purpose. Ok. Are there no other depencencies towards indexes? I don't know of any right now, but I have the feeling there were some other cases. On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund and...@2ndquadrant.comwrote: Some review comments: * Some of the added !is_reindex in index_create don't seem safe to me. This is added to control concurrent index relation for toast indexes. If we do not add an additional flag for that it will not be possible to reindex concurrently a toast index. I think some of them were added for cases that didn't seem to be related to that. I'll recheck in the current version. * Why do we now support reindexing exclusion constraints? CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I played around with exclusion constraints with my patch and did not particularly see any problems in supporting them as for example index_build performs a second scan of the heap when running so it looks enough solid for that. Is it because the structure of REINDEX CONCURRENTLY patch is different? Honestly I think no so is there something I am not aware of? I think I asked because you had added an !is_reindex to one of the checks. If I recall the reason why concurrent index builds couldn't support exclusion constraints correctly - namely that we cannot use them to check for new row versions when the index is in the ready !valid state - that shouldn't be a problem when we have a valid version of an old index arround because that enforces everything. It would maybe need an appropriate if (!isvalid) in the exclusion constraint code, but that should be it. * REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the concurrent reindexing for user-tables and non-concurrent for system tables would be very useful. E.g. for the upgrade from 9.1.5-9.1.6... OK. I thought that this was out of scope for the time being. I haven't done anything about that yet. Supporting that will not be complicated as ReindexRelationsConcurrently (new API) is more flexible now, the only thing needed is to gather the list of relations that need to be reindexed. Imo that so greatly reduces the usability of this patch that you should treat it as in scope ;). Especially as you say, it really shouldn't be that much work with all the groundwork built. * would be nice (but thats probably a step #2 thing) to do the individual steps of concurrent reindex over multiple relations to avoid too much overall waiting for other transactions. I think I did that by now using one transaction per index for each operation except the drop phase... Without yet having read the new version, I think thats not what I meant. There currently is a wait for concurrent transactions to end after most of the phases for every relation, right? If you have a busy database with somewhat longrunning transactions thats going to slow everything down with waiting quite bit. I wondered whether it would make sense to do PHASE1 for all indexes in all relations, then wait once, then PHASE2... That obviously has some space and index maintainece overhead issues, but its probably sensible anyway in many cases. * PHASE 6 should acquire exlusive locks on the indexes The necessary lock is taken when calling index_drop through performMultipleDeletion. Do you think it is not enough and that i should add an Exclusive lock inside index_concurrent_drop? It seems to be safer to acquire it earlier, otherwise the likelihood for deadlocks seems to be slightly higher as youre increasing the lock severity. And it shouldn't cause any disadvantages,s o ... Starts to look really nice now! Isn't the following block content thats mostly available somewhere else already? + refsect2 id=SQL-REINDEX-CONCURRENTLY + title id=SQL-REINDEX-CONCURRENTLY-titleRebuilding Indexes Concurrently/title + + indexterm zone=SQL-REINDEX-CONCURRENTLY + primaryindex/primary + secondaryrebuilding concurrently/secondary + /indexterm + + para +Rebuilding an index can interfere with regular operation of a database. +Normally productnamePostgreSQL/ locks the table whose index is rebuilt +against writes and performs the entire index build with a single scan of the +table. Other transactions can still read the table, but if
Re: [HACKERS] Switching timeline over streaming replication
Heikki, I ran into an unexpected issue while testing. I just wanted to fire up a chain of 5 replicas to see if I could connect them in a loop. However, I ran into a weird issue when starting up r3: it refused to come out of the database is starting up mode until I did a write on the master. Then it came up fine. master--r1--r2--r3--r4 I tried doing the full replication sequence (basebackup, startup, test) with it twice and got the exact same results each time. This is very strange because I did not encounter the same issues with r2 or r4. Nor have I seen this before in my tests. I'm also seeing Thom's spurious error message now. Each of r2, r3 and r4 have the following message once in their logs: LOG: database system was interrupted while in recovery at log time 2012-12-19 02:49:34 GMT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. This message doesn't seem to signify anything. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cascading replication: should we detect/prevent cycles?
Folks, So as a test I tried to connect a group of 9.3 streaming replicas in a circle (4 replicas). This was very easy to do: 1. create r1 as replica of master 2. create r2 as replica of r1 3. create r3 as replica of r2 4. create r4 as replica of r3 5. start traffic on master 6. shut down r1 7. point r1 to r4 in recovery.conf 8. start r1 replicas are now successfully connected in this pattern: r1 --- r2 ^ | | | | v r4 --- r3 pg_stat_replication displays the correct information on each replica. So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Feature Request: pg_replication_master()
Hackers, Currently we can see each master's current replicas using pg_stat_replication. However, there is no way from a replica, that I know of, to figure out who its master is other than to look at recovery.conf. We should probably have a function, like pg_replication_master(), which gives the host address of the current master. This would help DBAs for large replication clusters a lot. Obviously, this would only work in streaming. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Makefiles don't seem to remember to rebuild everything anymore
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 17, 2012 at 1:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: This is definitely not per make's contract, either. I think maybe the Don't rebuild the list if only the OBJS have changed hack in common.mk is a brick or two shy of a load, but I don't know how to fix that. I feel like it's been this way for a while - at least I feel like I've noticed this before. I think there is some inevitable kludginess around having one makefile per subdirectory that leads to these kinds of issues. Maybe we should get rid of all the makefiles under src/backend except for the top-level one and just do everything there. I mentioned this paper last year, but maybe it's time to start taking it seriously: http://aegis.sourceforge.net/auug97.pdf regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz wrote: Hi, attached is a new version of the patch that (a) converts the 'log_step_seconds' variable to a constant (and does not allow changing it using a command-line option etc.) (b) keeps the current logging as a default (c) adds a -q switch that enables the new logging with a 5-second interval I'm still not convinced there should be yet another know for tuning the log interval - opinions? It seems that you have generated a patch over your earlier version and due to that it is not cleanly applying on fresh sources. Please generate patch on fresh sources. However, I absolutely no issues with the design. Also code review is already done and looks good to me. I think to move forward on this we need someone from core-team. So I am planning to change its status to ready-for-committor. Before that please provide updated patch for final code review. Thanks Tomas On 11.12.2012 10:23, Jeevan Chalke wrote: On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 20.11.2012 08:22, Jeevan Chalke wrote: Hi, On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz mailto:t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line. However, I see, there are two types of users here: 1. Who likes these log lines, so that they can troubleshoot some slowness and all 2. Who do not like these log lines. Who likes these lines / needs them for something useful? No idea. I fall in second category. But from the discussion, I believe some people may need detailed (or lot more) output. I've read the thread again and my impression is that no one really needs or likes those lines, but (1) it's rather pointless to print a message every 100k rows, as it usually fills the console with garbabe (2) it's handy to have regular updates of the progress I don't think there're people (in the thread) that require to keep the current amount of log messages. But there might be users who actually use the current logs to do something (although I can't imagine what). If we want to do this in a backwards compatible way, we should probably use a new option (e.g. -q) to enable the new (less verbose) logging. Do we want to allow both types of logging, or shall we keep only the new one? If both, which one should be the default one? Both the options are fine with me, but the default should be the current behaviour. So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high. So what option(s) would you expect? Something that tunes the interval length or something else? Interval length. Well, I can surely imagine something like --interval N. +1 A switch that'd choose between the old and new behavior might be a good idea, but I'd strongly vote against automagic heuristics. It makes the behavior very difficult to predict and I really don't want to force the users to wonder whether the long delay is due to general slowness of the machine or some clever logic that causes long delays between log messages. That's why I choose a very simple approach with constant time interval. It does what I was aiming for (less logs) and it's easy to predict. Sure, we could choose different interval (or make it an option). I am preferring an option for choosing an interval, say from 1 second to 10 seconds. U, why not to allow arbitrary integer? Why saying 1 to 10 seconds? Hmm.. actually, I have no issues with any number there. Just put 1..10 as we hard-coded it 5. No particular reason as such. BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ? This will have only 10 (or 20) lines per pgbench
Re: [HACKERS] Feature Request: pg_replication_master()
On Dec 19, 2012 4:43 AM, Josh Berkus j...@agliodbs.com wrote: Hackers, Currently we can see each master's current replicas using pg_stat_replication. However, there is no way from a replica, that I know of, to figure out who its master is other than to look at recovery.conf. We should probably have a function, like pg_replication_master(), which gives the host address of the current master. This would help DBAs for large replication clusters a lot. Obviously, this would only work in streaming. This sounds like my previous suggestion of returning the primary conninfo value, but with just ip. That one came with a pretty bad patch, and was later postponed until we folded recovery.conf into the main configuration file parsing. I'm not really sure what happened to that project? (the configuration file one) /Magnus
Re: [HACKERS] system administration functions with hardcoded superuser checks
On Wed, Dec 19, 2012 at 1:58 AM, Tomas Vondra t...@fuzzy.cz wrote: On 18.12.2012 18:38, Pavel Stehule wrote: 2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? isn't it too strong gun for some people ??? I believe so some one can decrease necessary rights and it opens doors to system. No one was speaking about making them executable by a wider group of users by default (i.e. decreasing necessary rights). Today, when you need to provide the EXECUTE privilege on those functions, you have three options Given how limited these functions are in scope, I don't see a problem here. pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir is relative dangerous and I am not for opening these functions. power user can simply to write extension, but he knows what he does/ I see only dangers that are already present. Granting executability on pg_read_xyz is pretty darn close to granting superuser, without explicitly asking for it. Well, you get read only superuser. If we want to make that step as easy as just GRANT, we really need to write some *very* strong warnings in the documentation so that people realize this. I doubt most people will realize it unless we do that (and those who don't read the docs, whch is probably a majority, never will). If you use SECURITY DEFINER, you can limit the functions to *the specific files that you want to grant read on*. Which makes it possible to actually make it secure. E.g. you *don't* have to give full read to your entire database. If you're comparing it to a blanket SECURITY DEFINER with no checks, then yes, it's a simpler way to fire the cannon into your own foot, yes. But if also gives you a way that makes it more likely that you don't *realize* that you're about to fire a cannon into your foot. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - assign result of query to psql variable
On Tue, Dec 18, 2012 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Attached updated patch Seems fine. I'll mark this as ready for committer. Nice work! -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers