Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On 11/9/12 11:59 PM, Amit kapila wrote: Please let me know if there are any objections or problems in above method of implementation, else I can go ahead to prepare the patch for the coming CF. It may be the case that the locking scheme Robert described is the best approach here. It seems kind of heavy to me though. I suspect that some more thinking about it might come up with something better. Regardless, exactly how to prevent two concurrent processes from writing the same file feels like the last step in the small roadmap for what this feature needs. If you wanted to work on it more, I'd suggest breaking it into chunks in this order: 1) Change to add scanning a .conf directory in the default configuration using include-dir. This is a quick fix. I predict most of the headaches around it will end up being for packagers rather than the core code to deal with. You could submit this as a small thing to be evaluated on its own. How it's done is going to be controversial. Might as well get that fighting focused against a sample implementation as soon as possible. 2) Get familiar with navigating the GUC data and figuring out what, exactly, needs to be written out. This should include something that navigates like things appear after a RESET, ignoring per-user or per-session changes when figuring out what goes there. It seems inevitable that some amount of validating against the source information--what pg_settings labels source, sourcefile, and sourceline will be needed. An example is the suggestion Magnus made for confirming that the include-dir is still active before writing something there. 3) Add the function to write a new file out. Work out some test cases for that to confirm the logic and error checking in the previous step all works. I'd next submit what you've got for (2) and (3) to review at this point, before complicating things further with the locking parts. 4) Make the file write atomic and able to work when multiple users try it at once. You have to reach here successfully before the trivial around file locking comes into play. I wouldn't even bother aiming for that part in a first patch. It's obviously a solvable problem in a number of ways. You need a rock solid way to figure out what to write there before that solution is useful though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 11/12/12 12:55 AM, Jesper Krogh wrote: I'd just like some rough guard against hardware/OS related data corruption. and that is more likely to hit data-blocks constantly flying in and out of the system. I get that. I think that some of the design ideas floating around since this feature was first proposed have been innovating in the hope of finding a clever halfway point here. Ideally we'd be able to get online checksum conversion and up running easily, reliably, and without adding a lot of code. I have given up on that now though. The approach of doing a heavy per-table conversion with more state information than we'd like seems unavoidable, if you want to do it right and allow people to (slowly but surely) reach a trustworthy state. I think we should stop searching for a clever way around and just do slog through doing it. I've resigned myself to that now, and recently set aside a good block of time to beat my head against that particular wall over the next couple of months. But I totally agree that the scheme described with integrating it into a autovacuum process would be very close to ideal, even on a database as the one I'm running. I am sadly all too familiar with how challenging it is to keep a 2TB PostgreSQL database running reliably. One of my recent catch phrases for talks is "if you have a big Postgres database, you also have a vacuum problem". I think it's unreasonable to consider online conversion solutions that don't recognize that, and allow coordinating the work with the challenges of vacuuming larger systems too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 12/11/12 05:55, Greg Smith wrote: The only guarantee I see that we can give for online upgrades is that after a VACUUM CHECKSUM sweep is done, and every page is known to both have a valid checksum on it and have its checksum bits set, *then* any page that doesn't have both set bits and a matching checksum is garbage. Until reaching that point, any old data is suspect. The idea of operating in an "we'll convert on write but never convert old pages" can't come up with any useful guarantees about data integrity that I can see. As you say, you don't ever gain the ability to tell pages that were checksummed but have since been corrupted from ones that were corrupt all along in that path. You're right about that, but I'd just like some rough guard against hardware/OS related data corruption. and that is more likely to hit data-blocks constantly flying in and out of the system. I'm currently running a +2TB database and the capabillity to just see some kind of corruption earlier rather than later is a major benefit by itself. Currently corruption can go undetected if it just happens to hit data-only parts of the database. But I totally agree that the scheme described with integrating it into a autovacuum process would be very close to ideal, even on a database as the one I'l running. -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP checksums patch
On 11/9/12 6:14 PM, Jeff Davis wrote: On Mon, 2012-11-05 at 12:19 -0500, Robert Haas wrote: Yeah. I definitely think that we could shed an enormous amount of complexity by deciding that this is, for now, an option that can only be selected at initdb time. That would remove approximately 85% of everything I've ever disliked about this patch - without, I think, precluding the possibility of improving things later. That's certainly true, but it introduces one large problem: upgrading would not work, which (in the past few releases) we've treated as a major showstopper for many features. If you have pages that were written with one datetime storage format, and you create a cluster using the other one, that will fail. If checksums are done as an initdb time option, I see "checksummed" as being a block change on that level, and the precedent for not supporting it defensible. pg_upgrade will need to check for a mismatch--new cluster has checksums turned on, old one doesn't--and abort out if that happens. That can be lumped in with the other pg_controldata tests easily enough. What I think this argues for, though, is being precise about naming what goes into pg_controldata. Let's say the initial commit target is an initdb time switch, but later finer grained ones are expected to be available. I think the output and source code labels on the initdb controldata part should refer to this as something like "checksums available" then. The word "enabled" could be misleading when there's finer grained enabling coming later. We don't want people to run pg_controldata, see "checksums: enabled/on", and later discover they're not fully operational in that cluster yet. Saying "checksums: available" suggests there's somewhere else that should be checked, to tell if they're available on a given database/table or not. The sort of text I'm thinking of for the manual and/or warning is: "You can't use pg_upgrade to migrate from a cluster where checksums are not available to one where they are. This limitation may be removed by a future version." -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 11/11/12 2:56 PM, Jeff Davis wrote: We could have a separate utility, pg_checksums, that can alter the state and/or do an offline verification. And initdb would take an option that would start everything out fully protected with checksums. Adding an initdb option to start out with everything checksummed seems an uncontroversial good first thing to have available. It seems like a proper 9.3 target to aim at even if per-table upgrading gets bogged down in details. I have an argument below that the area between initdb and per-table upgrades is fundamentally uncertain and therefore not worth chasing after, based on reasons you already started to outline. There's not much useful middle ground there. Won't a pg_checksums program just grow until it looks like a limited version of vacuum though? It's going to iterate over most of the table; it needs the same cost controls as autovacuum (and to respect the load of concurrent autovacuum work) to keep I/O under control; and those cost control values might change if there's a SIGHUP to reload parameters. It looks so much like vacuum that I think there needs to be a really compelling reason to split it into something new. Why can't this be yet another autovacuum worker that does its thing? > In order to get to the fully-protected state, you still need to > somehow make sure that all of the old data is checksummed. > > And the "fully protected" state is important in my opinion, because > otherwise we aren't protected against corrupt page headers that say > they have no checksum (even when it really should have a checksum). I think it's useful to step back for a minute and consider the larger uncertainty an existing relation has, which amplifies just how ugly this situation is. The best guarantee I think online checksumming can offer is to tell the user "after transaction id X, all new data in relation R is known to be checksummed". Unless you do this at initdb time, any conversion case is going to have the possibility that a page is corrupted before you get to it--whether you're adding the checksum as part of a "let's add them while we're writing anyway" page update or the conversion tool is hitting it. That's why I don't think anyone will find online conversion really useful until they've done a full sweep updating the old pages. And if you accept that, a flexible checksum upgrade utility, one that co-exists with autovacuum activity costs, becomes a must. One of the really common cases I was expecting here is that conversions are done by kicking off a slow background VACUUM CHECKSUM job that might run in pieces. I was thinking of an approach like this: -Initialize a last_checked_block value for each table -Loop: --Grab the next block after the last checked one --When on the last block of the relation, grab an exclusive lock to protect against race conditions with extension --If it's marked as checksummed and the checksum matches, skip it ---Otherwise, add a checksum and write it out --When that succeeds, update last_checked_block --If that was the last block, save some state saying the whole table is checkedsummed With that logic, there is at least a forward moving pointer that removes the uncertainty around whether pages have been updated or not. It will keep going usefully if interrupted too. One obvious this way this can fail is if: 1) A late page in the relation is updated and a checksummed page written 2) The page is corrupted such that the "is this checksummed?" bits are not consistent anymore, along with other damage to it 3) The conversion process gets to this page eventually 4) The corruption of (2) isn't detected But I think that this possibility--that a page might get quietly corrupted after checked once, but still in the middle of checking a relation--is both impossible to remove and a red herring. How do we know that this page of the relation wasn't corrupted on disk before we even started? We don't, and we can't. The only guarantee I see that we can give for online upgrades is that after a VACUUM CHECKSUM sweep is done, and every page is known to both have a valid checksum on it and have its checksum bits set, *then* any page that doesn't have both set bits and a matching checksum is garbage. Until reaching that point, any old data is suspect. The idea of operating in an "we'll convert on write but never convert old pages" can't come up with any useful guarantees about data integrity that I can see. As you say, you don't ever gain the ability to tell pages that were checksummed but have since been corrupted from ones that were corrupt all along in that path. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch "only relevant" -> "relevant only"
On Tue, 2012-10-16 at 22:24 -0500, Karl O. Pinc wrote: > In a number of places the docs read "only relevant", > this patch reverses this to read "relevant only". committed -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 8:27 PM, Tom Lane wrote: > Noah Misch writes: > > So, I can reproduce the lower threshold, but the exception type does not > agree > > with the one Matthew observed. > > I finally got around to looking at the link you provided about error > 0xC409, and realized that I'd been completely confusing it with > stack overflow --- but actually, it's a report that something scribbled > past the end of a finite-size local-variable array. So I now think that > Matthew's stumbled across two completely independent bugs, and we've > fixed only one of them. The 0xC409 error is something else, and > possibly a lot worse since it could conceivably be a security issue. > > It still seems likely that the actual location of the bug is either > in PostGIS or in the GIST index code, but without the ability to > reproduce the failure it's awfully hard to find it. Matthew, could > you try a bit harder to find a self-contained test case that produces > that error? > > regards, tom lane > Sure, it might take me a while to find time but I'll keep it on my list. Matt
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
Noah Misch writes: > So, I can reproduce the lower threshold, but the exception type does not agree > with the one Matthew observed. I finally got around to looking at the link you provided about error 0xC409, and realized that I'd been completely confusing it with stack overflow --- but actually, it's a report that something scribbled past the end of a finite-size local-variable array. So I now think that Matthew's stumbled across two completely independent bugs, and we've fixed only one of them. The 0xC409 error is something else, and possibly a lot worse since it could conceivably be a security issue. It still seems likely that the actual location of the bug is either in PostGIS or in the GIST index code, but without the ability to reproduce the failure it's awfully hard to find it. Matthew, could you try a bit harder to find a self-contained test case that produces that error? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
On 2012-11-10 16:24:22 -0500, Tom Lane wrote: > Simon Riggs writes: > >> One thing that seems a bit annoying is the use of zero-based backup > >> block indexes in RestoreBackupBlock, when most (not all) of the callers > >> are referencing macros with one-based names (XLR_BKP_BLOCK_1 etc). > >> That's a bug waiting to happen. We could address it by changing > >> RestoreBackupBlock to accept a one-based argument, but what I'm thinking > >> of doing instead is getting rid of the one-based convention entirely; > >> that is, remove XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4 and instead have > >> all code use the XLR_SET_BKP_BLOCK() macro, which is zero-based. One > >> advantage of doing that is that it'll force inspection of all code > >> related to this. > > > I wouldn't do that in a back branch, but I can see why its a good idea. > > If any of this stuff were getting used by external modules, changing it > would be problematic ... but fortunately, it isn't, because we lack > support for plug-in rmgrs. So I'm not worried about back-patching the > change, and would prefer to keep the 9.x branches in sync. XLR_BKP_BLOCK_* might be used by things like pg_lesslog and its surely used by xlogdump. Not sure if either are worth that much attention, but it seems worth noticing that such a change will break stuff. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 11/11/2012 05:52 PM, Jeff Davis wrote: On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote: I don't think so GUC are good for this purpouse, but I don't like single purpouse statements too. what do you think about enhancing ALTER DATABASE statement some like ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS Per-database does sound easier than per-table. I'd have to think about how that would affect shared catalogs though. For now, I'm leaning toward an offline utility to turn checksums on or off, called pg_checksums. It could do so lazily (just flip a switch to "enabling" in pg_control), or it could do so eagerly and turn it into a fully-protected instance. For the first patch, it might just be an initdb-time option for simplicity. +1 I haven't followed this too closely, but I did wonder several days ago why this wasn't being made an initdb-time decision. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: pgbench - aggregation of info written into log
Hi, attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a bit more investigation, it seems to me that we can't really get the same behavior as in other systems - basically the timestamp is unavailable so we can't log the interval start. So it seems to me the best we can do is to disable this option on Windows (and this is done in the patch). So when you try to use "--aggregate-interval" on Win it will complain it's an unknown option. Now that I think of it, there might be a better solution that would not mimic the Linux/Unix behaviour exactly - we do have elapsed time since the start of the benchmark, so maybe we should use this instead of the timestamp? I mean on systems with reasonable gettimeofday we'd get 1345828501 5601 1542744 483552416 61 2573 1345828503 7884 1979812 565806736 60 1479 1345828505 7208 1979422 567277552 59 1391 1345828507 7685 1980268 569784714 60 1398 1345828509 7073 1979779 573489941 236 1411 but on Windows we'd get 0 5601 1542744 483552416 61 2573 2 7884 1979812 565806736 60 1479 4 7208 1979422 567277552 59 1391 6 7685 1980268 569784714 60 1398 8 7073 1979779 573489941 236 1411 i.e. the first column is "seconds since start of the test". H, and maybe we could call 'gettimeofday' at the beginning, to get the timestamp of the test start (AFAIK the issue on Windows is the resolution, but it should be enough). Then we could add it up with the elapsed time and we'd get the same output as on Linux. And maybe this could be done in regular pgbench execution too? But I really need help from someone who knows Windows and can test this. Comments regarding Pavel's reviews are below: On 2.10.2012 20:08, Pavel Stehule wrote: > Hello > > I did review of pgbench patch > http://archives.postgresql.org/pgsql-hackers/2012-09/msg00737.php > > * this patch is cleanly patched > > * I had a problem with doc > > make[1]: Entering directory `/home/pavel/src/postgresql/doc/src/sgml' > openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . > -c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl > -t sgml -i output-html -V html-index postgres.sgml > openjade:pgbench.sgml:767:8:E: document type does not allow element > "TITLE" here; missing one of "ABSTRACT", "AUTHORBLURB", "MSGSET", > "CALLOUTLIST", "ITEMIZEDLIST", "ORDEREDLIST", "SEGMENTEDLIST", > "VARIABLELIST", "CAUTION", "IMPORTANT", "NOTE", "TIP", "WARNING", > "FORMALPARA", "BLOCKQUOTE", "EQUATION", "EXAMPLE", "FIGURE", "TABLE", > "PROCEDURE", "SIDEBAR", "QANDASET", "REFSECT3" start-tag > make[1]: *** [HTML.index] Error 1 > make[1]: *** Deleting file `HTML.index' > make[1]: Leaving directory `/home/pavel/src/postgresql/doc/src/sgml Hmmm, I've never got the docs to build at all, all I do get is a heap of some SGML/DocBook related issues. Can you investigate a bit more where's the issue? I don't remember messing with the docs in a way that might cause this ... mostly copy'n'paste edits. > * pgbench is compiled without warnings > > * there is a few issues in source code > > + > + /* should we aggregate the results or not? */ > + if (use_log_agg) > + { > + > + /* are we still in the same interval? if yes, > accumulate the > + * values (print them otherwise) */ > + if (agg->start_time + use_log_agg >= > INSTR_TIME_GET_DOUBLE(now)) > + { > + Errr, so what are the issues here? > > * a real time range for aggregation is dynamic (pgbench is not > realtime application), so I am not sure if following constraint has > sense > > +if ((duration > 0) && (use_log_agg > 0) && (duration % use_log_agg != > 0)) { > + fprintf(stderr, "duration (%d) must be a multiple of aggregation > interval (%d)\n", duration, use_log_agg); > + exit(1); > + } I'm not sure what might be the issue here either. If the test duration is set (using "-T" option), then this kicks in and says that the duration should be a multiple of aggregation interval. Seems like a sensible assumption to me. Or do you think this is check should be removed? > * it doesn't flush last aggregated data (it is mentioned by Tomas: > "Also, I've realized the last interval may not be logged at all - I'll > take a look into this in the next version of the patch."). And it can > be significant for higher values of "use_log_agg" Yes, and I'm still not sure how to fix this in practice. But I do have this on my TODO. > > * a name of variable "use_log_agg" is not good - maybe "log_agg_interval" ? I've changed it to agg_interval. regards Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 090c210..2093edc 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
[ this time *with* the patch ] Attached is a complete draft patch against HEAD for this issue. I found a depressingly large number of pre-existing bugs: Practically all WAL record types that touch multiple pages have some bug of this type. In addition to btree_xlog_split, I found that heap_xlog_update, ginRedoDeletePage, spgRedoAddLeaf, spgRedoMoveLeafs, spgRedoAddNode, spgRedoSplitTuple, and spgRedoPickSplit fail to hold locks as required to make their updates safe for concurrent queries. (I'm not totally sure about ginRedoDeletePage, but the original action definitely locks the pages simultaneously, and it's not clear that it's safe not to.) Most of these are okay in cases without any full-page images, but could fail if the wrong subset of the pages-to-be-touched were processed by RestoreBkpBlocks. Some had bugs even without that :-( Worse than that, GIST WAL replay seems to be a total disaster from this standpoint, and I'm not convinced that it's even fixable without incompatible WAL changes. There are several cases where index insertion operations generate more than one WAL record while holding exclusive lock on buffer(s). If the lock continuity is actually necessary to prevent concurrent queries from seeing inconsistent index states, then we have a big problem, because WAL replay isn't designed to hold any buffer lock for more than one WAL action. This needs to be looked at closer by somebody who's more up on the GIST code than I am. The attached patch doesn't try very hard to make the GIST functions safe, it just transposes them to the new API. I also found that spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple --- it would get fooled by the LSN having been advanced already. This is an index corruption bug not just a transient-failure problem. Also, ginHeapTupleFastInsert's "merge lists" case fails to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. I already pointed out the inconsistency in heap_xlog_freeze about whether a cleanup lock is needed. As is, this patch uses a cleanup lock, but I suspect that a regular lock is sufficient --- comments? Also, gistRedoPageDeleteRecord seems to be dead code? I don't see anything that emits XLOG_GIST_PAGE_DELETE. regards, tom lane binVAAMKoREwG.bin Description: restore-backup-blocks-individually-2.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
Attached is a complete draft patch against HEAD for this issue. I found a depressingly large number of pre-existing bugs: Practically all WAL record types that touch multiple pages have some bug of this type. In addition to btree_xlog_split, I found that heap_xlog_update, ginRedoDeletePage, spgRedoAddLeaf, spgRedoMoveLeafs, spgRedoAddNode, spgRedoSplitTuple, and spgRedoPickSplit fail to hold locks as required to make their updates safe for concurrent queries. (I'm not totally sure about ginRedoDeletePage, but the original action definitely locks the pages simultaneously, and it's not clear that it's safe not to.) Most of these are okay in cases without any full-page images, but could fail if the wrong subset of the pages-to-be-touched were processed by RestoreBkpBlocks. Some had bugs even without that :-( Worse than that, GIST WAL replay seems to be a total disaster from this standpoint, and I'm not convinced that it's even fixable without incompatible WAL changes. There are several cases where index insertion operations generate more than one WAL record while holding exclusive lock on buffer(s). If the lock continuity is actually necessary to prevent concurrent queries from seeing inconsistent index states, then we have a big problem, because WAL replay isn't designed to hold any buffer lock for more than one WAL action. This needs to be looked at closer by somebody who's more up on the GIST code than I am. The attached patch doesn't try very hard to make the GIST functions safe, it just transposes them to the new API. I also found that spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple --- it would get fooled by the LSN having been advanced already. This is an index corruption bug not just a transient-failure problem. Also, ginHeapTupleFastInsert's "merge lists" case fails to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. I already pointed out the inconsistency in heap_xlog_freeze about whether a cleanup lock is needed. As is, this patch uses a cleanup lock, but I suspect that a regular lock is sufficient --- comments? Also, gistRedoPageDeleteRecord seems to be dead code? I don't see anything that emits XLOG_GIST_PAGE_DELETE. regards, tom lane #application/octet-stream [restore-backup-blocks-individually-2.patch.gz] /home/tgl/pgsql/restore-backup-blocks-individually-2.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote: > I don't think so GUC are good for this purpouse, but I don't like > single purpouse statements too. > > what do you think about enhancing ALTER DATABASE statement > > some like > > ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS Per-database does sound easier than per-table. I'd have to think about how that would affect shared catalogs though. For now, I'm leaning toward an offline utility to turn checksums on or off, called pg_checksums. It could do so lazily (just flip a switch to "enabling" in pg_control), or it could do so eagerly and turn it into a fully-protected instance. For the first patch, it might just be an initdb-time option for simplicity. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Hello > >> > Does it make sense to store this information in pg_control? That doesn't >> > require adding any new file, and it has the benefit that it's already >> > checksummed. It's available during recovery and can be made available >> > pretty easily in the places where we write data. >> > >> > And the next question is what commands to add to change state. Ideas: >> > >> >CHECKSUMS ENABLE; -- set state to "Enabling" >> >CHECKSUMS DISABLE; -- set state to "Off" >> >> Don't like this, please make it a GUC. > > I'll see if you have ideas about how to resolve the problems with a GUC > that I mentioned above. But if not, then what about using a utility, > perhaps called pg_checksums? That way we wouldn't need new syntax. I don't think so GUC are good for this purpouse, but I don't like single purpouse statements too. what do you think about enhancing ALTER DATABASE statement some like ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 2:43 PM, Noah Misch wrote: > On Sun, Nov 11, 2012 at 10:10:31AM -0500, Matthew Gerber wrote: > > > > Matthew Gerber writes: > > > > >> Here is the command that was executing when the 0xC409 > exception > > > was > > > > >> raised: > > > > >> INSERT INTO places > (bounding_box,country,full_name,id,name,type,url) > > > > >> VALUES > > > > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085 > > > > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789 > > > > >> 32.771786,-97.034085 32.771786))',4326),26918),'United > > > States','Irving, > > > > >> TX','dce44ec49eb788f5','Irving','city',' > > > > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'), > > > A slight update on this: previously, my insert code involved a long > > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this > > command was too long, I would get a stack depth exception thrown back to > my > > client application. I reduced the length of the command to eliminate the > > client-side exceptions, but on some occasions I would still get the > > 0xC409 crash on the server side. I have eliminated the long "SELECT > ... > > UNION ALL ... " command, and I now get no errors on either side. So it > > seems like long commands like this were actually causing the server-side > > crashes. The proper behavior would seem to be throwing the exception back > > to the client application instead of crashing the server. > > Above, you quoted an INSERT ... VALUES of two rows. Have you observed an > exception-0xC409 crash with an INSERT ... VALUES query, or only with an > "INSERT ... SELECT ... thousands of UNION" query? > Every time the server crashed with 0xC409, the log reported that it was running the simple INSERT command (two rows) that I started this thread with. However, this didn't make any sense to me given the simplicity of the INSERT command and the fact that the error indicated a stack overflow. So I removed the long "SELECT ... UNION ALL ..." command since it seemed more relevant to the error, and the process has been running continuously for a few days now. To answer your question directly: I was seeing the server crash when using the simple INSERT and long "SELECT ... UNION ..." (these commands are issued independently at different points in the program). Now my program is only using the simple INSERT, and the crashes are gone. Hope this helps... Matt
Re: [HACKERS] Enabling Checksums
On Fri, 2012-11-09 at 09:57 -0800, Josh Berkus wrote: > Huh? Why would a GUC not make sense? How else would you make sure that > checksums where on when you started the system? If we stored the information in pg_control, you could check with pg_controldata. We could have a separate utility, pg_checksums, that can alter the state and/or do an offline verification. And initdb would take an option that would start everything out fully protected with checksums. The problem with a GUC is that checksums aren't really something you can change by just changing the variable and restarting, unless you are only using checksums opportunistically (only write checksums when a page is dirtied and only verify a checksum if the header indicates that it's present). There are also usability issues. If someone has a fully-protected instance, and turns the GUC off, and starts the server, they'll lose the "fully-protected" status on the first write, and have to re-read all the data to get back to fully protected. That just doesn't seem right to me. > Well, large databases would tend to be stuck permanently in "Enabling", > becuase the user would never vacuum old cold partitions in order to > checksum them. So we need to be prepared for this to be the end state > for a lot of databases. That may be true, but if that's the case, it's more like a 3-bit checksum than a 16-bit checksum, because of the page-header corruption problem. I don't know of any way to give those users more than that, which won't be good enough for the set-at-initdb time users. > In fact, we'd need three settings for the checksum GUC: > > OFF -- don't checksum anything, equal to state (1) above > > WRITES -- checksum pages which are being written anyway, but ignore > tables which aren't touched. Permanent "Enabling" state. > > ALL -- checksum everything you can. particularly, autovacuum would > checksum any table which was not already checksummed at the next vacuum > of that table. Goal is to get to state 3 above. That's slightly more eager, but it's basically the same as the WRITES state. In order to get to the fully-protected state, you still need to somehow make sure that all of the old data is checksummed. And the "fully protected" state is important in my opinion, because otherwise we aren't protected against corrupt page headers that say they have no checksum (even when it really should have a checksum). > > Does it make sense to store this information in pg_control? That doesn't > > require adding any new file, and it has the benefit that it's already > > checksummed. It's available during recovery and can be made available > > pretty easily in the places where we write data. > > > > And the next question is what commands to add to change state. Ideas: > > > >CHECKSUMS ENABLE; -- set state to "Enabling" > >CHECKSUMS DISABLE; -- set state to "Off" > > Don't like this, please make it a GUC. I'll see if you have ideas about how to resolve the problems with a GUC that I mentioned above. But if not, then what about using a utility, perhaps called pg_checksums? That way we wouldn't need new syntax. > As there's no such thing as system-wide vacuum, we're going to have to > track whether a table is "fully checksummed" in the system catalogs. It seems like this is going down the road of per-table checksums. I'm not opposed to that, but that has a low chance of making 9.3. Let's try to do something simpler now that leaves open the possibility of more flexibility later. I'm inclined to agree with Robert that the first patch should probably be an initdb-time option. Then, we can allow a lazy mode (like your WRITES state) and an eager offline check with a pg_checksums utility. Then we can work towards per-table checksums, control via VACUUM, protecting the SLRU, treating zero pages as invalid, protecting temp files (which can be a GUC), replication integration, etc. > Hmmm, better to have a 2nd GUC: > > checksum_fail_action = WARNING | ERROR > > ... since some people want the write or read to fail, and others just > want to see it in the logs. Checksums don't introduce new failure modes on writes, only on reads. And for reads, I think we have a problem doing anything less than an ERROR. If we allow the read to succeed, we either risk a crash (or silently corrupting other buffers in shared memory), or we have to put a zero page in its place. But we already have the zero_damaged_pages option, which I think is better because reading corrupt data is only useful for data recovery efforts. > So, thinking about it, state (3) is never the state of an entire > installation; it's always the state of individual tables. That contradicts the idea of using a GUC then. It would make more sense to have extra syntax or extra VACUUM modes to accomplish that per-table. Unfortunately, I'm worried that the per-table approach will not be completed by 9.3. Do you see something about my proposal that makes it harder to get where we want to go in th
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 10:10:31AM -0500, Matthew Gerber wrote: > > > Matthew Gerber writes: > > > >> Here is the command that was executing when the 0xC409 exception > > was > > > >> raised: > > > >> INSERT INTO places (bounding_box,country,full_name,id,name,type,url) > > > >> VALUES > > > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085 > > > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789 > > > >> 32.771786,-97.034085 32.771786))',4326),26918),'United > > States','Irving, > > > >> TX','dce44ec49eb788f5','Irving','city',' > > > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'), > A slight update on this: previously, my insert code involved a long > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this > command was too long, I would get a stack depth exception thrown back to my > client application. I reduced the length of the command to eliminate the > client-side exceptions, but on some occasions I would still get the > 0xC409 crash on the server side. I have eliminated the long "SELECT ... > UNION ALL ... " command, and I now get no errors on either side. So it > seems like long commands like this were actually causing the server-side > crashes. The proper behavior would seem to be throwing the exception back > to the client application instead of crashing the server. Above, you quoted an INSERT ... VALUES of two rows. Have you observed an exception-0xC409 crash with an INSERT ... VALUES query, or only with an "INSERT ... SELECT ... thousands of UNION" query? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
Hi, I've prepared a slightly updated patch, based on the previous review. See it attached. On 18.10.2012 04:28, 花田 茂 wrote:> Hi Tomas, > > On 2012/10/17, at 20:45, Tomas Vondra wrote: >> >> Dne 17.10.2012 12:34, Shigeru HANADA napsal: >>> Performance test >>> >>> I tested 1000 tables case (each is copy of pgbench_branches with >>> 10 rows) on 1GB shared_buffers server. Please note that I >>> tested on MacBook air, i.e. storage is not HDD but SSD. Here is >>> the test procedure: >>> >>> 1) loop 1000 times >>> 1-1) create copy table of pgbench_accounts as accounts$i >>> 1-2) load 10 rows >>> 1-3) add primary key >>> 1-4) select all rows to cache pages in shared buffer >>> 2) BEGIN >>> 3) loop 1000 times >>> 3-1) DROP TABLE accounts$i >>> 4) COMMIT >> >> I don't think the 'load rows' and 'select all rows' is really >> necessary. And AFAIK sequential scans use small circular buffer >> not to pollute sharedbuffers, so I'd guess the pages are not cached >> in shared buffers anyway. Have you verified that, e.g. by >> pg_buffercache? > > Oops, you're right. I omitted 1-3 and 1-4 in actual measurement, but > IMO loading data is necessary to fill the shared buffer up, because > # of buffers which are deleted during COMMIT is major factor of this > patch. And, yes, I verified that all shared buffers are used after > all loading have been finished. I don't think it's an important factor, as the original code does this: for (i = 0; i < NBuffers; i++) { volatile BufferDesc *bufHdr = &BufferDescriptors[i]; ... } in the DropRelFileNodeAllBuffers. That loops through all shared buffers no matter what, so IMHO the performance in this case depends on the total size of the shared buffers. But maybe I'm missing something important. Our system creates a lot of "working tables" (even 100.000) and we need to perform garbage collection (dropping obsolete tables) regularly. Thisoften took ~ 1 hour, because we're using big AWS instances with lots of RAM (which tends to be slower than RAM on bare hw). After applying this patch and dropping tables in groups of 100, the gc runs in less than 4 minutes (i.e. a 15x speed-up). >>> >>> Hm, my environment seems very different from yours. Could you >>> show the setting of shared_buffers in your environment? I'd like >>> to make my test environment as similar as possible to yours. >> >> We're using m2.4xlarge instances (70G of RAM) with 10GB shared >> buffers. > > Thank you, it's more huge than I expected. I'm not sure whether my > boss allows me to use such rich environment... :( I've done a simple benchmark on my laptop with 2GB shared buffers, it's attached in the drop-test.py (it's a bit messy, but it works). It does four things: 1) creates N tables 2) drops them one by one 3) creates N tables 4) drops them in batches (batch = one transaction) To run the script, simply specify number of tables you want to work with (say 1), size of the batch (e.g. 100) and connection string (e.g. 'host=localhost dbname=mydb'). With those parameters, I got these numbers on the laptop: creating 1 tables all tables created in 3.298694 seconds dropping 1 tables one by one ... all tables dropped in 32.692478 seconds creating 1 tables all tables created in 3.458178 seconds dropping 1 tables in batches by 100... all tables dropped in 3.28268 seconds So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually get larger differences, as we use larger shared buffers and the memory is significantly slower there. Regarding the other comments: > * As Robert commented, this patch adds DropRelFileNodeAllBuffersList > by copying code from DropRelFileNodeAllBuffers. Please refactor it > to avoid code duplication. Yes, I've merged the two functions, i.e. I've removed the original DropRelFileNodeAllBuffers and used the name for the new one (taking array instead of single relnode). Then I've modified the existing calls to to use DropRelFileNodeAllBuffers(&relnode, 1) instead of the original one DropRelFileNodeAllBuffers(relnode) Maybe this should be done for smgrdounlink/smgrdounlinkall too. > * In smgrDoPendingDeletes, you free srels explicitly. Can't we leave > them to memory context stuff? Even it is required, at least pfree > must be called in the case nrels == 0 too. Hmmm, right. Not sure which choice is better, so for now I've moved the pfree out of the 'if' block, so it's executed for 'nrels==0' too. > * In smgrDoPendingDeletes, the buffer srels is expanded in every > iteration. This seems a bit inefficient. How about doubling the > capacity when used it up? This requires additional variable, but > reduces repalloc call considerably. Done, although I haven't seen any significant speed improvement. > * Just an idea, but if we can remove entries for local relations from > rnodes array before buffer loop in DropRelFileNodeAllBuffersList, > following
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 12:22:24PM -0500, Tom Lane wrote: > perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b, 3 > c\n" foreach (1..8200);' | psql > > On the machine I tried this on, it works up to about 8200 and then fails > in the way I'd expect: > > ERROR: stack depth limit exceeded > HINT: Increase the configuration parameter "max_stack_depth" (currently > 2048kB), after ensuring the platform's stack depth limit is adequate. > > But then when I cranked it up to 8, kaboom: > > connection to server was lost I tried this test case on Windows Server 2008 (x64). It hit max_stack_depth at 9000 UNIONs and crashed at 1. When I run it under a debugger, the debugger reports exception 0xC0FD (STATUS_STACK_OVERFLOW). Run normally, the server log reports exception 0xC005 (STATUS_ACCESS_VIOLATION). > Inspection of the core dump shows transformSetOperationTree is the > problem --- it's recursing but lacks a check_stack_depth test. > So that's easy to fix, but I wonder why the critical depth limit seems > to be so much less on your machine. I get the expected error up to > about 65000 UNION ALLs --- why is yours crashing at a tenth of that? So, I can reproduce the lower threshold, but the exception type does not agree with the one Matthew observed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sat, 2012-11-10 at 14:46 +0100, Florian Pflug wrote: > > The bit indicating that a checksum is present may be lost due to > > corruption. > > Though that concern mostly goes away if instead of a separate bit we use a > special checksum value, say 0xDEAD, to indicate that the page isn't > checksummed, no? Right. But then we have an upgrade impact to set the checksum to 0xDEAD on all existing pages, which seems to eliminate most of the possible reason for it. Also, we'd need to tweak the algorithm to make sure that it never landed on that magic value. So if we think we might want this in the future, we should reserve that magic value now. But I can't think of many reasons for it, unless we expect people to be turning checksums on and off repeatedly. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
Matthew Gerber writes: > Interesting. I really have no idea why mine seemed to fail so much sooner. > I recalled my 5k-7k figure from memory, so I might be off on that, but > probably not by an order of magnitude. In any case, it sounds like you know > how to fix the problem. Should I file this as a bug report or will you take > care of it from here? I will fix the crash I'm seeing; I'm just not 100% convinced it's the same crash you're seeing. It'd be useful if some other folk would poke at similar examples on Windows to see if there's something unexpected happening with stack depth checks there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 12:22 PM, Tom Lane wrote: > Matthew Gerber writes: > > On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane wrote: > >> How long is "long"? > > > I was seeing queries with around 5000-7000 "UNION ALL" statements. > > Hm. I experimented with test queries created like so: > > perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b, > 3 c\n" foreach (1..8200);' | psql > > On the machine I tried this on, it works up to about 8200 and then fails > in the way I'd expect: > > ERROR: stack depth limit exceeded > HINT: Increase the configuration parameter "max_stack_depth" (currently > 2048kB), after ensuring the platform's stack depth limit is adequate. > > But then when I cranked it up to 8, kaboom: > > connection to server was lost > > Inspection of the core dump shows transformSetOperationTree is the > problem --- it's recursing but lacks a check_stack_depth test. > So that's easy to fix, but I wonder why the critical depth limit seems > to be so much less on your machine. I get the expected error up to > about 65000 UNION ALLs --- why is yours crashing at a tenth of that? > Tom, Interesting. I really have no idea why mine seemed to fail so much sooner. I recalled my 5k-7k figure from memory, so I might be off on that, but probably not by an order of magnitude. In any case, it sounds like you know how to fix the problem. Should I file this as a bug report or will you take care of it from here? Best, Matt
Re: [HACKERS] too much pgbench init output
On 23.10.2012 18:21, Robert Haas wrote: > On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera > wrote: >> Tomas Vondra wrote: >> >>> I've been thinking about this a bit more, and do propose to use an >>> option that determines "logging step" i.e. number of items (either >>> directly or as a percentage) between log lines. >>> >>> The attached patch defines a new option "--logging-step" that accepts >>> either integers or percents. For example if you want to print a line >>> each 1000 lines, you can to this >>> >>> $ pgbench -i -s 1000 --logging-step 1000 testdb >> >> I find it hard to get excited about having to specify a command line >> argument to tweak this. Would it work to have it emit messages >> depending on elapsed time and log scale of tuples emitted? So for >> example emit the first message after 5 seconds or 100k tuples, then back >> off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is >> to make it verbose enough to keep a human satisfied with what he sees, >> but not flood the terminal with pointless updates. (I think printing >> the ETA might be nice as well, not sure). > > I like this idea. One of the times when the more verbose output is > really useful is when you expect it to run fast but then it turns out > that for some reason it runs really slow. If you make the output too > terse, then you end up not really knowing what's going on. Having it > give an update at least every 5 seconds would be a nice way to give > the user a heads-up if things aren't going as planned, without > cluttering the normal case. I've prepared a patch along these lines. The attached version used only elapsed time to print the log messages each 5 seconds, so now it prints a meessage each 5 seconds no matter what, along with an estimate of remaining time. I've removed the config option, although it might be useful to specify the interval? I'm not entirely sure how the 'log scale of tuples' should work - for example when the time 15 seconds limit is reached, should it be reset back to the previous step (5 seconds) to give a more detailed info, or should it be kept at 15 seconds? Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 090c210..3cbb564 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -135,6 +135,11 @@ intunlogged_tables = 0; double sample_rate = 0.0; /* + * logging steps (seconds between log messages) + */ +intlog_step_seconds = 5; + +/* * tablespace selection */ char *tablespace = NULL; @@ -1362,6 +1367,11 @@ init(bool is_no_vacuum) charsql[256]; int i; + /* used to track elapsed time and estimate of the remaining time */ + instr_time start, diff; + double elapsed_sec, remaining_sec; + int log_interval = 1; + if ((con = doConnect()) == NULL) exit(1); @@ -1430,6 +1440,8 @@ init(bool is_no_vacuum) } PQclear(res); + INSTR_TIME_SET_CURRENT(start); + for (i = 0; i < naccounts * scale; i++) { int j = i + 1; @@ -1441,10 +1453,28 @@ init(bool is_no_vacuum) exit(1); } - if (j % 10 == 0) - fprintf(stderr, "%d of %d tuples (%d%%) done.\n", - j, naccounts * scale, - (int) (((int64) j * 100) / (naccounts * scale))); + /* let's not call the timing for each row, but only each 100 rows */ + if (j % 100 == 0 || j == scale * naccounts) + { + INSTR_TIME_SET_CURRENT(diff); + INSTR_TIME_SUBTRACT(diff, start); + + elapsed_sec = INSTR_TIME_GET_DOUBLE(diff); + remaining_sec = (scale * naccounts - j) * elapsed_sec / j; + + /* have ve reached the next interval? */ + if (elapsed_sec >= log_interval * log_step_seconds) { + + fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n", + j, naccounts * scale, + (int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec); + + /* skip to the next interval */ + log_interval = (int)ceil(elapsed_sec/log_step_seconds); + + } + } + } if (PQputline(con, "\\.\n")) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
Matthew Gerber writes: > On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane wrote: >> How long is "long"? > I was seeing queries with around 5000-7000 "UNION ALL" statements. Hm. I experimented with test queries created like so: perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b, 3 c\n" foreach (1..8200);' | psql On the machine I tried this on, it works up to about 8200 and then fails in the way I'd expect: ERROR: stack depth limit exceeded HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate. But then when I cranked it up to 8, kaboom: connection to server was lost Inspection of the core dump shows transformSetOperationTree is the problem --- it's recursing but lacks a check_stack_depth test. So that's easy to fix, but I wonder why the critical depth limit seems to be so much less on your machine. I get the expected error up to about 65000 UNION ALLs --- why is yours crashing at a tenth of that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane wrote: > Matthew Gerber writes: > > On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch wrote: > >> It signifies scribbling past the end of the frame's local variables: > >> http://msdn.microsoft.com/en-us/library/8dbf701c.aspx > > > A slight update on this: previously, my insert code involved a long > > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. > > How long is "long"? > I was seeing queries with around 5000-7000 "UNION ALL" statements. Matt
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
Matthew Gerber writes: > On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch wrote: >> It signifies scribbling past the end of the frame's local variables: >> http://msdn.microsoft.com/en-us/library/8dbf701c.aspx > A slight update on this: previously, my insert code involved a long > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. How long is "long"? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server
On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch wrote: > On Sun, Nov 04, 2012 at 02:30:38PM -0500, Tom Lane wrote: > > Matthew Gerber writes: > > >> Here is the command that was executing when the 0xC409 exception > was > > >> raised: > > >> INSERT INTO places (bounding_box,country,full_name,id,name,type,url) > > >> VALUES > > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085 > > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789 > > >> 32.771786,-97.034085 32.771786))',4326),26918),'United > States','Irving, > > >> TX','dce44ec49eb788f5','Irving','city',' > > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'), > > > > Assuming that 0xC409 does actually represent a stack-overrun > > condition, > > It signifies scribbling past the end of the frame's local variables: > http://msdn.microsoft.com/en-us/library/8dbf701c.aspx > A slight update on this: previously, my insert code involved a long "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this command was too long, I would get a stack depth exception thrown back to my client application. I reduced the length of the command to eliminate the client-side exceptions, but on some occasions I would still get the 0xC409 crash on the server side. I have eliminated the long "SELECT ... UNION ALL ... " command, and I now get no errors on either side. So it seems like long commands like this were actually causing the server-side crashes. The proper behavior would seem to be throwing the exception back to the client application instead of crashing the server. Hope this helps... Matt
Re: [HACKERS] Failing SSL connection due to weird interaction with openssl
Am 06.11.2012 21:40, schrieb Robert Haas: > On Tue, Oct 23, 2012 at 4:09 AM, Lars Kanis wrote: >> While investigating a ruby-pg issue [1], we noticed that a libpq SSL >> connection can fail, if the running application uses OpenSSL for other work, >> too. Root cause is the thread local error queue of OpenSSL, that is used to >> transmit textual error messages to the application after a failed crypto >> operation. In case that the application leaves errors on the queue, the >> communication to the PostgreSQL server can fail with a message left from the >> previous failed OpenSSL operation, in particular when using non-blocking >> operations on the socket. This issue with openssl is quite old now - see >> [3]. >> >> For [1] it turned out that the issue is subdivided into these three parts: >> 1. the ruby-openssl binding does not clear the thread local error queue of >> OpenSSL after a certificate verify >> 2. OpenSSL makes use of a shared error queue for different crypto contexts. >> 3. libpq does not ensure a cleared error queue when doing SSL_* calls >> >> To 1: Remaining messages on the error queue can generally lead to failing >> operations, later on. I'd talk to the ruby-openssl developers, to discuss >> how we can avoid any remaining messages on the queue. >> >> To 2: SSL_get_error() inspects the shared error queue under some conditions. >> It's maybe poor API design, but it's documented behaviour [2]. So we >> certainly have to get along with it. >> >> To 3: To make libpq independent to a previous error state, the error queue >> might be cleared with a call to ERR_clear_error() prior >> SSL_connect/read/write as in the attached trivial patch. This would make >> libpq robust against other uses of openssl within the application. >> >> What do you think about clearing the OpenSSL error queue in libpq in that >> way? > Shouldn't it be the job of whatever code is consuming the error to > clear the error queue afterwards? > Yes, of course. I already filed a bug for ruby-openssl, some weeks ago [1]. But IMHO libpq should be changed too, for the following reasons: 1. The behavior of libpq isn't consistent, since blocking calls are already agnostic to remaining errors in the openssl queue, but non-blocking are not. This is a openssl quirk, that is exposed to the libpq-API, this way. 2. libpq throws wrong errors. The error of libpq isn't "Remaining errors in openssl error queue. libpq requires a clear error queue in order to work correctly.", but instead it throws arbitrary foreign errors that could relate to or may not relate to the communication of libpq. The documentation for SSL_get_error(3) is pretty unambiguous about the need to clear the error queue first. 3. The sensitivity of libpq to the error queue can lead to bugs that are hard to track down, like this one [2]. This is because a libpq error leads the developer to look for a bug related to the database connection, although the issue is in a very different part of the code. Regards, Lars [1] http://bugs.ruby-lang.org/issues/7215 [2] https://bitbucket.org/ged/ruby-pg/issue/142/async_exec-over-ssl-connection-can-fail-on -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Patch to compute Max LSN of Data Pages
On Sun, Nov 11, 2012 at 02:18:11AM -0300, Alvaro Herrera wrote: > Amit kapila wrote: > > On Saturday, November 10, 2012 10:19 PM Noah Misch wrote: > > > This patch is now marked Returned with Feedback in the CF, but I see no > > > on-list feedback. Did some review happen? > > > > No review happened for this patch. > > It has returned due to slight confusion thinking that this is same as: > > Patch for option in pg_resetxlog for restore from WAL files > > (https://commitfest.postgresql.org/action/patch_view?id=897 ) for which > > Heikki has given some comments. > > Oops, sorry, my mistake. Please reopen it as needing review. Done. > I will > move all the open patches to CF3, unless someone beats me to it. I > probably won't be able to get anything done tomorrow, so if somebody has > a boring Sunday I would appreciate the help. Likewise. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers