Re: [HACKERS] Assert's vs elog ERROR vs elog FATAL
Daniel Wood writes: > Is the main difference between: > if (s->blockState != TBLOCK_SUBINPROGESS) > elog(*FATAL*, ... > vs > Assert(s->blockState == TBLOCK_SUBINPROGRESS); > the fact that in both cases: > a) the situation is unexpected, as in no user code can create this; > b) however, if you want the check to always be done in production > because of paranoia, or because a failure after this would be harder to > figure out, or because you want to log more info, like the exact value > of blockState, then you need to use the elog(FATAL, ...) way of doing it? It's a bit of a judgment call really. The test-and-elog approach is the thing to use if you want the check to be there in production builds; if you think it's sufficient to check it in debug builds, then an Assert is appropriate. And, as you say, the elog does offer an opportunity to log some additional info beyond "this should not have happened". Another point is that an Assert forces a database-wide restart, whereas elog(FATAL) only forces the current session to quit --- in this respect, elog(PANIC) is a closer approximation to Assert. I wouldn't necessarily claim that there's been a great deal of uniformity in past decisions about which way to code can't-happen checks ;-) > Given the example: > elog(ERROR, "StartTransactionCommand: unexpected state %s", ... > vs > elog(FATAL, "CommitTransactionCommand: unexpected state %s", ... > why is one considered fatal but in the other case handle-able? I didn't look at the code but it may be that the Commit case is past the point where it's safe to abort the transaction (although if so, it really oughta be a PANIC). Or it might just be randomness; although I think we did once go through and look at all the FATAL calls to see if that was an appropriate labeling. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assert's vs elog ERROR vs elog FATAL
These two questions are about the correct coding practice in Postgresql vs the specifics of xact.c Is the main difference between: if (s->blockState != TBLOCK_SUBINPROGESS) elog(*FATAL*, ... vs Assert(s->blockState == TBLOCK_SUBINPROGRESS); the fact that in both cases: a) the situation is unexpected, as in no user code can create this; b) however, if you want the check to always be done in production because of paranoia, or because a failure after this would be harder to figure out, or because you want to log more info, like the exact value of blockState, then you need to use the elog(FATAL, ...) way of doing it? Given the example: elog(ERROR, "StartTransactionCommand: unexpected state %s", ... vs elog(FATAL, "CommitTransactionCommand: unexpected state %s", ... why is one considered fatal but in the other case handle-able? I presume the answer is something like: It is subjective and there is no real rule. OR: While no user code would ever likely try to handle an elog(ERROR, ...) case, in theory the internal process state is still intact such that we could safely continue even if there is code that uses the ERROR situation incorrectly(should be FATAL).
[HACKERS] exactly what is COPY BOTH mode supposed to do in case of an error?
It seems the backend and libpq don't agree. The backend makes no special provision to wait for a CopyDone message if an error occurs during copy-both. It simply sends an ErrorResponse and that's it. libpq, on the other hand, treats either CopyDone or ErrorResponse as a cue to transition to PGASYNC_COPY_IN (see pqGetCopyData3). And that's a problem, because in PGASYNC_COPY_IN mode, libpq is unwilling to process the ErrorResponse. getCopyDataMessage doesn't do anything with it, and if the user calls PQgetResult(), pqParseInput3() won't process it either, because it's only willing to do that when the status is PGASYNC_BUSY. So the bottom line is that after the server throws an error, the server gets back to thinking that the connection is idle, while the client ends up thinking that we're still in copy-in mode. The client can try to recover by doing PQputCopyEnd(), which will get libpq back to an idle state, but if the extended-query protocol is in use, the Sync it sends will cause the server to send an extra ReadyForQuery that libpq isn't expecting. Hilarity ensues. It's perhaps not coincidental that "48.2.5 COPY operations" is silent about what is supposed to happen when an error occurs in copy-both mode, though it does talk about both copy-in and copy-out. On copy-in, it says: In the event of a backend-detected error during copy-in mode (including receipt of a CopyFail message), the backend will issue an ErrorResponse message. If the COPY command was issued via an extended-query message, the backend will now discard frontend messages until a Sync message is received, then it will issue ReadyForQuery and return to normal processing. And regarding copy-out, it says: In the event of a backend-detected error during copy-out mode, the backend will issue an ErrorResponse message and revert to normal processing. The frontend should treat receipt of ErrorResponse as terminating the copy-out mode. There's no corresponding statement about error-handling in the copy-both case. However, since the apparent intent is that an error message from the server trumps anything the client may have had in mind, it seems reasonable to decide that an ErrorResponse is intended to fully terminate copy-both mode (not just switch to copy-in) and initiate a skip-until-sync. That's what the server actually does, but libpq has other ideas. One way to see the practical effect of this is to set up a streaming replication slave but modify the backup label to reference some future WAL location not yet written (I just changed the "0" before the slash to a "1"). This will cause the server to throw the following error: ereport(ERROR, (errmsg("requested starting point %X/%X is ahead of the WAL flush position of this server %X/%X", (uint32) (cmd->startpoint >> 32), (uint32) (cmd->startpoint), (uint32) (FlushPtr >> 32), (uint32) (FlushPtr; Doing this will cause the PQgetCopyData in libpqrcv_receive to return -1, so you might expected that slave would get the following error: ereport(ERROR, (errmsg("could not receive data from WAL stream: %s", PQerrorMessage(streamConn; But you don't, because PQgetResult() returns PGRES_COPY_IN. So the slave thinks that the master made a *normal* termination of streaming replication, due to a timeline change, and it prints this: LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/0 It then calls PQputCopyEnd to end a copy that the server thinks is no longer in progress and then invokes PQgetResult again. And now it gets the error message: FATAL: error reading result of streaming command: ERROR: requested starting point 1/500 is ahead of the WAL flush position of this server 0/6000348 This doesn't in practice matter very much, because either way, we're going to slam shut the server connection at this point. But it's clearly messed up - the error is actually NOT in response to the CopyDone we sent, but rather in response to the START_STREAMING command that preceded it. libpq, however, refuses to receive the error message until after the unnecessary CopyDone has been sent. I believe that the only other place where this coding pattern arises is in receivelog.c. That code has actually adopted the opposite convention from the backend: it thinks it needs to send CopyDone whether or not an error has occurred. This turns out not to matter particularly, because the server is just going to try to close the connection anyway. But if it did anything else after ending the copy, I suspect the wheels would come off. I'm attaching a patch which adopts the position that the backend is right and libpq
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.
On Fri, Apr 26, 2013 at 05:52:03PM -0700, Jeff Davis wrote: > On Fri, 2013-04-26 at 16:46 -0700, David Fetter wrote: > > On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote: > > > Given this risk and the lack of field complaints about the issue, it > > > doesn't seem prudent to back-patch. > > ... > > > This needs back-patching to 9.1, where the bug was introduced. > > It was explained in the commit message why it was not backported. Oops. Sorry about that, Tom. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.
On Fri, 2013-04-26 at 16:46 -0700, David Fetter wrote: > On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote: > > Given this risk and the lack of field complaints about the issue, it > > doesn't seem prudent to back-patch. ... > This needs back-patching to 9.1, where the bug was introduced. It was explained in the commit message why it was not backported. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Sat, 2013-04-27 at 00:20 +0200, Andres Freund wrote: > CFLAGS_VECTORIZATION? EXTRA sounds to generic to me. I went with CFLAGS_VECTOR to be a little shorter while still keeping some meaning. > I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which > assigns the flag to some passed variable name. Then we can reuse it for > different vars and I have the feeling those will come. And having a > CFLAGS_VECTOR_OPT would just be stupid ;) Good suggestion; done. Thank you for the review. New renamed patch attached for the build options only (the other patch for the FNV checksum algorithm is unchanged). Regards, Jeff Davis *** a/config/c-compiler.m4 --- b/config/c-compiler.m4 *** *** 242,247 undefine([Ac_cachevar])dnl --- 242,272 + # PGAC_PROG_CC_VAR_OPT + # --- + # Given a variable name and a string, check if the compiler supports + # the string as a command-line option. If it does, add the string to + # the given variable. + AC_DEFUN([PGAC_PROG_CC_VAR_OPT], + [define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_$2])])dnl + AC_CACHE_CHECK([whether $CC supports $2], [Ac_cachevar], + [pgac_save_CFLAGS=$CFLAGS + CFLAGS="$pgac_save_CFLAGS $2" + ac_save_c_werror_flag=$ac_c_werror_flag + ac_c_werror_flag=yes + _AC_COMPILE_IFELSE([AC_LANG_PROGRAM()], +[Ac_cachevar=yes], +[Ac_cachevar=no]) + ac_c_werror_flag=$ac_save_c_werror_flag + CFLAGS="$pgac_save_CFLAGS"]) + if test x"$Ac_cachevar" = x"yes"; then + $1="${$1} $2" + fi + undefine([Ac_cachevar])dnl + ])# PGAC_PROG_CC_CFLAGS_OPT + + + # PGAC_PROG_CC_LDFLAGS_OPT # # Given a string, check if the compiler supports the string as a *** a/configure --- b/configure *** *** 731,736 autodepend --- 731,737 TAS GCC CPP + CFLAGS_VECTOR SUN_STUDIO_CC OBJEXT EXEEXT *** *** 3944,3949 else --- 3945,3955 fi fi + # set CFLAGS_VECTOR from the environment, if available + if test "$ac_env_CFLAGS_VECTOR_set" = set; then + CFLAGS_VECTOR=$ac_env_CFLAGS_VECTOR_value + fi + # Some versions of GCC support some additional useful warning flags. # Check whether they are supported, and add them to CFLAGS if so. # ICC pretends to be GCC but it's lying; it doesn't support these flags, *** *** 4376,4381 if test x"$pgac_cv_prog_cc_cflags__fexcess_precision_standard" = x"yes"; then --- 4382,4508 CFLAGS="$CFLAGS -fexcess-precision=standard" fi + # Optimization flags for specific files that benefit from vectorization + { $as_echo "$as_me:$LINENO: checking whether $CC supports -funroll-loops" >&5 + $as_echo_n "checking whether $CC supports -funroll-loops... " >&6; } + if test "${pgac_cv_prog_cc_cflags__funroll_loops+set}" = set; then + $as_echo_n "(cached) " >&6 + else + pgac_save_CFLAGS=$CFLAGS + CFLAGS="$pgac_save_CFLAGS -funroll-loops" + ac_save_c_werror_flag=$ac_c_werror_flag + ac_c_werror_flag=yes + cat >conftest.$ac_ext <<_ACEOF + /* confdefs.h. */ + _ACEOF + cat confdefs.h >>conftest.$ac_ext + cat >>conftest.$ac_ext <<_ACEOF + /* end confdefs.h. */ + + int + main () + { + + ; + return 0; + } + _ACEOF + rm -f conftest.$ac_objext + if { (ac_try="$ac_compile" + case "(($ac_try" in + *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; + *) ac_try_echo=$ac_try;; + esac + eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\"" + $as_echo "$ac_try_echo") >&5 + (eval "$ac_compile") 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && { + test -z "$ac_c_werror_flag" || + test ! -s conftest.err +} && test -s conftest.$ac_objext; then + pgac_cv_prog_cc_cflags__funroll_loops=yes + else + $as_echo "$as_me: failed program was:" >&5 + sed 's/^/| /' conftest.$ac_ext >&5 + + pgac_cv_prog_cc_cflags__funroll_loops=no + fi + + rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + ac_c_werror_flag=$ac_save_c_werror_flag + CFLAGS="$pgac_save_CFLAGS" + fi + { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags__funroll_loops" >&5 + $as_echo "$pgac_cv_prog_cc_cflags__funroll_loops" >&6; } + if test x"$pgac_cv_prog_cc_cflags__funroll_loops" = x"yes"; then + CFLAGS_VECTOR="${CFLAGS_VECTOR} -funroll-loops" + fi + + { $as_echo "$as_me:$LINENO: checking whether $CC supports -ftree-vectorize" >&5 + $as_echo_n "checking whether $CC supports -ftree-vectorize... " >&6; } + if test "${pgac_cv_prog_cc_cflags__ftree_vectorize+set}" = set; then + $as_echo_n "(cached) " >&6 + else + pgac_save_CFLAGS=$CFLAGS + CFLAGS="$pgac_save_CFLAGS -ftree-vectorize" + ac_save_c_werror_flag=$ac_c_werror_flag + ac_c_werror_flag=yes + cat >conftest.$ac_ext <<_ACEOF + /* confdefs.h. */ + _ACEOF + cat confdefs.h >>conftest.$ac_ext + cat >>conftest.$ac_ext <<_ACEOF + /* end confdefs.h. */ + +
[HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.
On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote: > Fix collation assignment for aggregates with ORDER BY. > > ORDER BY expressions were being treated the same as regular aggregate > arguments for purposes of collation determination, but really they should > not affect the aggregate's collation at all; only collations of the > aggregate's regular arguments should affect it. > > In many cases this mistake would lead to incorrectly throwing a "collation > conflict" error; but in some cases the corrected code will silently assign > a different collation to the aggregate than before, for example > agg(foo ORDER BY bar COLLATE "x") > which will now use foo's collation rather than "x" for the aggregate. > Given this risk and the lack of field complaints about the issue, it > doesn't seem prudent to back-patch. > > In passing, rearrange code in assign_collations_walker so that we don't > need multiple copies of the standard logic for computing collation of a > node with children. (Previously, CaseExpr duplicated the standard logic, > and we would have needed a third copy for Aggref without this change.) > > Andrew Gierth and David Fetter This needs back-patching to 9.1, where the bug was introduced. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 2013-04-26 12:57:09 -0700, Jeff Davis wrote: > I updated the patch and split it into two parts (attached). > The second patch adds the configure-time check for the right > compilation flags, and uses them when compiling checksum.c. I > called the new variable CFLAGS_EXTRA, for lack of a better idea, > so feel free to come up with a new name. It doesn't check for, or > use, -msse4.1, but that can be specified by the user by > configuring with CFLAGS_EXTRA="-msse4.1". CFLAGS_VECTORIZATION? EXTRA sounds to generic to me. > --- a/config/c-compiler.m4 > +++ b/config/c-compiler.m4 > @@ -242,6 +242,30 @@ undefine([Ac_cachevar])dnl > > > > +# PGAC_PROG_CC_CFLAGS_EXTRA_OPT > +# --- > +# Given a string, check if the compiler supports the string as a > +# command-line option. If it does, add the string to CFLAGS_EXTRA. > +AC_DEFUN([PGAC_PROG_CC_CFLAGS_EXTRA_OPT], > +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_extra_$1])])dnl > +AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar], > +[pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA > +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA $1" > +ac_save_c_werror_flag=$ac_c_werror_flag > +ac_c_werror_flag=yes > +_AC_COMPILE_IFELSE([AC_LANG_PROGRAM()], > + [Ac_cachevar=yes], > + [Ac_cachevar=no]) > +ac_c_werror_flag=$ac_save_c_werror_flag > +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"]) > +if test x"$Ac_cachevar" = x"yes"; then > + CFLAGS_EXTRA="$CFLAGS_EXTRA $1" > +fi > +undefine([Ac_cachevar])dnl > +])# PGAC_PROG_CC_CFLAGS_EXTRA_OPT I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which assigns the flag to some passed variable name. Then we can reuse it for different vars and I have the feeling those will come. And having a CFLAGS_VECTOR_OPT would just be stupid ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_ctl non-idempotent behavior change
After 87306184580c9c49717, if the postmaster dies without cleaning up (i.e. power outage), running "pg_ctl start" just gives this message and then exits: pg_ctl: another server might be running Under the old behavior, it would try to start the server anyway, and succeed, then go through recovery and give you back a functional system. >From reading the archive, I can't really tell if this change in behavior was intentional. Anyway it seems like a bad thing to me. Now the user has a system that will not start up, and is given no clue that they need to remove "postmaster.pid" and try again. The behavior here under the new "-I" flag seems no better in this situation. It claims the server is running, when it only "might" be running (and in fact is not running). Cheers, Jeff
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Fri, 2013-04-26 at 16:40 -0400, Greg Smith wrote: > I think I need to do two baselines: master without checksums, and > master with extra optimizations but still without checksums. It may be > the case that using better compile time optimizations gives a general > speedup that's worth considering regardless. The optimizations seem to > have a very significant impact on the checksum feature, but I'd like to > quantify how they change the code a little bit before even getting into > that. The patch only affects optimization flags used when compiling checksum.c, so it should have no effect on other areas of the code. If you want to compile the whole source with those flags, then just do: CFLAGS="-msse4.1 -funroll-loops -ftree-vectorize" ./configure Changing the optimization flags for existing code will have a larger impact and should be considered separately from checksums. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 4/26/13 3:57 PM, Jeff Davis wrote: The second patch adds the configure-time check for the right compilation flags, and uses them when compiling checksum.c. I called the new variable CFLAGS_EXTRA, for lack of a better idea, so feel free to come up with a new name. It doesn't check for, or use, -msse4.1, but that can be specified by the user by configuring with CFLAGS_EXTRA="-msse4.1". Thank you, that is the last piece I was looking at but couldn't nail down on my own. With that I should be able to duplicate both the slicing by 8 CRC speedup Ants sent over (which also expected some optimization changes) and trying something FNV based this weekend. I think I need to do two baselines: master without checksums, and master with extra optimizations but still without checksums. It may be the case that using better compile time optimizations gives a general speedup that's worth considering regardless. The optimizations seem to have a very significant impact on the checksum feature, but I'd like to quantify how they change the code a little bit before even getting into that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq COPY handling
On 27/04/13 02:48, Tom Lane wrote: Robert Haas writes: On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane wrote: However, the documentation in libpq.sgml is a bit bogus too, because it counsels trying the PQputCopyEnd() call again, which will not work (since we already changed the asyncStatus). We could make that say "a zero result is informational, you might want to try PQflush() later". The trouble with this, though, is that any existing callers that were coded to the old spec would now be broken. Changing the meaning of a 0 return code seems like a bad idea. However, not ever returning 0 isn't great either: someone could be forgiven for writing code that calls PQputCopyData/End() until they get a 0 result, then waits for the socket to become write-OK before continuing. Anybody who tried that would have already discovered that it doesn't work, so that argument seems pretty hollow. What I'm suggesting is that we fix the documentation to match what the code actually does, ie 1 and -1 are the only return codes, but in nonblock mode it may not actually have flushed all the data. I do not see how that can break any code that works now. An alternative possibility is to document the zero return case as "can't happen now, but defined for possible future expansion", which I rather suspect was the thought process last time this was looked at. The trouble with that is that, if we ever did try to actually return zero, we'd more than likely break some code that should have been checking for the case and wasn't. Anyway, in view of the lack of complaints from the field, I think changing the behavior of this code would be much more likely to cause problems than fix them. regards, tom lane The original usage of returns codes was as an offset for the Operating System to jump to in a list of addresses to execute, after program completion. Zero offset was to the first address for normal completion, then 4 for the next address... Addresses were stored in 4 bytes. Hence an historical tendency to define return codes in multiple of 4. This dates back to the Mainframe days, pre UNIX, even! Personally I would prefer zero to indicate an error, on the basis that is the default value for memory, but historical usage makes that impractical! So we are stuck with zero being interpreted as the return code indicating 'Normal' completion - bash, and other Linux shells, are designed on this assumption. I first came across this return code convention when I was programming Mainframes in the early 1970's. Cheers, Gavin
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs wrote: > I'm expecting to spend some time on this over the weekend, once I've > re-read the thread and patches to see if there is something to commit. > > That's my last time window, so this looks like the last chance to make > changes before beta. I updated the patch and split it into two parts (attached). The first patch is the checksum algorithm itself. I have done some documentation updates and moved it into the C file (rather than the README), but further explanation of the "shift right 3" modification will need to be by Ants or Florian. The second patch adds the configure-time check for the right compilation flags, and uses them when compiling checksum.c. I called the new variable CFLAGS_EXTRA, for lack of a better idea, so feel free to come up with a new name. It doesn't check for, or use, -msse4.1, but that can be specified by the user by configuring with CFLAGS_EXTRA="-msse4.1". I don't know of any more required changes, aside from documentation improvements. Regards, Jeff Davis fnv-jeff-20130426.patch Description: Binary data fnv-jeff-20130426-cflags-extra.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates
David Fetter writes: > While testing the upcoming FILTER clause for aggregates, Erik Rijkers > uncovered a long-standing bug in $subject, namely that this case > wasn't handled. Please find attached a patch by Andrew Gierth and > myself which fixes this issue and adds a regression test to ensure it > remains fixed. Applied with minor editorialization. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
Joe Conway writes: > Committed back to 9.1 Thanks, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/08/2013 08:34 AM, Dimitri Fontaine wrote: > Joe Conway writes: >> OK, maybe I'll try to take a look in the meantime. > > That would be awesome :) > >> Did you have any comment on the other pg_dump patch (reviewed by >> Vibhor)? > > This whole extension table filtering and dumping is more in Tom's > realm, so I guess that if you want to have another pair of eyes on > your patch before commit'ing it yourself, you will need him to have > a look. > > That said, I didn't spot anything obvious that I want to report > myself, so I would label it "ready for commiter". Committed back to 9.1 Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRetZfAAoJEDfy90M199hl8tAP/1CpNniKJIQeZgye3RBY1l6I TrO+ZEfXoh5EWRkx6uxB1hUm0mMdKIgAWVHIxbFXSaFkB1NW8oX8l1kme9fsVIV2 9xWEviccg/9+iJ/+8xdTCAn8VD4FjcjkXSayjPowekMia28IwHzEA+dww9LWHzvA TzFC5RAApdWlm9LYAp2O+9U81mbUkXkIokwuMfX9a2jlH7AxCInyo9HvgilWRlkQ /XcFBuSLqFHOles3f9QFM7ra5uq4Da3niYUkto7GJcOeT/jyFcApgi0mLGx6akpe 807S2F2dkNOCiJeyGe/ZhYV0n6YP2dQMYwa7Lhb1eQswT7VTLEVmw22LuVUWQLOU WZhHlX5YkiUqkDCXSaAMPCXM/dFdmQanVgmhv2IehL7ZnBbPM4/AL1+GftV6OHpS k6eogOhVEWOVkfNZ70K144IUocxcPPwPUyo25ov8jJpMqcRuxv43o4/nAITln5Fe CF2cY+rGTsxJJEzO2m5rXkrA9WuisPLJeS97EZE1XbbDA6CwX++O8yBvbt61NEr0 JvtnkfxnPmMvFIDH7NyVxNgUoT/jh7IbeqjEVA2l1jkWawuRAQg10Woycj0S1+7w J0nO2T8PJrLsue+Un+NcwZH38iL12a1a1IHTTD8IW4VCF+JCPzf4ka2nHWY/bjX8 D49hPm+YZ97OeifObvfP =2Jbv -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26.04.2013 19:50, Magnus Hagander wrote: On Fri, Apr 26, 2013 at 6:43 PM, Simon Riggs wrote: On 26 April 2013 17:25, Heikki Linnakangas wrote: Actually, from a usability point of view I think would be nice to have just one setting, "recovery_target". It's already somewhat confusing to have recovery_target_xid, recovery_target_time, and recovery_target_name, which are mutually exclusive, and recovery_target_inclusive which is just a modifier for the others. Maybe something like: recovery_target = 'xid 1234' recovery_target = 'xid 1234 exclusive' recovery_target = '2013-04-22 12:33' recovery_target = '2013-04-22 12:33 exclusive' recovery_target = 'consistent' recovery_target = 'name: daily backup' So now you want to change the whole existing API so it fits with your one new requirement? No, I think the above would be a usability improvement whether or not we add the new feature. I like that newer API suggestion better than what we have now - though it can perhaps be improved even more. But I definitely don't think it's worth breaking backwards compatibility for it. There are lots of tools and scripts and whatnot out there that use the current API. I think we need a bigger improvement than just a cleaner syntax to break those. It would be possible to do it in a backwards-compatible way, keeping the old API as is. But yeah, might not be worth the effort. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Fri, Apr 26, 2013 at 6:43 PM, Simon Riggs wrote: > On 26 April 2013 17:25, Heikki Linnakangas wrote: >> On 26.04.2013 19:05, Simon Riggs wrote: >>> >>> On 26 April 2013 16:38, Robert Haas wrote: On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs wrote: > > Given that I was describing how we might implement Heikki's > suggestion, I find this comment confusing. > > Please explain. Heikki's suggestion is simply to have a mode that stops as soon as consistency is reached. The server already knows (from the backup label) what the consistency point is, so there's no need to add a restore point or anything else to the WAL stream to implement what he's talking about. >>> >>> >>> Using restore points just puts into use the facility that is already >>> best practice to use, put there for just this kind of situation. >>> I guess you could do recovery_target_name = '$consistent' >>> >>> Doing it the other way means you need to add a new kind of recovery >>> target to the API just for this. >>> recovery_target_immediate = on >> >> >> Sounds good to me. >> >> Actually, from a usability point of view I think would be nice to have just >> one setting, "recovery_target". It's already somewhat confusing to have >> recovery_target_xid, recovery_target_time, and recovery_target_name, which >> are mutually exclusive, and recovery_target_inclusive which is just a >> modifier for the others. Maybe something like: >> >> recovery_target = 'xid 1234' >> recovery_target = 'xid 1234 exclusive' >> recovery_target = '2013-04-22 12:33' >> recovery_target = '2013-04-22 12:33 exclusive' >> recovery_target = 'consistent' >> recovery_target = 'name: daily backup' > > So now you want to change the whole existing API so it fits with your > one new requirement? I like that newer API suggestion better than what we have now - though it can perhaps be improved even more. But I definitely don't think it's worth breaking backwards compatibility for it. There are lots of tools and scripts and whatnot out there that use the current API. I think we need a bigger improvement than just a cleaner syntax to break those. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 17:25, Heikki Linnakangas wrote: > On 26.04.2013 19:05, Simon Riggs wrote: >> >> On 26 April 2013 16:38, Robert Haas wrote: >>> >>> On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs >>> wrote: Given that I was describing how we might implement Heikki's suggestion, I find this comment confusing. Please explain. >>> >>> >>> Heikki's suggestion is simply to have a mode that stops as soon as >>> consistency is reached. The server already knows (from the backup >>> label) what the consistency point is, so there's no need to add a >>> restore point or anything else to the WAL stream to implement what >>> he's talking about. >> >> >> Using restore points just puts into use the facility that is already >> best practice to use, put there for just this kind of situation. >> I guess you could do recovery_target_name = '$consistent' >> >> Doing it the other way means you need to add a new kind of recovery >> target to the API just for this. >> recovery_target_immediate = on > > > Sounds good to me. > > Actually, from a usability point of view I think would be nice to have just > one setting, "recovery_target". It's already somewhat confusing to have > recovery_target_xid, recovery_target_time, and recovery_target_name, which > are mutually exclusive, and recovery_target_inclusive which is just a > modifier for the others. Maybe something like: > > recovery_target = 'xid 1234' > recovery_target = 'xid 1234 exclusive' > recovery_target = '2013-04-22 12:33' > recovery_target = '2013-04-22 12:33 exclusive' > recovery_target = 'consistent' > recovery_target = 'name: daily backup' So now you want to change the whole existing API so it fits with your one new requirement? Sounds like flamebait to me, but -1, just in case you're serious. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Fri, Apr 26, 2013 at 12:25 PM, Heikki Linnakangas wrote: >> Doing it the other way means you need to add a new kind of recovery >> target to the API just for this. >> recovery_target_immediate = on > > Sounds good to me. Yeah, I don't have a problem with that, at all. > Actually, from a usability point of view I think would be nice to have just > one setting, "recovery_target". It's already somewhat confusing to have > recovery_target_xid, recovery_target_time, and recovery_target_name, which > are mutually exclusive, and recovery_target_inclusive which is just a > modifier for the others. Maybe something like: > > recovery_target = 'xid 1234' > recovery_target = 'xid 1234 exclusive' > recovery_target = '2013-04-22 12:33' > recovery_target = '2013-04-22 12:33 exclusive' > recovery_target = 'consistent' > recovery_target = 'name: daily backup' I agree that the current API is confusing in exactly the way you describe. Whether this is an improvement, I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Ashutosh Bapat writes: > On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane wrote: >> A larger point is that the patch as proposed doesn't fix the stated >> problem, because it only descends into written-out subqueries. It >> would only succeed at looking into views if we applied it after >> rewriting, rather than in the parser. That's really not going to work. >> It would be a complete disaster if the dependencies of a query that >> references a view depend on the view's contents. > Can you please elaborate, why would it be a disaster? Consider that we've done create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; create view v2 as select * from v1 group by id; Currently, v2 would be rejected but you would like to make it legal. Now consider alter table t1 drop primary key; This ALTER would have to be rejected, or else (with CASCADE) lead to dropping v2 but not v1. That's pretty ugly action-at-a-distance if you ask me. But worse, consider create or replace view v1 as select * from t2; where t2 exposes the same columns as t1 but lacks a primary-key constraint on id. This likewise would need to invalidate v2. We lack any dependency mechanism that could enforce that, and it seems seriously ugly that such a view redefinition could fail at all. (Note for instance that there's no place to put a CASCADE/RESTRICT option in CREATE OR REPLACE VIEW.) So quite aside from the implementation difficulties of looking into views for such constraints, I don't think the behavior would be pleasant if we did do it. Views are not supposed to expose properties of the underlying tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
On Fri, Apr 26, 2013 at 2:08 AM, Andres Freund wrote: > On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote: > > I'm not sure who is supposed to be able to read this sort of stuff: > > > > Latest checkpoint's NextXID: 0/7575 > > Latest checkpoint's NextOID: 49152 > > Latest checkpoint's NextMultiXactId: 7 > > Latest checkpoint's NextMultiOffset: 13 > > Latest checkpoint's oldestXID:1265 > > Latest checkpoint's oldestXID's DB: 1 > > Latest checkpoint's oldestActiveXID: 0 > > Latest checkpoint's oldestMultiXid: 1 > > Latest checkpoint's oldestMulti's DB: 1 > > > > Note that these symbols don't even correspond to the actual symbols used > > in the source code in some cases. > > > > The comments in the pg_control.h header file use much more pleasant > > terms, which when put to use would lead to output similar to this: > > > > Latest checkpoint's next free transaction ID: 0/7575 > > Latest checkpoint's next free OID:49152 > > Latest checkpoint's next free MultiXactId:7 > > Latest checkpoint's next free MultiXact offset: 13 > > Latest checkpoint's cluster-wide minimum datfrozenxid:1265 > > Latest checkpoint's database with cluster-wide minimum datfrozenxid: 1 > > Latest checkpoint's oldest transaction ID still running: 0 > > Latest checkpoint's cluster-wide minimum datminmxid: 1 > > Latest checkpoint's database with cluster-wide minimum datminmxid: 1 > > > > One could even rearrange the layout a little bit like this: > > > > Control data as of latest checkpoint: > > next free transaction ID: 0/7575 > > next free OID:49152 > > I have to admit I don't see the point. None of those values is particularly > interesting to anybody without implementation level knowledge and those > will likely deal with them just fine. And I find the version with the > shorter names far quicker to read. > I agree. For the ones I didn't know the meaning of, I still don't know the meaning of them based on the long form, either. While a tutorial on what these things mean might be useful, embedding the tutorial into the output of pg_controldata probably isn't the right place. Cheers, Jeff
Re: [HACKERS] Recovery target 'immediate'
On 26.04.2013 19:05, Simon Riggs wrote: On 26 April 2013 16:38, Robert Haas wrote: On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs wrote: Given that I was describing how we might implement Heikki's suggestion, I find this comment confusing. Please explain. Heikki's suggestion is simply to have a mode that stops as soon as consistency is reached. The server already knows (from the backup label) what the consistency point is, so there's no need to add a restore point or anything else to the WAL stream to implement what he's talking about. Using restore points just puts into use the facility that is already best practice to use, put there for just this kind of situation. I guess you could do recovery_target_name = '$consistent' Doing it the other way means you need to add a new kind of recovery target to the API just for this. recovery_target_immediate = on Sounds good to me. Actually, from a usability point of view I think would be nice to have just one setting, "recovery_target". It's already somewhat confusing to have recovery_target_xid, recovery_target_time, and recovery_target_name, which are mutually exclusive, and recovery_target_inclusive which is just a modifier for the others. Maybe something like: recovery_target = 'xid 1234' recovery_target = 'xid 1234 exclusive' recovery_target = '2013-04-22 12:33' recovery_target = '2013-04-22 12:33 exclusive' recovery_target = 'consistent' recovery_target = 'name: daily backup' - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 16:38, Robert Haas wrote: > On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs wrote: >> Given that I was describing how we might implement Heikki's >> suggestion, I find this comment confusing. >> >> Please explain. > > Heikki's suggestion is simply to have a mode that stops as soon as > consistency is reached. The server already knows (from the backup > label) what the consistency point is, so there's no need to add a > restore point or anything else to the WAL stream to implement what > he's talking about. Using restore points just puts into use the facility that is already best practice to use, put there for just this kind of situation. I guess you could do recovery_target_name = '$consistent' Doing it the other way means you need to add a new kind of recovery target to the API just for this. recovery_target_immediate = on -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Simultaneous index creates on different schemas cause deadlock?
On Thu, Apr 25, 2013 at 12:17 PM, Tom Lane wrote: > The cause is that each one will wait for all older snapshots to be > gone --- and it does that before dropping its own snapshot, so that the > other ones will see it as something to be waited out too. This makes sense. Thank you for explaining. > Since we know that C.I.C. executes in its own transaction, and there > can't be more than one on the same table due to locking, it seems to me > that it'd be safe to drop our own snapshot before waiting for other > xacts to end. That is, we could just rearrange the last few steps in > DefineIndex(), taking care to save snapshot->xmin before we destroy the > snapshot so that we still have that value to pass to > GetCurrentVirtualXIDs(). Seems reasonable to me. Looks like a fix landed in master yesterday: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c3d09b3bd23f5f65b5eb8124a3c7592dad85a50c Many thanks to Tom and all the pgsql-hackers for all the work you do! Paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] danger of stats_temp_directory = /dev/shm
On Thu, Apr 25, 2013 at 12:09 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Jeff Janes escribió: >>> With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now >>> after a crash the postmaster will try to delete all files in the directory >>> stats_temp_directory. When that is just a subdirectory of PGDATA, this is >>> fine. But it seems rather hostile when it is set to a shared directory, >>> like the popular /dev/shm. > >>> Does this need to be fixed, or at least documented? > >> I think we need it fixed so that it only deletes the files matching a >> well-known pattern. > > I think we need it fixed to reject any stats_temp_directory that is not > postgres-owned with restrictive permissions. The problem here is not > with what it deletes, it's with the insanely insecure configuration. Only deleting files matching the relevant pattern might not be a bad idea either, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs wrote: > Given that I was describing how we might implement Heikki's > suggestion, I find this comment confusing. > > Please explain. Heikki's suggestion is simply to have a mode that stops as soon as consistency is reached. The server already knows (from the backup label) what the consistency point is, so there's no need to add a restore point or anything else to the WAL stream to implement what he's talking about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 15:38, Robert Haas wrote: > On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs wrote: >> Restore points are definitely the way to go here, this is what they >> were created for. Stopping at a labelled location has a defined >> meaning for the user, which is much better than just "stop anywhere >> convenient", which I found so frightening. >> >> It should be straightforward to create a restore point with the same >> name as used in pg_start_backup('text'); >> >> pg_basebackup backups would need to use a unique key, which is harder >> to achieve. If we write a WAL record at backup start that would make >> the starting LSN unique, so we could then use that for the restore >> point name for that backup. >> >> If people want anything else they can request an additional restore >> point at the end of the backup. > > I personally find this to be considerably more error-prone than > Heikki's suggestion. Given that I was describing how we might implement Heikki's suggestion, I find this comment confusing. Please explain. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane wrote: > Ashutosh Bapat writes: > > The reason being, it doesn't look into the subqueries (in FROM clause) to > > infer that p.product_id is essentially product.product_id which is a > > primary key. > > Right. > > > Attached find a crude patch to infer the same by traversing subqueries. > > I think this is probably a bad idea. We could spend an infinite amount > of code this way, with ever-increasing runtime cost and ever-decreasing > usefulness, and where would we stop? I'm also a bit concerned about > allowing illegal queries due to bugs of omission in the > ever-more-complex checking code, which could be quite hard to find, and > when we did find them we'd be faced with a backwards compatibility break > if we fix them. > > A larger point is that the patch as proposed doesn't fix the stated > problem, because it only descends into written-out subqueries. That's correct. I tested it only with the written-out subqueries, to see if the idea works. But it started with the case involving views. > It > would only succeed at looking into views if we applied it after > rewriting, rather than in the parser. That's really not going to work. > It would be a complete disaster if the dependencies of a query that > references a view depend on the view's contents. > > Can you please elaborate, why would it be a disaster? Will we extend this functionality for written-out subqueries queries and/or views or none? I am not touchy about the approach, I have taken. I am interested in the feature extension, whichever way it gets implemented. regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company
Re: [HACKERS] libpq COPY handling
On Fri, Apr 26, 2013 at 10:48 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane wrote: >>> However, the documentation in libpq.sgml is a bit bogus too, because it >>> counsels trying the PQputCopyEnd() call again, which will not work >>> (since we already changed the asyncStatus). We could make that say "a >>> zero result is informational, you might want to try PQflush() later". >>> The trouble with this, though, is that any existing callers that were >>> coded to the old spec would now be broken. > >> Changing the meaning of a 0 return code seems like a bad idea. >> However, not ever returning 0 isn't great either: someone could be >> forgiven for writing code that calls PQputCopyData/End() until they >> get a 0 result, then waits for the socket to become write-OK before >> continuing. > > Anybody who tried that would have already discovered that it doesn't > work, so that argument seems pretty hollow. > > What I'm suggesting is that we fix the documentation to match what > the code actually does, ie 1 and -1 are the only return codes, but > in nonblock mode it may not actually have flushed all the data. > I do not see how that can break any code that works now. > > An alternative possibility is to document the zero return case as > "can't happen now, but defined for possible future expansion", which > I rather suspect was the thought process last time this was looked at. > The trouble with that is that, if we ever did try to actually return > zero, we'd more than likely break some code that should have been > checking for the case and wasn't. > > Anyway, in view of the lack of complaints from the field, I think > changing the behavior of this code would be much more likely to cause > problems than fix them. Sure, I don't think we can change the behavior now; I just think it's a shame that the documented behavior was never implemented. But at this point we don't have much choice but to live with it. By the by, might it be time to start thinking about deprecating protocol version 2, at least for COPY? It seems that supporting it requires nontrivial contortions of both the front-end and back-end code, some of which seem possibly relevant to performance. Obviously we're not going to do anything to 9.3 at this point, but maybe for the next release... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
I explain more detail about this problem. This problem was occurred by RestartPoint create illegal WAL file in during archive recovery. But I cannot recognize why illegal WAL file was created in CreateRestartPoint(). My attached patch is really plain… In problem case at XLogFileReadAnyTLI(), first check WAL file does not get fd. Because it does not exists property WAL File in archive directory. XLogFileReadAnyTLI() > if (sources & XLOG_FROM_ARCHIVE) > { > fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true); >if (fd != -1) >{ > elog(DEBUG1, "got WAL segment from archive"); > return fd; >} > } Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And return illegal WAL File’s fd. XLogFileReadAnyTLI() > if (sources & XLOG_FROM_PG_XLOG) > { > fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true); > if (fd != -1) >return fd; > } Returned fd is be readFile value. Of cource readFile value is over 0. So out of for-loop. XLogPageRead > readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2, > sources); > switched_segment = true; > if (readFile >= 0) > break; Next, problem function point. Illegal WAL file was read, and error. XLogPageRead > if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0) > { > ereport(emode_for_corrupt_record(emode, *RecPtr), >(errcode_for_file_access(), > errmsg("could not seek in log file %u, segment %u to offset %u: %m", >readId, readSeg, readOff))); > goto next_record_is_invalid; > } > if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ) > { > ereport(emode_for_corrupt_record(emode, *RecPtr), >(errcode_for_file_access(), > errmsg("could not read from log file %u, segment %u, offset %u: %m", >readId, readSeg, readOff))); > goto next_record_is_invalid; > } > if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false)) > goto next_record_is_invalid; I think that horiguchi's discovery point is after this point. We must fix that CreateRestartPoint() does not create illegal WAL File. Best regards, -- Mitsumasa KONDO
Re: [HACKERS] Recovery target 'immediate'
On Apr 26, 2013 4:38 PM, "Robert Haas" wrote: > > On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs wrote: > > Restore points are definitely the way to go here, this is what they > > were created for. Stopping at a labelled location has a defined > > meaning for the user, which is much better than just "stop anywhere > > convenient", which I found so frightening. > > > > It should be straightforward to create a restore point with the same > > name as used in pg_start_backup('text'); > > > > pg_basebackup backups would need to use a unique key, which is harder > > to achieve. If we write a WAL record at backup start that would make > > the starting LSN unique, so we could then use that for the restore > > point name for that backup. > > > > If people want anything else they can request an additional restore > > point at the end of the backup. > > I personally find this to be considerably more error-prone than > Heikki's suggestion. On the occasions when I have had the dubious > pleasure of trying to do PITR recovery, it's quite easy to supply a > recovery target that never actually gets matched - and then you > accidentally recover all the way to the end of WAL. This is not fun. > Having a bulletproof way to say "recover until you reach consistency > and then stop" is a much nicer API. I don't think "stop as soon as > possible" is at all the same thing as "stop anywhere convenient". > Thinking some more about it, this could also be useful together with pausing at the recovery target to get a quick look at the state of things before recovering further. I assume that would work as well, since it would be "a recovery target like the others".. /Magnus
Re: [HACKERS] libpq COPY handling
Robert Haas writes: > On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane wrote: >> However, the documentation in libpq.sgml is a bit bogus too, because it >> counsels trying the PQputCopyEnd() call again, which will not work >> (since we already changed the asyncStatus). We could make that say "a >> zero result is informational, you might want to try PQflush() later". >> The trouble with this, though, is that any existing callers that were >> coded to the old spec would now be broken. > Changing the meaning of a 0 return code seems like a bad idea. > However, not ever returning 0 isn't great either: someone could be > forgiven for writing code that calls PQputCopyData/End() until they > get a 0 result, then waits for the socket to become write-OK before > continuing. Anybody who tried that would have already discovered that it doesn't work, so that argument seems pretty hollow. What I'm suggesting is that we fix the documentation to match what the code actually does, ie 1 and -1 are the only return codes, but in nonblock mode it may not actually have flushed all the data. I do not see how that can break any code that works now. An alternative possibility is to document the zero return case as "can't happen now, but defined for possible future expansion", which I rather suspect was the thought process last time this was looked at. The trouble with that is that, if we ever did try to actually return zero, we'd more than likely break some code that should have been checking for the case and wasn't. Anyway, in view of the lack of complaints from the field, I think changing the behavior of this code would be much more likely to cause problems than fix them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs wrote: > Restore points are definitely the way to go here, this is what they > were created for. Stopping at a labelled location has a defined > meaning for the user, which is much better than just "stop anywhere > convenient", which I found so frightening. > > It should be straightforward to create a restore point with the same > name as used in pg_start_backup('text'); > > pg_basebackup backups would need to use a unique key, which is harder > to achieve. If we write a WAL record at backup start that would make > the starting LSN unique, so we could then use that for the restore > point name for that backup. > > If people want anything else they can request an additional restore > point at the end of the backup. I personally find this to be considerably more error-prone than Heikki's suggestion. On the occasions when I have had the dubious pleasure of trying to do PITR recovery, it's quite easy to supply a recovery target that never actually gets matched - and then you accidentally recover all the way to the end of WAL. This is not fun. Having a bulletproof way to say "recover until you reach consistency and then stop" is a much nicer API. I don't think "stop as soon as possible" is at all the same thing as "stop anywhere convenient". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq COPY handling
On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane wrote: > However, the documentation in libpq.sgml is a bit bogus too, because it > counsels trying the PQputCopyEnd() call again, which will not work > (since we already changed the asyncStatus). We could make that say "a > zero result is informational, you might want to try PQflush() later". > The trouble with this, though, is that any existing callers that were > coded to the old spec would now be broken. > > It might be better to consider that the code is correct and fix the > documentation. I notice that the other places in fe-exec.c that call > pqFlush() generally say "In nonblock mode, don't complain if we're unable > to send it all", which is pretty much the spirit of what this is doing > though it lacks that comment. Changing the meaning of a 0 return code seems like a bad idea. However, not ever returning 0 isn't great either: someone could be forgiven for writing code that calls PQputCopyData/End() until they get a 0 result, then waits for the socket to become write-OK before continuing. They might be surprised to find that this results in libpq's internal buffer expanding until malloc fails. I'm gathering that what I should probably do is something like this. Send some data using PQputCopyData() or an end-of-copy using PQputCopyEnd(). Then, call PQflush(). If the latter returns 1, then wait for the socket to become write-OK and try again. Once it returns 0, send the next hunk of copy-data. This is a bit of an exasperating interface because when the socket buffer fills up, PQputCopyData() will do a write() but we won't know what happened; we'll have to call PQflush() and try a second write of the same data without an intervening wait to find out that the buffer is full. If this worked as currently documented, that wouldn't be necessary. But it seems tolerable. >> Also, I noticed that there are a few places in fe-protocol3.c that >> seem not to know about COPY-BOTH mode. I'm not sure that any of these >> are actually bugs right now given the current very limited use of >> COPY-BOTH mode, but I'm wondering whether it wouldn't be better to >> minimize the chance of future surprises. Patch attached. > > +1 for these changes, anyway. OK, committed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Ashutosh Bapat writes: > The reason being, it doesn't look into the subqueries (in FROM clause) to > infer that p.product_id is essentially product.product_id which is a > primary key. Right. > Attached find a crude patch to infer the same by traversing subqueries. I think this is probably a bad idea. We could spend an infinite amount of code this way, with ever-increasing runtime cost and ever-decreasing usefulness, and where would we stop? I'm also a bit concerned about allowing illegal queries due to bugs of omission in the ever-more-complex checking code, which could be quite hard to find, and when we did find them we'd be faced with a backwards compatibility break if we fix them. A larger point is that the patch as proposed doesn't fix the stated problem, because it only descends into written-out subqueries. It would only succeed at looking into views if we applied it after rewriting, rather than in the parser. That's really not going to work. It would be a complete disaster if the dependencies of a query that references a view depend on the view's contents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 26 April 2013 14:40, Tom Lane wrote: > Andres Freund writes: >>> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote: The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that. > >>> Do we necessarily have to do that before beta? If not, let's concentrate on >>> getting the basic path in, and let's add the gcc-specific compiler options >>> later. > >> If we want them we should do it before beta, otherwise we won't notice >> problems that the flags cause (misoptimizations, problems on compiler >> versions, ...). So either now or in 9.4. > > Yeah, beta1 is the point in the cycle where we have the most leverage > for discovering portability problems. We should not be leaving anything > involving configure tests as "to fix later". I'm expecting to spend some time on this over the weekend, once I've re-read the thread and patches to see if there is something to commit. That's my last time window, so this looks like the last chance to make changes before beta. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 14:48, Tom Lane wrote: > Magnus Hagander writes: >> That said, maybe the easier choice for a *system* (such as v-thingy) >> would be to simply to the full backup using pg_basebackup -x (or >> similar), therefor not needing the log archive at all when restoring. >> Yes, it makes the base backup slightly larger, but also much >> simpler... As a bonus, your base backup would still work if you hosed >> your log archive. > > It doesn't appear to me that that resolves Heikki's complaint: if you > recover from such a backup, the state that you get is still rather vague > no? The system will replay to the end of whichever WAL file it last > copied. > > I think it'd be a great idea to ensure that pg_stop_backup creates a > well defined restore stop point that corresponds to some instant during > the execution of pg_stop_backup. Obviously, if other sessions are > changing the database state meanwhile, it's impossible to pin it down > more precisely than that; but I think this would satisfy the principle > of least astonishment, and it's not clear that what we have now does. Restore points are definitely the way to go here, this is what they were created for. Stopping at a labelled location has a defined meaning for the user, which is much better than just "stop anywhere convenient", which I found so frightening. It should be straightforward to create a restore point with the same name as used in pg_start_backup('text'); pg_basebackup backups would need to use a unique key, which is harder to achieve. If we write a WAL record at backup start that would make the starting LSN unique, so we could then use that for the restore point name for that backup. If people want anything else they can request an additional restore point at the end of the backup. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Hi All, If group by clause has primary key, the targetlist may have columns which are not part of the aggregate and not part of group by clause. The relevant commit is e49ae8d3bc588294d07ce1a1272b31718cfca5ef and relevant mail thread has subject Functional dependencies and GROUP BY. As a result, for following set of commands, the last SELECT statement does not throw error. CREATE TEMP TABLE products (product_id int, name text, price numeric); CREATE TEMP TABLE sales (product_id int, units int); ALTER TABLE products ADD PRIMARY KEY (product_id); SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) GROUP BY product_id; But, if I rewrite the query using views as follows create view sel_product as SELECT p.product_id, p.name, p.price FROM products p; create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s; SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM sel_product p LEFT JOIN sel_sales s using(product_id) GROUP BY p.product_id; The last SELECT statement gives error ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s... The reason being, it doesn't look into the subqueries (in FROM clause) to infer that p.product_id is essentially product.product_id which is a primary key. Attached find a crude patch to infer the same by traversing subqueries. As I said the patch is crude and needs a better shape. If community is willing to accept the extension, I can work on it further. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company gb_subquery_pk.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
Magnus Hagander writes: > That said, maybe the easier choice for a *system* (such as v-thingy) > would be to simply to the full backup using pg_basebackup -x (or > similar), therefor not needing the log archive at all when restoring. > Yes, it makes the base backup slightly larger, but also much > simpler... As a bonus, your base backup would still work if you hosed > your log archive. It doesn't appear to me that that resolves Heikki's complaint: if you recover from such a backup, the state that you get is still rather vague no? The system will replay to the end of whichever WAL file it last copied. I think it'd be a great idea to ensure that pg_stop_backup creates a well defined restore stop point that corresponds to some instant during the execution of pg_stop_backup. Obviously, if other sessions are changing the database state meanwhile, it's impossible to pin it down more precisely than that; but I think this would satisfy the principle of least astonishment, and it's not clear that what we have now does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
Andres Freund writes: >> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote: >>> The unresolved code issue that I know of is moving the compiler flags >>> behind a configure check. I would greatly appreciate it if you could take a >>> look at that. My config-fu is weak and it would take me some time to figure >>> out how to do that. >> Do we necessarily have to do that before beta? If not, let's concentrate on >> getting the basic path in, and let's add the gcc-specific compiler options >> later. > If we want them we should do it before beta, otherwise we won't notice > problems that the flags cause (misoptimizations, problems on compiler > versions, ...). So either now or in 9.4. Yeah, beta1 is the point in the cycle where we have the most leverage for discovering portability problems. We should not be leaving anything involving configure tests as "to fix later". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26.04.2013 14:54, Magnus Hagander wrote: On Fri, Apr 26, 2013 at 1:47 PM, Simon Riggs wrote: On 26 April 2013 11:29, Heikki Linnakangas wrote: But there is also an operation to simply "restore a backup". Just because a tool supports an imprecise definition of a restore, doesn't mean Postgres should encourage and support that. "Restore a backup" is more suited to filesystems where most files don't change much. And its also a common user complaint: "I restored my back but now I've lost my changes. Can you help?". That's not something that's been heard around here because we don't encourage foot-guns. I think it makes perfect sense to have this. Since we do guarantee it to still be consistent even if things *are* changing around. The lack of an easy way to do this is probably the most common reason I've seen for people using pg_dump instead of physical backups in the past. pg_basebackup fixed it for the backup side of things, with the -x option. This appears to be a suggestion to do that kind of restore even if you have a log archive style backups. That said, maybe the easier choice for a *system* (such as v-thingy) would be to simply to the full backup using pg_basebackup -x (or similar), therefor not needing the log archive at all when restoring. Even if you have all the required WAL files included in the backup, you'll still want to use a restore_command that can restore timeline history files from the archive (I found this out the hard way). Otherwise Postgres won't see the existing timeline history files, and can choose a timeline ID that's already in use. That will cause confusion after recovery when files generated on the new timeline start to be archived; they will clash with files from the "other" timeline with the same TLI. You can work around that by with a restore_command that returns false for regular WAL files, but restores timeline history files normally. But that's inconvenient again; it's not trivial to formulate such a restore_command. Also, pg_basebackup is a lot less efficient than working straight with the filesystem. It's a very convenient stand-alone backup tool, but if you're writing a backup handling system, you'll want to use something more efficient. (Data Director uses disk snapshots, as it happens) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
On Fri, Apr 26, 2013 at 5:08 AM, Andres Freund wrote: > I have to admit I don't see the point. None of those values is particularly > interesting to anybody without implementation level knowledge and those > will likely deal with them just fine. And I find the version with the > shorter names far quicker to read. > The clarity win here doesn't seem to be worth the price of potentially > breaking some tools. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 12:54, Magnus Hagander wrote: > That said, maybe the easier choice for a *system* (such as v-thingy) > would be to simply to the full backup using pg_basebackup -x (or > similar), therefor not needing the log archive at all when restoring. > Yes, it makes the base backup slightly larger, but also much > simpler... As a bonus, your base backup would still work if you hosed > your log archive. Good point. My comments also apply there. I think we should put a clear health warning on that to explain what you get and don't get. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Fri, Apr 26, 2013 at 1:47 PM, Simon Riggs wrote: > On 26 April 2013 11:29, Heikki Linnakangas wrote: > >> But there is also an operation to simply "restore a backup". > > Just because a tool supports an imprecise definition of a restore, > doesn't mean Postgres should encourage and support that. > > "Restore a backup" is more suited to filesystems where most files > don't change much. And its also a common user complaint: "I restored > my back but now I've lost my changes. Can you help?". That's not > something that's been heard around here because we don't encourage > foot-guns. I think it makes perfect sense to have this. Since we do guarantee it to still be consistent even if things *are* changing around. The lack of an easy way to do this is probably the most common reason I've seen for people using pg_dump instead of physical backups in the past. pg_basebackup fixed it for the backup side of things, with the -x option. This appears to be a suggestion to do that kind of restore even if you have a log archive style backups. That said, maybe the easier choice for a *system* (such as v-thingy) would be to simply to the full backup using pg_basebackup -x (or similar), therefor not needing the log archive at all when restoring. Yes, it makes the base backup slightly larger, but also much simpler... As a bonus, your base backup would still work if you hosed your log archive. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26 April 2013 11:29, Heikki Linnakangas wrote: > But there is also an operation to simply "restore a backup". Just because a tool supports an imprecise definition of a restore, doesn't mean Postgres should encourage and support that. "Restore a backup" is more suited to filesystems where most files don't change much. And its also a common user complaint: "I restored my back but now I've lost my changes. Can you help?". That's not something that's been heard around here because we don't encourage foot-guns. > One solution is to create restore point after the backup ends. Then you have > a clearly defined point in time you can restore to. But it would be > convenient to not have to do that. Or another way to think of this is that > it would be convenient if there was an implicit restore point at the end of > each backup. If we were going to solve that problem, that would be the way to do it. But then we could also solve other similar problems. Like queries that run for a long time. We could just have them end after a specific time rather than run to completion and give a correct answer. We could skip joins that look difficult as well. After all "Run Query" wasn't a very precise definition of what the user wanted, so what's wrong with a taking a more relaxed attutude to query execution? They will appreciate the performance gain, after all. Precision and doing the safe thing are what people trust us to do. I recognise this as a common request from users, I just don't think we should add an option to Postgres to support this when imprecise recovery is already supported by external means for those that take the conscious decision to do things that way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
--On 25. April 2013 23:19:14 -0400 Tom Lane wrote: I think I've heard of scripts grepping the output of pg_controldata for this that or the other. Any rewording of the labels would break that. While I'm not opposed to improving the labels, I would vote against your second, abbreviated scheme because it would make things ambiguous for simple grep-based scripts. I had exactly this kind of discussion just a few days ago with a customer, who wants to use the output in their scripts and was a little worried about the compatibility between major versions. I don't think we do guarantuee any output format compatibility between corresponding symbols in major versions explicitly, but given that pg_controldata seems to have a broad use case here, we should maybe document it somewhere wether to discourage or encourage people to rely on it? -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 2013-04-26 13:11:00 +0200, Florian Pflug wrote: > > The unresolved code issue that I know of is moving the compiler flags > > behind a configure check. I would greatly appreciate it if you could take a > > look at that. My config-fu is weak and it would take me some time to figure > > out how to do that. > > Do we necessarily have to do that before beta? If not, let's concentrate on > getting the basic path in, and let's add the gcc-specific compiler options > later. If we want them we should do it before beta, otherwise we won't notice problems that the flags cause (misoptimizations, problems on compiler versions, ...). So either now or in 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Apr26, 2013, at 10:28 , Ants Aasma wrote: > On Apr 25, 2013 10:38 PM, "Jeff Davis" wrote: > > > > On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote: > > > I will try to reword. > > > > Did you have a chance to clarify this, as well as some of the other > > documentation issues Simon mentioned here? > > > > http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe > > +Nk=d7nqm4ypfszaef0esak4j31lyqc...@mail.gmail.com > > Not yet. I am busy at the moment due to events in private life. I might be > able to find some time over the weekend to go over the documentation but no > guarantees. I can try to write up the reasoning behind the choice of FNV1+SHIFT3 as a checksum function, but I'm quite busy too so I'm not 100% certain I'll get to it. If that's OK with you Ants, that is. > > I'm not sure if others are waiting on me for a new patch or not. I can > > give the documentation issues a try, but I was hesitant to do so because > > you've done the research. > > > > The problems that I can correct are fairly trivial. > > The unresolved code issue that I know of is moving the compiler flags behind > a configure check. I would greatly appreciate it if you could take a look at > that. My config-fu is weak and it would take me some time to figure out how > to do that. Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add the gcc-specific compiler options later. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 26.04.2013 12:16, Simon Riggs wrote: On 18 April 2013 19:11, Heikki Linnakangas wrote: I just found out that if you use continuous archiving and online backups, it's surprisingly difficult to restore a backup, without replaying any more WAL than necessary. I didn't add it myself because I don't see the need, if we think more carefully. Why would you want your recovery end time to be governed solely by the time that the *backup* ended? How can that have any bearing on what you want at recovery time? If you have access to more WAL data, why would you not apply them as well - unless you have some specific reason not to - i.e. an incorrect xid or known problem time? If you're storing only a few of the WAL files with the backup then it will end naturally without assistance when the last file runs out. What is the difference between stopping at an exact point in WAL half way through a file and ending at the end of the file? If the end point is arbitrary, why the need to specify it so closely? I can't see a time when I have access to more WAL files *and* I want to stop early at some imprecise point. But you could write a restore_command script that stopped after a specific file forcing recovery to end. Well, I ran into this with VMware's Data Director, which manages backups among other things. In a typical setup, you have a WAL archive, and every now and then (daily, typically) a full backup is taken. Full backups are retained for some time, like a few weeks or months. The user can also manually request a full backup to be taken at any time. There is an option to perform PITR. The system figures out the latest full backup that precedes the chosen point-in-time, sets recovery_target_time, and starts up Postgres. But there is also an operation to simply "restore a backup". The idea of that is to, well, restore to the chosen backup, and nothing more. In most cases, it probably wouldn't hurt if a one or two extra WAL files are replayed beyond the backup end time, but you certainly don't want to replay all the history. Yes, you could set recovery_target_time to the point where the backup ended, but that's complicated. You'd have to read the end-of-backup timestamp from the backup history file. And because timestamps are always a bit fuzzy, I think you'd have to add at least a few seconds to that to be sure. To illustrate why it would be bad to replay more WAL than necessary, imagine that the user is about to perform some dangerous action he might want to undo later. For example, he's about to purge old data that isn't needed anymore, so with "DELETE FROM data WHERE year <= '2010'". The first thing he does is to take a backup with label "before-purging-2010". Immediately after the backup has finished, he performs the deletion. Now, the application stops working because it actually still needs the data, so he restores from the backup. If recovery decides to replay a few more WAL files after the end-of-backup, that could include the deletion, and that's no good. One solution is to create restore point after the backup ends. Then you have a clearly defined point in time you can restore to. But it would be convenient to not have to do that. Or another way to think of this is that it would be convenient if there was an implicit restore point at the end of each backup. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 18 April 2013 19:11, Heikki Linnakangas wrote: > I just found out that if you use continuous archiving and online backups, > it's surprisingly difficult to restore a backup, without replaying any more > WAL than necessary. I didn't add it myself because I don't see the need, if we think more carefully. Why would you want your recovery end time to be governed solely by the time that the *backup* ended? How can that have any bearing on what you want at recovery time? If you have access to more WAL data, why would you not apply them as well - unless you have some specific reason not to - i.e. an incorrect xid or known problem time? If you're storing only a few of the WAL files with the backup then it will end naturally without assistance when the last file runs out. What is the difference between stopping at an exact point in WAL half way through a file and ending at the end of the file? If the end point is arbitrary, why the need to specify it so closely? I can't see a time when I have access to more WAL files *and* I want to stop early at some imprecise point. But you could write a restore_command script that stopped after a specific file forcing recovery to end. I don't think we should add a feature that encourages the belief that it makes sense (because its approved by the developers) to stop recovery at an arbitrary point, deliberately discarding user data. That just encourages sysadmins to not communicate with business/management about the exact details of a recovery. So -1, given it doesn't seem to make sense anyway, but if it did there are already 2 ways of stopping at an arbitrary point. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote: > I'm not sure who is supposed to be able to read this sort of stuff: > > Latest checkpoint's NextXID: 0/7575 > Latest checkpoint's NextOID: 49152 > Latest checkpoint's NextMultiXactId: 7 > Latest checkpoint's NextMultiOffset: 13 > Latest checkpoint's oldestXID:1265 > Latest checkpoint's oldestXID's DB: 1 > Latest checkpoint's oldestActiveXID: 0 > Latest checkpoint's oldestMultiXid: 1 > Latest checkpoint's oldestMulti's DB: 1 > > Note that these symbols don't even correspond to the actual symbols used > in the source code in some cases. > > The comments in the pg_control.h header file use much more pleasant > terms, which when put to use would lead to output similar to this: > > Latest checkpoint's next free transaction ID: 0/7575 > Latest checkpoint's next free OID:49152 > Latest checkpoint's next free MultiXactId:7 > Latest checkpoint's next free MultiXact offset: 13 > Latest checkpoint's cluster-wide minimum datfrozenxid:1265 > Latest checkpoint's database with cluster-wide minimum datfrozenxid: 1 > Latest checkpoint's oldest transaction ID still running: 0 > Latest checkpoint's cluster-wide minimum datminmxid: 1 > Latest checkpoint's database with cluster-wide minimum datminmxid: 1 > > One could even rearrange the layout a little bit like this: > > Control data as of latest checkpoint: > next free transaction ID: 0/7575 > next free OID:49152 I have to admit I don't see the point. None of those values is particularly interesting to anybody without implementation level knowledge and those will likely deal with them just fine. And I find the version with the shorter names far quicker to read. The clarity win here doesn't seem to be worth the price of potentially breaking some tools. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
On 26/04/13 18:53, Daniel Farina wrote: On Thu, Apr 25, 2013 at 9:34 PM, Tom Lane wrote: Alvaro Herrera writes: Tom Lane wrote: I think I've heard of scripts grepping the output of pg_controldata for this that or the other. Any rewording of the labels would break that. While I'm not opposed to improving the labels, I would vote against your second, abbreviated scheme because it would make things ambiguous for simple grep-based scripts. We could provide two alternative outputs, one for human consumption with the proposed format and something else that uses, say, shell assignment syntax. (I did propose this years ago and I might have an unfinished patch still lingering about somewhere.) And a script would use that how? "pg_controldata --machine-friendly" would fail outright on older versions. I think it's okay to ask script writers to write pg_controldata | grep -e 'old label|new label' but not okay to ask them to deal with anything as complicated as trying a switch to see if it works or not. From what I'm reading, it seems like the main benefit of the changes is to make things easier for humans to skim over. Automated programs that care about precise meanings of each field are awkwardly but otherwise well-served by the precise output as rendered right now. What about doing something similar but different from the --machine-readable proposal, such as adding an option for the *human*-readable variant that is guaranteed to mercilessly change as human-readers/-hackers sees fit on whim? It's a bit of a kludge that this is not the default, but would prevent having to serve two quite different masters with the same output. Although I'm not seriously proposing explicitly "-h" (as seen in some GNU programs in rendering byte sizes and the like...yet could be confused for 'help'), something like that may serve as prior art. I think the current way should remain the default, as Daniel suggests - but a '--human-readable' (or suitable abbreviation) flag could be added. Such as in the command to list directory details, using the 'ls' command in Linux... (Below, *Y* = 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 bytes = 2^80 bytes.) *man ls** **[...]** ** -h, --human-readable** ** with -l, print sizes in human readable format (e.g., 1K 234M 2G)** **[...]** ** SIZE may be (or may be an integer optionally followed by) one of fol-** ** lowing: KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T,** ** P, E, Z, Y.** **[...]* Cheers, Gavin
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
Hi, I discavered the problem cause. I think taht horiguchi's discovery is another problem... Problem has CreateRestartPoint. In recovery mode, PG should not WAL record. Because PG does not know latest WAL file location. But in this problem case, PG(standby) write WAL file at RestartPoint in archive recovery. In recovery mode, I think that RestartPoint can write only MinRecoveryPoint. Here is Standby's pg_xlog directory in problem caused. [mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/ 00020003 00020007 0002000B 0003.history 00020004 00020008 0002000C 0003000E 00020005 00020009 0002000D 0003000F 00020006 0002000A 0002000E archive_status This problem case is here. [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: attempting to remove WAL segments older than log file 00030002 [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3568 [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction log file "00010002" [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607 [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction log file "00020002" [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint complete: wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s, average=0.392 s [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: recovery restart point at 0/90FE448 [Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction was at log time 2013-04-26 04:25:53.203725-04 [Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint, xlog.c:8601 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint starting: xlog [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart, xlog.c:7821 cp: cannot stat `../arc/0003000F': そのようなファイルやディレクトリはありません [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file "0003000F" from archive: return code 256 [Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile, xlog.c:3323 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: unexpected pageaddr 0/200 in log file 0, segment 15, offset 0 [Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395 cp: cannot stat `../arc/0003000F': そのようなファイルやディレクトリはありません [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file "0003000F" from archive: return code 256 In recovery, pg normary search WAL file at archive recovery. If propery WAL file is nothing(archive command is failed), next search pg_xlog directory. Normary, propety WAL file is nothing in pg_xlog. But this case has propety name's WAL file(But it's mistaken and illegal) in pg_xlog. So recovery is failed and broken Standby. So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint. It seems to fix this problem, but attached patch is plain. Best Regard, -- NTT Open Source Software Center Mitsumasa KONDO diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5452ae1..ae4bcd8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7371,7 +7371,8 @@ CreateRestartPoint(int flags) * restartpoint. It's assumed that flushing the buffers will do that as a * side-effect. */ - if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) || + if (( ControlFile->state == DB_IN_ARCHIVE_RECOVERY && RecoveryInProgress()) || + XLogRecPtrIsInvalid(lastCheckPointRecPtr) || lastCheckPoint.redo <= ControlFile->checkPointCopy.redo) { ereport(DEBUG2, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Apr 25, 2013 10:38 PM, "Jeff Davis" wrote: > > On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote: > > I will try to reword. > > Did you have a chance to clarify this, as well as some of the other > documentation issues Simon mentioned here? > > http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe > +Nk=d7nqm4ypfszaef0esak4j31lyqc...@mail.gmail.com Not yet. I am busy at the moment due to events in private life. I might be able to find some time over the weekend to go over the documentation but no guarantees. > I'm not sure if others are waiting on me for a new patch or not. I can > give the documentation issues a try, but I was hesitant to do so because > you've done the research. > > The problems that I can correct are fairly trivial. The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that. Regards, Ants Aasma
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
On 26.04.2013 07:47, Amit Langote wrote: How would code after applying this patch behave if a recycled segment gets renamed using the newest timeline (say 3) while we are still recovering from a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns that recycled segment as the next segment to recover from, we get the error. And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to be recoveryTargetTLI at all times, is that right?), it will return that wrong (recycled segment) in the first iteration itself. As long as the right segment is present in the archive, that's OK. Even if a recycled segment with higher TLI is in pg_xlog, with the patch we'll still read the segment with lower TLI from the archive. But there is a corner-case where a recycled segment with a higher TLI masks a segment with lower TLI in pg_xlog. For example, if you try to recover by copying all the required WAL files directly into pg_xlog, without using restore_command, you could run into problems. So yeah, I think you're right and we need to rethink the recycling. The first question is, do we have to recycle WAL segments during recovery at all? It's pointless when we're restoring from archive with restore_command; the recycled files will just get replaced with files from the archive. It does help when walreceiver is active, but I wonder how significant it is in practice. I guess the safest, smallest change is to use a lower TLI when installing the recycled files. So, instead of using the current recovery target timeline, use the ID of the timeline we're currently recovering. That way the reycycled segments will never have a higher TLI than other files that recovery will try to replay. See attached patch. - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 30d877b..cfd8a34 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8459,10 +8459,15 @@ CreateRestartPoint(int flags) PrevLogSeg(_logId, _logSeg); /* - * Update ThisTimeLineID to the recovery target timeline, so that - * we install any recycled segments on the correct timeline. + * Update ThisTimeLineID to the timeline we're currently recovering, + * so that we install any recycled segments on the correct timeline. + * (This might be higher than the TLI of the restartpoint we just + * made, if a timeline switch was replayed while we were performing + * the restartpoint.) */ - ThisTimeLineID = GetRecoveryTargetTLI(); + SpinLockAcquire(&xlogctl->info_lck); + ThisTimeLineID = XLogCtl->lastCheckPoint.ThisTimeLineID; + SpinLockRelease(&xlogctl->info_lck); RemoveOldXlogFiles(_logId, _logSeg, endptr); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
What would happen if a recycled segment gets assigned a newer timeline than the one we are currently recovering from? In the reported erroneous behavior, that is what happens causing XLogFileReadAnyTLI() to return such bogus segment causing the error. Since, expectedTLIs contains a newer timeline at its head, and if such a wrongly recycled (that is one bearing a newer timeline than curFileTLI) segment exists in pg_xlog, XLogFileReadAnyTLI() would return fd of that segment itself causing the error in question. In next retry, the same thing would happen and prevent standby startup from proceeding any further. So, should we also look at the segment recycling code to see how it names the recycled segments (what timelineID does it use) ? For example, in CreateRestartPoint() where segments are recycled? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753358.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
Peter Eisentraut writes: > It seems pretty straightforward and useful, so I'm not sure where your > hesitation is coming from. If you're talking about my hesitation to consider what we have now as marketing material worthy, it comes from the fact that I don't have a use case where I don't need specific information about the SQL objects involved in the command, or the Normalized Command String. > Based in this, I could add some documentation in the coming weeks. I intend to be working on this $soon too. > I don't think we have time to add support for this to the in-tree PLs. > But I was thinking we should at least check all PLs out there that they > don't crash if they are presented with an event trigger function, > because if you code a PL like this: > > if (CALLED_AS_TRIGGER(fcinfo) { > // it's a trigger > } > else { > // it's a normal call > } > > there might be some trouble. As it happens I forgot about that part. Yes the API did change in a way that's not visible at compile time, and IMV that's a bug we need to be fixing. The simple way to have at it seems to involve adding a test branch for event triggers and reporting an ERROR ERRCODE_FEATURE_NOT_SUPPORTED, the other way to fix that is to actually inclue the support for event trigger. As you did the work for PLsh you know how much (little) is involved. If you want to consider a patch against one of those for adding even trigger support (in order to have more data to decice), let me know, I'll prepare that. Regards, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
On 26.04.2013 07:02, Kyotaro HORIGUCHI wrote: I am uncertain a bit weather it is necessary to move curFileTLI to anywhere randomly read . On a short glance, the random access occurs also for reading checkpoint-related records. I didn't understand that. Also I don't have clear distinction between lastSegmentTLI and curFileTLI after the patch applied. In short, lastSegmentTLI is the TLI in first page of the last opened segment, while curFileTLI is the TLI in the filename of the last opened segment. Usually they're the same, but in a segment that contains a timeline switch, they differ. For example, if you perform PITR to a point in the middle of segment 00010062, you end up with two segments: 00010062 00020062 The first half of those files are identical, but the latter contains a time-line changing checkpoint record in the middle, and from that point on the contents are different. When we open file 00020062, lastSegmentTLI is 1, because the first half of that segment contains WAL from timeline 1, but curFileTLI is 2, because that's the TLI in the filename. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers