date:20130426

Re: [HACKERS] Assert's vs elog ERROR vs elog FATAL

2013-04-26 Thread Tom Lane

Daniel Wood  writes:
> Is the main difference between:

>  if (s->blockState != TBLOCK_SUBINPROGESS)
>  elog(*FATAL*, ...
> vs
>  Assert(s->blockState == TBLOCK_SUBINPROGRESS);

> the fact that in both cases:
>  a) the situation is unexpected, as in no user code can create this;
>  b) however, if you want the check to always be done in production 
> because of paranoia, or because a failure after this would be harder to 
> figure out, or because you want to log more info, like the exact value 
> of blockState,   then you need to use the elog(FATAL, ...) way of doing it?

It's a bit of a judgment call really.  The test-and-elog approach is the
thing to use if you want the check to be there in production builds; if
you think it's sufficient to check it in debug builds, then an Assert
is appropriate.  And, as you say, the elog does offer an opportunity to
log some additional info beyond "this should not have happened".

Another point is that an Assert forces a database-wide restart, whereas
elog(FATAL) only forces the current session to quit --- in this respect,
elog(PANIC) is a closer approximation to Assert.

I wouldn't necessarily claim that there's been a great deal of
uniformity in past decisions about which way to code can't-happen checks
;-)

> Given the example:
>  elog(ERROR, "StartTransactionCommand: unexpected state %s", ...
> vs
>  elog(FATAL, "CommitTransactionCommand: unexpected state %s", ...
> why is one considered fatal but in the other case handle-able?

I didn't look at the code but it may be that the Commit case is past the
point where it's safe to abort the transaction (although if so, it
really oughta be a PANIC).  Or it might just be randomness; although
I think we did once go through and look at all the FATAL calls to see if
that was an appropriate labeling.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assert's vs elog ERROR vs elog FATAL

2013-04-26 Thread Daniel Wood

These two questions are about the correct coding practice in Postgresql 
vs the specifics of xact.c


Is the main difference between:

if (s->blockState != TBLOCK_SUBINPROGESS)
elog(*FATAL*, ...
vs
Assert(s->blockState == TBLOCK_SUBINPROGRESS);

the fact that in both cases:
a) the situation is unexpected, as in no user code can create this;
b) however, if you want the check to always be done in production 
because of paranoia, or because a failure after this would be harder to 
figure out, or because you want to log more info, like the exact value 
of blockState,   then you need to use the elog(FATAL, ...) way of doing it?



Given the example:
elog(ERROR, "StartTransactionCommand: unexpected state %s", ...
vs
elog(FATAL, "CommitTransactionCommand: unexpected state %s", ...
why is one considered fatal but in the other case handle-able?
I presume the answer is something like:  It is subjective and there is 
no real rule.
OR: While no user code would ever likely try to handle an elog(ERROR, 
...) case, in theory the internal process state is still intact such 
that we could safely continue even if there is code that uses the ERROR 
situation incorrectly(should be FATAL).

[HACKERS] exactly what is COPY BOTH mode supposed to do in case of an error?

2013-04-26 Thread Robert Haas

It seems the backend and libpq don't agree.  The backend makes no
special provision to wait for a CopyDone message if an error occurs
during copy-both.  It simply sends an ErrorResponse and that's it.
libpq, on the other hand, treats either CopyDone or ErrorResponse as a
cue to transition to PGASYNC_COPY_IN (see pqGetCopyData3).

And that's a problem, because in PGASYNC_COPY_IN mode, libpq is
unwilling to process the ErrorResponse. getCopyDataMessage doesn't do
anything with it, and if the user calls PQgetResult(), pqParseInput3()
won't process it either, because it's only willing to do that when the
status is PGASYNC_BUSY.  So the bottom line is that after the server
throws an error, the server gets back to thinking that the connection
is idle, while the client ends up thinking that we're still in copy-in
mode.  The client can try to recover by doing PQputCopyEnd(), which
will get libpq back to an idle state, but if the extended-query
protocol is in use, the Sync it sends will cause the server to send an
extra ReadyForQuery that libpq isn't expecting.  Hilarity ensues.

It's perhaps not coincidental that "48.2.5 COPY operations" is silent
about what is supposed to happen when an error occurs in copy-both
mode, though it does talk about both copy-in and copy-out.  On
copy-in, it says:

In the event of a backend-detected error during copy-in mode
(including receipt of a CopyFail message), the backend will issue an
ErrorResponse message. If the COPY command was issued via an
extended-query message, the backend will now discard frontend messages
until a Sync message is received, then it will issue ReadyForQuery and
return to normal processing.

And regarding copy-out, it says:

In the event of a backend-detected error during copy-out mode, the
backend will issue an ErrorResponse message and revert to normal
processing. The frontend should treat receipt of ErrorResponse as
terminating the copy-out mode.

There's no corresponding statement about error-handling in the
copy-both case.  However, since the apparent intent is that an error
message from the server trumps anything the client may have had in
mind, it seems reasonable to decide that an ErrorResponse is intended
to fully terminate copy-both mode (not just switch to copy-in) and
initiate a skip-until-sync.  That's what the server actually does, but
libpq has other ideas.

One way to see the practical effect of this is to set up a streaming
replication slave but modify the backup label to reference some future
WAL location not yet written (I just changed the "0" before the slash
to a "1").  This will cause the server to throw the following error:

ereport(ERROR,
(errmsg("requested starting
point %X/%X is ahead of the WAL flush position of this server %X/%X",
(uint32)
(cmd->startpoint >> 32),
(uint32)
(cmd->startpoint),
(uint32)
(FlushPtr >> 32),
(uint32) (FlushPtr;

Doing this will cause the PQgetCopyData in libpqrcv_receive to return
-1, so you might expected that slave would get the following error:

ereport(ERROR,
(errmsg("could not receive
data from WAL stream: %s",

PQerrorMessage(streamConn;

But you don't, because PQgetResult() returns PGRES_COPY_IN.  So the
slave thinks that the master made a *normal* termination of streaming
replication, due to a timeline change, and it prints this:

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/0

It then calls PQputCopyEnd to end a copy that the server thinks is no
longer in progress and then invokes PQgetResult again.  And now it
gets the error message:

FATAL:  error reading result of streaming command: ERROR:  requested
starting point 1/500 is ahead of the WAL flush position of this
server 0/6000348

This doesn't in practice matter very much, because either way, we're
going to slam shut the server connection at this point.  But it's
clearly messed up - the error is actually NOT in response to the
CopyDone we sent, but rather in response to the START_STREAMING
command that preceded it. libpq, however, refuses to receive the error
message until after the unnecessary CopyDone has been sent.

I believe that the only other place where this coding pattern arises
is in receivelog.c.  That code has actually adopted the opposite
convention from the backend: it thinks it needs to send CopyDone
whether or not an error has occurred.  This turns out not to matter
particularly, because the server is just going to try to close the
connection anyway.  But if it did anything else after ending the copy,
I suspect the wheels would come off.

I'm attaching a patch which adopts the position that the backend is
right and libpq

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.

2013-04-26 Thread David Fetter

On Fri, Apr 26, 2013 at 05:52:03PM -0700, Jeff Davis wrote:
> On Fri, 2013-04-26 at 16:46 -0700, David Fetter wrote:
> > On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote:
> > > Given this risk and the lack of field complaints about the issue, it
> > > doesn't seem prudent to back-patch.
> 
> ...
> 
> > This needs back-patching to 9.1, where the bug was introduced.
> 
> It was explained in the commit message why it was not backported.

Oops.  Sorry about that, Tom.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.

2013-04-26 Thread Jeff Davis

On Fri, 2013-04-26 at 16:46 -0700, David Fetter wrote:
> On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote:
> > Given this risk and the lack of field complaints about the issue, it
> > doesn't seem prudent to back-patch.

...

> This needs back-patching to 9.1, where the bug was introduced.

It was explained in the commit message why it was not backported.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Jeff Davis

On Sat, 2013-04-27 at 00:20 +0200, Andres Freund wrote:
> CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

I went with CFLAGS_VECTOR to be a little shorter while still keeping
some meaning.

> I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
> assigns the flag to some passed variable name. Then we can reuse it for
> different vars and I have the feeling those will come. And having a
> CFLAGS_VECTOR_OPT would just be stupid ;)

Good suggestion; done.

Thank you for the review. New renamed patch attached for the build
options only (the other patch for the FNV checksum algorithm is
unchanged).

Regards,
Jeff Davis
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
***
*** 242,247  undefine([Ac_cachevar])dnl
--- 242,272 
  
  
  
+ # PGAC_PROG_CC_VAR_OPT
+ # ---
+ # Given a variable name and a string, check if the compiler supports
+ # the string as a command-line option. If it does, add the string to
+ # the given variable.
+ AC_DEFUN([PGAC_PROG_CC_VAR_OPT],
+ [define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_$2])])dnl
+ AC_CACHE_CHECK([whether $CC supports $2], [Ac_cachevar],
+ [pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS $2"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ _AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
+[Ac_cachevar=yes],
+[Ac_cachevar=no])
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS="$pgac_save_CFLAGS"])
+ if test x"$Ac_cachevar" = x"yes"; then
+   $1="${$1} $2"
+ fi
+ undefine([Ac_cachevar])dnl
+ ])# PGAC_PROG_CC_CFLAGS_OPT
+ 
+ 
+ 
  # PGAC_PROG_CC_LDFLAGS_OPT
  # 
  # Given a string, check if the compiler supports the string as a
*** a/configure
--- b/configure
***
*** 731,736  autodepend
--- 731,737 
  TAS
  GCC
  CPP
+ CFLAGS_VECTOR
  SUN_STUDIO_CC
  OBJEXT
  EXEEXT
***
*** 3944,3949  else
--- 3945,3955 
fi
  fi
  
+ # set CFLAGS_VECTOR from the environment, if available
+ if test "$ac_env_CFLAGS_VECTOR_set" = set; then
+   CFLAGS_VECTOR=$ac_env_CFLAGS_VECTOR_value
+ fi
+ 
  # Some versions of GCC support some additional useful warning flags.
  # Check whether they are supported, and add them to CFLAGS if so.
  # ICC pretends to be GCC but it's lying; it doesn't support these flags,
***
*** 4376,4381  if test x"$pgac_cv_prog_cc_cflags__fexcess_precision_standard" = x"yes"; then
--- 4382,4508 
CFLAGS="$CFLAGS -fexcess-precision=standard"
  fi
  
+   # Optimization flags for specific files that benefit from vectorization
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -funroll-loops" >&5
+ $as_echo_n "checking whether $CC supports -funroll-loops... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags__funroll_loops+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS -funroll-loops"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+ int
+ main ()
+ {
+ 
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext
+ if { (ac_try="$ac_compile"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_compile") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+} && test -s conftest.$ac_objext; then
+   pgac_cv_prog_cc_cflags__funroll_loops=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_prog_cc_cflags__funroll_loops=no
+ fi
+ 
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS="$pgac_save_CFLAGS"
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags__funroll_loops" >&5
+ $as_echo "$pgac_cv_prog_cc_cflags__funroll_loops" >&6; }
+ if test x"$pgac_cv_prog_cc_cflags__funroll_loops" = x"yes"; then
+   CFLAGS_VECTOR="${CFLAGS_VECTOR} -funroll-loops"
+ fi
+ 
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -ftree-vectorize" >&5
+ $as_echo_n "checking whether $CC supports -ftree-vectorize... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags__ftree_vectorize+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS -ftree-vectorize"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+

[HACKERS] Re: [COMMITTERS] pgsql: Fix collation assignment for aggregates with ORDER BY.

2013-04-26 Thread David Fetter

On Fri, Apr 26, 2013 at 07:49:47PM +, Tom Lane wrote:
> Fix collation assignment for aggregates with ORDER BY.
> 
> ORDER BY expressions were being treated the same as regular aggregate
> arguments for purposes of collation determination, but really they should
> not affect the aggregate's collation at all; only collations of the
> aggregate's regular arguments should affect it.
> 
> In many cases this mistake would lead to incorrectly throwing a "collation
> conflict" error; but in some cases the corrected code will silently assign
> a different collation to the aggregate than before, for example
>   agg(foo ORDER BY bar COLLATE "x")
> which will now use foo's collation rather than "x" for the aggregate.
> Given this risk and the lack of field complaints about the issue, it
> doesn't seem prudent to back-patch.
> 
> In passing, rearrange code in assign_collations_walker so that we don't
> need multiple copies of the standard logic for computing collation of a
> node with children.  (Previously, CaseExpr duplicated the standard logic,
> and we would have needed a third copy for Aggref without this change.)
> 
> Andrew Gierth and David Fetter

This needs back-patching to 9.1, where the bug was introduced.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Andres Freund

On 2013-04-26 12:57:09 -0700, Jeff Davis wrote:
> I updated the patch and split it into two parts (attached).

> The second patch adds the configure-time check for the right
> compilation flags, and uses them when compiling checksum.c. I
> called the new variable CFLAGS_EXTRA, for lack of a better idea,
> so feel free to come up with a new name. It doesn't check for, or
> use, -msse4.1, but that can be specified by the user by
> configuring with CFLAGS_EXTRA="-msse4.1".

CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

> --- a/config/c-compiler.m4
> +++ b/config/c-compiler.m4
> @@ -242,6 +242,30 @@ undefine([Ac_cachevar])dnl
>  
>  
>  
> +# PGAC_PROG_CC_CFLAGS_EXTRA_OPT
> +# ---
> +# Given a string, check if the compiler supports the string as a
> +# command-line option. If it does, add the string to CFLAGS_EXTRA.
> +AC_DEFUN([PGAC_PROG_CC_CFLAGS_EXTRA_OPT],
> +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_extra_$1])])dnl
> +AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar],
> +[pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
> +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA $1"
> +ac_save_c_werror_flag=$ac_c_werror_flag
> +ac_c_werror_flag=yes
> +_AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
> +   [Ac_cachevar=yes],
> +   [Ac_cachevar=no])
> +ac_c_werror_flag=$ac_save_c_werror_flag
> +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"])
> +if test x"$Ac_cachevar" = x"yes"; then
> +  CFLAGS_EXTRA="$CFLAGS_EXTRA $1"
> +fi
> +undefine([Ac_cachevar])dnl
> +])# PGAC_PROG_CC_CFLAGS_EXTRA_OPT

I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
assigns the flag to some passed variable name. Then we can reuse it for
different vars and I have the feeling those will come. And having a
CFLAGS_VECTOR_OPT would just be stupid ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_ctl non-idempotent behavior change

2013-04-26 Thread Jeff Janes

After 87306184580c9c49717, if the postmaster dies without cleaning up (i.e.
power outage), running "pg_ctl start" just gives this message and then
exits:

pg_ctl: another server might be running

Under the old behavior, it would try to start the server anyway, and
succeed, then go through recovery and give you back a functional system.

>From reading the archive, I can't really tell if this change in behavior
was intentional.

Anyway it seems like a bad thing to me.  Now the user has a system that
will not start up, and is given no clue that they need to remove
"postmaster.pid" and try again.

The behavior here under the new "-I" flag seems no better in this
situation.  It claims the server is running, when it only "might" be
running (and in fact is not running).

Cheers,

Jeff

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Jeff Davis

On Fri, 2013-04-26 at 16:40 -0400, Greg Smith wrote:
> I think I need to do two baselines:  master without checksums, and 
> master with extra optimizations but still without checksums.  It may be 
> the case that using better compile time optimizations gives a general 
> speedup that's worth considering regardless.  The optimizations seem to 
> have a very significant impact on the checksum feature, but I'd like to 
> quantify how they change the code a little bit before even getting into 
> that.

The patch only affects optimization flags used when compiling
checksum.c, so it should have no effect on other areas of the code.

If you want to compile the whole source with those flags, then just do:

  CFLAGS="-msse4.1 -funroll-loops -ftree-vectorize" ./configure

Changing the optimization flags for existing code will have a larger
impact and should be considered separately from checksums.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Greg Smith


On 4/26/13 3:57 PM, Jeff Davis wrote:

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".


Thank you, that is the last piece I was looking at but couldn't nail 
down on my own.  With that I should be able to duplicate both the 
slicing by 8 CRC speedup Ants sent over (which also expected some 
optimization changes) and trying something FNV based this weekend.


I think I need to do two baselines:  master without checksums, and 
master with extra optimizations but still without checksums.  It may be 
the case that using better compile time optimizations gives a general 
speedup that's worth considering regardless.  The optimizations seem to 
have a very significant impact on the checksum feature, but I'd like to 
quantify how they change the code a little bit before even getting into 
that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq COPY handling

2013-04-26 Thread Gavin Flower


On 27/04/13 02:48, Tom Lane wrote:

Robert Haas  writes:

On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane  wrote:

However, the documentation in libpq.sgml is a bit bogus too, because it
counsels trying the PQputCopyEnd() call again, which will not work
(since we already changed the asyncStatus).  We could make that say "a
zero result is informational, you might want to try PQflush() later".
The trouble with this, though, is that any existing callers that were
coded to the old spec would now be broken.

Changing the meaning of a 0 return code seems like a bad idea.
However, not ever returning 0 isn't great either: someone could be
forgiven for writing code that calls PQputCopyData/End() until they
get a 0 result, then waits for the socket to become write-OK before
continuing.

Anybody who tried that would have already discovered that it doesn't
work, so that argument seems pretty hollow.

What I'm suggesting is that we fix the documentation to match what
the code actually does, ie 1 and -1 are the only return codes, but
in nonblock mode it may not actually have flushed all the data.
I do not see how that can break any code that works now.

An alternative possibility is to document the zero return case as
"can't happen now, but defined for possible future expansion", which
I rather suspect was the thought process last time this was looked at.
The trouble with that is that, if we ever did try to actually return
zero, we'd more than likely break some code that should have been
checking for the case and wasn't.

Anyway, in view of the lack of complaints from the field, I think
changing the behavior of this code would be much more likely to cause
problems than fix them.

regards, tom lane


The original usage of returns codes was as an offset for the Operating 
System to jump to in a list of addresses to execute, after program 
completion.  Zero offset was to the first address for normal completion, 
then 4 for the next address... Addresses were stored in 4 bytes. Hence 
an historical tendency to define return codes in multiple of 4.  This 
dates back to the Mainframe days, pre UNIX, even!


Personally I would prefer zero to indicate an error, on the basis that 
is the default value for memory, but historical usage makes that 
impractical!  So we are stuck with zero being interpreted as the return 
code indicating 'Normal' completion - bash, and other Linux shells, are 
designed on this assumption.


I first came across this return code convention when I was programming 
Mainframes in the early 1970's.



Cheers,
Gavin

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Jeff Davis

On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs  wrote:
> I'm expecting to spend some time on this over the weekend, once I've
> re-read the thread and patches to see if there is something to commit.
>
> That's my last time window, so this looks like the last chance to make
> changes before beta.

I updated the patch and split it into two parts (attached).

The first patch is the checksum algorithm itself. I have done
some documentation updates and moved it into the C file (rather
than the README), but further explanation of the "shift right 3"
modification will need to be by Ants or Florian.

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

I don't know of any more required changes, aside from
documentation improvements.

Regards,
 Jeff Davis

fnv-jeff-20130426.patch
Description: Binary data

fnv-jeff-20130426-cflags-extra.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates

2013-04-26 Thread Tom Lane

David Fetter  writes:
> While testing the upcoming FILTER clause for aggregates, Erik Rijkers
> uncovered a long-standing bug in $subject, namely that this case
> wasn't handled.  Please find attached a patch by Andrew Gierth and
> myself which fixes this issue and adds a regression test to ensure it
> remains fixed.

Applied with minor editorialization.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-04-26 Thread Dimitri Fontaine

Joe Conway  writes:
> Committed back to 9.1

Thanks,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-04-26 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/08/2013 08:34 AM, Dimitri Fontaine wrote:
> Joe Conway  writes:
>> OK, maybe I'll try to take a look in the meantime.
> 
> That would be awesome :)
> 
>> Did you have any comment on the other pg_dump patch (reviewed by
>> Vibhor)?
> 
> This whole extension table filtering and dumping is more in Tom's
> realm, so I guess that if you want to have another pair of eyes on
> your patch before commit'ing it yourself, you will need him to have
> a look.
> 
> That said, I didn't spot anything obvious that I want to report
> myself, so I would label it "ready for commiter".

Committed back to 9.1

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRetZfAAoJEDfy90M199hl8tAP/1CpNniKJIQeZgye3RBY1l6I
TrO+ZEfXoh5EWRkx6uxB1hUm0mMdKIgAWVHIxbFXSaFkB1NW8oX8l1kme9fsVIV2
9xWEviccg/9+iJ/+8xdTCAn8VD4FjcjkXSayjPowekMia28IwHzEA+dww9LWHzvA
TzFC5RAApdWlm9LYAp2O+9U81mbUkXkIokwuMfX9a2jlH7AxCInyo9HvgilWRlkQ
/XcFBuSLqFHOles3f9QFM7ra5uq4Da3niYUkto7GJcOeT/jyFcApgi0mLGx6akpe
807S2F2dkNOCiJeyGe/ZhYV0n6YP2dQMYwa7Lhb1eQswT7VTLEVmw22LuVUWQLOU
WZhHlX5YkiUqkDCXSaAMPCXM/dFdmQanVgmhv2IehL7ZnBbPM4/AL1+GftV6OHpS
k6eogOhVEWOVkfNZ70K144IUocxcPPwPUyo25ov8jJpMqcRuxv43o4/nAITln5Fe
CF2cY+rGTsxJJEzO2m5rXkrA9WuisPLJeS97EZE1XbbDA6CwX++O8yBvbt61NEr0
JvtnkfxnPmMvFIDH7NyVxNgUoT/jh7IbeqjEVA2l1jkWawuRAQg10Woycj0S1+7w
J0nO2T8PJrLsue+Un+NcwZH38iL12a1a1IHTTD8IW4VCF+JCPzf4ka2nHWY/bjX8
D49hPm+YZ97OeifObvfP
=2Jbv
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 19:50, Magnus Hagander wrote:

On Fri, Apr 26, 2013 at 6:43 PM, Simon Riggs  wrote:

On 26 April 2013 17:25, Heikki Linnakangas  wrote:

Actually, from a usability point of view I think would be nice to have just
one setting, "recovery_target". It's already somewhat confusing to have
recovery_target_xid, recovery_target_time, and recovery_target_name, which
are mutually exclusive, and recovery_target_inclusive which is just a
modifier for the others. Maybe something like:

recovery_target = 'xid 1234'
recovery_target = 'xid 1234 exclusive'
recovery_target = '2013-04-22 12:33'
recovery_target = '2013-04-22 12:33 exclusive'
recovery_target = 'consistent'
recovery_target = 'name: daily backup'


So now you want to change the whole existing API so it fits with your
one new requirement?


No, I think the above would be a usability improvement whether or not we 
add the new feature.



I like that newer API suggestion better than what we have now - though
it can perhaps be improved even more. But I definitely don't think
it's worth breaking backwards compatibility for it. There are lots of
tools and scripts and whatnot out there that use the current API. I
think we need a bigger improvement than just a cleaner syntax to break
those.


It would be possible to do it in a backwards-compatible way, keeping the 
old API as is.  But yeah, might not be worth the effort.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Magnus Hagander

On Fri, Apr 26, 2013 at 6:43 PM, Simon Riggs  wrote:
> On 26 April 2013 17:25, Heikki Linnakangas  wrote:
>> On 26.04.2013 19:05, Simon Riggs wrote:
>>>
>>> On 26 April 2013 16:38, Robert Haas  wrote:

 On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs
 wrote:
>
> Given that I was describing how we might implement Heikki's
> suggestion, I find this comment confusing.
>
> Please explain.


 Heikki's suggestion is simply to have a mode that stops as soon as
 consistency is reached.  The server already knows (from the backup
 label) what the consistency point is, so there's no need to add a
 restore point or anything else to the WAL stream to implement what
 he's talking about.
>>>
>>>
>>> Using restore points just puts into use the facility that is already
>>> best practice to use, put there for just this kind of situation.
>>> I guess you could do recovery_target_name = '$consistent'
>>>
>>> Doing it the other way means you need to add a new kind of recovery
>>> target to the API just for this.
>>> recovery_target_immediate = on
>>
>>
>> Sounds good to me.
>>
>> Actually, from a usability point of view I think would be nice to have just
>> one setting, "recovery_target". It's already somewhat confusing to have
>> recovery_target_xid, recovery_target_time, and recovery_target_name, which
>> are mutually exclusive, and recovery_target_inclusive which is just a
>> modifier for the others. Maybe something like:
>>
>> recovery_target = 'xid 1234'
>> recovery_target = 'xid 1234 exclusive'
>> recovery_target = '2013-04-22 12:33'
>> recovery_target = '2013-04-22 12:33 exclusive'
>> recovery_target = 'consistent'
>> recovery_target = 'name: daily backup'
>
> So now you want to change the whole existing API so it fits with your
> one new requirement?

I like that newer API suggestion better than what we have now - though
it can perhaps be improved even more. But I definitely don't think
it's worth breaking backwards compatibility for it. There are lots of
tools and scripts and whatnot out there that use the current API. I
think we need a bigger improvement than just a cleaner syntax to break
those.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 17:25, Heikki Linnakangas  wrote:
> On 26.04.2013 19:05, Simon Riggs wrote:
>>
>> On 26 April 2013 16:38, Robert Haas  wrote:
>>>
>>> On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs
>>> wrote:

 Given that I was describing how we might implement Heikki's
 suggestion, I find this comment confusing.

 Please explain.
>>>
>>>
>>> Heikki's suggestion is simply to have a mode that stops as soon as
>>> consistency is reached.  The server already knows (from the backup
>>> label) what the consistency point is, so there's no need to add a
>>> restore point or anything else to the WAL stream to implement what
>>> he's talking about.
>>
>>
>> Using restore points just puts into use the facility that is already
>> best practice to use, put there for just this kind of situation.
>> I guess you could do recovery_target_name = '$consistent'
>>
>> Doing it the other way means you need to add a new kind of recovery
>> target to the API just for this.
>> recovery_target_immediate = on
>
>
> Sounds good to me.
>
> Actually, from a usability point of view I think would be nice to have just
> one setting, "recovery_target". It's already somewhat confusing to have
> recovery_target_xid, recovery_target_time, and recovery_target_name, which
> are mutually exclusive, and recovery_target_inclusive which is just a
> modifier for the others. Maybe something like:
>
> recovery_target = 'xid 1234'
> recovery_target = 'xid 1234 exclusive'
> recovery_target = '2013-04-22 12:33'
> recovery_target = '2013-04-22 12:33 exclusive'
> recovery_target = 'consistent'
> recovery_target = 'name: daily backup'

So now you want to change the whole existing API so it fits with your
one new requirement?

Sounds like flamebait to me, but -1, just in case you're serious.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Robert Haas

On Fri, Apr 26, 2013 at 12:25 PM, Heikki Linnakangas
 wrote:
>> Doing it the other way means you need to add a new kind of recovery
>> target to the API just for this.
>> recovery_target_immediate = on
>
> Sounds good to me.

Yeah, I don't have a problem with that, at all.

> Actually, from a usability point of view I think would be nice to have just
> one setting, "recovery_target". It's already somewhat confusing to have
> recovery_target_xid, recovery_target_time, and recovery_target_name, which
> are mutually exclusive, and recovery_target_inclusive which is just a
> modifier for the others. Maybe something like:
>
> recovery_target = 'xid 1234'
> recovery_target = 'xid 1234 exclusive'
> recovery_target = '2013-04-22 12:33'
> recovery_target = '2013-04-22 12:33 exclusive'
> recovery_target = 'consistent'
> recovery_target = 'name: daily backup'

I agree that the current API is confusing in exactly the way you
describe.  Whether this is an improvement, I'm not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Tom Lane

Ashutosh Bapat  writes:
> On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane  wrote:
>> A larger point is that the patch as proposed doesn't fix the stated
>> problem, because it only descends into written-out subqueries.  It
>> would only succeed at looking into views if we applied it after
>> rewriting, rather than in the parser.  That's really not going to work.
>> It would be a complete disaster if the dependencies of a query that
>> references a view depend on the view's contents.

> Can you please elaborate, why would it be a disaster?

Consider that we've done

create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;
create view v2 as select * from v1 group by id;

Currently, v2 would be rejected but you would like to make it legal.
Now consider

alter table t1 drop primary key;

This ALTER would have to be rejected, or else (with CASCADE) lead to
dropping v2 but not v1.  That's pretty ugly action-at-a-distance
if you ask me.  But worse, consider

create or replace view v1 as select * from t2;

where t2 exposes the same columns as t1 but lacks a primary-key
constraint on id.  This likewise would need to invalidate v2.  We lack
any dependency mechanism that could enforce that, and it seems seriously
ugly that such a view redefinition could fail at all.  (Note for
instance that there's no place to put a CASCADE/RESTRICT option in
CREATE OR REPLACE VIEW.)

So quite aside from the implementation difficulties of looking into
views for such constraints, I don't think the behavior would be pleasant
if we did do it.  Views are not supposed to expose properties of the
underlying tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Jeff Janes

On Fri, Apr 26, 2013 at 2:08 AM, Andres Freund wrote:

> On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote:
> > I'm not sure who is supposed to be able to read this sort of stuff:
> >
> > Latest checkpoint's NextXID:  0/7575
> > Latest checkpoint's NextOID:  49152
> > Latest checkpoint's NextMultiXactId:  7
> > Latest checkpoint's NextMultiOffset:  13
> > Latest checkpoint's oldestXID:1265
> > Latest checkpoint's oldestXID's DB:   1
> > Latest checkpoint's oldestActiveXID:  0
> > Latest checkpoint's oldestMultiXid:   1
> > Latest checkpoint's oldestMulti's DB: 1
> >
> > Note that these symbols don't even correspond to the actual symbols used
> > in the source code in some cases.
> >
> > The comments in the pg_control.h header file use much more pleasant
> > terms, which when put to use would lead to output similar to this:
> >
> > Latest checkpoint's next free transaction ID: 0/7575
> > Latest checkpoint's next free OID:49152
> > Latest checkpoint's next free MultiXactId:7
> > Latest checkpoint's next free MultiXact offset:   13
> > Latest checkpoint's cluster-wide minimum datfrozenxid:1265
> > Latest checkpoint's database with cluster-wide minimum datfrozenxid:  1
> > Latest checkpoint's oldest transaction ID still running:  0
> > Latest checkpoint's cluster-wide minimum datminmxid:  1
> > Latest checkpoint's database with cluster-wide minimum datminmxid:  1
> >
> > One could even rearrange the layout a little bit like this:
> >
> > Control data as of latest checkpoint:
> > next free transaction ID: 0/7575
> > next free OID:49152
>
> I have to admit I don't see the point. None of those values is particularly
> interesting to anybody without implementation level knowledge and those
> will likely deal with them just fine. And I find the version with the
> shorter names far quicker to read.
>

I agree.  For the ones I didn't know the meaning of, I still don't know the
meaning of them based on the long form, either.  While a tutorial on what
these things mean might be useful, embedding the tutorial into the output
of pg_controldata probably isn't the right place.

Cheers,

Jeff

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 19:05, Simon Riggs wrote:

On 26 April 2013 16:38, Robert Haas  wrote:

On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs  wrote:

Given that I was describing how we might implement Heikki's
suggestion, I find this comment confusing.

Please explain.


Heikki's suggestion is simply to have a mode that stops as soon as
consistency is reached.  The server already knows (from the backup
label) what the consistency point is, so there's no need to add a
restore point or anything else to the WAL stream to implement what
he's talking about.


Using restore points just puts into use the facility that is already
best practice to use, put there for just this kind of situation.
I guess you could do recovery_target_name = '$consistent'

Doing it the other way means you need to add a new kind of recovery
target to the API just for this.
recovery_target_immediate = on


Sounds good to me.

Actually, from a usability point of view I think would be nice to have 
just one setting, "recovery_target". It's already somewhat confusing to 
have recovery_target_xid, recovery_target_time, and 
recovery_target_name, which are mutually exclusive, and 
recovery_target_inclusive which is just a modifier for the others. Maybe 
something like:


recovery_target = 'xid 1234'
recovery_target = 'xid 1234 exclusive'
recovery_target = '2013-04-22 12:33'
recovery_target = '2013-04-22 12:33 exclusive'
recovery_target = 'consistent'
recovery_target = 'name: daily backup'

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 16:38, Robert Haas  wrote:
> On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs  wrote:
>> Given that I was describing how we might implement Heikki's
>> suggestion, I find this comment confusing.
>>
>> Please explain.
>
> Heikki's suggestion is simply to have a mode that stops as soon as
> consistency is reached.  The server already knows (from the backup
> label) what the consistency point is, so there's no need to add a
> restore point or anything else to the WAL stream to implement what
> he's talking about.

Using restore points just puts into use the facility that is already
best practice to use, put there for just this kind of situation.
I guess you could do recovery_target_name = '$consistent'

Doing it the other way means you need to add a new kind of recovery
target to the API just for this.
recovery_target_immediate = on

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Simultaneous index creates on different schemas cause deadlock?

2013-04-26 Thread Paul Hinze

On Thu, Apr 25, 2013 at 12:17 PM, Tom Lane  wrote:
> The cause is that each one will wait for all older snapshots to be
> gone --- and it does that before dropping its own snapshot, so that the
> other ones will see it as something to be waited out too.

This makes sense. Thank you for explaining.

> Since we know that C.I.C. executes in its own transaction, and there
> can't be more than one on the same table due to locking, it seems to me
> that it'd be safe to drop our own snapshot before waiting for other
> xacts to end.  That is, we could just rearrange the last few steps in
> DefineIndex(), taking care to save snapshot->xmin before we destroy the
> snapshot so that we still have that value to pass to
> GetCurrentVirtualXIDs().

Seems reasonable to me. Looks like a fix landed in master yesterday:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c3d09b3bd23f5f65b5eb8124a3c7592dad85a50c

Many thanks to Tom and all the pgsql-hackers for all the work you do!

Paul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-26 Thread Robert Haas

On Thu, Apr 25, 2013 at 12:09 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Jeff Janes escribió:
>>> With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
>>> after a crash the postmaster will try to delete all files in the directory
>>> stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
>>> fine. But it seems rather hostile when it is set to a shared directory,
>>> like the popular /dev/shm.
>
>>> Does this need to be fixed, or at least documented?
>
>> I think we need it fixed so that it only deletes the files matching a
>> well-known pattern.
>
> I think we need it fixed to reject any stats_temp_directory that is not
> postgres-owned with restrictive permissions.  The problem here is not
> with what it deletes, it's with the insanely insecure configuration.

Only deleting files matching the relevant pattern might not be a bad
idea either, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Robert Haas

On Fri, Apr 26, 2013 at 11:35 AM, Simon Riggs  wrote:
> Given that I was describing how we might implement Heikki's
> suggestion, I find this comment confusing.
>
> Please explain.

Heikki's suggestion is simply to have a mode that stops as soon as
consistency is reached.  The server already knows (from the backup
label) what the consistency point is, so there's no need to add a
restore point or anything else to the WAL stream to implement what
he's talking about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 15:38, Robert Haas  wrote:
> On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs  wrote:
>> Restore points are definitely the way to go here, this is what they
>> were created for. Stopping at a labelled location has a defined
>> meaning for the user, which is much better than just "stop anywhere
>> convenient", which I found so frightening.
>>
>> It should be straightforward to create a restore point with the same
>> name as used in pg_start_backup('text');
>>
>> pg_basebackup backups would need to use a unique key, which is harder
>> to achieve. If we write a WAL record at backup start that would make
>> the starting LSN unique, so we could then use that for the restore
>> point name for that backup.
>>
>> If people want anything else they can request an additional restore
>> point at the end of the backup.
>
> I personally find this to be considerably more error-prone than
> Heikki's suggestion.

Given that I was describing how we might implement Heikki's
suggestion, I find this comment confusing.

Please explain.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Ashutosh Bapat

On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > The reason being, it doesn't look into the subqueries (in FROM clause) to
> > infer that p.product_id is essentially product.product_id which is a
> > primary key.
>
> Right.
>
> > Attached find a crude patch to infer the same by traversing subqueries.
>
> I think this is probably a bad idea.  We could spend an infinite amount
> of code this way, with ever-increasing runtime cost and ever-decreasing
> usefulness, and where would we stop?  I'm also a bit concerned about
> allowing illegal queries due to bugs of omission in the
> ever-more-complex checking code, which could be quite hard to find, and
> when we did find them we'd be faced with a backwards compatibility break
> if we fix them.
>
> A larger point is that the patch as proposed doesn't fix the stated
> problem, because it only descends into written-out subqueries.


That's correct. I tested it only with the written-out subqueries, to see if
the idea works. But it started with the case involving views.


> It
> would only succeed at looking into views if we applied it after
> rewriting, rather than in the parser.  That's really not going to work.
> It would be a complete disaster if the dependencies of a query that
> references a view depend on the view's contents.
>
>
Can you please elaborate, why would it be a disaster?

Will we extend this functionality for written-out subqueries queries and/or
views or none?

I am not touchy about the approach, I have taken. I am interested in the
feature extension, whichever way it gets implemented.

regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] libpq COPY handling

2013-04-26 Thread Robert Haas

On Fri, Apr 26, 2013 at 10:48 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane  wrote:
>>> However, the documentation in libpq.sgml is a bit bogus too, because it
>>> counsels trying the PQputCopyEnd() call again, which will not work
>>> (since we already changed the asyncStatus).  We could make that say "a
>>> zero result is informational, you might want to try PQflush() later".
>>> The trouble with this, though, is that any existing callers that were
>>> coded to the old spec would now be broken.
>
>> Changing the meaning of a 0 return code seems like a bad idea.
>> However, not ever returning 0 isn't great either: someone could be
>> forgiven for writing code that calls PQputCopyData/End() until they
>> get a 0 result, then waits for the socket to become write-OK before
>> continuing.
>
> Anybody who tried that would have already discovered that it doesn't
> work, so that argument seems pretty hollow.
>
> What I'm suggesting is that we fix the documentation to match what
> the code actually does, ie 1 and -1 are the only return codes, but
> in nonblock mode it may not actually have flushed all the data.
> I do not see how that can break any code that works now.
>
> An alternative possibility is to document the zero return case as
> "can't happen now, but defined for possible future expansion", which
> I rather suspect was the thought process last time this was looked at.
> The trouble with that is that, if we ever did try to actually return
> zero, we'd more than likely break some code that should have been
> checking for the case and wasn't.
>
> Anyway, in view of the lack of complaints from the field, I think
> changing the behavior of this code would be much more likely to cause
> problems than fix them.

Sure, I don't think we can change the behavior now; I just think it's
a shame that the documented behavior was never implemented.  But at
this point we don't have much choice but to live with it.

By the by, might it be time to start thinking about deprecating
protocol version 2, at least for COPY?  It seems that supporting it
requires nontrivial contortions of both the front-end and back-end
code, some of which seem possibly relevant to performance.  Obviously
we're not going to do anything to 9.3 at this point, but maybe for the
next release...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Mitsumasa KONDO

I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during
archive recovery. But I cannot recognize why illegal WAL file was created
in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(),  first check WAL file does not get
fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
> if (sources & XLOG_FROM_ARCHIVE)
> {
>   fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
>if (fd != -1)
>{
>   elog(DEBUG1, "got WAL segment from archive");
>   return fd;
>}
> }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And
return illegal WAL File’s fd.

XLogFileReadAnyTLI()
>  if (sources & XLOG_FROM_PG_XLOG)
>  {
> fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
> if (fd != -1)
>return fd;
>  }

Returned fd is be readFile value. Of cource readFile value is over 0. So
out of for-loop.

XLogPageRead
>  readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
>  sources);
>   switched_segment = true;
>   if (readFile >= 0)
>  break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
>   if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
>  {
>  ereport(emode_for_corrupt_record(emode, *RecPtr),
>(errcode_for_file_access(),
>   errmsg("could not seek in log file %u, segment %u to offset %u: %m",
>readId, readSeg, readOff)));
>  goto next_record_is_invalid;
>   }
>   if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>   {
>  ereport(emode_for_corrupt_record(emode, *RecPtr),
>(errcode_for_file_access(),
>   errmsg("could not read from log file %u, segment %u, offset %u: %m",
>readId, readSeg, readOff)));
>  goto next_record_is_invalid;
>   }
>   if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
>  goto next_record_is_invalid;


I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Magnus Hagander

On Apr 26, 2013 4:38 PM, "Robert Haas"  wrote:
>
> On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs 
wrote:
> > Restore points are definitely the way to go here, this is what they
> > were created for. Stopping at a labelled location has a defined
> > meaning for the user, which is much better than just "stop anywhere
> > convenient", which I found so frightening.
> >
> > It should be straightforward to create a restore point with the same
> > name as used in pg_start_backup('text');
> >
> > pg_basebackup backups would need to use a unique key, which is harder
> > to achieve. If we write a WAL record at backup start that would make
> > the starting LSN unique, so we could then use that for the restore
> > point name for that backup.
> >
> > If people want anything else they can request an additional restore
> > point at the end of the backup.
>
> I personally find this to be considerably more error-prone than
> Heikki's suggestion.  On the occasions when I have had the dubious
> pleasure of trying to do PITR recovery, it's quite easy to supply a
> recovery target that never actually gets matched - and then you
> accidentally recover all the way to the end of WAL.  This is not fun.
> Having a bulletproof way to say "recover until you reach consistency
> and then stop" is a much nicer API.  I don't think "stop as soon as
> possible" is at all the same thing as "stop anywhere convenient".
>

Thinking some more about it, this could also be useful together with
pausing at the recovery target to get a quick look at the state of things
before recovering further. I assume that would work as well, since it would
be "a recovery target like the others"..

/Magnus

Re: [HACKERS] libpq COPY handling

2013-04-26 Thread Tom Lane

Robert Haas  writes:
> On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane  wrote:
>> However, the documentation in libpq.sgml is a bit bogus too, because it
>> counsels trying the PQputCopyEnd() call again, which will not work
>> (since we already changed the asyncStatus).  We could make that say "a
>> zero result is informational, you might want to try PQflush() later".
>> The trouble with this, though, is that any existing callers that were
>> coded to the old spec would now be broken.

> Changing the meaning of a 0 return code seems like a bad idea.
> However, not ever returning 0 isn't great either: someone could be
> forgiven for writing code that calls PQputCopyData/End() until they
> get a 0 result, then waits for the socket to become write-OK before
> continuing.

Anybody who tried that would have already discovered that it doesn't
work, so that argument seems pretty hollow.

What I'm suggesting is that we fix the documentation to match what
the code actually does, ie 1 and -1 are the only return codes, but
in nonblock mode it may not actually have flushed all the data.
I do not see how that can break any code that works now.

An alternative possibility is to document the zero return case as
"can't happen now, but defined for possible future expansion", which
I rather suspect was the thought process last time this was looked at.
The trouble with that is that, if we ever did try to actually return
zero, we'd more than likely break some code that should have been
checking for the case and wasn't.

Anyway, in view of the lack of complaints from the field, I think
changing the behavior of this code would be much more likely to cause
problems than fix them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Robert Haas

On Fri, Apr 26, 2013 at 10:05 AM, Simon Riggs  wrote:
> Restore points are definitely the way to go here, this is what they
> were created for. Stopping at a labelled location has a defined
> meaning for the user, which is much better than just "stop anywhere
> convenient", which I found so frightening.
>
> It should be straightforward to create a restore point with the same
> name as used in pg_start_backup('text');
>
> pg_basebackup backups would need to use a unique key, which is harder
> to achieve. If we write a WAL record at backup start that would make
> the starting LSN unique, so we could then use that for the restore
> point name for that backup.
>
> If people want anything else they can request an additional restore
> point at the end of the backup.

I personally find this to be considerably more error-prone than
Heikki's suggestion.  On the occasions when I have had the dubious
pleasure of trying to do PITR recovery, it's quite easy to supply a
recovery target that never actually gets matched - and then you
accidentally recover all the way to the end of WAL.  This is not fun.
Having a bulletproof way to say "recover until you reach consistency
and then stop" is a much nicer API.  I don't think "stop as soon as
possible" is at all the same thing as "stop anywhere convenient".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq COPY handling

2013-04-26 Thread Robert Haas

On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane  wrote:
> However, the documentation in libpq.sgml is a bit bogus too, because it
> counsels trying the PQputCopyEnd() call again, which will not work
> (since we already changed the asyncStatus).  We could make that say "a
> zero result is informational, you might want to try PQflush() later".
> The trouble with this, though, is that any existing callers that were
> coded to the old spec would now be broken.
>
> It might be better to consider that the code is correct and fix the
> documentation.  I notice that the other places in fe-exec.c that call
> pqFlush() generally say "In nonblock mode, don't complain if we're unable
> to send it all", which is pretty much the spirit of what this is doing
> though it lacks that comment.

Changing the meaning of a 0 return code seems like a bad idea.
However, not ever returning 0 isn't great either: someone could be
forgiven for writing code that calls PQputCopyData/End() until they
get a 0 result, then waits for the socket to become write-OK before
continuing.  They might be surprised to find that this results in
libpq's internal buffer expanding until malloc fails.

I'm gathering that what I should probably do is something like this.
Send some data using PQputCopyData() or an end-of-copy using
PQputCopyEnd().  Then, call PQflush().  If the latter returns 1, then
wait for the socket to become write-OK and try again.  Once it returns
0, send the next hunk of copy-data.  This is a bit of an exasperating
interface because when the socket buffer fills up, PQputCopyData()
will do a write() but we won't know what happened; we'll have to call
PQflush() and try a second write of the same data without an
intervening wait to find out that the buffer is full.  If this worked
as currently documented, that wouldn't be necessary.  But it seems
tolerable.

>> Also, I noticed that there are a few places in fe-protocol3.c that
>> seem not to know about COPY-BOTH mode.  I'm not sure that any of these
>> are actually bugs right now given the current very limited use of
>> COPY-BOTH mode, but I'm wondering whether it wouldn't be better to
>> minimize the chance of future surprises.  Patch attached.
>
> +1 for these changes, anyway.

OK, committed, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Tom Lane

Ashutosh Bapat  writes:
> The reason being, it doesn't look into the subqueries (in FROM clause) to
> infer that p.product_id is essentially product.product_id which is a
> primary key.

Right.

> Attached find a crude patch to infer the same by traversing subqueries.

I think this is probably a bad idea.  We could spend an infinite amount
of code this way, with ever-increasing runtime cost and ever-decreasing
usefulness, and where would we stop?  I'm also a bit concerned about
allowing illegal queries due to bugs of omission in the
ever-more-complex checking code, which could be quite hard to find, and
when we did find them we'd be faced with a backwards compatibility break
if we fix them.

A larger point is that the patch as proposed doesn't fix the stated
problem, because it only descends into written-out subqueries.  It
would only succeed at looking into views if we applied it after
rewriting, rather than in the parser.  That's really not going to work.
It would be a complete disaster if the dependencies of a query that
references a view depend on the view's contents.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Simon Riggs

On 26 April 2013 14:40, Tom Lane  wrote:
> Andres Freund  writes:
>>> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
 The unresolved code issue that I know of is moving the compiler flags 
 behind a configure check. I would greatly appreciate it if you could take 
 a look at that. My config-fu is weak and it would take me some time to 
 figure out how to do that.
>
>>> Do we necessarily have to do that before beta? If not, let's concentrate on 
>>> getting the basic path in, and let's add the gcc-specific compiler options 
>>> later.
>
>> If we want them we should do it before beta, otherwise we won't notice
>> problems that the flags cause (misoptimizations, problems on compiler
>> versions, ...). So either now or in 9.4.
>
> Yeah, beta1 is the point in the cycle where we have the most leverage
> for discovering portability problems.  We should not be leaving anything
> involving configure tests as "to fix later".

I'm expecting to spend some time on this over the weekend, once I've
re-read the thread and patches to see if there is something to commit.

That's my last time window, so this looks like the last chance to make
changes before beta.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 14:48, Tom Lane  wrote:
> Magnus Hagander  writes:
>> That said, maybe the easier choice for a *system* (such as v-thingy)
>> would be to simply to the full backup using pg_basebackup -x (or
>> similar), therefor not needing the log archive at all when restoring.
>> Yes, it makes the base backup slightly larger, but also much
>> simpler... As a bonus, your base backup would still work if you hosed
>> your log archive.
>
> It doesn't appear to me that that resolves Heikki's complaint: if you
> recover from such a backup, the state that you get is still rather vague
> no?  The system will replay to the end of whichever WAL file it last
> copied.
>
> I think it'd be a great idea to ensure that pg_stop_backup creates a
> well defined restore stop point that corresponds to some instant during
> the execution of pg_stop_backup.  Obviously, if other sessions are
> changing the database state meanwhile, it's impossible to pin it down
> more precisely than that; but I think this would satisfy the principle
> of least astonishment, and it's not clear that what we have now does.

Restore points are definitely the way to go here, this is what they
were created for. Stopping at a labelled location has a defined
meaning for the user, which is much better than just "stop anywhere
convenient", which I found so frightening.

It should be straightforward to create a restore point with the same
name as used in pg_start_backup('text');

pg_basebackup backups would need to use a unique key, which is harder
to achieve. If we write a WAL record at backup start that would make
the starting LSN unique, so we could then use that for the restore
point name for that backup.

If people want anything else they can request an additional restore
point at the end of the backup.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Ashutosh Bapat

Hi All,
If group by clause has primary key, the targetlist may have columns which
are not part of the aggregate and not part of group by clause. The relevant
commit is e49ae8d3bc588294d07ce1a1272b31718cfca5ef and relevant mail thread
has subject Functional dependencies and GROUP BY.

As a result, for following set of commands, the last SELECT statement does
not throw error.
CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products
p LEFT JOIN sales s USING (product_id) GROUP BY product_id;

But, if I rewrite the query using views as follows
create view sel_product as SELECT p.product_id, p.name, p.price FROM
products p;
create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s;
SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM sel_product p
LEFT JOIN sel_sales s using(product_id) GROUP BY p.product_id;

The last SELECT statement gives error
ERROR:  column "p.name" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s...

The reason being, it doesn't look into the subqueries (in FROM clause) to
infer that p.product_id is essentially product.product_id which is a
primary key.

Attached find a crude patch to infer the same by traversing subqueries.

As I said the patch is crude and needs a better shape. If community is
willing to accept the extension, I can work on it further.

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company


gb_subquery_pk.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Tom Lane

Magnus Hagander  writes:
> That said, maybe the easier choice for a *system* (such as v-thingy)
> would be to simply to the full backup using pg_basebackup -x (or
> similar), therefor not needing the log archive at all when restoring.
> Yes, it makes the base backup slightly larger, but also much
> simpler... As a bonus, your base backup would still work if you hosed
> your log archive.

It doesn't appear to me that that resolves Heikki's complaint: if you
recover from such a backup, the state that you get is still rather vague
no?  The system will replay to the end of whichever WAL file it last
copied.

I think it'd be a great idea to ensure that pg_stop_backup creates a
well defined restore stop point that corresponds to some instant during
the execution of pg_stop_backup.  Obviously, if other sessions are
changing the database state meanwhile, it's impossible to pin it down
more precisely than that; but I think this would satisfy the principle
of least astonishment, and it's not clear that what we have now does.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Tom Lane

Andres Freund  writes:
>> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
>>> The unresolved code issue that I know of is moving the compiler flags 
>>> behind a configure check. I would greatly appreciate it if you could take a 
>>> look at that. My config-fu is weak and it would take me some time to figure 
>>> out how to do that.

>> Do we necessarily have to do that before beta? If not, let's concentrate on 
>> getting the basic path in, and let's add the gcc-specific compiler options 
>> later.

> If we want them we should do it before beta, otherwise we won't notice
> problems that the flags cause (misoptimizations, problems on compiler
> versions, ...). So either now or in 9.4.

Yeah, beta1 is the point in the cycle where we have the most leverage
for discovering portability problems.  We should not be leaving anything
involving configure tests as "to fix later".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 14:54, Magnus Hagander wrote:

On Fri, Apr 26, 2013 at 1:47 PM, Simon Riggs  wrote:

On 26 April 2013 11:29, Heikki Linnakangas  wrote:


But there is also an operation to simply "restore a backup".


Just because a tool supports an imprecise definition of a restore,
doesn't mean Postgres should encourage and support that.

"Restore a backup" is more suited to filesystems where most files
don't change much. And its also a common user complaint: "I restored
my back but now I've lost my changes. Can you help?". That's not
something that's been heard around here because we don't encourage
foot-guns.


I think it makes perfect sense to have this. Since we do guarantee it
to still be consistent even if things *are* changing around. The lack
of an easy way to do this is probably the most common reason I've seen
for people using pg_dump instead of physical backups in the past.
pg_basebackup fixed it for the backup side of things, with the -x
option. This appears to be a suggestion to do that kind of restore
even if you have a log archive style backups.

That said, maybe the easier choice for a *system* (such as v-thingy)
would be to simply to the full backup using pg_basebackup -x (or
similar), therefor not needing the log archive at all when restoring.


Even if you have all the required WAL files included in the backup, 
you'll still want to use a restore_command that can restore timeline 
history files from the archive (I found this out the hard way). 
Otherwise Postgres won't see the existing timeline history files, and 
can choose a timeline ID that's already in use. That will cause 
confusion after recovery when files generated on the new timeline start 
to be archived; they will clash with files from the "other" timeline 
with the same TLI. You can work around that by with a restore_command 
that returns false for regular WAL files, but restores timeline history 
files normally. But that's inconvenient again; it's not trivial to 
formulate such a restore_command.


Also, pg_basebackup is a lot less efficient than working straight with 
the filesystem. It's a very convenient stand-alone backup tool, but if 
you're writing a backup handling system, you'll want to use something 
more efficient. (Data Director uses disk snapshots, as it happens)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Robert Haas

On Fri, Apr 26, 2013 at 5:08 AM, Andres Freund  wrote:
> I have to admit I don't see the point. None of those values is particularly
> interesting to anybody without implementation level knowledge and those
> will likely deal with them just fine. And I find the version with the
> shorter names far quicker to read.
> The clarity win here doesn't seem to be worth the price of potentially
> breaking some tools.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 12:54, Magnus Hagander  wrote:

> That said, maybe the easier choice for a *system* (such as v-thingy)
> would be to simply to the full backup using pg_basebackup -x (or
> similar), therefor not needing the log archive at all when restoring.
> Yes, it makes the base backup slightly larger, but also much
> simpler... As a bonus, your base backup would still work if you hosed
> your log archive.

Good point. My comments also apply there.

I think we should put a clear health warning on that to explain what
you get and don't get.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Magnus Hagander

On Fri, Apr 26, 2013 at 1:47 PM, Simon Riggs  wrote:
> On 26 April 2013 11:29, Heikki Linnakangas  wrote:
>
>> But there is also an operation to simply "restore a backup".
>
> Just because a tool supports an imprecise definition of a restore,
> doesn't mean Postgres should encourage and support that.
>
> "Restore a backup" is more suited to filesystems where most files
> don't change much. And its also a common user complaint: "I restored
> my back but now I've lost my changes. Can you help?". That's not
> something that's been heard around here because we don't encourage
> foot-guns.

I think it makes perfect sense to have this. Since we do guarantee it
to still be consistent even if things *are* changing around. The lack
of an easy way to do this is probably the most common reason I've seen
for people using pg_dump instead of physical backups in the past.
pg_basebackup fixed it for the backup side of things, with the -x
option. This appears to be a suggestion to do that kind of restore
even if you have a log archive style backups.

That said, maybe the easier choice for a *system* (such as v-thingy)
would be to simply to the full backup using pg_basebackup -x (or
similar), therefor not needing the log archive at all when restoring.
Yes, it makes the base backup slightly larger, but also much
simpler... As a bonus, your base backup would still work if you hosed
your log archive.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 26 April 2013 11:29, Heikki Linnakangas  wrote:

> But there is also an operation to simply "restore a backup".

Just because a tool supports an imprecise definition of a restore,
doesn't mean Postgres should encourage and support that.

"Restore a backup" is more suited to filesystems where most files
don't change much. And its also a common user complaint: "I restored
my back but now I've lost my changes. Can you help?". That's not
something that's been heard around here because we don't encourage
foot-guns.

> One solution is to create restore point after the backup ends. Then you have
> a clearly defined point in time you can restore to. But it would be
> convenient to not have to do that. Or another way to think of this is that
> it would be convenient if there was an implicit restore point at the end of
> each backup.

If we were going to solve that problem, that would be the way to do it.

But then we could also solve other similar problems. Like queries that
run for a long time. We could just have them end after a specific time
rather than run to completion and give a correct answer. We could skip
joins that look difficult as well. After all "Run Query" wasn't a very
precise definition of what the user wanted, so what's wrong with a
taking a more relaxed attutude to query execution? They will
appreciate the performance gain, after all.

Precision and doing the safe thing are what people trust us to do.

I recognise this as a common request from users, I just don't think we
should add an option to Postgres to support this when imprecise
recovery is already supported by external means for those that take
the conscious decision to do things that way.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Bernd Helmle




--On 25. April 2013 23:19:14 -0400 Tom Lane  wrote:


I think I've heard of scripts grepping the output of pg_controldata for
this that or the other.  Any rewording of the labels would break that.
While I'm not opposed to improving the labels, I would vote against your
second, abbreviated scheme because it would make things ambiguous for
simple grep-based scripts.


I had exactly this kind of discussion just a few days ago with a customer, 
who wants to use the output in their scripts and was a little worried about 
the compatibility between major versions.


I don't think we do guarantuee any output format compatibility between 
corresponding symbols in major versions explicitly, but given that 
pg_controldata seems to have a broad use case here, we should maybe 
document it somewhere wether to discourage or encourage people to rely on 
it?


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Andres Freund

On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
> > The unresolved code issue that I know of is moving the compiler flags 
> > behind a configure check. I would greatly appreciate it if you could take a 
> > look at that. My config-fu is weak and it would take me some time to figure 
> > out how to do that.
> 
> Do we necessarily have to do that before beta? If not, let's concentrate on 
> getting the basic path in, and let's add the gcc-specific compiler options 
> later.

If we want them we should do it before beta, otherwise we won't notice
problems that the flags cause (misoptimizations, problems on compiler
versions, ...). So either now or in 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Florian Pflug

On Apr26, 2013, at 10:28 , Ants Aasma  wrote:
> On Apr 25, 2013 10:38 PM, "Jeff Davis"  wrote:
> >
> > On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:
> > > I will try to reword.
> >
> > Did you have a chance to clarify this, as well as some of the other
> > documentation issues Simon mentioned here?
> >
> > http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe
> > +Nk=d7nqm4ypfszaef0esak4j31lyqc...@mail.gmail.com
> 
> Not yet. I am busy at the moment due to events in private life. I might be 
> able to find some time over the weekend to go over the documentation but no 
> guarantees.

I can try to write up the reasoning behind the choice of FNV1+SHIFT3 as a 
checksum function, but I'm quite busy too so I'm not 100% certain I'll get to 
it. If that's OK with you Ants, that is.

> > I'm not sure if others are waiting on me for a new patch or not. I can
> > give the documentation issues a try, but I was hesitant to do so because
> > you've done the research.
> >
> > The problems that I can correct are fairly trivial.
> 
> The unresolved code issue that I know of is moving the compiler flags behind 
> a configure check. I would greatly appreciate it if you could take a look at 
> that. My config-fu is weak and it would take me some time to figure out how 
> to do that.

Do we necessarily have to do that before beta? If not, let's concentrate on 
getting the basic path in, and let's add the gcc-specific compiler options 
later.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 12:16, Simon Riggs wrote:

On 18 April 2013 19:11, Heikki Linnakangas  wrote:


I just found out that if you use continuous archiving and online backups,
it's surprisingly difficult to restore a backup, without replaying any more
WAL than necessary.


I didn't add it myself because I don't see the need, if we think more carefully.

Why would you want your recovery end time to be governed solely by the
time that the *backup* ended? How can that have any bearing on what
you want at recovery time? If you have access to more WAL data, why
would you not apply them as well - unless you have some specific
reason not to - i.e. an incorrect xid or known problem time?

If you're storing only a few of the WAL files with the backup then it
will end naturally without assistance when the last file runs out.
What is the difference between stopping at an exact point in WAL half
way through a file and ending at the end of the file? If the end point
is arbitrary, why the need to specify it so closely?

I can't see a time when I have access to more WAL files *and* I want
to stop early at some imprecise point. But you could write a
restore_command script that stopped after a specific file forcing
recovery to end.


Well, I ran into this with VMware's Data Director, which manages backups 
among other things. In a typical setup, you have a WAL archive, and 
every now and then (daily, typically) a full backup is taken. Full 
backups are retained for some time, like a few weeks or months. The user 
can also manually request a full backup to be taken at any time.


There is an option to perform PITR. The system figures out the latest 
full backup that precedes the chosen point-in-time, sets 
recovery_target_time, and starts up Postgres. But there is also an 
operation to simply "restore a backup". The idea of that is to, well, 
restore to the chosen backup, and nothing more. In most cases, it 
probably wouldn't hurt if a one or two extra WAL files are replayed 
beyond the backup end time, but you certainly don't want to replay all 
the history. Yes, you could set recovery_target_time to the point where 
the backup ended, but that's complicated. You'd have to read the 
end-of-backup timestamp from the backup history file. And because 
timestamps are always a bit fuzzy, I think you'd have to add at least a 
few seconds to that to be sure.


To illustrate why it would be bad to replay more WAL than necessary, 
imagine that the user is about to perform some dangerous action he might 
want to undo later. For example, he's about to purge old data that isn't 
needed anymore, so with "DELETE FROM data WHERE year <= '2010'". The 
first thing he does is to take a backup with label 
"before-purging-2010". Immediately after the backup has finished, he 
performs the deletion. Now, the application stops working because it 
actually still needs the data, so he restores from the backup. If 
recovery decides to replay a few more WAL files after the end-of-backup, 
that could include the deletion, and that's no good.


One solution is to create restore point after the backup ends. Then you 
have a clearly defined point in time you can restore to. But it would be 
convenient to not have to do that. Or another way to think of this is 
that it would be convenient if there was an implicit restore point at 
the end of each backup.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-04-26 Thread Simon Riggs

On 18 April 2013 19:11, Heikki Linnakangas  wrote:

> I just found out that if you use continuous archiving and online backups,
> it's surprisingly difficult to restore a backup, without replaying any more
> WAL than necessary.

I didn't add it myself because I don't see the need, if we think more carefully.

Why would you want your recovery end time to be governed solely by the
time that the *backup* ended? How can that have any bearing on what
you want at recovery time? If you have access to more WAL data, why
would you not apply them as well - unless you have some specific
reason not to - i.e. an incorrect xid or known problem time?

If you're storing only a few of the WAL files with the backup then it
will end naturally without assistance when the last file runs out.
What is the difference between stopping at an exact point in WAL half
way through a file and ending at the end of the file? If the end point
is arbitrary, why the need to specify it so closely?

I can't see a time when I have access to more WAL files *and* I want
to stop early at some imprecise point. But you could write a
restore_command script that stopped after a specific file forcing
recovery to end.

I don't think we should add a feature that encourages the belief that
it makes sense (because its approved by the developers) to stop
recovery at an arbitrary point, deliberately discarding user data.
That just encourages sysadmins to not communicate with
business/management about the exact details of a recovery.

So -1, given it doesn't seem to make sense anyway, but if it did there
are already 2 ways of stopping at an arbitrary point.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Andres Freund

On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote:
> I'm not sure who is supposed to be able to read this sort of stuff:
> 
> Latest checkpoint's NextXID:  0/7575
> Latest checkpoint's NextOID:  49152
> Latest checkpoint's NextMultiXactId:  7
> Latest checkpoint's NextMultiOffset:  13
> Latest checkpoint's oldestXID:1265
> Latest checkpoint's oldestXID's DB:   1
> Latest checkpoint's oldestActiveXID:  0
> Latest checkpoint's oldestMultiXid:   1
> Latest checkpoint's oldestMulti's DB: 1
> 
> Note that these symbols don't even correspond to the actual symbols used
> in the source code in some cases.
> 
> The comments in the pg_control.h header file use much more pleasant
> terms, which when put to use would lead to output similar to this:
> 
> Latest checkpoint's next free transaction ID: 0/7575
> Latest checkpoint's next free OID:49152
> Latest checkpoint's next free MultiXactId:7
> Latest checkpoint's next free MultiXact offset:   13
> Latest checkpoint's cluster-wide minimum datfrozenxid:1265
> Latest checkpoint's database with cluster-wide minimum datfrozenxid:  1
> Latest checkpoint's oldest transaction ID still running:  0
> Latest checkpoint's cluster-wide minimum datminmxid:  1
> Latest checkpoint's database with cluster-wide minimum datminmxid:  1
> 
> One could even rearrange the layout a little bit like this:
> 
> Control data as of latest checkpoint:
> next free transaction ID: 0/7575
> next free OID:49152

I have to admit I don't see the point. None of those values is particularly
interesting to anybody without implementation level knowledge and those
will likely deal with them just fine. And I find the version with the
shorter names far quicker to read.
The clarity win here doesn't seem to be worth the price of potentially
breaking some tools.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Gavin Flower


On 26/04/13 18:53, Daniel Farina wrote:

On Thu, Apr 25, 2013 at 9:34 PM, Tom Lane  wrote:

Alvaro Herrera  writes:

Tom Lane wrote:

I think I've heard of scripts grepping the output of pg_controldata for
this that or the other.  Any rewording of the labels would break that.
While I'm not opposed to improving the labels, I would vote against your
second, abbreviated scheme because it would make things ambiguous for
simple grep-based scripts.

We could provide two alternative outputs, one for human consumption with
the proposed format and something else that uses, say, shell assignment
syntax.  (I did propose this years ago and I might have an unfinished
patch still lingering about somewhere.)

And a script would use that how?  "pg_controldata --machine-friendly"
would fail outright on older versions.  I think it's okay to ask script
writers to write
 pg_controldata | grep -e 'old label|new label'
but not okay to ask them to deal with anything as complicated as trying
a switch to see if it works or not.

 From what I'm reading, it seems like the main benefit of the changes
is to make things easier for humans to skim over.  Automated programs
that care about precise meanings of each field are awkwardly but
otherwise well-served by the precise output as rendered right now.

What about doing something similar but different from the
--machine-readable proposal, such as adding an option for the
*human*-readable variant that is guaranteed to mercilessly change as
human-readers/-hackers sees fit on whim?  It's a bit of a kludge that
this is not the default, but would prevent having to serve two quite
different masters with the same output.

Although I'm not seriously proposing explicitly "-h" (as seen in some
GNU programs in rendering byte sizes and the like...yet could be
confused for 'help'), something like that may serve as prior art.


 I think the current way should remain the default, as Daniel suggests 
- but a '--human-readable' (or suitable abbreviation) flag could be added.


Such as in the command to list directory details, using the 'ls' command 
in Linux...



(Below, *Y* = 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 
bytes  = 2^80 bytes.)


*man ls**
**[...]**
**  -h, --human-readable**
**  with -l, print sizes in human readable format (e.g., 1K 
234M 2G)**

**[...]**
**   SIZE  may  be (or may be an integer optionally followed by) one 
of fol-**
**   lowing: KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on 
for G, T,**

**   P, E, Z, Y.**
**[...]*


Cheers,
Gavin

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread KONDO Mitsumasa


Hi,

I discavered the problem cause. I think taht horiguchi's discovery is another 
problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in archive 
recovery.
In recovery mode, I think that RestartPoint can write only MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.

[mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
00020003  00020007  0002000B  
0003.history
00020004  00020008  0002000C  
0003000E
00020005  00020009  0002000D  
0003000F
00020006  0002000A  0002000E  
archive_status


This problem case is here.

[Standby] 2013-04-26 04:26:44 EDT DEBUG:  0: attempting to remove WAL 
segments older than log file 00030002
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3568
[Standby] 2013-04-26 04:26:44 EDT DEBUG:  0: recycled transaction log file 
"00010002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT DEBUG:  0: recycled transaction log file 
"00020002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT LOG:  0: restartpoint complete: wrote 9 
buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s, 
average=0.392 s
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  LogCheckpointEnd, xlog.c:7893
[Standby] 2013-04-26 04:26:44 EDT LOG:  0: recovery restart point at 
0/90FE448
[Standby] 2013-04-26 04:26:44 EDT DETAIL:  last completed transaction was at 
log time 2013-04-26 04:25:53.203725-04
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  CreateRestartPoint, xlog.c:8601
[Standby] 2013-04-26 04:26:44 EDT LOG:  0: restartpoint starting: xlog
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  LogCheckpointStart, xlog.c:7821
cp: cannot stat `../arc/0003000F': そのようなファイルやディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG:  0: could not restore file 
"0003000F" from archive: return code 256
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  RestoreArchivedFile, xlog.c:3323
[Standby] 2013-04-26 04:26:44 EDT LOG:  0: unexpected pageaddr 0/200 in 
log file 0, segment 15, offset 0
[Standby] 2013-04-26 04:26:44 EDT LOCATION:  ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/0003000F': そのようなファイルやディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG:  0: could not restore file 
"0003000F" from archive: return code 256


In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search pg_xlog 
directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal) in 
pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.


Best Regard,
--
NTT Open Source Software Center
Mitsumasa KONDO
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5452ae1..ae4bcd8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7371,7 +7371,8 @@ CreateRestartPoint(int flags)
 * restartpoint. It's assumed that flushing the buffers will do that as a
 * side-effect.
 */
-   if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
+   if (( ControlFile->state == DB_IN_ARCHIVE_RECOVERY && RecoveryInProgress()) ||
+   XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
{
ereport(DEBUG2,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-26 Thread Ants Aasma

On Apr 25, 2013 10:38 PM, "Jeff Davis"  wrote:
>
> On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:
> > I will try to reword.
>
> Did you have a chance to clarify this, as well as some of the other
> documentation issues Simon mentioned here?
>
> http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe
> +Nk=d7nqm4ypfszaef0esak4j31lyqc...@mail.gmail.com

Not yet. I am busy at the moment due to events in private life. I might be
able to find some time over the weekend to go over the documentation but no
guarantees.

> I'm not sure if others are waiting on me for a new patch or not. I can
> give the documentation issues a try, but I was hesitant to do so because
> you've done the research.
>
> The problems that I can correct are fairly trivial.

The unresolved code issue that I know of is moving the compiler flags
behind a configure check. I would greatly appreciate it if you could take a
look at that. My config-fu is weak and it would take me some time to figure
out how to do that.

Regards,
Ants Aasma

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 07:47, Amit Langote wrote:

  How would code after applying this patch behave if a recycled segment gets
renamed using the newest timeline (say 3) while we are still recovering from
a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
that recycled segment as the next segment to recover from, we get the error.
And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
be recoveryTargetTLI at all times, is that right?), it will return that
wrong (recycled segment) in the first iteration itself.


As long as the right segment is present in the archive, that's OK. Even 
if a recycled segment with higher TLI is in pg_xlog, with the patch 
we'll still read the segment with lower TLI from the archive. But there 
is a corner-case where a recycled segment with a higher TLI masks a 
segment with lower TLI in pg_xlog. For example, if you try to recover by 
copying all the required WAL files directly into pg_xlog, without using 
restore_command, you could run into problems.


So yeah, I think you're right and we need to rethink the recycling. The 
first question is, do we have to recycle WAL segments during recovery at 
all? It's pointless when we're restoring from archive with 
restore_command; the recycled files will just get replaced with files 
from the archive. It does help when walreceiver is active, but I wonder 
how significant it is in practice.


I guess the safest, smallest change is to use a lower TLI when 
installing the recycled files. So, instead of using the current recovery 
target timeline, use the ID of the timeline we're currently recovering. 
That way the reycycled segments will never have a higher TLI than other 
files that recovery will try to replay. See attached patch.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30d877b..cfd8a34 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8459,10 +8459,15 @@ CreateRestartPoint(int flags)
 		PrevLogSeg(_logId, _logSeg);
 
 		/*
-		 * Update ThisTimeLineID to the recovery target timeline, so that
-		 * we install any recycled segments on the correct timeline.
+		 * Update ThisTimeLineID to the timeline we're currently recovering,
+		 * so that we install any recycled segments on the correct timeline.
+		 * (This might be higher than the TLI of the restartpoint we just
+		 * made, if a timeline switch was replayed while we were performing
+		 * the restartpoint.)
 		 */
-		ThisTimeLineID = GetRecoveryTargetTLI();
+		SpinLockAcquire(&xlogctl->info_lck);
+		ThisTimeLineID = XLogCtl->lastCheckPoint.ThisTimeLineID;
+		SpinLockRelease(&xlogctl->info_lck);
 
 		RemoveOldXlogFiles(_logId, _logSeg, endptr);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Amit Langote

What would happen if a recycled segment gets assigned a newer timeline than
the one we are currently recovering from? In the reported erroneous
behavior, that is what happens causing XLogFileReadAnyTLI() to return such
bogus segment causing the error. Since, expectedTLIs contains a newer
timeline at its head, and if such a wrongly recycled (that is one bearing a
newer timeline than curFileTLI) segment exists in pg_xlog,
XLogFileReadAnyTLI() would return fd of that segment itself causing the
error in question. In next retry, the same thing would happen and prevent
standby startup from proceeding any further.

So, should we also look at the segment recycling code to see how it names
the recycled segments (what timelineID does it use) ? For example, in
CreateRestartPoint() where segments are recycled? 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753358.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-26 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> It seems pretty straightforward and useful, so I'm not sure where your
> hesitation is coming from.

If you're talking about my hesitation to consider what we have now as
marketing material worthy, it comes from the fact that I don't have a
use case where I don't need specific information about the SQL objects
involved in the command, or the Normalized Command String.

> Based in this, I could add some documentation in the coming weeks.

I intend to be working on this $soon too.

> I don't think we have time to add support for this to the in-tree PLs.
> But I was thinking we should at least check all PLs out there that they
> don't crash if they are presented with an event trigger function,
> because if you code a PL like this:
>
> if (CALLED_AS_TRIGGER(fcinfo) {
> // it's a trigger
> }
> else {
> // it's a normal call
> }
>
> there might be some trouble.

As it happens I forgot about that part. Yes the API did change in a way
that's not visible at compile time, and IMV that's a bug we need to be
fixing.

The simple way to have at it seems to involve adding a test branch for
event triggers and reporting an ERROR ERRCODE_FEATURE_NOT_SUPPORTED, the
other way to fix that is to actually inclue the support for event
trigger.

As you did the work for PLsh you know how much (little) is involved. If
you want to consider a patch against one of those for adding even
trigger support (in order to have more data to decice), let me know,
I'll prepare that.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Heikki Linnakangas


On 26.04.2013 07:02, Kyotaro HORIGUCHI wrote:

I am uncertain a bit weather it is  necessary to move curFileTLI to
anywhere randomly read . On a short glance, the random access occurs also
for reading checkpoint-related records.


I didn't understand that.


Also I don't have clear distinction between lastSegmentTLI and curFileTLI
after the patch applied.


In short, lastSegmentTLI is the TLI in first page of the last opened 
segment, while curFileTLI is the TLI in the filename of the last opened 
segment. Usually they're the same, but in a segment that contains a 
timeline switch, they differ. For example, if you perform PITR to a 
point in the middle of segment 00010062, you end up with 
two segments:


00010062
00020062

The first half of those files are identical, but the latter contains a 
time-line changing checkpoint record in the middle, and from that point 
on the contents are different. When we open file 
00020062, lastSegmentTLI is 1, because the first half of 
that segment contains WAL from timeline 1, but curFileTLI is 2, because 
that's the TLI in the filename.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo