from:"\"Chris Redekop\""

Re: [HACKERS] 9.1.2 ?

2011-11-12 Thread Chris Redekop

On Wed, Nov 9, 2011 at 6:22 PM, Florian Pflug  wrote:

> On Nov9, 2011, at 23:53 , Daniel Farina wrote:
> > I think a novice user would be scared half to death: I know I was the
> > first time.  That's not a great impression for the project to leave
> > for what is not, at its root, a vast defect, and the fact it's
> > occurring for people when they use rsync rather than my very sensitive
> > backup routines is indication that it's not very corner-ey.
>
> Just to emphasize the non-conerish-ness of this problem, it should be
> mentioned that the HS issue was observed even with backups taken with
> pg_basebackup, if memory serves correctly.
>
Yes I personally can reliably reproduce both the clog+subtrans problems
using pg_basebackup, and can confirm that the
"oldestActiveXid_fixed.v2.patch" does resolve both issues.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-02 Thread Chris Redekop

okay, sorry I'm a little confused then.  Should I be able to apply both the
v2 patch as well as the v3 patch?  or is it expected that I'd have to
manually do the merge?

On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs  wrote:

> On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop  wrote:
>
> > looks like the v3 patch re-introduces the pg_subtrans issue...
>
> No, I just separated the patches to be clearer about the individual
> changes.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-11-02 Thread Chris Redekop

oopsreply-to-all

-- Forwarded message --
From: Chris Redekop 
Date: Wed, Nov 2, 2011 at 8:41 AM
Subject: Re: [HACKERS] Hot Standby startup with overflowed snapshots
To: Simon Riggs 


Sure, I've got quite a few logs lying around - I've attached 3 of 'em...let
me know if there are any specific things you'd like me to do or look for
next time it happens


On Wed, Nov 2, 2011 at 2:59 AM, Simon Riggs  wrote:

> On Fri, Oct 28, 2011 at 3:42 AM, Chris Redekop  wrote:
>
> > On a side note I am sporadically seeing another error on hotstandby
> startup.
> >  I'm not terribly concerned about it as it is pretty rare and it will
> work
> > on a retry so it's not a big deal.  The error is "FATAL:  out-of-order
> XID
> > insertion in KnownAssignedXids".  If you think it might be a bug and are
> > interested in hunting it down let me know and I'll help any way I
> can...but
> > if you're not too worried about it then neither am I :)
>
> I'd be interested to see further details of this if you see it again,
> or have access to previous logs.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


postgresql-2011-10-27_202007.log
Description: Binary data


postgresql-2011-10-31_152925.log
Description: Binary data


postgresql-2011-11-01_094501.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-01 Thread Chris Redekop

looks like the v3 patch re-introduces the pg_subtrans issue...


On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs  wrote:

> On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs 
> wrote:
>
> > StartupMultiXact() didn't need changing, I thought, but I will review
> further.
>
> Good suggestion.
>
> On review, StartupMultiXact() could also suffer similar error to the
> clog failure. This was caused *because* MultiXact is not maintained by
> recovery, which I had thought meant it was protected from such
> failure.
>
> Revised patch attached.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop

Sorry..."designed" was poor choice of words, I meant "not unexpected".
 Doing the checkpoint right after pg_stop_backup() looks like it will work
perfectly for me, so thanks for all your help!

On a side note I am sporadically seeing another error on hotstandby startup.
 I'm not terribly concerned about it as it is pretty rare and it will work
on a retry so it's not a big deal.  The error is "FATAL:  out-of-order XID
insertion in KnownAssignedXids".  If you think it might be a bug and are
interested in hunting it down let me know and I'll help any way I can...but
if you're not too worried about it then neither am I :)


On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs  wrote:

> On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop 
> wrote:
>
> > hrmz, still basically the same behaviour.  I think it might be a *little*
> > better with this patch.  Before when under load it would start up quickly
> > maybe 2 or 3 times out of 10 attemptswith this patch it might be up
> to 4
> > or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm
> still
> > only seeing your log statement a single time (I'm running at debug2).  I
> > have discovered something though - when the standby is in this state if I
> > force a checkpoint on the primary then the standby comes right up.  Is
> there
> > anything I check or try for you to help figure this out?or is it
> > actually as designed that it could take 10-ish minutes to start up even
> > after all clients have disconnected from the primary?
>
> Thanks for testing. The improvements cover specific cases, so its not
> subject to chance; its not a performance patch.
>
> It's not "designed" to act the way you describe, but it does.
>
> The reason this occurs is that you have a transaction heavy workload
> with occasional periods of complete quiet and a base backup time that
> is much less than checkpoint_timeout. If your base backup was slower
> the checkpoint would have hit naturally before recovery had reached a
> consistent state. Which seems fairly atypical. I guess you're doing
> this on a test system.
>
> It seems cheap to add in a call to LogStandbySnapshot() after each
> call to pg_stop_backup().
>
> Does anyone think this case is worth adding code for? Seems like one
> more thing to break.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop

hrmz, still basically the same behaviour.  I think it might be a *little*
better with this patch.  Before when under load it would start up quickly
maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
only seeing your log statement a single time (I'm running at debug2).  I
have discovered something though - when the standby is in this state if I
force a checkpoint on the primary then the standby comes right up.  Is there
anything I check or try for you to help figure this out?or is it
actually as designed that it could take 10-ish minutes to start up even
after all clients have disconnected from the primary?

On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs  wrote:

> On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop  wrote:
>
> > Thanks for the patch Simon, but unfortunately it does not resolve the
> issue
> > I am seeing.  The standby still refuses to finish starting up until long
> > after all clients have disconnected from the primary (>10 minutes).  I do
> > see your new log statement on startup, but only once - it does not
> repeat.
> >  Is there any way for me to see  what the oldest xid on the standby is
> via
> > controldata or something like that?  The standby does stream to keep up
> with
> > the primary while the primary has load, and then it becomes idle when the
> > primary becomes idle (when I kill all the connections)so it appears
> to
> > be current...but it just doesn't finish starting up
> > I'm not sure if it's relevant, but after it has sat idle for a couple
> > minutes I start seeing these statements in the log (with the same offset
> > every time):
> > DEBUG:  skipping restartpoint, already performed at 9/9520
>
> OK, so it looks like there are 2 opportunities to improve, not just one.
>
> Try this.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop

Thanks for the patch Simon, but unfortunately it does not resolve the issue
I am seeing.  The standby still refuses to finish starting up until long
after all clients have disconnected from the primary (>10 minutes).  I do
see your new log statement on startup, but only once - it does not repeat.
 Is there any way for me to see  what the oldest xid on the standby is via
controldata or something like that?  The standby does stream to keep up with
the primary while the primary has load, and then it becomes idle when the
primary becomes idle (when I kill all the connections)so it appears to
be current...but it just doesn't finish starting up

I'm not sure if it's relevant, but after it has sat idle for a couple
minutes I start seeing these statements in the log (with the same offset
every time):
DEBUG:  skipping restartpoint, already performed at 9/9520

On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs  wrote:

> Chris Redekop's recent report of slow startup for Hot Standby has made
> me revisit the code there.
>
> Although there isn't a bug, there is a missed opportunity for starting
> up faster which could be the source of Chris' annoyance.
>
> The following patch allows a faster startup in some circumstances.
>
> The patch also alters the log levels for messages and gives a single
> simple message for this situation. The log will now say
>
>  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
> oldest active xid on standby is at least %u (now %u)
>  ...multiple times until snapshot non-overflowed or xid reached...
>
> whereas before the first LOG message shown was
>
>  LOG:  consistent state delayed because recovery snapshot incomplete
>  and only later, at DEBUG2 do you see
>  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
>  ...multiple times until xid reached...
>
> Comments please.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

FYI I have given this patch a good test and can now no longer reproduce
either the subtrans nor the clog error.  Thanks guys!


On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs  wrote:

> On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs 
> wrote:
> > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs 
> wrote:
> >
> >> Brewing a patch now.
> >
> > Latest thinking... confirmations or other error reports please.
> >
> > This fixes both the subtrans and clog bugs in one patch.
>
> I'll be looking to commit that tomorrow afternoon as two separate
> patches with appropriate credits.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

> And I think they also reported that if they didn't run hot standby,
> but just normal recovery into a new master, it didn't have the problem
> either, i.e. without hotstandby, recovery ran, properly extended the
> clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the
pg_clog error repeatedly and then without changing anything else, just
turning hot standby off it will start up successfully.

> This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeingit's possible it happens
more frequently without the immediate checkpoint, but I am seeing it happen
even with the immediate checkpoint.

> This is a different problem and has already been reported by one of
> your colleagues in a separate thread, and answered in detail by me
> there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop

>
> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.
>
Do you have any theories on this that I could help investigate?  It happens
even when using pg_basebackup and it persists until another sync is
performed, so the files must be in some state that that it can't recover
fromwithout understanding the internals just viewing from an
outside perspective, I don't really see how this could not be a PostgreSQL
problem

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop

> Chris, can you rearrange the backup so you copy the pg_control file as
> the first act after the pg_start_backup?

I tried this and it doesn't seem to make any difference.  I also tried the
patch and I can no longer reproduce the subtrans error, however instead it
now it starts up, but never gets to the point where it'll accept
connections.  It starts up but if I try to do anything I always get "FATAL:
 the database system is starting up"...even if the load is removed from the
primary, the standby still never finishes "starting up".  Attached below is
a log of one of these startup attempts.  In my testing with the patch
applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go
into the "db is starting up" statethe pg_clog error is still there, but
seems much harder to reproduce nowI've seen it only once since applying
the patch (out of probably 50 or 60 under-load startup attempts).  It does
seem to be "moody" like that thoit will be very difficult to reproduce
for a while, and then it will happen damn-near every time for a
while...weirdness

On a bit of a side note, I've been thinking of changing my scripts so that
they perform an initial rsync prior to doing the
startbackup-rsync-stopbackup just so that the second rsync will be
fasterso that the backup is in progress for a shorter period of time, as
while it is running it will stop other standbys from starting upthis
shouldn't cause any issues eh?


2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG:  database system was
interrupted; last known up at 2011-10-25 13:43:11 MDT
2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG:  creating missing WAL
directory "pg_xlog/archive_status"
2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG:  entering standby mode
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 2/CF00
2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG:  streaming replication
successfully connected to primary
2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG:  checkpoint record is at
2/CF81A478
2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG:  redo record is at
2/CF20; shutdown FALSE
2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG:  next transaction ID:
0/4634700; next OID: 1188228
2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG:  next MultiXactId: 839;
next MultiXactOffset: 1686
2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG:  oldest unfrozen
transaction ID: 1669, in database 1
2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG:  transaction ID wrap limit
is 2147485316, limited by database with OID 1
2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG:  initializing for hot
standby
2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG:  consistent recovery state
reached at 2/CF81A4D0
2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG:  redo starts at 2/CF20
2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
because recovery snapshot incomplete
2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT:  xlog redo  running
xacts:
nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130
xacts:
4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344
4634345
4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354
4634355
4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364
4634365
4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517
4634518
4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527
4634528
4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537
4634538
4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388
4634389
4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398
4634399
4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408
4634409
4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418
4634419
4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586
4634587
4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596
4634597
4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606
4634607;
 subxid ovf
2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL:  the database system is
starting up
DEBUG:  standby "sync_rep_test" has now caught up with primary
2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL:  the database system is
starting up
and so on...


On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs  wrote:

> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug  wrote:
>
> > What I do

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop

Well, on the other hand maybe there is something wrong with the data.
 Here's the test/steps I just did -
1. I do the pg_basebackup when the master is under load, hot slave now will
not start up but warm slave will.
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql
restart'
4. The postgres fails to restart with the same error.
5. I turn hot_standby back off and postgres starts back up fine as a warm
slave
6. I then turn off the load, the slave is all caught up, master and slave
are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any
load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my
surprise, instead of failing, it successfully starts up as a hot standby
(this is after maybe 5 minutes or so of sitting idle)

So...given that it continued to fail even after the load had been turned of,
that makes me believe that the data which was copied over was invalid in
some way.  And when a checkpoint/logrotation/somethingelse occurred when not
under load it cleared itself upI'm shooting in the dark here

Anyone have any suggestions/ideas/things to try?

On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop  wrote:

> I can confirm that both the pg_clog and pg_subtrans errors do occur when
> using pg_basebackup instead of rsync.  The data itself seems to be fine
> because using the exact same data I can start up a warm standby no problem,
> it is just the hot standby that will not start up.
>
>
> On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop  wrote:
>
>> > > Linas, could you capture the output of pg_controldata *and* increase
>> the
>> > > log level to DEBUG1 on the standby? We should then see nextXid value
>> of
>> > > the checkpoint the recovery is starting from.
>> >
>> > I'll try to do that whenever I'm in that territory again...
>> Incidentally,
>> > recently there was a lot of unrelated-to-this-post work to polish things
>> up
>> > for a talk being given at PGWest 2011 Today :)
>> >
>> > > I also checked what rsync does when a file vanishes after rsync
>> computed the
>> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
>> complains
>> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
>> special
>> > > exit code for precisely that failure case.
>> >
>> > To be precise, my script has logic to accept the exit code 24, just as
>> > stated in PG manual:
>> >
>> > Docs> For example, some versions of rsync return a separate exit code
>> for
>> > Docs> "vanished source files", and you can write a driver script to
>> accept
>> > Docs> this exit code as a non-error case.
>>
>> I also am running into this issue and can reproduce it very reliably.  For
>> me, however, it happens even when doing the "fast backup" like so:
>> pg_start_backup('whatever', true)...my traffic is more write-heavy than
>> linas's tho, so that might have something to do with it.  Yesterday it
>> reliably errored out on pg_clog every time, but today it is
>> failing sporadically on pg_subtrans (which seems to be past where the
>> pg_clog error was)the only thing that has changed is that I've changed
>> the log level to debug1I wouldn't think that could be related though.
>>  I've linked the requested pg_controldata and debug1 logs for both errors.
>>  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
>> pg_controldata, and then the postgres debug1 log produced from a subsequent
>> startup attempt.
>>
>> pg_clog: http://pastebin.com/mTfdcjwH
>> pg_subtrans: http://pastebin.com/qAXEHAQt
>>
>> Any workarounds would be very appreciated.would copying clog+subtrans
>> before or after the rest of the data directory (or something like that) make
>> any difference?
>>
>> Thanks!
>>
>
>

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop

I can confirm that both the pg_clog and pg_subtrans errors do occur when
using pg_basebackup instead of rsync.  The data itself seems to be fine
because using the exact same data I can start up a warm standby no problem,
it is just the hot standby that will not start up.


On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop  wrote:

> > > Linas, could you capture the output of pg_controldata *and* increase
> the
> > > log level to DEBUG1 on the standby? We should then see nextXid value of
> > > the checkpoint the recovery is starting from.
> >
> > I'll try to do that whenever I'm in that territory again... Incidentally,
> > recently there was a lot of unrelated-to-this-post work to polish things
> up
> > for a talk being given at PGWest 2011 Today :)
> >
> > > I also checked what rsync does when a file vanishes after rsync
> computed the
> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
> complains
> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
> special
> > > exit code for precisely that failure case.
> >
> > To be precise, my script has logic to accept the exit code 24, just as
> > stated in PG manual:
> >
> > Docs> For example, some versions of rsync return a separate exit code for
> > Docs> "vanished source files", and you can write a driver script to
> accept
> > Docs> this exit code as a non-error case.
>
> I also am running into this issue and can reproduce it very reliably.  For
> me, however, it happens even when doing the "fast backup" like so:
> pg_start_backup('whatever', true)...my traffic is more write-heavy than
> linas's tho, so that might have something to do with it.  Yesterday it
> reliably errored out on pg_clog every time, but today it is
> failing sporadically on pg_subtrans (which seems to be past where the
> pg_clog error was)the only thing that has changed is that I've changed
> the log level to debug1I wouldn't think that could be related though.
>  I've linked the requested pg_controldata and debug1 logs for both errors.
>  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
> pg_controldata, and then the postgres debug1 log produced from a subsequent
> startup attempt.
>
> pg_clog: http://pastebin.com/mTfdcjwH
> pg_subtrans: http://pastebin.com/qAXEHAQt
>
> Any workarounds would be very appreciated.would copying clog+subtrans
> before or after the rest of the data directory (or something like that) make
> any difference?
>
> Thanks!
>

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-15 Thread Chris Redekop

> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
>
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things
up
> for a talk being given at PGWest 2011 Today :)
>
> > I also checked what rsync does when a file vanishes after rsync computed
the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
special
> > exit code for precisely that failure case.
>
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
>
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For
me, however, it happens even when doing the "fast backup" like so:
pg_start_backup('whatever', true)...my traffic is more write-heavy than
linas's tho, so that might have something to do with it.  Yesterday it
reliably errored out on pg_clog every time, but today it is
failing sporadically on pg_subtrans (which seems to be past where the
pg_clog error was)the only thing that has changed is that I've changed
the log level to debug1I wouldn't think that could be related though.
 I've linked the requested pg_controldata and debug1 logs for both errors.
 Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
pg_controldata, and then the postgres debug1 log produced from a subsequent
startup attempt.

pg_clog: http://pastebin.com/mTfdcjwH
pg_subtrans: http://pastebin.com/qAXEHAQt

Any workarounds would be very appreciated.would copying clog+subtrans
before or after the rest of the data directory (or something like that) make
any difference?

Thanks!

Re: [HACKERS] pg_last_xact_insert_timestamp

2011-09-08 Thread Chris Redekop

Thanks for all the feedback guys.  Just to throw another monkey wrench in
here - I've been playing with Simon's proposed solution of returning 0 when
the WAL positions match, and I've come to the realizatiion that even if
using pg_last_xact_insert_timestamp, although it would help, we still
wouldn't be able to get a 100% accurate "how far behind?" counternot
that this is a big deal, but I know my ops team is going to bitch to me
about it :).take this situation: there's a lull of 30 seconds where
there are no transactions committed on the serverthe slave is totally
caught up, WAL positions match, I'm reporting 0, everything is happy.  Then
a transaction is committed on the masterbefore the slave gets it my
query hits it and sees that we're 30 seconds behind (when in reality we're
<1sec behind).Because of this affect my graph is a little spikey...I
mean it's not a huge deal or anything - I can put some sanity checking in my
number reporting ("if 1 second ago you were 0 seconds behind, you can't be
more than 1 second behind now" sorta thing).  But if we wanted to go for
super-ideal solution there would be a way to get the timestamp of
pg_stat_replication.replay_location+1 (the first transaction that the slave
does not have).

On Thu, Sep 8, 2011 at 7:03 AM, Robert Haas  wrote:

> On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao  wrote:
> > OTOH, new function enables users to monitor the delay as a timestamp.
> > For users, a timestamp is obviously easier to handle than LSN, and the
> delay
> > as a timestamp is more intuitive. So, I think that it's worth adding
> > something like pg_last_xact_insert_timestamp into core for improvement
> > of user-friendness.
>
> It seems very nice from a usability point of view, but I have to agree
> with Simon's concern about performance.  Actually, as of today,
> WALInsertLock is such a gigantic bottleneck that I suspect the
> overhead of this additional bookkeeping would be completely
> unnoticeable.  But I'm still reluctant to add more centralized
> spinlocks that everyone has to fight over, having recently put a lot
> of effort into getting rid of some of the ones we've traditionally
> had.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] 9.1.2 ?

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Standby startup with overflowed snapshots

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Standby startup with overflowed snapshots

Re: [HACKERS] Hot Standby startup with overflowed snapshots

Re: [HACKERS] Hot Standby startup with overflowed snapshots

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Re: [HACKERS] pg_last_xact_insert_timestamp

15 matches

Site Navigation

Mail list logo

Footer information