Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa

(2013/03/06 16:50), Heikki Linnakangas wrote:

Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery - record minRecoveryPoint in control file - archive
recovery]
I think that this is an original intention of Heikki's patch.


Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, 
and false on error, and the patch makes it return 'true' on error, if archive recovery 
was requested but we're still in crash recovery. The real issue here is that I missed the 
two return NULL;s in ReadRecord(), so the code that I put in the 
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. 
Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not 
use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop, crash 
recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come off 
standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.
I attempt test script which was modifyed Horiuch's script. This script does not 
depend on shell enviroment. It was only needed to fix PGPATH.
Please execute this test script.



I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set 
recovery_target_timeline='latest'.

I set recovery_target_timeline=latest. hmm...

Here is my recovery.conf.

mitsu-ko@localhost postgresql]$ cat Standby/recovery.conf
standby_mode = 'yes'
recovery_target_timeline='latest'
primary_conninfo='host=localhost port=65432'
restore_command='cp ../arc/%f %p'

And my system's log message is here.

waiting for server to start[Standby] LOG:  database system was shut down in 
recovery at 2013-03-07 02:56:05 EST
[Standby] LOG:  restored log file 0002.history from archive
cp: cannot stat `../arc/0003.history': そのようなファイルやディレクトリはありません
[Standby] FATAL:  requested timeline 2 is not a child of database system 
timeline 1
[Standby] LOG:  startup process (PID 20941) exited with exit code 1
[Standby] LOG:  aborting startup due to startup process failure

It can be reproduced in my test script, too.
Last master start command might seem not to exist generally in my test script.
But it is generally that PostgreSQL with Pacemaker system.


Best regards,
--
Mitsumasa KONDO
NTT OSS Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..2486683 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
+	{
+		/*
+		 * If archive recovery was requested when crash recovery failed, go to
+		 * the label next_record_is_invalid to switch to archive recovery.
+		 */
+		if (!InArchiveRecovery  ArchiveRecoveryRequested)
+			goto next_record_is_invalid;
 		return NULL;
+	}
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ;
@@ -4168,7 +4176,15 @@ retry:
 			}
 			/* Wait for the next page to become available */
 			if (!XLogPageRead(pagelsn, emode, false, false))
+		{
+		/*
+ * If archive recovery was requested when crash recovery failed, go to
+ * the label next_record_is_invalid to switch to archive recovery.
+ */
+if (!InArchiveRecovery  ArchiveRecoveryRequested)
+	goto next_record_is_invalid;
 return NULL;
+			}
 
 			/* Check that the continuation record looks valid */
 			if (!(((XLogPageHeader) readBuf)-xlp_info  XLP_FIRST_IS_CONTRECORD))
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..591e8c0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4446,7 +4462,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 	if (targetTLI == 1)
 		return list_make1_int((int) targetTLI);
 
-	if (InArchiveRecovery)
+	if (ArchiveRecoveryRequested)
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
#! /bin/sh
echo  initial settings 
PGPATH=`pwd`/bin
PATH=$PGPATH:$PATH
PGDATA0=Master
PGDATA1=Standby
PGARC=arc
mkdir $PGARC
PGPORT0=65432
PGPORT1=65433
unset PGPORT
unset PGDATA
echo Postgresql is \`which postgres`\
killall -9 postgres
rm -rf $PGDATA0 $PGDATA1 $PGARC/*
initdb $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
port=$PGPORT0
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p ../$PGARC/%f'
max_wal_senders = 3
hot_standby = on
log_min_messages = debug1
log_line_prefix = '[Master] '
EOF
cat  

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Heikki Linnakangas

On 07.03.2013 10:05, KONDO Mitsumasa wrote:

(2013/03/06 16:50), Heikki Linnakangas wrote:

Yeah. That fix isn't right, though; XLogPageRead() is supposed to
return true on success, and false on error, and the patch makes it
return 'true' on error, if archive recovery was requested but we're
still in crash recovery. The real issue here is that I missed the two
return NULL;s in ReadRecord(), so the code that I put in the
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't
find the file at all. Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does
not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop,
crash recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come
off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.


Ah, I see. I committed a slightly modified version of this.


I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set
recovery_target_timeline='latest'.

...
It can be reproduced in my test script, too.


I see the problem now, with that script. So what happens is that the 
startup process first scans the timeline history files to choose the 
recovery target timeline. For that scan, I temporarily set 
InArchiveRecovery=true, in readRecoveryCommandFile. However, after 
readRecoveryCommandFile returns, we then try to read the timeline 
history file corresponding the chosen recovery target timeline, but 
InArchiveRecovery is no longer set, so we don't fetch the file from 
archive, and return a dummy history, with just the target timeline in 
it. That doesn't contain the older timeline, so you get an error at 
recovery.


Fixed per your patch to check for ArchiveRecoveryRequested instead of 
InArchiveRecovery, when reading timeline history files. This also makes 
it unnecessary to temporarily set InArchiveRecovery=true, so removed that.


Committed both fixes. Please confirm this this fixed the problem in your 
test environment. Many thanks for the testing and the patches!


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa

(2013/03/07 19:41), Heikki Linnakangas wrote:

On 07.03.2013 10:05, KONDO Mitsumasa wrote:

(2013/03/06 16:50), Heikki Linnakangas wrote:

Yeah. That fix isn't right, though; XLogPageRead() is supposed to
return true on success, and false on error, and the patch makes it
return 'true' on error, if archive recovery was requested but we're
still in crash recovery. The real issue here is that I missed the two
return NULL;s in ReadRecord(), so the code that I put in the
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't
find the file at all. Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does
not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop,
crash recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come
off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.


Ah, I see. I committed a slightly modified version of this.

I feel that your modification is legible. Thanks for your modification and 
committing patch!
 

I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set
recovery_target_timeline='latest'.

...
It can be reproduced in my test script, too.


I see the problem now, with that script. So what happens is that the startup process 
first scans the timeline history files to choose the recovery target timeline. For that 
scan, I temporarily set InArchiveRecovery=true, in readRecoveryCommandFile. However, 
after readRecoveryCommandFile returns, we then try to read the timeline history file 
corresponding the chosen recovery target timeline, but InArchiveRecovery is no longer 
set, so we don't fetch the file from archive, and return a dummy history, 
with just the target timeline in it. That doesn't contain the older timeline, so you get 
an error at recovery.
Fixed per your patch to check for ArchiveRecoveryRequested instead of 
InArchiveRecovery, when reading timeline history files. This also makes it 
unnecessary to temporarily set InArchiveRecovery=true, so removed that.
Committed both fixes. Please confirm this this fixed the problem in your test 
environment. Many thanks for the testing and the patches!

I understand this problem. Thank you for adding modification and detail 
explanation! I test latest REL9_2_STABLE in my system. I confirm that it run 
good without problem. If I found an another problem, I will report and send you 
patch and test script!


Best regards,
--
Mitsumasa KONDO
NTT OSS Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Kyotaro HORIGUCHI
Everything seems settled up above my head while sleeping..

Sorry for crumsy test script, and thank you for refining it, Mitsumasa.

And thank you for fixing the bug and the detailed explanation, Heikki.

I confirmed that the problem is fixed also for me at origin/REL9_2_STABLE.

 I understand this problem. Thank you for adding modification and
 detail explanation! I test latest REL9_2_STABLE in my system. I
 confirm that it run good without problem. If I found an another
 problem, I will report and send you patch and test script!

regards,

-- 
堀口恭太郎

日本電信電話株式会社 NTTオープンソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-06 Thread Heikki Linnakangas

On 05.03.2013 14:09, KONDO Mitsumasa wrote:

Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery - record minRecoveryPoint in control file - archive
recovery]
I think that this is an original intention of Heikki's patch.


Yeah. That fix isn't right, though; XLogPageRead() is supposed to return 
true on success, and false on error, and the patch makes it return 
'true' on error, if archive recovery was requested but we're still in 
crash recovery. The real issue here is that I missed the two return 
NULL;s in ReadRecord(), so the code that I put in the 
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find 
the file at all. Attached patch is the proper fix for this.



I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set 
recovery_target_timeline='latest'.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..74a54f6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,7 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
-		return NULL;
+		goto next_record_is_invalid;
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ;
@@ -4168,7 +4168,7 @@ retry:
 			}
 			/* Wait for the next page to become available */
 			if (!XLogPageRead(pagelsn, emode, false, false))
-return NULL;
+goto next_record_is_invalid;
 
 			/* Check that the continuation record looks valid */
 			if (!(((XLogPageHeader) readBuf)-xlp_info  XLP_FIRST_IS_CONTRECORD))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hello, I could cause the behavior and might understand the cause.

The head of origin/REL9_2_STABLE shows the behavior I metioned in
the last message when using the shell script attached. 9.3dev
runs as expected.

In XLogPageRead, when RecPtr goes beyond the last page, the
current xlog file is released and new page requested.

The variables were as below at the point.

  StandbyRequested == true
  StandbyMode == false
  ArchiveRecoveryRequested == true
  InArchiveRecovery == false

In this case, XLogPageRead immediately returns NULL before trying
to get xlogs via streaming nor from archive. So ReadRecord
returns NULL, then unexpectedly exits 'main redo apply loop' and
increases timeline ID as if it were promoted.

This seems fiexed by letting it try all requested
sources. Attached patch does it and the test script runs as
expected.

 We found that PostgreSQL with this patch unexpctedly becomes
 primary when starting up as standby. We'll do further
 investigation for the behavior.
 
   Anyway, I've committed this to master and 9.2 now.
  
  This seems to fix the issue. We'll examine this further.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh
version=abf5c5
version=924b
killall -9 postgres
source pgsetpath $version 0
rm -rf $PGDATA/* $PGARC/*
PGDATA0=$PGDATA
PGPORT0=$PGPORT
initdb -D $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC/%f'
max_wal_senders = 3
hot_standby = on
EOF
cat  $PGDATA0/pg_hba.conf EOF
local   replication horigutitrust
EOF
echo ## Startup master
pg_ctl -D $PGDATA0 -w start
source pgsetpath $version 1 -p 5433
PGDATA1=$PGDATA
PGPORT1=$PGPORT
rm -rf $PGDATA/* $PGARC/*
echo ## basebackup
pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1
chmod 700 $PGDATA1
cat  $PGDATA1/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5432'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
#restore_command='a=$PGARC; if [  -d \$a ]; then echo Archive directory \$a is 
not found.; exit 1; elif [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
EOF

echo ## Startup standby
pg_ctl -D $PGDATA1 start
echo ## Sleep for 5 seconds
sleep 5

echo ## Shutdown standby
pg_ctl -D $PGDATA1 -w stop -m f

echo ## Shutdown master in immediate mode
pg_ctl -D $PGDATA0 -w stop -m i

cat  $PGDATA0/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5433'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
EOF

echo ## Starting master as a standby
if [ $1 == w ]; then touch /tmp/xlogwait; fi
PGPORT=5432 pg_ctl -D $PGDATA0  start
#psql postgres -c select pg_is_in_recovery();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..00b5bc5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10604,7 +10604,19 @@ retry:
 			  sources);
 switched_segment = true;
 if (readFile  0)
+{
+	if (!InArchiveRecovery  ArchiveRecoveryRequested)
+	{
+		InArchiveRecovery = true;
+		goto retry;
+	}
+	else if (!StandbyMode  StandbyModeRequested)
+	{
+		StandbyMode = true;
+		goto retry;
+	}
 	return false;
+}
 			}
 		}
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Sorry, I sent wrong script.

 The head of origin/REL9_2_STABLE shows the behavior I metioned in
 the last message when using the shell script attached. 9.3dev
 runs as expected.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh
pgpath=$HOME/bin/pgsql_924b
echo $PATH | grep $pgpath | wc -l
if [ `echo $PATH | grep $pgpath | wc -l` == 0 ]; then
PATH=$pgpath/bin:$PATH
fi

PGDATA0=$HOME/data/pgdata_0
PGDATA1=$HOME/data/pgdata_1
PGARC0=$HOME/data/pgarc_0
PGARC1=$HOME/data/pgarc_1
PGPORT0=5432
PGPORT1=5433
unset PGPORT
unset PGDATA
echo Postgresql is \`which postgres`\
killall -9 postgres
rm -rf $PGDATA0/* $PGARC0/*
initdb -D $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
port=5432
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC0/%f'
max_wal_senders = 3
hot_standby = on
#log_min_messages = debug5
EOF
cat  $PGDATA0/pg_hba.conf EOF
local   replication horigutitrust
EOF
echo ## Startup master
pg_ctl -D $PGDATA0 -w  -o -p $PGPORT0 start

rm -rf $PGDATA1/* $PGARC1/*
echo ## basebackup
pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1
chmod 700 $PGDATA1
cat  $PGDATA1/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5432'
restore_command='a=$PGARC1; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; 
fi'
EOF

echo ## Startup standby
pg_ctl -D $PGDATA1 -o -p $PGPORT1 start
echo ## Sleep for 5 seconds
sleep 5

echo ## Shutdown standby
pg_ctl -D $PGDATA1 -w stop -m f

echo ## Shutdown master in immediate mode
pg_ctl -D $PGDATA0 -w stop -m i

cat  $PGDATA0/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5433'
restore_command='a=$PGARC0; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; 
fi'
EOF

echo ## Starting master as a standby
pg_ctl -D $PGDATA0  start

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread KONDO Mitsumasa

Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery - record minRecoveryPoint in control file - archive recovery]
I think that this is an original intention of Heikki's patch.

I also found a bug in latest 9.2_stable. It does not get latest timeline and
recovery history file in archive recovery when master and standby timeline is 
different.

Best regards,

(2013/03/05 18:22), Kyotaro HORIGUCHI wrote:

Hello, I could cause the behavior and might understand the cause.

The head of origin/REL9_2_STABLE shows the behavior I metioned in
the last message when using the shell script attached. 9.3dev
runs as expected.

In XLogPageRead, when RecPtr goes beyond the last page, the
current xlog file is released and new page requested.

The variables were as below at the point.

   StandbyRequested == true
   StandbyMode == false
   ArchiveRecoveryRequested == true
   InArchiveRecovery == false

In this case, XLogPageRead immediately returns NULL before trying
to get xlogs via streaming nor from archive. So ReadRecord
returns NULL, then unexpectedly exits 'main redo apply loop' and
increases timeline ID as if it were promoted.

This seems fiexed by letting it try all requested
sources. Attached patch does it and the test script runs as
expected.


We found that PostgreSQL with this patch unexpctedly becomes
primary when starting up as standby. We'll do further
investigation for the behavior.


Anyway, I've committed this to master and 9.2 now.


This seems to fix the issue. We'll examine this further.


regards,





--
Mitsumasa KONDO
NTT OSS Center
--- a/src/backend/access/transam/xlog.c	2013-03-04 15:13:49.0 -0500
+++ b/src/backend/access/transam/xlog.c	2013-03-05 06:43:49.435093827 -0500
@@ -4446,7 +4446,7 @@
 	if (targetTLI == 1)
 		return list_make1_int((int) targetTLI);
 
-	if (InArchiveRecovery)
+	if (ArchiveRecoveryRequested)
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
@@ -10603,8 +10603,11 @@
 readFile = XLogFileReadAnyTLI(readId, readSeg, emode,
 			  sources);
 switched_segment = true;
-if (readFile  0)
+if (readFile  0){
+	if (StandbyModeRequested)
+		return true;
 	return false;
+}
 			}
 		}
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hmm..

 Horiguch's patch does not seem to record minRecoveryPoint in
 ReadRecord();
 Attempt patch records minRecoveryPoint.
 [crash recovery - record minRecoveryPoint in control file - archive
 recovery]
 I think that this is an original intention of Heikki's patch.

It could be. Before that, my patch does not wake up hot standby
because I didn't care of minRecoveryPoint in it:-p

On the other hand, your patch fixes that point but ReadRecord
runs on the false page data. However the wrong record on the
false page can be identified as broken, It should be an
undesiable behavior.

If we want to show the final solution by ourselves, we need to
make another patch to settle them all. Let me take further
thought..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hi, I suppose the attached patch is close to the solution.

 I think that this is an original intention of Heikki's patch.

I noticed that archive recovery will be turned on in
next_record_is_invalid thanks to your patch.

 On the other hand, your patch fixes that point but ReadRecord
 runs on the false page data. However the wrong record on the
 false page can be identified as broken, It should be an
 undesiable behavior.

All we should do to update minRecoveryPoint as needed when
XLogPageRead is failed in ReadRecord is to simply jump to
next_record_is_invalid if archive recovery is requested but doing
crash recovery yet.

Your modification in readTimeLineHistory and my modifictions in
XLogPageRead seem not necessary for the objective in this thread,
so removed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..28d6f2e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
+	{
+		/*
+		 * If archive recovery was requested when crash recovery failed, go to
+		 * the label next_record_is_invalid to switch to archive recovery.
+		 */
+		if (!InArchiveRecovery  ArchiveRecoveryRequested)
+			goto next_record_is_invalid;
 		return NULL;
+	}
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-04 Thread Kyotaro HORIGUCHI
This is an interim report for this patch.

We found that PostgreSQL with this patch unexpctedly becomes
primary when starting up as standby. We'll do further
investigation for the behavior.

  Anyway, I've committed this to master and 9.2 now.
 
 This seems to fix the issue. We'll examine this further.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-26 Thread Josh Berkus
Folks,

Is there any way this particular issue could cause data corruption
without causing a crash?  I don't see a way for it to do so, but I
wanted to verify.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
Hello, 

 Anyway, I've committed this to master and 9.2 now.

This seems to fix the issue. We'll examine this further.

Thank you.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
At Fri, 22 Feb 2013 11:42:39 +0200, Heikki Linnakangas 
hlinnakan...@vmware.com wrote in 51273d8f.7060...@vmware.com
 On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote:
  In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL
  is stopped by 'pg_ctl stop -m i' regardless of situation.
 
 That seems like a bad idea. If nothing else, crash recovery can take a
 long time. I don't know much about Pacemaker, but wouldn't it make
 more sense to at least try fast shutdown first, falling back to
 immediate shutdown after a timeout.

I completely agree with you. But I hear that they want to
shutdown as fast as possible on failure and also don't want more
states on RA.. On the other hand he result in case of this
failure is catastrophic so this should be fixed regardless of
what pacemaker does. 

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
However this has become useless, I want to explain about how this
works.

  I tried to postpone smgrtruncate TO the next checktpoint.
 
 Umm, why? I don't understand this patch at all.

This inhibits truncate files after (quite vague in the patch:-)
the previous checkpoint by hindering the deleted tuples from
vacuum then hindering the vm pages from truncation. They are
truncated only If no tuples is deleted on the pages correspond to
them since the last checkpoint. Finally any nonexistent pages in
visibility maps won't be touched after the consistency point -
which is after the last checkpoint - on archive recovery.

# Umm.. sorry for confused explanation...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas

On 22.02.2013 02:13, Michael Paquier wrote:

On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:


On 15.02.2013 15:49, Heikki Linnakangas wrote:


Attached is a patch for git master. The basic idea is to split
InArchiveRecovery into two variables, InArchiveRecovery and
ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when
recovery.conf exists. But if we don't know how far we need to recover,
we first perform crash recovery with InArchiveRecovery=false. When we
reach the end of WAL in pg_xlog, InArchiveRecovery is set, and we
continue with normal archive recovery.



New version of this attached, with a few bugs fixed.

I'm thinking that this should be back-patched to 9.2, but not to earlier
branches. Before 9.2, we don't PANIC at a reference to a non-existent page
until end of recovery, even if we've already reached consistency. The same
basic issue still exists in earlier versions, though: if you have
hot_standby=on, the system will open for read-only queries too early,
before the database is consistent. But this patch is invasive enough that
I'm weary of back-patching it further, when the worst that can happen is
that there's a small window right after startup when you can see an
inconsistent database in hot standby mode. Maybe after we get some more
testing of this in 9.2 and master. Opinions on that?


People have not yet complained about this problem with versions prior to
9.1. Is it worth backpatching in this case?


Possibly not..

Anyway, I've committed this to master and 9.2 now.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas

On 14.02.2013 19:18, Fujii Masao wrote:

Yes. And the resource agent for streaming replication in Pacemaker (it's the
OSS clusterware) is the user of that archive recovery scenario, too. When it
starts up the server, it always creates the recovery.conf and starts the server
as the standby. It cannot start the master directly, IOW the server is always
promoted to the master from the standby. So when it starts up the server
after the server crashes, obviously it executes the same recovery scenario
(i.e., force archive recovery instead of crash one) as Kyotaro described.

The reason why that resource agent cannot start up the master directly is
that it manages three server states, called Master, Slave and Down. It can
move the server state from Down to Slave, and the reverse direction.
Also it can move the state from Slave to Master, and the reverse direction.
But there is no way to move the state between Down and Master directly.
This kind of the state transition model is isolated case in
clusterware, I think.


I don't have much sympathy for that to be honest. Seems like something 
that should be fixed in Pacemaker or the scripts used to glue it with 
PostgreSQL. However, this patch should make that work, so I guess 
everyone is happy.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas

On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote:

Sorry, I omitted to show how we found this issue.

In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL
is stopped by 'pg_ctl stop -m i' regardless of situation.


That seems like a bad idea. If nothing else, crash recovery can take a 
long time. I don't know much about Pacemaker, but wouldn't it make more 
sense to at least try fast shutdown first, falling back to immediate 
shutdown after a timeout.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Heikki Linnakangas

On 15.02.2013 15:49, Heikki Linnakangas wrote:

Attached is a patch for git master. The basic idea is to split
InArchiveRecovery into two variables, InArchiveRecovery and
ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when
recovery.conf exists. But if we don't know how far we need to recover,
we first perform crash recovery with InArchiveRecovery=false. When we
reach the end of WAL in pg_xlog, InArchiveRecovery is set, and we
continue with normal archive recovery.


New version of this attached, with a few bugs fixed.

I'm thinking that this should be back-patched to 9.2, but not to earlier 
branches. Before 9.2, we don't PANIC at a reference to a non-existent 
page until end of recovery, even if we've already reached consistency. 
The same basic issue still exists in earlier versions, though: if you 
have hot_standby=on, the system will open for read-only queries too 
early, before the database is consistent. But this patch is invasive 
enough that I'm weary of back-patching it further, when the worst that 
can happen is that there's a small window right after startup when you 
can see an inconsistent database in hot standby mode. Maybe after we get 
some more testing of this in 9.2 and master. Opinions on that?


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abc525a..29d1f96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -189,7 +189,18 @@ static bool LocalHotStandbyActive = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
-/* Are we recovering using offline XLOG archives? (only valid in the startup process) */
+/*
+ * When ArchiveRecoveryRequested is set, archive recovery was requested,
+ * ie. recovery.conf file was present. When InArchiveRecovery is set, we are
+ * currently recovering using offline XLOG archives. These variables are only
+ * valid in the startup process.
+ *
+ * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
+ * currently performing crash recovery using only XLOG files in pg_xlog, but
+ * will switch to using offline XLOG archives as soon as we reach the end of
+ * WAL in pg_xlog.
+*/
+static bool ArchiveRecoveryRequested = false;
 bool InArchiveRecovery = false;
 
 /* Was the last xlog file restored from archive, or local? */
@@ -207,10 +218,13 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 
 /* options taken from recovery.conf for XLOG streaming */
-bool StandbyMode = false;
+static bool StandbyModeRequested = false;
 static char *PrimaryConnInfo = NULL;
 static char *TriggerFile = NULL;
 
+/* are we currently in standby mode? */
+bool StandbyMode = false;
+
 /* whether request for fast promotion has been made yet */
 static bool fast_promote = false;
 
@@ -3217,10 +3231,10 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 	private-emode = emode;
 	private-randAccess = (RecPtr != InvalidXLogRecPtr);
 
-	/* This is the first try to read this page. */
+	/* This is the first attempt to read this page. */
 	lastSourceFailed = false;
 
-	do
+	for (;;)
 	{
 		char   *errormsg;
 
@@ -3229,8 +3243,6 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 		EndRecPtr = xlogreader-EndRecPtr;
 		if (record == NULL)
 		{
-			lastSourceFailed = true;
-
 			if (readFile = 0)
 			{
 close(readFile);
@@ -3247,22 +3259,16 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 ereport(emode_for_corrupt_record(emode,
  RecPtr ? RecPtr : EndRecPtr),
 		(errmsg_internal(%s, errormsg) /* already translated */));
-
-			/* Give up, or retry if we're in standby mode. */
-			continue;
 		}
-
 		/*
 		 * Check page TLI is one of the expected values.
 		 */
-		if (!tliInHistory(xlogreader-latestPageTLI, expectedTLEs))
+		else if (!tliInHistory(xlogreader-latestPageTLI, expectedTLEs))
 		{
 			char		fname[MAXFNAMELEN];
 			XLogSegNo segno;
 			int32 offset;
 
-			lastSourceFailed = true;
-
 			XLByteToSeg(xlogreader-latestPagePtr, segno);
 			offset = xlogreader-latestPagePtr % XLogSegSize;
 			XLogFileName(fname, xlogreader-readPageTLI, segno);
@@ -3273,11 +3279,73 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 			fname,
 			offset)));
 			record = NULL;
-			continue;
 		}
-	} while (StandbyMode  record == NULL  !CheckForStandbyTrigger());
 
-	return record;
+		if (record)
+		{
+			/* Great, got a record */
+			return record;
+		}
+		else
+		{
+			/* No valid record available from this source */
+			lastSourceFailed = true;
+
+			/*
+			 * If archive recovery was requested, but we were still doing crash
+			 * recovery, switch to archive recovery and retry using the offline
+			 * archive. We have now replayed all the valid WAL in pg_xlog, so
+			 * we are presumably now consistent.
+			 *
+			 * We require that there's at least some valid WAL present in
+			 * pg_xlog, however (!fetch_ckpt). We could recover using the 

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Michael Paquier
On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 15.02.2013 15:49, Heikki Linnakangas wrote:

 Attached is a patch for git master. The basic idea is to split
 InArchiveRecovery into two variables, InArchiveRecovery and
 ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when
 recovery.conf exists. But if we don't know how far we need to recover,
 we first perform crash recovery with InArchiveRecovery=false. When we
 reach the end of WAL in pg_xlog, InArchiveRecovery is set, and we
 continue with normal archive recovery.


 New version of this attached, with a few bugs fixed.

 I'm thinking that this should be back-patched to 9.2, but not to earlier
 branches. Before 9.2, we don't PANIC at a reference to a non-existent page
 until end of recovery, even if we've already reached consistency. The same
 basic issue still exists in earlier versions, though: if you have
 hot_standby=on, the system will open for read-only queries too early,
 before the database is consistent. But this patch is invasive enough that
 I'm weary of back-patching it further, when the worst that can happen is
 that there's a small window right after startup when you can see an
 inconsistent database in hot standby mode. Maybe after we get some more
 testing of this in 9.2 and master. Opinions on that?

People have not yet complained about this problem with versions prior to
9.1. Is it worth backpatching in this case?
-- 
Michael


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Kyotaro HORIGUCHI
Sorry, Let me correct a bit.

 I tried to postpone smgrtruncate after the next checkpoint. This

I tried to postpone smgrtruncate TO the next checktpoint.

 is similar to what hotstandby feedback does to vacuum.  It seems
 to be working fine but I warry that it might also bloats the
 table. I haven't found the way to postpone only objective
 smgrtruncate.
 
 The patch below is a immediate verification patch for this
 solution.
 
 - CreateCheckPoint records the oldest xmin at that point. Let's
   call it 'checkpoint xmin'.
 
 - vacuum skips the modification by the transactions at the same
   time or after the checkpoint xmin.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Heikki Linnakangas

On 20.02.2013 10:01, Kyotaro HORIGUCHI wrote:

Sorry, Let me correct a bit.


I tried to postpone smgrtruncate after the next checkpoint. This


I tried to postpone smgrtruncate TO the next checktpoint.


Umm, why? I don't understand this patch at all.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Ants Aasma
On Mon, Feb 18, 2013 at 8:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 backupStartPoint is set, which signals recovery to wait for an end-of-backup
 record, until the system is considered consistent. If the backup is taken
 from a hot standby, backupEndPoint is set, instead of inserting an
 end-of-backup record.

Thank you for explaining this, I can see how it works now. I'll see if
I can document this better so others won't have to go through as much
effort.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Kyotaro HORIGUCHI
Hello, I looked this from another point of view.

I consider the current discussion to be based on how to predict
the last consistency point. But there is another aspect of this
issue.

I tried to postpone smgrtruncate after the next checkpoint. This
is similar to what hotstandby feedback does to vacuum.  It seems
to be working fine but I warry that it might also bloats the
table. I haven't found the way to postpone only objective
smgrtruncate.

The patch below is a immediate verification patch for this
solution.

- CreateCheckPoint records the oldest xmin at that point. Let's
  call it 'checkpoint xmin'.

- vacuum skips the modification by the transactions at the same
  time or after the checkpoint xmin.

What do you think of this?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


=
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 479c14d..a274393 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -493,6 +493,7 @@ typedef struct XLogCtlData
XLogRecPtr  lastFpwDisableRecPtr;
 
slock_t info_lck;   /* locks shared variables shown 
above */
+   TransactionId chkptxactid;
 } XLogCtlData;
 
 static XLogCtlData *XLogCtl = NULL;
@@ -676,6 +677,11 @@ static bool read_backup_label(XLogRecPtr *checkPointLoc,
 static void rm_redo_error_callback(void *arg);
 static int get_sync_bit(int method);
 
+TransactionId
+XLogGetChkptTrId()
+{
+   return XLogCtl-chkptxactid;
+}
 
 /*
  * Insert an XLOG record having the specified RMID and info bytes,
@@ -3875,7 +3881,7 @@ XLOGShmemInit(void)
SpinLockInit(XLogCtl-info_lck);
SpinLockInit(XLogCtl-ulsn_lck);
InitSharedLatch(XLogCtl-recoveryWakeupLatch);
-
+   XLogCtl-chkptxactid = InvalidTransactionId;
/*
 * If we are not in bootstrap mode, pg_control should already exist. 
Read
 * and validate it immediately (see comments in ReadControlFile() for 
the
@@ -6700,6 +6706,8 @@ CreateCheckPoint(int flags)
else
checkPoint.oldestActiveXid = InvalidTransactionId;
 
+   XLogCtl-chkptxactid = GetOldestXmin(true, true);
+
/*
 * We must hold WALInsertLock while examining insert state to determine
 * the checkpoint REDO pointer.
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 4800b43..79205a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -374,6 +374,7 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
 /*
  * vacuum_set_xid_limits() -- compute oldest-Xmin and freeze cutoff points
  */
+extern TransactionId XLogGetChkptTrId(void);
 void
 vacuum_set_xid_limits(int freeze_min_age,
  int freeze_table_age,
@@ -397,6 +398,11 @@ vacuum_set_xid_limits(int freeze_min_age,
 * always an independent transaction.
 */
*oldestXmin = GetOldestXmin(sharedRel, true);
+   {
+   TransactionId chkpttrid = XLogGetChkptTrId();
+   if (chkpttrid  *oldestXmin)
+   *oldestXmin = chkpttrid;
+   }
 
Assert(TransactionIdIsNormal(*oldestXmin));
==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-18 Thread Heikki Linnakangas

On 16.02.2013 10:40, Ants Aasma wrote:

On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

While this solution would help solve my issue, it assumes that the
correct amount of WAL files are actually there. Currently the docs for
setting up a standby refer to 24.3.4. Recovering Using a Continuous
Archive Backup, and that step recommends emptying the contents of
pg_xlog. If this is chosen as the solution the docs should be adjusted
to recommend using pg_basebackup -x for setting up the standby.


When the backup is taken using pg_start_backup or pg_basebackup,
minRecoveryPoint is set correctly anyway, and it's OK to clear out pg_xlog.


How is minRecoveryPoint supposed to get set?


I was a slightly imprecise. minRecoveryPoint is not set at backup, but 
backupStartPoint is. minRecoveryPoint is set later, once the 
end-of-backup record is seen. A valid backupStartPoint has the same 
effect as minRecoveryPoint: the system is considered inconsistent until 
the end-of-backup record is seen.



I just tried taking a
pg_basebackup on master running pgbench. The resulting data dir
controlfile had this:

Min recovery ending location: 0/0

The end location was not in the backup_label either.

Looking at basebackup.c the process seems to be:
1. pg_start_backup
2. copy out backup_label
3. copy out rest of the datadir
4. copy out control file
5. pg_stop_backup
6. copy out WAL
7. send backup stop location

And pg_basebackup.c only uses the stop location to decide how much WAL to fetch.

I don't see anything here that could correctly communicate min
recovery point. Maybe I'm missing something.


backupStartPoint is set, which signals recovery to wait for an 
end-of-backup record, until the system is considered consistent. If the 
backup is taken from a hot standby, backupEndPoint is set, instead of 
inserting an end-of-backup record.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-16 Thread Ants Aasma
On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 While this solution would help solve my issue, it assumes that the
 correct amount of WAL files are actually there. Currently the docs for
 setting up a standby refer to 24.3.4. Recovering Using a Continuous
 Archive Backup, and that step recommends emptying the contents of
 pg_xlog. If this is chosen as the solution the docs should be adjusted
 to recommend using pg_basebackup -x for setting up the standby.

 When the backup is taken using pg_start_backup or pg_basebackup,
 minRecoveryPoint is set correctly anyway, and it's OK to clear out pg_xlog.

How is minRecoveryPoint supposed to get set? I just tried taking a
pg_basebackup on master running pgbench. The resulting data dir
controlfile had this:

Min recovery ending location: 0/0

The end location was not in the backup_label either.

Looking at basebackup.c the process seems to be:
1. pg_start_backup
2. copy out backup_label
3. copy out rest of the datadir
4. copy out control file
5. pg_stop_backup
6. copy out WAL
7. send backup stop location

And pg_basebackup.c only uses the stop location to decide how much WAL to fetch.

I don't see anything here that could correctly communicate min
recovery point. Maybe I'm missing something.

 Yeah, it probably could use some editing, as the underlying code has evolved
 a lot since it was written. The suggestion to clear out pg_xlog seems like
 an unnecessary complication. It's safe to do so, if you restore with an
 archive, but unnecessary.

 The File System Level Backup chapter
 (http://www.postgresql.org/docs/devel/static/backup-file.html) probably
 should mention pg_basebackup -x, too.

 Docs patches are welcome..

I will give it a shot.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Kyotaro HORIGUCHI
Sorry, I omitted to show how we found this issue.

In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL
is stopped by 'pg_ctl stop -m i' regardless of situation.

On the other hand, PosrgreSQL RA(Rsource Agent) is obliged to
start the master node via hot standby state because of the
restriction of the state transition of Pacemaker,

So the simply stopping and then starting the master node can fall
into this situation.

 Hmm, I just realized a little problem with that approach. If you take
 a base backup using an atomic filesystem backup from a running server,
 and start archive recovery from that, that's essentially the same
 thing as Kyotaro's test case.

Regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Ants Aasma
On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The problem is that we startup Hot Standby before we hit the min
 recovery point because that isn't recorded. For me, the thing to do is
 to make the min recovery point == end of WAL when state is
 DB_IN_PRODUCTION. That way we don't need to do any new writes and we
 don't need to risk people seeing inconsistent results if they do this.

While this solution would help solve my issue, it assumes that the
correct amount of WAL files are actually there. Currently the docs for
setting up a standby refer to 24.3.4. Recovering Using a Continuous
Archive Backup, and that step recommends emptying the contents of
pg_xlog. If this is chosen as the solution the docs should be adjusted
to recommend using pg_basebackup -x for setting up the standby. As a
related point, pointing standby setup to that section has confused at
least one of my clients. That chapter is rather scarily complicated
compared to what's usually necessary.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Heikki Linnakangas

On 15.02.2013 13:05, Ants Aasma wrote:

On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggssi...@2ndquadrant.com  wrote:

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.


While this solution would help solve my issue, it assumes that the
correct amount of WAL files are actually there. Currently the docs for
setting up a standby refer to 24.3.4. Recovering Using a Continuous
Archive Backup, and that step recommends emptying the contents of
pg_xlog. If this is chosen as the solution the docs should be adjusted
to recommend using pg_basebackup -x for setting up the standby.


When the backup is taken using pg_start_backup or pg_basebackup, 
minRecoveryPoint is set correctly anyway, and it's OK to clear out 
pg_xlog. It's only if you take the backup using an atomic filesystem 
snapshot, or just kill -9 the server and take a backup while it's not 
running, that we have a problem. In those scenarios, you should not 
clear pg_xlog.


Attached is a patch for git master. The basic idea is to split 
InArchiveRecovery into two variables, InArchiveRecovery and 
ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when 
recovery.conf exists. But if we don't know how far we need to recover, 
we first perform crash recovery with InArchiveRecovery=false. When we 
reach the end of WAL in pg_xlog, InArchiveRecovery is set, and we 
continue with normal archive recovery.



As a
related point, pointing standby setup to that section has confused at
least one of my clients. That chapter is rather scarily complicated
compared to what's usually necessary.


Yeah, it probably could use some editing, as the underlying code has 
evolved a lot since it was written. The suggestion to clear out pg_xlog 
seems like an unnecessary complication. It's safe to do so, if you 
restore with an archive, but unnecessary.


The File System Level Backup chapter 
(http://www.postgresql.org/docs/devel/static/backup-file.html) probably 
should mention pg_basebackup -x, too.


Docs patches are welcome..

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 479c14d..b4e7830 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -189,7 +189,18 @@ static bool LocalHotStandbyActive = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
-/* Are we recovering using offline XLOG archives? (only valid in the startup process) */
+/*
+ * When ArchiveRecoveryRequested is set, archive recovery was
+ * requested (recovery.conf file was present). When InArchiveRecovery is set,
+ * we are currently recovering using offline XLOG archives. (these variables
+ * are only valid in the startup process).
+ *
+ * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
+ * currently performing crash recovery using only XLOG files in pg_xlog, but
+ * will switch to using offline XLOG archives as soon as we reach the end of
+ * WAL in pg_xlog.
+*/
+static bool ArchiveRecoveryRequested = false;
 bool InArchiveRecovery = false;
 
 /* Was the last xlog file restored from archive, or local? */
@@ -207,10 +218,12 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 
 /* options taken from recovery.conf for XLOG streaming */
-bool StandbyMode = false;
+static bool StandbyModeRequested = false;
 static char *PrimaryConnInfo = NULL;
 static char *TriggerFile = NULL;
 
+bool StandbyMode = false;
+
 /* whether request for fast promotion has been made yet */
 static bool fast_promote = false;
 
@@ -3217,10 +3230,10 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 	private-emode = emode;
 	private-randAccess = (RecPtr != InvalidXLogRecPtr);
 
-	/* This is the first try to read this page. */
+	/* This is the first attempt to read this page. */
 	lastSourceFailed = false;
 
-	do
+	for (;;)
 	{
 		char   *errormsg;
 
@@ -3229,8 +3242,6 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 		EndRecPtr = xlogreader-EndRecPtr;
 		if (record == NULL)
 		{
-			lastSourceFailed = true;
-
 			if (readFile = 0)
 			{
 close(readFile);
@@ -3247,22 +3258,16 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 ereport(emode_for_corrupt_record(emode,
  RecPtr ? RecPtr : EndRecPtr),
 		(errmsg_internal(%s, errormsg) /* already translated */));
-
-			/* Give up, or retry if we're in standby mode. */
-			continue;
 		}
-
 		/*
 		 * Check page TLI is one of the expected values.
 		 */
-		if (!tliInHistory(xlogreader-latestPageTLI, expectedTLEs))
+		else if (!tliInHistory(xlogreader-latestPageTLI, expectedTLEs))
 		{
 			char		fname[MAXFNAMELEN];
 			XLogSegNo segno;
 			int32 offset;
 
-			lastSourceFailed = 

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Ants Aasma
On Feb 13, 2013 10:29 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
 Hmm, I just realized a little problem with that approach. If you take a
base backup using an atomic filesystem backup from a running server, and
start archive recovery from that, that's essentially the same thing as
Kyotaro's test case.

Coincidentally I was wondering about the same thing when I was reviewing
our slave provisioning procedures. There didn't seem to be any
communication channel from pg_stop_backup for the slave to know when it was
safe to allow connections.

Maybe there should be some mechanism akin to backup label to communicate
the minimum recovery point? When the min recovery point file exists the
value inside it is used, when the recovery point is reached the file is
removed. pg_basebackup would just append the file to the archive. Custom
backup procedures could also use it to communicate the necessary WAL
location.

--
Ants Aasma


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 13.02.2013 17:02, Tom Lane wrote:

 Heikki Linnakangashlinnakan...@vmware.com  writes:

 At least in back-branches, I'd call this a pilot error. You can't turn a
 master into a standby just by creating a recovery.conf file. At least
 not if the master was not shut down cleanly first.
 ...
 I'm not sure that's worth the trouble, though. Perhaps it would be
 better to just throw an error if the control file state is
 DB_IN_PRODUCTION and a recovery.conf file exists.


 +1 for that approach, at least until it's clear there's a market for
 doing this sort of thing.  I think the error check could be
 back-patched, too.


 Hmm, I just realized a little problem with that approach. If you take a base
 backup using an atomic filesystem backup from a running server, and start
 archive recovery from that, that's essentially the same thing as Kyotaro's
 test case.

Yes. And the resource agent for streaming replication in Pacemaker (it's the
OSS clusterware) is the user of that archive recovery scenario, too. When it
starts up the server, it always creates the recovery.conf and starts the server
as the standby. It cannot start the master directly, IOW the server is always
promoted to the master from the standby. So when it starts up the server
after the server crashes, obviously it executes the same recovery scenario
(i.e., force archive recovery instead of crash one) as Kyotaro described.

The reason why that resource agent cannot start up the master directly is
that it manages three server states, called Master, Slave and Down. It can
move the server state from Down to Slave, and the reverse direction.
Also it can move the state from Slave to Master, and the reverse direction.
But there is no way to move the state between Down and Master directly.
This kind of the state transition model is isolated case in
clusterware, I think.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:52 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Without step 3, the server would perform crash recovery, and it would work.
 But because of the recovery.conf file, the server goes into archive
 recovery, and because minRecoveryPoint is not set, it assumes that the
 system is consistent from the start.

 Aside from the immediate issue with truncation, the system really isn't
 consistent until the WAL has been replayed far enough, so it shouldn't open
 for hot standby queries. There might be other, later, changes already
 flushed to data files. The system has no way of knowing how far it needs to
 replay the WAL to become consistent.

 At least in back-branches, I'd call this a pilot error. You can't turn a
 master into a standby just by creating a recovery.conf file. At least not if
 the master was not shut down cleanly first.

 If there's a use case for doing that, maybe we can do something better in
 HEAD. If the control file says that the system was running
 (DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
 recovery first, until we reach the end of WAL, and go into archive recovery
 mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
 same as in crash recovery, and only start restoring files from the archive
 once we reach the end of WAL in pg_xlog. At that point, we'd also consider
 the system as consistent, and start up for hot standby.

 I'm not sure that's worth the trouble, though. Perhaps it would be better to
 just throw an error if the control file state is DB_IN_PRODUCTION and a
 recovery.conf file exists. The admin can always start the server normally
 first, shut it down cleanly, and then create the recovery.conf file.

 Now I've read the whole thing...

 The problem is that we startup Hot Standby before we hit the min
 recovery point because that isn't recorded. For me, the thing to do is
 to make the min recovery point == end of WAL when state is
 DB_IN_PRODUCTION. That way we don't need to do any new writes and we
 don't need to risk people seeing inconsistent results if they do this.

+1

And if it's the standby case, the min recovery point can be set to the end
of WAL files located in the standby. IOW, we can regard the database as
consistent when we replay all the WAL files in local and try to connect to
the master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 09:46, Kyotaro HORIGUCHI wrote:

In this case, the FINAL consistency point is at the
XLOG_SMGR_TRUNCATE record, but current implemet does not record
the consistency point (checkpoint, or commit or smgr_truncate)
itself, so we cannot predict the final consistency point on
starting of recovery.


Hmm, what you did was basically:

1. Run server normally.
2. Kill it with pg_ctl stop -m immediate.
3. Create a recovery.conf file, turning the server into a hot standby.

Without step 3, the server would perform crash recovery, and it would 
work. But because of the recovery.conf file, the server goes into 
archive recovery, and because minRecoveryPoint is not set, it assumes 
that the system is consistent from the start.


Aside from the immediate issue with truncation, the system really isn't 
consistent until the WAL has been replayed far enough, so it shouldn't 
open for hot standby queries. There might be other, later, changes 
already flushed to data files. The system has no way of knowing how far 
it needs to replay the WAL to become consistent.


At least in back-branches, I'd call this a pilot error. You can't turn a 
master into a standby just by creating a recovery.conf file. At least 
not if the master was not shut down cleanly first.


If there's a use case for doing that, maybe we can do something better 
in HEAD. If the control file says that the system was running 
(DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash 
recovery first, until we reach the end of WAL, and go into archive 
recovery mode after that. We'd recover all the WAL files in pg_xlog as 
far as we can, same as in crash recovery, and only start restoring files 
from the archive once we reach the end of WAL in pg_xlog. At that point, 
we'd also consider the system as consistent, and start up for hot standby.


I'm not sure that's worth the trouble, though. Perhaps it would be 
better to just throw an error if the control file state is 
DB_IN_PRODUCTION and a recovery.conf file exists. The admin can always 
start the server normally first, shut it down cleanly, and then create 
the recovery.conf file.



On the other hand, updating control file on every commits or
smgr_truncate's should slow the transactions..


To be precise, we'd need to update the control file on every 
XLogFlush(), like we do during archive recovery. That would indeed be 
unacceptable from a performance point of view. Updating the control file 
that often would also be bad for robustness.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 At least in back-branches, I'd call this a pilot error. You can't turn a 
 master into a standby just by creating a recovery.conf file. At least 
 not if the master was not shut down cleanly first.
 ...
 I'm not sure that's worth the trouble, though. Perhaps it would be 
 better to just throw an error if the control file state is 
 DB_IN_PRODUCTION and a recovery.conf file exists.

+1 for that approach, at least until it's clear there's a market for
doing this sort of thing.  I think the error check could be
back-patched, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 To be precise, we'd need to update the control file on every XLogFlush(),
 like we do during archive recovery. That would indeed be unacceptable from a
 performance point of view. Updating the control file that often would also
 be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

It sounds like we need to look at something better for use during
archive recovery.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 To be precise, we'd need to update the control file on every XLogFlush(),
 like we do during archive recovery. That would indeed be unacceptable from a
 performance point of view. Updating the control file that often would also
 be bad for robustness.

 If those arguments make sense, then why don't they apply to recovery as well?

In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 20:25, Simon Riggs wrote:

On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com  wrote:


To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.


If those arguments make sense, then why don't they apply to recovery as well?


To some degree, they do. The big difference is that during normal 
operation, every commit is XLogFlushed(). During recovery, XLogFlush() 
happens much less frequently - certainly not after replaying each commit 
record.



It sounds like we need to look at something better for use during
archive recovery.


Well, no-one's complained about the performance. From a robustness point 
of view, it might be good to keep the minRecoveryPoint value in a 
separate file, for example, to avoid rewriting the control file that 
often. Then again, why fix it when it's not broken.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Well, no-one's complained about the performance. From a robustness point 
 of view, it might be good to keep the minRecoveryPoint value in a 
 separate file, for example, to avoid rewriting the control file that 
 often. Then again, why fix it when it's not broken.

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case?  This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 21:21, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

Well, no-one's complained about the performance. From a robustness point
of view, it might be good to keep the minRecoveryPoint value in a
separate file, for example, to avoid rewriting the control file that
often. Then again, why fix it when it's not broken.


It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.


The problem is when you interrupt archive recovery (kill -9), and 
restart. After restart, the system needs to know how far the WAL was 
replayed before the crash, because it must not open for hot standby 
queries, or allow the database to be started up in master-mode, until 
it's replayed the WAL up to that same point again.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 21:03, Tom Lane wrote:

Simon Riggssi...@2ndquadrant.com  writes:

On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com  wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.



If those arguments make sense, then why don't they apply to recovery as well?


In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?


The problem we're trying to solve is determining how much WAL needs to 
be replayed until the database is consistent again. In crash recovery, 
the answer is all of it. That's why the CRC in the WAL is essential; 
it's required to determine where the WAL ends. But if we had some other 
mechanism, like if we updated minRecoveryPoint after every XLogFlush() 
like Simon suggested, we wouldn't necessarily need the CRC to detect end 
of WAL (not that I'd suggest removing it anyway), and we could throw an 
error if there is corrupt bit somewhere in the WAL before the true end 
of WAL.


In archive recovery, we can't just say replay all the WAL, because the 
whole idea of PITR is to not recover all the WAL. So we use 
minRecoveryPoint to keep track of how far the WAL needs to be replayed 
at a minimum, for the database to be consistent.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 13.02.2013 21:21, Tom Lane wrote:
 It would only be broken if someone interrupted a crash recovery
 mid-flight and tried to establish a recovery stop point before the end
 of WAL, no?  Why don't we just forbid that case? This would either be
 the same as, or a small extension of, the pg_control state vs existence
 of recovery.conf error check that was just discussed.

 The problem is when you interrupt archive recovery (kill -9), and 
 restart. After restart, the system needs to know how far the WAL was 
 replayed before the crash, because it must not open for hot standby 
 queries, or allow the database to be started up in master-mode, until 
 it's replayed the WAL up to that same point again.

Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 21:30, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

On 13.02.2013 21:21, Tom Lane wrote:

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.



The problem is when you interrupt archive recovery (kill -9), and
restart. After restart, the system needs to know how far the WAL was
replayed before the crash, because it must not open for hot standby
queries, or allow the database to be started up in master-mode, until
it's replayed the WAL up to that same point again.


Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.


Ah, ok. The short answer to that is no, because in crash recovery, we 
just replay the WAL all the way to the end. I thought he was questioning 
updating the control file at every XLogFlush() during archive recovery. 
The answer to that is that it's not so bad, because XLogFlush() is 
called so infrequently during recovery.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 The problem we're trying to solve is determining how much WAL needs to 
 be replayed until the database is consistent again. In crash recovery, 
 the answer is all of it. That's why the CRC in the WAL is essential; 
 it's required to determine where the WAL ends. But if we had some other 
 mechanism, like if we updated minRecoveryPoint after every XLogFlush() 
 like Simon suggested, we wouldn't necessarily need the CRC to detect end 
 of WAL (not that I'd suggest removing it anyway), and we could throw an 
 error if there is corrupt bit somewhere in the WAL before the true end 
 of WAL.

Meh.  I think that would be disastrous from both performance and
reliability standpoints.  (Performance because the whole point of WAL is
to commit with only one disk write in one place, and reliability because
of greatly increasing the number of writes to the utterly-critical
pg_control file.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 13.02.2013 21:30, Tom Lane wrote:
 Well, archive recovery is a different scenario --- Simon was questioning
 whether we need a minRecoveryPoint mechanism in crash recovery, or at
 least that's what I thought he asked.

 Ah, ok. The short answer to that is no, because in crash recovery, we 
 just replay the WAL all the way to the end. I thought he was questioning 
 updating the control file at every XLogFlush() during archive recovery. 
 The answer to that is that it's not so bad, because XLogFlush() is 
 called so infrequently during recovery.

Right, and it's not so evil from a reliability standpoint either, partly
because of that and partly because, by definition, this isn't your only
copy of the data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 17:02, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.
...
I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists.


+1 for that approach, at least until it's clear there's a market for
doing this sort of thing.  I think the error check could be
back-patched, too.


Hmm, I just realized a little problem with that approach. If you take a 
base backup using an atomic filesystem backup from a running server, and 
start archive recovery from that, that's essentially the same thing as 
Kyotaro's test case.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Without step 3, the server would perform crash recovery, and it would work.
 But because of the recovery.conf file, the server goes into archive
 recovery, and because minRecoveryPoint is not set, it assumes that the
 system is consistent from the start.

 Aside from the immediate issue with truncation, the system really isn't
 consistent until the WAL has been replayed far enough, so it shouldn't open
 for hot standby queries. There might be other, later, changes already
 flushed to data files. The system has no way of knowing how far it needs to
 replay the WAL to become consistent.

 At least in back-branches, I'd call this a pilot error. You can't turn a
 master into a standby just by creating a recovery.conf file. At least not if
 the master was not shut down cleanly first.

 If there's a use case for doing that, maybe we can do something better in
 HEAD. If the control file says that the system was running
 (DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
 recovery first, until we reach the end of WAL, and go into archive recovery
 mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
 same as in crash recovery, and only start restoring files from the archive
 once we reach the end of WAL in pg_xlog. At that point, we'd also consider
 the system as consistent, and start up for hot standby.

 I'm not sure that's worth the trouble, though. Perhaps it would be better to
 just throw an error if the control file state is DB_IN_PRODUCTION and a
 recovery.conf file exists. The admin can always start the server normally
 first, shut it down cleanly, and then create the recovery.conf file.

Now I've read the whole thing...

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

But I think that still gives you a timeline problem when putting a
master back into a standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers