Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-05-07 Thread Heikki Linnakangas
On 26.04.2013 11:51, KONDO Mitsumasa wrote: Hi, I discavered the problem cause. I think taht horiguchi's discovery is another problem... Problem has CreateRestartPoint. In recovery mode, PG should not WAL record. Because PG does not know latest WAL file location. But in this problem case,

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-05-07 Thread KONDO Mitsumasa
(2013/05/07 22:40), Heikki Linnakangas wrote: On 26.04.2013 11:51, KONDO Mitsumasa wrote: So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint. It seems to fix this problem, but attached patch is plain. I didn't understand this. I committed a fix for the issue where

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Heikki Linnakangas
On 26.04.2013 07:02, Kyotaro HORIGUCHI wrote: I am uncertain a bit weather it is necessary to move curFileTLI to anywhere randomly read . On a short glance, the random access occurs also for reading checkpoint-related records. I didn't understand that. Also I don't have clear distinction

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Amit Langote
What would happen if a recycled segment gets assigned a newer timeline than the one we are currently recovering from? In the reported erroneous behavior, that is what happens causing XLogFileReadAnyTLI() to return such bogus segment causing the error. Since, expectedTLIs contains a newer timeline

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Heikki Linnakangas
On 26.04.2013 07:47, Amit Langote wrote: How would code after applying this patch behave if a recycled segment gets renamed using the newest timeline (say 3) while we are still recovering from a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns that recycled segment as the

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread KONDO Mitsumasa
Hi, I discavered the problem cause. I think taht horiguchi's discovery is another problem... Problem has CreateRestartPoint. In recovery mode, PG should not WAL record. Because PG does not know latest WAL file location. But in this problem case, PG(standby) write WAL file at RestartPoint in

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Mitsumasa KONDO
I explain more detail about this problem. This problem was occurred by RestartPoint create illegal WAL file in during archive recovery. But I cannot recognize why illegal WAL file was created in CreateRestartPoint(). My attached patch is really plain… In problem case at XLogFileReadAnyTLI(),

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Amit Langote
I also had a similar observation when I could reproduce this. I tried to find why restartpoint causes the recycled segment to be named after timeline 3, but have not been able to determine that. When I looked at the source, I found that, the function XLogFileReadAnyTLI which returns a segment

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Kyotaro HORIGUCHI
Hmm. I think that I caught the tail of the problem.. Script with small change reproduced the situation for me. The latest standby uses 3 as its TLI after the history file 0..3.history which could get from the archive. So the WAL files recycled on this standby will have the TLI=3. Nevertheless

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Heikki Linnakangas
On 25.04.2013 17:55, Kyotaro HORIGUCHI wrote: Hmm. I think that I caught the tail of the problem.. Script with small change reproduced the situation for me. Can you share the modified script, please? The latest standby uses 3 as its TLI after the history file 0..3.history which could get

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Kyotaro HORIGUCHI
I forgot it. In conclusion, the standby should name the recycled WAL segment using the same TLI for the LSN on the master. Or should never recycle WAL files. Or the standby should make the request with correct TLI at first consulting the timeline history. Or the standby should make retry with

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Kyotaro HORIGUCHI
Can you share the modified script, please? Please find the attached files: test.sh : test script. most significant change is the load. I used simple insert instead of pgbench. It might need some more adjustment for other environment as my usual.

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Heikki Linnakangas
On 25.04.2013 18:56, Kyotaro HORIGUCHI wrote: Can you share the modified script, please? Please find the attached files: test.sh : test script. most significant change is the load. I used simple insert instead of pgbench. It might need some more adjustment

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Kyotaro HORIGUCHI
Thank you for the patch. The test script finishes in success with that. And looks reasonable on a short glance. On Fri, Apr 26, 2013 at 4:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: One idea to fix this is to not set curFileTLI, until the page header on the just-opened file has

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Amit Langote
How would code after applying this patch behave if a recycled segment gets renamed using the newest timeline (say 3) while we are still recovering from a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns that recycled segment as the next segment to recover from, we get the

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-25 Thread Amit Langote
How would code after applying this patch behave if a recycled segment gets renamed using the newest timeline (say 3) while we are still recovering from a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns that recycled segment as the next segment to recover from, we get the

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
ello, On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote: Hi, I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication. I test same problem in PG9.2.3. But it is not occerd... cp: cannot stat `../arc/00030013':

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Hello, cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file 00020013 from archive I can't read the error message here, but this looks suspicious. The error message is No such file or

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Sorry, caller XLogFileOpen successfully ets and returns fd for the filename The caller is XLogFileRead in this case. # and 'ets' is gets, of course. regards, -- Kyotaro Horiguchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
On 2013-04-24 19:16:12 +0900, Kyotaro HORIGUCHI wrote: Hello, cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file 00020013 from archive I can't read the error message here, but

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Oops, But thats not what happening here, afaics the restore log file ... message is only printed if the returncode is 0. You're right. 'cp nonexistent somewhere' exits with the status code 1 (or 256?). The quoted log lines simply show that segment for tli=3 did not exist and that for tli=2

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
I had a bit look on it and came up with an hypothesis.. umm or a scenario. == Just after restartpoint, too old xlog files are recycled but its page header has old content, specifically, xlogid and xrecoff. Plus, if the master's LSN is at the head of new segment file, the file for the segment

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread KONDO Mitsumasa
Hi, I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication. I test same problem in PG9.2.3. But it is not occerd... cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません [Standby] 2013-04-22 01:27:25 EDTLOG: 0: