Re: [HACKERS] Streaming replication, retrying from archive

2010-01-21 Thread Mark Kirkwood
Dimitri Fontaine wrote: Heikki Linnakangas writes: Yeah, a lot of that logic and states is completely unnecessary until we have a synchronous mode. Even then, it seems complex. I hope we'll find something less complex, what I proposed is heavily inspired from londiste (Skytools) table

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-21 Thread Dimitri Fontaine
Heikki Linnakangas writes: > Yeah, a lot of that logic and states is completely unnecessary until we > have a synchronous mode. Even then, it seems complex. I hope we'll find something less complex, what I proposed is heavily inspired from londiste (Skytools) table addition to a replication set (

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-20 Thread Simon Riggs
On Wed, 2010-01-20 at 21:26 +0200, Heikki Linnakangas wrote: > So there's just two states: > > 1. Recovering from archive > 2. Streaming > > We start from 1, and switch state at error. > > This gives nice behavior from a user point of view. Standby tries to > make progress using either the arch

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-20 Thread Heikki Linnakangas
Dimitri Fontaine wrote: > Heikki Linnakangas writes: >> 1. Initial archive recovery. Standby fetches WAL files from archive >> using restore_command. When a file is not found in archive, we start >> walreceiver and switch to state 2 >> >> 2. Retrying to restore from archive. When the connection to

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-16 Thread Dimitri Fontaine
Thanks for stating it this way, it really helps figuring out what is it we're talking about! Heikki Linnakangas writes: > The states with my suggested ReadRecord/FetchRecord refactoring, the > code I have in the replication-xlogrefactor branch in my git repo, > are: They look like you're trying

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-15 Thread Simon Riggs
On Fri, 2010-01-15 at 20:11 +0200, Heikki Linnakangas wrote: > The states we have at the moment in standby are: > > 1. Archive recovery. Standby fetches WAL files from archive using > restore_command. When a file is not found in archive, we switch to state 2 > > 2. Streaming replication. Standby

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-15 Thread Heikki Linnakangas
Dimitri Fontaine wrote: > But how we handle failures when transitioning from one state to the > other should be a lot easier to discuss and decide as soon as we have > the possible states and the transitions we want to allow and support. I > think. > > My guess is that those states and transitions

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Fujii Masao
On Fri, Jan 15, 2010 at 7:19 AM, Heikki Linnakangas wrote: > Let's introduce a new boolean variable in shared memory that the > walreceiver can set to tell startup process if it's connected or > streaming, or disconnected. When startup process sees that walreceiver > is connected, it waits for rec

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Heikki Linnakangas
Fujii Masao wrote: > On Fri, Jan 15, 2010 at 12:23 AM, Heikki Linnakangas > wrote: >> If we don't fix that within the server, we will need to document that >> caveat and every installation will need to work around that one way or >> another. Maybe with some monitoring software and an automatic res

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Tom Lane
Robert Haas writes: > I'm thinking that HS+SR are going to be a bit like the Windows port - > they're going to require a few releases before they really work as > well as we'd like them too. I've assumed that from the get-go ;-). It's one of the reasons that we ought to label this release 9.0 if

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Robert Haas
On Thu, Jan 14, 2010 at 10:23 AM, Heikki Linnakangas wrote: > I wasn't really asking if it's possible to fix, I meant "Let's think > about *how* to fix that". Well... maybe if it doesn't require too MUCH thought. I'm thinking that HS+SR are going to be a bit like the Windows port - they're goin

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Dimitri Fontaine
Fujii Masao writes: > On Fri, Jan 15, 2010 at 1:06 AM, Dimitri Fontaine > wrote: >> 0. base: slave asks the master for a base-backup, at the end of this it >> reaches the base-lsn > > What if the WAL file including the archive recovery starting location has > been removed from the primary's

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Fujii Masao
On Fri, Jan 15, 2010 at 1:06 AM, Dimitri Fontaine wrote: > Did I mention my viewpoint on that already? >  http://archives.postgresql.org/pgsql-hackers/2009-07/msg00943.php > 0. base: slave asks the master for a base-backup, at the end of this it > reaches the base-lsn What if the WAL file i

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Fujii Masao
On Fri, Jan 15, 2010 at 12:23 AM, Heikki Linnakangas wrote: > If we don't fix that within the server, we will need to document that > caveat and every installation will need to work around that one way or > another. Maybe with some monitoring software and an automatic restart. Ugh. > > I wasn't re

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Dimitri Fontaine
Heikki Linnakangas writes: > If we don't fix that within the server, we will need to document that > caveat and every installation will need to work around that one way or > another. Maybe with some monitoring software and an automatic restart. Ugh. > > I wasn't really asking if it's possible to f

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Heikki Linnakangas
Magnus Hagander wrote: > On Thu, Jan 14, 2010 at 15:36, Robert Haas wrote: >> On Thu, Jan 14, 2010 at 9:15 AM, Heikki Linnakangas >> wrote: >>> Imagine this scenario: >>> >>> 1. Master is up and running, standby is connected and streaming happily >>> 2. Network goes down, connection is broken. >>

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Magnus Hagander
On Thu, Jan 14, 2010 at 15:36, Robert Haas wrote: > On Thu, Jan 14, 2010 at 9:15 AM, Heikki Linnakangas > wrote: >> Imagine this scenario: >> >> 1. Master is up and running, standby is connected and streaming happily >> 2. Network goes down, connection is broken. >> 3. Standby falls behind a lot.

Re: [HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Robert Haas
On Thu, Jan 14, 2010 at 9:15 AM, Heikki Linnakangas wrote: > Imagine this scenario: > > 1. Master is up and running, standby is connected and streaming happily > 2. Network goes down, connection is broken. > 3. Standby falls behind a lot. Old WAL files that the standby needs are > archived, and de

[HACKERS] Streaming replication, retrying from archive

2010-01-14 Thread Heikki Linnakangas
Imagine this scenario: 1. Master is up and running, standby is connected and streaming happily 2. Network goes down, connection is broken. 3. Standby falls behind a lot. Old WAL files that the standby needs are archived, and deleted from master. 4. Network is restored. Standby reconnects 5. Standb