Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread fazool mein
 Might I suggest adopting the same technique walsender does, ie just read
 the data back from disk?  There's a reason why we gave up trying to have
 walsender read directly from the buffers.


That is exactly what I do not want to do, i.e. read from disk, as long as
the piece of WAL is available in the buffers. Can you please describe why
walsender reading directly from the buffers was given up? To avoid a lot of
locking?
The locking issue might not be a problem considering synchronous
replication. In synchronous replication, the primary will anyways wait for
the standby to send a confirmation before it can do more WAL inserts. Hence,
reading from buffers might be better in this case.

So, as I understand from the emails, we need to lock both WALWriteLock and
WALInsertLock in exclusive mode for reading from buffers. Agreed?

Thanks.


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread fazool mein
On Tue, Oct 26, 2010 at 11:23 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 
  Can you please describe why
  walsender reading directly from the buffers was given up? To avoid a lot
  of
  locking?
 
  To avoid locking yes, and complexity in general.

 And the fact that it might allow the standby to get ahead of the
 master, leading to silent database corruption.


I agree that the standby might get ahead, but this doesn't necessarily lead
to database corruption. Here, the interesting case is what happens when the
primary fails, which can lead to *either* of the following two cases:
1) The standby, due to some triggering mechanism, becomes the new primary.
In this case, even if the standby was ahead, its fine.
2) The primary comes back as primary. In this case, the standby will connect
again to the primary. At this point, *if* somehow we are able to detect that
the standby is ahead, then we should abort the standby and create a standby
from scratch.

I agree with Heikki that going through all this trouble only makes sense if
there is a huge performance boost.


[HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-22 Thread fazool mein
Hello guys,

I'm writing a function that will read data from the buffer in xlog (i.e.
from XLogCtl-pages and XLogCtl-xlblocks). I want to make sure that I am
doing it correctly.
For reading from the buffer, do I need to lock WALInsertLock or
WALWriteLock? Also, can you explain a bit the usage of 'LW_SHARED'. Can we
use it for read purposes?

Thanks a lot.


Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-18 Thread fazool mein
I believe we should come up with a universal solution that will solve
potential future problems as well (for example, if in sync replication, we
decide to send writes to standbys in parallel to writing on local disk).

The ideal thing would be to have an id that is incremented on every failure,
and is stored in the WAL. Whenever a standby connects to the primary, it
should send the point p in WAL where streaming should start, plus the id. If
the id is the same at the primary at point p, things are good. Else, we
should tell the standby to either create a new copy from scratch, or delete
some WALs.

@David
 One way to get them in sync without starting from scratch is to use
 rsync from A to B.  This works in the asynchronous case, too. :)

The problem mainly is detecting when one can rsync/stream and when not.

Regards



On Mon, Oct 18, 2010 at 1:57 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Fujii Masao masao.fu...@gmail.com writes:
  But, even though we will have done that, it should be noted that WAL in
  A might be ahead of that in B. For example, A might crash right after
  writing WAL to the disk and before sending it to B. So when we restart
  the old master A as the standby after failover, we should need to delete
  some WAL files (in A) which are inconsistent with the WAL sequence in B.

 The idea to send from master to slave the current last applied LSN has
 been talked about already. It would allow to send the WAL content in
 parallel of it's local fsync() on the master, the standby would refrain
 from applying any WAL segment until it knows the master is past that.

 Now, given such a behavior, that would mean that when A joins again as a
 standby, it would have to ask B for the current last applied LSN too,
 and would notice the timeline change. Maybe by adding a facility to
 request the last LSN of the previous timeline, and with the behavior
 above applied there (skipping now-known-future-WALs in recovery), that
 would work automatically?

 There's still the problem of WALs that have been applied before
 recovery, I don't know that we can do anything here. But maybe we could
 also tweak the CHECKPOINT mecanism not to advance the restart point
 until we know the standbys have already replayed anything up to the
 restart point?

 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



[HACKERS] Timeline in the light of Synchronous replication

2010-10-16 Thread fazool mein
Hello guys,

The concept of time line makes sense to me in the case of asynchronous
replication. But in case of synchronous replication, I am not so sure.

When a standby connects to the primary, it checks if both have the same time
line. If not, it doesn't start.

Now, consider the following scenario. The primary (call it A) fails, the
standby (call it B), via a trigger file, comes out of recovery mode
(increments time line id to say 2), and morphs into a primary. Now, lets say
we start the old primary A as a standby, to connect to the new primary B
(which previously was a standby). As the code is at the moment, the old
primary A will not be allowed to connect to the new primary B because A's
timelineid (1) is not equivalent to that of the new primary B (2). Hence, we
need to create a backup again, and setup the standby from scratch.

In the above scenario, if the system was using asynchronous replication,
time lines would have saved us from doing something wrong. But, if we are
using synchronous replication, we know that both A and B would have been in
sync before the failure. In this case, forcing to create a new standby from
scratch because of time lines might not be very desirable if the database is
huge.

Your comments on the above will be appreciated.

Regards


Re: [HACKERS] Heartbeat between Primary and Standby replicas

2010-09-26 Thread fazool mein
Hello again,

I checked the code for the keepalive feature. It seems that the socket
options are only set on the primary's socket connection. The tcp connection
created on the secondary for walreceiver does not use the keepalive
parameters from the configuration.

Am I correct? Is this intended or some bug?

Thanks.



On Fri, Sep 17, 2010 at 7:05 PM, fazool mein fazoolm...@gmail.com wrote:

 Apologies. I'm new to Postgres and I didn't see that feature. It satisfies
 what I want to do.

 Thanks.


 On Thu, Sep 16, 2010 at 7:34 PM, Fujii Masao masao.fu...@gmail.comwrote:

 On Fri, Sep 17, 2010 at 6:49 AM, fazool mein fazoolm...@gmail.com
 wrote:
  I am designing a heartbeat system between replicas to know when a
 replica
  goes down so that necessary measures can be taken. As I see, there are
 two
  ways of doing it:
 
  1) Creating a separate heartbeat process on replicas.
  2) Creating a heartbeat message, and sending it over the connection that
 is
  already established between walsender and walreceiver.
 
  With 2, sending heartbeat from walsender to walreceiver seems trivial.
  Sending a heartbeat from walreceiver to walsender seems tricky. Going
  through the code, it seems that the walreceiver is always in the
  PGASYNC_COPY_OUT mode (except in the beginning when handshaking is
 done).
 
  Can you recommend the right way of doing this?

 The existing keepalive feature doesn't help?

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center





Re: [HACKERS] Heartbeat between Primary and Standby replicas

2010-09-26 Thread fazool mein
Ah, great. I missed looking there.
Thanks.

On Sun, Sep 26, 2010 at 4:19 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Mon, Sep 27, 2010 at 7:46 AM, fazool mein fazoolm...@gmail.com wrote:
  I checked the code for the keepalive feature. It seems that the socket
  options are only set on the primary's socket connection. The tcp
 connection
  created on the secondary for walreceiver does not use the keepalive
  parameters from the configuration.

 You can use libpq keepalive parameters for walreceiver.

 keepalives_idle
 keepalives_interval
 keepalives_count
 http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html

 Those can be set in primary_connection in recovery.conf.

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center



Re: [HACKERS] Shutting down server from a backend process, e.g. walrceiver

2010-09-22 Thread fazool mein
Thanks for the tips.

In our case, SIGINT makes more sense. I'll use that.

Regards


On Tue, Sep 21, 2010 at 7:50 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Wed, Sep 22, 2010 at 2:50 AM, fazool mein fazoolm...@gmail.com wrote:
  Yes, I'll be modifying the code. In the walreceiver, I used the following
 to
  send a shutdown to the postmaster:
 
  kill(getppid(), SIGTERM);

 You can use the global variable PostmasterPid instead of getppid.
 There are three types of shutdown. SIGTERM triggers smart shutdown.
 Smart shutdown is suitable for your case? If not, you might need to
 send SIGINT or SIGQUIT instead.

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center



Re: [HACKERS] Shutting down server from a backend process, e.g. walrceiver

2010-09-21 Thread fazool mein
On Mon, Sep 20, 2010 at 9:44 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Sep 21, 2010 at 9:48 AM, fazool mein fazoolm...@gmail.com wrote:
  Hi,
 
  I want to shut down the server under certain conditions that can be
 checked
  inside a backend process. For instance, while running symmetric
 replication,
  if the primary dies, I want the the walreceiver to detect that and
 shutdown
  the standby. The reason for shutdown is that I want to execute some other
  stuff before I start the standby as a primary. Creating a trigger file
  doesn't help as it converts the standby into primary at run time.
 
  Using proc_exit() inside walreceiver only terminates the walreceiver
  process, which postgres starts again. The other way I see is using
  ereport(PANIC, ...). Is there some other way to shutdown the main server
  from within a backend process?

 Are you going to change the source code? If yes, you might be able to
 do that by making walreceiver send the shutdown signal to postmaster.


Yes, I'll be modifying the code. In the walreceiver, I used the following to
send a shutdown to the postmaster:

kill(getppid(), SIGTERM);


 If no, I think that a straightforward approach is to use a clusterware
 like pacemaker. That is, you need to make a clusterware periodically
 check the master and cause the standby to end when detecting the crash
 of the master.


This was another option, but I have to modify the code for this particular
case.

Thanks for your help.

Regards,


Re: [HACKERS] Shutting down server from a backend process, e.g. walrceiver

2010-09-21 Thread fazool mein
On Tue, Sep 21, 2010 at 8:32 AM, David Fetter da...@fetter.org wrote:

 On Mon, Sep 20, 2010 at 05:48:40PM -0700, fazool mein wrote:
  Hi,
 
  I want to shut down the server under certain conditions that can be
  checked inside a backend process.  For instance, while running
  symmetric

 Synchronous?


I meant streaming :), but the question is in general for any process forked
by the postmaster.



  replication, if the primary dies, I want the the walreceiver to
  detect that and shutdown the standby.  The reason for shutdown is
  that I want to execute some other stuff before I start the standby
  as a primary.  Creating a trigger file doesn't help as it converts
  the standby into primary at run time.
 
  Using proc_exit() inside walreceiver only terminates the walreceiver
  process, which postgres starts again.  The other way I see is using
  ereport(PANIC, ...).  Is there some other way to shutdown the main
  server from within a backend process?

 Perhaps I've misunderstood, but since there's already Something
 Else(TM) which takes actions, why not send a message to it so it can
 take appropriate action on the node, starting with shutting it down?


(wondering)

Thanks.


[HACKERS] Shutting down server from a backend process, e.g. walrceiver

2010-09-20 Thread fazool mein
Hi,

I want to shut down the server under certain conditions that can be checked
inside a backend process. For instance, while running symmetric replication,
if the primary dies, I want the the walreceiver to detect that and shutdown
the standby. The reason for shutdown is that I want to execute some other
stuff before I start the standby as a primary. Creating a trigger file
doesn't help as it converts the standby into primary at run time.

Using proc_exit() inside walreceiver only terminates the walreceiver
process, which postgres starts again. The other way I see is using
ereport(PANIC, ...). Is there some other way to shutdown the main server
from within a backend process?

Thanks.


Re: [HACKERS] Heartbeat between Primary and Standby replicas

2010-09-17 Thread fazool mein
Apologies. I'm new to Postgres and I didn't see that feature. It satisfies
what I want to do.

Thanks.

On Thu, Sep 16, 2010 at 7:34 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Fri, Sep 17, 2010 at 6:49 AM, fazool mein fazoolm...@gmail.com wrote:
  I am designing a heartbeat system between replicas to know when a replica
  goes down so that necessary measures can be taken. As I see, there are
 two
  ways of doing it:
 
  1) Creating a separate heartbeat process on replicas.
  2) Creating a heartbeat message, and sending it over the connection that
 is
  already established between walsender and walreceiver.
 
  With 2, sending heartbeat from walsender to walreceiver seems trivial.
  Sending a heartbeat from walreceiver to walsender seems tricky. Going
  through the code, it seems that the walreceiver is always in the
  PGASYNC_COPY_OUT mode (except in the beginning when handshaking is done).
 
  Can you recommend the right way of doing this?

 The existing keepalive feature doesn't help?

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center



[HACKERS] Heartbeat between Primary and Standby replicas

2010-09-16 Thread fazool mein
Hello everyone,

I am designing a heartbeat system between replicas to know when a replica
goes down so that necessary measures can be taken. As I see, there are two
ways of doing it:

1) Creating a separate heartbeat process on replicas.
2) Creating a heartbeat message, and sending it over the connection that is
already established between walsender and walreceiver.

With 2, sending heartbeat from walsender to walreceiver seems trivial.
Sending a heartbeat from walreceiver to walsender seems tricky. Going
through the code, it seems that the walreceiver is always in the
PGASYNC_COPY_OUT mode (except in the beginning when handshaking is done).

Can you recommend the right way of doing this?

Thank you.

Regards

---
Postgres version = 9.0 beta-4


[HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread fazool mein
Hello everyone,

I'm interested in benchmarking synchronous replication, to see how
performance degrades compared to asynchronous streaming replication.

I browsed through the archive of emails, but things still seem unclear. Do
we have a final agreed upon patch that I can use? Any links for that?

Thanks.

OS = Linux Suse, sles 11, 64-bit
Postgres version = 9.0 beta-4


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread fazool mein
Thanks!

I'll wait for the merging then; there is no point in benchmarking otherwise.

Regards

On Tue, Aug 31, 2010 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Wed, Sep 1, 2010 at 9:34 AM, Robert Haas robertmh...@gmail.com wrote:
  There are patches, and the latest from Fujii Masao is probably worth
  looking at :)
 
  I am pretty sure, however, that the performance will be terrible at
  this point.  Heikki is working on fixing that, but it ain't done yet.

 Yep. The latest WIP code is available in my git repository, but it's
 not worth benchmarking yet. I'll need to merge Heikki's effort and
 the synchronous replication patch.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center