subject:"\[HACKERS\] Streaming replication and non\-blocking I\/O"

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-16 Thread Dimitri Fontaine

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 The module doesn't need to touch backend internals much at all, no
 tinkering with shared memory for example, so I would feel much better
 about moving that out of src/backend. Not sure where, though; it's not
 an executable, so src/bin is hardly the right place, but I wouldn't want
 to put it in contrib either, because it should still be built and
 installed by default. So I'm inclined to still leave it in
 src/backend/replication/

It should be possible to be in contrib and installed by default, even
with the current tool set, by tweaking initdb to install the contrib
into template1. But that would be a packaging / dependency issue I guess
then.

Of course the extension system would ideally create extension foo; for
all foo in contrib at initdb time, then a user would have to install
extension foo; and be done with it.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-16 Thread Euler Taveira de Oliveira

Dimitri Fontaine escreveu:
 It should be possible to be in contrib and installed by default, even
 
And it could be uninstall too. Let's not do it for core functionalities.


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander mag...@hagander.net wrote:
 This change which moves walreceiver process into a dynamically loaded
 module caused the following compile error on my MinGW environment.
 That sounds strange - it should pick those up from the -lpostgres. Any
 chance you have an old postgres binary around from a non-syncrep build
 or something?
 
 No, there is no old postgres binary.
 
 Do you have an environment to try to build it under msvc?
 
 No, unfortunately.
 
 in my
 experience, that gives you easier-to-understand error messages in a
 lot of cases like this - it removets the mingw black magic.
 
 OK. I'll try to build it under msvc.
 
 But since there seems to be a long way to go before doing that,
 I would appreciate if someone could give me some advice.

It looks like dawn_bat is experiencing the same problem. I don't think
we want to sprinkle all those variables with PGDLLIMPORT, and it didn't
fix the problem for you earlier anyway. Is there some other way to fix this?

Do people still use MinGW for any real work? Could we just drop
walreceiver support from MinGW builds?

Or maybe we should consider splitting walreceiver into two parts after
all. Only the bare minimum that needs to access libpq would go into the
shared object, and the rest would be linked with the backend as usual.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Magnus Hagander

2010/1/15 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Fujii Masao wrote:
 On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander mag...@hagander.net wrote:
 This change which moves walreceiver process into a dynamically loaded
 module caused the following compile error on my MinGW environment.
 That sounds strange - it should pick those up from the -lpostgres. Any
 chance you have an old postgres binary around from a non-syncrep build
 or something?

 No, there is no old postgres binary.

 Do you have an environment to try to build it under msvc?

 No, unfortunately.

 in my
 experience, that gives you easier-to-understand error messages in a
 lot of cases like this - it removets the mingw black magic.

 OK. I'll try to build it under msvc.

 But since there seems to be a long way to go before doing that,
 I would appreciate if someone could give me some advice.

 It looks like dawn_bat is experiencing the same problem. I don't think
 we want to sprinkle all those variables with PGDLLIMPORT, and it didn't
 fix the problem for you earlier anyway. Is there some other way to fix this?

 Do people still use MinGW for any real work? Could we just drop
 walreceiver support from MinGW builds?

We don't know if this works on MSVC, because MSVC doesn't actually try
to build the walreceiver. I'm going to look at that tomorrow.

If we get the same issues there, we a problem in our code. If not, we
need to figure out what's up with mingw.


 Or maybe we should consider splitting walreceiver into two parts after
 all. Only the bare minimum that needs to access libpq would go into the
 shared object, and the rest would be linked with the backend as usual.

That would certainly be one option.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Andrew Dunstan




Heikki Linnakangas wrote:

Do people still use MinGW for any real work? Could we just drop
walreceiver support from MinGW builds?

Or maybe we should consider splitting walreceiver into two parts after
all. Only the bare minimum that needs to access libpq would go into the
shared object, and the rest would be linked with the backend as usual.

  


I use MinGW when doing Windows work (e.g. the threading piece in 
parallel pg_restore).  And I think it is generally desirable to be able 
to build on Windows using an open source tool chain. I'd want a damn 
good reason to abandon its use. And I don't like the idea of not 
supporting walreceiver on it either. Please find another solution if 
possible.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Magnus Hagander

2010/1/15 Andrew Dunstan and...@dunslane.net:


 Heikki Linnakangas wrote:

 Do people still use MinGW for any real work? Could we just drop
 walreceiver support from MinGW builds?

 Or maybe we should consider splitting walreceiver into two parts after
 all. Only the bare minimum that needs to access libpq would go into the
 shared object, and the rest would be linked with the backend as usual.



 I use MinGW when doing Windows work (e.g. the threading piece in parallel 
 pg_restore).  And I think it is generally desirable to be able to build on 
 Windows using an open source tool chain. I'd want a damn good reason to 
 abandon its use. And I don't like the idea of not supporting walreceiver on 
 it either. Please find another solution if possible.


Yeah. FWIW, I don't use mingw do do any windows development, but
definitely +1 on working hard to keep support for it if at all
possible.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Heikki Linnakangas

Magnus Hagander wrote:
 2010/1/15 Andrew Dunstan and...@dunslane.net:

 Heikki Linnakangas wrote:
 Do people still use MinGW for any real work? Could we just drop
 walreceiver support from MinGW builds?

 Or maybe we should consider splitting walreceiver into two parts after
 all. Only the bare minimum that needs to access libpq would go into the
 shared object, and the rest would be linked with the backend as usual.

 I use MinGW when doing Windows work (e.g. the threading piece in parallel 
 pg_restore).  And I think it is generally desirable to be able to build on 
 Windows using an open source tool chain. I'd want a damn good reason to 
 abandon its use. And I don't like the idea of not supporting walreceiver on 
 it either. Please find another solution if possible.
 
 Yeah. FWIW, I don't use mingw do do any windows development, but
 definitely +1 on working hard to keep support for it if at all
 possible.

Ok. I'll look at splitting walreceiver code between the shared module
and backend binary slightly differently. At first glance, it doesn't
seem that hard after all, and will make the code more modular anyway.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Magnus Hagander wrote:
 Yeah. FWIW, I don't use mingw do do any windows development, but
 definitely +1 on working hard to keep support for it if at all
 possible.

 Ok. I'll look at splitting walreceiver code between the shared module
 and backend binary slightly differently. At first glance, it doesn't
 seem that hard after all, and will make the code more modular anyway.

This is probably going in the wrong direction.  There is no good reason
why that module should be failing to link, and I don't think it's going
to be more modular if you're forced to avoid any global variable
references at all in some arbitrary portion of the code.

I think it's a tools/build process problem and should be attacked that
way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Aidan Van Dyk

* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100115 15:20]:

 Ok. I'll look at splitting walreceiver code between the shared module
 and backend binary slightly differently. At first glance, it doesn't
 seem that hard after all, and will make the code more modular anyway.

Maybe an insane question, but why can postmaster just not exec
walreceiver?  I mean, because of windows, we already have that code
around, and then walreceiver could link directly to libpq and not have
to worry at all about linking all of postmaster backends to libpq...

But I do understand that's a radical change...

a.
-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Tom Lane

I wrote:
 I think it's a tools/build process problem and should be attacked that
 way.

Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
We'll find out at the next mingw build...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Tom Lane

Aidan Van Dyk ai...@highrise.ca writes:
 Maybe an insane question, but why can postmaster just not exec
 walreceiver?

It'd greatly complicate access to shared memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Heikki Linnakangas

Tom Lane wrote:
 I wrote:
 I think it's a tools/build process problem and should be attacked that
 way.
 
 Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
 We'll find out at the next mingw build...

Thanks. But what is BE_DLLLIBS? I can't find any description of it.

I suspect the MinGW build will fail because of the missing PGDLLIMPORTs.
Before we sprinkle all the global variables it touches with that, let me
explain what I meant by dividing walreceiver code differently between
dynamically loaded module and backend code. Right now I have to go to
sleep, though, but I'll try to get back to during the weekend.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
 We'll find out at the next mingw build...

 Thanks. But what is BE_DLLLIBS? I can't find any description of it.

It was the wrong theory anyway --- it already is included (in
Makefile.shlib).  But what it does is provide -lpostgres on platforms
where that is needed, such as mingw.

 I suspect the MinGW build will fail because of the missing PGDLLIMPORTs.

Yeah.  On closer investigation the problem seems to be -DBUILDING_DLL,
which flips the meaning of PGDLLIMPORT.  contrib/dblink, which surely
works and has the same linkage requirements as walreceiver, does *not*
use that.  I've committed a patch to change that, we'll soon see if it
works...

 Before we sprinkle all the global variables it touches with that, let me
 explain what I meant by dividing walreceiver code differently between
 dynamically loaded module and backend code. Right now I have to go to
 sleep, though, but I'll try to get back to during the weekend.

Yeah, nothing to be done till we get another buildfarm cycle anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Andrew Dunstan




Tom Lane wrote:

Before we sprinkle all the global variables it touches with that, let me
explain what I meant by dividing walreceiver code differently between
dynamically loaded module and backend code. Right now I have to go to
sleep, though, but I'll try to get back to during the weekend.



Yeah, nothing to be done till we get another buildfarm cycle anyway.


  


I ran an extra cycle. Still a bit of work to do: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_batdt=2010-01-15%2023:04:54


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 I ran an extra cycle. Still a bit of work to do: 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_batdt=2010-01-15%2023:04:54

Well, at least now we're down to the variables that haven't got
PGDLLIMPORT, rather than wondering what's wrong with the build ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Heikki Linnakangas

Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Before we sprinkle all the global variables it touches with that, let me
 explain what I meant by dividing walreceiver code differently between
 dynamically loaded module and backend code. Right now I have to go to
 sleep, though, but I'll try to get back to during the weekend.
 
 Yeah, nothing to be done till we get another buildfarm cycle anyway.

Ok, looks like you did that anyway, let's see if it fixed it. Thanks.

So what I'm playing with is to pull walreceiver back into the backend
executable. To avoid the link dependency, walreceiver doesn't access
libpq directly, but loads a module dynamically which implements this
interface:

bool walrcv_connect(char *conninfo, XLogRecPtr startpoint)

Establish connection to the primary, and starts streaming from 'startpoint'.
Returns true on success.

bool walrcv_receive(int timeout, XLogRecPtr *recptr, char **buffer, int
*len)

Retrieve any WAL record available through the connection, blocking for
maximum of 'timeout' ms.

void walrcv_disconnect(void);

Disconnect.


This is the kind of API Greg Stark requested earlier
(http://archives.postgresql.org/message-id/407d949e0912220336u595a05e0x20bd91b9fbc08...@mail.gmail.com),
though I'm not planning to make it pluggable for 3rd party
implementations yet.

The module doesn't need to touch backend internals much at all, no
tinkering with shared memory for example, so I would feel much better
about moving that out of src/backend. Not sure where, though; it's not
an executable, so src/bin is hardly the right place, but I wouldn't want
to put it in contrib either, because it should still be built and
installed by default. So I'm inclined to still leave it in
src/backend/replication/

I've pushed that 'replication-dynmodule' branch in my git repo. The diff
is hard to read, because it mostly just moves code around, but I've
attached libpqwalreceiver.c here, which is the dynamic module part. You
can also browse the tree via the web interface
(http://git.postgresql.org/gitweb?p=users/heikki/postgres.git;a=tree;h=refs/heads/replication-dynmodule;hb=replication-dynmodule)

I like this division of labor much more than making the whole
walreceiver process a dynamically loaded module, so barring objections I
will review and test this more, and commit next week.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
/*-
 *
 * libpqwalreceiver.c
 *
 * The WAL receiver process (walreceiver) is new as of Postgres 8.5. It
 * is the process in the standby server that takes charge of receiving
 * XLOG records from a primary server during streaming replication.
 *
 * When the startup process determines that it's time to start streaming,
 * it instructs postmaster to start walreceiver. Walreceiver first connects
 * connects to the primary server (it will be served by a walsender process
 * in the primary server), and then keeps receiving XLOG records and
 * writing them to the disk as long as the connection is alive. As XLOG
 * records are received and flushed to disk, it updates the
 * WalRcv-receivedUpTo variable in shared memory, to inform the startup
 * process of how far it can proceed with XLOG replay.
 *
 * Normal termination is by SIGTERM, which instructs the walreceiver to
 * exit(0). Emergency termination is by SIGQUIT; like any postmaster child
 * process, the walreceiver will simply abort and exit on SIGQUIT. A close
 * of the connection and a FATAL error are treated not as a crash but as
 * normal operation.
 *
 * Walreceiver is a postmaster child process like others, but it's compiled
 * as a dynamic module to avoid linking libpq with the main server binary.
 *
 * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
 *
 *
 * IDENTIFICATION
 *	  $PostgreSQL$
 *
 *-
 */
#include postgres.h

#include unistd.h

#include libpq-fe.h
#include access/xlog.h
#include miscadmin.h
#include replication/walreceiver.h
#include utils/builtins.h

#ifdef HAVE_POLL_H
#include poll.h
#endif
#ifdef HAVE_SYS_POLL_H
#include sys/poll.h
#endif
#ifdef HAVE_SYS_SELECT_H
#include sys/select.h
#endif

PG_MODULE_MAGIC;

void		_PG_init(void);

/* streamConn is a PGconn object of a connection to walsender from walreceiver */
static PGconn *streamConn = NULL;
static bool justconnected = false;

/* Buffer for currently read records */
static char *recvBuf = NULL;

/* Prototypes for interface functions */
static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint);
static bool libpqrcv_receive(int timeout, XLogRecPtr *recptr, char **buffer,
			  int *len);
static void libpqrcv_disconnect(void);

/* Prototypes for private functions */
static bool libpq_select(int timeout_ms);

/*
 * Module load callback
 */
void
_PG_init(void)
{
	walrcv_connect = libpqrcv_connect;
	walrcv_receive =

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-15 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
 I've pushed that 'replication-dynmodule' branch in my git repo. The diff
 is hard to read, because it mostly just moves code around, but I've
 attached libpqwalreceiver.c here, which is the dynamic module part. You
 can also browse the tree via the web interface
 (http://git.postgresql.org/gitweb?p=users/heikki/postgres.git;a=tree;h=refs/heads/replication-dynmodule;hb=replication-dynmodule)

I just noticed that the comment at the top of libpqwalreceiver.c is a
leftover, not much relevant to the contents of the file anymore, all the
signal handling and interaction with startup process is in
src/backend/replication/walreceiver.c now. That obviously needs to be
fixed before committing..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-14 Thread Fujii Masao

On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 the frontend always puts the
 connection to non-blocking mode, while the backend uses blocking mode.

Really? By default (i.e., without the expressly setting by using
PQsetnonblocking()), the connection is set to blocking mode even
in frontend. Am I missing something?

 At least with SSL, I think it's possible for pq_wait() to return false
 positives, if the SSL layer decides to renegotiate the connection
 causing data to flow in the other direction in the underlying TCP
 connection. A false positive would lead cause walsender to block
 indefinitely on the pq_getbyte() call.

Sorry. I could not understand that issue scenario. Could you explain
it in more detail?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-14 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 the frontend always puts the
 connection to non-blocking mode, while the backend uses blocking mode.
 
 Really? By default (i.e., without the expressly setting by using
 PQsetnonblocking()), the connection is set to blocking mode even
 in frontend. Am I missing something?

That's right. The underlying socket is always put to non-blocking mode
in libpq. PQsetnonblocking() only affects whether libpq commands wait
and retry if the output buffer is full.

 At least with SSL, I think it's possible for pq_wait() to return false
 positives, if the SSL layer decides to renegotiate the connection
 causing data to flow in the other direction in the underlying TCP
 connection. A false positive would lead cause walsender to block
 indefinitely on the pq_getbyte() call.
 
 Sorry. I could not understand that issue scenario. Could you explain
 it in more detail?

1. Walsender calls pq_wait() which calls select(), waiting for timeout,
or data to become available for reading in the underlying socket.

2. Client issues an SSL renegotiation by sending a message to the server

3. Server receives the message, and select() returns indicating that
data has arrived

4. Walsender calls HandleEndOfRep() which calls pq_getbyte().
pq_readbyte() calls SSL_read(), which receives the renegotiation message
and handles it. No application data has arrived, however, so SSL_read()
blocks for some to arrive. It never does.

I don't understand enough of SSL to know if renegotiation can actually
happen like that, but the man page of SSL_read() suggests so. But a
similar thing can happen if an SSL record is broken into two TCP
packets. select() returns immediately as the first packet arrives, but
SSL_read() will block until the 2nd packet arrives.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-14 Thread Magnus Hagander

2010/1/14 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Fujii Masao wrote:
 On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 the frontend always puts the
 connection to non-blocking mode, while the backend uses blocking mode.

 Really? By default (i.e., without the expressly setting by using
 PQsetnonblocking()), the connection is set to blocking mode even
 in frontend. Am I missing something?

 That's right. The underlying socket is always put to non-blocking mode
 in libpq. PQsetnonblocking() only affects whether libpq commands wait
 and retry if the output buffer is full.

 At least with SSL, I think it's possible for pq_wait() to return false
 positives, if the SSL layer decides to renegotiate the connection
 causing data to flow in the other direction in the underlying TCP
 connection. A false positive would lead cause walsender to block
 indefinitely on the pq_getbyte() call.

 Sorry. I could not understand that issue scenario. Could you explain
 it in more detail?

 1. Walsender calls pq_wait() which calls select(), waiting for timeout,
 or data to become available for reading in the underlying socket.

 2. Client issues an SSL renegotiation by sending a message to the server

 3. Server receives the message, and select() returns indicating that
 data has arrived

 4. Walsender calls HandleEndOfRep() which calls pq_getbyte().
 pq_readbyte() calls SSL_read(), which receives the renegotiation message
 and handles it. No application data has arrived, however, so SSL_read()
 blocks for some to arrive. It never does.

 I don't understand enough of SSL to know if renegotiation can actually
 happen like that, but the man page of SSL_read() suggests so. But a
 similar thing can happen if an SSL record is broken into two TCP
 packets. select() returns immediately as the first packet arrives, but
 SSL_read() will block until the 2nd packet arrives.

I *think* renegotiation happens based on amount of content, not amount
of time. But it could still happen in cornercases I think. If the
renegotiation happens right after a complete packet has been sent
(which would be the logical place), but not fast enough that the SSL
library gets it in one read() from the socket, you could end up in
that situation. (if the SSL library gets the renegotiation request as
part of the first read(), it would probably do the renegotiation
before returning from that call to SSL_read(), in which case the
socket would be in the correct state before you call select)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-14 Thread Fujii Masao

On Thu, Jan 14, 2010 at 9:14 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 After reading up on SSL_read() and SSL_pending(), it seems that there is
 unfortunately no reliable way of checking if there is incoming data that
 can be read using SSL_read() without blocking, short of putting the
 socket to non-blocking mode. It also seems that we can't rely on poll()
 returning POLLHUP if the remote end has disconnected; it's not doing
 that at least on my laptop.

 So, the only solution I can see is to put the socket to non-blocking
 mode. But to keep the change localized, let's switch to non-blocking
 mode only temporarily, just when polling to see if there's data to read
 (or EOF), and switch back immediately afterwards.

Agreed. Though I also read some pages referring to that issue,
I was not able to find any better action other than the temporal
switch of the blocking mode.

 I've added a pq_getbyte_if_available() function to pqcomm.c to do that.
 The API to the upper levels is quite nice, the function returns a byte
 if one is available without blocking. Only minimal changes are required
 elsewhere.

Great! Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-13 Thread Fujii Masao

Thanks for your advice!

On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander mag...@hagander.net wrote:
 This change which moves walreceiver process into a dynamically loaded
 module caused the following compile error on my MinGW environment.

 That sounds strange - it should pick those up from the -lpostgres. Any
 chance you have an old postgres binary around from a non-syncrep build
 or something?

No, there is no old postgres binary.

 Do you have an environment to try to build it under msvc?

No, unfortunately.

 in my
 experience, that gives you easier-to-understand error messages in a
 lot of cases like this - it removets the mingw black magic.

OK. I'll try to build it under msvc.

But since there seems to be a long way to go before doing that,
I would appreciate if someone could give me some advice.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-13 Thread Heikki Linnakangas

Fujii Masao wrote:
 Done. Currently there is no new libpq function for replication. The
 walreceiver uses only existing functions like PQconnectdb, PQexec,
 PQgetCopyData, etc.
 
   git://git.postgresql.org/git/users/fujii/postgres.git
   branch: replication

Thanks!

I'm afraid we haven't quite nailed the select/poll issue yet. You copied
pq_wait() from the libpq pqSocketCheck(), but there's one big difference
between the backend and the frontend: the frontend always puts the
connection to non-blocking mode, while the backend uses blocking mode.
At least with SSL, I think it's possible for pq_wait() to return false
positives, if the SSL layer decides to renegotiate the connection
causing data to flow in the other direction in the underlying TCP
connection. A false positive would lead cause walsender to block
indefinitely on the pq_getbyte() call.

I don't even want to think about the changes required to put the backend
socket to non-blocking mode, I don't know that code well enough. Maybe
we could temporarily put it to non-blocking mode, read to see if there's
any data available, and put it back to blocking mode. But even then I
think we'd need to modify at least secure_read() to work correctly with
SSL in non-blocking mode.

Another idea is to use poll() to check for POLLHUP, on those platforms
that have poll(). AFAICS there is no equivalent for that in select(), so
for platforms that don't have poll() we would have to simply ignore the
issue or write some other platform-specific work-around (Windows
WSAEventSelect() seems to have a FD_CLOSE event for that). That would be
a quite localized change.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-12 Thread Fujii Masao

On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Umm.., I still cannot find the place where the walreceiver module is
 loaded by using load_external_function() in your 'replication' branch.
 Also the compilation of that branch fails. Is the 'pushed' branch the
 latest? Sorry if I'm missing something.

 Ah, I see. The changes were not included in the merge commit after all,
 but I had simple forgot to git add them. Sorry about that, should be
 there now.

This change which moves walreceiver process into a dynamically loaded
module caused the following compile error on my MinGW environment.

---
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g  -I. -I../../../../src/interfaces/libpq
-I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
-I../../../../src/include/port/win32 -DBUILDING_DLL  -c -o
walreceiverproc.o walreceiverproc.c
dlltool --export-all  --output-def libwalreceiverprocdll.def walreceiverproc.o
dllwrap  -o walreceiverproc.dll --dllname walreceiverproc.dll  --def
libwalreceiverprocdll.def walreceiverproc.o -L../../../../src/backend
-lpostgres -L../../../../src/interfaces/libpq -L../../../../src/port
-lpq
Info: resolving _pg_signal_mask by linking to __imp__pg_signal_mask
(auto-import)
Info: resolving _pg_signal_queue by linking to __imp__pg_signal_queue
(auto-import)
Info: resolving _InterruptPending by linking to
__imp__InterruptPending (auto-import)
Info: resolving _assert_enabled by linking to __imp__assert_enabled
(auto-import)
Info: resolving _WalRcv by linking to __imp__WalRcv (auto-import)
Info: resolving _proc_exit_inprogress by linking to
__imp__proc_exit_inprogress (auto-import)
Info: resolving _BlockSig by linking to __imp__BlockSig (auto-import)
Info: resolving _sync_method by linking to __imp__sync_method (auto-import)
Info: resolving _MyProcPid by linking to __imp__MyProcPid (auto-import)
Info: resolving _CurrentResourceOwner by linking to
__imp__CurrentResourceOwner (auto-import)
Info: resolving _TopMemoryContext by linking to
__imp__TopMemoryContext (auto-import)
Info: resolving _CurrentMemoryContext by linking to
__imp__CurrentMemoryContext (auto-import)
Info: resolving _PG_exception_stack by linking to
__imp__PG_exception_stack (auto-import)
Info: resolving _UnBlockSig by linking to __imp__UnBlockSig (auto-import)
Info: resolving _ThisTimeLineID by linking to __imp__ThisTimeLineID
(auto-import)
Info: resolving _error_context_stack by linking to
__imp__error_context_stack (auto-import)
Info: resolving _InterruptHoldoffCount by linking to
__imp__InterruptHoldoffCount (auto-import)
c:\MinGW\bin\..\lib\gcc\mingw32\3.4.2\..\..\..\..\mingw32\bin\ld.exe:
warning: auto-importing has been activated without
--enable-auto-import specified on the command line.
This should work unless it involves constant data structures
referencing symbols from auto-imported DLLs.
fu01.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu03.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu05.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu06.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu08.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu09.o:(.idata$2+0xc): more undefined references to
`libpostgres_a_iname' follow
nmth00.o:(.idata$4+0x0): undefined reference to `_nm__pg_signal_mask'
nmth02.o:(.idata$4+0x0): undefined reference to `_nm__pg_signal_queue'
nmth04.o:(.idata$4+0x0): undefined reference to `_nm__InterruptPending'
nmth07.o:(.idata$4+0x0): undefined reference to `_nm__assert_enabled'
nmth12.o:(.idata$4+0x0): undefined reference to `_nm__WalRcv'
nmth18.o:(.idata$4+0x0): undefined reference to `_nm__proc_exit_inprogress'
nmth20.o:(.idata$4+0x0): undefined reference to `_nm__BlockSig'
nmth23.o:(.idata$4+0x0): undefined reference to `_nm__sync_method'
nmth26.o:(.idata$4+0x0): undefined reference to `_nm__MyProcPid'
nmth28.o:(.idata$4+0x0): undefined reference to `_nm__CurrentResourceOwner'
nmth30.o:(.idata$4+0x0): undefined reference to `_nm__TopMemoryContext'
nmth32.o:(.idata$4+0x0): undefined reference to `_nm__CurrentMemoryContext'
nmth35.o:(.idata$4+0x0): undefined reference to `_nm__PG_exception_stack'
nmth37.o:(.idata$4+0x0): undefined reference to `_nm__UnBlockSig'
nmth39.o:(.idata$4+0x0): undefined reference to `_nm__ThisTimeLineID'
nmth41.o:(.idata$4+0x0): undefined reference to `_nm__error_context_stack'
nmth43.o:(.idata$4+0x0): undefined reference to `_nm__InterruptHoldoffCount'
collect2: ld returned 1 exit status
c:\MinGW\bin\dllwrap.exe: c:\MinGW\bin\gcc exited with status 1
make[2]: *** [walreceiverproc.dll] Error 1
make[2]: Leaving directory
`/c/postgres/mmm/src/backend/postmaster/walreceiverproc'
make[1]: *** [all] Error 2
make[1]: Leaving directory

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-12 Thread Magnus Hagander

On Tue, Jan 12, 2010 at 17:58, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Umm.., I still cannot find the place where the walreceiver module is
 loaded by using load_external_function() in your 'replication' branch.
 Also the compilation of that branch fails. Is the 'pushed' branch the
 latest? Sorry if I'm missing something.

 Ah, I see. The changes were not included in the merge commit after all,
 but I had simple forgot to git add them. Sorry about that, should be
 there now.

 This change which moves walreceiver process into a dynamically loaded
 module caused the following compile error on my MinGW environment.

That sounds strange - it should pick those up from the -lpostgres. Any
chance you have an old postgres binary around from a non-syncrep build
or something?


 ---

 Though I marked the variables shown in the above message as PGDLLIMPORT,
 the make still fails in the same way. I struggled with this issue
 for some time, but
 could not fix it yet :(

 Frankly I'm not familiar with that area. So it would be nice if
 someone could analyze
 this issue.

Do you have an environment to try to build it under msvc? in my
experience, that gives you easier-to-understand error messages in a
lot of cases like this - it removets the mingw black magic.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-05 Thread Fujii Masao

On Tue, Jan 5, 2010 at 12:22 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I've merged the replication branch with PostgreSQL CVS HEAD now,
 including the patch for end-of-backup WAL records I committed earlier
 today. See 'replication' branch in my git repository.

 There's also a couple of other small changes: I believe the SSL stuff
 isn't really necessary, so I removed it. I also moved the
 START_REPLICATION phase from the walreceiver main loop to WalRcvConnect,
 as it's simpler that way.

I also fixed a couple of small bugs:

* The ErrorResponse message from the primary server had been ignored
* The segment-boundary had been wrongly handled
* Valid replication starting location had been wrongly regarded as invalid

 git://git.postgresql.org/git/users/fujii/postgres.git
 branch: replication

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2010-01-04 Thread Heikki Linnakangas

I've merged the replication branch with PostgreSQL CVS HEAD now,
including the patch for end-of-backup WAL records I committed earlier
today. See 'replication' branch in my git repository.

There's also a couple of other small changes: I believe the SSL stuff
isn't really necessary, so I removed it. I also moved the
START_REPLICATION phase from the walreceiver main loop to WalRcvConnect,
as it's simpler that way.

I will continue reviewing..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Greg Stark

On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.

Please? I am really interested in replacing walsender and walreceiver
with something which uses a communication bus like spread instead of a
single point to point connection.

ISTM if we start with something tightly coupled it'll be hard to
decouple later. Whereas if we start with a limited interface we'll
learn just how much information is really required by the modules and
will have fewer surprises later when we find suprising
interdependencies.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.

 That's the way I did it yesterday, see 'replication' branch in my git
 repository, but it looks like I fumbled the commit so that some of the
 changes were committed as part of the merge commit with origin/master
 (=CVS HEAD). Sorry about that.
 
 Umm.., I still cannot find the place where the walreceiver module is
 loaded by using load_external_function() in your 'replication' branch.
 Also the compilation of that branch fails. Is the 'pushed' branch the
 latest? Sorry if I'm missing something.

Ah, I see. The changes were not included in the merge commit after all,
but I had simple forgot to git add them. Sorry about that, should be
there now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Heikki Linnakangas

Greg Stark wrote:
 On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.
 
 Please? I am really interested in replacing walsender and walreceiver
 with something which uses a communication bus like spread instead of a
 single point to point connection.

I think you'd still need to be able to request older WAL segments to
resync after a lost connection, restore from base backup etc., which
don't really fit into a publish/subscribe style communication bus. I'm
sure it could all be solved though. It would be a pretty cool feature,
for scaling to a large number of slaves.

 ISTM if we start with something tightly coupled it'll be hard to
 decouple later. Whereas if we start with a limited interface we'll
 learn just how much information is really required by the modules and
 will have fewer surprises later when we find suprising
 interdependencies.

I'm all ears if you have a concrete proposal.

I'm not too worried about it being hard to decouple later. The interface
is actually quite limited already, as the communication between
processes is done via shared memory. It probably wouldn't be hard to
turn it into an API, but I don't think there's a hurry to do that until
someone actually steps up to write an alternative walreceiver/walsender,

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Fujii Masao

On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ah, I see. The changes were not included in the merge commit after all,
 but I had simple forgot to git add them. Sorry about that, should be
 there now.

Thanks for doing git push again!

But the compilation still fails.
Attached patch addresses this problem.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/Makefile
--- b/src/backend/Makefile
***
*** 34,41  endif
  
  OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a
  
! # We put libpgport into OBJS, so remove it from LIBS; also add libldap and libpq
! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE) $(libpq)
  
  # The backend doesn't need everything that's in LIBS, however
  LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
--- 34,41 
  
  OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a
  
! # We put libpgport into OBJS, so remove it from LIBS; also add libldap
! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE)
  
  # The backend doesn't need everything that's in LIBS, however
  LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
*** a/src/backend/postmaster/walreceiverproc/Makefile
--- b/src/backend/postmaster/walreceiverproc/Makefile
***
*** 18,24  OBJS = walreceiverproc.o
  SHLIB_LINK = $(libpq)
  NAME = walreceiverproc
  
! all: all-shared-lib
  
  include $(top_srcdir)/src/Makefile.shlib
  
--- 18,24 
  SHLIB_LINK = $(libpq)
  NAME = walreceiverproc
  
! all: submake-libpq all-shared-lib
  
  include $(top_srcdir)/src/Makefile.shlib
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Fujii Masao

On Fri, Dec 18, 2009 at 11:42 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Okey. Design clarification again;

 0. Begin by connecting to the master using PQconnectdb() with new conninfo
 option specifying the request of replication. The startup packet with the
 request is sent to the master, then the backend switches to the walsender
 mode. The walsender goes into the main loop and wait for the request from
 the walreceiver.
snip
 4. Start replication

 Slave - Master: Query message, with query string START REPLICATION:
 , where  is the RecPtr of the starting point.

 Master - Slave: CopyOutResponse followed by a continuous stream of
 CopyData messages with WAL contents.

Done. Currently there is no new libpq function for replication. The
walreceiver uses only existing functions like PQconnectdb, PQexec,
PQgetCopyData, etc.

  git://git.postgresql.org/git/users/fujii/postgres.git
  branch: replication

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Fri, Dec 18, 2009 at 11:42 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Okey. Design clarification again;

 0. Begin by connecting to the master using PQconnectdb() with new conninfo
 option specifying the request of replication. The startup packet with the
 request is sent to the master, then the backend switches to the walsender
 mode. The walsender goes into the main loop and wait for the request from
 the walreceiver.
 snip
 4. Start replication

 Slave - Master: Query message, with query string START REPLICATION:
 , where  is the RecPtr of the starting point.

 Master - Slave: CopyOutResponse followed by a continuous stream of
 CopyData messages with WAL contents.
 
 Done. Currently there is no new libpq function for replication. The
 walreceiver uses only existing functions like PQconnectdb, PQexec,
 PQgetCopyData, etc.

Ok thanks, sounds good, I'll take a look.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Tue, Dec 15, 2009 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  Perhaps it should be a loadable plugin and not hard-linked into the
 backend?  Compare dblink.
 
 You mean that such plugin is supplied in shared_preload_libraries,
 a new process is forked and the shared-memory related to walreceiver
 is created by using shmem_startup_hook? Since this approach would
 solve the problem discussed previously, ISTM this makes sense.
 http://archives.postgresql.org/pgsql-hackers/2009-11/msg00031.php
 
 Some additional code might be required to control the termination
 of walreceiver.

I'm not sure which problem in that thread you're referring to, but I can
see two options:

1. Use dlopen()/dlsym() in walreceiver to use libpq. A bit awkward,
though we could write a bunch of macros to hide that and make the libpq
calls look normal.

2. Move walreceiver altogether into a loadable module, which is linked
as usual to libpq. Like e.g contrib/dblink.

Thoughts? Both seem reasonable to me. I tested the 2nd option (see
'replication' branch in my git repository), splitting walreceiver.c into
two: the functions that run in the walreceiver process, and the
functions that are called from other processes to control walreceiver.
That's a quite nice separation, though of course we could do that with
the 1st approach as well.

PS. I just merged with CVS HEAD. Streaming replication is pretty awesome
with Hot Standby!

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Fujii Masao wrote:
 I'm not sure which problem in that thread you're referring to, but I can
 see two options:

 1. Use dlopen()/dlsym() in walreceiver to use libpq. A bit awkward,
 though we could write a bunch of macros to hide that and make the libpq
 calls look normal.

 2. Move walreceiver altogether into a loadable module, which is linked
 as usual to libpq. Like e.g contrib/dblink.

 Thoughts? Both seem reasonable to me.

From a packager's standpoint the second is much saner.  If you want to
use dlopen() then you will have to know the exact name of the .so file
(e.g. libpq.so.5.3) and possibly its location too.  Or you will have to
persuade packagers that they should ship bare libpq.so symlinks, which
is contrary to packaging standards on most Linux distros.
(walreceiver.so wouldn't be subject to those standards, but libpq is
because it's a regular library that can also be hard-linked by
applications.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Fujii Masao

On Tue, Dec 22, 2009 at 2:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 2. Move walreceiver altogether into a loadable module, which is linked
 as usual to libpq. Like e.g contrib/dblink.

 Thoughts? Both seem reasonable to me. I tested the 2nd option (see
 'replication' branch in my git repository), splitting walreceiver.c into
 two: the functions that run in the walreceiver process, and the
 functions that are called from other processes to control walreceiver.
 That's a quite nice separation, though of course we could do that with
 the 1st approach as well.

Though I seem not to understand what a loadable module means, I wonder
how the walreceiver module is loaded. AFAIK, we need to manually install
the dblink functions by executing dblink.sql before using them. Likewise,
if we choose the 2nd option, we must manually install the walreceiver
module before starting replication?

Or we automatically install that by executing system_view.sql, like
pg_start_backup? I'd like to reduce the number of installation operations
as much as possible. Is my concern besides the point?

 PS. I just merged with CVS HEAD. Streaming replication is pretty awesome
 with Hot Standby!

Thanks!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 Though I seem not to understand what a loadable module means, I wonder
 how the walreceiver module is loaded.

Put it in shared_preload_libraries, perhaps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Tue, Dec 22, 2009 at 2:31 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 2. Move walreceiver altogether into a loadable module, which is linked
 as usual to libpq. Like e.g contrib/dblink.

 Thoughts? Both seem reasonable to me. I tested the 2nd option (see
 'replication' branch in my git repository), splitting walreceiver.c into
 two: the functions that run in the walreceiver process, and the
 functions that are called from other processes to control walreceiver.
 That's a quite nice separation, though of course we could do that with
 the 1st approach as well.
 
 Though I seem not to understand what a loadable module means, I wonder
 how the walreceiver module is loaded. AFAIK, we need to manually install
 the dblink functions by executing dblink.sql before using them. Likewise,
 if we choose the 2nd option, we must manually install the walreceiver
 module before starting replication?

I think we can just use load_external_function() to load the library and
call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
library name. Walreceiver is quite tightly coupled with the rest of the
backend anyway, so I don't think we need to come up with a pluggable API
at the moment.

That's the way I did it yesterday, see 'replication' branch in my git
repository, but it looks like I fumbled the commit so that some of the
changes were committed as part of the merge commit with origin/master
(=CVS HEAD). Sorry about that.

shared_preload_libraries seems like a bad place because the library
doesn't need to be loaded in all backends. Just the walreceiver process.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-21 Thread Fujii Masao

On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.

 That's the way I did it yesterday, see 'replication' branch in my git
 repository, but it looks like I fumbled the commit so that some of the
 changes were committed as part of the merge commit with origin/master
 (=CVS HEAD). Sorry about that.

Umm.., I still cannot find the place where the walreceiver module is
loaded by using load_external_function() in your 'replication' branch.
Also the compilation of that branch fails. Is the 'pushed' branch the
latest? Sorry if I'm missing something.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-17 Thread Fujii Masao

On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Great! The logical next step is move the handling of TimelineID and
 system identifier out of libpq as well.

All right.

 0. Begin by connecting to the master just like a normal backend does. We
 don't necessarily need the new ProtocolVersion code either, though it's
 probably still a good idea to reject connections to older server versions.

And, I think that such backend should switch to walsender mode when the startup
packet arrives. Otherwise, we would have to authenticate such backend twice
on different context, i.e., a normal backend and walsender. So the settings for
each context would be required in pg_hba.conf. This is odd, I think. Thought?

 1. Get the system identifier of the master.

 Slave - Master: Query message, with a query string like
 GET_SYSTEM_IDENTIFIER

 Master - Slave: RowDescription, DataRow CommandComplete, and
 ReadyForQuery messages. The system identifier is returned in the DataRow
 message.

 This is identical to what happens when a query is executed against a
 normal backend using the simple query protocol, so walsender can use
 PQexec() for this.

s/walsender/walreceiver ?

A signal cannot cancel PQexec() during waiting for the message from the
server. We might need to change SIGTERM handler of walreceiver so as to
call proc_exit() immediately if it's during PQexec().

 2. Another query exchange like above, for timeline ID. (or these two
 steps can be joined into one query, to eliminate one round-trip).

 3. Request a backup history file, if needed:

 Slave - Master: Query message, with a query string like
 GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr or file name.

 Master - Slave: RowDescription, DataRow CommandComplete and
 ReadyForQuery messages as usual. The file contents are returned in the
 DataRow message.

 4. Start replication

 Slave - Master: Query message, with query string START REPLICATION:
 , where  is the RecPtr of the starting point.

 Master - Slave: CopyOutResponse followed by a continuous stream of
 CopyData messages with WAL contents.

Seems OK.

 This minimizes the changes to the protocol and libpq, with a clear way
 of extending by adding new commands. Similar to what you did a long time
 ago, connecting as an actual backend at first and then switching to
 walsender mode after running a few queries, but this would all be
 handled in a separate loop in walsender instead of running as a
 full-blown backend.

Agreed. Only walsender should be allowed to handle the query strings that
you proposed, in order that we avoid touching a parser.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-17 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 0. Begin by connecting to the master just like a normal backend does. We
 don't necessarily need the new ProtocolVersion code either, though it's
 probably still a good idea to reject connections to older server versions.
 
 And, I think that such backend should switch to walsender mode when the 
 startup
 packet arrives. Otherwise, we would have to authenticate such backend twice
 on different context, i.e., a normal backend and walsender. So the settings 
 for
 each context would be required in pg_hba.conf. This is odd, I think. Thought?

True.

 This is identical to what happens when a query is executed against a
 normal backend using the simple query protocol, so walsender can use
 PQexec() for this.
 
 s/walsender/walreceiver ?

Right.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-17 Thread Fujii Masao

On Thu, Dec 17, 2009 at 9:02 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 And, I think that such backend should switch to walsender mode when the 
 startup
 packet arrives. Otherwise, we would have to authenticate such backend twice
 on different context, i.e., a normal backend and walsender. So the settings 
 for
 each context would be required in pg_hba.conf. This is odd, I think. Thought?

 True.

Currently this switch depends on whether XLOG_STREAMING_CODE is sent from the
standby or not, also which depends on whether PQstartXLogStreaming() is called
or not. But, as the next step, we should get rid of also such changes of libpq.

I'm thinking of making the standby send the walsender-switch-code the same way
as application_name; walreceiver always specifies the option like
replication=on
in conninfo string and calls PQconnectdb(), which sends the code as a part of
startup packet. And, the environment variable for that should not be defined to
avoid user's mis-configuration, I think.

Thought? Better idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-17 Thread Heikki Linnakangas

Fujii Masao wrote:
 I'm thinking of making the standby send the walsender-switch-code the same 
 way
 as application_name; walreceiver always specifies the option like
 replication=on
 in conninfo string and calls PQconnectdb(), which sends the code as a part of
 startup packet. And, the environment variable for that should not be defined 
 to
 avoid user's mis-configuration, I think.

Sounds good.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-17 Thread Fujii Masao

On Thu, Dec 17, 2009 at 10:25 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:
 I'm thinking of making the standby send the walsender-switch-code the same 
 way
 as application_name; walreceiver always specifies the option like
 replication=on
 in conninfo string and calls PQconnectdb(), which sends the code as a part of
 startup packet. And, the environment variable for that should not be defined 
 to
 avoid user's mis-configuration, I think.

 Sounds good.

Okey. Design clarification again;

0. Begin by connecting to the master using PQconnectdb() with new conninfo
option specifying the request of replication. The startup packet with the
request is sent to the master, then the backend switches to the walsender
mode. The walsender goes into the main loop and wait for the request from
the walreceiver.

1. Get the system identifier of the master.

Slave - Master: Query message, with a query string like
GET_SYSTEM_IDENTIFIER

Master - Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

2. Another query exchange like above, for timeline ID.

Slave - Master: Query message, with a query string like
GET_TIMELINE

Master - Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The timeline ID is returned in the DataRow
message.

3. Request a backup history file, if needed:

Slave - Master: Query message, with a query string like
GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr.

Master - Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.

In 1, 2, 3, the walreceiver uses PQexec() to send Query message and receive
the results.

4. Start replication

Slave - Master: Query message, with query string START REPLICATION:
, where  is the RecPtr of the starting point.

Master - Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-16 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Tom Lane wrote:
 The very, very large practical problem with this is that if you decide
 to change the behavior at any time, the only way to be sure that the WAL
 receiver is using the right libpq version is to perform a soname major
 version bump.  The transformations done by libpq will essentially become
 part of its ABI, and not a very visible part at that.
 Not having to change the libpq API would certainly be a big advantage.
 
 Done; I replaced PQgetXLogData and PQputXLogRecPtr with PQgetCopyData and
 PQputCopyData.

Great! The logical next step is move the handling of TimelineID and
system identifier out of libpq as well.


I'm thinking of refactoring the protocol along these lines:

0. Begin by connecting to the master just like a normal backend does. We
don't necessarily need the new ProtocolVersion code either, though it's
probably still a good idea to reject connections to older server versions.

1. Get the system identifier of the master.

Slave - Master: Query message, with a query string like
GET_SYSTEM_IDENTIFIER

Master - Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

This is identical to what happens when a query is executed against a
normal backend using the simple query protocol, so walsender can use
PQexec() for this.

2. Another query exchange like above, for timeline ID. (or these two
steps can be joined into one query, to eliminate one round-trip).

3. Request a backup history file, if needed:

Slave - Master: Query message, with a query string like
GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr or file name.

Master - Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.


4. Start replication

Slave - Master: Query message, with query string START REPLICATION:
, where  is the RecPtr of the starting point.

Master - Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.


This minimizes the changes to the protocol and libpq, with a clear way
of extending by adding new commands. Similar to what you did a long time
ago, connecting as an actual backend at first and then switching to
walsender mode after running a few queries, but this would all be
handled in a separate loop in walsender instead of running as a
full-blown backend. We'll still need small changes to libpq to allow
sending messages back to the server in COPY_IN mode (maybe add a new
COPY_IN_OUT mode for that).

Thoughts?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-16 Thread Greg Stark

I'm interested in abstracting out features of replication from libpq too. It
would be nice if we could implement different communication bus modules.

For example if you have dozens of replicas you may want to use something
like spread to distribute the records using multicast.

Sorry for top posting -- I haven't yet figured out how not to in this
client.

On 16 Dec 2009 09:54, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

Fujii Masao wrote:  On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas 
heikki.linnakan...@enter...
Great! The logical next step is move the handling of TimelineID and
system identifier out of libpq as well.


I'm thinking of refactoring the protocol along these lines:

0. Begin by connecting to the master just like a normal backend does. We
don't necessarily need the new ProtocolVersion code either, though it's
probably still a good idea to reject connections to older server versions.

1. Get the system identifier of the master.

Slave - Master: Query message, with a query string like
GET_SYSTEM_IDENTIFIER

Master - Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

This is identical to what happens when a query is executed against a
normal backend using the simple query protocol, so walsender can use
PQexec() for this.

2. Another query exchange like above, for timeline ID. (or these two
steps can be joined into one query, to eliminate one round-trip).

3. Request a backup history file, if needed:

Slave - Master: Query message, with a query string like
GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr or file name.

Master - Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.


4. Start replication

Slave - Master: Query message, with query string START REPLICATION:
, where  is the RecPtr of the starting point.

Master - Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.


This minimizes the changes to the protocol and libpq, with a clear way
of extending by adding new commands. Similar to what you did a long time
ago, connecting as an actual backend at first and then switching to
walsender mode after running a few queries, but this would all be
handled in a separate loop in walsender instead of running as a
full-blown backend. We'll still need small changes to libpq to allow
sending messages back to the server in COPY_IN mode (maybe add a new
COPY_IN_OUT mode for that).

Thoughts?

-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com --

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscript...

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-15 Thread Fujii Masao

On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Tom Lane wrote:
 The very, very large practical problem with this is that if you decide
 to change the behavior at any time, the only way to be sure that the WAL
 receiver is using the right libpq version is to perform a soname major
 version bump.  The transformations done by libpq will essentially become
 part of its ABI, and not a very visible part at that.

 Not having to change the libpq API would certainly be a big advantage.

Done; I replaced PQgetXLogData and PQputXLogRecPtr with PQgetCopyData and
PQputCopyData.

 git://git.postgresql.org/git/users/fujii/postgres.git
 branch: replication

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Fujii Masao

On Sat, Dec 12, 2009 at 5:09 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Could we change the API of PQgetXLogData to be more like PQgetCopyData?
 I'm thinking of removing the timeout argument, and instead looping with
 select/poll and PQconsumeInput in the caller. That probably means
 introducing a new state analogous to PGASYNC_COPY_IN. I haven't thought
 this fully through yet, but it seems like it would be good to have a
 consistent API.

On a related issue, so far I haven't considered about the way to output
the notice message at all :( In the current SR, it's always written to
stderr by the defaultNoticeProcessor by using fprintf, whether the
log_destination is specified or not. This is bizarre, and would need to
be fixed.

I'm going to set the new function calling ereport as the current notice
processor by using PQsetNoticeProcessor. But the problem is that only the
completed message like NOTICE: xxx is passed to such notice processor,
i.e., the error level itself is not passed.

So I wonder which error level should be used to output the notice message.
There are some approaches to address this;

1. Always use a specific level without regard to the actual one
2. Reverse-engineer the level from the complete message
3. Change some libpq functions so as to pass the error level to the notice
   processor

But nothing really stands out. Do you have another good idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 On Mon, Dec 14, 2009 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Do we need a new PQgetXLogData function at all?  Seems like you could
 shove the data through the COPY protocol and not have to touch libpq
 at all, rather than duplicating a nontrivial amount of code there.

 Yeah, I also think that all data (the WAL data itself, its LSN and
 the flag bits) which the PQgetXLogData handles could be shoved
 through the COPY protocol. But, outside libpq, it's somewhat messy
 to extract the LSN and the flag bits from the data buffer which
 PQgetCopyData returns, by using ntohs(). So I provided the new
 libpq function only for replication. That is, I didn't want to expose
 the low layer of network which libpq should handle.

I find that a completely unconvincing division of labor.  Who is to say
that the LSN is the only part of the data that needs special treatment?

The very, very large practical problem with this is that if you decide
to change the behavior at any time, the only way to be sure that the WAL
receiver is using the right libpq version is to perform a soname major
version bump.  The transformations done by libpq will essentially become
part of its ABI, and not a very visible part at that.

I am going to insist that no such logic be placed in libpq.  From a
packager's standpoint that's insanity.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 I'm going to set the new function calling ereport as the current notice
 processor by using PQsetNoticeProcessor. But the problem is that only the
 completed message like NOTICE: xxx is passed to such notice processor,
 i.e., the error level itself is not passed.

Use PQsetNoticeReceiver.  The other one is just there for backwards
compatibility.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Heikki Linnakangas

Tom Lane wrote:
 The very, very large practical problem with this is that if you decide
 to change the behavior at any time, the only way to be sure that the WAL
 receiver is using the right libpq version is to perform a soname major
 version bump.  The transformations done by libpq will essentially become
 part of its ABI, and not a very visible part at that.

Not having to change the libpq API would certainly be a big advantage.

It's going to be a bit more complicated in walsender/walreceiver to work
with the libpq COPY API. We're going to need a WAL sending/receiving
protocol on top of it, defined in terms of rows and columns passed
through the COPY protocol.

One problem is the the standby is supposed to send back acknowledgments
to the master, telling it how far it has received/replayed the WAL. Is
there any way to send information back to the server, while a COPY OUT
is in progress? That's not absolutely necessary with asynchronous
replication, but will be with synchronous.

One idea is to stop/start the COPY between every batch of WAL records
sent, giving the client (= walreceiver) a chance to send messages back.
But that will lead to extra round trips.

BTW, something that's been bothering me a bit with this patch is that we
now have to link the backend with libpq. I don't see an immediate
problem with that, but I'm not a packager. Does anyone see a problem
with that?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 It's going to be a bit more complicated in walsender/walreceiver to work
 with the libpq COPY API. We're going to need a WAL sending/receiving
 protocol on top of it, defined in terms of rows and columns passed
 through the COPY protocol.

AFAIR, libpq knows essentially nothing of the data being passed through
COPY --- it just treats that as a byte stream.  I think you can define
any data format you want, it doesn't need to look exactly like a COPY
of a table would.  In fact it's probably a lot better if it DOESN'T
look like COPY data once it gets past libpq, so that you can check
that it is WAL and not COPY data.

 One problem is the the standby is supposed to send back acknowledgments
 to the master, telling it how far it has received/replayed the WAL. Is
 there any way to send information back to the server, while a COPY OUT
 is in progress? That's not absolutely necessary with asynchronous
 replication, but will be with synchronous.

Well, a real COPY would of course not stop to look for incoming
messages, but I don't think that's inherent in the protocol.  You
would likely need some libpq adjustments so it didn't throw error
when you tried that, but it would be a small and one-time adjustment.

 BTW, something that's been bothering me a bit with this patch is that we
 now have to link the backend with libpq. I don't see an immediate
 problem with that, but I'm not a packager. Does anyone see a problem
 with that?

Yeah, I have a problem with that.  What's the backend doing with libpq?
It's not receiving this data, it's sending it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 Yeah, I have a problem with that.  What's the backend doing with libpq?
 It's not receiving this data, it's sending it.

 walreceiver is a postmaster subprocess too.

Hm.  Perhaps it should be a loadable plugin and not hard-linked into the
backend?  Compare dblink.

The main concern I have with hard-linking libpq is that it has a lot of
symbol conflicts with the backend --- and at least the ones from
src/port/ aren't easily removed.  I foresee problems that will be very
difficult to fix on platforms where we can't filter the set of link
symbols exposed by libpq.  Linking a thread-enabled libpq into the
backend could also create problems on some platforms --- it would likely
cause a thread-capable libc to get linked, which is not what we want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Heikki Linnakangas

Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 BTW, something that's been bothering me a bit with this patch is that we
 now have to link the backend with libpq. I don't see an immediate
 problem with that, but I'm not a packager. Does anyone see a problem
 with that?
 
 Yeah, I have a problem with that.  What's the backend doing with libpq?
 It's not receiving this data, it's sending it.

walreceiver is a postmaster subprocess too.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-14 Thread Fujii Masao

On Tue, Dec 15, 2009 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  Perhaps it should be a loadable plugin and not hard-linked into the
 backend?  Compare dblink.

You mean that such plugin is supplied in shared_preload_libraries,
a new process is forked and the shared-memory related to walreceiver
is created by using shmem_startup_hook? Since this approach would
solve the problem discussed previously, ISTM this makes sense.
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00031.php

Some additional code might be required to control the termination
of walreceiver.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-13 Thread Fujii Masao

On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Walreceiver wants to wait for data to arrive from the master or a
 signal. PQgetXLogData(), which is the libpq function to read a piece of
 WAL, takes a timeout argument to support that. Walreceiver calls
 PQgetXLogData() in an endless loop, checking for a received sighup or
 death of postmaster at every iteration.

 In the synchronous replication mode, I presume it's also going to listen
 for a signal from the startup process, so that it can send a
 acknowledgment to the master as soon as a COMMIT record has been
 replayed that a backend on the master is waiting for.

Right.

 To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
 to take a timeout instead of finishing_time argument. Which is a mistake
 because it breaks PQconnectdb, and as I said I don't think
 PQgetXLogData(9 should have a timeout argument to begin with. Instead,
 it should have a boolean 'async' argument to return immediately if
 there's no data, and walreceiver main loop should call poll()/select()
 to wait. Ie. just like PQgetCopyData() works.

Seems good. I'll revise the code.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-13 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
 to take a timeout instead of finishing_time argument. Which is a mistake
 because it breaks PQconnectdb, and as I said I don't think
 PQgetXLogData(9 should have a timeout argument to begin with. Instead,
 it should have a boolean 'async' argument to return immediately if
 there's no data, and walreceiver main loop should call poll()/select()
 to wait. Ie. just like PQgetCopyData() works.

 Seems good. I'll revise the code.

Do we need a new PQgetXLogData function at all?  Seems like you could
shove the data through the COPY protocol and not have to touch libpq
at all, rather than duplicating a nontrivial amount of code there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-13 Thread Fujii Masao

On Mon, Dec 14, 2009 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Do we need a new PQgetXLogData function at all?  Seems like you could
 shove the data through the COPY protocol and not have to touch libpq
 at all, rather than duplicating a nontrivial amount of code there.

Yeah, I also think that all data (the WAL data itself, its LSN and
the flag bits) which the PQgetXLogData handles could be shoved
through the COPY protocol. But, outside libpq, it's somewhat messy
to extract the LSN and the flag bits from the data buffer which
PQgetCopyData returns, by using ntohs(). So I provided the new
libpq function only for replication. That is, I didn't want to expose
the low layer of network which libpq should handle.

I think that the friendly function would be useful to implement
the standby program (e.g., a stand-alone walreceiver tool) outside
the core.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-12 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Thu, Dec 10, 2009 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The OS buffer is expected to be able to store a large number of
 XLogRecPtr messages, because its size is small. So it's also OK
 to just drop it.
 It certainly seems to be something we could improve later, when and
 if evidence emerges that it's a real-world problem.  For now,
 simple is beautiful.
 
 I just dropped the backend libpq changes related to non-blocking I/O.
 
   git://git.postgresql.org/git/users/fujii/postgres.git
   branch: replication

Thanks, much simpler now.

Changing the finish_time argument to pqWaitTimed into timeout_ms changes
the behavior connect_timeout option to PQconnectdb. It should wait for
max connect_timeout seconds in total, but now it is waiting for
connect_timeout seconds at each step in the connection process: opening
a socket, authenticating etc.

Could we change the API of PQgetXLogData to be more like PQgetCopyData?
I'm thinking of removing the timeout argument, and instead looping with
select/poll and PQconsumeInput in the caller. That probably means
introducing a new state analogous to PGASYNC_COPY_IN. I haven't thought
this fully through yet, but it seems like it would be good to have a
consistent API.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-12 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Changing the finish_time argument to pqWaitTimed into timeout_ms changes
 the behavior connect_timeout option to PQconnectdb. It should wait for
 max connect_timeout seconds in total, but now it is waiting for
 connect_timeout seconds at each step in the connection process: opening
 a socket, authenticating etc.

Refresh my memory as to why this patch is touching any of that code at
all?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-12 Thread Heikki Linnakangas

Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Changing the finish_time argument to pqWaitTimed into timeout_ms changes
 the behavior connect_timeout option to PQconnectdb. It should wait for
 max connect_timeout seconds in total, but now it is waiting for
 connect_timeout seconds at each step in the connection process: opening
 a socket, authenticating etc.
 
 Refresh my memory as to why this patch is touching any of that code at
 all?

Walreceiver wants to wait for data to arrive from the master or a
signal. PQgetXLogData(), which is the libpq function to read a piece of
WAL, takes a timeout argument to support that. Walreceiver calls
PQgetXLogData() in an endless loop, checking for a received sighup or
death of postmaster at every iteration.

In the synchronous replication mode, I presume it's also going to listen
for a signal from the startup process, so that it can send a
acknowledgment to the master as soon as a COMMIT record has been
replayed that a backend on the master is waiting for.

To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
to take a timeout instead of finishing_time argument. Which is a mistake
because it breaks PQconnectdb, and as I said I don't think
PQgetXLogData(9 should have a timeout argument to begin with. Instead,
it should have a boolean 'async' argument to return immediately if
there's no data, and walreceiver main loop should call poll()/select()
to wait. Ie. just like PQgetCopyData() works.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-09 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 On Wed, Dec 9, 2009 at 3:58 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 But if everyone is happy with just relying on the OS buffer to not fill
 up, let's just drop it.

 The OS buffer is expected to be able to store a large number of
 XLogRecPtr messages, because its size is small. So it's also OK
 to just drop it.

It certainly seems to be something we could improve later, when and
if evidence emerges that it's a real-world problem.  For now,
simple is beautiful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-09 Thread Fujii Masao

On Thu, Dec 10, 2009 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The OS buffer is expected to be able to store a large number of
 XLogRecPtr messages, because its size is small. So it's also OK
 to just drop it.

 It certainly seems to be something we could improve later, when and
 if evidence emerges that it's a real-world problem.  For now,
 simple is beautiful.

I just dropped the backend libpq changes related to non-blocking I/O.

  git://git.postgresql.org/git/users/fujii/postgres.git
  branch: replication

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Streaming replication and non-blocking I/O

2009-12-08 Thread Heikki Linnakangas

I find the backend libpq changes related to non-blocking I/O quite
complex. Can we find a simpler solution?

The problem we're trying to solve is that while the walsender backend
sends a lot of WAL records to the client, the client can send a lot of
messages to the backend. If volume of the messages from client to server
exceeds both the input buffer in the server and the output buffer in the
client, the client will block until the server has read some data. But
if the client is blocked, it will not process incoming data from the
server, and eventually the server will block too. And we have a
deadlock. This:
http://florin.bjdean.id.au/docs/omnimark/omni55/docs/html/concept/717.htm
is a pretty good description of the problem.

The first question is: do we really need to be prepared for that? The
XLogRecPtr acknowledgment messages the client sends are very small, and
if the client is mindful about not sending them too often - perhaps max
1 ack per 1 received XLOG message - the receive buffer in the backend
should never fill up in practice.

If that's deemed not good enough, we could modify just internal_flush()
so that it uses secure_poll to wait for the possibility to either read
or write, instead of blocking for just write. Whenever there's incoming
data, read them into PqRecvBuffer for later processing, which keeps the
OS input buffer from filling up. If PqRecvBuffer fills up, it can be
extended, or we can start dropping old XLogRecPtr messages from it.

In any case, we'll need something like pq_wait to check if a message can
be read without blocking, but that's just a small additional function as
opposed to a whole new API for assembling and sending messages without
blocking.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-08 Thread Fujii Masao

On Tue, Dec 8, 2009 at 11:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The first question is: do we really need to be prepared for that? The
 XLogRecPtr acknowledgment messages the client sends are very small, and
 if the client is mindful about not sending them too often - perhaps max
 1 ack per 1 received XLOG message - the receive buffer in the backend
 should never fill up in practice.

It's OK to drop that feature.

 If that's deemed not good enough, we could modify just internal_flush()
 so that it uses secure_poll to wait for the possibility to either read
 or write, instead of blocking for just write. Whenever there's incoming
 data, read them into PqRecvBuffer for later processing, which keeps the
 OS input buffer from filling up. If PqRecvBuffer fills up, it can be
 extended, or we can start dropping old XLogRecPtr messages from it.

Extending PqRecvBuffer seems better because XLogRecPtr message
has some types (i.e., we cannot just drop old message without parsing
all messages in the buffer).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-08 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Tue, Dec 8, 2009 at 11:23 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 If that's deemed not good enough, we could modify just internal_flush()
 so that it uses secure_poll to wait for the possibility to either read
 or write, instead of blocking for just write. Whenever there's incoming
 data, read them into PqRecvBuffer for later processing, which keeps the
 OS input buffer from filling up. If PqRecvBuffer fills up, it can be
 extended, or we can start dropping old XLogRecPtr messages from it.
 
 Extending PqRecvBuffer seems better because XLogRecPtr message
 has some types (i.e., we cannot just drop old message without parsing
 all messages in the buffer).

True. Another idea I had was to introduce a callback that backend libpq
can call when the buffer fills. Walsender would set the callback to
ProcessStreamMsgs().

But if everyone is happy with just relying on the OS buffer to not fill
up, let's just drop it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-08 Thread Fujii Masao

On Wed, Dec 9, 2009 at 3:58 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 True. Another idea I had was to introduce a callback that backend libpq
 can call when the buffer fills. Walsender would set the callback to
 ProcessStreamMsgs().

 But if everyone is happy with just relying on the OS buffer to not fill
 up, let's just drop it.

The OS buffer is expected to be able to store a large number of
XLogRecPtr messages, because its size is small. So it's also OK
to just drop it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

67 matches

Mail list logo