Re: [HACKERS] Streaming replication on win32, still broken

2010-02-21 Thread Fujii Masao
On Fri, Feb 19, 2010 at 7:54 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Heikki Linnakangas wrote:
 Magnus Hagander wrote:
 Well, it's going to make the process that reads the WAL cause actual
 physical I/O... That'll take a chunk out of your total available I/O,
 which is likely to push you to the limit of your I/O capacity much
 quicker.

 Right, doesn't seem sensible, though it would be nice to see a benchmark
 on that.

 Here's a patch to disable O_DIRECT when archiving or streaming is
 enabled. This is pretty hard to test, so any extra eyeballs would be nice..

 Committed. Can you check that this fixed the PANIC you saw?

Thanks! Yeah, SR works fine in my MinGW environment.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-19 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
 Magnus Hagander wrote:
 Well, it's going to make the process that reads the WAL cause actual
 physical I/O... That'll take a chunk out of your total available I/O,
 which is likely to push you to the limit of your I/O capacity much
 quicker.
 
 Right, doesn't seem sensible, though it would be nice to see a benchmark
 on that.
 
 Here's a patch to disable O_DIRECT when archiving or streaming is
 enabled. This is pretty hard to test, so any extra eyeballs would be nice..

Committed. Can you check that this fixed the PANIC you saw?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-18 Thread Magnus Hagander
2010/2/18 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Fujii Masao wrote:
 On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
 walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
 When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
 issue_xlog_fsync() will do nothing because it assumes the write() synced
 it already. So the data written isn't being forced to disk at all.

 When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
 flag. So I think that write() flushes the data to disk even if O_DIRECT
 flag is not given. Am I missing something?

 Ah, ok, you're right.

Yes, I believe the difference is that with O_DIRECT it bypasses the
cache completely. Without it, we still sync it out, but it also goes
into the cache.

O_DIRECT helps us when we're not going to read the file again, because
we don't waste cache on it. If we are, which is the case here, it
should be really bad for performance, since we actually have to do a
physical read.

Incidentally, that should also apply to general WAL when archive_mdoe
is on. Do we optimize for that?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-18 Thread Heikki Linnakangas
Magnus Hagander wrote:
 O_DIRECT helps us when we're not going to read the file again, because
 we don't waste cache on it. If we are, which is the case here, it
 should be really bad for performance, since we actually have to do a
 physical read.
 
 Incidentally, that should also apply to general WAL when archive_mdoe
 is on. Do we optimize for that?

Hmm, no we don't. We do take that into account so that we refrain from
issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
disable O_DIRECT. Maybe we should..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-18 Thread Fujii Masao
On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Magnus Hagander wrote:
 O_DIRECT helps us when we're not going to read the file again, because
 we don't waste cache on it. If we are, which is the case here, it
 should be really bad for performance, since we actually have to do a
 physical read.

 Incidentally, that should also apply to general WAL when archive_mdoe
 is on. Do we optimize for that?

 Hmm, no we don't. We do take that into account so that we refrain from
 issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
 disable O_DIRECT. Maybe we should..

Since the performance of WAL write is more important than that of WAL
archiving in general, that optimization might offer little benefit.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-18 Thread Magnus Hagander
2010/2/18 Fujii Masao masao.fu...@gmail.com:
 On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Magnus Hagander wrote:
 O_DIRECT helps us when we're not going to read the file again, because
 we don't waste cache on it. If we are, which is the case here, it
 should be really bad for performance, since we actually have to do a
 physical read.

 Incidentally, that should also apply to general WAL when archive_mdoe
 is on. Do we optimize for that?

 Hmm, no we don't. We do take that into account so that we refrain from
 issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
 disable O_DIRECT. Maybe we should..

 Since the performance of WAL write is more important than that of WAL
 archiving in general, that optimization might offer little benefit.

Well, it's going to make the process that reads the WAL cause actual
physical I/O... That'll take a chunk out of your total available I/O,
which is likely to push you to the limit of your I/O capacity much
quicker.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-18 Thread Heikki Linnakangas
Magnus Hagander wrote:
 2010/2/18 Fujii Masao masao.fu...@gmail.com:
 On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Magnus Hagander wrote:
 O_DIRECT helps us when we're not going to read the file again, because
 we don't waste cache on it. If we are, which is the case here, it
 should be really bad for performance, since we actually have to do a
 physical read.

 Incidentally, that should also apply to general WAL when archive_mdoe
 is on. Do we optimize for that?
 Hmm, no we don't. We do take that into account so that we refrain from
 issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
 disable O_DIRECT. Maybe we should..
 Since the performance of WAL write is more important than that of WAL
 archiving in general, that optimization might offer little benefit.
 
 Well, it's going to make the process that reads the WAL cause actual
 physical I/O... That'll take a chunk out of your total available I/O,
 which is likely to push you to the limit of your I/O capacity much
 quicker.

Right, doesn't seem sensible, though it would be nice to see a benchmark
on that.

Here's a patch to disable O_DIRECT when archiving or streaming is
enabled. This is pretty hard to test, so any extra eyeballs would be nice..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
? GNUmakefile
? a.patch
? b
? backend
? config.log
? config.status
? config.status.lineno
? configure.lineno
? gin-splay-1.patch
? gin-splay-2.patch
? gin-splay-3.patch
? include
? libpq
? log_unlogged_op_0115.patch
? md-1.c
? md-1.patch
? replication
? temp-file-resowner-2.patch
? contrib/pgbench/.deps
? contrib/pgbench/fsynctest
? contrib/pgbench/fsynctest.c
? contrib/pgbench/fsynctestfile
? contrib/pgbench/pgbench
? contrib/spi/.deps
? doc/src/sgml/HTML.index
? doc/src/sgml/bookindex.sgml
? doc/src/sgml/features-supported.sgml
? doc/src/sgml/features-unsupported.sgml
? doc/src/sgml/version.sgml
? src/Makefile.global
? src/backend/aaa.patch
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/foreign/.deps
? src/backend/foreign/dummy/.deps
? src/backend/foreign/postgresql/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/po/af.mo
? src/backend/po/cs.mo
? src/backend/po/hr.mo
? src/backend/po/hu.mo
? src/backend/po/it.mo
? src/backend/po/ko.mo
? src/backend/po/nb.mo
? src/backend/po/nl.mo
? src/backend/po/pl.mo
? src/backend/po/ro.mo
? src/backend/po/ru.mo
? src/backend/po/sk.mo
? src/backend/po/sl.mo
? src/backend/po/sv.mo
? src/backend/po/zh_CN.mo
? src/backend/po/zh_TW.mo
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/replication/.deps
? src/backend/replication/libpqwalreceiver/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/Unicode/BIG5.TXT
? src/backend/utils/mb/Unicode/CP950.TXT
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? 

Re: [HACKERS] Streaming replication on win32, still broken

2010-02-17 Thread Fujii Masao
On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote:
 In that case, O_DIRECT would be counterproductive, no? It maps to
 FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
 cache. So the read in the startup proc is actually guaranteed to
 reuqire a physical read - of something we just wrote, so it'll almost
 certainly end up waiting for a rotation, no?

 Seems like getting rid of O_DIRECT here is the right thing to do,
 regardless of this.

 Agreed. I'll remove O_DIRECT from walreceiver.

Here is the patch to do that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 1627,1633  XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
  			/* create/use new log file */
  			use_existent = true;
  			openLogFile = XLogFileInit(openLogId, openLogSeg,
! 	   use_existent, true);
  			openLogOff = 0;
  		}
  
--- 1627,1633 
  			/* create/use new log file */
  			use_existent = true;
  			openLogFile = XLogFileInit(openLogId, openLogSeg,
! 	   use_existent, true, true);
  			openLogOff = 0;
  		}
  
***
*** 2184,2189  XLogNeedsFlush(XLogRecPtr record)
--- 2184,2195 
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
+  * allow_direct_io: if TRUE, allow a WAL write to bypass the kernel cache
+  * by using PG_O_DIRECT for opening a file. Otherwise, PG_O_DIRECT is
+  * forcibly removed from the sync flag of open(). This should be FALSE
+  * only when walreceiver process writes WAL data because it's read
+  * immediately by the startup process.
+  *
   * Returns FD of opened file.
   *
   * Note: errors here are ERROR not PANIC because we might or might not be
***
*** 2193,2199  XLogNeedsFlush(XLogRecPtr record)
   */
  int
  XLogFileInit(uint32 log, uint32 seg,
! 			 bool *use_existent, bool use_lock)
  {
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
--- 2199,2205 
   */
  int
  XLogFileInit(uint32 log, uint32 seg,
! 			 bool *use_existent, bool use_lock, bool allow_direct_io)
  {
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
***
*** 2203,2208  XLogFileInit(uint32 log, uint32 seg,
--- 2209,2217 
  	int			max_advance;
  	int			fd;
  	int			nbytes;
+ 	int			sync_bit;
+ 
+ 	sync_bit = get_sync_bit(sync_method)  (allow_direct_io ? 0 : ~PG_O_DIRECT);
  
  	XLogFilePath(path, ThisTimeLineID, log, seg);
  
***
*** 2211,2217  XLogFileInit(uint32 log, uint32 seg,
  	 */
  	if (*use_existent)
  	{
! 		fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
  		   S_IRUSR | S_IWUSR);
  		if (fd  0)
  		{
--- 2220,2226 
  	 */
  	if (*use_existent)
  	{
! 		fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit,
  		   S_IRUSR | S_IWUSR);
  		if (fd  0)
  		{
***
*** 2237,2243  XLogFileInit(uint32 log, uint32 seg,
  
  	unlink(tmppath);
  
! 	/* do not use get_sync_bit() here --- want to fsync only at end of fill */
  	fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
--- 2246,2252 
  
  	unlink(tmppath);
  
! 	/* do not use sync_bit here --- want to fsync only at end of fill */
  	fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
***
*** 2317,2323  XLogFileInit(uint32 log, uint32 seg,
  	*use_existent = false;
  
  	/* Now open original target segment (might not be file I just made) */
! 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
  		ereport(ERROR,
--- 2326,2332 
  	*use_existent = false;
  
  	/* Now open original target segment (might not be file I just made) */
! 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
  		ereport(ERROR,
***
*** 3121,3127  PreallocXlogFiles(XLogRecPtr endptr)
  	{
  		NextLogSeg(_logId, _logSeg);
  		use_existent = true;
! 		lf = XLogFileInit(_logId, _logSeg, use_existent, true);
  		close(lf);
  		if (!use_existent)
  			CheckpointStats.ckpt_segs_added++;
--- 3130,3136 
  	{
  		NextLogSeg(_logId, _logSeg);
  		use_existent = true;
! 		lf = XLogFileInit(_logId, _logSeg, use_existent, true, true);
  		close(lf);
  		if (!use_existent)
  			CheckpointStats.ckpt_segs_added++;
***
*** 4794,4800  BootStrapXLOG(void)
  
  	/* Create first XLOG segment file */
  	use_existent = false;
! 	openLogFile = XLogFileInit(0, 0, use_existent, false);
  
  	/* Write the first page with the initial record */
  	errno = 0;
--- 4803,4809 
  
  	/* Create first XLOG segment file */
  	use_existent = false;

Re: [HACKERS] Streaming replication on win32, still broken

2010-02-17 Thread Fujii Masao
On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote:
 In that case, O_DIRECT would be counterproductive, no? It maps to
 FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
 cache. So the read in the startup proc is actually guaranteed to
 reuqire a physical read - of something we just wrote, so it'll almost
 certainly end up waiting for a rotation, no?

 Seems like getting rid of O_DIRECT here is the right thing to do,
 regardless of this.

 Agreed. I'll remove O_DIRECT from walreceiver.

 Here is the patch to do that.

Ooops! I found the bug in the patch. Here is the updated version.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 1627,1633  XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
  			/* create/use new log file */
  			use_existent = true;
  			openLogFile = XLogFileInit(openLogId, openLogSeg,
! 	   use_existent, true);
  			openLogOff = 0;
  		}
  
--- 1627,1633 
  			/* create/use new log file */
  			use_existent = true;
  			openLogFile = XLogFileInit(openLogId, openLogSeg,
! 	   use_existent, true, true);
  			openLogOff = 0;
  		}
  
***
*** 2184,2189  XLogNeedsFlush(XLogRecPtr record)
--- 2184,2195 
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
+  * allow_direct_io: if TRUE, allow a WAL write to bypass the kernel cache
+  * by using PG_O_DIRECT for opening a file. Otherwise, PG_O_DIRECT is
+  * forcibly removed from the sync flag of open(). This should be FALSE
+  * only when walreceiver process writes WAL data because it's read
+  * immediately by the startup process.
+  *
   * Returns FD of opened file.
   *
   * Note: errors here are ERROR not PANIC because we might or might not be
***
*** 2193,2199  XLogNeedsFlush(XLogRecPtr record)
   */
  int
  XLogFileInit(uint32 log, uint32 seg,
! 			 bool *use_existent, bool use_lock)
  {
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
--- 2199,2205 
   */
  int
  XLogFileInit(uint32 log, uint32 seg,
! 			 bool *use_existent, bool use_lock, bool allow_direct_io)
  {
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
***
*** 2203,2208  XLogFileInit(uint32 log, uint32 seg,
--- 2209,2219 
  	int			max_advance;
  	int			fd;
  	int			nbytes;
+ 	int			sync_bit;
+ 
+ 	sync_bit = get_sync_bit(sync_method);
+ 	if (!allow_direct_io)
+ 		sync_bit = ~PG_O_DIRECT;
  
  	XLogFilePath(path, ThisTimeLineID, log, seg);
  
***
*** 2211,2217  XLogFileInit(uint32 log, uint32 seg,
  	 */
  	if (*use_existent)
  	{
! 		fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
  		   S_IRUSR | S_IWUSR);
  		if (fd  0)
  		{
--- ,2228 
  	 */
  	if (*use_existent)
  	{
! 		fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit,
  		   S_IRUSR | S_IWUSR);
  		if (fd  0)
  		{
***
*** 2237,2243  XLogFileInit(uint32 log, uint32 seg,
  
  	unlink(tmppath);
  
! 	/* do not use get_sync_bit() here --- want to fsync only at end of fill */
  	fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
--- 2248,2254 
  
  	unlink(tmppath);
  
! 	/* do not use sync_bit here --- want to fsync only at end of fill */
  	fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
***
*** 2317,2323  XLogFileInit(uint32 log, uint32 seg,
  	*use_existent = false;
  
  	/* Now open original target segment (might not be file I just made) */
! 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
  		ereport(ERROR,
--- 2328,2334 
  	*use_existent = false;
  
  	/* Now open original target segment (might not be file I just made) */
! 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit,
  	   S_IRUSR | S_IWUSR);
  	if (fd  0)
  		ereport(ERROR,
***
*** 3121,3127  PreallocXlogFiles(XLogRecPtr endptr)
  	{
  		NextLogSeg(_logId, _logSeg);
  		use_existent = true;
! 		lf = XLogFileInit(_logId, _logSeg, use_existent, true);
  		close(lf);
  		if (!use_existent)
  			CheckpointStats.ckpt_segs_added++;
--- 3132,3138 
  	{
  		NextLogSeg(_logId, _logSeg);
  		use_existent = true;
! 		lf = XLogFileInit(_logId, _logSeg, use_existent, true, true);
  		close(lf);
  		if (!use_existent)
  			CheckpointStats.ckpt_segs_added++;
***
*** 4794,4800  BootStrapXLOG(void)
  
  	/* Create first XLOG segment file */
  	use_existent = false;
! 	openLogFile = XLogFileInit(0, 0, use_existent, false);

Re: [HACKERS] Streaming replication on win32, still broken

2010-02-17 Thread Heikki Linnakangas
Fujii Masao wrote:
 On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 In that case, O_DIRECT would be counterproductive, no? It maps to
 FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
 cache. So the read in the startup proc is actually guaranteed to
 reuqire a physical read - of something we just wrote, so it'll almost
 certainly end up waiting for a rotation, no?

 Seems like getting rid of O_DIRECT here is the right thing to do,
 regardless of this.
 Agreed. I'll remove O_DIRECT from walreceiver.
 Here is the patch to do that.
 
 Ooops! I found the bug in the patch. Here is the updated version.

If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
issue_xlog_fsync() will do nothing because it assumes the write() synced
it already. So the data written isn't being forced to disk at all.

How about just forcing sync_method to 'fsync' in walreceiver?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-17 Thread Fujii Masao
On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
 walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
 When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
 issue_xlog_fsync() will do nothing because it assumes the write() synced
 it already. So the data written isn't being forced to disk at all.

When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
flag. So I think that write() flushes the data to disk even if O_DIRECT
flag is not given. Am I missing something?

 How about just forcing sync_method to 'fsync' in walreceiver?

In win32, O_DSYNC seems to be preferred to 'fsync' so far. So I'm not sure
if reshuffling of priority is harmless.
http://archives.postgresql.org/pgsql-hackers-win32/2005-03/msg00148.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-17 Thread Heikki Linnakangas
Fujii Masao wrote:
 On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
 walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
 When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
 issue_xlog_fsync() will do nothing because it assumes the write() synced
 it already. So the data written isn't being forced to disk at all.
 
 When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
 flag. So I think that write() flushes the data to disk even if O_DIRECT
 flag is not given. Am I missing something?

Ah, ok, you're right.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Fujii Masao
On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote:
 With the libpq fixes, I get further (more on that fix later, btw), but
 now I get stuck in this. When I do something on the master that
 generates WAL, such as insert a record, and then try to query this on
 the slave, the walreceiver process crashes with:

 PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 
 160:
  Invalid argument
 LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

 I'll keep digging at the details, but if somebody has a good idea here.. ;)

Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too.
Though I've not idenfied the cause yet, I guess that it derives from wrong use
of the type of local variables in XLogWalRcvWrite(). I'll continue investigation
of it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Magnus Hagander
2010/2/16 Fujii Masao masao.fu...@gmail.com:
 On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote:
 With the libpq fixes, I get further (more on that fix later, btw), but
 now I get stuck in this. When I do something on the master that
 generates WAL, such as insert a record, and then try to query this on
 the slave, the walreceiver process crashes with:

 PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 
 160:
  Invalid argument
 LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

 I'll keep digging at the details, but if somebody has a good idea here.. ;)

 Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, 
 too.
 Though I've not idenfied the cause yet, I guess that it derives from wrong use
 of the type of local variables in XLogWalRcvWrite(). I'll continue 
 investigation
 of it.

Thanks!

I will be somewhat spottily available over the next two days due to
on-site work with clients.

Let me know if you would be helped by some details of how to get a
(somewhat faster) EC2 image up and running with MSVC to test on :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Fujii Masao
On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander mag...@hagander.net wrote:
 2010/2/16 Fujii Masao masao.fu...@gmail.com:
 On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 With the libpq fixes, I get further (more on that fix later, btw), but
 now I get stuck in this. When I do something on the master that
 generates WAL, such as insert a record, and then try to query this on
 the slave, the walreceiver process crashes with:

 PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 
 160:
  Invalid argument
 LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

 I'll keep digging at the details, but if somebody has a good idea here.. ;)

 Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, 
 too.
 Though I've not idenfied the cause yet, I guess that it derives from wrong 
 use
 of the type of local variables in XLogWalRcvWrite(). I'll continue 
 investigation
 of it.

 Thanks!

 I will be somewhat spottily available over the next two days due to
 on-site work with clients.

 Let me know if you would be helped by some details of how to get a
 (somewhat faster) EC2 image up and running with MSVC to test on :-)

Thanks! I can probably use the EC2 image by reading your great blog post.
http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

But it might take some time to make my sysadmin open the port for
rdesktop for some reasons...

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-16 Thread Magnus Hagander
2010/2/16 Fujii Masao masao.fu...@gmail.com:
 On Tue, Feb 16, 2010 at 1:33 AM, Magnus Hagander mag...@hagander.net wrote:
 2010/2/15 Tom Lane t...@sss.pgh.pa.us:
 Magnus Hagander mag...@hagander.net writes:
 I changed your patch to this, because I find it a lot simpler. The
 change is in the checking in pgwin32_recv - there is no need to ever
 call waitforsinglesocket, we can just exit out early.

 Thanks a lot, Magnus!

 Do you see any issue with that?

 This definitely looks cleaner, but is there a reason not to use bool
 instead of int here?

 No.

 Can include/port/win32.h refer to bool type?

Nope, you're correct, it can't.

Committed without that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Magnus Hagander
2010/2/16 Fujii Masao masao.fu...@gmail.com:
 On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander mag...@hagander.net wrote:
 2010/2/16 Fujii Masao masao.fu...@gmail.com:
 On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 With the libpq fixes, I get further (more on that fix later, btw), but
 now I get stuck in this. When I do something on the master that
 generates WAL, such as insert a record, and then try to query this on
 the slave, the walreceiver process crashes with:

 PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, 
 length 160:
  Invalid argument
 LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

 I'll keep digging at the details, but if somebody has a good idea here.. ;)

 Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, 
 too.
 Though I've not idenfied the cause yet, I guess that it derives from wrong 
 use
 of the type of local variables in XLogWalRcvWrite(). I'll continue 
 investigation
 of it.

 Thanks!

 I will be somewhat spottily available over the next two days due to
 on-site work with clients.

 Let me know if you would be helped by some details of how to get a
 (somewhat faster) EC2 image up and running with MSVC to test on :-)

 Thanks! I can probably use the EC2 image by reading your great blog post.
 http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

Actually, that one deosn't work anymore, because I managed to break
the image :-)

If you send me your amazon id, I can get you premissions on my private
image. I plan to clean it up and make it public, just haven't gotten
around to it yet...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Fujii Masao
On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander mag...@hagander.net wrote:
 If you send me your amazon id, I can get you premissions on my private
 image. I plan to clean it up and make it public, just haven't gotten
 around to it yet...

Thanks for your concern! I'll send the ID when I complete the preparation.

And, fortunately?, when I set wal_sync_method to open_sync, the problem was
reproduced in the linux, too. The cause is that the data that is written by
walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is
used by default. So the problem always happened on win32.

I propose two solution ideas:

1. O_DIRECT is somewhat harmful in the standby since the data written by
   walreceiver is read by the startup process immediately. So, how about
   not making only walreceiver use O_DIRECT?

2. Straightforwardly observe the alignment rule. Since the received WAL
   data might start at the middle of WAL block, walreceiver needs to keep
   the last half-written WAL block for alignment. OTOH since the received
   data might end at the middle of WAL block, walreceiver needs zero-padding.
   As a result, walreceiver writes the set of the last WAL block, received
   data and zero-padding.

Which is better? Or do you have another better idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Magnus Hagander
On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander mag...@hagander.net wrote:
 If you send me your amazon id, I can get you premissions on my private
 image. I plan to clean it up and make it public, just haven't gotten
 around to it yet...

 Thanks for your concern! I'll send the ID when I complete the preparation.

ok.


 And, fortunately?, when I set wal_sync_method to open_sync, the problem was
 reproduced in the linux, too. The cause is that the data that is written by

Ah, that's good. It always helps if it's a cross-platform issue -
particularly in that it's not one of the funky win32 specific things
we did :)


 walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is
 used by default. So the problem always happened on win32.

Ahh. I see.


 I propose two solution ideas:

 1. O_DIRECT is somewhat harmful in the standby since the data written by
   walreceiver is read by the startup process immediately. So, how about
   not making only walreceiver use O_DIRECT?

In that case, O_DIRECT would be counterproductive, no? It maps to
FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
cache. So the read in the startup proc is actually guaranteed to
reuqire a physical read - of something we just wrote, so it'll almost
certainly end up waiting for a rotation, no?

Seems like getting rid of O_DIRECT here is the right thing to do,
regardless of this.


 2. Straightforwardly observe the alignment rule. Since the received WAL
   data might start at the middle of WAL block, walreceiver needs to keep
   the last half-written WAL block for alignment. OTOH since the received
   data might end at the middle of WAL block, walreceiver needs zero-padding.
   As a result, walreceiver writes the set of the last WAL block, received
   data and zero-padding.

May there be other reasons to d this as well?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote:
 2. Straightforwardly observe the alignment rule. Since the received WAL
   data might start at the middle of WAL block, walreceiver needs to keep
   the last half-written WAL block for alignment. OTOH since the received
   data might end at the middle of WAL block, walreceiver needs zero-padding.
   As a result, walreceiver writes the set of the last WAL block, received
   data and zero-padding.

 May there be other reasons to d this as well?

Writing misaligned data is certain to be expensive even when it works...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Fujii Masao
On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote:
 In that case, O_DIRECT would be counterproductive, no? It maps to
 FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
 cache. So the read in the startup proc is actually guaranteed to
 reuqire a physical read - of something we just wrote, so it'll almost
 certainly end up waiting for a rotation, no?

 Seems like getting rid of O_DIRECT here is the right thing to do,
 regardless of this.

Agreed. I'll remove O_DIRECT from walreceiver.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication on win32, still broken

2010-02-16 Thread Fujii Masao
On Wed, Feb 17, 2010 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote:
 2. Straightforwardly observe the alignment rule. Since the received WAL
   data might start at the middle of WAL block, walreceiver needs to keep
   the last half-written WAL block for alignment. OTOH since the received
   data might end at the middle of WAL block, walreceiver needs zero-padding.
   As a result, walreceiver writes the set of the last WAL block, received
   data and zero-padding.

 May there be other reasons to d this as well?

 Writing misaligned data is certain to be expensive even when it works...

Yeah, right. After I remove O_DIRECT, I'll change walreceiver so as to
do an alignment correctly, and then I'll test the performance.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication on win32, still broken

2010-02-15 Thread Magnus Hagander
With the libpq fixes, I get further (more on that fix later, btw), but
now I get stuck in this. When I do something on the master that
generates WAL, such as insert a record, and then try to query this on
the slave, the walreceiver process crashes with:

PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 160:
 Invalid argument
LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

I'll keep digging at the details, but if somebody has a good idea here.. ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-15 Thread Magnus Hagander
2010/2/15 Fujii Masao masao.fu...@gmail.com:
 On Sun, Feb 14, 2010 at 11:52 PM, Magnus Hagander mag...@hagander.net wrote:
 Remember that the win32 code *always* puts the socket in non-blocking
 mode. So we can't just teach the layer about it. We need some way to
 pass the information down that this is actually something we want to
 be non-blocking, and it can't be the normal flag on the socket. I
 don't really have an idea of where else we'd put it though :( It's in
 the port structure, but not beyond it.

 Right.

 BTW, pq_getbyte_if_available() always changes the socket to non-blocking
 and blocking mode before and after calling secure_read(), respectively.
 This code seems wrong on win32. Because, as you said, the socket is always
 in non-blocking mode on win32. We should change pq_getbyte_if_available()
 so as not to change the socket mode only in win32?

Yes.

 What we could do, is have an ugly global flag specifically for the
 use-case we have here. Assuming we do create a plataform specific
 pq_getbyte_if_available(), the code-path that would have trouble now
 would be when we call pq_getbyte_if_available(), and it in turns asks
 the socket if there is data, there is, but we end up calling back into
 the SSL code to fetch the data, and it gets an incomplete packet.
 Correct? So the path is basically:

 pq_getbyte_if_available() - secure_read() - SSL_read() -
 my_sock_read() - pgwin32_recv()

 Given that we know we are working on a single socket here, we could
 use a global flag to tell pgwin32_recv() to become nonblocking. We
 could set this flag directly in the win32-specific version of
 pq_getbyte_if_available(), and make sure it's cleared as soon as we
 exit.

 It will obviously fail if we do anything on a *different* socket
 during this time, so it has to be set for a very short time. But that
 seems doable. And we don't call any socket stuff from signal handlers
 so that shouldn't cause issues.

 Agreed. Here is the patch which does that (including the above-mentioned
 change). I haven't tested it yet because I failed in creating the build
 environment for the MSVC :( I'll try to create that again, and test it.
 Though I'm not sure how long it takes.

I changed your patch to this, because I find it a lot simpler. The
change is in the checking in pgwin32_recv - there is no need to ever
call waitforsinglesocket, we can just exit out early.

Do you see any issue with that?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


pq_getbyte_if_available_on_win32_magnus.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-15 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 I changed your patch to this, because I find it a lot simpler. The
 change is in the checking in pgwin32_recv - there is no need to ever
 call waitforsinglesocket, we can just exit out early.

 Do you see any issue with that?

This definitely looks cleaner, but is there a reason not to use bool
instead of int here?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-15 Thread Magnus Hagander
2010/2/15 Tom Lane t...@sss.pgh.pa.us:
 Magnus Hagander mag...@hagander.net writes:
 I changed your patch to this, because I find it a lot simpler. The
 change is in the checking in pgwin32_recv - there is no need to ever
 call waitforsinglesocket, we can just exit out early.

 Do you see any issue with that?

 This definitely looks cleaner, but is there a reason not to use bool
 instead of int here?

No.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-15 Thread Fujii Masao
On Tue, Feb 16, 2010 at 1:33 AM, Magnus Hagander mag...@hagander.net wrote:
 2010/2/15 Tom Lane t...@sss.pgh.pa.us:
 Magnus Hagander mag...@hagander.net writes:
 I changed your patch to this, because I find it a lot simpler. The
 change is in the checking in pgwin32_recv - there is no need to ever
 call waitforsinglesocket, we can just exit out early.

Thanks a lot, Magnus!

 Do you see any issue with that?

 This definitely looks cleaner, but is there a reason not to use bool
 instead of int here?

 No.

Can include/port/win32.h refer to bool type?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-14 Thread Magnus Hagander
2010/2/8 Fujii Masao masao.fu...@gmail.com:
 On Mon, Jan 18, 2010 at 11:46 PM, Magnus Hagander mag...@hagander.net wrote:
 From what I can tell, this indicates that pq_getbyte_if_available() is
 not working - because it's supposed to never block, right?

 Right, it's not supposed to block.

 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

 I presume the win32 emulation layer can be taught about non-blocking
 sockets? Or maybe pq_getbyte_if_available() can be implemented using
 some other simpler method on Windows.

 It could be taught that, but it would probably be a lot easier to put
 platform specific code in pq_getbyte_if_available().

 Umm.. in this case, for SSL on win32 case, we also need to create
 new function like my_sock_read_if_available() that receives data
 from non-blocking socket, and reassign it to the SSL BIO function.
 Right? If so, it seems easier for me to tell the win32 layer about
 non-blocking.

Sorry about the delay in responding to this.

Remember that the win32 code *always* puts the socket in non-blocking
mode. So we can't just teach the layer about it. We need some way to
pass the information down that this is actually something we want to
be non-blocking, and it can't be the normal flag on the socket. I
don't really have an idea of where else we'd put it though :( It's in
the port structure, but not beyond it.

What we could do, is have an ugly global flag specifically for the
use-case we have here. Assuming we do create a plataform specific
pq_getbyte_if_available(), the code-path that would have trouble now
would be when we call pq_getbyte_if_available(), and it in turns asks
the socket if there is data, there is, but we end up calling back into
the SSL code to fetch the data, and it gets an incomplete packet.
Correct? So the path is basically:

pq_getbyte_if_available() - secure_read() - SSL_read() -
my_sock_read() - pgwin32_recv()

Given that we know we are working on a single socket here, we could
use a global flag to tell pgwin32_recv() to become nonblocking. We
could set this flag directly in the win32-specific version of
pq_getbyte_if_available(), and make sure it's cleared as soon as we
exit.

It will obviously fail if we do anything on a *different* socket
during this time, so it has to be set for a very short time. But that
seems doable. And we don't call any socket stuff from signal handlers
so that shouldn't cause issues.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-02-14 Thread Fujii Masao
On Sun, Feb 14, 2010 at 11:52 PM, Magnus Hagander mag...@hagander.net wrote:
 Sorry about the delay in responding to this.

Thanks for the response.

 Remember that the win32 code *always* puts the socket in non-blocking
 mode. So we can't just teach the layer about it. We need some way to
 pass the information down that this is actually something we want to
 be non-blocking, and it can't be the normal flag on the socket. I
 don't really have an idea of where else we'd put it though :( It's in
 the port structure, but not beyond it.

Right.

BTW, pq_getbyte_if_available() always changes the socket to non-blocking
and blocking mode before and after calling secure_read(), respectively.
This code seems wrong on win32. Because, as you said, the socket is always
in non-blocking mode on win32. We should change pq_getbyte_if_available()
so as not to change the socket mode only in win32?

 What we could do, is have an ugly global flag specifically for the
 use-case we have here. Assuming we do create a plataform specific
 pq_getbyte_if_available(), the code-path that would have trouble now
 would be when we call pq_getbyte_if_available(), and it in turns asks
 the socket if there is data, there is, but we end up calling back into
 the SSL code to fetch the data, and it gets an incomplete packet.
 Correct? So the path is basically:

 pq_getbyte_if_available() - secure_read() - SSL_read() -
 my_sock_read() - pgwin32_recv()

 Given that we know we are working on a single socket here, we could
 use a global flag to tell pgwin32_recv() to become nonblocking. We
 could set this flag directly in the win32-specific version of
 pq_getbyte_if_available(), and make sure it's cleared as soon as we
 exit.

 It will obviously fail if we do anything on a *different* socket
 during this time, so it has to be set for a very short time. But that
 seems doable. And we don't call any socket stuff from signal handlers
 so that shouldn't cause issues.

Agreed. Here is the patch which does that (including the above-mentioned
change). I haven't tested it yet because I failed in creating the build
environment for the MSVC :( I'll try to create that again, and test it.
Though I'm not sure how long it takes.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/libpq/pqcomm.c
--- b/src/backend/libpq/pqcomm.c
***
*** 837,845  pq_getbyte_if_available(unsigned char *c)
--- 837,849 
  	}
  
  	/* Temporarily put the socket into non-blocking mode */
+ #ifdef WIN32
+ 	pgwin32_noblock = 1;
+ #else
  	if (!pg_set_noblock(MyProcPort-sock))
  		ereport(ERROR,
  (errmsg(couldn't put socket to non-blocking mode: %m)));
+ #endif
  	MyProcPort-noblock = true;
  	PG_TRY();
  	{
***
*** 851,866  pq_getbyte_if_available(unsigned char *c)
--- 855,878 
  		 * The rest of the backend code assumes the socket is in blocking
  		 * mode, so treat failure as FATAL.
  		 */
+ #ifdef WIN32
+ 		pgwin32_noblock = 0;
+ #else
  		if (!pg_set_block(MyProcPort-sock))
  			ereport(FATAL,
  	(errmsg(couldn't put socket to blocking mode: %m)));
+ #endif
  		MyProcPort-noblock = false;
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
+ #ifdef WIN32
+ 	pgwin32_noblock = 0;
+ #else
  	if (!pg_set_block(MyProcPort-sock))
  		ereport(FATAL,
  (errmsg(couldn't put socket to blocking mode: %m)));
+ #endif
  	MyProcPort-noblock = false;
  
  	return r;
*** a/src/backend/port/win32/socket.c
--- b/src/backend/port/win32/socket.c
***
*** 13,18 
--- 13,26 
  
  #include postgres.h
  
+ /*
+  * This indicates whether pgwin32_recv() is blocked until the receive
+  * operation has been finished. A value of zero blocks it even if
+  * the socket is set to non-blocking mode. A non-zero value makes it
+  * return immediately whether data is available or not.
+  */
+ int	pgwin32_noblock = 0;
+ 
  #undef socket
  #undef accept
  #undef connect
***
*** 315,321  pgwin32_recv(SOCKET s, char *buf, int len, int f)
  	for (n = 0; n  5; n++)
  	{
  		if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
! 		INFINITE) == 0)
  			return -1;			/* errno already set */
  
  		r = WSARecv(s, wbuf, 1, b, flags, NULL, NULL);
--- 323,330 
  	for (n = 0; n  5; n++)
  	{
  		if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
! 		(pgwin32_noblock == 0) ? INFINITE : 0)
! 			== 0)
  			return -1;			/* errno already set */
  
  		r = WSARecv(s, wbuf, 1, b, flags, NULL, NULL);
*** a/src/include/port/win32.h
--- b/src/include/port/win32.h
***
*** 283,288  int			pgwin32_send(SOCKET s, char *buf, int len, int flags);
--- 283,290 
  const char *pgwin32_socket_strerror(int err);
  int			pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout);
  
+ extern int	pgwin32_noblock;
+ 
  /* in backend/port/win32/security.c */
  extern int	pgwin32_is_admin(void);
  extern int	pgwin32_is_service(void);

-- 
Sent 

Re: [HACKERS] Streaming Replication on win32

2010-02-08 Thread Fujii Masao
On Mon, Jan 18, 2010 at 11:46 PM, Magnus Hagander mag...@hagander.net wrote:
 From what I can tell, this indicates that pq_getbyte_if_available() is
 not working - because it's supposed to never block, right?

 Right, it's not supposed to block.

 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

 I presume the win32 emulation layer can be taught about non-blocking
 sockets? Or maybe pq_getbyte_if_available() can be implemented using
 some other simpler method on Windows.

 It could be taught that, but it would probably be a lot easier to put
 platform specific code in pq_getbyte_if_available().

Umm.. in this case, for SSL on win32 case, we also need to create
new function like my_sock_read_if_available() that receives data
from non-blocking socket, and reassign it to the SSL BIO function.
Right? If so, it seems easier for me to tell the win32 layer about
non-blocking.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-24 Thread Joe Conway
On 01/21/2010 11:19 PM, Heikki Linnakangas wrote:
 Joe Conway wrote:
 OK, so now I see why we want this fixed for dblink and walreceiver, but
 doesn't this approach leave every other WIN32 libpq client out in the
 cold? Is there nothing that can be done for the general case, or is it a
 SMOP?
 
 The problem only applies to libpq calls from the backend. Client apps
 are not affected, only backend modules. If there's any other modules out
 there that use libpq, then yes, those have a problem too.

Sorry for being thick -- I'm still missing something. I don't understand
why any user program using libpq/PQexec running on Windows does not have
the same issue. Or to put it another way, why does this only apply to
libpq calls from backend modules?

 A generic fix would be nice. Putting the wrapper functions in a new
 header file that's included in all backend modules that want to use
 libpq would work. Maybe the new header file could even be #included from
 libpq-fe.h, when compiling backend code (#ifndef FRONTEND), so you
 wouldn't need to modify dblink, walreceiver, or any 3rd party modules
 that use libpq at all.

I'll go ahead and do this if there is consensus for it.

Joe



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Streaming Replication on win32

2010-01-24 Thread Magnus Hagander
2010/1/24 Joe Conway m...@joeconway.com:
 On 01/21/2010 11:19 PM, Heikki Linnakangas wrote:
 Joe Conway wrote:
 OK, so now I see why we want this fixed for dblink and walreceiver, but
 doesn't this approach leave every other WIN32 libpq client out in the
 cold? Is there nothing that can be done for the general case, or is it a
 SMOP?

 The problem only applies to libpq calls from the backend. Client apps
 are not affected, only backend modules. If there's any other modules out
 there that use libpq, then yes, those have a problem too.

 Sorry for being thick -- I'm still missing something. I don't understand
 why any user program using libpq/PQexec running on Windows does not have
 the same issue. Or to put it another way, why does this only apply to
 libpq calls from backend modules?

Because Windows programs in general don't rely on working Unix signal
semantics - since Win32 doesn't *have* them. We faked them for
PostgreSQL so we didn't have to rewrite large parts of how the backend
deals with those things. I don't know any other software that does -
and especially not client side software.

So yeah, you could say they are affected insofar that their calls will
be blocked as well, but if they are Windows apps they are probably
designed to deal with it. It's the common way for such calls to behave
on the platform.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-24 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 2010/1/24 Joe Conway m...@joeconway.com:
 Sorry for being thick -- I'm still missing something. I don't understand
 why any user program using libpq/PQexec running on Windows does not have
 the same issue. Or to put it another way, why does this only apply to
 libpq calls from backend modules?

 Because Windows programs in general don't rely on working Unix signal
 semantics - since Win32 doesn't *have* them.

The real question is why is it so critical for our emulated signals to
be able to interrupt these operations.

If you're trying to tell me that Hot Standby is too fragile to work
unless it can interrupt them, then HS has got a serious problem, and
it is not libpq's fault.  There is an enormous amount of code in the
backend that can run for long periods of time without noticing signals,
and not all of that is fixable.  Consider for example a plperl,
plpython, or pltcl function that goes off and does a long computation.

So I'm thinking that proposing to kluge up libpq in this area isn't
solving the real problem anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-24 Thread Heikki Linnakangas
Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
 2010/1/24 Joe Conway m...@joeconway.com:
 Sorry for being thick -- I'm still missing something. I don't understand
 why any user program using libpq/PQexec running on Windows does not have
 the same issue. Or to put it another way, why does this only apply to
 libpq calls from backend modules?
 
 Because Windows programs in general don't rely on working Unix signal
 semantics - since Win32 doesn't *have* them.
 
 The real question is why is it so critical for our emulated signals to
 be able to interrupt these operations.

The process won't react to a shutdown request otherwise, while it's
waiting for the response to PQexec(). It's not such a big deal for
walreceiver, actually, because it already uses select() (with emulation)
in the main loop, but it's theoretically possible for the connection to
be silently lost during the initial handshake, after sending the Query
message, before receiving a response.

dblink/contrib has the same issue, it might wait for a response forever.

Hmm, maybe we should just TCP_KEEPALIVE (if available) on the
connection. We should really be doing that in walreceiver anyway,
walreceiver won't otherwise notice if the connection is silently lost,
and won't know to reconnect.

 If you're trying to tell me that Hot Standby is too fragile to work
 unless it can interrupt them, then HS has got a serious problem, and
 it is not libpq's fault.  There is an enormous amount of code in the
 backend that can run for long periods of time without noticing signals,
 and not all of that is fixable.  Consider for example a plperl,
 plpython, or pltcl function that goes off and does a long computation.

No, it's not related to Hot Standby.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-22 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

 Joe Conway wrote:
 OK, so now I see why we want this fixed for dblink and walreceiver, but
 doesn't this approach leave every other WIN32 libpq client out in the
 cold? Is there nothing that can be done for the general case, or is it a
 SMOP?

 The problem only applies to libpq calls from the backend. Client apps
 are not affected, only backend modules. If there's any other modules out
 there that use libpq, then yes, those have a problem too.

plproxy comes to mind.

-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-22 Thread Marko Kreen
On 1/22/10, Dimitri Fontaine dfonta...@hi-media.com wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

   Joe Conway wrote:
   OK, so now I see why we want this fixed for dblink and walreceiver, but
   doesn't this approach leave every other WIN32 libpq client out in the
   cold? Is there nothing that can be done for the general case, or is it a
   SMOP?
  
   The problem only applies to libpq calls from the backend. Client apps
   are not affected, only backend modules. If there's any other modules out
   there that use libpq, then yes, those have a problem too.


 plproxy comes to mind.

Thats interesting.  PL/Proxy deos not use PQexec, it uses async
execution and waits on sockets with plain select() called
from code compiled with backend headers.

So it seems to be already using pgwin32_select().  Or not?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-22 Thread Heikki Linnakangas
Marko Kreen wrote:
 On 1/22/10, Dimitri Fontaine dfonta...@hi-media.com wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
   The problem only applies to libpq calls from the backend. Client apps
   are not affected, only backend modules. If there's any other modules out
   there that use libpq, then yes, those have a problem too.


 plproxy comes to mind.
 
 Thats interesting.  PL/Proxy deos not use PQexec, it uses async
 execution and waits on sockets with plain select() called
 from code compiled with backend headers.
 
 So it seems to be already using pgwin32_select().  Or not?

Yes. I just grepped plproxy source code and there's indeed no blocking
libpq calls there.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-21 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
 Magnus Hagander wrote:
 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 We could replace the blocking PQexec() calls with PQsendQuery(), and use
  the emulated version of select() to wait.
 Hmm. That would at least theoretically work, but aren't there still
 places we may end up blocking further down? Or are those ok?
 
 There's also PQconnect that needs similar treatment (using
 PQconnectStart/Poll()), but that's it.

So here's a patch implementing that for contrib/dblink. Walreceiver
needs the same treatment.

The implementation should be shared between the two, but I'm not sure
how. We can't just put the wrapper functions to a module in
src/backend/port/, because the wrapper functions depend on libpq. Maybe
put them in a new header file as static functions, and include that in
contrib/dblink/dblink.c and src/backend/replication/libpqwalreceiver.c.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 2c1d7a2..fa11709 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -34,6 +34,14 @@
 
 #include limits.h
 
+#ifdef WIN23
+/* These are needed by the interruptible libpq function replacements */
+#include time.h
+#include unistd.h
+#include sys/time.h
+#include sys/types.h
+#endif
+
 #include libpq-fe.h
 #include fmgr.h
 #include funcapi.h
@@ -101,6 +109,193 @@ static void dblink_res_error(const char *conname, PGresult *res, const char *dbl
 static char *get_connect_string(const char *servername);
 static char *escape_param_str(const char *from);
 
+#ifdef WIN23
+/*
+ * Replacement functions for blocking libpq functions, for Windows.
+ *
+ * On Windows, the vanilla select() function doesn't react to our emulated
+ * signals. PQexec() and PQconnectdb() use select(), so they're
+ * uninterruptible. These replacement functions use the corresponding
+ * asynchronous libpq functions and backend version of select() to implement
+ * the same functionality, but in a way that's interrupted by signals.
+ *
+ * These work on other platforms as well, but presumably it's more efficient
+ * to let libpq block.
+ */
+
+static PGresult *
+dblink_PQexec(PGconn *conn, const char *command)
+{
+	int			sock;
+	PGresult   *result,
+			   *lastResult;
+
+	/* Send query. This can block too, but we ignore that for now. */
+	if (PQsendQuery(conn, command) == 0)
+		return NULL;
+
+	/* Wait for response */
+	sock = PQsocket(conn);
+
+	while(PQisBusy(conn))
+	{
+		fd_set		input_mask;
+
+		FD_ZERO(input_mask);
+
+		FD_SET		(sock, input_mask);
+
+		/*
+		 * Note that we don't check the return code. We assume that
+		 * PQconsumeInput() will get the same error, and set the result
+		 * as failed.
+		 */
+		select(sock + 1, input_mask, NULL, NULL, NULL);
+		PQconsumeInput(conn);
+	}
+
+	/*
+	 * Emulate PQexec()'s behavior of returning the *last* result, if
+	 * there's many. dblink doesn't normally issue statements that return
+	 * multiple results, but the user-supplied SQL statement passed to
+	 * dblink() might. You'll only get the last result back, so it's not a
+	 * very sensible thing to do, but we must still handle it gracefully.
+	 *
+	 * We don't try to concatenate error messages like PQexec() does.
+	 * Doesn't seem worth the effort.
+	 */
+	lastResult = NULL;
+	while((result = PQgetResult(conn)) != NULL)
+	{
+		if (lastResult != NULL)
+		{
+			if (PQresultStatus(lastResult) != PGRES_COMMAND_OK 
+PQresultStatus(lastResult) != PGRES_TUPLES_OK)
+			{
+PQclear(result);
+result = lastResult;
+			}
+			else
+PQclear(lastResult);
+		}
+		lastResult = result;
+	}
+
+	return lastResult;
+}
+
+static PGconn *
+dblink_PQconnectdb(const char *conninfo)
+{
+	PGconn *conn;
+	PostgresPollingStatusType status;
+	PQconninfoOption *options;
+	int timeout_secs = 0;
+	time_t end_time;
+	int sock;
+
+	conn = PQconnectStart(conninfo);
+	if (conn == NULL)
+		return NULL;
+
+	if (PQstatus(conn) == CONNECTION_BAD)
+		return conn;
+
+	/* Extract timeout from the connection string */
+	options = PQconninfoParse(conninfo, NULL);
+	if (options)
+	{
+		PQconninfoOption *option;
+		for (option = options; option-keyword != NULL; option++)
+		{
+			if (strcmp(option-keyword, connect_timeout) == 0)
+			{
+if (option-val != NULL  option-val[0] != '\0')
+{
+	timeout_secs = atoi(option-val);
+	break;
+}
+			}
+		}
+		PQconninfoFree(options);
+	}
+	if (timeout_secs  0)
+		end_time = time(NULL) + timeout_secs;
+
+	sock = PQsocket(conn);
+
+	/* Wait for connection to be established */
+	for (;;)
+	{
+		fd_set	input_mask;
+		fd_set	output_mask;
+		time_t	now;
+		struct timeval timeout;
+		struct timeval *timeout_ptr;
+
+		FD_ZERO(input_mask);
+		FD_ZERO(output_mask);
+
+		status = PQconnectPoll(conn);
+		switch(status)
+		{
+			case PGRES_POLLING_OK:
+			case PGRES_POLLING_FAILED:
+return conn;
+
+			case PGRES_POLLING_READING:
+FD_SET(sock, input_mask);
+break;
+
+			case 

Re: [HACKERS] Streaming Replication on win32

2010-01-21 Thread Joe Conway
On 01/21/2010 04:46 AM, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
 Magnus Hagander wrote:
 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 We could replace the blocking PQexec() calls with PQsendQuery(), and use
  the emulated version of select() to wait.
 Hmm. That would at least theoretically work, but aren't there still
 places we may end up blocking further down? Or are those ok?

 There's also PQconnect that needs similar treatment (using
 PQconnectStart/Poll()), but that's it.
 
 So here's a patch implementing that for contrib/dblink. Walreceiver
 needs the same treatment.
 
 The implementation should be shared between the two, but I'm not sure
 how. We can't just put the wrapper functions to a module in
 src/backend/port/, because the wrapper functions depend on libpq. Maybe
 put them in a new header file as static functions, and include that in
 contrib/dblink/dblink.c and src/backend/replication/libpqwalreceiver.c.

+#ifdef WIN23
^
I assume you meant WIN32 here ;-)

+#define PQexec(conn, command) dblink_PQexec(conn, command)
+#define PQconnectdb(conninfo) dblink_PQconnectdb(conninfo)

I have not been really following this thread, but why can't we put the
#ifdef WIN32 and special definition of these functions into libpq. I
don't understand why we need special treatment for dblink.

Joe





signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Streaming Replication on win32

2010-01-21 Thread Heikki Linnakangas
Joe Conway wrote:
 +#ifdef WIN23
 ^
 I assume you meant WIN32 here ;-)

Yeah. I admit I haven't tested this on Windows, I just commented out
those #ifdef's and tested on Linux. Will need to verify that this
actually solves the problem on Windows before committing.

 +#define PQexec(conn, command) dblink_PQexec(conn, command)
 +#define PQconnectdb(conninfo) dblink_PQconnectdb(conninfo)
 
 I have not been really following this thread, but why can't we put the
 #ifdef WIN32 and special definition of these functions into libpq. I
 don't understand why we need special treatment for dblink.

The problem is that select() function on Windows isn't interrupted by
signals. That's because Unix-style signals don't exist on Windows, but
we've emulated them in the server with pipes. The select() function
doesn't know about that hack, so in the backend, we've replaced it with
pgwin32_select() that does, using a #define.

libpq doesn't use that #define and pgwin32_select(), because that's a
backend function. It won't work in regular client applications.

If we just moved those dblink_PQexec/PQconnectdb() functions to libpq,
they wouldn't use pgwin32_select() and would thus be useless.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-21 Thread Joe Conway
On 01/21/2010 10:33 PM, Heikki Linnakangas wrote:
 Joe Conway wrote:
 I have not been really following this thread, but why can't we put the
 #ifdef WIN32 and special definition of these functions into libpq. I
 don't understand why we need special treatment for dblink.
 
 The problem is that select() function on Windows isn't interrupted by
 signals. That's because Unix-style signals don't exist on Windows, but
 we've emulated them in the server with pipes. The select() function
 doesn't know about that hack, so in the backend, we've replaced it with
 pgwin32_select() that does, using a #define.

Ah, thanks for the synopsis.

 libpq doesn't use that #define and pgwin32_select(), because that's a
 backend function. It won't work in regular client applications.
 
 If we just moved those dblink_PQexec/PQconnectdb() functions to libpq,
 they wouldn't use pgwin32_select() and would thus be useless.

OK, so now I see why we want this fixed for dblink and walreceiver, but
doesn't this approach leave every other WIN32 libpq client out in the
cold? Is there nothing that can be done for the general case, or is it a
SMOP?

Joe




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Streaming Replication on win32

2010-01-21 Thread Heikki Linnakangas
Joe Conway wrote:
 OK, so now I see why we want this fixed for dblink and walreceiver, but
 doesn't this approach leave every other WIN32 libpq client out in the
 cold? Is there nothing that can be done for the general case, or is it a
 SMOP?

The problem only applies to libpq calls from the backend. Client apps
are not affected, only backend modules. If there's any other modules out
there that use libpq, then yes, those have a problem too.

A generic fix would be nice. Putting the wrapper functions in a new
header file that's included in all backend modules that want to use
libpq would work. Maybe the new header file could even be #included from
libpq-fe.h, when compiling backend code (#ifndef FRONTEND), so you
wouldn't need to modify dblink, walreceiver, or any 3rd party modules
that use libpq at all.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Fujii Masao
On Mon, Jan 18, 2010 at 5:22 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

 I presume the win32 emulation layer can be taught about non-blocking
 sockets? Or maybe pq_getbyte_if_available() can be implemented using
 some other simpler method on Windows.

How about checking the socket by using select/poll before calling
pq_getbyte_if_available()? This would prevent pgwin32_recv() from
being blocked because a message is guaranteed to have already arrived.
When the renegotiation happens, SSL_read (instead of pqwin32_recv())
is called with non-blocking socket, so it's not blocked.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Magnus Hagander
On Mon, Jan 18, 2010 at 10:30, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 18, 2010 at 5:22 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

 I presume the win32 emulation layer can be taught about non-blocking
 sockets? Or maybe pq_getbyte_if_available() can be implemented using
 some other simpler method on Windows.

 How about checking the socket by using select/poll before calling
 pq_getbyte_if_available()? This would prevent pgwin32_recv() from
 being blocked because a message is guaranteed to have already arrived.
 When the renegotiation happens, SSL_read (instead of pqwin32_recv())
 is called with non-blocking socket, so it's not blocked.

SSL_read calls into pqwin32_recv(), so you have the same problem. (see
my_sock_read() and my_sock_write() in be-secure.c)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Fujii Masao
On Mon, Jan 18, 2010 at 6:40 PM, Magnus Hagander mag...@hagander.net wrote:
 SSL_read calls into pqwin32_recv(), so you have the same problem. (see
 my_sock_read() and my_sock_write() in be-secure.c)

Oh, I confirmed that. Thanks!

Can we prevent SSL_read from being blocked in the renegotiation case
if we use poll/select in my_sock_read() even if the socket is blocking
mode? If Yes, ISTM that we can work around the problem by using the
special BIO function which checks the socket, as BIO.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Magnus Hagander
2010/1/18 Tom Lane t...@sss.pgh.pa.us:
 Magnus Hagander mag...@hagander.net writes:
 Which shows one potentially big problem - since we're calling select()
 from inside libpq, it's not calling our signal emulation layer
 compatible select(). This means that at this point, walreceiver is
 not interruptible.

 Ugh.

Indeed.


 Which also shows itself if I shut down the system -
 the walreceiver stays around, and won't terminate properly. Do we need
 to invent a way for libpq to call back into backend code to do this
 select? We certainly can't have libipq use our version directly -
 since that would break all non-postmaster/postgres processes.

 I think that on some platforms, it's possible for the call to select()
 from a shlib such as libpq to be captured by a select() provided by the
 executable it's loaded into.  Dunno about the linking rules on Windows,
 but is there any chance of a workaround that way?

AFAIK, no. We can somehow control the link order when we link with the
import library, but that would require us to do it at link time,
meaning libpq would be linked to postgres.exe -- FAIL.

Another option is to load the select() function on runtime, by having
libpq examine the list of loaded modules for postgres.exe.. But that
seems a lot uglier than providing some kind of callback for it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Magnus Hagander
2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Magnus Hagander wrote:
 Which shows one potentially big problem - since we're calling select()
 from inside libpq, it's not calling our signal emulation layer
 compatible select(). This means that at this point, walreceiver is
 not interruptible. Which also shows itself if I shut down the system -
 the walreceiver stays around, and won't terminate properly. Do we need
 to invent a way for libpq to call back into backend code to do this
 select? We certainly can't have libipq use our version directly -
 since that would break all non-postmaster/postgres processes.

 Hmm, contrib/dblink probably has the same issue then.

Yes.


 We could replace the blocking PQexec() calls with PQsendQuery(), and use
  the emulated version of select() to wait.

Hmm. That would at least theoretically work, but aren't there still
places we may end up blocking further down? Or are those ok?


 From what I can tell, this indicates that pq_getbyte_if_available() is
 not working - because it's supposed to never block, right?

 Right, it's not supposed to block.

 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

 I presume the win32 emulation layer can be taught about non-blocking
 sockets? Or maybe pq_getbyte_if_available() can be implemented using
 some other simpler method on Windows.

It could be taught that, but it would probably be a lot easier to put
platform specific code in pq_getbyte_if_available(). That's a bit
breaking an abstraction layer though, but that's maybe Ok in this
case...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-18 Thread Heikki Linnakangas
Magnus Hagander wrote:
 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 We could replace the blocking PQexec() calls with PQsendQuery(), and use
  the emulated version of select() to wait.
 
 Hmm. That would at least theoretically work, but aren't there still
 places we may end up blocking further down? Or are those ok?

There's also PQconnect that needs similar treatment (using
PQconnectStart/Poll()), but that's it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming Replication on win32

2010-01-17 Thread Magnus Hagander
I'm trying to figure out why streaming replication doesn't work on
win32. Here is what I have so far: It starts up fine, and outputs:
LOG:  starting archive recovery
LOG:  standby_mode = 'on'
LOG:  primary_conninfo = 'host=localhost port=5432'
LOG:  starting streaming recovery at 0/200

After this, *nothing* happens, and it never reaches a consistent state
or anything.

Looking at stacktraces, I notice two things:
walreceiver process is in:
ntdll!ZwWaitForSingleObject+0xa
mswsock+0x4f65
WS2_32!select+0x105
LIBPQ!pqSocketPoll(int sock = 4936, int forRead = 1, int forWrite = 0,
int64 end_time = -1)+0x2bb
LIBPQ!pqSocketCheck(struct pg_conn * conn = 0x`00830160, int
forRead = 1, int forWrite = 0, int64 end_time = -1)+0xa1
LIBPQ!pqWaitTimed(int forRead = 1, int forWrite = 0, struct pg_conn *
conn = 0x`00830160, int64 finish_time = -1)+0x2e
LIBPQ!pqWait(int forRead = 1, int forWrite = 0, struct pg_conn * conn
= 0x`00830160)+0x2a
LIBPQ!PQgetResult(struct pg_conn * conn = 0x`00830160)+0x82
LIBPQ!PQexecFinish(struct pg_conn * conn = 0x`00830160)+0x1c
LIBPQ!PQexec(struct pg_conn * conn = 0x`00830160, char * query
= 0x`0042f600 START_REPLICATION 0/200)+0x44
walreceiver!WalRcvConnect(void)+0x457
walreceiver!WalReceiverMain(struct FunctionCallInfoData * fcinfo =
0x`)+0x20e
postgres!AuxiliaryProcessMain(int argc = 2, char ** argv =
0x`0081f080)+0x600
postgres!SubPostmasterMain(int argc = 4, char ** argv =
0x`0081f070)+0x2d7
postgres!main(int argc = 4, char ** argv = 0x`0081f070)+0x1e4
postgres!__tmainCRTStartup(void)+0x192
postgres!mainCRTStartup(void)+0xe
kernel32!BaseProcessStart+0x2c


Which shows one potentially big problem - since we're calling select()
from inside libpq, it's not calling our signal emulation layer
compatible select(). This means that at this point, walreceiver is
not interruptible. Which also shows itself if I shut down the system -
the walreceiver stays around, and won't terminate properly. Do we need
to invent a way for libpq to call back into backend code to do this
select? We certainly can't have libipq use our version directly -
since that would break all non-postmaster/postgres processes.


The second thing I note is that the walsender is in:
ntdll!ZwWaitForMultipleObjects+0xa
kernel32!ReleaseSemaphore+0x6b
postgres!pgwin32_waitforsinglesocket(unsigned int64 s = 0x13fc, int
what = 41, int timeout = -1)+0x275
postgres!pgwin32_recv(unsigned int64 s = 0x13fc, char * buf =
0x`0042f990 ???, int len = 1, int f = 0)+0xf5
postgres!secure_read(struct Port * port = 0x`0042fcf0, void *
ptr = 0x`0042f990, unsigned int64 len = 1)+0x32
postgres!pq_getbyte_if_available(unsigned char * c =
0x`0042f990 ???)+0x106
postgres!CheckClosedConnection(void)+0x10
postgres!WalSndLoop(void)+0xdf
postgres!WalSenderMain(void)+0xb9
postgres!PostgresMain(int argc = 2, char ** argv =
0x`0084d520, char * username = 0x`0082e218
Administrator)+0x3b5
postgres!BackendRun(struct Port * port = 0x`0042fcf0)+0x235
postgres!SubPostmasterMain(int argc = 3, char ** argv =
0x`0081f080)+0x278
postgres!main(int argc = 3, char ** argv = 0x`0081f080)+0x1e4
postgres!__tmainCRTStartup(void)+0x192
postgres!mainCRTStartup(void)+0xe
kernel32!BaseProcessStart+0x2c

From what I can tell, this indicates that pq_getbyte_if_available() is
not working - because it's supposed to never block, right?

This could be because the win32 socket emulation layer simply wasn't
designed to deal with non-blocking sockets. Specifically, it actually
*always* sets the socket to non-blocking mode, and then uses that to
properly emulate how sockets work under unix.

Oh, and the walsender process says:
\Sessions\1\BaseNamedObjects\pgident(2196): postgres: wal sender
process Administrator 127.0.0.1(1398) startup
the walreceiver says:
\Sessions\1\BaseNamedObjects\pgident(2264): postgres: wal receiver process
and the startup process says:
\Sessions\1\BaseNamedObjects\pgident(2764): postgres: startup process
 waiting for 00010002


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-17 Thread Heikki Linnakangas
Magnus Hagander wrote:
 Which shows one potentially big problem - since we're calling select()
 from inside libpq, it's not calling our signal emulation layer
 compatible select(). This means that at this point, walreceiver is
 not interruptible. Which also shows itself if I shut down the system -
 the walreceiver stays around, and won't terminate properly. Do we need
 to invent a way for libpq to call back into backend code to do this
 select? We certainly can't have libipq use our version directly -
 since that would break all non-postmaster/postgres processes.

Hmm, contrib/dblink probably has the same issue then.

We could replace the blocking PQexec() calls with PQsendQuery(), and use
 the emulated version of select() to wait.

 From what I can tell, this indicates that pq_getbyte_if_available() is
 not working - because it's supposed to never block, right?

Right, it's not supposed to block.

 This could be because the win32 socket emulation layer simply wasn't
 designed to deal with non-blocking sockets. Specifically, it actually
 *always* sets the socket to non-blocking mode, and then uses that to
 properly emulate how sockets work under unix.

I presume the win32 emulation layer can be taught about non-blocking
sockets? Or maybe pq_getbyte_if_available() can be implemented using
some other simpler method on Windows.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication on win32

2010-01-17 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Which shows one potentially big problem - since we're calling select()
 from inside libpq, it's not calling our signal emulation layer
 compatible select(). This means that at this point, walreceiver is
 not interruptible.

Ugh.

 Which also shows itself if I shut down the system -
 the walreceiver stays around, and won't terminate properly. Do we need
 to invent a way for libpq to call back into backend code to do this
 select? We certainly can't have libipq use our version directly -
 since that would break all non-postmaster/postgres processes.

I think that on some platforms, it's possible for the call to select()
from a shlib such as libpq to be captured by a select() provided by the
executable it's loaded into.  Dunno about the linking rules on Windows,
but is there any chance of a workaround that way?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers