Re: [HACKERS] Streaming replication on win32, still broken
On Fri, Feb 19, 2010 at 7:54 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Heikki Linnakangas wrote: Magnus Hagander wrote: Well, it's going to make the process that reads the WAL cause actual physical I/O... That'll take a chunk out of your total available I/O, which is likely to push you to the limit of your I/O capacity much quicker. Right, doesn't seem sensible, though it would be nice to see a benchmark on that. Here's a patch to disable O_DIRECT when archiving or streaming is enabled. This is pretty hard to test, so any extra eyeballs would be nice.. Committed. Can you check that this fixed the PANIC you saw? Thanks! Yeah, SR works fine in my MinGW environment. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
Heikki Linnakangas wrote: Magnus Hagander wrote: Well, it's going to make the process that reads the WAL cause actual physical I/O... That'll take a chunk out of your total available I/O, which is likely to push you to the limit of your I/O capacity much quicker. Right, doesn't seem sensible, though it would be nice to see a benchmark on that. Here's a patch to disable O_DIRECT when archiving or streaming is enabled. This is pretty hard to test, so any extra eyeballs would be nice.. Committed. Can you check that this fixed the PANIC you saw? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
2010/2/18 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: Fujii Masao wrote: On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: If I'm reading the patch correctly, when wal_sync_method is 'open_sync', walreceiver nevertheless opens the WAL file without the O_DIRECT flag. When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(), issue_xlog_fsync() will do nothing because it assumes the write() synced it already. So the data written isn't being forced to disk at all. When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC flag. So I think that write() flushes the data to disk even if O_DIRECT flag is not given. Am I missing something? Ah, ok, you're right. Yes, I believe the difference is that with O_DIRECT it bypasses the cache completely. Without it, we still sync it out, but it also goes into the cache. O_DIRECT helps us when we're not going to read the file again, because we don't waste cache on it. If we are, which is the case here, it should be really bad for performance, since we actually have to do a physical read. Incidentally, that should also apply to general WAL when archive_mdoe is on. Do we optimize for that? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
Magnus Hagander wrote: O_DIRECT helps us when we're not going to read the file again, because we don't waste cache on it. If we are, which is the case here, it should be really bad for performance, since we actually have to do a physical read. Incidentally, that should also apply to general WAL when archive_mdoe is on. Do we optimize for that? Hmm, no we don't. We do take that into account so that we refrain from issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't disable O_DIRECT. Maybe we should.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Magnus Hagander wrote: O_DIRECT helps us when we're not going to read the file again, because we don't waste cache on it. If we are, which is the case here, it should be really bad for performance, since we actually have to do a physical read. Incidentally, that should also apply to general WAL when archive_mdoe is on. Do we optimize for that? Hmm, no we don't. We do take that into account so that we refrain from issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't disable O_DIRECT. Maybe we should.. Since the performance of WAL write is more important than that of WAL archiving in general, that optimization might offer little benefit. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
2010/2/18 Fujii Masao masao.fu...@gmail.com: On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Magnus Hagander wrote: O_DIRECT helps us when we're not going to read the file again, because we don't waste cache on it. If we are, which is the case here, it should be really bad for performance, since we actually have to do a physical read. Incidentally, that should also apply to general WAL when archive_mdoe is on. Do we optimize for that? Hmm, no we don't. We do take that into account so that we refrain from issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't disable O_DIRECT. Maybe we should.. Since the performance of WAL write is more important than that of WAL archiving in general, that optimization might offer little benefit. Well, it's going to make the process that reads the WAL cause actual physical I/O... That'll take a chunk out of your total available I/O, which is likely to push you to the limit of your I/O capacity much quicker. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
Magnus Hagander wrote: 2010/2/18 Fujii Masao masao.fu...@gmail.com: On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Magnus Hagander wrote: O_DIRECT helps us when we're not going to read the file again, because we don't waste cache on it. If we are, which is the case here, it should be really bad for performance, since we actually have to do a physical read. Incidentally, that should also apply to general WAL when archive_mdoe is on. Do we optimize for that? Hmm, no we don't. We do take that into account so that we refrain from issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't disable O_DIRECT. Maybe we should.. Since the performance of WAL write is more important than that of WAL archiving in general, that optimization might offer little benefit. Well, it's going to make the process that reads the WAL cause actual physical I/O... That'll take a chunk out of your total available I/O, which is likely to push you to the limit of your I/O capacity much quicker. Right, doesn't seem sensible, though it would be nice to see a benchmark on that. Here's a patch to disable O_DIRECT when archiving or streaming is enabled. This is pretty hard to test, so any extra eyeballs would be nice.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ? GNUmakefile ? a.patch ? b ? backend ? config.log ? config.status ? config.status.lineno ? configure.lineno ? gin-splay-1.patch ? gin-splay-2.patch ? gin-splay-3.patch ? include ? libpq ? log_unlogged_op_0115.patch ? md-1.c ? md-1.patch ? replication ? temp-file-resowner-2.patch ? contrib/pgbench/.deps ? contrib/pgbench/fsynctest ? contrib/pgbench/fsynctest.c ? contrib/pgbench/fsynctestfile ? contrib/pgbench/pgbench ? contrib/spi/.deps ? doc/src/sgml/HTML.index ? doc/src/sgml/bookindex.sgml ? doc/src/sgml/features-supported.sgml ? doc/src/sgml/features-unsupported.sgml ? doc/src/sgml/version.sgml ? src/Makefile.global ? src/backend/aaa.patch ? src/backend/postgres ? src/backend/access/common/.deps ? src/backend/access/gin/.deps ? src/backend/access/gist/.deps ? src/backend/access/hash/.deps ? src/backend/access/heap/.deps ? src/backend/access/index/.deps ? src/backend/access/nbtree/.deps ? src/backend/access/transam/.deps ? src/backend/bootstrap/.deps ? src/backend/catalog/.deps ? src/backend/commands/.deps ? src/backend/executor/.deps ? src/backend/foreign/.deps ? src/backend/foreign/dummy/.deps ? src/backend/foreign/postgresql/.deps ? src/backend/lib/.deps ? src/backend/libpq/.deps ? src/backend/main/.deps ? src/backend/nodes/.deps ? src/backend/optimizer/geqo/.deps ? src/backend/optimizer/path/.deps ? src/backend/optimizer/plan/.deps ? src/backend/optimizer/prep/.deps ? src/backend/optimizer/util/.deps ? src/backend/parser/.deps ? src/backend/po/af.mo ? src/backend/po/cs.mo ? src/backend/po/hr.mo ? src/backend/po/hu.mo ? src/backend/po/it.mo ? src/backend/po/ko.mo ? src/backend/po/nb.mo ? src/backend/po/nl.mo ? src/backend/po/pl.mo ? src/backend/po/ro.mo ? src/backend/po/ru.mo ? src/backend/po/sk.mo ? src/backend/po/sl.mo ? src/backend/po/sv.mo ? src/backend/po/zh_CN.mo ? src/backend/po/zh_TW.mo ? src/backend/port/.deps ? src/backend/postmaster/.deps ? src/backend/regex/.deps ? src/backend/replication/.deps ? src/backend/replication/libpqwalreceiver/.deps ? src/backend/rewrite/.deps ? src/backend/snowball/.deps ? src/backend/snowball/snowball_create.sql ? src/backend/storage/buffer/.deps ? src/backend/storage/file/.deps ? src/backend/storage/freespace/.deps ? src/backend/storage/ipc/.deps ? src/backend/storage/large_object/.deps ? src/backend/storage/lmgr/.deps ? src/backend/storage/page/.deps ? src/backend/storage/smgr/.deps ? src/backend/tcop/.deps ? src/backend/tsearch/.deps ? src/backend/utils/.deps ? src/backend/utils/probes.h ? src/backend/utils/adt/.deps ? src/backend/utils/cache/.deps ? src/backend/utils/error/.deps ? src/backend/utils/fmgr/.deps ? src/backend/utils/hash/.deps ? src/backend/utils/init/.deps ? src/backend/utils/mb/.deps ? src/backend/utils/mb/Unicode/BIG5.TXT ? src/backend/utils/mb/Unicode/CP950.TXT ? src/backend/utils/mb/conversion_procs/conversion_create.sql ? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps ? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps ? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps ? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps ? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps ? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps ?
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote: In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. Agreed. I'll remove O_DIRECT from walreceiver. Here is the patch to do that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 1627,1633 XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) /* create/use new log file */ use_existent = true; openLogFile = XLogFileInit(openLogId, openLogSeg, ! use_existent, true); openLogOff = 0; } --- 1627,1633 /* create/use new log file */ use_existent = true; openLogFile = XLogFileInit(openLogId, openLogSeg, ! use_existent, true, true); openLogOff = 0; } *** *** 2184,2189 XLogNeedsFlush(XLogRecPtr record) --- 2184,2195 * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * + * allow_direct_io: if TRUE, allow a WAL write to bypass the kernel cache + * by using PG_O_DIRECT for opening a file. Otherwise, PG_O_DIRECT is + * forcibly removed from the sync flag of open(). This should be FALSE + * only when walreceiver process writes WAL data because it's read + * immediately by the startup process. + * * Returns FD of opened file. * * Note: errors here are ERROR not PANIC because we might or might not be *** *** 2193,2199 XLogNeedsFlush(XLogRecPtr record) */ int XLogFileInit(uint32 log, uint32 seg, ! bool *use_existent, bool use_lock) { char path[MAXPGPATH]; char tmppath[MAXPGPATH]; --- 2199,2205 */ int XLogFileInit(uint32 log, uint32 seg, ! bool *use_existent, bool use_lock, bool allow_direct_io) { char path[MAXPGPATH]; char tmppath[MAXPGPATH]; *** *** 2203,2208 XLogFileInit(uint32 log, uint32 seg, --- 2209,2217 int max_advance; int fd; int nbytes; + int sync_bit; + + sync_bit = get_sync_bit(sync_method) (allow_direct_io ? 0 : ~PG_O_DIRECT); XLogFilePath(path, ThisTimeLineID, log, seg); *** *** 2211,2217 XLogFileInit(uint32 log, uint32 seg, */ if (*use_existent) { ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); if (fd 0) { --- 2220,2226 */ if (*use_existent) { ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit, S_IRUSR | S_IWUSR); if (fd 0) { *** *** 2237,2243 XLogFileInit(uint32 log, uint32 seg, unlink(tmppath); ! /* do not use get_sync_bit() here --- want to fsync only at end of fill */ fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, S_IRUSR | S_IWUSR); if (fd 0) --- 2246,2252 unlink(tmppath); ! /* do not use sync_bit here --- want to fsync only at end of fill */ fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, S_IRUSR | S_IWUSR); if (fd 0) *** *** 2317,2323 XLogFileInit(uint32 log, uint32 seg, *use_existent = false; /* Now open original target segment (might not be file I just made) */ ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); if (fd 0) ereport(ERROR, --- 2326,2332 *use_existent = false; /* Now open original target segment (might not be file I just made) */ ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit, S_IRUSR | S_IWUSR); if (fd 0) ereport(ERROR, *** *** 3121,3127 PreallocXlogFiles(XLogRecPtr endptr) { NextLogSeg(_logId, _logSeg); use_existent = true; ! lf = XLogFileInit(_logId, _logSeg, use_existent, true); close(lf); if (!use_existent) CheckpointStats.ckpt_segs_added++; --- 3130,3136 { NextLogSeg(_logId, _logSeg); use_existent = true; ! lf = XLogFileInit(_logId, _logSeg, use_existent, true, true); close(lf); if (!use_existent) CheckpointStats.ckpt_segs_added++; *** *** 4794,4800 BootStrapXLOG(void) /* Create first XLOG segment file */ use_existent = false; ! openLogFile = XLogFileInit(0, 0, use_existent, false); /* Write the first page with the initial record */ errno = 0; --- 4803,4809 /* Create first XLOG segment file */ use_existent = false;
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote: In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. Agreed. I'll remove O_DIRECT from walreceiver. Here is the patch to do that. Ooops! I found the bug in the patch. Here is the updated version. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 1627,1633 XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) /* create/use new log file */ use_existent = true; openLogFile = XLogFileInit(openLogId, openLogSeg, ! use_existent, true); openLogOff = 0; } --- 1627,1633 /* create/use new log file */ use_existent = true; openLogFile = XLogFileInit(openLogId, openLogSeg, ! use_existent, true, true); openLogOff = 0; } *** *** 2184,2189 XLogNeedsFlush(XLogRecPtr record) --- 2184,2195 * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * + * allow_direct_io: if TRUE, allow a WAL write to bypass the kernel cache + * by using PG_O_DIRECT for opening a file. Otherwise, PG_O_DIRECT is + * forcibly removed from the sync flag of open(). This should be FALSE + * only when walreceiver process writes WAL data because it's read + * immediately by the startup process. + * * Returns FD of opened file. * * Note: errors here are ERROR not PANIC because we might or might not be *** *** 2193,2199 XLogNeedsFlush(XLogRecPtr record) */ int XLogFileInit(uint32 log, uint32 seg, ! bool *use_existent, bool use_lock) { char path[MAXPGPATH]; char tmppath[MAXPGPATH]; --- 2199,2205 */ int XLogFileInit(uint32 log, uint32 seg, ! bool *use_existent, bool use_lock, bool allow_direct_io) { char path[MAXPGPATH]; char tmppath[MAXPGPATH]; *** *** 2203,2208 XLogFileInit(uint32 log, uint32 seg, --- 2209,2219 int max_advance; int fd; int nbytes; + int sync_bit; + + sync_bit = get_sync_bit(sync_method); + if (!allow_direct_io) + sync_bit = ~PG_O_DIRECT; XLogFilePath(path, ThisTimeLineID, log, seg); *** *** 2211,2217 XLogFileInit(uint32 log, uint32 seg, */ if (*use_existent) { ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); if (fd 0) { --- ,2228 */ if (*use_existent) { ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit, S_IRUSR | S_IWUSR); if (fd 0) { *** *** 2237,2243 XLogFileInit(uint32 log, uint32 seg, unlink(tmppath); ! /* do not use get_sync_bit() here --- want to fsync only at end of fill */ fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, S_IRUSR | S_IWUSR); if (fd 0) --- 2248,2254 unlink(tmppath); ! /* do not use sync_bit here --- want to fsync only at end of fill */ fd = BasicOpenFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, S_IRUSR | S_IWUSR); if (fd 0) *** *** 2317,2323 XLogFileInit(uint32 log, uint32 seg, *use_existent = false; /* Now open original target segment (might not be file I just made) */ ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); if (fd 0) ereport(ERROR, --- 2328,2334 *use_existent = false; /* Now open original target segment (might not be file I just made) */ ! fd = BasicOpenFile(path, O_RDWR | PG_BINARY | sync_bit, S_IRUSR | S_IWUSR); if (fd 0) ereport(ERROR, *** *** 3121,3127 PreallocXlogFiles(XLogRecPtr endptr) { NextLogSeg(_logId, _logSeg); use_existent = true; ! lf = XLogFileInit(_logId, _logSeg, use_existent, true); close(lf); if (!use_existent) CheckpointStats.ckpt_segs_added++; --- 3132,3138 { NextLogSeg(_logId, _logSeg); use_existent = true; ! lf = XLogFileInit(_logId, _logSeg, use_existent, true, true); close(lf); if (!use_existent) CheckpointStats.ckpt_segs_added++; *** *** 4794,4800 BootStrapXLOG(void) /* Create first XLOG segment file */ use_existent = false; ! openLogFile = XLogFileInit(0, 0, use_existent, false);
Re: [HACKERS] Streaming replication on win32, still broken
Fujii Masao wrote: On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote: In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. Agreed. I'll remove O_DIRECT from walreceiver. Here is the patch to do that. Ooops! I found the bug in the patch. Here is the updated version. If I'm reading the patch correctly, when wal_sync_method is 'open_sync', walreceiver nevertheless opens the WAL file without the O_DIRECT flag. When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(), issue_xlog_fsync() will do nothing because it assumes the write() synced it already. So the data written isn't being forced to disk at all. How about just forcing sync_method to 'fsync' in walreceiver? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: If I'm reading the patch correctly, when wal_sync_method is 'open_sync', walreceiver nevertheless opens the WAL file without the O_DIRECT flag. When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(), issue_xlog_fsync() will do nothing because it assumes the write() synced it already. So the data written isn't being forced to disk at all. When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC flag. So I think that write() flushes the data to disk even if O_DIRECT flag is not given. Am I missing something? How about just forcing sync_method to 'fsync' in walreceiver? In win32, O_DSYNC seems to be preferred to 'fsync' so far. So I'm not sure if reshuffling of priority is harmless. http://archives.postgresql.org/pgsql-hackers-win32/2005-03/msg00148.php Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
Fujii Masao wrote: On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: If I'm reading the patch correctly, when wal_sync_method is 'open_sync', walreceiver nevertheless opens the WAL file without the O_DIRECT flag. When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(), issue_xlog_fsync() will do nothing because it assumes the write() synced it already. So the data written isn't being forced to disk at all. When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC flag. So I think that write() flushes the data to disk even if O_DIRECT flag is not given. Am I missing something? Ah, ok, you're right. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote: With the libpq fixes, I get further (more on that fix later, btw), but now I get stuck in this. When I do something on the master that generates WAL, such as insert a record, and then try to query this on the slave, the walreceiver process crashes with: PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: Invalid argument LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 I'll keep digging at the details, but if somebody has a good idea here.. ;) Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. Though I've not idenfied the cause yet, I guess that it derives from wrong use of the type of local variables in XLogWalRcvWrite(). I'll continue investigation of it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
2010/2/16 Fujii Masao masao.fu...@gmail.com: On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote: With the libpq fixes, I get further (more on that fix later, btw), but now I get stuck in this. When I do something on the master that generates WAL, such as insert a record, and then try to query this on the slave, the walreceiver process crashes with: PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: Invalid argument LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 I'll keep digging at the details, but if somebody has a good idea here.. ;) Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. Though I've not idenfied the cause yet, I guess that it derives from wrong use of the type of local variables in XLogWalRcvWrite(). I'll continue investigation of it. Thanks! I will be somewhat spottily available over the next two days due to on-site work with clients. Let me know if you would be helped by some details of how to get a (somewhat faster) EC2 image up and running with MSVC to test on :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander mag...@hagander.net wrote: 2010/2/16 Fujii Masao masao.fu...@gmail.com: On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote: With the libpq fixes, I get further (more on that fix later, btw), but now I get stuck in this. When I do something on the master that generates WAL, such as insert a record, and then try to query this on the slave, the walreceiver process crashes with: PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: Invalid argument LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 I'll keep digging at the details, but if somebody has a good idea here.. ;) Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. Though I've not idenfied the cause yet, I guess that it derives from wrong use of the type of local variables in XLogWalRcvWrite(). I'll continue investigation of it. Thanks! I will be somewhat spottily available over the next two days due to on-site work with clients. Let me know if you would be helped by some details of how to get a (somewhat faster) EC2 image up and running with MSVC to test on :-) Thanks! I can probably use the EC2 image by reading your great blog post. http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html But it might take some time to make my sysadmin open the port for rdesktop for some reasons... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/2/16 Fujii Masao masao.fu...@gmail.com: On Tue, Feb 16, 2010 at 1:33 AM, Magnus Hagander mag...@hagander.net wrote: 2010/2/15 Tom Lane t...@sss.pgh.pa.us: Magnus Hagander mag...@hagander.net writes: I changed your patch to this, because I find it a lot simpler. The change is in the checking in pgwin32_recv - there is no need to ever call waitforsinglesocket, we can just exit out early. Thanks a lot, Magnus! Do you see any issue with that? This definitely looks cleaner, but is there a reason not to use bool instead of int here? No. Can include/port/win32.h refer to bool type? Nope, you're correct, it can't. Committed without that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
2010/2/16 Fujii Masao masao.fu...@gmail.com: On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander mag...@hagander.net wrote: 2010/2/16 Fujii Masao masao.fu...@gmail.com: On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander mag...@hagander.net wrote: With the libpq fixes, I get further (more on that fix later, btw), but now I get stuck in this. When I do something on the master that generates WAL, such as insert a record, and then try to query this on the slave, the walreceiver process crashes with: PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: Invalid argument LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 I'll keep digging at the details, but if somebody has a good idea here.. ;) Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. Though I've not idenfied the cause yet, I guess that it derives from wrong use of the type of local variables in XLogWalRcvWrite(). I'll continue investigation of it. Thanks! I will be somewhat spottily available over the next two days due to on-site work with clients. Let me know if you would be helped by some details of how to get a (somewhat faster) EC2 image up and running with MSVC to test on :-) Thanks! I can probably use the EC2 image by reading your great blog post. http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html Actually, that one deosn't work anymore, because I managed to break the image :-) If you send me your amazon id, I can get you premissions on my private image. I plan to clean it up and make it public, just haven't gotten around to it yet... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander mag...@hagander.net wrote: If you send me your amazon id, I can get you premissions on my private image. I plan to clean it up and make it public, just haven't gotten around to it yet... Thanks for your concern! I'll send the ID when I complete the preparation. And, fortunately?, when I set wal_sync_method to open_sync, the problem was reproduced in the linux, too. The cause is that the data that is written by walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is used by default. So the problem always happened on win32. I propose two solution ideas: 1. O_DIRECT is somewhat harmful in the standby since the data written by walreceiver is read by the startup process immediately. So, how about not making only walreceiver use O_DIRECT? 2. Straightforwardly observe the alignment rule. Since the received WAL data might start at the middle of WAL block, walreceiver needs to keep the last half-written WAL block for alignment. OTOH since the received data might end at the middle of WAL block, walreceiver needs zero-padding. As a result, walreceiver writes the set of the last WAL block, received data and zero-padding. Which is better? Or do you have another better idea? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander mag...@hagander.net wrote: If you send me your amazon id, I can get you premissions on my private image. I plan to clean it up and make it public, just haven't gotten around to it yet... Thanks for your concern! I'll send the ID when I complete the preparation. ok. And, fortunately?, when I set wal_sync_method to open_sync, the problem was reproduced in the linux, too. The cause is that the data that is written by Ah, that's good. It always helps if it's a cross-platform issue - particularly in that it's not one of the funky win32 specific things we did :) walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is used by default. So the problem always happened on win32. Ahh. I see. I propose two solution ideas: 1. O_DIRECT is somewhat harmful in the standby since the data written by walreceiver is read by the startup process immediately. So, how about not making only walreceiver use O_DIRECT? In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. 2. Straightforwardly observe the alignment rule. Since the received WAL data might start at the middle of WAL block, walreceiver needs to keep the last half-written WAL block for alignment. OTOH since the received data might end at the middle of WAL block, walreceiver needs zero-padding. As a result, walreceiver writes the set of the last WAL block, received data and zero-padding. May there be other reasons to d this as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
Magnus Hagander mag...@hagander.net writes: On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote: 2. Straightforwardly observe the alignment rule. Since the received WAL data might start at the middle of WAL block, walreceiver needs to keep the last half-written WAL block for alignment. OTOH since the received data might end at the middle of WAL block, walreceiver needs zero-padding. As a result, walreceiver writes the set of the last WAL block, received data and zero-padding. May there be other reasons to d this as well? Writing misaligned data is certain to be expensive even when it works... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander mag...@hagander.net wrote: In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. Agreed. I'll remove O_DIRECT from walreceiver. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication on win32, still broken
On Wed, Feb 17, 2010 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Wed, Feb 17, 2010 at 06:55, Fujii Masao masao.fu...@gmail.com wrote: 2. Straightforwardly observe the alignment rule. Since the received WAL data might start at the middle of WAL block, walreceiver needs to keep the last half-written WAL block for alignment. OTOH since the received data might end at the middle of WAL block, walreceiver needs zero-padding. As a result, walreceiver writes the set of the last WAL block, received data and zero-padding. May there be other reasons to d this as well? Writing misaligned data is certain to be expensive even when it works... Yeah, right. After I remove O_DIRECT, I'll change walreceiver so as to do an alignment correctly, and then I'll test the performance. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming replication on win32, still broken
With the libpq fixes, I get further (more on that fix later, btw), but now I get stuck in this. When I do something on the master that generates WAL, such as insert a record, and then try to query this on the slave, the walreceiver process crashes with: PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: Invalid argument LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 I'll keep digging at the details, but if somebody has a good idea here.. ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/2/15 Fujii Masao masao.fu...@gmail.com: On Sun, Feb 14, 2010 at 11:52 PM, Magnus Hagander mag...@hagander.net wrote: Remember that the win32 code *always* puts the socket in non-blocking mode. So we can't just teach the layer about it. We need some way to pass the information down that this is actually something we want to be non-blocking, and it can't be the normal flag on the socket. I don't really have an idea of where else we'd put it though :( It's in the port structure, but not beyond it. Right. BTW, pq_getbyte_if_available() always changes the socket to non-blocking and blocking mode before and after calling secure_read(), respectively. This code seems wrong on win32. Because, as you said, the socket is always in non-blocking mode on win32. We should change pq_getbyte_if_available() so as not to change the socket mode only in win32? Yes. What we could do, is have an ugly global flag specifically for the use-case we have here. Assuming we do create a plataform specific pq_getbyte_if_available(), the code-path that would have trouble now would be when we call pq_getbyte_if_available(), and it in turns asks the socket if there is data, there is, but we end up calling back into the SSL code to fetch the data, and it gets an incomplete packet. Correct? So the path is basically: pq_getbyte_if_available() - secure_read() - SSL_read() - my_sock_read() - pgwin32_recv() Given that we know we are working on a single socket here, we could use a global flag to tell pgwin32_recv() to become nonblocking. We could set this flag directly in the win32-specific version of pq_getbyte_if_available(), and make sure it's cleared as soon as we exit. It will obviously fail if we do anything on a *different* socket during this time, so it has to be set for a very short time. But that seems doable. And we don't call any socket stuff from signal handlers so that shouldn't cause issues. Agreed. Here is the patch which does that (including the above-mentioned change). I haven't tested it yet because I failed in creating the build environment for the MSVC :( I'll try to create that again, and test it. Though I'm not sure how long it takes. I changed your patch to this, because I find it a lot simpler. The change is in the checking in pgwin32_recv - there is no need to ever call waitforsinglesocket, we can just exit out early. Do you see any issue with that? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ pq_getbyte_if_available_on_win32_magnus.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Magnus Hagander mag...@hagander.net writes: I changed your patch to this, because I find it a lot simpler. The change is in the checking in pgwin32_recv - there is no need to ever call waitforsinglesocket, we can just exit out early. Do you see any issue with that? This definitely looks cleaner, but is there a reason not to use bool instead of int here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/2/15 Tom Lane t...@sss.pgh.pa.us: Magnus Hagander mag...@hagander.net writes: I changed your patch to this, because I find it a lot simpler. The change is in the checking in pgwin32_recv - there is no need to ever call waitforsinglesocket, we can just exit out early. Do you see any issue with that? This definitely looks cleaner, but is there a reason not to use bool instead of int here? No. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On Tue, Feb 16, 2010 at 1:33 AM, Magnus Hagander mag...@hagander.net wrote: 2010/2/15 Tom Lane t...@sss.pgh.pa.us: Magnus Hagander mag...@hagander.net writes: I changed your patch to this, because I find it a lot simpler. The change is in the checking in pgwin32_recv - there is no need to ever call waitforsinglesocket, we can just exit out early. Thanks a lot, Magnus! Do you see any issue with that? This definitely looks cleaner, but is there a reason not to use bool instead of int here? No. Can include/port/win32.h refer to bool type? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/2/8 Fujii Masao masao.fu...@gmail.com: On Mon, Jan 18, 2010 at 11:46 PM, Magnus Hagander mag...@hagander.net wrote: From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? Right, it's not supposed to block. This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. It could be taught that, but it would probably be a lot easier to put platform specific code in pq_getbyte_if_available(). Umm.. in this case, for SSL on win32 case, we also need to create new function like my_sock_read_if_available() that receives data from non-blocking socket, and reassign it to the SSL BIO function. Right? If so, it seems easier for me to tell the win32 layer about non-blocking. Sorry about the delay in responding to this. Remember that the win32 code *always* puts the socket in non-blocking mode. So we can't just teach the layer about it. We need some way to pass the information down that this is actually something we want to be non-blocking, and it can't be the normal flag on the socket. I don't really have an idea of where else we'd put it though :( It's in the port structure, but not beyond it. What we could do, is have an ugly global flag specifically for the use-case we have here. Assuming we do create a plataform specific pq_getbyte_if_available(), the code-path that would have trouble now would be when we call pq_getbyte_if_available(), and it in turns asks the socket if there is data, there is, but we end up calling back into the SSL code to fetch the data, and it gets an incomplete packet. Correct? So the path is basically: pq_getbyte_if_available() - secure_read() - SSL_read() - my_sock_read() - pgwin32_recv() Given that we know we are working on a single socket here, we could use a global flag to tell pgwin32_recv() to become nonblocking. We could set this flag directly in the win32-specific version of pq_getbyte_if_available(), and make sure it's cleared as soon as we exit. It will obviously fail if we do anything on a *different* socket during this time, so it has to be set for a very short time. But that seems doable. And we don't call any socket stuff from signal handlers so that shouldn't cause issues. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On Sun, Feb 14, 2010 at 11:52 PM, Magnus Hagander mag...@hagander.net wrote: Sorry about the delay in responding to this. Thanks for the response. Remember that the win32 code *always* puts the socket in non-blocking mode. So we can't just teach the layer about it. We need some way to pass the information down that this is actually something we want to be non-blocking, and it can't be the normal flag on the socket. I don't really have an idea of where else we'd put it though :( It's in the port structure, but not beyond it. Right. BTW, pq_getbyte_if_available() always changes the socket to non-blocking and blocking mode before and after calling secure_read(), respectively. This code seems wrong on win32. Because, as you said, the socket is always in non-blocking mode on win32. We should change pq_getbyte_if_available() so as not to change the socket mode only in win32? What we could do, is have an ugly global flag specifically for the use-case we have here. Assuming we do create a plataform specific pq_getbyte_if_available(), the code-path that would have trouble now would be when we call pq_getbyte_if_available(), and it in turns asks the socket if there is data, there is, but we end up calling back into the SSL code to fetch the data, and it gets an incomplete packet. Correct? So the path is basically: pq_getbyte_if_available() - secure_read() - SSL_read() - my_sock_read() - pgwin32_recv() Given that we know we are working on a single socket here, we could use a global flag to tell pgwin32_recv() to become nonblocking. We could set this flag directly in the win32-specific version of pq_getbyte_if_available(), and make sure it's cleared as soon as we exit. It will obviously fail if we do anything on a *different* socket during this time, so it has to be set for a very short time. But that seems doable. And we don't call any socket stuff from signal handlers so that shouldn't cause issues. Agreed. Here is the patch which does that (including the above-mentioned change). I haven't tested it yet because I failed in creating the build environment for the MSVC :( I'll try to create that again, and test it. Though I'm not sure how long it takes. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/libpq/pqcomm.c --- b/src/backend/libpq/pqcomm.c *** *** 837,845 pq_getbyte_if_available(unsigned char *c) --- 837,849 } /* Temporarily put the socket into non-blocking mode */ + #ifdef WIN32 + pgwin32_noblock = 1; + #else if (!pg_set_noblock(MyProcPort-sock)) ereport(ERROR, (errmsg(couldn't put socket to non-blocking mode: %m))); + #endif MyProcPort-noblock = true; PG_TRY(); { *** *** 851,866 pq_getbyte_if_available(unsigned char *c) --- 855,878 * The rest of the backend code assumes the socket is in blocking * mode, so treat failure as FATAL. */ + #ifdef WIN32 + pgwin32_noblock = 0; + #else if (!pg_set_block(MyProcPort-sock)) ereport(FATAL, (errmsg(couldn't put socket to blocking mode: %m))); + #endif MyProcPort-noblock = false; PG_RE_THROW(); } PG_END_TRY(); + #ifdef WIN32 + pgwin32_noblock = 0; + #else if (!pg_set_block(MyProcPort-sock)) ereport(FATAL, (errmsg(couldn't put socket to blocking mode: %m))); + #endif MyProcPort-noblock = false; return r; *** a/src/backend/port/win32/socket.c --- b/src/backend/port/win32/socket.c *** *** 13,18 --- 13,26 #include postgres.h + /* + * This indicates whether pgwin32_recv() is blocked until the receive + * operation has been finished. A value of zero blocks it even if + * the socket is set to non-blocking mode. A non-zero value makes it + * return immediately whether data is available or not. + */ + int pgwin32_noblock = 0; + #undef socket #undef accept #undef connect *** *** 315,321 pgwin32_recv(SOCKET s, char *buf, int len, int f) for (n = 0; n 5; n++) { if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT, ! INFINITE) == 0) return -1; /* errno already set */ r = WSARecv(s, wbuf, 1, b, flags, NULL, NULL); --- 323,330 for (n = 0; n 5; n++) { if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT, ! (pgwin32_noblock == 0) ? INFINITE : 0) ! == 0) return -1; /* errno already set */ r = WSARecv(s, wbuf, 1, b, flags, NULL, NULL); *** a/src/include/port/win32.h --- b/src/include/port/win32.h *** *** 283,288 int pgwin32_send(SOCKET s, char *buf, int len, int flags); --- 283,290 const char *pgwin32_socket_strerror(int err); int pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout); + extern int pgwin32_noblock; + /* in backend/port/win32/security.c */ extern int pgwin32_is_admin(void); extern int pgwin32_is_service(void); -- Sent
Re: [HACKERS] Streaming Replication on win32
On Mon, Jan 18, 2010 at 11:46 PM, Magnus Hagander mag...@hagander.net wrote: From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? Right, it's not supposed to block. This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. It could be taught that, but it would probably be a lot easier to put platform specific code in pq_getbyte_if_available(). Umm.. in this case, for SSL on win32 case, we also need to create new function like my_sock_read_if_available() that receives data from non-blocking socket, and reassign it to the SSL BIO function. Right? If so, it seems easier for me to tell the win32 layer about non-blocking. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On 01/21/2010 11:19 PM, Heikki Linnakangas wrote: Joe Conway wrote: OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. Sorry for being thick -- I'm still missing something. I don't understand why any user program using libpq/PQexec running on Windows does not have the same issue. Or to put it another way, why does this only apply to libpq calls from backend modules? A generic fix would be nice. Putting the wrapper functions in a new header file that's included in all backend modules that want to use libpq would work. Maybe the new header file could even be #included from libpq-fe.h, when compiling backend code (#ifndef FRONTEND), so you wouldn't need to modify dblink, walreceiver, or any 3rd party modules that use libpq at all. I'll go ahead and do this if there is consensus for it. Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Streaming Replication on win32
2010/1/24 Joe Conway m...@joeconway.com: On 01/21/2010 11:19 PM, Heikki Linnakangas wrote: Joe Conway wrote: OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. Sorry for being thick -- I'm still missing something. I don't understand why any user program using libpq/PQexec running on Windows does not have the same issue. Or to put it another way, why does this only apply to libpq calls from backend modules? Because Windows programs in general don't rely on working Unix signal semantics - since Win32 doesn't *have* them. We faked them for PostgreSQL so we didn't have to rewrite large parts of how the backend deals with those things. I don't know any other software that does - and especially not client side software. So yeah, you could say they are affected insofar that their calls will be blocked as well, but if they are Windows apps they are probably designed to deal with it. It's the common way for such calls to behave on the platform. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Magnus Hagander mag...@hagander.net writes: 2010/1/24 Joe Conway m...@joeconway.com: Sorry for being thick -- I'm still missing something. I don't understand why any user program using libpq/PQexec running on Windows does not have the same issue. Or to put it another way, why does this only apply to libpq calls from backend modules? Because Windows programs in general don't rely on working Unix signal semantics - since Win32 doesn't *have* them. The real question is why is it so critical for our emulated signals to be able to interrupt these operations. If you're trying to tell me that Hot Standby is too fragile to work unless it can interrupt them, then HS has got a serious problem, and it is not libpq's fault. There is an enormous amount of code in the backend that can run for long periods of time without noticing signals, and not all of that is fixable. Consider for example a plperl, plpython, or pltcl function that goes off and does a long computation. So I'm thinking that proposing to kluge up libpq in this area isn't solving the real problem anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: 2010/1/24 Joe Conway m...@joeconway.com: Sorry for being thick -- I'm still missing something. I don't understand why any user program using libpq/PQexec running on Windows does not have the same issue. Or to put it another way, why does this only apply to libpq calls from backend modules? Because Windows programs in general don't rely on working Unix signal semantics - since Win32 doesn't *have* them. The real question is why is it so critical for our emulated signals to be able to interrupt these operations. The process won't react to a shutdown request otherwise, while it's waiting for the response to PQexec(). It's not such a big deal for walreceiver, actually, because it already uses select() (with emulation) in the main loop, but it's theoretically possible for the connection to be silently lost during the initial handshake, after sending the Query message, before receiving a response. dblink/contrib has the same issue, it might wait for a response forever. Hmm, maybe we should just TCP_KEEPALIVE (if available) on the connection. We should really be doing that in walreceiver anyway, walreceiver won't otherwise notice if the connection is silently lost, and won't know to reconnect. If you're trying to tell me that Hot Standby is too fragile to work unless it can interrupt them, then HS has got a serious problem, and it is not libpq's fault. There is an enormous amount of code in the backend that can run for long periods of time without noticing signals, and not all of that is fixable. Consider for example a plperl, plpython, or pltcl function that goes off and does a long computation. No, it's not related to Hot Standby. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Joe Conway wrote: OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. plproxy comes to mind. -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On 1/22/10, Dimitri Fontaine dfonta...@hi-media.com wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Joe Conway wrote: OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. plproxy comes to mind. Thats interesting. PL/Proxy deos not use PQexec, it uses async execution and waits on sockets with plain select() called from code compiled with backend headers. So it seems to be already using pgwin32_select(). Or not? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Marko Kreen wrote: On 1/22/10, Dimitri Fontaine dfonta...@hi-media.com wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. plproxy comes to mind. Thats interesting. PL/Proxy deos not use PQexec, it uses async execution and waits on sockets with plain select() called from code compiled with backend headers. So it seems to be already using pgwin32_select(). Or not? Yes. I just grepped plproxy source code and there's indeed no blocking libpq calls there. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Heikki Linnakangas wrote: Magnus Hagander wrote: 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: We could replace the blocking PQexec() calls with PQsendQuery(), and use the emulated version of select() to wait. Hmm. That would at least theoretically work, but aren't there still places we may end up blocking further down? Or are those ok? There's also PQconnect that needs similar treatment (using PQconnectStart/Poll()), but that's it. So here's a patch implementing that for contrib/dblink. Walreceiver needs the same treatment. The implementation should be shared between the two, but I'm not sure how. We can't just put the wrapper functions to a module in src/backend/port/, because the wrapper functions depend on libpq. Maybe put them in a new header file as static functions, and include that in contrib/dblink/dblink.c and src/backend/replication/libpqwalreceiver.c. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 2c1d7a2..fa11709 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -34,6 +34,14 @@ #include limits.h +#ifdef WIN23 +/* These are needed by the interruptible libpq function replacements */ +#include time.h +#include unistd.h +#include sys/time.h +#include sys/types.h +#endif + #include libpq-fe.h #include fmgr.h #include funcapi.h @@ -101,6 +109,193 @@ static void dblink_res_error(const char *conname, PGresult *res, const char *dbl static char *get_connect_string(const char *servername); static char *escape_param_str(const char *from); +#ifdef WIN23 +/* + * Replacement functions for blocking libpq functions, for Windows. + * + * On Windows, the vanilla select() function doesn't react to our emulated + * signals. PQexec() and PQconnectdb() use select(), so they're + * uninterruptible. These replacement functions use the corresponding + * asynchronous libpq functions and backend version of select() to implement + * the same functionality, but in a way that's interrupted by signals. + * + * These work on other platforms as well, but presumably it's more efficient + * to let libpq block. + */ + +static PGresult * +dblink_PQexec(PGconn *conn, const char *command) +{ + int sock; + PGresult *result, + *lastResult; + + /* Send query. This can block too, but we ignore that for now. */ + if (PQsendQuery(conn, command) == 0) + return NULL; + + /* Wait for response */ + sock = PQsocket(conn); + + while(PQisBusy(conn)) + { + fd_set input_mask; + + FD_ZERO(input_mask); + + FD_SET (sock, input_mask); + + /* + * Note that we don't check the return code. We assume that + * PQconsumeInput() will get the same error, and set the result + * as failed. + */ + select(sock + 1, input_mask, NULL, NULL, NULL); + PQconsumeInput(conn); + } + + /* + * Emulate PQexec()'s behavior of returning the *last* result, if + * there's many. dblink doesn't normally issue statements that return + * multiple results, but the user-supplied SQL statement passed to + * dblink() might. You'll only get the last result back, so it's not a + * very sensible thing to do, but we must still handle it gracefully. + * + * We don't try to concatenate error messages like PQexec() does. + * Doesn't seem worth the effort. + */ + lastResult = NULL; + while((result = PQgetResult(conn)) != NULL) + { + if (lastResult != NULL) + { + if (PQresultStatus(lastResult) != PGRES_COMMAND_OK +PQresultStatus(lastResult) != PGRES_TUPLES_OK) + { +PQclear(result); +result = lastResult; + } + else +PQclear(lastResult); + } + lastResult = result; + } + + return lastResult; +} + +static PGconn * +dblink_PQconnectdb(const char *conninfo) +{ + PGconn *conn; + PostgresPollingStatusType status; + PQconninfoOption *options; + int timeout_secs = 0; + time_t end_time; + int sock; + + conn = PQconnectStart(conninfo); + if (conn == NULL) + return NULL; + + if (PQstatus(conn) == CONNECTION_BAD) + return conn; + + /* Extract timeout from the connection string */ + options = PQconninfoParse(conninfo, NULL); + if (options) + { + PQconninfoOption *option; + for (option = options; option-keyword != NULL; option++) + { + if (strcmp(option-keyword, connect_timeout) == 0) + { +if (option-val != NULL option-val[0] != '\0') +{ + timeout_secs = atoi(option-val); + break; +} + } + } + PQconninfoFree(options); + } + if (timeout_secs 0) + end_time = time(NULL) + timeout_secs; + + sock = PQsocket(conn); + + /* Wait for connection to be established */ + for (;;) + { + fd_set input_mask; + fd_set output_mask; + time_t now; + struct timeval timeout; + struct timeval *timeout_ptr; + + FD_ZERO(input_mask); + FD_ZERO(output_mask); + + status = PQconnectPoll(conn); + switch(status) + { + case PGRES_POLLING_OK: + case PGRES_POLLING_FAILED: +return conn; + + case PGRES_POLLING_READING: +FD_SET(sock, input_mask); +break; + + case
Re: [HACKERS] Streaming Replication on win32
On 01/21/2010 04:46 AM, Heikki Linnakangas wrote: Heikki Linnakangas wrote: Magnus Hagander wrote: 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: We could replace the blocking PQexec() calls with PQsendQuery(), and use the emulated version of select() to wait. Hmm. That would at least theoretically work, but aren't there still places we may end up blocking further down? Or are those ok? There's also PQconnect that needs similar treatment (using PQconnectStart/Poll()), but that's it. So here's a patch implementing that for contrib/dblink. Walreceiver needs the same treatment. The implementation should be shared between the two, but I'm not sure how. We can't just put the wrapper functions to a module in src/backend/port/, because the wrapper functions depend on libpq. Maybe put them in a new header file as static functions, and include that in contrib/dblink/dblink.c and src/backend/replication/libpqwalreceiver.c. +#ifdef WIN23 ^ I assume you meant WIN32 here ;-) +#define PQexec(conn, command) dblink_PQexec(conn, command) +#define PQconnectdb(conninfo) dblink_PQconnectdb(conninfo) I have not been really following this thread, but why can't we put the #ifdef WIN32 and special definition of these functions into libpq. I don't understand why we need special treatment for dblink. Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Streaming Replication on win32
Joe Conway wrote: +#ifdef WIN23 ^ I assume you meant WIN32 here ;-) Yeah. I admit I haven't tested this on Windows, I just commented out those #ifdef's and tested on Linux. Will need to verify that this actually solves the problem on Windows before committing. +#define PQexec(conn, command) dblink_PQexec(conn, command) +#define PQconnectdb(conninfo) dblink_PQconnectdb(conninfo) I have not been really following this thread, but why can't we put the #ifdef WIN32 and special definition of these functions into libpq. I don't understand why we need special treatment for dblink. The problem is that select() function on Windows isn't interrupted by signals. That's because Unix-style signals don't exist on Windows, but we've emulated them in the server with pipes. The select() function doesn't know about that hack, so in the backend, we've replaced it with pgwin32_select() that does, using a #define. libpq doesn't use that #define and pgwin32_select(), because that's a backend function. It won't work in regular client applications. If we just moved those dblink_PQexec/PQconnectdb() functions to libpq, they wouldn't use pgwin32_select() and would thus be useless. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On 01/21/2010 10:33 PM, Heikki Linnakangas wrote: Joe Conway wrote: I have not been really following this thread, but why can't we put the #ifdef WIN32 and special definition of these functions into libpq. I don't understand why we need special treatment for dblink. The problem is that select() function on Windows isn't interrupted by signals. That's because Unix-style signals don't exist on Windows, but we've emulated them in the server with pipes. The select() function doesn't know about that hack, so in the backend, we've replaced it with pgwin32_select() that does, using a #define. Ah, thanks for the synopsis. libpq doesn't use that #define and pgwin32_select(), because that's a backend function. It won't work in regular client applications. If we just moved those dblink_PQexec/PQconnectdb() functions to libpq, they wouldn't use pgwin32_select() and would thus be useless. OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Streaming Replication on win32
Joe Conway wrote: OK, so now I see why we want this fixed for dblink and walreceiver, but doesn't this approach leave every other WIN32 libpq client out in the cold? Is there nothing that can be done for the general case, or is it a SMOP? The problem only applies to libpq calls from the backend. Client apps are not affected, only backend modules. If there's any other modules out there that use libpq, then yes, those have a problem too. A generic fix would be nice. Putting the wrapper functions in a new header file that's included in all backend modules that want to use libpq would work. Maybe the new header file could even be #included from libpq-fe.h, when compiling backend code (#ifndef FRONTEND), so you wouldn't need to modify dblink, walreceiver, or any 3rd party modules that use libpq at all. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On Mon, Jan 18, 2010 at 5:22 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. How about checking the socket by using select/poll before calling pq_getbyte_if_available()? This would prevent pgwin32_recv() from being blocked because a message is guaranteed to have already arrived. When the renegotiation happens, SSL_read (instead of pqwin32_recv()) is called with non-blocking socket, so it's not blocked. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On Mon, Jan 18, 2010 at 10:30, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 18, 2010 at 5:22 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. How about checking the socket by using select/poll before calling pq_getbyte_if_available()? This would prevent pgwin32_recv() from being blocked because a message is guaranteed to have already arrived. When the renegotiation happens, SSL_read (instead of pqwin32_recv()) is called with non-blocking socket, so it's not blocked. SSL_read calls into pqwin32_recv(), so you have the same problem. (see my_sock_read() and my_sock_write() in be-secure.c) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
On Mon, Jan 18, 2010 at 6:40 PM, Magnus Hagander mag...@hagander.net wrote: SSL_read calls into pqwin32_recv(), so you have the same problem. (see my_sock_read() and my_sock_write() in be-secure.c) Oh, I confirmed that. Thanks! Can we prevent SSL_read from being blocked in the renegotiation case if we use poll/select in my_sock_read() even if the socket is blocking mode? If Yes, ISTM that we can work around the problem by using the special BIO function which checks the socket, as BIO. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/1/18 Tom Lane t...@sss.pgh.pa.us: Magnus Hagander mag...@hagander.net writes: Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our signal emulation layer compatible select(). This means that at this point, walreceiver is not interruptible. Ugh. Indeed. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. I think that on some platforms, it's possible for the call to select() from a shlib such as libpq to be captured by a select() provided by the executable it's loaded into. Dunno about the linking rules on Windows, but is there any chance of a workaround that way? AFAIK, no. We can somehow control the link order when we link with the import library, but that would require us to do it at link time, meaning libpq would be linked to postgres.exe -- FAIL. Another option is to load the select() function on runtime, by having libpq examine the list of loaded modules for postgres.exe.. But that seems a lot uglier than providing some kind of callback for it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: Magnus Hagander wrote: Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our signal emulation layer compatible select(). This means that at this point, walreceiver is not interruptible. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. Hmm, contrib/dblink probably has the same issue then. Yes. We could replace the blocking PQexec() calls with PQsendQuery(), and use the emulated version of select() to wait. Hmm. That would at least theoretically work, but aren't there still places we may end up blocking further down? Or are those ok? From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? Right, it's not supposed to block. This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. It could be taught that, but it would probably be a lot easier to put platform specific code in pq_getbyte_if_available(). That's a bit breaking an abstraction layer though, but that's maybe Ok in this case... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Magnus Hagander wrote: 2010/1/17 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: We could replace the blocking PQexec() calls with PQsendQuery(), and use the emulated version of select() to wait. Hmm. That would at least theoretically work, but aren't there still places we may end up blocking further down? Or are those ok? There's also PQconnect that needs similar treatment (using PQconnectStart/Poll()), but that's it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming Replication on win32
I'm trying to figure out why streaming replication doesn't work on win32. Here is what I have so far: It starts up fine, and outputs: LOG: starting archive recovery LOG: standby_mode = 'on' LOG: primary_conninfo = 'host=localhost port=5432' LOG: starting streaming recovery at 0/200 After this, *nothing* happens, and it never reaches a consistent state or anything. Looking at stacktraces, I notice two things: walreceiver process is in: ntdll!ZwWaitForSingleObject+0xa mswsock+0x4f65 WS2_32!select+0x105 LIBPQ!pqSocketPoll(int sock = 4936, int forRead = 1, int forWrite = 0, int64 end_time = -1)+0x2bb LIBPQ!pqSocketCheck(struct pg_conn * conn = 0x`00830160, int forRead = 1, int forWrite = 0, int64 end_time = -1)+0xa1 LIBPQ!pqWaitTimed(int forRead = 1, int forWrite = 0, struct pg_conn * conn = 0x`00830160, int64 finish_time = -1)+0x2e LIBPQ!pqWait(int forRead = 1, int forWrite = 0, struct pg_conn * conn = 0x`00830160)+0x2a LIBPQ!PQgetResult(struct pg_conn * conn = 0x`00830160)+0x82 LIBPQ!PQexecFinish(struct pg_conn * conn = 0x`00830160)+0x1c LIBPQ!PQexec(struct pg_conn * conn = 0x`00830160, char * query = 0x`0042f600 START_REPLICATION 0/200)+0x44 walreceiver!WalRcvConnect(void)+0x457 walreceiver!WalReceiverMain(struct FunctionCallInfoData * fcinfo = 0x`)+0x20e postgres!AuxiliaryProcessMain(int argc = 2, char ** argv = 0x`0081f080)+0x600 postgres!SubPostmasterMain(int argc = 4, char ** argv = 0x`0081f070)+0x2d7 postgres!main(int argc = 4, char ** argv = 0x`0081f070)+0x1e4 postgres!__tmainCRTStartup(void)+0x192 postgres!mainCRTStartup(void)+0xe kernel32!BaseProcessStart+0x2c Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our signal emulation layer compatible select(). This means that at this point, walreceiver is not interruptible. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. The second thing I note is that the walsender is in: ntdll!ZwWaitForMultipleObjects+0xa kernel32!ReleaseSemaphore+0x6b postgres!pgwin32_waitforsinglesocket(unsigned int64 s = 0x13fc, int what = 41, int timeout = -1)+0x275 postgres!pgwin32_recv(unsigned int64 s = 0x13fc, char * buf = 0x`0042f990 ???, int len = 1, int f = 0)+0xf5 postgres!secure_read(struct Port * port = 0x`0042fcf0, void * ptr = 0x`0042f990, unsigned int64 len = 1)+0x32 postgres!pq_getbyte_if_available(unsigned char * c = 0x`0042f990 ???)+0x106 postgres!CheckClosedConnection(void)+0x10 postgres!WalSndLoop(void)+0xdf postgres!WalSenderMain(void)+0xb9 postgres!PostgresMain(int argc = 2, char ** argv = 0x`0084d520, char * username = 0x`0082e218 Administrator)+0x3b5 postgres!BackendRun(struct Port * port = 0x`0042fcf0)+0x235 postgres!SubPostmasterMain(int argc = 3, char ** argv = 0x`0081f080)+0x278 postgres!main(int argc = 3, char ** argv = 0x`0081f080)+0x1e4 postgres!__tmainCRTStartup(void)+0x192 postgres!mainCRTStartup(void)+0xe kernel32!BaseProcessStart+0x2c From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. Oh, and the walsender process says: \Sessions\1\BaseNamedObjects\pgident(2196): postgres: wal sender process Administrator 127.0.0.1(1398) startup the walreceiver says: \Sessions\1\BaseNamedObjects\pgident(2264): postgres: wal receiver process and the startup process says: \Sessions\1\BaseNamedObjects\pgident(2764): postgres: startup process waiting for 00010002 -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Magnus Hagander wrote: Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our signal emulation layer compatible select(). This means that at this point, walreceiver is not interruptible. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. Hmm, contrib/dblink probably has the same issue then. We could replace the blocking PQexec() calls with PQsendQuery(), and use the emulated version of select() to wait. From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? Right, it's not supposed to block. This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. I presume the win32 emulation layer can be taught about non-blocking sockets? Or maybe pq_getbyte_if_available() can be implemented using some other simpler method on Windows. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication on win32
Magnus Hagander mag...@hagander.net writes: Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our signal emulation layer compatible select(). This means that at this point, walreceiver is not interruptible. Ugh. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. I think that on some platforms, it's possible for the call to select() from a shlib such as libpq to be captured by a select() provided by the executable it's loaded into. Dunno about the linking rules on Windows, but is there any chance of a workaround that way? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers