[HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao wrote: > On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao wrote: >> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait >> -forever' (i.e., you set allow_standalone_master to false), the master >> should detect the standby crash as soon as possible by using the >> timeout. For example, imagine that max_wal_senders is set to one and >> the master cannot detect the standby crash because of absence of the >> timeout. In this case, even if you start new standby, it will not be >> able to connect to the master since there is no free walsender slot. >> As the result, the master actually waits forever. > > This occurred to me that the timeout would be required even for > asynchronous streaming replication. So, how about implementing the > replication timeout feature before synchronous replication itself? Here is the patch. This is one of features required for synchronous replication, so I added this into current CF as a part of synchronous replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center replication_timeout_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout and wait-forever in sync rep
On 06.12.2010 07:42, Fujii Masao wrote: On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao wrote: The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait -forever' (i.e., you set allow_standalone_master to false), the master should detect the standby crash as soon as possible by using the timeout. For example, imagine that max_wal_senders is set to one and the master cannot detect the standby crash because of absence of the timeout. In this case, even if you start new standby, it will not be able to connect to the master since there is no free walsender slot. As the result, the master actually waits forever. This occurred to me that the timeout would be required even for asynchronous streaming replication. So, how about implementing the replication timeout feature before synchronous replication itself? Sounds good to me. The more pieces we can nibble off the main patch the better. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 06.12.2010 02:55, Robert Haas wrote: On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane wrote: I'm wondering if we should reconsider the pass-it-through-the-client approach, because if we could make that work it would be more general and it wouldn't need any special privileges. The trick seems to be to apply sufficient sanity testing to the snapshot proposed to be installed in the subsidiary transaction. I think the requirements would basically be (1) xmin<= any listed XIDs< xmax (2) xmin not so old as to cause GlobalXmin to decrease (3) xmax not beyond current XID counter (4) XID list includes all still-running XIDs in the given range Thoughts? I think this is too ugly to live. I really think it's a very bad idea for database clients to need to explicitly know anywhere near this many details about how the server represents snapshots. It's not impossible we might want to change this in the future, and even if we don't, it seems to me to be exposing a whole lot of unnecessary internal grottiness. The client doesn't need to know anything about the snapshot blob that the server gives it. It just needs to pass it back to the server through the other connection. To the client, it's just an opaque chunk of bytes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout and wait-forever in sync rep
On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao wrote: > The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait > -forever' (i.e., you set allow_standalone_master to false), the master > should detect the standby crash as soon as possible by using the > timeout. For example, imagine that max_wal_senders is set to one and > the master cannot detect the standby crash because of absence of the > timeout. In this case, even if you start new standby, it will not be > able to connect to the master since there is no free walsender slot. > As the result, the master actually waits forever. This occurred to me that the timeout would be required even for asynchronous streaming replication. So, how about implementing the replication timeout feature before synchronous replication itself? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
Hey Andrew, No, thanks. :-) And I don't think that libpq should follows it (libpqtypes). 2010/12/6 Andrew Chernow > > A varargs version of PQexecParams() would be handy, though. Imagine being >> able >> to do: >> >> PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, >> barvar); >> >> instead of constructing an array for the variables. >> >> > http://libpqtypes.esilo.com/man3/PQexecf.html > > > -- > Andrew Chernow > eSilo, LLC > every bit counts > http://www.esilo.com/ > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- // Dmitriy.
Re: [HACKERS] WIP patch for parallel pg_dump
Thank you Joachim; Yes, and the current patch requires the original (publisher) transaction is alive to prevent RecentXmin updated. I hope this restriction is acceptable if publishing/subscribing is provided via functions, not statements. Cheers; -- Koichi Suzuki 2010/12/6 Joachim Wieland : > On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas wrote: >> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan wrote: >>> Why not just say give me the snapshot currently held by process ? >>> >>> And please, not temp files if possible. >> >> As far as I'm aware, the full snapshot doesn't normally exist in >> shared memory, hence the need for publication of some sort. We could >> dedicate a shared memory region for publication but then you have to >> decide how many slots to allocate, and any number you pick will be too >> many for some people and not enough for others, not to mention that >> shared memory is a fairly precious resource. > > So here is a patch that I have been playing with in the past, I have > done it a while back and thanks go to Koichi Suzuki for his helpful > comments. I have not published it earlier because I haven't worked on > it recently and from the discussion that I brought up in march I got > the feeling that people are fine with having a first version of > parallel dump without synchronized snapshots. > > I am not really sure that what the patch does is sufficient nor if it > does it in the right way but I hope that it can serve as a basis to > collect ideas (and doubt). > > My idea is pretty much similar to Robert's about publishing snapshots > and subscribing to them, the patch even uses these words. > > Basically the idea is that a transaction in isolation level > serializable can publish a snapshot and as long as this transaction is > alive, its snapshot can be adopted by other transactions. Requiring > the publishing transaction to be serializable guarantees that the copy > of the snapshot in shared memory is always current. When the > transaction ends, the copy of the snapshot is also invalidated and > cannot be adopted anymore. So instead of doing explicit checks, the > patch aims at always having a reference transaction around that > guarantees validity of the snapshot information in shared memory. > > The patch currently creates a new area in shared memory to store > snapshot information but we can certainly discuss this... I had a GUC > in mind that can control the number of available "slots", similar to > max_prepared_transactions. Snapshot information can become quite > large, especially with a high number of max_connections. > > Known limitations: the patch is lacking awareness of prepared > transactions completely and doesn't check if both backends belong to > the same user. > > > Joachim > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 6/12/2010 12:57 PM, Craig Ringer wrote: On 22/11/2010 7:37 PM, Magnus Hagander wrote: Finally getting to looking at this one - sorry about the very long delay. Ditto, I'm afraid. Oh, I forgot to mention in the patch email: I'm not sure I've taken the right approach in terms of how I've hooked the crash dump code into the build system. I'm fairly happy with when it's actually inited, and with the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure about is how I've handled the conditional compilation by platform. Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It exports a generic win32-garbage-free function: void installCrashDumpHandler(); I was just going to create a no-op version of the same function in a (new) file src/backend/ports/crashdump.c , but I was having problems figuring out how to sensibly express "compile this file for everything except these ports". In addition, doing it this way means there's still a pointless no-op function call in each backend start and some useless symbols - maybe not a big worry, but not ideal. What I ended up doing was putting a conditionally compiled switch in port.h that produces an inline no-op version of installCrashDumpHandler for unsupported platforms, and an extern for supported platforms (currently win32). That'll optimize out completely on unsupported platforms, which is nice if unimportant. I'm just not sure if port.h is really the right place and if it's only used by the backend. I considered a new header included by postgres.h (since it's really backend-only stuff) but as it seems Pg generally prefers bigger headers over more little headers, I popped it in port.h . Comments? -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
On Mon, Dec 6, 2010 at 3:07 AM, Greg Smith wrote: > The one time this year top-posting seems appropriate...this patch seems > stalled waiting for some sort of response to the concerns Alvaro raised > here. Sorry for the delay. I didn't have the time. > I gave this a look. It seems good, but I'm not sure about this bit: Thanks for the review! > I guess this was OK when this was conceived as CopyXlog, but since it's > now a generic mechanism, this seems a bit unwise. Should this be > reconsidered so that it's possible to change the format or number of > columns? I changed CopyBothResponse message so that it includes the format and number of columns of copy data. Please see the attached patch. > (The paragraph added to the docs is also a bit too specific about this > being used exclusively in streaming replication, ISTM) Yes. But it seems difficult to generalize the docs more because currently only SR uses Copy-both. So I had to write that, for example, the condition to get into the state is only "START REPLICATION" command. > While modifying the code, it occurred to me that we might have to add new > ExecStatusType like PGRES_COPY_BOTH and use that for CopyBoth mode, > for the sake of consistency. But since it's just alias of PGRES_COPY_BOTH > for now, i.e., there is no specific behavior for that ExecStatusType, I > don't > think that it's worth adding that yet. > > > I'm not so sure about this. If we think that it's worth adding a new > possible state, we should do so now; we will not be able to change this > behavior later. OK. I added that new state. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center libpqrcv_send_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 22/11/2010 7:37 PM, Magnus Hagander wrote: Finally getting to looking at this one - sorry about the very long delay. Ditto, I'm afraid. I agree with Heikki's earlier comment that it's better to have this included in the backend - but that's obviously not going to happen for already-released versions. I'd therefor advocate a contrib module for existing versions, and then in core for 9.1+. I think that was the conclusion the following discussion came to, as well. It's easy to distribute a binary .dll for older versions and if the EDB folks found this facility really useful they could even bundle it in the installer. There's no point backpatching it given how few win32 users will build from source. We should then have an option to turn it off (on by default). But we don't need to pay the overhead on every backend startup - we could just map the value into the parameter shared memory block, and require a full postmaster restart in order to change it. As discussed later, implemented by testing for a 'crashdumps' directory. If the directory exists, a dump is written. If the directory is missing a log message is emitted to explain why no dump was written; as this'll only happen when a backend dies, log noise isn't an issue and it'll help save sysadmin head-scratching given the "magic" behaviour of turning on/off based on directory existence. Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? +1 to just publishing the DLLs Looking at the code: * Why do we need to look at differnt versions of dbghelp.dll? Can't we always use the one with Windows? And in that case, can't we just use compile-time linking, do we need to bother with DLL loading at all? Newer versions than those shipped with an OS release contain updated and improved functionality as well as bug fixes. We *can* use the version shipped with the OS, as even in XP dbghelp.dll contains the required functionality, but it'd be better to bundle it. That's microsoft's recommendation when you use dbghelp.dll . * Per your comment about using elog() - a good idea in that case is to use write_stderr(). That will send the output to stderr if there is one, and otherwise the eventlog (when running as service). Hm, ok. The attached patch doesn't do that yet. Given the following comments, do you think the change still necessary? * And yes, in the crash handler, we should *definitely* not use elog(). Currently I'm actually using it, but only after the dump has been written or we've decided not to write a dump. At this point, if elog() fails we don't really care, we'd just be returning control to the OS's fatal exception cleanup anyway. It's IMO desirable for output to go to the same log as everything else, so admins can find out what's going on and can associate the crash dump files with events that happened around a certain time. * On Unix, the core file is dropped in the database directory, we don't have a separate directory for crashdumps. If we want to be consistent, we should do that here too. I do think that storing them in a directory like "crashdumps" is better, but I just wanted to raise the comment. Given the recent discussion about using the dir as an on/off switch, that'll be necessary. * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? Not now that we're using it an on/off switch, but of course it would've made sense otherwise. * Right now, we overwrite old crashdumps. It is well known that PIDs are recycled pretty quickly on Windows - should we perhaps dump as postgres--.mdmp when there is a filename conflict? Good idea. Rather than try to test for existing files I've just taken the simple route (always good in code run during process crash) and appended the system ticks to the dump name. System ticks are a 32-bit quantity that wraps about every 40 days, so they should be easily unique enough in combination with the pid. The logs will report the crashdump filenames unless the backend is too confused to be able to elog(), and all the crash dump files have file system timestamps so there isn't much point trying to format a crash date/time into their names. I could use wall clock seconds (or ms) instead, but favour system ticks because they can be read with a simple int-returning call that doesn't require any more stack allocation. On Windows, getting wall clock time in seconds seems to involve a call to GetSystemTimeAsFileTime (http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to populate a struct containing a fake-64-bit int as two 32-bit ints. It's not clear that it's free of timezone calculations and generally looks like something better avoided in a crash handler if it's not necessary. As GetSystemTicks seems quite sufficient, I thought
Re: [HACKERS] Review: Extensions Patch
On Dec 4, 2010, at 6:14 AM, Dimitri Fontaine wrote: > Hi, > > Thanks for the review, that's quite one! :) > > I'm not sure to follow you all along, it seems like the reading is "try > it first then understand and comment again", so sometimes I'm not sure > if you're saying that docs are missing the point or that the behaviour > ain't right. Overall I think the docs could use a lot more fleshing out. Some of the stuff in the wiki would help a lot. At some point, though, I'll work over the docs myself and either send a patch to you or to the list (if it has been committed to core). > "David E. Wheeler" writes: >> `make installcheck` fails (regression.diffs attached). > > You forgot the attachment. I'd add mine if I didn't get "All 124 tests > passed." result for the command :) Sorry, attached now. > (Note that I've been having problems too, but can't reproduce them > easily, and they might have been related to forgotten maintainer-clean > steps). Yeah, I did that a bunch of times. This was the best run. > Well, the exposed functionality aren't much (2 SQL commands and some > support functions) and you need to have a contrib at hand to exercise > them. So I'm not sure about how to add in regression tests for contrib > infrastructure work and have them depend only on -core. I'll take ideas. You write a very simple contrib module exclusively for testing. It doesn't have to actually do anything other than create a couple of objects. A domain perhaps. > The other thing is that the very important feature here is dump and > restore of a database containing extensions. You're not reporting about > it, though. And I'm not sure about how to check for pg_dump content from > pg_regress oriented tests. Yeah, I haven't tried the dump and restore stuff. Are there no dump/restore tests in core? If there are, I expect all you'd have to do is add an extension or two to be dumped and restored. >> Yes. Much of the execution is superuser-only, so we need to make sure >> that such is actually the case (unit tests would help with that). > > How do you mean? Trying to subvert the features by being able to run > them as a non-superuser? How do you install tests around that? If the tests run as a super user (I think they usually do), then all you have to do is create a new unprivileged user, switch to that user via SET ROLE, and call the function in question. >> The pg_extension catalog docs are a little thin, but probably >> sufficient. They appear to be duplicated, though, with the entries for >> catalog-pg-extension and view-pg-extensions looking identical. One is >> apparently a list of installed extensions, and the other an SRF that >> lists installed and available extensions. But I find the duplication a >> bit confusing. (Might be easer if I wasn't reading SGML though.) > > Well yes the pg_extension catalog is filled at CREATE EXTENSION time so > that we have an OID to register dependencies on. Then I added a SRF to > also list available extensions so that it's easy to integrate the > feature into pgadmin and the like. Right. > Now I'm open to suggestions as far as the names are involved, or maybe > we should keep the SRF but remove the view. psql could easily enough use > the function directly I think. I think the catalog should be pg_created_extensions (or pg_installed_extensions) and the view should be pg_available_extensions. Is there no way to have a catalog table visible to *all* databases that contains the list of available extensions? >> I don't understand this paragraph at all: >> This paragraph isn't very clear, either: >> Examples might help. > > Well, I think native English skills are required here. And I'm not too > sure about the example. I recognize that some documentation pages have no (or very few examples). But oftentimes nothing makes the utility of a feature more apparent than an example. I tend to use them liberally in my documentation. As I said, I'll probably contribute a refactoring of the docs at some point. >> Control file keys: >> >> * comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, >> no? >> * version -- If it's free-form, how will you do dependency-checking? Or will >> you? >> * script >> * encoding -- Seems unnecessary; one can explicitly set it in the .sql file, >> no? >> * custom_variable_classes > > Doc says: > http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html > > This control file allows for registering the extension and should > provide values for the keys described below. > > Then there's the detail about fields. I think the context is enough to > understand, but I'd be happy to integrate some revisions of the current > content. Yes, I think if you just mentioned that the comment is used for an implicit CREATE COMMENT that it would be useful. That's true, yes? The others are okay as-is. > For the version, it has been agreed before (Developer Meeting) that v1 > would not care about inter-extension
Re: [HACKERS] WIP patch for parallel pg_dump
On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas wrote: > On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan wrote: >> Why not just say give me the snapshot currently held by process ? >> >> And please, not temp files if possible. > > As far as I'm aware, the full snapshot doesn't normally exist in > shared memory, hence the need for publication of some sort. We could > dedicate a shared memory region for publication but then you have to > decide how many slots to allocate, and any number you pick will be too > many for some people and not enough for others, not to mention that > shared memory is a fairly precious resource. So here is a patch that I have been playing with in the past, I have done it a while back and thanks go to Koichi Suzuki for his helpful comments. I have not published it earlier because I haven't worked on it recently and from the discussion that I brought up in march I got the feeling that people are fine with having a first version of parallel dump without synchronized snapshots. I am not really sure that what the patch does is sufficient nor if it does it in the right way but I hope that it can serve as a basis to collect ideas (and doubt). My idea is pretty much similar to Robert's about publishing snapshots and subscribing to them, the patch even uses these words. Basically the idea is that a transaction in isolation level serializable can publish a snapshot and as long as this transaction is alive, its snapshot can be adopted by other transactions. Requiring the publishing transaction to be serializable guarantees that the copy of the snapshot in shared memory is always current. When the transaction ends, the copy of the snapshot is also invalidated and cannot be adopted anymore. So instead of doing explicit checks, the patch aims at always having a reference transaction around that guarantees validity of the snapshot information in shared memory. The patch currently creates a new area in shared memory to store snapshot information but we can certainly discuss this... I had a GUC in mind that can control the number of available "slots", similar to max_prepared_transactions. Snapshot information can become quite large, especially with a high number of max_connections. Known limitations: the patch is lacking awareness of prepared transactions completely and doesn't check if both backends belong to the same user. Joachim diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 95beba8..c24150f 100644 *** a/src/backend/storage/ipc/ipci.c --- b/src/backend/storage/ipc/ipci.c *** CreateSharedMemoryAndSemaphores(bool mak *** 124,129 --- 124,130 size = add_size(size, BTreeShmemSize()); size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); + size = add_size(size, SyncSnapshotShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif *** CreateSharedMemoryAndSemaphores(bool mak *** 228,233 --- 229,235 BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + SyncSnapshotInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 6e7a6db..00522fb 100644 *** a/src/backend/storage/ipc/procarray.c --- b/src/backend/storage/ipc/procarray.c *** typedef struct ProcArrayStruct *** 91,96 --- 91,111 static ProcArrayStruct *procArray; + + /* this should be a GUC later... */ + #define MAX_SYNC_SNAPSHOT_SETS 4 + typedef struct + { + SnapshotData ssd; + char name[NAMEDATALEN]; + BackendId backendId; + OiddatabaseId; + } NamedSnapshotData; + + typedef NamedSnapshotData* NamedSnapshot; + + static NamedSnapshot syncSnapshots; + /* * Bookkeeping for tracking emulated transactions in recovery */ *** static int KnownAssignedXidsGetAndSetXmi *** 159,164 --- 174,182 static TransactionId KnownAssignedXidsGetOldestXmin(void); static void KnownAssignedXidsDisplay(int trace_level); + static bool DeleteSyncSnapshot(const char *name); + static bool snapshotPublished = false; /* true if we have published at least one snapshot */ + /* * Report shared-memory space needed by CreateSharedProcArray. */ *** ProcArrayRemove(PGPROC *proc, Transactio *** 350,355 --- 368,379 void ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid) { + if (snapshotPublished) + { + DeleteSyncSnapshot(NULL); + snapshotPublished = false; + } + if (TransactionIdIsValid(latestXid)) { /* *** KnownAssignedXidsDisplay(int trace_level *** 3104,3106 --- 3132,3374 pfree(buf.data); } + + + /* + * Report space needed for our shared memory area. + * + * Memory is structured as follows: + * + * NamedSnapshotData[0] + * NamedSnapshotData[1] + * NamedSnapshotData[2] + * Xids for NamedSnapshotData[0] + * Sub-Xids for NamedSnapshotData[
Re: [HACKERS] Suggesting a libpq addition
A varargs version of PQexecParams() would be handy, though. Imagine being able to do: PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, barvar); instead of constructing an array for the variables. http://libpqtypes.esilo.com/man3/PQexecf.html -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On 12/5/2010 4:22 AM, Marc Balmer wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and does connection re-establishing if the connection fails (it can be discussed if this already to much magic, maybe remove this part). It has been carefully designed to handle memory the right way. We use this since a long time. What do you think? I think it is a wonderful idea. Check out libpqtypes. It has a PQexecf, PQexecvf, PQsendf and PQsendvf. But that is just the beginning http://libpqtypes.esilo.com http://pgfoundry.org/projects/libpqtypes/ -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 9:35 PM, Rob Wultsch wrote: > On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas wrote: >> On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wrote: >>> I think you have read a bit more into what I have said than is >>> correct. MySQL can deal with thousands of users and separate schemas >>> on commodity hardware. There are many design decisions (some >>> questionable) that have made MySQL much better in a shared hosting >>> environment than pg and I don't know where the grants system falls >>> into that. >> >> Objection: Vague. > > I retract the remark, your honor. Clarifying it would be fine, too... :-) > At some point Hackers should look at pg vs MySQL multi tenantry but it > is way tangential today. My understanding is that our schemas work like MySQL databases; and our databases are an even higher level of isolation. No? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas wrote: > On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wrote: >> I think you have read a bit more into what I have said than is >> correct. MySQL can deal with thousands of users and separate schemas >> on commodity hardware. There are many design decisions (some >> questionable) that have made MySQL much better in a shared hosting >> environment than pg and I don't know where the grants system falls >> into that. > > Objection: Vague. > I retract the remark, your honor. At some point Hackers should look at pg vs MySQL multi tenantry but it is way tangential today. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan wrote: > Why not just say give me the snapshot currently held by process ? > > And please, not temp files if possible. As far as I'm aware, the full snapshot doesn't normally exist in shared memory, hence the need for publication of some sort. We could dedicate a shared memory region for publication but then you have to decide how many slots to allocate, and any number you pick will be too many for some people and not enough for others, not to mention that shared memory is a fairly precious resource. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane wrote: > Itagaki Takahiro writes: >> On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine >> wrote: >>> My understanding is that the variadic form shadows the other one in a >>> way that it's now impossible to call it from SQL level. That's the >>> reason why I did the (text, text, text, VARIADIC text) version before, >>> but is it true? > >> The VARIADIC version doesn't hide the 3-args version. I tested the >> behavior by printf-debug. The planner seems to think the non VARIADIC >> version is the best-matched one when 3 arguments are passed. > > Why is there a variadic replace() in this patch at all? It seems just > about entirely unrelated to the stated purpose of the patch, as well > as being of dubious usefulness. When would it be superior to > > replace(replace(orig, from1, to1), from2, to2), ... > > The implementation doesn't appear to offer any material speed > improvement over nested calls of that sort, and I'm finding it hard to > visualize when it would be more useful than nested calls. The > documentation is entirely inadequate as well. An iterated replacement has different semantics from a simultaneous replace - replacing N placeholders with values simultaneously means you don't need to worry about the case where one of the replacement strings contains something that looks like a placeholder. I actually think a simultaneous replacement feature would be quite handy but I make no comment on whether it belongs as part of this patch. The discussion on this patch has been rather wide-ranging, and it's not clear to me that there's really consensus on what this patch needs to - or should - do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sat, Dec 4, 2010 at 8:04 PM, Jeff Janes wrote: > But who would be doing the passing? For the postmaster to be doing > that would probably go against the minimalist design. It would have > to keep track of which backend is available, and which db and user it > is primed for. Perhaps a feature could be added to the backend to > allow it to get passed a FD from pgbouncer or pgpool-II and then hand > control back to the pooler upon "close" of the connection, as they > already have the infrastructure to keep pools around while the > postmaster does not. Are pgbouncer and pgpool close enough to "core" > to make such intimate collaboration with the backend OK? I am not sure. I'm afraid that might be adding complexity without really solving anything, but maybe I'm a pessimist. One possible way to do make an improvement in this area would be to move the responsibility for accepting connections out of the postmaster. Instead, you'd have a group of children that would all call accept() on the socket, and the OS would arbitrarily pick one to receive each new incoming connection. The postmaster would just be responsible for making sure that there were enough children hanging around. You could in fact make this change without doing anything else, in which case it wouldn't save any work but would possibly reduce connection latency a bit since more of the work could be done before the connection actually arrived. From there, you could go two ways. One option would be to have backends that would otherwise terminate normally instead do the equivalent of DISCARD ALL and then go back around and try to accept() another incoming connection. If they get a guy who wants the database to which they previously connected, profit. If not, laboriously flush every cache in sight and rebind to the new database. Another option would be to have backends that would otherwise terminate normally instead do the equivalent of DISCARD ALL and then mark themselves as able to accept a new connection to the same database to which they are already connected (but not any other database). Following authentication, a backend that accepted a new incoming connection looks through the pool of such backends and, if it finds one, hands off the connection using file-descriptor passing and then loops back around to accept() again. Otherwise it handles the connection itself. This wouldn't offer much of an advantage over the first option for a cluster that basically has just one database, or for a cluster that has 1000 actively used databases. But it would be much better for a system with three databases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 12/05/2010 08:55 PM, Robert Haas wrote: On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane wrote: I'm wondering if we should reconsider the pass-it-through-the-client approach, because if we could make that work it would be more general and it wouldn't need any special privileges. The trick seems to be to apply sufficient sanity testing to the snapshot proposed to be installed in the subsidiary transaction. I think the requirements would basically be (1) xmin<= any listed XIDs< xmax (2) xmin not so old as to cause GlobalXmin to decrease (3) xmax not beyond current XID counter (4) XID list includes all still-running XIDs in the given range Thoughts? I think this is too ugly to live. I really think it's a very bad idea for database clients to need to explicitly know anywhere near this many details about how the server represents snapshots. It's not impossible we might want to change this in the future, and even if we don't, it seems to me to be exposing a whole lot of unnecessary internal grottiness. How about just pg_publish_snapshot(), returning a token that is only valid until the end of the transaction in which it was called, and pg_subscribe_snapshot(token)? The implementation can be that the publisher writes its snapshot to a temp file and returns the name of the temp file, setting an at-commit hook to remove the temp file. The subscriber reads the temp file and sets the contents as its transaction snapshot. If security is a concern, one could also save the publisher's role OID to the file and require the subscriber's to match. Why not just say give me the snapshot currently held by process ? And please, not temp files if possible. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch wrote: > I think you have read a bit more into what I have said than is > correct. MySQL can deal with thousands of users and separate schemas > on commodity hardware. There are many design decisions (some > questionable) that have made MySQL much better in a shared hosting > environment than pg and I don't know where the grants system falls > into that. Objection: Vague. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 3:17 PM, Rob Wultsch wrote: > On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch wrote: >> One thing I would suggest that the PG community keeps in mind while >> talking about built in connection process caching, is that it is very >> nice feature for memory leaks caused by a connection to not exist for >> and continue growing forever. > > s/not exist for/not exist/ > > I have had issues with very slow leaks in MySQL building up over > months. It really sucks to have to go to management to ask for > downtime because of a slow memory leak. Apache has a very simple and effective solution to this problem - they have a configuration option controlling the number of connections a child process handles before it dies and a new one is spawned. I've found that setting this to 1000 works excellently. Process startup overhead decreases by three orders of magnitude, and only egregiously bad leaks add up to enough to matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane wrote: > I'm wondering if we should reconsider the pass-it-through-the-client > approach, because if we could make that work it would be more general and > it wouldn't need any special privileges. The trick seems to be to apply > sufficient sanity testing to the snapshot proposed to be installed in > the subsidiary transaction. I think the requirements would basically be > (1) xmin <= any listed XIDs < xmax > (2) xmin not so old as to cause GlobalXmin to decrease > (3) xmax not beyond current XID counter > (4) XID list includes all still-running XIDs in the given range > > Thoughts? I think this is too ugly to live. I really think it's a very bad idea for database clients to need to explicitly know anywhere near this many details about how the server represents snapshots. It's not impossible we might want to change this in the future, and even if we don't, it seems to me to be exposing a whole lot of unnecessary internal grottiness. How about just pg_publish_snapshot(), returning a token that is only valid until the end of the transaction in which it was called, and pg_subscribe_snapshot(token)? The implementation can be that the publisher writes its snapshot to a temp file and returns the name of the temp file, setting an at-commit hook to remove the temp file. The subscriber reads the temp file and sets the contents as its transaction snapshot. If security is a concern, one could also save the publisher's role OID to the file and require the subscriber's to match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FK's to refer to rows in inheritance child
On Sun, Dec 5, 2010 at 12:41 PM, Andrew Dunstan wrote: > Well, ISTM that amounts to not having "official topic branches" :-) I agree > that this is supposed to be one of git's strengths (or more exactly a > strength of distributed SCM's generally). I don't really see any great > value in sanctifying a particular topic branch with some official status. I think the value in an official topic branch would be to allow formal incremental commit of large patches. In other words, we could decide that a commit to the official topic branch must meet the same standards of quality normally applied to a commit to the master branch, and must go through the same process. It would be understood that the topic branch would eventually be merged (with or without squash) back into the master branch, but that objections were to be raised as pieces were committed to the topic branch, not at merge time. The merge itself would require consensus as to timing, but we'd agree to take a dim view of "I haven't reviewed anything that's been going on here for the last six months but now hate all of it". I think that this sort of approach could be useful either for really big projects, like perhaps SQL/MED; or possibly for certain categories of changes (things we want to include the next time we do a libpq version bump). Although, the trouble with the latter is that we do compatibility breaks so rarely that the effort involved in maintaining the branch might well exceed the value we'd get out of having it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Mon, Dec 6, 2010 at 08:01, Tom Lane wrote: > Why is there a variadic replace() in this patch at all? It seems just > about entirely unrelated to the stated purpose of the patch, as well > as being of dubious usefulness. As I wrote in the previous mail, the most important part of the patch for CREATE EXTENSION is pg_read_binary_file(). We can rewrite not only replace(VARIADIC) but also other functions in the patch with existing functions. However, the author wanted simple-case user APIs, and I also agreed to export each step of the complex pg_execute_sql_file(). But I have no objections to hide some of the subroutines if there are any problems. | $sql := replace( | convert_from( |pg_read_binary_file($path, 0), |$encoding), | '@extschema@', $schema)); | EXECUTE $sql; -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Rob Wultsch wrote: Forgive me, but is all of this a step on the slippery slope to direct io? And is this a bad thing I don't really think so. There's an important difference in my head between direct I/O, where the kernel is told "write this immediately!", and what I'm trying to achive. I want to give the kernel an opportunity to write blocks out in an efficient way, so that it can take advantage of elevator sorting, write combining, and similar tricks. But, eventually, those writes have to make it out to disk. Linux claims to have concepts like a "deadline" for I/O to happen, but they turn out to not be so effective once the system gets backed up with enough writes. Since fsync time is the only effective deadline, I'm progressing from the standpoint that adjusting when it happens relative to the write will help, while still allowing the kernel an opportunity to get the writes out on its own schedule. When ends up happening if you push toward fully sync I/O is the design you see in some other databases, where you need multiple writer processes. Then requests for new pages can continue to allocate as needed, while keeping any one write from blocking things. That's one sort of a way to simulate asynchronous I/O, and you can substitute true async I/O instead in many of those implementations. We didn't have much luck with portability on async I/O when that was last experimented with, and having multiple background writer processes seems like overkill; that whole direction worries me. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
Itagaki Takahiro writes: > On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine wrote: >> My understanding is that the variadic form shadows the other one in a >> way that it's now impossible to call it from SQL level. That's the >> reason why I did the (text, text, text, VARIADIC text) version before, >> but is it true? > The VARIADIC version doesn't hide the 3-args version. I tested the > behavior by printf-debug. The planner seems to think the non VARIADIC > version is the best-matched one when 3 arguments are passed. Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. When would it be superior to replace(replace(orig, from1, to1), from2, to2), ... The implementation doesn't appear to offer any material speed improvement over nested calls of that sort, and I'm finding it hard to visualize when it would be more useful than nested calls. The documentation is entirely inadequate as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2010-11: Status report at 2/3 of scheduled time
After an unfortunate two weeks where I was lost in Nyquil-land and not paying as much attention as I should have, I just finished a pass checking over all the open 26 CommitFest items. Just under 50% of the patches that were open at the start of the CF are still hanging around, with 6 in "Ready for Committer" and the rest split evently waiting for reviewers or the author. A few larger patches or patch sets deserve some commentary. The multi-part Extensions patch continues to chug along. A committer needs to pick up "pg_execute_from_file()" next to keep progress on that series going, as the other two remaining patches depend on it (and may need to be rebased after it goes in). Help is also needed on coding review of the main "Extensions" patch. The as yet untouched "ALTER EXTENSION ... SET SCHEMA ..." is the last patch in the set to apply, and while it needs a reviewer eventually I don't think that's the blocking point on the general extension work yet. As for the also large SQL/MED set, I'm not sure whether it's posible to get review work started on its two dependent components--"file_fdw" and "postgresql_fdw"--until the main patch is finished off. Given the size of code involved, it wouldn't surprise me if everyone is still be chewing on that series into the next CF. Glad to see it all submitted for this one though. Most of the remaining patches that are in "Needs Review" have already gone through one round of review. There are two notable exceptions. While some libpq work has been getting finished for Sync Rep, neither of the two main patches for that have received any love for almost two months now. This is the spot where I start to get worried about seeing this finish for 9.1. I'm now trying to find time to work on those; anybody else need this feature who can help? -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith wrote: > Heikki Linnakangas wrote: >> >> If you fsync() a file with one dirty page in it, it's going to return very >> quickly, but a 1GB file will take a while. That could be problematic if you >> have a thousand small files and a couple of big ones, as you would want to >> reserve more time for the big ones. I'm not sure what to do about it, maybe >> it's not a problem in practice. > > It's a problem in practice allright, with the bulk-loading situation being > the main one you'll hit it. If somebody is running a giant COPY to populate > a table at the time the checkpoint starts, there's probably a 1GB file of > dirty data that's unsynced around there somewhere. I think doing anything > about that situation requires an additional leap in thinking about buffer > cache evicition and fsync absorption though. Ultimately I think we'll end > up doing sync calls for relations that have gone "cold" for a while all the > time as part of BGW activity, not just at checkpoint time, to try and avoid > this whole area better. That's a lot more than I'm trying to do in my first > pass of improvements though. > > In the interest of cutting the number of messy items left in the official > CommitFest, I'm going to mark my patch here "Returned with Feedback" and > continue working in the general direction I was already going. Concept > shared, underlying patches continue to advance, good discussion around it; > those were my goals for this CF and I think we're there. > > I have a good idea how to autotune the sync spread that's hardcoded in the > current patch. I'll work on finishing that up and organizing some more > extensive performance tests. Right now I'm more concerned about finishing > the tests around the wal_sync_method issues, which are related to this and > need to get sorted out a bit more urgently. > > -- > Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD > PostgreSQL Training, Services and Support www.2ndQuadrant.us > Forgive me, but is all of this a step on the slippery slope to direction io? And is this a bad thing? -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Josh Berkus wrote: I modified test_fsync in two ways to run this; first, to make it support O_DIRECT, and second to make it run in the *current* directory. Patch please? I agree with the latter change; what test_fsync does is surprising. I suggested a while ago that we refactor test_fsync to use a common set of source code as the database itself for detecting things related to wal_sync_method, perhaps just extract that whole set of DEFINE macro logic to somewhere else. That happened at a bad time in the development cycle (right before a freeze) and nobody ever got back to the idea afterwards. If this code is getting touched, and it's clear it is in some direction, I'd like to see things change so it's not possible for the two to diverge again afterwards. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Heikki Linnakangas wrote: If you fsync() a file with one dirty page in it, it's going to return very quickly, but a 1GB file will take a while. That could be problematic if you have a thousand small files and a couple of big ones, as you would want to reserve more time for the big ones. I'm not sure what to do about it, maybe it's not a problem in practice. It's a problem in practice allright, with the bulk-loading situation being the main one you'll hit it. If somebody is running a giant COPY to populate a table at the time the checkpoint starts, there's probably a 1GB file of dirty data that's unsynced around there somewhere. I think doing anything about that situation requires an additional leap in thinking about buffer cache evicition and fsync absorption though. Ultimately I think we'll end up doing sync calls for relations that have gone "cold" for a while all the time as part of BGW activity, not just at checkpoint time, to try and avoid this whole area better. That's a lot more than I'm trying to do in my first pass of improvements though. In the interest of cutting the number of messy items left in the official CommitFest, I'm going to mark my patch here "Returned with Feedback" and continue working in the general direction I was already going. Concept shared, underlying patches continue to advance, good discussion around it; those were my goals for this CF and I think we're there. I have a good idea how to autotune the sync spread that's hardcoded in the current patch. I'll work on finishing that up and organizing some more extensive performance tests. Right now I'm more concerned about finishing the tests around the wal_sync_method issues, which are related to this and need to get sorted out a bit more urgently. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Instrument checkpoint sync calls
Jeff Janes wrote: I've attached a tiny patch to apply over yours, to deal with this and with the case where no files are synced. Thanks for that. That obvious error eluded me because in most of the patch update testing I was doing (on ext3), the longest sync was always about the same length as the total sync time. Attached patch (in correct diff form this time!) collects up all changes. That includes elimination of a potential race condition if someone changes log_checkpoints while a long sync phase is executing. I don't know whether that can happen, and it obviously won't give accurate stats going back to the beginning of the checkpoint in that case, but it tries to defend aginst producing complete garbage if that value changes out from under it. This is the first version of this patch I feel fairly good about; no open concerns left on my side. Jeff, since you're now the de-facto credited reviewer of this one by virtue of suggesting bug fixes, could you take this update out for a spin too? Combining this instrumentation patch with the backend sync one already committed, the net result under log_min_messages=debug1is somewhat undesirable in that I can now see the individual sync times for the syncs done by the checkpoint writer, but I do not get times for the syncs done by the backend (I only get informed of their existence). On a properly working system, backend syncs shouldn't be happening. So if you see them, I think the question shouldn't be "how long are they taking?", it's "how do I get rid of them?" From that perspective, knowing of their existence is sufficient to suggest the necessary tuning changes, such as dropping bgwriter_delay. When you get into a situation where they are showing up, a whole lot of them can happen in a very brief period; enough that I'd actually be concerned about the added timing overhead, something I normally don't care about very much. That's the main reason I didn't bother instrumenting those too--the situations where they happen are ones already running low on resources. Big writes for things that can only be written out at checkpoint time, on the other hand, are unavoidable in the current design. Providing detail on them is going to be relevant unless there's a major refactoring of how those happen. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ae9ed8f..6adc139 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** LogCheckpointEnd(bool restartpoint) *** 6955,6964 { long write_secs, sync_secs, ! total_secs; int write_usecs, sync_usecs, ! total_usecs; CheckpointStats.ckpt_end_t = GetCurrentTimestamp(); --- 6955,6969 { long write_secs, sync_secs, ! total_secs, ! longest_secs, ! average_secs; int write_usecs, sync_usecs, ! total_usecs, ! longest_usecs, ! average_usecs; ! double average_sync_time; CheckpointStats.ckpt_end_t = GetCurrentTimestamp(); *** LogCheckpointEnd(bool restartpoint) *** 6974,6991 CheckpointStats.ckpt_sync_end_t, &sync_secs, &sync_usecs); if (restartpoint) elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); " ! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, write_secs, write_usecs / 1000, sync_secs, sync_usecs / 1000, ! total_secs, total_usecs / 1000); else elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); " "%d transaction log file(s) added, %d removed, %d recycled; " ! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, CheckpointStats.ckpt_segs_added, --- 6979,7017 CheckpointStats.ckpt_sync_end_t, &sync_secs, &sync_usecs); + /* + * Timing values returned from CheckpointStats are in milliseconds. + * Convert to the second plus microsecond form that TimestampDifference + * returns for homogeneous printing. + */ + longest_secs = (long) (CheckpointStats.ckpt_longest_sync / 1000); + longest_usecs = 1000 * (CheckpointStats.ckpt_longest_sync - + longest_secs * 1000); + + average_sync_time = 0; + if (CheckpointStats.ckpt_sync_rels > 0) + average_sync_time = CheckpointStats.ckpt_agg_sync_time / + CheckpointStats.ckpt_sync_rels; + average_secs = (long) (average_sync_time / 1000); + average_usecs = 1000 * (average_sync_time - average_secs * 1000); + if (restartpoint) elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); " ! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " !
Re: [HACKERS] V3: Idle in transaction cancellation
"Kevin Grittner" wrote: > Andres Freund wrote: >> On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote: >> >>> Is there any provision for one backend to cause a *different* >>> backend which is idle in a transaction to terminate cleanly when >>> it attempts to process its next statement? >> >> You might want to check out SendProcSignal() et al. > > Yeah, that was the missing link for me. Thanks! OK, I have confirmed that the patch set Andres posted compiles and passes regression tests (both `make check` and `make installcheck-world`), and that the API provided works for at least one purpose -- allowing SSI conflict detection to cancel an a transaction other than the one on which the failure is detected, to allow a guarantee that there will be no rollbacks without first having a successful commit of a transaction which wrote data. I created a "skip" branch in my git repo to hold the work for this test. A single commit there which includes all three of Andres's patches plus the code needed to implement the SSI feature is at: http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a= commitdiff;h=0e1e86830b49dab51ff80619a580527260f4e186 I now wish I'd committed Andres's patches separately first. If anyone likes, I could try to put something together to separate things out like that. I did notice one picky thing while working on this -- the left brace for some if statements wasn't placed consistently with surrounding code. Rather than: if (x) { a(); b(); } the patch tends to do this: if (x){ a(); b(); } I agree with Andres that the API to use it is a little bit spread out and non-obvious, but I don't see a nicer way to do it. Maybe someone else can think of something. I didn't test the patch in relation to the original purpose (for which only the first two of the three patch files should be needed) -- error handling with hot standby. I'm hoping that someone familiar with the issues there can test that part of it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch wrote: > One thing I would suggest that the PG community keeps in mind while > talking about built in connection process caching, is that it is very > nice feature for memory leaks caused by a connection to not exist for > and continue growing forever. s/not exist for/not exist/ I have had issues with very slow leaks in MySQL building up over months. It really sucks to have to go to management to ask for downtime because of a slow memory leak. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Sun, Dec 5, 2010 at 11:59 AM, Josh Berkus wrote: > >> * no coordination of restarts/configuration changes between the cluster >> and the pooler >> * you have two pieces of config files to configure your pooling settings >> (having all that available say in a catalog in pg would be awesome) >> * you lose all of the advanced authentication features of pg (because >> all connections need to go through the pooler) and also ip-based stuff >> * no SSL support(in the case of pgbouncer) >> * complexity in reseting backend state (we added some support for that >> through explicit SQL level commands in recent releases but it still is a >> hazard) > > More: > > * pooler logs to separate file, for which there are (currently) no anaysis > tools > * pooling is incompatible with the use of ROLES for data security > > The last is a major issue, and not one I think we can easily resolve. MySQL > has a pooling-friendly user system, because when you connect to MySQL you > basically always connect as the superuser and on connection it switches you > to your chosen login role. This, per Rob Wulsch, is one of the things at > the heart of allowing MySQL to support 100,000 low frequency users per cheap > hosting system. > > As you might imagine, this behavior is also the source of a lot of MySQL's > security bugs. I don't see how we could imitate it without getting the bugs > as well. > > I think you have read a bit more into what I have said than is correct. MySQL can deal with thousands of users and separate schemas on commodity hardware. There are many design decisions (some questionable) that have made MySQL much better in a shared hosting environment than pg and I don't know where the grants system falls into that. MySQL does not have that many security problems because of how grants are stored. Most MySQL security issues are DOS sort of stuff based on a authenticated user being able to cause a crash. The decoupled backend storage and a less than awesome parser shared most of the blame for these issues. One thing I would suggest that the PG community keeps in mind while talking about built in connection process caching, is that it is very nice feature for memory leaks caused by a connection to not exist for and continue growing forever. NOTE: 100k is not a number that I would put much stock in. I don't recall ever mentioning that number and it is not a number that would be truthful for me to throw out. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
On fre, 2010-12-03 at 15:27 -0500, Robert Haas wrote: > On Fri, Dec 3, 2010 at 2:56 PM, r t wrote: > > What exactly was the objection to the following --> > > ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name; > > Is the objection that you might have been trying to specify a constraint > > named "using" ? I'm willing to make that option more difficult. :-) > > I think it's that someone might expect the word after USING to be the > name of an index AM. That could be avoided by writing USING INDEX -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
* no coordination of restarts/configuration changes between the cluster and the pooler * you have two pieces of config files to configure your pooling settings (having all that available say in a catalog in pg would be awesome) * you lose all of the advanced authentication features of pg (because all connections need to go through the pooler) and also ip-based stuff * no SSL support(in the case of pgbouncer) * complexity in reseting backend state (we added some support for that through explicit SQL level commands in recent releases but it still is a hazard) More: * pooler logs to separate file, for which there are (currently) no anaysis tools * pooling is incompatible with the use of ROLES for data security The last is a major issue, and not one I think we can easily resolve. MySQL has a pooling-friendly user system, because when you connect to MySQL you basically always connect as the superuser and on connection it switches you to your chosen login role. This, per Rob Wulsch, is one of the things at the heart of allowing MySQL to support 100,000 low frequency users per cheap hosting system. As you might imagine, this behavior is also the source of a lot of MySQL's security bugs. I don't see how we could imitate it without getting the bugs as well. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
Marko Tiikkaja wrote: This is almost exactly the patch from 2010-02 without CommandCounterIncrement()s. It's still a bit rough around the edges and needs some more comments, but I'm posting it here anyway. This patch passes all regression tests, but feel free to try to break it, there are probably ways to do that. This one also has the "always run DMLs to completion, and exactly once" behaviour. So this patch was marked "Ready for Committer", but a) no committer has picked it up yet and b) Marko has made changes here that nobody else has tested out yet that I've seen on the last. Accordingly, that classification may have been optimistic. It seems to me that another testing run-through from someone like David might be appropriate to build some confidence this latest patch should be a commit candidate. If there is a committer intending to work on this as-is, they haven't identified themselves. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Greg Smith writes: > In addition, Joachim submitted a synchronized snapshot patch that looks > to me like it slipped through the cracks without being fully explored. > ... > The way I read that thread, there were two objections: > 1) This mechanism isn't general enough for all use-cases outside of > pg_dump, which doesn't make it wrong when the question is how to get > parallel pg_dump running > 2) Running as superuser is excessive. Running as the database owner was > suggested as likely to be good enough for pg_dump purposes. IIRC, in old discussions of this problem we first considered allowing clients to pull down an explicit representation of their snapshot (which actually is an existing feature now, txid_current_snapshot()) and then upload that again to become the active snapshot in another connection. That was rejected on the grounds that you could cause all kinds of mischief by uploading a bad snapshot; so we decided to think about providing a server-side-only means to clone another backend's current snapshot. Which is essentially what Joachim's above-mentioned patch provides. However, as was discussed in that thread, that approach is far from being ideal either. I'm wondering if we should reconsider the pass-it-through-the-client approach, because if we could make that work it would be more general and it wouldn't need any special privileges. The trick seems to be to apply sufficient sanity testing to the snapshot proposed to be installed in the subsidiary transaction. I think the requirements would basically be (1) xmin <= any listed XIDs < xmax (2) xmin not so old as to cause GlobalXmin to decrease (3) xmax not beyond current XID counter (4) XID list includes all still-running XIDs in the given range One tricky part would be ensuring GlobalXmin doesn't decrease when the snap is installed, but I think that could be made to work if we take ProcArrayLock exclusively and insist on observing some other running transaction with xmin <= proposed xmin. For the pg_dump case this would certainly hold since xmin would be the parent pg_dump's xmin. Given the checks stated above, it would be possible for someone to install a snapshot that corresponds to no actual state of the database, eg it shows some T1 as running and T2 as committed when actually T1 committed before T2. I don't see any simple way for the installation function to detect that, but I'm not sure whether it matters. The user might see inconsistent data, but do we care? Perhaps as a safety measure we should only allow snapshot installation in read-only transactions, so that even if the xact does observe inconsistent data it can't possibly corrupt the database state thereby. This'd be no skin off pg_dump's nose, obviously. Or compromise on "only superusers can do it in non-read-only transactions". Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Boxuan Zhai wrote: I have updated the MERGE patch for two main problems. The patch inside the .tar.gz file you attached isn't right; that extracts to a tiny file of junk characters. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
Tom Lane wrote: > "Kevin Grittner" writes: >> I'm reviving the discussion on the subject topic because I just >> had an epiphany which makes it seem simple to implement. The >> concept of this is that if you start a SERIALIZABLE READ ONLY >> transaction in an SSI environment when certain conditions are >> true, it doesn't need to acquire predicate locks or test for >> rw-conflicts. > > I assume this would have to be a "hard" definition of READ ONLY, > not the rather squishy definition we use now? How would we manage > the compatibility implications? If there are issues with whether READ ONLY covers the right ground, it's likely to affect more than this particular issue -- I've taken advantage of the READ ONLY property of transactions to allow quite a few novel optimizations to SSI beyond what is presented in Cahill's doctoral work. I'm excluding temporary tables from SSI on the grounds that they are only read and written by a single transaction and therefore can't be a source of rw-dependencies, and I'm excluding system tables on the grounds that they don't follow normal snapshot isolation rules. Hint bit rewrites are not an issue for SSI. Are there any other squishy aspect I might need to consider? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
The one time this year top-posting seems appropriate...this patch seems stalled waiting for some sort of response to the concerns Alvaro raised here. Alvaro Herrera wrote: Excerpts from Fujii Masao's message of jue nov 25 10:47:12 -0300 2010: The attached patch s/CopyXLog/CopyBoth/g and adds the description about CopyBoth into the COPY section. I gave this a look. It seems good, but I'm not sure about this bit: + case 'W': /* Start Copy Both */ + /* +* We don't need to use getCopyStart here since CopyBothResponse +* specifies neither the copy format nor the number of columns in +* the Copy data. They should be always zero. +*/ + conn->result = PQmakeEmptyPGresult(conn, PGRES_COPY_OUT); + if (!conn->result) + return; + conn->asyncStatus = PGASYNC_COPY_BOTH; + conn->copy_already_done = 0; + break; I guess this was OK when this was conceived as CopyXlog, but since it's now a generic mechanism, this seems a bit unwise. Should this be reconsidered so that it's possible to change the format or number of columns? (The paragraph added to the docs is also a bit too specific about this being used exclusively in streaming replication, ISTM) While modifying the code, it occurred to me that we might have to add new ExecStatusType like PGRES_COPY_BOTH and use that for CopyBoth mode, for the sake of consistency. But since it's just alias of PGRES_COPY_BOTH for now, i.e., there is no specific behavior for that ExecStatusType, I don't think that it's worth adding that yet. I'm not so sure about this. If we think that it's worth adding a new possible state, we should do so now; we will not be able to change this behavior later.
Re: [HACKERS] WIP patch for parallel pg_dump
Joachim Wieland wrote: Regarding snapshot cloning and dump consistency, I brought this up already several months ago and asked if the feature is considered useful even without snapshot cloning. In addition, Joachim submitted a synchronized snapshot patch that looks to me like it slipped through the cracks without being fully explored. Since it's split in the official archives the easiest way to read the thread is at http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg143866.html Or you can use these two: http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php http://archives.postgresql.org/pgsql-hackers/2010-02/msg00363.php That never made it into a CommitFest proper that I can see, it just picked up review mainly from Markus. The way I read that thread, there were two objections: 1) This mechanism isn't general enough for all use-cases outside of pg_dump, which doesn't make it wrong when the question is how to get parallel pg_dump running 2) Running as superuser is excessive. Running as the database owner was suggested as likely to be good enough for pg_dump purposes. Ultimately I think that stalled because without a client that needed it the code wasn't so interesting yet. But now there is one; should that get revived again? It seems like all of the pieces needed to build what's really desired here are available, it's just the always non-trivial task of integrating them together the right way that's needed. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] allow COPY routines to read arbitrary numbers of fields
Attached is a patch that allows CopyReadAttibutesText() and CopyReadAttributesCSV() to read arbitrary numbers of attributes. Underflowing attributes are recorded as null, and space is made for overflowing attributes on a line. This patch doesn't result in any user-visible behavior. The current calling code will fail if the number of attributes read is not what is expected, as happens now. But it will allow the API to be used (when exposed) by a foreign data wrapper that can accept arbitrary numbers of attributes. My aim here is to get to something like: CREATE FOREIGN TABLE my_csv ( t text[] ) SERVER file_server OPTIONS (format 'csv', filename '/path/to/my/data.csv', textarray 'true', header 'true', delimiter ';', quote '@', escape '"', null ''); SELECT t[3] as f1, t[1] as f2, t[] as probably_null FROM my_csv; It would probably be nice to apply this before we start exposing the COPY API to FDW routines, as discussed earlier today. cheers andrew *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 141,146 typedef struct CopyStateData --- 141,151 */ StringInfoData attribute_buf; + /* field raw data pointers found by COPY FROM */ + + int max_fields; + char ** raw_fields; + /* * Similarly, line_buf holds the whole input line being processed. The * input cycle is first to read the whole line into line_buf, convert it *** *** 250,259 static void CopyOneRowTo(CopyState cstate, Oid tupleOid, static void CopyFrom(CopyState cstate); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); ! static int CopyReadAttributesText(CopyState cstate, int maxfields, ! char **fieldvals); ! static int CopyReadAttributesCSV(CopyState cstate, int maxfields, ! char **fieldvals); static Datum CopyReadBinaryAttribute(CopyState cstate, int column_no, FmgrInfo *flinfo, Oid typioparam, int32 typmod, --- 255,262 static void CopyFrom(CopyState cstate); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); ! static int CopyReadAttributesText(CopyState cstate, int maxfields); ! static int CopyReadAttributesCSV(CopyState cstate, int maxfields); static Datum CopyReadBinaryAttribute(CopyState cstate, int column_no, FmgrInfo *flinfo, Oid typioparam, int32 typmod, *** *** 1679,1685 CopyFrom(CopyState cstate) Oid in_func_oid; Datum *values; bool *nulls; ! int nfields; char **field_strings; bool done = false; bool isnull; --- 1682,1688 Oid in_func_oid; Datum *values; bool *nulls; ! int nfields = 0; char **field_strings; bool done = false; bool isnull; *** *** 1920,1927 CopyFrom(CopyState cstate) nulls = (bool *) palloc(num_phys_attrs * sizeof(bool)); /* create workspace for CopyReadAttributes results */ ! nfields = file_has_oids ? (attr_count + 1) : attr_count; ! field_strings = (char **) palloc(nfields * sizeof(char *)); /* Initialize state variables */ cstate->fe_eof = false; --- 1923,1934 nulls = (bool *) palloc(num_phys_attrs * sizeof(bool)); /* create workspace for CopyReadAttributes results */ ! if (! cstate->binary) ! { ! nfields = file_has_oids ? (attr_count + 1) : attr_count; ! cstate->max_fields = nfields; ! cstate->raw_fields = (char **) palloc(nfields * sizeof(char *)); ! } /* Initialize state variables */ cstate->fe_eof = false; *** *** 1985,1994 CopyFrom(CopyState cstate) /* Parse the line into de-escaped field values */ if (cstate->csv_mode) ! fldct = CopyReadAttributesCSV(cstate, nfields, field_strings); else ! fldct = CopyReadAttributesText(cstate, nfields, field_strings); fieldno = 0; /* Read the OID field if present */ if (file_has_oids) --- 1992,2009 /* Parse the line into de-escaped field values */ if (cstate->csv_mode) ! fldct = CopyReadAttributesCSV(cstate, nfields); else ! fldct = CopyReadAttributesText(cstate, nfields); ! ! /* check for overflowing fields */ ! if (nfields > 0 && fldct > nfields) ! ereport(ERROR, ! (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("extra data after last expected column"))); ! fieldno = 0; + field_strings = cstate->raw_fields; /* Read the OID field if present */ if (file_has_oids) *** *** 2218,2224 CopyFrom(CopyState cstate) pfree(values); pfree(nulls); ! pfree(field_strings); pfree(in_functions); pfree(typioparams); --- 2233,2240 pfree(values); pfree(nulls); ! if (! cstate->binary) ! pfree(cstate->raw_fields); pfree(in_functions); pfree(typioparams); *** *** 2717,2737 GetDecimalFromHex(char hex) * performing de-escaping as needed
Re: [HACKERS] FK's to refer to rows in inheritance child
On 12/05/2010 12:10 PM, Tom Lane wrote: Andrew Dunstan writes: On 12/04/2010 07:12 PM, Robert Haas wrote: I wouldn't necessarily be opposed to official topic branches at some point in the future, but I think it's premature to speculate about whether it'd be useful here. I'd need a lot of convincing if it imposed an extra burden on people like Tom. The only way I could see working is if some committer took ownership of the topic branch and guaranteed to keep it pretty much in sync with the master branch. Well, allegedly this is one of the reasons we moved to git. Anybody can do that in their own repository, just as easily as a core committer could. AFAICS it's not necessary for the core repo to contain the branch, up until the point where it's ready to merge into master. Well, ISTM that amounts to not having "official topic branches" :-) I agree that this is supposed to be one of git's strengths (or more exactly a strength of distributed SCM's generally). I don't really see any great value in sanctifying a particular topic branch with some official status. What I would like to see is people publishing the location of development repos so that they can be pulled from or merged, especially for any large patch. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
"Kevin Grittner" writes: > I'm reviving the discussion on the subject topic because I just had > an epiphany which makes it seem simple to implement. The concept of > this is that if you start a SERIALIZABLE READ ONLY transaction in an > SSI environment when certain conditions are true, it doesn't need to > acquire predicate locks or test for rw-conflicts. I assume this would have to be a "hard" definition of READ ONLY, not the rather squishy definition we use now? How would we manage the compatibility implications? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FK's to refer to rows in inheritance child
Andrew Dunstan writes: > On 12/04/2010 07:12 PM, Robert Haas wrote: >> I wouldn't necessarily be opposed to official topic branches at some point >> in the future, but I think it's premature to speculate about whether it'd be >> useful here. > I'd need a lot of convincing if it imposed an extra burden on people > like Tom. The only way I could see working is if some committer took > ownership of the topic branch and guaranteed to keep it pretty much in > sync with the master branch. Well, allegedly this is one of the reasons we moved to git. Anybody can do that in their own repository, just as easily as a core committer could. AFAICS it's not necessary for the core repo to contain the branch, up until the point where it's ready to merge into master. >> What is needed right now is design work, not code. > Indeed. In this case I don't think we even have agreement on the > features let alone how they might work. Yeah. But it's fair to look ahead to how development might proceed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using different semaphore for locking
flyusa2010 fly writes: > I found that postgres uses different semaphore system call on some different > operating systems. > For example, I found that on linux, System V semaphore (semop etc.) is used > to implement locking, while on Darwin, POSIX semaphore (sem_wait, sem_post > etc.) is used. > linux and Darwin support both System V and POSIX semaphores, i'm wondering > why postgres implement locking using different types of semaphore system > call. Is there any performance or semantic issue on different OSes? I think your question is answered in src/template/darwin: # Select appropriate semaphore support. Darwin 6.0 (Mac OS X 10.2) and up # support System V semaphores; before that we have to use POSIX semaphores, # which are less good for our purposes because they eat a file descriptor # per backend per max_connection slot. case $host_os in darwin[015].*) USE_NAMED_POSIX_SEMAPHORES=1 ;; *) USE_SYSV_SEMAPHORES=1 ;; esac regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] serializable read only deferrable
I'm reviving the discussion on the subject topic because I just had an epiphany which makes it seem simple to implement. The concept of this is that if you start a SERIALIZABLE READ ONLY transaction in an SSI environment when certain conditions are true, it doesn't need to acquire predicate locks or test for rw-conflicts. This would be particularly useful for pg_dump or large reports, as it would allow them to read data which was guaranteed to be consistent with later states of the database without risking serialization failure or contributing to the failure of other transactions. They should also run a bit faster without the overhead of locking and checking. Having completed the switch from a pair of rw-conflict pointers per serializable transaction to a list of rw-conflicts, I'm working through the more aggressive transaction clean-up strategies thereby allowed in preparation for the graceful degradation code. Along the way, I noticed how easy it is to allow a READ ONLY transaction to opt out of predicate locking and conflict detection when it starts with no concurrent non READ ONLY transactions active, or even to remove READ ONLY transactions from those activities when such a state is reached during the execution of READ ONLY transactions; while properly recognizing the *additional* conditions under which this would be valid is rather painful. (Those additional conditions being that no concurrent non-read-only transaction may overlap a committed non-read-only transaction which wrote data and committed before the read-only transaction acquired its snapshot.) The simple way to implement SERIALIZABLE READ ONLY DEFERRABLE under SSI would be to have each non-read-only serializable transaction acquire a heavyweight lock which can coexist with other locks at the same level (SHARE looks good) on some common object and hold that for the duration of the transaction, while a SERIALIZABLE READ ONLY DEFERRABLE transaction would need to acquire a conflicting lock (EXCLUSIVE looks good) before it could acquire a snapshot, and release the lock immediately after acquiring the snapshot. For these purposes, it appears that advisory locks could work, as long as the lock release does not wait for the end of the transaction (which it doesn't, if I'm reading the docs right) and as long as I can pick a lock ID which won't conflict with other uses. That latter part is the only iffy aspect of the whole thing that I can see. Of course, I could add a third lock method, but that seems like overkill to be able to get one single lock. Since I'm already allowing a transaction to opt out of predicate locking and conflict detection if there are no non-read-only transactions active when it acquires its snapshot, the work needed within the SSI code is pretty trivial; it's all in adding the DEFERRABLE word as a non-standard extension to SET TRANSACTION et al, and finding a heavyweight lock to use. Thoughts? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FK's to refer to rows in inheritance child
On 12/04/2010 07:12 PM, Robert Haas wrote: I wouldn't necessarily be opposed to official topic branches at some point in the future, but I think it's premature to speculate about whether it'd be useful here. I'd need a lot of convincing if it imposed an extra burden on people like Tom. The only way I could see working is if some committer took ownership of the topic branch and guaranteed to keep it pretty much in sync with the master branch. What is needed right now is design work, not code. Indeed. In this case I don't think we even have agreement on the features let alone how they might work. cheers andreew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what are clog and xlog?
flyusa2010 fly wrote: > clog is for commit log (very unclear for me ...) It is an important part of tracking tuple visibility. You might want to start with this page: http://wiki.postgresql.org/wiki/Hint_Bits -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new patch of MERGE (merge_204) & a question about duplicated ctid
On Sat, Dec 04, 2010 at 09:27:52PM +0800, Boxuan Zhai wrote: > Dear Greg, > > I have updated the MERGE patch for two main problems. Please attach the actual patch :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Using different semaphore for locking
I found that postgres uses different semaphore system call on some different operating systems. For example, I found that on linux, System V semaphore (semop etc.) is used to implement locking, while on Darwin, POSIX semaphore (sem_wait, sem_post etc.) is used. linux and Darwin support both System V and POSIX semaphores, i'm wondering why postgres implement locking using different types of semaphore system call. Is there any performance or semantic issue on different OSes? Thnaks!
[HACKERS] what are clog and xlog?
Hi, folks, I'm a newbie to postgres. I'm confused with xlog and clog. To my initial understanding, xlog is the periodic checkpoint log for data, while clog is for commit log (very unclear for me ...) Thanks!
Re: [HACKERS] disk caching for writing log
Thanks for your reply. Yes, i mean disk may lie to os. On Fri, Dec 3, 2010 at 12:14 PM, Stefan Kaltenbrunner wrote: > On 12/03/2010 06:43 PM, Heikki Linnakangas wrote: > >> On 03.12.2010 13:49, flyusa2010 fly wrote: >> >>> When writing log, dbms should synchronously flush log to disk. I'm >>> wondering, if it is possible that the logs are in disk cache, while the >>> control is returned to dbms again, so dbms thinks logs are persistent on >>> disk. In this case, if the disk fails, then there's incorrectness for >>> dbms >>> log writing, because the log is not persistent, but dbms considers it is >>> persistent! >>> >> >> I have no idea what you mean. The method we use to flush the WAL to disk >> should not be fallible to such failures, we wait for fsync() or >> fdatasync() to return before we assume the logs are safely on disk. If >> you can elaborate what you mean by "control is returned to dbms", maybe >> someone can explain why in more detail. >> > > I think he is refering to the plain old "the disk/os is lying about whether > the data really made it to stable storage" issue(especially with the huge > local caches on modern disks) - if you have such a disk and/or an OS with > broken barrier support you are doomed. > > > Stefan >
Re: [HACKERS] SQL/MED - file_fdw
On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: On Sun, Dec 5, 2010 at 07:24, Andrew Dunstan wrote: Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't it be better to call those functions, or refactor them so they are callable if necessary? We could export private functions and structs in copy.c, though details of the implementation should be kept in copy.c. How about splitting the file_fdw patch into two pieces? One exports the copy functions from the core, and another implements file_fdw using the infrastructure. Yes please. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
Hey hackers, Varargs-exec is useful only when programmer calls it directly. It is useless when libpq is used to create a more flexible high-level library (e.g., for C++). PQexecParams (PQexecPrepared) are good for it. What about auto reconnect. There are PQreset already and PG_CONNECTION_OK (_BAD) to manipulate the strategy of reconnection. If connection became BAD how many times suggested function will try to reconnect ? IMO, auto-reconnection is a magic... 2010/12/5 Magnus Hagander > On Sun, Dec 5, 2010 at 11:57, Heikki Linnakangas > wrote: > > On 05.12.2010 12:10, Magnus Hagander wrote: > >> > >> On Sun, Dec 5, 2010 at 10:22, Marc Balmer wrote: > >>> > >>> I am suggesting adding a function to libpq: > >>> > >>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...); > >>> > >>> It behaves similar to PQexec, but it allows for printf style varargs > and > >> > >> How is that not a horrible idea, compared to using PQexecParams()? You > >> have to remember to do all your escaping and things manually, whereas > >> PQexecParams() does it automatically. > > > > A varargs version of PQexecParams() would be handy, though. Imagine being > > able to do: > > > > PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", > foovar, > > barvar); > > > > instead of constructing an array for the variables. > > I agree, that sounds a lot more useful. And if definitely needs to be > split off from the auto-reconnection stuff. > > > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/ > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- // Dmitriy.
Re: [HACKERS] Suggesting a libpq addition
On Sun, Dec 5, 2010 at 11:57, Heikki Linnakangas wrote: > On 05.12.2010 12:10, Magnus Hagander wrote: >> >> On Sun, Dec 5, 2010 at 10:22, Marc Balmer wrote: >>> >>> I am suggesting adding a function to libpq: >>> >>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...); >>> >>> It behaves similar to PQexec, but it allows for printf style varargs and >> >> How is that not a horrible idea, compared to using PQexecParams()? You >> have to remember to do all your escaping and things manually, whereas >> PQexecParams() does it automatically. > > A varargs version of PQexecParams() would be handy, though. Imagine being > able to do: > > PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, > barvar); > > instead of constructing an array for the variables. I agree, that sounds a lot more useful. And if definitely needs to be split off from the auto-reconnection stuff. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
Am 05.12.2010 um 11:57 schrieb Heikki Linnakangas : > On 05.12.2010 12:10, Magnus Hagander wrote: >> On Sun, Dec 5, 2010 at 10:22, Marc Balmer wrote: >>> I am suggesting adding a function to libpq: >>> >>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...); >>> >>> It behaves similar to PQexec, but it allows for printf style varargs and >> >> How is that not a horrible idea, compared to using PQexecParams()? You >> have to remember to do all your escaping and things manually, whereas >> PQexecParams() does it automatically. > > A varargs version of PQexecParams() would be handy, though. Imagine being > able to do: > > PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, > barvar); > > instead of constructing an array for the variables. yes, indeed. while the suggested implementation relies on the caller to do all escaping (a bad idea...), the ease of use of a printf-like function with the security of PQexecParam would be nice. I'd say forget about my first suggestion, I will spend a bit more time on a better approach. (and at the same time remove the connection reset code) > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On 05.12.2010 12:10, Magnus Hagander wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. A varargs version of PQexecParams() would be handy, though. Imagine being able to do: PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, barvar); instead of constructing an array for the variables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On Sun, Dec 5, 2010 at 10:22, Marc Balmer wrote: > I am suggesting adding a function to libpq: > > PGresult *PQvexec(PGconn *conn, const char *fmt, ...); > > It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. > does connection re-establishing if the connection fails (it can be > discussed if this already to much magic, maybe remove this part). It > has been carefully designed to handle memory the right way. We use this > since a long time. It certainly doesn't belong in a function like that - and in fact, I think reconnection has to be handled at a different layer anyway.What if the connection was in the middle of a transaction? Silently rolls it back without letting the app know, since it switched to a new one? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
2010/12/5 Marc Balmer : > I am suggesting adding a function to libpq: > > PGresult *PQvexec(PGconn *conn, const char *fmt, ...); > > It behaves similar to PQexec, but it allows for printf style varargs and > does connection re-establishing if the connection fails (it can be > discussed if this already to much magic, maybe remove this part). It > has been carefully designed to handle memory the right way. We use this > since a long time. > > What do you think? > It's depend on implementation, but it can be a great security hole - see SQL injection Regards Pavel Stehule > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Suggesting a libpq addition
I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and does connection re-establishing if the connection fails (it can be discussed if this already to much magic, maybe remove this part). It has been carefully designed to handle memory the right way. We use this since a long time. What do you think? /* * Copyright (c) 2008, 2009, 2010 Marc Balmer * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright *notice, this list of conditions and the following disclaimer. * 2. Author's name may not be used endorse or promote products derived *from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /* PostgreSQL utility functions */ #include #include #include #include #include PGresult * PQvexec(PGconn *conn, const char *fmt, ...) { va_list ap; PGresult *res; int len, ntries; char *sql; char *errmsg; for (ntries = 0; PQstatus(conn) == CONNECTION_BAD && ntries < 10; ntries++) { PQreset(conn); if (PQstatus(conn) == CONNECTION_BAD) sleep(1); } if (PQstatus(conn) == CONNECTION_BAD) { res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR); goto finish; } va_start(ap, fmt); len = vasprintf(&sql, fmt, ap); va_end(ap); if (len == -1) { res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR); goto finish; } res = PQexec(conn, sql); for (ntries = 0; PQstatus(conn) == CONNECTION_BAD && ntries < 10; ntries++) { PQreset(conn); if (PQstatus(conn) == CONNECTION_BAD) sleep(1); else { PQclear(res); res = PQexec(conn, sql); } } free(sql); finish: return res; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On 12/01/2010 05:32 AM, Jeff Janes wrote: On 11/28/10, Robert Haas wrote: In a close race, I don't think we should get bogged down in micro-optimization here, both because micro-optimizations may not gain much and because what works well on one platform may not do much at all on another. The more general issue here is what to do about our high backend startup costs. Beyond trying to recycle backends for new connections, as I've previous proposed and with all the problems it entails, Is there a particular discussion of that matter you could point me to? the only thing that looks promising here is to try to somehow cut down on the cost of populating the catcache and relcache, not that I have a very clear idea how to do that. This has to be a soluble problem because other people have solved it. Oracle's backend start up time seems to be way higher than PG's. Their main solution is something that is fundamentally a built in connection pooler with some bells and whistles built in. I'm not sure "other people" you had in mind--Oracle is generally the one that pops to my mind. To some degree we're a victim of our own flexible and extensible architecture here, but I find it pretty unsatisfying to just say, OK, well, we're slow. What about "well OK, we have PGbouncer"? Are there fixable short-comings that it has which could make the issue less of an issue? well I would very much like to seen an integrated pooler in postgresql - pgbouncer is a very nice piece of software (and might even be a base for an integrated bouncer), but being not closely tied to the backend you are loosing a lot. One of the more obvious examples is that now that we have no flatfile copy of pg_authid you have to use cruel hacks like: http://www.depesz.com/index.php/2010/12/04/auto-refreshing-password-file-for-pgbouncer/ to get "automatic" management of roles. There are some other drawbacks as well: * no coordination of restarts/configuration changes between the cluster and the pooler * you have two pieces of config files to configure your pooling settings (having all that available say in a catalog in pg would be awesome) * you lose all of the advanced authentication features of pg (because all connections need to go through the pooler) and also ip-based stuff * no SSL support(in the case of pgbouncer) * complexity in reseting backend state (we added some support for that through explicit SQL level commands in recent releases but it still is a hazard) Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers