Re: [HACKERS] Sync Rep v17
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs si...@2ndquadrant.com wrote: The patch is very lite touch on a few areas of code, plus a chunk of specific code, all on master-side. Pretty straight really. I'm sure problems will be found, its not long since I completed this; thanks to Daniel Farina for your help with patch assembly. This looks like it's in far better shape than the previous version. Thanks to you and Dan for working on it. As you have this coded, enabling synchronous_replication forces all transactions to commit synchronously. This means that, when synchronous_replication=on, you get synchronous_commit=on behavior even if you have synchronous_commit=off. I'd prefer to see us go the other way, and say that you can only get synchronous replication when you're also getting synchronous commit. First, we should be clear to explain that you are referring to the fact that the request synchronous_commit = off synchronous_replication = on makes no sense in the way the replication system is currently designed, even though it is a wish-list item to make it work in 9.2+ Since that combination makes no sense we need to decide how will we react when it is requested. I think fail-safe is the best way. We should make the default the safe way and allow performance options in non-default paths. Hence the above combination of options is taken in the patch to be the same as synchronous_commit = on synchronous_replication = on If you want performance, you can still get it with synchronous_commit = off synchronous_replication = off which does have meaning So the way the patch is coded makes most sense for a safety feature. I think it does need to be documented also. Must fly now... -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote: Simon Riggs wrote: Most parameters are set on the primary. Set primary: synchronous_standby_names = 'node1, node2, node3' which means that whichever of those standbys connect first will become the main synchronous standby. Servers arriving later will be potential standbys (standby standbys doesn't sound right...). synchronous_standby_names can change at reload. Currently, the standby_name is the application_name parameter set in the primary_conninfo. When we set this for a client, or in postgresql.conf primary: synchronous_replication = on then we will wait at commit until the synchronous standby has reached the WAL location of our commit point. If the current synchronous standby dies then one of the other standbys will take over. (I think it would be a great idea to make the list a priority order, but I haven't had time to code that). If none of the standbys are available, then we don't wait at all if allow_standalone_primary is set. allow_standalone_primary can change at reload. Are we going to allow a command to be run when these things happen, like archive_command does, or is that something for 9.2? Not really sure which events you're speaking of, sorry. As I write, I don't see a place or a reason to run a command, but that might change with a longer explanation of what you're thinking. I wouldn't rule out adding something like that in this release, but I'm not around for the next week to add it. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: On the other hand, I see no particular harm in leaving the option in there either, though I definitely think the default should be changed to -1. The default setting should be to *not* freeze up if you lose the standby. That behaviour unexpectedly leads to an effective server down situation, rather than 2 minutes of slow running. This is supposed to be High Availability software so it will be bad for us if we allow that. We've discussed this many times already. The people that wish to wait forever have their wait-forever option, but lets not force it on everyone. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
Simon Riggs wrote: On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote: Simon Riggs wrote: Most parameters are set on the primary. Set primary: synchronous_standby_names = 'node1, node2, node3' which means that whichever of those standbys connect first will become the main synchronous standby. Servers arriving later will be potential standbys (standby standbys doesn't sound right...). synchronous_standby_names can change at reload. Currently, the standby_name is the application_name parameter set in the primary_conninfo. When we set this for a client, or in postgresql.conf primary: synchronous_replication = on then we will wait at commit until the synchronous standby has reached the WAL location of our commit point. If the current synchronous standby dies then one of the other standbys will take over. (I think it would be a great idea to make the list a priority order, but I haven't had time to code that). If none of the standbys are available, then we don't wait at all if allow_standalone_primary is set. allow_standalone_primary can change at reload. Are we going to allow a command to be run when these things happen, like archive_command does, or is that something for 9.2? Not really sure which events you're speaking of, sorry. As I write, I don't see a place or a reason to run a command, but that might change with a longer explanation of what you're thinking. I wouldn't rule out adding something like that in this release, but I'm not around for the next week to add it. Sorry. I was thinking of allowing a command to alert an administrator when we switch standby machines, or if we can't do synchronous standby any longer. I assume we put a message in the logs, and the admin could have a script that monitors that, but I thought a pager-like command would be helpful, though I don't think we do that in any other case, so maybe it is overkill. These are all optional ideas. Also, I like that you are using application name here. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python invalidate functions with composite arguments
On ons, 2011-02-09 at 10:09 +0100, Jan Urbański wrote: On 27/01/11 22:42, Jan Urbański wrote: On 23/12/10 14:50, Jan Urbański wrote: Here's a patch implementing properly invalidating functions that have composite type arguments after the type changes, as mentioned in http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's an incremental patch on top of the plpython-refactor patch sent eariler. Updated to master. Again. Committed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and wal streaming
On Fri, Feb 18, 2011 at 20:33, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: Better late than never (or?), here's the final cleanup of pg_streamrecv for moving into the main distribution, per discussion back in late dec or early jan. It also includes the stream logs in parallel to backup part that was not completed on pg_basebackup. Other than that, the only changes to pg_basebackup are the moving of a couple of functions into streamutil.c to make them usable from both, and the progress format output fix Fujii-san mentioned recently. Should be complete except for Win32 support (needs thread/fork thing for the background streaming feature. Shouldn't be too hard, and I guess that falls on me anyway..) and the reference documentation. And with no feedback to my question here (http://archives.postgresql.org/pgsql-hackers/2011-02/msg00805.php), I went with the duplicate the macros that are needed to avoid loading postgres.h path. Yes, I realize that this is far too late in the CF process really, but I wanted to post it anyway... If it's too late to be acceptable it should be possible to maintain this outside the main repository until 9.2, since it only changes frontend binaries. So I'm not actually going to put it on the CF page unless someone else says that's a good idea, to at least share the blame from Robert ;) Well, if you are going to stand behind it, the CF is not a requirement and you can apply it. I am, but I'm posting it here because I'd appreciate some review before it goes in, to protect our users from my bugs :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
On Fri, Feb 18, 2011 at 02:35:42PM -0500, Bruce Momjian wrote: /* Get the OpenSSL structure associated with a connection. Returns NULL for * unencrypted connections or if any other TLS library is in use. */ extern void *PQgetssl(PGconn *conn); We are under no compulsion to emulate OpenSSL if we switch to another library. The design intent is that we'd provide a separate function (PQgetnss?) and callers that know how to use that library would call that function. If they don't, it's not our problem. Who uses this? ODBC? There's a few users, that I can find anyway. psql is one. It uses this to get information about the connection, pgadmin does it also for a similar reasons I guess. Adding a seperate function for each SSL library seems odd. It would mean that psql would need the headers to every possible SSL library because it (in theory) doesn't know which library might be used at runtime. ODBC uses it as well. It really uses it for communication. As far as Google Code Search can it's the only one that does. But if the intention is to do it by adding new functions, we can and let the ODBC guys sort it out (ERROR: Using incompatable SSL connection). Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Debian readline/libedit breakage
On Sat, Feb 19, 2011 at 07:42:20PM +0100, Martijn van Oosterhout wrote: ODBC uses it as well. It really uses it for communication. As far as Google Code Search can it's the only one that does. But if the intention is to do it by adding new functions, we can and let the ODBC guys sort it out (ERROR: Using incompatable SSL connection). Actually, it would just break. There is currently no way to distinguish between no-SSL and not-OpenSSL, so ODBC would use PQgetssl() to determine the SSL status, get NULL and assume there is no SSL active, and subsequently break when trying to communicate via the sockets. Frankly I think that this solution doesn't meet the usual high standards of this project... Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Debian readline/libedit breakage
On 02/19/2011 01:42 PM, Martijn van Oosterhout wrote: On Fri, Feb 18, 2011 at 02:35:42PM -0500, Bruce Momjian wrote: /* Get the OpenSSL structure associated with a connection. Returns NULL for * unencrypted connections or if any other TLS library is in use. */ extern void *PQgetssl(PGconn *conn); We are under no compulsion to emulate OpenSSL if we switch to another library. The design intent is that we'd provide a separate function (PQgetnss?) and callers that know how to use that library would call that function. If they don't, it's not our problem. Who uses this? ODBC? There's a few users, that I can find anyway. psql is one. It uses this to get information about the connection, pgadmin does it also for a similar reasons I guess. Adding a seperate function for each SSL library seems odd. It would mean that psql would need the headers to every possible SSL library because it (in theory) doesn't know which library might be used at runtime. I don't see why. If you plug in a libpq that was compiled against, say, NSS under a psql that's expecting OpenSSL you'll get a null back instead of a pointer to an SSL object, but then that would be a silly thing to do. ODBC uses it as well. It really uses it for communication. As far as Google Code Search can it's the only one that does. But if the intention is to do it by adding new functions, we can and let the ODBC guys sort it out (ERROR: Using incompatable SSL connection). Yeah. It might make sense to have a library function that tells the client what SSL library is being used. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
On Fri, Feb 18, 2011 at 10:42:20AM -0500, Andrew Dunstan wrote: Could we provide an abstraction layer over whatever SSL library is in use with things like read/write/poll? Maybe that's what you had in mind for the passthrough mode. The suggested interface was as follows. It basically exposes the read/write interface that libpq itself uses. Whether its enough for all uses I don't know, but it was extensible. From the patch: + /* Get data about current TLS connection */ + extern PGresult *PQgettlsinfo(PGconn *conn); + /* Tell libpq whether it needs to initialize OpenSSL */ extern void PQinitSSL(int do_init); + /* Tell libpq we're taking over the connection. After this, no normal + * queries may be sent anymore. When finished you may close the connection */ + typedef PostgresPollingStatusType (*pq_read_func)( PGconn *conn, void *buf, int *len); + typedef PostgresPollingStatusType (*pq_write_func)( PGconn *conn, const void *buf, int *len); + typedef int (*pq_pending_func)( PGconn *conn ); + + typedef struct { + int len; /* Length of this structure, so users may determine if the + info they require is there. For backward compatability, + new members can only be added to the end. */ + pq_read_func read; + pq_write_func write; + pq_pending_func pending; + + /* char *ssllibname; Need not yet demonstrated. */ + /* void *sslptr; */ + } PQpassthrough; + + /* The pointer returned in state must be freed with PQfreemem() */ + extern int PQsetPassthrough(PGconn *conn, PQpassthrough **state ); + -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Debian readline/libedit breakage
On lör, 2011-02-19 at 13:55 -0500, Andrew Dunstan wrote: If you plug in a libpq that was compiled against, say, NSS under a psql that's expecting OpenSSL you'll get a null back instead of a pointer to an SSL object, but then that would be a silly thing to do. Not so silly if you consider partial upgrades. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On 2/18/11 4:06 PM, Simon Riggs wrote: v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. Yes, but what wine do I serve with it? ;-) -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote: 2. is md5 the most appropriate digest for this? If you need a cryptographically secure hash, do we need something stronger? If not, why not just use hash_any? MD5 is probably more appropriate than hash_any, because the latter is optimized for speed and collision avoidance and doesn't have a guaranteed external format. The only consideration against MD5 might be that it would make us look quite lame. We should probably provide builtin SHA1 and SHA2 functions for this and other reasons. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix for fuzzystrmatch
Hacker, I found two issues in fuzzystrmatch contrib. 1) Incorrect s_data shift in levenshtein calculation with threshold with multibyte characters. i index was used instead of start_column. 2) Missing dependency of fuzzystrmatch.o on levenshtein.c Patch is attached. -- With best regards, Alexander Korotkov. *** a/contrib/fuzzystrmatch/Makefile --- b/contrib/fuzzystrmatch/Makefile *** *** 16,18 top_builddir = ../.. --- 16,21 include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif + + fuzzystrmatch.o: fuzzystrmatch.c levenshtein.c + *** a/contrib/fuzzystrmatch/levenshtein.c --- b/contrib/fuzzystrmatch/levenshtein.c *** *** 377,383 levenshtein_internal(text *s, text *t, prev[start_column] = max_d + 1; curr[start_column] = max_d + 1; if (start_column != 0) ! s_data += s_char_len != NULL ? s_char_len[i - 1] : 1; start_column++; } --- 377,383 prev[start_column] = max_d + 1; curr[start_column] = max_d + 1; if (start_column != 0) ! s_data += s_char_len != NULL ? s_char_len[start_column - 1] : 1; start_column++; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: collect frequency statistics for arrays
Thanks for feedback on my proposal. Ok, I'll write it as an separate function. After that I'm going to look if is there a way to union them without kluge. If I'll not find such way then I'll propose patch with separate function. -- With best regards, Alexander Korotkov.
Re: [HACKERS] SSI bug?
Heikki Linnakangas wrote: On 14.02.2011 20:10, Kevin Grittner wrote: Promotion of the lock granularity on the prior tuple is where we have problems. If the two tuple versions are in separate pages then the second UPDATE could miss the conflict. My first thought was to fix that by requiring promotion of a predicate lock on a tuple to jump straight to the relation level if nextVersionOfRow is set for the lock target and it points to a tuple in a different page. But that doesn't cover a situation where we have a heap tuple predicate lock which gets promoted to page granularity before the tuple is updated. To handle that we would need to say that an UPDATE to a tuple on a page which is predicate locked by the transaction would need to be promoted to relation granularity if the new version of the tuple wasn't on the same page as the old version. Yeah, promoting the original lock on the UPDATE was my first thought too. Another idea is to duplicate the original predicate lock on the first update, so that the original reader holds a lock on both row versions. I think that would ultimately be simpler as we wouldn't need the next-prior chains anymore. For example, suppose that transaction X is holding a predicate lock on tuple A. Transaction Y updates tuple A, creating a new tuple B. Transaction Y sees that X holds a lock on tuple A (or the page containing A), so it acquires a new predicate lock on tuple B on behalf of X. If the updater aborts, the lock on the new tuple needs to be cleaned up, so that it doesn't get confused with later tuple that's stored in the same physical location. We could store the xmin of the tuple in the predicate lock to check for that. Whenever you check for conflict, if the xmin of the lock doesn't match the xmin on the tuple, you know that the lock belonged to an old dead tuple stored in the same location, and can be simply removed as the tuple doesn't exist anymore. That said, the above is about eliminating false negatives from some corner cases which escaped notice until now. I don't think the changes described above will do anything to prevent the problems reported by YAMAMOTO Takashi. Agreed, it's a separate issue. Although if we change the way we handle the read-update-update problem, the other issue might go away too. Unless I'm missing something, it sounds like tuple IDs are being changed or reused while predicate locks are held on the tuples. That's probably not going to be overwhelmingly hard to fix if we can identify how that can happen. I tried to cover HOT issues, but it seems likely I missed something. Storing the xmin of the original tuple would probably help with that too. But it would be nice to understand and be able to reproduce the issue first. I haven't been able to produce a test case yet, but I'm getting clear enough about the issue that I think I can see my way to a good fix. Even if I have a fix first, I'll continue to try to create a test to show the pre-fix failure (and post-fix success), to include in the regression suite. This is the sort of thing which is hard enough to hit that a regression could slip in because of some change and make it out in a release unnoticed if we aren't specifically testing for it. It's pretty clear that the issue is this -- we are cleaning up the predicate locks for a transaction when the transaction which read the data becomes irrelevant; but a heap tuple predicate locked by a transaction may be updated or deleted and become eligible for cleanup before the transaction become irrelevant. This can lead to cycles in the target links if a tuple ID is reused before the transaction is cleaned up. Since we may want to detect the rw-conflicts with the reading transaction based on later versions of a row after the tuple it read has been updated and the old tuple removed and its ID reused, Heikki's suggestion of predicate lock duplication in these cases goes beyond being a simple way to avoid the particular symptoms reported; it is actually necessary for correct behavior. One adjustment I'm looking at for it is that I think the tuple xmin can go in the lock target structure rather than the lock structure, because if the tuple ID has already been reused, it's pretty clear that there are no transactions which can now update the old tuple which had that ID, so any predicate locks attached to a target with the old xmin can be released and the target reused with the new xmin. This does, however, raise the question of what happens if the tuple lock has been promoted to a page lock and a tuple on that page is non-HOT updated. (Clearly, it's not an issue if the locks on a heap relation for a transaction have been promoted all the way to the relation granularity, but it's equally clear we don't want to jump to that granularity when we can avoid it.) It seems to me that in addition to what Heikki suggested, we need to create a predicate lock on the new tuple or its (new) page
Re: [HACKERS] Sync Rep v17
On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote: Sorry. I was thinking of allowing a command to alert an administrator when we switch standby machines, or if we can't do synchronous standby any longer. I assume we put a message in the logs, and the admin could have a script that monitors that, but I thought a pager-like command would be helpful For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps that could be used? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On Sat, Feb 19, 2011 at 9:17 PM, Peter Eisentraut pete...@gmx.net wrote: The only consideration against MD5 might be that it would make us look quite lame. We should probably provide builtin SHA1 and SHA2 functions for this and other reasons. In this particular case however the checksum is never exposed to the user but stays within shared memory. So nobody would notice that we are still using lame old md5sum :-) Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
Marti Raudsepp wrote: On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote: Sorry. ?I was thinking of allowing a command to alert an administrator when we switch standby machines, or if we can't do synchronous standby any longer. ?I assume we put a message in the logs, and the admin could have a script that monitors that, but I thought a pager-like command would be helpful For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps that could be used? Well, those are going to require work to notify someone externally, like send an email alert. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Sat, Feb 19, 2011 at 6:05 PM, Bruce Momjian br...@momjian.us wrote: Marti Raudsepp wrote: On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote: Sorry. ?I was thinking of allowing a command to alert an administrator when we switch standby machines, or if we can't do synchronous standby any longer. ?I assume we put a message in the logs, and the admin could have a script that monitors that, but I thought a pager-like command would be helpful For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps that could be used? Well, those are going to require work to notify someone externally, like send an email alert. although, that seems the work for a monitor tool. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF16 surrogate pairs in UTF8 encoding
Marko Kreen wrote: On 9/8/10, Tom Lane t...@sss.pgh.pa.us wrote: Marko Kreen mark...@gmail.com writes: Although it does seem unnecessary. The reason I asked for this to be spelled out is that ordinarily, a backslash escape \nnn is a very low-level thing that will insert exactly what you say. To me it's quite unexpected that the system would editorialize on that to the extent of replacing two UTF16 surrogate characters by a single code point. That's necessary for correctness because our underlying storage is UTF8, but it's not obvious that it will happen. (As a counterexample, if our underlying storage were UTF16, then very different things would need to happen for the exact same SQL input.) I think a lot of people will have this same question when reading this para, which is why I asked for an explanation there. Ok, but I still don't like the whens. How about: -6-digit form technically makes this unnecessary. (When surrogate -pairs are used when the server encoding is literalUTF8/, they -are first combined into a single code point that is then encoded -in UTF-8.) +6-digit form technically makes this unnecessary. (Surrogate +pairs are not stored directly, but combined into a single +code point that is then encoded in UTF-8.) Applied, thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FDW API: don't like the EXPLAIN mechanism
I've been poking at the FDW stuff and file_fdw, and I find myself dissatisfied with the way that the EXPLAIN support is designed, namely that we have to compute a string at plan time to be displayed by EXPLAIN. There are a couple of problems with that: 1. The explainInfo string is useless overhead during a normal query. 2. There is no way to vary the display depending on EXPLAIN options such as VERBOSE and COSTS OFF. It seems fairly obvious to me that there might be scope for showing more info in VERBOSE mode. But even more to the point, the current regression test's example output: EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv; Foreign Scan on public.agg_csv Output: a, b Foreign Plan: file=@abs_srcdir@/data/agg.csv, size=46 is extremely bogus. COSTS OFF ought to suppress highly-volatile details like the file size. To stick with this design, we'd have to have a convention that explainInfo never shows any more data than would be appropriate in COSTS OFF mode. And then there's 3. There is no way to report actual inside-the-FDW execution stats in EXPLAIN ANALYZE mode. So this seems very far short of satisfactory. I think we should drop FdwPlan.explainInfo and instead define an additional callback function that is called by EXPLAIN to produce the extra data for EXPLAIN to display. This function could have access to the EXPLAIN options as well as (in ANALYZE mode) the final state of the execution node, so it could tailor its output appropriately. BTW, another thing that strikes me as poorly done in the file_fdw code is that it gathers up all the options of the foreign table, server, and wrapper at plan time, and stores those in the plan, and uses that information at runtime. What happens if the options change underneath a prepared plan? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update PostgreSQL shared memory usage table for 9.0?
Bruce Momjian wrote: Can someone update the PostgreSQL shared memory usage table for 9.0? http://www.postgresql.org/docs/9.0/static/kernel-resources.html#SYSVIPC Right now it says Approximate shared memory bytes required (as of 8.3). This documentation still says as of 8.3. If they are unchanged, can I just remove the version mention? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
Peter Eisentraut pete...@gmx.net writes: On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote: 2. is md5 the most appropriate digest for this? If you need a cryptographically secure hash, do we need something stronger? If not, why not just use hash_any? MD5 is probably more appropriate than hash_any, because the latter is optimized for speed and collision avoidance and doesn't have a guaranteed external format. The only consideration against MD5 might be that it would make us look quite lame. Only to people who don't understand whether crypto strength is actually important in a given use-case. However ... IIRC, hash_any gives different results on bigendian and littleendian machines. I'm not sure if a predictable cross-platform result is important for this use? If you're hashing data containing native integers, this is a problem anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
Excerpts from Tom Lane's message of sáb feb 19 21:26:42 -0300 2011: However ... IIRC, hash_any gives different results on bigendian and littleendian machines. I'm not sure if a predictable cross-platform result is important for this use? If you're hashing data containing native integers, this is a problem anyway. The current feature only lets you synchronize snapshots within a cluster -- you can't ship them to another machine. I don't think dependency on endianness is a problem here. (But no, the data doesn't contain native integers -- it's the snapshot's text representation.) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] work_mem / maintenance_work_mem maximums
Stephen Frost wrote: -- Start of PGP signed section. Greetings, After watching a database import go abysmally slow on a pretty beefy box with tons of RAM, I got annoyed and went to hunt down why in the world PG wasn't using but a bit of memory. Turns out to be a well known and long-standing issue: http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg101139.html Now, we could start by fixing guc.c to correctly have the max value for these be MaxAllocSize/1024, for starters, then at least our users would know when they set a higher value it's not going to be used. That, in my mind, is a pretty clear bug fix. Of course, that doesn't help us poor data-warehousing bastards with 64G+ machines. Sooo.. I don't know much about what the limit is or why it's there, but based on the comments, I'm wondering if we could just move the limit to a more 'sane' place than the-function-we-use-to-allocate. If we need a hard limit due to TOAST, let's put it there, but I'm hopeful we could work out a way to get rid of this limit in repalloc and that we can let sorts and the like (uh, index creation) use what memory the user has decided it should be able to. Is this a TODO? Can we easily fix the tuplesort.c code? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] work_mem / maintenance_work_mem maximums
Is this a TODO? Can we easily fix the tuplesort.c code? Easily, no. But that's not a reason for it to not be a TODO. I, too, would like to be able to make use of 32GB of work_mem effectively. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote: First, we should be clear to explain that you are referring to the fact that the request synchronous_commit = off synchronous_replication = on makes no sense in the way the replication system is currently designed, even though it is a wish-list item to make it work in 9.2+ What exactly do you mean by make it work? We can either (1) wait for the local commit and the remote commit (synchronous_commit=on, synchronous_replication=on), (2) wait for the local commit only (synchronous_commit=on, synchronous_replication=off), or (3) wait for neither (synchronous_commit=off, synchronous_replication=off). There's no fourth possible behavior, AFAICS. The question is whether synchronous_commit=off, synchronous_replication=on should behave like (1) or (3); AFAICS there's no fourth possible behavior. You have it as #1; I'm arguing it should be #3. I realize it's an arguable point; I'm just arguing for what makes most sense to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW API: don't like the EXPLAIN mechanism
I wrote: ... I think we should drop FdwPlan.explainInfo and instead define an additional callback function that is called by EXPLAIN to produce the extra data for EXPLAIN to display. This function could have access to the EXPLAIN options as well as (in ANALYZE mode) the final state of the execution node, so it could tailor its output appropriately. My initial idea was to define the new callback function with signature char * ExplainForeignScan (ForeignScanState *node, ExplainState *es); and stick to pretty much the functionality implemented in the submitted patch, ie, you could expect to get output looking like this: Foreign Scan on public.agg_csv FDW Info: file=@abs_srcdir@/data/agg.csv, size=46 However, it occurs to me that as long as we're passing the function the ExplainState, it has what it needs to add arbitrary EXPLAIN result fields. Although it could do this the hard way, we could make it a lot easier by exporting the ExplainPropertyType functions. Then it'd be possible to produce output like Foreign Scan on public.agg_csv Foreign File: @abs_srcdir@/data/agg.csv Foreign File Size: 46 which seems a whole lot nicer than the current design. In this scheme the callback function would just be declared to return void. Anybody have an objection to doing it like that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: On the other hand, I see no particular harm in leaving the option in there either, though I definitely think the default should be changed to -1. The default setting should be to *not* freeze up if you lose the standby. That behaviour unexpectedly leads to an effective server down situation, rather than 2 minutes of slow running. My understanding is that the parameter will wait on every commit, not just once. There's no mechanism to do anything else. But I did some testing this evening and actually it appears to not work at all. I hit walreceiver with a SIGSTOP and the commit never completes, even after the two minute timeout. Also, when I restarted walreceiver after a long time, I got a server crash. DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690 DEBUG: released 0 procs up to 0/3014690 DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 DEBUG: released 2 procs up to 0/3027BC8 WARNING: could not locate ourselves on wait queue server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: DEBUG: shmem_exit(-1): 0 callbacks to make DEBUG: proc_exit(-1): 0 callbacks to make FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Failed. ! LOG: record with zero length at 0/3027BC8 DEBUG: CommitTransaction DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 0/300 LOG: streaming replication successfully connected to primary DEBUG: standby standby is a potential synchronous standby DEBUG: write 0/0 flush 0/0 apply 0/3027BC8 DEBUG: released 0 procs up to 0/0 DEBUG: standby standby has now caught up with primary DEBUG: write 0/3027C18 flush 0/0 apply 0/3027BC8 DEBUG: standby standby is now the synchronous replication standby DEBUG: released 0 procs up to 0/0 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027BC8 DEBUG: released 0 procs up to 0/3027C18 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027C18 DEBUG: released 0 procs up to 0/3027C18 (lots more copies of those last two messages) I believe the problem is that the definition of IsOnSyncRepQueue is bogus, so that the loop in SyncRepWaitOnQueue always takes the first branch. It was a little confusing to me setting this up that setting only synchronous_replication did nothing; I had to also set synchronous_standby_names. We might need a cross-check there. I believe the docs for synchronous_replication also need some updating; this part appears to be out of date: +between primary and standby. The commit wait will last until the +first reply from any standby. Multiple standby servers allow +increased availability and possibly increase performance as well. The words on the primary in the next sentence may not be necessary any more either, as I believe this parameter now has no effect anywhere else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW API: don't like the EXPLAIN mechanism
On Sat, Feb 19, 2011 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Anybody have an objection to doing it like that? I don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation, window functions
Robert Haas wrote: On Thu, Sep 23, 2010 at 11:34 PM, Dennis Bj?rklund d...@zigo.dhs.org wrote: On Wed, Sep 22, 2010 at 6:03 AM, Dennis Bj?rklund d...@zigo.dhs.org wrote: But I confess that I'm sort of murky on how ORDER affects the window frame, or how to rephrase this more sensibly. The rows included in the calculation of the window function are per default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW where CURRENT ROW include all the rows that are equal to the row you are at according to the ordering. So if you say order by name then all the rows up to your name and all rows with the same name are included, not later rows. If you don't have any ordering, then all rows are equal and all rows are included in the computation. That's why your example behaved like it did. At least that's my understanding of how these things work. I've not used window functions very much myself. This is fairly difficult stuff and it probably don't belong in a tutorial but the current wording suggest that you can add any ordering and it won't affect the result. That is also a bad since it might teach people the wrong thing. Hmm... it is true that average will produce the same results on any ordering of the same set of input values, though. Perhaps the word partition emcompass that, though then again maybe not. I'd be happy to fix this if I understand what to fix it to. I clarified the window function ORDER BY wording to avoid mentioning avg(). Applied patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 11859b4..218988e 100644 *** a/doc/src/sgml/advanced.sgml --- b/doc/src/sgml/advanced.sgml *** SELECT depname, empno, salary, avg(salar *** 383,392 /para para ! Although functionavg/ will produce the same result no matter ! what order it processes the partition's rows in, this is not true of all ! window functions. When needed, you can control that order using ! literalORDER BY/ within literalOVER/. Here is an example: programlisting SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary; --- 383,392 /para para ! You can also control the order in which rows are processed by ! window functions using literalORDER BY/ within literalOVER/. ! (The window literalORDER BY/ does not even have to match the ! order in which the rows are output.) Here is an example: programlisting SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: FDW API
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 11.02.2011 22:50, Heikki Linnakangas wrote: I spent some more time reviewing this, and working on the PostgreSQL FDW in tandem. Here's an updated API patch, with a bunch of cosmetic changes, and a bug fix for FOR SHARE/UPDATE. Another version, rebased against master branch and with a bunch of small cosmetic fixes. I've applied this after a moderate amount of editorialization. The question of avoiding extra relkind lookups is still open. We talked about adding relkind to RangeTblEntry, but I wonder whether adding a new RTEKind would be a better idea. Haven't researched it yet. I have a hacked-up version of contrib/file_fdw that I've been using to test it with. That needs some more cleanup before committing, but I think it should not take too long. The one thing that is kind of annoying is that the regression tests need generated files (to insert absolute paths) and it seems like the PGXS infrastructure doesn't know how to clean up the generated files afterwards. Anybody have any thoughts about fixing that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Sat, Feb 19, 2011 at 10:52 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote: First, we should be clear to explain that you are referring to the fact that the request synchronous_commit = off synchronous_replication = on makes no sense in the way the replication system is currently designed, even though it is a wish-list item to make it work in 9.2+ What exactly do you mean by make it work? We can either (1) wait for the local commit and the remote commit (synchronous_commit=on, synchronous_replication=on), (2) wait for the local commit only (synchronous_commit=on, synchronous_replication=off), or (3) wait for neither (synchronous_commit=off, synchronous_replication=off). There's no fourth possible behavior, AFAICS. The question is whether synchronous_commit=off, synchronous_replication=on should behave like (1) or (3); AFAICS there's no fourth possible behavior. You have it as #1; I'm arguing it should be #3. I realize it's an arguable point; I'm just arguing for what makes most sense to me. IMHO, we should stick to the safest option. considering that synchronous_replication to on means that we *want* durability, and that synchronous_commit to off means we don't *care* about durability. Then the real question here is: in the presence of synchronous_replication to on (which means we want durability), are we allowed to assume we can loss data? one way to manage that is simply disallow that combination with an error, maybe that is a bit strict but we haven't to make assumptions; the other option is to keep safe which means keep durability so if you want to risk some data then you should disable synchronous_replication as well as synchronous_commit... maybe sending a message to the log when you detect the conflicting situation. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers