[HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
Well, it is none of the things I considered. The problem seems to be due to use of --delete in the base backup rsync (see diff attached). In fact I can now reproduce the uninitialized pages using the bare bones method: primary: $ grep archive_command postgresql.conf archive_command = 'rsync %p standby:/var/lib/postgresql/archive' $ pgbench -c 4 -t 20 bench (wait for approx 1 transactions) standby: $ psql -h primary -c SELECT pg_start_backup('backup'); $ rsync --exclude pg_xlog/\* --exclude postmaster.pid --delete --exclude=backup_label \ primary:/var/lib/postgresql/8.3/main/* \ /var/lib/postgresql/8.3/main $ psql -h primary -c SELECT pg_stop_backup(); $ grep restore_command recovery.conf restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t /tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r' $ /etc/init.d/postgresql-8.3 start (wait for approx 14 transactions) $ touch /tmp/trigger.5432 Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I'd be interested to know if the other folks getting these warnings were using unusual rsync options either during backup or for archiving. regards Mark On 30/12/10 13:32, Mark Kirkwood wrote: I'm frankly puzzled about what Pitrtools is doing that is different - I only noticed it using rsync compression (-z) and doing rsync backups via pulling from the standby rather than pushing from the primary (I'm in the process of trying these variations out in the bare bones case). Just as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if there is/are timing issues which mean that recovery might use some (corrupted) logs from there before the (clean) archived ones arrive (will check). *** cmd_standby.orig Tue Dec 28 21:10:31 2010 --- cmd_standby Thu Dec 30 05:20:04 2010 *** *** 175,181 if debug == on: ssh_flags = -vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout)) !rsync_flags = -avzl --delete --stats --exclude=backup_label pg_standby_flags = -s5 -w0 -d -c if pgversion == '8.2': pg_standby_args = %%f %%p -k%s % (float(numarchives)) --- 175,181 if debug == on: ssh_flags = -vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout)) !rsync_flags = -a pg_standby_flags = -s5 -w0 -d -c if pgversion == '8.2': pg_standby_args = %%f %%p -k%s % (float(numarchives)) *** *** 184,190 else: ssh_flags = -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout)) !rsync_flags = -azl --delete --exclude=backup_label pg_standby_flags = -s5 -w0 -c if pgversion == '8.2': pg_standby_args = %%f %%p -k%s % (float(numarchives)) --- 184,190 else: ssh_flags = -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout)) !rsync_flags = -azl pg_standby_flags = -s5 -w0 -c if pgversion == '8.2': pg_standby_args = %%f %%p -k%s % (float(numarchives)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On Wed, Dec 29, 2010 at 22:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Magnus Hagander mag...@hagander.net writes: Would people be interested in putting pg_streamrecv (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for 9.1? I think it would make sense to do so. +1 for having that in core, only available for the roles WITH REPLICATION I suppose? Yes. Well, anybody who wants can run it, but they need those permissions on the server to make it work. pg_streamrecv is entirely a client app. I think that the base backup feature is more important than simple streaming chunks of the WAL (SR already does this). Talking about the base backup over libpq, it is something we should implement to fulfill people's desire that claim an easy replication setup. Yes, definitely. But that also needs server side support. Yeah, but it's already in core for 9.1, we have pg_read_binary_file() there. We could propose a contrib module for previous version implementing the function in C, that should be pretty easy to code. Oh. I didn't actually think about that one. So yeah, we could use that - making it easy to code. However, I wonder how much less efficient it would be than being able to stream the base backup. It's going to be a *lot* more roundtrips across the network, and we're also going to open/close the files a lot more. Also, I haven't tested it, but a quick look at the code makes me wonder how it will actually work with tablespaces - it seems to only allow files under PGDATA? That could of course be changed.. The only reason I didn't do that for pg_basebackup is that I wanted a self-contained python script, so that offering a public git repo is all I needed as far as distributing the tool goes. Right, there's an advantage with that when it comes to being able to work on old versions. Yeah, the WIP patch heikki posted is simliar, except it uses tar format and is implemented natively in the backend with no need for pl/pythonu to be installed. As of HEAD the dependency on pl/whatever is easily removed. The included C tool would need to have a parallel option from the get-go if at all possible, but if you look at the pg_basebackup prototype, it would be good to drop the wrong pg_xlog support in there and rely on a proper archiving setup on the master. Do you want to work on internal archive and restore commands over libpq in the same effort too? I think this tool should be either a one time client or a daemon with support for: Definitely a one-time client. If you want it to be a deamon, you write a small wrapper that makes it one :) - running a base backup when receiving a signal - continuous WAL streaming from a master Yes. - accepting standby connections and streaming to them I see that as a separate tool, I think. But still a useful one, sure. - one-time libpq streaming of a WAL file, either way Hmm. That might be interesting, yes. Maybe we don't need to daemonize the tool from the get-go, but if you're going parallel for the base-backup case you're almost there, aren't you? Also having internal commands for archive and restore commands that rely on this daemon running would be great too. I don't want anything *relying* on this tool. I want to keep the current way where you can choose whatever you prefer - I just want us to ship a good default tool. I'd offer more help if it wasn't for finishing the extension patches, :-) Yeah, focus on that, please - don't want to get it stalled. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On Wed, Dec 29, 2010 at 20:19, Gurjeet Singh singh.gurj...@gmail.com wrote: On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas robertmh...@gmail.com wrote: On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Is it really stable enough for bin/? My impression of the state of affairs is that there is nothing whatsoever about replication that is really stable yet. Well, that's not stopping us from shipping a core feature called replication. I'll defer to others on how mature pg_streamrecv is, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do. As the README says that is not self-contained (for no fault of its own) and one should typically set archive_command to guarantee zero WAL loss. Yes. Though you can combine it fine with wal_keep_segments if you think that's safe - but archive_command is push and this tool is pull, so if your backup server goes down for a while, pg_streamrecv will get a gap and fail. Whereas if you configure an archive_command, it will queue up the log on the master if it stops working, up to the point of shutting it down because of out-of-disk. Which you *want*, if you want to be really sure about the backups. quote TODO: Document some ways of setting up an archive_command that works well together with pg_streamrecv. /quote I think implementing just that TODO might make it a candidate. Well, yes, that's obviously a requirement. I have neither used it nor read the code, but if it works as advertised then it is definitely a +1 from me; no preference of bin/ or contrib/, since the community will have to maintain it anyway. It's not that much code, but some more eyes on it would always be good! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On Wed, Dec 29, 2010 at 19:42, Robert Haas robertmh...@gmail.com wrote: On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Is it really stable enough for bin/? My impression of the state of affairs is that there is nothing whatsoever about replication that is really stable yet. Well, that's not stopping us from shipping a core feature called replication. I'll defer to others on how mature pg_streamrecv is, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do. It has had less eyes on it, which puts it worse off than general replication. OTOH, it's a lot simper code, which puts it better. Either way, as long as it gets those eyes before release if we put it in, it shouldn't be worse off than general replication. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On 2010-12-30 9:02 AM +0200, Greg Smith wrote: Marko Tiikkaja wrote: I have no idea why it worked in the past, but the patch was never designed to work for UPSERT. This has been discussed in the past and some people thought that that's not a huge deal. It takes an excessively large lock when doing UPSERT, which means its performance under a heavy concurrent load can't be good. The idea is that if the syntax and general implementation issues can get sorted out, fixing the locking can be a separate performance improvement to be implemented later. Using MERGE for UPSERT is the #1 use case for this feature by a gigantic margin. If that doesn't do what's expected, the whole implementation doesn't provide the community anything really worth talking about. That's why I keep hammering on this particular area in all my testing. I'm confused. Are you saying that the patch is supposed to lock the table against concurrent INSERT/UPDATE/DELETE/MERGE? Because I don't see it in the patch, and the symptoms you're having are a clear indication of the fact that it's not happening. I also seem to recall that people thought locking the table would be excessive. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Magnus Hagander's message of mié dic 29 11:40:34 -0300 2010: On Wed, Dec 29, 2010 at 15:05, Gurjeet Singh singh.gurj...@gmail.com wrote: Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix? Um, I just copied it off a similar entry elsewhere. I saw no comment about what _P actually means, and I can't say I know. I know very little about the bison files :-) Some lexer keywords have a _P prefix because otherwise they'd collide with some symbol in Windows header files or something like that. It's old stuff, but I think you, Magnus, were around at that time. Heh. That doesn't mean I *remember* it :-) But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the comment convert some keywords.c symbols to KEYWORD_P to prevent conflict. Based on that, I should probably change it back, right? I just tried a patch for it and it compiles and checks just fine with the _P parts removed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote: As the README says that is not self-contained (for no fault of its own) and one should typically set archive_command to guarantee zero WAL loss. Yes. Though you can combine it fine with wal_keep_segments if you think that's safe - but archive_command is push and this tool is pull, so if your backup server goes down for a while, pg_streamrecv will get a gap and fail. Whereas if you configure an archive_command, it will queue up the log on the master if it stops working, up to the point of shutting it down because of out-of-disk. Which you *want*, if you want to be really sure about the backups. I was thinking I'ld like use pg_streamrecv to make my archive, and the archive script on the master would just verify the archive has that complete segment. This get's you an archive synced as it's made (as long as streamrecv is running), and my verifyarchive command would make sure that if for some reason, the backup archive went down, the wal segments would be blocked on the master until it's up again and current. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Snapshot synchronization, again...
The snapshot synchronization discussion from the parallel pg_dump patch somehow ended without a clear way to go forward. Let me sum up what has been brought up and propose a short- and longterm solution. Summary: Passing snapshot sync information can be done either: a) by returning complete snapshot information from the backend to the client so that the client can pass it along to a different backend b) or by returning only a unique identifier to the client and storing the actual snapshot data somewhere on the server side Advantage of a: no memory is used in the backend and no memory needs to get cleaned up, it is also theoretically possible that we could forward that data to a hot standby server and do e.g. a dump partially on the master server and partially on the hot standby server or among several hot standby servers. Disadvantage of a: The snapshot must be validated to make sure that its information is still current, it might be difficult to cover all cases of this validation. A client can not only access exactly a published snapshot, but just about any snapshot that fits and passes the validation checks (this is more a disadvantage than an advantage because it allows to see a database state that never existed in reality). Advantage of b: No validation necessary, as soon as the transaction that publishes the snapshot loses that snapshot, it will also revoke the snapshot information (either by removing a temp file or deleting it from shared memory) Disadvantage of b: It doesn't allow a snapshot to be installed on a different server. It requires a serializable open transaction to hold the snapshot. What I am proposing now is the following: We return snapshot information as a chunk of data to the client. At the same time however, we set a checksum in shared memory to protect against modification of the snapshot. A publishing backend can revoke its snapshot by deleting the checksum and a backend that is asked to install a snapshot can verify that the snapshot is correct and current by calculating the checksum and comparing it with the one in shared memory. This only costs us a few bytes for the checksum * max_connection in shared memory and apart from resetting the checksum it does not have cleanup and verification issues. Note that we are also free to change the internal format of the chunk of data we return whenever we like, so we are free to enhance this feature in the future, transparently to the client. Thoughts? Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On Thu, Dec 30, 2010 at 13:30, Aidan Van Dyk ai...@highrise.ca wrote: On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote: As the README says that is not self-contained (for no fault of its own) and one should typically set archive_command to guarantee zero WAL loss. Yes. Though you can combine it fine with wal_keep_segments if you think that's safe - but archive_command is push and this tool is pull, so if your backup server goes down for a while, pg_streamrecv will get a gap and fail. Whereas if you configure an archive_command, it will queue up the log on the master if it stops working, up to the point of shutting it down because of out-of-disk. Which you *want*, if you want to be really sure about the backups. I was thinking I'ld like use pg_streamrecv to make my archive, and the archive script on the master would just verify the archive has that complete segment. This get's you an archive synced as it's made (as long as streamrecv is running), and my verifyarchive command would make sure that if for some reason, the backup archive went down, the wal segments would be blocked on the master until it's up again and current. That's exactly the method I was envisionning, and in fact that I am using in a couple of cases - jus thaven't documented it properly :) Since pg_streamrecv only moves a segment into the correct archive location when it's completed, the archive_command only needs to check if the file *exists* - if it does, it's transferred, if not, it returns an error to make sure the wal segments don't get cleaned out. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On Thu, Dec 30, 2010 at 3:55 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Well, it is none of the things I considered. The problem seems to be due to use of --delete in the base backup rsync (see diff attached). In fact I can now reproduce the uninitialized pages using the bare bones method: Any time a relation is extended, we end up with a page of all zeros at the end until the updated page is written out, which often doesn't happen until the next checkpoint. So it doesn't seem too mysterious that you could end up with all zeroes pages on the standby initially, but WAL replay ought to fix that. I suppose the reason it isn't is because you've excluded the backup label, so recovery will begin from the wrong place. Unless I'm missing something, that seems like a really bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Old git repo
Hi! Are we ready to drop the old git mirror? The one that's still around (as postgresql-old.git) from before we migrated the main repository to git, and thus has the old hashes around. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
Excerpts from Magnus Hagander's message of jue dic 30 08:57:09 -0300 2010: On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera alvhe...@commandprompt.com wrote: Some lexer keywords have a _P prefix because otherwise they'd collide with some symbol in Windows header files or something like that. It's old stuff, but I think you, Magnus, were around at that time. Heh. That doesn't mean I *remember* it :-) :-) But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the comment convert some keywords.c symbols to KEYWORD_P to prevent conflict. Wow, what a mess of a patch ... nowadays this would be like 10 commits (or so I hope) ... hey, did Bruce sabotage the qnx4 port surreptitiously? Based on that, I should probably change it back, right? I just tried a patch for it and it compiles and checks just fine with the _P parts removed. Hmm, I wouldn't bother really. It's not that important anyway, IMHO. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function for dealing with xlog data
On Tue, Dec 28, 2010 at 16:30, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Magnus Hagander's message of mar dic 28 10:46:31 -0300 2010: Well, yeah, that was obvious ;) The question is, how much do we prefer the more elegant method? ;) If we go the new type route, do we need it to have an implicit cast to text, for backwards compatibility? I'd argue not. Probably all existing uses are just selecting the function value. What comes back to the client will just be the text form anyway. That's certainly the only thing I've seen. I'm of the opinion that a new type isn't worth the work, myself, but it would mostly be up to whoever was doing the work. Fair enough - at least enough people have said it won't be rejected because it's done as a function rather than a datatype - so that seems like the easiest way to proceed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote: Are we ready to drop the old git mirror? The one that's still around (as postgresql-old.git) from before we migrated the main repository to git, and thus has the old hashes around. I see no reason to drop that ever, or at least not any time soon. What is it costing us? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote: Are we ready to drop the old git mirror? The one that's still around (as postgresql-old.git) from before we migrated the main repository to git, and thus has the old hashes around. I see no reason to drop that ever, or at least not any time soon. What is it costing us? Some disk space, so almost nothing. And the potential that people grab it by mistake - it adds a bit to confusion. Looking at it from the other side, what's the use-case for keeping it? If you want to diff against it or something like that, you can just do that against your local clone (that you already had - if you didn't, you shouldn't be using it at all)... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On 12/30/2010 02:02 AM, Greg Smith wrote: Marko Tiikkaja wrote: I have no idea why it worked in the past, but the patch was never designed to work for UPSERT. This has been discussed in the past and some people thought that that's not a huge deal. It takes an excessively large lock when doing UPSERT, which means its performance under a heavy concurrent load can't be good. The idea is that if the syntax and general implementation issues can get sorted out, fixing the locking can be a separate performance improvement to be implemented later. Using MERGE for UPSERT is the #1 use case for this feature by a gigantic margin. If that doesn't do what's expected, the whole implementation doesn't provide the community anything really worth talking about. That's why I keep hammering on this particular area in all my testing. One of the reflexive I can't switch to PostgreSQL easily stopping points for MySQL users is I can't convert my ON DUPLICATE KEY UPDATE code. Every other use for MERGE is a helpful side-effect of adding the implementation in my mind, but not the primary driver of why this is important. My hints in this direction before didn't get adopted, so I'm saying it outright now: this patch must have an UPSERT implementation in its regression tests. And the first thing I'm going to do every time a new rev comes in is try and break it with the pgbench test I attached. If Boxuan can start doing that as part of his own testing, I think development here might start moving forward faster. I don't care so much about the rate at which concurrent UPSERT-style MERGE happens, so long as it doesn't crash. But that's where this has been stuck at for a while now. I strongly agree. It *is* a huge deal. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
Excerpts from Joachim Wieland's message of jue dic 30 09:31:47 -0300 2010: Advantage of b: No validation necessary, as soon as the transaction that publishes the snapshot loses that snapshot, it will also revoke the snapshot information (either by removing a temp file or deleting it from shared memory) Disadvantage of b: It doesn't allow a snapshot to be installed on a different server. It requires a serializable open transaction to hold the snapshot. Why does it require a serializable transaction? You could simply register the snapshot in any transaction. (Of course, the net effect would be pretty similar to a serializable transaction). We return snapshot information as a chunk of data to the client. At the same time however, we set a checksum in shared memory to protect against modification of the snapshot. A publishing backend can revoke its snapshot by deleting the checksum and a backend that is asked to install a snapshot can verify that the snapshot is correct and current by calculating the checksum and comparing it with the one in shared memory. This only costs us a few bytes for the checksum * max_connection in shared memory and apart from resetting the checksum it does not have cleanup and verification issues. So one registered snapshot per transaction? Sounds a reasonable limitation (I doubt there's a use case for more than that, anyway). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Dec27, 2010, at 23:49 , Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: With respect to (b), I think I'd need to see a much more detailed design for how you intend to make this work. Off the top of my head there seems to be some pretty serious feasibility problems. I had one random thought on that -- it seemed like a large concern was that there would need to be at least an occasional scan of the entire table to rebuild the distinct value information. I believe we could actually avoid that. First, the paper An Optimal Algorithm for the Distinct Elements Problem actually contains an algorithm with *does* handle deletes - it's called L_0 estimate there. Second, as Tomas pointed out, the stream-based estimator is essentially a simplified version of a bloom filter. It starts out with a field of N zero bits, and sets K of them to 1 for each value v in the stream. Which bits are set to 1 depends on some hash function(s) H_i(v). It's then easy to compute how many 1-bits you'd expect to find in the bit field after seeing D distinct values, and by reversing that you can estimate D from the number of 1-bits in the bit field. To avoid having to rescan large tables, instead of storing one such bit field, we'd store one per B pages of data. We'd then only need to scan a range of B pages around every updated or deleted tuple, and could afterwards compute a new global estimate of D by combining the individual bit fields with bitwise and. Since the need to regularly VACUUM tables hit by updated or deleted won't go away any time soon, we could piggy-back the bit field rebuilding onto VACUUM to avoid a second scan. A good value for B would probably be around 1000*size of bitfield/page size. If the bitfield needs ~100k, that'd make B ~= 12000 pages ~= 100MB. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On Dec30, 2010, at 13:31 , Joachim Wieland wrote: We return snapshot information as a chunk of data to the client. At the same time however, we set a checksum in shared memory to protect against modification of the snapshot. A publishing backend can revoke its snapshot by deleting the checksum and a backend that is asked to install a snapshot can verify that the snapshot is correct and current by calculating the checksum and comparing it with the one in shared memory. We'd still have to stream these checksums to the standbys though, or would they be exempt from the checksum checks? I still wonder whether these checks are worth the complexity. I believe we'd only allow snapshot modifications for read-only queries anyway, so what point is there in preventing clients from setting broken snapshots? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote: Are we ready to drop the old git mirror? The one that's still around (as postgresql-old.git) from before we migrated the main repository to git, and thus has the old hashes around. I see no reason to drop that ever, or at least not any time soon. What is it costing us? Some disk space, so almost nothing. And the potential that people grab it by mistake - it adds a bit to confusion. Well if it's clearly labeled old I don't think it should confuse anyone much. You could even tack one more commit on there adding a README file with a big ol' warning. Looking at it from the other side, what's the use-case for keeping it? If you want to diff against it or something like that, you can just do that against your local clone (that you already had - if you didn't, you shouldn't be using it at all)... I realize it's not as official as the CVS repository was, but I still think we ought to hold onto it for a year or two. Maybe no one will ever look at it again, but I'm not prepared to bet on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 12/23/10 2:21 PM, Tom Lane wrote: Well, that's one laudable goal here, but secure by default is another one that ought to be taken into consideration. I don't see how *not* granting the superuser replication permissions makes things more secure. The superuser can grant replication permissions to itself, so why is suspending them by default beneficial? I'm not following your logic here. Well, the reverse of that is just as true: if we ship it without replication permissions on the postgres user, people can change that if they'd rather not create a separate role for replication. But I think we should encourage people to NOT do it that way. Setting it up that way by default hardly encourages use of a more secure arrangement. I think this argument is a bit inconsistent in the extreme. You might as well argue that a superuser shouldn't have any permissions by default, to discourage users from using it. They can always grant permissions back to it. I don't see why this particular one is so different. If we go down this road, we'll end up with a mess of permissions that a superuser has and doesn't have. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
On ons, 2010-12-29 at 11:09 +0100, Magnus Hagander wrote: I've applied this version (with some minor typo-fixes). This page is now somewhat invalidated: http://developer.postgresql.org/pgdocs/postgres/role-attributes.html First, it doesn't mention the replication privilege, and second it continues to claim that superuser status bypasses all permission checks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
On Thu, Dec 30, 2010 at 9:54 AM, Peter Eisentraut pete...@gmx.net wrote: On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 12/23/10 2:21 PM, Tom Lane wrote: Well, that's one laudable goal here, but secure by default is another one that ought to be taken into consideration. I don't see how *not* granting the superuser replication permissions makes things more secure. The superuser can grant replication permissions to itself, so why is suspending them by default beneficial? I'm not following your logic here. Well, the reverse of that is just as true: if we ship it without replication permissions on the postgres user, people can change that if they'd rather not create a separate role for replication. But I think we should encourage people to NOT do it that way. Setting it up that way by default hardly encourages use of a more secure arrangement. I think this argument is a bit inconsistent in the extreme. You might as well argue that a superuser shouldn't have any permissions by default, to discourage users from using it. They can always grant permissions back to it. I don't see why this particular one is so different. If we go down this road, we'll end up with a mess of permissions that a superuser has and doesn't have. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SLRU API tweak
Excerpts from Kevin Grittner's message of mié dic 29 20:46:55 -0300 2010: Attached is a small patch to avoid putting an opaque structure into the slru.h file and using it in an external function call where external callers must always specify NULL. Thanks, committed. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)
I had an epiphany about this topic, or actually two of them. 1. Whether or not you think there's a significant performance reason to support hash right joins, there's a functionality reason. The infrastructure for right join could just as easily do full joins. And AFAICS, a hash full join would only require one hashable join clause --- the other FULL JOIN ON conditions could be anything at all. This is unlike the situation for merge join, where all the JOIN ON conditions have to be mergeable or it doesn't work right. So we could greatly reduce the scope of the dreaded FULL JOIN is only supported with merge-joinable join conditions error. (Well, okay, it's not *that* dreaded, but people complain about it occasionally.) 2. The obvious way to implement this would involve adding an extra bool field to struct HashJoinTupleData. The difficulty with that, and the reason I'd been resistant to the whole idea, is that it'd eat up a full word per hashtable entry because of alignment considerations. (On 64-bit machines it'd be free because of alignment considerations, but that's cold comfort when 32-bit machines are the ones pressed for address space.) But we only need one bit, so what about commandeering an infomask bit in the tuple itself? For the initial implementation I'd be inclined to take one of the free bits in t_infomask2. We could actually get away with overlaying the flag bit with one of the tuple visibility bits, since it will only be used in tuples that are in the in-memory hash table, which don't need visibility info anymore. But that seems like a kluge that could wait until we really need the flag space. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On Wed, Dec 29, 2010 at 5:14 PM, David Fetter da...@fetter.org wrote: On Wed, Dec 29, 2010 at 04:53:47PM -0500, Robert Haas wrote: On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 29.12.2010 06:54, Robert Haas wrote: With the patch: rhaas=# cluster v; ERROR: views do not support CLUSTER do not support sounds like a missing feature, rather than a nonsensical command. How about something like CLUSTER cannot be used on views In the latest version of this patch, I created four translatable strings per object type: blats do not support %s (where %s is an SQL command) blats do not support constraints blats do not support rules blats do not support triggers It's reasonable enough to write CLUSTER cannot be used on views, but does constraints cannot be used on views seems more awkward to me. Or do we think that's OK? That particular one looks good insofar as it describes reality. With predicate locks, etc., it may become untrue later, though :) After further thought, I think it makes sense to change this around a bit and create a family of functions that can be invoked like this: void check_relation_for_FEATURE_support(Relation rel); ...where FEATURE is constraint, trigger, rule, index, etc. The function will be defined to throw an error if the relation isn't of a type that can support the named feature. The error message will be of the form: constraints can only be used on tables triggers can be used only on tables and views etc. This avoids the need to define a separate error message for each unsupported relkind, and I think it's just as informative as, e.g., constraints cannot be used on whatever object type you tried to invoke it on. We can adopt the same language for commands, e.g.: CLUSTER can only be used on tables. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: I had an epiphany about this topic, or actually two of them. 1. Whether or not you think there's a significant performance reason to support hash right joins, there's a functionality reason. The infrastructure for right join could just as easily do full joins. And AFAICS, a hash full join would only require one hashable join clause --- the other FULL JOIN ON conditions could be anything at all. This is unlike the situation for merge join, where all the JOIN ON conditions have to be mergeable or it doesn't work right. So we could greatly reduce the scope of the dreaded FULL JOIN is only supported with merge-joinable join conditions error. (Well, okay, it's not *that* dreaded, but people complain about it occasionally.) Yeah, that would be neat. It might be a lot faster in some cases, too. 2. The obvious way to implement this would involve adding an extra bool field to struct HashJoinTupleData. The difficulty with that, and the reason I'd been resistant to the whole idea, is that it'd eat up a full word per hashtable entry because of alignment considerations. (On 64-bit machines it'd be free because of alignment considerations, but that's cold comfort when 32-bit machines are the ones pressed for address space.) But we only need one bit, so what about commandeering an infomask bit in the tuple itself? For the initial implementation I'd be inclined to take one of the free bits in t_infomask2. We could actually get away with overlaying the flag bit with one of the tuple visibility bits, since it will only be used in tuples that are in the in-memory hash table, which don't need visibility info anymore. But that seems like a kluge that could wait until we really need the flag space. I think that's a reasonable approach, although I might be inclined to do the overlay sooner rather than later if it doesn't add too much complexity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
Magnus Hagander mag...@hagander.net writes: But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the comment convert some keywords.c symbols to KEYWORD_P to prevent conflict. Based on that, I should probably change it back, right? I just tried a patch for it and it compiles and checks just fine with the _P parts removed. I'd leave it be, it's fine as-is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010: After further thought, I think it makes sense to change this around a bit and create a family of functions that can be invoked like this: void check_relation_for_FEATURE_support(Relation rel); ...where FEATURE is constraint, trigger, rule, index, etc. The function will be defined to throw an error if the relation isn't of a type that can support the named feature. The error message will be of the form: constraints can only be used on tables triggers can be used only on tables and views etc. So this will create a combinatorial explosion of strings to translate? I liked the other idea because the number of translatable strings was kept within reasonable bounds. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote: I see no reason to drop that ever, or at least not any time soon. What is it costing us? Some disk space, so almost nothing. And the potential that people grab it by mistake - it adds a bit to confusion. I realize it's not as official as the CVS repository was, but I still think we ought to hold onto it for a year or two. Maybe no one will ever look at it again, but I'm not prepared to bet on that. I'm with Magnus on this: the risk of confusion seems to greatly outweigh any possible benefit from keeping it. There is no reason for anyone to use that old repo unless they are still working with a local clone of it, and even if they do have a local clone, such a clone is self-sufficient. And more to the point, it seems quite unlikely that anyone is still working with such a clone rather than having rebased by now. We should wait a week or so to see if anyone does pipe up and say they still use that repo; but in the absence of such feedback, it should go. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On Thu, Dec 30, 2010 at 11:00 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010: After further thought, I think it makes sense to change this around a bit and create a family of functions that can be invoked like this: void check_relation_for_FEATURE_support(Relation rel); ...where FEATURE is constraint, trigger, rule, index, etc. The function will be defined to throw an error if the relation isn't of a type that can support the named feature. The error message will be of the form: constraints can only be used on tables triggers can be used only on tables and views etc. So this will create a combinatorial explosion of strings to translate? I liked the other idea because the number of translatable strings was kept within reasonable bounds. No, quite the opposite. With the other approach, you needed: constraints cannot be used on views constraints cannot be used on composite types constraints cannot be used on TOAST tables constraints cannot be used on indexes constraints cannot be used on foreign tables With this, you just need: constraints can only be used on tables -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
On Thu, Dec 30, 2010 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote: I see no reason to drop that ever, or at least not any time soon. What is it costing us? Some disk space, so almost nothing. And the potential that people grab it by mistake - it adds a bit to confusion. I realize it's not as official as the CVS repository was, but I still think we ought to hold onto it for a year or two. Maybe no one will ever look at it again, but I'm not prepared to bet on that. I'm with Magnus on this: the risk of confusion seems to greatly outweigh any possible benefit from keeping it. There is no reason for anyone to use that old repo unless they are still working with a local clone of it, and even if they do have a local clone, such a clone is self-sufficient. And more to the point, it seems quite unlikely that anyone is still working with such a clone rather than having rebased by now. We should wait a week or so to see if anyone does pipe up and say they still use that repo; but in the absence of such feedback, it should go. Well, I still have at least on repo against the old respository, which is why I mentioned it. Maybe there's nothing valuable in there and maybe I don't need the origin anyway, but I haven't bothered to check it over carefully yet because, well, there's no rush to clean up my old repositories, and there is a rush to finish 9.1 development real soon now. I can, of course, carve out time to deal with it, but I think that it's a poor use of time and that the risk of confusion that you and Magnus are postulating is mostly hypothetical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: ... But we only need one bit, so what about commandeering an infomask bit in the tuple itself? For the initial implementation I'd be inclined to take one of the free bits in t_infomask2. We could actually get away with overlaying the flag bit with one of the tuple visibility bits, since it will only be used in tuples that are in the in-memory hash table, which don't need visibility info anymore. But that seems like a kluge that could wait until we really need the flag space. I think that's a reasonable approach, although I might be inclined to do the overlay sooner rather than later if it doesn't add too much complexity. Well, there's no complexity involved, it's just which bit do we define the macro as. Any complexity is conceptual. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
Robert Haas robertmh...@gmail.com writes: After further thought, I think it makes sense to change this around a bit and create a family of functions that can be invoked like this: void check_relation_for_FEATURE_support(Relation rel); That seems like a reasonable idea, but ... ... The error message will be of the form: constraints can only be used on tables triggers can be used only on tables and views etc. This avoids the need to define a separate error message for each unsupported relkind, and I think it's just as informative as, e.g., constraints cannot be used on whatever object type you tried to invoke it on. We can adopt the same language for commands, e.g.: CLUSTER can only be used on tables. ISTM there are four things we might potentially want to state in the error message: the feature/operation you tried to apply, the name of the object you tried to apply it to, the type of that object, and the set of object types that the feature/operation will actually work for. Our current wording (foo is not a table or view) covers the second and fourth of these, though the fourth is stated rather awkwardly. Your proposal above covers the first and fourth. I'm not happy about leaving out the object name, because there are going to be cases where people get this type of error out of a long sequence or nest of operations and it's not clear what it's talking about. It'd probably be okay to leave out the actual object type as long as you include its name, though. One possibility is to break it down like this: ERROR: foo is a sequence DETAIL: Triggers can only be used on tables and views. This could still be emitted by a function such as you suggest, and indeed that would be the most practical way from both a consistency and code-size standpoint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: I had an epiphany about this topic, or actually two of them. 1. Whether or not you think there's a significant performance reason to support hash right joins, there's a functionality reason. The infrastructure for right join could just as easily do full joins. And AFAICS, a hash full join would only require one hashable join clause --- the other FULL JOIN ON conditions could be anything at all. This is unlike the situation for merge join, where all the JOIN ON conditions have to be mergeable or it doesn't work right. So we could greatly reduce the scope of the dreaded FULL JOIN is only supported with merge-joinable join conditions error. (Well, okay, it's not *that* dreaded, but people complain about it occasionally.) Yeah, that would be neat. It might be a lot faster in some cases, too. Yeah, PostgreSQL should have this great feature. Actually Oracle 10g already has the right hash join, http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html And Oracle 11g has the full hash join. http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm Haven't checked whether other DBMS have this feature. Thanks, Li Jie
[HACKERS] pl/python do not delete function arguments
(continuing the flurry of patches) Here's a patch that stops PL/Python from removing the function's arguments from its globals dict after calling it. It's an incremental patch on top of the plpython-refactor patch sent in http://archives.postgresql.org/message-id/4d135170.3080...@wulczer.org. Git branch for this patch: https://github.com/wulczer/postgres/tree/dont-remove-arguments Apart from being useless, as the whole dict is unreffed and thus freed in PLy_procedure_delete, removing args actively breaks things for recursive invocation of the same function. The recursive callee after returning will remove the args from globals, and subsequent access to the arguments in the caller will cause a NameError (see new regression test in patch). Cheers, Jan diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out index 7f4ae5c..cb11f60 100644 *** a/src/pl/plpython/expected/plpython_spi.out --- b/src/pl/plpython/expected/plpython_spi.out *** CONTEXT: PL/Python function result_nro *** 133,135 --- 133,163 2 (1 row) + -- + -- check recursion with same argument does not clobber globals + -- + CREATE FUNCTION recursion_test(n integer) RETURNS integer + AS $$ + if n in (0, 1): + return 1 + + return n * plpy.execute(select recursion_test(%d) as result % (n - 1))[0][result] + $$ LANGUAGE plpythonu; + SELECT recursion_test(5); + recursion_test + + 120 + (1 row) + + SELECT recursion_test(4); + recursion_test + + 24 + (1 row) + + SELECT recursion_test(1); + recursion_test + + 1 + (1 row) + diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c index 67eb0f3..1827fc9 100644 *** a/src/pl/plpython/plpython.c --- b/src/pl/plpython/plpython.c *** static Datum PLy_function_handler(Functi *** 307,313 static HeapTuple PLy_trigger_handler(FunctionCallInfo fcinfo, PLyProcedure *); static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *); - static void PLy_function_delete_args(PLyProcedure *); static PyObject *PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *, HeapTuple *); static HeapTuple PLy_modify_tuple(PLyProcedure *, PyObject *, --- 307,312 *** PLy_function_handler(FunctionCallInfo fc *** 988,1001 */ plargs = PLy_function_build_args(fcinfo, proc); plrv = PLy_procedure_call(proc, args, plargs); - if (!proc-is_setof) - { - /* - * SETOF function parameters will be deleted when last row is - * returned - */ - PLy_function_delete_args(proc); - } Assert(plrv != NULL); } --- 987,992 *** PLy_function_handler(FunctionCallInfo fc *** 1053,1060 Py_XDECREF(plargs); Py_XDECREF(plrv); - PLy_function_delete_args(proc); - if (has_error) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), --- 1044,1049 *** PLy_function_build_args(FunctionCallInfo *** 1267,1287 return args; } - - static void - PLy_function_delete_args(PLyProcedure *proc) - { - int i; - - if (!proc-argnames) - return; - - for (i = 0; i proc-nargs; i++) - if (proc-argnames[i]) - PyDict_DelItemString(proc-globals, proc-argnames[i]); - } - - /* Decide if a cached PLyProcedure struct is still valid */ static bool PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup) --- 1256,1261 diff --git a/src/pl/plpython/sql/plpython_spi.sql b/src/pl/plpython/sql/plpython_spi.sql index 7f8f6a3..3b65f95 100644 *** a/src/pl/plpython/sql/plpython_spi.sql --- b/src/pl/plpython/sql/plpython_spi.sql *** else: *** 105,107 --- 105,123 $$ LANGUAGE plpythonu; SELECT result_nrows_test(); + + + -- + -- check recursion with same argument does not clobber globals + -- + CREATE FUNCTION recursion_test(n integer) RETURNS integer + AS $$ + if n in (0, 1): + return 1 + + return n * plpy.execute(select recursion_test(%d) as result % (n - 1))[0][result] + $$ LANGUAGE plpythonu; + + SELECT recursion_test(5); + SELECT recursion_test(4); + SELECT recursion_test(1); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
Excerpts from Tom Lane's message of jue dic 30 13:49:20 -0300 2010: One possibility is to break it down like this: ERROR: foo is a sequence DETAIL: Triggers can only be used on tables and views. This could still be emitted by a function such as you suggest, and indeed that would be the most practical way from both a consistency and code-size standpoint. This seems good to me. There will only be as many messages as relkinds we have, plus as many as features there are. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: One possibility is to break it down like this: ERROR: foo is a sequence DETAIL: Triggers can only be used on tables and views. This could still be emitted by a function such as you suggest, and indeed that would be the most practical way from both a consistency and code-size standpoint. Great idea. I should have thought of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 12/30/2010 06:26 PM, Simon Riggs wrote: I've mulled over the design for sync rep for awhile now, and have come up with a feature set that includes the final detailed feedback from Fujii Masao, Aidan Van Dyk, Josh Berkus and others. The design also draws from MySQL concepts to make the two interfaces as similar and as simple as possible. It should be noted that the design presented here has many features that the MySQL design does not. I am currently finishing up my patch to offer these features, so its time to begin final discussions. As an interim step, I enclose a PDF version of relevant excerpts from the doc patch. The patch will follow on a later post in the near future. I would like to separate discussions on user interface from that of internal design, to make it easier for more people to get involved. Please read the following and post your comments. Thank you. it would help if this would just be a simple text-only description of the design that people can actually comment on inline. I don't think sending technical design proposals as a pdf (which seems to be written in doc-style as well) is a good idea to encourage discussion on -hackers :( Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problems with autovacuum and vacuum
Hello, Last week I had a serious problem with my PostgreSQL database. My autovacuum is OFF, but in September it started to prevent the transaction wraparoud; however last week the following message appeared continuously in my log: WARNING: database production must be vacuumed within 4827083 transactions HINT: To avoid a database shutdown, execute a full-database VACUUM in production. This message appeared for five to six hours; after that, the message disappeared from log. Any idea about what could have happened? Every day the vacuum is executed on some tables; and on Sundays it's executed on all tables. But as the autovacuum is running since September, and it runs for a long time, the vacuum was blocked because autovacuum had been running on the same table. How should I procede in this case? The table where the autovacuum is running and where the vacuum was blocked has billion of rows. I'm using the PostgreSQL 8.3.8 The configuration of the vacuum parameters are: vacuum_cost_limit = 200 vacuum_cost_delay = 0 vacuum_freeze_min_age = 1 autovacuum = off autovacuum_freeze_max_age = 2 Regards, João Paulo -- JotaComm http://jotacomm.wordpress.com
Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE
On Dec 29, 2010, at 10:14 PM, Robert Haas wrote: +1 for trying to optimize these cases (but maybe after we optimize the varchar - text and varchar(less) - varchar(more) cases to skip the scan altogether). +1 on getting the obvious cases of varchar and numeric done first; we run into those a lot at work and would be willing to sponsor work on a patch that makes those operations as fast as just adding a new column. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote: it would help if this would just be a simple text-only description of the design that people can actually comment on inline. I don't think sending technical design proposals as a pdf (which seems to be written in doc-style as well) is a good idea to encourage discussion on -hackers :( 25.2.6. Synchronous Replication Streaming replication is by default asynchronous. Transactions on the primary server write commit records to WAL, yet do not know whether or when a standby has received and processed those changes. So with asynchronous replication, if the primary crashes, transactions committed on the primary might not have been received by any standby. As a result, failover from primary to standby could cause data loss because transaction completions are absent, relative to the primary. The amount of data loss is proportional to the replication delay at the time of failover. Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. When synchronous replication is requested, the commit of a write transaction will wait until confirmation that the commit record has been transferred successfully to at least one standby server. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the roundtrip time from primary to standby. Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only final top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. 25.2.6.1. Basic Configuration Synchronous replication must be enabled on both the primary and at least one standby server. If synchronous replication is disabled on the master, or enabled on the primary but not enabled on any slaves, the primary will use asynchronous replication by default. We use a single parameter to enable synchronous replication, set in postgresql.conf on both primary and standby servers: synchronous_replication = off (default) | on On the primary, synchronous_replication can be set for particular users or databases, or dynamically by applications programs. If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. Turning this setting off for a standby allows the administrator to exclude certain standby servers from releasing waiting transactions. This is useful if not all standby servers are designated as potential future primary servers. On the standby, this parameter only takes effect at server start. 25.2.6.2. Planning for Performance Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn't utilise system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention. PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions. For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users. With synchronous replication options specified at the application level (on the master) we can offer sync rep for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications. This feature is unique to PostgreSQL. 25.2.6.3. Planning for High Availability The easiest and safest method of gaining High Availability using synchronous replication is to configure at least two standby servers. To understand why, we need to examine what can happen when you lose all standby servers. Commits made when synchronous_replication is set will wait until at least one standby responds. The response may never occur if the last, or only, standby should crash or the network drops. What should we do in that situation? Sitting and waiting will typically cause operational problems because it is an effective outage of the
Re: [HACKERS] and it's not a bunny rabbit, either
On Thu, Dec 30, 2010 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: One possibility is to break it down like this: ERROR: foo is a sequence DETAIL: Triggers can only be used on tables and views. This could still be emitted by a function such as you suggest, and indeed that would be the most practical way from both a consistency and code-size standpoint. Great idea. I should have thought of that. On further reflection, this can still turn into a laundry list in certain cases. DETAIL: You can only comment on columns of tables, views, and composite types. seems less helpful than: DETAIL: Comments on relations with system-generated column names are not supported. I think that for rules, triggers, constraints, and anything that only works on a single relkind, we can't do much better than to list the specific object types. But where there's some sort of guiding principle involved I think we'd do well to articulate it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 30.12.2010 15:43, Florian Pflug napsal(a): On Dec27, 2010, at 23:49 , Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: With respect to (b), I think I'd need to see a much more detailed design for how you intend to make this work. Off the top of my head there seems to be some pretty serious feasibility problems. I had one random thought on that -- it seemed like a large concern was that there would need to be at least an occasional scan of the entire table to rebuild the distinct value information. I believe we could actually avoid that. First, the paper An Optimal Algorithm for the Distinct Elements Problem actually contains an algorithm with *does* handle deletes - it's called L_0 estimate there. Hmmm, that's interesting. I know there's a part about L_0 estimation, but that's about estimating Hamming norm of a vector - so I've ignored it as I thought we can't use it to estimate number of distinct values. But if it really handles deletions and if we can use it, then it's really interesting. Second, as Tomas pointed out, the stream-based estimator is essentially a simplified version of a bloom filter. It starts out with a field of N zero bits, and sets K of them to 1 for each value v in the stream. Which bits are set to 1 depends on some hash function(s) H_i(v). It's then easy to compute how many 1-bits you'd expect to find in the bit field after seeing D distinct values, and by reversing that you can estimate D from the number of 1-bits in the bit field. No, I haven't said the stream-based estimators are simplified versions of a Bloom filter. I said the approach is very similar - all the algorithms use bitmaps and hash functions, but the algorithms (Bloom filter vs. probabilistic counting and adaptive sampling) are very different. The Bloom filter is much more straightforward. The other algorithms are much more sophisticated which allows to use less space. To avoid having to rescan large tables, instead of storing one such bit field, we'd store one per B pages of data. We'd then only need to scan a range of B pages around every updated or deleted tuple, and could afterwards compute a new global estimate of D by combining the individual bit fields with bitwise and. I don't think this could help. 1) This works just with the Bloom filters, not with the other algorithms (you can't combine the segments using bitmap OR). 2) With heavily modified tables the updates are usually 'spread' through the whole table, so you'll have to rebuild all the segments anyway. Since the need to regularly VACUUM tables hit by updated or deleted won't go away any time soon, we could piggy-back the bit field rebuilding onto VACUUM to avoid a second scan. Well, I guess it's a bit more complicated. First of all, there's a local VACUUM when doing HOT updates. Second, you need to handle inserts too (what if the table just grows?). But I'm not a VACUUM expert, so maybe I'm wrong and this is the right place to handle rebuilds of distinct stats. I was thinking about something else - we could 'attach' the rebuild to an actual seq scan if the amount of changes reaches some threshold (since the last rebuild). Only in case the amount of changes reaches a higher threshold, we would rebuild the stats on our own. Something like IF (# of updates * deletes 5%) THEN wait for seq scan IF (# of updates * deletes 10%) THEN rebuild the stats I've found a nice ppt describing how Oracle does this: http://www.oraclegeek.net/downloads/One_Pass_Distinct_Sampling.ppt and there's PDF version http://www.oraclegeek.net/downloads/OnePassDistinctSampling.pdf According to this, Oracle is using the probabilistic counting approach (see slide 26). And once they find out there were to many changes in the table, they rebuild the whole thing. I'm not saying we should do exactly the same, just that this might be a good direction. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C++ keywords in headers
pete...@gmx.net (Peter Eisentraut) writes: On mån, 2010-12-27 at 12:33 -0500, Andrew Dunstan wrote: On a more general point, it would be useful to have some infrastructure for running quality checks like this and publishing the results. We should be way beyond the point where we rely on individuals doing this sort of stuff. I had a Hudson service set up for things like this, but the hosting was unreliable and then the thing faded away. I could try to revive it. Careful, Oracle has been trying to claim proprietary ownership of that... http://hudson-labs.org/content/whos-driving-thing -- ``God decided to take the devil to court and settle their differences once and for all. When Satan heard of this, he grinned and said, And just where do you think you're going to find a lawyer?'' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote: it would help if this would just be a simple text-only description of the design that people can actually comment on inline. I don't think sending technical design proposals as a pdf (which seems to be written in doc-style as well) is a good idea to encourage discussion on -hackers :( 25.2.6. Synchronous Replication Streaming replication is by default asynchronous. Transactions on the primary server write commit records to WAL, yet do not know whether or when a standby has received and processed those changes. So with asynchronous replication, if the primary crashes, transactions committed on the primary might not have been received by any standby. As a result, failover from primary to standby could cause data loss because transaction completions are absent, relative to the primary. The amount of data loss is proportional to the replication delay at the time of failover. Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. When synchronous replication is requested, the commit of a write transaction will wait until confirmation that the commit record has been transferred successfully to at least one standby server. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the roundtrip time from primary to standby. Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only final top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. 25.2.6.1. Basic Configuration Synchronous replication must be enabled on both the primary and at least one standby server. If synchronous replication is disabled on the master, or enabled on the primary but not enabled on any slaves, the primary will use asynchronous replication by default. We use a single parameter to enable synchronous replication, set in postgresql.conf on both primary and standby servers: synchronous_replication = off (default) | on On the primary, synchronous_replication can be set for particular users or databases, or dynamically by applications programs. This seems like a potential issue, where I start a server with this off, and then I start turning it on for specific transactions; it isn't exactly clear what happens, since there may or may not be a running synchronous rep slave available. (I love the idea though) If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. I don't want you to think I am setting an expectation, but I'm curious about the possibility of requiring more than 1 server to reply? Turning this setting off for a standby allows the administrator to exclude certain standby servers from releasing waiting transactions. This is useful if not all standby servers are designated as potential future primary servers. On the standby, this parameter only takes effect at server start. 25.2.6.2. Planning for Performance Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn't utilise system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention. PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions. For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users. With synchronous replication options specified at the application level (on the master) we can offer sync rep for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications. This feature is unique to PostgreSQL. 25.2.6.3. Planning for High Availability The easiest and safest
Re: [HACKERS] Sync Rep Design
Most of your doc uses the terms primary and standby, but a few instances of master and slave have slipped in. I think it's better to stick to consistent terminology. On Thu, Dec 30, 2010 at 21:04, Simon Riggs si...@2ndquadrant.com wrote: With synchronous replication options specified at the application level (on the master) we can offer sync rep for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications. This feature is unique to PostgreSQL. I think a comment about the head-of-line blocking nature of streaming repliaction is in order. If you execute massive writes in async mode and then run a transaction in sync mode, its commit will be delayed until all the async transactions before it have been applied on the slave. synchronous_replication_timeout (boolean) Doesn't look like a boolean to me :) Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Excerpts from Tomas Vondra's message of jue dic 30 16:38:03 -0300 2010: Since the need to regularly VACUUM tables hit by updated or deleted won't go away any time soon, we could piggy-back the bit field rebuilding onto VACUUM to avoid a second scan. Well, I guess it's a bit more complicated. First of all, there's a local VACUUM when doing HOT updates. Second, you need to handle inserts too (what if the table just grows?). But I'm not a VACUUM expert, so maybe I'm wrong and this is the right place to handle rebuilds of distinct stats. I was thinking that we could have two different ANALYZE modes, one full and one incremental; autovacuum could be modified to use one or the other depending on how many changes there are (of course, the user could request one or the other, too; not sure what should be the default behavior). So the incremental one wouldn't worry about deletes, only inserts, and could be called very frequently. The other one would trigger a full table scan (or nearly so) to produce a better estimate in the face of many deletions. I haven't followed this discussion closely so I'm not sure that this would be workable. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
On Thu, Dec 30, 2010 at 2:13 AM, Joel Jacobson j...@gluefinance.com wrote: 2010/12/29 Dimitri Fontaine dimi...@2ndquadrant.fr Please have a look at getddl: https://github.com/dimitri/getddl Nice! Looks like a nifty tool. When I tried it, ./getddl.py -f -F /crypt/funcs -d glue, I got the error No such file or directory: 'sql/schemas.sql'. While the task of splitting objects into separate files could be solved by an external wrapper tool like yours around pg_dump, I argue it makes more sense of putting the (minimal required) logics into pg_dump, due to a number of reasons, most importantly because it's simplier and less complex, thus less error prone. My patch is only a few lines of code and doesn't add any logics to pg_dump, it merely reroutes the fwrite() system calls based on the toc entries. Just the fact you and others had to create own tools to do the splitting shows the feature is important, which I think should be included in the normal pg_dump tool. As someone whose own version of getddl helped inspire Dimitri to create his own version, I've both enjoyed reading this thread and seeing this wheel reinvented yet again, and wholeheartedly +1 the idea of building this directly into pg_dump. (The only thing better would be to make everything thing sql callable, but that's a problem for another day). Robert Treat http://www.xzilla.net
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote: If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. I don't want you to think I am setting an expectation, but I'm curious about the possibility of requiring more than 1 server to reply? I was initially interested in this myself, but after a long discussion on quorum commit it was decided to go with first past post. That is easier to manage, requires one less parameter, performs better and doesn't really add that much additional confidence. It was also discussed that we would have a plugin API, but I'm less sure about that now. Perhaps we can add that option in the future, but its not high on my list of things for this release. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote: We use a single parameter to enable synchronous replication, set in postgresql.conf on both primary and standby servers: synchronous_replication = off (default) | on On the primary, synchronous_replication can be set for particular users or databases, or dynamically by applications programs. This seems like a potential issue, where I start a server with this off, and then I start turning it on for specific transactions; it isn't exactly clear what happens, since there may or may not be a running synchronous rep slave available. (I love the idea though) Not really an issue. Even if there was a standby there a moment ago, the standby can go away at any time. So we must cope gracefully with what happens if you do this. By default, the parameters specify that in the case you mention we will just use async replication (no wait!). Options exist to change that, since some people want to wait until the sysadmin adds a standby. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 12/30/2010 08:04 PM, Simon Riggs wrote: On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote: it would help if this would just be a simple text-only description of the design that people can actually comment on inline. I don't think sending technical design proposals as a pdf (which seems to be written in doc-style as well) is a good idea to encourage discussion on -hackers :( 25.2.6. Synchronous Replication Streaming replication is by default asynchronous. Transactions on the primary server write commit records to WAL, yet do not know whether or when a standby has received and processed those changes. So with asynchronous replication, if the primary crashes, transactions committed on the primary might not have been received by any standby. As a result, failover from primary to standby could cause data loss because transaction completions are absent, relative to the primary. The amount of data loss is proportional to the replication delay at the time of failover. Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. When synchronous replication is requested, the commit of a write transaction will wait until confirmation that the commit record has been transferred successfully to at least one standby server. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the roundtrip time from primary to standby. hmm this is one of the main problems I see with the proposed master is sometimes aware of the standby(as in the feedback mode) concept this proposal has. If it waits for only one of the standbys there is some issue with the terminology. As a DBA I would expect the master to only return if ALL of the sync replication declared nodes replied ok. Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only final top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. 25.2.6.1. Basic Configuration Synchronous replication must be enabled on both the primary and at least one standby server. If synchronous replication is disabled on the master, or enabled on the primary but not enabled on any slaves, the primary will use asynchronous replication by default. We use a single parameter to enable synchronous replication, set in postgresql.conf on both primary and standby servers: this reads as if you can only set it there synchronous_replication = off (default) | on On the primary, synchronous_replication can be set for particular users or databases, or dynamically by applications programs. this says otherwise If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. see above for why I think this violates the configuration promise - if I say this is a sync standby I better expect it to be... Turning this setting off for a standby allows the administrator to exclude certain standby servers from releasing waiting transactions. This is useful if not all standby servers are designated as potential future primary servers. On the standby, this parameter only takes effect at server start. 25.2.6.2. Planning for Performance Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn't utilise system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention. PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions. For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users. With synchronous replication options specified at the application level (on the master) we can offer sync rep for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications. This feature is unique to PostgreSQL. that seems to be
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat r...@xzilla.net wrote: If primary crashes while commits are waiting for acknowledgement, those transactions will be marked fully committed if the primary database recovers, no matter how allow_standalone_primary is set. This seems backwards; if you are waiting for acknowledgement, wouldn't the normal assumption be that the transactions *didnt* make it to any standby, and should be rolled back ? This is the standard 2-phase commit problem. The primary server *has* committed it, it's fsync has returned, and the only thing keeping it from returning the commit to the client is that it's waiting on a synchronous ack from a slave. You've got 2 options: 1) initiate fsync on the slave first - In this case, the slave is farther ahead than the primary, and if primary fails, you're *forced* to have a failover. The standby is head of the primary, so the primary recovering can cause divergence. And you'll likely have to do a base-backup style sync to get a new primary/standby setup. 2) initiate fsync on the primary first - In this case, the slave is always slightly behind. If if your primary falls over, you don't give commit messages to the clients, but if it recovers, it might have committed data, and slaves will still be able to catch up. The thing is that currently, even without replication, #2 can happen. If your db falls over before it gets the commit packet stuffed out the network, you're in the same boat. The data might be committed, even though you didn't get the commit packet, and when your DB recovers, it's got the committed data that you never knew was committed. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 3:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote: If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. I don't want you to think I am setting an expectation, but I'm curious about the possibility of requiring more than 1 server to reply? I was initially interested in this myself, but after a long discussion on quorum commit it was decided to go with first past post. That is easier to manage, requires one less parameter, performs better and doesn't really add that much additional confidence. Yes, I think with a single master, you are probably right (been dealing with more than my fair share of multi-master based nosql solutions lately) Still, one thing that has me concerned is that in the case of two slaves, you don't know which one is the more up-to-date one if you need to failover. It'd be nice if you could just guarantee they both are, but in lieu of that, I guess whatever decision tree is being used, it needs to look at current xlog location of any potential failover targets. It was also discussed that we would have a plugin API, but I'm less sure about that now. Perhaps we can add that option in the future, but its not high on my list of things for this release. Agreed. Robert Treat http://www.xzilla.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
On Thu, 2010-12-30 at 11:02 -0500, Tom Lane wrote: I'm with Magnus on this: the risk of confusion seems to greatly outweigh any possible benefit from keeping it. There is no reason for anyone to use that old repo unless they are still working with a local clone of it, and even if they do have a local clone, such a clone is self-sufficient. The reason I originally asked for it to be kept around was not because it's hard to rebase, but because there might be references to SHA1s from that repo floating around. I don't think these would be very common, nor critical, but I know I wrote a few emails that included things like look at this commit. Personally, my utility for the old repo is not much (if it was anything important, I wouldn't have relied on the unofficial repo). But we should probably give a little bit of warning for folks that might want to rebase or translate some old notes. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote: When allow_standalone_primary is set, a user will stop waiting once the replication_timeout has been reached for their specific session. Users are not waiting for a specific standby to reply, they are waiting for a reply from any standby, so the unavailability of any one standby is not significant to a user. It is possible for user sessions to hit timeout even though standbys are communicating normally. In that case, the setting of replication_timeout is probably too low. will a notice or warning be thrown in these cases? I'm thinking something like the checkpoint timeout warning, but could be something else; it just seems to me you need some way to know you're timing out. We can do that, yes. The standby sends regular status messages to the primary. If no status messages have been received for replication_timeout the primary server will assume the connection is dead and terminate it. This happens whatever the setting of allow_standalone_primary. Does the standby attempt to reconnect in these scenarios? Yes it would, but the reason why we terminated the connection was it wasn't talking any more, so it is probably dead. If primary crashes while commits are waiting for acknowledgement, those transactions will be marked fully committed if the primary database recovers, no matter how allow_standalone_primary is set. This seems backwards; if you are waiting for acknowledgement, wouldn't the normal assumption be that the transactions *didnt* make it to any standby, and should be rolled back ? Well, we can't roll it back. We have already written the commit record to WAL. There is no way to be certain that all standbys have received all outstanding WAL data at time of the crash of the primary. Some transactions may not show as committed on the standby, even though they show as committed on the primary. The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by the standby. Hence this mechanism is technically semi synchronous rather than fully synchronous replication. Note that replication still not be fully synchronous even if we wait for all standby servers, though this would reduce availability, as described previously. I think we ought to have an example of the best configuration for cannot afford to lose any data scenarios, where we would prefer an overall service interruption over the chance of having the primary / secondary out of synch. I say use two or more standbys more than once... somewhat concerned that we seem to need to use double negatives to describe whats going on here. it makes me think we ought to rename this to require_synchronous_standby or similar. Don't see why we can't use double negatives. ;-) The parameter is named directly from Fujii Masao's suggestion. 18.5.6. Standby Servers These settings control the behavior of a standby server that is to receive replication data. ... i was expecting this section to mention the synchronous_replication (bool) somewhere, to control if the standby will participate synchronously or asynch; granted it's the same config as listed in 18.5.5 right? Just that the heading of that section specifically targets the primary. OK, good idea. HTH, looks pretty good at first glance. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_streamrecv for 9.1?
On 12/29/2010 07:42 PM, Robert Haas wrote: On Dec 29, 2010, at 1:01 PM, Tom Lanet...@sss.pgh.pa.us wrote: Is it really stable enough for bin/? My impression of the state of affairs is that there is nothing whatsoever about replication that is really stable yet. Well, that's not stopping us from shipping a core feature called replication. I'll defer to others on how mature pg_streamrecv is, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do. well I have not looked at how good pg_streamrecv really is but we desperately need to fix the basic usability issues in our current replication implementation and pg_streamrecv seems to be a useful tool to help with some. From all the people I talked to with SR they where surprised how complex and fragile the initial setup procedure is - it is the lack of providing a simple and reliable tool to do a base backup over libpq and also a simple way to have that tool tell the master keep the wal segments I need for starting the standby. I do realize we need to keep the ability to do the basebackup out-of-line but for 99% of the users it is tool complex, scary and failure proof (I know nobody who got the procedure right the first time - which is a strong hint that we need to work on that). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote: Still, one thing that has me concerned is that in the case of two slaves, you don't know which one is the more up-to-date one if you need to failover. It'd be nice if you could just guarantee they both are... Regrettably, nobody can know that, without checking. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small table left outer join big table
Tom Lane t...@sss.pgh.pa.us writes: I can't get all *that* excited about complicating hash joins as proposed. The query is still fundamentally going to be slow because you won't get out of having to seqscan the large table. The only way to make it really fast is to not read all of the large table, and nestloop-with-inner-indexscan is the only plan type with a hope of doing that. That sounds somewhat like Loose Indexscan as described in the following wiki page, right? http://wiki.postgresql.org/wiki/Loose_indexscan Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 12/30/2010 10:01 PM, Simon Riggs wrote: On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote: Still, one thing that has me concerned is that in the case of two slaves, you don't know which one is the more up-to-date one if you need to failover. It'd be nice if you could just guarantee they both are... Regrettably, nobody can know that, without checking. how exactly would you check? - this seems like something that needs to be done from the SQL and the CLI level and also very well documented (which I cannot see in your proposal). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 22:11 +0200, Marti Raudsepp wrote: I think a comment about the head-of-line blocking nature of streaming repliaction is in order. If you execute massive writes in async mode and then run a transaction in sync mode, its commit will be delayed until all the async transactions before it have been applied on the slave. Not really sure I understand what you want me to add there. The case you mention is identical whether we use the word async or sync where you mention in async mode. Replication doesn't wait until a sync commit is requested, it is continuously active. Sync rep's only addition are the reply messages. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 21:42 +0100, Stefan Kaltenbrunner wrote: Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. When synchronous replication is requested, the commit of a write transaction will wait until confirmation that the commit record has been transferred successfully to at least one standby server. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the roundtrip time from primary to standby. hmm this is one of the main problems I see with the proposed master is sometimes aware of the standby(as in the feedback mode) concept this proposal has. If it waits for only one of the standbys there is some issue with the terminology. As a DBA I would expect the master to only return if ALL of the sync replication declared nodes replied ok. Well, as a DBA, I expect it to work with just one. That's how MySQL and Oracle work at least. If ALL standbys reply, it takes longer, makes the code harder, how do you determine what all is robustly etc.. Plus its been discussed already. What I'm really missing with that proposal is how people expect that solution to be managed - What aspect do you wish to monitor? I'm happy to consider your suggestions. given there is only sometimes a feedback channel into the master you can't do the monitoring. Not sure what you mean. Please explain more. Even if you could (which we really need!) there is nothing in the proposal yet that will help to determine on what the most recent standby (in the case of more 1 sync standby) might be. Functions to determine that already exist. - but it would require a real standby registration or at least standby management possibility on the master not a halfway done one - so do we really need hot_standby_feedback as part of the inital sync-rep patch? It is a Hot Standby feature, but so tightly integrated with this code that it isn't possible for me to submit as two separate patches. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote: We use a single parameter to enable synchronous replication, set in postgresql.conf on both primary and standby servers: synchronous_replication = off (default) | on On the primary, synchronous_replication can be set for particular users or databases, or dynamically by applications programs. If more than one standby server specifies synchronous_replication, then whichever standby replies first will release waiting commits. Turning this setting off for a standby allows the administrator to exclude certain standby servers from releasing waiting transactions. This is useful if not all standby servers are designated as potential future primary servers. On the standby, this parameter only takes effect at server start. I think it's a bad idea to use the same parameter to mean different things on the master and standby. You proposed this kind of double meaning for the hot_standby parameter (possibly back when it was called standby_connections, or something like that) and we (rightly, I think) did not adopt that, instead ending up with wal_level to control the master's behavior and hot_standby to control the slave's behavior. synchronous_replication (boolean) Specifies whether transaction commit will wait for WAL records to be replicated before the command returns a success indication to the client. The word replicated here could be taken to mean different things, most obviously: - slave has received the WAL - slave has fsync'd the WAL - slave has applied the WAL -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 22:08 +0100, Stefan Kaltenbrunner wrote: On 12/30/2010 10:01 PM, Simon Riggs wrote: On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote: Still, one thing that has me concerned is that in the case of two slaves, you don't know which one is the more up-to-date one if you need to failover. It'd be nice if you could just guarantee they both are... Regrettably, nobody can know that, without checking. how exactly would you check? - this seems like something that needs to be done from the SQL and the CLI level and also very well documented (which I cannot see in your proposal). This is a proposal for sync rep, not multi-node failover. I'm definitely not going to widen the scope of this project. Functions already exist to check the thing you're asking. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On 30.12.2010 16:49, Florian Pflug wrote: On Dec30, 2010, at 13:31 , Joachim Wieland wrote: We return snapshot information as a chunk of data to the client. At the same time however, we set a checksum in shared memory to protect against modification of the snapshot. A publishing backend can revoke its snapshot by deleting the checksum and a backend that is asked to install a snapshot can verify that the snapshot is correct and current by calculating the checksum and comparing it with the one in shared memory. We'd still have to stream these checksums to the standbys though, or would they be exempt from the checksum checks? I still wonder whether these checks are worth the complexity. I believe we'd only allow snapshot modifications for read-only queries anyway, so what point is there in preventing clients from setting broken snapshots? Hmm, our definition of read-only is a bit fuzzy. While a transaction doesn't modify the database itself, it could still send NOTIFYs or call a PL function to do all sorts of things outside the database. Imagine that you're paranoid about data integrity, and have a security definer function that runs cross checks on the data. If it finds any anomalities, it wakes up the operator or forces shutdown or similar. Now a malicious user could set a snapshot that passes the basic validity checks, ie. xmin = GlobalXmin, but contains a combination of still in-progress that never existed in reality. If he then calls the paranoia-function, it would see an inconsistent state of committed tuples and get upset. Maybe that's a bit far-stretched, but it's not entirely clear that running with an inconsistent snapshot is harmless. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: synchronous replication for high performance applications. This feature is unique to PostgreSQL. that seems to be a bit too much marketing for a reference level document +1. It also does not address the more general (not sync rep specific) problem of how to deal with max_keep_segments which is a wart and I was hoping we could get rid of in 9.1 - but it would require a real standby registration or at least standby management possibility on the master not a halfway done one - so do we really need hot_standby_feedback as part of the inital sync-rep patch? And this is really the key point on which previous discussions of sync rep stalled. Simon is clearly of the opinion that any system where the slaves have an individual identities (aka standby registration) is a bad idea, but the only justification he's offered for that position is the assertion that it doesn't allow any added functionality. As you point out, and as has been pointed out before, this is not true, but unless Simon has changed his position since the last time we discussed this, he will not only refuse to include any kind of standby identifier in any of his proposals, but will also argue against including any such code even if it is written by someone else. I don't understand why, but that's how it is. Synchronous replication would probably be done and committed by now if it weren't for this issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 30.12.2010 10:55, Mark Kirkwood wrote: Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I don't see why --delete would make any difference, but you shouldn't exclude backup_label from the base backup. The backup label file is an important part of the online backup, it cannot be recovered safely without it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 31/12/10 11:01, Heikki Linnakangas wrote: On 30.12.2010 10:55, Mark Kirkwood wrote: Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I don't see why --delete would make any difference, but you shouldn't exclude backup_label from the base backup. The backup label file is an important part of the online backup, it cannot be recovered safely without it. Yes, you (and Robert) are entirely correct, I was confused in my understanding of the --delete --exclude=backup_label and thought it to mean exclude the backup label from the delete. Yeah the --delete is harmless, it is the exclude backup_label that is causing the problem. Note to all current Pitrtools users, this impacts you! We need to get a corrected version out soon I would think. I note that this uninitialized pages with standbys has cropped up from time to time - I wonder if in most/all the cases folk were using Pitrtools? regards Mark
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 31/12/10 11:11, Mark Kirkwood wrote: Yes, you (and Robert) are entirely correct, I was confused in my understanding of the --delete --exclude=backup_label and thought it to mean exclude the backup label from the delete. Yeah the --delete is harmless, it is the exclude backup_label that is causing the problem. Note to all current Pitrtools users, this impacts you! We need to get a corrected version out soon I would think. Also (not surprisingly) I can confirm that data corruption is possible: 1/ Perform approx 14 transactions against the primary 2/ Cancel Pgbench 3/ Issue SELECT pg_switch_xlog() on primary 4/ Bring up standby after checking it has applied last log The resulting primary and standby should be identical, but: primary: bench=# SELECT count(*) FROM branches; count --- 100 bench=# SELECT count(*) FROM accounts; count -- 1000 standby: bench=# SELECT count(*) FROM branches; count --- 132 bench=# SELECT count(*) FROM accounts; count - 9998269 The other counts are the same. We have lost some accounts records, but have gained duplicates in branches: bench=# REINDEX TABLE branches; ERROR: could not create unique index branches_pkey DETAIL: Table contains duplicated values. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Dec 30, 2010, at 3:27 PM, Robert Haas wrote: synchronous_replication (boolean) Specifies whether transaction commit will wait for WAL records to be replicated before the command returns a success indication to the client. The word replicated here could be taken to mean different things, most obviously: - slave has received the WAL - slave has fsync'd the WAL - slave has applied the WAL I think that comment is valid for the entire set of docs, actually. The document goes out of its way to avoid simple phrases like replicated, but doesn't spell out exactly what is happening, ie: Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. Reading that, I'm left with the sense that this isn't a simple matter of Oh, the data has been replicated to the slave before commit returns, but nothing does a good job of clearly explaining what the distinction is and what it means. This section: The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by the standby. Hence this mechanism is technically semi synchronous rather than fully synchronous replication. does provide some enlightenment, but it's at the end of the section. I think it would be best if there was a section right at the beginning that talked about the data quality issue of sync replication and how we're avoiding it with our semi-sync solution. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 16:27 -0500, Robert Haas wrote: I think it's a bad idea to use the same parameter to mean different things on the master and standby. Obviously if you phrase it like that, nobody would disagree. I would say I have used the same parameter on both sides in a balanced way to simplify the configuration, which had been an important factor in the debate. You need to set parameter X on both primary and standby seems simple and clear. It certainly works OK for MySQL. It's no bother to change, whichever way we decide and I'm happy to do so. My previous patch had two parameters: primary: synchronous_replication = ... standby: synchronous_replication_service = on | off Which do people prefer? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote: On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: synchronous replication for high performance applications. This feature is unique to PostgreSQL. that seems to be a bit too much marketing for a reference level document +1. I've removed the This feature is unique to PostgreSQL, which I agree belongs in a press release, not docs. The explanation of a use case that would benefit from the feature seems valid and I've left that in. PostgreSQL docs are more technical and precise than any other DBMS, even DB2. Having read everybody else's docs, I'm inclined to say it would be easier to explain if I left out the details, as they do. You won't find a detailed explanation of commit guarantees in MySQL docs, for example. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE
On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote: On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch n...@leadboat.com wrote: On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote: I think for any pair of types (T1, T2) we should first determine whether we can skip the scan altogether. ?If yes, we're done. ?If no, then we should have a way of determining whether a verify-only scan is guaranteed to be sufficient (in your terminology, the verification scan is guaranteed to return either positive or error, not negative). If yes, then we do a verification scan. ?If no, we do a rewrite. How would we answer the second question in general? I am not sure - I guess we'd need to design some sort of mechanism for that. Okay, here goes. Given a Var varexpr representing the column we're changing and an expression tree expr we need to answer two questions (argument lists simplified -- assume the same RTEs in all cases): always-noop: Will datumIsEquals(ExecEvalExpr(varexpr), ExecEvalExpr(expr)) return true or yield an error for all possible tuples? never-error: Will ExecEvalExpr(expr) never throw an error? Currently we're only interested in the second question when the first is also true; I'm not sure if there's something fundamental there, or just an artifact of current needs. To support answering these questions, extend the CREATE CAST changes from my earlier proposal, modifying the exemptor signature to return an int, a bitmask containing one bit for each of these two questions. Call the function in find_typmod_coercion_function. If its return value answers yes to both questions, return COERCION_PATH_NONE, resulting in omission of the length coercion node. For other verdicts, generate the FuncExpr as normal and insert the verdict in a new FuncExpr field funcexempt. (That need not increase the size of FuncExpr, if that's a concern.) ATPrepAlterColumnType, having generated its transformation expression, will call a new function that recursively walks the tree to answer the two questions. The walker will apply these rules: 1. For a Var with the varno/varattno in question, intrinsically yes to both. 2. A RelabelType node inherits the answers of its sole argument. 3. A CoerceToDomain node inherits the always-noop answer of its sole argument. When GetDomainConstraints() == NIL, it also inherits the never-error answer. Otherwise, never-error becomes no. 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field and the answers from its first argument. 5. Any other node answers no to both questions. If the transformation expression root has yes to both questions, we're done with no scan. If only always-noop is true, we do a verification scan only. Otherwise, we optimize nothing and do a rewrite. Thoughts? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 18:47 -0600, Jim Nasby wrote: On Dec 30, 2010, at 3:27 PM, Robert Haas wrote: synchronous_replication (boolean) Specifies whether transaction commit will wait for WAL records to be replicated before the command returns a success indication to the client. The word replicated here could be taken to mean different things, most obviously: - slave has received the WAL - slave has fsync'd the WAL - slave has applied the WAL I think that comment is valid for the entire set of docs, actually. The document goes out of its way to avoid simple phrases like replicated, but doesn't spell out exactly what is happening, ie: Synchronous replication offers the ability to guarantee that all changes made by a transaction have been transferred to at least one remote standby server. This is an extension to the standard level of durability offered by a transaction commit. This is referred to as semi-synchronous replication. Reading that, I'm left with the sense that this isn't a simple matter of Oh, the data has been replicated to the slave before commit returns, but nothing does a good job of clearly explaining what the distinction is and what it means. This section: The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by the standby. Hence this mechanism is technically semi synchronous rather than fully synchronous replication. does provide some enlightenment, but it's at the end of the section. I think it would be best if there was a section right at the beginning that talked about the data quality issue of sync replication and how we're avoiding it with our semi-sync solution. I'm happy to change the docs. It's the first draft... If that's the only problem you've got, then I'm feeling good. Any problems with the user interface itself? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote: It also does not address the more general (not sync rep specific) problem of how to deal with max_keep_segments which is a wart and I was hoping we could get rid of in 9.1 - but it would require a real standby registration or at least standby management possibility on the master not a halfway done one - so do we really need hot_standby_feedback as part of the inital sync-rep patch? And this is really the key point on which previous discussions of sync rep stalled. Simon is clearly of the opinion that any system where the slaves have an individual identities (aka standby registration) is a bad idea, but the only justification he's offered for that position is the assertion that it doesn't allow any added functionality. As you point out, and as has been pointed out before, this is not true, but unless Simon has changed his position since the last time we discussed this, he will not only refuse to include any kind of standby identifier in any of his proposals, but will also argue against including any such code even if it is written by someone else. I don't understand why, but that's how it is. Synchronous replication would probably be done and committed by now if it weren't for this issue. I'm not very clear what your response has to do with Stefan's comments. My general perspective is that MySQL released a simple design a year ahead of us, which should be to our collective shame. I will be working towards delivering something useful in this release. Standby registration is complicated and not necessary. If anybody needs to justify anything, it is the people that claim it is somehow essential. If you want increased complexity and features, you can have it, one day, but don't prevent everybody else from benefiting from simplicity, now. What we do need is performance, otherwise the feature is mostly unusable for production systems, without splitting your application into pieces. I would rather concentrate on a minimal set of functionality that we can all agree on. To show that, I have gone out of my way to include features specified by others, including exact names and behaviours of parameters. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
Robert Haas robertmh...@gmail.com writes: On further reflection, this can still turn into a laundry list in certain cases. DETAIL: You can only comment on columns of tables, views, and composite types. seems less helpful than: DETAIL: Comments on relations with system-generated column names are not supported. I think that for rules, triggers, constraints, and anything that only works on a single relkind, we can't do much better than to list the specific object types. But where there's some sort of guiding principle involved I think we'd do well to articulate it. I'm unconvinced, because the guiding principle is likely to be an implementation detail that won't actually mean much to users. Your example above is a case in point --- I do *not* think the average user will see that as an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Alvaro Herrera alvhe...@commandprompt.com writes: I was thinking that we could have two different ANALYZE modes, one full and one incremental; autovacuum could be modified to use one or the other depending on how many changes there are (of course, the user could request one or the other, too; not sure what should be the default behavior). How is an incremental ANALYZE going to work at all? It has no way to find out the recent changes in the table, for *either* inserts or deletes. Unless you want to seqscan the whole table looking for tuples with xmin later than something-or-other ... which more or less defeats the purpose. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old git repo
Jeff Davis pg...@j-davis.com writes: Personally, my utility for the old repo is not much (if it was anything important, I wouldn't have relied on the unofficial repo). But we should probably give a little bit of warning for folks that might want to rebase or translate some old notes. Well, I guess the question is how much warning. I suggested O(1 week) but Robert seems to want O(1 year). As long as there's some agreed deadline, I'm not very picky about what it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On Thu, Dec 30, 2010 at 8:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On further reflection, this can still turn into a laundry list in certain cases. DETAIL: You can only comment on columns of tables, views, and composite types. seems less helpful than: DETAIL: Comments on relations with system-generated column names are not supported. I think that for rules, triggers, constraints, and anything that only works on a single relkind, we can't do much better than to list the specific object types. But where there's some sort of guiding principle involved I think we'd do well to articulate it. I'm unconvinced, because the guiding principle is likely to be an implementation detail that won't actually mean much to users. Your example above is a case in point --- I do *not* think the average user will see that as an improvement. I think this thread has worked itself around to where it's entirely pointless. My original complaint was about error messages like this: %s is not a table, view, composite type, or index which, once we have foreign tables, needs to be changed to read: %s is not a table, view, composite type, index, or foreign table I think that message is the epitome of worthless, and several other people agreed. After various proposals of greater and lesser merit, we've somehow worked around to the suggestion that this should be reworded to: ERROR: %s is a sequence DETAIL: Only attributes of tables, views, composite types, indexes, or foreign tables can be renamed. While that may be a marginal improvement in clarity, it does absolutely nothing to address my original complaint, which is that adding a relkind forces trivial revisions of messages all over the system, some of which are already excessively long-winded. This message also does nothing to help the user understand WHY we don't allow renaming the attributes of his sequence or TOAST table, whereas the proposed revision does. The absolute worst offenders are messages of the form: blah is not supported on X, Y, Z, or T. which now have to be revised to read: blah is not supported on X,Y, Z, T, or W. This problem could be avoided by writing: blah is supported on A and B Or: blah is supported only for relation types which quack -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
Robert Haas robertmh...@gmail.com writes: I think this thread has worked itself around to where it's entirely pointless. I understand your frustration, but it's not clear to me that there *is* any simple solution to this problem. Fundamentally, adding new relkinds to the system is always going to require running around and looking at a lot of code to see what's affected; and that goes for the error messages too. I put no stock at all in the idea that writing a guiding principle in the error messages will avoid anything, because as often as not, adding a fundamentally new relkind is going to involve some tweaking of what those principles are. ... This message also does nothing to help the user understand WHY we don't allow renaming the attributes of his sequence or TOAST table, whereas the proposed revision does. I remain unconvinced that the average user cares, or will be able to extrapolate the message to understand what's supported or not, even if he does care about the reason for the restriction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Disadvantage of b: It doesn't allow a snapshot to be installed on a different server. It requires a serializable open transaction to hold the snapshot. Why does it require a serializable transaction? You could simply register the snapshot in any transaction. (Of course, the net effect would be pretty similar to a serializable transaction). I am not assuming that the publishing transaction blocks until its snapshot is being picked up. A read committed transaction would get a new snapshot for every other query, so the published snapshot is no longer represented by an actual backend until it is being picked up by one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum would remove tuples that the published-but-not-yet-picked-up snapshot should still be able to see, no? Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On Thu, Dec 30, 2010 at 9:49 AM, Florian Pflug f...@phlo.org wrote: On Dec30, 2010, at 13:31 , Joachim Wieland wrote: We return snapshot information as a chunk of data to the client. At the same time however, we set a checksum in shared memory to protect against modification of the snapshot. A publishing backend can revoke its snapshot by deleting the checksum and a backend that is asked to install a snapshot can verify that the snapshot is correct and current by calculating the checksum and comparing it with the one in shared memory. We'd still have to stream these checksums to the standbys though, or would they be exempt from the checksum checks? I am not talking about having synchronized snapshots among standby servers at all. I am only proposing a client API that will work for this future idea as well. I still wonder whether these checks are worth the complexity. I believe we'd only allow snapshot modifications for read-only queries anyway, so what point is there in preventing clients from setting broken snapshots? What's the use case for it? As soon as nobody comes up with a reasonable use case for it, let's aim for the robust version. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 03:24:09PM -0500, Aidan Van Dyk wrote: On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat r...@xzilla.net wrote: If primary crashes while commits are waiting for acknowledgement, those transactions will be marked fully committed if the primary database recovers, no matter how allow_standalone_primary is set. This seems backwards; if you are waiting for acknowledgement, wouldn't the normal assumption be that the transactions *didnt* make it to any standby, and should be rolled back ? This is the standard 2-phase commit problem. The primary server *has* committed it, it's fsync has returned, and the only thing keeping it from returning the commit to the client is that it's waiting on a synchronous ack from a slave. snip 2) initiate fsync on the primary first - In this case, the slave is always slightly behind. If if your primary falls over, you don't give commit messages to the clients, but if it recovers, it might have committed data, and slaves will still be able to catch up. The thing is that currently, even without replication, #2 can happen. For what little it's worth, I vote for this option, because it's a problem that can already happen (as opposed to adding an entirely new type of problem to the mix). -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggs si...@2ndquadrant.com wrote: I'm not very clear what your response has to do with Stefan's comments. My general perspective is that MySQL released a simple design a year ahead of us, which should be to our collective shame. I will be working towards delivering something useful in this release. I don't feel ashamed of our feature set and I am not out to beat MySQL or anyone else, just to deliver the best product that we can. Our community has different interests than the MySQL community and that is fine. Still, I don't disagree that we should be aiming at feature parity. reads MySQL documentation I see now that you've tried to design this feature in a way that is similar to MySQL's offering, which does have some value. But it appears to me that the documentation you've written here is substantially similar to the MySQL 5.5 reference documentation. That could get us into a world of legal trouble - that documentation is not even open source, let alone BSD. http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html I would rather concentrate on a minimal set of functionality that we can all agree on. Me too; and perhaps your proposal is it. But I think it's a shame we didn't put more work into standby registration when we had time to get that done. It might not be necessary, but it would have delivered some nice functionality that we are now not going to have for 9.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On Thu, Dec 30, 2010 at 9:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think this thread has worked itself around to where it's entirely pointless. I understand your frustration, but it's not clear to me that there *is* any simple solution to this problem. Fundamentally, adding new relkinds to the system is always going to require running around and looking at a lot of code to see what's affected; and that goes for the error messages too. I put no stock at all in the idea that writing a guiding principle in the error messages will avoid anything, because as often as not, adding a fundamentally new relkind is going to involve some tweaking of what those principles are. I think that's true in some cases but not all. The system-generated attribute names thing actually applies in several cases, and I think it's pretty cut-and-dried. When you get into something like which kinds of relations support triggers, that's a lot more arbitrary. ... This message also does nothing to help the user understand WHY we don't allow renaming the attributes of his sequence or TOAST table, whereas the proposed revision does. I remain unconvinced that the average user cares, or will be able to extrapolate the message to understand what's supported or not, even if he does care about the reason for the restriction. I'm convinced, but that only makes one of us. I think for now what I had better do is try to get this SQL/MED patch finished up by soldiering through this mess rather than trying to fix it. I think it's going to be kind of ugly, but we haven't got another plan then we're just going to have to live with the ugliness. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problems with autovacuum and vacuum
On Thu, Dec 30, 2010 at 12:56 PM, JotaComm jota.c...@gmail.com wrote: Last week I had a serious problem with my PostgreSQL database. My autovacuum is OFF, but in September it started to prevent the transaction wraparoud; however last week the following message appeared continuously in my log: WARNING: database production must be vacuumed within 4827083 transactions HINT: To avoid a database shutdown, execute a full-database VACUUM in production. This message appeared for five to six hours; after that, the message disappeared from log. Any idea about what could have happened? I'm thinking that autovacuum kicked into gear to prevent transaction wraparound. Once it did enough work to stave off disaster, the warning messages stopped appearing in the log. Every day the vacuum is executed on some tables; and on Sundays it's executed on all tables. But as the autovacuum is running since September, and it runs for a long time, the vacuum was blocked because autovacuum had been running on the same table. How should I procede in this case? I guess the obvious thing to do would be to turn on autovacuum and forget about manual vacuums. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE
On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch n...@leadboat.com wrote: 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field and the answers from its first argument. Why its first argument? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE
On Fri, Dec 31, 2010 at 12:34:50AM -0500, Robert Haas wrote: On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch n...@leadboat.com wrote: 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field and the answers from its first argument. Why its first argument? funcexempt would only be nonzero for FuncExpr of length coercion casts. Those have the subject datum as a first argument, typmod as second, and is-explicit boolean as third. The other arguments are effectively already validated. That brings up a point -- the exemptor function also needs an is-explicit argument, as that affects the decision for some types. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 30.12.2010 22:27, Robert Haas wrote: On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggssi...@2ndquadrant.com wrote: synchronous_replication (boolean) Specifies whether transaction commit will wait for WAL records to be replicated before the command returns a success indication to the client. The word replicated here could be taken to mean different things, most obviously: - slave has received the WAL - slave has fsync'd the WAL - slave has applied the WAL Perhaps the level of replication guarantee should be decided on the slave side, by having a configuration parameter there report_as_replicated = received|written_to_disk|fsynced|applied for different types of hosts may have wildly different guarantees and performance parameters for these. One could envision a WAL-archive type standby which is there for data persistence only will and never apply WAL. of couse we could put a bitmap in the status update messages from slave and have some quorum on options on master for when the data is in sync, say need 5 received or (1 applied and 1 fsynced), but I am pretty sure that trying to get anywhere with this before applying the basic sync rep patch would push back sync rep to at least 9.2 if not 9.5 - Hannu Krosing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 31.12.2010 6:02, Robert Haas wrote: On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggssi...@2ndquadrant.com wrote: I'm not very clear what your response has to do with Stefan's comments. My general perspective is that MySQL released a simple design a year ahead of us, which should be to our collective shame. I will be working towards delivering something useful in this release. I don't feel ashamed of our feature set and I am not out to beat MySQL or anyone else, just to deliver the best product that we can. The key word here is deliver . The aim is to deliver sync rep, not specify, leaving out controversial details. The registration part has been left out for a reason - while the registration itself is easy, deciding all the interactions with already running replication is not. Doing just the minimal support for sync rep (need acknowledge from at least one standby) and leaving the management of standbys to user enables us to get to actual working code instead of a pie-in-the-sky wishlist. Our community has different interests than the MySQL community and that is fine. Still, I don't disagree that we should be aiming at feature parity. reads MySQL documentation I see now that you've tried to design this feature in a way that is similar to MySQL's offering, which does have some value. But it appears to me that the documentation you've written here is substantially similar to the MySQL 5.5 reference documentation. That could get us into a world of legal trouble - that documentation is not even open source, let alone BSD. http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html Maybe we should get someone who has not read mysql docs to re-write a spec in a clean room fashion, by just inspecting code and asking Simon et.al. I would rather concentrate on a minimal set of functionality that we can all agree on. Me too; and perhaps your proposal is it. But I think it's a shame we didn't put more work into standby registration when we had time to get that done. When you need _just_ the registration, then make a table and two functions pg_standby_register(name) and pg_standby_unregister(name) For a little more added functionality add a third one pg_standby_last_seen(name) to update last seen timestamp and a script that polls all standbys and calls this. It might not be necessary, but it would have delivered some nice functionality that we are now not going to have for 9.1. There are tons of nice functionality we are not going to have for 9.1, lets just not make this cause even more nice functionality being left out ! - Hannu Krosing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers