Re: [HACKERS] issue with smlar exension and gist index creation (9.2Beta1)
-Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Tuesday, June 05, 2012 9:22 PM To: mark Cc: 'pgsql-hackers' Subject: Re: [HACKERS] issue with smlar exension and gist index creation (9.2Beta1) mark dvlh...@gmail.com writes: I am playing around with 9.2Beta1 and the smlar extension that was presented at pgcon. Looks like a lot of great work has gone into both - so thanks to everyone for all the great work. I did run into an issue while creating a GIST index using the _text_sml_ops. I am getting ERROR: failed to re-find parent for block 25399. Try beta2 ... this sounds suspiciously like the bug Heikki fixed last week: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d1996 ed5e8bfaf1314e7817015668029c07d3b9b regards, tom lane Thanks, I will give that a shot and see if I still have the same issue. ..: Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof
On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote: On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: If that is the case, is there a convention we can use to separate the parts of the connection string (in both representations) into the parts sent to the server and the part that the client needs? We already abuse this a little bit because URI syntax (in general, not just our rendition of it) leaves little room for extension for parameters on the client side. Consider ?sslmode=require. In both representations, the net effect of a typo would be that instead of magically reading some properties on the client side, they'd be sent to the server. How often is this going to be so wrong that one cannot send a response from the server indicating to the user their error? On casual inspection it doesn't seem like prohibitively often, but I haven't mulled over that for very long. I think that's an excellent example of this being a bad idea. If you mis-spell sslmode=require, that should absolutely result in an error on the client side. Otherwise, you might end up sending your password (or other details that are not as sensitive, but still sensitive) over an unencrypted connection. If you wait for the error from the server, it's too late. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ExecStoreTuple going into infinite loop
Hi all, I am trying to build and store multiple tuples.The code is: ExecClearTuple(slot); /The code for fetching the data from which tuple will be formed../ for(;xy;x++){ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att), values); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } return (slot); When I am including the ExecClearTuple(slot),the result only includes the last tuple that was built.If I do not include ExecClearTuple(slot),the code goes into an infinite loop. Please help. Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interrupting long external library calls
FYI, I finally committed the code installing a signal handler in PostGIS, using the pqsignal function: https://trac.osgeo.org/postgis/changeset/9850 It is currently only used to interrupt GEOS calls, but the idea is that it could eventually also be used to interrupt other library calls having a provision for such interruption request. GEOS itself only supports it in the 3.4 branch. In order to test it you'll need to define POSTGIS_ENABLE_GEOS_INTERRUPTIBILITY at the top of postgis/postgis_module.c - the macro is off by default due to concerns about possible consequences we may be unaware of. Your comments will be very useful to reduce (or confirm) the concerns. If we can get this confidently straight there may be the possibility to toggle the default to on before 2.0.1 ... Thanks in advance. PS: Tom, I still don't know what you mean by the SIGINFO case. --strk; On Mon, May 28, 2012 at 08:48:21AM +0200, Sandro Santilli wrote: On Fri, May 25, 2012 at 12:34:54PM -0400, Tom Lane wrote: Sandro Santilli s...@keybit.net writes: I ended up providing an explicit mechanism to request interruption of whatever the library is doing, and experimented (successfully so far) requesting the interruption from a SIGINT handler. Do you see any major drawback in doing so ? This seems a bit fragile. It might work all right in Postgres, where we tend to set up signal handlers just once at process start, but ISTM other systems might assume they can change their signal handlers at any time. The handler itself looks less than portable anyway --- what about the SIGINFO case? Indeed setting the handler from within the library itself was a temporary implementation to see how effective it would have been. The idea is to move the registration of the hanlder outside the library and into the user (PostGIS in this case). The library itself would only expose GEOS_requestInterrupt/GEOS_cancelInterrupt API calls. I'm guessing PostGIS should use the more portable pqsignal functions ? What should I know about SIGINFO ? I assume that the geos::util::Interrupt::request() call sets a flag somewhere that's going to be periodically checked in long-running loops. Yes, this is what will happen. Would it be possible for the periodic checks to include a provision for a callback into Postgres-specific glue code, wherein you could test the same flags CHECK_FOR_INTERRUPTS does? A similar approach might then be usable in other contexts, and it seems safer to me than messing with a host environment's signal handling. Would it be enough for the signal handler (installed by PostGIS) to check those flags and call the GEOS_requestInterrupt function when appropriate ? --strk; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi, On Mon, 4 Jun 2012, Ants Aasma wrote: On Mon, Jun 4, 2012 at 7:44 PM, Merlin Moncure mmonc...@gmail.com wrote: I tried to keep it simple at first to find an answer to the question if it's even worth trying before expending large effort on it. If anyone with a multisocket machine would chip in, I'd love to know how this patch handles on larger machines. I think the most interesting workloads are read only loads with heavy buffer trashing but inside OS memory. Select only pgbench with 32MB shared_buffers was withín error margin, although slightly faster on my machine (Intel i2500K). The workload that I used to demonstrate gain was an aggregated index scan to minimise other overheads. I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecStoreTuple going into infinite loop
On Wed, Jun 6, 2012 at 5:36 AM, Atri Sharma atri.j...@gmail.com wrote: Hi all, I am trying to build and store multiple tuples.The code is: ExecClearTuple(slot); /The code for fetching the data from which tuple will be formed../ for(;xy;x++){ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att), values); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } return (slot); When I am including the ExecClearTuple(slot),the result only includes the last tuple that was built.If I do not include ExecClearTuple(slot),the code goes into an infinite loop. Please help. I've answered off list. The context here is fdw iteration. Atri was trying to return multiple rows inside the iteration which is not how the function is designed to work. (also this would be more appropriate question for -general). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ability to listen on two unix sockets
Hi, before I ask the main question, just a little background for one issue we're currently having in Fedora 17: PrivateTmp is a systemd's feature, which allows to have private /tmp directory for services, which in turn means that such services aren't able to access systems's /tmp directory. It's been enabled by some services already, including Apache, while PostgreSQL uses system's /tmp directory, where its unix socket is located. Naturally, it resulted in a state, where Apache or other services with PrivateTmp enabled are not able to communicate with PostgreSQL using the socket. Since we don't want just to move socket for compatibility reasons, I'm going to prepare a draft patch to allow PostgreSQL to use a second unix socket at a time. A question I'd like to ask now is: Do we need a new configuration variable for this or it's enough to have the location hard-coded? What are your opinions? Regards, Honza -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecStoreTuple going into infinite loop
Atri Sharma atri.j...@gmail.com writes: Hi all, I am trying to build and store multiple tuples.The code is: ExecClearTuple(slot); /The code for fetching the data from which tuple will be formed../ for(;xy;x++){ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att), values); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } return (slot); When I am including the ExecClearTuple(slot),the result only includes the last tuple that was built. I am not sure why you find this surprising. A tuple slot can only hold one tuple. If you're trying to build a set-returning function, you need a tuplestore not a tuple slot. Or else restructure the code to return each tuple as it's built. Look at existing SRFs for examples. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Jun6, 2012, at 15:50 , Honza Horak wrote: before I ask the main question, just a little background for one issue we're currently having in Fedora 17: PrivateTmp is a systemd's feature, which allows to have private /tmp directory for services, which in turn means that such services aren't able to access systems's /tmp directory. It's been enabled by some services already, including Apache, while PostgreSQL uses system's /tmp directory, where its unix socket is located. Naturally, it resulted in a state, where Apache or other services with PrivateTmp enabled are not able to communicate with PostgreSQL using the socket. Couldn't you simply tell postgres to put it's socket in, say, /var/run, and create a symlink to that socket in the global /tmp directory? Since we don't want just to move socket for compatibility reasons, I'm going to prepare a draft patch to allow PostgreSQL to use a second unix socket at a time. A question I'd like to ask now is: Do we need a new configuration variable for this or it's enough to have the location hard-coded? What are your opinions? If we're going to have this at all, we should go all the way and support an arbitrary number of sockets. But then, is there any advantage in providing this feature natively compare to simply creating symlinks? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
Florian Pflug f...@phlo.org writes: Couldn't you simply tell postgres to put it's socket in, say, /var/run, and create a symlink to that socket in the global /tmp directory? FYI, this proposal emerged out of a discussion between Honza and myself. Use a symlink was my first idea too, but on reflection it seems like it will take less new code to support two sockets. We already support multiple TCP sockets, so multiple Unix sockets shouldn't be that much extra trouble. The reasons a symlink doesn't seem attractive are: 1. The code to create/delete it has to be in the postmaster. If we tried to make the Fedora-specific startup script manage it, we would first have to teach that script how to know which port number the postmaster will select, which means parsing config files. Ugh. 2. What if two postmasters try to create a symlink in the same place? Or we're just trying to decide if the previous creator crashed without removing it? So we need a lockfile beside it. So at this point we are building a whole bunch of new infrastructure to create symlinks, whereas we can probably just call the same subroutine twice if we go with the two-socket design. If we're going to have this at all, we should go all the way and support an arbitrary number of sockets. Well, that's what I wanted to discuss before Honza starts coding. It's not obvious that there are any use-cases for more than two. It's also not clear whether there is any value in supporting run-time rather than build-time configuration of the socket locations. The Fedora use-case has no need of that, but if people can point to other cases where it would be sensible, we can write the patch that way. You might think we should design this exactly like the TCP-socket multiple-listen-addresses case, ie just have a config variable containing a list of directory names. The sticking point there is that the directories aren't really interchangeable. In particular, there is still going to be one directory that is the one hard-wired into libpq. So whereas multiple TCP sockets really are pretty much interchangeable, I think in the Unix-socket case we are going to have to think of it as being a primary socket and one or more alternate sockets. Is there a reason to have more than one alternate, and if so what is the use-case? (BTW, we would probably just adopt the Debian solution if we were sure there were no non-libpq clients out there; but we aren't.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote: Florian Pflug f...@phlo.org writes: If we're going to have this at all, we should go all the way and support an arbitrary number of sockets. Well, that's what I wanted to discuss before Honza starts coding. It's not obvious that there are any use-cases for more than two. It's also not clear whether there is any value in supporting run-time rather than build-time configuration of the socket locations. The Fedora use-case has no need of that, but if people can point to other cases where it would be sensible, we can write the patch that way. I had the need to make pg available from multiple chroots via unix sockets. The same might come up more frequently with the availability of filesystem namespaces... You might think we should design this exactly like the TCP-socket multiple-listen-addresses case, ie just have a config variable containing a list of directory names. The sticking point there is that the directories aren't really interchangeable. In particular, there is still going to be one directory that is the one hard-wired into libpq. I wonder if the whole issue doesn't require libpq to also try multiple hardcoded socket locations. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
Excerpts from Tom Lane's message of mié jun 06 10:38:42 -0400 2012: Florian Pflug f...@phlo.org writes: Couldn't you simply tell postgres to put it's socket in, say, /var/run, and create a symlink to that socket in the global /tmp directory? FYI, this proposal emerged out of a discussion between Honza and myself. Use a symlink was my first idea too, but on reflection it seems like it will take less new code to support two sockets. We already support multiple TCP sockets, so multiple Unix sockets shouldn't be that much extra trouble. The reasons a symlink doesn't seem attractive are: 1. The code to create/delete it has to be in the postmaster. If we tried to make the Fedora-specific startup script manage it, we would first have to teach that script how to know which port number the postmaster will select, which means parsing config files. Ugh. Well, you could use postmaster -C port The other reason seems compelling enough, though ... particularly, handling a lockfile sounds messy; if it's a symlink and it's created by the script, then it would need a separate lockfile, and filling its data wouldn't be exactly trivial. (BTW, we would probably just adopt the Debian solution if we were sure there were no non-libpq clients out there; but we aren't.) Maybe this is a good time to make the /var/run socket location (Debian's choice) the primary one, and /tmp be the alternate. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On 06/06/2012 04:50 PM, Andres Freund wrote: On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote: Florian Pflugf...@phlo.org writes: If we're going to have this at all, we should go all the way and support an arbitrary number of sockets. Well, that's what I wanted to discuss before Honza starts coding. It's not obvious that there are any use-cases for more than two. It's also not clear whether there is any value in supporting run-time rather than build-time configuration of the socket locations. The Fedora use-case has no need of that, but if people can point to other cases where it would be sensible, we can write the patch that way. I had the need to make pg available from multiple chroots via unix sockets. The same might come up more frequently with the availability of filesystem namespaces... It seems you were not alone with such need: http://archives.postgresql.org/pgsql-novice/2006-09/msg00172.php Honza -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mié jun 06 10:38:42 -0400 2012: (BTW, we would probably just adopt the Debian solution if we were sure there were no non-libpq clients out there; but we aren't.) Maybe this is a good time to make the /var/run socket location (Debian's choice) the primary one, and /tmp be the alternate. I'm not really in favor of making /var/run be the out-of-the-box default, because it would discriminate against personal/testing installations (ie, you couldn't set it up without root privileges). It's a reasonable choice for distro-provided packages, but not so much for one-off builds --- think about the buildfarm if nothing else. Having said that, if we made it easier to configure things that way than by patching the source, I bet Martin Pitt isn't going to object. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
Honza Horak hho...@redhat.com writes: On 06/06/2012 04:50 PM, Andres Freund wrote: On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote: Florian Pflugf...@phlo.org writes: If we're going to have this at all, we should go all the way and support an arbitrary number of sockets. Well, that's what I wanted to discuss before Honza starts coding. It's not obvious that there are any use-cases for more than two. It's also not clear whether there is any value in supporting run-time rather than build-time configuration of the socket locations. The Fedora use-case has no need of that, but if people can point to other cases where it would be sensible, we can write the patch that way. I had the need to make pg available from multiple chroots via unix sockets. The same might come up more frequently with the availability of filesystem namespaces... It seems you were not alone with such need: http://archives.postgresql.org/pgsql-novice/2006-09/msg00172.php I had forgotten that conversation, but it does seem like there is interest in this type of configuration. Can anybody confirm that dropping a socket into a chroot or jail would actually work, ie make it possible to connect from inside the chroot to a postmaster running outside? If that's real and not just wishful thinking, it seems like enough of an argument to justify supporting N sockets. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Wed, Jun 06, 2012 at 11:32:45AM -0400, Tom Lane wrote: I had forgotten that conversation, but it does seem like there is interest in this type of configuration. Can anybody confirm that dropping a socket into a chroot or jail would actually work, ie make it possible to connect from inside the chroot to a postmaster running outside? If that's real and not just wishful thinking, it seems like enough of an argument to justify supporting N sockets. We need to deal with exactly this sort of issue with schroot, where we may want to provide programs in the chroot with access to facilities outside the chroot. We generally just bind mount in the minimal set of stuff needed. This might mean binding just the socket, or it could be /var/run/postgresql. We do this for the X11 socket for our desktop configuration profile to permit X11 programs to run in a chroot, though we currently bind mount all of /tmp rather than just the socket, since we want that as well in any case. (http://people.debian.org/~rleigh/schroot.pdf) While not exactly what was proposed (multiple sockets), this allows one to re-use a single socket without the daemon requiring any special support for it. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linuxhttp://people.debian.org/~rleigh/ `. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools `-GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof
On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote: On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: If that is the case, is there a convention we can use to separate the parts of the connection string (in both representations) into the parts sent to the server and the part that the client needs? We already abuse this a little bit because URI syntax (in general, not just our rendition of it) leaves little room for extension for parameters on the client side. Consider ?sslmode=require. In both representations, the net effect of a typo would be that instead of magically reading some properties on the client side, they'd be sent to the server. How often is this going to be so wrong that one cannot send a response from the server indicating to the user their error? On casual inspection it doesn't seem like prohibitively often, but I haven't mulled over that for very long. I think that's an excellent example of this being a bad idea. If you mis-spell sslmode=require, that should absolutely result in an error on the client side. Otherwise, you might end up sending your password (or other details that are not as sensitive, but still sensitive) over an unencrypted connection. If you wait for the error from the server, it's too late. That is an excellent point. Is there enough time in the day to gripe about how sslmode=require is not the default? Well, this seems pretty obviated by the prefix-naming convention, but it's an iron clad example of how the older idea was a bad one. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] page is not marked all-visible warning in regression tests
On Tuesday, June 05, 2012 04:18:44 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote: I got this last night in a perfectly standard build of HEAD: + WARNING: page is not marked all-visible but visibility map bit is set in relation pg_db_role_setting page 0 -- I have seen that twice just yesterday. Couldn't reproduce it so far. Workload was (pretty exactly): initdb postgres -c fsync=off pgbench -i -s 100 CREATE TABLE data(id serial primary key, data int); ALTER SEQUENCE data_id_seq INCREMENT 2; VACUUM FREEZE; normal shutdown postgres -c fsync=on pgbench -c 20 -j 20 -T 100 WARNING: ... pg_depend ... WARNING: ... can't remember ... Hmm ... from memory, what I did was configure/build/install from a fresh pull initdb start postmaster, fsync off make installcheck stop postmaster apply Hanada-san's json patch, replace postgres executable start postmaster, fsync off make installcheck and it was the second of these runs that failed. Could we be missing flushing some blocks out to disk at shutdown? Maybe fsync off is a contributing factor? On a cursory lock it might just be a race condition in vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the warning to be visible, all_visible_according_to_vm is determined before we loop over all blocks. At the point where one specific heap block is actually read and locked that knowledge might be completely outdated by any concurrent backend. Am I missing something? I have to say the whole visibilitymap correctness and crash-safety seems to be quite under documented, especially as it seems to be somewhat intricate (to me). E.g. not having any note why visibilitymap_test doesn't need locking. (I guess the theory is that a 1 byte read will always be consistent. But how does that ensure other backends see an up2date value?). Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Wed, Jun 6, 2012 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that's what I wanted to discuss before Honza starts coding. It's not obvious that there are any use-cases for more than two. It's also not clear whether there is any value in supporting run-time rather than build-time configuration of the socket locations. The Fedora use-case has no need of that, but if people can point to other cases where it would be sensible, we can write the patch that way. You might think we should design this exactly like the TCP-socket multiple-listen-addresses case, ie just have a config variable containing a list of directory names. The sticking point there is that the directories aren't really interchangeable. In particular, there is still going to be one directory that is the one hard-wired into libpq. So whereas multiple TCP sockets really are pretty much interchangeable, I think in the Unix-socket case we are going to have to think of it as being a primary socket and one or more alternate sockets. Is there a reason to have more than one alternate, and if so what is the use-case? (BTW, we would probably just adopt the Debian solution if we were sure there were no non-libpq clients out there; but we aren't.) I recently had an urge to make it possible for the postmaster to listen on multiple ports and even went so far as to code up a patch to allow that. It still applies, with offsets, so I'll attach it here. So I guess I'm +1 on the idea of allowing N UNIX sockets rather than limiting it to N=2, and really I'd like to do one better and allow listening on multiple TCP ports as well. Since the PID file contains the port number, multiple TCP sockets stop being interchangeable as soon as you allow multiple ports, but that's not very difficult to handle. Now, you might ask whether this has any real-world value, and obviously I'm going to say yes or I wouldn't be proposing it. The reason for wanting multiple UNIX sockets is because those sockets might be in different places that are not all equally accessible to everyone, because of things like chroot. But of course the same thing is possible in the network space using iptables and similar tools. For example, you might want to have users connect to application A using port 5432, and to application B using port 15432. Now you can use network monitoring tools to see how much data each application is sending and receiving, without needing deep packet inspection. You can firewall those ports differently to provide access to different groups of users. And you can even decide, if the database gets overloaded, to cut off access to one of those ports, so that the application causing the problem becomes inaccessible but the rest of the database ceases being overloaded and you can still operate. Of course, you could also do that by changing pg_hba.conf, but for some people it might be more convenient (or feel more bullet-proof) to do it using network management tools. There are probably other use cases, as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company additional-sockets.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Tue, Jun 5, 2012 at 11:42 PM, Magnus Hagander mag...@hagander.net wrote: Works for me. We still need a (reworked) patch, though, right? We just move where the move between seconds and milliseconds happens? Attached is the updated version of the patch. I definitely don't think we need subsecond granularity in the user facing number. Even a second is pretty short. Yep. (We do need to retain the ability to set it to 0 = off of course). Yep, a value of zero disables the status updates, and the patch adds that explanation into the document of pg_basebackup and pg_receivexlog. Regards, -- Fujii Masao timeout_handling_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] page is not marked all-visible warning in regression tests
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund and...@2ndquadrant.com wrote: On a cursory lock it might just be a race condition in vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the warning to be visible, all_visible_according_to_vm is determined before we loop over all blocks. At the point where one specific heap block is actually read and locked that knowledge might be completely outdated by any concurrent backend. Am I missing something? No, I think you're right. I think that warning is bogus. I added it in place of some older warning which no longer made sense, but I think this one doesn't make sense either. I have to say the whole visibilitymap correctness and crash-safety seems to be quite under documented, especially as it seems to be somewhat intricate (to me). E.g. not having any note why visibilitymap_test doesn't need locking. (I guess the theory is that a 1 byte read will always be consistent. But how does that ensure other backends see an up2date value?). It's definitely intricate, and it's very possible that we should have some more documentation. I am not sure exactly what and where, but feel free to suggest something. visibilitymap_test() does have a comment saying that: /* * We don't need to lock the page, as we're only looking at a single bit. */ But that's a bit unsatisfying, because, as you say, it doesn't address the question of memory-ordering issues. I think that there's no situation in which it causes a problem to see the visibility map bit as unset when in reality it has just recently been set by some other back-end. It would be bad if someone did something like: if (visibilitymap_test(...)) visibilitymap_clear(); ...because then memory-ordering issues could cause us to accidentally fail to clear the bit. No one should be doing that, though; the relevant locking and conditional logic is built directly into visibilitymap_clear(). On the flip side, if someone sees the visibility map bit as set when it's actually just been cleared, that could cause a problem - most seriously, index-only scans could return wrong answers. For that to happen, someone would have to insert a heap tuple onto a previously all-visible page, clearing the visibility map bit, and then insert an index tuple; concurrently, some other backend would need to see the index tuple but not the fact that the visibility map bit had been cleared. I don't think that can happen: after inserting the heap tuple, the inserting backend would release buffer content lock, which acts as a full memory barrier; before reading the index tuple, the index-only-scanning backend would have to take the content lock on the index buffer, which also acts as a full memory barrier. So the inserter can't do the writes out of order, and the index-only-scanner can't do the reads out of order, so I think it's safe but we probably do need to explain that somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and feedback message
On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote: Right now, pg_receivexlog sets: replymsg-write = InvalidXLogRecPtr; replymsg-flush = InvalidXLogRecPtr; replymsg-apply = InvalidXLogRecPtr; when it sends it's status updates. I'm thinking it sohuld set replymsg-write = blockpos instad. Why? That way you can see in pg_stat_replication what has actually been received by pg_receivexlog - not just what we last sent. This can be useful in combination with an archive_command that can block WAL recycling until it has been saved to the standby. And it would be useful as a general monitoring thing as well. I think the original reason was that it shouldn't interefer with synchronous replication - but it does take away a fairly useful usecase... I think that not only replaymsg-write but also -flush should be set to blockpos in pg_receivexlog. Which allows pg_receivexlog to behave as synchronous standby, so we can write WAL to both local and remote synchronously. I believe there are some use cases for synchronous pg_receivexlog. pg_receivexlog doesn't currently fsync() after every write. It only fsync():s complete files. So we'd need to set -flush only at the end of a segment, right? Yes. Currently the status update is sent for each status interval. In sync replication, transaction has to wait for a while even after pg_receivexlog has written or flushed the WAL data. So we should add new option which specifies whether pg_receivexlog sends the status packet back as soon as it writes or flushes the WAL data, like the walreceiver does? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] creating objects in pg_catalog
Right now, you can't directly create a relation (table, index, composite type) in the pg_catalog schema, but you can create a non-relation (function, domain, etc.) in the pg_catalog schema. Furthermore, you can create a table in some other schema and then move it into the pg_catalog schema using ALTER TABLE .. SET SCHEMA. After you do that, you can't move it back out again, nor can you drop it; or at least not without setting allow_system_table_mods. This all seems pretty wonky and inconsistent to me. It strikes me that we ought to either (1) allow users to place SQL objects in pg_catalog or (2) not. Having a weird special case that disallows it only for relations, but then lets you do it anyway via the back door, seems pretty pointless. Tabula raza, I'd argue for getting tough on this, and error out on any attempt to get a user-created SQL object into pg_catalog by any means, unless allow_system_table_mods is set. However, considering that we have two extensions whose extension install scripts do this -- adminpack and sepgsql -- and one of those (adminpack) is extremely widely used, that seems like it might be asking for trouble. So maybe we should just go the opposite direction and just remove the rather toothless prohibition that currently exists. Or, as a middle way, we could tighten up the prohibition, but also provide a GUC other than allow_system_table_mods that can be changed via SET LOCAL by extension scripts that need to do this. allow_system_table_mods requires a server restart and is purposefully undocumented, so it's not a good thing to rely on for prohibitions that people might need to work around on a production system. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \conninfo and SSL
On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner b...@ctrlf5.co.za wrote: A one-line change adds the SSL info on its own line like -- You are connected to database scratch as user scratch on host 127.0.0.1 at port 5432. SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) -- Does this need a more integrated presentation, and therefore a broader change to make it translatable? +1 for doing it that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and feedback message
On Tue, Jun 5, 2012 at 10:44 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote: Right now, pg_receivexlog sets: replymsg-write = InvalidXLogRecPtr; replymsg-flush = InvalidXLogRecPtr; replymsg-apply = InvalidXLogRecPtr; when it sends it's status updates. I'm thinking it sohuld set replymsg-write = blockpos instad. Why? That way you can see in pg_stat_replication what has actually been received by pg_receivexlog - not just what we last sent. This can be useful in combination with an archive_command that can block WAL recycling until it has been saved to the standby. And it would be useful as a general monitoring thing as well. I think the original reason was that it shouldn't interefer with synchronous replication - but it does take away a fairly useful usecase... I think that not only replaymsg-write but also -flush should be set to blockpos in pg_receivexlog. Which allows pg_receivexlog to behave as synchronous standby, so we can write WAL to both local and remote synchronously. I believe there are some use cases for synchronous pg_receivexlog. pg_receivexlog doesn't currently fsync() after every write. It only fsync():s complete files. So we'd need to set -flush only at the end of a segment, right? If you want to be able to use it as a synchronous standby, that's not going to work very well. You could end up with pg_receivexlog waiting for the end of the segment before it flushes; meanwhile, all the clients are sitting there waiting for the flush to happen before they do anything that could generate more WAL to fill the segment. Unless you have a solution to that problem, I'd recommend setting write (which should work with the new remote_write mode for sync rep) but not setting flush. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Was this the range scan on the test table? (sorry about the error in the query, the x should really be id) In that case the results look really suspicious. My machine (4 cores, no ht, @ 4GHz, newer arch) peaked at 90tps with the stated configuration. Even when upping the shared_buffers and enabling indexonlyscan I didn't see more than about 540tps per thread. The test is designed to exercise buffer eviction, doing about 9800 buffer reads per transaction with 32MB of buffers. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] page is not marked all-visible warning in regression tests
On Wednesday, June 06, 2012 08:19:15 PM Robert Haas wrote: On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund and...@2ndquadrant.com wrote: On a cursory lock it might just be a race condition in vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the warning to be visible, all_visible_according_to_vm is determined before we loop over all blocks. At the point where one specific heap block is actually read and locked that knowledge might be completely outdated by any concurrent backend. Am I missing something? No, I think you're right. I think that warning is bogus. I added it in place of some older warning which no longer made sense, but I think this one doesn't make sense either. Agreed. It might be interesting to recheck the visibility and warn if its wrong. That should be infrequent enough to bearable and it does check for an actually dangerous case in a new code path. I have to say the whole visibilitymap correctness and crash-safety seems to be quite under documented, especially as it seems to be somewhat intricate (to me). E.g. not having any note why visibilitymap_test doesn't need locking. (I guess the theory is that a 1 byte read will always be consistent. But how does that ensure other backends see an up2date value?). It's definitely intricate, and it's very possible that we should have some more documentation. I am not sure exactly what and where, but feel free to suggest something. I think some addition to the LOCKING part of visibilitymap.c's header explaining some of what you wrote in your email might be a good start. I would also suggest explictly mentioning that its ok to have the visibilitymap and the page disagreeing about the visibility if its the visibility map that think that the page contains invisible data but not the other way round (I think that can currently happen). visibilitymap_test() should explain that its results can be outdated if youre not holding a buffer lock. visibilitymap_test() does have a comment saying that: /* * We don't need to lock the page, as we're only looking at a single bit. */ Oh. I conveniently skipped that comment in my brain ;) But that's a bit unsatisfying, because, as you say, it doesn't address the question of memory-ordering issues. I think that there's no situation in which it causes a problem to see the visibility map bit as unset when in reality it has just recently been set by some other back-end. It would be bad if someone did something like: if (visibilitymap_test(...)) visibilitymap_clear(); ...because then memory-ordering issues could cause us to accidentally fail to clear the bit. No one should be doing that, though; the relevant locking and conditional logic is built directly into visibilitymap_clear(). Then _test should document that... I don't think its impossible that we will grow more uses of the visibilitymap logic. On the flip side, if someone sees the visibility map bit as set when it's actually just been cleared, that could cause a problem - most seriously, index-only scans could return wrong answers. For that to happen, someone would have to insert a heap tuple onto a previously all-visible page, clearing the visibility map bit, and then insert an index tuple; concurrently, some other backend would need to see the index tuple but not the fact that the visibility map bit had been cleared. I don't think that can happen: after inserting the heap tuple, the inserting backend would release buffer content lock, which acts as a full memory barrier; before reading the index tuple, the index-only-scanning backend would have to take the content lock on the index buffer, which also acts as a full memory barrier. So the inserter can't do the writes out of order, and the index-only-scanner can't do the reads out of order, so I think it's safe but we probably do need to explain that somewhere. Hm. For a short while I thought there would be an issue with heap_delete and IOS because the deleting transaction can commit without any barriers happening on the IOS side. But that only seems to be possible with non MVCC snapshots which are currently not allowed with index only scans. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums
On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote: Thoughts? Simon already proposed a way of doing this that doesn't require explicit user action, which seems preferable to a method that does require explicit user action, even though it's a little harder to implement. His idea was to store the XID of the process creating the table in the pg_class row, which I think is *probably* better than your idea of having a process that waits and then flips the flag. There are some finicky details though - see previous thread for discussion of some of the issues. It would be very nice to have a method that detects whether or not there is only one open snapshot in a particular backend. Any time that condition is met, tuples written into a table created or truncated in the current transaction can be written with HEAP_XMIN_COMMITTED already set. That is not as nice as being able to set HEAP_XMIN_COMMITTED *and* PD_ALL_VISIBLE *and* the visibility map, but it would still be a big improvement over the status quo. I would like to see us get that part done and committed and then worry about writing the tuples with PD_ALL_VISIBLE set as a separate project. In many cases it would also be nice to write the tuples pre-frozen, so I think we should look for a design that will support that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Avoiding adjacent checkpoint records
In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the rule for when to skip checkpoints on the grounds that not enough activity has happened since the last one. However, that commit left the comment block about it in a nonsensical state: * If this isn't a shutdown or forced checkpoint, and we have not switched * to the next WAL file since the start of the last checkpoint, skip the * checkpoint. The idea here is to avoid inserting duplicate checkpoints * when the system is idle. That wastes log space, and more importantly it * exposes us to possible loss of both current and previous checkpoint * records if the machine crashes just as we're writing the update. * (Perhaps it'd make even more sense to checkpoint only when the previous * checkpoint record is in a different xlog page?) The new code entirely fails to prevent writing adjacent checkpoint records, because what it checks is the distance from the previous checkpoint's REDO pointer, not the previous checkpoint record itself. So the concern raised in the last two sentences of the comment isn't being addressed at all: if we corrupt the current page of WAL while trying to write the new checkpoint record, we risk losing the previous checkpoint record too. Should the system then crash, there is enough logic to back up to the second previous checkpoint record and roll forward from there --- but since we've lost the last checkpoint and up to one page's worth of preceding WAL records, there is no guarantee that we'll manage to reach a database state that is consistent with data already flushed out to disk during the last checkpoint. I started to make a quick patch to add an additional check on the location of the previous checkpoint record, so that we'd skip a new checkpoint unless we'd moved to a new page of WAL. However, if we really want to take this risk seriously, ISTM that allowing adjacent checkpoint records is bad all the time, not only for non-forced checkpoints. What I'm now thinking is that a more appropriate way to address that risk is to force a skip to a new page (not segment) of WAL after we write a checkpoint record. This won't waste much WAL space in view of the new rule to avoid checkpoints more than once per segment on average. On the other hand, you could argue that this concern is entirely hypothetical, and we're already basically assuming that once a WAL record has been flushed to disk it's safe there even if we're still writing more stuff into the same page. If we don't want to assume that, then any XLogFlush would have to include skip-to-new-page, and that's not going to be cheap. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WalSndWakeup() and synchronous_commit=off
On Tuesday, May 29, 2012 08:42:43 PM Andres Freund wrote: Hi, On Monday, May 28, 2012 07:11:53 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Does anybody have a better idea than to either call WalSndWakeup() at essentially the wrong places or calling it inside a critical section? Tom, what danger do you see from calling it in a critical section? My concern was basically that it might throw an error. Looking at the current implementation of SetLatch, it seems that's not possible, but I wonder whether we want to lock ourselves into that assumption. The assumption is already made at several other places I think. XLogSetAsyncXactLSN does a SetLatch and is called from critical sections; several signal handlers call it without any attention to the context. Requiring it to be called outside would make its usage considerably less convenient and I don't really see what could change that would require to throw non-panic errors. Still, if the alternatives are worse, maybe that's the best answer. If we do that, though, let's add comments to WalSndWakeup and SetLatch mentioning that they mustn't throw error. Patch attached. I would like to invite some more review (+commit...) here ;). Imo this is an annoying bug which should be fixed before next point release or beta/rc comes out... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding adjacent checkpoint records
On Wed, Jun 6, 2012 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the rule for when to skip checkpoints on the grounds that not enough activity has happened since the last one. However, that commit left the comment block about it in a nonsensical state: * If this isn't a shutdown or forced checkpoint, and we have not switched * to the next WAL file since the start of the last checkpoint, skip the * checkpoint. The idea here is to avoid inserting duplicate checkpoints * when the system is idle. That wastes log space, and more importantly it * exposes us to possible loss of both current and previous checkpoint * records if the machine crashes just as we're writing the update. * (Perhaps it'd make even more sense to checkpoint only when the previous * checkpoint record is in a different xlog page?) IIRC, the inspiration for the change was that we were getting a never-ending series of checkpoints even when nothing was happening at all: http://archives.postgresql.org/pgsql-hackers/2011-10/msg00207.php I felt (and still feel) that this was misguided. I understand why people don't want a completely idle system to checkpoint; but I can't recall a single complaint about a checkpoint on a system low but not zero activity. Checkpoints are pretty cheap when there isn't much data to flush. The flip side is that I know of real customers who would have suffered real data loss had this code been present in the server version they were using. Checkpoints are the *only* mechanism by which SLRU pages get flushed to disk on a mostly-idle system. That means if something happens to your pg_xlog directory, and you haven't had a checkpoint, you're screwed. Letting data sit in memory for hours, days, weeks, or months because we haven't filled up a WAL segment is just terrible. The first user who loses a transaction that was committed a month ago after running pg_resetxlog is going to hit the ceiling, and I don't blame them. It wouldn't be so bad if we had background writing for SLRU pages, because then you could figure that the OS would eventually have a chance to write the page out... but we don't. It'll just sit there in shared memory, dirty, forever. CLOG data in particular is FAR too precious to take that kind of chance with. I don't think there's much sense in doing push-ups to avoid having the current and previous checkpoint records are on the same XLOG page. If the system is so nearly idle that you get two checkpoint records in the same 8k block, and that block gets corrupted, it is extremely likely that you can run pg_resetxlog and be OK. If not, that means there were more XLOG records after the corrupted page, and you're not going to be able to replay those anyway, whether the checkpoint records are in the same 8k block or not. So I'm not seeing how your proposal is buying us any additional measure of safety that we don't already have. Of course, if we had a way to skip over the corrupted portion of WAL and pick up replaying records after that, that would be very useful (even though you'd have to view the resulting database with extreme suspicion), but without that I don't see that finding the previous checkpoint record is doing much for us. Either way, you've potentially lost changes that were covered by WAL records emitted after the most recent checkpoint. The only thing we can really do is make sure that there aren't likely to be too many more unreplayed records after the last checkpoint segment, which goes back to my previous complaint. As a side point, another reason not to make the checkpoint record consume the rest of the page is that, for scalability reasons, we want to minimize the amount of calculation that has to be done while holding WALInsertLock, and have as much of the computation as possible get done before acquiring it. XLogInsert() is already way more complicated than anything anyone ought to be doing while holding a heavily-contended LWLock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 6 Jun 2012, Ants Aasma wrote: On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Was this the range scan on the test table? (sorry about the error in the query, the x should really be id) In that case the results look really suspicious. Yes, my fault partially, because without much thought I've put value instead of x in the script. Now after replacing it by id the tps are much smaller. Here is the tps vs nthreads I did test up to 10 threads on my 24 cpu system (I disabled HT though): https://docs.google.com/open?id=0B7koR68V2nM1Nk9OcWNJOTRrYVE Your patch clearly improve the situation (the peak tps is ~ 10% higher), but the general picture is the same: flattening of tps vs nthreads. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \conninfo and SSL
Excerpts from Robert Haas's message of mié jun 06 14:45:46 -0400 2012: On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner b...@ctrlf5.co.za wrote: A one-line change adds the SSL info on its own line like -- You are connected to database scratch as user scratch on host 127.0.0.1 at port 5432. SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) -- Does this need a more integrated presentation, and therefore a broader change to make it translatable? +1 for doing it that way. Yeah, printSSLInfo already outputs translated stuff so this should be OK. Merging both messages into a single translatable unit would be pretty cumbersome, for no practical gain. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and feedback message
On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote: Right now, pg_receivexlog sets: replymsg-write = InvalidXLogRecPtr; replymsg-flush = InvalidXLogRecPtr; replymsg-apply = InvalidXLogRecPtr; when it sends it's status updates. I'm thinking it sohuld set replymsg-write = blockpos instad. Why? That way you can see in pg_stat_replication what has actually been received by pg_receivexlog - not just what we last sent. This can be useful in combination with an archive_command that can block WAL recycling until it has been saved to the standby. And it would be useful as a general monitoring thing as well. I think the original reason was that it shouldn't interefer with synchronous replication - but it does take away a fairly useful usecase... I think that not only replaymsg-write but also -flush should be set to blockpos in pg_receivexlog. Which allows pg_receivexlog to behave as synchronous standby, so we can write WAL to both local and remote synchronously. I believe there are some use cases for synchronous pg_receivexlog. pg_receivexlog doesn't currently fsync() after every write. It only fsync():s complete files. So we'd need to set -flush only at the end of a segment, right? Yes. Currently the status update is sent for each status interval. In sync replication, transaction has to wait for a while even after pg_receivexlog has written or flushed the WAL data. So we should add new option which specifies whether pg_receivexlog sends the status packet back as soon as it writes or flushes the WAL data, like the walreceiver does? That might be useful, but I think that's 9.3 material at this point. But I think we can get the set the write location in as a bugfix. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof
On Wed, Jun 6, 2012 at 6:58 PM, Daniel Farina dan...@heroku.com wrote: On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote: On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: If that is the case, is there a convention we can use to separate the parts of the connection string (in both representations) into the parts sent to the server and the part that the client needs? We already abuse this a little bit because URI syntax (in general, not just our rendition of it) leaves little room for extension for parameters on the client side. Consider ?sslmode=require. In both representations, the net effect of a typo would be that instead of magically reading some properties on the client side, they'd be sent to the server. How often is this going to be so wrong that one cannot send a response from the server indicating to the user their error? On casual inspection it doesn't seem like prohibitively often, but I haven't mulled over that for very long. I think that's an excellent example of this being a bad idea. If you mis-spell sslmode=require, that should absolutely result in an error on the client side. Otherwise, you might end up sending your password (or other details that are not as sensitive, but still sensitive) over an unencrypted connection. If you wait for the error from the server, it's too late. That is an excellent point. Is there enough time in the day to gripe about how sslmode=require is not the default? Well, you'll get me first in line to back that the current default is stupid. But I'm not sure sslmode=require is a proper default either. Because then the connection will fail completely to the vast majority of servers, which simply don't have SSL support. Well, this seems pretty obviated by the prefix-naming convention, but it's an iron clad example of how the older idea was a bad one. Yeah, a prefix based solution would fix this, since we can keep throwing errors. However, not throwing errors on the URL syntax should be considered a bug, I think. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree or not btree? That is the question
On Mon, Jun 04, 2012 at 02:09:44PM -0400, Tom Lane wrote: Greg Sabino Mullane g...@endpoint.com writes: We have a 8.3.18 system (yes, the same one from the previous thread, finally upgraded!) that gave us this error yesterday: ERROR: index pg_class_oid_index is not a btree That means you got bogus data while reading the metapage. I'm beginning to wonder about the hardware on this server ... Thanks for the reply. Me too. This is on a cluster[1], so we do have the luxury of testing the hardware on each box and shuffling things around. I'm also thinking we need to get away from the near-constant updating of pg_class, just as a good measure. [1] Bruce, yet another usage! :) -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpbXwlXRWU0v.pgp Description: PGP signature
Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof
On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote: However, not throwing errors on the URL syntax should be considered a bug, I think. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, Jun 6, 2012 at 2:53 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: On Wed, 6 Jun 2012, Ants Aasma wrote: On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Was this the range scan on the test table? (sorry about the error in the query, the x should really be id) In that case the results look really suspicious. Yes, my fault partially, because without much thought I've put value instead of x in the script. Now after replacing it by id the tps are much smaller. Here is the tps vs nthreads I did test up to 10 threads on my 24 cpu system (I disabled HT though): https://docs.google.com/open?id=0B7koR68V2nM1Nk9OcWNJOTRrYVE Your patch clearly improve the situation (the peak tps is ~ 10% higher), but the general picture is the same: flattening of tps vs nthreads. I think this is the expected result. In the single user case the spinklock never spins and only has to make the cpu-locking cache instructions once. can we see results @24 threads? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding adjacent checkpoint records
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 6, 2012 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the rule for when to skip checkpoints on the grounds that not enough activity has happened since the last one. IIRC, the inspiration for the change was that we were getting a never-ending series of checkpoints even when nothing was happening at all: http://archives.postgresql.org/pgsql-hackers/2011-10/msg00207.php Right. I felt (and still feel) that this was misguided. Looking at it again, I'm inclined to agree. The behavior was entirely correct up until somebody decided to emit a continuing stream of XLOG_RUNNING_XACTS WAL records even when the system is idle. Why did we not fix it by fixing that? I don't think there's much sense in doing push-ups to avoid having the current and previous checkpoint records are on the same XLOG page. Perhaps not. I only got exercised about it after noting that the commit hadn't updated the comment about it to match what the code is doing. If we end up reverting that commit and doing something else to fix the useless-checkpoint problem, I'm happy to let the subject rest, at least until we get some evidence of a real problem in the area. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding adjacent checkpoint records
On Wed, Jun 6, 2012 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: I felt (and still feel) that this was misguided. Looking at it again, I'm inclined to agree. The behavior was entirely correct up until somebody decided to emit a continuing stream of XLOG_RUNNING_XACTS WAL records even when the system is idle. Why did we not fix it by fixing that? That's exactly what I think we should have done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating objects in pg_catalog
Robert Haas robertmh...@gmail.com writes: Right now, you can't directly create a relation (table, index, composite type) in the pg_catalog schema, but you can create a non-relation (function, domain, etc.) in the pg_catalog schema. Surely this is true only for superusers. Superusers can do whatever they want anyway, no? Tabula raza, I'd argue for getting tough on this, and error out on any attempt to get a user-created SQL object into pg_catalog by any means, unless allow_system_table_mods is set. allow_system_table_mods is mainly intended to prevent people from altering the schemas of system catalogs, since it's more than likely that the backend C code will fail (nastily) to cope with such changes. I don't think it follows that we should prevent superusers from doing things that are perfectly safe, like adding new functions in pg_catalog. It's very likely that the specific restrictions enforced by allow_system_table_mods could stand a fresh look, but I don't agree with the idea of radically changing its charter. Nor do I think we need to put training wheels on superusers for any changes that aren't demonstrably likely to result in unrecoverable database corruption. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 6 Jun 2012, Merlin Moncure wrote: I think this is the expected result. In the single user case the spinklock never spins and only has to make the cpu-locking cache instructions once. can we see results @24 threads? Here https://docs.google.com/open?id=0B7koR68V2nM1NDJHLUhNSS0zbUk * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding adjacent checkpoint records
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 6, 2012 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: I felt (and still feel) that this was misguided. Looking at it again, I'm inclined to agree. The behavior was entirely correct up until somebody decided to emit a continuing stream of XLOG_RUNNING_XACTS WAL records even when the system is idle. Why did we not fix it by fixing that? That's exactly what I think we should have done. Actually, it looks like there is an extremely simple way to handle this, which is to move the call of LogStandbySnapshot (which generates the WAL record in question) to before the checkpoint's REDO pointer is set, but after we have decided that we need a checkpoint. That will result in later iterations not thinking that some work had happened while the checkpoint is in progress. It looks like we would need an extra release/reacquire of WALInsertLock to avoid holding that lock while doing LogStandbySnapshot, but that seems relatively negligible in comparison to the total cost of a checkpoint. There might be some still-better way to manage all this, but this one seems safe enough to consider as a post-beta patch. So I recommend we revert the change in the when-to-skip-checkpoint test in favor of reordering these operations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating objects in pg_catalog
On Wed, Jun 6, 2012 at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Right now, you can't directly create a relation (table, index, composite type) in the pg_catalog schema, but you can create a non-relation (function, domain, etc.) in the pg_catalog schema. Surely this is true only for superusers. Superusers can do whatever they want anyway, no? No. rhaas=# create table pg_catalog.tom (a int); ERROR: permission denied to create pg_catalog.tom DETAIL: System catalog modifications are currently disallowed. rhaas=# create table tom (a int); CREATE TABLE rhaas=# alter table tom set schema pg_catalog; ALTER TABLE rhaas=# create domain pg_catalog.lane as int; CREATE DOMAIN The offending error check is in heap_create(), and based on what you're saying here it seems like we should just rip it out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating objects in pg_catalog
Robert Haas robertmh...@gmail.com writes: rhaas=# create table pg_catalog.tom (a int); ERROR: permission denied to create pg_catalog.tom The offending error check is in heap_create(), and based on what you're saying here it seems like we should just rip it out. Hmm. Yeah, it seems like the regular permissions tests on the schemas in question should be enough to keep Joe User from making tables there, and I do not see a reason why the backend would care if there are non-catalog tables laying about in pg_catalog. Checking the commit history, it seems this was originally a test to prevent people from creating tables named pg_xxx: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f5a10e722c052006886b678995695001958a#patch3 which may or may not have been critical once upon a time, but surely is not any more. So no objection to removing that particular test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof
On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote: However, not throwing errors on the URL syntax should be considered a bug, I think. +1. +1 Here's a patch that just makes the thing an error. Of course we could revert it if it makes the URI feature otherwise unusable...but I don't see a huge and terrible blocker ATM. A major question mark for me any extra stuff in JDBC URLs. Nevertheless, here it is. -- fdr libpq-error-on-unknown-parameters-in-URIs-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and feedback message
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote: Right now, pg_receivexlog sets: replymsg-write = InvalidXLogRecPtr; replymsg-flush = InvalidXLogRecPtr; replymsg-apply = InvalidXLogRecPtr; when it sends it's status updates. I'm thinking it sohuld set replymsg-write = blockpos instad. Why? That way you can see in pg_stat_replication what has actually been received by pg_receivexlog - not just what we last sent. This can be useful in combination with an archive_command that can block WAL recycling until it has been saved to the standby. And it would be useful as a general monitoring thing as well. I think the original reason was that it shouldn't interefer with synchronous replication - but it does take away a fairly useful usecase... I think that not only replaymsg-write but also -flush should be set to blockpos in pg_receivexlog. Which allows pg_receivexlog to behave as synchronous standby, so we can write WAL to both local and remote synchronously. I believe there are some use cases for synchronous pg_receivexlog. pg_receivexlog doesn't currently fsync() after every write. It only fsync():s complete files. So we'd need to set -flush only at the end of a segment, right? Yes. Currently the status update is sent for each status interval. In sync replication, transaction has to wait for a while even after pg_receivexlog has written or flushed the WAL data. So we should add new option which specifies whether pg_receivexlog sends the status packet back as soon as it writes or flushes the WAL data, like the walreceiver does? That might be useful, but I think that's 9.3 material at this point. Fair enough. That's new feature rather than a bugfix. But I think we can get the set the write location in as a bugfix. Also set the flush location? Sending the flush location back seems helpful when using pg_receivexlog for WAL archiving purpose. By seeing the flush location we can ensure that WAL file has been archived durably (IOW, WAL file has been flushed in remote archive area). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding adjacent checkpoint records
I wrote: Actually, it looks like there is an extremely simple way to handle this, which is to move the call of LogStandbySnapshot (which generates the WAL record in question) to before the checkpoint's REDO pointer is set, but after we have decided that we need a checkpoint. On further contemplation, there is a downside to that idea, which probably explains why the code was written as it was: if we place the XLOG_RUNNING_XACTS WAL record emitted during a checkpoint before rather than after the checkpoint's REDO point, then a hot standby slave starting up from that checkpoint won't process the XLOG_RUNNING_XACTS record. That means its KnownAssignedXids machinery won't be fully operational until the master starts another checkpoint, which might be awhile. So this could result in undesirable delay in hot standby mode becoming active. I am not sure how significant this really is though. Comments? If we don't like that, I can think of a couple of other ways to get there, but they have their own downsides: * Instead of trying to detect after-the-fact whether any concurrent WAL activity happened during the last checkpoint, we could detect it during the checkpoint and then keep the info in a static variable in the checkpointer process until next time. However, I don't see any bulletproof way to do this without adding at least one or two lines of code within XLogInsert, which I'm sure Robert will complain about. * We could expand checkpoint records to contain two different REDO pointers, one to be used by hot standby slaves and one for normal crash recovery. (The LogStandbySnapshot records would appear between these two points; we'd still be moving them up to the start of the checkpoint sequence.) This is a relatively clean solution but would force pg_upgrade between beta2 and beta3, so that's not so nice. * Combining the two ideas, we could take the nominal REDO pointer, run LogStandbySnapshot, make a fresh note of where the insert point is (real REDO point, which is what we publish in shared memory for the bufmgr to compare LSNs to), complete the checkpoint, and write the checkpoint record using the nominal REDO pointer so that that's where any crash or HS slave starts from. But save the real REDO pointer in checkpointer static state, and in the next checkpoint use that rather than the nominal pointer to decide if anything's happened that would force a new checkpoint. I think this dodges both of the above complaints, but it feels pretty baroque. Thoughts, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums
On Wed, Jun 06, 2012 at 03:08:05PM -0400, Robert Haas wrote: On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote: Thoughts? Simon already proposed a way of doing this that doesn't require explicit user action, which seems preferable to a method that does require explicit user action, even though it's a little harder to implement. His idea was to store the XID of the process creating the table in the pg_class row, which I think is *probably* better than your idea of having a process that waits and then flips the flag. There are some finicky details though - see previous thread for discussion of some of the issues. I think both improvements have a place. Loads to a new heap should take as many automatic shortcuts as practical. Loads to an existing heap also matter, and Jeff's proposal helps those. It would be very nice to have a method that detects whether or not there is only one open snapshot in a particular backend. Any time that condition is met, tuples written into a table created or truncated in the current transaction can be written with HEAP_XMIN_COMMITTED already set. Perhaps criteria like those in CheckTableNotInUse() would suffice? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums
On Wed, 2012-06-06 at 15:08 -0400, Robert Haas wrote: On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote: Thoughts? Simon already proposed a way of doing this that doesn't require explicit user action, which seems preferable to a method that does require explicit user action, even though it's a little harder to implement. His idea was to store the XID of the process creating the table in the pg_class row, which I think is *probably* better than your idea of having a process that waits and then flips the flag. There are some finicky details though - see previous thread for discussion of some of the issues. My goals include: * The ability to load into existing tables with existing data * The ability to load concurrently My understanding was that the proposal to which you're referring can't do those things, which seem like major limitations. Did I miss something? In many cases it would also be nice to write the tuples pre-frozen, so I think we should look for a design that will support that. You're right, that would be nice. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time for pgindent run?
On Tue, Jun 5, 2012 at 10:25 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jun 05, 2012 at 10:21:14AM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Is everyone ready for me to run pgindent? We are nearing the first commit-fest (June 15) and will have to branch the git tree soon. Also, we should do the pgindent run well before the commitfest, so that authors of pending patches have time to rebase their patches in case pgindent changes the code they are patching ... Ah, good point. That will affect commit-fest patches. We could run it only on the 9.3 branch, but that makes double-patching very hard. Is everyone good for a pgindent run this week? The sooner the better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums
On Mon, Jun 04, 2012 at 06:26:04PM -0700, Jeff Davis wrote: I propose a special user-initiated loading mode at the table granularity. During this time, readers must ignore PD_ALL_VISIBLE, HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers may set all of those bits before the writing transaction commits, obviating the need to rewrite (and WAL) the data again later. Ideally, there would be no work for VACUUM to do after the data load (unless a transaction aborted). Note that, currently, only VACUUM sets PD_ALL_VISIBLE and visibility map bits. Would you make something else like heap_multi_insert() be able to do so? Goals: = * Table granularity (doesn't affect other tables at all) * Allows concurrent loaders * Allows loading into existing tables with existing data * Online (allow reads to proceed, even if degraded) +1 Obviously, readers and writers would need a mechanism to honor those flags, but I haven't dug into the details yet (additional routines in tqual.c?). Avoiding measurable overhead in tuple visibility checks when the feature is inactive may well prove to be a key implementation challenge. FINALIZE LOAD would first move from state 2 to state 3 by acquiring a ShareUpdateExclusiveLock on the table setting optimistichints = false. Then, it would move from state 3 to state 0 by first waiting for all transactions that currently hold a lock on the table, to ensure they see the optimistichints=false flag. This is certainly necessary, but ... Then, it would remember the current xid as max_loader_xid, and wait until the global xmin is greater than max_loader_xid. This should ensure that all snapshots regard all loading transactions as complete. ... this might not be. Each backend could decide, based on its own xmin, whether to ignore PD_ALL_VISIBLE in a given table. In other words, your ignorehints flag could be an xmin set to InvalidTransactionId during stages 1 and 2 and to the earliest safe xmin during stages 0 and 3. * INITIATE and FINALIZE probably need to use PreventTransactionChain() and multiple transactions, to avoid holding the ShareUpdateExclusiveLock for too long. Also, we want to keep people from using it in the same transaction as the loading xact, because they might not realize that they would get a concurrency of 1 that way (because of the ShareUpdateExclusiveLock). Yes. You need to commit the transaction modifying pg_class so other backends can observe the change, at which point you can gather the list to wait on. Consider splitting the INITIATE UI into two interfaces, one that transitions from state 0 to state 1 and another that expects state 1 and blocks until we reach state 2. You then have no need for PreventTransactionChain(), and the interfaces could even be normal functions. It's less clear how reasonably you could do this for the FINALIZE step, given its implicit VACUUM. It could be achieved by having the user do the VACUUM and making the new interface merely throw an error if a VACUUM is still needed. The trivial usage pattern might look like this: SELECT pg_initiate_load('bigtbl'); SELECT pg_wait_load('bigtbl'); -- not a great name COPY bigtbl FROM STDIN; SELECT pg_stop_load('bigtbl'); VACUUM bigtbl; SELECT pg_finalize_load('bigtbl'); It's definitely less elegant, alas. Perhaps offer the interface you've proposed and have it do the above under the hood. That way, users with complex needs have the flexibility of the lower-level interfaces while those who can tolerate PreventTransactionChain() have simplicity. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Early hint bit setting
On 5/30/12 4:42 PM, Ants Aasma wrote: I was thinking about what is the earliest time where we could set hint bits. This would be just after the commit has been made visible. Except that's only true when there are no other transactions running. That's been one of the big sticking points about trying to proactively set hint bits; in a real system you're not going to gain very much unless you wait a while before setting them. An interesting option might be to keep the first XID that dirtied a page and loop through all pages in the background looking for pages where first_dirty_xid is the oldest running XID. Those pages would have hint bits that could be set. While scanning the page you would want to set first_dirty_xid to the oldest XID that could not be hinted. This is a modification of the idea to set hint bits when a page is on it's way out of the buffer; the advantage here is that it would also handle pages that are too hot to leave the buffer. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How could we make it simple to access the log as a table?
On 5/28/12 2:55 PM, Robert Haas wrote: As far as CSV goes, I think the biggest deficiency is that there's a mismatch between the way that log files are typically named (e.g. one per day, or one per hour) and the way that a CSV foreign table is created (you've got to point it at one particular file). Maybe we could have a CSV reader that understands PostgreSQL-format CSV logs, but you point it at a directory, rather than a single file, and it reads all the CSV files in the directory. And maybe it could also be smart enough that if you've got a WHERE clause that filter by date, it uses that to skip any files that can be proven irrelevant. So the user can just turn on CSV logging, point the FDW at the log directory, and away they go. The idea is a really good one, except for one thing: this pattern is undoubtedly not unique to PG CSV logs. ISTM it would be extremely useful to have a generic ability to define a filename pattern and have all files matching that pattern be pulled in via FDW; possibly be treating each file as a partition, possibly complete with exclusion constraints. ISTM that what's really needed here are a bunch of separate improvements to our logging and/or FDW, each of which will add usefulness. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Could we replace SysV semaphores with latches?
There has been regular griping in this list about our dependence on SysV shared memory, but not so much about SysV semaphores, even though the latter cause their fair share of issues; as seen for example in buildfarm member spoonbill's recent string of failures: creating template1 database in /home/pgbuild/pgbuildfarm/HEAD/pgsql.25563/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create semaphores: No space left on device DETAIL: Failed system call was semget(1, 17, 03600). HINT: This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. The PostgreSQL documentation contains more information about configuring your system for PostgreSQL. child process exited with exit code 1 It strikes me that we have recently put together an independent but just about equivalent waiting mechanism in the form of latches. And not only that, but there's already a latch for each process. Could we replace our usages of SysV semaphores with WaitLatch on the procLatch? Unlike the situation with shared memory where we need some secondary features (mumble shm_nattch mumble), I think we aren't really using anything interesting about SysV semaphores except for the raw ability to wait for somebody to signal us. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers