Re: [HACKERS] Condition to become the standby mode.
Hi, On 2013-07-26 13:19:34 +0900, Sawada Masahiko wrote: When the slave server starts, the slave server perform the following steps in StartupXLOG(): 1. Read latest CheckPoint record LSN from pg_control file. 2. Try to fetch CheckPoint record from pg_xlog directory at first. ( The server try to read up to prior CheckPoint record from pg_contrl file) 3. If there is not in pg_xlog, the slave server requests CheckPoint record to the primary server. in #3, it works only when StandbyMode is true. For StandbyMode is to true, database cluster state should be DB_SHUTDOWNED (it is one of the conditions). that is, slave server can try to fetch checkpoint record from the master server after slave server was successful completion. It also will fetch the xlog from remote when we've found a backup label (the read_backup_label() branch in StartupXLOG() will set StandbyMode to true). Which will be the case if we're recovering from something like a base backup. The reason we don't fetch from remote if there's no backup label *and* the last checkpoint wasn't a shutdown checkpoint is that that's doing *local* crash recovery. Usually it will happen because we needed to restart after a crash (or immediate restart which basically is the same). There's one valid case where you can get into the situation otherwise as well, which is that the *entire* database directory has been copied via an atomic snapshot. But in that case you *need* to also snapshot pg_xlog. Maybe there's another valid scenario? If this problem is solved, there is possible of that we can failback by removing the all WAL record which is in pg_xlog before server starts as the slave server. ( And we also use synchronous_transfer which I'm proposing, I think we can fail-back without taking full backup surely) I still have *massive* doubts about the concept. But anyway, if you want to do so, you should generate a backup label that specifies the startup location. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inconsistent state after crash recovery
Hi, On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote: I received a question about inconsistent state after crash recovery. When a table file is broken (or just lost), PostgreSQL can not recover a whole table, and does not show any notice while recoverying. I think it means inconsistent state. (1) create a table, and fill records. (2) process a checkpoint. (3) fill more records. (4) force a crash, and delete the table file. (5) run recovery on restarting. (6) only records after the checkpoint can be recoverd. For example, the attached log shows that PostgreSQL can recover only 1058 records in the table which contains 2000 records before the crash, and does not tell anything in the server log. Is this expected or acceptable? I'd say it's both. WAL replay doesn't have the knowledge to detect that in all too many cases. Nearly always a page's contents will be restored by a full page image the first time it gets changed so they will individually look completely normal. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] comment for fast promote
On Fri, Jul 26, 2013 at 11:19 AM, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: Hi Fujii-san, Thank you for response. (2013/07/25 21:15), Fujii Masao wrote: On Thu, Jul 25, 2013 at 5:33 PM, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: Hi, Now I'm seeing xlog.c in 93_stable for studying fast promote, and I have a question. I think it has an extra unlink command for promote file. (on 9937 line) --- 9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, stat_buf) == 0) 9935 { 9936 unlink(FAST_PROMOTE_SIGNAL_FILE); 9937 unlink(PROMOTE_SIGNAL_FILE); 9938 fast_promote = true; 9939 } --- Is this command necesary ? Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if both promote files exist. The command(unlink(PROMOTE_SIGNAL_FILE)) here is for unusualy case. Because the case is when done both procedures below. - user create promote file on PGDATA - user issue pg_ctl promote I understand the reason. But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before unlink(FAST_PROMOTE_SIGNAL_FILE). Because FAST_PROMOTE_SIGNAL_FILE is definetly there but PROMOTE_SIGNAL_FILE is sometimes there or not there. I could not understand why that's better. Could you elaborate that? And I have another question linking this behavior. I think TriggerFile should be removed too. This is corner-case but it will happen. How do you think of it ? I don't have strong opinion about that. I've never heard the complaint about that current behavior so far. One question is that: we really still need to support normal promote? pg_ctl promote provides only way to do fast promotion. If we want to do normal promotion, we need to create PROMOTE_SIGNAL_FILE and send the SIGUSR1 signal to postmaster by hand. This seems messy. I think that we should remove normal promotion at all, or change pg_ctl promote so that provides also the way to do normal promotion. I think he merit of fast promote is - allowing quick connection by skipping checkpoint and its demerit is - taking little bit longer when crash-recovery If it is seldom to happen its crash soon after promoting and fast promte never breaks consistency of database cluster, I think we don't need normal promotion. You can execute checkpoint after fast promotion for that. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements
W dniu 26.07.2013 02:44, Fabrízio de Royes Mello pisze: Should be... I fix that in attached patch. Hello, as I can see there are more inconsistent places. First style: OperatorCreate --- Second style: ProcedureCreate TypeCreate DefineTSParser DefineType DefineEnum --- Third style: CreateCast DefineTSDictionary DefineTSTemplate DefineTSConfiguration DefineRange DefineCompositeType Regards, Karol -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic background workers, round two
On 2013-07-25 12:35:30 -0400, Robert Haas wrote: On Wed, Jul 24, 2013 at 5:34 PM, Andres Freund and...@2ndquadrant.com wrote: This seems like a sensible idea to me. But, in the context of dynamic query, don't we also need the reverse infrastructure of notifying a bgworker that the client, that requested it to be started, has died? Ending up with a dozen bgworkers running because the normal backend FATALed doesn't seem nice. I think that will be desirable for many, although not all, uses of background workers. For example, you might want to be able to do something like SELECT pg_background('CREATE INDEX CONCURRENTLY ...') and then exit your session and have the background worker continue to run. Or maybe somebody writes a trigger that starts a background worker (if there's not one already working) and queues up work for the trigger, and the background worker continues processing the queue until it is empty, and then exits. But for parallel query specifically, yeah, we'll probably want the untimely demise of either the original backend or any of its workers to result in the death of all the workers and an abort of the parent's transaction. However, there's no need for the postmaster to be involved in any of that. For example, once the worker process is up and running and the parent has its PID, it can use an on_shmem_exit hook or a PG_ENSURE_ERROR_CLEANUP() block to send SIGTERM to any still-living workers. It doesn't need to be the postmaster, but I think we need to provide central infrastructure for that. I don't want this to end up being redone poorly in multiple places. I just wanted to mention it, it obviously doesn't need to be implemented at the same time. +#define InvalidPid (-1) + That value strikes me as a rather bad idea. As a pid that represents init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd rather not spread that outside async.c. Well, if we ever try to kill(InvalidPid), that's a PostgreSQL bug. And if the postgres process has permission to send signals to init, that's an OS bug (or a PostgreSQL bug, since we refuse to run as root). So I'm not very worried about it. I don't think somebody mistakenly kill()ing somebody with -1 is all too likely, but it's a valid value to pass. That's why I dislike using it as constant for an invalid value. I've got two functions that need distinguishable return values for the not-yet-started case and the already-dead case, neither of which can be real PIDs. If we don't use 0 and -1, the alternative is presumably to complicate the calling signature with another out parameter. That does not seem like a net improvement. I actually think those cases would be better served by an error code mechanism which is extensible. +/* + * Report the PID of a newly-launched background worker in shared memory. + * + * This function should only be called from the postmaster. + */ +void +ReportBackgroundWorkerPID(RegisteredBgWorker *rw) +{ + BackgroundWorkerSlot *slot; + + Assert(rw-rw_shmem_slot max_worker_processes); + slot = BackgroundWorkerData-slot[rw-rw_shmem_slot]; + slot-pid = rw-rw_pid; + + if (rw-rw_worker.bgw_notify_pid != 0) + kill(rw-rw_worker.bgw_notify_pid, SIGUSR1); +} Any reason not to use SendProcSignal() properly here? Just that you don't know the BackendId? I seems unclean to reuse SIGUSR1 without using the infrastructure built for that. We're in the postmaster here. The postmaster currently uses kill directly in all cases, rather than SendProcSignal(), and I'd rather not change that. Any code that the postmaster calls has got to be safe against arbitrary shared memory corruption, and I'd rather keep that footprint as small as possible. For example, if we added bgw_backend_id, the postmaster would have to bounds check it to make sure it didn't seg fault. The more of that kind of stuff we add, the more chances there are to destabilize the postmaster. I'd like to keep it as minimal as possible. I can see the point in the argument, but it still seems to be grotty architecturally. + for (;;) + { + pid = GetBackgroundWorkerPid(handle); + if (pid != InvalidPid) + break; + rc = WaitLatch(MyProc-procLatch, +WL_LATCH_SET | WL_POSTMASTER_DEATH, 0); + if (rc WL_POSTMASTER_DEATH) + return InvalidPid; This basically ignores postmaster death. With unix_latch.c that shouldn't be a problem because afaics it's implementation correctly should report an error in the next WaitLatch without waiting as well. I am not sure about the win32 implementation, is that ok there as well? I don't understand what you mean by this. Why does it
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 7/25/13 6:02 PM, didier wrote: It was surely already discussed but why isn't postresql writing sequentially its cache in a temporary file? If you do that, reads of the data will have to traverse that temporary file to assemble their data. You'll make every later reader pay the random I/O penalty that's being avoided right now. Checkpoints are already postponing these random writes as long as possible. You have to take care of them eventually though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 07/26/2013 11:42 AM, Greg Smith wrote: On 7/25/13 6:02 PM, didier wrote: It was surely already discussed but why isn't postresql writing sequentially its cache in a temporary file? If you do that, reads of the data will have to traverse that temporary file to assemble their data. You'll make every later reader pay the random I/O penalty that's being avoided right now. Checkpoints are already postponing these random writes as long as possible. You have to take care of them eventually though. Well, SSD disks do it in the way proposed by didier (AFAIK), by putting random fs pages on one large disk page and having an extra index layer for resolving random-to-sequential ordering. I would not dismiss the idea without more tests and discussion. We could have a system where checkpoint does sequential writes of dirty wal buffers to alternating synced holding files (a checkpoint log :) ) and only background writer does random writes with no forced sync at all -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 07/26/2013 11:42 AM, Greg Smith wrote: On 7/25/13 6:02 PM, didier wrote: It was surely already discussed but why isn't postresql writing sequentially its cache in a temporary file? If you do that, reads of the data will have to traverse that temporary file to assemble their data. In case of crash recovery, a sequential reading of this file could be performed as first step. this should work fairly well in most cases, at least when the recovery shared_buffers is not smaller than the latest run of checkpoint-written dirty buffers. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
Il 7/25/2013 11:02 PM, Alvaro Herrera ha scritto: Andrew Dunstan wrote: Jeff Janes asked me about this, and Bruce just tripped up on it. Usually on Windows it's necessary to have libpq.dll/cygpq.dll either in the PATH or in the same directory as client .exe files. The buildfarm client has for many years simply copied this dll from the installation lib to the installation bin directory after running make install. But I can't really see why we don't do that as part of make install anyway. I haven't tested but I think something like this patch would achieve this goal - it would fix something that's tripped a lot of people up over the years. Seems a reasonable workaround for a silly platform bug. Do you need to patch the MSVC stuff as well? Comments? If we do this, should it be backported? To 9.3, sure, but not further back, as there's probably little point. on cygwin no need to go before 9.3 . We already moved them during the package build as workaround. +ifneq (,$findstring($(PORTNAME), win32 cygwin)) + $(INSTALL_DATA) $(shlib) '$(DESTDIR)$(bindir)/$(shlib)' +endif I wonder if someday we will create a win64 $(PORTNAME) value, or something like that, and then we'll have to update the port list on which these hacks are applied all over the place. Not complaining about this patch in particular, just an idle thought. Andrew, are you planning to include also http://www.postgresql.org/message-id/e1uqoyt-0007qc...@wrigleys.postgresql.org http://www.postgresql.org/message-id/51b59794.3000...@gmail.com to get rid of dllwrap ? Regards Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 7/26/13 5:59 AM, Hannu Krosing wrote: Well, SSD disks do it in the way proposed by didier (AFAIK), by putting random fs pages on one large disk page and having an extra index layer for resolving random-to-sequential ordering. If your solution to avoiding random writes now is to do sequential ones into a buffer, you'll pay for it by having more expensive random reads later. In the SSD write buffer case, that works only because those random reads are very cheap. Do the same thing on a regular drive, and you'll be paying a painful penalty *every* time you read in return for saving work *once* when you write. That only makes sense when your workload is near write-only. It's possible to improve on this situation by having some sort of background process that goes back and cleans up the random data, converting it back into sequentially ordered writes again. SSD controllers also have firmware that does this sort of work, and Postgres might do it as part of vacuum cleanup. But note that such work faces exactly the same problems as writing the data out in the first place. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On Friday, July 26, 2013 10:35 AM Alvaro Herrera wrote: Josh Berkus escribió: On 07/25/2013 02:02 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: My thought is that people might put postgresql.conf in a directory that only contains configuration files and isn't writeable by the postgres user. So I would expect to find postgresql.auto.conf in the data directory always. Yeah, exactly. I think putting it anywhere but $PGDATA is a bad idea, and a sysadmin who thinks he knows better probably doesn't. Please see Greg Smith's fairly strong arguments for putting it in the config/ directory. As far as I see, there are two argument he makes: 1. We ought to have conf.d/ (not config/) so that tools have a way to deploy snippets (e.g. pgtune) 2. we ought to have all the config files together so that versioning tools (Puppet) can just say keep all files within directory X versioned and not have to care about specific file names, etc. I can buy (1), because that's a pretty common design for daemons nowadays. But I think that's its own patch, and there's no reason that this patch should be messing with this. I don't care all that much about (2), but I have no problem with doing that. So we could have two patches, first one that introduces a conf.d subdir that's automatically parsed after postgresql.conf, and another one that implements ALTER SYSTEM by using a file within conf.d. The reason I say we need a separate patch for conf.d is that I think it'd be easier to argue about it in isolation, than having it be entangled with ALTER SYSTEM stuff. The main contention point I see is where conf.d lives; the two options are in $PGDATA or together with postgresql.conf. Tom and Robert, above, say it should be in $PGDATA; but this goes against Debian packaging and the Linux FHS (or whatever that thing is called). Sure, I think we can split into 2 patches, but I doubt it will make it any easier to get this feature completed. The contention point can delay the first patch (automatic parse of conf.d) which can lead to further delay of ALTER SYSTEM SET patch and that will eventually lead to its death. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inconsistent state after crash recovery
Andres Freund and...@2ndquadrant.com writes: On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote: Is this expected or acceptable? I'd say it's both. Postgres is built on the assumption that the underlying filesystem is reliable, ie, once you've successfully fsync'd some data that data won't disappear. If the filesystem fails to honor that contract, it's a filesystem bug not a Postgres bug. Nor is it reasonable to expect Postgres to be able to detect every such violation. As an example, would you expect crash recovery to notice the disappearance of a file that was touched nowhere in the replayed actions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
Greg Smith g...@2ndquadrant.com writes: On 7/26/13 5:59 AM, Hannu Krosing wrote: Well, SSD disks do it in the way proposed by didier (AFAIK), by putting random fs pages on one large disk page and having an extra index layer for resolving random-to-sequential ordering. If your solution to avoiding random writes now is to do sequential ones into a buffer, you'll pay for it by having more expensive random reads later. What I'd point out is that that is exactly what WAL does for us, ie convert a bunch of random writes into sequential writes. But sooner or later you have to put the data where it belongs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
Alvaro Herrera alvhe...@2ndquadrant.com writes: The main contention point I see is where conf.d lives; the two options are in $PGDATA or together with postgresql.conf. Tom and Robert, above, say it should be in $PGDATA; but this goes against Debian packaging and the Linux FHS (or whatever that thing is called). Ordinarily, if postgresql.conf is not in $PGDATA, it will be somewhere that the postmaster does not (and should not) have write permissions for. I have no objection to inventiny a conf.d subdirectory, I just say that it must be under $PGDATA. The argument that this is against FHS is utter nonsense, because anything we write there is not static configuration, it's just data. Come to think of it, maybe part of the reason we're having such a hard time getting to consensus is that people are conflating the snippet part with the writable part? I mean, if you are thinking you want system-management tools to be able to drop in configuration fragments as separate files, there's a case to be made for a conf.d subdirectory that lives somewhere that the postmaster can't necessarily write. We just mustn't confuse that with support for ALTER SYSTEM SET. I strongly believe that ALTER SYSTEM SET must not be designed to write anywhere outside $PGDATA. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas robertmh...@gmail.com writes: OK. I've taken care of all remaining uses of SnapshotNow in the code base. I think we can go ahead and remove it, now. Patch attached. (And there was, hopefully, much rejoicing.) What about SnapshotSelf? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic background workers, round two
On Fri, Jul 26, 2013 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote: It doesn't need to be the postmaster, but I think we need to provide central infrastructure for that. I don't want this to end up being redone poorly in multiple places. I just wanted to mention it, it obviously doesn't need to be implemented at the same time. OK, makes sense. I'm still working out a design for this part of it; I think I want to get the dynamic shared memory stuff working first. +#define InvalidPid (-1) + That value strikes me as a rather bad idea. As a pid that represents init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd rather not spread that outside async.c. Well, if we ever try to kill(InvalidPid), that's a PostgreSQL bug. And if the postgres process has permission to send signals to init, that's an OS bug (or a PostgreSQL bug, since we refuse to run as root). So I'm not very worried about it. I don't think somebody mistakenly kill()ing somebody with -1 is all too likely, but it's a valid value to pass. That's why I dislike using it as constant for an invalid value. Well, so is any negative number, but we should never be trying to kill process groups. I've got two functions that need distinguishable return values for the not-yet-started case and the already-dead case, neither of which can be real PIDs. If we don't use 0 and -1, the alternative is presumably to complicate the calling signature with another out parameter. That does not seem like a net improvement. I actually think those cases would be better served by an error code mechanism which is extensible. Hmm. What signature would you suggest? I can see the point in the argument, but it still seems to be grotty architecturally. Suck it up. :-) I considered the idea of using SIGUSR2, but it didn't seem worth it. I don't understand what you mean by this. Why does it ignore postmaster death? The other callers that check for WL_POSTMASTER_DEATH actually perform drastic measures like exit()ing upon postmaster. Here you just return. With the unix latch implementation it seems to be guaranteed that the next WaitLatch() done somewhere else will also return WL_POSTMASTER_DEATH. I am not sure whether it might actually consume the death in the windows implementation. Well, IMHO, every backend should exit as quick as possible upon discovering that the postmaster has died. However, for complex political reasons (ah-Tom-Lane-choo!), our policy is to have the auxiliary processes exit and regular backends soldier on. This seems completely stupid to me, but it's not this patch's job to reverse that policy decision. To put that another way, one could equally well ask why WaitLatch(), or for that matter PostmasterIsAlive(), doesn't contain ereport(FATAL) upon discovering that the postmaster has given up the ghost. And the answer is that it's the job of those functions to provide information, not make policy decisions. I don't have a problem with the restriction, but I'd like to see a check against it. Maybe check for MyBackendId != InvalidBackendId in RegisterDynamicBackgroundWorker()? That would also prevent starting further bgworkers before BackgroundWorkerInitializeConnection() is done in a connected bgworker which seems to be a good thing. Well, that's easy enough to fix. Should we Assert() or elog() or what? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On 2013-07-26 08:49:38 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: OK. I've taken care of all remaining uses of SnapshotNow in the code base. I think we can go ahead and remove it, now. Patch attached. (And there was, hopefully, much rejoicing.) What about SnapshotSelf? I thought about that yesterday and I think we should replace the usages which aren't easily replaceable (constraint stuff) with an mvcc snapshot, just one treats our transaction's current CommandId as visible. That should be doable? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On 2013-07-25 19:24:53 -0400, Robert Haas wrote: (And there was, hopefully, much rejoicing.) Definitely! Thanks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: OK. I've taken care of all remaining uses of SnapshotNow in the code base. I think we can go ahead and remove it, now. Patch attached. (And there was, hopefully, much rejoicing.) What about SnapshotSelf? Well, that's still used in _bt_check_unique, unique_key_recheck (trigger function to do a deferred uniqueness check), RI_FKey_check, and rather extensively by sepgsql. I don't really have much desire to do the work to get rid of it, though. Getting rid of SnapshotNow is arguably important on the grounds that third-party code may be using it, and doing this will force them to do it the new way instead, and that's got some value. I'm not sure if anything's already been committed that relies on MVCC catalog access, but several things have certainly been proposed and it's a good bet that 9.4 will rely on the catalog access using MVCC-semantics, so forcing third-party code to stop using SnapshotNow will prevent subtle bugs. But there's no similar joy for SnapshotSelf. You can argue that all of the things that we're doing with it are crufty, but nobody's complaining about any of them, and some of them are in places where the cost of an additional MVCC snapshot on every iteration might be much more serious than anything we ever saw for catalog scans. So I'm personally content to leave it well enough alone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions makefiles - coverage
On Thu, Jul 25, 2013 at 11:07 AM, Ronan Dunklau rdunk...@gmail.com wrote: Hello. I was having trouble figuring how to use the coverage targets when using an extension. I am using approximatively the layout that was proposed here: http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net It looks like everything is hard-coded to take the source and the gcda, gcno files in the base directory, but these files lay in a src directory with the proposed layout. It may be better to base the .gcda file discovery on the OBJS variables when using PGXS. Please find attached a small patch that implements this. There is probably a better way than the redundant rm $(gcda_files) / rm *.gcda to cleanup the generated files. With the attached patch, the following targets seem to have the same behaviour as on the current HEAD, both on the whole tree and on individual contrib modules: - coverage-html - clean - coverage-clean - clean-coverage I noticed that make clean leaves gcda and gcov files on the current HEAD, and this is no different with the given patch. I also tested it against several pgxn extensions, and it seems to work fine. You can ensure that your patch doesn't get forgotten about by adding it here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
Hi, On Fri, Jul 26, 2013 at 11:42 AM, Greg Smith g...@2ndquadrant.com wrote: On 7/25/13 6:02 PM, didier wrote: It was surely already discussed but why isn't postresql writing sequentially its cache in a temporary file? If you do that, reads of the data will have to traverse that temporary file to assemble their data. You'll make every later reader pay the random I/O penalty that's being avoided right now. Checkpoints are already postponing these random writes as long as possible. You have to take care of them eventually though. No the log file is only used at recovery time. in check point code: - loop over cache, marks dirty buffers with BM_CHECKPOINT_NEEDED as in current code - other workers can't write and evicted these marked buffers to disk, there's a race with fsync. - check point fsync now or after the next step. - check point loop again save to log these buffers, clear BM_CHECKPOINT_NEEDED but *doesn't* clear BM_DIRTY, of course many buffers will be written again, as they are when check point isn't running. - check point done. During recovery you have to load the log in cache first before applying WAL. Didier
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: What about SnapshotSelf? Well, that's still used in _bt_check_unique, unique_key_recheck (trigger function to do a deferred uniqueness check), RI_FKey_check, and rather extensively by sepgsql. I don't really have much desire to do the work to get rid of it, though. Hm. I agree the first three may be all right, but I can't help suspecting that sepgsql is doing the wrong thing here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions makefiles - coverage
Thank you for the tip, its done. 2013/7/26 Robert Haas robertmh...@gmail.com: On Thu, Jul 25, 2013 at 11:07 AM, Ronan Dunklau rdunk...@gmail.com wrote: Hello. I was having trouble figuring how to use the coverage targets when using an extension. I am using approximatively the layout that was proposed here: http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net It looks like everything is hard-coded to take the source and the gcda, gcno files in the base directory, but these files lay in a src directory with the proposed layout. It may be better to base the .gcda file discovery on the OBJS variables when using PGXS. Please find attached a small patch that implements this. There is probably a better way than the redundant rm $(gcda_files) / rm *.gcda to cleanup the generated files. With the attached patch, the following targets seem to have the same behaviour as on the current HEAD, both on the whole tree and on individual contrib modules: - coverage-html - clean - coverage-clean - clean-coverage I noticed that make clean leaves gcda and gcov files on the current HEAD, and this is no different with the given patch. I also tested it against several pgxn extensions, and it seems to work fine. You can ensure that your patch doesn't get forgotten about by adding it here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 7/26/13 9:14 AM, didier wrote: During recovery you have to load the log in cache first before applying WAL. Checkpoints exist to bound recovery time after a crash. That is their only purpose. What you're suggesting moves a lot of work into the recovery path, which will slow down how long it takes to process. More work at recovery time means someone who uses the default of checkpoint_timeout='5 minutes', expecting that crash recovery won't take very long, will discover it does take a longer time now. They'll be forced to shrink the value to get the same recovery time as they do currently. You might need to make checkpoint_timeout 3 minutes instead, if crash recovery now has all this extra work to deal with. And when the time between checkpoints drops, it will slow the fundamental efficiency of checkpoint processing down. You will end up writing out more data in the end. The interval between checkpoints and recovery time are all related. If you let any one side of the current requirements slip, it makes the rest easier to deal with. Those are all trade-offs though, not improvements. And this particular one is already an option. If you want less checkpoint I/O per capita and don't care about recovery time, you don't need a code change to get it. Just make checkpoint_timeout huge. A lot of checkpoint I/O issues go away if you only do a checkpoint per hour, because instead of random writes you're getting sequential ones to the WAL. But when you crash, expect to be down for a significant chunk of an hour, as you go back to sort out all of the work postponed before. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Fri, Jul 26, 2013 at 9:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: What about SnapshotSelf? Well, that's still used in _bt_check_unique, unique_key_recheck (trigger function to do a deferred uniqueness check), RI_FKey_check, and rather extensively by sepgsql. I don't really have much desire to do the work to get rid of it, though. Hm. I agree the first three may be all right, but I can't help suspecting that sepgsql is doing the wrong thing here. sepgsql is using SnapshotSelf to find the old version of a tuple that was updated by the core code just before. That should be safe in the sense that there can't be a currently-committing transaction somewhere else that's updated that tuple, if we know that our own uncommitted transaction has done a transactional update. There was a recent thread discussing whether another API might be better, and I'd be prepared to concede that it might be. But I don't think it's drop-dead broken. Not that I really object if someone wants to have a go at getting rid of SnapshotSelf, but I think it'd be worth articulating what we hope to accomplish by so doing. For example, the btree README says the following about the deletion algorithm: --- ... The reason we do it is to provide an interlock between non-full VACUUM and indexscans. Since VACUUM deletes index entries before deleting tuples, the super-exclusive lock guarantees that VACUUM can't delete any heap tuple that an indexscanning process might be about to visit. (This guarantee works only for simple indexscans that visit the heap in sync with the index scan, not for bitmap scans. We only need the guarantee when using non-MVCC snapshot rules; in an MVCC snapshot, it wouldn't matter if the heap tuple were replaced with an unrelated tuple at the same TID, because the new tuple wouldn't be visible to our scan anyway.) --- Obviously, when we were using SnapshotNow for catalog access, changing anything here was a non-starter. But now that we're not, it might be worth asking whether there are few enough users of non-MVCC rules that we could apply some suitable treatment to those that remain and then change the locking protocol here. And if we did do that, would there be enough performance benefit to justify the work? I don't have answers to those questions, and the answer may well be that we should leave things as they are, but I think the questions are worth thinking about. Aside from any possible advantage in further trimming the list of available snapshot types, I'd like to spend some time thinking about what we can do that's safe and useful in terms of reducing lock levels; or even adding completely new facilities that would have been DOA in the old world. I have high hopes in both areas, but I wouldn't be surprised to find that there are problems we haven't thought about yet. I think our dependence on SnapshotNow has wormed itself into our design choices in deep ways, and I suspect it's going to take a good deal of thought to figure out exactly what we can improve and how. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 7/26/13 8:32 AM, Tom Lane wrote: What I'd point out is that that is exactly what WAL does for us, ie convert a bunch of random writes into sequential writes. But sooner or later you have to put the data where it belongs. Hannu was observing that SSD often doesn't do that at all. They can maintain logical - physical translation tables that decode where each block was written to forever. When read seeks are really inexpensive, the only pressure to reorder block is wear leveling. That doesn't really help with regular drives though, where the low seek time assumption doesn't play out so well. The whole idea of writing things sequentially and then sorting them out later was the rage in 2001 for ext3 on Linux, as part of the data=journal mount option. You can go back and see that people are confused but excited about the performance at http://www.ibm.com/developerworks/linux/library/l-fs8/index.html Spoiler: if you use a workload that has checkpoint issues, it doesn't help PostgreSQL latency. Just like using a large write cache, you gain some burst performance, but eventually you pay for it with extra latency somewhere. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 26, 2013 at 9:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, that's still used in _bt_check_unique, unique_key_recheck (trigger function to do a deferred uniqueness check), RI_FKey_check, and rather extensively by sepgsql. I don't really have much desire to do the work to get rid of it, though. Hm. I agree the first three may be all right, but I can't help suspecting that sepgsql is doing the wrong thing here. sepgsql is using SnapshotSelf to find the old version of a tuple that was updated by the core code just before. Oh. OK, then it reduces to the same case as the other three, ie we're looking at tuples we know to be update-locked. [ interesting ruminations snipped ] Yeah, removing SnapshotNow catalog access certainly opens the doors for a lot of new thinking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On 2013-07-26 09:50:32 -0400, Robert Haas wrote: sepgsql is using SnapshotSelf to find the old version of a tuple that was updated by the core code just before. That should be safe in the sense that there can't be a currently-committing transaction somewhere else that's updated that tuple, if we know that our own uncommitted transaction has done a transactional update. There was a recent thread discussing whether another API might be better, and I'd be prepared to concede that it might be. But I don't think it's drop-dead broken. It's safe for the tuples updated in that transaction, but it's not safe to look at anything else if you expect results without the SnapshotNow problems. E.g. looking at a newly created attribute is fine, but iterating over all attributes not necessarily. I am more concerned about the care needed when placing CommandCounterIncrement()'s somewhere though. It seems more than likely that this will get repeatedly broken, even if it's not atm (which I doubt). E.g. inheritance handling seems to be rather wonky WRT this. Not that I really object if someone wants to have a go at getting rid of SnapshotSelf, but I think it'd be worth articulating what we hope to accomplish by so doing. Agreed. From the internal usages there doesn't seem to be too much pressure. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Condition to become the standby mode.
On Fri, Jul 26, 2013 at 3:08 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-07-26 13:19:34 +0900, Sawada Masahiko wrote: When the slave server starts, the slave server perform the following steps in StartupXLOG(): 1. Read latest CheckPoint record LSN from pg_control file. 2. Try to fetch CheckPoint record from pg_xlog directory at first. ( The server try to read up to prior CheckPoint record from pg_contrl file) 3. If there is not in pg_xlog, the slave server requests CheckPoint record to the primary server. in #3, it works only when StandbyMode is true. For StandbyMode is to true, database cluster state should be DB_SHUTDOWNED (it is one of the conditions). that is, slave server can try to fetch checkpoint record from the master server after slave server was successful completion. It also will fetch the xlog from remote when we've found a backup label (the read_backup_label() branch in StartupXLOG() will set StandbyMode to true). Which will be the case if we're recovering from something like a base backup. The reason we don't fetch from remote if there's no backup label *and* the last checkpoint wasn't a shutdown checkpoint is that that's doing *local* crash recovery. Yes, in that case, local crash recovery occurs first. After it ends (i.e., reach the end of the WAL available in local), we can enable standby mode and fetch the WAL from remote server. If this problem is solved, there is possible of that we can failback by removing the all WAL record which is in pg_xlog before server starts as the slave server. ( And we also use synchronous_transfer which I'm proposing, I think we can fail-back without taking full backup surely) I still have *massive* doubts about the concept. But anyway, if you want to do so, you should generate a backup label that specifies the startup location. Generating a backup label doesn't seem to be enough because there is no backup-end WAL record and we cannot know the consistency point. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Condition to become the standby mode.
On 2013-07-26 23:47:59 +0900, Fujii Masao wrote: If this problem is solved, there is possible of that we can failback by removing the all WAL record which is in pg_xlog before server starts as the slave server. ( And we also use synchronous_transfer which I'm proposing, I think we can fail-back without taking full backup surely) I still have *massive* doubts about the concept. But anyway, if you want to do so, you should generate a backup label that specifies the startup location. Generating a backup label doesn't seem to be enough because there is no backup-end WAL record and we cannot know the consistency point. Since 9.2 we allow generation of base backups from standbys, the infrastructure built for should be sufficient to pass the lsn at which consistency is achieved. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
Hi, On Fri, Jul 26, 2013 at 3:41 PM, Greg Smith greg@2ndquadg...@2ndquadrant.com(needrant.comg...@2ndquadrant.com wrote: On 7/26/13 9:14 AM, didier wrote: During recovery you have to load the log in cache first before applying WAL. Checkpoints exist to bound recovery time after a crash. That is their only purpose. What you're suggesting moves a lot of work into the recovery path, which will slow down how long it takes to process. Yes it's slower but you're sequentially reading only one file at most the size of your buffer cache, moreover it's a constant time. Let say you make a checkpoint and crash just after with a next to empty WAL. Now recovery is very fast but you have to repopulate your cache with random reads from requests. With the snapshot it's slower but you read, sequentially again, a lot of hot cache you will need later when the db starts serving requests. Of course the worst case is if it crashes just before a checkpoint, most of the snapshot data are stalled and will be overwritten by WAL ops. But If the WAL recovery is CPU bound, loading from the snapshot may be done concurrently while replaying the WAL. More work at recovery time means someone who uses the default of checkpoint_timeout='5 minutes', expecting that crash recovery won't take very long, will discover it does take a longer time now. They'll be forced to shrink the value to get the same recovery time as they do currently. You might need to make checkpoint_timeout 3 minutes instead, if crash recovery now has all this extra work to deal with. And when the time between checkpoints drops, it will slow the fundamental efficiency of checkpoint processing down. You will end up writing out more data in the end. Yes it's a trade off, now you're paying the price at checkpoint time, every time, with the log you're paying only once, at recovery. The interval between checkpoints and recovery time are all related. If you let any one side of the current requirements slip, it makes the rest easier to deal with. Those are all trade-offs though, not improvements. And this particular one is already an option. If you want less checkpoint I/O per capita and don't care about recovery time, you don't need a code change to get it. Just make checkpoint_timeout huge. A lot of checkpoint I/O issues go away if you only do a checkpoint per hour, because instead of random writes you're getting sequential ones to the WAL. But when you crash, expect to be down for a significant chunk of an hour, as you go back to sort out all of the work postponed before. It's not the same it's a snapshot saved and loaded in constant time unlike the WAL log. Didier
Re: [HACKERS] Condition to become the standby mode.
On Fri, Jul 26, 2013 at 11:55 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-07-26 23:47:59 +0900, Fujii Masao wrote: If this problem is solved, there is possible of that we can failback by removing the all WAL record which is in pg_xlog before server starts as the slave server. ( And we also use synchronous_transfer which I'm proposing, I think we can fail-back without taking full backup surely) I still have *massive* doubts about the concept. But anyway, if you want to do so, you should generate a backup label that specifies the startup location. Generating a backup label doesn't seem to be enough because there is no backup-end WAL record and we cannot know the consistency point. Since 9.2 we allow generation of base backups from standbys, the infrastructure built for should be sufficient to pass the lsn at which consistency is achieved. Yeah, right. I'd forgotten about that. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade -j broken on Windows
On Thu, Jul 25, 2013 at 10:57:28AM -0400, Bruce Momjian wrote: Everyone should be aware that the 9.3 pg_upgrade -j/--jobs option on Windows is currently broken, and hopefully will be fixed by the next beta. Someone at PGDay UK told me they were getting pg_upgrade -j crashes on Windows. Andrew Dunstan was able to reproduce the crash, and that has been fixed, but there is still a race condition that I am trying to diagnose. After three days of testing and creating a Windows MinGW build environment (with Andrew's help), I have come up with the attached patch which fixes the pg_upgrade -j race condition on Windows. In summary, creating a file with fopen() from a non-primary thread and then calling system() soon after can result in a file-in-use error. The solution was to issue the fopen() after system() in such cases. This would be applied to head and 9.3. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade -j broken on Windows
On Fri, Jul 26, 2013 at 01:27:34PM -0400, Bruce Momjian wrote: On Thu, Jul 25, 2013 at 10:57:28AM -0400, Bruce Momjian wrote: Everyone should be aware that the 9.3 pg_upgrade -j/--jobs option on Windows is currently broken, and hopefully will be fixed by the next beta. Someone at PGDay UK told me they were getting pg_upgrade -j crashes on Windows. Andrew Dunstan was able to reproduce the crash, and that has been fixed, but there is still a race condition that I am trying to diagnose. After three days of testing and creating a Windows MinGW build environment (with Andrew's help), I have come up with the attached patch which fixes the pg_upgrade -j race condition on Windows. In summary, creating a file with fopen() from a non-primary thread and then calling system() soon after can result in a file-in-use error. The solution was to issue the fopen() after system() in such cases. This would be applied to head and 9.3. Sorry, patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c new file mode 100644 index ef123a8..3ec6e1c *** a/contrib/pg_upgrade/exec.c --- b/contrib/pg_upgrade/exec.c *** static void validate_exec(const char *di *** 21,26 --- 21,27 #ifdef WIN32 static int win32_check_directory_write_permissions(void); + DWORD mainThreadId = 0; #endif *** static int win32_check_directory_write_p *** 37,48 * If throw_error is true, this raises a PG_FATAL error and pg_upgrade * terminates; otherwise it is just reported as PG_REPORT and exec_prog() * returns false. */ bool exec_prog(const char *log_file, const char *opt_log_file, bool throw_error, const char *fmt,...) { ! int result; int written; #define MAXCMDLEN (2 * MAXPGPATH) --- 38,51 * If throw_error is true, this raises a PG_FATAL error and pg_upgrade * terminates; otherwise it is just reported as PG_REPORT and exec_prog() * returns false. + * + * The code requires it be called first from the primary thread on Windows. */ bool exec_prog(const char *log_file, const char *opt_log_file, bool throw_error, const char *fmt,...) { ! int result = 0; int written; #define MAXCMDLEN (2 * MAXPGPATH) *** exec_prog(const char *log_file, const ch *** 50,55 --- 53,64 FILE *log; va_list ap; + #ifdef WIN32 + /* We assume we are called from the primary thread first */ + if (mainThreadId == 0) + mainThreadId = GetCurrentThreadId(); + #endif + written = strlcpy(cmd, SYSTEMQUOTE, sizeof(cmd)); va_start(ap, fmt); written += vsnprintf(cmd + written, MAXCMDLEN - written, fmt, ap); *** exec_prog(const char *log_file, const ch *** 61,66 --- 70,91 if (written = MAXCMDLEN) pg_log(PG_FATAL, command too long\n); + pg_log(PG_VERBOSE, %s\n, cmd); + + #ifdef WIN32 + /* + * For some reason, Windows issues a file-in-use error if we write data + * to the log file from a non-primary thread just before we create a + * subprocess that also writes to the same log file. One fix is to + * sleep for 100ms. A cleaner fix is to write to the log file _after_ + * the subprocess has completed, so we do this only when writing from + * a non-primary thread. fflush(), running system() twice, and + * pre-creating the file do not see to help. + */ + if (mainThreadId != GetCurrentThreadId()) + result = system(cmd); + #endif + log = fopen(log_file, a); #ifdef WIN32 *** exec_prog(const char *log_file, const ch *** 84,94 if (log == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); #ifdef WIN32 ! fprintf(log, \n\n); #endif - pg_log(PG_VERBOSE, %s\n, cmd); fprintf(log, command: %s\n, cmd); /* * In Windows, we must close the log file at this point so the file is not --- 109,126 if (log == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); + #ifdef WIN32 ! /* Are we printing command: before its output? */ ! if (mainThreadId == GetCurrentThreadId()) ! fprintf(log, \n\n); #endif fprintf(log, command: %s\n, cmd); + #ifdef WIN32 + /* Are we printing command: after its output? */ + if (mainThreadId != GetCurrentThreadId()) + fprintf(log, \n\n); + #endif /* * In Windows, we must close the log file at this point so the file is not *** exec_prog(const char *log_file, const ch *** 96,102 */ fclose(log); ! result = system(cmd); if (result != 0) { --- 128,138 */ fclose(log); ! #ifdef WIN32 ! /* see comment above */ ! if (mainThreadId == GetCurrentThreadId()) ! #endif ! result = system(cmd); if (result != 0) { *** exec_prog(const char *log_file, const
Re: [HACKERS] getting rid of SnapshotNow
On Fri, Jul 26, 2013 at 10:16 AM, Andres Freund and...@2ndquadrant.com wrote: I am more concerned about the care needed when placing CommandCounterIncrement()'s somewhere though. It seems more than likely that this will get repeatedly broken, even if it's not atm (which I doubt). E.g. inheritance handling seems to be rather wonky WRT this. There may well be bugs. I am fine with reviewing patches to improve the code in this area, but I don't plan to take it upon myself to rewrite that code. Either it's working as expected, or nobody's using it, because we're not getting any bug reports. Not that I really object if someone wants to have a go at getting rid of SnapshotSelf, but I think it'd be worth articulating what we hope to accomplish by so doing. Agreed. From the internal usages there doesn't seem to be too much pressure. So unless there are objections to the patch as posted, I'm going to apply that next week. This in no way precludes more work in this area later, but since we're likely to break third-party code with this change, we might as well get it out of the way as early in the release cycle as possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
* Tom Lane (t...@sss.pgh.pa.us) wrote: Come to think of it, maybe part of the reason we're having such a hard time getting to consensus is that people are conflating the snippet part with the writable part? I mean, if you are thinking you want system-management tools to be able to drop in configuration fragments as separate files, there's a case to be made for a conf.d subdirectory that lives somewhere that the postmaster can't necessarily write. We just mustn't confuse that with support for ALTER SYSTEM SET. I strongly believe that ALTER SYSTEM SET must not be designed to write anywhere outside $PGDATA. Agreed. To continue that thought, I find it *very* unlikely that a given environment would use *both* a tool like puppet to manage the files in their conf.d *and* have people using ALTER SYSTEM SET. You're going to do one or the other, almost certainly; not the least of which is because those are very likely two different teams and only one of them is going to be responsible for the PG system config. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1
Lonni J Friedman netll...@gmail.com writes: nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ; ERROR: option use_remote_estimate not found Am I doing something wrong, or is this a bug? [ experiments... ] You need to say ADD, not SET, to add a new option to the list. SET might more appropriately be spelled REPLACE, because it requires that the object already have a defined value for the option, which will be replaced. Our documentation appears not to disclose this fine point, but a look at the SQL-MED standard says it's operating per spec. The standard also says that ADD is an error if the option is already defined, which is a bit more defensible, but still not exactly what I'd call user-friendly. And the error we issue for that case is pretty misleading too: regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ; ALTER SERVER regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ; ERROR: option use_remote_estimate provided more than once I think we could do with both more documentation, and better error messages for these cases. In the SET-where-you-should-use-ADD case, perhaps ERROR: option use_remote_estimate has not been set HINT: Use ADD not SET to define an option that wasn't already set. In the ADD-where-you-should-use-SET case, perhaps ERROR: option use_remote_estimate is already set HINT: Use SET not ADD to change an option's value. The provided more than once wording would be appropriate if the same option is specified more than once in the command text, but I'm not sure that it's worth the trouble to detect that case. Thoughts, better wordings? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1
On Fri, Jul 26, 2013 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Lonni J Friedman netll...@gmail.com writes: nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ; ERROR: option use_remote_estimate not found Am I doing something wrong, or is this a bug? [ experiments... ] You need to say ADD, not SET, to add a new option to the list. SET might more appropriately be spelled REPLACE, because it requires that the object already have a defined value for the option, which will be replaced. Our documentation appears not to disclose this fine point, but a look at the SQL-MED standard says it's operating per spec. The standard also says that ADD is an error if the option is already defined, which is a bit more defensible, but still not exactly what I'd call user-friendly. And the error we issue for that case is pretty misleading too: regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ; ALTER SERVER regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ; ERROR: option use_remote_estimate provided more than once I think we could do with both more documentation, and better error messages for these cases. In the SET-where-you-should-use-ADD case, perhaps ERROR: option use_remote_estimate has not been set HINT: Use ADD not SET to define an option that wasn't already set. In the ADD-where-you-should-use-SET case, perhaps ERROR: option use_remote_estimate is already set HINT: Use SET not ADD to change an option's value. The provided more than once wording would be appropriate if the same option is specified more than once in the command text, but I'm not sure that it's worth the trouble to detect that case. Thoughts, better wordings? Thanks Tom, I've confirmed that using ADD was the solution. I think your suggested updated ERROR HINT text is an excellent improvement. It definitely would have given me the clue I was missing earlier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Patch to add support of IF NOT EXISTS to others CREATE statements
At 2013-07-26 10:39:00 +0200, karl...@gmail.com wrote: Hello, as I can see there are more inconsistent places. Right. This is what I was referring to in my original review. All of the relevant sites (pre-patch) that currently do: if (already exists) ereport(ERROR …) should instead be made to do: if (already exists) { if (ifNotExists) { ereport(NOTICE …) return } ereport(ERROR …) } or even (very slightly easier to review): if (already exists ifNotExists) { ereport(ERROR …) return } if (already exists) ereport(ERROR …) I don't care much which of the two is used, so long as it's (a) the same everywhere, and (b) there's no if (!ifNot anywhere. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
On 25-07-2013 05:32, suresh.balasubra wrote: Disclaimer: I am no hacker, just a PostGreSQL user, trying to provide a user scenario where DISCARD SEQUENCES functionality is required. We have designed a developed a small Application Development platform for which the backend is PostGreSQL. There is a DBLayer which is responsible in generating SQL statements for all the INSERT, UPDATE, DELETE operations. Data can be pushed into multiple tables using this layer. What we provide to this layer is just a DataSet (.NET). DataTables in the DataSet will be named after their respective tables. We also use DataColumn extended properties to push in additional logic. All these happen with in a transaction and also in one shot, just one hit to the DB by appending SQL statements in proper order. There is an interesting feature that we have built into this DBlayer which is auto linking. All tables in our system will have a serial field 'id'. Suppose there is a master table (let us call it 'voucher') and a detail table ('voucher_lines'), and we are using the layer to push one record to the master table and 50 records to the detail table. 'voucher_lines' table will have a integer column 'voucher_id'. '*_id' fields are automatically populated with 'currval('*_id_seq'). All this works like a charm. Now, imagine we want to push data into another table (say 'invoice') which also has a field 'voucher_id'. This is a different activity not connected with the above mentioned transaction. In this scenario this field will get updated as currval('voucher_id_seq') returns a value. But we do not want that to be updated. What we want is to resolve '*_id' fields into values only within a transaction. After the transaction we want to get away with the session. If we could have cleared the session someway (DISCARD ALL does it, but not the currval sequence info) after the first transaction it would have worked for us. I already sent a patch to implement DISCARD SEQUENCES [1] that I expect to be part of 9.4 version. Regards, [1] https://commitfest.postgresql.org/action/patch_view?id=1171 -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers