Re: [HACKERS] Improving executor performance
On 25 June 2016 05:00, Andres Freund Wrote: >To: pgsql-hackers@postgresql.org >Subject: [HACKERS] Improving executor performance > >My observations about the performance bottlenecks I found, and partially >addressed, are in rough order of importance (there's interdependence >between most of them): > >1) Cache misses cost us a lot, doing more predictable accesses can make > a huge difference. Without addressing that, many other bottlenecks > don't matter all that much. I see *significant* performance > improvements for large seqscans (when the results are used) simply > memcpy()'ing the current target block. > > This partially is an intrinsic problem of analyzing a lot of data, > and partially because our access patterns are bad. > > >2) Baring 1) tuple deforming is the biggest bottleneck in nearly all > queries I looked at. There's various places we trigger deforming, > most ending in either slot_deform_tuple(), heap_getattr(), > heap_deform_tuple(). Agreed, I had also observed similar behavior specifically (2) while working on improving executor performance. I had done some prototype work on this to improve performance by using native compilation. Details of same is available at my blog: http://rajeevrastogi.blogspot.in/2016/03/native-compilation-part-2-pgday-asia.html >3) Our 1-by-1 tuple flow in the executor has two major issues: Agreed, In order to tackle this IMHO, we should 1. Makes the processing data-centric instead of operator centric. 2. Instead of pulling each tuple from immediate operator, operator can push the tuple to its parent. It can be allowed to push until it sees any operator, which cannot be processed without result from other operator. More details from another thread: https://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab77159a9b...@szxeml521-mbs.china.huawei.com Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11 May 2016 19:50, Bruce Momjian Wrote: >I am giving a keynote at an IEEE database conference in Helsinki next >week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa >because I accepted the Helsinki conference invitation before the PGCon >Ottawa date was changed from June to May). > >As part of the keynote, I would like to mention areas where academia can >help us. The topics I can think of are: > > Query optimization > Optimizer statistics > Indexing structures > Reducing function call overhead > CPU locality > Sorting > Parallelism > Sharding > >Any others? How about? 1. Considering NUMA aware architecture. 2. Optimizer tuning as per new hardware trends. 3. More effective version of Join algorithms (e.g. Compare to traditional "build and then probe" mechanism of Hash Join, now there is pipelining Hash join where probe and build both happens together). Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 09 May 2016 23:04, Robert Haas Wrote: >2. vectorized execution, by which I mean the ability of a node to return >tuples in batches rather than one by one. Andres has opined more than >once that repeated trips through ExecProcNode defeat the ability of the >CPU to do branch prediction correctly, slowing the whole system down, >and that they also result in poor CPU cache behavior, since we jump all >over the place executing a little bit of code from each node before >moving onto the next rather than running one bit of code first, and then >another later. I think that's >probably right. For example, consider a 5-table join where all of >the joins are implemented as hash tables. If this query plan is going >to be run to completion, it would make much more sense to fetch, say, >100 tuples from the driving scan and then probe for all of those in the >first hash table, and then probe for all of those in the second hash >table, and so on. What we do instead is fetch one tuple and probe for >it in all 5 hash tables, and then repeat. If one of those hash tables >would fit in the CPU cache but all five together will not, >that seems likely to be a lot worse. But even just ignoring the CPU >cache aspect of it for a minute, suppose you want to write a loop to >perform a hash join. The inner loop fetches the next tuple from the >probe table and does a hash lookup. Right now, fetching the next tuple >from the probe table means calling a function which in turn calls >another function which probably calls another function which probably >calls another function and now about 4 layers down we actually get the >next tuple. If the scan returned a batch of tuples to the hash join, >fetching the next tuple from the batch would probably be 0 or 1 function >calls rather than ... more. Admittedly, you've got to consider the cost >of marshaling the batches but I'm optimistic that there are cycles to be >squeezed out here. We might also want to consider storing batches of >tuples in a column-optimized rather than row-optimized format so that >iterating through one or two attributes across every tuple in the batch >touches the minimal number of cache lines. This sounds to be really great idea in the direction of performance improvement. I would like to share my thought as per our research work in the similar area (Mostly it may be as you have mentioned). Our goal with this work was to: 1. Makes the processing data-centric instead of operator centric. 2. Instead of pulling each tuple from immediate operator, operator can push the tuple to its parent. It can be allowed to push until it sees any operator, which cannot be processed without result from other operator. 3. Above two points to make better data-locality. e.g. we had done some quick prototyping (take it just as thought provoker) as mentioned below: Query: select * from tbl1, tbl2, tbl3 where tbl1.a=tbl2.b and tbl2.b=tbl3.c; For hash join: For each tuple t2 of tbl2 Materialize a hash tbl1.a = tbl2.b For each tuple t3 of tbl3 Materialize a hash tbl2.b = tbl3.c for each tuple t1 of tbl1 Search in hash tbl1.a = tbl2.b Search in hash tbl2.b = tbl3.c Output t1*t2*t3 Off course at each level if there is any additional Qual for the table, same can be applied. Similarly for Nested Loop Join, plan tree can be processed in the form of post-order traversal of tree. Scan first relation (leftmost relation), store all tuple --> Outer Loop through all scan (Or some part of total tuples)node relation starting from second one Scan the current relation For each tuple, match with all tuples of outer result, build the combined tuple. Save all combined satisfied tuple --> Outer The result we got was really impressive. There is a paper by Thomas Neumann on this idea: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf Note: VitesseDB has also implemented this approach for Hash Join along with compilation and their result is really impressive. Thanks and Regards, Kumar Rajeev Rastogi. http://rajeevrastogi.blogspot.in/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 30 October 2015 20:33, Andres Freund Wrote: >On 2015-10-30 10:57:45 -0400, Tom Lane wrote: >> Andres Freund <and...@anarazel.de> writes: >> > adding a parseInput(conn) into the loop yields the expected >> > FATAL: 57P01: terminating connection due to unexpected postmaster >> > exit Is there really any reason not to do that? >> >> Might work, but it probably needs some study: > >Yea, definitely. I was just at pgconf.eu's keynote catching up on a talk. >No fully thought through proposal's to be expected ;) > >> (a) is it safe > >I don't immediately see why not. > >> (b) is this the right place / are there other places > >I think it's ok for the send failure case, in a quick lookthrough I >didn't find anything else for writes - I'm not entirely sure all read >cases are handled tho, but it seems less likely to be mishandles. I also did some analysis as Andres suggested and observed that since the error message is already sent by backend to frontend, so error message is available on connection channel just that is was not received/processed by frontend. Also pqHandleSendFailure was checking the available data if any but it was ignored to process. So as Andres suggested above, if we add parseInput to receiver and parse the available message on connection channel, we could display appropriate error. IMHO above proposed place to add parseInput seems to be right, as already this function takes of handling " NOTICE message that the backend might have sent just before it died " Attached is the patch with this change. Comments? Thanks and Regards, Kumar Rajeev Rastogi dangle-v4.patch Description: dangle-v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 23 October 2015 01:58, Robert Haas [mailto:robertmh...@gmail.com] Wrote: >Well, I'm not buying this extra PostmasterIsAlive() call on every pass >through the main loop. That seems more expensive than we can really >justify. Checking this when we're already calling WaitLatchOrSocket is >basically free, but the other part is not. Agree. >Here's a version with that removed and some changes to the comments. Thanks for changing. >I still don't think this is quite working right, though, because here's >what happened when I killed the postmaster: > >rhaas=# select 1; > ?column? >-- >1 >(1 row) > >rhaas=# \watch >Watch every 2sThu Oct 22 16:24:10 2015 > > ?column? >-- >1 >(1 row) > >Watch every 2sThu Oct 22 16:24:12 2015 > > ?column? >-- >1 >(1 row) > >Watch every 2sThu Oct 22 16:24:14 2015 > > ?column? >-- >1 >(1 row) > >Watch every 2sThu Oct 22 16:24:16 2015 > > ?column? >-- >1 >(1 row) > >server closed the connection unexpectedly >This probably means the server terminated abnormally >before or while processing the request. >The connection to the server was lost. Attempting reset: Failed. > >Note that the error message doesn't actually show up on the client (it >did show up in the log). I guess that may be inevitable if we're >blocked in secure_write(), but if we're in secure_read() maybe it should >work? I haven't investigated why it doesn't. Actually in this case client is not waiting for the response from the server rather it is waiting for new command from user. So even though server has sent the response to client, client is not able to receive. Once client receives the next command to execute, by the time connection has terminated from server side and hence it comes out with the above message (i.e. server closed the connection...) Though I am also in favor of providing some error message to client. But with the current infrastructure, I don’t think there is any way to pass this error message to client [This error has happened without involvement of the client side]. Comments? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 20 October 2015 23:34, Robert Haas [mailto:robertmh...@gmail.com] Wrote: >On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera ><alvhe...@2ndquadrant.com> wrote: >> Robert Haas wrote: >>> I don't think that proc_exit(1) is the right way to exit here. It's >>> not very friendly to exit without at least attempting to give the >>> client a clue about what has gone wrong. I suggest something like >>> this: >>> >>> ereport(FATAL, >>> (errcode(ERRCODE_ADMIN_SHUTDOWN), >>> errmsg("terminating connection due to postmaster >>> shutdown"))); >> >> Agreed, but I don't think "shutdown" is the right word to use here -- >> that makes it sound like it was orderly. Perhaps "crash"? > >Well, that's a little speculative. "due to unexpected postmaster exit"? Agreed. Attached is the patch with changes. Thanks and Regards, Kumar Rajeev Rastogi dangling_backend_process_v2.patch Description: dangling_backend_process_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 19 October 2015 21:37, Robert Haas [mailto:robertmh...@gmail.com] Wrote: >On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera ><alvhe...@2ndquadrant.com> wrote: >> Andres Freund wrote: >>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote: >>> > If I recall correctly, he concerned about killing the backends >>> > running transactions which could be saved. I have a sympathy with >>> > the opinion. >>> >>> I still don't. Leaving backends alive after postmaster has died >>> prevents the auto-restart mechanism to from working from there on. >>> Which means that we'll potentially continue happily after another >>> backend has PANICed and potentially corrupted shared memory. Which >>> isn't all that unlikely if postmaster isn't around anymore. >> >> I agree. When postmaster terminates without waiting for all backends >> to go away, things are going horribly wrong -- either a DBA has done >> something stupid, or the system is misbehaving. As Andres says, if >> another backend dies at that point, things are even worse -- the dying >> backend could have been holding a critical lwlock, for instance, or it >> could have corrupted shared buffers on its way out. It is not >> sensible to leave the rest of the backends in the system still trying >> to run just because there is no one there to kill them. > >Yep. +1 for changing this. Seems many people are in favor of this change. I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new command). Changes are as per approach explained in my earlier mail i.e. 1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as WL_POSTMASTER_DEATH. 2. There is a possibility that the command is read without waiting on latch. This case is handled by checking postmaster status after command read (i.e. after ReadCommand). Attached is the patch. Thanks and Regards, Kumar Rajeev Rastogi dangling_backend_process.patch Description: dangling_backend_process.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 14 October 2015 14:03, Kyotaro HORIGUCHI: >Subject: Re: [HACKERS] Dangling Client Backend Process > >At Wed, 14 Oct 2015 11:08:37 +0530, Amit Kapila <amit.kapil...@gmail.com> >wrote in ><CAA4eK1L8SGWymhXF+yDpxiyA2ARCiEgQ88XsLhEvJba3Fh_F=q...@mail.gmail.com> >> On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi >> <rajeev.rast...@huawei.com> >> wrote: >> > If we add the event WL_POSTMASTER_DEATH also, client backend process >> > handling will become same as other backend process. So postmaster >> > death can be detected in the same way. >> > >> > But I am not sure if WL_POSTMASTER_DEATH event was not added >> > intentionally for some reason. Please confirm. >> > >> > Also is it OK to add this even handling in generic path of Libpq? >> > >> > Please let me know if I am missing something? >> > >> > >> I feel this is worth investigation, example for what kind of cases >> libpq is used for non-blocking sockets, because for such cases above >> idea will not work. > >Blocking mode of a port is changed using socket_set_nonblocking(). I >found two points that the function is called with true. >pq_getbyte_if_available() and socket_flush_if_writable(). They seems to >be used only in walsender *so far*. Yes and in that case I assume the proposed solution will work in all cases. >> Here, I think the bigger point is that, Tom was not in favour of this >> proposal (making backends exit on postmaster death ) at that time, not >> sure whether he has changed his mind. >If I recall correctly, he concerned about killing the backends running >transactions which could be saved. I have a sympathy with the opinion. >But also think it reasonable to kill all backends immediately so that >new postmaster can run... Yes it will be really helpful to know the earlier reason for "not making backend exit on postmaster death". Please let me know if there is any thread, which I can refer to find the same. IMHO there could be below major issues, if we don't kill client backend process on postmaster death: 1. Postmaster cannot be re-started again as pointed by Kyotaro and Andres Also. 2. If from existing client session, we try to do some operation which has dependency with backend process, then that operation will either fail or may lead to undefined behavior sometime. 3. Also unintentionally all operation done by application will not be checkpointed and also will not be flushed by bgworker. 4. Replicating of WAL to standby will be stopped and if synchronous standby is configured then command from master will be hanged. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 12th October 2015 20:45, Rajeev Rastogi Wrote: >>> I observed one strange behavior today that if postmaster process gets >>> crashed/killed, then it kill all background processes but not the client >>> backend process. >> This is a known behaviour and there was some discussion on this >> topic [1] previously as well. > Now as it is confirmed to be valid issue, I will spend some time on this to > find if there is something more appropriate solution. I checked the latest code and found Heikki has already added code for secure_read using the latch mechanism (using WaitLatchOrSocket). It currently waits for two events i.e. WL_LATCH_SET and WL_SOCKET_READABLE. Commit id: 80788a431e9bff06314a054109fdea66ac538199 If we add the event WL_POSTMASTER_DEATH also, client backend process handling will become same as other backend process. So postmaster death can be detected in the same way. But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally for some reason. Please confirm. Also is it OK to add this even handling in generic path of Libpq? Please let me know if I am missing something? Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Dangling Client Backend Process
On 10 October 2015 20:45, Amit Kapila Wrote: >> I observed one strange behavior today that if postmaster process gets >> crashed/killed, then it kill all background processes but not the client >> backend process. > This is a known behaviour and there was some discussion on this > topic [1] previously as well. I think that thread didn't reach to conclusion, > but there were couple of other reasons discussed in that thread as well to > have the behaviour as you are proposing here. Oops..I did not know about this. I shall check the older thread to get other opinions. >> One way to handle this issue will be to check whether postmaster is alive >> after every command read but it will add extra cost for each query execution. > I don't think that is a good idea as if there is no command execution > it will still stay as it is and doing such operations on each command > doesn't sound to be good idea even though overhead might not be > big. There are some other ideas discussed in that thread [2] to achieve > this behaviour, but I think we need to find a portable way to achieve it. Yes, you are right that process will not be closed till a new command comes but I think it does not harm functionality in anyway except that the process and its acquired resources does not get freed. Also use-case of application will be very less where their client process stays idle for very long time. But at the same time I agree this is not the best solution, we should look for more appropriate/better one. Now as it is confirmed to be valid issue, I will spend some time on this to find if there is something more appropriate solution. Thanks and Regards, Kumar Rajeev Rastogi
[HACKERS] Dangling Client Backend Process
I observed one strange behavior today that if postmaster process gets crashed/killed, then it kill all background processes but not the client backend process. Moreover it is also allowed to execute query on the connected client session without any other background process. But If I try to execute some command (like checkpoint) from the client session which requires any background task to perform, it fails because it cannot find the corresponding background process (like checkpoint process). I am not sure if this is already known behavior but I found it to be little awkward. This may lead to some unknown behavior in user application. Currently All background process keeps checking if Postmaster is Alive while they wait for any event but for client backend process there is no such mechanism. One way to handle this issue will be to check whether postmaster is alive after every command read but it will add extra cost for each query execution. Any comments? Thanks and Regards, Kumar Rajeev Rastogi
[HACKERS] Memory Context Info dump
In our customer environment as well as during development, we have observed that many time we need to get details of memory used by each contexts in order to analyze the memory consumption/leak. So I would like to propose one contrib function interface, which will dump the whole memory context into a file. One of the major infrastructure i.e. the function MemoryContextStats (which dump all memory context details on the console) is already there in PG but it is being automatically called for scenarios such as "system is out of memory". So Now: 1. Create one contrib function dump_memctxt_info, which is when executed, it will call the existing function MemoryContextStats. 2. The file to dump will be created in the following format: context__.dump 3. The file handler will be passed to MemoryContextStats. 4. If the parameter to MemoryContextStats is NULL (All existing calls will pass NULL), then it will behave as earlier i.e. dump on console. 5. The output format of the context will be same as earlier. 6. One sample file is attached. Use-case: In order to check the memory leak, this contrib function can be called before and after execution of query. Then comparison of these two dump will be helpful to further analyze. Attached is the patch. Please provide your opinion. If it is OK, I will add it to next commitFest. Thanks and Regards, Kumar Rajeev Rastogi memory_ctxt_dumpv1.patch Description: memory_ctxt_dumpv1.patch context_13192_2015-09-08_181236.dump Description: context_13192_2015-09-08_181236.dump -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 18 August 2015 21:18, Robert Haas Wrote: This footnote goes to my point. It seems clear to me that having the autonomous transaction see the effects of the outer uncommitted transaction is a recipe for trouble. If the autonomous transaction updates a row and commits, and the outer transaction later aborts, the resulting state is inconsistent with any serial history. I'm fairly certain that's going to leave us in an unhappy place. Even more obviously, ending up with two committed row versions that are both updates of a single ancestor version is no good. So, I agree that this scenario should be an error. What I don't agree with is the idea that it should be the deadlock detector's job to throw that error. Rather, I think that when we examine the xmax of the tuple we can see - which is the original one, not the one updated by the outer transaction - we should check whether that XID belongs to an outer transaction. If it does, we should throw an error instead of trying to lock it. That way (1) the error message will be clear and specific to the situation and (2) we don't need a separate PGPROC for each autonomous transaction. The first of those benefits is agreeable; the second one is, in my opinion, a key design goal for this feature. Yes I agree with this. I was in favor of error all the time without involving deadlock detector. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 03 August 2015 18:40, Merlin Moncure [mailto:mmonc...@gmail.com] Wrote: On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 31 July 2015 23:10, Robert Haas Wrote: I think we're going entirely down the wrong path here. Why is it ever useful for a backend's lock requests to conflict with themselves, even with autonomous transactions? That seems like an artifact of somebody else's implementation that we should be happy we don't need to copy. IMHO, since most of the locking are managed at transaction level not backend level and we consider main autonomous transaction to be independent transaction, then practically they may conflict right. It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with main (parent) transaction. But we cannot take it for granted as user might make a mistake. So at- least we should have some mechanism to handle this rare case, for which as of now I think throwing error from autonomous transaction as one of the solution. Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main transaction also??). hm. OK, what's the behavior of: BEGIN UPDATE foo SET x = x + 1 WHERE foo_id = 1; BEGIN WITH AUTONOMOUS TRANSACTION UPDATE foo SET x = x + 1 WHERE foo_id = 1; END; RAISE EXCEPTION ...; EXCEPTION ... END; It should throw an error (or something equivalent) as the second update will wait for record lock to get released, which in this case will not happen till second update finishes. So catch 22. Also, *) What do the other candidate implementations do? IMO, compatibility should be the underlying design principle. Oracle throws error in such case. But we can decide on what behavior we want to keep. *) What will the SQL only feature look like? Similar to PL as mentioned in your example, we can provide the SQL only feature also. *) Is the SPI interface going to be extended to expose AT? I don’t think at this point that there is any need of exposing SPI interface for this. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 31 July 2015 23:10, Robert Haas Wrote: On Tue, Jul 28, 2015 at 6:01 AM, Craig Ringer cr...@2ndquadrant.com wrote: That should be practical to special-case by maintaining a list of parent transaction (virtual?) transaction IDs. Attempts to wait on a lock held by any of those should fail immediately. There's no point waiting for the deadlock detector since the outer tx can never progress and commit/rollback to release locks, and it might not be able to see the parent/child relationship from outside the backend doing the nested tx anyway. I think we're going entirely down the wrong path here. Why is it ever useful for a backend's lock requests to conflict with themselves, even with autonomous transactions? That seems like an artifact of somebody else's implementation that we should be happy we don't need to copy. IMHO, since most of the locking are managed at transaction level not backend level and we consider main autonomous transaction to be independent transaction, then practically they may conflict right. It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with main (parent) transaction. But we cannot take it for granted as user might make a mistake. So at-least we should have some mechanism to handle this rare case, for which as of now I think throwing error from autonomous transaction as one of the solution. Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main transaction also??). Any other suggestion to handle this will be really helpful. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 28 July 2015 03:21, Josh Berkus Wrote: On 07/27/2015 02:47 AM, Rajeev rastogi wrote: Why have any fixed maximum? Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resources can be controlled. Is there a particular reason why this limit wouldn't just be max_stack_depth? We will require to allocate some initial resources in order to handle all nested autonomous transaction. So I think it is better to have some different configuration parameter. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 28 July 2015 15:31, Craig Ringer Wrote: 2.It should be allowed to deadlock with master transaction. We need to work-out a solution to avoid deadlock. The deadlock case in autonomous tx's is a bit different. Assuming you don't intend to allow interleaving, where you can switch between transactions at will rather than just at begin/commit, the only way a deadlock can happen is when the outer tx holds a lock that the inner tx tries to acquire. That should be practical to special-case by maintaining a list of parent transaction (virtual?) transaction IDs. Attempts to wait on a lock held by any of those should fail immediately. There's no point waiting for the deadlock detector since the outer tx can never progress and commit/rollback to release locks, and it might not be able to see the parent/child relationship from outside the backend doing the nested tx anyway. Thanks, sounds to be a good idea. I shall evaluate the same. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 23 July 2015 21:04, Robert Haas Wrote: On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: 2.It should be allowed to deadlock with master transaction. We need to work-out a solution to avoid deadlock. This sentence seems to contradict itself. I thought the consensus was that the transaction should NOT conflict with the master transaction. Since we are saying transaction is autonomous to parent transaction, we cannot guarantee that it does not take any conflicting lock unless otherwise designed so by the application. But yes, we should have mechanism to deal with the possible deadlock. 3.It can support multiple level of nesting based on the configuration (may be max as 70). Why have any fixed maximum? Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resources can be controlled. 2. The above commands can be issued either inside the procedure to make few statements of procedure inside autonomous transaction or even in stand-alone query execution. I think inside a procedure the autonomous transaction will need to be lexically scoped. You won't be able to do this, for example: BEGIN AUTONOMOUS TRANSACTION; FOR x IN SELECT ... LOOP COMMIT; BEGIN AUTONOMOUS TRANSACTION; END LOOP; I am not sure, how we will be able to control this. IMHO user should be able to control this, especially since it does not have any meaning from user perspective. Please let me know if I am missing something here. Rather you'd have to do something like this: FOR x IN SELECT .. LOOP BEGIN WITH AUTONOMOUS TRANSACTION do stuff END; END LOOP; Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Autonomous Transaction is back
After few failed attempt to propose Autonomous transaction earlier. I along with Simon Riggs would like to propose again but completely different in approach. We also had discussion about this feature in last PGCon2015 Unconference Day, those who missed this discussion, please refer https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015 Before jumping into the design and code proposal for this feature, me along with Simon Riggs wanted to propose its behavior and usage to keep everyone in the same boat. So we have summarized the behavior and usage of the Autonomous Transaction based on the discussion with community members in last PGCon2015 Unconference Day: Behavior of Autonomous Transaction: 1.The autonomous transaction treated as a completely different transaction from the master transaction. 2.It should be allowed to deadlock with master transaction. We need to work-out a solution to avoid deadlock. 3.It can support multiple level of nesting based on the configuration (may be max as 70). 4.Outer (i.e. main or upper autonomous) transaction to be suspended while the inner autonomous transaction is running. 5.Outer transaction should not see data of inner till inner is committed (serializable upper transaction should not see even after inner transaction commit). How to Use Autonomous Transaction: 1. We can issue explicit command to start an Autonomous transaction as below: BEGIN AUTONOMOUS TRANSACTION (Don't worry about keywords at this point.) Do you work. COMMIT/ROLLBACK (Will commit/rollback the autonomous transaction and will return to main transaction or upper autonomous transaction). 2. The above commands can be issued either inside the procedure to make few statements of procedure inside autonomous transaction or even in stand-alone query execution. 3. We can make whole procedure itself as autonomous, which will be similar to start autonomous transaction in the beginning of the procedure and commit/rollback at the end of the procedure. There was another discussion in Unconference Day to decide whether to implement COMMIT/ROLLBACK inside the procedure or autonomous transaction. So our opinion about this is that COMMIT/ROLLBACK inside procedure will be somewhat different from Autonomous Transaction as incase of first, once we commit inside the procedure, it commits everything done before call of procedure. This is the behavior of Oracle. So in this case user required to be very careful to not do any operation before call of procedure, which is not yet intended to be committed inside procedure. So we can prefer to implement Autonomous Transaction, which will not only be compatible with Oracle but also gives really strong required features. I have not put the use-cases here as already we agree about its strong use-cases. Requesting for everyone's opinion regarding this based on which we can proceed to enhance/tune/re-write our design. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] 9.5 release notes
On 11 June 2015 09:45, Bruce Momjian Wrote: I have committed the first draft of the 9.5 release notes. You can view the output here: http://momjian.us/pgsql_docs/release-9-5.html and it will eventually appear here: http://www.postgresql.org/docs/devel/static/release.html I am ready to make suggested adjustments, though I am traveling for conferences for the next ten days so there might a delay in my replies. Below performance improvement in the General Performance category is missing: Reduce btree scan overhead for and strategies For , =, and = strategies, mark the first scan key as already matched if scanning in an appropriate direction. If index tuple contains no nulls we can skip the first re-check for each tuple. Author: Kumar Rajeev Rastogi Committer: Simon Riggs Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pluggable Parser
What is Pluggable Parser: It is an option for users to select a different kind of parser to evaluate PostgreSQL for their business logic without much manual effort. Why do we need? As of now migration from other databases to PostgreSQL requires manual effort of translating SQL PL/SQL to PostgreSQL equivalent SQL queries. Because of this manual effort in converting scripts, migration to PostgreSQL considered to be very slow and sometime de-motivating also. So if we allow to plug different database syntaxes with PostgreSQL, then it will be one of the strong motivating result for many DBAs to try PostgreSQL. How to Do? This can be realized by supporting new SQL/Stored procedures syntaxes in the parser corresponding to each other database syntax and plug the one needs to be evaluated by putting the corresponding so file in dynamic_library_path. Default will be PostgreSQL syntax. Parser Interface: /* Hook for plugins to get control in Parser */ typedef List * (*parser_hook_type) const char *str); parser_hook_type parser_hook = NULL; extern PGDLLIMPORT parser_hook_type parser_hook; Make the parser entry point as function pointer (raw_parser); which can be loaded based on parser type. The parse_hook will be initialized with proper function during shared library loading, which is done only during server startup. By this way it can be ensured as only one parser which is provided by the user is used in the PostgreSQL. Each Database syntax related parser can be implemented as part of contrib module to keep it separate from the core code. To start with, I am planning to (For 2015-06 commitFest): 1. Support infrastructure to allow plugging. 2. Support of ORACLE SQL query syntaxes. Please let me know if community will be interested in this or if there were already any discussion about this in past? Please provide your opinion/suggestion. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Parallel Seq Scan
On 20 March 2015 17:37, Amit Kapila Wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) While I was going through this patch, I observed one invalid ASSERT in the function “ExecInitFunnel” i.e. Assert(outerPlan(node) == NULL); Outer node of Funnel node is always non-NULL and currently it will be PartialSeqScan Node. May be ASSERT is disabled while building the code because of which this issue has not yet been observed. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Parallel Seq Scan
On 25 March 2015 16:00, Amit Kapila Wrote: Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL I was seeing the version-10. I just checked version-11 and version-12 and found to be already fixed. I should have checked the latest version before sending the report…☺ Thanks and Regards, Kumar Rajeev Rastogi From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: 25 March 2015 16:00 To: Rajeev rastogi Cc: Amit Langote; Robert Haas; Andres Freund; Kouhei Kaigai; Amit Langote; Fabrízio Mello; Thom Brown; Stephen Frost; pgsql-hackers Subject: Re: [HACKERS] Parallel Seq Scan On Wed, Mar 25, 2015 at 3:47 PM, Rajeev rastogi rajeev.rast...@huawei.commailto:rajeev.rast...@huawei.com wrote: On 20 March 2015 17:37, Amit Kapila Wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) While I was going through this patch, I observed one invalid ASSERT in the function “ExecInitFunnel” i.e. Assert(outerPlan(node) == NULL); Outer node of Funnel node is always non-NULL and currently it will be PartialSeqScan Node. Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.comhttp://www.enterprisedb.com/
[HACKERS] Materialiation is slower than non-materialized
During my routine work, I observed that incase of execution of plan having inner node of NLJ as materialized node (on top of SeqScan) is slower compared to non-materialized SeqScan node. This happens only if Work_mem is not big enough to hold all tuples in memory. To make test easy and faster, I set the work_mem as 256kB. Then result is as below: =With Material off= postgres=# set enable_material to off; SET Time: 0.225 ms postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1tbl2.id1; count -- 49995000 (1 row) Time: 26674.299 ms postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1tbl2.id1; QUERY PLAN -- Aggregate (cost=2783478.33..2783478.34 rows=1 width=4) - Nested Loop (cost=0.00..2700145.00 rows= width=4) Join Filter: (tbl.id1 tbl2.id1) - Seq Scan on tbl (cost=0.00..145.00 rows=1 width=4) - Seq Scan on tbl2 (cost=0.00..145.00 rows=1 width=4) Planning time: 0.120 ms (6 rows) =With Material on= postgres=# set enable_material to on; SET Time: 0.222 ms postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1tbl2.id1; count -- 49995000 (1 row) Time: 32839.627 ms postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1tbl2.id1; QUERY PLAN Aggregate (cost=1983648.33..1983648.34 rows=1 width=4) - Nested Loop (cost=0.00..1900315.00 rows= width=4) Join Filter: (tbl.id1 tbl2.id1) - Seq Scan on tbl (cost=0.00..145.00 rows=1 width=4) - Materialize (cost=0.00..235.00 rows=1 width=4) - Seq Scan on tbl2 (cost=0.00..145.00 rows=1 width=4) Planning time: 0.140 ms (7 rows) As per my analysis, above result is aligned with our current design. Materialization Node: Cost Calculation @ Plan time: If the results spills over to disk in case of Materialization, it considers the cost for the same in total cost. Actual Execution: Result is actually fetched from disk only even on re-scan. Scan Node: Cost Calculation @ Plan time: The cost of re-scan of SeqScan node is considered to be same scan of SeqScan node, which always assumes that the records is fetched from disk and hence disk access cost is added (As we don't know really how much memory will be available to cache during execution). Actual Execution: After first scan, once the whole records is loaded to memory (provided shared_buffer is big enough), rescan of records are read from memory only and hence it is much faster. So because of this while planning cost of Materialized node is lesser than that of SeqScan node but while execution SeqScan is faster because it fetches tuples from memory on re-scan. I am not sure if we can consider this to be a problem or not but I just wanted to share as generally it is expected by user to be Materialization faster than Non-materialized. Please provide your opinion. If we can do something about this then I can take up this work. Thanks and Regards, Kumar Rajeev Rastogi -- This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [HACKERS] Materialiation is slower than non-materialized
On 23 March 2015 21:39, Robert Haas On Mon, Mar 23, 2015 at 6:01 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: The cost of re-scan of SeqScan node is considered to be same scan of SeqScan node, which always assumes that the records is fetched from disk and hence disk access cost is added (As we don’t know really how much memory will be available to cache during execution). That's a general problem not limited to materialize nodes. We might choose to do a heap-sort rather than a quick-sort, but it may turn out that the tapes we create end up in the OS buffer cache instead of on physical storage; in fact, it's probably the common case. Scans are costed using seq_page_cost and random_page_cost, but most of the time the random page cost will not be the cost of a head seek, because we'll find the data in the OS page cache. Some of the time it really will be a head seek, but we have no idea whether that will happen in any given case. The autovacuum cost delays have this problem too: a miss in shared buffers may really be a hit in the OS page cache, but we don't know. Yes, I agree. This kind of disclaimer is inappropriate on a public mailing list. Don't send confidential information to public mailing lists. You probably don't have any legal right to control what happens to it after that, regardless of what you put in your email. Sorry for this. Generally we delete this legal message before sending mails to community but somehow missed to do the same this time. Thanks and Regards, Kumar Rajeev Rastogi. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multiple call of GetTransactionSnapshot in single flow
I have observed in some places like exec_bind_message and exec_simple_query, Though these two function have already got snapshot but before they call function PortalStart, current snapshot gets popped off and then they pass InvalidSnapshot as parameter because of which inside PortalStart again snapshot is taken. In cases where many transactions are running, taking snapshot multiple times may be very costly. What is the reason for taking snapshot multiple time: 1. Is this implementation to make sure snapshot is at more granular level ? 2. Is it something do with current command id of the snapshot? 3. Or there is any other specific reason for this, which I am not able visualize? 4. Or am I missing something else? If it is just reason 1, then maybe we can try to pass the same snapshot to PortalStart as taken in caller, it can enhance the performance in many case. With this change, I did one small performance test on pgbench with prepared queries for pgbench select with 16 users and observed performance benefit of 10%. Please provide your opinion? Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Index scan optimization
On 17 November 2014 16:05, Simon Riggs Wrote: I confirm 5% improvement in both short and long index scans, on the least beneficial datatype. Looks likely to be a very positive win overall. Thanks a lot for testing and confirmation. The language used in the comments is not clear enough. I'll do my best to improve that, then look to commit this in about 5 hours. Thanks a lot for support. Please let me know if I also need to add something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 26 October 2014 10:42, Haribabu Kommi wrote: Hi, I reviewed index scan optimization patch, the following are the observations. - Patch applies cleanly. - Compiles without warnings - All regress tests are passed. There is a good performance gain with the patch in almost all scenarios. Thanks for reviewing. I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? Actually the keysCount can be more than one only if the key strategy is BTEqualStrategyNumber (i.e. = condition) But our optimization is only for the '' or '=' condition only. So adding code to set flag for all keys is of no use. Let me know if I am missing anything here? If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Do you mean to say that whether we should add any special handling for volatile function? If yes then as you said since index scan itself is not selected for such functions, then I feel We don’t need to add anything for that. Any opinion? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 18:51, Stephen Frost Wrote: * Rajeev rastogi (rajeev.rast...@huawei.com) wrote: Thanks, I shall start to prepare a patch for this optimization and share in 1 or 2 days. This sounded interesting to me also- please be sure to put it on the open commitfest once you have posted the patch. Thanks, I have added it to next CommitFest i.e. October CommitFest. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make pg_controldata accept -D dirname
On 24 September 2014 17:15, Michael Paquier Wrote: On Wed, Sep 24, 2014 at 7:59 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: I can never remember that pg_controldata takes only a dirname rather than -D dirname, and I gather I'm not the only one. Here's a tiny patch to make it work either way. As far as I can see, it doesn't break anything, not even if you have a data directory named -D. +1. I forget it all the time as well... +1, always I get confused once pg_controldata does not work with -D. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 19:17, Heikki Linnakangas wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. I have finished implementation of the discussed optimization. I got a performance improvement of around 30% on the schema and data shared in earlier mail. I also tested for the index scan case, where our optimization is not done and observed that there is no effect on those query because of this change. Change details: I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey structure), the value used for this 0x0004, which was unused. Inside the function _bt_first, once we finish finding the start scan position based on the first key, I am appending the flag SK_BT_MATCHED to the first key. Then in the function _bt_checkkeys, during the key comparison, I am checking if the key has SK_BT_MATCHED flag set, if yes then there is no need to further comparison. But if the tuple is having NULL value, then even if this flag is set, we will continue with further comparison (this handles the Heikki point of checking NULLs). I will add this patch to the next CommitFest. Please let me know your feedback. Thanks and Regards, Kumar Rajeev Rastogi index_scan_opt.patch Description: index_scan_opt.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 12:35, Heikki Linnakangas: I have observed a scope of considerable performance improvement in- case of index by a very minor code change. Consider the below schema: create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); Query as: select count(*) from tbl2 where id2'a' and id399; As per current design, it takes following steps to retrieve index tuples: 1. Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2'a' 2. Then it fetches each tuples from position found in step-1. 3. For each tuple, it matches all scan key condition, in our example it matches both scan key condition. 4. If condition match, it returns the tuple otherwise scan stops. Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted). So comparison on first scan key condition again in step-3 seems to be redundant. So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. Very much true. I would like to submit the patch for this improvement. Please provide your feedback. Also let me know if I am missing something. Yeah, sounds like a good idea. This scenario might not arise very often, but it should be cheap to check, so I doubt it will add any measurable overhead to the cases where the optimization doesn't help. Thanks, I shall start to prepare a patch for this optimization and share in 1 or 2 days. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index scan optimization
I have observed a scope of considerable performance improvement in-case of index by a very minor code change. Consider the below schema: create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); Query as: select count(*) from tbl2 where id2'a' and id399; As per current design, it takes following steps to retrieve index tuples: 1. Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2'a' 2. Then it fetches each tuples from position found in step-1. 3. For each tuple, it matches all scan key condition, in our example it matches both scan key condition. 4. If condition match, it returns the tuple otherwise scan stops. Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted). So comparison on first scan key condition again in step-3 seems to be redundant. So my proposal is to skip the condition check on the first scan key condition for every tuple. I have prototype code to see the result, Below are my some test data as per the query attached with this mail: Time: Original Code took:3621 ms Optimized Code took: 2576 ms So overall performance improvement is around 29%. Instruction: Original Code took:20057 Optimized Code took: 14557 So overall instruction improvement is around 27%. Also I observed that only for function varstr_cmp, around 661M instruction was taken, which is completely saved for optimized code. This optimization can be extended: 1. For a case where once all scan key matches the condition, no need to check condition for further tuples for any scan key. Just keep returning the tuple. 2. Above approach can be used for '' operator also by finding the position of last matching tuples. I would like to submit the patch for this improvement. Please provide your feedback. Also let me know if I am missing something. Thanks and Regards, Kumar Rajeev Rastogi --Schema create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); --Procedure to insert 1M data: create or replace function insert_data(count1 int, count2 int) returns void AS $$ Begin for i IN 1..count1 loop insert into tbl2 values(i, 'a', i); end loop; for i IN count1+1..count2 loop insert into tbl2 values(i, 'b', i); end loop; End; $$ language plpgsql; select insert_data(99, 100); --Procedure to select data 1000 times (1000 times selected to make data more appropriate.) create or replace function select_data(count1 int) returns void AS $$ declare x int; Begin for i IN 1..count1 loop select count(*) into x from tbl2 where id2'a' and id399; end loop; End; $$ language plpgsql; select select_data(1000); --Result: OptimizedOriginal Reading-1 2579.27 3621.82 Reading-2 2573.82 3618.29 Reading-3 2575.08 3625.16 Average 2576.06 3621.76 Overall Improvement 29% Instruction Original Code took: 20057 M Insrtuction Optimized Code took:14557 M So overall instruction improvement is around 27%. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On 23 August 2014 11:22, Michael Paquier Wrote: 2. Logic of deciding the highest priority one seems to be in-correct. Assume, s_s_num = 3, s_s_names = 3,4,2,1 standby nodes are in order as: 1,2,3,4,5,6,7 As per the logic in patch, node 4 with priority 2 will not be added in the list whereas 1,2,3 will be added. The problem is because priority updated for next tracking is not the highest priority as of that iteration, it is just priority of last node added to the list. So it may happen that a node with higher priority is still there in list but we are comparing with some other smaller priority. Fixed. Nice catch! Actually by re-reading the code I wrote yesterday I found that the fix in v6 for that is not correct. That's really fixed with v7 attached. I have done some more review, below are my comments: 1. There are currently two loops on *num_sync, Can we simply the function SyncRepGetSynchronousNodes by moving the priority calculation inside the upper loop if (*num_sync == allowed_sync_nodes) { for (j = 0; j *num_sync; j++) { Anyway we require priority only if *num_sync == allowed_sync_nodes condition matches. So in this loop itself, we can calculate the priority as well as assigment of new standbys with lower priority. Let me know if you see any issue with this. 2. Comment inside the function SyncRepReleaseWaiters, /* * We should have found ourselves at least, except if it is not expected * to find any synchronous nodes. */ Assert(num_sync 0); I think except if it is not expected to find any synchronous nodes is not required. As if it has come till this point means atleast this node is synchronous. 3. Document says that s_s_num should be lesser than max_wal_senders but code wise there is no protection for the same. IMHO, s_s_num should be lesser than or equal to max_wal_senders otherwise COMMIT will never return back the console without any knowledge of user. I see that some discussion has happened regarding this but I think just adding documentation for this is not enough. I am not sure what issue is observed in adding check during GUC initialization but if there is unavoidable issue during GUC initialization then can't we try to add check at later points. 4. Similary interaction between parameters s_s_names and s_s_num. I see some discussion has happened regarding this and it is acceptable to have s_s_num more than s_s_names. But I was thinking should not give atleast some notice message to user for such case along with some documentation. config.sgml 5. At any one time there will be at a number of active synchronous standbys: this sentence is not proper. 6. When this parameter is set to literal0/, all the standby nodes will be considered as asynchronous. Can we make this as When this parameter is set to literal0/, all the standby nodes will be considered as asynchronous irrespective of value of synchronous_standby_names. 7. Are considered as synchronous the first elements of xref linkend=guc-synchronous-standby-names in number of xref linkend=guc-synchronous-standby-num that are connected. Starting of this sentence looks to be incomplete. high-availability.sgml 8. Standbys listed after this will take over the role of synchronous standby if the first one should fail. Should not we make it as: Standbys listed after this will take over the role of synchronous standby if any of the first synchronous-standby-num standby fails. Let me know incase if something is not clear. Thanks and Regards, Kumar Rajeev Rastogi. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On 09 August 2014 11:33, Michael Paquier Wrote: Please find attached a patch to add support of synchronous replication for multiple standby servers. This is controlled by the addition of a new GUC parameter called synchronous_standby_num, that makes server wait for transaction commit on the first N standbys defined in synchronous_standby_names. The implementation is really straight- forward, and has just needed a couple of modifications in walsender.c for pg_stat_get_wal_senders and syncrep.c. I have just started looking into this patch. Please find below my first level of observation from the patch: 1. Allocation of memory for sync_nodes in function SyncRepGetSynchronousNodes should be equivalent to allowed_sync_nodes instead of max_wal_senders. As anyway we are not going to store sync stdbys more than allowed_sync_nodes. sync_nodes = (int *) palloc(allowed_sync_nodes * sizeof(int)); 2. Logic of deciding the highest priority one seems to be in-correct. Assume, s_s_num = 3, s_s_names = 3,4,2,1 standby nodes are in order as: 1,2,3,4,5,6,7 As per the logic in patch, node 4 with priority 2 will not be added in the list whereas 1,2,3 will be added. The problem is because priority updated for next tracking is not the highest priority as of that iteration, it is just priority of last node added to the list. So it may happen that a node with higher priority is still there in list but we are comparing with some other smaller priority. 3. Can we optimize the function SyncRepGetSynchronousNodes in such a way that it gets the number of standby nodes from s_s_names itself. With this it will be usful to stop scanning the moment we get first s_s_num potential standbys. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unwanted LOG during recovery of DROP TABLESPACE REDO
I found and fixed a bug that causes recovery (crash recovery , PITR) to throw unwanted LOG message if the tablespace symlink is not found during the processing of DROP TABLESPACE redo. LOG: could not remove symbolic link pg_tblspc/16384: No such file or directory To Reproduce the issue: 1. Start the server. 2. Create a tablespace. 3. Perform Checkpoint. 4. Drop tablespace. 5. Stop server using immediate mode. 6. Start server : At this stage, recovery throw log message as mentioned above. Reason is that DROP TABLESPACE has already removed symlink and again it is being tried to remove during recovery. As it is very much possible that DROP TABLESPACE was successful and cleaned up the file before server crashed. So this should be considered as valid scenario and no need to throw any LOG in such case. In case of processing of CREATE TABLESPACE redo, same is already handled. I will add this to 2014-08 CF for review. Thanks and Regards, Kumar Rajeev Rastogi rec_issue_with_drop_tblspc_redo.patch Description: rec_issue_with_drop_tblspc_redo.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
On 04 July 2014 19:29, MauMau Wrote: [How to fix] Of course, adding -o '-c synchronous_commit=local' or -o '-c synchronous_standby_names=' to pg_ctl start in the recovery script would prevent the problem. But isn't there anything to fix in PostgreSQL? I think the doc needs improvement so that users won't misunderstand that only write transactions would block at commit. As of now there is no solution for this in PostgreSQL but I had submitted a patch Standalone synchronous master in 9.4 2014-01 CommitFest, which was rejected because of some issues. This patch was meant to degrade the synchronous level of master, if all synchronous standbys are down. I plan to resubmit this with better design sometime in 9.5. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 01 July 2014 12:00, Amit Kapila Wrote: On Tue, Jul 1, 2014 at 11:46 AM, Rajeev rastogi rajeev.rast...@huawei.commailto:rajeev.rast...@huawei.com wrote: On 30 June 2014 22:50, Pavel Stehule Wrote: I didn't find a related message. ? I think there have been some confusion, the design idea were never rejected but yes there were few feedback/ concern, which I had clarified. Also some of the other concerns are already fixed in latest patch. Simon has mentioned that exactly this idea has been rejected at PGCon 2 years back. Please refer that in below mail: http://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab7713dde1...@szxeml508-mbx.china.huawei.com As far as I can see, you never came back with the different solution. Yeah right. So for this I tried to search archived mails to get the details about the discussion but I could not find anything regarding design. So I am not sure how shall I make my solution different from earlier as earlier solution is not accessible to me. Any help regarding this will be really great help to me. Also from the current Autonomous transaction discussion thread (including ca+u5nmkeum4abrqbndlyt5ledektae8rbiyw3977yhmeowq...@mail.gmail.comhttp://www.postgresql.org/message-id/ca+u5nmkeum4abrqbndlyt5ledektae8rbiyw3977yhmeowq...@mail.gmail.com), I have summarized all important feedbacks as mentioned below along with the resolution suggested: 1. Pavel Stehule (07-04-2014): -1 for Oracle syntax - it is hardly inconsistent with Postgres Changed the syntax to “START AUTONOMOUS TRANSACTION” 2. Pavan (10-04-2014): Making autonomous transaction properties independent of main transaction. Made all properties of autonomous transaction (including read-only) independent from main transaction except isolation level, which I did not find very useful to keep different. But others opinion is different then we can make this property also independent. 3. Alvaro Herrarta (09-04-2014): Autonomous transaction to have their own separate proc entry. This was concluded to not have separate proc entry for autonomous transaction and same concluded. 4. Tom Lane (09-04-2014): The point being that you need to change both pg_subtrans and pg_clog to make that state transition. This is handled for autonomous transaction. 5. Robert Haas (09-04-2014): Not in favour of current design related to maintaining lockmask for autonomous transaction. I had replied for this mail regarding why this design is kept but still if design for this part is not acceptable, then I can rework to make it better. In order to do so I would be very happy to have more discussion to get concrete feedback and direction to improve this. 6. Tom Lane (09-04-2014): no justification for distinguishing normal and autonomous transactions at this level (locking level). I had replied this also earlier. Reason for distinguishing at this level is to handle any kind of deadlock possibility between main and autonomous transaction. Deadlock handling between main and autonomous transaction was one of the requirement discussed at PGCon 2012 as part of autonomous transaction discussion. Please let me know if I am missing something in this. All of the above mentioned changes are included in latest patch shared. Please let me know if I have missed any other important points from the earlier discussion, I would like to address that also. Have you checked the discussion in Developer meeting notes. Please check the same at below link: http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting#Autonomous_Transactions From the discussion, I am able to make out two important points: 1. Main transaction and autonomous transaction should be independent and can conflict. This is already included in our latest patch. 2. Utility commands like VACUUM and CREATE INDEX CONCURRENTLY should be able to work from autonomous transaction. Both of the above mentioned utility commands are not supported even inside the main transaction. So it is not working within autonomous transaction. Any opinion about this? Please let me know if I have missed any points from the link given. So I wanted to have this patch in commitfest application, so that we can have a healthy discussion and rectify all the issues. But now I see that this patch has already been moved to rejected category, which will put break on further review. I believe ideally this patch should have been marked as Returned with feedback as you already got a feedback long back and never come up with solution for same. Since this patch is very big and complex, it is better we continue discussing from the first CommitFest itself so that we can get ample time to share everyone’s opinion and then address all possible issue. Any Opinions/Suggestions are welcome. Also let me know if I have missed something. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Add the number of pinning backends to pg_buffercache's output
On 23 June 2014 15:22, Andres Freund Wrote: This may be harmless but pinning_backends should be defined as int4 rather than int8 because BufferDesc-refcount is just defined as unsigned and it's converted to Datum by Int32GetDatum(). Well, in theory a uint32 can't generally be converted to a int32. That's why I chose a int64 because it's guaranteed to have sufficient range. It's fairly unlikely to have that many pins, but using a int64 seems cheap enough here. But since we have declared pinning_backends as int8, we should use Int64GetDatum to form the tuple datum. Using Int32GetDatum will lead to issue specially incase of WIN32 platform or any other platform where int8 is not considered as USE_FLOAT8_BYVAL (or FLOAT8PASSBYVAL as 0). On such platform while initializing the tuple, type by value will be marked as false but still value (not address) will be passed to datum, which will lead to crash. I have done testing to verify this on win32 and able to observe the crash where as it works fine on Linux. Also can we change the description of function pg_buffercache_pages to include pinning_backend also in the description. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 30 June 2014 22:50, Pavel Stehule Wrote: 2014-06-30 12:38 GMT+02:00 Abhijit Menon-Sen a...@2ndquadrant.commailto:a...@2ndquadrant.com: If I understand correctly, the design of this patch has already been considered earlier and rejected. So I guess the patch should also be marked rejected? I didn't find a related message. ? I think there have been some confusion, the design idea were never rejected but yes there were few feedback/ concern, which I had clarified. Also some of the other concerns are already fixed in latest patch. So I wanted to have this patch in commitfest application, so that we can have a healthy discussion and rectify all the issues. But now I see that this patch has already been moved to rejected category, which will put break on further review. So is there any way to bring back and continue reviewing this patch. Please let me know if any issue or I am missing something. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Autonomous Transaction (WIP)
On 01 July 2014 12:26, Pavel Stehule Wrote: Have you checked the discussion in Developer meeting notes. Please check the same at below link: http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting#Autonomous_Transactions Are these notes still valid? * Why autonomous transaction should be close to functions? We can implement AT as first step and next step can be implementation of integration AT to stored procedures. We have implemented AT on the line of sub-transaction. Also we have integrated AT with stored procedure i.e. we can create an autonomous transaction inside the store procedure, which can be also committed. * When autonomous transaction is independent on parent transaction, then locks parent and autonomous transaction should be in conflict Yes our implementation makes the autonomous transaction independent of main transaction and hence as per our design parent (main) transaction and autonomous may get conflicted. For which we have implemented deadlock detection mechanism between autonomous transaction and its parent transaction. I though about integration to PL/pgSQL and I don't think so close integration between autonomous transaction and procedure is optimal. More practical is design so autonomous transaction is similar to subtransaction. Yes as mentioned above, our implementation of autonomous transaction is on track of subtransaction. Then we can simply wrote some code like BEGIN .. some code WHEN OTHERS THEN .. I would to write permanently to log BEGIN AUTONOMOUS INSERT INTO log VALUES(..); WHEN OTHERS RAISE WARNING 'Cannot to write to log ..'; RAISE EXCEPTION ' ...' forward up exception from autonomous transaction to parent transaction END END; Now I am thinking so PL/SQL design of autonomous transactions is relatively limited and is not best to follow it. With our approach, we can use autonomous transaction in procedure as given below: BEGIN .. some code WHEN OTHERS THEN .. I would to write permanently to log START AUTONOMOUS TRANSACTION INSERT INTO log VALUES(..); COMMIT: WHEN OTHERS RAISE WARNING 'Cannot to write to log ..'; RAISE EXCEPTION ' ...' forward up exception from autonomous transaction to parent transaction END END; Please let me know if I have missed to answer any of your queries. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] psql: show only failed queries
On 26 June 2014 11:53, Samrat Revagade Wrote: I am sending updated patch - buggy statement is printed via more logical psql_error function instead printf Thank you for updating patch, I really appreciate your efforts. Now, everything is good from my side. * it apply cleanly to the current git master * includes necessary docs * I think It is very good and necessary feature. If Kumar Rajeev Rastogi do not have any extra comments, then I think patch is ready for committer. I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. 2. New Command start-up option should be added in psql --help as well as in documentation. Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] psql: show only failed queries
On 30 June 2014 12:24, Pavel Stehule Wrote: I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. all not options entered via psql variables has psql option and psql comment. I'll plan add new decription to --help-variables list. If it is necessary I can add long option --echo-errors, I didn't a good char for short option. Any idea? But the new option we are adding are on a track of existing option, so better we add start-up option for this also. Yeah long option –echo-errors seems to be fine to me also. For short option, I think we can use “-b” stands for blunder. This is the closest one I could think of. 2. New Command start-up option should be added in psql --help as well as in documentation. depends on previous, Right. Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. This is question. And I am not strong in feeling what should be preferred. But still I am inclined to prefer a variant with STATEMENT prefix. Mode with -a is used with different purpose than mode show errors only - and output with prefix is much more consistent with log entry - and displaying error. So I agree, so there is potential inconsistency (but nowhere is output defined), but this output is more practical, when you are concentrated to error's processing. Yeah right, I just wanted to raise point to provoke other thought to see if anyone having different opinion. If no objection from others, we can go ahead with the current prefixing approach. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Proposal for CSN based snapshots
On 13 May 2014 14:06, Heikki Linnakangas The core of the design is to store the LSN of the commit record in pg_clog. Currently, we only store 2 bits per transaction there, indicating if the transaction committed or not, but the patch will expand it to 64 bits, to store the LSN. To check the visibility of an XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and compared with the snapshot's LSN. Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit LSN for every visibility check? Maybe. If no hint bit is set on the tuple, you have to check the clog anyway to determine if the tuple is committed. And if for XIDs older than xmin or newer than xmax, you don't need to check pg_clog. But it's true that for tuples with hint bit set, and xmin XID xmax, you have to check the pg_clog in the new system, when currently you only need to do a binary search of the local array in the snapshot. My gut feeling is that it won't be significantly slower in practice. If it becomes a problem, some rearrangement pg_clog code might help, or you could build a cache of XID-CSN mappings that you've alread looked up in SnapshotData. So I don't think that's going to be a show-stopper. Yes definitely it should not be not show-stopper. This can be optimized later by method as you mentioned and also by some cut-off technique based on which we can decide that a XID beyond a certain range will be always visible, and thereby avoiding look-up in pg_clog. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 12 May 2014 19:27, Heikki Linnakangas Wrote: On 01/24/2014 02:10 PM, Rajeev rastogi wrote: We are also planning to implement CSN based snapshot. So I am curious to know whether any further development is happening on this. I started looking into this, and plan to work on this for 9.5. It's a big project, so any help is welcome. The design I have in mind is to use the LSN of the commit record as the CSN (as Greg Stark suggested). Great ! Some problems and solutions I have been thinking of: The core of the design is to store the LSN of the commit record in pg_clog. Currently, we only store 2 bits per transaction there, indicating if the transaction committed or not, but the patch will expand it to 64 bits, to store the LSN. To check the visibility of an XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and compared with the snapshot's LSN. Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit LSN for every visibility check? With this mechanism, taking a snapshot is just a matter of reading the current WAL insertion point. There is no need to scan the proc array, which is good. However, it probably still makes sense to record an xmin and an xmax in SnapshotData, for performance reasons. An xmax, in particular, will allow us to skip checking the clog for transactions that will surely not be visible. We will no longer track the latest completed XID or the xmin like we do today, but we can use SharedVariableCache-nextXid as a conservative value for xmax, and keep a cached global xmin value in shared memory, updated when convenient, that can be just copied to the snapshot. I think we can update xmin, whenever transaction with its XID equal to xmin gets committed (i.e. in ProcArrayEndTransaction). Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
On 02 May 2014 10:00, Amit Longote Wrote: I s the following behavior perceived fix-worthy? -- note the ' 1's in the output s po stgres=# CREATE TABLE test AS SELECT; SELECT 1 postgres=# insert into test select; INSERT 0 1 Or maybe, it just means 1 'null' row/record and not no row at all? It just creates an item pointer and corresponding to that heap tuple header (without data or bitmask for NULL) gets stored as part of this insertion. So though it does not insert anything (not even NULL) but still it reserve one row position. So while SELECT, it will not display anything but it will show actual number of rows. Even below syntax is also allowed: CREATE TABLE no_column_table(); IMO, this might be useful for dynamic use of table (later column might be added using 'ALTER') or to use as abstract ancestor in class hierarchy. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So why is EXPLAIN printing only *plan* time?
On 27 April 2014 22:38, Tom Lane Wrote: ... and not, in particular, parse analysis or rewrite time? I'd been a bit suspicious of the recent patch to add $SUBJECT without the other pre-execution components, but it just now occurred to me that there's at least one reason why this might be a significant omission: any delay caused by waiting to acquire locks on the query's tables will be spent in the parser. It's probably okay to omit raw parsing time (that is, flex+bison), even though profiling sometimes suggests that that's a nontrivial cost. It's completely predictable and more or less linear in the query string length, so there are not likely to be any surprises for users in that time. But it's not at all clear to me that the same can be said for parse analysis or rewrite times. Rewrite timing could easily be captured by EXPLAIN since that call is done within ExplainQuery(). Parse analysis isn't, but we could imagine having transformExplainStmt() time the operation and stick the result into a new field in struct ExplainStmt. I'm not sure if it'd be appropriate to add all of these measurements as separate printout lines; arguably we should just fold them into planning time. Thoughts? I had understanding that the Planning time printed along with EXPLAIN is only for the purpose to see how much time is spent in creating plan for execution. If we start including all previous (i.e. parsing + analyzing + rewrite) time also in Planning time then, it might lose whole of the meaning of printing this. If we add time spent in locking, then Planning time for the same query with same statistics will become variable depending on the amount of time it had to wait to acquire lock, which will be bit confusing for users. May be if we really have to print other time (i.e. parsing + analyzing + rewrite), then IMHO we can print with some different name instead of including in Planning time only. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 14 April 2014 20:10, Simon Riggs wrote: Autonomous Transaction Storage: As for main transaction, structure PGXACT is used to store main transactions, which are created in shared memory of size: (Number of process)*sizeof(struct PGXACT) Similarly a new structure will be defined to store autonomous transaction: Struct PGAutonomousXACT Oh...I had already added this patch for 2014-June CommitFest, thinking that everyone is busy with work to wrap up 9.4. I already proposed exactly this design two years ago and it was rejected at the PgCon hackers meeting. I have a better design worked out now and will likely be working on it for 9.5 Can we work together to take this feature to final goal. May be you can go through my complete patch and see whatever part of the patch and related design can be re-used along with your new design. Also if possible you can share your design (even rough is OK), I will see if I can contribute to that in some-way. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 21:25, Robert Haas Wrote: Deadlock Detection: I'm not sure how this would work out internally In order to resolve deadlock, two member variable will be created in the structure PROLOCK: Bitmask for lock types currently held by autonomous transaction. LOCKMASKholdMaskByAutoTx[MAX_AUTO_TX_LEVEL] Bitmask for lock types currently held by main transaction. LOCKMASKholdMaskByNormalTx Now when we grant the lock to particular transaction, depending on type of transaction, bit Mask will be set for either holdMaskByAutoTx or holdMaskByNormalTx. Similar when lock is ungranted, corresponding bitmask will be reset. That sounds pretty ugly, not to mention the fact that it will cause a substantial increase in the amount of memory required to store PROCLOCKs. It will probably slow things down, too. Actually I followed above design to keep it align with the existing design. As I understand, currently also all lock conflict is checked based on the corresponding lock bit mask. This is good catch that shared memory required will increase but isn't it justified from user perspective since we are allowing more transactions per session and hence memory required to store various kind of resources will increase. Since we are just additionally setting the bitmask for each lock (in-case there is autonomous transaction, then there will be one more additional bit mask setting and deadlock check), I don't think it should slow down the overall operation. Also We can keep number of autonomous transaction configurable(default-0), to keep it less impacting incase it is not configured. An autonomous transaction can also conflict with main transaction, so in order to check conflict between them, I am distinguishing at this level. Please correct me If I am wrong anywhere and also please provide your thought on this and on overall design. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
On 10 April 2014 11:18, Pavan Deolasee Wrote: I could think of few global variables like transaction properties related(i.e. read-only mode, isolation level etc). As I plan to keep transaction properties of autonomous transaction same as main transaction, so there is no need to have these global variables separately. Apart from this there are global variables like with-in transaction counters, GUC, xactStartTimeStamp. I think there is no need to maintain these variables also separately. They can continue from previous value for autonomous transaction also similar to as sub-transaction does. Hmm. Is that in line with what other databases do ? I would have preferred AT to run like a standalone transaction without any influence of the starting transaction, managing its own resources/locks/visibility/triggers etc. To me it seems it is not very useful to keep the transaction properties separate except the read-only properties (though oracle does not share any transaction properties). So we can have restriction that isolation and deferrable properties of main transaction will be inherited by autonomous transaction but read-only properties can be defined independently by autonomous transaction. Which looks to be fair restriction according to me. In order to keep read-only properties separate, there is already infrastructure in PG. Inside the structure TransactionStateData, there is variable prevXactReadOnly (entry-time xact r/o state), which can keep the parent transaction read only properties and XactReadOnly can be changed to current transaction properties. Moreover we can take this (transaction properties) as a feature enhancement also once a basic infrastructure is established, if acceptable to everyone. Autonomous transaction will not share resource/lock/visibility etc with main transaction. This has been already taken care in WIP patch. In-case of autonomous transaction, only specific global variables initialized are related to resources (similar to sub-transaction), which anyway gets stored in current transaction state. Please let me know if I am missing something or if you have some specific global variables related issue. No, I don't have any specific issues in mind. Mostly all such global state is managed through various AtStart/AtEOX and related routines. So a careful examination of all those routines will give a good idea what needs to be handled. You probably will require to write AtATStart/AtATEOX and similar routines to manage the state at AT start/commit/rollback. Sorry, I haven't looked at your WIP patch yet. For some of the resources, I have already written AtATStart/AtATEOX kind of routines in WIP patch. Comments/feedbacks/doubts are welcome. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Autonomous Transaction (WIP)
On 10 April 2014 11:18, Pavan Deolasee Wrote: I could think of few global variables like transaction properties related(i.e. read-only mode, isolation level etc). As I plan to keep transaction properties of autonomous transaction same as main transaction, so there is no need to have these global variables separately. Apart from this there are global variables like with-in transaction counters, GUC, xactStartTimeStamp. I think there is no need to maintain these variables also separately. They can continue from previous value for autonomous transaction also similar to as sub-transaction does. Hmm. Is that in line with what other databases do ? I would have preferred AT to run like a standalone transaction without any influence of the starting transaction, managing its own resources/locks/visibility/triggers etc. To me it seems it is not very useful to keep the transaction properties separate except the read-only properties (though oracle does not share any transaction properties). So we can have restriction that isolation and deferrable properties of main transaction will be inherited by autonomous transaction but read-only properties can be defined independently by autonomous transaction. Which looks to be fair restriction according to me. In order to keep read-only properties separate, there is already infrastructure in PG. Inside the structure TransactionStateData, there is variable prevXactReadOnly (entry-time xact r/o state), which can keep the parent transaction read only properties and XactReadOnly can be changed to current transaction properties. Moreover we can take this (transaction properties) as a feature enhancement also once a basic infrastructure is established, if acceptable to everyone. Autonomous transaction will not share resource/lock/visibility etc with main transaction. This has been already taken care in WIP patch. In-case of autonomous transaction, only specific global variables initialized are related to resources (similar to sub-transaction), which anyway gets stored in current transaction state. Please let me know if I am missing something or if you have some specific global variables related issue. No, I don't have any specific issues in mind. Mostly all such global state is managed through various AtStart/AtEOX and related routines. So a careful examination of all those routines will give a good idea what needs to be handled. You probably will require to write AtATStart/AtATEOX and similar routines to manage the state at AT start/commit/rollback. Sorry, I haven't looked at your WIP patch yet. For some of the resources, I have already written AtATStart/AtATEOX kind of routines in WIP patch. Comments/feedbacks/doubts are welcome. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 12:14, Pavan Deolasee Wrote: Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. I could think of few global variables like transaction properties related(i.e. read-only mode, isolation level etc). As I plan to keep transaction properties of autonomous transaction same as main transaction, so there is no need to have these global variables separately. Apart from this there are global variables like with-in transaction counters, GUC, xactStartTimeStamp. I think there is no need to maintain these variables also separately. They can continue from previous value for autonomous transaction also similar to as sub-transaction does. In-case of autonomous transaction, only specific global variables initialized are related to resources (similar to sub-transaction), which anyway gets stored in current transaction state. Please let me know if I am missing something or if you have some specific global variables related issue. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 01:09, Rover Haas Wrote: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Yes you right. That is why I am not creating a separate procarray entry to maintain autonomous transaction. Please find details in previous reply sent today sometime back. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. Maybe the backend needs to internally frob visibility rules, but that's not a matter for shared memory. In order to get snapshot from other session, it will be required by other session to access autonomous transaction and their sub-transactions. During snapshot creation, autonomous transaction is considered as main transaction and list of all running autonomous transaction and their sub-transaction gets stored in snapshot data. e.g. Suppose below processes are running with given transactions: Proc-1: 100 Proc-2: 101, 102 (Auto Tx1), 103 (Auto Tx2), 104 (Sub-tx of Auto Tx2) Proc-3: 105, 106 (Auto Tx2), 107 (Auto Tx2) Suppose latest completed transaction is 108. Then Snapshot data for autonomous transaction 107 will be as below: Xmin: 100 Xmax: 109 Snapshot-xip[]: 100, 101, 102, 103, 105, 106 Snapshot-subxip[]: 104 Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 01:43, Tom Lane Wrote: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. If we can make it work like that, sure. I'm a bit worried about how you'd decouple a subtransaction and commit it atomically ... or if that's not atomic, will it create any problems? Though autonomous transaction uses mixed approach of sub-transaction as well as main transaction, transaction state of autonomous transaction is handled independently. So depending on the transaction state of autonomous transaction (for commit TBLOCK_AUTOCOMMIT), this transaction will be committed. While committing: 1. Commit of record and logging the corresponding WAL happens in the same way as main transaction (except the way autonomous transaction and their sub-transaction accessed). This will take care automatically of updating pg_clog also for autonomous transaction. 2. Also it marks the autonomous transaction finish by setting appropriate fields of MyPgAutonomousXact in similar manner as done for main transaction. 3. Freeing of all resource and popping out of parent transaction happens in the same way as sub-transaction. The point being that you need to change both pg_subtrans and pg_clog to make that state transition. Yes I am changing both. But no specific changes were required. During commit and assignment of autonomous transaction, it is automatically taken care. Any comment/feedback/doubt are welcome? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 07 April 2014 12:20, Craig Ringer Syntax to create autonomous transaction can be as: */PRAGMA AUTONOMOUS TRANSACTION;/* Wouldn't you want to use SET TRANSACTION for this? Or a suffix on BEGIN, like BEGIN AUTONOMOUS TRANSACTION ? What's the logic behind introducing PRAGMA ? If you wanted to use that syntax for Oracle compatibility you'd need to use: PRAGMA AUTONOMOUS_TRANSACTION; (note underscore). But really, this would no be a pragma at all, PostgreSQL doesn't really have the concept. Calling it that would just be misleading. Actually it is same as oracle (i.e. PRAGMA AUTONOMOUS_TRANSACTION), it was just typo mistake in previous mail. But if this is also not accepted then we can discuss and come out with a syntax based on everyone agreement. *_Starting of Autonomous Transaction:_* Starting of autonomous transaction will be exactly same as starting sub-transaction. If you don't want it to dirty read data from the parent tx, or inherit parent locks, then it cannot be the same at all. While starting sub-transaction, it is just initializing the resources required and links the same to the parent transaction, which we require for autonomous transaction also. I am not able to notice any issue as you mentioned above with this. Please let me know if I am missing something or misunderstood your concern. 2. Freeing of all resource and popping of previous transaction happens in the same way as sub-transaction. I'm not sure what you mean here. It means, during commit of autonomous transaction, freeing of all resource are done in the same way as done for sub-transaction. Also current autonomous transaction gets popped out and points to the parent transaction in the similar way as done for sub-transaction. Overall, this looks like a HUGE job to make work well. I know some others have been doing work along the same lines, so hopefully you'll be able to collaborate and share ideas. Yes it is huge works, so I have proposed in the beginning of 9.5 so that we can have multiple round of discussion and hence address all concerns. Also I have proposed to finish this feature in multiple rounds i.e. first patch, we can try to support autonomous transaction from standalone SQL-command only, which will set-up infrastructure for future work in this area. Using the WIP patch sent, I have done basic testing and it works fine. Any comments? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 07 April 2014 12:12, Pavel Stehule wrote: +1 for feature Thanks -1 for Oracle syntax - it is hardly inconsistent with Postgres We can discuss and come out with the syntax based on everyone agreement. Autonomous transactions should be used everywhere - not only in plpgsql Yes you are right. I am not planning to support only using plpgsql. Initially we can support this Using the standalone SQL-commands and then later we can enhance based on this infrastructure to be used using plpgsql, triggers. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Proposal: COUNT(*) (and related) speedup
On 04 April 2014 18:09, Joshua Yanovski Wrote: The counter would be updated only by VACUUM, which would perform the same computation performed by the COUNT operation but add it permanently to counter just before it set the visible_map bit to 1 (I am not certain whether this would require unusual changes to WAL replay or not). I think there is one more disadvantages with this (or maybe I have missed something): User may not use count(*) or may use just once after creating new kind of index proposed but VACUUM will have to keep performing operation equivalent to iterative count(*), which might degrade performance for VACUUM. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 05 April 2014 04:14, Tom Lane [ pgctl_win32service_rel_dbpath_v6.patch ] Committed with minor corrections, mostly but not all cosmetic. Thanks a lot... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Observed an issue in CREATE TABLE syntax
I observed an issue that even if invalid syntax is provided for CREATE TABLE, table is getting created successfully. Below table creation succeed even though same constraint name is given multiple times. None of the below constraints has any meaning of giving multiple times. postgres=# create table new (id int NULL NULL); CREATE TABLE postgres=# create table new1(id serial NOT NULL NOT NULL); CREATE TABLE postgres=# create table new2 (id int unique unique); CREATE TABLE Should we not throw error for above syntaxes? Please find the attached patch with the fix. Thanks and Regards, Kumar Rajeev Rastogi multiconstissuev1.patch Description: multiconstissuev1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation
On 27 March 2014 17:16, Tom Lane Wrote: I agree we need to make the docs match the code, but changing behavior that's been like that for ten or fifteen years isn't the answer. Sounds good. Please find the attached patch with only documentation change. Thanks and Regards, Kumar Rajeev Rastogi datistemplate_issueV2.patch Description: datistemplate_issueV2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] datistemplate of pg_database does not behave as per description in documentation
As per the documentation, datistemplate of pg_database is used in following way: datistemplate Bool If true then this database can be used in the TEMPLATE clause of CREATE DATABASE to create a new database as a clone of this one But current code does not behave in this manner. Even if dbistemplate of database is false, still it allows to be used as template database. postgres=# select datname, datistemplate from pg_database; datname | datistemplate ---+--- template1 | t template0 | t postgres | f (3 rows) postgres=# create database tempdb template postgres; ---Actually this should fail. CREATE DATABASE Though I am not sure if we have to modify source code to align the behavior with documentation or we need to change the documentation itself. To me it looks like code change will be better, so I am attaching the current patch with source code change. After modification, result will be as follows: postgres=# create database newtempdb template postgres; ERROR: DB name postgres given as template is not a template database Please provide your feedback. Thanks and Regards, Kumar Rajeev Rastogi datistemplate_issue.patch Description: datistemplate_issue.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standby server won't start
On 21 March 2014 13:41, Tatsuo Wrote: I changed primary servers max_connections from 100 to 4 for just a testing purpose. Now standby server won't start and complains: hot standby is not possible because max_connections = 4 is a lower setting than on the master server (its value was 100) My guess is this is because standby's pg_control file contains previous primary setting (max_connections = 100). Is there any way to start the standby server without re-creating pg_control (which implies getting base backup again)? If not, there should be some way to allow to start standby server without getting base backup... I think there is no way to do this because parameter from master is already set in pg_control file, which can not be changed without taking new backup from master. Also this is not recommended to have standby's max_connection values lesser than the master's max_connection value. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standby server won't start
On 21 March 2014 16:17, Tatsuo Wrote: In my case I had already changed primary's max_connections to 4 and restarted it. So at that point both postgresql.conf of primary and standby were 4. If you changed max_connection to 4 only in primary, then I am not able to understand, how it got changed in standby also (if you have not taken back again)? Let me know If I have missed something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 12 March 2014 23:57, Tom Lane Wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Mar 12, 2014 at 12:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: My inclination now (see later traffic) is to suppress the status report when the COPY destination is the same as pset.queryFout (ie, a simple test whether the FILE pointers are equal). This would suppress the status report for \copy to stdout and COPY TO STDOUT cases, and also for \copy to pstdout if you'd not redirected queryFout with \o. Based on my analysis, I observed that just file pointer comparison may not be sufficient to decide whether to display command tag or not. E.g. imagine below scenario: psql.exe -d postgres -o 'file.dat' -c \copy tbl to 'file.dat'; Though both destination files are same but file pointer will be different and hence printing status in file 'file.dat' will overwrite some part of data copied to file. Also we don't have any direct way of comparison of file name itself. As I see \copy ... TO.. will print status only in-case of \copy to pstdout if -o option is given. So instead of having so much of confusion and inconsistency that too for one very specific case, I though not to print status for all case Of STDOUT and \COPY ... TO ... This is reasonably similar to what we already do for SELECT, isn't it? I mean, the server always sends back a command tag, but psql sometimes opts not to print it. Right, the analogy to SELECT gives some comfort that this is reasonable. I have modified the patch based on above analysis as: 1. In-case of COPY ... TO STDOUT, command tag will not be displayed. 2. In-case of \COPY ... TO ..., command tag will not be displayed. 3. In all other cases, command tag will be displayed similar as were getting displayed earlier. I have modified the corresponding documentation. Please find the attached revised patch. Thanks and Regards, Kumar Rajeev Rastogi psql-copy-count-tag-20140313.patch Description: psql-copy-count-tag-20140313.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 10 March 2014 23:44, Tom Lane wrote: Unfortunately, while testing it I noticed that there's a potentially fatal backwards-compatibility problem, namely that the COPY n status gets printed on stdout, which is the same place that COPY OUT data is going. While this isn't such a big problem for interactive use, usages like this one are pretty popular: psql -c 'copy mytable to stdout' mydatabase | some-program With the patch, COPY n gets included in the data sent to some-program, which never happened before and is surely not what the user wants. The same if the -c string uses \copy. There are several things we could do about this: 1. Treat this as a non-backwards-compatible change, and document that people have to use -q if they don't want the COPY tag in the output. I'm not sure this is acceptable. 2. Kluge ProcessResult so that it continues to not pass back a PGresult for the COPY TO STDOUT case, or does so only in limited circumstances (perhaps only if isatty(stdout), for instance). I'm inclined to think #2 is the best answer if we can't stomach #1. Is it OK to have different status output for different flavor of COPY command? I am afraid that It will become kind of inconsistent result. Also not providing the command result status may be inconsistent from behavior of any other SQL commands. I agree that it breaks the backward compatibility but I am not sure if anyone is so tightly coupled with this ( or whether they will be effected with additional status result). To me option #1 seems to be more suitable specially since there is an option to disable the status output by giving -q. Please provide your opinion or let me know If I have missed something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 11 March 2014 19:52, Tom Lane wrote: After sleeping on it, I'm inclined to think we should continue to not print status for COPY TO STDOUT. Aside from the risk of breaking scripts, there's a decent analogy to be made to SELECT: we don't print a status tag for that either. It is correct that SELECT does not print conventional way of status tag but still it prints the number of rows selected (e.g. (2 rows)) along with rows actual value, which can be very well considered as kind of status. User can make out with this result, that how many rows have been selected. But in-case of COPY TO STDOUT, if we don't print anything, then user does not have any direct way of finding that how many rows were copied from table to STDOUT, which might have been very useful. Please let me know your opinion or if I have missed something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql command copy count tag
As mentioned by Pavel also, this patch will be very useful, which provides below enhancement: 1. Brings consistency between copy from “stdin” and “file”. 2. Consistent with server side copy statement. 3. Also fixes the issue related to “\copy destination file becomes default destination file for next command given in sequence”. This has been in “Ready for committer” stage for long time. Please check if this can be committed now or any other changes required. Thanks and Regards, Kumar Rajeev Rastogi -- This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! From: Pavel Stehule [mailto:pavel.steh...@gmail.com] Sent: 30 January 2014 01:57 To: PostgreSQL Hackers Cc: Amit Khandekar; Rajeev rastogi Subject: review: psql command copy count tag related to http://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab7713ddb1...@szxeml508-mbx.china.huawei.com Hello 1. I had to rebase this patch - actualised version is attached - I merged two patches to one 2. The psql code is compiled without issues after patching 3. All regress tests are passed without errors 5. We like this feature - it shows interesting info without any slowdown - psql copy command is more consistent with server side copy statement from psql perspective. This patch is ready for commit Regards Pavel
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
While registering the service for PostgreSQL on windows, we cannot expect user to give absolute path always. So it is required to support relative path as data directory. So this patch will be very useful to handle the same. This patch has been in Ready for committer stage for long time. Please check if this can be committed now or any other changes required. Thanks and Regards, Kumar Rajeev Rastogi -Original Message- From: MauMau [mailto:maumau...@gmail.com] Sent: 24 February 2014 15:31 To: Rajeev rastogi Cc: pgsql-hackers@postgresql.org; Magnus Hagander Subject: Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path From: Rajeev rastogi rajeev.rast...@huawei.com Please find the attached modified patch. Thanks, reviewed and made this patch ready for committer. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 04 February 2014 14:38, Myself wrote: On 4th February 2014, Christian kruse Wrote: On 04/02/14 12:38, Fujii Masao wrote: ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Sounds reasonable to me. Attached patch changes messages to the following: Process holding the lock: A. Wait queue: B. Processes holding the lock: A, B. Wait queue: C. This looks good to me also. I have tested the revised patch and found ready to be committed. I am marking this as Ready for Committer. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 22 February 2014 06:16, MauMau Wrote: Thanks for reviewing again. Please make small cosmetic changes so that make_absolute_path() follows the style of other parts. Then I'll make this ready for committer. (1) Add the function name in the comment as in: /* * make_absolute_path * * ...existing function descripton */ Added. (2) Add errno description as in: fprintf(stderr, _(could not get current working directory: %s\n, strerror(errno))); Modified. Please find the attached modified patch. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v6.patch Description: pgctl_win32service_rel_dbpath_v6.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 18 February 2014 21:47, MauMau Wrote: We already have two different versions of make_absolute_path() in the tree - one in src/backend/utils/init/miscinit.c and one in src/test/regress/pg_regress.c. I don't think we need a third one. If we put it in port/ like this patch done, then we should make the other callers use it. We might need one for frontend and one for backend (I haven't looked into that much detail), but definitely the one between pg_regress and pg_ctl should be shared. Agreed. Sorry, I should have proposed the refactoring. Rajeev, could you refactor the code so that there is only one make_absolute_path()? I think we can do like this: 1. Delete get_absolute_path() in src/port/path.c. 2. Delete make_absolute_path() in src/test/regress/pg_regress.c. 3. Move make_absolute_path() from src/backend/utils/init/miscinit.c to src/port/path.c, and delete the declaration in miscadmin.h. 4. Modify make_absolute_path() in path.c so that it can be used both in backend and frontend. That is, surround the error message output with #ifdef FRONTEND ... #else ... #endif. See some other source files in src/port. Changed the patch as per your suggestion. Now only one version of make_absolute_path there defined in src/port/path.c Found one small memory leak in the existing function make_absolute_path(miscinit.c), fixed the same. Please find the revised patch. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v5.patch Description: pgctl_win32service_rel_dbpath_v5.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 12 February 2014 12:16, KONDO Mitsumasa Wrote: Hi Rajeev, (2014/01/29 17:31), Rajeev rastogi wrote: No Issue, you can share me the test cases, I will take the performance report. Attached patch is supported to latest pg_stat_statements. It includes min, max, and stdev statistics. Could you run compiling test on your windows enviroments? I think compiling error was fixed. It got compiled successfully on Windows. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 4th February 2014, Christian kruse Wrote: On 04/02/14 12:38, Fujii Masao wrote: ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Sounds reasonable to me. Attached patch changes messages to the following: Process holding the lock: A. Wait queue: B. Processes holding the lock: A, B. Wait queue: C. This looks good to me also. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 3rd February 2014, MauMau Wrote: Thanks, your patch looks good. I confirmed the following: * Applied cleanly to the latest source tree, and built on Linux and Windows (with MSVC) without any warnings. * Changed to D:\ and ran pg_ctl register -N pg -D pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. * Changed to D:\pgdata and ran pg_ctl register -N pg -D ..\pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. Thanks for reviewing and testing the patch. Finally, please change the function description comment of get_absolute_path() so that the style follows other function in path.c. That is: /* * function_name * * Function description */ I have modified the function description comment as per your suggestion. Then update the CommitFest entry with the latest patch and let me know of that. Then, I'll change the patch status to ready for committer. I will update commitFest with the latest patch immediately after sending the mail. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v4.patch Description: pgctl_win32service_rel_dbpath_v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 1st February 2014, MauMau Wrote: I reviewed the patch content. I find this fix useful. I'd like to suggest some code improvements. I'll apply and test the patch when I receive your reply. Thanks for reviewing the patch. (1) I think it is appropriate to place find_my_abs_path() in path.c rather than exec.c. Please look at the comments at the beginning of those files. exec.c handles functions related to executables, while path.c handles general functions handling paths. I have moved this function to path.c It's better to rename the function to follow the naming of other functions in path.c, something like get_absolute_path() or so. Unfortunately, we cannot use make_absolute_path() as the name because it is used in src/backend/utils/init/miscinit.c, which conflicts in the backend module. Renamed the function as get_absolute_path. (2) In pg_ctl.c, dbpath can be better named as datadir, because it holds data directory location. dbpath is used to mean some different location in other source files. Renamed as dataDir. (3) find_my_abs_path() had better not call make_native_path() because the role of this function should be to just return an absolute path. pg_ctl.c can instead call make_native_path() after find_my_abs_path(). Changed as per suggestion. (4) find_my_abs_path() should not call resolve_symlinks(). For reference, look at make_absolute_path() in src/backend/utils/init/miscinit.c and src/test/regress/pg_regress.c. I guess the reason is that if the user changed to the directory through a symbolic link, we should retain the symbolic link name. Changed as per suggestion. (5) Change file/path in the comment of find_my_abs_path() to path, because file is also a path. Changed as per suggestion. Please find the attached revised patch. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v3.patch Description: pgctl_win32service_rel_dbpath_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 28th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thank you for trying to test my patch. As you say, recently commit changes pg_stat_statements.c a lot. So I have to revise my patch. Please wait for a while. By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. It works well in Linux system, but I'm not sure in Windows system. If you have time, could you test it on your Windows system? If it affects perfomance a lot, we can still change it. No Issue, you can share me the test cases, I will take the performance report. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Observed Compilation warning in WIN32 build
I observed below WIN32 compilation warnings in postmaster.c (seems introduced by commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f Relax the requirement that all lwlocks be stored in a single array.). 1.\src\backend\postmaster\postmaster.c(5625) : warning C4133: '=' : incompatible types - from 'LWLockPadded *' to 'LWLock *' 1.\src\backend\postmaster\postmaster.c(5856) : warning C4133: '=' : incompatible types - from 'LWLock *' to 'LWLockPadded *' Attached is the patch with the fix. Thanks and Regards, Kumar Rajeev Rastogi compile_issue_lwlock.patch Description: compile_issue_lwlock.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 28/01/14, Christian Kruse wrote: I have checked the revised patch. It looks fine to me except one minor code formatting issue. In elog.c, two tabs are missing in the definition of function errdetail_log_plural. Please run pgindent tool to check the same. I did, but this reformats various other locations in the file, too. Nevertheless I now ran pg_indent against it and removed the other parts. Attached you will find the corrected patch version. Also I would like to highlight one behavior here is that process ID of process trying to acquire lock is also listed in the list of Request queue. E.g. session 1 with process id X: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2 with process id Y: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; On execution of LOCK in session-2, as part of log it will display as: DETAIL: Process holding the lock: X. Request queue: Y. Where Y is the process ID of same process, which was trying to acquire lock. This is on purpose due to the rewording of the Message. In the first version the PID of the backend was missing. Thanks for the review! Now patch looks fine to me. I am marking this as Ready for Committer. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Function definition removed but prototype still there
As part of the below commit 36a35c550ac114caa423bcbe339d3515db0cd957 (Compress GIN posting lists, for smaller index size.) Function GinDataPageAddItemPointer definition was removed but corresponding prototype was still there. Attached is the patch to fix the same. Thanks and Regards, Kumar Rajeev Rastogi unwanted_prototype.patch Description: unwanted_prototype.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 23/01/14, Christian Kruse wrote: Well, is it context or detail? Those fields have reasonably well defined meanings IMO. I find the distinction somewhat blurry and think both would be appropriate. But since I wasn't sure I changed to detail. If we need errcontext_plural, let's add it, not adopt inferior solutions just because that isn't there for lack of previous need. I would've added it if I would've been sure. But having said that, I think this is indeed detail not context. (I kinda wonder whether some of the stuff that's now in the primary message shouldn't be pushed to errdetail as well. It looks like some previous patches in this area have been lazy.) I agree, the primary message is not very well worded. On the other hand finding an appropriate alternative seems hard for me. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Lastly, is this information that we want to be shipping to clients? Perhaps from a security standpoint that's not such a wise idea, and errdetail_log() is what should be used. Fixed. I added an errdetail_log_plural() for this, too. I have checked the revised patch. It looks fine to me except one minor code formatting issue. In elog.c, two tabs are missing in the definition of function errdetail_log_plural. Please run pgindent tool to check the same. Also I would like to highlight one behavior here is that process ID of process trying to acquire lock is also listed in the list of Request queue. E.g. session 1 with process id X: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2 with process id Y: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; On execution of LOCK in session-2, as part of log it will display as: DETAIL: Process holding the lock: X. Request queue: Y. Where Y is the process ID of same process, which was trying to acquire lock. To me, it seems to be correct behavior as the meaning of message will be list of all processes already holding the lock and processes waiting in queue and position of self process in wait-list. In above example, it will indicate that process Y in on top of wait list. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 27th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/25/2014, Josh Berkus wrote: ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. I don't feel that we'll never do auto-degrade is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. I shall rework to improve this patch. Below are the summarization of all discussions, which will be used as input for improving the patch: 1. Method of degrading the synchronous mode: a. Expose the configuration variable to a new SQL-callable functions. b. Using ALTER SYSTEM SET. c. Auto-degrade using some sort of configuration parameter as done in current patch. d. Or may be combination of above, which DBA can use depending on their use-cases. We can discuss further to decide on one of the approach. 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master. 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. I shall propose a better approach for this. 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. 5. Please add more if I am missing something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 11th June 2013, Markus Wanner wrote: Agreed. Postgres-R uses a CommitOrderId, which is very similar in concept, for example. Do you think having this snapshot scheme would be helpful for Postgres-R? Yeah, it could help to reduce patch size, after a rewrite to use such a CSN. Or why do you need to tell apart aborted from in-progress transactions by CSN? I need to detect aborted transactions so they can be discared during the eviction process, otherwise the sparse array will fill up. They could also be filtered out by cross-referencing uncommitted slots with the procarray. Having the abort case do some additional work to make xid assigment cheaper looks like a good tradeoff. I see. Sparse buffer needs to be at least big enough to fit CSN slots for the xids of all active transactions and non-overflowed subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer, or 203KB total in the default configuration. A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess the given 16 bytes includes alignment to 8 byte boundaries. Sounds good. 8 byte alignment for CSNs is needed for atomic if not something else. Oh, right, atomic writes. I think the size could be cut in half by using a base value for CSNs if we assume that no xid is active for longer than 2B transactions as is currently the case. I didn't want to include the complication in the first iteration, so I didn't verify if that would have any gotchas. In Postgres-R, I effectively used a 32-bit order id which wraps around. In this case, I guess adjusting the base value will get tricky. Wrapping could probably be used as well, instead. The number of times each cache line can be invalidated is bounded by 8. Hm.. good point. We are also planning to implement CSN based snapshot. So I am curious to know whether any further development is happening on this. If not then what is the reason? Am I missing something? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 23/01/14 14:45, Christian Kruse wrote: Well, is it context or detail? Those fields have reasonably well defined meanings IMO. I find the distinction somewhat blurry and think both would be appropriate. But since I wasn't sure I changed to detail. If we need errcontext_plural, let's add it, not adopt inferior solutions just because that isn't there for lack of previous need. I would've added it if I would've been sure. But having said that, I think this is indeed detail not context. (I kinda wonder whether some of the stuff that's now in the primary message shouldn't be pushed to errdetail as well. It looks like some previous patches in this area have been lazy.) I agree, the primary message is not very well worded. On the other hand finding an appropriate alternative seems hard for me. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Lastly, is this information that we want to be shipping to clients? Perhaps from a security standpoint that's not such a wise idea, and errdetail_log() is what should be used. Fixed. I added an errdetail_log_plural() for this, too. I think you have attached wrong patch. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 31st December 2013, Christian Kruse Wrote: Hi there, I created two patches improving the log messages generated by log_lock_waits. The first patch shows the process IDs holding a lock we try to acquire (show_pids_in_lock_log.patch), sample output (log_lock_waits=on required): session 1: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2: BEGIN; LOCK TABLE foo IN SHARE MODE; session 3: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; Output w/o patch: LOG: process 13777 still waiting for ExclusiveLock on relation 16385 of database 16384 after 1000.030 ms Output with patch: LOG: process 13777 still waiting for ExclusiveLock on relation 16385 of database 16384 after 1000.030 ms CONTEXT: processes owning lock: 13775, 13776 I am reviewing this patch. The idea seems to be reasonable. Following are my first level observation: 1. I find a issue in following scenario: session 1 with process id X: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2 with process id Y: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; session 3 with process id Z: BEGIN; LOCK TABLE foo IN SHARE MODE; On execution of LOCK in session-3, as part of log it will display as: processes owning lock: X, Y But actually if we see Y has not yet own the lock, it is still waiting with higher priority. It may mislead user. May be we should change message to give all meaning i.e. which process is owning lock and Which process is already in queue. 2. Can we give a better name to new variable 'buf1'? 3. Do we need to take performance reading to see if any impact? 4. Do we require documentation? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start if data directory given is relative path
On Tue, Jan 12, 2014 David Rowley wrote: I have found a case that PostgreSQL as win32 service does not start, if the data directory given as relative path. Error observed in this case is: The PostgreSQL on Local Computer started and then stopped. This may happen because relative path given will be relative to path from where service is registered but the path from where WIN32 service execution gets invoked may be different and hence it won't be able to find the data directory. I have fixed the same by internally converting the relative path to absolute path as it is being done for executable file. Hi, I've not quite got around to testing this yet, but I think the patch is a good idea as I can see that I can use a relative path when I start postgres.exe from the command line, I guess the behaviour should likely be the same when installed as a windows service. Thanks for reviewing and providing the first level of feedback. So, I've just been looking over this patch and I'm just wondering about a few things: In pgwin32_CommandLine you declare dbPath, it looks like you could declare this in the scope of; if (pg_config) Yes I have declared the same in the scope of if (pg_config) In find_my_abs_path you're making a call to StrNCpy, I know you likely just used find_my_exec as an example here, but I'd say the use of StrNCpy is not really needed here and is a bit of a waste of cycles. I'd rather see strlcpy being used as strncpy will needlessly zero the remaining buffer space. Yes you are right, it is much better to use strlcpy instead of StrNCpy. I have modified the patch. Also in find_my_abs_path, I'm just wondering if the cwd variable is really needed, I think you could just use retpath each time and it would also save a little bit of copying that's done in join_path_components(). By the looks of it you can just call join_path_components(retpath, retpath, inpath). Perhaps some people would disagree, but I'm not really seeing the harm in it and it saves some copying too. Yes I am also convinced with your suggestion. It really avoid one level of copy. I have seen that the similar mechanism is used in many places where join_path_components() is called. So I think it should be OK to use in our case also. I have made the necessary changes for the same. I have attached the changed patch. Please provide your further feedback, if any. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v2.patch Description: pgctl_win32service_rel_dbpath_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Sun, Jan 12, Amit Kapila wrote: How would that work? Would it be a tool in contrib? There already is a timeout, so if a tool checked more frequently than the timeout, it should work. The durable notification of the admin would happen in the tool, right? Well, you know what tool *I'm* planning to use. Thing is, when we talk about auto-degrade, we need to determine things like Is the replica down or is this just a network blip? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. Will it possible in current mechanism, because presently master will not accept any new command when the sync replica is not available? Or is there something else also which needs to be done along with above 2 points to make it possible. Since there is not WAL written for ALTER SYSTEM SET command, then it should be able to handle this command even though sync replica is not available. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 13th January 2013, Josh Berkus Wrote: I'm leading this off with a review of the features offered by the actual patch submitted. My general discussion of the issues of Sync Degrade, which justifies my specific suggestions below, follows that. Rajeev, please be aware that other hackers may have different opinions than me on what needs to change about the patch, so you should collect all opinions before changing code. Thanks for reviewing and providing the first level of comments. Surely We'll collect all feedback to improve this patch. Add a new parameter : synchronous_standalone_master = on | off I think this is a TERRIBLE name for any such parameter. What does synchronous standalone even mean? A better name for the parameter would be auto_degrade_sync_replication or synchronous_timeout_action = error | degrade, or something similar. It would be even better for this to be a mode of synchronous_commit, except that synchronous_commit is heavily overloaded already. Yes we can change this parameter name. Some of the suggestion in order to degrade the mode 1. Auto-degrade using some sort of configuration parameter as done in current patch. 2. Expose the configuration variable to a new SQL-callable functions as suggested by Heikki. 3. Or using ALTER SYSTEM SET as suggested by others. Some issues raised by this log script: LOG: standby tx0113 is now the synchronous standby with priority 1 LOG: waiting for standby synchronization -- standby wal receiver on the standby is killed (SIGKILL) LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization -- restart standby so that it connects again LOG: standby tx0113 is now the synchronous standby with priority 1 LOG: waiting for standby synchronization -- standby wal receiver is first stopped (SIGSTOP) to make sure The not waiting for standby synchronization message should be marked something stronger than LOG. I'd like ERROR. Yes we can change this to ERROR. Second, you have the master resuming sync rep when the standby reconnects. How do you determine when it's safe to do that? You're making the assumption that you have a failing sync standby instead of one which simply can't keep up with the master, or a flakey network connection (see discussion below). Yes this can be further improved so that only if we make sure that synchronous Standby has caught up with master node (may require a better design), then only master can be upgraded to Synchronous mode by one of the method discussed above. a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. I'm not at all clear what the difference between these two commands is. When would one be excuted, and when would the other be executed? Also, renaming ... There is typo mistake in above explain, meaning of two commands are: a.Master_to_standalone_cmd: To be executed during degradation of sync mode. b. Master_to_sync_cmd: To be executed before upgrade or restoration of mode. These two commands are per the TODO item to inform DBA. But as per Heikki suggestion, we should not use this mechanism to inform DBA rather We should some have some sort of generic trap system, instead of adding this one particular extra config option specifically for this feature. This looks to be better idea so we can have further discussion to come with proper design. Missing features: a) we should at least send committing clients a WARNING if they have commited a synchronous transaction and we are in degraded mode. Yes it is great idea. One of the reasons that there's so much disagreement about this feature is that most of the folks strongly in favor of auto-degrade are thinking *only* of the case that the standby is completely down. There are many other reasons for a sync transaction to hang, and the walsender has absolutely no way of knowing which is the case. For example: * Transient network issues * Standby can't keep up with master * Postgres bug * Storage/IO issues (think EBS) * Standby is restarting You don't want to handle all of those issues the same way as far as sync rep is concerned. For example, if the standby is restaring, you probably want to wait instead of degrading. I think if we support to have some external SQL-callable functions as Heikki suggested to degrade instead of auto-degrade then user can handle at-least some of the above scenarios if not all based on their experience and observation. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 8th Jan, 2014, Amit Kapila Wrote Add a new eager synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period This would require some type of command to be executed to alert administrators of this change. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php This patch implementation is in the same line as it was given in the earlier thread. Some Of the additional important changes are: 1. Have added two GUC variable to take commands from user to be executed a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. In description of both switches (a b), you are telling that it will switch to standalone mode, I think by your point 1b. you mean to say other way (switch from standalone to sync mode). Yes you are right. Its typo mistake. Instead of getting commands, why can't we just log such actions? I think adding 3 new guc variables for this functionality seems to be bit high. Actually in earlier discussion as well as in TODO added, it is mentioned to execute some kind of command to be executed to alert administrator. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php In my current patch, I have kept the LOG along with command. Also what will happen when it switches to standalone mode incase there are some async standby's already connected to it before going to standalone mode, if it continues to send data then I think naming it as 'enable_standalone_master' is not good. Yes we can change name to something more appropriate, some of them are: 1. enable_async_master 2. sync_standalone_master 3. enable_nowait_master 4. enable_nowait_resp_master Please provide your suggestion on above name or any other?. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL Service on Windows does not start if data directory given is relative path
I have found a case that PostgreSQL as win32 service does not start, if the data directory given as relative path. Error observed in this case is: The PostgreSQL on Local Computer started and then stopped. This may happen because relative path given will be relative to path from where service is registered but the path from where WIN32 service execution gets invoked may be different and hence it won't be able to find the data directory. I have fixed the same by internally converting the relative path to absolute path as it is being done for executable file. Attached is the patch with the fix. Please provide your opinion. I will add this to Jan 2014 CommitFest. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath.patch Description: pgctl_win32service_rel_dbpath.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On 12 December 2013, Heikki Linnakangas Wrote: Further Review of this patch: I have done few more cosmetic changes in your patch, please find the updated patch attached with this mail. Kindly check once whether changes are okay. Changes are fine. Thanks you. I have uploaded the latest patch in CF app and marked it as Ready For Committer. Thanks a lot. Committed, with some minor kibitzing of comments and docs. Thanks! Thanks a lot..:-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 12th December 2013, Rajeev Rastogi Wrote: On 9th December, Amit Khandelkar wrote: 1. slashcopyissuev1.patch :- This patch fixes the \COPY issue. You have removed the if condition in this statement, mentioning that it is always true now: - if (copystream == pset.cur_cmd_source) - pset.lineno++; + pset.lineno++; But copystream can be different than pset.cur_cmd_source , right ? As per the earlier code, condition result was always true. So pset.lineno was always incremented. In the earlier code pset.cur_cmd_source was sent as parameter to function and inside the function same parameter was used with the name copystream. So on entry of this function both will be one and same. I checked inside the function handleCopyIn, both of these parameters are not changing before above check. Also since pset is specific to single session, so it cannot change concurrently. Please let me know, if I am missing something. + FILE *copyStream; /* Stream to read/write for copy command */ There is no tab between FILE and *copystream, hence it is not aligned. OK. I shall change accodingly. I ran pgindent on settings.h file but found no issue reported. Seems tab is not mandatory in between variable declaration. Even for other parameters also in psqlSetting structure space instead of tab is being used. So seems no change required for this. Please confirm. 2. initialcopyissuev1_ontopofslashcopy.patch : Fix for COPY table FROM STDIN/STDOUT doesn't show count tag. The following header comments of ProcessResult() need to be modified: * Changes its argument to point to the last PGresult of the command string, * or NULL if that result was for a COPY FROM STDIN or COPY TO STDOUT. OK. I shall change accodingly. I have changed it in the latest patch. Regression results show all passed. Other than this, the patch needs a new regression test. I had checked the existing regression test cases and observed that it has already got all kind of test cases. Like copystdin, copystdout, \copy.stdin \copy.stdout. But since as regression framework runs in quite i.e. -q mode, so it does not show any message except query output. So our new code change does not impact regression framework. Please let me know if you were expecting any other test cases? Summary of two patches: 1. slashcopyissuev1.patch :- No change in this patch (same as earlier). 2. Initialcopyissuev2_ontopofslashcopy.patch : This patch is modified to change comment as per above review comments. Please provide your opinion or let me know if any other changes are required. Thanks and Regards, Kumar Rajeev Rastogi slashcopyissuev1.patch Description: slashcopyissuev1.patch initialcopyissuev2_ontopofslashcopy.patch Description: initialcopyissuev2_ontopofslashcopy.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 9th December, Amit Khandelkar wrote: 1. slashcopyissuev1.patch :- This patch fixes the \COPY issue. You have removed the if condition in this statement, mentioning that it is always true now: - if (copystream == pset.cur_cmd_source) - pset.lineno++; + pset.lineno++; But copystream can be different than pset.cur_cmd_source , right ? As per the earlier code, condition result was always true. So pset.lineno was always incremented. In the earlier code pset.cur_cmd_source was sent as parameter to function and inside the function same parameter was used with the name copystream. So on entry of this function both will be one and same. I checked inside the function handleCopyIn, both of these parameters are not changing before above check. Also since pset is specific to single session, so it cannot change concurrently. Please let me know, if I am missing something. + FILE *copyStream; /* Stream to read/write for copy command */ There is no tab between FILE and *copystream, hence it is not aligned. OK. I shall change accodingly. 2. initialcopyissuev1_ontopofslashcopy.patch : Fix for COPY table FROM STDIN/STDOUT doesn't show count tag. The following header comments of ProcessResult() need to be modified: * Changes its argument to point to the last PGresult of the command string, * or NULL if that result was for a COPY FROM STDIN or COPY TO STDOUT. OK. I shall change accodingly. Regression results show all passed. Other than this, the patch needs a new regression test. I had checked the existing regression test cases and observed that it has already got all kind of test cases. Like copystdin, copystdout, \copy.stdin \copy.stdout. But since as regression framework runs in quite i.e. -q mode, so it does not show any message except query output. So our new code change does not impact regression framework. Please let me know if you were expecting any other test cases? I don't think we need to do any doc changes, because the doc already mentions that COPY should show the COUNT tag, and does not mention anything specific to client-side COPY. OK. Please provide you opinion, based on which I shall prepare new patch and share the same. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 26 November 2013, Amit Khandelkar wrote: Can you please submit the \COPY patch as a separate patch ? Since these are two different issues, I would like to have these two fixed and committed separately. You can always test the \COPY issue using \COPY TO followed by INSERT. Please find the attached two separate patches: 1. slashcopyissuev1.patch :- This patch fixes the \COPY issue. 2. initialcopyissuev1_ontopofslashcopy.patch : Fix for COPY table FROM STDIN/STDOUT doesn't show count tag. Thanks and Regards, Kumar Rajeev Rastogi slashcopyissuev1.patch Description: slashcopyissuev1.patch initialcopyissuev1_ontopofslashcopy.patch Description: initialcopyissuev1_ontopofslashcopy.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On 29 November 2013, Amit Kapila Wrote: Further Review of this patch: b. why to display 'First log segment after reset' in 'Currrent pg_control values' section as now the same information is available in new section Values to be used after reset: ? May not be always. Suppose if user has given new value of seg no and TLI, then it will be different. Otherwise it will be same. So now I have changed the code in such way that the value of XLOG segment # read from checkpoint record gets printed as part of current value and any further new value gets printed in Values to be reset (This will be always at-least one plus the current value). Because of following code in FindEndOfXLOG xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size; newXlogSegNo = (xlogbytepos + XLogSegSize - 1) / XLogSegSize; newXlogSegNo++; It can be different, but I don't think we should display it in both sections because: a. it doesn't belong to control file. b. if you carefully look at original link (http://www.postgresql.org/message-id/1283277511-sup-2...@alvh.no- ip.org), these values are not getting displayed in pg_control values section. So I suggest it is better to remove it from pg_control values section. Done as per suggestion. I have done few more cosmetic changes in your patch, please find the updated patch attached with this mail. Kindly check once whether changes are okay. Changes are fine. Thanks you. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On 29 November 2013, Amit Kapila Wrote: Further Review of this patch: I have done few more cosmetic changes in your patch, please find the updated patch attached with this mail. Kindly check once whether changes are okay. Changes are fine. Thanks you. I have uploaded the latest patch in CF app and marked it as Ready For Committer. Thanks a lot. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
On 27 November 2013, Naoya Anzai wrote: Hi, Rajeev I tested the latest patch. My observation is: If we give relative data directory path while registering the service, then service start fails. But same works if the data directory is absolute path. Looks like an existing issue. May be we need to internally convert relative data path to absolute. Since the mentioned issue is an existing issue and not because of this patch. So can we take that as separate defect and fix. If so, then I can move this patch to ready for committer. I think so too. In boot by Service, CurrentDirectory seems to be C:/Windows/system32. So, you have to set a relative data directory path that the starting point to be C:/Windows/system32. OK. Then I am moving it to ready for committer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On 28 November 2013, Amit Kapila Wrote: Further Review of this patch: b. why to display 'First log segment after reset' in 'Currrent pg_control values' section as now the same information is available in new section Values to be used after reset: ? May not be always. Suppose if user has given new value of seg no and TLI, then it will be different. Otherwise it will be same. So now I have changed the code in such way that the value of XLOG segment # read from checkpoint record gets printed as part of current value and any further new value gets printed in Values to be reset (This will be always at-least one plus the current value). Because of following code in FindEndOfXLOG xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size; newXlogSegNo = (xlogbytepos + XLogSegSize - 1) / XLogSegSize; newXlogSegNo++; It can be different, but I don't think we should display it in both sections because: a. it doesn't belong to control file. b. if you carefully look at original link (http://www.postgresql.org/message-id/1283277511-sup-2...@alvh.no- ip.org), these values are not getting displayed in pg_control values section. So I suggest it is better to remove it from pg_control values section. Done as per suggestion. Few new comments: 1. Formatting for 'OldestMulti's DB' and 'OldestXID's DB' is not consistent with other values, try by printing it. Changed to align output. 2. +It will print values in two sections. In first section it will print all original/guessed values +and in second section all values to be used after reset. In second section filename will be +always printed as segment number will be at-least one more than current. Also if any other parameter +is given using appropriate switch, then those new values will be also printed. a. the length of newly added lines is not in sync with previous text, try to keep length less than 80 chars. b. I think there is no need of text (In second section ), you can remove all text after that point. Done. 3. ! printf(_( -n no update, just show extracted control values (for testing) and to be reset values of parameters(if given)\n)); I find this line too long and elaborative, try to make it shorter. Changed accordingly. Please provide your opinion. Thanks and Regards, Kumar Rajeev Rastogi pg_resetxlogsectionV4.patch Description: pg_resetxlogsectionV4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On 26 November 2013, Amit Kapila Wrote: Further Review of this patch: a. there are lot of trailing whitespaces in patch. Fixed. b. why to display 'First log segment after reset' in 'Currrent pg_control values' section as now the same information is available in new section Values to be used after reset: ? May not be always. Suppose if user has given new value of seg no and TLI, then it will be different. Otherwise it will be same. So now I have changed the code in such way that the value of XLOG segment # read from checkpoint record gets printed as part of current value and any further new value gets printed in Values to be reset (This will be always at-least one plus the current value). Because of following code in FindEndOfXLOG xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size; newXlogSegNo = (xlogbytepos + XLogSegSize - 1) / XLogSegSize; newXlogSegNo++; c. I think it is better to display 'First log segment after reset' as file name as it was earlier. This utility takes filename as input, so I think we should display filename. Fixed. Printing filename. d. I still think it is not good idea to use new parameter changedParam to detect which parameters are changed and the reason is code already has that information. We should try to use that information rather introducing new parameter to mean the same. Fixed. Removed changedParam and made existing variables like set_xid global and same is being used for check. e. if (guessed !force) { PrintControlValues(guessed); if (!noupdate) { printf(_(\nIf these values seem acceptable, use -f to force reset.\n)); exit(1); } Do you think there will be any chance when noupdate can be true in above loop, if not then why to keep the check for same? Fixed along with the last comment. f. if (changedParam DISPLAY_XID) { printf(_(NextXID: %u\n), ControlFile.checkPointCopy.nextXid); printf(_(oldestXID:%u\n), ControlFile.checkPointCopy.oldestXid); } Here you are priniting oldestXid, but not oldestXidDB, if user provides xid both values are changed. Same is the case for multitransaction. Fixed. g. /* newXlogSegNo will be always printed unconditionally*/ Extra space between always and printed. In single line comments space should be provided when comment text ends, please refer other single line comments. Fixed. In case when the values are guessed and -n option is not given, then this patch will not be able to distinguish the values. Don't you think it is better to display values in 2 sections in case of guessed values without -n flag as well, otherwise this utility will have 2 format's to display values? Yes you are right. Now printing the values in two section in two cases: a. -n is given or b. If we had to guess and -f not given. Please let know your opinion. Thanks and Regards, Kumar Rajeev Rastogi pg_resetxlogsectionV3.patch Description: pg_resetxlogsectionV3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
On 25 November 2013, Rajeev Rastogi Wrote: One suggestion: Instead of using sizeof(cmdLine), a. Can't we use strlen (hence small 'for' loop). b. Or use memmove to move one byte. I looked at this patch a bit. I agree that we need to fix pgwin32_CommandLine to double-quote the executable name, but it needs a great deal more work than that :-(. Whoever wrote this code was apparently unacquainted with the concept of buffer overrun. It's not going to be hard at all to crash pg_ctl with overlength arguments. I'm not sure that that amounts to a security bug, but it's certainly bad. After some thought it seems like the most future-proof fix is to not use a fixed-length buffer for the command string at all. The attached revised patch switches it over to using a PQExpBuffer instead, which is pretty much free since we're relying on libpq anyway in this program. (We still use a fixed-length buffer for the program path, which is OK because that's what find_my_exec and find_other_exec expect.) In addition, I fixed it to append .exe in both cases not just the one. I'm not in a position to actually test this, but it does compile without warnings. I tested the latest patch. My observation is: If we give relative data directory path while registering the service, then service start fails. But same works if the data directory is absolute path. Looks like an existing issue. May be we need to internally convert relative data path to absolute. Since the mentioned issue is an existing issue and not because of this patch. So can we take that as separate defect and fix. If so, then I can move this patch to ready for committer. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers