Re: [HACKERS] recent ALTER whatever .. SET SCHEMA refactoring
2013/1/7 Robert Haas robertmh...@gmail.com: On Mon, Jan 7, 2013 at 2:14 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Kohei KaiGai escribió: Function and collation are candidates of this special case handling; here are just two kinds of object. Another idea is to add a function-pointer as argument of AlterNamespace_internal for (upcoming) object classes that takes special handling for detection of name collision. My personal preference is the later one, rather than hardwired special case handling. However, it may be too elaborate to handle just two exceptions. I think this idea is fine. Pass a function pointer which is only not-NULL for the two exceptional cases; the code should have an Assert that either the function pointer is passed, or there is a nameCacheId to use. That way, the object types we already handle in the simpler way do not get any more complicated than they are today, and we're not forced to create useless callbacks for objects were the lookup is trivial. The function pointer should return boolean, true when the function/collation is already in the given schema; that way, the message wording is only present in AlterObjectNamespace_internal. It seems overly complex to me. What's wrong with putting special-case logic directly into the function? That seems cleaner and easier to understand, and there's no real downside AFAICS. We have similar special cases elsewhere; the code can't be simpler than the actual logic. Does it make sense an idea to invoke AlterFunctionNamespace_oid() or AlterCollationNamespace_oid() from AlterObjectNamespace_internal() for checks of namespace conflicts? It can handle special cases with keeping modularity between common and specific parts. Let's consider function pointer when we have mode than 5 object classes that needs special treatment. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
Hi, Why would that be a good tradeoff to make? Larger stored values require more I/O, which is likely to swamp any CPU savings in the compression step. Not to mention that a value once written may be read many times, so the extra I/O cost could be multiplied many times over later on. I agree with this analysis, but I note that the test results show it actually improving things along both parameters. Hm ... one of us is reading those results backwards, then. I think that it's a parameter-tuning issue. I added the two parameters, PGLZ_SKIP_SIZE and PGLZ_HASH_GAP, and set PGLZ_SKIP_SIZE=3 and PGLZ_HASH_GAP=8 for the quick tests. And also, I found that the performance in my patch was nearly equal to that in the current implementation when PGLZ_SKIP_SIZE=1 and PGLZ_HASH_GAP=1. Apart from my patch, what I care is that the current one might be much slow against I/O. For example, when compressing and writing large values, compressing data (20-40MiB/s) might be a dragger against writing data in disks (50-80MiB/s). Moreover, IMHO modern (and very fast) I/O subsystems such as SSD make a bigger issue in this case. Then, I think it's worth keeping discussions to improve compression stuffs for 9.4, or later. Another thing to keep in mind is that the compression area in general is a minefield of patents. We're fairly confident that pg_lzcompress as-is doesn't fall foul of any, but any significant change there would probably require more research. Agree, and we know ... we need to have patent-free ideas to improve compression issues. For example, pluggable compression IF, or something. I just went back and looked. Unless I'm misreading it he has about a 2.5 times speed improvement but about a 20% worse compression result. What would be interesting would be to see if the knobs he's tweaked could be tweaked a bit more to give us substantial speedup without significant space degradation. Yes, you're right, and these results highly depend on data sets though. regards, -- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.take...@lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
So why don't we use LZ4? +1 Agree though, I think there're still patent issues there. regards, -- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.take...@lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch n...@leadboat.com wrote: At that point in the investigation, I realized that the cost of being able to remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit. Again, the benefit is being able to remove tuples whose inserting transaction aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() and the one in lazy_scan_heap(). To make that possible, lazy_vacuum_heap() grabs a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples. If we take it out of the business of removing tuples, lazy_vacuum_heap() can skip WAL and update lp_flags under a mere shared lock. The second attached patch, for HEAD, implements that. Besides optimizing things somewhat, it simplifies the code and removes rarely-tested branches. (This patch supersedes the backpatch-oriented patch rather than stacking with it.) +1. I'd also advocated removing the line pointer array scan in lazy_scan_heap() because the heap_page_prune() does most of the work. The reason why we still have that scan in lazy_scan_heap() is to check for new dead tuples (which should be rare), check for all-visibility of the page and freeze tuples if possible. I think we can move all of that to heap_page_prune(). But while you are at it, I wonder if you have time and energy to look at the single pass vacuum work that I, Robert and others tried earlier but failed to get to the final stage [1][2]. If we do something like that, we might not need the second scan of the heap at all, which as you rightly pointed out, does not do much today anyway, other than marking LP_DEAD line pointers to LP_UNUSED. We can push that work to the next vacuum or even a foreground task. BTW, with respect to your idea of not WAL logging the operation of setting LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create problems with crash recovery. During normal vacuum operation, you may have converted LP_DEAD to LP_UNUSED. Then you actually used one of those line pointers for subsequent PageAddItem() on the page. If you crash after that but before the page gets written to the disk, during crash recovery, AFAICS PageAddItem() will complain with will not overwrite a used ItemId warning and subsequent PANIC of the recovery. Thanks, Pavan [1] http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php [2] http://archives.postgresql.org/message-id/caboikdphax5ugugb9rjnsj+zveytv8sn4ctyfcmbc47r6_b...@mail.gmail.com -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
Hi, (2013/01/07 22:36), Greg Stark wrote: On Mon, Jan 7, 2013 at 10:21 AM, John R Piercepie...@hogranch.com wrote: On 1/7/2013 2:05 AM, Andres Freund wrote: I think there should be enough bits available in the toast pointer to indicate the type of compression. I seem to remember somebody even posting a patch to that effect? I agree that it's probably too late in the 9.3 cycle to start with this. so an upgraded database would have old toasted values in the old compression format, and new toasted values in the new format in an existing table? that's kind of ugly. I haven't looked at the patch. It's not obvious to me from the description that the output isn't backwards compatible. The way the LZ toast compression works the output is self-describing. There are many different outputs that would decompress to the same thing and the compressing code can choose how hard to look for earlier matches and when to just copy bytes wholesale but the decompression will work regardless. My patch is not backwards compatible, so we need some features to switch these old and new disk formats. I think the discussion below is helpful in this use. That is, PGLZ_Header is used as this purpose. http://archives.postgresql.org/pgsql-hackers/2012-03/msg00971.php regards, -- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.take...@lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On Tue, Jan 8, 2013 at 3:58 AM, Peter Eisentraut pete...@gmx.net wrote: For debugging PL/Python functions, I'm often tempted to write something like rv = plpy.execute(...) plpy.info(rv) which prints something unhelpful like PLyResult object at 0xb461d8d8 By implementing a str handler for the result object, it now prints something like PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': '22'}] Patch attached for review. How does it work if there are many rows in there? Say the result contains 10,000 rows - will the string contain all of them? If so, might it be worthwhile to cap the number of rows shown and then follow with a ... or something? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On 01/08/2013 10:19 AM, Takeshi Yamamuro wrote: Hi, (2013/01/07 22:36), Greg Stark wrote: On Mon, Jan 7, 2013 at 10:21 AM, John R Piercepie...@hogranch.com wrote: On 1/7/2013 2:05 AM, Andres Freund wrote: I think there should be enough bits available in the toast pointer to indicate the type of compression. I seem to remember somebody even posting a patch to that effect? I agree that it's probably too late in the 9.3 cycle to start with this. so an upgraded database would have old toasted values in the old compression format, and new toasted values in the new format in an existing table? that's kind of ugly. I haven't looked at the patch. It's not obvious to me from the description that the output isn't backwards compatible. The way the LZ toast compression works the output is self-describing. There are many different outputs that would decompress to the same thing and the compressing code can choose how hard to look for earlier matches and when to just copy bytes wholesale but the decompression will work regardless. My patch is not backwards compatible, so we need some features to switch these old and new disk formats. Is it a feature of our compressed format that it is hard to make this backwards compatible. Only decompression should work anyway as we have not supported physical compatibility in the other direction in our other tools. That is, we don't have pg_downgrade :) Hannu I think the discussion below is helpful in this use. That is, PGLZ_Header is used as this purpose. http://archives.postgresql.org/pgsql-hackers/2012-03/msg00971.php regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent ALTER whatever .. SET SCHEMA refactoring
On Tue, Jan 8, 2013 at 4:05 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Does it make sense an idea to invoke AlterFunctionNamespace_oid() or AlterCollationNamespace_oid() from AlterObjectNamespace_internal() for checks of namespace conflicts? It can handle special cases with keeping modularity between common and specific parts. Let's consider function pointer when we have mode than 5 object classes that needs special treatment. Unless I'm gravely mistaken, we're only talking about a handful of lines of code. We have lots of functions, in objectaddress.c for example, whose behavior is conditional on the type of object that they are operating on. And we just write out all the cases. I'm not understanding why we need to take a substantially more complex approach here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro yamamuro.take...@lab.ntt.co.jp wrote: Apart from my patch, what I care is that the current one might be much slow against I/O. For example, when compressing and writing large values, compressing data (20-40MiB/s) might be a dragger against writing data in disks (50-80MiB/s). Moreover, IMHO modern (and very fast) I/O subsystems such as SSD make a bigger issue in this case. What about just turning compression off? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
On Fri, Jan 4, 2013 at 1:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/3/13 3:26 PM, Robert Haas wrote: It's true, as we've often said here, that leveraging the OS facilities means that we get the benefit of improving OS facilities for free - but it also means that we never exceed what the OS facilities are able to provide. And that should be the deciding factor, shouldn't it? Clearly, the OS timestamps do not satisfy the requirements of tracking database object creation times. Yes, I think so. But I am not entirely sold on tracking the creation time of every SQL object. It might be all right, but what about catalog bloat? I am on board for databases, and for tables, at any rate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent ALTER whatever .. SET SCHEMA refactoring
2013/1/8 Robert Haas robertmh...@gmail.com: On Tue, Jan 8, 2013 at 4:05 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Does it make sense an idea to invoke AlterFunctionNamespace_oid() or AlterCollationNamespace_oid() from AlterObjectNamespace_internal() for checks of namespace conflicts? It can handle special cases with keeping modularity between common and specific parts. Let's consider function pointer when we have mode than 5 object classes that needs special treatment. Unless I'm gravely mistaken, we're only talking about a handful of lines of code. We have lots of functions, in objectaddress.c for example, whose behavior is conditional on the type of object that they are operating on. And we just write out all the cases. I'm not understanding why we need to take a substantially more complex approach here. I'm probably saying same idea. It just adds invocation of external functions to check naming conflicts of functions or collation; that takes additional 4-lines for special case handling in AlterObjectNamespace_internal(). Do you have different image for the special case handling? @@ -380,6 +368,10 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid) errmsg(%s already exists in schema \%s\, getObjectDescriptionOids(classId, objid), get_namespace_name(nspOid; + else if (classId == ProcedureRelationId) + AlterFunctionNamespace_oid(rel, objid, nspOid); + else if (classId == CollationRelationId) + AlterCollationNamespace_oid(rel, objid, nspOid); /* Build modified tuple */ values = palloc0(RelationGetNumberOfAttributes(rel) * sizeof(Datum)); Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp pgsql-v9.3-alter-namespace-specials.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra XLOG in Checkpoint for StandbySnapshot
On Monday, January 07, 2013 7:15 PM Andres Freund wrote: On 2013-01-07 19:03:35 +0530, Amit Kapila wrote: On Monday, January 07, 2013 6:30 PM Simon Riggs wrote: On 7 January 2013 12:39, Amit Kapila amit.kap...@huawei.com wrote: So We can modify to change this in function LogStandbySnapshot as below: running = GetRunningTransactionData(); if (running-xcnt 0) LogCurrentRunningXacts(running); So this check will make sure that if there is no operation happening i.e. no new running transaction, then no need to log running transaction snapshot and hence further checkpoint operations will be skipped. Let me know if I am missing something? It's not the same test. The fact that nothing is running at that moment is not the same thing as saying nothing at all has run since last checkpoint. But isn't the functionality of LogStandbySnapshot() is to log all running xids and all current AccessExclusiveLocks. For RunningTransactionLocks, WAL is avoided in similar way. The information that no transactions are currently running allows you to build a recovery snapshot, without that information the standby won't start answering queries. Now that doesn't matter if all standbys already have built a snapshot, but the primary cannot know that. Can't we make sure that checkpoint operation doesn't happen for below conds. a. nothing has happened during or after last checkpoint OR b. nothing except snapshotstanby WAL has happened Currently it is done for point a. Having to issue a checkpoint while ensuring transactions are running just to get a standby up doesn't seem like a good idea to me :) Simon: If you make the correct test, I'd be more inclined to accept the premise. Not sure, what exact you are expecting from test? The test is do any one operation on system and then keep the system idle. Now at each checkpoint interval, it logs WAL for SnapshotStandby. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra XLOG in Checkpoint for StandbySnapshot
On 2013-01-08 19:51:39 +0530, Amit Kapila wrote: On Monday, January 07, 2013 7:15 PM Andres Freund wrote: On 2013-01-07 19:03:35 +0530, Amit Kapila wrote: On Monday, January 07, 2013 6:30 PM Simon Riggs wrote: On 7 January 2013 12:39, Amit Kapila amit.kap...@huawei.com wrote: So We can modify to change this in function LogStandbySnapshot as below: running = GetRunningTransactionData(); if (running-xcnt 0) LogCurrentRunningXacts(running); So this check will make sure that if there is no operation happening i.e. no new running transaction, then no need to log running transaction snapshot and hence further checkpoint operations will be skipped. Let me know if I am missing something? It's not the same test. The fact that nothing is running at that moment is not the same thing as saying nothing at all has run since last checkpoint. But isn't the functionality of LogStandbySnapshot() is to log all running xids and all current AccessExclusiveLocks. For RunningTransactionLocks, WAL is avoided in similar way. The information that no transactions are currently running allows you to build a recovery snapshot, without that information the standby won't start answering queries. Now that doesn't matter if all standbys already have built a snapshot, but the primary cannot know that. Can't we make sure that checkpoint operation doesn't happen for below conds. a. nothing has happened during or after last checkpoint OR b. nothing except snapshotstanby WAL has happened Currently it is done for point a. Having to issue a checkpoint while ensuring transactions are running just to get a standby up doesn't seem like a good idea to me :) Simon: If you make the correct test, I'd be more inclined to accept the premise. Not sure, what exact you are expecting from test? The test is do any one operation on system and then keep the system idle. Now at each checkpoint interval, it logs WAL for SnapshotStandby. I can't really follow what you want to do here. The snapshot is only logged if a checkpoint is performed anyway? As recovery starts at (the logical) checkpoint's location we need to log a snapshot exactly there. If you want to avoid activity when the system is idle you need to prevent checkpoints from occurring itself. There was a thread some time back about that and its not as trivial as it seems at the first glance. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On Tue, Jan 8, 2013 at 10:20 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro yamamuro.take...@lab.ntt.co.jp wrote: Apart from my patch, what I care is that the current one might be much slow against I/O. For example, when compressing and writing large values, compressing data (20-40MiB/s) might be a dragger against writing data in disks (50-80MiB/s). Moreover, IMHO modern (and very fast) I/O subsystems such as SSD make a bigger issue in this case. What about just turning compression off? I've been relying on compression for some big serialized blob fields for some time now. I bet I'm not alone, lots of people save serialized data to text fields. So rather than removing it, I'd just change the default to off (if that was the decision). However, it might be best to evaluate some of the modern fast compression schemes like snappy/lz4 (250MB/s per core sounds pretty good), and implement pluggable compression schemes instead. Snappy wasn't designed for nothing, it was most likely because it was necessary. Cassandra (just to name a system I'm familiar with) started without compression, and then it was deemed necessary to the point they invested considerable time into it. I've always found the fact that pg does compression of toast tables quite forward-thinking, and I'd say the feature has to remain there, extended and modernized, maybe off by default, but there. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, Jan 8, 2013 at 11:39 AM, Merlin Moncure mmonc...@gmail.com wrote: Reference: http://postgresql.1045698.n5.nabble.com/Simple-join-doesn-t-use-index-td5738689.html This is a pretty common gotcha: user sets shared_buffers but misses the esoteric but important effective_cache_size. ISTM effective_cache_size should always be = shared buffers -- this is a soft configuration error that could be reported as a warning and perhaps overridden on the fly. Not true. If there are many concurrent users running concurrent queries against parallel databases, such as some test systems I have that contain many databases for many test environments, such a setting wouldn't make sense. If a DBA sets it to lower than shared_buffers, that setting has to be honored. Rather, I'd propose the default setting should be -1 or something default and automagic that works most of the time (but not all). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On 01/08/2013 01:45 AM, james wrote: The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latter in particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly common pattern) into a set of Postgres records in a single pass. So this would allow an 'insert into ... select ... from unpack-the-JSON(...)'? Yes. I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided. Your query above would need to be planned too, although the plan will be trivial. Is this mechanism likely to be as fast as we can get at the moment in contexts where copy is not feasible? You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in the json array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications. Both these routines and COPY in non-binary mode use the data type input routines to convert text values. In some cases (very notably timestamps) these routines can easily be shown to be fantastically expensive compared to binary input. This is part of what has led to the creation of utilities like pg_bulkload. Perhaps if you give us a higher level view of what you're trying to achieve we can help you better. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra XLOG in Checkpoint for StandbySnapshot
On Tuesday, January 08, 2013 8:01 PM Andres Freund wrote: On 2013-01-08 19:51:39 +0530, Amit Kapila wrote: On Monday, January 07, 2013 7:15 PM Andres Freund wrote: On 2013-01-07 19:03:35 +0530, Amit Kapila wrote: On Monday, January 07, 2013 6:30 PM Simon Riggs wrote: On 7 January 2013 12:39, Amit Kapila amit.kap...@huawei.com wrote: So We can modify to change this in function LogStandbySnapshot as below: running = GetRunningTransactionData(); if (running-xcnt 0) LogCurrentRunningXacts(running); So this check will make sure that if there is no operation happening i.e. no new running transaction, then no need to log running transaction snapshot and hence further checkpoint operations will be skipped. Let me know if I am missing something? It's not the same test. The fact that nothing is running at that moment is not the same thing as saying nothing at all has run since last checkpoint. But isn't the functionality of LogStandbySnapshot() is to log all running xids and all current AccessExclusiveLocks. For RunningTransactionLocks, WAL is avoided in similar way. The information that no transactions are currently running allows you to build a recovery snapshot, without that information the standby won't start answering queries. Now that doesn't matter if all standbys already have built a snapshot, but the primary cannot know that. Can't we make sure that checkpoint operation doesn't happen for below conds. a. nothing has happened during or after last checkpoint OR b. nothing except snapshotstanby WAL has happened Currently it is done for point a. Having to issue a checkpoint while ensuring transactions are running just to get a standby up doesn't seem like a good idea to me :) Simon: If you make the correct test, I'd be more inclined to accept the premise. Not sure, what exact you are expecting from test? The test is do any one operation on system and then keep the system idle. Now at each checkpoint interval, it logs WAL for SnapshotStandby. I can't really follow what you want to do here. The snapshot is only logged if a checkpoint is performed anyway? As recovery starts at (the logical) checkpoint's location we need to log a snapshot exactly there. If you want to avoid activity when the system is idle you need to prevent checkpoints from occurring itself. Even if the checkpoint is scheduled, it doesn't perform actual operation if there's nothing logged between current and previous checkpoint due to below check in CreateCheckPoint() function. if (curInsert == ControlFile-checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) ControlFile-checkPoint == ControlFile-checkPointCopy.redo) But if we set the wal_level as hot_standby, it will log snapshot, now next time again when function CreateCheckPoint() will get called due to scheduled checkpoint, the above check will fail and it will again log snapshot, so this will continue, even if the system is totally idle. I understand that it doesn't cause any problem, but I think it is better if the repeated log of snapshot in this scenario can be avoided. There was a thread some time back about that and its not as trivial as it seems at the first glance. I know some part of it that it has been fixed to avoid checkpoint operation for low activity system and later on that change is rolledback due to another problem, but I am not sure if it has been agreed that we don't need to do anything for the above scenario. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
On Sat, Jan 5, 2013 at 11:04 AM, Stephen Frost sfr...@snowman.net wrote: * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: * also we discuss about create two new catalogs, one local and another shared (like pg_description and pg_shdescription) to track creation times of all database objects. Creating a separate catalog (or two) every time we want to track XYZ for all objects is rather overkill... Thinking about this a bit more, and noting that pg_description/shdescription more-or-less already exist as a framework for tracking 'something' for 'all catalog entries'- why don't we just add these columns to those tables..? This would also address Peter's concern about making sure we do this 'wholesale' and in one release rather than spread across multiple releases- just make sure it covers the same set of things which 'comment' does. I suspect that trying to shoehorn this into pg_description/pg_shdescription will contort both features unnecessarily, but I'm willing to be proven wrong. Also, I don't think we really need a GUC for this. Indeed, a GUC would seem to me to defeat the entire point of the feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra XLOG in Checkpoint for StandbySnapshot
On 2013-01-08 20:33:28 +0530, Amit Kapila wrote: On Tuesday, January 08, 2013 8:01 PM Andres Freund wrote: On 2013-01-08 19:51:39 +0530, Amit Kapila wrote: On Monday, January 07, 2013 7:15 PM Andres Freund wrote: On 2013-01-07 19:03:35 +0530, Amit Kapila wrote: On Monday, January 07, 2013 6:30 PM Simon Riggs wrote: On 7 January 2013 12:39, Amit Kapila amit.kap...@huawei.com wrote: So We can modify to change this in function LogStandbySnapshot as below: running = GetRunningTransactionData(); if (running-xcnt 0) LogCurrentRunningXacts(running); So this check will make sure that if there is no operation happening i.e. no new running transaction, then no need to log running transaction snapshot and hence further checkpoint operations will be skipped. Let me know if I am missing something? It's not the same test. The fact that nothing is running at that moment is not the same thing as saying nothing at all has run since last checkpoint. But isn't the functionality of LogStandbySnapshot() is to log all running xids and all current AccessExclusiveLocks. For RunningTransactionLocks, WAL is avoided in similar way. The information that no transactions are currently running allows you to build a recovery snapshot, without that information the standby won't start answering queries. Now that doesn't matter if all standbys already have built a snapshot, but the primary cannot know that. Can't we make sure that checkpoint operation doesn't happen for below conds. a. nothing has happened during or after last checkpoint OR b. nothing except snapshotstanby WAL has happened Currently it is done for point a. Having to issue a checkpoint while ensuring transactions are running just to get a standby up doesn't seem like a good idea to me :) Simon: If you make the correct test, I'd be more inclined to accept the premise. Not sure, what exact you are expecting from test? The test is do any one operation on system and then keep the system idle. Now at each checkpoint interval, it logs WAL for SnapshotStandby. I can't really follow what you want to do here. The snapshot is only logged if a checkpoint is performed anyway? As recovery starts at (the logical) checkpoint's location we need to log a snapshot exactly there. If you want to avoid activity when the system is idle you need to prevent checkpoints from occurring itself. Even if the checkpoint is scheduled, it doesn't perform actual operation if there's nothing logged between current and previous checkpoint due to below check in CreateCheckPoint() function. if (curInsert == ControlFile-checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) ControlFile-checkPoint == ControlFile-checkPointCopy.redo) But if we set the wal_level as hot_standby, it will log snapshot, now next time again when function CreateCheckPoint() will get called due to scheduled checkpoint, the above check will fail and it will again log snapshot, so this will continue, even if the system is totally idle. I understand that it doesn't cause any problem, but I think it is better if the repeated log of snapshot in this scenario can be avoided. ISTM in that case you just need a way to cope with the additionally logged record in the above piece of code. Not logging seems to be the entirely wrong way to go at this. I admit its not totally simple, but making HS less predictable seems like a cure *far* worse than the disease. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Weird Assert failure in GetLockStatusData()
This is a bit disturbing: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushpigdt=2013-01-07%2019%3A15%3A02 The key bit is [50eb2156.651e:6] LOG: execute isolationtester_waiting: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShar! eLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'AccessExclusiveLock' THEN ARRAY['AccessShareLock','RowShareLock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] END) AND holder.locktype IS NOT DISTINCT FROM waiter.locktype AND holder.database IS NOT DISTINCT FROM waiter.database AND holder.relation IS NOT DISTINCT FROM waiter.relation AND holder.page IS NOT DISTINCT FROM waiter.page AND holder.tuple IS NOT DISTINCT FROM waiter.tuple AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid AND holder.transactionid IS NOT DISTINCT FROM waiter.transactionid AND holder.classid IS NOT DISTINCT FROM waiter.classid AND holder.objid IS NOT DISTINCT FROM waiter.objid AND holder.objsubid IS NOT DISTINCT FROM waiter.objsubid [50eb2156.651e:7] DETAIL: parameters: $1 = '25889' TRAP: FailedAssertion(!(el == data-nelements), File: lock.c, Line: 3398) [50eb2103.62ee:2] LOG: server process (PID 25886) was terminated by signal 6: Aborted [50eb2103.62ee:3] DETAIL: Failed process was running: SELECT 1 FROM pg_locks holder, pg_locks waiter WHERE NOT waiter.granted AND waiter.pid = $1 AND holder.granted AND holder.pid $1 AND holder.pid IN (25887, 25888, 25889) AND holder.mode = ANY (CASE waiter.mode WHEN 'AccessShareLock' THEN ARRAY['AccessExclusiveLock'] WHEN 'RowShareLock' THEN ARRAY['ExclusiveLock','AccessExclusiveLock'] WHEN 'RowExclusiveLock' THEN ARRAY['ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareUpdateExclusiveLock' THEN ARRAY['ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ShareRowExclusiveLock' THEN ARRAY['RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','ExclusiveLock','AccessExclusiveLock'] WHEN 'ExclusiveLock' THEN ARRAY['RowShareL! ock','RowExclusiveLock','ShareUpdateExclusiveLock','ShareLock','ShareRowExclusiveLock','E The assertion failure seems to indicate that the number of LockMethodProcLockHash entries found by hash_seq_search didn't match the number that had been counted by hash_get_num_entries immediately before that. I don't see any bug in GetLockStatusData itself, so this suggests that there's something wrong with dynahash's entry counting, or that somebody somewhere is modifying the shared hash table without holding the appropriate lock. The latter seems a bit more likely, given that this must be a very low-probability bug or we'd have seen it before. An overlooked locking requirement in a seldom-taken code path would fit the symptoms. Or maybe bushpig just had some weird cosmic-ray hardware failure, but I don't put a lot of faith in such explanations. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On Tue, Jan 8, 2013 at 9:32 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jan 8, 2013 at 3:58 AM, Peter Eisentraut pete...@gmx.net wrote: For debugging PL/Python functions, I'm often tempted to write something like rv = plpy.execute(...) plpy.info(rv) which prints something unhelpful like PLyResult object at 0xb461d8d8 By implementing a str handler for the result object, it now prints something like PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': '22'}] This looks more a repr-style format to me (if you implement repr but not str, the latter will default to the former). Patch attached for review. How does it work if there are many rows in there? Say the result contains 10,000 rows - will the string contain all of them? If so, might it be worthwhile to cap the number of rows shown and then follow with a ... or something? I think it would: old django versions were a pain in the neck because when a page broke an entire dump of gigantic queries was often dumped as debug info. -- Daniele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade regression test litters the source tree with log files
In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] xlogreader-v4
From: Andres Freund and...@2ndquadrant.com Subject: [PATCH] xlogreader-v4 In-Reply-To: Hi, this is the latest and obviously best version of xlogreader xlogdump with changes both from Heikki and me. Changes: * windows build support for pg_xlogdump * xlogdump moved to contrib * xlogdump option parsing enhancements * generic cleanups * a few more comments * removal of some ugliness in XLogFindNextRecord I think its mostly ready, for xlogdump minimally these two issues remain: const char * timestamptz_to_str(TimestampTz dt) { return unimplemented-timestamp; } const char * relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum) { return unimplemented-relpathbackend; } aren't exactly the nicest wrapper functions. I think its ok to simply copy relpathbackend from the backend, but timestamptz_to_str? Thats a heck of a lot of code. Patches 1 and 2 and 5 are just preparatory and probably can be applied beforehand. 3 and 4 are the real meat of this and especially 3 needs some careful review. Input welcome! Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From: Andres Freund and...@anarazel.de c.h already had parts of the assert support (StaticAssert*) and its the shared file between postgres.h and postgres_fe.h. This makes it easier to build frontend programs which have to do the hack. --- src/include/c.h | 65 +++ src/include/postgres.h| 54 ++- src/include/postgres_fe.h | 12 - 3 files changed, 67 insertions(+), 64 deletions(-) diff --git a/src/include/c.h b/src/include/c.h index f7db157..c30df8b 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -694,6 +694,71 @@ typedef NameData *Name; /* + * USE_ASSERT_CHECKING, if defined, turns on all the assertions. + * - plai 9/5/90 + * + * It should _NOT_ be defined in releases or in benchmark copies + */ + +/* + * Assert() can be used in both frontend and backend code. In frontend code it + * just calls the standard assert, if it's available. If use of assertions is + * not configured, it does nothing. + */ +#ifndef USE_ASSERT_CHECKING + +#define Assert(condition) +#define AssertMacro(condition) ((void)true) +#define AssertArg(condition) +#define AssertState(condition) + +#elif defined FRONTEND + +#include assert.h +#define Assert(p) assert(p) +#define AssertMacro(p) ((void) assert(p)) + +#else /* USE_ASSERT_CHECKING FRONTEND */ + +/* + * Trap + * Generates an exception if the given condition is true. + */ +#define Trap(condition, errorType) \ + do { \ + if ((assert_enabled) (condition)) \ + ExceptionalCondition(CppAsString(condition), (errorType), \ +__FILE__, __LINE__); \ + } while (0) + +/* + * TrapMacro is the same as Trap but it's intended for use in macros: + * + * #define foo(x) (AssertMacro(x != 0), bar(x)) + * + * Isn't CPP fun? + */ +#define TrapMacro(condition, errorType) \ + ((bool) ((! assert_enabled) || ! (condition) || \ +(ExceptionalCondition(CppAsString(condition), (errorType), \ + __FILE__, __LINE__), 0))) + +#define Assert(condition) \ + Trap(!(condition), FailedAssertion) + +#define AssertMacro(condition) \ + ((void) TrapMacro(!(condition), FailedAssertion)) + +#define AssertArg(condition) \ + Trap(!(condition), BadArgument) + +#define AssertState(condition) \ + Trap(!(condition), BadState) + +#endif /* USE_ASSERT_CHECKING !FRONTEND */ + + +/* * Macros to support compile-time assertion checks. * * If the condition (a compile-time-constant expression) evaluates to false, diff --git a/src/include/postgres.h b/src/include/postgres.h index b6e922f..bbe125a 100644 --- a/src/include/postgres.h +++ b/src/include/postgres.h @@ -25,7 +25,7 @@ * --- * 1) variable-length datatypes (TOAST support) * 2) datum type + support macros - * 3) exception handling definitions + * 3) exception handling * * NOTES * @@ -627,62 +627,12 @@ extern Datum Float8GetDatum(float8 X); /* - * Section 3: exception handling definitions - * Assert, Trap, etc macros + * Section 3: exception handling backend support * */ extern PGDLLIMPORT bool assert_enabled; -/* - * USE_ASSERT_CHECKING, if defined, turns on all the assertions. - * - plai 9/5/90 - * - * It should _NOT_ be defined in releases or in benchmark copies - */ - -/* - * Trap - * Generates an exception if the given condition is true. - */ -#define Trap(condition, errorType) \ - do { \ - if ((assert_enabled) (condition)) \ - ExceptionalCondition(CppAsString(condition), (errorType), \ -__FILE__, __LINE__); \ - } while (0) - -/* - * TrapMacro is the same as Trap but it's intended for use in macros: - * - * #define foo(x) (AssertMacro(x != 0), bar(x)) - * - * Isn't CPP fun? - */ -#define TrapMacro(condition, errorType) \ - ((bool) ((! assert_enabled) || ! (condition) || \ -(ExceptionalCondition(CppAsString(condition), (errorType), \ - __FILE__, __LINE__), 0))) - -#ifndef USE_ASSERT_CHECKING -#define Assert(condition) -#define AssertMacro(condition) ((void)true) -#define AssertArg(condition) -#define AssertState(condition) -#else -#define Assert(condition) \ -
[HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From: Andres Freund and...@anarazel.de relpathbackend() (via some of its wrappers) is used in *_desc routines which we want to be useable without a backend environment arround. Change signature to return a 'const char *' to make misuse easier to detect. That necessicates also changing the 'FileName' typedef to 'const char *' which seems to be a good idea anyway. --- src/backend/access/rmgrdesc/smgrdesc.c | 6 ++--- src/backend/access/rmgrdesc/xactdesc.c | 6 ++--- src/backend/access/transam/xlogutils.c | 9 +++ src/backend/catalog/catalog.c | 49 +++--- src/backend/storage/buffer/bufmgr.c| 12 +++-- src/backend/storage/file/fd.c | 6 ++--- src/backend/storage/smgr/md.c | 23 +--- src/backend/utils/adt/dbsize.c | 4 +-- src/include/catalog/catalog.h | 2 +- src/include/storage/fd.h | 9 +++ 10 files changed, 42 insertions(+), 84 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index bcabf89..490c8c7 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -26,19 +26,17 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec) if (info == XLOG_SMGR_CREATE) { xl_smgr_create *xlrec = (xl_smgr_create *) rec; - char *path = relpathperm(xlrec-rnode, xlrec-forkNum); + const char *path = relpathperm(xlrec-rnode, xlrec-forkNum); appendStringInfo(buf, file create: %s, path); - pfree(path); } else if (info == XLOG_SMGR_TRUNCATE) { xl_smgr_truncate *xlrec = (xl_smgr_truncate *) rec; - char *path = relpathperm(xlrec-rnode, MAIN_FORKNUM); + const char *path = relpathperm(xlrec-rnode, MAIN_FORKNUM); appendStringInfo(buf, file truncate: %s to %u blocks, path, xlrec-blkno); - pfree(path); } else appendStringInfo(buf, UNKNOWN); diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 2471279..b86a53e 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -35,10 +35,9 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec) appendStringInfo(buf, ; rels:); for (i = 0; i xlrec-nrels; i++) { - char *path = relpathperm(xlrec-xnodes[i], MAIN_FORKNUM); + const char *path = relpathperm(xlrec-xnodes[i], MAIN_FORKNUM); appendStringInfo(buf, %s, path); - pfree(path); } } if (xlrec-nsubxacts 0) @@ -105,10 +104,9 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec) appendStringInfo(buf, ; rels:); for (i = 0; i xlrec-nrels; i++) { - char *path = relpathperm(xlrec-xnodes[i], MAIN_FORKNUM); + const char *path = relpathperm(xlrec-xnodes[i], MAIN_FORKNUM); appendStringInfo(buf, %s, path); - pfree(path); } } if (xlrec-nsubxacts 0) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index f9a6e62..8266f3c 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -57,7 +57,7 @@ static void report_invalid_page(int elevel, RelFileNode node, ForkNumber forkno, BlockNumber blkno, bool present) { - char *path = relpathperm(node, forkno); + const char *path = relpathperm(node, forkno); if (present) elog(elevel, page %u of relation %s is uninitialized, @@ -65,7 +65,6 @@ report_invalid_page(int elevel, RelFileNode node, ForkNumber forkno, else elog(elevel, page %u of relation %s does not exist, blkno, path); - pfree(path); } /* Log a reference to an invalid page */ @@ -153,11 +152,10 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno) { if (log_min_messages = DEBUG2 || client_min_messages = DEBUG2) { - char *path = relpathperm(hentry-key.node, forkno); + const char *path = relpathperm(hentry-key.node, forkno); elog(DEBUG2, page %u of relation %s has been dropped, hentry-key.blkno, path); - pfree(path); } if (hash_search(invalid_page_tab, @@ -186,11 +184,10 @@
[HACKERS] [PATCH 5/5] remove spurious space in running_xact's _desc function
From: Andres Freund and...@anarazel.de --- src/backend/access/rmgrdesc/standbydesc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index c38892b..5fb6f54 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -57,7 +57,7 @@ standby_desc(StringInfo buf, uint8 xl_info, char *rec) { xl_running_xacts *xlrec = (xl_running_xacts *) rec; - appendStringInfo(buf, running xacts:); + appendStringInfo(buf, running xacts:); standby_desc_running_xacts(buf, xlrec); } else -- 1.7.12.289.g0ce9864.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 4/5] Add pg_xlogdump contrib module
From: Andres Freund and...@anarazel.de Authors: Andres Freund, Heikki Linnakangas --- contrib/Makefile | 1 + contrib/pg_xlogdump/Makefile | 37 +++ contrib/pg_xlogdump/compat.c | 58 contrib/pg_xlogdump/pg_xlogdump.c | 654 ++ contrib/pg_xlogdump/tables.c | 78 + doc/src/sgml/ref/allfiles.sgml| 1 + doc/src/sgml/ref/pg_xlogdump.sgml | 76 + doc/src/sgml/reference.sgml | 1 + src/backend/access/transam/rmgr.c | 1 + src/backend/catalog/catalog.c | 2 + src/tools/msvc/Mkvcbuild.pm | 16 +- 11 files changed, 924 insertions(+), 1 deletion(-) create mode 100644 contrib/pg_xlogdump/Makefile create mode 100644 contrib/pg_xlogdump/compat.c create mode 100644 contrib/pg_xlogdump/pg_xlogdump.c create mode 100644 contrib/pg_xlogdump/tables.c create mode 100644 doc/src/sgml/ref/pg_xlogdump.sgml diff --git a/contrib/Makefile b/contrib/Makefile index fcd7c1e..5d290b8 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -39,6 +39,7 @@ SUBDIRS = \ pg_trgm \ pg_upgrade \ pg_upgrade_support \ + pg_xlogdump \ pgbench \ pgcrypto\ pgrowlocks \ diff --git a/contrib/pg_xlogdump/Makefile b/contrib/pg_xlogdump/Makefile new file mode 100644 index 000..1adef35 --- /dev/null +++ b/contrib/pg_xlogdump/Makefile @@ -0,0 +1,37 @@ +# contrib/pg_xlogdump/Makefile + +PGFILEDESC = pg_xlogdump +PGAPPICON=win32 + +PROGRAM = pg_xlogdump +OBJS = pg_xlogdump.o compat.o tables.o xlogreader.o $(RMGRDESCOBJS) \ + $(WIN32RES) + +# XXX: Perhaps this should be done by a wildcard rule so that you don't need +# to remember to add new rmgrdesc files to this list. +RMGRDESCSOURCES = clogdesc.c dbasedesc.c gindesc.c gistdesc.c hashdesc.c \ + heapdesc.c mxactdesc.c nbtdesc.c relmapdesc.c seqdesc.c smgrdesc.c \ + spgdesc.c standbydesc.c tblspcdesc.c xactdesc.c xlogdesc.c + +RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES)) + +EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_xlogdump +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +override CPPFLAGS := -DFRONTEND $(CPPFLAGS) + +xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/% + rm -f $@ $(LN_S) $ . + +$(RMGRDESCSOURCES): % : $(top_srcdir)/src/backend/access/rmgrdesc/% + rm -f $@ $(LN_S) $ . diff --git a/contrib/pg_xlogdump/compat.c b/contrib/pg_xlogdump/compat.c new file mode 100644 index 000..e150afb --- /dev/null +++ b/contrib/pg_xlogdump/compat.c @@ -0,0 +1,58 @@ +/*- + * + * compat.c + * Reimplementations of various backend functions. + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_xlogdump/compat.c + * + * This file contains client-side implementations for various backend + * functions that the rm_desc functions in *desc.c files rely on. + * + *- + */ + +/* ugly hack, same as in e.g pg_controldata */ +#define FRONTEND 1 +#include postgres.h + +#include catalog/catalog.h +#include datatype/timestamp.h +#include lib/stringinfo.h +#include storage/relfilenode.h +#include utils/timestamp.h +#include utils/datetime.h + +const char * +timestamptz_to_str(TimestampTz dt) +{ + return unimplemented-timestamp; +} + +const char * +relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum) +{ + return unimplemented-relpathbackend; +} + +/* + * Provide a hacked up compat layer for StringInfos so xlog desc functions can + * be linked/called. + */ +void +appendStringInfo(StringInfo str, const char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + vprintf(fmt, args); + va_end(args); +} + +void +appendStringInfoString(StringInfo str, const char *string) +{ + appendStringInfo(str, %s, string); +} diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c new file mode 100644 index 000..29ee73e --- /dev/null +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -0,0 +1,654 @@ +/*- + * + * pg_xlogdump.c - decode and display WAL + * + * Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_xlogdump/pg_xlogdump.c + *- + */ + +/* ugly hack, same as in e.g pg_controldata */ +#define FRONTEND 1 +#include postgres.h + +#include unistd.h + +#include access/xlog.h +#include access/xlogreader.h +#include
Re: [HACKERS] [PATCH] xlogreader-v4
On 8 January 2013 19:09, Andres Freund and...@2ndquadrant.com wrote: From: Andres Freund and...@2ndquadrant.com Subject: [PATCH] xlogreader-v4 In-Reply-To: Hi, this is the latest and obviously best version of xlogreader xlogdump with changes both from Heikki and me. Aren't you forgetting something? -- Thom
Re: [HACKERS] [PATCH] xlogreader-v4
On 8 January 2013 19:15, Thom Brown t...@linux.com wrote: On 8 January 2013 19:09, Andres Freund and...@2ndquadrant.com wrote: From: Andres Freund and...@2ndquadrant.com Subject: [PATCH] xlogreader-v4 In-Reply-To: Hi, this is the latest and obviously best version of xlogreader xlogdump with changes both from Heikki and me. Aren't you forgetting something? I see, you're posting them separately. Nevermind. -- Thom
Re: [HACKERS] [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
Andres Freund and...@2ndquadrant.com writes: From: Andres Freund and...@anarazel.de c.h already had parts of the assert support (StaticAssert*) and its the shared file between postgres.h and postgres_fe.h. This makes it easier to build frontend programs which have to do the hack. This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. I don't think that moving the two different definitions into an #if block in one file is an improvement. If that were an improvement, we might as well move everything in both postgres.h and postgres_fe.h into c.h with a pile of #ifs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] xlogreader-v4
On 2013-01-08 20:09:42 +0100, Andres Freund wrote: From: Andres Freund and...@2ndquadrant.com Subject: [PATCH] xlogreader-v4 In-Reply-To: Hi, this is the latest and obviously best version of xlogreader xlogdump with changes both from Heikki and me. Changes: * windows build support for pg_xlogdump That was done blindly, btw, so I only know it compiles, not that it runs... Its in git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlogreader_v4 btw. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Andres Freund and...@2ndquadrant.com writes: maxpg From: Andres Freund and...@anarazel.de relpathbackend() (via some of its wrappers) is used in *_desc routines which we want to be useable without a backend environment arround. I'm 100% unimpressed with making relpathbackend return a pointer to a static buffer. Who's to say whether that won't create bugs due to overlapping usages? Change signature to return a 'const char *' to make misuse easier to detect. That seems to create way more churn than is necessary, and it's wrong anyway if the result is palloc'd. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
On 2013-01-08 14:25:06 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: From: Andres Freund and...@anarazel.de c.h already had parts of the assert support (StaticAssert*) and its the shared file between postgres.h and postgres_fe.h. This makes it easier to build frontend programs which have to do the hack. This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. I don't think that moving the two different definitions into an #if block in one file is an improvement. If that were an improvement, we might as well move everything in both postgres.h and postgres_fe.h into c.h with a pile of #ifs. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. It seems the most consistent to move all of that into c.h, enough of the assertion stuff is already there, I don't see an advantage of splitting it across 3 files as it currently is. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:25:06 -0500, Tom Lane wrote: This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. There is no place, anywhere, that should be including both. So I don't see the problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 14:28:14 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: maxpg From: Andres Freund and...@anarazel.de relpathbackend() (via some of its wrappers) is used in *_desc routines which we want to be useable without a backend environment arround. I'm 100% unimpressed with making relpathbackend return a pointer to a static buffer. Who's to say whether that won't create bugs due to overlapping usages? I say it ;). I've gone through all callers and checked. Not that that guarantees anything, but ... The reason a static buffer is better is that some of the *desc routines use relpathbackend() and pfree() the result. That would require providing a stub pfree() in xlogdump which seems to be exceedingly ugly. Change signature to return a 'const char *' to make misuse easier to detect. That seems to create way more churn than is necessary, and it's wrong anyway if the result is palloc'd. It causes warnings in potential external users that pfree() the result of relpathbackend which seems helpful. Obviously only makes sense if relpathbackend() returns a pointer into a static buffer... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? Daniel already did object I think we have other problems that need solving much more than this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:28:14 -0500, Tom Lane wrote: I'm 100% unimpressed with making relpathbackend return a pointer to a static buffer. Who's to say whether that won't create bugs due to overlapping usages? I say it ;). I've gone through all callers and checked. Not that that guarantees anything, but ... Even if you've proven it safe today, it seems unnecessarily fragile. Just about any other place we've used a static result buffer, we've regretted it, unless the use cases were *very* narrow. The reason a static buffer is better is that some of the *desc routines use relpathbackend() and pfree() the result. That would require providing a stub pfree() in xlogdump which seems to be exceedingly ugly. Why? If we have palloc support how hard can it be to map pfree to free? And why wouldn't we want to? I can hardly imagine providing only palloc and not pfree support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? I'm pretty sure the logical change stuff Andres et al. are working on will be able to include the originating node, which makes cycle detection dead simple. Other restrictions on the graph like, must be a tree might be more complicated. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 14:53:29 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:28:14 -0500, Tom Lane wrote: I'm 100% unimpressed with making relpathbackend return a pointer to a static buffer. Who's to say whether that won't create bugs due to overlapping usages? I say it ;). I've gone through all callers and checked. Not that that guarantees anything, but ... Even if you've proven it safe today, it seems unnecessarily fragile. Just about any other place we've used a static result buffer, we've regretted it, unless the use cases were *very* narrow. Hm, relpathbackend seems pretty narrow to me. Funny, we both argued the other way round than we are now when talking about the sprintf(..., %X/%X, (uint32)(recptr 32), (uint32)recptr) thingy ;) The reason a static buffer is better is that some of the *desc routines use relpathbackend() and pfree() the result. That would require providing a stub pfree() in xlogdump which seems to be exceedingly ugly. Why? If we have palloc support how hard can it be to map pfree to free? And why wouldn't we want to? I can hardly imagine providing only palloc and not pfree support. Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided. Your query above would need to be planned too, although the plan will be trivial. Ah yes, I meant that I had not found a way to avoid it (for multi-row inserts etc) from a stored proc context where I have SPI functions available. You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in the js on array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications. Yes - but I don't think I can use COPY from a stored proc context can I? If I could use binary COPY from a stored proc that has received a binary param and unpacked to the data, it would be handy. If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess. Perhaps if you give us a higher level view of what you're trying to achieve we can help you better. I had been trying to identify a way to work with record sets where the records might be used for insert, or for updates or deletion statements, preferably without forming a large custom SQL statement that must then be parsed and planned (and which would be a PITA if I wanted to use the SQL-C preprocessor or some language bindings that like to prepare a statement and execute with params). The data I work with has a master-detail structure and insertion performance matters, so I'm trying to limit manipulations to one statement per table per logical operation even where there are multiple detail rows. Sometimes the network latency can be a pain too and that also suggests an RPC with unpack and insert locally. Cheers James -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On 01/08/2013 09:58 AM, Andrew Dunstan wrote: If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Here is a Proof Of Concept patch against my development tip on what's involved in getting the JSON lexer not to need a nul-terminated string to parse. This passes regression, incidentally. The downside is that processing is very slightly more complex, and that json_in() would need to call strlen() on its input. The upside would be that the processing routines I've been working on would no longer need to create copies of their json arguments using text_to_cstring() just so they can get a null-terminated string to process. Consequent changes would modify the signature of makeJsonLexContext() so it's first argument would be a text* instead of a char* (and of course its logic would change accordingly). I could go either way. Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
On 2013-01-08 14:35:12 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:25:06 -0500, Tom Lane wrote: This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. There is no place, anywhere, that should be including both. So I don't see the problem. Sorry, misremembered the problem somewhat. The problem is that code that includes postgres.h atm ends up with ExceptionalCondition() et al. declared even if FRONTEND is defined. So if anything uses an assert you need to provide wrappers for those which seems nasty. If they are provided centrally and check for FRONTEND that problem doesn't exist. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On 01/08/2013 03:12 PM, Andrew Dunstan wrote: On 01/08/2013 09:58 AM, Andrew Dunstan wrote: If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Here is a Proof Of Concept patch against my development tip on what's involved in getting the JSON lexer not to need a nul-terminated string to parse. This passes regression, incidentally. The downside is that processing is very slightly more complex, and that json_in() would need to call strlen() on its input. The upside would be that the processing routines I've been working on would no longer need to create copies of their json arguments using text_to_cstring() just so they can get a null-terminated string to process. Consequent changes would modify the signature of makeJsonLexContext() so it's first argument would be a text* instead of a char* (and of course its logic would change accordingly). I could go either way. Thoughts? this time with patch ... *** a/src/backend/utils/adt/json.c --- b/src/backend/utils/adt/json.c *** *** 212,217 makeJsonLexContext(char *json, bool need_escapes) --- 212,218 lex-input = lex-token_terminator = lex-line_start = json; lex-line_number = 1; + lex-input_length = strlen(json); if (need_escapes) lex-strval = makeStringInfo(); return lex; *** *** 398,416 static void json_lex(JsonLexContext *lex) { char *s; ! /* Skip leading whitespace. */ s = lex-token_terminator; ! while (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r') { if (*s == '\n') ++lex-line_number; ++s; } lex-token_start = s; /* Determine token type. */ ! if (*s == '\0') { lex-token_start = NULL; lex-prev_token_terminator = lex-token_terminator; --- 399,420 json_lex(JsonLexContext *lex) { char *s; ! int len; /* Skip leading whitespace. */ s = lex-token_terminator; ! len = s - lex-input; ! while (len lex-input_length ! (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r')) { if (*s == '\n') ++lex-line_number; ++s; + ++len; } lex-token_start = s; /* Determine token type. */ ! if (len = lex-input_length) { lex-token_start = NULL; lex-prev_token_terminator = lex-token_terminator; *** *** 476,482 json_lex(JsonLexContext *lex) * whole word as an unexpected token, rather than just some * unintuitive prefix thereof. */ ! for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++) /* skip */ ; /* --- 480,486 * whole word as an unexpected token, rather than just some * unintuitive prefix thereof. */ ! for (p = s; JSON_ALPHANUMERIC_CHAR(*p) p - s lex-input_length - len; p++) /* skip */ ; /* *** *** 519,539 static void json_lex_string(JsonLexContext *lex) { char *s; ! if (lex-strval != NULL) resetStringInfo(lex-strval); ! for (s = lex-token_start + 1; *s != ''; s++) { ! /* Per RFC4627, these characters MUST be escaped. */ ! if ((unsigned char) *s 32) { ! /* A NUL byte marks the (premature) end of the string. */ ! if (*s == '\0') ! { ! lex-token_terminator = s; ! report_invalid_token(lex); ! } /* Since *s isn't printable, exclude it from the context string */ lex-token_terminator = s; ereport(ERROR, --- 523,545 json_lex_string(JsonLexContext *lex) { char *s; ! int len; if (lex-strval != NULL) resetStringInfo(lex-strval); ! len = lex-token_start - lex-input; ! len++; ! for (s = lex-token_start + 1; *s != ''; s++, len++) { ! /* Premature end of the string. */ ! if (len = lex-input_length) { ! lex-token_terminator = s; ! report_invalid_token(lex); ! } ! else if ((unsigned char) *s 32) ! { ! /* Per RFC4627, these characters MUST be escaped. */ /* Since *s isn't printable, exclude it from the context string */ lex-token_terminator = s; ereport(ERROR, *** *** 547,553 json_lex_string(JsonLexContext *lex) { /* OK, we have an escape character. */ s++; ! if (*s == '\0') { lex-token_terminator = s; report_invalid_token(lex); --- 553,560 { /* OK, we have an escape character. */ s++; ! len++; ! if (len = lex-input_length) { lex-token_terminator = s; report_invalid_token(lex); *** *** 560,566 json_lex_string(JsonLexContext *lex) for (i = 1; i = 4; i++) { s++; ! if (*s == '\0') { lex-token_terminator = s; report_invalid_token(lex); --- 567,574 for (i = 1; i = 4; i++) { s++; ! len++; ! if (len = lex-input_length) { lex-token_terminator = s; report_invalid_token(lex); *** *** 690,696 json_lex_number(JsonLexContext *lex,
Re: [HACKERS] json api WIP patch
On 01/08/2013 03:07 PM, james wrote: Yes - but I don't think I can use COPY from a stored proc context can I? If I could use binary COPY from a stored proc that has received a binary param and unpacked to the data, it would be handy. You can use COPY from a stored procedure, but only to and from files. If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess. SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Andres Freund and...@2ndquadrant.com writes: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 8 January 2013 19:53, David Fetter da...@fetter.org wrote: On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? I'm pretty sure the logical change stuff Andres et al. are working on will be able to include the originating node, which makes cycle detection dead simple. That's different thing really, but I see what you mean. The problem here is how you tell whether an indirect connection is connected to the master. It's not just a hard problem its a transient problem, where any one person's view of the answer might be in the midst of changing as you measure it. So throwing an error message might make certain cluster configs inoperable. I'd prefer to be able to bring up a complex cluster in any order, rather than in waves of startups all needing synchronisation to avoid error. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
Andres Freund wrote: On 2013-01-08 14:35:12 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:25:06 -0500, Tom Lane wrote: This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. There is no place, anywhere, that should be including both. So I don't see the problem. Sorry, misremembered the problem somewhat. The problem is that code that includes postgres.h atm ends up with ExceptionalCondition() et al. declared even if FRONTEND is defined. So if anything uses an assert you need to provide wrappers for those which seems nasty. If they are provided centrally and check for FRONTEND that problem doesn't exist. I think the right fix here is to fix things so that postgres.h is not necessary. How hard is that? Maybe it just requires some more reshuffling of xlog headers. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. As far as this patch is concerned, I think it's sufficient to do #define palloc(x) malloc(x) #define pfree(x) free(x) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 15:27:23 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. If you think relpathbackend() alone warrants that, yes, I can provide a wrapper. Everything else is imo already handled in a sensible and not really ugly manner? Imo its not worth the effort *for this alone*. I already had some elog(), ereport(), whatever emulation but Heikki preferred not to have it, so its removed by now. To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
On 2013-01-08 17:36:19 -0300, Alvaro Herrera wrote: Andres Freund wrote: On 2013-01-08 14:35:12 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-08 14:25:06 -0500, Tom Lane wrote: This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. There is no place, anywhere, that should be including both. So I don't see the problem. Sorry, misremembered the problem somewhat. The problem is that code that includes postgres.h atm ends up with ExceptionalCondition() et al. declared even if FRONTEND is defined. So if anything uses an assert you need to provide wrappers for those which seems nasty. If they are provided centrally and check for FRONTEND that problem doesn't exist. I think the right fix here is to fix things so that postgres.h is not necessary. How hard is that? Maybe it just requires some more reshuffling of xlog headers. I don't really think thats realistic. Think the rmgrdesc/* files and xlogreader.c itself... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Andres Freund and...@2ndquadrant.com writes: To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? I don't see any need for memory pools, at least not for frontend applications of the currently envisioned levels of complexity. I concur with Alvaro's suggestion about just #define'ing them to malloc/free --- or maybe better, pg_malloc/free so that we can have a failure-checking wrapper. Not sure how we ought to handle elog, but maybe we can put off that bit of design until we have a more concrete use-case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 08.01.2013 22:39, Andres Freund wrote: On 2013-01-08 15:27:23 -0500, Tom Lane wrote: Andres Freundand...@2ndquadrant.com writes: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. If you think relpathbackend() alone warrants that, yes, I can provide a wrapper. Everything else is imo already handled in a sensible and not really ugly manner? Imo its not worth the effort *for this alone*. I already had some elog(), ereport(), whatever emulation but Heikki preferred not to have it, so its removed by now. To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? Note that the xlogreader facility doesn't need any more emulation. I'm quite satisfied with that part of the patch now. However, the rmgr desc routines do, and I'm not very happy with those. Not sure what to do about it. As you said, we could add enough infrastructure to make them compile in frontend context, or we could try to make them rely less on backend functions. One consideration is that once we have a separate pg_xlogdump utility, I expect that to be the most visible consumer of the rmgr desc functions. The backend will of course still use those functions in e.g error messages, but those don't happen very often. Not sure how that should be taken into account in this patch, but I thought I'd mention it. An rmgr desc routine probably shouldn't be calling elog() even in the backend, because you don't want to throw errors in the contexts where those routines are called. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
Alvaro Herrera alvhe...@2ndquadrant.com writes: Andres Freund wrote: Sorry, misremembered the problem somewhat. The problem is that code that includes postgres.h atm ends up with ExceptionalCondition() et al. declared even if FRONTEND is defined. So if anything uses an assert you need to provide wrappers for those which seems nasty. If they are provided centrally and check for FRONTEND that problem doesn't exist. I think the right fix here is to fix things so that postgres.h is not necessary. How hard is that? Maybe it just requires some more reshuffling of xlog headers. That would definitely be the ideal fix. In general, #include'ing postgres.h into code that's not backend code opens all kinds of hazards that are likely to bite us sooner or later; the incompatibility of the Assert definitions is just the tip of that iceberg. (In the past we've had issues around stdio.h providing different definitions in the two environments, for example.) But having said that, I'm also now remembering that we have files in src/port/ and probably elsewhere that try to work in both environments by just #include'ing c.h directly (relying on the Makefile to supply -DFRONTEND or not). If we wanted to support Assert use in such files, we would have to move at least the Assert() macro definitions into c.h as Andres is proposing. Now, I've always thought that #include'ing c.h directly was kind of an ugly hack, but I don't have a better design than that to offer ATM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 8.1.2013 03:47, Shigeru Hanada wrote: * naming of DROP_RELATIONS_BSEARCH_LIMIT (or off-by-one bug?) IIUC bsearch is used when # of relations to be dropped is *more* than the value of DROP_RELATIONS_BSEARCH_LIMIT (10). IMO this behavior is not what the macro name implies; I thought that bsearch is used for 10 relations. Besides, the word LIMIT is used as *upper limit* in other macros. How about MIN_DROP_REL[ATION]S_BSEARCH or DROP_REL[ATION]S_LINEAR_SEARCH_LIMIT? # using RELS instead of RELATIONS would be better to shorten the name I've changed the name to DROP_RELS_BSEARCH_THRESHOLD. I believe this naming is consistent with options like geqo_threshold - when the number of relations reaches the specified value, the bsearch is used. I've also increased the value from 10 to 20, in accordance with the previous discussion. New name sounds good to me, but the #define says that the value is 15. Should it be fixed to 20? Ah, yes, please increase it to 20. I've increased it from 10 to 15 first, but I think 20 is nearer the optimal value and I forgot to change it in the code. * +1 for Alvaro's suggestion about avoiding palloc traffic by starting with 8 elements or so. Done. Not yet. Initial size of srels array is still 1 element. I've checked the patch and I see there 'maxrels = 8' - or do you mean something else? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 22:47:43 +0200, Heikki Linnakangas wrote: On 08.01.2013 22:39, Andres Freund wrote: On 2013-01-08 15:27:23 -0500, Tom Lane wrote: Andres Freundand...@2ndquadrant.com writes: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. If you think relpathbackend() alone warrants that, yes, I can provide a wrapper. Everything else is imo already handled in a sensible and not really ugly manner? Imo its not worth the effort *for this alone*. I already had some elog(), ereport(), whatever emulation but Heikki preferred not to have it, so its removed by now. To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? Note that the xlogreader facility doesn't need any more emulation. I'm quite satisfied with that part of the patch now. However, the rmgr desc routines do, and I'm not very happy with those. Not sure what to do about it. As you said, we could add enough infrastructure to make them compile in frontend context, or we could try to make them rely less on backend functions. Which emulation needs are you missing in the desc routines besides relpathbackend() and timestamptz_to_str()? pfree() is just needed because its used to free the result of relpathbackend(). No own pallocs, no ereport ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 08.01.2013 23:00, Andres Freund wrote: Note that the xlogreader facility doesn't need any more emulation. I'm quite satisfied with that part of the patch now. However, the rmgr desc routines do, and I'm not very happy with those. Not sure what to do about it. As you said, we could add enough infrastructure to make them compile in frontend context, or we could try to make them rely less on backend functions. Which emulation needs are you missing in the desc routines besides relpathbackend() and timestamptz_to_str()? pfree() is just needed because its used to free the result of relpathbackend(). No own pallocs, no ereport ... Nothing besides those, those are the stuff I was referring to. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade regression test litters the source tree with log files
On Tue, Jan 8, 2013 at 01:08:44PM -0500, Tom Lane wrote: In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. Those look like files left over from a failed upgrade, or you used --retain. Does that make sense? Because they are tracked by oid, it is possible a later successful upgrade would not remove all those files, bit it should remove contrib/pg_upgrade/pg_upgrade_dump_1.log because it is 1. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade regression test litters the source tree with log files
Bruce Momjian br...@momjian.us writes: On Tue, Jan 8, 2013 at 01:08:44PM -0500, Tom Lane wrote: In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. Those look like files left over from a failed upgrade, or you used --retain. Does that make sense? It's possible that I had one or more failed regression test runs on that machine ... don't recall for sure. In any case the point here is that make clean ought to get rid of anything that might be left over from a test run, successful or otherwise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 15:45:07 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? I don't see any need for memory pools, at least not for frontend applications of the currently envisioned levels of complexity. I concur with Alvaro's suggestion about just #define'ing them to malloc/free --- or maybe better, pg_malloc/free so that we can have a failure-checking wrapper. Unless we want to introduce those into common headers, its more complex than #define's, you actually need to provide at least palloc/pfree/CurrentMemoryContext symbols. Still seems like a shame to do that for one lonely pfree() (+ something an eventual own implementation of relpathbackend(). Not sure how we ought to handle elog, but maybe we can put off that bit of design until we have a more concrete use-case. Agreed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade regression test litters the source tree with log files
On 1/8/13 4:04 PM, Bruce Momjian wrote: On Tue, Jan 8, 2013 at 01:08:44PM -0500, Tom Lane wrote: In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. Those look like files left over from a failed upgrade, or you used --retain. Does that make sense? Because they are tracked by oid, it is possible a later successful upgrade would not remove all those files, bit it should remove contrib/pg_upgrade/pg_upgrade_dump_1.log because it is 1. I think this came in with the pg_upgrade --jobs option. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade regression test litters the source tree with log files
On Tue, Jan 8, 2013 at 04:08:42PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Tue, Jan 8, 2013 at 01:08:44PM -0500, Tom Lane wrote: In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. Those look like files left over from a failed upgrade, or you used --retain. Does that make sense? It's possible that I had one or more failed regression test runs on that machine ... don't recall for sure. In any case the point here is that make clean ought to get rid of anything that might be left over from a test run, successful or otherwise. That seems like something more for the regression script (test.sh) to delete. Those are output by _running_ the program, and I never expected people to be running it in the git tree. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade regression test litters the source tree with log files
On Tue, Jan 8, 2013 at 04:11:41PM -0500, Peter Eisentraut wrote: On 1/8/13 4:04 PM, Bruce Momjian wrote: On Tue, Jan 8, 2013 at 01:08:44PM -0500, Tom Lane wrote: In a tree in which I previously ran make check in contrib/pg_upgrade: $ make -s distclean $ git status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # contrib/pg_upgrade/pg_upgrade_dump_1.log # contrib/pg_upgrade/pg_upgrade_dump_12912.log # contrib/pg_upgrade/pg_upgrade_dump_16384.log nothing added to commit but untracked files present (use git add to track) Not sure how long this has been happening. Those look like files left over from a failed upgrade, or you used --retain. Does that make sense? Because they are tracked by oid, it is possible a later successful upgrade would not remove all those files, bit it should remove contrib/pg_upgrade/pg_upgrade_dump_1.log because it is 1. I think this came in with the pg_upgrade --jobs option. Yes, it was part of the split to allow creation of per-database SQL files, but pg_upgrade always created files in the current directory --- there are just more of them now. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 23:02:15 +0200, Heikki Linnakangas wrote: On 08.01.2013 23:00, Andres Freund wrote: Note that the xlogreader facility doesn't need any more emulation. I'm quite satisfied with that part of the patch now. However, the rmgr desc routines do, and I'm not very happy with those. Not sure what to do about it. As you said, we could add enough infrastructure to make them compile in frontend context, or we could try to make them rely less on backend functions. Which emulation needs are you missing in the desc routines besides relpathbackend() and timestamptz_to_str()? pfree() is just needed because its used to free the result of relpathbackend(). No own pallocs, no ereport ... Nothing besides those, those are the stuff I was referring to. Yea :(. As I said, I think its reasonable to emulate the former but timestamptz_to_str() seems too be too complex for that. Extracting the whole datetime/timestamp handling into a backend independent module seems like complex overkill to me although it might be useful to reduce the duplication of all that code in ecpg. (which deviated quite a bit by now). No idea how to solve that other than returning placeholder data atm. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On 1/7/13 5:15 PM, Andrew Dunstan wrote: You (Merlin) have kindly volunteered to work on documentation, so before we go too far with that any bikeshedding on names, or on the functionality being provided, should now take place. Hmm, I was going to say, this patch contains no documentation, so I have no idea what it is supposed to do. Recently discussed isn't a good substitute for describing what the patch is supposed to accomplish. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On 1/8/13 4:32 AM, Magnus Hagander wrote: How does it work if there are many rows in there? Say the result contains 10,000 rows - will the string contain all of them? If so, might it be worthwhile to cap the number of rows shown and then follow with a ... or something? I don't think so. Any number you pick will be too low for someone. Since this would only be executed when explicitly asked for, it's up to the user to manage this. It's analogous to print(long_list) -- you wouldn't truncate that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On Tue, Jan 8, 2013 at 10:23 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/8/13 4:32 AM, Magnus Hagander wrote: How does it work if there are many rows in there? Say the result contains 10,000 rows - will the string contain all of them? If so, might it be worthwhile to cap the number of rows shown and then follow with a ... or something? I don't think so. Any number you pick will be too low for someone. Since this would only be executed when explicitly asked for, it's up to the user to manage this. It's analogous to print(long_list) -- you wouldn't truncate that. Fair enough. I was thinking of a specific example when I wrote that, bu I can't recall what it was, and clearly using print or the python console would be the most similar scenarios. And they both do it the way you suggest. So that's probably the right thing to do. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On 1/8/13 11:55 AM, Daniele Varrazzo wrote: PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': '22'}] This looks more a repr-style format to me (if you implement repr but not str, the latter will default to the former). The repr style was the only guideline I found. There is no guideline for how str should look like when it's not repr. Do you have a better suggestion for the output format? (The reason this is str and not repr is that it doesn't contain other information such as the tuple descriptor, so str of two different results could easily be the same.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
Tomas Vondra wrote: On 8.1.2013 03:47, Shigeru Hanada wrote: * +1 for Alvaro's suggestion about avoiding palloc traffic by starting with 8 elements or so. Done. Not yet. Initial size of srels array is still 1 element. I've checked the patch and I see there 'maxrels = 8' - or do you mean something else? Well, you need to ensure that the initial palloc is an array of that size. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
You can use COPY from a stored procedure, but only to and from files. I think that's in the chocolate fireguard realm though as far as efficiency for this sort of scenario goes, even if its handled by retaining an mmap'd file as workspace. If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess. SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price. So it is possible to use a lower level interface from a C stored proc? SPI is the (only) documented direct function extension API isn't it? Is the issue with using the JSON data-to-record set that the parsing can be costly? Perhaps it can be achieved with B64 of compressed protobuf, or such. I don't mind if it seems a bit messy - the code can be generated from the table easily enough, especially if I can use C++. I guess an allocator that uses SPI_palloc would solve issues with memory management on error? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/7/13 5:15 PM, Andrew Dunstan wrote: You (Merlin) have kindly volunteered to work on documentation, so before we go too far with that any bikeshedding on names, or on the functionality being provided, should now take place. Hmm, I was going to say, this patch contains no documentation, so I have no idea what it is supposed to do. Recently discussed isn't a good substitute for describing what the patch is supposed to accomplish. Why not? There are functional examples in the docs and the purpose of the various functions was hashed out pretty well a couple weeks back, deficiencies corrected, etc. reference: http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 8.1.2013 22:30, Alvaro Herrera wrote: Tomas Vondra wrote: On 8.1.2013 03:47, Shigeru Hanada wrote: * +1 for Alvaro's suggestion about avoiding palloc traffic by starting with 8 elements or so. Done. Not yet. Initial size of srels array is still 1 element. I've checked the patch and I see there 'maxrels = 8' - or do you mean something else? Well, you need to ensure that the initial palloc is an array of that size. Oh, right - I forgot to modify the palloc() call. Thanks for spotting this. Attached is a patch with increased threshold and fixed palloc call. Tomas diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 8dc622a..b1790eb 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -312,6 +312,11 @@ smgrDoPendingDeletes(bool isCommit) PendingRelDelete *pending; PendingRelDelete *prev; PendingRelDelete *next; + + int nrels = 0, +i = 0, +maxrels = 8; + SMgrRelation *srels = palloc(maxrels * sizeof(SMgrRelation)); prev = NULL; for (pending = pendingDeletes; pending != NULL; pending = next) @@ -335,14 +340,32 @@ smgrDoPendingDeletes(bool isCommit) SMgrRelation srel; srel = smgropen(pending-relnode, pending-backend); -smgrdounlink(srel, false); -smgrclose(srel); + +/* extend the array if needed (double the size) */ +if (maxrels = nrels) +{ + maxrels *= 2; + srels = repalloc(srels, sizeof(SMgrRelation) * maxrels); +} + +srels[nrels++] = srel; } /* must explicitly free the list entry */ pfree(pending); /* prev does not change */ } } + + if (nrels 0) + { + smgrdounlinkall(srels, nrels, false); + + for (i = 0; i nrels; i++) + smgrclose(srels[i]); + } + + pfree(srels); + } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..2330fda 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -62,6 +62,7 @@ #define BUF_WRITTEN0x01 #define BUF_REUSABLE 0x02 +#define DROP_RELS_BSEARCH_THRESHOLD 20 /* GUC variables */ bool zero_damaged_pages = false; @@ -108,6 +109,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln); static void AtProcExit_Buffers(int code, Datum arg); +static int rnode_comparator(const void * p1, const void * p2); /* * PrefetchBuffer -- initiate asynchronous read of a block of a relation @@ -2094,35 +2096,87 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum, * */ void -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend *rnodes, int nnodes) { - int i; + int i, j, n = 0; + RelFileNode *nodes; + + nodes = palloc(sizeof(RelFileNode) * nnodes); /* non-local relations */ /* If it's a local relation, it's localbuf.c's problem. */ - if (RelFileNodeBackendIsTemp(rnode)) + for (i = 0; i nnodes; i++) { - if (rnode.backend == MyBackendId) - DropRelFileNodeAllLocalBuffers(rnode.node); + if (RelFileNodeBackendIsTemp(rnodes[i])) + { + if (rnodes[i].backend == MyBackendId) +DropRelFileNodeAllLocalBuffers(rnodes[i].node); + } + else + { + nodes[n++] = rnodes[i].node; + } + } + + /* + * If there are no non-local relations, then we're done. Release the memory + * and return. + */ + if (n == 0) + { + pfree(nodes); return; } + /* sort the list of rnodes (but only if we're going to use the bsearch) */ + if (n DROP_RELS_BSEARCH_THRESHOLD) + pg_qsort(nodes, n, sizeof(RelFileNode), rnode_comparator); + for (i = 0; i NBuffers; i++) { + RelFileNode *rnode = NULL; volatile BufferDesc *bufHdr = BufferDescriptors[i]; - + /* * As in DropRelFileNodeBuffers, an unlocked precheck should be safe * and saves some cycles. */ - if (!RelFileNodeEquals(bufHdr-tag.rnode, rnode.node)) + + /* + * For low number of relations to drop just use a simple walk through, + * to save the bsearch overhead. The BSEARCH_LIMIT is rather a guess + * than a exactly determined value, as it depends on many factors (CPU + * and RAM speeds, amount of shared buffers etc.). + */ + if (n DROP_RELS_BSEARCH_THRESHOLD) + { + for (j = 0; j n; j++) + { +if (RelFileNodeEquals(bufHdr-tag.rnode, nodes[j])) +{ + rnode = nodes[j]; + break; +} + } + } + else + { + rnode = bsearch((const void *) (bufHdr-tag.rnode), + nodes, n, sizeof(RelFileNode), + rnode_comparator); + } + + /* buffer does not belong to any of the relations */ + if (rnode == NULL) continue; LockBufHdr(bufHdr); - if (RelFileNodeEquals(bufHdr-tag.rnode, rnode.node)) + if (RelFileNodeEquals(bufHdr-tag.rnode, (*rnode))) InvalidateBuffer(bufHdr); /* releases spinlock */ else UnlockBufHdr(bufHdr); } + + pfree(nodes); } /*
Re: [HACKERS] Weird Assert failure in GetLockStatusData()
I wrote: This is a bit disturbing: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushpigdt=2013-01-07%2019%3A15%3A02 ... The assertion failure seems to indicate that the number of LockMethodProcLockHash entries found by hash_seq_search didn't match the number that had been counted by hash_get_num_entries immediately before that. I don't see any bug in GetLockStatusData itself, so this suggests that there's something wrong with dynahash's entry counting, or that somebody somewhere is modifying the shared hash table without holding the appropriate lock. The latter seems a bit more likely, given that this must be a very low-probability bug or we'd have seen it before. An overlooked locking requirement in a seldom-taken code path would fit the symptoms. After digging around a bit, I can find only one place where it looks like somebody might be messing with the LockMethodProcLockHash table while not holding the appropriate lock-partition LWLock(s): 1. VirtualXactLock finds target xact holds its VXID lock fast-path. 2. VirtualXactLock calls SetupLockInTable to convert the fast-path lock to regular. 3. SetupLockInTable makes entries in LockMethodLockHash and LockMethodProcLockHash. I see no partition lock acquisition anywhere in the above code path. Is there one that I'm missing? Why isn't SetupLockInTable documented as expecting the caller to hold the partition lock, as is generally done for lock.c subroutines that require that? If this is a bug, it's rather disturbing that it took us this long to recognize it. That code path isn't all that seldom-taken, AFAIK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On 01/08/2013 04:32 PM, Merlin Moncure wrote: On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentrautpete...@gmx.net wrote: On 1/7/13 5:15 PM, Andrew Dunstan wrote: You (Merlin) have kindly volunteered to work on documentation, so before we go too far with that any bikeshedding on names, or on the functionality being provided, should now take place. Hmm, I was going to say, this patch contains no documentation, so I have no idea what it is supposed to do. Recently discussed isn't a good substitute for describing what the patch is supposed to accomplish. Why not? There are functional examples in the docs and the purpose of the various functions was hashed out pretty well a couple weeks back, deficiencies corrected, etc. reference:http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html Well, at a high level the patch is meant to do two things: provide an API that can be used to build JSON processing functions easily, and provide some basic json processing functions built on the API. Those functions provide similar capabilities to the accessor functions that hstore has. Perhaps also this will help. Here is the list of functions and operators as currently implemented. I also have working operators for the get_path functions which will be in a future patch. All these are used in the included regression tests. Name | Result data type | Argument data types -+--+ json_array_length | integer | json json_each | SETOF record | from_json json, OUT key text, OUT value json json_each_as_text | SETOF record | from_json json, OUT key text, OUT value text json_get| json | json, integer json_get| json | json, text json_get_as_text| text | json, integer json_get_as_text| text | json, text json_get_path | json | from_json json, VARIADIC path_elems text[] json_get_path_as_text | text | from_json json, VARIADIC path_elems text[] json_object_keys| SETOF text | json json_populate_record| anyelement | anyelement, json json_populate_recordset | SETOF anyelement | base anyelement, from_json json, use_json_as_text boolean DEFAULT false json_unnest | SETOF json | from_json json, OUT value json Name | Left arg type | Right arg type | Result type | Description --+---++-+ - | json | integer| json| get json array element - | json | text | json| get json object field - | json | integer| text| get json array element as text - | json | text | text| get json object field as text cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 8, 2013 at 11:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? Daniel already did object To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
On 1/5/13 11:04 AM, Stephen Frost wrote: Creating a separate catalog (or two) every time we want to track XYZ for all objects is rather overkill... Thinking about this a bit more, and noting that pg_description/shdescription more-or-less already exist as a framework for tracking 'something' for 'all catalog entries'- why don't we just add these columns to those tables..? This would also address Peter's concern about making sure we do this 'wholesale' and in one release rather than spread across multiple releases- just make sure it covers the same set of things which 'comment' does. Yeah, actually, the other day I was thinking we should get rid of all the system catalogs and use a big EAV-like schema instead. We're not getting any relational-database value out of the current way, and it's just a lot of duplicate code. If we had a full EAV system, we could even do in-place upgrade. Obviously, this isn't going to happen any time soon or ever, but I think I agree with your concern above as a partial step. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index build temp files
Greetings, We were surprised recently to note that the temp files that are created during a CREATE INDEX don't go into either a temp tablespace set for the user or into the tablespace which the CREATE INDEX specifies. Instead, they go only into base/pgsql_tmp/. This doesn't allow for any flexibility in defining where to create these potentially quite large sets of files. Shouldn't these temp files be going into the temp tablespace for the user creating the index instead..? Or perhaps into the tablespace which the index is being created in? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] I s this a bug of spgist index in a heavy write condition?
=?gb2312?B?wO66o8H6?= hailong...@qunar.com writes: I am very excited to say that I may have created a test case! I've been running variants of this example for most of the afternoon, and have not seen a failure :-(. So I'm afraid there is some aspect of your situation that you've not provided enough information to reproduce. Most likely, that's the initial contents of your table, which you didn't provide. I tried seeding the table with the five values you did show and then running the insertion loops, but no luck, even after many millions of insertions with various timing changes. Please see if you can put together a self-contained test case including necessary test data. (Note there's no reason to think that any of the columns other than the spgist-indexed one are interesting, if that helps you sanitize the data to the point you can let it out.) The control flow in spgdoinsert.c is flat enough that the stack trace alone isn't much help in understanding the bug, I'm afraid. We can guess that two insertions are trying to lock the same two index pages in opposite orders, but without looking at the data that doesn't put us much closer to a fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
2013/1/8 Peter Eisentraut pete...@gmx.net: On 1/5/13 11:04 AM, Stephen Frost wrote: Creating a separate catalog (or two) every time we want to track XYZ for all objects is rather overkill... Thinking about this a bit more, and noting that pg_description/shdescription more-or-less already exist as a framework for tracking 'something' for 'all catalog entries'- why don't we just add these columns to those tables..? This would also address Peter's concern about making sure we do this 'wholesale' and in one release rather than spread across multiple releases- just make sure it covers the same set of things which 'comment' does. Yeah, actually, the other day I was thinking we should get rid of all the system catalogs and use a big EAV-like schema instead. We're not getting any relational-database value out of the current way, and it's just a lot of duplicate code. If we had a full EAV system, we could even do in-place upgrade. -1 now we have a thousands tables, I am not sure so EAV can get good performance Pavel Obviously, this isn't going to happen any time soon or ever, but I think I agree with your concern above as a partial step. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On Tue, Jan 8, 2013 at 9:51 AM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Jan 8, 2013 at 10:20 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro yamamuro.take...@lab.ntt.co.jp wrote: Apart from my patch, what I care is that the current one might be much slow against I/O. For example, when compressing and writing large values, compressing data (20-40MiB/s) might be a dragger against writing data in disks (50-80MiB/s). Moreover, IMHO modern (and very fast) I/O subsystems such as SSD make a bigger issue in this case. What about just turning compression off? I've been relying on compression for some big serialized blob fields for some time now. I bet I'm not alone, lots of people save serialized data to text fields. So rather than removing it, I'd just change the default to off (if that was the decision). However, it might be best to evaluate some of the modern fast compression schemes like snappy/lz4 (250MB/s per core sounds pretty good), and implement pluggable compression schemes instead. Snappy wasn't designed for nothing, it was most likely because it was necessary. Cassandra (just to name a system I'm familiar with) started without compression, and then it was deemed necessary to the point they invested considerable time into it. I've always found the fact that pg does compression of toast tables quite forward-thinking, and I'd say the feature has to remain there, extended and modernized, maybe off by default, but there. I'm not offering any opinion on whether we should have compression as a general matter. Maybe yes, maybe no, but my question was about the OP's use case. If he's willing to accept less efficient compression in order to get faster compression, perhaps he should just not use compression at all. Personally, my biggest gripe about the way we do compression is that it's easy to detoast the same object lots of times. More generally, our in-memory representation of user data values is pretty much a mirror of our on-disk representation, even when that leads to excess conversions. Beyond what we do for TOAST, there's stuff like numeric where not only toast but then post-process the results into yet another internal form before performing any calculations - and then of course we have to convert back before returning from the calculation functions. And for things like XML, JSON, and hstore we have to repeatedly parse the string, every time someone wants to do anything to do. Of course, solving this is a very hard problem, and not solving it isn't a reason not to have more compression options - but more compression options will not solve the problems that I personally have in this area, by and large. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index build temp files
On Tue, Jan 8, 2013 at 05:09:47PM -0500, Stephen Frost wrote: Greetings, We were surprised recently to note that the temp files that are created during a CREATE INDEX don't go into either a temp tablespace set for the user or into the tablespace which the CREATE INDEX specifies. Instead, they go only into base/pgsql_tmp/. This doesn't allow for any flexibility in defining where to create these potentially quite large sets of files. Shouldn't these temp files be going into the temp tablespace for the user creating the index instead..? Or perhaps into the tablespace which the index is being created in? Well, our docs for temp_tablespaces says: This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a commandCREATE/ command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces. Are you saying this is inaccorate? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
* Pavel Stehule (pavel.steh...@gmail.com) wrote: 2013/1/8 Peter Eisentraut pete...@gmx.net: On 1/5/13 11:04 AM, Stephen Frost wrote: Yeah, actually, the other day I was thinking we should get rid of all the system catalogs and use a big EAV-like schema instead. We're not getting any relational-database value out of the current way, and it's just a lot of duplicate code. If we had a full EAV system, we could even do in-place upgrade. -1 now we have a thousands tables, I am not sure so EAV can get good performance To be honest, my first reaction to this was an assumption that it was pure sarcasm.. Seriously tho, the argument for not putting these things into the various individual catalogs is that they'd create bloat and these items don't need to be performant. I would think that the kind of timestamps that we're talking about fall into the same data category as comments on tables. If there isn't a good reason for comments on objects to be off in a generic this is for any kind of object table, then perhaps we should move them into the appropriate catalog tables? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Index build temp files
* Bruce Momjian (br...@momjian.us) wrote: On Tue, Jan 8, 2013 at 05:09:47PM -0500, Stephen Frost wrote: Greetings, We were surprised recently to note that the temp files that are created during a CREATE INDEX don't go into either a temp tablespace set for the user or into the tablespace which the CREATE INDEX specifies. Instead, they go only into base/pgsql_tmp/. This doesn't allow for any flexibility in defining where to create these potentially quite large sets of files. Shouldn't these temp files be going into the temp tablespace for the user creating the index instead..? Or perhaps into the tablespace which the index is being created in? Well, our docs for temp_tablespaces says: This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a commandCREATE/ command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces. Are you saying this is inaccorate? Yes and no? Are the temporary files created during a CREATE INDEX considered Temporary files for purposes such as sorting large data sets? My thinking is 'yes', but others may disagree. Also, considering this a bug would imply that it's back-patchable and I'm not convinced it's worth the risk of breaking things which depend on the current behavior. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Jan 8, 2013 at 11:39 AM, Merlin Moncure mmonc...@gmail.com wrote: Reference: http://postgresql.1045698.n5.nabble.com/Simple-join-doesn-t-use-index-td5738689.html This is a pretty common gotcha: user sets shared_buffers but misses the esoteric but important effective_cache_size. ISTM effective_cache_size should always be = shared buffers -- this is a soft configuration error that could be reported as a warning and perhaps overridden on the fly. Not true. If there are many concurrent users running concurrent queries against parallel databases, such as some test systems I have that contain many databases for many test environments, such a setting wouldn't make sense. If a DBA sets it to lower than shared_buffers, that setting has to be honored. Rather, I'd propose the default setting should be -1 or something default and automagic that works most of the time (but not all). +1. I've found that a value of three-quarters of system memory works pretty well most of the time. Of course, there's not a single, portable way of getting that on every platform we support. If I remember my last investigation into this area, to use that particular rule we would probably need at least three paths - one for Windows, one for System V descendents, and one for BSD descendents. And there might still be obscure things that wouldn't be covered. Of course this also makes the admittedly unwarranted assumption that the database owns the box, which could be wrong too, but purposely lowballing effective_cache_size to discourage index-scan plans seems unlikely to be a winning strategy. A cruder heuristic that might be useful is 3 * shared_buffers. If people follow the guidance to set shared_buffers around 25% of RAM, then this will work out to around 75% again. Of course, many people, for valid reasons, use smaller values of shared_buffers than that, and a few use larger ones. It might still be better than no auto-tuning, though I wouldn't swear to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund and...@2ndquadrant.com wrote: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. FWIW, I'm with Tom on this one. Any meaningful code sharing is going to need that, so we might as well bite the bullet. And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weird Assert failure in GetLockStatusData()
I wrote: After digging around a bit, I can find only one place where it looks like somebody might be messing with the LockMethodProcLockHash table while not holding the appropriate lock-partition LWLock(s): 1. VirtualXactLock finds target xact holds its VXID lock fast-path. 2. VirtualXactLock calls SetupLockInTable to convert the fast-path lock to regular. 3. SetupLockInTable makes entries in LockMethodLockHash and LockMethodProcLockHash. I see no partition lock acquisition anywhere in the above code path. I've been able to reproduce the assertion crash by forcing the appropriate timing with some carefully chosen debugger breakpoints. So this theory is evidently correct. If this is a bug, it's rather disturbing that it took us this long to recognize it. That code path isn't all that seldom-taken, AFAIK. On closer inspection, VirtualXactLock() is only called in CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY (and yes, the failed test case on bushpig was exercising DROP INDEX CONCURRENTLY). So maybe the path isn't that frequently taken after all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Robert Haas escribió: And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). +1 If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. +1 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On 2013-01-08 17:28:33 -0500, Robert Haas wrote: On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund and...@2ndquadrant.com wrote: Uhm, we don't have need palloc support and I don't think relpathbackend() is a good justification for adding it. FWIW, I'm with Tom on this one. Any meaningful code sharing is going to need that, so we might as well bite the bullet. Yes, if we set the scope bigger than xlogreader I aggree. If its xlogreader itself I don't. But as I happen to think we should share more code... Will prepare a patch. I wonder whether it would be acceptable to make palloc() an actual function instead of #define palloc(sz) MemoryContextAlloc(CurrentMemoryContext, (sz)) so we don't have to expose CurrentMemoryContext? Alternatively we can just move the whole of utils/mmgr/* to port, but that would imply an elog/ereport wrapper... And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. Imo it depends on the circumstances and number of possible callers, but anyway, it seems to be already decided that my suggestion isn't the way to go. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, Jan 8, 2013 at 05:23:36PM -0500, Robert Haas wrote: Rather, I'd propose the default setting should be -1 or something default and automagic that works most of the time (but not all). +1. I've found that a value of three-quarters of system memory works pretty well most of the time. Of course, there's not a single, portable way of getting that on every platform we support. If I remember my last investigation into this area, to use that particular rule we would probably need at least three paths - one for Windows, one for System V descendents, and one for BSD descendents. And there might still be obscure things that wouldn't be covered. Of course this also makes the admittedly unwarranted assumption that the database owns the box, which could be wrong too, but purposely lowballing effective_cache_size to discourage index-scan plans seems unlikely to be a winning strategy. A cruder heuristic that might be useful is 3 * shared_buffers. If people follow the guidance to set shared_buffers around 25% of RAM, then this will work out to around 75% again. Of course, many people, for valid reasons, use smaller values of shared_buffers than that, and a few use larger ones. It might still be better than no auto-tuning, though I wouldn't swear to it. Agreed. This is similar to the fudge we have about random_page_cost: Random access to mechanical disk storage is normally much more expensive than four-times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached. effective_cache_size is impossible to set accurately because you have no idea what other things might be in the cache, or what other concurrent queries might be filling the cache. Going with something at least partly reasonable makes a lot more sense. While we don't know the size of RAM, we do know the size of shared_buffers, and keying on that for a default seems like a no-brainer, rather than using 128MB. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Robert Haas robertmh...@gmail.com writes: And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. Yeah, that was exactly the case that was top-of-mind when I was complaining about static return buffers upthread. It's not hard to make the ugliness go away: just let it strdup its return value. The problem is that in the vast majority of usages it wouldn't be convenient to free the result, so we'd have a good deal of memory leakage. What might be interesting is to instrument it to see how much (adding a counter to the function ought to be easy enough) and then find out whether it's an amount we still care about in 2013. Frankly, pg_dump is a memory hog already - a few more identifier-sized strings laying about might not matter anymore. (Wanders away wondering how many relpathbackend callers bother to free its result, and whether that matters either ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index build temp files
Stephen Frost sfr...@snowman.net writes: * Bruce Momjian (br...@momjian.us) wrote: Well, our docs for temp_tablespaces says: This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a commandCREATE/ command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces. Are you saying this is inaccorate? Yes and no? Are the temporary files created during a CREATE INDEX considered Temporary files for purposes such as sorting large data sets? My thinking is 'yes', but others may disagree. Also, considering this a bug would imply that it's back-patchable and I'm not convinced it's worth the risk of breaking things which depend on the current behavior. I don't think it's a bug. What you seem to be proposing is that CREATE INDEX ought to ignore temp_tablespaces and instead always put its temp files in the tablespace where the finished index will reside. This would not be a good idea IMO --- allowing the temp files to be spread to other tablespaces is better both from the standpoint of space usage and the ability to overlap I/O. (Admittedly, both of those concerns are often theoretical, but if they are then I don't see why you'd care which tablespace is used.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index build temp files
On 9 January 2013 00:04, Tom Lane t...@sss.pgh.pa.us wrote: I don't think it's a bug. What you seem to be proposing is that CREATE INDEX ought to ignore temp_tablespaces and instead always put its temp files in the tablespace where the finished index will reside. I don't think that's what he said. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire klaussfre...@gmail.com wrote: Rather, I'd propose the default setting should be -1 or something default and automagic that works most of the time (but not all). A cruder heuristic that might be useful is 3 * shared_buffers. Both parts of that work for me. It's certainly silly that the default value of effective_cache_size is now equivalent to the default value of shared_buffers. And I don't especially like the idea of trying to make it depend directly on the box's physical RAM, for the same practical reasons Robert mentioned. It might be better to use 4 * shared_buffers, as that corresponds to the multiple that's been the default since 8.2 or so (ie 128MB vs 32MB), and 3x just seems kinda oddball. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index build temp files
Simon Riggs si...@2ndquadrant.com writes: On 9 January 2013 00:04, Tom Lane t...@sss.pgh.pa.us wrote: I don't think it's a bug. What you seem to be proposing is that CREATE INDEX ought to ignore temp_tablespaces and instead always put its temp files in the tablespace where the finished index will reside. I don't think that's what he said. Then I misunderstood. Stephen, what was your thought exactly about what should happen? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On Tue, Jan 8, 2013 at 6:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. Yeah, that was exactly the case that was top-of-mind when I was complaining about static return buffers upthread. It's not hard to make the ugliness go away: just let it strdup its return value. The problem is that in the vast majority of usages it wouldn't be convenient to free the result, so we'd have a good deal of memory leakage. What might be interesting is to instrument it to see how much (adding a counter to the function ought to be easy enough) and then find out whether it's an amount we still care about in 2013. Frankly, pg_dump is a memory hog already - a few more identifier-sized strings laying about might not matter anymore. (Wanders away wondering how many relpathbackend callers bother to free its result, and whether that matters either ...) I was thinking more about a sprintf()-type function that only understands a handful of escapes, but adds the additional and novel escapes %I (quote as identifier) and %L (quote as literal). I think that would allow a great deal of code simplification, and it'd be more efficient, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire klaussfre...@gmail.com wrote: Rather, I'd propose the default setting should be -1 or something default and automagic that works most of the time (but not all). A cruder heuristic that might be useful is 3 * shared_buffers. Both parts of that work for me. It's certainly silly that the default value of effective_cache_size is now equivalent to the default value of shared_buffers. And I don't especially like the idea of trying to make it depend directly on the box's physical RAM, for the same practical reasons Robert mentioned. For the record, I don't believe those problems would be particularly hard to solve. It might be better to use 4 * shared_buffers, as that corresponds to the multiple that's been the default since 8.2 or so (ie 128MB vs 32MB), and 3x just seems kinda oddball. I suspect that would be OK, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
Robert Haas robertmh...@gmail.com writes: I was thinking more about a sprintf()-type function that only understands a handful of escapes, but adds the additional and novel escapes %I (quote as identifier) and %L (quote as literal). I think that would allow a great deal of code simplification, and it'd be more efficient, too. Seems like a great idea. Are you offering to code it? Note that this wouldn't entirely fix the fmtId problem, as not all the uses of fmtId are directly in sprintf calls. Still, it might get rid of most of the places where it'd be painful to avoid a memory leak with a strdup'ing version of fmtId. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
On Tue, Jan 8, 2013 at 7:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I was thinking more about a sprintf()-type function that only understands a handful of escapes, but adds the additional and novel escapes %I (quote as identifier) and %L (quote as literal). I think that would allow a great deal of code simplification, and it'd be more efficient, too. Seems like a great idea. Are you offering to code it? Not imminently. Note that this wouldn't entirely fix the fmtId problem, as not all the uses of fmtId are directly in sprintf calls. Still, it might get rid of most of the places where it'd be painful to avoid a memory leak with a strdup'ing version of fmtId. Yeah, I didn't think about that. Might be worth a look to see how comprehensively it would solve the problem. But I'll have to leave that for another day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... And I don't especially like the idea of trying to make it depend directly on the box's physical RAM, for the same practical reasons Robert mentioned. For the record, I don't believe those problems would be particularly hard to solve. Well, the problem of find out the box's physical RAM is doubtless solvable if we're willing to put enough sweat and tears into it, but I'm dubious that it's worth the trouble. The harder part is how to know if the box is supposed to be dedicated to the database. Bear in mind that the starting point of this debate was the idea that we're talking about an inexperienced DBA who doesn't know about any configuration knob we might provide for the purpose. I'd prefer to go with a default that's predictable and not totally foolish --- and some multiple of shared_buffers seems like it'd fit the bill. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump transaction's read-only mode
On Mon, Dec 31, 2012 at 1:38 AM, Pavan Deolasee pavan.deola...@gmail.comwrote: On Sun, Dec 30, 2012 at 12:38 AM, Stephen Frost sfr...@snowman.net wrote: * Pavan Deolasee (pavan.deola...@gmail.com) wrote: On Fri, Sep 7, 2012 at 6:06 PM, Kevin Grittner That makes sense to me. The reason I didn't make that change when I added the serializable special case to pg_dump was that it seemed like a separate question; I didn't want to complicate an already big patch with unnecessary changes to non-serializable transactions. If we agree, should we change that now ? This is on the next commitfest, so I figure it deserves some comment. For my part- I tend to agree that we should have it always use a read only transaction. Perhaps we should update the pg_dump documentation to mention this as well though? Pavan, do you want to put together an actual patch? I'd posted actual patch on this thread, but probably linked wrong message-id in the commitfest page. Will check and correct. Regarding pg_dump's documentation, I don't have strong views on that. Whether pg_dump runs as a read-only transaction or not is entirely internal to its implementation, but then if we make this change, it might be worth telling users that they can trust that pg_dump will not make any changes to their database and hence a safe operation to carry out. I have updated the commitfest submission to link to the correct patch email. I initially thought that this patch deserves accompanying documentation because pg_dump's serializable transaction may error out because of a conflict. But the following line in the docs [1] confirms otherwise: read-only transactions will never have serialization conflicts So no doc patch necessary :) [1] http://www.postgresql.org/docs/9.2/static/transaction-iso.html -- Gurjeet Singh http://gurjeet.singh.im/