Re: [HACKERS] Oid registry
On Tue, Sep 25, 2012 at 5:36 PM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2012-09-24 at 21:18 -0700, Daniel Farina wrote: The gap between pre-JSON-in-the-standard-library in Python, Ruby, et al and post-JSON-in-stdlib was much smaller. Except in Python they renamed the thing. By 'smaller' I mean the difference in capability between promoted to the Python stdlib is comparatively smaller than data types and operators from being promoted to core in Postgres, and the main reason for that is that people can compose on top of Postgres-core functionality, but those extensions themselves are not very useful in further composition. The extensibility at the first level is great. There exists an entirely livable alternative-future/present where Python never promoted the (simple)json library into the stdlib. The same could not be said for Postgres extensions as they exist today. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 09/26/2012 01:02 AM, m...@rpzdesign.com wrote: John: Who has the money for oracle RAC or funding arrogant bastard Oracle CEO Ellison to purchase another island? Postgres needs CHEAP, easy to setup, self healing, master-master-master-master and it needs it yesterday. I was able to patch the 9.2.0 code base in 1 day and change my entire architecture strategy for replication into self healing async master-master-master and the tiniest bit of sharding code imaginable Tell us about the compromises you had to make. It is an established fact that you can either have it replicate fast and loose or slow and correct. In the fast and loose case you have to be ready to do a lot of mopping-up in case of conflicts. That is why I suggest something to replace OIDs with ROIDs for replication ID. (CREATE TABLE with ROIDS) I implement ROIDs as a uniform design pattern for the table structures. Synchronous replication maybe between 2 local machines if absolutely no local hardware failure is acceptable, but cheap, scaleable synchronous, Scaleable / synchronous is probably doable, if we are ready to take the initial performance hit of lock propagation. TRANSACTIONAL, master-master-master-master is a real tough slog. I could implement global locks in the external replication layer if I choose, but there are much easier ways in routing requests thru the load balancer and request sharding than trying to manage global locks across the WAN. Good luck with your HA patch for Postgres. Thanks for all of the responses! You guys are 15 times more active than the MySQL developer group, likely because they do not have a single db engine that meets all the requirements like PG. marco On 9/25/2012 5:10 PM, John R Pierce wrote: On 09/25/12 11:01 AM, m...@rpzdesign.com wrote: At some point, every master - slave replicator gets to the point where they need to start thinking about master-master replication. master-master and transactional integrity are mutually exclusive, except perhaps in special cases like Oracle RAC, where the masters share a coherent cache and implement global locks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EVENT Keyword and CREATE TABLE
Tom Lane t...@sss.pgh.pa.us writes: It's a bug. The event-trigger patch added EVENT as a new keyword, but forgot to list it in the unreserved_keywords production, which is necessary to make it actually act unreserved. Oh. Oops. Sorry about that. I've committed a fix, and manually verified there are no other such errors at present, but this isn't the first time this has happened. We probably need to put in some automated cross-check, or it won't be the last time either. I see you've added the automated cross-check now, thanks for that! Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On 26 September 2012 15:47, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Really, as far as autovacuum is concerned, it would be much more useful to be able to reliably detect that a table has been recently vacuumed, without having to request a 10ms-recent pgstat snapshot. That would greatly reduce the amount of time autovac spends on pgstat requests. VACUUMing generates a relcache invalidation. Can we arrange for those invalidations to be received by autovac launcher, so it gets immediate feedback of recent activity without polling? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On 25.09.2012 18:27, Amit Kapila wrote: If you feel it is must to do the comparison, we can do it in same way as we identify for HOT? Yeah. (But as discussed, I think it would be even better to just treat the old and new tuple as an opaque chunk of bytes, and run them through a generic delta algorithm). Can you please explain me why you think that after doing encoding doing LZ compression on it is better, as already we have reduced the amount of WAL for update by only storing changed column information? a. is it to further reduce the size of WAL b. storing diff WAL in some standard format c. or does it give any other kind of benefit Potentially all of those. I don't know if it'd be better or worse, but my gut feeling is that it would be simpler, and produce even more compact WAL. Attached is a simple patch to apply LZ compression to update WAL records. I modified the LZ compressor so that it can optionally use a separate history data, and the same history data must then be passed to the decompression function. That makes it work as a pretty efficient delta encoder, when you use the old tuple as the history data. I ran some performance tests with the modified version of pgbench that you posted earlier: Current PostgreSQL master - tps = 941.601924 (excluding connections establishing) pg_xlog_location_diff --- 721227944 pglz_wal_update_records.patch - tps = 1039.792527 (excluding connections establishing) pg_xlog_location_diff --- 419395208 pglz_wal_update_records.patch, COMPRESS_ONLY tps = 1009.682002 (excluding connections establishing) pg_xlog_location_diff --- 422505104 Amit's wal_update_changes_hot_update.patch -- tps = 1092.703883 (excluding connections establishing) pg_xlog_location_diff --- 436031544 The COMPRESS_ONLY result is with the attached patch, but it just uses LZ to compress the new tuple, without taking advantage of the old tuple. The pg_xlog_location_diff value is the amount of WAL generated during the pgbench run. Attached is also the shell script I used to run these tests. The conclusion is that there isn't very much difference among the patches. They all squeeze the WAL to about the same size, and the increase in TPS is roughly the same. I think more performance testing is required. The modified pgbench test isn't necessarily very representative of a real-life application. The gain (or loss) of this patch is going to depend a lot on how many columns are updated, and in what ways. Need to test more scenarios, with many different database schemas. The LZ approach has the advantage that it can take advantage of all kinds of similarities between old and new tuple. For example, if you swap the values of two columns, LZ will encode that efficiently. Or if you insert a character in the middle of a long string. On the flipside, it's probably more expensive. Then again, you have to do a memcmp() to detect which columns have changed with your approach, and that's not free either. That was not yet included in the patch version I tested. Another consideration is that when you compress the record more, you have less data to calculate CRC for. CRC calculation tends to be quite expensive, so even quite aggressive compression might be a win. Yet another consideration is that the compression/encoding is done while holding a lock on the buffer. For the sake of concurrency, you want to keep the duration the lock is held as short as possible. - Heikki diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 5a4591e..56b53a5 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -70,6 +70,7 @@ #include utils/snapmgr.h #include utils/syscache.h #include utils/tqual.h +#include utils/pg_lzcompress.h /* GUC variable */ @@ -85,6 +86,7 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, CommandId cid, int options); static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from, Buffer newbuf, HeapTuple newtup, +HeapTuple oldtup, bool all_visible_cleared, bool new_all_visible_cleared); static bool HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs, HeapTuple oldtup, HeapTuple newtup); @@ -3195,10 +3197,12 @@ l2: /* XLOG stuff */ if (RelationNeedsWAL(relation)) { - XLogRecPtr recptr = log_heap_update(relation, buffer, oldtup.t_self, - newbuf, heaptup, - all_visible_cleared, - all_visible_cleared_new); + XLogRecPtr recptr; + + recptr = log_heap_update(relation, buffer, oldtup.t_self, + newbuf, heaptup, oldtup, + all_visible_cleared, +
Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote: On 25.09.2012 18:27, Amit Kapila wrote: If you feel it is must to do the comparison, we can do it in same way as we identify for HOT? Yeah. (But as discussed, I think it would be even better to just treat the old and new tuple as an opaque chunk of bytes, and run them through a generic delta algorithm). Thank you for the modified patch. The conclusion is that there isn't very much difference among the patches. They all squeeze the WAL to about the same size, and the increase in TPS is roughly the same. I think more performance testing is required. The modified pgbench test isn't necessarily very representative of a real-life application. The gain (or loss) of this patch is going to depend a lot on how many columns are updated, and in what ways. Need to test more scenarios, with many different database schemas. The LZ approach has the advantage that it can take advantage of all kinds of similarities between old and new tuple. For example, if you swap the values of two columns, LZ will encode that efficiently. Or if you insert a character in the middle of a long string. On the flipside, it's probably more expensive. Then again, you have to do a memcmp() to detect which columns have changed with your approach, and that's not free either. That was not yet included in the patch version I tested. Another consideration is that when you compress the record more, you have less data to calculate CRC for. CRC calculation tends to be quite expensive, so even quite aggressive compression might be a win. Yet another consideration is that the compression/encoding is done while holding a lock on the buffer. For the sake of concurrency, you want to keep the duration the lock is held as short as possible. Now I shall do the various tests for following and post it here: a. Attached Patch in the mode where it takes advantage of history tuple b. By changing the logic for modified column calculation to use calculation for memcmp() With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to json member or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. I feel like there are two different behaviors that someone might want here, and a cast cannot mean both. 1. Please reinterpret the existing value that I have already got as a JSON object. For example, you might have a text field in which you have been storing JSON values. Once you upgrade to 9.2, you might want to reinterpret the existing contents of the field - which are already valid JSON - as JSON objects. 2. Please convert the value that I have into a JSON object according to a type-specific rule. For example, you might have a text field in which you store arbitrary strings. But perhaps you need to store structured data there, so once you upgrade to 9.2 you might want to wrap up your strings inside JSON strings. Now there is some subtle ambiguity here because in some cases the behavior can be exactly the same in both cases. For example most numeric values will get the same treatment either way, but NaN cannot. If you do mynumeric::json, interpretation #1 will fail for NaN but interpretation #2 will probably produce something like NaN. Similarly if the type is boolean, we could likely get away with producing true and false for either interpretation. If the type is hstore, then #1 is going to fail, but #2 is going to convert 1=2 to {1:2}. So in general it might seem that #2 is the better interpretation, because it gives many casts a sensible interpretation that is otherwise lacking. But, what about text? It seems to me that users will count on the fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON array containing the first three numbers) and NOT [1,2,3] (a JSON string containing 7 characters). And that is emphatically interpretation #1. I think it would be an extremely bad idea to decide that casts should have interpretation #2 for all data types except things that are kind of like text, which should instead behave like #1. And if we standardize completely on interpretation #2, then I think that '[1,2,3]'::json will end up meaning something different from '[1,2,3]'::text::json, because the former will (IIUC) go through the type-input function and end up creating a JSON array, whereas the latter will go through the cast function and end up creating a JSON string. It would also mean that in more complex queries, you could get substantially different behavior in simple cases where the parser maintains literals as unknown vs. more complex cases where it decides that they must be text for lack of a full type inference system. Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Perhaps something to consider for a future extension of the standard. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers truncate triggers
On Wed, Sep 26, 2012 at 10:59 AM, Christopher Browne cbbro...@gmail.com wrote: A different place where I wound up having to jump through considerable hoops when doing schema analytics was vis-a-vis identifying functions. I need to be able to compare schemas across databases, so oid-based identification of functions is a total non-starter. It appears that the best identification of a function would be based on the combination of schema name, function name, and the concatenation of argument data types. It wasn't terribly difficult to construct that third bit, but it surely would be nice if there was a view capturing it, and possibly even serializing it into a table to enable indexing on it. Performance-wise, function comparisons turned out to be one of the most expensive things I did, specifically because of that mapping surrounding arguments. pg_proc.oid::regprocedure::text has been pretty good to me for this sort of thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Performance Improvement by reducing WAL for Update Operation
On Thursday, September 27, 2012 9:12 AM Noah Misch wrote: On Mon, Sep 24, 2012 at 10:57:02AM +, Amit kapila wrote: Rebased version of patch based on latest code. I like the direction you're taking with this patch; the gains are striking, especially considering the isolation of the changes. Thank you for a detailed review of the patch. You cannot assume executor-unmodified columns are also unmodified from heap_update()'s perspective. Expansion in one column may instigate TOAST compression of a logically-unmodified column, and that counts as a change for xlog delta purposes. You do currently skip the optimization for relations having a TOAST table, but TOAST compression can still apply. Observe this with text columns of storage mode PLAIN. I see two ways out: skip the new behavior when need_toast=true, or compare all inline column data, not just what the executor modified. One can probably construct a benchmark favoring either choice. I'd lean toward the latter; wide tuples are the kind this change can most help. If the marginal advantage of ignoring known- unmodified columns proves important, we can always bring it back after designing a way to track which columns changed in the toaster. You are right that it can give benefit for both ways, but we should also see which approach can give better results for most of the scenario's. As in most cases of Update I have observed, the change in values will not increase the length of value to too much. OTOH I am not sure may be there are many more scenario's which change the length of updated value which can lead to scenario explained by you above. Given that, why not treat the tuple as an opaque series of bytes and not worry about datum boundaries? When several narrow columns change together, say a sequence of sixteen smallint columns, you will use fewer binary delta commands by representing the change with a single 32-byte substitution. If an UPDATE changes just part of a long datum, the delta encoding algorithm will still be able to save considerable space. That case arises in many forms: changing one word in a long string, changing one element in a long array, changing one field of a composite-typed column. Granted, this makes the choice of delta encoding algorithm more important. Like Heikki, I'm left wondering why your custom delta encoding is preferable to an encoding from the literature. Your encoding has much in common with VCDIFF, even sharing two exact command names. If a custom encoding is the right thing, code comments or a README section should at least discuss the advantages over an established alternative. Idle thought: it might pay off to use 1-byte sizes and offsets most of the time. Tuples shorter than 256 bytes are common; for longer tuples, we can afford wider offsets. My apprehension was that it can affect the performance if do more work by holding the lock. If we use any standard technique like LZ of VCDiff, it has overhead of comparison and other things pertaining to their algorithm. However using updated patch by Heikki, I can run the various performance tests both for update operation as well as recovery. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modest proposal: run check_keywords.pl on every build
On Wed, Sep 26, 2012 at 9:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Any objections? None, and my apologies for the oversight. I would really like to have a cross-check for duplicated OIDs, too. After a couple of years I think I have learned most of the rules for what stuff I need to cross-check before committing, but it is a nuisance to remember to do it each time and, despite my best efforts, I have fouled things up a few times. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On Thu, Sep 27, 2012 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to json member or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. I feel like there are two different behaviors that someone might want here, and a cast cannot mean both. 1. Please reinterpret the existing value that I have already got as a JSON object. For example, you might have a text field in which you have been storing JSON values. Once you upgrade to 9.2, you might want to reinterpret the existing contents of the field - which are already valid JSON - as JSON objects. 2. Please convert the value that I have into a JSON object according to a type-specific rule. For example, you might have a text field in which you store arbitrary strings. But perhaps you need to store structured data there, so once you upgrade to 9.2 you might want to wrap up your strings inside JSON strings. Now there is some subtle ambiguity here because in some cases the behavior can be exactly the same in both cases. For example most numeric values will get the same treatment either way, but NaN cannot. If you do mynumeric::json, interpretation #1 will fail for NaN but interpretation #2 will probably produce something like NaN. Similarly if the type is boolean, we could likely get away with producing true and false for either interpretation. If the type is hstore, then #1 is going to fail, but #2 is going to convert 1=2 to {1:2}. So in general it might seem that #2 is the better interpretation, because it gives many casts a sensible interpretation that is otherwise lacking. But, what about text? It seems to me that users will count on the fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON array containing the first three numbers) and NOT [1,2,3] (a JSON string containing 7 characters). And that is emphatically interpretation #1. Hm. Well, that's a really good point although I kinda disagree with your assumption: I think it's much cleaner to have: select '[1,2,3]'::int[]::json produce a json array. All types but text (record[] etc) would seem to use the type structure to define how the json gets laid out. 'text::json' is an exception, because there is an implied parse, which I'm starting to unfortunately think is the wrong behavior if you want to be able to make json datums out of sql datums: how do you create a vanilla json text datum? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Excerpts from Kohei KaiGai's message of jue sep 27 01:06:41 -0300 2012: Hi Alvaro, Let me volunteer for reviewing, of course, but now pgsql_fdw is in my queue... Sure, thanks -- keep in mind I entered this patch in the next commitfest, so please do invest more effort in the ones in the commitfest now in progress. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/27/2012 09:22 AM, Robert Haas wrote: On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to json member or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. I feel like there are two different behaviors that someone might want here, and a cast cannot mean both. [snip] Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. [pause for another bout of vigorous self-criticism] So how about this suggestion: we'll look for a visible function named as_json or some such which has one parameter of the given type and returns json, and if it's present use it instead of the standard text representation. As an optimization we'll skip that lookup for builtin types, since there won't be one. Of course, we'll have one for hstore. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 09:22 AM, Robert Haas wrote: Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. Well, that doesn't make it wrong, it just means there's more work needed. I'm not that thrilled with magic assumptions about function names either; schema search path issues, for example, will make that dangerous. We've gone to considerable lengths to avoid embedding assumptions about operator names, and assumptions about function names aren't any better. There are at least three ways we could use the cast machinery for this: (1) Reject Robert's assumption that we have to support both interpretations for every cast situation. For instance, it doesn't seem that unreasonable to me to insist that you have to cast to text and then to json if you want the literal-reinterpretation behavior. The main problem then is figuring out a convenient way to provide interpretation #2 for text itself. (2) Add another hidden argument to cast functions, or perhaps repurpose one of the ones that exist now. This is likely to come out rather ugly because of the need to shoehorn it into an API that's already suffered a couple of rounds of after-the-fact additions, but it's certainly possible in principle. The main thing I'd want is to not define it in a JSON-only fashion --- so the first thing is to be able to explain the distinction we're trying to make in a type-independent way. (3) Invent an auxiliary type along the lines of json_value and say that you create a cast from foo to json_value when you want one interpretation, or directly to json if you want the other. Then things like record_to_json would look for the appropriate type of cast. This is a bit ugly because the auxiliary type has no reason to live other than to separate the two kinds of cast, but it avoids creating any new JSON-specific mechanisms in the type system. There might be some other ideas I'm not thinking of. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On Thu, Sep 27, 2012 at 10:09 AM, Andrew Dunstan and...@dunslane.net wrote: So how about this suggestion: we'll look for a visible function named as_json or some such which has one parameter of the given type and returns json, and if it's present use it instead of the standard text representation. As an optimization we'll skip that lookup for builtin types, since there won't be one. Of course, we'll have one for hstore. I think that general approach has some promise, although there are some security issues to worry about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
Hm... IMO, '[1,2,3]'::json '[1,2,3]'::text::json '[1,2,3]'::int[]::json are the same thing... (though I am not sure '[1,2,3]'::int[] is valid in postgres...) in js var o = JSON.parse(result_of_any_cast_above) should produce array of 3 integer '[1,2,3]' is different then'[1,2,3]' If there is the need to some text value as '[1,2,3]' be treated as JSON text value, then it would be: quote_literal('[1,2,3]')::json Kind Regards, Misa || 2012/9/27 Merlin Moncure mmonc...@gmail.com On Thu, Sep 27, 2012 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to json member or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. I feel like there are two different behaviors that someone might want here, and a cast cannot mean both. 1. Please reinterpret the existing value that I have already got as a JSON object. For example, you might have a text field in which you have been storing JSON values. Once you upgrade to 9.2, you might want to reinterpret the existing contents of the field - which are already valid JSON - as JSON objects. 2. Please convert the value that I have into a JSON object according to a type-specific rule. For example, you might have a text field in which you store arbitrary strings. But perhaps you need to store structured data there, so once you upgrade to 9.2 you might want to wrap up your strings inside JSON strings. Now there is some subtle ambiguity here because in some cases the behavior can be exactly the same in both cases. For example most numeric values will get the same treatment either way, but NaN cannot. If you do mynumeric::json, interpretation #1 will fail for NaN but interpretation #2 will probably produce something like NaN. Similarly if the type is boolean, we could likely get away with producing true and false for either interpretation. If the type is hstore, then #1 is going to fail, but #2 is going to convert 1=2 to {1:2}. So in general it might seem that #2 is the better interpretation, because it gives many casts a sensible interpretation that is otherwise lacking. But, what about text? It seems to me that users will count on the fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON array containing the first three numbers) and NOT [1,2,3] (a JSON string containing 7 characters). And that is emphatically interpretation #1. Hm. Well, that's a really good point although I kinda disagree with your assumption: I think it's much cleaner to have: select '[1,2,3]'::int[]::json produce a json array. All types but text (record[] etc) would seem to use the type structure to define how the json gets laid out. 'text::json' is an exception, because there is an implied parse, which I'm starting to unfortunately think is the wrong behavior if you want to be able to make json datums out of sql datums: how do you create a vanilla json text datum? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql, remove include of psqlscan.c
Hi, This patch (psql_remove_include.patch) eliminates the #include of psqlscan.c at the bottom of mainloop.c. The attached patch uses the %top{} flex feature introduced in flex 2.5.30 released 2003-4-1. (See the NEWS file for flex.) The good news is that config/programs.m4 requires flex = 2.5.31. The bad news is that RHEL 5 (released 2007-03-14 with a 13 year (10+3 years) support lifecycle) has a flex (2.5.4a) that is too old for this patch. :-( (At least this is what the changelog in flex from the RHEL 5 srpm repo tells me.) I don't know what this means. The flex docs on my box, fwiw, don't mention %top{} in the section describing Posix (in)compatibility. The patch is against git head. All the tests pass, fwiw. Because this depends on the toolchain it wouldn't hurt to build on other architectures/ operating systems. I'm thinking of exposing enough of the psql parser, moving it to libpq, that any client-side app can do what libpq does; given a bunch of sql separated by semi-colons get the results of all the statements. This should also allow the statement separation to be done on the client side in libpq. Although I don't imagine that this will have a performance impact on the server side it sounds like a first step toward pushing more of the parsing onto the client. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein P.S. Sent this patch to the list at Sep 27 00:43:28 Central time and it never showed up so I'm sending again with a different subject. Info on the previous message: Subject:Remove #include psqlscan.c from psql message-id=1348724599.28442.0@mofo diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile index 771fd71..e05b1cf 100644 --- a/src/bin/psql/Makefile +++ b/src/bin/psql/Makefile @@ -23,7 +23,7 @@ override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) -I$(top_srcdir)/src/bin/p OBJS= command.o common.o help.o input.o stringutils.o mainloop.o copy.o \ startup.o prompt.o variables.o large_obj.o print.o describe.o \ tab-complete.o mbprint.o dumputils.o keywords.o kwlookup.o \ - sql_help.o \ + sql_help.o psqlscan.o \ $(WIN32RES) FLEXFLAGS = -Cfe -b -p -p @@ -46,9 +46,6 @@ sql_help.c: sql_help.h ; sql_help.h: create_help.pl $(wildcard $(REFDOCDIR)/*.sgml) $(PERL) $ $(REFDOCDIR) $* -# psqlscan is compiled as part of mainloop -mainloop.o: psqlscan.c - psqlscan.c: psqlscan.l ifdef FLEX $(FLEX) $(FLEXFLAGS) -o'$@' $ diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c index 16c65ec..3f55813 100644 --- a/src/bin/psql/mainloop.c +++ b/src/bin/psql/mainloop.c @@ -413,13 +413,3 @@ MainLoop(FILE *source) return successResult; } /* MainLoop() */ - - -/* - * psqlscan.c is #include'd here instead of being compiled on its own. - * This is because we need postgres_fe.h to be read before any system - * include files, else things tend to break on platforms that have - * multiple infrastructures for stdio.h and so on. flex is absolutely - * uncooperative about that, so we can't compile psqlscan.c on its own. - */ -#include psqlscan.c diff --git a/src/bin/psql/mainloop.h b/src/bin/psql/mainloop.h index 134987a..1557187 100644 --- a/src/bin/psql/mainloop.h +++ b/src/bin/psql/mainloop.h @@ -9,6 +9,7 @@ #define MAINLOOP_H #include postgres_fe.h +#include psqlscan.h int MainLoop(FILE *source); diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l index 1208c8f..069c46f 100644 --- a/src/bin/psql/psqlscan.l +++ b/src/bin/psql/psqlscan.l @@ -37,8 +37,17 @@ * *- */ +%} + /* + * postgres_fe.h must be read before any system include files, else + * things tend to break on platforms that have multiple + * infrastructures for stdio.h and so on. + */ +%top{ #include postgres_fe.h +} +%{ #include psqlscan.h #include ctype.h -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Excerpts from Simon Riggs's message of jue sep 27 06:51:28 -0300 2012: On 26 September 2012 15:47, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Really, as far as autovacuum is concerned, it would be much more useful to be able to reliably detect that a table has been recently vacuumed, without having to request a 10ms-recent pgstat snapshot. That would greatly reduce the amount of time autovac spends on pgstat requests. VACUUMing generates a relcache invalidation. Can we arrange for those invalidations to be received by autovac launcher, so it gets immediate feedback of recent activity without polling? Hmm, this is an interesting idea worth exploring, I think. Maybe we should sort tables in the autovac worker to-do list by age of last invalidation messages received, or something like that. Totally unclear on the details, but as I said, worth exploring. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
Karl O. Pinc k...@meme.com writes: This patch (psql_remove_include.patch) eliminates the #include of psqlscan.c at the bottom of mainloop.c. I don't really see that this is enough of an improvement to justify depending on a non-portable flex feature. I'm thinking of exposing enough of the psql parser, moving it to libpq, that any client-side app can do what libpq does; given a bunch of sql separated by semi-colons get the results of all the statements. This should also allow the statement separation to be done on the client side in libpq. Although I don't imagine that this will have a performance impact on the server side it sounds like a first step toward pushing more of the parsing onto the client. Quite honestly, I don't like that idea at all either. It's bad enough that psql has to know so much low-level SQL syntax. If we start encouraging other programs to know that, we're going to be locked into never again making any lexical-level changes. Why exactly is pushing parsing onto the client a good idea? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
On 09/27/2012 10:07:48 AM, Tom Lane wrote: Karl O. Pinc k...@meme.com writes: This patch (psql_remove_include.patch) eliminates the #include of psqlscan.c at the bottom of mainloop.c. I don't really see that this is enough of an improvement to justify depending on a non-portable flex feature. Ok. (I had no idea it was non-portable. The flex docs don't mention this.) I'm thinking of exposing enough of the psql parser, moving it to libpq, that any client-side app can do what libpq does; given a bunch of sql separated by semi-colons get the results of all the statements. This should also allow the statement separation to be done on the client side in libpq. Although I don't imagine that this will have a performance impact on the server side it sounds like a first step toward pushing more of the parsing onto the client. Quite honestly, I don't like that idea at all either. It's bad enough that psql has to know so much low-level SQL syntax. If we start encouraging other programs to know that, we're going to be locked into never again making any lexical-level changes. Why exactly is pushing parsing onto the client a good idea? There's really 2 ideas. I don't care about this one: pushing parsing/anything onto the client is a good idea because clients scale horizontally and so performance is improved. What I'm thinking of in libpq is the ability to give it big string with many sql statements and have it hand back each statement so the client can then submit it to the server for execution. What I really _want_ is to be able get a bit string of many sql statements from the user and return the results, statuses, etc. of executing each statement. Just what psql does when, say, fed a file from stdin. The reason I want this is because I don't want to have to rewrite the sql parser in PHP for inclusion in phpPgAdmin. (I did this once, and it was such a big ugly patch it never got around to getting into the mainline phpPgAdmin.) phpPgAdmin (pgAdmin/ front-end of your choice) should be able process a string of sql statements in the same way that psql does, but currently the only way to do this is to rewrite the parser in each application. Much better to have libpq provide the functionality, although I suppose it's possible to push this into the server. Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
Excerpts from Karl O. Pinc's message of jue sep 27 12:29:53 -0300 2012: The reason I want this is because I don't want to have to rewrite the sql parser in PHP for inclusion in phpPgAdmin. (I did this once, and it was such a big ugly patch it never got around to getting into the mainline phpPgAdmin.) phpPgAdmin (pgAdmin/ front-end of your choice) should be able process a string of sql statements in the same way that psql does, but currently the only way to do this is to rewrite the parser in each application. Much better to have libpq provide the functionality, although I suppose it's possible to push this into the server. This seems a worthy goal to me. But I think I see what Tom objection to it is: if we export this capability to libpq applications, then we set it in stone to a certain extent: exactly how things are split would become part of the API, so to speak. Upgrading to a newer libpq could break application code that worked with the previous release's by splitting things differently. I don't currently have an opinion on whether this is a bad thing or not. Who wants to argue for/against? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
On 09/27/12 8:41 AM, Alvaro Herrera wrote: But I think I see what Tom objection to it is: if we export this capability to libpq applications, then we set it in stone to a certain extent: exactly how things are split would become part of the API, so to speak. Upgrading to a newer libpq could break application code that worked with the previous release's by splitting things differently. I don't currently have an opinion on whether this is a bad thing or not. Who wants to argue for/against? I wonder if it shouldn't be in a separate 'helper' library, as it has no direct ties to any libpq internals. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogInsert scaling, revisited
On 24.09.2012 21:06, Fujii Masao wrote: The patch could be applied cleanly and the compile could be successfully done. Thanks for the testing! But when I ran initdb, I got the following assertion error: -- $ initdb -D data --locale=C --encoding=UTF-8 The files belonging to this database system will be owned by user postgres. This user must also own the server process. The database cluster will be initialized with locale C. The default text search configuration will be set to english. creating directory data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB creating configuration files ... ok creating template1 database in data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... TRAP: FailedAssertion(!(((uint64) currpos) % 8192= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) ~((intptr_t) ((8) - 1))) || rdata_len == 0), File: xlog.c, Line: 1363) sh: line 1: 29537 Abort trap: 6 /dav/hoge/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 /dev/null child process exited with exit code 134 initdb: removing data directory data -- I got the above problem on MacOS: Hmm, I cannot reproduce this on my Linux laptop. However, I think I see what the problem is: the assertion should assert that (*CurrPos* % XLOG_BLCKZ = SizeOfXLogShortPHD), not currpos. The former is an XLogRecPtr, the latter is a pointer. If the WAL buffers are aligned at 8k boundaries, the effect is the same, but otherwise the assertion is just wrong. And as it happens, if O_DIRECT is defined, we align WAL buffers at XLOG_BLCKSZ. I think that's why I don't see this on my laptop. Does Mac OS X not define O_DIRECT? Anyway, attached is a patch with that fixed. - Heikki xloginsert-scale-21.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
Alvaro Herrera alvhe...@2ndquadrant.com writes: But I think I see what Tom objection to it is: if we export this capability to libpq applications, then we set it in stone to a certain extent: exactly how things are split would become part of the API, so to speak. Upgrading to a newer libpq could break application code that worked with the previous release's by splitting things differently. That's not exactly what the problem is: an upgrade could only break *existing* application code if we had made a non-backward-compatible change in SQL lexical rules, which hopefully we'd never do. Rather, the problem is that the server might know about some newer lexical feature, and so might the application, but if libpq is behind the times then it's broken. You can see an instance of this right now over in the pgsql-jdbc list, where they're arguing about fixing the JDBC driver to understand dollar quoting. Application authors not unreasonably think it should have heard of that by now. The JDBC example is sort of an argument in favor of Karl's idea, in that if client-side code is going to know this anyway then it would be better if it were at least localized in one place. But of course JDBC is never going to depend on libpq (wrong language) so moving this code into libpq isn't actually going to help them. A larger point is that I don't believe this is actually going to help anybody, because of mismatch of requirements not only implementation language. JDBC couldn't use a libpq lexer implementation even without the language issue, because the context in which they're arguing about this is finding and replacing JDBC-spec escape sequences, which libpq is not going to know about. I imagine PHP has got the same problem only different. Conversely, psql's lexer has a lot of psql-specific design decisions, such as the need to handle backslash commands and include files, that I don't think would belong in a libpq implementation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Wed, Sep 26, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote: How many would EDB need for it to be useful? Looks like we currently have about 160 hard-coded OIDs in our fork that are not in PostgreSQL, across all catalogs. Actually, there are probably some things we could do to reduce that number, which might be the smarter way to go. But taken at face value I suppose that means we would need a reservation of several hundred to allow for future growth. I am somewhat opposed to the idea of an OID registry. I can't see why the space of things people want to reserve OIDs for would be only tens rather than thousands. There are surely plenty of extensions that would like to depend on specific other extensions, or on core. If we legislate that hard-coded OIDs are the way to do that, I think we're going to end up with a lot of 'em. Maybe you have a better feel than I do for how many live addon types are out there. Maybe there are hundreds or thousands, but if there are I am blissfully unaware of them. Well, that's a fair point. There probably aren't. But then again, the proposed registry wouldn't only cover live projects. We'd probably have a decent number of people say: I can't do what I want unless I have an OID. And then the project becomes dormant or obsolescent but we still have the OID reservation and there's not really any politically feasible way of recovering the address space. I can't help thinking that it sounds a little bit like what IANA does, and the problems they face, except with 2^16 OIDs instead of 2^32 IP addresses. Admittedly there should be a lot less demand for type OIDs than for IP addresses, but the amount of address space we can allocate without pain is also much smaller. I did like the alternative idea upthread of UUIDs for types which would give them a virtually unlimited space. Yeah, me too. That doesn't require a centralized authority (hence, no debates here about whether a given extension is important enough to merit an allocation of a given size), doesn't move us further in the direction of exposing the database's internal identifiers as a concept that users have to care about, ad provides essentially infinite address space. There's more engineering work involved but sometimes more engineering work means a better result. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for Array ELEMENT Foreign Keys
On Tue, Sep 18, 2012 at 05:52:51PM +0200, Marco Nenciarini wrote: please find attached the refreshed v1 patch. I perused this version in comparison to the last version I reviewed, finding minor problems. First, a warning: tablecmds.c: In function `ATExecAddConstraint': tablecmds.c:5898: warning: `fk_element_type' may be used uninitialized in this function tablecmds.c:5898: note: `fk_element_type' was declared here I don't see an actual bug; add a dead store to silence the compiler. *** a/src/test/regress/parallel_schedule --- b/src/test/regress/parallel_schedule *** test: event_trigger *** 94,100 # -- # Another group of parallel tests # -- ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json # -- # Another group of parallel tests --- 94,100 # -- # Another group of parallel tests # -- ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock element_foreign_key Keep that json test. + errmsg(array ELEMENT foreign keys only support NO ACTION +and RESTRICT actions))); Project style is not to break message literals; instead, let the line run long. There are a few more examples of this in your patch. Those problems are isolated and do not impugn design, so committer time would be just as well-spent on the latest version. As such, I'm marking the patch Ready for Committer. Thanks to Rafal Pietrak for his helpful review. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
I did like the alternative idea upthread of UUIDs for types which would give them a virtually unlimited space. Yeah, me too. That doesn't require a centralized authority (hence, no debates here about whether a given extension is important enough to merit an allocation of a given size), doesn't move us further in the direction of exposing the database's internal identifiers as a concept that users have to care about, ad provides essentially infinite address space. There's more engineering work involved but sometimes more engineering work means a better result. Random thought from the sideline... GIT is able to provide assurances as to content because it creates a hash. Now, for a function PostgreSQL could hash the catalog entry (including function body) and return than as proof that said function is the same as one installed in some other database or published publically. I do not know enough about the other objects to know if something similar is possible but maybe this will spark someone else's thoughts. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
On 09/27/2012 11:02:42 AM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: A larger point is that I don't believe this is actually going to help anybody, because of mismatch of requirements not only implementation language. JDBC couldn't use a libpq lexer implementation even without the language issue, because the context in which they're arguing about this is finding and replacing JDBC-spec escape sequences, which libpq is not going to know about. I imagine PHP has got the same problem only different. Conversely, psql's lexer has a lot of psql-specific design decisions, such as the need to handle backslash commands and include files, that I don't think would belong in a libpq implementation. Well no, I'm not at all interested in escape sequences. I want to take sql directly from the user and execute it. Right now I can take only one statement at a time. And this is true of any human-facing application. I'm not looking for a general purpose solution, although it did occur to me that the psql variable substitution mechanism could be exposed. But what I really want is not an exposed parser. What I really want is to be able to get results from all the statements passed to PQexec (et-al), not just the last statement. This could be done without exposing the parser, but it does mean having a parser in libpq. Since psql links to libpq anyway my plan was to move the parser entirely out of psql into libpq and have an undocumented internal interface so that psql can do the escaping/variable substitution stuff. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
On 09/27/2012 11:02:42 AM, Tom Lane wrote: Rather, the problem is that the server might know about some newer lexical feature, and so might the application, but if libpq is behind the times then it's broken. If the application knows about the newer feature and wants to use it, is it unreasonable to ask the application be linked against a newer libpq? Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Sep 27, 2012 9:52 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Sep 26, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote: How many would EDB need for it to be useful? Looks like we currently have about 160 hard-coded OIDs in our fork that are not in PostgreSQL, across all catalogs. Actually, there are probably some things we could do to reduce that number, which might be the smarter way to go. But taken at face value I suppose that means we would need a reservation of several hundred to allow for future growth. I'm not sure that's a way we really want to go down. How do we define which third party vendors would get to reserve oids? And how many? And under what other potential terms? Seems like we'd set ourselves up for endless discussions and bike shedding... I am somewhat opposed to the idea of an OID registry. I can't see why the space of things people want to reserve OIDs for would be only tens rather than thousands. There are surely plenty of extensions that would like to depend on specific other extensions, or on core. If we legislate that hard-coded OIDs are the way to do that, I think we're going to end up with a lot of 'em. Maybe you have a better feel than I do for how many live addon types are out there. Maybe there are hundreds or thousands, but if there are I am blissfully unaware of them. Well, that's a fair point. There probably aren't. But then again There are probably more than we know. Many many internal things at different organizations. the proposed registry wouldn't only cover live projects. We'd probably have a decent number of people say: I can't do what I want unless I have an OID. And then the project becomes dormant or obsolescent but we still have the OID reservation and there's not really any politically feasible way of recovering the address space. I can't help thinking that it sounds a little bit like what IANA does, and the problems they face, except with 2^16 OIDs instead of 2^32 IP addresses. Admittedly there should be a lot less demand for type OIDs than for IP addresses, but the amount of address space we can allocate without pain is also much smaller. Yeah, I think that would rapidly turn into a pain point. I did like the alternative idea upthread of UUIDs for types which would give them a virtually unlimited space. Yeah, me too. That doesn't require a centralized authority (hence, no debates here about whether a given extension is important enough to merit an allocation of a given size), doesn't move us further in the direction of exposing the database's internal identifiers as a concept that users have to care about, ad provides essentially infinite address space. There's more engineering work involved but sometimes more engineering work means a better result. Yeah, seems like it would work much better long term. /Magnus
Re: [HACKERS] patch: shared session variables
Hello 2012/9/24 Heikki Linnakangas hlinnakan...@vmware.com: Having read through this thread, the consensus seems to be that we don't want this patch as it is (and I agree with that). As I understand it, you are trying to solve two problems: 1. Passing parameters to a DO statement. You could quote the parameters and inline them in the block itself in the client, but quoting can be difficult to do correctly and is not as readable anyway, so I agree it would be good to be able to pass them separately. 2. Returning results from a DO statement. At the moment, a DO statement is a utility statement, so it cannot return a result set. Two different approaches to these problems have been discussed that have some support: A) WITH FUNCTION syntax, to allow defining a temporary function for a single query, similar to the current WITH syntax. (http://archives.postgresql.org/pgsql-hackers/2012-07/msg00426.php) B) DO ... USING parameter list syntax. This is a straightforward extension of the current DO syntax, just adding a parameter list to it. Not sure how to return a result set from this, perhaps also support a RETURNS keyword, similar to CREATE FUNCTION. I'm ok with either of those approaches. A) would be more flexible, while B) would be straightforward extension of what we already have. I'm marking this patch as rejected in the commitfest app. Please pursue the WITH FUNCTION or DO ... USING syntax instead. A basic discussion should be about a character of DO statements. What it should be? Temporary function or temporary stored procedure. Both variants are correct. I prefer idea, where DO is temporary procedure. A reply on this question solves a question about returning result from DO statement. We can support more calling context for DO statements like we do with general functions - and then both your @A and @B variants should be possible supported. A blocker for @B is unsupported parametrization for utility statements. If we can support a CALL statement, then we will have solution for parametrized DO statement too. From my perspective, missing parametrized DO is not blocker for anything. Is not super elegant, but workaround needs only a few lines more. \gsets solves last missing functionality. So I would to close this topic - we should to implement procedures and CALL statement first - this functionality is relative clean - a base is described by ANSI/SQL - and as next step we can reimplement anynymous (temporary) functions or procedures. Regards Pavel Thanks! On 31.08.2012 21:27, Pavel Stehule wrote: 2012/8/31 Dimitri Fontainedimi...@2ndquadrant.fr: Pavel Stehulepavel.steh...@gmail.com writes: Pavel, you didn't say what you think about the WITH FUNCTION proposal? I don't like it - this proposal is too lispish - it is not SQL We're not doing lambda here, only extending a facility that we rely on today. The function would be named, for one. I don't know what you mean by SQL being lispish here, and I can't imagine why it would be something to avoid. And you didn't say how do you want to turn a utility statement into something that is able to return a result set. if we support real procedures ala sybase procedures (MySQL, MSSQL..) - then we can return result with same mechanism - there are no significant difference between DO and CALL statements - you don't know what will be result type before you call it. Currently we don't have CALL, and we have DO which is not a query but a utility statement. Are you proposing to implement CALL? What would be the difference between making DO a query and having CALL? defacto a CALL statement implementation can solve this issue. The core of this issue is an impossibility using parameters for utility statements. CALL and DO are utility statements - and if we can use parameters for CALL, then we can do it for DO too. CALL statement starts a predefined batch - inside this batch, you can do anything - can use COMMIT, ROLLBACK, SELECTs, ... DO is some batch with immediate start. Sure, there is relative significant between stored procedures implemented in popular RDBMS and although I don't like T-SQL too much, I like sybase concept of stored procedures - it is strong and very useful for maintaining tasks. Regards Pavel Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - assign result of query to psql variable
Hello 2012/9/21 Shigeru HANADA shigeru.han...@gmail.com: Hi Pavel, (2012/09/21 2:01), Pavel Stehule wrote: - Is it intentional that \gset can set special variables such as AUTOCOMMIT and HOST? I don't see any downside for this behavior, because \set also can do that, but it is not documented nor tested at all. I use a same SetVariable function, so a behave should be same It seems reasonable. Document - Adding some description of \gset command, especially about limitation of variable list, seems necessary. - In addition to the meta-command section, Advanced features section mentions how to set psql's variables, so we would need some mention there too. - The term target list might not be familiar to users, since it appears in only sections mentioning PG internal relatively. I think that the feature described in the section Retrieving Query Results in ECPG document is similar to this feature. http://www.postgresql.org/docs/devel/static/ecpg-variables.html I invite any proposals about enhancing documentation. Personally I am a PostgreSQL developer, so I don't known any different term other than target list - but any user friendly description is welcome. How about to say stores the query's result output into variable? Please see attached file for my proposal. I also mentioned about 1-row limit and omit of variable. should be Coding == The code follows our coding conventions. Here are comments for coding. - Some typo found in comments, please see attached patch. - There is a code path which doesn't print error message even if libpq reported error (PGRES_BAD_RESPONSE, PGRES_NONFATAL_ERROR, PGRES_FATAL_ERROR) in StoreQueryResult. Is this intentional? FYI, ecpg prints bad response message for those errors. yes - it is question. I use same pattern like PrintQueryResult, but bad response message should be used. I am sending updated patch It seems ok. BTW, as far as I see, no psql backslash command including \setenv (it was added in 9.2) has regression test in core (I mean src/test/regress). Is there any convention about this issue? If psql backslash commands (or any psql feature else) don't need regression test, we can remove psql.(sql|out). # Of course we need to test new feature by hand. It is question for Tom or David - only server side functionalities has regress tests. But result of some backslash command is verified in other regress tests. I would to see some regression tests for this functionality. Anyway, IMO the name psql impresses larger area than the patch implements. How about to rename psql to psql_cmd or backslash_cmd than psql as regression test name? I have no idea - psql_cmd is good name Regards Pavel -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
Karl O. Pinc k...@meme.com writes: What I'm thinking of in libpq is the ability to give it big string with many sql statements and have it hand back each statement so the client can then submit it to the server for execution. What I really _want_ is to be able get a bit string of many sql statements from the user and return the results, statuses, etc. of executing each statement. Just what psql does when, say, fed a file from stdin. Just as a note --- I believe you can get that result today with PQsendQuery followed by a PQgetResult loop. There's no need to provide an API that splits the string first. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_signal_backend() asymmetry
Excerpts from Noah Misch's message of mié sep 26 17:54:43 -0300 2012: I'm marking this patch Ready for Committer. I suggest backpatching it to 9.2; the patch corrects an oversight in 9.2 changes. There's more compatibility value in backpatching than in retaining distinct behavior for 9.2 only. On Thu, Jun 28, 2012 at 09:32:41AM -0700, Josh Kupershmidt wrote: ! if (!superuser()) { /* ! * Since the user is not superuser, check for matching roles. */ ! if (proc-roleId != GetUserId()) ! return SIGNAL_BACKEND_NOPERMISSION; } I would have collapsed the conditionals and deleted the obvious comment: if (!(superuser() || proc-roleId == GetUserId())) return SIGNAL_BACKEND_NOPERMISSION; The committer can do that if desired; no need for another version. Thanks, I pushed to HEAD and 9.2 with the suggested tweaks. Documentation doesn't specify the behavior of the non-superuser case when there's trouble, so I too think the behavior change in 9.2 is okay. I am less sure about it needing documenting. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
Karl O. Pinc k...@meme.com writes: On 09/27/2012 11:02:42 AM, Tom Lane wrote: Rather, the problem is that the server might know about some newer lexical feature, and so might the application, but if libpq is behind the times then it's broken. If the application knows about the newer feature and wants to use it, is it unreasonable to ask the application be linked against a newer libpq? Well, see the JDBC example: there is no newer driver with the desired feature for applications to link against. Even if all client-side code with such knowledge stays perfectly in sync as far as upstream sources are concerned, there are any number of reasons why particular installations might have copies of varying vintages. It's not really a situation that I want to get into more than necessary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Thu, Sep 27, 2012 at 2:34 PM, Magnus Hagander mag...@hagander.net wrote: I'm not sure that's a way we really want to go down. How do we define which third party vendors would get to reserve oids? And how many? And under what other potential terms? Seems like we'd set ourselves up for endless discussions and bike shedding... Not really. I'm only proposing that it would be nice to have a block of OIDs that core agrees not to assign for any other purpose, not that we dole out specific ones to specific companies. There's no reason why, for example, EnterpriseDB's fork can't use OIDs from the same reserved block as PostgreSQL-XC's fork or Greenplum's fork or Aster Data's fork - those are all distinct projects. All might need private OIDs but they can all come from the same range because the code bases don't mingle. That having been said, we've gotten this far without having any terrible trouble about this, so maybe it's not worth worrying about. It's a nice-to-have, not a big deal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql, remove include of psqlscan.c
On 09/27/2012 02:28:49 PM, Tom Lane wrote: Karl O. Pinc k...@meme.com writes: What I really _want_ is to be able get a bit string of many sql statements from the user and return the results, statuses, etc. of executing each statement. Just what psql does when, say, fed a file from stdin. Just as a note --- I believe you can get that result today with PQsendQuery followed by a PQgetResult loop. There's no need to provide an API that splits the string first. Ah. *sigh* I should pay better attention. Sorry for the distraction. Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On 27 September 2012 15:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Excerpts from Simon Riggs's message of jue sep 27 06:51:28 -0300 2012: On 26 September 2012 15:47, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Really, as far as autovacuum is concerned, it would be much more useful to be able to reliably detect that a table has been recently vacuumed, without having to request a 10ms-recent pgstat snapshot. That would greatly reduce the amount of time autovac spends on pgstat requests. VACUUMing generates a relcache invalidation. Can we arrange for those invalidations to be received by autovac launcher, so it gets immediate feedback of recent activity without polling? Hmm, this is an interesting idea worth exploring, I think. Maybe we should sort tables in the autovac worker to-do list by age of last invalidation messages received, or something like that. Totally unclear on the details, but as I said, worth exploring. Just put them to back of queue if an inval is received. There is already support for listening and yet never generating to relcache inval messages. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Thu, Sep 27, 2012 at 10:02 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Sep 27, 2012 at 2:34 PM, Magnus Hagander mag...@hagander.net wrote: I'm not sure that's a way we really want to go down. How do we define which third party vendors would get to reserve oids? And how many? And under what other potential terms? Seems like we'd set ourselves up for endless discussions and bike shedding... Not really. I'm only proposing that it would be nice to have a block of OIDs that core agrees not to assign for any other purpose, not that we dole out specific ones to specific companies. There's no reason Ah, ok. In that case I agree, that wouldn't be open for a lot of bikeshedding. why, for example, EnterpriseDB's fork can't use OIDs from the same reserved block as PostgreSQL-XC's fork or Greenplum's fork or Aster Data's fork - those are all distinct projects. All might need private OIDs but they can all come from the same range because the code bases don't mingle. That having been said, we've gotten this far without having any terrible trouble about this, so maybe it's not worth worrying about. It's a nice-to-have, not a big deal. Yeah, there's got to be a whole lot of other very much more complicated things you have to do with each new major version :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/27/2012 10:36 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 09:22 AM, Robert Haas wrote: Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. Well, that doesn't make it wrong, it just means there's more work needed. I'm not that thrilled with magic assumptions about function names either; schema search path issues, for example, will make that dangerous. We've gone to considerable lengths to avoid embedding assumptions about operator names, and assumptions about function names aren't any better. There are at least three ways we could use the cast machinery for this: (1) Reject Robert's assumption that we have to support both interpretations for every cast situation. For instance, it doesn't seem that unreasonable to me to insist that you have to cast to text and then to json if you want the literal-reinterpretation behavior. The main problem then is figuring out a convenient way to provide interpretation #2 for text itself. The trouble is, ISTM, that both things seem equally intuitive. You could easily argue that x::text::json means take x as text and treat it as json, or that it means take x as text and produce a valid json value from it by escaping and quoting it. It's particularly ambiguous when x is itself already a text value. If we go this way I suspect we'll violate POLA for a good number of users. (2) Add another hidden argument to cast functions, or perhaps repurpose one of the ones that exist now. This is likely to come out rather ugly because of the need to shoehorn it into an API that's already suffered a couple of rounds of after-the-fact additions, but it's certainly possible in principle. The main thing I'd want is to not define it in a JSON-only fashion --- so the first thing is to be able to explain the distinction we're trying to make in a type-independent way. I agree with the ugly part of this analysis :-) (3) Invent an auxiliary type along the lines of json_value and say that you create a cast from foo to json_value when you want one interpretation, or directly to json if you want the other. Then things like record_to_json would look for the appropriate type of cast. This is a bit ugly because the auxiliary type has no reason to live other than to separate the two kinds of cast, but it avoids creating any new JSON-specific mechanisms in the type system. I could accept this. The reason is that very few types are in fact going to need a gadget like this. Yes it's mildly ugly, but really fairly unobtrusive. cheers andrew There might be some other ideas I'm not thinking of. Yeah. You've done better than me though :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
Excerpts from Kohei KaiGai's message of lun sep 10 08:08:32 -0300 2012: As attached, I split off the original one into three portions; for set-schema, set-owner and rename-to. Please apply them in order of patch filename. Regarding to the error message, RenameErrorMsgAlreadyExists() was added to handle per object type messages in case when aclcheck_error() is not available to utilize. All the regression test is contained with the 1st patch to make sure the series of reworks does not change existing behaviors. I checked this and noticed that the regression test has a couple of failures -- see below. I think we should remove the test for the first two hunks, and fix the query for the final one to remove the offending column. The good news is that I ran this without applying the whole patch, only the new regression test' files. I didn't check the files in detail. I have skimmed over the patches and they seem reasonable too; I will look at them in more detail later. I think we should go ahead and apply the (tweaked) regression test alone, as a first step. *** /pgsql/source/HEAD/src/test/regress/expected/alter_generic.out 2012-09-27 17:23:05.0 -0300 --- /home/alvherre/Code/pgsql/build/HEAD/src/test/regress/results/alter_generic.out 2012-09-27 17:29:21.0 -0300 *** *** 106,112 CREATE COLLATION alt_coll1 (locale = 'C'); CREATE COLLATION alt_coll2 (locale = 'C'); ALTER COLLATION alt_coll1 RENAME TO alt_coll2; -- failed (name conflict) ! ERROR: collation alt_coll2 for encoding SQL_ASCII already exists in schema alt_nsp1 ALTER COLLATION alt_coll1 RENAME TO alt_coll3; -- OK ALTER COLLATION alt_coll2 OWNER TO regtest_alter_user2; -- failed (no role membership) ERROR: must be member of role regtest_alter_user2 --- 106,112 CREATE COLLATION alt_coll1 (locale = 'C'); CREATE COLLATION alt_coll2 (locale = 'C'); ALTER COLLATION alt_coll1 RENAME TO alt_coll2; -- failed (name conflict) ! ERROR: collation alt_coll2 for encoding UTF8 already exists in schema alt_nsp1 ALTER COLLATION alt_coll1 RENAME TO alt_coll3; -- OK ALTER COLLATION alt_coll2 OWNER TO regtest_alter_user2; -- failed (no role membership) ERROR: must be member of role regtest_alter_user2 *** *** 125,131 ALTER COLLATION alt_coll3 SET SCHEMA alt_nsp2; -- failed (not owner) ERROR: must be owner of collation alt_coll3 ALTER COLLATION alt_coll2 SET SCHEMA alt_nsp2; -- failed (name conflict) ! ERROR: collation alt_coll2 for encoding SQL_ASCII already exists in schema alt_nsp2 RESET SESSION AUTHORIZATION; SELECT n.nspname, c.collname, a.rolname FROM pg_collation c, pg_namespace n, pg_authid a --- 125,131 ALTER COLLATION alt_coll3 SET SCHEMA alt_nsp2; -- failed (not owner) ERROR: must be owner of collation alt_coll3 ALTER COLLATION alt_coll2 SET SCHEMA alt_nsp2; -- failed (name conflict) ! ERROR: collation alt_coll2 for encoding UTF8 already exists in schema alt_nsp2 RESET SESSION AUTHORIZATION; SELECT n.nspname, c.collname, a.rolname FROM pg_collation c, pg_namespace n, pg_authid a *** *** 334,341 ORDER BY nspname, opfname; nspname | opfname | amname | rolname --+--++- ! alt_nsp1 | alt_opc1 | hash | kaigai ! alt_nsp1 | alt_opc2 | hash | kaigai alt_nsp1 | alt_opf2 | hash | regtest_alter_user2 alt_nsp1 | alt_opf3 | hash | regtest_alter_user1 alt_nsp1 | alt_opf4 | hash | regtest_alter_user2 --- 334,341 ORDER BY nspname, opfname; nspname | opfname | amname | rolname --+--++- ! alt_nsp1 | alt_opc1 | hash | alvherre ! alt_nsp1 | alt_opc2 | hash | alvherre alt_nsp1 | alt_opf2 | hash | regtest_alter_user2 alt_nsp1 | alt_opf3 | hash | regtest_alter_user1 alt_nsp1 | alt_opf4 | hash | regtest_alter_user2 == -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]
Hello I reduced this patch as just plpgsql (SPI) problem solution. Correct generic solution needs a discussion about enhancing our libpq interface - we can take a number of returned rows, but we should to get number of processed rows too. A users should to parse this number from completationTag, but this problem is not too hot and usually is not blocker, because anybody can process completationTag usually. So this issue is primary for PL/pgSQL, where this information is not possible. There is precedent - CREATE AS SELECT (thanks for sample :)), and COPY should to have same impact on ROW_COUNT like this statement (last patch implements it). Regards Pavel 2012/9/21 Amit Kapila amit.kap...@huawei.com: On Friday, September 21, 2012 1:23 AM Pavel Stehule wrote: Basic stuff: - Patch applies OK. but offset difference in line numbers. - Compiles with errors in contrib [pg_stat_statements, sepgsql] modules - Regression failed; one test-case in COPY due to incomplete test-case attached patch. – same as reported by Heikki fixed patch is in attachment After modifications: --- - Patch applies OK - Compiles cleanly without any errors/warnings - Regression tests pass. What it does: -- Modification to get the number of processed rows evaluated via SPI. The changes are to add extra parameter in ProcessUtility to get the number of rows processed by COPY command. Code Review Comments: - 1. New parameter is added to ProcessUtility_hook_type function but the functions which get assigned to these functions like sepgsql_utility_command, pgss_ProcessUtility, prototype definition is not modified. Functionality is not fixed correctly for hook functions, In function pgss_ProcessUtility for bellow snippet of code processed parameter is passed NULL, as well as not initialized. because of this when pg_stat_statements extention is utilized COPY command is giving garbage values. if (prev_ProcessUtility) prev_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); else standard_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); Testcase is attached. In this testcase table has only 1000 records but it show garbage value. postgres=# show shared_preload_libraries ; shared_preload_libraries -- pg_stat_statements (1 row) postgres=# CREATE TABLE tbl (a int); CREATE TABLE postgres=# INSERT INTO tbl VALUES(generate_series(1,1000)); INSERT 0 1000 postgres=# do $$ declare r int; begin copy tbl to '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'exported % rows', r; truncate tbl; copy tbl from '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'imported % rows', r; end; $$ language plpgsql; postgres$# NOTICE: exported 13281616 rows NOTICE: imported 13281616 rows DO 2. Why to add the new parameter if completionTag hold the number of processed tuple information; can be extracted from it as follows: _SPI_current-processed = strtoul(completionTag + 7, NULL, 10); this is basic question. I prefer a natural type for counter - uint64 instead text. And there are no simply way to get offset (7 in this case) I agree with your point, but currently in few other places we are parsing the completion tag for getting number of tuples processed. So may be in future we can change those places as well. For example yes, this step can be done in future - together with libpq enhancing. Is not adequate change API (and break lot of extensions) for this isolated problem. So I propose fix plpgsql issue, and add to ToDo - enhance libpq to return a number of processed rows as number - but this change breaking compatibility. Pavel pgss_ProcessUtility { .. /* parse command tag to retrieve the number of affected rows. */ if (completionTag sscanf(completionTag, COPY UINT64_FORMAT, rows) != 1) rows = 0; } _SPI_execute_plan { .. .. if (IsA(stmt, CreateTableAsStmt)) { Assert(strncmp(completionTag, SELECT , 7) == 0); _SPI_current-processed = strtoul(completionTag + 7, NULL, 10); .. } With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers copy-processed-rows-simple.patch Description:
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Tue, Sep 25, 2012 at 5:08 PM, Brian Weaver cmdrcluel...@gmail.com wrote: Unless I misread the code, the tar format and streaming xlog are mutually exclusive. Considering my normal state of fatigue it's not unlikely. I don't want to have to set my wal_keep_segments artificially high just for the backup Correct, you can't use both of those at the same time. That can certainly be improved - but injecting a file into the tar from a different process is far from easy. But one idea might be to just stream the WAL into a *separate* tarfile in this case. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Brian Weaver cmdrcluel...@gmail.com writes: While researching the way streaming replication works I was examining the construction of the tar file header. By comparing documentation on the tar header format from various sources I certain the following patch should be applied to so the group identifier is put into thee header properly. Yeah, this is definitely wrong. While I realize that wikipedia isn't always the best source of information, the header offsets seem to match the other documentation I've found. The format is just easier to read on wikipedia The authoritative specification can be found in the pax page in the POSIX spec, which is available here: http://pubs.opengroup.org/onlinepubs/9699919799/ I agree that the 117 number is bogus, and also that the magic ustar string is written incorrectly. What's more, it appears that the latter error has been copied from pg_dump (but the 117 seems to be just a new bug in pg_basebackup). I wonder what else might be wrong hereabouts :-( Will sit down and take a closer look. I believe what we need to do about this is: 1. fix pg_dump and pg_basebackup output to conform to spec. 2. make sure pg_restore will accept both conformant and previous-generation files. Am I right in believing that we don't have any code that's expected to read pg_basebackup output? We just feed it to tar, no? Yes. We generate a .tarfile, but we have no tool that reads it ourselves. (unlike pg_dump where we have pg_restore that actually reads it) I'm a bit concerned about backwards compatibility issues. It looks to me like existing versions of pg_restore will flat out reject files that have a spec-compliant ustar\0 MAGIC field. Is it going to be sufficient if we fix this in minor-version updates, or are we going to need to have a switch that tells pg_dump to emit the incorrect old format? (Ick.) Do we officially support using an older pg_restore to reload a newer dump? I think not? As long as we don't officially support that, I think we'll be ok. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Magnus, I probably just did a poor job of explaining what I wanted to try. I was going to have the base backup open two connections; one to stream the tar archive, the second to receive the wal files like pg_receivexlog. The wal files received on the second connection would be streamed to a temporary file, with tar headers. Then when the complete tar archive from the first header was complete received simply replay the contents from the temporary file to append them to the tar archive. Turns out that isn't necessary. It was an idea borne from my misunderstanding of how the pg_basebackup worked. The archive will include all the wal files if I make wal_keep_segments high enough. -- Brian On Thu, Sep 27, 2012 at 6:01 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Sep 25, 2012 at 5:08 PM, Brian Weaver cmdrcluel...@gmail.com wrote: Unless I misread the code, the tar format and streaming xlog are mutually exclusive. Considering my normal state of fatigue it's not unlikely. I don't want to have to set my wal_keep_segments artificially high just for the backup Correct, you can't use both of those at the same time. That can certainly be improved - but injecting a file into the tar from a different process is far from easy. But one idea might be to just stream the WAL into a *separate* tarfile in this case. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- /* insert witty comment here */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
OK, here is my attempt at patching and correcting the issue in this thread. I have done my best to test to ensure that hot standby, pg_basebackup, and pg_restore of older files work without issues. I think this might be a larger patch that expected, I took some liberties of trying to clean up a bit. For example the block size '512' was scattered throughout the code regarding the tar block size. I've replace instances of that with a defined constant TAR_BLOCK_SIZE. I've likewise created other constants and used them in place of raw numbers in what I hope makes the code a bit more readable. I've also used functions like strncpy(), strnlen(), and the like in place of sprintf() where I could. Also instead of using sscanf() I used a custom octal conversion routine which has a hard limit on how many character it will process like strncpy() strnlen(). I expect comments, hopefully they'll be positive. -- Brian -- /* insert witty comment here */ postgresql-tar.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Fri, Sep 28, 2012 at 12:12 AM, Brian Weaver cmdrcluel...@gmail.com wrote: Magnus, I probably just did a poor job of explaining what I wanted to try. I was going to have the base backup open two connections; one to stream the tar archive, the second to receive the wal files like pg_receivexlog. This is what --xlog=stream does. The wal files received on the second connection would be streamed to a temporary file, with tar headers. Then when the complete tar archive from the first header was complete received simply replay the contents from the temporary file to append them to the tar archive. Ah, yeah, that should also work I guess. But you could also just leave the the data in the temporary tarfile the whole time. IIRC, you can just cat one tarfile to the end of another one, right? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Magnus Hagander mag...@hagander.net writes: On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit concerned about backwards compatibility issues. It looks to me like existing versions of pg_restore will flat out reject files that have a spec-compliant ustar\0 MAGIC field. Is it going to be sufficient if we fix this in minor-version updates, or are we going to need to have a switch that tells pg_dump to emit the incorrect old format? (Ick.) Do we officially support using an older pg_restore to reload a newer dump? I think not? As long as we don't officially support that, I think we'll be ok. Well, for the -Fc format, we have an explicit version number, and pg_restore is supposed to be able to read anything with current or prior version number. We don't bump the version number too often, but we've definitely done it anytime we added new features at the file-format level. However, since the whole point of the -Ft format is to be standard-compliant, people might be surprised if it fell over in a backwards-compatibility situation. Having said all that, I don't think we have a lot of choices here. A tar format output option that isn't actually tar format has hardly any excuse to live at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Fri, Sep 28, 2012 at 12:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit concerned about backwards compatibility issues. It looks to me like existing versions of pg_restore will flat out reject files that have a spec-compliant ustar\0 MAGIC field. Is it going to be sufficient if we fix this in minor-version updates, or are we going to need to have a switch that tells pg_dump to emit the incorrect old format? (Ick.) Do we officially support using an older pg_restore to reload a newer dump? I think not? As long as we don't officially support that, I think we'll be ok. Well, for the -Fc format, we have an explicit version number, and pg_restore is supposed to be able to read anything with current or prior version number. We don't bump the version number too often, but we've definitely done it anytime we added new features at the file-format level. However, since the whole point of the -Ft format is to be standard-compliant, people might be surprised if it fell over in a backwards-compatibility situation. Having said all that, I don't think we have a lot of choices here. A tar format output option that isn't actually tar format has hardly any excuse to live at all. Yeah, that's what I'm thinking - it's really a bugfix... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Brian Weaver cmdrcluel...@gmail.com writes: OK, here is my attempt at patching and correcting the issue in this thread. I have done my best to test to ensure that hot standby, pg_basebackup, and pg_restore of older files work without issues. I think this might be a larger patch that expected, I took some liberties of trying to clean up a bit. For example the block size '512' was scattered throughout the code regarding the tar block size. I've replace instances of that with a defined constant TAR_BLOCK_SIZE. I've likewise created other constants and used them in place of raw numbers in what I hope makes the code a bit more readable. That seems possibly reasonable ... I've also used functions like strncpy(), strnlen(), and the like in place of sprintf() where I could. Also instead of using sscanf() I used a custom octal conversion routine which has a hard limit on how many character it will process like strncpy() strnlen(). ... but I doubt that this really constitutes a readability improvement. Or a portability improvement. strnlen for instance is not to be found in Single Unix Spec v2 (http://pubs.opengroup.org/onlinepubs/007908799/) which is what we usually take as our baseline assumption about which system functions are available everywhere. By and large, I think the more different system functions you rely on, the harder it is to read your code, even if some unusual system function happens to exactly match your needs in particular places. It also greatly increases the risk of having portability problems, eg on Windows, or non-mainstream Unix platforms. But a larger point is that the immediate need is to fix bugs. Code beautification is a separate activity and would be better submitted as a separate patch. There is no way I'd consider applying most of this patch to the back branches, for instance. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Magnus Hagander mag...@hagander.net writes: Ah, yeah, that should also work I guess. But you could also just leave the the data in the temporary tarfile the whole time. IIRC, you can just cat one tarfile to the end of another one, right? Not if they're written according to spec, I think. There is an EOF marker consisting of 2 blocks of zeroes (which pg_dump is failing to create correctly, but that's a bug not a feature). Possibly you could strip the EOF marker though, if the file join code is allowed to be something smarter than cat. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On 09/27/2012 06:30 PM, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit concerned about backwards compatibility issues. It looks to me like existing versions of pg_restore will flat out reject files that have a spec-compliant ustar\0 MAGIC field. Is it going to be sufficient if we fix this in minor-version updates, or are we going to need to have a switch that tells pg_dump to emit the incorrect old format? (Ick.) Do we officially support using an older pg_restore to reload a newer dump? I think not? As long as we don't officially support that, I think we'll be ok. Well, for the -Fc format, we have an explicit version number, and pg_restore is supposed to be able to read anything with current or prior version number. We don't bump the version number too often, but we've definitely done it anytime we added new features at the file-format level. However, since the whole point of the -Ft format is to be standard-compliant, people might be surprised if it fell over in a backwards-compatibility situation. Having said all that, I don't think we have a lot of choices here. A tar format output option that isn't actually tar format has hardly any excuse to live at all. I agree, but it's possibly worth pointing out that GNU tar has no trouble at all processing the erroneous format, and the file program on my Linux system has no trouble recognizing it as a tar archive. Nevertheless, I think we should fix all live versions of pg_dump make all live versions of pg-restore accept both formats. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Fri, Sep 28, 2012 at 12:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Ah, yeah, that should also work I guess. But you could also just leave the the data in the temporary tarfile the whole time. IIRC, you can just cat one tarfile to the end of another one, right? Not if they're written according to spec, I think. There is an EOF marker consisting of 2 blocks of zeroes (which pg_dump is failing to create correctly, but that's a bug not a feature). Possibly you could strip the EOF marker though, if the file join code is allowed to be something smarter than cat. Hmm. Yeah. It seems gnu tar has --concatenate Not sure if it's in the standard or a GNU extension though. But it says: However, tar archives incorporate an end-of-file marker which must be removed if the concatenated archives are to be read properly as one archive. `--concatenate' removes the end-of-archive marker from the target archive before each new archive is appended. If you use cat to combine the archives, the result will not be a valid tar format archive. If you need to retrieve files from an archive that was added to using the cat utility, use the --ignore-zeros (-i) option. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 06:30 PM, Tom Lane wrote: Having said all that, I don't think we have a lot of choices here. A tar format output option that isn't actually tar format has hardly any excuse to live at all. I agree, but it's possibly worth pointing out that GNU tar has no trouble at all processing the erroneous format, and the file program on my Linux system has no trouble recognizing it as a tar archive. Well, they're falling back to assuming that the file is a pre-POSIX tarfile, which is why you don't see string user/group names for instance. Nevertheless, I think we should fix all live versions of pg_dump make all live versions of pg-restore accept both formats. I think it's clear that we should make all versions of pg_restore accept either spelling of the magic string. It's less clear that we should change the output of pg_dump in back branches though. I think the only reason we'd not get complaints about that is that not that many people are relying on tar-format output anyway. Anybody who is would probably be peeved if version 8.3.21 pg_restore couldn't read the output of version 8.3.22 pg_dump. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Fri, Sep 28, 2012 at 12:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 06:30 PM, Tom Lane wrote: Having said all that, I don't think we have a lot of choices here. A tar format output option that isn't actually tar format has hardly any excuse to live at all. I agree, but it's possibly worth pointing out that GNU tar has no trouble at all processing the erroneous format, and the file program on my Linux system has no trouble recognizing it as a tar archive. Well, they're falling back to assuming that the file is a pre-POSIX tarfile, which is why you don't see string user/group names for instance. Nevertheless, I think we should fix all live versions of pg_dump make all live versions of pg-restore accept both formats. I think it's clear that we should make all versions of pg_restore accept either spelling of the magic string. It's less clear that we should change the output of pg_dump in back branches though. I think the only reason we'd not get complaints about that is that not that many people are relying on tar-format output anyway. Anybody who is would probably be peeved if version 8.3.21 pg_restore couldn't read the output of version 8.3.22 pg_dump. There's no real point to using the tar format in pg_dump, really, is there? Which is why I think most people just don't use it. pg_basebackup in tar format is a much more useful thing, of course... So we could fix just pg_basebackup in backbranches (since we never read anything, it shouldn't be that big a problem), and then do pg_dump in HEAD only? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/27/2012 09:18 PM, Andrew Dunstan wrote: On 09/27/2012 10:36 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 09:22 AM, Robert Haas wrote: Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. Well, that doesn't make it wrong, it just means there's more work needed. I'm not that thrilled with magic assumptions about function names either; schema search path issues, for example, will make that dangerous. We've gone to considerable lengths to avoid embedding assumptions about operator names, and assumptions about function names aren't any better. There are at least three ways we could use the cast machinery for this: (1) Reject Robert's assumption that we have to support both interpretations for every cast situation. For instance, it doesn't seem that unreasonable to me to insist that you have to cast to text and then to json if you want the literal-reinterpretation behavior. Maybe cast not to text but to cstring for getting the text-is-already-json ? That is, reuse the current type io as literal casts. This way a cast of '{a: 1}'::json::text will fail, as this json value really does not represent a text/string value. The main problem then is figuring out a convenient way to provide interpretation #2 for text itself. The trouble is, ISTM, that both things seem equally intuitive. You could easily argue that x::text::json means take x as text and treat it as json, or that it means take x as text and produce a valid json value from it by escaping and quoting it. It's particularly ambiguous when x is itself already a text value. If we go this way I suspect we'll violate POLA for a good number of users. It may be easier to sort this out if we think in terms of symmetry and unambiguity. let's postulate that mytype::json::mytype and json::mytype::json should always reproduce the original result or they should fail. so '[1,2,3]'::text::json::text === '[1,2,3]'::text with intermediate json being '[1,2,3]' and '[1,2,3]'::json::text::json fails the json--text casts as '[1,2,3]'::json does not represent a text value (in a similar way as '[1,2,3]'::json::date fails) on the other hand '[1,2,3]'::json::int[]::json should succeed as there is a direct mapping to int array. (3) Invent an auxiliary type along the lines of json_value and say that you create a cast from foo to json_value when you want one interpretation, or directly to json if you want the other. Then things like record_to_json would look for the appropriate type of cast. This is a bit ugly because the auxiliary type has no reason to live other than to separate the two kinds of cast, but it avoids creating any new JSON-specific mechanisms in the type system. As suggested above, this special type could be on the other side - the type cstring as already used for type io functions the main problem here is, that currently we do interpret ::text::json as it were the type input function. we do proper selective quoting when converting to back json hannu=# create table jt(t text, j json); CREATE TABLE hannu=# insert into jt values ('[1,2]','[3,4]'); INSERT 0 1 hannu=# select row_to_json(jt) from jt; row_to_json - {t:[1,2],j:[3,4]} (1 row) but we do automatic casting through cstring and json type input func when converting to json. hannu=# select t::json, j::json from jt; t | j ---+--- [1,2] | [3,4] (1 row) This should probably be cleaned up. I could accept this. The reason is that very few types are in fact going to need a gadget like this. Yes it's mildly ugly, but really fairly unobtrusive. cheers andrew There might be some other ideas I'm not thinking of. Yeah. You've done better than me though :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Magnus Hagander mag...@hagander.net writes: On Fri, Sep 28, 2012 at 12:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's clear that we should make all versions of pg_restore accept either spelling of the magic string. It's less clear that we should change the output of pg_dump in back branches though. I think the only reason we'd not get complaints about that is that not that many people are relying on tar-format output anyway. Anybody who is would probably be peeved if version 8.3.21 pg_restore couldn't read the output of version 8.3.22 pg_dump. There's no real point to using the tar format in pg_dump, really, is there? Which is why I think most people just don't use it. I think folks who know the tool realize that custom format is better. But we've seen plenty of indications that novices sometimes choose tar format. For instance, people occasionally complain about its 8GB-per-file limit. pg_basebackup in tar format is a much more useful thing, of course... Right. So we could fix just pg_basebackup in backbranches (since we never read anything, it shouldn't be that big a problem), and then do pg_dump in HEAD only? Yeah, pg_basebackup is an entirely different tradeoff. I think we want to fix its problems in all branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD's replication WAS Switching timeline over streaming replication
On 9/26/12 6:17 PM, m...@rpzdesign.com wrote: Josh: The good part is you are the first person to ask for a copy and I will send you the hook code that I have and you can be a good sport and put it on GitHub, that is great, you can give us both credit for a joint effort, I do the code, you put it GitHub. Well, I think it just makes sense for you to put it up somewhere public so that folks can review it; if not Github, then somewhere else. If it's useful and well-written, folks will be interested. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/27/2012 06:58 PM, Hannu Krosing wrote: On 09/27/2012 09:18 PM, Andrew Dunstan wrote: On 09/27/2012 10:36 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 09/27/2012 09:22 AM, Robert Haas wrote: Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. Well, that doesn't make it wrong, it just means there's more work needed. I'm not that thrilled with magic assumptions about function names either; schema search path issues, for example, will make that dangerous. We've gone to considerable lengths to avoid embedding assumptions about operator names, and assumptions about function names aren't any better. There are at least three ways we could use the cast machinery for this: (1) Reject Robert's assumption that we have to support both interpretations for every cast situation. For instance, it doesn't seem that unreasonable to me to insist that you have to cast to text and then to json if you want the literal-reinterpretation behavior. Maybe cast not to text but to cstring for getting the text-is-already-json ? That is, reuse the current type io as literal casts. This way a cast of '{a: 1}'::json::text will fail, as this json value really does not represent a text/string value. The main problem then is figuring out a convenient way to provide interpretation #2 for text itself. The trouble is, ISTM, that both things seem equally intuitive. You could easily argue that x::text::json means take x as text and treat it as json, or that it means take x as text and produce a valid json value from it by escaping and quoting it. It's particularly ambiguous when x is itself already a text value. If we go this way I suspect we'll violate POLA for a good number of users. It may be easier to sort this out if we think in terms of symmetry and unambiguity. let's postulate that mytype::json::mytype and json::mytype::json should always reproduce the original result or they should fail. Where are all these casts from json going to come from? What is going to dequote and unescape strings, or turn objects into hstores? You're making this much bigger than what I had in mind. The advantage of Tom's option (3) that I liked is that it is very minimal. Any type can provide its own function for conversion to json. If it's there we use it, if it's not we use its standard text representation. Let's stick to the KISS principle. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 27-09-2012 01:30, Amit Kapila wrote: I understood this point, but currently in documentation of Timelines, this usecase is not documented (Section 24.3.5). Timeline documentation was written during PITR implementation. There wasn't SR yet. AFAICS it doesn't cite SR but is sufficiently generic (it use 'wal records' term to explain the feature). Feel free to reword those paragraphs mentioning SR. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On Thu, Sep 27, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Brian Weaver cmdrcluel...@gmail.com writes: OK, here is my attempt at patching and correcting the issue in this thread. I have done my best to test to ensure that hot standby, pg_basebackup, and pg_restore of older files work without issues. I think this might be a larger patch that expected, I took some liberties of trying to clean up a bit. For example the block size '512' was scattered throughout the code regarding the tar block size. I've replace instances of that with a defined constant TAR_BLOCK_SIZE. I've likewise created other constants and used them in place of raw numbers in what I hope makes the code a bit more readable. That seems possibly reasonable ... I've also used functions like strncpy(), strnlen(), and the like in place of sprintf() where I could. Also instead of using sscanf() I used a custom octal conversion routine which has a hard limit on how many character it will process like strncpy() strnlen(). ... but I doubt that this really constitutes a readability improvement. Or a portability improvement. strnlen for instance is not to be found in Single Unix Spec v2 (http://pubs.opengroup.org/onlinepubs/007908799/) which is what we usually take as our baseline assumption about which system functions are available everywhere. By and large, I think the more different system functions you rely on, the harder it is to read your code, even if some unusual system function happens to exactly match your needs in particular places. It also greatly increases the risk of having portability problems, eg on Windows, or non-mainstream Unix platforms. But a larger point is that the immediate need is to fix bugs. Code beautification is a separate activity and would be better submitted as a separate patch. There is no way I'd consider applying most of this patch to the back branches, for instance. regards, tom lane Here's a very minimal fix then, perhaps it will be more palatable. Even though I regret the effort I put into the first patch it's in my employer's best interest that it's fixed. I'm obliged to try to remediate the problem in something more acceptable to the community. enjoy -- /* insert witty comment here */ postgresql-tar.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] setting per-database/role parameters checks them against wrong context
Example: create temporary table foo (a int); insert into foo values (1); alter role peter set temp_buffers = 120; ERROR: 22023: invalid value for parameter temp_buffers: 120 DETAIL: temp_buffers cannot be changed after any temporary tables have been accessed in the session. Another example: set log_statement_stats = on; alter role peter set log_parser_stats = on; ERROR: 22023: invalid value for parameter log_parser_stats: 1 DETAIL: Cannot enable parameter when log_statement_stats is true. Another example is that in =9.1, ALTER DATABASE ... SET search_path would check the existence of the schema in the current database, but that was done away with in 9.2. The first example could probably be fixed if check_temp_buffers() paid attention to the GucSource, but in the second case and in general there doesn't seem to be a way to distinguish assigning per-database setting and enacting per-database setting as a source. Ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]
On Friday, September 28, 2012 2:28 AM Pavel Stehule wrote: Hello I reduced this patch as just plpgsql (SPI) problem solution. Correct generic solution needs a discussion about enhancing our libpq interface - we can take a number of returned rows, but we should to get number of processed rows too. A users should to parse this number from completationTag, but this problem is not too hot and usually is not blocker, because anybody can process completationTag usually. So this issue is primary for PL/pgSQL, where this information is not possible. There is precedent - CREATE AS SELECT (thanks for sample :)), and COPY should to have same impact on ROW_COUNT like this statement (last patch implements it). IMO now this patch is okay. I have marked it as Ready For Committer. With Regards, Amit Kapila. 2012/9/21 Amit Kapila amit.kap...@huawei.com: On Friday, September 21, 2012 1:23 AM Pavel Stehule wrote: Basic stuff: - Patch applies OK. but offset difference in line numbers. - Compiles with errors in contrib [pg_stat_statements, sepgsql] modules - Regression failed; one test-case in COPY due to incomplete test-case attached patch. – same as reported by Heikki fixed patch is in attachment After modifications: --- - Patch applies OK - Compiles cleanly without any errors/warnings - Regression tests pass. What it does: -- Modification to get the number of processed rows evaluated via SPI. The changes are to add extra parameter in ProcessUtility to get the number of rows processed by COPY command. Code Review Comments: - 1. New parameter is added to ProcessUtility_hook_type function but the functions which get assigned to these functions like sepgsql_utility_command, pgss_ProcessUtility, prototype definition is not modified. Functionality is not fixed correctly for hook functions, In function pgss_ProcessUtility for bellow snippet of code processed parameter is passed NULL, as well as not initialized. because of this when pg_stat_statements extention is utilized COPY command is giving garbage values. if (prev_ProcessUtility) prev_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); else standard_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); Testcase is attached. In this testcase table has only 1000 records but it show garbage value. postgres=# show shared_preload_libraries ; shared_preload_libraries -- pg_stat_statements (1 row) postgres=# CREATE TABLE tbl (a int); CREATE TABLE postgres=# INSERT INTO tbl VALUES(generate_series(1,1000)); INSERT 0 1000 postgres=# do $$ declare r int; begin copy tbl to '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'exported % rows', r; truncate tbl; copy tbl from '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'imported % rows', r; end; $$ language plpgsql; postgres$# NOTICE: exported 13281616 rows NOTICE: imported 13281616 rows DO 2. Why to add the new parameter if completionTag hold the number of processed tuple information; can be extracted from it as follows: _SPI_current-processed = strtoul(completionTag + 7, NULL, 10); this is basic question. I prefer a natural type for counter - uint64 instead text. And there are no simply way to get offset (7 in this case) I agree with your point, but currently in few other places we are parsing the completion tag for getting number of tuples processed. So may be in future we can change those places as well. For example yes, this step can be done in future - together with libpq enhancing. Is not adequate change API (and break lot of extensions) for this isolated problem. So I propose fix plpgsql issue, and add to ToDo - enhance libpq to return a number of processed rows as number - but this change breaking compatibility. Pavel pgss_ProcessUtility { .. /* parse command tag to retrieve the number of affected rows. */ if (completionTag sscanf(completionTag, COPY UINT64_FORMAT, rows) != 1) rows = 0; } _SPI_execute_plan { .. .. if (IsA(stmt, CreateTableAsStmt)) { Assert(strncmp(completionTag, SELECT , 7) == 0); _SPI_current-processed = strtoul(completionTag + 7, NULL, 10); .. } With Regards, Amit Kapila. --
Re: [HACKERS] ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]
2012/9/28 Amit Kapila amit.kap...@huawei.com: On Friday, September 28, 2012 2:28 AM Pavel Stehule wrote: Hello I reduced this patch as just plpgsql (SPI) problem solution. Correct generic solution needs a discussion about enhancing our libpq interface - we can take a number of returned rows, but we should to get number of processed rows too. A users should to parse this number from completationTag, but this problem is not too hot and usually is not blocker, because anybody can process completationTag usually. So this issue is primary for PL/pgSQL, where this information is not possible. There is precedent - CREATE AS SELECT (thanks for sample :)), and COPY should to have same impact on ROW_COUNT like this statement (last patch implements it). IMO now this patch is okay. I have marked it as Ready For Committer. Thank you very much for review Regards Pavel With Regards, Amit Kapila. 2012/9/21 Amit Kapila amit.kap...@huawei.com: On Friday, September 21, 2012 1:23 AM Pavel Stehule wrote: Basic stuff: - Patch applies OK. but offset difference in line numbers. - Compiles with errors in contrib [pg_stat_statements, sepgsql] modules - Regression failed; one test-case in COPY due to incomplete test-case attached patch. – same as reported by Heikki fixed patch is in attachment After modifications: --- - Patch applies OK - Compiles cleanly without any errors/warnings - Regression tests pass. What it does: -- Modification to get the number of processed rows evaluated via SPI. The changes are to add extra parameter in ProcessUtility to get the number of rows processed by COPY command. Code Review Comments: - 1. New parameter is added to ProcessUtility_hook_type function but the functions which get assigned to these functions like sepgsql_utility_command, pgss_ProcessUtility, prototype definition is not modified. Functionality is not fixed correctly for hook functions, In function pgss_ProcessUtility for bellow snippet of code processed parameter is passed NULL, as well as not initialized. because of this when pg_stat_statements extention is utilized COPY command is giving garbage values. if (prev_ProcessUtility) prev_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); else standard_ProcessUtility(parsetree, queryString, params, dest, completionTag, context, NULL); Testcase is attached. In this testcase table has only 1000 records but it show garbage value. postgres=# show shared_preload_libraries ; shared_preload_libraries -- pg_stat_statements (1 row) postgres=# CREATE TABLE tbl (a int); CREATE TABLE postgres=# INSERT INTO tbl VALUES(generate_series(1,1000)); INSERT 0 1000 postgres=# do $$ declare r int; begin copy tbl to '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'exported % rows', r; truncate tbl; copy tbl from '/home/kiran/copytest.csv' csv; get diagnostics r = row_count; raise notice 'imported % rows', r; end; $$ language plpgsql; postgres$# NOTICE: exported 13281616 rows NOTICE: imported 13281616 rows DO 2. Why to add the new parameter if completionTag hold the number of processed tuple information; can be extracted from it as follows: _SPI_current-processed = strtoul(completionTag + 7, NULL, 10); this is basic question. I prefer a natural type for counter - uint64 instead text. And there are no simply way to get offset (7 in this case) I agree with your point, but currently in few other places we are parsing the completion tag for getting number of tuples processed. So may be in future we can change those places as well. For example yes, this step can be done in future - together with libpq enhancing. Is not adequate change API (and break lot of extensions) for this isolated problem. So I propose fix plpgsql issue, and add to ToDo - enhance libpq to return a number of processed rows as number - but this change breaking compatibility. Pavel pgss_ProcessUtility { .. /* parse command tag to retrieve the number of affected rows. */ if (completionTag sscanf(completionTag, COPY UINT64_FORMAT, rows) != 1) rows = 0; } _SPI_execute_plan { .. .. if (IsA(stmt, CreateTableAsStmt)) { Assert(strncmp(completionTag, SELECT , 7) == 0); _SPI_current-processed =