Re: [HACKERS] Proposing pg_hibernate
On Wed, May 28, 2014 at 7:31 AM, Gurjeet Singh gurj...@singh.im wrote: Caveats -- - Buffer list is saved only when Postgres is shutdown in smart and fast modes. That is, buffer list is not saved when database crashes, or on immediate shutdown. - A reduction in `shared_buffers` is not detected. If the `shared_buffers` is reduced across a restart, and if the combined saved buffer list is larger than the new shared_buffers, Postgres Hibernator continues to read and restore blocks even after `shared_buffers` worth of buffers have been restored. How about the cases when shared buffers already contain some data: a. Before Readers start filling shared buffers, if this cluster wishes to join replication as a slave and receive the data from master, then this utility might need to evict some buffers filled during startup phase. b. As soon as the server completes startup (reached consistent point), it allows new connections which can also use some shared buffers before Reader process could use shared buffers or are you planing to change the time when users can connect to database. I am not sure if replacing shared buffer contents in such cases can always be considered useful. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] json casts
On 05/27/2014 11:55 PM, Robert Haas wrote: On Tue, May 27, 2014 at 5:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be inclined to think a more useful answer to this issue would be to make json.c special-case timestamps, as it already does for numerics. I wonder if anyone besides me is nervous about changing the semantics here. It seems like the sort of backward-compatibility break we normally avoid. If we do make such a compatibility break, it should certainly be release-noted. Yes, certainly it should be. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spreading full-page writes
On 27 May 2014 13:20, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/27/2014 03:18 PM, Simon Riggs wrote: IIRC Koichi had a patch for prefetch during recovery. Heikki, is that the reason you also discussed changing the WAL record format to allow us to identify the blocks touched by recovery more easily? Yeah, that was one use case I had in mind for the WAL format changes. See http://www.postgresql.org/message-id/533d6cbf.6080...@vmware.com. Those proposals suggest some very big changes to the way WAL works. Prefetch can work easily enough for most records - do we really need that much churn? You mentioned Btree vacuum records, but I'm planning to optimize those another way. Why don't we just have the prefetch code in core and forget the WAL format changes? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spreading full-page writes
On 27 May 2014 18:18, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, May 26, 2014 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 26, 2014 at 1:22 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I don't think we know that. The server might have crashed before that second record got generated. (This appears to be an unfixable flaw in this proposal.) The second record is generated before the checkpoint is finished and the checkpoint record is written. So it will be there. (if you crash before the checkpoint is finished, the in-progress checkpoint is no good for recovery anyway, and won't be used) Hmm, I see. It's not great to have to generate WAL at buffer-eviction time, though. Normally, when we go to evict a buffer, the WAL is already written. We might have to wait for it to be flushed, but if the WAL writer is doing its job, hopefully not. But here we'll definitely have to wait for the WAL flush. I'm not sure we do need to flush it. If the checkpoint finishes, then the WAL surely got flushed as part of the process of recording the end of the checkpoint. If the checkpoint does not finish, recovery will start from the previous checkpoint, which does contain the FPW (because if it didn't, the page would not be eligible for this treatment) and so the possibly torn page will get overwritten in full. I think Robert is correct, you would need to flush WAL before writing the disk buffer. That is the current invariant of WAL before data. However, we don't need to do this in a simple way: FPW-flush-buffer, we can do that with more buffering. So it seems like a reasonable idea to do this using a 64 buffer BulkAccessStrategy object and flush the WAL every 64 buffers. That's beginning to look more like double buffering though... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spreading full-page writes
On 05/28/2014 09:41 AM, Simon Riggs wrote: On 27 May 2014 13:20, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/27/2014 03:18 PM, Simon Riggs wrote: IIRC Koichi had a patch for prefetch during recovery. Heikki, is that the reason you also discussed changing the WAL record format to allow us to identify the blocks touched by recovery more easily? Yeah, that was one use case I had in mind for the WAL format changes. See http://www.postgresql.org/message-id/533d6cbf.6080...@vmware.com. Those proposals suggest some very big changes to the way WAL works. Prefetch can work easily enough for most records - do we really need that much churn? You mentioned Btree vacuum records, but I'm planning to optimize those another way. Why don't we just have the prefetch code in core and forget the WAL format changes? Well, the prefetching was just one example of why the proposed WAL format changes are a good idea. The changes will make life easier for any external (or internal, for that matter) tool that wants to read WAL records. The thing that finally really got me into doing that was pg_rewind. For pg_rewind it's not enough to cover most records, you have to catch all modifications to data pages for correctness, and that's difficult to maintain as new WAL record types are added and old ones are modified in every release. Also, the changes make WAL-logging and -replaying code easier to write. Which reduces the potential for bugs. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition within _bt_findinsertloc()? (new page split code)
On 05/28/2014 02:15 AM, Peter Geoghegan wrote: On Tue, May 27, 2014 at 12:19 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Fair enough, but I don't think that affects correctness either way (I don't think that you meant to imply that this was a necessary precaution that you'd taken - right?). Right. So, the comments within _bt_search() suggest that the _bt_moveright() call will perform a _bt_finish_split() call opportunistically iff it's called from _bt_doinsert() (i.e. access == BT_WRITE). There is no reason to not do so in all circumstances though, assuming that it's desirable to do so as soon as possible (which I *don't* actually assume). You can't write in a hot standby, so that's one reason to only do it when inserting, not when querying. Even when you're not in a hot standby, it seems like a nice property that a read-only query doesn't do writes. I know we do hint bit updates and other kinds of write-action when reading anyway, but still. If I'm not mistaken, it's also true that it would be strictly speaking correct to never do it there. Hmm. No, it wouldn't. It is not allowed to have two incompletely-split pages adjacent to each other. If you move right to the right-half of an incomplete split, i.e. a page that does not have a downlink in its parent, and then try to split the page again, _bt_insert_parent() would fail to find the location to insert the new downlink to. It assumes that there is a downlink to the page being split in the parent, and uses that to find the location for the new downlink. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rangetypes spgist questions/refactoring
On Tue, May 20, 2014 at 11:18:29AM -0700, Jeff Davis wrote: I think this can be done without breaking upgrade compatibility, because I think the structure already satisfies the invariants I mentioned in the other email (aside from the special case of a root tuple with two nodes and no prefix). That being said, it's a little scary to refactor indexing code while trying to keep it upgrade-compatible. We can make pg_upgrade mark such indexes as invalid and create a user script to reindex all the indexes after the upgrade. We have done that in the past. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Sun, May 25, 2014 at 5:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: I agree that there are not many cases left to remove the join that remain after is_simple_subquery() has decided not to pullup the subquery. Some of the perhaps more common cases would be having windowing functions in the subquery as this is what you need to do if you want to include the results of a windowing function from within the where clause. Another case, though I can't imagine it would be common, is ORDER BY in the subquery... But for that one I can't quite understand why is_simple_subquery() stops that being flattened in the first place. The problem there is that (in general) pushing qual conditions to below a window function will change the window function's results. If we flatten such a subquery then the outer query's quals can get evaluated before the window function, so that's no good. Another issue is that flattening might cause the window function call to get copied to places in the outer query where it can't legally go, such as the WHERE clause. I should have explained more clearly. I was meaning that a query such as this: SELECT a.* FROM a LEFT OUTER JOIN (SELECT id,LAG(id) OVER (ORDER BY id) AS prev_id FROM b) b ON a.id=b.id; assuming that id is the primary key, could have the join removed. I was just commenting on this as it's probably a fairly common thing to have a subquery with windowing functions in order to perform some sort of filtering of window function columns in the outer query. The other use cases for example: SELECT a.* FROM a LEFT OUTER JOIN (SELECT id FROM b LIMIT 10) b ON a.id=b.id ; Are likely less common. Regards David Rowley
Re: [HACKERS] pg9.4b1: unhelpful error message when creating a collation
Searching for that error turned up: https://sourceware.org/bugzilla/show_bug.cgi?id=14247 https://bugzilla.redhat.com/show_bug.cgi?id=827510 Indeed. Thanks for the pointer. I have reported the issue on launchpad (ubuntu bug tracking site) with a link to the redhat bug and Tom's test program attached. https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1323953 We'll see what happens. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposing pg_hibernate
On Wed, May 28, 2014 at 2:15 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, May 28, 2014 at 7:31 AM, Gurjeet Singh gurj...@singh.im wrote: Caveats -- - Buffer list is saved only when Postgres is shutdown in smart and fast modes. That is, buffer list is not saved when database crashes, or on immediate shutdown. - A reduction in `shared_buffers` is not detected. If the `shared_buffers` is reduced across a restart, and if the combined saved buffer list is larger than the new shared_buffers, Postgres Hibernator continues to read and restore blocks even after `shared_buffers` worth of buffers have been restored. How about the cases when shared buffers already contain some data: a. Before Readers start filling shared buffers, if this cluster wishes to join replication as a slave and receive the data from master, then this utility might need to evict some buffers filled during startup phase. A cluster that wishes to be a replication standby, it would do so while it's in startup phase. The BlockReaders are launched immediately on cluster reaching consistent state, at which point, I presume, in most of the cases, most of the buffers would be unoccupied. Hence BlockReaders might evict the occupied buffers, which may be a small fraction of total shared_buffers. b. As soon as the server completes startup (reached consistent point), it allows new connections which can also use some shared buffers before Reader process could use shared buffers or are you planing to change the time when users can connect to database. The BlockReaders are launched immediately after the cluster reaches consistent state, that is, just about when it is ready to accept connections. So yes, there is a possibility that the I/O caused by the BlockReaders may affect the performance of queries executed right at cluster startup. But given that the performance of those queries was anyway going to be low (because of empty shared buffers), and that BlockReaders tend to cause sequential reads, and that by default there's only one BlockReader active at a time, I think this won't be a concern in most of the cases. By the time the shared buffers start getting filled up, the buffer replacement strategy will evict any buffers populated by BlockReaders if they are not used by the normal queries. In the 'Sample Runs' section of my blog [1], I compared the cases 'Hibernator w/ App' and 'Hibernator then App', which demonstrate that launching application load while the BlockReaders are active does cause performance of both to be impacted by each other. But overall it's a net win for application performance. I am not sure if replacing shared buffer contents in such cases can always be considered useful. IMHO, all of these caveats, would affect a very small fraction of use-cases and are eclipsed by the benefits this extension provides in normal cases. [1]: http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/ Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EDB www.EnterpriseDB.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, May 26, 2014 at 10:39 AM, Stephen Frost sfr...@snowman.net wrote: It'd need to be explicitly requested, eg a 'CASCADE' option. Why? Would any sane person NOT want this behavior? [...] Now maybe there are options other than trying to reproduce what the original CREATE OR REPLACE statement would have done against the new type. For example, we could look through views that depend on t.a and rewrite each reference to that column to t.a::oldtype. This might lead to odd results with multiple nested casts and generally funny behavior if the column is re-typed multiple times; but maybe there's some way to fix that. This. Also, it might not really be the semantics you want if you were hoping the type update would truly cascade. But it might still be better than a sharp stick in the eye, which is kinda what we offer today. I hadn't even considered the idea that we would go through and try to change everything which referenced that view to now be the new type- but in that case, I'd want to know that there were other changes which were happening beyond the single view which I was updating. Perhaps a NOTICE would be enough, but it doesn't feel correct to me. Also consider MatViews which would need to be rewritten for the new type, or pl/pgsql functions which we couldn't possibly fix entirely (we're going to change the variable's type definition because it selects out a column from this view?) and so they'd just break instead. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
Robert Haas robertmh...@gmail.com writes: Well, pg_dump is trying to do something different than what you're trying to do here. pg_dump wants to make sure that the view, when fed back into psql, creates the same view that exists now, regardless of whether that's what the user created originally. For example, if a view is created referring to table foo, and table foo is later renamed to bar, then pg_dump wants to (and does) dump a statement referring to bar, not foo - even if there's a *new* table called foo against which the view could have been defined. Similarly, pg_dump will schema-qualify functions and operators, or not, based on whether that's necessary to reference the exact same operators that were selected when the original CREATE VIEW command was run, regardless of whether the original references were schema-qualified. Sorry, I don't see how any of the above is a problem in my use case. Should a table has been renamed, naturally we want to re-create the view referring to the *old* table, but under its *new name*. The same goes with schema-qualifying objects. None of that involves answering hypothetical questions; but what you want to do does, and that I think is the problem in a nutshell. In a nutshell I'd like PostgreSQL to just re-parse the *current* view definition. Should that throw an error, user intervention will be required anyway, but most of the time it should just work. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
Stephen Frost sfr...@snowman.net writes: I hadn't even considered the idea that we would go through and try to change everything which referenced that view to now be the new type- but in that case, I'd want to know that there were other changes which were happening beyond the single view which I was updating. Perhaps a NOTICE would be enough, but it doesn't feel correct to me. Also consider MatViews which would need to be rewritten for the new type That might be costly but not impossible. A user would need to do that anyway, though manually. or pl/pgsql functions which we couldn't possibly fix entirely (we're going to change the variable's type definition because it selects out a column from this view?) and so they'd just break instead. I'm not suggesting that we try to *fix* functions, but if we can detect function breakage by re-parsing them it would be nice to alert the user of any problems at the instant of running ALTER TABLE statement, not when the user tries to actually run the function (see my mail upthread for sample scenario.) -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
* ash (a...@commandprompt.com) wrote: Stephen Frost sfr...@snowman.net writes: I hadn't even considered the idea that we would go through and try to change everything which referenced that view to now be the new type- but in that case, I'd want to know that there were other changes which were happening beyond the single view which I was updating. Perhaps a NOTICE would be enough, but it doesn't feel correct to me. Also consider MatViews which would need to be rewritten for the new type That might be costly but not impossible. A user would need to do that anyway, though manually. I was pointing out why it should need to be explicitly requested, but all-in-all, I don't really see this proposal going anywhere. It's a neat idea, and if there's a sensible way to do it, then the user should have to explicitly request it, imv. or pl/pgsql functions which we couldn't possibly fix entirely (we're going to change the variable's type definition because it selects out a column from this view?) and so they'd just break instead. I'm not suggesting that we try to *fix* functions, but if we can detect function breakage by re-parsing them it would be nice to alert the user of any problems at the instant of running ALTER TABLE statement, not when the user tries to actually run the function (see my mail upthread for sample scenario.) We're not going to re-parse every function in the system, like, ever. I might be willing to buy off on too bad in those cases (it's not like we're going and fixing them today for ALTER TABLE .. TYPE cases anyway) but the point is that cascading the change to a column's type through all of its dependencies is likely to cause breakage somewhere in even modestly complex systems and we shouldn't just start doing that automatically. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
Stephen Frost sfr...@snowman.net writes: * ash (a...@commandprompt.com) wrote: Stephen Frost sfr...@snowman.net writes: Also consider MatViews which would need to be rewritten for the new type That might be costly but not impossible. A user would need to do that anyway, though manually. I was pointing out why it should need to be explicitly requested, but all-in-all, I don't really see this proposal going anywhere. It's a neat idea, and if there's a sensible way to do it, then the user should have to explicitly request it, imv. Ah, understood. or pl/pgsql functions which we couldn't possibly fix entirely (we're going to change the variable's type definition because it selects out a column from this view?) and so they'd just break instead. I'm not suggesting that we try to *fix* functions, but if we can detect function breakage by re-parsing them it would be nice to alert the user of any problems at the instant of running ALTER TABLE statement, not when the user tries to actually run the function (see my mail upthread for sample scenario.) We're not going to re-parse every function in the system, like, ever. Well, only every *affected* function, which might be pretty minimal in the usual case. I might be willing to buy off on too bad in those cases (it's not like we're going and fixing them today for ALTER TABLE .. TYPE cases anyway) but the point is that cascading the change to a column's type through all of its dependencies is likely to cause breakage somewhere in even modestly complex systems and we shouldn't just start doing that automatically. What I am suggesting is that we try to detect such breakage at the time the user runs ALTER TABLE (issuing NOTICE or ERROR at user discretion.) If changing column type of a table breaks some functions down the way, the user will hit it anyway, but better know it soon than when he wants to *run* that function. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_llog not mentioned in Database File Layout
On Wed, May 28, 2014 at 4:36 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Mon, May 26, 2014 at 12:33 PM, Amit Langote amitlangot...@gmail.com wrote: Hi, Just noticed pg_llog is not mentioned in the Database File Layout section. Wonder if it's an oversight? Yes, it is an oversight. Patch attached. ISTM pg_stat directory is also missing in the doc. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_llog not mentioned in Database File Layout
Hi, On 2014-05-27 00:33:12 +0900, Amit Langote wrote: Just noticed pg_llog is not mentioned in the Database File Layout section. Wonder if it's an oversight? Yes, it should be mentioned there. I'll fix it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
* ash (a...@commandprompt.com) wrote: What I am suggesting is that we try to detect such breakage at the time the user runs ALTER TABLE (issuing NOTICE or ERROR at user discretion.) If changing column type of a table breaks some functions down the way, the user will hit it anyway, but better know it soon than when he wants to *run* that function. Sure, if we had the information about what would break then we could tell the user about it. We don't and that's not likely to ever change... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_llog not mentioned in Database File Layout
On Wed, May 28, 2014 at 10:46 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, May 28, 2014 at 4:36 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Mon, May 26, 2014 at 12:33 PM, Amit Langote amitlangot...@gmail.com wrote: Hi, Just noticed pg_llog is not mentioned in the Database File Layout section. Wonder if it's an oversight? Yes, it is an oversight. Patch attached. ISTM pg_stat directory is also missing in the doc. Added pg_stat too. Attached patch! -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index 57e7f09..41f81e9 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -78,6 +78,11 @@ Item /row row + entryfilenamepg_llog//entry + entrySubdirectory containing logical replication status data/entry +/row + +row entryfilenamepg_multixact//entry entrySubdirectory containing multitransaction status data (used for shared row locks)/entry @@ -104,6 +109,11 @@ Item /row row + entryfilenamepg_stat//entry + entrySubdirectory containing files for the statistics subsystem/entry +/row + +row entryfilenamepg_stat_tmp//entry entrySubdirectory containing temporary files for the statistics subsystem/entry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat directory and pg_stat_statements
Hi, Thanks to 187492b6c2e8cafc5b39063ca3b67846e8155d24, pgstat files are now saved to $PGDATA/pg_stat directory at shutdown. But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? Saving that file to global is harmless, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
ash a...@commandprompt.com writes: Stephen Frost sfr...@snowman.net writes: We're not going to re-parse every function in the system, like, ever. Well, only every *affected* function, which might be pretty minimal in the usual case. We don't store dependency information for function bodies, so there's no way to do this except by reparsing everything in sight. A larger issue with the idea is that a function might fail reparsing for reasons having nothing to do with the proposed ALTER TABLE. For instance, it's not at all unusual for functions to contain references to tables that don't normally exist, but are created when the function is to be called (or maybe even by the function itself). Because of this problem, reparsing, in the sense of detecting semantic rather than purely syntactic problems in a function body, is something that we don't actually do *at all*, ever, except when the function is actually executed. (This is part of the reason why there's no dependency info.) Pavel Stehule has made some efforts towards improving that situation for plpgsql functions: https://commitfest.postgresql.org/action/patch_view?id=884 but that patch remains pretty controversial and may never get committed. Even if it does get in, it wouldn't move the goalposts for any other PL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
On Tue, May 27, 2014 at 09:41:06AM -0400, Stephen Frost wrote: * David Fetter (da...@fetter.org) wrote: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column While I like the general idea, you seem to be making this appear much more complex than it actually is. For example, I see no value in a table-level uint4 - int8 definition. You chose a particularly silly example, so I have to assume it's a straw man. My point was that since we don't know what things are relevant to preserve and transform here in the design stage, we need to leave them settable by people who have needs we don't know about, i.e. the users of the feature. If you want something which is not the default, just set it on the individual columns of the foreign table, exactly how we handle column name differences. There might be some value in being able to redefine what the default is at the FOREIGN SERVER level, as perhaps MySQL DB X and MySQL DB Y could have different default mappings for some reason, but adding in the rest of those levels doesn't add value imv. This would consist of: - The remote type - The local type to which it maps - The inbound transformation (default identity) - The outbound transformation (default identity) The remote type and the local type are known already to the FDW, no? The above aren't separable items. They all have to be there, even if for user convenience we have two default transformations which are the identity. The FDW will need to know how to do whatever conversions we're talking about also, right? No. The writer of the FDW is not the same person as the user of the FDW, and the former is not omniscient. My idea here is to allow FDW users to supply transformation code at these spots. (If you want to do it in core PG instead of the FDW then I'm thinking you should probably use a view over top of the foreign table..). We can't know in advance what to preserve and what to jettison. We can't even know which server is responsible for doing the transformation. What's nice is that this all falls under what an FDW can do *already*, if it wants (and, actually, it could do it on the table-level technically too, if the FDW supports that, but schema? database? these things don't make sense...). Not to you yet, but I've seen deployed applications with exactly these requirements. For the IMPORT bit, it should just be doing whatever the defaults are unless you've set some different defaults for a given foreign server (also something which could be set using our existing system...). ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') I'm very much against having two different command languages where one is used when convenient. I did not suggest that we use two different ways to specify this. I did sketch out one path--adding a bunch of SQL grammar--which I made it explicitly clear we shouldn't tread. Apparently, I wasn't quite explicit enough. If we wanted to do this, we should build a full-spec mapping from whatever JSON language you come up with for our utility commands to what we actually implement in the grammar. Excellent plan. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Tue, May 27, 2014 at 12:57 PM, Rahila Syed rahilasyed...@gmail.com wrote: Hello All, 0001-CompressBackupBlock_snappy_lz4_pglz extends patch on compression of full page writes to include LZ4 and Snappy . Changes include making compress_backup_block GUC from boolean to enum. Value of the GUC can be OFF, pglz, snappy or lz4 which can be used to turn off compression or set the desired compression algorithm. 0002-Support_snappy_lz4 adds support for LZ4 and Snappy in PostgreSQL. It uses Andres’s patch for getting Makefiles working and has a few wrappers to make the function calls to LZ4 and Snappy compression functions and handle varlena datatypes. Patch Courtesy: Pavan Deolasee Thanks for extending and revising the FPW-compress patch! Could you add your patch into next CF? Also, compress_backup_block GUC needs to be merged with full_page_writes. Basically I agree with you because I don't want to add new GUC very similar to the existing one. But could you imagine the case where full_page_writes = off. Even in this case, FPW is forcibly written only during base backup. Such FPW also should be compressed? Which compression algorithm should be used? If we want to choose the algorithm for such FPW, we would not be able to merge those two GUCs. IMO it's OK to always use the best compression algorithm for such FPW and merge them, though. Tests use JDBC runner TPC-C benchmark to measure the amount of WAL compression ,tps and response time in each of the scenarios viz . Compression = OFF , pglz, LZ4 , snappy ,FPW=off Isn't it worth measuring the recovery performance for each compression algorithm? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
On Mon, May 26, 2014 at 09:50:42PM +0900, Michael Paquier wrote: x86_64-w64-mingw32-gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/interfaces/libpq -I../../../src/include -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/include -I../pgsql/src/include/port/win32 -DEXEC_BACKEND -I/c/prog/3p64/include/libxml2 -I/c/prog/3p64/include -I/c/prog/3p64/openssl/include -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/include/port/win32 -c -o parallel.o /home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c: In function 'pgpipe': c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c:1332:2: warning: overflow in implicit constant conversion [-Woverflow] handles[0] = handles[1] = INVALID_SOCKET; ^ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c:1386:3: warning: overflow in implicit constant conversion [-Woverflow] handles[1] = INVALID_SOCKET; ^ In mingw-w64, SOCKET_INVALID is defined as ~0: http://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-headers/include/psdk_inc/_socket_types.h Is this overflow caused because SOCKET_INVALID corresponds to a 64b value in mingw-w64? Looking at the code this exists since 9.3. I think this is caused because the variable is not defined as SOCKET. The attached patch fixes this. This should prevent the warning. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c new file mode 100644 index caedbb8..cd6ef7a *** a/src/bin/pg_dump/parallel.c --- b/src/bin/pg_dump/parallel.c *** *** 36,42 #ifdef WIN32 static unsigned int tMasterThreadId = 0; static HANDLE termEvent = INVALID_HANDLE_VALUE; ! static int pgpipe(int handles[2]); static int piperead(int s, char *buf, int len); /* --- 36,42 #ifdef WIN32 static unsigned int tMasterThreadId = 0; static HANDLE termEvent = INVALID_HANDLE_VALUE; ! static int pgpipe(SOCKET handles[2]); static int piperead(int s, char *buf, int len); /* *** readMessageFromPipe(int fd) *** 1323,1329 * with recv/send. */ static int ! pgpipe(int handles[2]) { SOCKET s; struct sockaddr_in serv_addr; --- 1323,1329 * with recv/send. */ static int ! pgpipe(SOCKET handles[2]) { SOCKET s; struct sockaddr_in serv_addr; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On 28 May 2014 15:34, Fujii Masao masao.fu...@gmail.com wrote: Also, compress_backup_block GUC needs to be merged with full_page_writes. Basically I agree with you because I don't want to add new GUC very similar to the existing one. But could you imagine the case where full_page_writes = off. Even in this case, FPW is forcibly written only during base backup. Such FPW also should be compressed? Which compression algorithm should be used? If we want to choose the algorithm for such FPW, we would not be able to merge those two GUCs. IMO it's OK to always use the best compression algorithm for such FPW and merge them, though. I'd prefer a new name altogether torn_page_protection = 'full_page_writes' torn_page_protection = 'compressed_full_page_writes' torn_page_protection = 'none' this allows us to add new techniques later like torn_page_protection = 'background_FPWs' or torn_page_protection = 'double_buffering' when/if we add those new techniques -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_sleep() doesn't work well with recovery conflict interrupts.
Hi, Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses WaitLatch() to wait. That's fine in itself. But procsignal_sigusr1_handler, which is used e.g. when resolving recovery conflicts, doesn't unconditionally do a SetLatch(). That means that we'll we'll currently not be able to cancel conflicting backends during recovery for 10min. Now, I don't think that'll happen too often in practice, but it's still annoying. As an alternative to doing the PG_TRY/save set_latch_on_sigusr1/set set_latch_on_sigusr1/PG_CATCH/reset set_latch_on_sigusr1/ dance in pg_sleep() we could also have RecoveryConflictInterrupt() do an unconditional SetLatch()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
On Wed, May 28, 2014 at 7:38 AM, Bruce Momjian br...@momjian.us wrote: On Mon, May 26, 2014 at 09:50:42PM +0900, Michael Paquier wrote: x86_64-w64-mingw32-gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/interfaces/libpq -I../../../src/include -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/include -I../pgsql/src/include/port/win32 -DEXEC_BACKEND -I/c/prog/3p64/include/libxml2 -I/c/prog/3p64/include -I/c/prog/3p64/openssl/include -I/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/include/port/win32 -c -o parallel.o /home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c: In function 'pgpipe': c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c:1332:2: warning: overflow in implicit constant conversion [-Woverflow] handles[0] = handles[1] = INVALID_SOCKET; ^ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.5100/../pgsql/src/bin/pg_dump/parallel.c:1386:3: warning: overflow in implicit constant conversion [-Woverflow] handles[1] = INVALID_SOCKET; ^ In mingw-w64, SOCKET_INVALID is defined as ~0: http://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-headers/include/psdk_inc/_socket_types.h Is this overflow caused because SOCKET_INVALID corresponds to a 64b value in mingw-w64? Looking at the code this exists since 9.3. I think this is caused because the variable is not defined as SOCKET. The attached patch fixes this. This should prevent the warning. Wouldn't it be better to use pgsocket rather than SOCKET? Cheers, Jeff
Re: [HACKERS] pg_stat directory and pg_stat_statements
On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
Tom Lane t...@sss.pgh.pa.us writes: We don't store dependency information for function bodies, so there's no way to do this except by reparsing everything in sight. A larger issue with the idea is that a function might fail reparsing for reasons having nothing to do with the proposed ALTER TABLE. For instance, it's not at all unusual for functions to contain references to tables that don't normally exist, but are created when the function is to be called (or maybe even by the function itself). Because of this problem, reparsing, in the sense of detecting semantic rather than purely syntactic problems in a function body, is something that we don't actually do *at all*, ever, except when the function is actually executed. (This is part of the reason why there's no dependency info.) Pavel Stehule has made some efforts towards improving that situation for plpgsql functions: https://commitfest.postgresql.org/action/patch_view?id=884 but that patch remains pretty controversial and may never get committed. Even if it does get in, it wouldn't move the goalposts for any other PL. OK, forget functions, I now realize it's not feasible to consider. Can we get back to re-defining views at least? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
ash wrote: Tom Lane t...@sss.pgh.pa.us writes: We don't store dependency information for function bodies, so there's no way to do this except by reparsing everything in sight. OK, forget functions, I now realize it's not feasible to consider. Can we get back to re-defining views at least? Hi Alex, I think it's reasonable to try and fix the problems for views (and other objects -- there are other things that can depend on table definitions; composite types come to mind) and ignore functions bodies, since you can already get into trouble by using ALTER TABLE today and it's known to be an unsolvable problem. Now -- do we need to do anything about tables used as return types or argument types for functions? alvherre=# create table qux (a int, b text); CREATE TABLE alvherre=# create or replace function test_qux(a qux) returns void language plpgsql as $$ begin raise notice 'the qux we got is %', $1; end; $$; CREATE FUNCTION alvherre=# insert into qux values (1, 'one'); INSERT 0 1 alvherre=# select * from qux, test_qux(qux.*); NOTICE: the qux we got is (1,one) a | b | test_qux ---+-+-- 1 | one | (1 fila) alvherre=# alter table qux add column c timestamptz; ALTER TABLE alvherre=# update qux set c = now(); UPDATE 1 alvherre=# select * from qux, test_qux(qux.*); NOTICE: the qux we got is (1,one,) a | b | c | test_qux ---+-+---+-- 1 | one | 2014-05-28 12:08:28.210895-04 | (1 fila) Notice how the NOTICE has a final comma, meaning the tuple descriptor is aware that there is a third column -- but the value in the table is not null per the UPDATE, so the fact that there's nothing after the comma means this is not being handled correctly. If I close the session and start a fresh one, the result is saner: alvherre=# select * from qux, test_qux(qux.*); NOTICE: the qux we got is (1,one,2014-05-28 12:08:28.210895-04) a | b | c | test_qux ---+-+---+-- 1 | one | 2014-05-28 12:08:28.210895-04 | (1 fila) Maybe we're missing a function cache invalidation or something like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
Bruce Momjian br...@momjian.us writes: I think this is caused because the variable is not defined as SOCKET. The attached patch fixes this. This should prevent the warning. Surely that's just going to move the errors somewhere else. The call site still expects the argument to be int[]. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq: PQexec may block indefinitly
Greetings. I have an application which uses libpq for interaction with remote PostgreSQL server 9.2. Clients and Server nodes are running Linux and connection is established using TCPv4. The client application has some small fault-tolerance features, which are activated when server related problems are encountered. One day some bad things happened with network layer hardware and, long story short, host with PSQL server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any way. None of fault-tolerance features were triggered. Said critical network problem was resolved and I started to investigate why clients got fully inoperable. I have successfully reproduced the problem in the laboratory environment. These iptables commands should be run on the server node after some period of client - server interaction: # iptables -A OUTPUT -p tcp --sport 5432 -j DROP # iptables -A INPUT -p tcp --dport 5432 -j DROP After this my client blocks in libpq code according to debugger. I made a glimpse over master branch of libpq sources and some questions arose. Namely: 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful? 2. PQexec ultimately leads to PQwait, which after some function calls lands in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this? Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround. So, this infinite poll situation looks like imperfection to me and I think it should be fixed. Waiting for your comments: is it a bug or a feature? With regards, Dmitry Samonenko
Re: [HACKERS] pg_stat directory and pg_stat_statements
On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. OK, patch attached. I'm afraid that it's not okay to change the file layout in $PGDATA at this beta1 stage because that change basically seems to need initdb. Otherwise something like no such file or directory error can happen. But in this case what we need to change is only the location of the pg_stat_statements permanent stats file. So, without initdb, the server will not be able to find the pg_stat_statements stats file, but this is not so harmful. Only the problem is that the pg_stat_statements stats which were collected in past would disappear. OTOH, the server can keep running successfully from then and no critical data will not disappear. Therefore I think we can commit this patch even at beta1. Thought? Regards, -- Fujii Masao From 8f3281566399bbb088b826b8fe9c3baa5027e5da Mon Sep 17 00:00:00 2001 From: MasaoFujii masao.fu...@gmail.com Date: Thu, 29 May 2014 02:14:34 +0900 Subject: [PATCH] Save pg_stat_statements statistics file into $PGDATA/pg_stat directory at shutdown. 187492b6c2e8cafc5b39063ca3b67846e8155d24 changed pgstat.c so that the stats files were saved into $PGDATA/pg_stat directory when the server was shutdowned. But it accidentally forgot to change the location of pg_stat_statements permanent stats file. This commit fixes pg_stat_statements so that its stats file is also saved into $PGDATA/pg_stat at shutdown. --- contrib/pg_stat_statements/pg_stat_statements.c | 2 +- src/backend/postmaster/pgstat.c | 8 src/include/pgstat.h| 8 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 32d16cc..a3e8c59 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -80,7 +80,7 @@ PG_MODULE_MAGIC; /* Location of permanent stats file (valid when database is shut down) */ -#define PGSS_DUMP_FILE global/pg_stat_statements.stat +#define PGSS_DUMP_FILE PGSTAT_STAT_PERMANENT_DIRECTORY /pg_stat_statements.stat /* * Location of external query text file. We don't keep it in the core diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index f864816..3ab1428 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -68,14 +68,6 @@ /* -- - * Paths for the statistics files (relative to installation's $PGDATA). - * -- - */ -#define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat -#define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp - -/* -- * Timer definitions. * -- */ diff --git a/src/include/pgstat.h b/src/include/pgstat.h index d9de09f..0892533 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -20,6 +20,14 @@ #include utils/relcache.h +/* -- + * Paths for the statistics files (relative to installation's $PGDATA). + * -- + */ +#define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat +#define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp + /* Default directory to store temporary statistics data in */ #define PG_STAT_TMP_DIR pg_stat_tmp -- 1.7.12.4 (Apple Git-37) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
* David Fetter (da...@fetter.org) wrote: On Tue, May 27, 2014 at 09:41:06AM -0400, Stephen Frost wrote: * David Fetter (da...@fetter.org) wrote: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column While I like the general idea, you seem to be making this appear much more complex than it actually is. For example, I see no value in a table-level uint4 - int8 definition. You chose a particularly silly example, so I have to assume it's a straw man. ... I used your example from 20140525194118.gb32...@fetter.org and your suggestion that we support that on a per-table basis. If that combination is silly then let's not support it. My point was that since we don't know what things are relevant to preserve and transform here in the design stage, we need to leave them settable by people who have needs we don't know about, i.e. the users of the feature. This is not relevant as far as I can tell. Users can already make changes to the definitions, or create them however they want the first time by using CREATE FOREIGN TABLE. The point of IMPORT is that it should be for bulk operations- if you have a lot of very specific transformations, then use CREATE instead. I don't see value in rebuilding the flexibility of what a script of CREATEs gives you into a single IMPORT command. The FDW will need to know how to do whatever conversions we're talking about also, right? No. The writer of the FDW is not the same person as the user of the FDW, and the former is not omniscient. My idea here is to allow FDW users to supply transformation code at these spots. Then we're not talking about something which should be IMPORT's job, or at least it goes far beyond it and that's what we're discussing here. This sounds a bit like you're looking to rebuild what ETLs provide in a streaming fashion- and I'm all for that as an interesting project that isn't about adding IMPORT. I still think you could use views for this, or ETL, if you really want to implement it using core instead of the FDW. (If you want to do it in core PG instead of the FDW then I'm thinking you should probably use a view over top of the foreign table..). We can't know in advance what to preserve and what to jettison. We can't even know which server is responsible for doing the transformation. Either side could handle the transformation using views, or there could be transformations on both sides. What's nice is that this all falls under what an FDW can do *already*, if it wants (and, actually, it could do it on the table-level technically too, if the FDW supports that, but schema? database? these things don't make sense...). Not to you yet, but I've seen deployed applications with exactly these requirements. I don't view IMPORT as being part of a deployed application. It's a way of simplifying the process of setting up FDWs, not an ETL mechanism. For the IMPORT bit, it should just be doing whatever the defaults are unless you've set some different defaults for a given foreign server (also something which could be set using our existing system...). ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') I'm very much against having two different command languages where one is used when convenient. I did not suggest that we use two different ways to specify this. I did sketch out one path--adding a bunch of SQL grammar--which I made it explicitly clear we shouldn't tread. Apparently, I wasn't quite explicit enough. What you missed here is that I'd find it objectionable to have some portion of the system has a JSON-based grammar while the rest is an SQL grammar. I'm not entirely thrilled with the jsquery approach for that reason, but at least that's a very contained case which is about a logic expression (and we have logic expressions already in SQL). That's not what you're describing here. If we wanted to do this, we should build a full-spec mapping from whatever JSON language you come up with for our utility commands to what we actually implement in the grammar. Excellent plan. Go for it. From here, I don't see anything stopping you from making a wrapper around PG which converts your JSON syntax into SQL (indeed, people have already done this..). I wouldn't get your hopes up about having it ever part of core PG though, and it certainly won't be until it's as complete and capable as the existing SQL grammar. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?
* ash (a...@commandprompt.com) wrote: OK, forget functions, I now realize it's not feasible to consider. I never meant to imply that it was but rather to point out that we might have users who actually want to get an error when they're changing a type definition which goes beyond the scope of the explicit action (and therefore could very well have more side-effects than they realize), rather than just doing it for them. Can we get back to re-defining views at least? I'm still not convinced you'll be able to do it in a sensible and reliable way, but you're certainly welcome to continue exploring. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Bison 3.0 updates
On 5/21/14, 12:29 PM, Tom Lane wrote: Vik Fearing vik.fear...@dalibo.com writes: I'm getting some more of these, including some I thought you had fixed. Bison 3.0.2 on current head. I didn't do anything to suppress those warnings: gram.y:172.1-13: warning: deprecated directive, use ‘%name-prefix’ [-Wdeprecated] %name-prefix=base_yy ^ because it's hard to see how that's anything but a bison bug. What they want is that you use %name-prefix base_yy instead of %name-prefix=base_yy That makes the warning go away. The %something=something syntax is not documented anywhere, so it looks like it worked more or less by accident. We don't use it anywhere else either (e.g. %expect 0, not %expect=0). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bison 3.0 updates
Peter Eisentraut pete...@gmx.net writes: What they want is that you use %name-prefix base_yy instead of %name-prefix=base_yy That makes the warning go away. Oh really!? The %something=something syntax is not documented anywhere, so it looks like it worked more or less by accident. We don't use it anywhere else either (e.g. %expect 0, not %expect=0). Hah. That's probably my fault, somewhere way back when. I'll fix it, unless you're already on it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bison 3.0 updates
On 5/28/14, 2:43 PM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: What they want is that you use %name-prefix base_yy instead of %name-prefix=base_yy That makes the warning go away. Oh really!? The %something=something syntax is not documented anywhere, so it looks like it worked more or less by accident. We don't use it anywhere else either (e.g. %expect 0, not %expect=0). Hah. That's probably my fault, somewhere way back when. I'll fix it, unless you're already on it. Go ahead. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb failed assertions
On 05/28/2014 04:13 AM, Peter Geoghegan wrote: On Wed, May 21, 2014 at 4:53 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. The patch looks correct as far as it goes. But that function is still a bit funny. When it compares two identical arrays (or objects), and reaches the WJB_END_ARRAY token, it will still fall into the code that checks what the va and vb types are, and compares the last scalar values in that array again. That's wrong, and will fail if the compiler decides to clobber the local va or vb variables between iterations of the do-while loop, because JsonbIteratorNext() does not set the value when returning WJB_END_ARRAY. That's an obsolete assumption that once actually applied during development. Attached revision also adds handling for that case. Thanks, applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat directory and pg_stat_statements
On 28.5.2014 19:52, Fujii Masao wrote: On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. OK, patch attached. I'm afraid that it's not okay to change the file layout in $PGDATA at this beta1 stage because that change basically seems to need initdb. Otherwise something like no such file or directory error can happen. But in this case what we need to change is only the location of the pg_stat_statements permanent stats file. So, without initdb, the server will not be able to find the pg_stat_statements stats file, but this is not so harmful. Only the problem is that the pg_stat_statements stats which were collected in past would disappear. OTOH, the server can keep running successfully from then and no critical data will not disappear. Therefore I think we can commit this patch even at beta1. Thought? For HEAD, probably yes. But what about backpatching 9.3? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
On Wed, May 28, 2014 at 12:29:28PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I think this is caused because the variable is not defined as SOCKET. The attached patch fixes this. This should prevent the warning. Surely that's just going to move the errors somewhere else. The call site still expects the argument to be int[]. Ah, yes, you are right. This is a similar problem I had with libpq where PQsocket() had to return an int. Attached is an updated patch which follows my previous coding of checking for PGINVALID_SOCKET, and if not equal, assigns the value to an integer handle. I would also like to rename variable 's' to 'listen_sock', but that is not in the patch, for clarity reasons. Should this be held for 9.5? I think it is only warning removal. On the other hand, portability is what we do during beta testing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c new file mode 100644 index caedbb8..4efa8fb *** a/src/bin/pg_dump/parallel.c --- b/src/bin/pg_dump/parallel.c *** readMessageFromPipe(int fd) *** 1320,1337 /* * This is a replacement version of pipe for Win32 which allows returned * handles to be used in select(). Note that read/write calls must be replaced ! * with recv/send. */ static int pgpipe(int handles[2]) { ! SOCKET s; struct sockaddr_in serv_addr; int len = sizeof(serv_addr); handles[0] = handles[1] = INVALID_SOCKET; ! if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET) { write_msg(modulename, pgpipe: could not create socket: error code %d\n, WSAGetLastError()); --- 1320,1342 /* * This is a replacement version of pipe for Win32 which allows returned * handles to be used in select(). Note that read/write calls must be replaced ! * with recv/send. handles have to be integers so we check for errors then ! * cast to integers. */ static int pgpipe(int handles[2]) { ! pgsocket s, tmp_sock; struct sockaddr_in serv_addr; int len = sizeof(serv_addr); + /* We have to use the Unix socket definition here. */ handles[0] = handles[1] = INVALID_SOCKET; ! /* ! * setup listen socket ! */ ! if ((s = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET) { write_msg(modulename, pgpipe: could not create socket: error code %d\n, WSAGetLastError()); *** pgpipe(int handles[2]) *** 1363,1375 closesocket(s); return -1; } ! if ((handles[1] = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET) { write_msg(modulename, pgpipe: could not create second socket: error code %d\n, WSAGetLastError()); closesocket(s); return -1; } if (connect(handles[1], (SOCKADDR *) serv_addr, len) == SOCKET_ERROR) { --- 1368,1385 closesocket(s); return -1; } ! ! /* ! * setup pipe handles ! */ ! if ((tmp_sock = socket(AF_INET, SOCK_STREAM, 0)) == PGINVALID_SOCKET) { write_msg(modulename, pgpipe: could not create second socket: error code %d\n, WSAGetLastError()); closesocket(s); return -1; } + handles[1] = (int) tmp_sock; if (connect(handles[1], (SOCKADDR *) serv_addr, len) == SOCKET_ERROR) { *** pgpipe(int handles[2]) *** 1378,1384 closesocket(s); return -1; } ! if ((handles[0] = accept(s, (SOCKADDR *) serv_addr, len)) == INVALID_SOCKET) { write_msg(modulename, pgpipe: could not accept connection: error code %d\n, WSAGetLastError()); --- 1388,1394 closesocket(s); return -1; } ! if ((tmp_sock = accept(s, (SOCKADDR *) serv_addr, len)) == PGINVALID_SOCKET) { write_msg(modulename, pgpipe: could not accept connection: error code %d\n, WSAGetLastError()); *** pgpipe(int handles[2]) *** 1387,1392 --- 1397,1404 closesocket(s); return -1; } + handles[0] = (int) tmp_sock; + closesocket(s); return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On 05/28/2014 11:52 PM, John Lumby wrote: The patch is attached. It is based on clone of today's 9.4dev source. I have noticed that this source is (not suprisingly) quite a moving target at present, meaning that this patch becomes stale quite quickly. So although this copy is fine for reviewing, it may quite probably soon not be correct for the current source tree. As mentioned before, if anyone wishes to try this feature out on 9.3.4, I will be making a patch for that soon which I can supply on request. Wow, that's a huge patch. I took a very brief look, focusing on the basic design. ignoring the style other minor things for now: The patch seems to assume that you can put the aiocb struct in shared memory, initiate an asynchronous I/O request from one process, and wait for its completion from another process. I'm pretty surprised if that works on any platform. How portable is POSIX aio nowadays? Googling around, it still seems that on Linux, it's implemented using threads. Does the thread-emulation implementation cause problems with the rest of the backend, which assumes that there is only a single thread? In any case, I think we'll want to encapsulate the AIO implementation behind some kind of an API, to allow other implementations to co-exist. Benchmarks? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On Tue, May 27, 2014 at 3:17 PM, John Lumby johnlu...@hotmail.com wrote: Below I am pasting the README we have written for this new functionality which mentions some of the measurements, advantages (and disadvantages) and we welcome all and any comments on this. I think that this is likely to be a useful area to work on, but I wonder: can you suggest a useful test-case or benchmark to show the advantages of the patch you posted? You mention a testcase already, but it's a little short on details. I think it's always a good idea to start with that when pursuing a performance feature. Have you thought about things like specialized prefetching for nested loop joins? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST bug.
Teodor Sigaev teo...@sigaev.ru writes: create table xxx ( t text); insert into xxx select 'x' || v::text from generate_series(1, 291) as v; insert into xxx values (''); create index xxxidx on xxx using spgist (t); Fun! And postgres will eat memory forever. It seems to me that checkAllTheSame wrongly decides that all tuples are the same. All tuples except tuple to be inserted have the same character 'x' and only new tuple has '\0'. I don't think that checkAllTheSame is broken, but it probably would be after this patch --- ISTM you're breaking the corner case described in the header comment for checkAllTheSame, where we need to force splitting of a set of existing tuples that the opclass can't distinguish. What actually is broken, I think, is the spgist text opclass, because it's using a zero node label for two different situations. The scenario we're looking at here is: 1. We call picksplit with all 292 of these entries. Not surprisingly, it puts the first 291 into a single node with label 'x' and the last one into another node with label '\0'. 2. Because this is too many to fit on one index page, checkAllTheSame decides it had better create an allTheSame tuple for the first 291 and then try again to insert the empty string into that tuple. While you could argue whether this is *necessary*, it is not *incorrect*, so ISTM the opclass had better cope with it. 3. We call spg_text_choose with the allTheSame tuple and the empty-string value to be inserted. It finds the empty-string value doesn't match (since all the nodes have label 'x') so it requests a split: out-resultType = spgSplitTuple; out-result.splitTuple.prefixHasPrefix = in-hasPrefix; out-result.splitTuple.prefixPrefixDatum = in-prefixDatum; out-result.splitTuple.nodeLabel = UInt8GetDatum('\0'); out-result.splitTuple.postfixHasPrefix = false; After splitting, we have a new upper tuple with a single node with label '\0', pointing to a new allTheSame tuple containing the original 291 entries. 4. We call spg_text_choose with the new upper tuple and the empty-string value to be inserted. It looks for a node labeled '\0', and finds it, so it reports that we should descend into that node --- ie, down to the new allTheSame tuple. 5. We call spg_text_choose with the new allTheSame tuple and the empty-string value to be inserted. This is exactly like the situation at step 3, so we're in an infinite loop. It doesn't look to me like the core code has done anything wrong here; it just did what the opclass requested. Rather, the problem is we're using '\0' node label both to represent a no op label descent and to represent we're past the end of the string. I'm afraid there may not be any way to fix this without breaking on-disk compatibility for spgist text indexes :-(. It seems like we have to distinguish the case of zero-length string from that of a dummy label for a pushed-down allTheSame tuple. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout
Hi, On 2014-05-17 01:34:25 +0200, Andres Freund wrote: On 2014-05-16 17:02:33 -0400, Steve Singer wrote: I don't think that's going to cut it though. The creation can take longer than whatever wal_sender_timeout is set to (when there's lots of longrunning transactions). I think checking whether last_reply_timestamp = 0 during timeout checking is more robust. That makes sense. A patch that does that is attached. I've attached a heavily revised version of that patch. Didn't touch all the necessary places... Survives creation of logical replication slots under 'intensive pressure', with a wal_sender_timeout=10ms. Thanks for the bugreport! Pushed a fix for it. I am pretty sure it will, but could you still test that it fixes your problem? Thanks! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json casts
On 05/27/2014 07:25 PM, Andrew Dunstan wrote: On 05/27/2014 07:17 PM, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Andrew Dunstan (and...@dunslane.net) wrote: Given that this would be a hard coded behaviour change, is it too late to do this for 9.4? No, for my 2c. If we do it by adding casts then it'd require an initdb, so I'd vote against that for 9.4. If we just change behavior in json.c then that objection doesn't apply, so I wouldn't complain. I wasn't proposing to add a cast, just to allow users to add one if they wanted. But I'm quite happy to go with special-casing timestamps, and leave the larger question for another time. Here's a draft patch. I'm still checking to see if there are other places that need to be fixed, but I think this has the main one. cheers andrew diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index a7364f3..d262bda 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -24,6 +24,7 @@ #include parser/parse_coerce.h #include utils/array.h #include utils/builtins.h +#include utils/formatting.h #include utils/lsyscache.h #include utils/json.h #include utils/jsonapi.h @@ -53,6 +54,8 @@ typedef enum /* type categories for datum_to_json */ JSONTYPE_NULL,/* null, so we didn't bother to identify */ JSONTYPE_BOOL,/* boolean (built-in types only) */ JSONTYPE_NUMERIC, /* numeric (ditto) */ + JSONTYPE_TIMESTAMP, /* we use special formatting for timestamp */ + JSONTYPE_TIMESTAMPTZ, /* ... and timestamptz */ JSONTYPE_JSON,/* JSON itself (and JSONB) */ JSONTYPE_ARRAY,/* array */ JSONTYPE_COMPOSITE, /* composite */ @@ -60,6 +63,13 @@ typedef enum /* type categories for datum_to_json */ JSONTYPE_OTHER/* all else */ } JsonTypeCategory; +/* + * to_char formats to turn timestamps and timpstamptzs into json strings + * that are ISO 8601 compliant + */ +#define TS_ISO8601_FMT \\\-MM-DD\T\HH24:MI:SS.US\\\ +#define TSTZ_ISO8601_FMT \\\-MM-DD\T\HH24:MI:SS.USOF\\\ + static inline void json_lex(JsonLexContext *lex); static inline void json_lex_string(JsonLexContext *lex); static inline void json_lex_number(JsonLexContext *lex, char *s, bool *num_err); @@ -1262,6 +1272,14 @@ json_categorize_type(Oid typoid, *tcategory = JSONTYPE_NUMERIC; break; + case TIMESTAMPOID: + *tcategory = JSONTYPE_TIMESTAMP; + break; + + case TIMESTAMPTZOID: + *tcategory = JSONTYPE_TIMESTAMPTZ; + break; + case JSONOID: case JSONBOID: *tcategory = JSONTYPE_JSON; @@ -1375,6 +1393,29 @@ datum_to_json(Datum val, bool is_null, StringInfo result, } pfree(outputstr); break; + case JSONTYPE_TIMESTAMP: + /* + * The timestamp format used here provides for quoting the string, + * so no escaping is required. + */ + jsontext = DatumGetTextP( +DirectFunctionCall2(timestamp_to_char, val, + CStringGetTextDatum(TS_ISO8601_FMT))); + outputstr = text_to_cstring(jsontext); + appendStringInfoString(result, outputstr); + pfree(outputstr); + pfree(jsontext); + break; + case JSONTYPE_TIMESTAMPTZ: + /* same comment as for timestamp above */ + jsontext = DatumGetTextP( +DirectFunctionCall2(timestamptz_to_char, val, + CStringGetTextDatum(TSTZ_ISO8601_FMT))); + outputstr = text_to_cstring(jsontext); + appendStringInfoString(result, outputstr); + pfree(outputstr); + pfree(jsontext); + break; case JSONTYPE_JSON: /* JSON and JSONB output will already be escaped */ outputstr = OidOutputFunctionCall(outfuncoid, val); diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index 9f08676..c4dc8b0 100644 --- a/src/test/regress/expected/json.out +++ b/src/test/regress/expected/json.out @@ -403,6 +403,29 @@ SELECT row_to_json(row((select array_agg(x) as d from generate_series(5,10) x)), {f1:[5,6,7,8,9,10]} (1 row) +-- to_json, timestamps +select to_json(timestamp '2014-05-28 12:22:35.614298'); + to_json +-- + 2014-05-28T12:22:35.614298 +(1 row) + +BEGIN; +SET LOCAL TIME ZONE 10.5; +select to_json(timestamptz '2014-05-28 12:22:35.614298-04'); + to_json + + 2014-05-29T02:52:35.614298+10:30 +(1 row) + +SET LOCAL TIME ZONE -8; +select to_json(timestamptz '2014-05-28 12:22:35.614298-04'); + to_json +- + 2014-05-28T08:22:35.614298-08 +(1 row) + +COMMIT; --json_agg SELECT json_agg(q) FROM ( SELECT $$a$$ || x AS b, y AS c, diff --git a/src/test/regress/expected/json_1.out b/src/test/regress/expected/json_1.out index 13f7687..629e98e 100644 --- a/src/test/regress/expected/json_1.out +++ b/src/test/regress/expected/json_1.out @@ -403,6 +403,29 @@ SELECT row_to_json(row((select array_agg(x) as d from generate_series(5,10) x)), {f1:[5,6,7,8,9,10]} (1 row) +-- to_json, timestamps
Re: [HACKERS] replication protocol documentation inconsistencies
Hi, On 2014-05-21 07:29:53 -0400, Peter Eisentraut wrote: Looking at http://www.postgresql.org/docs/devel/static/protocol-replication.html under START_REPLICATION it goes The payload of each CopyData message from server to the client contains a message of one of the following formats: If a slot's name is provided via slotname, it will be updated as replication progresses so that the server knows which WAL segments - and if hot_standby_feedback is on which transactions - are still needed by the standby. XLogData (B) ... Primary keepalive message (B) ... That second paragraph was inserted recently and doesn't make sense there. It should be moved somewhere else. Hm. I am not sure why it doesn't make sense there? It's about the SLOT $slotname parameter to START_REPLICATION? More generally, it is weird that the message formats are described there, even though the rest of the protocol documentation only mentions the messages by name and then describes the formats later (http://www.postgresql.org/docs/devel/static/protocol-message-types.html and http://www.postgresql.org/docs/devel/static/protocol-message-formats.html). For example, the meaning of the (B) code isn't described until two sections later. I am not sure that makes sense. These messages cannot be sent as toplevel messages - they're just describing the contents of the CopyBoth stream after START_REPLICATION has begun. It seems wierd to add these 'subprotocol' messages to the toplevel protocol description. I think the description of the details of the these messages should also be moved there. The CopyBothResponse, which is also used for replication only, is also listed among the normal message formats. I think that's different because CopyBoth is a toplevel protocol issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.
Andres Freund and...@2ndquadrant.com writes: On 2014-05-28 22:55:28 +0200, Andres Freund wrote: On 2014-05-28 19:42:35 +, Tom Lane wrote: Fix bogus %name-prefix option syntax in all our Bison files. Are you sure about this? When I saw those warnings first after debian unstable got bison 3.0 I've read the release notes and interpreted it differently: By accident *only* the = syntax worked for a long time. Then somewhere around 2.8 they added the syntax without =. That means that 2.8 versions are likely not to work anymore. According to git tag --contains the syntax without = has been added in 2.4 (not 2.8 as I'd remembered) which was released 2008-11-02. It's warning since 3.0 which was released 2013-07-25. Yeah, that's what the buildfarm is showing: members with bison 2.3 or less are failing :-(. It's imo not realistic to rely on bison = 2.4, at least not in the backbranches. Pretty damn annoying. We'll have to live with those warnings for a couple of years. Agreed; even relatively modern platforms such as OS X 10.9 are still shipping 2.3, or maybe even lower. Considering that up to now our benchmark requirement was bison 1.875, requiring 2.4 is a pretty big jump just to get rid of a warning. I guess we have to revert this, and IMO we should also lobby the Bison people to not emit the deprecation warnings yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.
On 2014-05-28 18:52:22 -0400, Tom Lane wrote: I guess we have to revert this Looks like it. and IMO we should also lobby the Bison people to not emit the deprecation warnings yet. That's a good idea. What i've been thinking about is to add -Wno-deprecated to the bison rule in the interim. Maybe after a configure test for the option. All deprecation warnings so far seem to be pretty unhelpful. Btw, the bison release process and documentation suck. Majorly. The most efficient way to learn about changes seems to be to look at the git repository. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.
Andres Freund and...@2ndquadrant.com writes: On 2014-05-28 18:52:22 -0400, Tom Lane wrote: and IMO we should also lobby the Bison people to not emit the deprecation warnings yet. That's a good idea. What i've been thinking about is to add -Wno-deprecated to the bison rule in the interim. Maybe after a configure test for the option. All deprecation warnings so far seem to be pretty unhelpful. Meh. If we just hide them permanently, we're likely to be blindsided somewhere down the road when they turn a deprecation into an error. What I was wondering about was if we could modify the .y files when building with a pre-2.4 bison. It'd be easy enough to fix this with sed, say. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.
On 2014-05-28 19:12:44 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-28 18:52:22 -0400, Tom Lane wrote: and IMO we should also lobby the Bison people to not emit the deprecation warnings yet. That's a good idea. What i've been thinking about is to add -Wno-deprecated to the bison rule in the interim. Maybe after a configure test for the option. All deprecation warnings so far seem to be pretty unhelpful. Meh. If we just hide them permanently, we're likely to be blindsided somewhere down the road when they turn a deprecation into an error. I think some bleeding edge buildfarm animal will warn us soon enough. It's not as if we're able to do much about the deprecations as is without breaking with older releases. What I was wondering about was if we could modify the .y files when building with a pre-2.4 bison. It'd be easy enough to fix this with sed, say. .oO(m4). Should be doable and might actually be interesting for a couple of other things. I think I'll just stick a BISONFLAGS=+ -Wno-deprecated in my Makefile.custom for now. I.e. I am not volunteering. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Wed, May 28, 2014 at 04:04:13PM +0100, Simon Riggs wrote: On 28 May 2014 15:34, Fujii Masao masao.fu...@gmail.com wrote: Also, compress_backup_block GUC needs to be merged with full_page_writes. Basically I agree with you because I don't want to add new GUC very similar to the existing one. But could you imagine the case where full_page_writes = off. Even in this case, FPW is forcibly written only during base backup. Such FPW also should be compressed? Which compression algorithm should be used? If we want to choose the algorithm for such FPW, we would not be able to merge those two GUCs. IMO it's OK to always use the best compression algorithm for such FPW and merge them, though. I'd prefer a new name altogether torn_page_protection = 'full_page_writes' torn_page_protection = 'compressed_full_page_writes' torn_page_protection = 'none' this allows us to add new techniques later like torn_page_protection = 'background_FPWs' or torn_page_protection = 'double_buffering' when/if we add those new techniques Uh, how would that work if you want to compress the background_FPWs? Use compressed_background_FPWs? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On Wed, May 28, 2014 at 6:51 PM, Peter Geoghegan p...@heroku.com wrote: Have you thought about things like specialized prefetching for nested loop joins? Currently, such a thing would need some non-trivial changes to the execution nodes, I believe. For nestloop, correct me if I'm wrong, but index scan nodes don't have visibility of the next tuple to be searched for. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On Wed, May 28, 2014 at 5:59 PM, Claudio Freire klaussfre...@gmail.com wrote: For nestloop, correct me if I'm wrong, but index scan nodes don't have visibility of the next tuple to be searched for. Nested loop joins are considered a particularly compelling case for prefetching, actually. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Avoiding re-creation of uuid_t state with OSSP UUID
While mucking around with contrib/uuid-ossp, I realized that its usage of the OSSP UUID library is really rather broken: it creates and destroys a uuid_t object for each call of the UUID creation functions. This is not the way you're supposed to use that library. The uuid_t object is meant to hold persistent state such as the system MAC address, so really we ought to make such an object once per session and re-use it, as per the attached proposed patch. Doing it like this has multiple benefits: * saving the cycles needed to create and destroy a uuid_t object, notably including kernel calls to find out the system's MAC address. On my machine, this patch reduces the runtime of uuid_generate_v1() from about 16 microseconds to about 3. * reducing the amount of entropy we draw from /dev/urandom. strace'ing shows that at least with RHEL6's version of OSSP UUID, each uuid_create call reads 2 bytes from /dev/urandom, thus depleting the available entropy system-wide. * enabling the code to actually guarantee that successive V1-style UUIDs are distinct. OSSP remembers the last timestamp it read from the kernel, and if the new one is the same, it increments the saved clock sequence value to guarantee a distinct result. But that logic all depends on re-using a cached uuid_t! Without that, if successive gettimeofday calls produce the same result, we're at risk of producing duplicate UUIDs if the clock sequence values are the same. uuid_create does initialize the clock sequence value to a random number, but since only 14 bits are available for it in the V1 UUID format, the chance of collision is 1 in 16K. Now admittedly you'd need a machine noticeably faster than mine to get duplicate timestamps with the existing code, but we're not that far away from having it happen. So I think the existing code is not only rather broken, but its effects on the system entropy pool are not far short of being a security bug. Accordingly, I'd like to not only apply this to HEAD but back-patch it. Comments? regards, tom lane diff --git a/contrib/uuid-ossp/uuid-ossp.c b/contrib/uuid-ossp/uuid-ossp.c index 88da168..9e9905b 100644 *** a/contrib/uuid-ossp/uuid-ossp.c --- b/contrib/uuid-ossp/uuid-ossp.c *** pguuid_complain(uuid_rc_t rc) *** 141,146 --- 141,182 errmsg(OSSP uuid library failure: error code %d, rc))); } + /* + * We create a uuid_t object just once per session and re-use it for all + * operations in this module. OSSP UUID caches the system MAC address and + * other state in this object. Reusing the object has a number of benefits: + * saving the cycles needed to fetch the system MAC address over and over, + * reducing the amount of entropy we draw from /dev/urandom, and providing a + * positive guarantee that successive generated V1-style UUIDs don't collide. + * (On a machine fast enough to generate multiple UUIDs per microsecond, + * or whatever the system's wall-clock resolution is, we'd otherwise risk + * collisions whenever random initialization of the uuid_t's clock sequence + * value chanced to produce duplicates.) + * + * However: when we're doing V3 or V5 UUID creation, uuid_make needs two + * uuid_t objects, one holding the namespace UUID and one for the result. + * It's unspecified whether it's safe to use the same uuid_t for both cases, + * so let's cache a second uuid_t for use as the namespace holder object. + */ + static uuid_t * + get_cached_uuid_t(int which) + { + static uuid_t *cached_uuid[2] = {NULL, NULL}; + + if (cached_uuid[which] == NULL) + { + uuid_rc_t rc; + + rc = uuid_create(cached_uuid[which]); + if (rc != UUID_RC_OK) + { + cached_uuid[which] = NULL; + pguuid_complain(rc); + } + } + return cached_uuid[which]; + } + static char * uuid_to_string(const uuid_t *uuid) { *** string_to_uuid(const char *str, uuid_t * *** 171,190 static Datum special_uuid_value(const char *name) { ! uuid_t *uuid; char *str; uuid_rc_t rc; - rc = uuid_create(uuid); - if (rc != UUID_RC_OK) - pguuid_complain(rc); rc = uuid_load(uuid, name); if (rc != UUID_RC_OK) pguuid_complain(rc); str = uuid_to_string(uuid); - rc = uuid_destroy(uuid); - if (rc != UUID_RC_OK) - pguuid_complain(rc); return DirectFunctionCall1(uuid_in, CStringGetDatum(str)); } --- 207,220 static Datum special_uuid_value(const char *name) { ! uuid_t *uuid = get_cached_uuid_t(0); char *str; uuid_rc_t rc; rc = uuid_load(uuid, name); if (rc != UUID_RC_OK) pguuid_complain(rc); str = uuid_to_string(uuid); return DirectFunctionCall1(uuid_in, CStringGetDatum(str)); } *** special_uuid_value(const char *name) *** 193,212 static Datum uuid_generate_internal(int mode, const uuid_t *ns, const char *name, int len) { ! uuid_t *uuid; char *str; uuid_rc_t rc; - rc = uuid_create(uuid); - if (rc != UUID_RC_OK)
Re: [HACKERS] SQL access to database attributes
On 05/26/2014 08:19 PM, Vik Fearing wrote: On 05/26/2014 07:10 AM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: I don't really object to doing an unlocked check for another such database, but I'm not convinced that additional locking to try to prevent a race is worth its keep. +1 on the nannyism, and +1 to ignoring the race. Okay, I'll submit a new patch with racy nannyism and some grammar changes that Tom and I have been discussing in private. Attached are two patches. The first is a refactoring of the createdb/alterdb grammars mostly by Tom which makes all of the options non-keywords that don't otherwise need to be. Not only does this remove the two unreserved keywords I had added (ALLOW and CONNECTIONS) but also removes two existing ones (LC_COLLATE and LC_TYPE), reducing gram.o by about half a percent by Tom's calculations. That's much better than increasing it like my original patch did. The problem is we foolishly adopted a two-word option name (CONNECTION LIMIT) which complicates the grammar. My aping of that for IS TEMPLATE and ALLOW CONNECTIONS only aggravated the situation. And so I changed all the documentation (and pg_dumpall etc) to use CONNECTION_LIMIT instead. We might hopefully one day deprecate the with-space version so the sooner the documentation recommends the without-space version, the better. The old syntax is of course still valid. I also changed the documentation to say connection_limit instead of connlimit. Documentation is for humans, something like connlimit (and later as we'll see allowconn) is for programmers. It also indirectly reminds us that we should not add another multi-word option like I initially did. Now that anything goes grammar-wise, the elog for unknown option is now ereport. I am hoping this gets backpatched. --- The second patch adds my original functionality, except this time the syntax is IS_TEMPLATE and ALLOW_CONNECTIONS (both one word, neither being keywords). It also changes all the catalog updates we used to do because we didn't have this patch. As for the nannyism, the point was to find another database having datallowconn=true but since we have to be connected to a database to issue this command, the simplest is just to disallow the current database (credit Andres on IRC) so that's what I've done as explained with this in-code comment: /* * In order to avoid getting locked out and having to go through standalone * mode, we refuse to disallow connections on the database we're currently * connected to. Lockout can still happen with concurrent sessions but the * likeliness of that is not high enough to worry about. */ I also changed the C variable connlimit to dbconnlimit in AlterDatabase() to be consistent with its analog in createdb(). --- The third and non-existent patch is about regression tests. Tom put in gram.y that we should have some now that the grammar doesn't regulate it, and I wanted some anyway in my original patch; but I don't know what they should look like or where they should go so I'm asking for help on that. -- Vik *** a/contrib/sepgsql/expected/alter.out --- b/contrib/sepgsql/expected/alter.out *** *** 110,116 SET search_path = regtest_schema, regtest_schema_2, public; -- -- misc ALTER commands -- ! ALTER DATABASE regtest_sepgsql_test_database CONNECTION LIMIT 999; LOG: SELinux: allowed { setattr } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:object_r:sepgsql_db_t:s0 tclass=db_database name=regtest_sepgsql_test_database ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet ALTER TABLE regtest_table ADD COLUMN d float; --- 110,116 -- -- misc ALTER commands -- ! ALTER DATABASE regtest_sepgsql_test_database CONNECTION_LIMIT 999; LOG: SELinux: allowed { setattr } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:object_r:sepgsql_db_t:s0 tclass=db_database name=regtest_sepgsql_test_database ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet ALTER TABLE regtest_table ADD COLUMN d float; *** a/contrib/sepgsql/sql/alter.sql --- b/contrib/sepgsql/sql/alter.sql *** *** 85,91 SET search_path = regtest_schema, regtest_schema_2, public; -- -- misc ALTER commands -- ! ALTER DATABASE regtest_sepgsql_test_database CONNECTION LIMIT 999; ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet ALTER TABLE regtest_table ADD COLUMN d float; --- 85,91 -- -- misc ALTER commands -- ! ALTER DATABASE regtest_sepgsql_test_database CONNECTION_LIMIT 999; ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet ALTER TABLE regtest_table ADD COLUMN d float; *** a/doc/src/sgml/ref/alter_database.sgml ---
[HACKERS] Ancient bug in formatting.c/to_char()
Consider this example: [local]/postgres=# SELECT to_char(1e9::float8,'9D9'); to_char -- 10.0 (1 row) [local]/postgres=# SELECT to_char(1e20::float8,'9D9'); to_char 1 (1 row) [local]/postgres=# SELECT to_char(1e20,'9D9'); to_char -- 1.0 (1 row) There is access to uninitialized memory here. #define DEBUG_TO_FROM_CHAR shows this to be the case (garbage is printed to stdout). Valgrind brought this to my attention. I propose the attached patch as a fix for this issue. The direct cause appears to be that float8_to_char() feels entitled to truncate Number description post-digits, while that doesn't happen within numeric_to_char(), say. This is very old code, from the original to_char() patch back in 2000, and the interactions here are not obvious. I think that that should be okay to not do this truncation, since the value of MAXDOUBLEWIDTH was greatly increased in 2007 as part of a round of fixes to bugs of similar vintage. There is still a defensive measure against over-sized input (we bail), but that ought to be unnecessary, strictly speaking. -- Peter Geoghegan *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** *** 59,64 --- 59,65 * --- */ + /* XXX: DEBUG_TO_FROM_CHAR may access uninitialized memory */ #ifdef DEBUG_TO_FROM_CHAR #define DEBUG_elog_output DEBUG3 #endif *** float8_to_char(PG_FUNCTION_ARGS) *** 5427,5440 Num.pre += Num.multi; } orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! len = snprintf(orgnum, MAXDOUBLEWIDTH + 1, %.0f, fabs(val)); ! if (Num.pre len) ! plen = Num.pre - len; ! if (len = DBL_DIG) ! Num.post = 0; ! else if (Num.post + len DBL_DIG) ! Num.post = DBL_DIG - len; ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %.*f, Num.post, val); if (*orgnum == '-') { /* 0 */ --- 5428,5439 Num.pre += Num.multi; } orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! ! len = snprintf(orgnum, MAXDOUBLEWIDTH + 1, %.*f, Num.post, val); ! ! /* Defensive */ ! if (len = MAXDOUBLEWIDTH + 1) ! elog(ERROR, double precision for character conversion is too wide); if (*orgnum == '-') { /* 0 */ *** a/src/test/regress/expected/window.out --- b/src/test/regress/expected/window.out *** SELECT to_char(SUM(n::float8) OVER (ORDE *** 1765,1771 FROM (VALUES(1,1e20),(2,1)) n(i,n); to_char -- ! 1 1.0 (2 rows) --- 1765,1771 FROM (VALUES(1,1e20),(2,1)) n(i,n); to_char -- ! 1.0 1.0 (2 rows) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat directory and pg_stat_statements
On Thu, May 29, 2014 at 4:55 AM, Tomas Vondra t...@fuzzy.cz wrote: On 28.5.2014 19:52, Fujii Masao wrote: On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. OK, patch attached. I'm afraid that it's not okay to change the file layout in $PGDATA at this beta1 stage because that change basically seems to need initdb. Otherwise something like no such file or directory error can happen. But in this case what we need to change is only the location of the pg_stat_statements permanent stats file. So, without initdb, the server will not be able to find the pg_stat_statements stats file, but this is not so harmful. Only the problem is that the pg_stat_statements stats which were collected in past would disappear. OTOH, the server can keep running successfully from then and no critical data will not disappear. Therefore I think we can commit this patch even at beta1. Thought? For HEAD, probably yes. But what about backpatching 9.3? I think No. So we should not backpatch this to 9.3. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Odd uuid-ossp behavior on smew and shearwater
Buildfarm critters smew and shearwater are reporting regression test failures that suggest that the UUID library can't get a system MAC address: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=smewdt=2014-05-28%2023%3A38%3A28 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwaterdt=2014-05-29%2000%3A24%3A32 I've just committed regression test adjustments to prevent that from being a failure case, but I am confused about why it's happening. I wouldn't be surprised at not getting a MAC address on a machine that lacks any internet connection, but that surely can't describe the buildfarm environment. Are you curious enough to poke into it and see what's going on? It might be useful to strace a backend that's trying to execute uuid_generate_v1() and see what the kernel interaction looks like exactly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat directory and pg_stat_statements
On Thu, May 29, 2014 at 8:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 29, 2014 at 4:55 AM, Tomas Vondra t...@fuzzy.cz wrote: On 28.5.2014 19:52, Fujii Masao wrote: On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. OK, patch attached. I'm afraid that it's not okay to change the file layout in $PGDATA at this beta1 stage because that change basically seems to need initdb. Otherwise something like no such file or directory error can happen. But in this case what we need to change is only the location of the pg_stat_statements permanent stats file. So, without initdb, the server will not be able to find the pg_stat_statements stats file, but this is not so harmful. Only the problem is that the pg_stat_statements stats which were collected in past would disappear. OTOH, the server can keep running successfully from then and no critical data will not disappear. Therefore I think we can commit this patch even at beta1. Thought? For HEAD, probably yes. But what about backpatching 9.3? I think No. So we should not backpatch this to 9.3. Just curious. Will it work in upgrade scenario? -- Thanks Regards, Ashesh Vashi Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposing pg_hibernate
On Wed, May 28, 2014 at 5:30 PM, Gurjeet Singh gurj...@singh.im wrote: On Wed, May 28, 2014 at 2:15 AM, Amit Kapila amit.kapil...@gmail.com wrote: How about the cases when shared buffers already contain some data: a. Before Readers start filling shared buffers, if this cluster wishes to join replication as a slave and receive the data from master, then this utility might need to evict some buffers filled during startup phase. A cluster that wishes to be a replication standby, it would do so while it's in startup phase. The BlockReaders are launched immediately on cluster reaching consistent state, at which point, I presume, in most of the cases, most of the buffers would be unoccupied. Even to reach consistent state, it might need to get the records from master (example to get to STANDBY_SNAPSHOT_READY state). Hence BlockReaders might evict the occupied buffers, which may be a small fraction of total shared_buffers. Yes, but I think still it depends on how much redo replay happens on different pages. b. As soon as the server completes startup (reached consistent point), it allows new connections which can also use some shared buffers before Reader process could use shared buffers or are you planing to change the time when users can connect to database. The BlockReaders are launched immediately after the cluster reaches consistent state, that is, just about when it is ready to accept connections. So yes, there is a possibility that the I/O caused by the BlockReaders may affect the performance of queries executed right at cluster startup. But given that the performance of those queries was anyway going to be low (because of empty shared buffers), and that BlockReaders tend to cause sequential reads, and that by default there's only one BlockReader active at a time, I think this won't be a concern in most of the cases. By the time the shared buffers start getting filled up, the buffer replacement strategy will evict any buffers populated by BlockReaders if they are not used by the normal queries. Even Block Readers might need to evict buffers filled by user queries or by itself in which case there is chance of contention, but again all these are quite rare scenario's. I am not sure if replacing shared buffer contents in such cases can always be considered useful. IMHO, all of these caveats, would affect a very small fraction of use-cases and are eclipsed by the benefits this extension provides in normal cases. I agree with you that there are only few corner cases where evicting shared buffers by this utility would harm, but was wondering if we could even save those, say if it would only use available free buffers. I think currently there is no such interface and inventing a new interface for this case doesn't seem to reasonable unless we see any other use case of such a interface. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] libpq: PQexec may block indefinitly
On Mon, May 26, 2014 at 1:34 PM, Dmitry Samonenko shreddingw...@gmail.com wrote: 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful? 2. PQexec ultimately leads to PQwait, which after some function calls lands in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this? Have you tried using Cancel functionality: http://www.postgresql.org/docs/9.4/static/libpq-cancel.html Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround. There are certain tcp parameters which can be configured for connections. tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_stat directory and pg_stat_statements
On Thu, May 29, 2014 at 1:01 PM, Ashesh Vashi ashesh.va...@enterprisedb.com wrote: On Thu, May 29, 2014 at 8:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 29, 2014 at 4:55 AM, Tomas Vondra t...@fuzzy.cz wrote: On 28.5.2014 19:52, Fujii Masao wrote: On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote: But pg_stat_statements file is saved under $PGDATA/global yet. Is this intentional or just oversight? I think it's an oversight. OK, patch attached. I'm afraid that it's not okay to change the file layout in $PGDATA at this beta1 stage because that change basically seems to need initdb. Otherwise something like no such file or directory error can happen. But in this case what we need to change is only the location of the pg_stat_statements permanent stats file. So, without initdb, the server will not be able to find the pg_stat_statements stats file, but this is not so harmful. Only the problem is that the pg_stat_statements stats which were collected in past would disappear. OTOH, the server can keep running successfully from then and no critical data will not disappear. Therefore I think we can commit this patch even at beta1. Thought? For HEAD, probably yes. But what about backpatching 9.3? I think No. So we should not backpatch this to 9.3. Just curious. Will it work in upgrade scenario? You're concerned about the scenario using pg_upgrade? I'm not sure the detail of pg_upgrade. But if it doesn't work properly, we should have gotten the trouble when 9.3 (which pg_stat directory was introduced) was released. But I've not heard such trouble Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers