Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually important. Do you have reason to believe that this one is? It's been six years since that email, so it's worth asking if this is actually relevant. As far as I know the pg_control is not WAL protected, which means if it gets corrupt due to any reason (disk crash during flush, so written partially), it might lead to failure in recovery of database. AFAIR pg_controldata fit on a disk sector so it can not be half written. It can be corrupt due to some other reasons as well like torn disk sector. As already pg_resetxlog has a mechanism to recover corrupt pg_control file, so it is already considered that it can be corrupt in some case. The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be relevant to do this patch. From: Cédric Villemain [ced...@2ndquadrant.com] Sent: Saturday, June 16, 2012 2:19 AM To: pgsql-hackers@postgresql.org Cc: Amit kapila; 'Robert Haas' Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit : I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually important. Do you have reason to believe that this one is? It's been six years since that email, so it's worth asking if this is actually relevant. As far as I know the pg_control is not WAL protected, which means if it gets corrupt due to any reason (disk crash during flush, so written partially), it might lead to failure in recovery of database. AFAIR pg_controldata fit on a disk sector so it can not be half written. So user can use pg_resetxlog to recover the database. Currently pg_resetxlog works on guessed values for pg_control. However this implementation can improve the logic that instead of guessing, it can try to regenerate the values from WAL. This implementation can allow better recovery in certain circumstances. The deadline for patches for this CommitFest is today, so I think you should target any work you're starting now for the NEXT CommitFest. Oh, I am sorry, as this was my first time I was not fully aware of the deadline. However I still seek your opinion whether it makes sense to work on this feature. -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Friday, June 15, 2012 12:40 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com wrote: I am planning to work on the below Todo list item for this CommitFest Allow WAL information to recover corrupted pg_controldata http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php The deadline for patches for this CommitFest is today, so I think you should target any work you're starting now for the NEXT CommitFest. I wanted to confirm my understanding about the work involved for this patch: The existing patch has following set of problems: 1. Memory leak and linked list code path is not proper 2. lock check for if the server is already running, is removed in patch which needs to be reverted 3. Refactoring of the code. Apart from above what I understood from the patch is that its intention is to generate values for ControlFile using WAL logs when -r option is used. The change in algorithm from current will be if control file is corrupt which essentialy means ReadControlFile() will return False, then it should generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using WAL if -r option is enabled. Also for -r option, it doesn't need to call function FindEndOfXLOG() as the that work will be achieved by above point. It will just rewrite the control file and don’t do other resets. The algorithm of restoring the pg_control value from old xlog file: 1. Retrieve all of the active xlog files from xlog direcotry into a list by increasing order, according their timeline, log id, segment id. 2. Search the list to find the oldest xlog file of the lastest time line. 3. Search the records from the oldest xlog file of latest time line to the latest xlog file of latest time line, if the checkpoint record has been found, update the latest checkpoint and previous checkpoint. Apart from above some changes in code will be required after the Xlog patch by Heikki. Suggest me if my understanding is correct? I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually
Re: [HACKERS] Resource Owner reassign Locks
I don't think so. C doesn't ref count its pointers. You are right I have misunderstood. I don't think that lock tags have good human readable formats, and just a pointer dump probably wouldn't be much use when something that can never happen has happened. But I'll at least add a reference to the resource owner if this stays in. I have checked in lock.c file for the message where lock tags have been used. elog(ERROR, lock %s on object %u/%u/%u is already held, lockMethodTable-lockModeNames[lockmode], lock-tag.locktag_field1, lock-tag.locktag_field2, lock-tag.locktag_field3); This can give more information about erroneous lock. From: Jeff Janes [jeff.ja...@gmail.com] Sent: Saturday, June 16, 2012 3:21 AM To: Amit kapila Cc: pgsql-hackers Subject: Re: [HACKERS] Resource Owner reassign Locks On Mon, Jun 11, 2012 at 9:30 PM, Amit Kapila amit.kap...@huawei.com wrote: Yes, that means the list has over-flowed. Once it is over-flowed, it is now invalid for the reminder of the life of the resource owner. Don't we need any logic to clear the reference of locallock in owner-locks array. I don't think so. C doesn't ref count its pointers. MAX_RESOWNER_LOCKS - How did you arrive at number 10 for it. Is there any specific reason for 10. I instrumented the code to record the maximum number of locks held by a resource owner, and report the max when it was destroyed. (That code is not in this patch). During a large pg_dump, the vast majority of the resource owners had maximum locks of 2, with some more at 4 and 6.Then there was one resource owner, for the top-level transaction, at tens or hundreds of thousands (basically one for every lockable object). There was little between 6 and this top-level number, so I thought 10 was a good compromise, safely above 6 but not so large that searching through the list itself was likely to bog down. Also, Tom independently suggested the same number. Should it emit a FATAL rather than an ERROR? I thought ERROR was sufficient to make the backend quit, as it is not clear how it could meaningfully recover. I am not able to visualize any valid scenario in which it can happen unless some corruption happens. If this happens, user can close all statements and abort its transactions. According to me ERROR is okay. However in the message Can't find lock to remove, it could be better, if there is information about resource owner and lock. I think we might end up changing that entirely once someone more familiar with the error handling mechanisms takes a look at it. I don't think that lock tags have good human readable formats, and just a pointer dump probably wouldn't be much use when something that can never happen has happened. But I'll at least add a reference to the resource owner if this stays in. Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Yes, but there's also a lot of such awkward logic we need to add if we *do* go with the SSL library doing the compression: For example, we can no longer trust the SSL library to always do encryption, since we specifically want to support null encryption. True, but are you sure we don't need to do that anyway? What happens today, if a non-libpq client connects with SSL and specifies null encryption? openssl rejects the connection unless you have explicitly allowed NULL encryption in ssl_ciphers. Which is the only sensible default. And we currently have no way to specify different encryption options on a per-host basis, which is something we'd have to do (e.g. i want to be able to say that subnet x requires encryption with these encryptions methods and subnet y doesn't require encryption but should do compression. [ shrug... ] Having that sort of control over a homebrew compression solution will *also* require a lot of control logic that does not exist today. The important part isn't really being able to control the compression in this. It's that we're overloading a convenience feature (compression) in the settings of a security feature (encryption). Which leads to both complex processing, and also a fairly high risk of accidentally configuring what you wouldn't want unless we change the interface to make it look like separate things even if they aren't. So there's quite a bit of complexity that needs to be put in there just to deal with the fact that we're using SSL to do compression, if we want to support it in a way that's not hackish. It's not obvious to me that we actually *need* anything except the ability to recognize that a null-encrypted SSL connection probably shouldn't be treated as matching a hostssl line; which is not something that requires any fundamental rearrangements, since it only requires an after-the-fact check of what was selected. Things like subnet x requires encryption with these encryption methods are features that are sensible with our existing feature set. But we don't have that now and nobody has asked for it, so I think you are moving the goalposts rather unfairly by claiming that a compression-related patch needs to add it. Maybe I spelled it out wrong. It does require it insofar that if we want to use this for compression, we must *always* enable openssl on the connection. So the with these encryption method boils down to NULL encryption only or whatever other standards I have for encryption. We don't need the ability to change the whatever other standards per subnet, but we need to control the accept-NULL-encryption on a per subnet basis. It also risks some level of information leak - assuming someone connects with NULL encryption and we don't support it, unless we do something particular about it, the error message will go out in cleartext. Today, you will get a client generated error message and no actual message crosses the wire in cleartext. It's not that we can't deal with those things. It's just that it's going to take some work, and some careful thought about exactly which parts can be exposed over NULL encrypted connections. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Sat, Jun 16, 2012 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Marko Kreen mark...@gmail.com writes: On Sat, Jun 16, 2012 at 6:39 AM, Magnus Hagander mag...@hagander.net wrote: Fair enough if we decide that - but we should make that decision knowing that we're leaving the JDBC and .Net people in a bad position where they are not likely to be able to implement his. The JDBC people have a theoretical chance if the JDK is open. The .Net people are stuck with schannel that doesn't support it at this point. It might well do in the future (since it's in the standard); but they're at the mercy of Microsoft. Both Java and C# are open-source enough that anybody can take existing SSL implementation and add compression to it, then distribute it as improved SSL library. Possibly more to the point: that is work they might have to do, if nobody else steps up to the plate --- and if they do end up doing it, it could benefit other projects too. On the other hand, if we roll-our-own transport compression solution, that is work they *will* have to do, with no chance of sharing the effort with other projects. True - provided said upstream (Oracle in the Java case) are interested in accepting the patches... If they end up having to port one of the compressoin algorithms, let's dake LZ4 as an example, then they can certainly release that as open source under a compatible license, thus making it available to others. Though that's not necessarily that relevant - LZ4 already has a C# implementation for .net, a JNI wrapper for Java. Snappy even has a native Java implementation. So if we went down that road, there wouldn't *be* a need to implement it. Just the protocol parts itself, which are - compared to implementing the actual compression in either scheme - trivial. BTW, as far as the .Net case goes, it took only a moment's googling to find this: http://openssl-net.sourceforge.net/ which is a .Net wrapper around real OpenSSL. It doesn't appear to provide wrappers for the compression selection functions, but surely that's just a lack of round tuits, not that it would take more than five minutes to add them. that would then loose all the advantages that npgsql get from schannel, such as integrated certificate management. So it can be done - but it would AFAICT require a fairly large rearchitecture of how security is handled, it would add a license-incompatible requirement, and it would loose other features. But it can be done. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Combine non-recursive and recursive CTEs?
I'm not sure if this is something I don't know how to do, or if it's something we simply can't do, or if it's something we could do but the syntax can't handle :-) Basically, I'd like to combine a recursive and a non-recursive CTE in the same query. If I do it non-recursive, I can do something like: WITH t1(z) AS ( SELECT a FROM x ), t2 AS ( SELECT z FROM t1 ) SELECT * FROM t2; But what if I want t2 to be recursive? Trying something like: WITH t1 (z,b) AS ( SELECT a,b FROM x ), RECURSIVE t2(z,b) AS ( SELECT z,b FROM t1 WHERE b IS NULL UNION ALL SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z ) I get a syntax error on the RECURSIVE. Is there any other position in this query that I can put the RECURSIVE in order for it to get through? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
Amit kapila amit.kap...@huawei.com writes: AFAIR pg_controldata fit on a disk sector so it can not be half written. It can be corrupt due to some other reasons as well like torn disk sector. Torn disk sector? Please, this is nonsense. Disks cannot write half a sector and then stop. A sufficiently badly designed drive might attempt to start a write when it didn't have enough power left to finish ... but the result of that would be a corrupt sector with a non-matching CRC, not one that read back okay but contained erroneous data. The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be relevant to do this patch. Well, we invented pg_resetxlog with the thought that it might be useful for such situations, but I'm not sure offhand that we've ever seen a field report of corrupted pg_control files. For instance, a quick search in the archives for incorrect checksum in control file turns up only cases of pilot error, such as supposing that a 32-bit database could be used with a 64-bit server or vice versa. Actual hardware failures on the pg_control file could be expected to result in something like could not read from control file: I/O error, which I find no evidence for at all in the archives. Before adding new code to improve the situation, it would be good to have (a) evidence that there's a problem worth solving, and (b) a theory as to what likely-to-occur cases the new code is going to make better, while not making things worse in other likely-to-occur cases. Case in point here is that it's not immediately obvious that we should trust the contents of WAL more than pg_control --- the former gets a whole lot more write traffic and hence has many more opportunities for failure. At the moment I don't see that we have either (a) or (b), so I think it's pretty dubious to be making any changes of this sort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combine non-recursive and recursive CTEs?
On Jun 16, 2012, at 8:27 AM, Magnus Hagander wrote: I'm not sure if this is something I don't know how to do, or if it's something we simply can't do, or if it's something we could do but the syntax can't handle :-) Basically, I'd like to combine a recursive and a non-recursive CTE in the same query. If I do it non-recursive, I can do something like: WITH t1(z) AS ( SELECT a FROM x ), t2 AS ( SELECT z FROM t1 ) SELECT * FROM t2; But what if I want t2 to be recursive? Trying something like: WITH t1 (z,b) AS ( SELECT a,b FROM x ), RECURSIVE t2(z,b) AS ( SELECT z,b FROM t1 WHERE b IS NULL UNION ALL SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z ) I get a syntax error on the RECURSIVE. Is there any other position in this query that I can put the RECURSIVE in order for it to get through? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers hm, this is interesting ... cat /tmp/a.sql WITHy AS ( SELECT 1 AS n), g AS (WITH RECURSIVE x(n) AS ( SELECT (SELECT n FROM y) AS n UNION ALL SELECT n + 1 AS n FROM x WHERE n 10)) SELECT * FROM g; Hans-Jurgen-Scbonigs-MacBook-Pro:sql hs$ psql test /tmp/a.sql ERROR: syntax error at or near ) LINE 8: WHERE n 10)) this gives a syntax error as well ... if my early morning brain is correct this should be a proper statement ... regards, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combine non-recursive and recursive CTEs?
Magnus Hagander mag...@hagander.net writes: Basically, I'd like to combine a recursive and a non-recursive CTE in the same query. Just mark them all as recursive. There's no harm in marking a CTE as recursive when it isn't really. Trying something like: WITH t1 (z,b) AS ( SELECT a,b FROM x ), RECURSIVE t2(z,b) AS ( SELECT z,b FROM t1 WHERE b IS NULL UNION ALL SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z ) I get a syntax error on the RECURSIVE. The SQL spec says RECURSIVE can only appear immediately after WITH, so it necessarily applies to all the CTEs in the WITH list. The reason why it's like that is that RECURSIVE affects the visibility rules for which CTEs can refer to which other ones. I think the SQL committee would have done better to keep the two concepts separate, but they didn't ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pg default's verbosity?
Hello pgdev, (Second attempt) I've conducted a statistical study about PostgreSQL use in OSS. One of the result is that quite a few projects have errors in their SQL setup scripts which lead to some statements to be ignored, typically somme ADD CONSTRAINTS which do not change the database schema from a functional point of view, or syntactic errors (typically a mysql syntax...) that result in missing tables, but which are not found if the application is not fully tested. I think that there are two reasons why these errors are not caught by application developers: (1) the default verbosity is set to notice, which is much to high. The users just get used to seeing a lot of messages on loading an sql script, and to ignore them, so that errors are just hidden in the flow of notices. I think that a better default setting would be warnings, that is messages that require some attention from the developer. (2) the default behavior of psql on errors is to keep going. Developers of SQL script that are expected to work shoud be advised to: - encourage application devs to set ON_ERROR_STOP and/or use a global transaction in their script. - provide a simple/short option to do that from the command line basically that could be an enhanced -1, NOT restricted to -f but that would work on standard input as well. sh psql -1 -f setup.sql # -1 does work here sh psql -1 setup.sql # -1 does not apply to stdin stuff... So I would suggest the following todos: 1 - change the default verbosity to warning. 2 - change -1 to work on stdin as well instead of being ignored, or provide another option that would do that. -- Fabien Coelho - coe...@cri.ensmp.fr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] compare lower case and upper case when encoding is utf-8
Hi hackers, I found that lower case is less than upper case when the db is created with utf8. I tried below locale en_US.utf8 'A''a' false locale ja_JP.utf8 'A''a' true locale zh_CN.utf8 'A''a' false Under Windows locale Chinese_China 'A''a' false I am not sure it is normal or not. But in Chinese, the lower case should be greater than upper, same as locale C. I made some code try to fix it. It seems to work fine. Quan Zongliang -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combine non-recursive and recursive CTEs?
On Sat, Jun 16, 2012 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Basically, I'd like to combine a recursive and a non-recursive CTE in the same query. Just mark them all as recursive. There's no harm in marking a CTE as recursive when it isn't really. Hah. I could've sworn I tried that and got the typical error of you need to use the union construct for recursive queries. But clearly I must've typoed something in that one, because when I did that over again, it now worked perfectly... Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
Hi Robert, On Friday, June 15, 2012 10:03:38 PM Robert Haas wrote: On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund and...@2ndquadrant.com wrote: I don't plan to throw in loads of conflict resolution smarts. The aim is to get to the place where all the infrastructure is there so that a MM solution can be built by basically plugging in a conflict resolution mechanism. Maybe providing a very simple one. I think without in-core support its really, really hard to build a sensible MM implementation. Which doesn't mean it has to live entirely in core. Of course, several people have already done it, perhaps most notably Bucardo. Bucardo certainly is nice but its not useable for many things just from an overhead perspective. Anyway, it would be good to get opinions from more people here. I am sure I am not the only person with an opinion on the appropriateness of trying to build a multi-master replication solution in core or, indeed, the only person with an opinion on any of these other issues. It is not good for those other opinions to be saved for a later date. Agreed. Hm. Yes, you could do that. But I have to say I don't really see a point. Maybe the fact that I do envision multimaster systems at some point is clouding my judgement though as its far less easy in that case. Why? I don't think that particularly changes anything. Because it makes conflict detection very hard. I also don't think its a feature worth supporting. Whats the use-case of updating records you cannot properly identify? It also complicates the wal format as you now need to specify whether you transport a full or a primary-key only tuple... Why? If the schemas are in sync, the target knows what the PK is perfectly well. If not, you're probably in trouble anyway. True. There already was the wish (from Kevin) of having the option of transporting full before/after images anyway, so the wal format might want to be able to represent that. I think though that we do not want to enforce that mode of operation for tightly coupled instances. For those I was thinking of using command triggers to synchronize the catalogs. One of the big screwups of the current replication solutions is exactly that you cannot sensibly do DDL which is not a big problem if you have a huge system with loads of different databases and very knowledgeable people et al. but at the beginning it really sucks. I have no problem with making one of the nodes the schema master in that case. Also I would like to avoid the overhead of the proxy instance for use-cases where you really want one node replicated as fully as possible with the slight exception of being able to have summing tables, different indexes et al. In my view, a logical replication solution is precisely one in which the catalogs don't need to be in sync. If the catalogs have to be in sync, it's not logical replication. ISTM that what you're talking about is sort of a hybrid between physical replication (pages) and logical replication (tuples) - you want to ship around raw binary tuple data, but not entire pages. Ok, thats a valid point. Simon argued at the cluster summit that everything thats not physical is logical. Which has some appeal because it seems hard to agree what exactly logical rep is. So definition by exclusion makes kind of sense ;) I think what you categorized as hybrid logical/physical rep solves an important use-case thats very hard to solve at the moment. Before my 2ndquadrant days I had several client which had huge problemsing the trigger based solutions because their overhead simply was to big a burden on the master. They couldn't use SR either because every consuming database kept loads of local data. I think such scenarios are getting more and more common. The problem with that is it's going to be tough to make robust. Users could easily end up with answers that are total nonsense, or probably even crash the server. Why? To step back and talk about DDL more generally, you've mentioned a few times the idea of using an SR instance that has been filtered down to just the system catalogs as a means of generating logical change records. However, as things stand today, there's no reason to suppose that replicating anything less than the entire cluster is sufficient. For example, you can't translate enum labels to strings without access to the pg_enum catalog, which would be there, because enums are built-in types. But someone could supply a similar user-defined type that uses a user-defined table to do those lookups, and now you've got a problem. I think this is a contractual problem, not a technical one. From the point of view of logical replication, it would be nice if type output functions were basically guaranteed to look at nothing but the datum they get passed as an argument, or at the very least nothing other than the system catalogs, but there is no such guarantee. And, without
[HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 12 February 2012 02:06, Vik Reykja vikrey...@gmail.com wrote: I decided to take a crack at the todo item created from the following post: http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php The attached patch makes the desired changes in both code and function naming. It seemed quite easy to do but wasn't marked as easy on the todo, so I'm wondering if I've missed something. All regression tests pass. Here's my review of this patch. Basic stuff: * Patch applies OK (some offsets). BTW, I had no problems applying both the original patch and Chetan Suttraway's version. The only difference between the patches seems to be that the original is in context format, and Chetan Suttraway's is in unified format. Which format do hackers actually prefer? The wiki page http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git suggests context format, but then the linked example http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified format. Do people care, or are both formats OK? * Compiles cleanly with no warnings. * Regression tests pass. The regression tests have not been updated. I think that's fair enough - I don't see a way to test this in a regression test - but I did some testing (see below) to confirm the expected behaviour. * No doc changes needed. What it does: - The primary benefit this patch offers is to prevent unnecessary queuing of RI triggers in the case of updates to a FK table where the old and new FK values are both NULL. It does this by effectively replacing the existing key equality checks with IS [NOT] DISTINCT FROM checks. This seems like a worthwhile optimisation, because I think that it is fairly common to have NULLs in FKs columns, and then update some other column. The patch also prevents unnecessary queuing of RI triggers when the PK table is updated, and the old and new values are both NULL, but that seems like a much less common case. I've looked over the code changes fairly closely, and I believe that this is a safe change. Technically, I think the changes to ri_OneKeyEqual() and ri_AllKeysUnequal() are unnecessary, since they can only be called from the trigger functions in the case where all the old values are non-NULL, hence the new versions end up behaving the same. However, I think it makes sense for all these functions to be consistent. Talking of consistency, I wonder if RI_FKey_keyequal_upd_pk() and RI_FKey_keyequal_upd_fk() ought to be renamed to RI_FKey_keyunchanged_upd_pk() and RI_FKey_keyunchanged_upd_pk()? Testing: I tested using the following tables: CREATE TABLE pk_table ( a int PRIMARY KEY ); INSERT INTO pk_table SELECT * FROM generate_series(1,10); CREATE TABLE fk_table ( a int PRIMARY KEY, b int, c int, d int, e int REFERENCES pk_table(a) ); INSERT INTO fk_table SELECT i, i*2, i*3, i*4, CASE WHEN i%10 = 0 THEN i END FROM generate_series(1,10) g(i); (i.e., FK populated in 10% of rows) Then in HEAD: EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; QUERY PLAN --- Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=1390.037..1390.037 rows=0 loops=1) - Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=0.010..60.841 rows=10 loops=1) Trigger for constraint fk_table_e_fkey: time=210.184 calls=9 Total runtime: 1607.626 ms (4 rows) So the RI trigger is fired 9 times, for the unchanged NULL FK rows. With this patch, the RI trigger is not fired at all: EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; QUERY PLAN --- Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=1489.640..1489.640 rows=0 loops=1) - Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=0.010..66.328 rows=10 loops=1) Total runtime: 1489.679 ms (3 rows) Similarly, if I update the FK column in HEAD the RI trigger is fired for every row: EXPLAIN ANALYSE UPDATE fk_table SET e=e-1; QUERY PLAN --- Update on fk_table (cost=0.00..1800.00 rows=10 width=26) (actual time=1565.148..1565.148 rows=0 loops=1) - Seq Scan on fk_table (cost=0.00..1800.00 rows=10 width=26) (actual time=0.010..42.725 rows=10 loops=1) Trigger for constraint fk_table_e_fkey: time=705.962 calls=10 Total runtime: 2279.408 ms (4 rows) whereas with this patch it is only fired for the non-NULL FK rows that are changing: EXPLAIN ANALYSE
[HACKERS] Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3
On Sat, Jun 16, 2012 at 01:10:31AM -0400, Noah Misch wrote: On Fri, Jun 15, 2012 at 10:45:16PM -0400, Bruce Momjian wrote: I have updated the pgindent README to use these Perl indent instructions: find . -name \*.pl -o -name \*.pm | xargs perltidy \ --backup-and-modify-in-place --opening-brace-on-new-line \ --vertical-tightness=2 --vertical-tightness-closing=2 \ --nospace-after-keyword=for --nospace-for-semicolon \ --add-whitespace --delete-old-whitespace --paren-tightness=2 \ --keep-old-blank-lines=2 --maximum-line-length=78 \ --entab-leading-whitespace=4 --output-line-ending=unix I would lean against using --nospace-after-keyword=for. Not using it means we get wrong formatting when the for-loop conditions span multiple lines. Using it means we get wrong formatting (albeit less severe) on every for-loop. In any event, if we do use it for for-loops, we should probably use it for all control structure keywords. Agreed, good point. Otherwise, I like this. As a last idle idea, how about putting the options in a configuration file and passing --profile= as the only option? Besides keeping you from copying a 7-line shell command, this has the benefit of ignoring any ~/.perltidyrc. Also agreed, and change made. Perltidyrc now has: --add-whitespace --backup-and-modify-in-place --delete-old-whitespace --entab-leading-whitespace=4 --keep-old-blank-lines=2 --maximum-line-length=78 --nospace-for-semicolon --opening-brace-on-new-line --output-line-ending=unix --paren-tightness=2 --vertical-tightness=2 --vertical-tightness-closing=2 -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Demos: https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c Few clarifications below. On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen mark...@gmail.com wrote: Now, looking at the problem with some perspective, the solution is obvious: when in single-row mode, the PQgetResult() must return proper PGresult for that single row. And everything else follows that. Such API is implemented in attached patch: * PQsetSingleRowMode(conn): set's single-row mode. The function can be called only after PQsend* and before any rows have arrived. This guarantees there will be no surprises to PQexec* users who expect full resultset at once. Also it guarantees that user will process resultset with PQgetResult() loop, either sync or async. Next PQexec/PQsend call will reset the flag. So it is active only for duration of processing results from one command. Currently it returns FALSE if called in wrong place and does nothing. Only question I see here is whether it should set error state on connection or not. It does not seem to be improvement. * PQgetRowData(): can be called instead PQgetResult() to get raw row data in buffer, for more efficient processing. This is optional feature that provides the original row-callback promise of avoiding unnecessary row data copy. * Although PQgetRowData() makes callback API unnecessary, it is still fully compatible with it - the callback should not see any difference whether the resultset is processed in single-row mode or old single-PGresult mode. Unless it wants to - it can check PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. The PQgetResult() is compatible with callbacks, the PQgetRowData() bypasses them. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen mark...@gmail.com writes: Now, looking at the problem with some perspective, the solution is obvious: when in single-row mode, the PQgetResult() must return proper PGresult for that single row. And everything else follows that. * PQgetRowData(): can be called instead PQgetResult() to get raw row data in buffer, for more efficient processing. This is optional feature that provides the original row-callback promise of avoiding unnecessary row data copy. * Although PQgetRowData() makes callback API unnecessary, it is still fully compatible with it - the callback should not see any difference whether the resultset is processed in single-row mode or old single-PGresult mode. Unless it wants to - it can check PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. I guess this raises the question of whether we ought to revert the row-callback patch entirely and support only this approach. IMO it is (barely) not too late to do that for 9.2, if we want to. If we don't want to, then this is just another new feature and should be considered for 9.3. What I like about this is the greatly simpler and harder-to-misuse API. The only arguable drawback is that there's still at least one malloc/free cycle per tuple, imposed by the creation of a PGresult for each one, whereas the callback approach avoids that. But worrying about that could be considered to be vast overoptimization; the backend has certainly spent a lot more overhead than that generating the tuple. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
Magnus Hagander mag...@hagander.net writes: On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not obvious to me that we actually *need* anything except the ability to recognize that a null-encrypted SSL connection probably shouldn't be treated as matching a hostssl line; which is not something that requires any fundamental rearrangements, since it only requires an after-the-fact check of what was selected. Maybe I spelled it out wrong. It does require it insofar that if we want to use this for compression, we must *always* enable openssl on the connection. So the with these encryption method boils down to NULL encryption only or whatever other standards I have for encryption. We don't need the ability to change the whatever other standards per subnet, but we need to control the accept-NULL-encryption on a per subnet basis. After sleeping on it, I wonder if we couldn't redefine the existing list of acceptable ciphers option as the list of ciphers that are considered to provide encrypted transport. So you'd be allowed to connect with SSL using any unapproved cipher (including NULL), the backend just considers it as equivalent to a non-SSL connection for pg_hba purposes. Then no change is needed in any configuration stuff. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg default's verbosity?
Fabien COELHO coe...@cri.ensmp.fr writes: [ errors in SQL scripts fed to psql are easily missed ] So I would suggest the following todos: 1 - change the default verbosity to warning. The argument for defaulting to NOTICE is the same as it's always been: that those messages are really intended for novices, and a pretty good definition of a novice is somebody who doesn't know how to (or that he should) change the verbosity setting. So if we don't show notices by default, they will be unavailable to exactly the people who need them. Your proposal does not overcome this argument. Besides, I'm not convinced that changing client_min_messages in isolation would do much for the problem, because psql is still pretty chatty by itself; you really need -q to have any hope that important messages didn't scroll off your screen. Perhaps it would be sensible to have the -q switch also execute set client_min_messages = warning, and recommend that people use that when running allegedly-debugged scripts? 2 - change -1 to work on stdin as well instead of being ignored, or provide another option that would do that. Yeah, if that doesn't work already, it would be sane to make it do so, at least for non-tty stdin. It seems like a fairly bad idea for interactive stdin, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup docs
Magnus Hagander mag...@hagander.net writes: This is copied from the old documentation. It used to say It is not necessary to be concerned about the amount of time elapsed between pg_start_backup and the start of the actual backup, nor between the end of the backup and pg_stop_backup. And the whole idea was to simplify the text at the beginning ;) Oh I see, not your patch to fix then. I just quick read the diff, as you can see. This is copied exactly from what it is today. I'm sure it can be approved, but it's not the goal of this patch. Let's not let perfection get in the way of improvement... Same. It does, it's under standalone hot backups. The second to last part of the patch. Perfect then. Sorry for the noise, regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Sat, Jun 16, 2012 at 11:15:30AM -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not obvious to me that we actually *need* anything except the ability to recognize that a null-encrypted SSL connection probably shouldn't be treated as matching a hostssl line; which is not something that requires any fundamental rearrangements, since it only requires an after-the-fact check of what was selected. Maybe I spelled it out wrong. It does require it insofar that if we want to use this for compression, we must *always* enable openssl on the connection. So the with these encryption method boils down to NULL encryption only or whatever other standards I have for encryption. We don't need the ability to change the whatever other standards per subnet, but we need to control the accept-NULL-encryption on a per subnet basis. After sleeping on it, I wonder if we couldn't redefine the existing list of acceptable ciphers option as the list of ciphers that are considered to provide encrypted transport. So you'd be allowed to connect with SSL using any unapproved cipher (including NULL), the backend just considers it as equivalent to a non-SSL connection for pg_hba purposes. Then no change is needed in any configuration stuff. regards, tom lane +1 That is nice and clean. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: I guess this raises the question of whether we ought to revert the row-callback patch entirely and support only this approach. IMO it is (barely) not too late to do that for 9.2, if we want to. If we don't want to, then this is just another new feature and should be considered for 9.3. I think row-callback is dangerous API that does not solve any important problems. But I do like the 2-phase processing the rowproc patch introduced and having a way to bypass unnecessary malloc()+copy. So my preference would be to simply remove the callback API but keep the processing and provide PQgetRowData() instead. Although the win that it brings is significantly smaller thanks to single-row PQgetResult(). So if it does not sound interesting to others, it can be dropped. Because the single-row processing is the important feature we need, rest is extra. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Dean Rasheed dean.a.rash...@gmail.com writes: BTW, I had no problems applying both the original patch and Chetan Suttraway's version. The only difference between the patches seems to be that the original is in context format, and Chetan Suttraway's is in unified format. Which format do hackers actually prefer? The wiki page http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git suggests context format, but then the linked example http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified format. Do people care, or are both formats OK? Some people find one or the other more readable. (I'm in the camp that says unified format is great for isolated single-line changes and utterly unreadable for anything more complex, but apparently there are people who prefer it.) For detailed review/commit purposes, it doesn't matter that much as long as the patch applies cleanly, since it's easy to apply it and then get a diff in the other format if you prefer reading the other. However, if you're just hoping people will eyeball the patch in email and comment on it, readability matters. If the patch requires manual fixup in order to get it to apply anymore, readability is also a concern, since you're dependent on the committer not misinterpreting the hunks he has to patch in by hand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 16 June 2012 18:04, Tom Lane t...@sss.pgh.pa.us wrote: Dean Rasheed dean.a.rash...@gmail.com writes: BTW, I had no problems applying both the original patch and Chetan Suttraway's version. The only difference between the patches seems to be that the original is in context format, and Chetan Suttraway's is in unified format. Which format do hackers actually prefer? The wiki page http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git suggests context format, but then the linked example http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified format. Do people care, or are both formats OK? Some people find one or the other more readable. (I'm in the camp that says unified format is great for isolated single-line changes and utterly unreadable for anything more complex, but apparently there are people who prefer it.) For detailed review/commit purposes, it doesn't matter that much as long as the patch applies cleanly, since it's easy to apply it and then get a diff in the other format if you prefer reading the other. However, if you're just hoping people will eyeball the patch in email and comment on it, readability matters. If the patch requires manual fixup in order to get it to apply anymore, readability is also a concern, since you're dependent on the committer not misinterpreting the hunks he has to patch in by hand. OK thanks, that's good to know. I tend to find context format easier to read for large patches, but that's a highly subjective thing. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Start of 2012-06 CommitFest
There hasn't been any discussion of who will manage this CF that I've heard. Since nature abhors a vacuum, I took some preliminary steps to get it started, and will take the role if nobody objects. If you want to review a particular patch, go ahead and claim it in the CF application: https://commitfest.postgresql.org/action/commitfest_view?id=14 If you're not sure which patch you want to review, please send me an email off-list with your areas of interest and a summary of your skill-set, so I can pick one for you. There was a decision at the developers' meeting to ask each patch submitter to review patches of the same number and approximate complexity as they submit. If you haven't yet done so, please do. (Remember, it's fine to have multiple reviewers for a single patch.) If you want to contribute to the development of PostgreSQL and you haven't yet reviewed any patches, please read this page: http://wiki.postgresql.org/wiki/CommitFest ... and follow the appropriate links for more detail. You don't need to be a C coder to help. We need people to test, benchmark, and check documentation, too. This CF is scheduled to run from the 15th of June to the 15th of July, so any new patches should be submitted to the next CF: http://commitfest.postgresql.org/action/commitfest_view/open -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?
While looking at Vik Reykja's pending patch to improve the FK triggers by skipping processing when a NULL column didn't change, I started to wonder whether that really had no user-visible semantic effect. In particular, in ON UPDATE SET NULL/SET DEFAULT cases, it seemed like this could change the set of referencing columns that get set to NULL or to their defaults. So the next question was which behavior actually conforms to the SQL standard, and the answer to that is ... disturbing. The code in ri_triggers.c was written against SQL92's definition of ON UPDATE SET NULL/SET DEFAULT, which is (ignoring the MATCH PARTIAL case, which we don't implement): 6) If an update rule is specified and a non-null value of a ref- erenced column in the referenced table is updated to a value that is distinct from the current value of that column, then a) If match type is not specified or if FULL is specified, then ii) If the update rule specifies SET NULL, then Case: 1) If match type is not specified, then in all matching rows the referencing column that corresponds with the referenced column is set to the null value. 2) If match type specifies FULL, then in all matching rows each referencing column is set to the null value. iii) If the update rule specifies SET DEFAULT, then in all matching rows the referencing column that corresponds with the referenced column is set to the default value specified in the General Rules of Subclause 11.5, default clause. Note that only in the MATCH FULL + SET NULL case does it say to set *all* the referencing columns in each matching row. Otherwise, you are only supposed to change columns that correspond to referenced columns that were changed. It's notable that SET NULL and SET DEFAULT have different behaviors here. On the other hand, in SQL:2008 I find (some boilerplate text omitted): 10) If a non-null value of a referenced column RC in the referenced table is updated to a value that is distinct from the current value of RC, then, for every member F of the subtable family of the referencing table: Case: a) If M specifies SIMPLE or FULL, then Case: ii) If UR specifies SET NULL, then Case: 1) If M specifies SIMPLE, then each matching row MR in F is paired with the candidate replacement row NMR, formed by copying MR and setting each referencing column in the copy to the null value. MR is identified for replacement by NMR in F. 2) If M specifies FULL, then each matching row MR in F is paired with the candidate replacement row NMR, formed by copying MR and setting each referencing column in the copy to the null value. MR is identified for replacement by NMR in F. iii) If UR specifies SET DEFAULT, then each matching row MR in F is paired with the candidate replacement row NMR, formed by copying MR and setting each referencing column in the copy to the default value specified in the General Rules of Subclause 11.5, default clause. MR is identified for replacement by NMR in F. So far as I can see, this says to set *all* referencing columns to nulls or their defaults, in all four cases, whether the corresponding referenced column was one that changed or not. This is very clearly different from what SQL92 says. It's also rather curious that they distinguish two cases for SET NULL when the texts are exactly alike. It looks to me like this change occurred in SQL:2003, although SQL:1999's version of the text is such badly written pseudo-mathematical gobbledygook that it's a bit hard to tell which behavior they meant. However, neither of those specs list any change in referential constraint behavior as being an acknowledged incompatibility with the prior standard. Have the SQL committee simply failed to notice that in whacking this text around they changed the meaning? Which behavior is actually implemented by other RDBMSes? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg default's verbosity?
Hello Tom, thanks for your answer. So I would suggest the following todos: 1 - change the default verbosity to warning. The argument for defaulting to NOTICE is the same as it's always been: that those messages are really intended for novices, and a pretty good definition of a novice is somebody who doesn't know how to (or that he should) change the verbosity setting. So if we don't show notices by default, they will be unavailable to exactly the people who need them. Your proposal does not overcome this argument. I'm sceptical about what a real novice is expected to do about the incredible flow of useless information displayed when loading a significant script. For a start, s?he should be an incredibly fast reader:-) For a non-novice it just hides what is important and should be seen. However maybe it make senses in interactive mode, as you suggest, so possibly this should be the real trigger to change the level of messages. Besides, I'm not convinced that changing client_min_messages in isolation would do much for the problem, I agree with you, but it is a necessary step somewhere... because psql is still pretty chatty by itself; you really need -q to have any hope that important messages didn't scroll off your screen. Hmmm, yes and no, in my opinion. CREATE XXX is a very short output, quite distinct from the output of a warning/error, which can be seen when messages are scrolled, even if the message cannot be read on the fly. I would know that something is not right. Perhaps it would be sensible to have the -q switch also execute set client_min_messages = warning, and recommend that people use that when running allegedly-debugged scripts? That could be useful. However I'm not sure that I would select -q when loading a big script, I'm happy to know that things are going on and I would like to know if the script is stuck somewhere. 2 - change -1 to work on stdin as well instead of being ignored, or provide another option that would do that. Yeah, if that doesn't work already, I did checked that it does not work with 9.1.3. it would be sane to make it do so, at least for non-tty stdin. It seems like a fairly bad idea for interactive stdin, though. I agree that distinguishing interactive non interactive stdin is reasonable. So maybe the suggestion would be to distinguish 2 default settings - client_min_tty_messages = notice; # or some other name... - client_min_messages = warning; # or some other name... Moreover: - -1 should work on stdin *when not interactive* - -1 should be clearly advised when loading scripts... not sure where it should be in the documentation... - I'm not sure about -q for this purpose, mostly because I would not use it by default -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote: On 10 June 2012 19:47, Joshua Berkus j...@agliodbs.com wrote: So currently we have a major limitation in binary replication, where it is not possible to remaster your system (that is, designate the most caught-up standby as the new master) based on streaming replication only. This is a major limitation because the requirement to copy physical logs over scp (or similar methods), manage and expire them more than doubles the administrative overhead of managing replication. This becomes even more of a problem if you're doing cascading replication. The major limitation was solved by repmgr close to 2 years ago now. So while you're correct that the patch to fix that assumed that archiving worked as well, it has been possible to operate happily without it. Remastering is one of the biggest thorns in my side over the last year. I don't think it's yet a trivially mechanized issue yet, but I do need to get there, and probably a few alterations in Postgres would help, although I have not itemized what they are (rather, I was intending to work around problems with what I have today). But since it is apropos to this discussion, here's what I've been thinking along these lines: Instead of using re-synchronization (e.g. repmgr in its relation to rsync), I intend to proxy and also inspect the streaming replication traffic and then quiesce all standbys and figure out what node is farthest ahead. Once I figure out the node that is farthest ahead, if it is not a node that is eligible for promotion to the master, I need to exchange its changes to nodes that are eligible for promotion[0], and then promote one of those, repointing all other standbys to that node. This must all take place nominally within a second or thirty. Conceptually it is simple, but mechanically it's somewhat intense, especially in relation to the inconvenience of doing this incorrectly. I surmise someone could come up with supporting mechanisms to make it less burdensome to write. One snarl is the interaction with the archive and restore commands: Postgres might, for example, have been in the middle of download and replaying a WAL segment even when I wish to be quiesced, and there's not a great way to stop it[1]. Ideally, I could replace those archive/dearchive commands with software that speaks the streaming replication protocol and just have less code involved overall. I think that is technically possible today, but maybe could be made easier, in particular being able to more easily chunk and align the WAL stream into units of some kind from the streaming protocol. Maybe it's already possible, but it will take a little thinking. I had already written off getting this level of cohesion in the next year (intending a detailed mix of archive_command and streaming protocol software), but it's not something that leaves me close to satisfied by any measure. Furthermore, some use cases demand that no matter what the user setting with regard to syncrep is that Postgres not make progress unless it has synchronously replicated to a special piece of proxy software. This is useful if one wants to offload the exact location and storage strategy for crash recovery to another piece of software. That's the obvious next step after a cohesive delegation of (de-)archiving. So, all in all, Postgres has no great way to cohesively delegate all WAL-persistence and WAL-restoration and I don't know if the streaming protocol + sync rep facilities can completely conveniently subsume all those use cases (but I think it probably can without enormous modification). I think it should learn what it needs to learn to make that happen. It might even allow the existing shell-command based (de-)archiver to live as a contrib. [0]: Use case: When a small standby used for some reporting happens to be the farthest ahead) [1]: Details: a simple touched file to no-op the restore_command is unsatisfying, because the restore_command may have already been started by postgres, so now you have to make your restore_command coordinate with your streaming replication proxy software to be safe or wait long enough for a single segment to replay as so one can be assured that the system is quiesced. I see this is an anti-feature of the current file-based archiving strategy) -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On 12-06-15 04:03 PM, Robert Haas wrote: On Thu, Jun 14, 2012 at 4:13 PM, Andres Freundand...@2ndquadrant.com wrote: I don't plan to throw in loads of conflict resolution smarts. The aim is to get to the place where all the infrastructure is there so that a MM solution can be built by basically plugging in a conflict resolution mechanism. Maybe providing a very simple one. I think without in-core support its really, really hard to build a sensible MM implementation. Which doesn't mean it has to live entirely in core. Of course, several people have already done it, perhaps most notably Bucardo. Anyway, it would be good to get opinions from more people here. I am sure I am not the only person with an opinion on the appropriateness of trying to build a multi-master replication solution in core or, indeed, the only person with an opinion on any of these other issues. This sounds like a good place for me to chime in. I feel that in-core support to capture changes and turn them into change records that can be replayed on other databases, without relying on triggers and log tables, would be good to have. I think we want some flexible enough that people write consumers of the LCRs to do conflict resolution for multi-master but I am not sure that the conflict resolution support actually belongs in core. Most of the complexity of slony (both in terms of lines of code, and issues people encounter using it) comes not from the log triggers or replay of the logged data but comes from the configuration of the cluster. Controlling things like * Which tables replicate from a node to which other nodes * How do you change the cluster configuration on a running system (adding nodes, removing nodes, moving the origin of a table, adding tables to replication etc...) This is the harder part of the problem, I think we need to first get the infrastructure committed (that the current patch set deals with) to capturing, transporting and translating the LCR's into the system before get too caught up in the configuration aspects. I think we will have a hard time agreeing on behaviours for some of that other stuff that are both flexible for enough use cases and simple enough for administrators. I'd like to see in-core support for a lot of that stuff but I'm not holding my breath. It is not good for those other opinions to be saved for a later date. Hm. Yes, you could do that. But I have to say I don't really see a point. Maybe the fact that I do envision multimaster systems at some point is clouding my judgement though as its far less easy in that case. Why? I don't think that particularly changes anything. It also complicates the wal format as you now need to specify whether you transport a full or a primary-key only tuple... Why? If the schemas are in sync, the target knows what the PK is perfectly well. If not, you're probably in trouble anyway. I think though that we do not want to enforce that mode of operation for tightly coupled instances. For those I was thinking of using command triggers to synchronize the catalogs. One of the big screwups of the current replication solutions is exactly that you cannot sensibly do DDL which is not a big problem if you have a huge system with loads of different databases and very knowledgeable people et al. but at the beginning it really sucks. I have no problem with making one of the nodes the schema master in that case. Also I would like to avoid the overhead of the proxy instance for use-cases where you really want one node replicated as fully as possible with the slight exception of being able to have summing tables, different indexes et al. In my view, a logical replication solution is precisely one in which the catalogs don't need to be in sync. If the catalogs have to be in sync, it's not logical replication. ISTM that what you're talking about is sort of a hybrid between physical replication (pages) and logical replication (tuples) - you want to ship around raw binary tuple data, but not entire pages. The problem with that is it's going to be tough to make robust. Users could easily end up with answers that are total nonsense, or probably even crash the server. I see three catalogs in play here. 1. The catalog on the origin 2. The catalog on the proxy system (this is the catalog used to translate the WAL records to LCR's). The proxy system will need essentially the same pgsql binaries (same architecture, important complie flags etc..) as the origin 3. The catalog on the destination system(s). The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need to be in-sync with catalog 1. I think catalogs 2 and 3 are combined in the current patch set (though I haven't yet looked at the code closely). I think the performance optimizations Andres has implemented to update tuples through low-level functions should be left for later and that we should be generating SQL in the apply cache so we don't start
Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?
I wrote: Have the SQL committee simply failed to notice that in whacking this text around they changed the meaning? Which behavior is actually implemented by other RDBMSes? If anyone is up for actually trying this, here is a script to test the behavior in question: create table pp (f1 int, f2 int, primary key (f1,f2)); create table cmssn (f1 int, f2 int, foreign key (f1,f2) references pp(f1,f2) on update set null); create table cmfsn (f1 int, f2 int, foreign key (f1,f2) references pp(f1,f2) match full on update set null); create table cmssd (f1 int default 0, f2 int default 0, foreign key (f1,f2) references pp(f1,f2) on update set default); create table cmfsd (f1 int default 0, f2 int default 0, foreign key (f1,f2) references pp(f1,f2) match full on update set default); insert into pp values (11, 22); insert into pp values (11, 0); insert into pp values (0, 0); insert into cmssn values (11, 22); insert into cmfsn values (11, 22); insert into cmssd values (11, 22); insert into cmfsd values (11, 22); update pp set f2 = f2 + 1 where f2 0; select * from cmssn; select * from cmfsn; select * from cmssd; select * from cmfsd; In Postgres this produces f1 | f2 + 11 | (1 row) f1 | f2 + | (1 row) f1 | f2 + 11 | 0 (1 row) f1 | f2 + 0 | 0 (1 row) which shows that we are self-consistent but not actually compliant with either old or new wordings of the spec :-( The only other SQL DB I have handy is mysql 5.5.24, which shows up pretty unimpressively: it gives a syntax error on the cmssd definition, which would be all right because the manual says the innodb storage engine doesn't support SET DEFAULT, except it *doesn't* give a syntax error for creating cmfsd. Then, the update fails claiming that cmfsn's FK constraint is violated, so they evidently don't implement that case correctly. After removing cmfsn, the update fails again claiming that cmfsd's FK constraint is violated, so yeah they are telling the truth when they say SET DEFAULT doesn't work. The upshot is that only the MATCH SIMPLE SET NULL case works at all in current mysql, and that produces the result mysql select * from cmssn; +--+--+ | f1 | f2 | +--+--+ | NULL | NULL | +--+--+ 1 row in set (0.00 sec) so they are nulling all the referencing columns in this case, which matches the more recent specs but is clearly contrary to SQL92. Anybody have DB2, or something else that might be thought to be pretty close to spec-compliant? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?
On 16 June 2012 21:18, Tom Lane t...@sss.pgh.pa.us wrote: Anybody have DB2, or something else that might be thought to be pretty close to spec-compliant? I have an Oracle DB, but they're not exactly known for spec compliance. In fact they dodge this entire issue by not supporting ON UPDATE actions at all :-) Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] s/UNSPECIFIED/SIMPLE/ in foreign key code?
Our foreign-key-related code uses MATCH_UNSPECIFIED to denote the default foreign key match behavior. This corresponds to the wording used in the SQL92 spec, for instance If match type is not specified or if FULL is specified, But I always found it rather confusing; it sounds like we don't know what match behavior we're supposed to implement. I notice that in SQL99 and later, the SQL committee introduced MATCH SIMPLE as a way to name the behavior that formerly had no name. So now they can write things like If M specifies SIMPLE or FULL, ... which seems much nicer to me. I think it would be a useful advance in readability if we replaced UNSPECIFIED by SIMPLE throughout the FK code, and barring objections I will go do that. A small flaw in this plan is that in pg_constraint.confmatchtype, MATCH_UNSPECIFIED is stored as 'u'. In a green field I'd just rename that to 's' for SIMPLE, but it seems possible that this would confuse client-side code such as pg_dump or psql. A quick look shows that neither of those programs actually look directly at pg_constraint.confmatchtype, instead relying on backend functions when they want to deconstruct a foreign key constraint. But there could well be other client code that would notice the change. So I'm a bit torn as to whether to change it and create a release-note-worthy compatibility issue, or to leave it as-is (with documentation notes that u for MATCH_SIMPLE is a historical accident). Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote: However, it occurred to me that we could pretty easily jury-rig something that would give us an idea about the actual benefit available here. To wit: make a C function that wraps strxfrm, basically strxfrm(text) returns bytea. Then compare the performance of ORDER BY text_col to ORDER BY strxfrm(text_col). (You would need to have either both or neither of text and bytea using the sortsupport code paths for this to be a fair comparison.) I thought this was an interesting idea, so decided to try it out for myself. I tried this out against master (not Robert's patch, per Tom's direction). The results were interesting: [peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 2795 tps = 46.563970 (including connections establishing) tps = 46.568234 (excluding connections establishing) [peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 2079 tps = 34.638838 (including connections establishing) tps = 34.640665 (excluding connections establishing) The first test executed the following query against the dellstore database: select * from products order by strxfrm_test(actor) offset 10001; The second: select * from products order by actor offset 10001; So, this was pretty good - an improvement that is completely independent of Robert's. Bear in mind, this simple demonstration adds additional fmgr overhead, which we have plenty of reason to believe could hurt things, besides which each call must allocate memory that could perhaps be avoided. In addition, I don't know enough about locale-aware sorting and related algorithms to have devised a test that would stress strxfrm()/ strcoll() - these were all strings that could be represented as ASCII. In light of this, I think there is a pretty strong case to be made for pre-processing text via strxfrm() as part of this patch. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services strxfrm_test.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] measuring spinning
On Thu, Jun 14, 2012 at 2:39 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 11, 2012 at 8:48 PM, Robert Haas robertmh...@gmail.com wrote: I've had cause, a few times this development cycle, to want to measure the amount of spinning on each lwlock in the system. To that end, I've found the attached patch useful. Note that if you don't define LWLOCK_STATS, this changes nothing except that the return value from s_lock becomes int rather than void. If you do define LWLOCK_STATS, then LWLockAcquire() counts the number of pg_usleep() calls that are required to acquire each LWLock, in addition to the other statistics. Since this has come up for me a few times now, I'd like to proposing including it in core. Well, this fell through the cracks, because I forgot to add it to the January CommitFest. Here it is again, rebased. This applies and builds cleanly and passes make check (under enable-cassert). Not test or docs are needed for a patch of this nature. It does what it says, and we want it. I wondered if the change in the return signature of s_lock would have an affect on performance. So I've run a series of pgbench -T 30 -P -c8 -j8, at a scale of 30 which fits in shared_buffers, using an Amazon c1.xlarge (8 cores). I ran both HEAD, and HEAD+patch (without LWLOCK_STATS in both cases), in random ordering. The patch was 0.37% slower, average 298483 selects per second patched to 299582 HEAD. The difference is probably real (p value 0.042, one sided.) but is also pretty much negligible and could just be due to where the executable code falls in the cache lines which could move around with other changes to the code. Two suggestions: In your original email you say number of pg_usleep() calls that are required to acquire each LWLock, but nothing in the code says this. Just reading lwlock.c I would naively assume it is reporting the number of TAS spins, not the number of spin-delays (and in fact that is what I did assume until I read your email more carefully). A comment somewhere in lwlock.c would be helpful. Also in lwlock.c, if (sh_acquire_counts[i] || ex_acquire_counts[i] || block_counts[i] || spin_counts[i]) I don't think we can have spins (or blocks, for that matter) unless we have some acquires to have caused them, so the last two tests in that line seem to be noise. Since my suggestions are minor, should I go ahead and mark this ready for committer? Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?
2012/6/16 Tom Lane t...@sss.pgh.pa.us: I wrote: Have the SQL committee simply failed to notice that in whacking this text around they changed the meaning? Which behavior is actually implemented by other RDBMSes? If anyone is up for actually trying this, here is a script to test the behavior in question: create table pp (f1 int, f2 int, primary key (f1,f2)); create table cmssn (f1 int, f2 int, foreign key (f1,f2) references pp(f1,f2) on update set null); create table cmfsn (f1 int, f2 int, foreign key (f1,f2) references pp(f1,f2) match full on update set null); create table cmssd (f1 int default 0, f2 int default 0, foreign key (f1,f2) references pp(f1,f2) on update set default); create table cmfsd (f1 int default 0, f2 int default 0, foreign key (f1,f2) references pp(f1,f2) match full on update set default); insert into pp values (11, 22); insert into pp values (11, 0); insert into pp values (0, 0); insert into cmssn values (11, 22); insert into cmfsn values (11, 22); insert into cmssd values (11, 22); insert into cmfsd values (11, 22); update pp set f2 = f2 + 1 where f2 0; select * from cmssn; select * from cmfsn; select * from cmssd; select * from cmfsd; In Postgres this produces f1 | f2 + 11 | (1 row) f1 | f2 + | (1 row) f1 | f2 + 11 | 0 (1 row) f1 | f2 + 0 | 0 (1 row) which shows that we are self-consistent but not actually compliant with either old or new wordings of the spec :-( The only other SQL DB I have handy is mysql 5.5.24, which shows up pretty unimpressively: it gives a syntax error on the cmssd definition, which would be all right because the manual says the innodb storage engine doesn't support SET DEFAULT, except it *doesn't* give a syntax error for creating cmfsd. Then, the update fails claiming that cmfsn's FK constraint is violated, so they evidently don't implement that case correctly. After removing cmfsn, the update fails again claiming that cmfsd's FK constraint is violated, so yeah they are telling the truth when they say SET DEFAULT doesn't work. The upshot is that only the MATCH SIMPLE SET NULL case works at all in current mysql, and that produces the result mysql select * from cmssn; +--+--+ | f1 | f2 | +--+--+ | NULL | NULL | +--+--+ 1 row in set (0.00 sec) so they are nulling all the referencing columns in this case, which matches the more recent specs but is clearly contrary to SQL92. Anybody have DB2, or something else that might be thought to be pretty close to spec-compliant? I tryed in a MS SQL Server 2012 via SQLFiddle [1]. I could only create 'cmssn' and 'cmssd' tables because as I can see in [2] MS SQL Server 2012 doesn't supports MATCH syntax. The result was: select * from cmssn; F1 | F2 (null) | (null) select * from cmssd; F1 |F2 0 | 0 The test is in [3], and there you can try other RDBMS, just create the schema on the left panel and testing selects on the right. [1] http://sqlfiddle.com [2] http://msdn.microsoft.com/en-us/library/ms174979.aspx [3] http://sqlfiddle.com/#!6/ac7db/1 Regards. -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Broken system timekeeping breaks the stats collector
I've had a vague feeling for awhile now that the occasional buildfarm failures we see in the stats regression test (where it appears that the stats collector fails to respond to requests for no good reason) might be related to operating-system timekeeping glitches. Today there is finally indisputable evidence of this happening, in this log file: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogmouthdt=2012-06-16%2014%3A30%3A12 Observe the following log excerpt: [4fdcaca3.e74:478] LOG: statement: CREATE INDEX dupindexcols_i ON dupindexcols (f1, id, f1 text_pattern_ops); [4fdcaca3.e74:479] LOG: statement: ANALYZE dupindexcols; [4fdcaca3.e74:480] LOG: statement: EXPLAIN (COSTS OFF) SELECT count(*) FROM dupindexcols WHERE f1 'WA' and id 1000 and f1 ~~ 'YX'; [4fdcaca3.e74:481] LOG: statement: SELECT count(*) FROM dupindexcols WHERE f1 'WA' and id 1000 and f1 ~~ 'YX'; [4fdcaca3.e74:482] LOG: disconnection: session time: 0:00:00.000 user=pgrunner database=regression host=::1 port=1123 [4fdcab40.e04:1] LOG: connection received: host=::1 port=1125 [4fdcab40.e04:2] LOG: connection authorized: user=pgrunner database=regression [4fdcab40.e04:3] LOG: statement: CREATE TABLE a (aa TEXT); [4fdcab40.444:1] LOG: connection received: host=::1 port=1126 [4fdcab40.e88:1] LOG: connection received: host=::1 port=1127 [4fdcab40.444:2] LOG: connection authorized: user=pgrunner database=regression [4fdcab40.444:3] LOG: statement: CREATE USER regtest_unpriv_user; [4fdcab40.444:4] LOG: statement: CREATE SCHEMA temp_func_test; [4fdcab40.e04:4] LOG: statement: CREATE TABLE b (bb TEXT) INHERITS (a); We can tell from the statements being executed that session 4fdcaca3.e74 is running the create_index regression test, while 4fdcab40.444 is running create_function_3, and the other sessions starting concurrently with it are part of the parallel group that runs after create_index. So they certainly didn't start until 4fdcaca3.e74 finished. Also create_index is not exactly a zero-time test, so the fact that 4fdcaca3.e74 reported a session runtime of 0:00:00.000 should already set off some alarm bells. But remember that in the %c log_line_prefix escape, the first part is the process's start-time timestamp in hex. So 4fdcaca3 means Sat Jun 16 2012, 11:56:19 EDT while 4fdcab40 means Sat Jun 16 2012, 11:50:24 EDT, nearly six minutes *earlier*. The reported zero session time is explainable by the fact that TimestampDifference returns zeros if the given timestamps are out of order. All the other process timestamps in the log are likewise consistent with the theory that the system's clock went backwards six-plus minutes while create_index was running. Then, when we finally get to the stats regression test, it fails with symptoms indicating that the stats collector never answered the backend's requests for an updated stats file; and when the postmaster finally shuts down and tells the stats collector to shut down too, the final pgstat_write_statsfile call bleats like so: [4fdcac54.a0c:1] LOG: last_statrequest 2012-06-16 11:55:20.813625-04 is later than collector's time 2012-06-16 11:54:12.392375-04 Once you know that the system clock glitched like that, it's fairly obvious what happened inside the stats collector: it wrote out the stats file (probably in response to an autovacuum request) sometime shortly after 11:55:20, and then the system clock went backwards to around 11:50, and so all subsequent inquiry messages had request timestamps older than last_statwrite causing the collector to believe it needn't emit fresh stats files. Aside from breaking the stats regression test, a scenario like this one has bad implications for autovacuum: for something like six minutes, autovacuum would have been operating with stale pgstat data since the stats file would not get rewritten, and backends would see the file timestamp as being ahead of their own clocks so they wouldn't think it was stale. (So this scenario doesn't explain the occasional pgstat wait timeout failures we see; though perhaps someone can think of a variant that fits that symptom?) I will refrain from disparaging Windows here and just note that this would be an unsurprising occurrence on any machine not running NTP: the owner might occasionally fix the clock once it had drifted far enough from reality. So I think it might be advisable to install some defenses against the case rather than assuming it can't happen. The most direct fix would involve checking GetCurrentTimestamp against last_statswrite in the stats collector's main loop, but on a machine with slow gettimeofday calls that would be pretty bad for performance. What I suggest instead is: (1) In backend_read_statsfile, make an initial attempt to read the stats file and then read GetCurrentTimestamp after that. If the local clock reading is less than the stats file's timestamp, we know that some sort of clock skew or glitch has happened, so force an inquiry
Re: [HACKERS] transforms
On Thu, Jun 14, 2012 at 3:42 PM, Peter Eisentraut pete...@gmx.net wrote: Here is my first patch for the transforms feature. This is a mechanism to adapt data types to procedural languages. The previous proposal was here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php When I apply this and go to contrib and do make check, I get: In file included from hstore_plperl.c:4:0: ../../src/pl/plperl/plperl.h:49:20: fatal error: EXTERN.h: No such file or directory Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On Sat, Jun 16, 2012 at 7:15 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Jun 14, 2012 at 3:42 PM, Peter Eisentraut pete...@gmx.net wrote: Here is my first patch for the transforms feature. This is a mechanism to adapt data types to procedural languages. The previous proposal was here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php When I apply this and go to contrib and do make check, I get: In file included from hstore_plperl.c:4:0: ../../src/pl/plperl/plperl.h:49:20: fatal error: EXTERN.h: No such file or directory Ah, that went away when I remembered to ./configure --with-perl Although the error message seem less than optimal. Aren't test usually skipped when they are missing prerequisites in the config? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken system timekeeping breaks the stats collector
2012/6/16 Tom Lane t...@sss.pgh.pa.us: [... cut ...] (1) In backend_read_statsfile, make an initial attempt to read the stats file and then read GetCurrentTimestamp after that. If the local clock reading is less than the stats file's timestamp, we know that some sort of clock skew or glitch has happened, so force an inquiry message to be sent with the local timestamp. But then accept the stats file anyway, since the skew might be small and harmless. The reason for the forced inquiry message is to cause (2) to happen at the collector. (2) In pgstat_recv_inquiry, if the received inquiry_time is older than last_statwrite, we should suspect a clock glitch (though it might just indicate delayed message receipt). In this case, do a fresh GetCurrentTimestamp call, and if the reading is less than last_statwrite, we know that the collector's time went backwards. To recover, reset these variables as we do at startup: last_statrequest = GetCurrentTimestamp(); last_statwrite = last_statrequest - 1; to force an immediate write to happen with the new local time. (1) is basically free in terms of the amount of work done in non-broken cases, though it will require a few more lines of code. (2) means adding some GetCurrentTimestamp calls that did not occur before, but hopefully these will be infrequent, since in the absence of clock glitches they would only happen when a backend's demand for a new stats file is generated before the collector starts to write a new stats file but not received till afterwards. Comments? Anyone see a flaw in this design? Or want to argue that we shouldn't do anything about such cases? What happens when Daylight saving time ends? Or it doesn't matter in this scenario? regards -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken system timekeeping breaks the stats collector
Dickson S. Guedes lis...@guedesoft.net writes: What happens when Daylight saving time ends? Or it doesn't matter in this scenario? Irrelevant, we're working in UTC-based timestamps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg default's verbosity?
I've always used -1-f - file.sql. It is confusing that -1 doesn't warn you when it wont work though. Sent from my iPhone On Jun 16, 2012, at 3:42 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello pgdev, (Second attempt) I've conducted a statistical study about PostgreSQL use in OSS. One of the result is that quite a few projects have errors in their SQL setup scripts which lead to some statements to be ignored, typically somme ADD CONSTRAINTS which do not change the database schema from a functional point of view, or syntactic errors (typically a mysql syntax...) that result in missing tables, but which are not found if the application is not fully tested. I think that there are two reasons why these errors are not caught by application developers: (1) the default verbosity is set to notice, which is much to high. The users just get used to seeing a lot of messages on loading an sql script, and to ignore them, so that errors are just hidden in the flow of notices. I think that a better default setting would be warnings, that is messages that require some attention from the developer. (2) the default behavior of psql on errors is to keep going. Developers of SQL script that are expected to work shoud be advised to: - encourage application devs to set ON_ERROR_STOP and/or use a global transaction in their script. - provide a simple/short option to do that from the command line basically that could be an enhanced -1, NOT restricted to -f but that would work on standard input as well. sh psql -1 -f setup.sql # -1 does work here sh psql -1 setup.sql # -1 does not apply to stdin stuff... So I would suggest the following todos: 1 - change the default verbosity to warning. 2 - change -1 to work on stdin as well instead of being ignored, or provide another option that would do that. -- Fabien Coelho - coe...@cri.ensmp.fr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
Torn disk sector? Please, this is nonsense. Disks cannot write half a sector and then stop. What I was intended to say is corruption due to hardware or some other problem, not because when Postgres is updating pg_control file. For example http://cquirke.mvps.org/9x/baddata.htm. Well, we invented pg_resetxlog with the thought that it might be useful for such situations, but I'm not sure offhand that we've ever seen a field report of corrupted pg_control files. I have found few cases where people have tried to use pg_resetxlog due to hardware problems or missing pg_control file. http://archives.postgresql.org/pgsql-performance/2004-06/msg00236.php http://archives.postgresql.org/pgsql-general/2004-06/msg00173.php http://archives.postgresql.org/pgsql-admin/2006-12/msg00205.php Case in point here is that it's not immediately obvious that we should trust the contents of WAL more than pg_control. Agreed. At the moment I don't see that we have either (a) or (b), so I think it's pretty dubious to be making any changes of this sort. As the chances of usecase for this feature are very less, So I will stop working on this feature. From: Tom Lane [t...@sss.pgh.pa.us] Sent: Saturday, June 16, 2012 12:11 PM To: Amit kapila Cc: Cédric Villemain; pgsql-hackers@postgresql.org; 'Robert Haas' Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata Amit kapila amit.kap...@huawei.com writes: AFAIR pg_controldata fit on a disk sector so it can not be half written. It can be corrupt due to some other reasons as well like torn disk sector. Torn disk sector? Please, this is nonsense. Disks cannot write half a sector and then stop. A sufficiently badly designed drive might attempt to start a write when it didn't have enough power left to finish ... but the result of that would be a corrupt sector with a non-matching CRC, not one that read back okay but contained erroneous data. The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be relevant to do this patch. Well, we invented pg_resetxlog with the thought that it might be useful for such situations, but I'm not sure offhand that we've ever seen a field report of corrupted pg_control files. For instance, a quick search in the archives for incorrect checksum in control file turns up only cases of pilot error, such as supposing that a 32-bit database could be used with a 64-bit server or vice versa. Actual hardware failures on the pg_control file could be expected to result in something like could not read from control file: I/O error, which I find no evidence for at all in the archives. Before adding new code to improve the situation, it would be good to have (a) evidence that there's a problem worth solving, and (b) a theory as to what likely-to-occur cases the new code is going to make better, while not making things worse in other likely-to-occur cases. Case in point here is that it's not immediately obvious that we should trust the contents of WAL more than pg_control --- the former gets a whole lot more write traffic and hence has many more opportunities for failure. At the moment I don't see that we have either (a) or (b), so I think it's pretty dubious to be making any changes of this sort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers