[HACKERS] Pruning the TODO list
On 15 June 2012 03:10, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com wrote: I am planning to work on the below Todo list item... ... Suggest me if my understanding is correct? I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually important. Do you have reason to believe that this one is? It's been six years since that email, so it's worth asking if this is actually relevant. The current TODO list is close to worthless and needs substantial pruning or even complete truncation. Or massive redesign. The above shows a recent attempt to fix one of the items on the TODO list, which many people have then spent time questioning. That is just a waste of time for *all* concerned. I see many items on there that have been there for eight years and longer. The situation is now so bad that many experienced developers ignore the list completely. Robert's answer is the final nail in that coffin; I agree with him. ISTM that we should prune the list right down to nothing, or very, very few entries. We must have a TODO list that we can trust to save us time. I don't want to see any more people waste their time on issues that aren't really wanted. At the moment we have lots of false positives and so waste time and bring the TODO list into disrepute. It would be better to have a compact list where every item was reasonably accepted, so we can begin to trust it again. Trusting the TODO list is what brought me my first patch, and I think it could and should be the same with others. Can Tom go through the list and archive items no longer considered likely to fly? A simple triage is all that is needed here. Or would you like me or another to do this? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
Amit Kapila amit.kap...@huawei.com writes: The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not doing so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the next-LSN. The further corruption can only be possible when we replay some wrong WAL by selecting wrong LSN. No, this is mistaken. Pages in the database that have LSN ahead of where the server thinks the end of WAL is cause lots of problems unrelated to replay; for example, inability to complete a checkpoint. That might not directly lead to additional corruption, but consider the case where such a page gets further modified, and the server decides it doesn't need to create a full-page image because the LSN is ahead of where the last checkpoint was. A crash or two later, you have new problems. (Admittedly, once you've run pg_resetxlog you're best advised to just be trying to dump what you've got, and not modify it more. But sometimes you have to hack the data just to get pg_dump to complete.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On 14.06.2012 01:56, Alexander Korotkov wrote: Hackers, attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing. @@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R angeType * r2) * part of the relcache entry for the index, typically) this essentially * eliminates lookup overhead during operations on a GiST range index. */ -static Datum +Datum TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2) { FunctionCallInfoData fcinfo; I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like: *** *** 692,697 --- 692,708 { RangeType *r1 = PG_GETARG_RANGE(0); RangeType *r2 = PG_GETARG_RANGE(1); + + typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1)); + + PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache); + } + + bool + range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache) + { + RangeType *r1 = PG_GETARG_RANGE(0); + RangeType *r2 = PG_GETARG_RANGE(1); TypeCacheEntry *typcache; RangeBound lower1, lower2; The gist and SP-gist consistent functions could call the internal function directly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On tor, 2012-06-21 at 15:01 +0800, Simon Riggs wrote: ISTM that we should prune the list right down to nothing, or very, very few entries. We must have a TODO list that we can trust to save us time. I don't want to see any more people waste their time on issues that aren't really wanted. At the moment we have lots of false positives and so waste time and bring the TODO list into disrepute. It would be better to have a compact list where every item was reasonably accepted, so we can begin to trust it again. Trusting the TODO list is what brought me my first patch, and I think it could and should be the same with others. Discussing the merits and design of a feature is half the work, so one shouldn't expect all the things on the TODO list to be completely discussed. Otherwise someone would have typed in the code already. So we just need to clarify to readers that the list are items worth thinking about, not items needing someone to type in some code. The archive links that should accompany most items should contain hints about how far discussion has progressed. Nonetheless, it would be a good idea to prune the TODO list regularly, such as after a release. We used to do that a bit, not so much lately, perhaps. But everyone is invited to contribute to that. Note, however, that even in well-maintained bug tracking systems, garbage and obsolete items accumulate over time, so some maintenance of this kind would still be necessary. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Jun20, 2012, at 19:38 , Peter Geoghegan wrote: On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote: In any case, if you have to redefine the meaning of equality in order to justify a performance patch, I'm prepared to walk away at the start. The advantage of my proposed implementation is precisely that I won't have to redefine the meaning of equality, and that only the text datatype will have to care about equivalency, so you can just skip over an explanation of equivalency for most audiences. Ok, so what exactly is it that you're proposing then? AFAICS, you're proposing to make the less-than operator rely solely on strcol(). If you don't also redefine the quality operator to match, you'll end up with cases where !(a b) and !(b a), yet also !(a == b). This breaks the trichotomy law, no? Which in turns breaks merge-joins - I believe we'd output the cartesian product {potty, potyty} x {potyty, potty} when merge-joining {potty, potyty} with itself, unless the comparison operator contains a tie-breaker. Which is obviously wrong unless you decree that 'potty' = 'potyty'. I do agree that there's a use-case for having a textual type which treats equivalent strings as equal (and which should then also treat equivalent Unicode representations of the same character as equal). But it should be a separate type, not bolted onto the existing textual types. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Too frequent message of pgbench -i?
On Wed, Jun 20, 2012 at 4:04 AM, Tatsuo Ishii is...@postgresql.org wrote: Currently pgbench -i prints following message every 10k tuples created. fprintf(stderr, %d tuples done.\n, j); I think it's long time ago when the frequency of message seemed to be appropriate because computer is getting so fast these days and every 10k message seems to be too often for me. Can we change the frequency from 10k to 100k? +1 for doing it that way. I have an old patch lying around for that. It's one line, which seems to be about as much work as the problem justifies. +1. Your patch looks good. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote: Nonetheless, it would be a good idea to prune the TODO list regularly, such as after a release. We used to do that a bit, not so much lately, perhaps. But everyone is invited to contribute to that. The idea is to remove contentious issues from the list, to avoid the waste of time. I don't believe everyone can do that, not even close. That's why I raise it here, rather than just doing it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Hi Steve, On Thursday, June 21, 2012 02:16:57 AM Steve Singer wrote: On 12-06-13 07:28 AM, Andres Freund wrote: From: Andres Freundand...@anarazel.de The individual changes need to be identified by an xid. The xid can be a subtransaction or a toplevel one, at commit those can be reintegrated by doing a k-way mergesort between the individual transaction. Callbacks for apply_begin, apply_change and apply_commit are provided to retrieve complete transactions. Missing: - spill-to-disk - correct subtransaction merge, current behaviour is simple/wrong - DDL handling (?) - resource usage controls Here is an initial review of the ApplyCache patch. Thanks! This patch provides a module for taking actions in the WAL stream and groups the actions by transaction, then passing these change records to a set of plugin functions. For each transaction it encounters it keeps a list of the actions in that transaction. The ilist included in an earlier patch is used, changes resulting from that patch review would effect the code here but not in a way that chances the design. When the module sees a commit for a transaction it calls the apply_change callback for each change. I can think of three ways that a replication system like this could try to apply transactions. 1) Each time it sees a new transaction it could open up a new transaction on the replica and makes that change. It leaves the transaction open and goes on applying the next change (which might be for the current transaction or might be for another one). When it comes across a commit record it would then commit the transaction. If 100 concurrent transactions were open on the origin then 100 concurrent transactions will be open on the replica. 2) Determine the commit order of the transactions, group all the changes for a particular transaction together and apply them in that order for the transaction that committed first, commit that transaction and then move onto the transaction that committed second. 3) Group the transactions in a way that you move the replica from one consistent snapshot to another. This is what Slony and Londiste do because they don't have the commit order or commit timestamps. Built-in replication can do better. This patch implements option (2). If we had a way of implementing option (1) efficiently would we be better off? Option (2) requires us to put unparsed WAL data (HeapTuples) in the apply cache. You can't translate this to an independent LCR until you call the apply_change record (which happens once the commit is encountered). The reason for this is because some of the changes might be DDL (or things generated by a DDL trigger) that will change the translation catalog so you can't translate the HeapData to LCR's until your at a stage where you can update the translation catalog. In both cases you might need to see later WAL records before you can convert an earlier one into an LCR (ie TOAST). Very good analysis, thanks! Another reasons why we cannot easily do 1) is that subtransactions aren't discernible from top-level transactions before the top-level commit happens, we can only properly merge in the right order (by sorting via lsn) once we have seen the commit record which includes a list of all committed subtransactions. I also don't think 1) would be particularly welcome by people trying to replicate into foreign systems. Some of my concerns with the apply cache are Big transactions (bulk loads, mass updates) will be cached in the apply cache until the commit comes along. One issue Slony has todo with bulk operations is that the replicas can't start processing the bulk INSERT until after it has commited. If it takes 10 hours to load the data on the master it will take another 10 hours (at best) to load the data into the replica(20 hours after you start the process). With binary streaming replication your replica is done processing the bulk update shortly after the master is. One reason why we have the low level apply stuff planned is that that way the apply on the standby actually takes less time than on the master. If you do it right there is significantly lower overhead there. Still, a 10h bulk load will definitely cause problems. I don't think there is a way around that for now. Long running transactions can sit in the cache for a long time. When you spill to disk we would want the long running but inactive ones spilled to disk first. This is solvable but adds to the complexity of this module, how were you planning on managing which items of the list get spilled to disk? I planned to have some cutoff 'max_changes_in_memory_per_txn' value. If it has been reached for one transaction all existing changes are spilled to disk. New changes again can be kept in memory till its reached again. We need to support serializing the cache for crash recovery + shutdown of the receiving side as well. Depending on how
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not doing so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the next-LSN. The further corruption can only be possible when we replay some wrong WAL by selecting wrong LSN. No, this is mistaken. Pages in the database that have LSN ahead of where the server thinks the end of WAL is cause lots of problems unrelated to replay; for example, inability to complete a checkpoint. That might not directly lead to additional corruption, but consider the case where such a page gets further modified, and the server decides it doesn't need to create a full-page image because the LSN is ahead of where the last checkpoint was. A crash or two later, you have new problems. Incase any modification happen to the database after it started, even if the next-LSN is max LSN of pages, the modification can create a problem because the database will be in inconsistent state. Please correct me if I am wrong in assuming that the next-LSN having value as max LSN of pages 1. has nothing to do with Replay. We should still reset the WAL so that no replay happens. 2. It is to avoid some further disasters. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies for anyone who cares to try this out, other than a C++ compiler and the standard library. Uh, are you sure the results aren't swapped? string_wrapper takes a parameter called use_strcoll, and it indeed uses strcoll() if that parameter is true, and strxfm() otherwise. But then it passes *false* to determine the speed of strcoll(), and *true* to determine the speed of strxfm()... Also, place_random_string() needs to insert a terminating zero byte, otherwise set_test segfaults, at least on my OSX machine. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote: On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies for anyone who cares to try this out, other than a C++ compiler and the standard library. Uh, are you sure the results aren't swapped? string_wrapper takes a parameter called use_strcoll, and it indeed uses strcoll() if that parameter is true, and strxfm() otherwise. But then it passes *false* to determine the speed of strcoll(), and *true* to determine the speed of strxfm()... Egads, you're right. Apparently in my haste I gave the wrong boolean argument to the class constructor in each case. I am completely wrong about this. I apologise for the careless mistake. At least I advanced the debate, though - we don't want to do any ad-hoc generation of strxfrm() output within comparators, even when one side can have a cached value. You're right about merge-joins; I cannot answer that without arbitrarily making a convenient exception about the conditions that they join on, which is itself unacceptable. I cannot change the semantics of equality to do something with strxfrm either, since that would have unacceptable performance implications, as is evident from my test-case. I wish someone had a better idea, but I suppose at this point the idea of concatenating strxfrm() output with the original string begins to look like the least-worst option. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Jun21, 2012, at 11:55 , Peter Geoghegan wrote: On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote: On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies for anyone who cares to try this out, other than a C++ compiler and the standard library. Uh, are you sure the results aren't swapped? string_wrapper takes a parameter called use_strcoll, and it indeed uses strcoll() if that parameter is true, and strxfm() otherwise. But then it passes *false* to determine the speed of strcoll(), and *true* to determine the speed of strxfm()... Egads, you're right. Apparently in my haste I gave the wrong boolean argument to the class constructor in each case. I am completely wrong about this. I apologise for the careless mistake. At least I advanced the debate, though - we don't want to do any ad-hoc generation of strxfrm() output within comparators, even when one side can have a cached value. FWIW, I've played around a bit with a fixed version of your set_test.cpp. I made it use the cached sort key of the RHS also, made it preserve sort keys during copy construction, even used a boost::shared_ptr to avoid actually copying the cached sort key. The strxfrm() version still performed about 30% worse than the strcoll() version. From that, I figured that tree inserts don't trigger enough comparisons to make strxfrm() worthwhile. So I mangled set_test.cpp a bit more and instead of a set::set, I created an (C-style) array of string_wrapper instances, and sorted them with std::sort(). Which made strcoll() twice as fast as strxfrm()... At this point, my theory is that your choice of random strings prevents strxfrm() from ever winning over strcoll(). The reason being that you pick each letter uniformly distributed from a-z, resulting in a probability of two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for extremely complex collation rules, strcoll() probably only needs to compare a few characters to determine the order. Whereas strxfrm() has to compute the whole sort key, no matter what. The question is thus, how good a model are your random strings for the input of a typical sorting step in postgres? My guess is, a quite good one actually, since people probably don't deal with a lot of very similar strings very often. Which makes we wonder if using strxfrm() during sorting wouldn't be a net loss, all things considered… My modified set_test.cpp (which should now be called array_test.cpp) is attached. best regards, Florian Pflug set_test.cpp Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 21 June 2012 11:40, Florian Pflug f...@phlo.org wrote: At this point, my theory is that your choice of random strings prevents strxfrm() from ever winning over strcoll(). The reason being that you pick each letter uniformly distributed from a-z, resulting in a probability of two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for extremely complex collation rules, strcoll() probably only needs to compare a few characters to determine the order. Whereas strxfrm() has to compute the whole sort key, no matter what. Good point. The question is thus, how good a model are your random strings for the input of a typical sorting step in postgres? My guess is, a quite good one actually, since people probably don't deal with a lot of very similar strings very often. Which makes we wonder if using strxfrm() during sorting wouldn't be a net loss, all things considered… Well, I think the answer to that has to be no. I posted a simple postgres text - strxfrm blob bytea SQL-callable wrapper function a few days ago that clearly wins by quite a bit on some sample queries against the dellstore sample database, which has a representative sample set. Completely uniformly-distributed data actually isn't representative of the real world at all. Normal distributions abound. This C++ program was never intended to justify the general utility of using strxfrm() for sorting, which I don't believe is in question. I just wanted to get some idea about the performance characteristics of using strxfrm() to traverse an index. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] not null validation option in contrib/file_fdw
I rebased the patch to current head. Attached is an updated version of the patch. Best regards, Etsuro Fujita -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita Sent: Tuesday, April 17, 2012 2:40 PM To: 'Andrew Dunstan'; 'Shigeru HANADA' Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] not null validation option in contrib/file_fdw I updated the patch. Attached is an updated version of the patch. Changes: * fix a bug in fileGetOptions() * rename the validation option and its code to validate_data_file * clean up Best regards, Etsuro Fujita -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita Sent: Monday, April 16, 2012 4:09 PM To: 'Andrew Dunstan'; 'Shigeru HANADA' Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] not null validation option in contrib/file_fdw Thank you for the review. -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew Dunstan Sent: Friday, April 13, 2012 9:16 PM To: Shigeru HANADA Cc: Etsuro Fujita; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] not null validation option in contrib/file_fdw On 04/13/2012 07:21 AM, Shigeru HANADA wrote: (2012/04/13 16:59), Etsuro Fujita wrote: I updated the patch added to CF 2012-Next [1]. Attached is the updated version of the patch. I applied the patch and ran regression tests of file_fdw, and I got SIGSEGV X-( The failure occurs in fileGetOptions, and it is caused by list_delete_cell used in foreach loop; ListCell points delete target has been free-ed in list_delete_cell, but foreach accesses it to get next element. Some of backend functions which use list_delete_cell in loop use for loop instead of foreach, and other functions exit the loop after calling list_delete_cell. Since we can't stop searching non-COPY options until meeting the end of the options list, we would need to choose former (for loop), or create another list which contains only valid COPY options and return it via other_options parameter. Yes, the code in fileGetOptions() appears to be bogus. Sorry, I will fix it. Also, validate is a terrible name for the option (and in the code) IMNSHO. It's far too generic. validate_not_null or some such would surely be better. I thought it would be used for not only NOT NULL but also CHECK and foreign key constraints. That is, when a user sets the option to 'true', file_fdw verifies that each tuple meets all kinds of constraints. So, how about validate_data_file or simply validate_file? Best regards, Etsuro Fujita cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers begin 666 file_fdw_notnull_v4.patch M9EF9B M+6=I=!A+V-O;G1R:6(O9FEL95]F9'O9FEL95]F9'N8R!B+V-O M;G1R:6(O9FEL95]F9'O9FEL95]F9'N8PII;F1E!E,V(Y,C(S+BXQ.3$X M-6(X(#$P,#8T- HM+2T@82]C;VYTFEB+V9I;5?9F1W+V9I;5?9F1W+F,* M*RLK((O8V]N=')I8B]F:6QE7V9D=R]F:6QE7V9D=RYCD! (TW,2PV(LW M,2PY($! ('-T871I8R!C;VYS=!S=')U8W0@1FEL949D=T]P=EO;B!V86QI M9%]O'1I;VYS6UT@/2![B )(H@9F]R8V5?75O=4@:7,@;F]T('-U'!O MG1E9!B2!F:6QE7V9D=R!B96-A=7-E(ET)W,@9F]R($-/4%D@5$\NB ) M(HOB **PDO*B!686QI9%T:6]N(]P=EO;G,@*B\**PE[(G9A;ED871E M7V1A=%?9FEL92(L($9OF5I9VY486)L95)E;%T:6]N261]+ HKB )+RH@ M4V5N=EN96P@*B\*( E[3E5,3P@26YV86QI9$]I9'T*('T[D! (TY-PV M(LY-RPX($! ('1Y5D968@W1R=6-T($9I;59'=%5C=71I;VY3=%T M90H@6-H87()( @*F9I;5N86UE.PD)+RH@9FEL92!T;R!R96%D(HOB ) M3ES= D@( J;W!T:6]NSL)2\J(UEF=E9!#3U!9(]P=EO;G,L(5X M8VQU9EN9R!F:6QE;F%M92 J+PH@4-O'E3=%T90ECW1A=4[0D)+RH@ MW1A=4@;V8@F5A9EN9R!F:6QE(HOBL)8F]O; D)=F%L:61A=5?9%T M85]F:6QE.PHK0D)0D)0DO*B!W:5T:5R('1O('9A;ED871E($Y/5!. M54Q,(-O;G-TF%I;G1S(HOB!]($9I;59'=%5C=71I;VY3=%T93L* M( H@+RH*0$ @+3$S-PW(LQ,SDL.2! 0!S=%T:6,@8F]O;!F:6QE06YA M;'EZ949OF5I9VY486)L92A296QA=EO;B!R96QA=EO;BP*( J+PH@W1A M=EC()O;VP@:7-?=F%L:61?;W!T:6]N*-O;G-T(-H87(@*F]P=EO;BP@ M3VED(-O;G1E'0I.PH@W1A=EC('9O:60@9FEL94=E=$]P=EO;G,H3VED M(9OF5I9VYT86)L96ED+ HM0D)( @8VAAB J*F9I;5N86UE+!,:7-T M(HJ;W1H97)?;W!T:6]NRD[BL)0D)0D@(!C:%R(HJ9FEL96YA;64L MBL)0D)0D@(!,:7-T(HJ;W1H97)?;W!T:6]NRP**PD)0D)2 @()O M;VP@*G9A;ED871E7V1A=%?9FEL92D[B!S=%T:6,@3ES= J9V5T7V9I M;5?9F1W7V%T=')I8G5T95]O'1I;VYS*$]I9!R96QI9D[B!S=%T:6,@ M=F]I9!EW1I;6%T95]S:7IE*%!L86YN97));F9O(IR;V]T+!296Q/'1) M;F9O(IB87-EF5L+ H@0D)(!:6QE1F1W4QA;E-T871E(IF9'=?')I M=F%T92D[D! (TQ.#(L-B K,3@Y+#@0$ @9FEL95]F9'=?=F%L:61A=]R M*%!'7T953D-424].7T%21U,IB )8VAA@D@( J9FEL96YA;64@/2!.54Q,
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Hello, Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr and SvPVUTF8() when turning a perl string into a cstring. Right. I spent a bit longer time catching on pl/perl and now understand what is the problem... So I played a bit with this patch, and touched it a bit mainly just to add some more comments; and while at it I noticed that some of the functions in Util.xs might leak some memory, so I made an attempt to plug them, as in the attached patch (which supersedes yours). Ok, Is it ok to look into the newer patch including fix of leaks at first? -- Coding and styles. This also seems to have polished the previous one on some codes, styles and comments which generally look reasonable. And patch style was corrected into unified. -- Functions I seems to work properly on the database the encodings of which are SQL_ASCII and UTF8 (and EUC-JP) as below, = = create or replace function foo(text) returns text language plperlu as $$ $a = shift; return BOO! if ($a != a\x80cあ); return $a; $$; SQL_ASCII= select foo(E'a\200cあ') = E'a\200cあ'; ?column? -- t UTF8= select foo(E'a\200cあ'); ERROR: invalid byte sequence for encoding UTF8: 0x80 UTF8= select foo(E'a\302\200cあ') = E'a\u0080cあ'; ?column? -- t = This looks quite valid according to the definition of the encodings and perl's nature as far as I see. -- The others Variable naming in util_quote_*() seems a bit confusing, text *arg = sv2text(sv); text *ret = DatumGetTextP(..., PointerGetDatum(arg))); char *str; pfree(arg); str = text_to_cstring(ret); RETVAL = cstr2sv(str); pfree(str); Renaming ret to quoted and str to ret as the patch attached might make it easily readable. Now, with my version of the patch applied and using a SQL_ASCII database to test the problem in the original report, I notice that we now have a regression failure: snip. I'm not really sure what to do here -- maybe have a second expected file for that test is a good enough answer? Or should I just take the test out? Opinions please. The attached ugly patch does it. We seem should put NO_LOCALE=1 on the 'make check' command line for the encodings not compatible with the environmental locale, although it looks work. # UtfToLocal() seems to have a bug that always report illegal # encoding was UTF8 regardless of the real encoding. But # plper_lc_*.(sql|out) increases if the bug is fixed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs index 7d0102b..4b4b680 100644 --- a/src/pl/plperl/Util.xs +++ b/src/pl/plperl/Util.xs @@ -67,8 +67,11 @@ static text * sv2text(SV *sv) { char *str = sv2cstr(sv); + text *text; - return cstring_to_text(str); + text = cstring_to_text(str); + pfree(str); + return text; } MODULE = PostgreSQL::InServer::Util PREFIX = util_ @@ -113,10 +116,12 @@ util_quote_literal(sv) } else { text *arg = sv2text(sv); -text *ret = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg))); - char *str = text_to_cstring(ret); - RETVAL = cstr2sv(str); - pfree(str); +text *quoted = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg))); +char *ret; +pfree(arg); +ret = text_to_cstring(quoted); +RETVAL = cstr2sv(ret); +pfree(ret); } OUTPUT: RETVAL @@ -132,10 +137,12 @@ util_quote_nullable(sv) else { text *arg = sv2text(sv); -text *ret = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg))); - char *str = text_to_cstring(ret); - RETVAL = cstr2sv(str); - pfree(str); +text *quoted = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg))); +char *ret; +pfree(arg); +ret = text_to_cstring(quoted); +RETVAL = cstr2sv(ret); +pfree(ret); } OUTPUT: RETVAL @@ -145,14 +152,15 @@ util_quote_ident(sv) SV *sv PREINIT: text *arg; -text *ret; - char *str; +text *quoted; +char *ret; CODE: arg = sv2text(sv); -ret = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg))); - str = text_to_cstring(ret); - RETVAL = cstr2sv(str); - pfree(str); +quoted = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg))); +pfree(arg); + ret = text_to_cstring(quoted); +RETVAL = cstr2sv(ret); +pfree(ret); OUTPUT: RETVAL diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h index 1b6648b..ed99194 100644 --- a/src/pl/plperl/plperl_helpers.h +++ b/src/pl/plperl/plperl_helpers.h @@ -3,21 +3,29 @@ /* * convert from utf8 to database encoding + * + * Returns a palloc'ed copy of the original string */ static inline char *
[HACKERS] Catalog/Metadata consistency during changeset extraction from wal
Hi Robert, Hi all, Robert and I talked quite a bit about different methods of providing enough information to extract tuples from wal. I don't think either of us is yet really convinced of any individual method, so I want to recap our discussion in one email so others can chime in without reading the already huge thread. I hope I am not misrepesenting Roberts opinion here, but I am sure he will correct me if I do ;) To satisfy the different needs people have for changeset extraction we currently think that for each individual tuple extracted from wal we need to provide the following information: a) one or more pieces of tuple data (HeapTupleData structs) * INSERT: full new tuple * UPDATE: full new tuple, old pkey (optionally the full old tuple) * DELETE: old pkey (optionally the full old tuple) b) the action performed (INSERT|UPDATE|DELETE) c) the table on which the action was performed d) access to the table structure (names, column types, ...) procs (*_out functions for the individual columns) The problem with getting that data is that at the point were decoding the wal the catalog may have evolved significantly from the state it was in when the tuple was put into the wal. We can extract a) and b) without any problems (lets not talk about it here) but we don't necessarily know how to make sense of the data because a HeapTuple cannot be properly interpreted without the knowledge of c) and d). I am of the opinion that c) is basically equivalent to solving d) because the wal only contains the tuple (pg_database.oid, pg_tablespace.oid, pg_class.relfilenode) of the table and not the 'pg_class.oid'. The relfilenode is changed by operations that rewrite the table like ALTER TABLE ADD COLUMN ... DEFAULT ...; TRUNCATE; CLUSTER and some others. A single transaction can contain tuples for different relfilenodes and with different columns: CREATE TABLE foo(id serial primary key, data text); BEGIN; INSERT INTO foo ...; TRUNCATE foo; INSERT INTO foo ...; -- same structure, different relfilenode ALTER TABLE foo ADD COLUMN bar text; INSERT INTO foo ...; -- same relfilenode, different table structure ALTER TABLE foo ADD COLUMN zaphod text DEFAULT ''; INSERT INTO foo ...; -- different relfilenode, different table structure COMMIT; There are several additional complex scenarios. In http://archives.postgresql.org/message- id/201206192023.20589.and...@2ndquadrant.com I listed which options I see for reaching that goal. A refined version of that list: 1.) Decode on a different, possibly catalog-only, pg instance kept in sync using the command trigger infrastructure (but not necessarily user-level defined command triggers) If the command/event trigger infrastructure logs into a system-catalog table keeping the catalog in the correct state is relatively easy. When replaying/converting a reassembled transaction everytime an INSERT into that system table happens the contained DDL gets performed. The locking on the generating side takes care of the concurrency aspects. Advantages: * minimal overhead (space, performance) * allows additional tables/indexes/triggers if you take care with oid allocation * easy transactionally correct catalog behaviour behaviour * the decoding instance can be used to store all data in a highly efficient manner (no decoding, no full detoasting, ...) * the decoding instance is fully writable without problems if you don't generate conflicts (separate tables, non-overlapping writes, whatever) * implementable in a pretty unintrusive way Disadvantes: * the table structure of replicated tables needs to be exactly the same * the type definition + support procs needs to be similar enough to read the data * error checking of the above isn't easy but probably possible * full version/architecture compatibility required * a proxy instance required even if you want to replicate into some other system/architecture/version 2.) Keep the decoding site up2date by replicating the catalog via normal HS recovery mechanisms. Advantages: * most of the technology is already there * minimal overhead (space, performance) * no danger of out of sync catalogs * no support for command triggers required that can keep a catalog in sync, including oids Disadvantages: * driving the catalog recovery that way requires some somewhat intricate code as it needs to be done in lockstep with decoding the wal-stream * requires an additional feature to guarantee HS always has enough information to be queryable after a crash/shutdown * some complex logic/low-level fudging required to keep the transactional behaviour sensible when querying the catalog * full version/architecture compatibility required * the decoding site will always ever be only readable 3) Multi-Versioned catalog Below are two possible implementation strategies for that concept Advantages: * Decoding is done on the master in an asynchronous fashion * low overhead during normal DML execution, not much additional
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Ouch! # UtfToLocal() seems to have a bug that always report illegal # encoding was UTF8 regardless of the real encoding. But # plper_lc_*.(sql|out) increases if the bug is fixed. This is not a bug. The error message invalid byte sequence for encoding UTF8 meant invalid INPUT byte sequence as encoding UTF8 not encoding for output.. So the regtest patch covers all of possible patterns - UTF8 and SQL_ASCII.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Resource Owner reassign Locks
On 18.06.2012 13:59, Heikki Linnakangas wrote: On 10.06.2012 23:39, Jeff Janes wrote: I found the interface between resowner.c and lock.c a bit confusing. resowner.c would sometimes call LockReassignCurrentOwner() to reassign all the locks, and sometimes it would call LockReassignCurrentOwner() on each individual lock, with the net effect that's the same as calling LockReleaseCurrentOwner(). And it doesn't seem right for ResourceOwnerRemember/ForgetLock to have to accept a NULL owner. I rearranged that so that there's just a single LockReassignCurrentOwner() function, like before this patch. But it takes as an optional argument a list of locks held by the current resource owner. If the caller knows it, it can pass them to make the call faster, but if it doesn't it can just pass NULL and the function will traverse the hash table the old-fashioned way. I think that's a better API. Please take a look to see if I broke something. I hear no complaints, so committed. Thanks! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Jun21, 2012, at 13:41 , Andres Freund wrote: 3b) Ensure that enough information in the catalog remains by fudging the xmin horizon. Then reassemble an appropriate snapshot to read the catalog as the tuple in question has seen it. The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is updated multiple times by the same transaction, you cannot decide whether a tuple was visible in a certain snapshot unless you have access to the updating backend's ComboCID hash. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
Simon Riggs si...@2ndquadrant.com writes: On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote: Nonetheless, it would be a good idea to prune the TODO list regularly, such as after a release. We used to do that a bit, not so much lately, perhaps. But everyone is invited to contribute to that. The idea is to remove contentious issues from the list, to avoid the waste of time. The thing is, a lot of stuff gets punted to the TODO list *because* it's contentious, ie there's not consensus on what to do. If there were consensus we might've just done it already. I'm not sure we want to remove such entries, though perhaps somehow marking them as debatable would be a good thing. There may well be stuff on the list that is no longer very relevant in today's world, but somebody would have to go through it item by item to decide which ones those are. I'm not volunteering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Jun21, 2012, at 13:41 , Andres Freund wrote: 5.) The actually good idea. Yours? What about a mixure of (3b) and (4), which writes the data not to the WAL but to a separate logical replication log. More specifically: There's a per-backend queue of change notifications. Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED record containing (xid, databaseoid, tableoid, old xmin, old ctid, new ctid) Whenever a table (or something that a table depends on) is modified we wait until all references to that table's oid have vanished from the queue, then queue a DDL record containing (xid, databaseoid, tableoid, text). Other backend cannot concurrently add further TUPLE_MODIFIED records since we alreay hold an exclusive lock on the table at that point. A background process continually processes these queues. If the front of the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple based on their ctids and writes the old tuple's PK and the full new tuple to the logical replication log. Since table modifications always wait for all previously queued TUPLE_MODIFIED records referencing that table to be processes *before* altering the catalog, tuples can always be interpreted according to the current (SnapshotNow) catalog contents. Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records, which are also written to the log by the background process. The background process may decide to wait until a backend commits before processing that backend's log. In that case, rolled back transaction don't leave a trace in the logical replication log. Should a backend, however, issue a DDL statement, the background process *must* process that backend's queue immediately, since otherwise there's a dead lock. The background process also maintains a value in shared memory which contains the oldest value in any of the queue's xid or old xmin fields. VACUUM and the like must not remove tuples whose xmin is = that value. Hit bits *may* be set for newest tuples though, provided that the background process ignores hint bits when fetching the old and new tuples. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reseting undo/redo logs
I have this issue on Greenplum which is a MPP hybrid build from postgres 8.2, and the issue I am seeing is 100% from pg code. One of the Greenplum segments went down and it cannot recover because PANIC XX000 invalid redo/undo record in shutdown checkpoint (xlog.c:6576) I am posting this question here because most casual users of Postgres/Greenplum are telling me that database is hosed, but I think that with pg_resetxlog and some (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data loss I could at least hack database to come back up. What I am asking for help here is to help me calculate the reset values - where to find the most recent valid one and how to *specifically* calculate the reset ones. Please advise, Edmon 2012-06-12 13:16:18.614912 EDT p14611 th802662304 0 seg-1 LOG 0 mirror transition, primary address(port) 'boxgp10a(41001)' mirror address(port) 'boxgp02a(51001)' mirroring role 'primary role' mirroring state 'change tracking' segment state 'not initialized' process name(pid) 'filerep main process(14611)' filerep state 'not initialized' 0 cdbfilerep.c3371 2012-06-12 13:16:18.617047 EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToInsync() found insync_transition_completed:'false' full resync:'false' 0 cdbresynchronizechangetracking.c2522 2012-06-12 13:16:18.617113 EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToResync() found resync_transition_completed:'false' full resync:'false' 0 cdbresynchronizechangetracking.c2559 2012-06-12 13:16:18.746870 EDT p14612 th802662304 0 seg-1 LOG 0 searching for last checkpoint location for creating the initial resynchronize changetracking 0 xlog.c 10836 2012-06-12 13:16:18.747318 EDT p14612 th802662304 0 seg-1 LOG 0 record with zero length at 14/4870 0 xlog.c 4182 2012-06-12 13:16:18.747491 EDT p14612 th802662304 0 seg-1 LOG 0 scanned through 1 initial xlog records since last checkpoint for writing into the resynchronize change log 0 cdbresynchronizechangetracking.c206 2012-06-12 13:16:18.750830 EDT p14624 th802662304 0 seg-1 LOG 0 database system was shut down at 2012-06-12 11:00:13 EDT 0 xlog.c 6326 2012-06-12 13:16:18.750987 EDT p14624 th802662304 0 seg-1 LOG 0 checkpoint record is at 14/4820 0 xlog.c 6425 2012-06-12 13:16:18.751016 EDT p14624 th802662304 0 seg-1 LOG 0 redo record is at 14/4820; undo record is at 14/42AC2118; shutdown TRUE0 xlog.c 6534 2012-06-12 13:16:18.751041 EDT p14624 th802662304 0 seg-1 LOG 0 next transaction ID: 0/4553423; next OID: 241771 0 xlog.c 6538 2012-06-12 13:16:18.751065 EDT p14624 th802662304 0 seg-1 LOG 0 next MultiXactId: 271; next MultiXactOffset: 549 0 xlog.c 6541 2012-06-12 13:16:18.796637 EDT p14624 th802662304 0 seg-1 PANIC XX000 invalid redo/undo record in shutdown checkpoint
Re: [HACKERS] Release versioning inconsistency
On Wed, Jun 20, 2012 at 5:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On ons, 2012-06-20 at 13:26 +0200, Magnus Hagander wrote: That might actually be a good idea. We can't really change the way we named the betas, but it's not too late to consider naming the actual release as 9.2.0... The final release was always going to be called 9.2.0, but naming the beta 9.2.0betaX is wrong. There was a previous discussion about that particular point. Yes. There is no reason to change the naming scheme we have been using for years now (at least since version_stamp.pl was invented for 7.4). The only problem is that somebody got the name of the directory wrong on the FTP server. If that wasn't clear, then yes, that was me. I don't recall the reason why using 9.2.0betax was actually wrong - i realize that's not the name of the version, so thereby the directory was wrong. But in what way would it be wrong to call the version that? Given that it would help with sorting. (And yes, this is a very long-forward question, more about 9.3, since we can't really go back and change the current filename..) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release versioning inconsistency
On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net wrote: 2012/6/20 Magnus Hagander mag...@hagander.net: On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote: On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net wrote: (I do believe that using the v9.2.0beta marker is *better*, because then it sorts properly. But likely not enough much better to be inconsistent with previous versions) Good point. Maybe that's a reason to change the versioning scheme and stick with 9.2.0betaX everywhere. Including calling the final release 9.2.0 instead of simply 9.2? That might actually be a good idea. We can't really change the way we named the betas, but it's not too late to consider naming the actual release as 9.2.0... May be a symlink could be created just do fit the same pattern that other versions do and keeps the actual links (for beta) working. That we can do - in fact, done. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On 21 June 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote: Nonetheless, it would be a good idea to prune the TODO list regularly, such as after a release. We used to do that a bit, not so much lately, perhaps. But everyone is invited to contribute to that. The idea is to remove contentious issues from the list, to avoid the waste of time. The thing is, a lot of stuff gets punted to the TODO list *because* it's contentious, ie there's not consensus on what to do. If there were consensus we might've just done it already. I'm not sure we want to remove such entries, though perhaps somehow marking them as debatable would be a good thing. There may well be stuff on the list that is no longer very relevant in today's world, but somebody would have to go through it item by item to decide which ones those are. I'm not volunteering. smiles Understood I'll have a play. Maybe I should just go with the idea of Simon's TODO List - stuff I personally think is worth working on, and leave it at that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012: Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr and SvPVUTF8() when turning a perl string into a cstring. Right. So I played a bit with this patch, and touched it a bit mainly just to add some more comments; and while at it I noticed that some of the functions in Util.xs might leak some memory, so I made an attempt to plug them, as in the attached patch (which supersedes yours). I think most of these leaks go back to 9.0. Dunno if its worth backpatching them... to test the problem in the original report, I notice that we now have a regression failure: I'm not really sure what to do here -- maybe have a second expected file for that test is a good enough answer? Or should I just take the test out? Opinions please. I think we have broken that check twice so it seems like it would be nice to keep. But I don't feel *to* strongly about it. The comment and cleanups all look good to me. Thanks! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote: 3) Multi-Versioned catalog Below are two possible implementation strategies for that concept Advantages: * Decoding is done on the master in an asynchronous fashion * low overhead during normal DML execution, not much additional code in that path * can be very efficient if architecture/version are the same * version/architecture compatibility can be done transparently by falling back to textual versions on mismatch Disadvantages: * decoding probably has to happen on the master which might not be what people want performancewise 3a) Change the system catalogs to be versioned Advantages. * catalog access is easy * might be interesting for other users Disadvantages: * catalog versioning is complex to implement * space overhead for all users, even without using logical replication * I can't see -hackers signing off Hmm, there's all sorts of stuff mixed up there in your description. ISTM we should maintain a lookup table on target system that has the minimal required information on it. There is no need to version the whole catalog. (Complete overkill - I would oppose it ;-) If we keep the lookup table on the target as a normal table, we can insert new rows into it as changes occur. If we need to perform recovery then the earlier version rows will still be there and we just use those. Versioning is easy to implement, just use LSN as additional key in the table. Then lookup based on key and LSN. If a transaction that makes DDL changes aborts, then the changes will be automatically backed out. Only keep the lookup table if using logical replication, so zero overhead otherwise. We just need to setup the initial state carefully, so it matches whats in the database, but that sounds OK. So I don't see any of the disadvantages you have there. Its just darn simple, and hence will probably work. It's also a very similar solution to the other lookups required in memory by the apply process. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
Hi, On Thursday, June 21, 2012 04:39:21 PM Simon Riggs wrote: On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote: 3) Multi-Versioned catalog Below are two possible implementation strategies for that concept Advantages: * Decoding is done on the master in an asynchronous fashion * low overhead during normal DML execution, not much additional code in that path * can be very efficient if architecture/version are the same * version/architecture compatibility can be done transparently by falling back to textual versions on mismatch Disadvantages: * decoding probably has to happen on the master which might not be what people want performancewise 3a) Change the system catalogs to be versioned Advantages. * catalog access is easy * might be interesting for other users Disadvantages: * catalog versioning is complex to implement * space overhead for all users, even without using logical replication * I can't see -hackers signing off Hmm, there's all sorts of stuff mixed up there in your description. Sure, it tried to compress a complex topic discussed in a long thread ;) ISTM we should maintain a lookup table on target system that has the minimal required information on it. You need just about the whole catalog because the *_out procs might need to lookup types, operators and such again. Unless you want to rewrite those functions you need to provide a normal execution environment. I don't see how your idea works because of that? Am I missing something? Yes, that would be easier if we didn't want to support conversion to text and similar, but I don't see that flying. And even if it would be acceptable you would need to have enough information to construct a btree ScanKey which means you already need a lot of the catalogs. There is no need to version the whole catalog. (Complete overkill - I would oppose it ;-) Hey, that originally was your idea :P. But I definitely agree, its not a good idea. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
On Thu, Jun 21, 2012 at 5:22 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: So I played a bit with this patch, and touched it a bit mainly [...] functions in Util.xs might leak some memory, so I made an attempt to Ok, Is it ok to look into the newer patch including fix of leaks at first? Yeah :-). Variable naming in util_quote_*() seems a bit confusing, Renaming ret to quoted and str to ret as the patch attached might make it easily readable. Ok. [ regression failure on a SQL_ASCII database ] I'm not really sure what to do here -- maybe have a second expected file for that test is a good enough answer? Or should I just take the test out? Opinions please. The attached ugly patch does it. We seem should put NO_LOCALE=1 on the 'make check' command line for the encodings not compatible with the environmental locale, although it looks work. +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo $(ENCODING) | tr A-Z- a-z_).sql 2/dev/null)) +REGRESS_LC = $(if $(REGRESS_LC0),$(REGRESS_LC0),plperl_lc) +REGRESS = plperl $(REGRESS_LC) plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array Hrm, that's quite cute. I dunno if there is a more cannon way of doing the above-- but it seems to work. I'm not sure this regression test is worth it. I'm thinking maybe we should just remove theegressionegression test instead. There is a minor issue with the patch where sql/plperl_lc_sql_ascii.sql contains the text plperl_lc.sql. After copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything worked as described. Thanks for the review! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote: ISTM we should maintain a lookup table on target system that has the minimal required information on it. You need just about the whole catalog because the *_out procs might need to lookup types, operators and such again. Unless you want to rewrite those functions you need to provide a normal execution environment. OK, so its more tables than I first thought, but its not all rows and columns of all catalog tables. I don't see how your idea works because of that? Am I missing something? Why does the number/size of the tables required make that not work? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote: On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote: ISTM we should maintain a lookup table on target system that has the minimal required information on it. You need just about the whole catalog because the *_out procs might need to lookup types, operators and such again. Unless you want to rewrite those functions you need to provide a normal execution environment. OK, so its more tables than I first thought, but its not all rows and columns of all catalog tables. Sure, there are a few you probably can leave out (pg_database, pg_auth*, pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not many. I don't see how your idea works because of that? Am I missing something? Why does the number/size of the tables required make that not work? The number of tables itself isn't a fundamental problem although it would make stuff harder. The problem is that the out functions expect a normal operating environment and might e.g. do catalog lookups themselves. I don't see how we can do anything here without providing a (nearly) full catalog. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On 21 June 2012 16:13, Andres Freund and...@2ndquadrant.com wrote: On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote: On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote: ISTM we should maintain a lookup table on target system that has the minimal required information on it. You need just about the whole catalog because the *_out procs might need to lookup types, operators and such again. Unless you want to rewrite those functions you need to provide a normal execution environment. OK, so its more tables than I first thought, but its not all rows and columns of all catalog tables. Sure, there are a few you probably can leave out (pg_database, pg_auth*, pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not many. That's a start. Leaving out the shared catalogs makes me smile already. I don't see how your idea works because of that? Am I missing something? Why does the number/size of the tables required make that not work? The number of tables itself isn't a fundamental problem although it would make stuff harder. The problem is that the out functions expect a normal operating environment and might e.g. do catalog lookups themselves. I don't see how we can do anything here without providing a (nearly) full catalog. I accept that there could be pathological functions in there. We're not trying to make it work with any conceivable datatype/operator, so forcing logical replication to follow sensible rules makes sense. Are there any out functions that anybody uses that do that? It's too much change to actually version the main catalog. Keeping a separate copy of a versioned catalog for use by replication sounds much more likely to fly. In any case, I think we'll have to go back through the list and do more work on evaluation. When the options look like that, its typical to have ruled out the final winner early on, but that doesn't mean it isn't in there somewhere. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Thursday, June 21, 2012 05:25:41 PM Simon Riggs wrote: On 21 June 2012 16:13, Andres Freund and...@2ndquadrant.com wrote: On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote: On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote: ISTM we should maintain a lookup table on target system that has the minimal required information on it. You need just about the whole catalog because the *_out procs might need to lookup types, operators and such again. Unless you want to rewrite those functions you need to provide a normal execution environment. OK, so its more tables than I first thought, but its not all rows and columns of all catalog tables. Sure, there are a few you probably can leave out (pg_database, pg_auth*, pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not many. That's a start. Leaving out the shared catalogs makes me smile already. I don't see how your idea works because of that? Am I missing something? Why does the number/size of the tables required make that not work? The number of tables itself isn't a fundamental problem although it would make stuff harder. The problem is that the out functions expect a normal operating environment and might e.g. do catalog lookups themselves. I don't see how we can do anything here without providing a (nearly) full catalog. I accept that there could be pathological functions in there. We're not trying to make it work with any conceivable datatype/operator, so forcing logical replication to follow sensible rules makes sense. Are there any out functions that anybody uses that do that? Loads. enum_out, record_out, array_out are examples I can think of without even looking. I am pretty sure there are more. But imo this list already shows its prohibitive. It's too much change to actually version the main catalog. Keeping a separate copy of a versioned catalog for use by replication sounds much more likely to fly. I don't yet see how that should work given oids and everything are quite possibly hardcoded in those functions. You could start switching out the catalogs on a lower level but I think at that point its getting too ugly. In any case, I think we'll have to go back through the list and do more work on evaluation. When the options look like that, its typical to have ruled out the final winner early on, but that doesn't mean it isn't in there somewhere. I hope we have but I am not convinced that there is an elegant solution... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote: On Jun21, 2012, at 13:41 , Andres Freund wrote: 3b) Ensure that enough information in the catalog remains by fudging the xmin horizon. Then reassemble an appropriate snapshot to read the catalog as the tuple in question has seen it. The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is updated multiple times by the same transaction, you cannot decide whether a tuple was visible in a certain snapshot unless you have access to the updating backend's ComboCID hash. Thats a very good point. Not sure how I forgot that. It think it might be possible to reconstruct a sensible combocid mapping from the walstream. Let me think about it for a while... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COMMUTATOR doesn't seem to work
Maybe I am using it wrong but I get no error message when I use it. I have a type called chkpass (a version is in the additional supplied modules) and I create the equality operator like this: CREATE OPERATOR = ( PROCEDURE = eq, LEFTARG = chkpass, RIGHTARG = text, COMMUTATOR = =, NEGATOR = ); This works; cosmostest=# SELECT 'aaa'::chkpass = 'aaa'; ?column? -- t (1 row) But... cosmostest=# SELECT 'aaa' = 'aaa'::chkpass; ERROR: operator is only a shell: text = chkpass LINE 1: SELECT 'aaa' = 'aaa'::chkpass; When I look at the operators I see why, sort of... public | =| chkpass | text | boolean | public | =| text | chkpass| - | So while it created the operator it didn't set a return type. I don't know if this is a new issue or I simply got lucky and never tried the opposite test before but I see this in 9.0.4 and 9.1.3. Am I using the command improperly? -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Thursday, June 21, 2012 04:05:54 PM Florian Pflug wrote: On Jun21, 2012, at 13:41 , Andres Freund wrote: 5.) The actually good idea. Yours? What about a mixure of (3b) and (4), which writes the data not to the WAL but to a separate logical replication log. More specifically: There's a per-backend queue of change notifications. Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED record containing (xid, databaseoid, tableoid, old xmin, old ctid, new ctid) Whenever a table (or something that a table depends on) is modified we wait until all references to that table's oid have vanished from the queue, then queue a DDL record containing (xid, databaseoid, tableoid, text). Other backend cannot concurrently add further TUPLE_MODIFIED records since we alreay hold an exclusive lock on the table at that point. A background process continually processes these queues. If the front of the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple based on their ctids and writes the old tuple's PK and the full new tuple to the logical replication log. Since table modifications always wait for all previously queued TUPLE_MODIFIED records referencing that table to be processes *before* altering the catalog, tuples can always be interpreted according to the current (SnapshotNow) catalog contents. Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records, which are also written to the log by the background process. The background process may decide to wait until a backend commits before processing that backend's log. In that case, rolled back transaction don't leave a trace in the logical replication log. Should a backend, however, issue a DDL statement, the background process *must* process that backend's queue immediately, since otherwise there's a dead lock. The background process also maintains a value in shared memory which contains the oldest value in any of the queue's xid or old xmin fields. VACUUM and the like must not remove tuples whose xmin is = that value. Hit bits *may* be set for newest tuples though, provided that the background process ignores hint bits when fetching the old and new tuples. I think thats too complicated to fly. Getting that to recover cleanly in case of crash would mean you'd need another wal. I think if it comes to that going for 1) is more realistic... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reseting undo/redo logs
Edmon Begoli ebeg...@gmail.com writes: One of the Greenplum segments went down and it cannot recover because PANICXX000 invalid redo/undo record in shutdown checkpoint (xlog.c:6576) I am posting this question here because most casual users of Postgres/Greenplum are telling me that database is hosed, but I think that with pg_resetxlog and some (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data loss I could at least hack database to come back up. What I am asking for help here is to help me calculate the reset values - where to find the most recent valid one and how to *specifically* calculate the reset ones. pg_controldata should give you useful starting points. I don't think we can offer any more help than what is on the pg_resetxlog reference page as to what to do with them. (Though you might try reading the more recent releases' versions of that page to see if anything's been clarified.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reseting undo/redo logs
Thanks. I was going down this route, so just your confirmation that this is the right path is helpful. Edmon On Thu, Jun 21, 2012 at 11:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: Edmon Begoli ebeg...@gmail.com writes: One of the Greenplum segments went down and it cannot recover because PANIC XX000 invalid redo/undo record in shutdown checkpoint (xlog.c:6576) I am posting this question here because most casual users of Postgres/Greenplum are telling me that database is hosed, but I think that with pg_resetxlog and some (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data loss I could at least hack database to come back up. What I am asking for help here is to help me calculate the reset values - where to find the most recent valid one and how to *specifically* calculate the reset ones. pg_controldata should give you useful starting points. I don't think we can offer any more help than what is on the pg_resetxlog reference page as to what to do with them. (Though you might try reading the more recent releases' versions of that page to see if anything's been clarified.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
D'Arcy Cain da...@druid.net writes: Maybe I am using it wrong but I get no error message when I use it. I have a type called chkpass (a version is in the additional supplied modules) and I create the equality operator like this: CREATE OPERATOR = ( PROCEDURE = eq, LEFTARG = chkpass, RIGHTARG = text, COMMUTATOR = =, NEGATOR = ); Did you actually create a text = chkpass function and operator? This declaration merely promises that you will provide one eventually. The system does not have the ability to make one for you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On Jun21, 2012, at 17:46 , D'Arcy Cain wrote: Maybe I am using it wrong but I get no error message when I use it. I have a type called chkpass (a version is in the additional supplied modules) and I create the equality operator like this: ... So while it created the operator it didn't set a return type. I don't know if this is a new issue or I simply got lucky and never tried the opposite test before but I see this in 9.0.4 and 9.1.3. Am I using the command improperly? COMMUTATOR (and also NEGATOR) only inform the planner/optimizer about the relationship between operators, but you still have to create all of them manually. What you see is the placeholder for that to-be-created operator that CREATE OPERATOR fabricated. It does that because if the COMMUTATOR or NEGATOR was required to already exist, how would you ever be able to create a pair of commuting operators? If you later use CREATE OPERATOR to actually create the COMMUTATOR or NEGATOR you specified previously, it'll simply complete the previously created placeholder. Long story short, you need another CREATE OPERATOR command for the COMMUTATOR (which argument types swapped), and it should in turn name the original operator as it's COMMUTATOR. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release versioning inconsistency
2012/6/21 Magnus Hagander mag...@hagander.net: On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net wrote: 2012/6/20 Magnus Hagander mag...@hagander.net: On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote: On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net wrote: (I do believe that using the v9.2.0beta marker is *better*, because then it sorts properly. But likely not enough much better to be inconsistent with previous versions) Good point. Maybe that's a reason to change the versioning scheme and stick with 9.2.0betaX everywhere. Including calling the final release 9.2.0 instead of simply 9.2? That might actually be a good idea. We can't really change the way we named the betas, but it's not too late to consider naming the actual release as 9.2.0... May be a symlink could be created just do fit the same pattern that other versions do and keeps the actual links (for beta) working. That we can do - in fact, done. It works fine here, thanks! []s -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release versioning inconsistency
On tor, 2012-06-21 at 16:19 +0200, Magnus Hagander wrote: On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net wrote: 2012/6/20 Magnus Hagander mag...@hagander.net: On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote: On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net wrote: (I do believe that using the v9.2.0beta marker is *better*, because then it sorts properly. But likely not enough much better to be inconsistent with previous versions) Good point. Maybe that's a reason to change the versioning scheme and stick with 9.2.0betaX everywhere. Including calling the final release 9.2.0 instead of simply 9.2? That might actually be a good idea. We can't really change the way we named the betas, but it's not too late to consider naming the actual release as 9.2.0... May be a symlink could be created just do fit the same pattern that other versions do and keeps the actual links (for beta) working. That we can do - in fact, done. Why not just move the whole thing and not increase the confusion? The press releases don't refer to the directory directly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release versioning inconsistency
On tor, 2012-06-21 at 16:17 +0200, Magnus Hagander wrote: I don't recall the reason why using 9.2.0betax was actually wrong - i realize that's not the name of the version, so thereby the directory was wrong. But in what way would it be wrong to call the version that? It's not the beta for 9.2.0, it's the beta for the 9.2 series. There is not 9.2.1betaX, after all. Given that it would help with sorting. How does it help? 9.2.0, 9.2.0betaX, 9.2.1, ... ??? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree or not btree? That is the question
ERROR: index pg_class_oid_index is not a btree That means you got bogus data while reading the metapage. I'm beginning to wonder about the hardware on this server ... This happened again, and this time I went back through the logs and found that it is always the exact same query causing the issue. I also found it occuring on different servers, which rules out RAM anyway (still shared disk, so those are suspect). This query also sometimes gives errors like this: ERROR: could not read block 3 of relation 1663/1554846571/3925298284: read only 0 of 8192 bytes However, the final number changes: these are invariably temporary relations. The query itself is a GROUP BY over a large view and the explain plan is 107 rows, with nothing esoteric about it. Most of the tables used are fairly common ones. I'm trying to duplicate on a non-production box, without success so far, and I'm loath to run it on production as it sometimes causes multiple backends to freeze up and requires a forceful restart. Any ideas on how to carefully debug this? There are a couple of quicksorts when I explain analyze on a non-prod system, which I am guessing where the temp tables come from (work_mem is 24MB). I'm not sure I understand what could be causing both the 'read 0' and btree errors for the same query - bad blocks on disk for one of the underlying tables? I'll work next on checking each of the tables the view is using. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpCBcRgxlYYF.pgp Description: PGP signature
[HACKERS] pg_dump and dependencies and --section ... it's a mess
Don't know if everybody on this list has been paying attention to the pgsql-bugs thread about bug #6699. The shortest example of the problem is create table t1 (f1 int primary key, f2 text); create view v1 as select f1, f2 from t1 group by f1; The view's query is legal only because of the primary key on f1, else the reference to f2 would be considered inadequately grouped. So when we create the view we mark it as dependent on that primary key (or more accurately, the view's _RETURN rule is so marked). This all works fine so far as the backend is concerned: you can't drop the PK without dropping or redefining the view. But it gives pg_dump indigestion. If you do pg_dump -Fc | pg_restore -l -v you see (relevant entries only): 5; 2615 2200 SCHEMA - public postgres PRE_DATA 168; 1259 41968 TABLE public t1 postgresPRE_DATA ; depends on: 5 1921; 2606 41975 CONSTRAINT public t1_pkey postgres POST_DATA ; depends on: 168 168 169; 1259 41976 VIEW public v1 postgres PRE_DATA ; depends on: 1919 5 1922; 0 41968 TABLE DATA public t1 postgres DATA ; depends on: 168 Now, there are two not-nice things about this. First, the view is shown as depending on object 1919, which is pg_dump's internal DumpId for the view's _RETURN rule --- but that's nowhere to be seen in the dump, so the fact that it's dependent on the t1_pkey constraint is not visible in the dump. This puts parallel pg_restore at risk of doing the wrong thing. Second, because view definitions are preferentially dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted up to come before the table data. That means that in an ordinary serial restore, the index will be created before the table's data is loaded, which is undesirable for efficiency reasons. It gets worse though. I've labeled the above archive items with their SECTION codes (which for some reason pg_restore -l -v doesn't print) so that you can see that we've got a POST_DATA item that must come before a PRE_DATA item. This wreaks havoc with the entire concept of splitting dump files into three sections. When I first realized that, I was thinking that we would have to revert the --section flag we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee that the items can actually be restored if they are split according to sections. I think that we can fix it though. There are provisions in pg_dump already for splitting a view apart from its _RETURN rule --- and if the rule comes out as a separate object, it's POST_DATA. So at least in principle, we could fix things by dumping this scenario this way: SCHEMA public PRE_DATA TABLE t1PRE_DATA TABLE v1PRE_DATA (at this point it's a table not a view) TABLE DATA t1 DATA CONSTRAINT t1_pkey POST_DATA RULE for v1 POST_DATA (adding the rule turns v1 into a view) The problem is how to persuade pg_dump to do that; as the code stands, it will only break a view apart from its rule if that's necessary to break a circular dependency loop, and there is none in this example. Another point worth thinking about is that if --section is to be trusted at all, we need some way of guaranteeing that the dependency-sorting code won't happily create other similar cases. There is nothing in there right now that would think anything is wrong with an ordering that breaks the section division. I believe the right fix for both of these issues is to add knowledge of the section concept to the topological sort logic, so that an ordering that puts POST_DATA before DATA or PRE_DATA after DATA is considered to be a dependency-ordering violation. One way to do that is to add dummy fencepost objects to the sort, representing the start and end of the DATA section. However, these objects would need explicit dependency links to every other DumpableObject, so that doesn't sound very good from a performance standpoint. What I'm going to go look at is whether we can mark DumpableObjects with their SECTION codes at creation time (rather than adding that information at ArchiveEntry() time) and then have the topo sort logic take that marking into account in addition to the explicit dependency links. Assuming we can make that work, how far should it be back-patched? The --section issue is new in 9.2, but this would also take care of the restore problems for views with constraint dependencies, which are allowed as of 9.1, so I'm thinking we'd better put it into 9.1. The other problem is the bogus dependency IDs that pg_dump emits by virtue of not caring whether the dependency links to an object it's actually dumped. I posted a patch for that in the pgsql-bugs thread but pointed out that it would not work before 9.2 without additional surgery. If we fix the view vs constraint issue by adding section knowledge to the sort, I think we can maybe get away with not fixing the dependency IDs in back
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On Sun, Jun 10, 2012 at 4:15 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2012/6/8 Simon Riggs si...@2ndquadrant.com: I have a prototype that has some of these characteristics, so I see our work as complementary. At present, I don't think this patch would be committable in CF1, but I'd like to make faster progress with it than that. Do you want to work on this more, or would you like me to merge our prototypes into a more likely candidate? I'm not favor in duplicate similar efforts. If available, could you merge some ideas in my patch into your prototypes? so, we are waiting for a new patch? is it coming from Simon or Kohei? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree or not btree? That is the question
Greg Sabino Mullane g...@endpoint.com writes: ERROR: index pg_class_oid_index is not a btree That means you got bogus data while reading the metapage. I'm beginning to wonder about the hardware on this server ... This happened again, and this time I went back through the logs and found that it is always the exact same query causing the issue. I also found it occuring on different servers, which rules out RAM anyway (still shared disk, so those are suspect). This query also sometimes gives errors like this: ERROR: could not read block 3 of relation 1663/1554846571/3925298284: read only 0 of 8192 bytes However, the final number changes: these are invariably temporary relations. Oh really ... okay, maybe it is a software problem then. Any ideas on how to carefully debug this? There are a couple of quicksorts when I explain analyze on a non-prod system, which I am guessing where the temp tables come from (work_mem is 24MB). No, that error message is complaining about an attempt to read an actual, named, relation (could be temp, no way to be sure from this info). A sort might generate temp files but those don't have that kind of name. I'm not sure I understand what could be causing both the 'read 0' and btree errors for the same query - bad blocks on disk for one of the underlying tables? The could not read thing looks more like an attempt to fetch an invalid TID; you could get such an error for instance if you had an index that referenced a tuple in block 3, but the table on disk isn't that long. So one possible theory for what's happening here is that once in a while we get confused about which shared buffer holds which disk block, and either find the wrong block entirely when looking for pg_class_oid_index (the first case) or grab a page of the wrong index in the second case. This theory would be more plausible if you're wrong about the second-case tables being temp, though, because if they are temp then their indexes would be kept in local buffers not shared buffers, making it much harder to believe in a single bug causing both symptoms. One possible mechanism for confusion of that sort would be if the spinlock code wasn't quite right, or the compiler was incorrectly moving loads/stores into or out of locked sections. So it might be time to ask exactly what kind of hardware this is, which compiler PG was built with, etc. On the other hand, if the issue were of that sort then it ought to affect all buffers more or less at random; so if you're consistently seeing exactly these symptoms (in particular, if it's always pg_class_oid_index that's complained of), then I'm not sure I believe this theory either. Which PG version again? Are you in the habit of doing VACUUM FULLs on system catalogs, and if so do these glitches correlate at all with such activities? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect behaviour when using a GiST index on points
On Wed, Feb 22, 2012 at 5:56 PM, Alexander Korotkov aekorot...@gmail.comwrote: Attached patch fixes GiST behaviour without altering operators behaviour. I think we definitely should apply this patch before 9.2 release, because it is a bug fix. Otherwise people will continue produce incorrect GiST indexes with in-core geometrical opclasses until 9.3. Patch is very simple and only changes few lines of code. Any thoughts? -- With best regards, Alexander Korotkov. gistproc_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 08:02:58 -0400 2012: Ouch! # UtfToLocal() seems to have a bug that always report illegal # encoding was UTF8 regardless of the real encoding. But # plper_lc_*.(sql|out) increases if the bug is fixed. This is not a bug. The error message invalid byte sequence for encoding UTF8 meant invalid INPUT byte sequence as encoding UTF8 not encoding for output.. So the regtest patch covers all of possible patterns - UTF8 and SQL_ASCII.. Right, that confused me too for a while. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 07:22:43 -0400 2012: Hello, Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr and SvPVUTF8() when turning a perl string into a cstring. Right. I spent a bit longer time catching on pl/perl and now understand what is the problem... Hi, Thanks for the review. So I played a bit with this patch, and touched it a bit mainly just to add some more comments; and while at it I noticed that some of the functions in Util.xs might leak some memory, so I made an attempt to plug them, as in the attached patch (which supersedes yours). Ok, Is it ok to look into the newer patch including fix of leaks at first? Yeah, I'm going to have to backpatch the whole of it so having full review is good. Thanks. -- The others Variable naming in util_quote_*() seems a bit confusing, text *arg = sv2text(sv); text *ret = DatumGetTextP(..., PointerGetDatum(arg))); char *str; pfree(arg); str = text_to_cstring(ret); RETVAL = cstr2sv(str); pfree(str); Renaming ret to quoted and str to ret as the patch attached might make it easily readable. I think I'm going to refrain from this because it will be more painful to backpatch. Now, with my version of the patch applied and using a SQL_ASCII database to test the problem in the original report, I notice that we now have a regression failure: snip. I'm not really sure what to do here -- maybe have a second expected file for that test is a good enough answer? Or should I just take the test out? Opinions please. The attached ugly patch does it. We seem should put NO_LOCALE=1 on the 'make check' command line for the encodings not compatible with the environmental locale, although it looks work. The idea of separating the test into its own file has its merit; but instead of having two different tests, I'm going to have a single test and two expected files. That seems simpler than messing around in the makefile. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree or not btree? That is the question
I dug through the logs and found some other occurances of the could not read block errors. Some on dirt simple SELECT queries. Nothing else has generated the btree error yet. About 35 found in the last month. This theory would be more plausible if you're wrong about the second-case tables being temp, though, because if they are temp then their indexes would be kept in local buffers not shared buffers, making it much harder to believe in a single bug causing both symptoms. I grepped the last month of logs and found about 20 instances of that error: none of the relfilenodes given shows up in pg_class, even for that dirt simple SELECT. One possible mechanism for confusion of that sort would be if the spinlock code wasn't quite right, or the compiler was incorrectly moving loads/stores into or out of locked sections. So it might be time to ask exactly what kind of hardware this is, which compiler PG was built with, etc. Quad core AMD Opteron. RHEL. Compiled with gcc with all the options (basically the standard compilation e.g. --build=x86_64-redhat-linux-gnu) I can give you more details offlist if it will help. On the other hand, if the issue were of that sort then it ought to affect all buffers more or less at random; so if you're consistently seeing exactly these symptoms (in particular, if it's always pg_class_oid_index that's complained of), then I'm not sure I believe this theory either. I've never seen any other index for the btree error, but it has only happened a grand total of 3 times ever. The other error appears to be fairly random, except that the one particular query that gives the btree error always seems to give one version or the other. Which PG version again? Are you in the habit of doing VACUUM FULLs on system catalogs, and if so do these glitches correlate at all with such activities? Heh. 8.3.18. Yes, very heavy vac fulls (and reindexes) of the system catalogs. Cron-driven, and depends on the time of day and if any DDL is running (if so, it does not run), but probably on average pg_class is vacfulled and reindexed twice an hour during the times this happens (which is, during normal business hours). There is a lot in churn in pg_class, pg_attribute, and pg_depend in particular from all the temp stuff being created and torn down all day, as well as some Bucardo pg_class updating. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpLQFTbOf8Tw.pgp Description: PGP signature
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Excerpts from Alex Hunsaker's message of jue jun 21 10:27:41 -0400 2012: On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012: Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr and SvPVUTF8() when turning a perl string into a cstring. Right. So I played a bit with this patch, and touched it a bit mainly just to add some more comments; and while at it I noticed that some of the functions in Util.xs might leak some memory, so I made an attempt to plug them, as in the attached patch (which supersedes yours). I think most of these leaks go back to 9.0. Dunno if its worth backpatching them... Well, nobody has ever complained about them so maybe they're just not worth the effort. to test the problem in the original report, I notice that we now have a regression failure: I'm not really sure what to do here -- maybe have a second expected file for that test is a good enough answer? Or should I just take the test out? Opinions please. I think we have broken that check twice so it seems like it would be nice to keep. But I don't feel *to* strongly about it. Hmm, if we're broken it then it's probably best to keep it. The comment and cleanups all look good to me. Thanks for the review. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 14.06.2012 01:56, Alexander Korotkov wrote: Hackers, attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing. @@ -788,7 +774,7 @@ range_super_union(**TypeCacheEntry *typcache, RangeType * r1, R angeType * r2) * part of the relcache entry for the index, typically) this essentially * eliminates lookup overhead during operations on a GiST range index. */ -static Datum +Datum TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2) { FunctionCallInfoData fcinfo; I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like: *** *** 692,697 --- 692,708 { RangeType *r1 = PG_GETARG_RANGE(0); RangeType *r2 = PG_GETARG_RANGE(1); + + typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1)); + + PG_RETURN_BOOL(range_adjacent_**internal(r1, r2, typcache); + } + + bool + range_adjacent_internal(**RangeType r1, RangeType r2, TypeCacheEntry *typcache) + { + RangeType *r1 = PG_GETARG_RANGE(0); + RangeType *r2 = PG_GETARG_RANGE(1); TypeCacheEntry *typcache; RangeBound lower1, lower2; The gist and SP-gist consistent functions could call the internal function directly. I like idea of replacing TrickFunctionCall2 with internal function. Do you think I should post a separate patch for existing GiST code? -- With best regards, Alexander Korotkov.
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On Thu, Jun 14, 2012 at 2:56 AM, Alexander Korotkov aekorot...@gmail.comwrote: attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing. Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in column @ const query is thanks to 2d-mapping. Patch with another SP-GiST implementation for ranges is attached. It uses k-d tree instead of quad-tree. I would like to leave only one implementation of SP-GiST for ranges. I'm going to do as comprehensive testing as I can for it. -- With best regards, Alexander Korotkov. range_spgist_kd-0.1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
Alexander Korotkov aekorot...@gmail.com writes: On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like: I like idea of replacing TrickFunctionCall2 with internal function. Do you think I should post a separate patch for existing GiST code? +1 ... that was a pretty grotty hack, so let's get rid of it if we can. It's still going to require some documentation though I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On 21 June 2012 19:13, Jaime Casanova ja...@2ndquadrant.com wrote: On Sun, Jun 10, 2012 at 4:15 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2012/6/8 Simon Riggs si...@2ndquadrant.com: I have a prototype that has some of these characteristics, so I see our work as complementary. At present, I don't think this patch would be committable in CF1, but I'd like to make faster progress with it than that. Do you want to work on this more, or would you like me to merge our prototypes into a more likely candidate? I'm not favor in duplicate similar efforts. If available, could you merge some ideas in my patch into your prototypes? so, we are waiting for a new patch? is it coming from Simon or Kohei? There is an updated patch coming from me. I thought I would focus on review of other work first. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
On 06/21/2012 02:13 PM, Tom Lane wrote: Don't know if everybody on this list has been paying attention to the pgsql-bugs thread about bug #6699. The shortest example of the problem is create table t1 (f1 int primary key, f2 text); create view v1 as select f1, f2 from t1 group by f1; The view's query is legal only because of the primary key on f1, else the reference to f2 would be considered inadequately grouped. So when we create the view we mark it as dependent on that primary key (or more accurately, the view's _RETURN rule is so marked). This all works fine so far as the backend is concerned: you can't drop the PK without dropping or redefining the view. But it gives pg_dump indigestion. If you do pg_dump -Fc | pg_restore -l -v you see (relevant entries only): 5; 2615 2200 SCHEMA - public postgres PRE_DATA 168; 1259 41968 TABLE public t1 postgresPRE_DATA ; depends on: 5 1921; 2606 41975 CONSTRAINT public t1_pkey postgres POST_DATA ; depends on: 168 168 169; 1259 41976 VIEW public v1 postgres PRE_DATA ; depends on: 1919 5 1922; 0 41968 TABLE DATA public t1 postgres DATA ; depends on: 168 Now, there are two not-nice things about this. First, the view is shown as depending on object 1919, which is pg_dump's internal DumpId for the view's _RETURN rule --- but that's nowhere to be seen in the dump, so the fact that it's dependent on the t1_pkey constraint is not visible in the dump. This puts parallel pg_restore at risk of doing the wrong thing. Second, because view definitions are preferentially dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted up to come before the table data. That means that in an ordinary serial restore, the index will be created before the table's data is loaded, which is undesirable for efficiency reasons. It gets worse though. I've labeled the above archive items with their SECTION codes (which for some reason pg_restore -l -v doesn't print) so that you can see that we've got a POST_DATA item that must come before a PRE_DATA item. This wreaks havoc with the entire concept of splitting dump files into three sections. When I first realized that, I was thinking that we would have to revert the --section flag we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee that the items can actually be restored if they are split according to sections. I think that we can fix it though. There are provisions in pg_dump already for splitting a view apart from its _RETURN rule --- and if the rule comes out as a separate object, it's POST_DATA. So at least in principle, we could fix things by dumping this scenario this way: SCHEMA public PRE_DATA TABLE t1PRE_DATA TABLE v1PRE_DATA (at this point it's a table not a view) TABLE DATA t1 DATA CONSTRAINT t1_pkey POST_DATA RULE for v1 POST_DATA (adding the rule turns v1 into a view) The problem is how to persuade pg_dump to do that; as the code stands, it will only break a view apart from its rule if that's necessary to break a circular dependency loop, and there is none in this example. Another point worth thinking about is that if --section is to be trusted at all, we need some way of guaranteeing that the dependency-sorting code won't happily create other similar cases. There is nothing in there right now that would think anything is wrong with an ordering that breaks the section division. I believe the right fix for both of these issues is to add knowledge of the section concept to the topological sort logic, so that an ordering that puts POST_DATA before DATA or PRE_DATA after DATA is considered to be a dependency-ordering violation. One way to do that is to add dummy fencepost objects to the sort, representing the start and end of the DATA section. However, these objects would need explicit dependency links to every other DumpableObject, so that doesn't sound very good from a performance standpoint. What I'm going to go look at is whether we can mark DumpableObjects with their SECTION codes at creation time (rather than adding that information at ArchiveEntry() time) and then have the topo sort logic take that marking into account in addition to the explicit dependency links. Assuming we can make that work, how far should it be back-patched? The --section issue is new in 9.2, but this would also take care of the restore problems for views with constraint dependencies, which are allowed as of 9.1, so I'm thinking we'd better put it into 9.1. The other problem is the bogus dependency IDs that pg_dump emits by virtue of not caring whether the dependency links to an object it's actually dumped. I posted a patch for that in the pgsql-bugs thread but pointed out that it would not work before 9.2 without additional surgery. If we fix the view vs constraint issue by adding section knowledge to the sort, I think we can maybe get
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On 20 June 2012 14:38, Andres Freund and...@2ndquadrant.com wrote: It incurs a rather high performance overhead due to added memory allocations and added pointer indirections. Thats fine for most of the current users of the List interface, but certainly not for all. In other places you cannot even have memory allocations because the list lives in shared memory. Yes, in general lists interact horribly with the memory hierarchy. I think I pointed out to you once a rant of mine on -hackers a while back in which I made various points about just how badly they do these days. On modern architectures, with many layers of cache, the cost of the linear search to get an insertion point is very large. So this: /* * removes a node from a list * Attention: O(n) */ static inline void ilist_s_remove(ilist_s_head *head, ilist_s_node *node) is actually even worse than you might suspect. E.g. in the ApplyCache, where I use the submitted ilist.h stuff, when reconstructing transactions you add to a potentially really long linked list of individual changes for every interesting wal record. Before I prevented memory allocations in that path it took about 12-14% of the time when applying changes in the same backend. Afterwards it wasn't visible in the profile anymore. I find that very easy to believe. Several of the pieces of code I pointed out in a previous email use open-coded list implementation exactly to prevent those problems. Interesting. So, it seems like this list implementation could be described as a minimal embeddable list implementation that requires the user to do all the memory allocation, and offers a doubly-linked list too. Not an unreasonable idea. I do think that the constraints you have are not well served by any existing implementation, including List and Dllist. Are you planning on just overhauling the Dllist interface in your next iteration? As to the question of inling, the C99 standard (where inlining is standardised by ANSI, but inspired by earlier extensions to C), unlike the C89 standard, seems to be well respected by vendors as far as it goes, with some compilers going to pains to implement it correctly, like ICC and Clang. We can't really switch to C99, because MSVC doesn't support it, and it is patently obvious that Microsoft have zero interest in it. Funnily enough, if anyone takes C89 as a standard seriously still, it's Microsoft, if only due to indifference to later standards. This hack exists purely for the benefit of their strict interpretation of C89, I think: /* Define to `__inline__' or `__inline' if that's what the C compiler calls it, or to nothing if 'inline' is not supported under any name. */ #ifndef __cplusplus /* #undef inline */ #endif If anyone today is using PostgreSQL binaries in production that were built with a compiler that does not USE_INLINE, I would be very surprised indeed. The idea that anyone intends to build 9.3 with a compiler that doesn't support inline functions is very difficult to believe. Other C open source projects like Linux freely use inline functions. Now, granted, it was only possible to build the kernel for a long time using gcc, but inline had nothing to do with the problem of building the kernel. My point is that broadly it makes more practical sense to talk about GNU C as a standard than C89, and GNU C supports inline functions (C99 is a different matter, but that isn't going to happen in the foreseeable future). Don't believe me? This is from our configure script: # Check if it's Intel's compiler, which (usually) pretends to be gcc, # but has idiosyncrasies of its own. We assume icc will define # __INTEL_COMPILER regardless of CFLAGS. All of the less popular compilers we support we support precisely because they pretend to be GCC, with the sole exception, as always, of the Microsoft product, in this case MSVC. So my position is that I'm in broad agreement that we should freely allow the use of inline without macro hacks, since we generally resists using macro hacks if that makes code ugly, which USE_INLINE certainly does, and for a benefit that is indistinguishable from zero, at least to me. Why are you using the stdlib's assert.h? Why have you used the NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if the header was intended to live in port, but it isn't, right? Why have you done this: #ifdef __GNUC__ #define unused_attr __attribute__((unused)) #else #define unused_attr #endif and then gone on to use this unused_attr macro all over the place? Firstly, that isn't going to suppress the warnings on many platforms that we support, and we do make an effort to build warning free on at least 3 compilers these days - GCC, Clang and MSVC. Secondly, compilers give these warnings because it doesn't make any sense to have an unused parameter - so why have you used one? At the very least, if you require this exact interface, use compatibility macros. I can't imagine why
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
Andrew Dunstan and...@dunslane.net writes: If I'm understanding you correctly, fixing the bogus dependency thing is more an insurance policy than fixing a case (other than the constraint dependency) that is known to be broken. Right. That's the only *known* broken case, and it does seem like we'd have heard by now about others. Also, what I have in mind will cause at least HEAD, and however far we back-patch it, to actively complain if it runs into a case where the sections can't be separated, rather than silently outputting items in a funny order as now. So if there are any more cases lurking I think we'll hear about them quickly, and then we can evaluate whether further backpatching is required. (There's another bug to do with parallel pg_restore and clustering that Andrew Hammond raised back in January, that I want to fix when I get some time.) Hm, I guess I've forgotten that one? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
On 06/21/2012 06:25 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: (There's another bug to do with parallel pg_restore and clustering that Andrew Hammond raised back in January, that I want to fix when I get some time.) Hm, I guess I've forgotten that one? See http://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
Andrew Dunstan and...@dunslane.net writes: On 06/21/2012 06:25 PM, Tom Lane wrote: Hm, I guess I've forgotten that one? See http://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php I didn't understand that then, and I still don't. The ALTER TABLE CLUSTER might need exclusive lock, but it's not going to hold the lock long enough to be an issue. I could see that there's a problem with identify_locking_dependencies believing that two CONSTRAINT items conflict (do they really?) but not convinced the CLUSTER aspect has anything to do with it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
Peter Geoghegan pe...@2ndquadrant.com writes: All of the less popular compilers we support we support precisely because they pretend to be GCC, with the sole exception, as always, of the Microsoft product, in this case MSVC. This is nonsense. There are at least three buildfarm machines running compilers that do not pretend to be gcc (at least, configure recognizes them as not gcc) and are not MSVC either. We ought to have more IMO, because software monocultures are dangerous. Of those three, two pass the quiet inline test and one --- the newest of the three if I guess correctly --- does not. So it is not the case that !USE_INLINE is dead code, even if you adopt the position that we don't care about any compiler not represented in the buildfarm. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On 22 June 2012 01:04, Tom Lane t...@sss.pgh.pa.us wrote: This is nonsense. There are at least three buildfarm machines running compilers that do not pretend to be gcc (at least, configure recognizes them as not gcc) and are not MSVC either. So those three don't have medium to high degrees of compatibility with GCC? We ought to have more IMO, because software monocultures are dangerous. Of those three, two pass the quiet inline test and one --- the newest of the three if I guess correctly --- does not. So it is not the case that !USE_INLINE is dead code, even if you adopt the position that we don't care about any compiler not represented in the buildfarm. I note that you said that it doesn't pass the quiet inline test, and not that it doesn't support inline functions. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
On 06/21/2012 07:43 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 06/21/2012 06:25 PM, Tom Lane wrote: Hm, I guess I've forgotten that one? Seehttp://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php I didn't understand that then, and I still don't. The ALTER TABLE CLUSTER might need exclusive lock, but it's not going to hold the lock long enough to be an issue. I could see that there's a problem with identify_locking_dependencies believing that two CONSTRAINT items conflict (do they really?) but not convinced the CLUSTER aspect has anything to do with it. If something else holds a lock on the table (e.g. another CREATE INDEX) the ALTER TABLE will block until it's done, waiting for an ACCESS EXCLUSIVE lock. The whole method of operation of parallel restore is that we are not supposed to start items that might be blocked by currently running operations. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
Andrew Dunstan and...@dunslane.net writes: On 06/21/2012 07:43 PM, Tom Lane wrote: I didn't understand that then, and I still don't. If something else holds a lock on the table (e.g. another CREATE INDEX) the ALTER TABLE will block until it's done, waiting for an ACCESS EXCLUSIVE lock. The whole method of operation of parallel restore is that we are not supposed to start items that might be blocked by currently running operations. Oh, I see --- Andrew Hammond's complaint involves *three* index creations on the same table (one running; a second one completed except for the ALTER TABLE CLUSTER step attached to it, which is blocked behind the running index creation; and a third blocked behind the ALTER TABLE CLUSTER). I had misread it to imply that only two indexes were enough to manifest the problem. Yeah, we should break out the ALTER TABLE CLUSTER step as a separate TOC item to fix this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts
(2012/05/01 0:30), Ryan Kelly wrote: On Mon, Apr 30, 2012 at 09:02:33AM -0400, Alvaro Herrera wrote: Well, do *you* want it? Of course. That way I can stop patching my psql and go back to using the one that came with my release :) Here is result of my review of v4 patch. Submission -- - The patch is in context diff format - It needs rebase for HEAD, but patch command takes care automatically. - Make finishes cleanly, and all regression tests pass Functionality - After applying this patch, I could cancel connection attempt at start-up by pressing Ctrl+C on terminal just after invoking psql command with an unused IP address. Without this patch, such attempt ends up with error such as No route to host after uninterruptable few seconds (maybe the duration until error would depend on environment). Connection attempt by \connect command could be also canceled by pressing Ctrl+C on psql prompt. In addition, I tried setting PGCONNECT_TIMEOUT to 0 (infinite), but psql gave up after few seconds, for both start-up and re-connect. Is this intentional behavior? Detail of changes - Basically this patch consists of three changes: 1) defer setup of cancel handler until start-up connection has established 2) new libpq API PQconnectTimeout which returns connect_timeout value of current session 3) use asynchronous connection API at \connect psql command, this requires API added by 2) These seem reasonable to achieve canceling connection attempt at start-up and \connect, but, as Ryan says, codes added to do_connect might need more simplification. I have some random comments. - Checking status by calling PQstatus just after PQconnectStartParams is necessary. - Copying only select() part of pqSocketPoll seems not enough, should we use poll(2) if it is supported? - Don't we need to clear error message stored in PGconn after PQconnectPoll returns OK status, like connectDBComplete? Regards, -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On 12-06-21 12:18 PM, Tom Lane wrote: Did you actually create a text = chkpass function and operator? This declaration merely promises that you will provide one eventually. The system does not have the ability to make one for you. I guess I am missing the point of COMMUTATOR then. The docs say When you are defining a self-commutative operator, you just do it. It seems you need to do more than just do it. As far as I can tell I will need to add another C function and another CREATE OPERATOR. Then I can create an operator that takes args (text, chkpass) instead of (chkpass, text). What is the COMMUTATOR for then? Is it just a hint to the planner? -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip checkpoint on promoting from streaming replication
Hello, Is it guaranteed that all the files (e.g., the latest timeline history file) required for such crash recovery exist in pg_xlog? If yes, your approach might work well. Particularly regarding the promotion, the files reuiqred are the history file of the latest timeline, the WAL file including redo location of the latest restartpoint, and all WAL files after the first one each of which is of appropriate timeline. On current (9.2/9.3dev) implement, as far as I know, archive recovery and stream replication will create regular WAL files requireded during recovery sequence in slave's pg_xlog direcotory. And only restart point removes them older than the one on which the restart point takes place. If so, all required files mentioned above should be in pg_xlog directory. Is there something I've forgotten? However, it will be more robust if we could check if all required files available on promotion. I could guess two approaches which might accomplish that. 1. Record the id of the WAL segment which is not in pg_xlog as regular WAL file on reading. For example, if we modify archive recovery so as to make work WAL files out of pg_xlog or give a special name which cannot be refferred to for fetching them in crash recovery afterward, record the id of the segment. The shutdown checkpoint on promotion or end of recovery cannot be skipped if this recorded segment id is equal or larger than redo point of the latest of checkpoint. This approach of cource reduces the chance to skip shutdown checkpoint than forcing to copy all required files into pg_xlog, but still seems to be effective for most common cases, say promoting enough minutes after wal-streaming started to have a restart point on a WAL in pg_xlog. I hope this is promising. Temporary WAL file for streaming? It seems for me to make shutdown checkpoint mandatory since no WAL files before promotion becomes accessible at the moment. On the other hand, preserving somehow the WALs after the latest restartpoint seems to have not significant difference to the current way from the viewpoint of disk consumption. 2. Check for all required WAL files on promotion or end of recovery. We could check the existence of all required files on promotion scanning with the manner similar to recovery. But this requires to add the codes similar to the existing or induces the labor to weave new function into existing code. Furthurmore, this seems to take certain time on promotion (or end of recovery). The discussion about temporary wal files would be the same to 1. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
D'Arcy Cain da...@druid.net writes: On 12-06-21 12:18 PM, Tom Lane wrote: Did you actually create a text = chkpass function and operator? This declaration merely promises that you will provide one eventually. The system does not have the ability to make one for you. I guess I am missing the point of COMMUTATOR then. The docs say When you are defining a self-commutative operator, you just do it. It seems you need to do more than just do it. Um, an operator with different types on left and right cannot be its own commutator. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On 12-06-22 12:22 AM, Tom Lane wrote: Um, an operator with different types on left and right cannot be its own commutator. Understood. I completely misunderstood the purpose of COMMUTATOR. I thought that it was telling the system that the procedure could be called with the arguments reversed if it could not find a specific procedure. I see now what it is for. So I have my type working now but I had to create a new C function that take the opposite argument order. Seems redundant but I could not see a better way. -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers