Re: [HACKERS] GIN improvements part 1: additional information
On Thu, Jan 23, 2014 at 9:27 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Wed, Jan 22, 2014 at 9:28 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/22/2014 02:17 PM, Alexander Korotkov wrote: We already spent a lot of time with compression. Now we need to figure out the result we want see. I spent quite long time debugging varbyte encoding without segments. Finally, it passed very many tests without any problems. Now, it is just piece of junk. I'm afraid that we will have to reimplement everything from scratch another two or three times because code doesn't look perfect. For sure, it's incompatible with getting something into 9.4. That's a bit harsh :-). But yes, I understand what you're saying. It's quite common for large patches like this to be rewritten several times before being committed; you just don't know what works best until you've tried many designs. Remember we have also subsequent fast-scan which is very needed for hstore and other application. I propose to do final decisions now and concentrate forces on making committable patch with these decisions. And don't change these decisions even if somebody have idea how to improve code readability in 100 times and potential extendability in 1000 times. I propose following decisions: 1) Leave uncompressed area, allow duplicates in it 2) Don't introduce Items on page. Well, I got the insertions to work now without the uncompressed area, so in the spirit of Just Getting This Damn Patch Committed, I'm going to go ahead with that. We can add the uncompressed area later if performance testing shows it to be necessary. And I agree about the Items on page idea - we can come back to that too still in 9.4 timeframe if necessary, but probably not. So, committed. It's the same design as in the most recent patch I posted, but with a bunch of bugs fixed, and cleaned up all over. I'm going to move to the fast scan patch now, but please do test and review the committed version to see if I missed something. Great! Thanks a lot! Assertion in dataPlaceToPageLeafRecompress is too strong. Page can contain GinDataLeafMaxContentSize bytes. Patch is attached. My test-suite don't run correctly. I'm debugging now. ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some segments. Others are not even re-palloced. They are moved left in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with memcpy, proper solution is memmove. Using memcpy platform dependently can lead to page corruption. Another solution is to re-palloc segments in ginVacuumPostingTreeLeaf. -- With best regards, Alexander Korotkov. gin-memmove-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote: v15 to fix the above problem. Minor quibble: I get a compiler warning with the patch applied. relcache.c: In function 'RememberToFreeTupleDescAtEOX': relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]. Thanks, that was a wart, now fixed. v16 attached -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services reduce_lock_levels.v16.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 01/24/2014 10:03 AM, Alexander Korotkov wrote: ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some segments. Others are not even re-palloced. They are moved left in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with memcpy, proper solution is memmove. Using memcpy platform dependently can lead to page corruption. Another solution is to re-palloc segments in ginVacuumPostingTreeLeaf. Good catch. Thanks, committed, changing memcpy to memmove. Will have to switch to pallocing everything in the future, if we make leafRepackItems smarter, so that it doesn't rewrite all the segments after the first modified one. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Fri, Jan 24, 2014 at 12:50 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/24/2014 10:03 AM, Alexander Korotkov wrote: ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some segments. Others are not even re-palloced. They are moved left in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with memcpy, proper solution is memmove. Using memcpy platform dependently can lead to page corruption. Another solution is to re-palloc segments in ginVacuumPostingTreeLeaf. Good catch. Thanks, committed, changing memcpy to memmove. Will have to switch to pallocing everything in the future, if we make leafRepackItems smarter, so that it doesn't rewrite all the segments after the first modified one. OK. What about previous fix in assert? -- With best regards, Alexander Korotkov.
Re: [HACKERS] extension_control_path
Sergey Muraviov sergey.k.murav...@gmail.com writes: I can't apply the patch. Did you try using the `patch`(1) command? The PostgreSQL project policy is to not use the git format when sending patches to the mailing list, prefering the context diff format. So you need to resort to using the basic patch commands rather than the modern git tooling. See also: http://wiki.postgresql.org/wiki/Submitting_a_Patch Patches must be in a format which provides context (eg: Context Diff); 'normal' or 'plain' diff formats are not acceptable. The following email might be useful for you: http://www.postgresql.org/message-id/CAOR=d=0q0dal0bnztsddnwpgm5ejkxuykj7m+qsqbr728eo...@mail.gmail.com Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views
Hi all I've hit an interesting wrinkle and am interested in opinions. By integrating updatable security barrier view support with row-security, it has become a lot harder to detect and prevent infinite recursion (including mutual recursion) in row-security policy expansion. The code is attached, but it's not an independent patch so it's way easier to grab it from git: g...@github.com:ringerc/postgres.git branch rls-9.4-upd-sb-views (subject to rebase); or tagrls-9.4-upd-sb-views-v1 (Only contains half the old row-security patch; I'll rebase the docs, tests, etc on top of this once it's working properly). I've integrated the updatable security barrier view support into RLS by injecting securityQuals in subquery_planner() just before preprocess_rowmarks . (I'm still thinking about some inheritance related aspects to that, but that's for later). The problem is that this causes infinite recursion - the securityQuals get expanded into a subquery over the original RTE that had the row-security policy on it. Then subquery_planner is re-entered when processing the subquery, a row-security qual is found on the target RTE, and ... *boom*. Fixing this is not as simple as suppressing expansion of row-security policy when processing a security barrier subquery created by a row-security policy, as it is desirable to respect the row-security policy of *other* tables that get referenced in the expanded row-security qual. If we just record the relid of the relation a subquery was expanded from and avoid expanding that inside the generated subquery we prevent simple linear recursion, but not mutual recursion between two or more rels with row-security policy. KaiGai's row-security patch handles this because it does its own recursive expansion, so (like the rewriter) it can keep a breadcrumb trail and detect when it is repeating a path. That's not so practical when row-security code tags RTEs with policy, then updatable s.b. views goes and expands them. So. Options. 1.Come up with a reasonable way to track recursive row-security expansion, detect infinite recursion, and bail out. Likely to involve a new global structure with planner/optimizer lifetime that gets checked and maintained by apply_row_security_rangetbl. 2.Solve the linear recursion case by storing a relid that should not get expanded for security quals when processing a subquery. Say don't do that, expect stack exceeded crashes if you do for the mutual recursion case. Seems user hostile, incomplete, and likely to break people's systems when they really don't expect it. 3.Disregard row-security policy on referenced tables when expanding row-security qualifiers. There's precendent here in foreign keys, which ignore row-security policy, but I don't think this is at all appealing. 4.Magic? Unless someone has a suggestion for #4, I'll be going with #1. I'd appreciate tips on doing this in a sane and efficient manner if anyone has them. I'll be reading over the rewriter's infinite loop protection and that of KaiGai's RLS patch for ideas in the mean time, and will give it a good go. BTW, I feel like we should be letting the rewriter do this job; it's good at dealing with recursion problems already. That won't work so long as security barrier qual expansion happens during planning, not rewrite, though - and we've already explored the fun problems with doing upd. s.b. qual expansion in rewrite. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 4f8ed71e4cf18db7a4285da9b9c9f8f7f1030e32 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 24 Jan 2014 16:18:40 +0800 Subject: [PATCH] RLS: First attempt to integrate with updatable s.b. views Enters infinite recursion because the rls-protected table gets expanded and the new RTE gets checked for row-security quals, gets securityQuals added, and ... infinite recursion! --- src/backend/optimizer/plan/planner.c | 7 ++ src/backend/optimizer/prep/prepunion.c | 6 + src/backend/optimizer/util/Makefile | 3 +- src/backend/optimizer/util/rowsecurity.c | 185 +++ src/include/optimizer/rowsecurity.h | 27 + 5 files changed, 227 insertions(+), 1 deletion(-) create mode 100644 src/backend/optimizer/util/rowsecurity.c create mode 100644 src/include/optimizer/rowsecurity.h diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 80940ea..8cdd580 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -388,6 +388,13 @@ subquery_planner(PlannerGlobal *glob, Query *parse, } /* + * Check RTEs for row-security policies and set securityQuals on the + * RTE if a policy is found. This must happen before inherited table + * expansion so that the quals get copied to the child rels. + */ + apply_row_security_policy(root); + + /* * Preprocess RowMark
Re: [HACKERS] GIN improvements part 1: additional information
On 01/24/2014 10:53 AM, Alexander Korotkov wrote: OK. What about previous fix in assert? Ah right, fixed that too now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating
Hello, I started reviewing the patch. Go through the original mail thread to understand the need of fix and the actual problem. http://www.postgresql.org/message-id/20121002155857.ge30...@momjian.us Patch is using pg_valid_server_encoding() to compare the encoding which looks more correct. Did code walk through and it looks good to me. I tried to test the patch on CentOS and its working fine. I am not quite knowledgeable about other OS so on that perspective would good to have others view. Here are the comment about the patch: .) Patch gets cleanly apply on master ( using patch -p1 ) .) compilation done successful .) Code walk through and logic looks good .) Manual testing worked good for me To test the issue I set the old database locale to en_US.utf8 and for new database locale to en_US.utf-8. WIth this when I tried pg_upgrade it failed with lc_collate cluster values do not match: old en_US.utf8, new en_US.UTF-8. With the patch pg_upgrade running fine. Regards,
Re: [HACKERS] [patch] Client-only installation on Windows
From: Andrew Dunstan and...@dunslane.net Is there any reason why pgbench is listed in @client_program_files as well as @client_contribs? AFAICT it should only be in the latter. Thank you for reviewing the patch. Yes, you are right. I removed pgbench from @client_program_files. In addition, I added some documentation, as well as modifying the usage at the end of install.pl. I'll update the CommitFest entry shortly. Regards MauMau client_only_install_win_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR
Hello, My customer reported the following problem on Windows. I'm afraid this is a serious problem, and I think we should provide a fix in the next minor release. I've attached a fix, and I would wish it to be back-ported. [Problem] The customer is using 64-bit PostgreSQL 9.1.x on Windows Server 2012 R2. He found the following messages output repeatedly (once per about 3 minutes) in the event log. LOG: could not reserve shared memory region (addr=0127) for child 1340: 487 LOG: could not fork autovacuum worker process: A blocking operation was interrupted by a call to WSACancelBlockingCall. I found the same problem was reported by another user here: Re: PG 924, Windows 2012, error code 487 http://www.postgresql.org/message-id/2c0926abd16bb641a8e2f11a549200425471425...@phxccrprd01.adprod.bmc.com In my customer's case, the failed process was autovacuum worker, so there was no visible impact on their applications. However, as the above mail suggests, this can happen with any backend process. [Cause] This is caused by the improved ASLR (Address Space Layout Randomization) in Windows 8/2012. That is analyzed in the following mail last year, but the problem was not addressed: windows 8 RTM compatibility issue (could not reserve shared memory region for child) http://www.postgresql.org/message-id/5046caeb.4010...@grammatech.com [Fix] Disable ASLR when building PostgreSQL modules. This eliminated the log output in my customer's environment. I'll add this to the CommitFest if necessary. Regards MauMau disable_ASLR.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: MauMau maumau...@gmail.com OK, I added several lines for this. Please check the attached patch. I'm sorry, I attached the old patch as v5 in my previous mail. Attached on this mail is the correct one. I'll update the CommitFest entry soon. Regards MauMau pg_ctl_eventsrc_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views
On 24 January 2014 09:04, Craig Ringer cr...@2ndquadrant.com wrote: Hi all I've hit an interesting wrinkle and am interested in opinions. By integrating updatable security barrier view support with row-security, it has become a lot harder to detect and prevent infinite recursion (including mutual recursion) in row-security policy expansion. The code is attached, but it's not an independent patch so it's way easier to grab it from git: g...@github.com:ringerc/postgres.git branch rls-9.4-upd-sb-views (subject to rebase); or tagrls-9.4-upd-sb-views-v1 (Only contains half the old row-security patch; I'll rebase the docs, tests, etc on top of this once it's working properly). I've integrated the updatable security barrier view support into RLS by injecting securityQuals in subquery_planner() just before preprocess_rowmarks . (I'm still thinking about some inheritance related aspects to that, but that's for later). The problem is that this causes infinite recursion - the securityQuals get expanded into a subquery over the original RTE that had the row-security policy on it. Then subquery_planner is re-entered when processing the subquery, a row-security qual is found on the target RTE, and ... *boom*. Fixing this is not as simple as suppressing expansion of row-security policy when processing a security barrier subquery created by a row-security policy, as it is desirable to respect the row-security policy of *other* tables that get referenced in the expanded row-security qual. If we just record the relid of the relation a subquery was expanded from and avoid expanding that inside the generated subquery we prevent simple linear recursion, but not mutual recursion between two or more rels with row-security policy. KaiGai's row-security patch handles this because it does its own recursive expansion, so (like the rewriter) it can keep a breadcrumb trail and detect when it is repeating a path. That's not so practical when row-security code tags RTEs with policy, then updatable s.b. views goes and expands them. So. Options. 1.Come up with a reasonable way to track recursive row-security expansion, detect infinite recursion, and bail out. Likely to involve a new global structure with planner/optimizer lifetime that gets checked and maintained by apply_row_security_rangetbl. 2.Solve the linear recursion case by storing a relid that should not get expanded for security quals when processing a subquery. Say don't do that, expect stack exceeded crashes if you do for the mutual recursion case. Seems user hostile, incomplete, and likely to break people's systems when they really don't expect it. 3.Disregard row-security policy on referenced tables when expanding row-security qualifiers. There's precendent here in foreign keys, which ignore row-security policy, but I don't think this is at all appealing. 4.Magic? My first thought is to add a boolean flag to RangeTblEntry (similar to the inh flag) to say whether RLS expansion is requested for that RTE. Then set it to false on each RTE as you add new RLS quals to it's securityQuals. In addition, when adding RLS quals to an RTE, I think they should be fully and recursively expanded immediately, before setting the new flag to false and moving on --- think recursively calling the rewriter to expand view references in the new RLS qual, and expand_security_qual() to expand any additional RLS quals in the securityQuals list --- with loop detection there. I'm not pretending that's going to be easy, but there are a couple of existing pieces of code to borrow ideas from. Doing it this way should make it possible to do the loop detection without any global structures. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Fri, Jan 24, 2014 at 1:19 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/24/2014 10:53 AM, Alexander Korotkov wrote: OK. What about previous fix in assert? Ah right, fixed that too now. Good, now my test-suite passed. Results are so. Time of operations event | period ---+- index_build | 00:01:45.709546 index_build_recovery | 00:00:09 index_update | 00:05:45.732214 index_update_recovery | 00:01:17 search_new| 00:24:02.191049 search_updated| 00:26:54.122852 (6 rows) Index sizes label | size ---+--- new | 387547136 after_updates | 610877440 (2 rows) Before compression results were following. Time of operations event | period ---+- index_build | 00:01:51.116722 index_build_recovery | 00:00:08 index_update | 00:05:12.124758 index_update_recovery | 00:01:43 search_new| 00:23:44.281424 search_updated| 00:25:41.96251 (6 rows) Index sizes label |size ---+ new | 884514816 after_updates | 1595252736 (2 rows) So, we can see little regression. However, I'm not sure if it's very significant. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Recovery to backup point
Hi, Heiki-san, From: MauMau maumau...@gmail.com From: Heikki Linnakangas hlinnakan...@vmware.com After some refactoring and fixing bugs in the existing code, I came up with the attached patch. I called the option simply recovery_target, with the only allowed value of immediate. IOW, if you want to stop recovery as early as possible, you add recovery_target='immediate' to recovery.conf. Now that we have four different options to set the recovery target with, I rearranged the docs slightly. How does this look to you? I'm almost comfortable with your patch. There are two comments: C1. The following parts seem to be mistakenly taken from my patch. These are not necessary for your patch, aren't they? I'm going to add the attached new revision of the patch soon, which is almost based on yours. All what I modified is removal of parts I mentioned above. I confirmed that the original problem could be solved. Thanks. Regards MauMau recover_to_backup_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On 01/24/2014 01:20 AM, Marco Atzeri wrote: AFAICT the regression is in Cygwin. The buildfarm passes because it's using an oldish Cygwin release, 1.7.7 rather than the current 1.7.27. I have brought the regression the athe attention of the Cygwin people in the past, but without response. which issue ? During my package tests I have only two issues: tsearch ... FAILED and test: prepared_xacts must be skipped as it never completes Those two issues need to be fixed. And yes, they are regressions from my Cygwin 1.7.7 setup where they pass consistently, just about every day. See http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=brolgabr=HEAD You don't get to choose which regression tests you're going to pass and which you're not. Disabling the tests or providing nonsensical results files are unacceptable. This is a Cygwin behavior issue and needs to be fixed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Fri, Jan 24, 2014 at 6:57 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Sergey Muraviov sergey.k.murav...@gmail.com writes: I can't apply the patch. Did you try using the `patch`(1) command? The PostgreSQL project policy is to not use the git format when sending patches to the mailing list, prefering the context diff format. So you need to resort to using the basic patch commands rather than the modern git tooling. See also: http://wiki.postgresql.org/wiki/Submitting_a_Patch Patches must be in a format which provides context (eg: Context Diff); 'normal' or 'plain' diff formats are not acceptable. Would be nice if we can use git apply command... :-) -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GIN improvements part2: fast scan
On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/14/2014 05:35 PM, Alexander Korotkov wrote: Attached version is rebased against last version of packed posting lists. Thanks! I think we're missing a trick with multi-key queries. We know that when multiple scan keys are used, they are ANDed together, so we can do the skip optimization even without the new tri-state consistent function. To get started, I propose the three attached patches. These only implement the optimization for the multi-key case, which doesn't require any changes to the consistent functions and hence no catalog changes. Admittedly this isn't anywhere near as useful in practice as the single key case, but let's go for the low-hanging fruit first. This nevertheless introduces some machinery that will be needed by the full patch anyway. I structured the code somewhat differently than your patch. There is no separate fast-path for the case where the optimization applies. Instead, I'm passing the advancePast variable all the way down to where the next batch of items are loaded from the posting tree. keyGetItem is now responsible for advancing the entry streams, and the logic in scanGetItem has been refactored so that it advances advancePast aggressively, as soon as one of the key streams let us conclude that no items a certain point can match. scanGetItem might yet need to be refactored when we get to the full preconsistent check stuff, but one step at a time. The first patch is the most interesting one, and contains the scanGetItem changes. The second patch allows seeking to the right segment in a posting tree page, and the third allows starting the posting tree scan from root, when skipping items (instead of just following the right-links). Here are some simple performance test results, demonstrating the effect of each of these patches. This is a best-case scenario. I don't think these patches has any adverse effects even in the worst-case scenario, although I haven't actually tried hard to measure that. The used this to create a test table: create table foo (intarr int[]); -- Every row contains 0 (frequent term), and a unique number. insert into foo select array[0,g] from generate_series(1, 1000) g; -- Add another tuple with 0, 1 combo physically to the end of the table. insert into foo values (array[0,1]); The query I used is this: postgres=# select count(*) from foo where intarr @ array[0] and intarr @ array[1]; count --- 2 (1 row) I measured the time that query takes, and the number of pages hit, using explain (analyze, buffers true) patches time (ms) buffers --- unpatched 650 1316 patch 1 0.521316 patches 1+2 0.501316 patches 1+2+3 0.1315 So, the second patch isn't doing much in this particular case. But it's trivial, and I think it will make a difference in other queries where you have the opportunity skip, but return a lot of tuples overall. In summary, these are fairly small patches, and useful on their, so I think these should be committed now. But please take a look and see if the logic in scanGetItem/keyGetItem looks correct to you. After this, I think the main fast scan logic will go into keyGetItem. Good, thanks! Now I can reimplement fast scan basing on this patches. PS. I find it a bit surprising that in your patch, you're completely bailing out if there are any partial-match keys involved. Is there some fundamental reason for that, or just not implemented? Just not implemented. I think there is two possible approaches to handle it: 1) Handle partial-match keys like OR on matching keys. 2) Implement keyAdvancePast for bitmap. First approach seems to be useful with low number of keys. Probably, we should implement automatic switching between them. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows
From: Fujii Masao masao.fu...@gmail.com I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem. Please check that. I looked through your patch. I'm fine with the PQhostaddr(). I didn't notice the problem when hostaddr was passed to psql on Windows. OTOH, regarding PQhost(), I think we had better take my patch. connectOptions2() computes and set derived values to PGconn, so that PGconn's members have values which are actually used for connection. To follow that rule, PGconn-pghost may as well have localhost on machines without UNIX domain sockets. Plus, if we want to refer to the actual connection target for libpq implementation in other places than PQhost(), we can just use PGconn's members. If we do so, we don't have to change PQhost(). Could you replace the patch? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 11th June 2013, Markus Wanner wrote: Agreed. Postgres-R uses a CommitOrderId, which is very similar in concept, for example. Do you think having this snapshot scheme would be helpful for Postgres-R? Yeah, it could help to reduce patch size, after a rewrite to use such a CSN. Or why do you need to tell apart aborted from in-progress transactions by CSN? I need to detect aborted transactions so they can be discared during the eviction process, otherwise the sparse array will fill up. They could also be filtered out by cross-referencing uncommitted slots with the procarray. Having the abort case do some additional work to make xid assigment cheaper looks like a good tradeoff. I see. Sparse buffer needs to be at least big enough to fit CSN slots for the xids of all active transactions and non-overflowed subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer, or 203KB total in the default configuration. A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess the given 16 bytes includes alignment to 8 byte boundaries. Sounds good. 8 byte alignment for CSNs is needed for atomic if not something else. Oh, right, atomic writes. I think the size could be cut in half by using a base value for CSNs if we assume that no xid is active for longer than 2B transactions as is currently the case. I didn't want to include the complication in the first iteration, so I didn't verify if that would have any gotchas. In Postgres-R, I effectively used a 32-bit order id which wraps around. In this case, I guess adjusting the base value will get tricky. Wrapping could probably be used as well, instead. The number of times each cache line can be invalidated is bounded by 8. Hm.. good point. We are also planning to implement CSN based snapshot. So I am curious to know whether any further development is happening on this. If not then what is the reason? Am I missing something? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows
On Fri, Jan 24, 2014 at 9:00 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem. Please check that. I looked through your patch. I'm fine with the PQhostaddr(). I didn't notice the problem when hostaddr was passed to psql on Windows. OTOH, regarding PQhost(), I think we had better take my patch. connectOptions2() computes and set derived values to PGconn, so that PGconn's members have values which are actually used for connection. To follow that rule, PGconn-pghost may as well have localhost on machines without UNIX domain sockets. I'm not sure if we should follow that rule. As far as I read the libpq code, at least connectFailureMessage() and connectDBStart() do the same things as PQhost() does now. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On 24/01/2014 12:56, Andrew Dunstan wrote: On 01/24/2014 01:20 AM, Marco Atzeri wrote: AFAICT the regression is in Cygwin. The buildfarm passes because it's using an oldish Cygwin release, 1.7.7 rather than the current 1.7.27. I have brought the regression the athe attention of the Cygwin people in the past, but without response. which issue ? During my package tests I have only two issues: tsearch ... FAILED and test: prepared_xacts must be skipped as it never completes Those two issues need to be fixed. And yes, they are regressions from my Cygwin 1.7.7 setup where they pass consistently, just about every day. See http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=brolgabr=HEAD 1.7.7 is 3.5 years hold. In the mean time we added 64 bit and moved to gcc-4.8.x You don't get to choose which regression tests you're going to pass and which you're not. Disabling the tests or providing nonsensical results files are unacceptable. This is a Cygwin behavior issue and needs to be fixed. Distributing broken binary that crashes after standard rebase, it is also not acceptable and it is also worst. Your farm is not testing this case, I presume. Until I took over there was NO recent cygwin package at all in the distribution, so do not complain that my binary is not perfect, I already know, but for me almost working is better than NOT available or BROKEN. cheers andrew I am available to work on tests regression, but I am not a postgresql expert so do not expect all the job from my side. Regards Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
From: Fujii Masao masao.fu...@gmail.com On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas Thanks! The patch looks good to me. Attached is the updated version of the patch. I added the comments. Thank you very much. Your comment looks great. I tested some recovery situations, and confirmed that WAL segments were kept/removed as intended. I'll update the CommitFest entry with this patch. hlinnakan...@vmware.com wrote: Sorry for reacting so slowly, but I'm not sure I like this patch. It's a quite useful property that all the WAL files that are needed for recovery are copied into pg_xlog, even when restoring from archive, even when not doing cascading replication. It guarantees that you can restart the standby, even if the connection to the archive is lost for some reason. I intentionally changed the behavior for archive recovery too, when it was introduced for cascading replication. Also, I think it's good that the behavior does not depend on whether cascading replication is enabled - it's a quite subtle difference. So, IMHO this is not a bug, it's a feature. Yep. I understood the benefit for the standby recovery. To solve the original problem of running out of disk space in archive recovery, I wonder if we should perform restartpoints more aggressively. We intentionally don't trigger restatpoings by checkpoint_segments, only checkpoint_timeout, but I wonder if there should be an option for that. That's an option. MauMau, did you try simply reducing checkpoint_timeout, while doing recovery? The problem is, we might not be able to perform restartpoints more aggressively even if we reduce checkpoint_timeout in the server under the archive recovery. Because the frequency of occurrence of restartpoints depends on not only that checkpoint_timeout but also the checkpoints which happened while the server was running. I haven't tried reducing checkpoint_timeout. I think we cannot take that approach, because we cannot suggest appropriate checkpoint_timeout to users. That is, what checkpoint_timeout setting can we suggest so that WAL doesn't accumulate in pg_xlog/ more than 9.1? In addition, as Fujii-san said, it doesn't seem we can restartpoint completely. Plus, if we can cause restartpoints frequently, the recovery would take (much?) longer, because shared buffers are flushed more frequently. So, how about just removing AllowCascadeReplication() condition from the patch? That allows WAL to accumulate in pg_xlog/ during standby recovery but not during archive recovery. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan24, 2014, at 08:47 , Dean Rasheed dean.a.rash...@gmail.com wrote: I think it should probably be broken up. It might be overly ambitious to try to get all of this committed during this commitfest, and in any case, I suspect that the committer would probably choose to commit it in stages. Perhaps something like: Patch 1 - Basic support for inverse transition functions, CREATE AGGREGATE support and doc updates. This should include test cases to validate that the underlying executor changes are correct, by defining custom aggregates such as sum(int) and array_agg() using inverse transition functions. Patch 2 - Add built-in inverse transition functions for count, sum(int), and friends. Patch 3, 4... - Other related groups of built-in aggregates. By this point, it should be a fairly mechanical process. Splitting it up this way now should help to focus on getting patch 1 correct, without being distracted by all the other aggregates that may or may not usefully be made to have inverse transition functions. I think the value of the feature has been proved, and it is good to see that it can be applied to so many aggregates, but let's not try to do it all at once. Working on that now, will post individual patches later today. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows
From: Fujii Masao masao.fu...@gmail.com On Fri, Jan 24, 2014 at 9:00 PM, MauMau maumau...@gmail.com wrote: OTOH, regarding PQhost(), I think we had better take my patch. connectOptions2() computes and set derived values to PGconn, so that PGconn's members have values which are actually used for connection. To follow that rule, PGconn-pghost may as well have localhost on machines without UNIX domain sockets. I'm not sure if we should follow that rule. As far as I read the libpq code, at least connectFailureMessage() and connectDBStart() do the same things as PQhost() does now. Yes, I found them ugly, too. Maybe we can refactor them as a separate patch. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
Am 23.01.14 02:14, schrieb Jim Nasby: On 1/19/14, 5:51 PM, Dave Chinner wrote: Postgres is far from being the only application that wants this; many people resort to tmpfs because of this: https://lwn.net/Articles/499410/ Yes, we covered the possibility of using tmpfs much earlier in the thread, and came to the conclusion that temp files can be larger than memory so tmpfs isn't the solution here.:) Although... instead of inventing new APIs and foisting this work onto applications, perhaps it would be better to modify tmpfs such that it can handle a temp space that's larger than memory... possibly backing it with X amount of real disk and allowing it/the kernel to decide when to passively move files out of the in-memory tmpfs and onto disk. This is exactly what I'd expect from a file system that's suitable for tmp purposes. The current tmpfs better should have been named memfs or so, since it lacks the dedicated disk backing storage. Regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On Thu, Jan 23, 2014 at 2:01 AM, Greg Stark st...@mit.edu wrote: On Wed, Jan 22, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote: Probably Heroku has some more specific exploit case to be concerned about here; if so, might I suggest taking it up with the -security list? I don't think there's a specific vulnerability that needs to be kept secret here. Here's an example. I just created a new hobby database which is on a multi-tenant cluster and ran select * from pg_stat_activity. Here are two of the more interesting examples: 463752 | de5nmf0gbii3u5 | 32250 | 463751 | qspfkgrwgqtbcu | unicorn worker[1] -p 30390 -c ./config/unicorn.rb || | | | | | | || insufficient privilege 463752 | de5nmf0gbii3u5 | 32244 | 463751 | qspfkgrwgqtbcu | unicorn worker[0] -p 30390 -c ./config/unicorn.rb || | | | | | | || insufficient privilege Note that the contents of the ARGV array are being set by the unicorn task queuing library. It knows it's making this information visible to other users with shell access on this machine. But the decision to stuff the ARGV information into the application_name is being made by the Pg driver. Neither is under the control of the application author who may not even be aware this is happening. Neither component has the complete information to make a competent decision about whether this information is safe to be in application_name or not. Note that the query is showing as insufficient privilege even though it is listed in the ps output -- the same ps output that is listing the unicorn ARGV that is being shown in the application_name You might say that the Pg gem is at fault for making a blanket policy decision for applications that the ARGV is safe to show to other database users but realistically it's so useful to see this information for your own connections that it's probably the right decision. Without it it's awfully hard to tell which worker is on which connection. It would just be nice to be able to treat application_name the same as query. I would say that yes, this is clearly broken in the Pg gem. I can see it having such a default, but not allowing an override... The application can of course issue a SET application_name, assuming there is a hook somewhere in the system that will run after the connection has been established. I've had customers use that many times in java based systems for example, but I don't know enough about the pg gem, or unicorn, to have a clue if anything like it exists there. This is also a good way to track how connections are used throughout a pooled system where the same connection might be used for different things at different times. What actually happens if you set the application_name in the connection string in that environment? Does it override it to it's own default? If so, the developers there clearly need to be taught about fallback_application_name. And what happens if you set it in PGAPPNAME? Long term I agree we should really have some way of controlling these permissions more fine grained, but I just blanket hiding application name for non-superusers seems like a bad solution that still only fixes a small part of the problem. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Change authentication error message (patch)
Bruce Momjian br...@momjian.us writes: On Thu, Jan 23, 2014 at 10:39:34PM -0500, Tom Lane wrote: I'm not convinced that this improves anything. The problem might not in fact be either of the things you mention, in which case the new message is outright misleading. Also, what of the policy stated in the header comment for the function you're hacking, ie we intentionally don't reveal the precise cause of the failure to the client? Well, the only solution then would be to add some weasel words like perhaps expired password, but that seems so rare that I doubt it would apply very often and seems like an odd suggestion. We could go with: password authentication failed for user \%s\: perhaps invalid or expired password We did have two threads on this issue in the past 12 months so I figured we should try to do something. I agree with doing *something*, but this particular thing seems to violate our very long-standing policy on how to deal with authentication failures, as well as being too vague to be really useful. What would be well within that policy is to log additional information into the postmaster log. I see that md5_crypt_verify knows perfectly well whether the problem is no password set, wrong password, or expired password. I don't see anything wrong with having it emit a log entry --- maybe not in the second case for fear of log-spam complaints, but certainly the third case and maybe the first one. Or possibly cleaner, have it return additional status so that auth_failed() can include the info in the main ereport using errdetail_log(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating
Rushabh, really sorry I have to re-create the patch and thanks a lot for looking at it! Looking at the patch once again, I see that there were at least two problems. Firstly, I used the equivalent_locale function also on the encoding values. Even if that should not cause bugs (as it should result in strncasecmp anyway), it was not pretty.. The second problem was assuming that the locale specifier A is not longer then locale specifier B. Comparisons like 'en_US.utf8' with 'en_US_.utf8' would result in success. Bug resulting from this mistake is not real probably but it is not nice anyway.. Rather cleaning the patch once more, attached, Pavel From 35b9f600b592db24bb0e25d168bc5955087d65df Mon Sep 17 00:00:00 2001 From: Pavel Raiskup prais...@redhat.com Date: Sat, 21 Dec 2013 01:27:01 +0100 Subject: [PATCH] pg_upgrade: make the locale comparison more tolerating Locale strings specified like 'cs_CZ.utf8' and 'cs_CZ.UTF-8' should be treat as equivalent. Absence of taking these as equivalents caused fail during major server upgrade (when the server machine has a little different encoding then the not yet actualized data stack). Workaround for that was changing the system locale to match the previous locale string. Applying of this commit makes the comparison to be done in two phases. Firstly is compared the encoding part of the locale string (if any) and then the rest of string. Before the encoding part is compared, it is decoded into precisely defined code from 'enum pg_enc'. This should make the comparison more stable even for completely different spelling of encoding (e.g. 'latin2' and 'iso 8859-2'). References: 3356208.rhzgij6...@nb.usersys.redhat.com 20121002155857.ge30...@momjian.us --- contrib/pg_upgrade/check.c | 58 +++--- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index a706708..2adefb2 *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** *** 9,14 --- 9,15 #include postgres_fe.h + #include mb/pg_wchar.h #include pg_upgrade.h *** set_locale_and_encoding(ClusterInfo *clu *** 393,398 --- 394,450 PQfinish(conn); } + /* + * equivalent_encoding() + * + * Best effort encoding comparison. Return true only if the encodings both + * are correctly spelled and when they are equivalent. + */ + static bool + equivalent_encoding(const char *chara, const char *charb) + { + int enca = pg_valid_server_encoding(chara); + int encb = pg_valid_server_encoding(charb); + + if (enca 0 || encb 0) + return false; + + return (enca == encb); + } + + /* + * equivalent_locale() + * + * Best effort locale comparison. Return false if we are not 100% sure the + * locale is equivalent. + */ + static bool + equivalent_locale(const char *loca, const char *locb) + { + int lencmp; + const char *chara = strrchr(loca, '.'); + const char *charb = strrchr(locb, '.'); + + if (!chara || !charb) + /* not both locale strings do contain encoding part */ + return (pg_strcasecmp(loca, locb) == 0); + chara++; + charb++; + + if (!equivalent_encoding(chara, charb)) + return false; + + /* + * We know the encoding part is equivalent. So now compare only the + * locale identifier (e.g. en_US part of en_US.utf8). + */ + + lencmp = chara - loca; + if (lencmp != charb - locb) + return false; + + return (pg_strncasecmp(loca, locb, lencmp) == 0); + } /* * check_locale_and_encoding() *** check_locale_and_encoding(ControlData *o *** 409,421 * They also often use inconsistent hyphenation, which we cannot fix, e.g. * UTF-8 vs. UTF8, so at least we display the mismatching values. */ ! if (pg_strcasecmp(oldctrl-lc_collate, newctrl-lc_collate) != 0) pg_fatal(lc_collate cluster values do not match: old \%s\, new \%s\\n, oldctrl-lc_collate, newctrl-lc_collate); ! if (pg_strcasecmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0) pg_fatal(lc_ctype cluster values do not match: old \%s\, new \%s\\n, oldctrl-lc_ctype, newctrl-lc_ctype); ! if (pg_strcasecmp(oldctrl-encoding, newctrl-encoding) != 0) pg_fatal(encoding cluster values do not match: old \%s\, new \%s\\n, oldctrl-encoding, newctrl-encoding); } --- 461,473 * They also often use inconsistent hyphenation, which we cannot fix, e.g. * UTF-8 vs. UTF8, so at least we display the mismatching values. */ ! if (!equivalent_locale(oldctrl-lc_collate, newctrl-lc_collate)) pg_fatal(lc_collate cluster values do not match: old \%s\, new \%s\\n, oldctrl-lc_collate, newctrl-lc_collate); ! if (!equivalent_locale(oldctrl-lc_ctype, newctrl-lc_ctype)) pg_fatal(lc_ctype cluster values do not match: old \%s\, new \%s\\n, oldctrl-lc_ctype, newctrl-lc_ctype); ! if (!equivalent_encoding(oldctrl-encoding, newctrl-encoding))
Re: [HACKERS] proposal: hide application_name from other users
On Fri, Jan 24, 2014 at 6:46 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Jan 23, 2014 at 2:01 AM, Greg Stark st...@mit.edu wrote: On Wed, Jan 22, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote: Probably Heroku has some more specific exploit case to be concerned about here; if so, might I suggest taking it up with the -security list? I don't think there's a specific vulnerability that needs to be kept secret here. Here's an example. I just created a new hobby database which is on a multi-tenant cluster and ran select * from pg_stat_activity. Here are two of the more interesting examples: 463752 | de5nmf0gbii3u5 | 32250 | 463751 | qspfkgrwgqtbcu | unicorn worker[1] -p 30390 -c ./config/unicorn.rb || | | | | | | || insufficient privilege 463752 | de5nmf0gbii3u5 | 32244 | 463751 | qspfkgrwgqtbcu | unicorn worker[0] -p 30390 -c ./config/unicorn.rb || | | | | | | || insufficient privilege Note that the contents of the ARGV array are being set by the unicorn task queuing library. It knows it's making this information visible to other users with shell access on this machine. But the decision to stuff the ARGV information into the application_name is being made by the Pg driver. Neither is under the control of the application author who may not even be aware this is happening. Neither component has the complete information to make a competent decision about whether this information is safe to be in application_name or not. Note that the query is showing as insufficient privilege even though it is listed in the ps output -- the same ps output that is listing the unicorn ARGV that is being shown in the application_name You might say that the Pg gem is at fault for making a blanket policy decision for applications that the ARGV is safe to show to other database users but realistically it's so useful to see this information for your own connections that it's probably the right decision. Without it it's awfully hard to tell which worker is on which connection. It would just be nice to be able to treat application_name the same as query. I would say that yes, this is clearly broken in the Pg gem. I can see it having such a default, but not allowing an override... Uhm, it does allow an override as I said before. The application can of course issue a SET application_name, assuming there is a hook somewhere in the system that will run after the connection has been established. I've had customers use that many times in java based systems for example, but I don't know enough about the pg gem, or unicorn, to have a clue if anything like it exists there. This is also a good way to track how connections are used throughout a pooled system where the same connection might be used for different things at different times. What actually happens if you set the application_name in the connection string in that environment? Does it override it to it's own default? If so, the developers there clearly need to be taught about fallback_application_name. It can be overridden using any of these methods. It does in fact use fallback_application_name when it defaults to $0, see https://bitbucket.org/ged/ruby-pg/src/6c2444dc63e17eb695363993e8887cc5d67750bc/lib/pg/connection.rb?at=default#cl-46 And what happens if you set it in PGAPPNAME? It works fine: ``` PGAPPNAME=this_is_a_custom_app_name ruby -w -rpg -e PG.connect(dbname: 'hgmnz', host: 'localhost').exec('SELECT application_name FROM pg_stat_activity') { |res| res.each { |row| puts row.values_at('application_name') } } this_is_a_custom_app_name ``` -Harold -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Erik Rijkers wrote: On Thu, September 26, 2013 00:34, Erik Rijkers wrote: On Wed, September 25, 2013 22:34, Alvaro Herrera wrote: [minmax-5.patch] I have the impression it's not quite working correctly. Here's a version 7 of the patch, which fixes these bugs and adds opclasses for a bunch more types (timestamp, timestamptz, date, time, timetz), courtesy of Martín Marqués. It's also been rebased to apply cleanly on top of today's master branch. I have also added a selectivity function, but I'm not positive that it's very useful yet. This patch doesn't appear to have been submitted to any Commitfest. Is this still a feature undergoing research then? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
On Thu, Jan 23, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote: I also wonder if we should use the terminology attach instead of acquire; that pairs more naturally with release. Then the message, if we want more than an assert, might be this backend is already attached to a replication slot. I went with Acquire/Release because our locking code does so, and it seemed sensible to be consistent. I don't have strong feelings about it. Yeah, but I think a slot is not really the same thing as a lock. Acquire/release might be OK. In some of my recent code I used attach/detach, which feels a little more natural to me for something like this, so I lean that direction. Unfortunately not. Inside the walsender there's currently no LWLockReleaseAll() for ERRORs since commands aren't run inside a transaction command... But maybe I should have fixed this by adding the release to WalSndErrorCleanup() instead? That'd still leave the problematic case that currently we try to delete a replication slot inside a CATCH when we fail while initializing the rest of logical replication... But I guess adding it would be a good idea independent of that. +1. I think that if we can't rely on error handling to clean up the same things everywhere, it's gonna be a mess. People won't be able to keep track of which error cleanup is engaged in which code paths, and screw-ups will result when old code paths are called from new call sites. We could also do a StartTransactionCommand() but I'd rather not, that currently prevents code in that vicinity from doing anything it shouldn't via various Assert()s in existing code. Like what? I mean, I'm OK with having a partial error-handling environment if that's all we need, but I think it's a bad plan to the extent that the code here needs to be aware of error-handling differences versus expectations elsewhere in our code. This doesn't need the LWLockRelease either. It does need the SpinLockRelease, but as I think I noted previously, the right way to write this is probably: SpinLockAcquire(slot-mutex); was_active = slot-active; slot-active = true; SpinLockRelease(slot-mutex); if (was_active) ereport(). MyReplicatonSlot = slot. That's not really simpler tho? But if you prefer I can go that way. It avoids a branch while holding the lock, and it puts the SpinLockAcquire/Release pair much closer together, so it's easier to visually verify that the lock is released in all cases. ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and the comment Provide interlock against concurrent recomputations doesn't seem adequate to me. I guess the idea here is that we regard ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and ReplicationSlotCtl-data_xmin, but if that's the idea then we only need to hold the lock during the time when we actually update those values, not the loop where we compute them. There's a comment someplace else to that end, but yes, that's essentially the idea. I decided to take it during the whole recomputation because we also take ProcArrayLock when creating a new decoding slot and initially setting -catalog_xmin. That's not strictly required but seemed simpler that way, and the code shouldn't be very hot. The code that initially computes the starting value for catalog_xmin when creating a new decoding slot has to take ProcArrayLock to be safe, that's why I though it'd be convenient to always use it for those values. I don't really see why it's simpler that way. It's clearer what the point of the lock is if you only hold it for the operations that need to be protected by that lock. In all other cases where we modify *_xmin we're only increasing it which doesn't need a lock (HS feedback never has taken one, and GetSnapshotData() modifies -xmin while holding a shared lock), the only potential danger is a slight delay in increasing the overall value. Right. We might want to comment such places. Also, if that's the design, maybe they should be part of PROC_HDR *ProcGlobal rather than here. It seems weird to have some of the values protected by ProcArrayLock live in a completely different data structure managed almost entirely by some other part of the system. Don't we already have cases of that? I seem to remember so. If you prefer having them there, I am certainly fine with doing that. This way they aren't allocated if slots are disabled but it's just two TransactionIds. Let's go for it, unless we think of a reason not to. It's pretty evident that what's currently patch #4 (only peg the xmin horizon for catalog tables during logical decoding) needs to become patch #1, because it doesn't make any sense to apply this before we do that. Well, the slot code and the the slot support for streaming rep are independent from and don't use it. So they easily can come before it. But this code is riddled with places where you track a catalog xmin and a data xmin separately. The
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
On Tue, Jan 14, 2014 at 06:04:47PM +0900, Kyotaro HORIGUCHI wrote: I proposed 3 types of solution but the one of them - tweaking Equivalence Class (Type 1)- was dismissed because of inappropriate manipulation on Equivalence Class. The rest do essentially the same thing - flattening appendrels - at different timings. In type 2, the process is implemented as a generic appendrel flattening function and applied after expand_inherited_tables() in subquery_planner. In type 3, the equivelant process is combined in expand_inherited_rtentry(). Type 2 loops once for whole appendrel list (in most cases) so it might be more effective than type 3 which searches the parent appendrel for each appended ones. Type 3 is more approprieate design assuming that appendrel tree deeper than 2 levels will be generated only by expand_inherited_tables(). Let's focus on type 3; Tom and I both found it most promising. The attached two patches are rebased to current 9.4dev HEAD and make check at the topmost directory and src/test/isolation are passed without error. One bug was found and fixed on the way. It was an assertion failure caused by probably unexpected type conversion introduced by collapse_appendrels() which leads implicit whole-row cast from some valid reltype to invalid reltype (0) into adjust_appendrel_attrs_mutator(). What query demonstrated this bug in the previous type 2/3 patches? - unionall_inh_idx_typ3_v4_20140114.patch You have not addressed my review comments from November: http://www.postgresql.org/message-id/20131123073913.ga1008...@tornado.leadboat.com Specifically, these: [transvar_merge_mutator()] has roughly the same purpose as adjust_appendrel_attrs_mutator(), but it propagates the change to far fewer node types. Why this case so much simpler? The parent translated_vars of parent_appinfo may bear mostly-arbitrary expressions. ... I get this warning: prepunion.c: In function `expand_inherited_rtentry': prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from incompatible pointer type ... Approaches (2) and (3) leave the inheritance parent with rte-inh == true despite no AppendRelInfo pointing to it as their parent. Until now, expand_inherited_rtentry() has been careful to clear rte-inh in such cases. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
Hi Now patch applies cleanly and works. :-) But I have some notes: 1. There is an odd underscore character in functions find_in_extension_control_path and list_extension_control_paths: \extension_control__path\ 2. If we have several versions of one extension in different directories (which are listed in extension_control_path parameter) then we get strange output from pg_available_extensions and pg_available_extension_versions views (Information about extension, whose path is at the beginning of the list, is duplicated). And only one version of the extension can be created. See examples: /extensions/ ├── postgis-2.0.4 │ ├── postgis--2.0.4.sql │ └── postgis.control └── postgis-2.1.1 ├── postgis--2.1.1.sql └── postgis.control = postgresql.conf: extension_control_path = '/extensions/postgis-2.0.4:/extensions/postgis-2.1.1' postgres=# table pg_catalog.pg_available_extensions; name | default_version | installed_version | comment -+-+---+- postgis | 2.0.4 | | PostGIS geometry, geography, and raster spatial types and functions postgis | 2.0.4 | | PostGIS geometry, geography, and raster spatial types and functions (2 rows) postgres=# table pg_catalog.pg_available_extension_versions; name | version | installed | superuser | relocatable | schema | requires | comment -+-+---+---+-++--+- postgis | 2.0.4 | f | t | t || | PostGIS geometry, geography, and raster spatial types and functions postgis | 2.0.4 | f | t | t || | PostGIS geometry, geography, and raster spatial types and functions (2 rows) = postgresql.conf: extension_control_path = '/extensions/postgis-2.1.1:/extensions/postgis-2.0.4' postgres=# table pg_catalog.pg_available_extensions; name | default_version | installed_version | comment -+-+---+- postgis | 2.1.1 | | PostGIS geometry, geography, and raster spatial types and functions postgis | 2.1.1 | | PostGIS geometry, geography, and raster spatial types and functions (2 rows) postgres=# create extension postgis; CREATE EXTENSION postgres=# SELECT PostGIS_version(); postgis_version --- 2.1 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (1 row) postgres=# table pg_catalog.pg_available_extensions; name | default_version | installed_version | comment -+-+---+- postgis | 2.1.1 | 2.1.1 | PostGIS geometry, geography, and raster spatial types and functions postgis | 2.1.1 | 2.1.1 | PostGIS geometry, geography, and raster spatial types and functions (2 rows) 3. It would be fine to see an extension control path in pg_available_extensions and pg_available_extension_versions views (in separate column or within of extension name). 4. Perhaps the CREATE EXTENSION command should be improved to allow creation of the required version of the extension. So we can use different versions of extensions in different databases. PS Sorry for my English. 2014/1/24 Fabrízio de Royes Mello fabriziome...@gmail.com On Fri, Jan 24, 2014 at 6:57 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Sergey Muraviov sergey.k.murav...@gmail.com writes: I can't apply the patch. Did you try using the `patch`(1) command? The PostgreSQL project policy is to not use the git format when sending patches to the mailing list, prefering the context diff format. So you need to resort to using the basic patch commands rather than the modern git tooling. See also: http://wiki.postgresql.org/wiki/Submitting_a_Patch Patches must be in a format which provides context (eg: Context Diff); 'normal' or 'plain' diff formats are not acceptable. Would be nice if we can use git apply command... :-) -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello -- Best regards, Sergey Muraviov
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 07:22 PM, Alvaro Herrera wrote: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. Anything which depends on a timing-based feedback loop is going to be hopeless. Saying autovac shouldn't run if load is high sounds like a simple statement, until you actually try to implement it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
On 2014-01-24 12:10:50 -0500, Robert Haas wrote: Unfortunately not. Inside the walsender there's currently no LWLockReleaseAll() for ERRORs since commands aren't run inside a transaction command... But maybe I should have fixed this by adding the release to WalSndErrorCleanup() instead? That'd still leave the problematic case that currently we try to delete a replication slot inside a CATCH when we fail while initializing the rest of logical replication... But I guess adding it would be a good idea independent of that. +1. I think that if we can't rely on error handling to clean up the same things everywhere, it's gonna be a mess. People won't be able to keep track of which error cleanup is engaged in which code paths, and screw-ups will result when old code paths are called from new call sites. Ok, I'll additionally add it there. We could also do a StartTransactionCommand() but I'd rather not, that currently prevents code in that vicinity from doing anything it shouldn't via various Assert()s in existing code. Like what? I mean, I'm OK with having a partial error-handling environment if that's all we need, but I think it's a bad plan to the extent that the code here needs to be aware of error-handling differences versus expectations elsewhere in our code. Catalog lookups, building a snapshot, xid assignment, whatnot. All that shouldn't happen in the locations creating/dropping a slot. I think we should at some point separate parts of the error handling out of xact.c. Currently its repeated slightly differently over logs of places (check e.g. the callsites for LWLockReleaseAll), that's not robust. But that's a project for another day. Note that the actual decoding *does* happen inside a TransactionCommand, as it'd be failure prone to copy all the cleanup logic. And we need to have most of the normal cleanup code. I don't really see why it's simpler that way. It's clearer what the point of the lock is if you only hold it for the operations that need to be protected by that lock. In all other cases where we modify *_xmin we're only increasing it which doesn't need a lock (HS feedback never has taken one, and GetSnapshotData() modifies -xmin while holding a shared lock), the only potential danger is a slight delay in increasing the overall value. Right. We might want to comment such places. Don't we already have cases of that? I seem to remember so. If you prefer having them there, I am certainly fine with doing that. This way they aren't allocated if slots are disabled but it's just two TransactionIds. Let's go for it, unless we think of a reason not to. ok on those counts. It's pretty evident that what's currently patch #4 (only peg the xmin horizon for catalog tables during logical decoding) needs to become patch #1, because it doesn't make any sense to apply this before we do that. Well, the slot code and the the slot support for streaming rep are independent from and don't use it. So they easily can come before it. But this code is riddled with places where you track a catalog xmin and a data xmin separately. The only point of doing it that way is to support a division that hasn't been made yet. If you think it will make stuff more manageable I can try separating all lines dealing with catalog_xmin into another patch as long as data_xmin doesn't have to be renamed. That said, I don't really think it's a big problem that the division hasn't been made, essentially the meaning is different, even if we don't take advantage of it yet. data_xmin is there for streaming replication where we need to prevent all removal, catalog_xmin is there for changeset extraction. I have zero confidence that it's OK to treat fsync() as an operation that won't fail. Linux documents EIO as a plausible error return, for example. (And really, how could it not?) But quite fundamentally having a the most basic persistency building block fail is something you can't really handle safely. Note that issue_xlog_fsync() has always (and I wager, will always) treated that as a PANIC. I don't recall many complaints about that for WAL. All of the ones I found in a quick search were like oh, the fs invalidated my fd because of corruption. And few. Calling a slot old or new looks liable to cause problems. Maybe change those names to contain a character not allowed in a slot name, if we're going to keep doing it that way. I wondered about making them plain files as well but given the need for a directory independent from this I don't really see the advantage, we'll need to handle them anyway during cleanup. Yeah, sure, but if it makes for fewer in-between states, it might be worth it. I don't think it'd make anything simpler with the new version of the code. Agreed? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services --
Re: [HACKERS] Change authentication error message (patch)
I wrote: I agree with doing *something*, but this particular thing seems to violate our very long-standing policy on how to deal with authentication failures, as well as being too vague to be really useful. What would be well within that policy is to log additional information into the postmaster log. I see that md5_crypt_verify knows perfectly well whether the problem is no password set, wrong password, or expired password. I don't see anything wrong with having it emit a log entry --- maybe not in the second case for fear of log-spam complaints, but certainly the third case and maybe the first one. Or possibly cleaner, have it return additional status so that auth_failed() can include the info in the main ereport using errdetail_log(). Attached is a proposed patch that does exactly that. This just addresses the cases mentioned above; once the infrastructure is in place, there might be more things that would be worth logging using this mechanism. regards, tom lane diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 882dc8f..1974b57 100644 *** a/src/backend/libpq/auth.c --- b/src/backend/libpq/auth.c *** *** 38,46 * */ static void sendAuthRequest(Port *port, AuthRequest areq); ! static void auth_failed(Port *port, int status); static char *recv_password_packet(Port *port); ! static int recv_and_check_password_packet(Port *port); /* --- 38,46 * */ static void sendAuthRequest(Port *port, AuthRequest areq); ! static void auth_failed(Port *port, int status, char *logdetail); static char *recv_password_packet(Port *port); ! static int recv_and_check_password_packet(Port *port, char **logdetail); /* *** ClientAuthentication_hook_type ClientAut *** 207,216 * in use, and these are items that must be presumed known to an attacker * anyway. * Note that many sorts of failure report additional information in the ! * postmaster log, which we hope is only readable by good guys. */ static void ! auth_failed(Port *port, int status) { const char *errstr; int errcode_return = ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION; --- 207,217 * in use, and these are items that must be presumed known to an attacker * anyway. * Note that many sorts of failure report additional information in the ! * postmaster log, which we hope is only readable by good guys. In ! * particular, if logdetail isn't NULL, we send that string to the log. */ static void ! auth_failed(Port *port, int status, char *logdetail) { const char *errstr; int errcode_return = ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION; *** auth_failed(Port *port, int status) *** 273,286 } if (port-hba) ! ereport(FATAL, ! (errcode(errcode_return), ! errmsg(errstr, port-user_name), ! errdetail_log(Connection matched pg_hba.conf line %d: \%s\, port-hba-linenumber, port-hba-rawline))); ! else ! ereport(FATAL, ! (errcode(errcode_return), ! errmsg(errstr, port-user_name))); /* doesn't return */ } --- 274,294 } if (port-hba) ! { ! char *cdetail; ! ! cdetail = psprintf(_(Connection matched pg_hba.conf line %d: \%s\), ! port-hba-linenumber, port-hba-rawline); ! if (logdetail) ! logdetail = psprintf(%s\n%s, logdetail, cdetail); ! else ! logdetail = cdetail; ! } ! ! ereport(FATAL, ! (errcode(errcode_return), ! errmsg(errstr, port-user_name), ! logdetail ? errdetail_log(%s, logdetail) : 0)); /* doesn't return */ } *** void *** 294,299 --- 302,308 ClientAuthentication(Port *port) { int status = STATUS_ERROR; + char *logdetail = NULL; /* * Get the authentication method to use for this frontend/database *** ClientAuthentication(Port *port) *** 507,518 (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), errmsg(MD5 authentication is not supported when \db_user_namespace\ is enabled))); sendAuthRequest(port, AUTH_REQ_MD5); ! status = recv_and_check_password_packet(port); break; case uaPassword: sendAuthRequest(port, AUTH_REQ_PASSWORD); ! status = recv_and_check_password_packet(port); break; case uaPAM: --- 516,527 (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), errmsg(MD5 authentication is not supported when \db_user_namespace\ is enabled))); sendAuthRequest(port, AUTH_REQ_MD5); ! status = recv_and_check_password_packet(port, logdetail); break; case uaPassword: sendAuthRequest(port, AUTH_REQ_PASSWORD); ! status = recv_and_check_password_packet(port, logdetail);
Re: [HACKERS] Minmax indexes
Thom Brown wrote: On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Erik Rijkers wrote: On Thu, September 26, 2013 00:34, Erik Rijkers wrote: On Wed, September 25, 2013 22:34, Alvaro Herrera wrote: [minmax-5.patch] I have the impression it's not quite working correctly. Here's a version 7 of the patch, which fixes these bugs and adds opclasses for a bunch more types (timestamp, timestamptz, date, time, timetz), courtesy of Martín Marqués. It's also been rebased to apply cleanly on top of today's master branch. I have also added a selectivity function, but I'm not positive that it's very useful yet. This patch doesn't appear to have been submitted to any Commitfest. Is this still a feature undergoing research then? It's still a planned feature, but I didn't have time to continue work for 2014-01. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 24 January 2014 17:53, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Thom Brown wrote: On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Erik Rijkers wrote: On Thu, September 26, 2013 00:34, Erik Rijkers wrote: On Wed, September 25, 2013 22:34, Alvaro Herrera wrote: [minmax-5.patch] I have the impression it's not quite working correctly. Here's a version 7 of the patch, which fixes these bugs and adds opclasses for a bunch more types (timestamp, timestamptz, date, time, timetz), courtesy of Martín Marqués. It's also been rebased to apply cleanly on top of today's master branch. I have also added a selectivity function, but I'm not positive that it's very useful yet. This patch doesn't appear to have been submitted to any Commitfest. Is this still a feature undergoing research then? It's still a planned feature, but I didn't have time to continue work for 2014-01. Alles klar. Thanks -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Fri, Jan 24, 2014 at 2:44 PM, Josh Berkus j...@agliodbs.com wrote: On 01/23/2014 07:22 PM, Alvaro Herrera wrote: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. Anything which depends on a timing-based feedback loop is going to be hopeless. Saying autovac shouldn't run if load is high sounds like a simple statement, until you actually try to implement it. Exactly. But people tuning autovac down are doing exactly that: trying to tune autovac to background-only work. They *must* then launch foreground vacuums, at times they deem sensible, when doing that. So, problem is not of people tuning down autovacuum, but of them forgetting to vacuum explicitly after doing so. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Fri, Jan 24, 2014 at 2:54 PM, Thom Brown t...@linux.com wrote: On 24 January 2014 17:53, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Thom Brown wrote: On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Erik Rijkers wrote: On Thu, September 26, 2013 00:34, Erik Rijkers wrote: On Wed, September 25, 2013 22:34, Alvaro Herrera wrote: [minmax-5.patch] I have the impression it's not quite working correctly. Here's a version 7 of the patch, which fixes these bugs and adds opclasses for a bunch more types (timestamp, timestamptz, date, time, timetz), courtesy of Martín Marqués. It's also been rebased to apply cleanly on top of today's master branch. I have also added a selectivity function, but I'm not positive that it's very useful yet. This patch doesn't appear to have been submitted to any Commitfest. Is this still a feature undergoing research then? It's still a planned feature, but I didn't have time to continue work for 2014-01. What's the status? I believe I have more than a use for minmax indexes, and wouldn't mind lending a hand if it's within my grasp. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 01/24/2014 01:58 PM, Alexander Korotkov wrote: On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: In summary, these are fairly small patches, and useful on their, so I think these should be committed now. But please take a look and see if the logic in scanGetItem/keyGetItem looks correct to you. After this, I think the main fast scan logic will go into keyGetItem. Good, thanks! Now I can reimplement fast scan basing on this patches. I hope we're not wasting effort doing the same thing, but I was also hacking that; here's what I got. It applies on top of the previous set of patches. I decided to focus on the ginget.c changes, and worry about the catalog changes later. Instead, I added an abstraction for calling the ternary consistent function, with a shim implementation that checks if there is only one UNKNOWN argument, and tries the regular boolean consistent function both ways for the UNKNOWN argument. That's the same strategy we were already using when dealing with a key with one lossy page, so I refactored that to also use the new ternary consistent function. That abstraction can be used to do other optimizations in the future. For example, building the truth table like I suggested a long time ago, or simply checking for some common cases, like if the consistent function implements plain AND logic. Or caching the results of expensive consistent functions. And obviously, that is where the call to the opclass-provided tri-state consistent function will go to. Then, I rewrote keyGetItem to make use of the ternary consistent function, to perform the pre-consistent check. That is the core logic from your patch. Currently (ie. without the patch), we loop through all the entries, and advance them to the next item pointer advancePast, and then perform the consistent check for the smallest item among those. With the patch, we first determine the smallest item pointer among the entries with curItem advancePast, and call that minItem. The others are considered as unknown, as they might contain an item X where advancePast X minItem. Normally, we would have to load the next item from that entry to determine if X exists, but we can skip it if the ternary pre-consistent function says that X cannot match anyway. In addition to that, I'm using the ternary consistent function to check if minItem is a match, even if we haven't loaded all the entries yet. That's less important, but I think for something like rare1 | (rare2 frequent) it might be useful. It would allow us to skip fetching 'frequent', when we already know that 'rare1' matches for the current item. I'm not sure if that's worth the cycles, but it seemed like an obvious thing to do, now that we have the ternary consistent function. (hmm, I should put the above paragraphs in a comment in the patch) This isn't exactly the same structure as in your patch, but I found the concept easier to understand when written this way. I did not implement the sorting of the entries. It seems like a sensible thing to do, but I'd like to see a test case that shows the difference before bothering with it. If we do it, a binary heap probably makes more sense than keeping the array fully sorted. There's one tradeoff here that should be mentioned: In most cases, it's extremely cheap to fetch the next item from an entry stream. We load a page worth of items into the array, so it's just a matter of pulling the next item from the array. Instead of trying to refute such items based on other entries, would it be better to load them and call the consistent function the normal way for them? Refuting might slash all the entries in one consistent check, but OTOH, when refuting fails, the consistent check was a waste of cycles. If we only tried to refute items when the alternative would be to load a new page, there would be less change of a performance regression from this patch. Anyway, how does this patch look to you? Did I get the logic correct? Do you have some test data and/or queries that you could share easily? - Heikki From eb9c6a202cbb0ab03181cb19a434deb6082da497 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 23 Jan 2014 23:08:43 +0200 Subject: [PATCH 1/1] Add the concept of a ternary consistent check, and use it to skip entries. When we have loaded the next item from some, but not all, entries in a scan, it might be possible to prove that there cannot be any matches with smaller item pointer coming from the other entries. In that case, we can fast-forward those entries to the smallest item among the already-fetched sources. There is no support for opclass-defined ternary consistent functions yet, but there is a shim function that calls the regular, boolean, consistent function both ways, when only one input is unknown. Per the concept by Alexander Korotkov --- src/backend/access/gin/Makefile | 2 +-
Re: [HACKERS] new json funcs
On 01/22/2014 12:49 PM, Andrew Dunstan wrote: On 01/21/2014 06:21 PM, Marko Tiikkaja wrote: Hi Andrew, On 1/18/14, 10:05 PM, I wrote: But I'll continue with my review now that this has been sorted out. Sorry about the delay. I think the API for the new functions looks good. They are all welcome additions to the JSON family. The implementation side looks reasonable to me. I'm not sure there's need to duplicate so much code, though. E.g. json_to_recordset is almost identical to json_populate_recordset, and json_to_record has a bit of the same disease. Finally, (as I'm sure you know already), docs are still missing. Marking the patch Waiting on Author for the time being. New patch attached. Main change is I changed json_populate_record/json_to_record to call a common worker function, and likewise with json_populate_recordset/json_to_recordset. We're still finalizing the docs - should be ready in the next day or so. OK, here's the patch, this time with docs, thanks to Merlin Moncure and Josh Berkus for help with that. I want to do some more wordsmithing around json_to_record{set} and json_populate_record{set}, but I think this is close to being committable as is. cheers andrew diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c0a75de..d20e0ea 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10300,6 +10300,136 @@ table2-mapping entryliteraljson_typeof('-123.4')/literal/entry entryliteralnumber/literal/entry /row + row + entry + indexterm + primaryjson_build_array/primary + /indexterm + literaljson_build_array(VARIADIC any)/literal + /entry + entrytypejson/type/entry + entry + Builds a heterogeneously typed json array out of a variadic agument list. + /entry + entryliteralSELECT json_build_array(1,2,'3',4,5);/literal/entry + entry +programlisting + json_build_array +--- + [1, 2, 3, 4, 5] + /programlisting + /entry + /row + row + entry + indexterm + primaryjson_build_object/primary + /indexterm + literaljson_build_object(VARIADIC any)/literal + /entry + entrytypejson/type/entry + entry + Builds a JSON array out of a variadic agument list. By convention, the object is + constructed out of alternating name/value arguments. + /entry + entryliteralSELECT json_build_object('foo',1,'bar',2);/literal/entry + entry +programlisting + json_build_object + + {foo : 1, bar : 2} + /programlisting + /entry + /row + row + entry + indexterm + primaryjson_object/primary + /indexterm + literaljson_object(text[])/literal + /entry + entrytypejson/type/entry + entry + Builds a JSON object out of a text array. The array must have exactly one dimension + with an even number of members, in which case they are taken as alternating name/value + pairs, or two dimensions with such that each inner array has exactly two elements, which + are taken as a name/value pair. + /entry + entryliteralselect * from json_object('{a, 1, b, def, c, 3.5}') or literalselect * from json_object('{{a, 1},{b, def},{c, 3.5}}')/literal/literal/entry + entry +programlisting + json_object +--- + {a : 1, b : def, c : 3.5} + /programlisting + /entry + /row + row + entry + literaljson_object(keys text[], values text[])/literal + /entry + entrytypejson/type/entry + entry + The two argument form of JSON object takes keys and values pairwise from two separate + arrays. In all other respects it is identical to the one argument form. + /entry + entryliteralselect * from json_object('{a, b}', '{1,2}');/literal/entry + entry +programlisting + json_object + + {a : 1, b : 2} + /programlisting + /entry + /row + row + entry + indexterm + primaryjson_to_record/primary + /indexterm + literaljson_to_record(json, nested_as_text bool)/literal + /entry + entrytyperecord/type/entry + entry + json_to_record returns an arbitrary record from a JSON object. As with all functions + returning 'record', the caller must explicitly define the structure of the record + when making the call. The input JSON must be an object, not a scalar or an array. + If nested_as_text is true, the function coerces nested complex elements to text. + Also, see notes below on columns and types. + /entry + entryliteralselect * from json_to_record('{a:1,b:[1,2,3],c:bar}',true) as x(a int, b text, d text) /literal/entry + entry +programlisting + a |b| d
[HACKERS] LIKE INCLUDING CONSTRAINTS is broken
It seems CREATE TABLE ... (LIKE INCLUDING CONSTRAINTS) doesn't work cleanly when there's also regular inheritance; my guess is that attnums get messed up at some point after the constraints are generated. Here's a trivial test case: create table b (b1 int unique check (b1 100)); CREATE TABLE c (c1 int not null references b (b1)); create table d (d1 int, d2 point not null); create table a (a1 int not null, a2 text primary key, a3 timestamptz(6), like b including constraints, like c) inherits (d); You can see the broken state: alvherre=# \d [ab] Tabla «public.a» Columna |Tipo | Modificadores -+-+--- d1 | integer | d2 | point | not null a1 | integer | not null a2 | text| not null a3 | timestamp(6) with time zone | b1 | integer | c1 | integer | not null Índices: a_pkey PRIMARY KEY, btree (a2) Restricciones CHECK: b_b1_check CHECK (a2 100) Hereda: d Tabla «public.b» Columna | Tipo | Modificadores -+-+--- b1 | integer | Índices: b_b1_key UNIQUE CONSTRAINT, btree (b1) Restricciones CHECK: b_b1_check CHECK (b1 100) Referenciada por: TABLE c CONSTRAINT c_c1_fkey FOREIGN KEY (c1) REFERENCES b(b1) Notice how the CHECK constraint in table b points to column b1, but in table a it is mentioning column a2, even though that one is not even of the correct datatype. In fact if you try an insert, you get a weird error message: alvherre=# insert into a (d2, a2, a1, c1) values ('(1, 0)', '1', 1, 1); ERROR: attribute 4 has wrong type DETALLE: Table has type text, but query expects integer. If I take out the INHERITS clause in table a, the error disappears. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery to backup point
On 01/24/2014 01:37 PM, MauMau wrote: Hi, Heiki-san, From: MauMau maumau...@gmail.com From: Heikki Linnakangas hlinnakan...@vmware.com After some refactoring and fixing bugs in the existing code, I came up with the attached patch. I called the option simply recovery_target, with the only allowed value of immediate. IOW, if you want to stop recovery as early as possible, you add recovery_target='immediate' to recovery.conf. Now that we have four different options to set the recovery target with, I rearranged the docs slightly. How does this look to you? I'm almost comfortable with your patch. There are two comments: C1. The following parts seem to be mistakenly taken from my patch. These are not necessary for your patch, aren't they? I'm going to add the attached new revision of the patch soon, which is almost based on yours. All what I modified is removal of parts I mentioned above. I confirmed that the original problem could be solved. Thanks. Thanks, committed! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. On 24 January 2014 10:26, Andrew Dunstan and...@dunslane.net wrote: On 01/22/2014 12:49 PM, Andrew Dunstan wrote: On 01/21/2014 06:21 PM, Marko Tiikkaja wrote: Hi Andrew, On 1/18/14, 10:05 PM, I wrote: But I'll continue with my review now that this has been sorted out. Sorry about the delay. I think the API for the new functions looks good. They are all welcome additions to the JSON family. The implementation side looks reasonable to me. I'm not sure there's need to duplicate so much code, though. E.g. json_to_recordset is almost identical to json_populate_recordset, and json_to_record has a bit of the same disease. Finally, (as I'm sure you know already), docs are still missing. Marking the patch Waiting on Author for the time being. New patch attached. Main change is I changed json_populate_record/json_to_record to call a common worker function, and likewise with json_populate_recordset/json_to_recordset. We're still finalizing the docs - should be ready in the next day or so. OK, here's the patch, this time with docs, thanks to Merlin Moncure and Josh Berkus for help with that. I want to do some more wordsmithing around json_to_record{set} and json_populate_record{set}, but I think this is close to being committable as is. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: Il 08/01/14 18:42, Simon Riggs ha scritto: Not sure I see why it needs to be an SRF. It only returns one row. Attached is version 4, which removes management of SRF stages. I have been looking at v4 a bit more, and found a couple of small things: - a warning in pgstatfuncs.c - some typos and format fixing in the sgml docs - some corrections in a couple of comments - Fixed an error message related to pg_stat_reset_shared referring only to bgwriter and not the new option archiver. Here is how the new message looks: =# select pg_stat_reset_shared('popo'); ERROR: 22023: unrecognized reset target: popo HINT: Target must be bgwriter or archiver A new version is attached containing those fixes. I played also with the patch and pgbench, simulating some archive failures and successes while running pgbench and the reports given by pg_stat_archiver were correct, so I am marking this patch as Ready for committer. I refactored the patch further. * Remove ArchiverStats global variable to simplify pgarch.c. * Remove stats_timestamp field from PgStat_ArchiverStats struct because it's not required. I have some review comments: +s.archived_wals, +s.last_archived_wal, +s.last_archived_wal_time, +s.failed_attempts, +s.last_failed_wal, +s.last_failed_wal_time, The column names of pg_stat_archiver look not consistent at least to me. What about the followings? archive_count last_archived_wal last_archived_time fail_count last_failed_wal last_failed_time I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */ +TimestampTz last_archived_wal_timestamp;/* last archival success */ +PgStat_Counter failed_attempts; +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved in failure */ Some hackers don't like the increase of the size of the statsfile. In order to reduce the size as much as possible, we should use the exact size of WAL file here instead of MAXFNAMELEN? Regards, -- Fujii Masao *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** *** 270,275 postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re --- 270,283 /row row + entrystructnamepg_stat_archiver/indextermprimarypg_stat_archiver/primary/indexterm/entry + entryOne row only, showing statistics about the +WAL archiver process's activity. See +xref linkend=pg-stat-archiver-view for details. + /entry + /row + + row entrystructnamepg_stat_bgwriter/indextermprimarypg_stat_bgwriter/primary/indexterm/entry entryOne row only, showing statistics about the background writer process's activity. See *** *** 648,653 postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re --- 656,718 /para /note + table id=pg-stat-archiver-view xreflabel=pg_stat_archiver +titlestructnamepg_stat_archiver/structname View/title + +tgroup cols=3 + thead + row + entryColumn/entry + entryType/entry + entryDescription/entry + /row + /thead + + tbody + row + entrystructfieldarchived_wals//entry + entrytypebigint/type/entry + entryNumber of WAL files that have been successfully archived/entry + /row + row + entrystructfieldlast_archived_wal//entry + entrytypetext/type/entry + entryName of the last WAL file successfully archived/entry + /row + row + entrystructfieldlast_archived_wal_time//entry + entrytypetimestamp with time zone/type/entry + entryTime of the last successful archive operation/entry + /row + row + entrystructfieldfailed_attempts//entry + entrytypebigint/type/entry + entryNumber of failed attempts for archiving WAL files/entry + /row + row + entrystructfieldlast_failed_wal//entry + entrytypetext/type/entry + entryName of the WAL file of the last failed archival operation/entry + /row + row + entrystructfieldlast_failed_wal_time//entry + entrytypetimestamp with time zone/type/entry + entryTime of the last failed archival operation/entry + /row + row + entrystructfieldstats_reset//entry + entrytypetimestamp with time zone/type/entry + entryTime at which these statistics were last reset/entry + /row + /tbody +/tgroup + /table + + para +The structnamepg_stat_archiver/structname view will always have a +single row, containing
Re: [HACKERS] Standalone synchronous master
ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/24/2014 12:59 PM, Andrew Dunstan wrote: On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. FWIW, I prefer the parameter to having differently named functions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. I don't feel that we'll never do auto-degrade is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL
On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian br...@momjian.us wrote: Is everyone else OK with this approach? Updated patch attached. Hi, I started to look at this patch and i found that it fails an assertion as soon as you run a VACUUM FULL after a lazy VACUUM even if those are on unrelated relations. For example in an assert-enabled build with the regression database run: VACUUM customer; [... insert here whatever commands you like or nothing at all ...] VACUUM FULL tenk1; TRAP: FailedAssertion(!(InRecovery || ( ((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) = NBuffers (heapBuf) = -NLocBuffer)) || (ExceptionalCondition(!((heapBuf) = NBuffers (heapBuf) = -NLocBuffer), (FailedAssertion), visibilitymap.c, 260), 0, (heapBuf) != 0 )), File: visibilitymap.c, Line: 260) LOG: server process (PID 25842) was terminated by signal 6: Aborted DETAIL: Failed process was running: vacuum FULL customer; LOG: terminating any other active server processes trace: (gdb) bt #0 0x7f9a3d00d475 in *__GI_raise (sig=optimized out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x7f9a3d0106f0 in *__GI_abort () at abort.c:92 #2 0x00777597 in ExceptionalCondition ( conditionName=conditionName@entry=0x7cd3b8 !(InRecovery || ( ((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) = NBuffers (heapBuf) = -NLocBuffer)) || (ExceptionalCondition(\!((heapBuf) = NBuffers (heapBuf) = -NLocBuffer)\, (\Fai..., errorType=errorType@entry=0x7b0730 FailedAssertion, fileName=fileName@entry=0x7cd105 visibilitymap.c, lineNumber=lineNumber@entry=260) at assert.c:54 #3 0x004a7d99 in visibilitymap_set (rel=rel@entry=0x7f9a3da56a00, heapBlk=heapBlk@entry=0, heapBuf=heapBuf@entry=0, recptr=recptr@entry=0, vmBuf=220, cutoff_xid=2) at visibilitymap.c:260 #4 0x004a33e5 in update_page_vm (relation=0x7f9a3da56a00, page=page@entry=0x1868b18 , blkno=0) at rewriteheap.c:702 #5 0x004a3668 in raw_heap_insert (state=state@entry=0x1849e98, tup=tup@entry=0x184f208) at rewriteheap.c:641 #6 0x004a3b8b in rewrite_heap_tuple (state=state@entry=0x1849e98, old_tuple=old_tuple@entry=0x1852a50, new_tuple=new_tuple@entry=0x184f208) at rewriteheap.c:433 #7 0x0055c373 in reform_and_rewrite_tuple (tuple=tuple@entry=0x1852a50, oldTupDesc=oldTupDesc@entry=0x7f9a3da4d350, newTupDesc=newTupDesc@entry=0x7f9a3da599a8, values=values@entry=0x1852f40, isnull=isnull@entry=0x184f920 , newRelHasOids=1 '\001', rwstate=rwstate@entry=0x1849e98) at cluster.c:1670 -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Jan24, 2014, at 22:29 , Josh Berkus j...@agliodbs.com wrote: On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. I don't feel that we'll never do auto-degrade is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. There seemed to be at least some support for having way to manually degrade from sync rep to async rep via something like ALTER SYSTEM SET synchronous_commit='local'; Doing that seems unlikely to meet much resistant on grounds of principle, so it seems to me that working on that would be the best way forward for the submitter. I don't know how hard it would be to pull this off, though. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_get_viewdefs() indentation considered harmful
We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. pg_get_viewdefs indents each branch of the union by 8 spaces more than the previous branch. The net effect is that the size of the output is O(n^2) bytes just for the indentation spaces. If your union branches are about 5 lines long then you hit MaxAlloc after about 5,000 branches. And incidentally there's no CHECK_FOR_INTERRUPTS in that loop. I would actually suggest pg_dump should be able to pass a flag to disable indentation but then I noticed that all the code is already there. It's just that every single variation of pg_get_viewdef sets the indentation flag explicitly. I wonder if this was the intended behaviour. Shouldn't pg_get_viewdef(view, false) set indentation off? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
Greg Stark st...@mit.edu writes: We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. Really? Given how poorly the system performs with that many inheritance children, I've got a hard time believing either that this is common or that ruleutils is your worst problem with it. pg_get_viewdefs indents each branch of the union by 8 spaces more than the previous branch. I think that's because the unions are a nested binary tree so far as the parsetree representation goes. We could probably teach ruleutils to flatten the display in common cases, but it might be a bit tricky to be sure we don't create any lies about unusual nestings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Recovery inconsistencies, standby much larger than primary
Since the point release we've run into a number of databases that when we restore from a base backup end up being larger than the primary database was. Sometimes by a large factor. The data below is from 9.1.11 (both primary and standby) but we've seen the same thing on 9.2.6. primary$ for i in 1261982 1364767 1366221 473158 ; do echo -n $i ; du -shc $i* | tail -1 ; done 1261982 29G total 1364767 23G total 1366221 12G total 473158 76G total standby$ for i in 1261982 1364767 1366221 473158 ; do echo -n $i ; du -shc $i* | tail -1 ; done 1261982 55G total 1364767 28G total 1366221 17G total 473158 139G total I've run the snaga xlogdump on the WAL records played before reaching a consistent point (we confirmed the extra storage had already appeared by then) and grepped for the above relfilenode but they're quite large. I believe these dumps don't contain any sensitive data, when I verify that I can upload one of them for inspection. $ ls -lh [14]* -rw-rw-r-- 1 heroku heroku 325M Jan 24 04:13 1261982 -rw-r--r-- 1 root root 352M Jan 25 00:04 1364767 -rw-r--r-- 1 root root 123M Jan 25 00:04 1366221 -rw-r--r-- 1 root root 357M Jan 25 00:04 473158 The first three are btrees and the fourth is a haeap btw. We're also seeing log entries about wal contains reference to invalid pages but these errors seem only vaguely correlated. Sometimes we get the errors but the tables don't grow noticeably and sometimes we don't get the errors and the tables are much larger. Much of the added space is uninitialized pages as you might expect but I don't understand is how the database can start up without running into the reference to invalid pages panic consistently. We check both that there are no references after consistency is reached *and* that any references before consistency are resolved by a truncate or unlink before consistency. The primary was never this large btw, so it's not just a case of leftover files from drops or truncates that might have failed on the standby. I'm assuming this is somehow related to the mulixact or transaction wraparound problems but I don't really understand how they could be hitting when both the primary and standby are post-upgrade to the most recent point release which have the fixes -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: pg_get_viewdefs indents each branch of the union by 8 spaces more than the previous branch. I think that's because the unions are a nested binary tree so far as the parsetree representation goes. We could probably teach ruleutils to flatten the display in common cases, but it might be a bit tricky to be sure we don't create any lies about unusual nestings. That may be a worthwhile thing to do. But it strikes me that pg_dump, at least when not doing an SQL dump, really has no reason to ask for indentation at all. It's already asking for non-prettyprinted output, why not make non-prettyprinted also mean non-indented? Also, it also seems like a reasonable safeguard that if the underlying rule is larger than, say, 32kB or maybe 128kB you probably don't care about indentation. That would cover all the system views except a handful that seem to have peculiarly large pg_rewrite rules considering their SQL definition isn't especially large: daeqck898dvduj= select ev_class::regclass, length(ev_action) rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len, length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from pg_rewrite order by 2 desc limit 20; ev_class | rewrite_len | prettyprint_len | non_prettyprint_len +-+-+- pg_seclabels | 138565 | 13528 | 13758 information_schema.columns | 126787 | 6401 |6642 information_schema.usage_privileges| 93863 | 11653 | 11939 information_schema.attributes | 83268 | 4264 |4417 information_schema.referential_constraints | 81762 | 2527 |2653 pg_statio_all_tables | 59617 | 1023 |1061 pg_stats | 59331 | 2803 |2899 information_schema.domains | 54611 | 3206 |3320 information_schema.element_types | 53049 | 5608 |5762 information_schema.routines| 52333 | 7852 |8089 information_schema.column_privileges | 49229 | 3883 |3954 pg_indexes | 46717 | 458 | 486 information_schema.check_constraints | 42132 | 1375 |1443 information_schema.tables | 37458 | 2122 |2212 pg_stat_all_indexes| 35426 | 508 | 528 pg_statio_all_indexes | 35412 | 490 | 512 information_schema.table_constraints | 31882 | 3098 |3231 information_schema.column_udt_usage| 31731 | 1034 |1090 information_schema.parameters | 30497 | 3640 |3750 pg_stat_all_tables | 27193 | 1367 |1387 (20 rows) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
Argh. Attached is a plain text file with that query data. I'll be switching back to Gnus any day now. daeqck898dvduj= select ev_class::regclass, length(ev_action) rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len, length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from pg_rewrite order by 2 desc limit 20; ev_class | rewrite_len | prettyprint_len | non_prettyprint_len +-+-+- pg_seclabels | 138565 | 13528 | 13758 information_schema.columns | 126787 |6401 | 6642 information_schema.usage_privileges| 93863 | 11653 | 11939 information_schema.attributes | 83268 |4264 | 4417 information_schema.referential_constraints | 81762 |2527 | 2653 pg_statio_all_tables | 59617 |1023 | 1061 pg_stats | 59331 |2803 | 2899 information_schema.domains | 54611 |3206 | 3320 information_schema.element_types | 53049 |5608 | 5762 information_schema.routines| 52333 |7852 | 8089 information_schema.column_privileges | 49229 |3883 | 3954 pg_indexes | 46717 | 458 | 486 information_schema.check_constraints | 42132 |1375 | 1443 information_schema.tables | 37458 |2122 | 2212 pg_stat_all_indexes| 35426 | 508 | 528 pg_statio_all_indexes | 35412 | 490 | 512 information_schema.table_constraints | 31882 |3098 | 3231 information_schema.column_udt_usage| 31731 |1034 | 1090 information_schema.parameters | 30497 |3640 | 3750 pg_stat_all_tables | 27193 |1367 | 1387 (20 rows) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
Greg Stark st...@mit.edu writes: But it strikes me that pg_dump, at least when not doing an SQL dump, really has no reason to ask for indentation at all. It's already asking for non-prettyprinted output, why not make non-prettyprinted also mean non-indented? We do go to some effort to make pg_dump's output legible, so I'm not especially on board with this idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] libpq: Support TLSv1.1+ (was: fe-secure.c and SSL/TLS)
[restored Cc: list] On Thu, Jan 09, 2014 at 10:12:28PM -0800, Wim Lewis wrote: I applied both libpq.tls11plus.diff and the related psql.conninfo.tlsver.diff patch to postgresql git head. psql.conninfo.tlsver.diff is not so essential for debugging purposes since commit 4cba1f6bbf7c8f956c95e72c43e517a56b97665b, but it still seems like a perfectly reasonable addition. Source review: The source changes are pretty tiny. Although I think the change from TLSv1_method to SSLv23_method is correct, the comment is not quite correct: * SSLv23_method() is only method that negotiates * higher protocol versions. Rest of the methods * allow only one specific TLS version. As I understand it (backed up by a quick glance through the openssl source), the TLSv1_method, TLSv1_1_method, and TLSv1_2_method will all advertise the corresponding protocol version to the peer, meaning that in practice they will negotiate *up to* that TLS version, but will still negotiate down to SSLv3. So, using TLSv1_2_method would give the right behavior when compiled against a recent openssl. However, someday when TLSv1.3 (or 2.0) appears, presumably the SSLv23_method will be extended to include it but TLSv1_2_method would have to be changed to TLSv1_3_method. Therefore using SSLv23_method and disabling older protocol versions with SSL_CTX_set_options() should have the desired behavior even in future versions. (And it doesn't require autoconf to probe the openssl version.) With OpenSSL 1.0.1f, I see results consistent with the comment. Using TLSv1_1_method() gets a TLSv1.1 connection against a server capable of TLSv1.2, and it fails against a server capable of no more than TLSv1. The patch's use of SSLv23_method() gets TLSv1.2 from the first server and TLSv1 from the second server. Testing: I built the patched postgresql against a handful of openssl versions: 1.0.1 (netbsd, x86-64, supports TLSv1.1); Git head aka 1.0.1f++ (osx, x86-32, supports TLSv1.2), and 0.9.8y (osx, x86-32, supports TLSv1.0). They all built cleanly and passed 'make check'. I also built 'contrib' and installed the sslinfo extension. I connected between each pair of versions (with psql) and saw that the connection negotiated the highest protocol version supported by both ends and a corresponding ciphersuite. /conninfo and the sslinfo extension agreed on the protocol version and ciphersuite in use. Things I didn't test: Client certificates, restricted sets of ciphersuites, MITM protocol-downgrade attacks, non-x86 architectures, or 1.0.0* versions of openssl. The level of testing you did made for an excellent review. Thank you. I've committed both patches. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
On Fri, Jan 24, 2014 at 12:49 PM, Andres Freund and...@2ndquadrant.com wrote: But this code is riddled with places where you track a catalog xmin and a data xmin separately. The only point of doing it that way is to support a division that hasn't been made yet. If you think it will make stuff more manageable I can try separating all lines dealing with catalog_xmin into another patch as long as data_xmin doesn't have to be renamed. That said, I don't really think it's a big problem that the division hasn't been made, essentially the meaning is different, even if we don't take advantage of it yet. data_xmin is there for streaming replication where we need to prevent all removal, catalog_xmin is there for changeset extraction. I spent some more time studying the 0001 and 0002 patches this afternoon, with a side dish of 0004. I'm leaning toward thinking we should go ahead and make that division. I'm also wondering about whether we've got the right naming here. AFAICT, it's not the case that we're going to use the catalog xmin for catalogs and the data xmin for non-catalogs. Rather, the catalog xmin is going to always be included in globalxmin calculations, so IOW it should just be called xmin. The data xmin is going to be included only for non-catalog tables. I guess data is a reasonable antonym for catalog, but I'm slightly tempted to propose RecentGlobalNonCatalogXmin and similar. Maybe that's too ugly to live, but I can see someone failing to guess what the exact distinction is between xmin and data xmin, and I bet they'd be a lot less likely to misguess if we wrote non catalog. It's interesting (though not immediately relevant) to speculate about how we might extend this to fine-grained xmin tracking more generally. The design sketch that comes to mind (and I think parts of this have been proposed before) is to have a means by which backends can promise not to lock any more tables except under a new snapshot. At the read committed isolation level, or in any single-statement transaction, backends can so promise whenever (a) all tables mentioned in the query have been locked and (b) all functions to be invoked during the query via the fmgr interface promise (e.g. via function labeling) that they won't directly or indirectly do such a thing. If they break their promise, we detect it and ereport(ERROR). Backends that have made such a guarantee can be ignored for global-xmin calculations that don't involve the tables they have locked. One idea is to keep a hash table keyed by dboid, reloid with some limited number of entries in shared memory; it caches the table-specific xmin, a usage counter, and a flag indicating whether the cached xmin might be stale. In order to promise not to lock any new tables, backends must make or update entries for all the tables they already have locked in this hash table; if there aren't enough entries, they're not allowed to promise. Thus, backends wishing to prune can use the cached xmin value if it's present (optionally updating it if it's stale) and the minimum of the xmins of the backends that haven't made a promise if it isn't. This is a bit hairy though; access to the shared hash table had better be *really* fast, and we'd better not need to recompute the cached value too often. Anyway, whether we end up pursuing something like that or not, I think I'm persuaded that this particular optimization won't really be a problem for hypothetical future work in this area; and also that it's a good idea to do this much now specifically for logical decoding. I have zero confidence that it's OK to treat fsync() as an operation that won't fail. Linux documents EIO as a plausible error return, for example. (And really, how could it not?) But quite fundamentally having a the most basic persistency building block fail is something you can't really handle safely. Note that issue_xlog_fsync() has always (and I wager, will always) treated that as a PANIC. I don't recall many complaints about that for WAL. All of the ones I found in a quick search were like oh, the fs invalidated my fd because of corruption. And few. Well, you have a point. And certainly this version looks much better than the previous version in terms of the likelihood of PANIC occurring in practice. But I wonder if we couldn't cut it down even further without too much effort. Suppose we define a slot to exist if, and only if, the state file exists. A directory without a state file is subject to arbitrary removal. Then we can proceed as follows: - mkdir() the directory. - open() state.tmp - write() state.tmp - close() state.tmp - fsync() parent directory, directory and state.tmp - rename() state.tmp to state - fsync() state If any step except the last one fails, no problem. The next startup can nuke the leftovers; any future attempt to create a slot with the name can ignore an EEXIST failure from mkdir() and procedure just as above. Only a failure of the very last fsync is
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. Really? Given how poorly the system performs with that many inheritance children, I've got a hard time believing either that this is common or that ruleutils is your worst problem with it. Doesn't make it a bad idea to fix it. You may need hundreds or thousands of union branches for this to totally break the world, but you only need about five for it to be annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. Really? Given how poorly the system performs with that many inheritance children, I've got a hard time believing either that this is common or that ruleutils is your worst problem with it. Doesn't make it a bad idea to fix it. You may need hundreds or thousands of union branches for this to totally break the world, but you only need about five for it to be annoying. Indeed even aside from the performance questions, once you're indented 5-10 times the indention stops being useful at all. The query would probably be even more readable if we just made indentation modulo 40 so once you get too far indented it wraps around which is not unlike how humans actually indent things in this case. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Fri, Jan 24, 2014 at 12:58 PM, Claudio Freire klaussfre...@gmail.com wrote: What's the status? I believe I have more than a use for minmax indexes, and wouldn't mind lending a hand if it's within my grasp. I'm also interested in looking at this. Mostly because I have ideas for other summary functions that would be interesting and could use the same infrastructure otherwise. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL
On Fri, Jan 24, 2014 at 04:52:55PM -0500, Jaime Casanova wrote: On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian br...@momjian.us wrote: Is everyone else OK with this approach? Updated patch attached. Hi, I started to look at this patch and i found that it fails an assertion as soon as you run a VACUUM FULL after a lazy VACUUM even if those are on unrelated relations. For example in an assert-enabled build with the regression database run: VACUUM customer; [... insert here whatever commands you like or nothing at all ...] VACUUM FULL customer; Wow, thanks for the testing. You are right that I had two bugs, both in visibilitymap_set(). First, I made setting heapBuf optional, but forgot to remove the Assert check now that it was optional. Second, I passed InvalidBlockNumber instead of InvalidBuffer when calling visibilitymap_set(). Both are fixed in the attached patch. Thanks for the report. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c new file mode 100644 index c34ab98..ff275fa *** a/src/backend/access/heap/rewriteheap.c --- b/src/backend/access/heap/rewriteheap.c *** *** 107,112 --- 107,114 #include access/rewriteheap.h #include access/transam.h #include access/tuptoaster.h + #include access/visibilitymap.h + #include commands/vacuum.h #include storage/bufmgr.h #include storage/smgr.h #include utils/memutils.h *** typedef OldToNewMappingData *OldToNewMap *** 172,177 --- 174,180 /* prototypes for internal functions */ static void raw_heap_insert(RewriteState state, HeapTuple tup); + static void update_page_vm(Relation relation, Page page, BlockNumber blkno); /* *** end_heap_rewrite(RewriteState state) *** 281,286 --- 284,290 true); RelationOpenSmgr(state-rs_new_rel); + update_page_vm(state-rs_new_rel, state-rs_buffer, state-rs_blockno); PageSetChecksumInplace(state-rs_buffer, state-rs_blockno); smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno, *** raw_heap_insert(RewriteState state, Heap *** 633,638 --- 637,643 */ RelationOpenSmgr(state-rs_new_rel); + update_page_vm(state-rs_new_rel, page, state-rs_blockno); PageSetChecksumInplace(page, state-rs_blockno); smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, *** raw_heap_insert(RewriteState state, Heap *** 678,680 --- 683,705 if (heaptup != tup) heap_freetuple(heaptup); } + + static void + update_page_vm(Relation relation, Page page, BlockNumber blkno) + { + Buffer vmbuffer = InvalidBuffer; + TransactionId visibility_cutoff_xid; + + visibilitymap_pin(relation, blkno, vmbuffer); + Assert(BufferIsValid(vmbuffer)); + + if (!visibilitymap_test(relation, blkno, vmbuffer) + heap_page_is_all_visible(relation, InvalidBuffer, page, + visibility_cutoff_xid)) + { + PageSetAllVisible(page); + visibilitymap_set(relation, blkno, InvalidBuffer, + InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); + } + ReleaseBuffer(vmbuffer); + } diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c new file mode 100644 index 899ffac..3e7546b *** a/src/backend/access/heap/visibilitymap.c --- b/src/backend/access/heap/visibilitymap.c *** visibilitymap_set(Relation rel, BlockNum *** 257,263 #endif Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); - Assert(InRecovery || BufferIsValid(heapBuf)); /* Check that we have the right heap page pinned, if present */ if (BufferIsValid(heapBuf) BufferGetBlockNumber(heapBuf) != heapBlk) --- 257,262 *** visibilitymap_set(Relation rel, BlockNum *** 278,284 map[mapByte] |= (1 mapBit); MarkBufferDirty(vmBuf); ! if (RelationNeedsWAL(rel)) { if (XLogRecPtrIsInvalid(recptr)) { --- 277,283 map[mapByte] |= (1 mapBit); MarkBufferDirty(vmBuf); ! if (RelationNeedsWAL(rel) BufferIsValid(heapBuf)) { if (XLogRecPtrIsInvalid(recptr)) { diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c new file mode 100644 index 75e5f15..8d84bef *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** static void lazy_record_dead_tuple(LVRel *** 152,159 ItemPointer itemptr); static bool lazy_tid_reaped(ItemPointer itemptr, void *state); static int vac_cmp_itemptr(const void *left, const void *right); - static bool heap_page_is_all_visible(Relation rel, Buffer buf, - TransactionId *visibility_cutoff_xid); /* --- 152,157 *** lazy_vacuum_page(Relation onerel, BlockN *** 1232,1238 * check if the page has become
Re: [HACKERS] extension_control_path
On Fri, Jan 24, 2014 at 6:57 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Would be nice if we can use git apply command... git apply seems to have raised pedantry to an art form. Not only won't it apply patches in any format other than the one it likes, it'll fail to apply any part of the patch if there are any failing hunks; I don't think it tolerates fuzz, either. You can override some of these behaviors but not all of them. It seems like somebody designed this tool more with the idea of preventing people from applying patches than actually doing it. patch, on the other hand, makes the very reasonable assumption that if you didn't want to apply the patch, you wouldn't have run the patch command in the first place. It does its best to make sense of whatever you feed it, and if it can't apply the whole thing, it still applies as much as it can. I find this much more desirable behavior. It may be the policy of other projects to reject patches for trivial formatting mistakes or minor fuzz, but it's not the policy here, and I think that's a good thing. We typically bounce things for rebasing if there are actual rejects, but not otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark st...@mit.edu wrote: On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. Really? Given how poorly the system performs with that many inheritance children, I've got a hard time believing either that this is common or that ruleutils is your worst problem with it. Doesn't make it a bad idea to fix it. You may need hundreds or thousands of union branches for this to totally break the world, but you only need about five for it to be annoying. Indeed even aside from the performance questions, once you're indented 5-10 times the indention stops being useful at all. The query would probably be even more readable if we just made indentation modulo 40 so once you get too far indented it wraps around which is not unlike how humans actually indent things in this case. Ha! That seems a little crazy, but *capping* the indentation at some reasonable value might not be dumb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Review] inherit support for foreign tables
Hi Fujita-san, Thanks for the review. 2014/1/23 Etsuro Fujita fujita.ets...@lab.ntt.co.jp: Shigeru Hanada wrote: Though this would be debatable, in current implementation, constraints defined on a foreign table (now only NOT NULL and CHECK are supported) are not enforced during INSERT or UPDATE executed against foreign tables. This means that retrieved data might violates the constraints defined on local side. This is debatable issue because integrity of data is important for DBMS, but in the first cut, it is just documented as a note. I agree with you, but we should introduce a new keyword such as ASSERTIVE or something in 9.4, in preparation for the enforcement of constraints on the local side in a future release? What I'm concerned about is, otherwise, users will have to rewrite those DDL queries at that point. No? In the original thread, whether the new syntax is necessary (maybe ASSERTIVE will not be used though) is under discussion. Anyway, your point, decrease user's DDL rewrite less as possible is important. Could you post review result and/or your opinion as the reply to the original thread, then we include other people interested in this feature can share discussion. Because I don't see practical case to have a foreign table as a parent, and it avoid a problem about recursive ALTER TABLE operation, foreign tables can't be a parent. I agree with you on that point. For other commands recursively processed such as ANALYZE, foreign tables in the leaf of inheritance tree are ignored. I'm not sure that in the processing of the ANALYZE command, we should ignore child foreign tables. It seems to me that the recursive processing is not that hard. Rather what I'm concerned about is that if the recursive processing is allowed, then autovacuum will probably have to access to forign tables on the far side in order to acquire those sample rows. It might be undesirable from the viewpoint of security or from the viewpoint of efficiency. As you say, it's not difficult to do recursive ANALYZE including foreign tables. The reason why ANALYZE ignores descendant foreign tables is consistent behavior. At the moment, ANALYZE ignores foreign tables in top-level (non-child) when no table name was given, and if we want to ANALYZE foreign tables we need to specify explicitly. I think it's not so bad to show WARNING or INFO message about foreign table ignorance, including which is not a child. --- a/doc/src/sgml/ref/create_foreign_table.sgml +++ b/doc/src/sgml/ref/create_foreign_table.sgml @@ -22,6 +22,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ OPTIONS ( replaceable class=PA\ RAMETERoption/replaceable 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ] [ COLLATE replaceablecollation/replaceable ] [ rep\ laceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ] [, ... ] ] ) +[ INHERITS ( replaceableparent_table/replaceable [, ... ] ) ] SERVER replaceable class=parameterserver_name/replaceable [ OPTIONS ( replaceable class=PARAMETERoption/replaceable 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ] @@ -159,6 +160,17 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name /varlistentry varlistentry + termreplaceable class=PARAMETERparent_table/replaceable/term + listitem + para + The name of an existing table or foreign table from which the new foreign + table automatically inherits all columns, see + xref linkend=ddl-inherit for details of table inheritance. + /para + /listitem + /varlistentry Correct? I think we should not allow a foreign table to be a parent. Oops, good catch. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Change authentication error message (patch)
On Fri, Jan 24, 2014 at 10:10:00AM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Thu, Jan 23, 2014 at 10:39:34PM -0500, Tom Lane wrote: I'm not convinced that this improves anything. The problem might not in fact be either of the things you mention, in which case the new message is outright misleading. Also, what of the policy stated in the header comment for the function you're hacking, ie we intentionally don't reveal the precise cause of the failure to the client? Well, the only solution then would be to add some weasel words like perhaps expired password, but that seems so rare that I doubt it would apply very often and seems like an odd suggestion. We could go with: password authentication failed for user \%s\: perhaps invalid or expired password We did have two threads on this issue in the past 12 months so I figured we should try to do something. I agree with doing *something*, but this particular thing seems to violate our very long-standing policy on how to deal with authentication failures, as well as being too vague to be really useful. What would be well within that policy is to log additional information into the postmaster log. I see that md5_crypt_verify knows perfectly well whether the problem is no password set, wrong password, or expired password. I don't see anything wrong with having it emit a log entry --- maybe not in the second case for fear of log-spam complaints, but certainly the third case and maybe the first one. Or possibly cleaner, have it return additional status so that auth_failed() can include the info in the main ereport using errdetail_log(). I was afraid that PGOPTIONS='-c client_min_messages=log' would allow clients to see the log messages, but in testing I found we don't show them during authentication, and I found this C comment: * client_min_messages is honored only after we complete the * authentication handshake. This is required both for security * reasons and because many clients can't handle NOTICE messages * during authentication. I like the 'LOG' idea very much, and liked your patch too. :-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-23 19:29:23 -0500, Tom Lane wrote: I concur with the other reports that the main problem in this test case is just that the default cost delay settings throttle autovacuum so hard that it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay from the default 20ms to 2ms, it seems to keep up quite nicely, on my machine anyway. Probably other combinations of changes would do it too. Perhaps we need to back off the default cost delay settings a bit? We've certainly heard more than enough reports of table bloat in heavily-updated tables. A system that wasn't hitting the updates as hard as it could might not need this, but on the other hand it probably wouldn't miss the I/O cycles from a more aggressive autovacuum, either. Yes, I think adjusting the default makes sense, most setups that have enough activity that costing plays a role have to greatly increase the values. I'd rather increase the cost limit than reduce cost delay so drastically though, but that's admittedly just gut feeling. Well, I didn't experiment with intermediate values, I was just trying to test the theory that autovac could keep up given less-extreme throttling. I'm not taking any position on just where we need to set the values, only that what we've got is probably too extreme. So, Greg Smith proposed what I think is a very useful methodology for assessing settings in this area: figure out what it works out to in MB/s. If we assume we're going to read and dirty every page we vacuum, and that this will take negligible time of itself so that the work is dominated by the sleeps, the default settings work out to 200/(10 + 20) pages every 20ms, or 2.67MB/s. Obviously, the rate will be 3x higher if the pages don't need to be dirtied, and higher still if they're all in cache, but considering the way the visibility map works, it seems like a good bet that we WILL need to dirty most of the pages that we look at - either they've got dead tuples and need clean-up, or they don't and need to be marked all-visible. A corollary of this is that if you're dirtying heap pages faster than a few megabytes per second, autovacuum, at least with default settings, is not going to keep up. And if you assume that each write transaction dirties at least one heap page, any volume of write transactions in excess of a few hundred per second will meat that criteria. Which is really not that much; a single core can do over 1000 tps with synchronous_commit=off, or if there's a BBWC that can absorb it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FOREIGN TABLE ( ... LIKE ... )
On Sun, Nov 24, 2013 at 02:03:18AM +0100, Vik Fearing wrote: On 10/15/2013 07:50 AM, David Fetter wrote: On Mon, Oct 07, 2013 at 11:16:56PM -0700, David Fetter wrote: Folks, Please find attached a patch implementing and documenting, to some extent, $subject. I did this in aid of being able to import SQL standard catalogs and other entities where a known example could provide a template for a foreign table. Should there be errhint()s, too? Should we pile up all such errors and mention them at the end rather than simply bailing on the first one? TBD: regression tests. Now included: regression tests for disallowed LIKE options. I like this patch, but I don't like its implementation at all. First of all, the documentation doesn't compile: openjade:ref/create_foreign_table.sgml:124:17:E: end tag for LISTITEM omitted, but OMITTAG NO was specified openjade:ref/create_foreign_table.sgml:119:4: start tag was here Fixed. I fixed that, and then noticed that like_option is not explained like it is in CREATE TABLE. Also fixed. Then I got down to the description of the LIKE clause in both pages, and I noticed the last line of CREATE TABLE, which is Inapplicable options (e.g., INCLUDING INDEXES from a view) are ignored.. This is inconsistent with the behavior of this patch to throw errors for inapplicable options. Fixed. Please find attached the next rev :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/ref/create_foreign_table.sgml b/doc/src/sgml/ref/create_foreign_table.sgml index 1ef4b5e..375bd1a 100644 --- a/doc/src/sgml/ref/create_foreign_table.sgml +++ b/doc/src/sgml/ref/create_foreign_table.sgml @@ -20,6 +20,7 @@ synopsis CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name/replaceable ( [ replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ OPTIONS ( replaceable class=PARAMETERoption/replaceable 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ] [ COLLATE replaceablecollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ] +| LIKE replaceablesource_table/replaceable [ replaceablelike_option/replaceable ... ] } [, ... ] ] ) SERVER replaceable class=parameterserver_name/replaceable @@ -31,6 +32,8 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name { NOT NULL | NULL | DEFAULT replaceabledefault_expr/replaceable } + + phrase and replaceable class=PARAMETERlike_option/replaceable is the same as for xref linkend=SQL-CREATETABLE./phrase /synopsis /refsynopsisdiv @@ -114,6 +117,19 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name /varlistentry varlistentry +termliteralLIKE replaceablesource_table/replaceable [ replaceablelike_option/replaceable ... ]/literal/term +listitem + para + The literalLIKE/literal clause specifies a table from which + the new foreign table automatically copies all column names and their data types. + /para + para + Inapplicable options like literalINCLUDING STORAGE/literal are ignored. + /para + /listitem + /varlistentry + + varlistentry termliteralNOT NULL//term listitem para diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index eb07ca3..82c77eb 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -649,7 +649,7 @@ transformTableConstraint(CreateStmtContext *cxt, Constraint *constraint) /* * transformTableLikeClause * - * Change the LIKE srctable portion of a CREATE TABLE statement into + * Change the LIKE srctable portion of a CREATE [FOREIGN] TABLE statement into * column definitions which recreate the user defined column portions of * srctable. */ @@ -668,12 +668,6 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla setup_parser_errposition_callback(pcbstate, cxt-pstate, table_like_clause-relation-location); - /* we could support LIKE in many cases, but worry about it another day */ - if (cxt-isforeign) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), -errmsg(LIKE is not supported for creating foreign tables))); - relation = relation_openrv(table_like_clause-relation, AccessShareLock); if (relation-rd_rel-relkind != RELKIND_RELATION @@ -689,6 +683,12 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: Il 08/01/14 18:42, Simon Riggs ha scritto: Not sure I see why it needs to be an SRF. It only returns one row. Attached is version 4, which removes management of SRF stages. I have been looking at v4 a bit more, and found a couple of small things: - a warning in pgstatfuncs.c - some typos and format fixing in the sgml docs - some corrections in a couple of comments - Fixed an error message related to pg_stat_reset_shared referring only to bgwriter and not the new option archiver. Here is how the new message looks: =# select pg_stat_reset_shared('popo'); ERROR: 22023: unrecognized reset target: popo HINT: Target must be bgwriter or archiver A new version is attached containing those fixes. I played also with the patch and pgbench, simulating some archive failures and successes while running pgbench and the reports given by pg_stat_archiver were correct, so I am marking this patch as Ready for committer. I refactored the patch further. * Remove ArchiverStats global variable to simplify pgarch.c. * Remove stats_timestamp field from PgStat_ArchiverStats struct because it's not required. Thanks, pgstat_send_archiver is cleaner now. I have some review comments: +s.archived_wals, +s.last_archived_wal, +s.last_archived_wal_time, +s.failed_attempts, +s.last_failed_wal, +s.last_failed_wal_time, The column names of pg_stat_archiver look not consistent at least to me. What about the followings? archive_count last_archived_wal last_archived_time fail_count last_failed_wal last_failed_time And what about archived_count and failed_count instead of respectively archive_count and failed_count? The rest of the names are better now indeed. I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. Yep, this is definitely a different patch. +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */ +TimestampTz last_archived_wal_timestamp;/* last archival success */ +PgStat_Counter failed_attempts; +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved in failure */ Some hackers don't like the increase of the size of the statsfile. In order to reduce the size as much as possible, we should use the exact size of WAL file here instead of MAXFNAMELEN? The first versions of the patch used a more limited size length more inline with what you say: +#define MAX_XFN_CHARS40 But this is inconsistent with xlog_internal.h. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery to backup point
On Sat, Jan 25, 2014 at 5:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks, committed! It seems that this patch has not been pushed :) Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers