Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Mar 3, 2015 at 12:05 AM, Heikki Linnakangas hlinn...@iki.fi wrote: My experimental branch works just fine (with a variant jjanes_upsert with subxact looping), until I need to restart an update after a failed heap_update() that still returned HeapTupleMayBeUpdated (having super deleted within an ExecUpdate() call). There is no good way to do that for ExecUpdate() that I can see, because an existing, visible row is affected (unlike with ExecInsert()). Even if it was possible, it would be hugely invasive to already very complicated code paths. Ah, so we can't easily use super-deletion to back out an UPDATE. I had not considered that. Yeah. When I got into considering making EvalPlanQualFetch() look at speculative tokens, it became abundantly clear that that code would never be committed, even if I could make it work. I continue to believe that the best way forward is to incrementally commit the work by committing ON CONFLICT IGNORE first. That way, speculative tokens will remain strictly the concern of UPSERTers or sessions doing INSERT ... ON CONFLICT IGNORE. Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE, please? Of course. I'll have that for your shortly. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 03/02/2015 11:21 PM, Peter Geoghegan wrote: On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. I used a b-tree to estimate the effect that the locking would have in the UPSERT case, for UPSERT into a table with a b-tree index. But you're right that for the question of whether this is acceptable for the case of regular insert into a table with exclusion constraints, other indexams are more interesting. In a very quick test, the overhead with a single GiST index seems to be about 5%. IMHO that's still a noticeable slowdown, so let's try to find a way to eliminate it. Honestly, I don't know why you're so optimistic that this can be fixed, even with that heavyweight lock (for regular inserters + exclusion constraints). We already concluded that it can be fixed, with the heavyweight lock. See http://www.postgresql.org/message-id/54f4a0e0.4070...@iki.fi. Do you see some new problem with that that hasn't been discussed yet? To eliminate the heavy-weight lock, we'll need some new ideas, but it doesn't seem that hard. My experimental branch, which I showed you privately shows big problems when I simulate upserting with exclusion constraints (so we insert first, handle exclusion violations using the traditional upsert subxact pattern that does work with B-Trees). It's much harder to back out of a heap_update() than it is to back out of a heap_insert(), since we might well be updated a tuple visible to some other MVCC snapshot. Whereas we can super delete a tuple knowing that only a dirty snapshot might have seen it, which bounds the complexity (only wait sites for B-Trees + exclusion constraints need to worry about speculative insertion tokens and so on). My experimental branch works just fine (with a variant jjanes_upsert with subxact looping), until I need to restart an update after a failed heap_update() that still returned HeapTupleMayBeUpdated (having super deleted within an ExecUpdate() call). There is no good way to do that for ExecUpdate() that I can see, because an existing, visible row is affected (unlike with ExecInsert()). Even if it was possible, it would be hugely invasive to already very complicated code paths. Ah, so we can't easily use super-deletion to back out an UPDATE. I had not considered that. I continue to believe that the best way forward is to incrementally commit the work by committing ON CONFLICT IGNORE first. That way, speculative tokens will remain strictly the concern of UPSERTers or sessions doing INSERT ... ON CONFLICT IGNORE. Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE, please? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
2015-03-02 23:07 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: I seem to be getting a problem with whole-row references: # SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on e.person_id = p.id inner join countries c on p.country_id = c.id; ERROR: table r has 3 columns available but 4 columns specified CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT id, country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name, country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0)) In this case, the 4th target-entry should be l, not l.a_1. Actually. I fixed that part. And the error message could be somewhat confusing. This mentions table r, but there's no such table or alias in my actual query. However, do we have a mechanical/simple way to distinguish the cases when we need relation alias from the case when we don't need it? Like a self-join cases, we has to construct a remote query even if same table is referenced multiple times in a query. Do you have a good idea? I'd like to vote for keeping current aliasing style, use l and r for join source relations, and use a_0, a_1, ... for each column of them. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
Thanks for reviewing my patch. 2015-03-02 22:50 GMT+09:00 Thom Brown t...@linux.com: I seem to be getting a problem with whole-row references: # SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on e.person_id = p.id inner join countries c on p.country_id = c.id; ERROR: table r has 3 columns available but 4 columns specified CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT id, country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name, country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0)) And the error message could be somewhat confusing. This mentions table r, but there's no such table or alias in my actual query. Your concern would not be limited to the feature I'm proposing, because fdw_options about object name also introduce such mismatch between the query user constructed and another one which postgres_fdw constructed for remote execution. Currently we put CONTEXT line for such purpose, but it might hard to understand soon only from error messages. So I'd like to add section about query debugging for postgres_fdw. Another issue: # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN countries ON people.country_id = countries.id LIMIT 3) x; ERROR: could not open relation with OID 0 Good catch. In my quick trial, removing LIMIT3 avoids this error. I'll check it right now. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Kaigai-san, The v6 patch was cleanly applied on master branch. I'll rebase my patch onto it, but before that I have a comment about name of the new FDW API handler GetForeignJoinPath. Obviously FDW can add multiple paths at a time, like GetForeignPaths, so IMO it should be renamed to GetForeignJoinPaths, with plural form. In addition to that, new member of RelOptInfo, fdw_handler, should be initialized explicitly in build_simple_rel. Please see attached a patch for these changes. I'll review the v6 path afterwards. 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: Sorry, I misoperated on patch creation. Attached one is the correct version. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Tuesday, March 03, 2015 6:31 PM To: Kaigai Kouhei(海外 浩平); Robert Haas Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) The attached version of custom/foreign-join interface patch fixes up the problem reported on the join-pushdown support thread. The previous version referenced *_ps_tlist on setrefs.c, to check whether the Custom/ForeignScan node is associated with a particular base relation, or not. This logic considered above nodes performs base relation scan, if *_ps_tlist is valid. However, it was incorrect in case when underlying pseudo-scan relation has empty targetlist. Instead of the previous logic, it shall be revised to check scanrelid itself. If zero, it means Custom/ForeignScan node is not associated with a particular base relation, thus, its slot descriptor for scan shall be constructed based on *_ps_tlist. Also, I noticed a potential problem if CSP/FDW driver want to displays expression nodes using deparse_expression() but varnode within this expression does not appear in the *_ps_tlist. For example, a remote query below shall return rows with two columns. SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; Thus, ForeignScan will perform like as a scan on relation with two columns, and FDW driver will set two TargetEntry on the fdw_ps_tlist. If FDW is designed to keep the join condition (aid = bid) using expression node form, it is expected to be saved on custom/fdw_expr variable, then setrefs.c rewrites the varnode according to *_ps_tlist. It means, we also have to add *_ps_tlist both of aid and bid to avoid failure on variable lookup. However, these additional entries changes the definition of the slot descriptor. So, I adjusted ExecInitForeignScan and ExecInitCustomScan to use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct the slot descriptor based on the *_ps_tlist. It expects CSP/FDW drivers to add target-entries with resjunk=true, if it wants to have additional entries for variable lookups on EXPLAIN command. Fortunately or unfortunately, postgres_fdw keeps its remote query in cstring form, so it does not need to add junk entries on the fdw_ps_tlist. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Sunday, February 15, 2015 11:01 PM To: Kaigai Kouhei(海外 浩平); Robert Haas Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) The attached patch is a rebased version of join replacement with foreign-/custom-scan. Here is no feature updates at this moment but SGML documentation is added (according to Michael's comment). This infrastructure allows foreign-data-wrapper and custom-scan- provider to add alternative scan paths towards relations join. From viewpoint of the executor, it looks like a scan on a pseudo- relation that is materialized from multiple relations, even though FDW/CSP internally processes relations join with their own logic. Its basic idea is, (1) scanrelid==0 indicates this foreign/custom scan node runs on a pseudo relation and (2) fdw_ps_tlist and custom_ps_tlist introduce the definition of the pseudo relation, because it is not associated with a tangible relation unlike simple scan case, thus planner cannot know the expected record type to be returned without these additional information. These two enhancement enables extensions to process relations join internally, and to perform as like existing scan node from viewpoint of the core backend. Also, as an aside. I had a discussion with Hanada-san about this interface off-list. He had an idea to keep create_plan_recurse() static, using a special list field in CustomPath structure to chain underlying Path node. If core backend translate the Path node
Re: [HACKERS] autogenerated column names + views are a dump hazard
On 2015-03-02 16:32:53 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: The easiest way to solve this would teach ruleutils.c to simply always attach AS clauses for auto-generated columnnames. Won't look too pretty though. Does somebody have a better idea? No, it would look awful :-(. Hm, so I looked into it, and I think the problem is actually restricted to columns where the typename that FigureColname() assigns is different from the one that will result after ger_rule_expr()/get_const_expr()'s implicit cast is added. For this case it seems easiest if we'd make get_rule_expr() (and some of its helpers) return whether an implicit cast has been added. Whenever an implicit cast is added in the target list we should add an alias - as the generated column name will be different from before. That's a bit invasive, but doesn't seem too bad. A quick hack doing so shows that there's no changes in the regression output when doing it for consts. But I think we might have enough infrastructure in there to notice whether the column is actually referenced or not. So we could label the columns only if necessary, which would at least limit the ugliness. I don't think looking for referenced columns is actually sufficient, unless you define referenced pretty widely at least. Before the dump CREATE VIEW ... AS SELECT '2' ORDER BY 1; will yield a '?column?' columnname. After a dump/restore it'll be 'text'. That doesn't seem desireable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
I rebased join push-down patch onto Kaigai-san's Custom/Foreign Join v6 patch. I posted some comments to v6 patch in this post: http://www.postgresql.org/message-id/CAEZqfEcNvjqq-P=jxnw1pb4t9wvpcporcn7g6cc46jgub7d...@mail.gmail.com Before applying my v3 patch, please apply Kaigai-san's v6 patch and my mod_cjv6.patch. Sorry for complex patch combination. Those patches will be arranged soon by Kaigai-san and me. I fixed the issues pointed out by Thom and Kohei, but still the patch has an issue about joins underlying UPDATE or DELETE. Now I'm working on fixing this issue. Besides this issue, existing regression test passed. 2015-03-03 19:48 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: * Bug reported by Thom Brown - # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN countries ON people.country_id = countries.id LIMIT 3) x; ERROR: could not open relation with OID 0 Sorry, it was a problem caused by my portion. The patched setrefs.c checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan node is associated with a certain base relation. If *_ps_tlist is valid, it also expects scanrelid == 0. However, things we should check is incorrect. We may have a case with empty *_ps_tlist if remote join expects no columns. So, I adjusted the condition to check scanrelid instead. Is this issue fixed by v5 custom/foreign join patch? Yes, please rebase it. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Shigeru HANADA foreign_join_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
Hi, Sorry for the delay, I missed this thread. Here is a new version of this patch considering Andreas' comments. On 30/12/2014 03:48, Andreas Karlsson wrote: - A test fails in create_view.out. I looked some into it and did not see how this could happen. *** /home/andreas/dev/postgresql/src/test/regress/expected/create_view.out 2014-08-09 01:35:50.008886776 +0200 --- /home/andreas/dev/postgresql/src/test/regress/results/create_view.out 2014-12-30 00:41:17.966525339 +0100 *** *** 283,289 *** relname | relkind |reloptions +-+-- mysecview1 | v | ! mysecview2 | v | mysecview3 | v | {security_barrier=true} mysecview4 | v | {security_barrier=false} (4 rows) --- 283,289 relname | relkind |reloptions +-+-- mysecview1 | v | ! mysecview2 | v | {security_barrier=true} mysecview3 | v | {security_barrier=true} mysecview4 | v | {security_barrier=false} (4 rows) I can't reproduce this issue. - pg_dump is broken pg_dump: [archiver (db)] query failed: ERROR: syntax error at or near ( LINE 1: ...nest(tc.reloptions) x), ', ') AS toast_reloptions (SELECT sp... Fixed. - ALTER INDEX foo_idx SET TABLE TABLESPACE ... should not be allowed, currently it changes the tablespace of the index. I do not think ALTER INDEX foo_idx SET (TABLE|TOAST) TABLESPACE ... should even be allowed in the grammar. Fixed. - You should add a link to http://www.postgresql.org/docs/current/interactive/storage-toast.html to the manual page of ALTER TABLE. Added. - I do not like how \d handles the toast tablespace. Having TOAST in pg_default and the table in another space looks the same as if there was no TOAST table at all. I think we should always print both tablespaces if either of them are not pg_default. If we do it that way, we should always show the tablespace even if it's pg_default in any case to be consistent. I'm pretty sure that we don't want that. - Would it be interesting to add syntax for moving the toast index to a separate tablespace? -1, we just want to be able to move TOAST heap, which is supposed to contain a lot of data and we want to move on a different storage device / filesystem. - There is no warning if you set the toast table space of a table lacking toast. I think there should be one. A notice is raised now in that case. - No support for materialized views as pointed out by Alex. Support, documentation and regression tests added. - I think the code would be cleaner if ATPrepSetTableSpace and ATPrepSetToastTableSpace were merged into one function or split into two, one setting the toast and one setting the tablespace of the table and one setting it on the TOAST table. Done. - I think a couple of tests for the error cases would be nice. 2 more tests added. - Missing periods on the ALTER TABLE manual page after See also CREATE TABLESPACE (in two places). - Missing period last in the new paragraph added to the storage manual page. I don't understand those 2 last points ? - Double spaces in src/backend/catalog/toasting.c after if (new_toast. Fixed. - The comment and if this is not a TOAST re-creation case (through CLUSTER). incorrectly implies that CLUSTER is the only case where the TOAST table is recreated. Fixed. - The local variables ToastTableSpace and ToastRel do not follow the naming conventions. Fixed (I hope so). - Remove the parentheses around (is_system_catalog). Fixed. - Why was the Save info for Phase 3 to do the real work comment changed to XXX: Save info for Phase 3 to do the real work? Fixed. - Incorrect indentation for new code added last in ATExecSetTableSpace. Fixed. - The patch adds commented out code in src/include/commands/tablecmds.h. Fixed. Thank you for your review. -- Julien diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml index b0759fc..321ffc9 100644 --- a/doc/src/sgml/ref/alter_materialized_view.sgml +++ b/doc/src/sgml/ref/alter_materialized_view.sgml @@ -44,6 +44,8 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE replaceable class=parametername/r RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] ) OWNER TO replaceable class=PARAMETERnew_owner/replaceable SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable +SET TABLE TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable +SET TOAST TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable /synopsis /refsynopsisdiv diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index b3a4970..777ba57 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. 4071 /srv/dev/pgdev-master 1425396089 5440 /tmp localhost 5440001 82345992 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
Andres Freund and...@2ndquadrant.com writes: On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. Yeah, that ship sailed long ago. I'm sure anyone who's doing this is using head -1 not just cat, and what I suggest wouldn't break that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Comparing primary/HS standby in tests
Hi, I've regularly wished we had automated tests that setup HS and then compare primary/standby at the end to verify replay worked correctly. Heikki's page comparison tools deals with some of that verification, but it's really quite expensive and doesn't care about runtime only differences. I.e. it doesn't test HS at all. I every now and then run installcheck against a primary, verify that replay works without errors, and then compare pg_dumpall from both clusters. Unfortunately that currently requires hand inspection of dumps, there are differences like: -SELECT pg_catalog.setval('default_seq', 1, true); +SELECT pg_catalog.setval('default_seq', 33, true); The reason these differences is that the primary increases the sequence's last_value by 1, but temporarily sets it to +SEQ_LOG_VALS before XLogInsert(). So the two differ. Does anybody have a good idea how to get rid of that difference? One way to do that would be to log the value the standby is sure to have - but that's not entirely trivial. I'd very much like to add a automated test like this to the tree, but I don't see wa way to do that sanely without a comparison tool... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, It would be good to get those problems fixed first. Could you send an updated patch? Please find attached updated patch with WAL replay error fixed. The patch follows chunk ID approach of xlog format. Following are brief measurement numbers. WAL FPW compression on 122.032 MB FPW compression off 155.239 MB HEAD 155.236 MB Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-of-full-page-writes_v22.patch Description: Support-compression-of-full-page-writes_v22.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Mon, Jun 30, 2014 at 8:06 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: After sleeping on it, I realised that the code would return '{all}' for 'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly ambiguous, but I don't think it's especially useful for callers. Hm. Nope, it doesn't. It just says {all} regardless of whether all is quoted or not. This makes sense, the rules for when to quote things in pg_hba and when to quote things for arrays are separate. And we definitely don't want to start adding quotes to every token that the parser noted was quoted because then you'll start seeing things like {\all\} and {\database with space\} which won't make it any easier to match things. I'm not sure adding a separate column is really the solution either. You can have things like all,databasename (which is the keyword all not a database all). Or more likely something like sameuser,bob or even things like replication,all. I'm thinking leaving well enough alone is probably best. It's not perfect but if a user does have a database named all or replication or a user named sameuser or samerole then it's not like pg_hba_settings crashes or anything, it just produces information that's hard to interpret and the response might just be don't do that. The only other option I was pondering was using a jsonb instead of an array. That would give us more flexibility and we could have a json array that contained strings and objects representing keywords. But most hba files consist *mostly* of singleton keywords, so the result might be kind of cumbersome. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On Fri, Feb 20, 2015 at 1:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. Extend the definition of the postmaster.pid file to add another line, which will contain the time of the last postmaster configuration load attempt (might as well be a numeric Unix-style timestamp) and a boolean indication of whether that attempt succeeded or not. Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. We do have the external_pid_file GUC so perhaps this just means we should warn people in the release notes or somewhere and point them at that GUC. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Weirdly pesimistic estimates in optimizer
Hi all, I have encountered a performance problem with relatively simple query, which I think is caused by the overly pesimistic estimates in optimizer. I have originally run into this issue on a table few GBs large, but it can be reproduced with much smaller table as follows: -- Setup main fact table create table facts (fk int, f numeric); insert into facts select (random()*1)::int, random()*1 from generate_series(1,100) g(i); create index on facts (fk); vacuum analyze facts; -- Pick 100 values of 'fk' which cover roughly 1/100 rows from the 'facts' table and select all corresponding values create table random_fk_dupl as select fk from facts where fk in (select (random()*1)::int from generate_series(1,100) g(i)); vacuum analyze random_fk_dupl; -- Create a table with unique values from 'random_fk_dupl' create table random_fk_uniq as select distinct fk from random_fk_dupl; vacuum analyze random_fk_uniq; (The motivation for doing this is that I have a persistent table/cache with aggregated values and I want to update this table whenever new data are loaded to 'facts' by processing only loaded data. This is very rough description but it isn't needed to go into more detail for the sake of the argument.) Now the main query: david=# explain analyze select fk, sum(f) from facts where fk in (select fk from random_fk_dupl) group by 1; QUERY PLAN -- HashAggregate (cost=892.82..1017.34 rows=9962 width=15) (actual time=22.752..22.791 rows=100 loops=1) Group Key: facts.fk - Nested Loop (cost=167.01..842.63 rows=10038 width=15) (actual time=2.167..16.739 rows=9807 loops=1) - HashAggregate (cost=166.59..167.59 rows=100 width=4) (actual time=2.153..2.193 rows=100 loops=1) Group Key: random_fk_dupl.fk - Seq Scan on random_fk_dupl (cost=0.00..142.07 rows=9807 width=4) (actual time=0.003..0.688 rows=9807 loops=1) - Index Scan using facts_fk_idx on facts (cost=0.42..5.75 rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100) Index Cond: (fk = random_fk_dupl.fk) Planning time: 0.372 ms Execution time: 22.842 ms This is the plan I would expect given the estimates (quite good except the last aggregation) for this query and it exactly corresponds to what's written in the README of optimizer: {quote} The naive way to join two relations using a clause like WHERE A.X = B.Y is to generate a nestloop plan like this: NestLoop Filter: A.X = B.Y - Seq Scan on A - Seq Scan on B We can make this better by using a merge or hash join, but it still requires scanning all of both input relations. If A is very small and B is very large, but there is an index on B.Y, it can be enormously better to do something like this: NestLoop - Seq Scan on A - Index Scan using B_Y_IDX on B Index Condition: B.Y = A.X Here, we are expecting that for each row scanned from A, the nestloop plan node will pass down the current value of A.X into the scan of B. That allows the indexscan to treat A.X as a constant for any one invocation, and thereby use it as an index key. This is the only plan type that can avoid fetching all of B, and for small numbers of rows coming from A, that will dominate every other consideration. (As A gets larger, this gets less attractive, and eventually a merge or hash join will win instead. So we have to cost out all the alternatives to decide what to do.) {quote} So far so good. Now let's use 'random_fk_uniq' instead of 'random_fk_dupl'. I would expect almost the same plan (with just cheaper inner HashAggregate), because the optimizer knows that thar there are (at most) 100 values in 'random_fk_uniq' so nested loop is clearly the best choice. Instead we get this: david=# explain analyze select fk, sum(f) from facts where fk in (select fk from random_fk_uniq) group by 1; QUERY PLAN -- HashAggregate (cost=18198.11..18322.64 rows=9962 width=15) (actual time=163.738..163.773 rows=100 loops=1) Group Key: facts.fk - Hash Semi Join (cost=3.25..18147.92 rows=10038 width=15) (actual time=0.298..160.271 rows=9807 loops=1) Hash Cond: (facts.fk = random_fk_uniq.fk) - Seq Scan on facts (cost=0.00..15408.00 rows=100 width=15) (actual time=0.010..66.524 rows=100 loops=1) - Hash (cost=2.00..2.00 rows=100 width=4) (actual time=0.079..0.079 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB - Seq Scan on random_fk_uniq (cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.028 rows=100 loops=1) Planning time:
[HACKERS] Weirdly pesimistic estimates in optimizer
Hi all, I have encountered a performance problem with relatively simple query, which I think is caused by the overly pesimistic estimates in optimizer. I have originally run into this issue on a table few GBs large, but it can be reproduced with much smaller table as follows: -- Setup main fact table create table facts (fk int, f numeric); insert into facts select (random()*1)::int, random()*1 from generate_series(1,100) g(i); create index on facts (fk); vacuum analyze facts; -- Pick 100 values of 'fk' which cover roughly 1/100 rows from the 'facts' table and select all corresponding values create table random_fk_dupl as select fk from facts where fk in (select (random()*1)::int from generate_series(1,100) g(i)); vacuum analyze random_fk_dupl; -- Create a table with unique values from 'random_fk_dupl' create table random_fk_uniq as select distinct fk from random_fk_dupl; vacuum analyze random_fk_uniq; (The motivation for doing this is that I have a persistent table/cache with aggregated values and I want to update this table whenever new data are loaded to 'facts' by processing only loaded data. This is very rough description but it isn't needed to go into more detail for the sake of the argument.) Now the main query: david=# explain analyze select fk, sum(f) from facts where fk in (select fk from random_fk_dupl) group by 1; QUERY PLAN -- HashAggregate (cost=892.82..1017.34 rows=9962 width=15) (actual time=22.752..22.791 rows=100 loops=1) Group Key: facts.fk - Nested Loop (cost=167.01..842.63 rows=10038 width=15) (actual time=2.167..16.739 rows=9807 loops=1) - HashAggregate (cost=166.59..167.59 rows=100 width=4) (actual time=2.153..2.193 rows=100 loops=1) Group Key: random_fk_dupl.fk - Seq Scan on random_fk_dupl (cost=0.00..142.07 rows=9807 width=4) (actual time=0.003..0.688 rows=9807 loops=1) - Index Scan using facts_fk_idx on facts (cost=0.42..5.75 rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100) Index Cond: (fk = random_fk_dupl.fk) Planning time: 0.372 ms Execution time: 22.842 ms This is the plan I would expect given the estimates (quite good except the last aggregation) for this query and it exactly corresponds to what's written in the README of optimizer: {quote} The naive way to join two relations using a clause like WHERE A.X = B.Y is to generate a nestloop plan like this: NestLoop Filter: A.X = B.Y - Seq Scan on A - Seq Scan on B We can make this better by using a merge or hash join, but it still requires scanning all of both input relations. If A is very small and B is very large, but there is an index on B.Y, it can be enormously better to do something like this: NestLoop - Seq Scan on A - Index Scan using B_Y_IDX on B Index Condition: B.Y = A.X Here, we are expecting that for each row scanned from A, the nestloop plan node will pass down the current value of A.X into the scan of B. That allows the indexscan to treat A.X as a constant for any one invocation, and thereby use it as an index key. This is the only plan type that can avoid fetching all of B, and for small numbers of rows coming from A, that will dominate every other consideration. (As A gets larger, this gets less attractive, and eventually a merge or hash join will win instead. So we have to cost out all the alternatives to decide what to do.) {quote} So far so good. Now let's use 'random_fk_uniq' instead of 'random_fk_dupl'. I would expect almost the same plan (with just cheaper inner HashAggregate), because the optimizer knows that thar there are (at most) 100 values in 'random_fk_uniq' so nested loop is clearly the best choice. Instead we get this: david=# explain analyze select fk, sum(f) from facts where fk in (select fk from random_fk_uniq) group by 1; QUERY PLAN -- HashAggregate (cost=18198.11..18322.64 rows=9962 width=15) (actual time=163.738..163.773 rows=100 loops=1) Group Key: facts.fk - Hash Semi Join (cost=3.25..18147.92 rows=10038 width=15) (actual time=0.298..160.271 rows=9807 loops=1) Hash Cond: (facts.fk = random_fk_uniq.fk) - Seq Scan on facts (cost=0.00..15408.00 rows=100 width=15) (actual time=0.010..66.524 rows=100 loops=1) - Hash (cost=2.00..2.00 rows=100 width=4) (actual time=0.079..0.079 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB - Seq Scan on random_fk_uniq (cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.028 rows=100 loops=1) Planning time:
Re: [HACKERS] autogenerated column names + views are a dump hazard
On 2015-03-03 09:39:03 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: For this case it seems easiest if we'd make get_rule_expr() (and some of its helpers) return whether an implicit cast has been added. Aside from being pretty ugly, that doesn't seem particularly bulletproof. Right. It might deal all right with this specific manifestation, but now that we've seen the problem, who's to say that there aren't other ways to get bit? Or that some innocent future change to FigureColname() might not create a new one? It's not like the behavior of that function was handed down on stone tablets --- we do change it from time to time. Sure it'd not be a guarantee, but what is? I wondered whether there was a way to directly test whether FigureColname() gives a different result from what we have recorded as the effective column name. Unfortunately, it wants a raw parse tree whereas what we have is an analyzed parse tree, so there's no easy way to apply it and see. Additionally the casts added by get_rule_expr() show up in neither tree, so that'd not in itself help. Now, we do have the deparsed text of the column expression at hand, so in principle you could run that back through the grammar to get a raw parse tree, and then apply FigureColname() to that. Not sure what that would be like from a performance standpoint, but it would provide a pretty robust fix for this class of problem. Yea, I've wondered about that as well. Given that this is pretty much only run during deparsing I don't think the overhead would be relevant. And on the third hand ... doing that would really only be robust as long as you assume that the output will be read by a server using exactly the same FigureColname() logic as what we are using. So maybe the whole idea is a bad one, and we should just bite the bullet and print AS clauses always. I think this is the way to go though. There's different extremes we can go to though - the easiest is to simply remove the attname = ?column? assignment from get_target_list(). That way plain Var references (aside from whole row vars/subplans and such that get_variable() deals with) don't get a forced alias, but everything else does. That seems like a good compromise between readability and safety. get_rule_expr() deals with most of the nasty stuff. I did this, and the noise in the regression test output isn't too bad. Primarily that a fair number of of EXISTS(SELECT 1 .. ) grow an alias. Or we could consider printing them always if the pretty flag isn't on. IIRC, pg_dump doesn't enable that. Not a fan of that, I'd rather not output queries that can't be executed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Mar 3, 2015 at 04:55:56PM +0300, Vladimir Borodin wrote: OK, hmmm. Thanks for testing. It feels like you didn't have your new master set up for streaming replication when you ran pg_upgrade. Is that correct? Should I specify that specifically in the instructions? After running pg_upgrade I do put in new PGDATA on master old pg_hba.conf and postgresql.conf with wal_level = hot_standby. The full content of postgresql.conf could be seen here - http://pastie.org/9995902. Then I do rsync to replica, put recovery.conf and try to start both - first master, then replica. If I turn off hot_standby in replica configuration, it starts. What am I doing wrong? After running initdb to create the new master, but before running pg_upgrade, modify the new master's postgresql.conf and change wal_level = hot_standby. (Don't modify pg_hba.conf at this stage.) I didn't think that was necessary, but this might be some 9.3-specific problem, but let's get it working first. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding
On 2015-02-25 14:04:55 -0800, Peter Geoghegan wrote: On Wed, Feb 25, 2015 at 3:26 AM, Andres Freund and...@2ndquadrant.com wrote: I'm pretty sure this will entirely fail if you have a transaction that's large enough to spill to disk. Calling ReorderBufferIterTXNNext() will reuse the memory for the in memory changes. It's also borked because it skips a large number of checks for the next change. You're entirely disregarding what the routine does otherwise - it could be a toast record, it could be a change to a system table, it could be a new snapshot... Also, this seems awfully fragile in th presence of toast rows and such? Can we be entirely sure that the next WAL record logged by that session would be the internal super deletion? toast_save_datum() is called with a heap_insert() call before heap insertion for the tuple proper. We're relying on the assumption that if there is no immediate super deletion record, things are fine. We cannot speculatively insert into catalogs, BTW. I fail to see what prohibits e.g. another catalog modifying transaction to commit inbetween and distribute a new snapshot. Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case next, in the case that we need to care about (when the tuple was super deleted immediately afterwards)? It's irrelevant whether you care about it or not. Your ReorderBufferIterTXNNext() consumes a change that needs to actually be processed. It could very well be something important like a new snapshot. As for the idea of writing WAL separately, I don't think it's worth it. We'd need to go exclusive lock the buffer again, which seems like a non-starter. Meh^2. That won't be the most significant cost of an UPSERT. While what I have here *is* very ugly, I see no better alternative. By doing what you suggest, we'd be finding a special case way of safely violating the general WAL log-before-data rule. No, it won't. We don't WAL log modifications that don't need to persist in a bunch of places. It'd be perfectly fine to start upsert with a 'acquiring a insertion lock' record that pretty much only contains the item pointer (and a FPI if necessary to prevent torn pages). And then only log the full new record once it's sure that the whole thing needs to survive. Redo than can simply clean out the size touched by the insertion lock. Performance aside, that seems like a worse, less isolated kludge...no? No. I'm not sure it'll come up significantly better, but I don't see it coming up much worse. WAL logging where the result of the WAL logged action essentially can't be determined until many records later aren't pretty. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
2 марта 2015 г., в 21:28, Bruce Momjian br...@momjian.us написал(а): On Tue, Feb 24, 2015 at 12:13:17PM +0300, Vladimir Borodin wrote: 20 февр. 2015 г., в 18:21, Bruce Momjian br...@momjian.us написал(а): On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote: #3 bothered me as well because it was not specific enough. I like what you've added to clarify the procedure. Good. It took me a while to understand why they have to be in sync --- because we are using rsync in size-only-comparison mode, if they are not in sync we might update some files whose sizes changed, but not others, and the old slave would be broken. The new slave is going to get all new files or hard links for user files, so it would be fine, but we should be able to fall back to the old slaves, and having them in sync allows that. Also, since there was concern about the instructions, I am thinking of applying the patch only to head for 9.5, and then blog about it if people want to test it. Am I right that if you are using hot standby with both streaming replication and WAL shipping you do still need to take full backup of master after using pg_upgrade? No, you would not need to take a full backup if you use these instructions. Although it would be applied to documentation for 9.5 only, are these instructions applicable for upgrading from 9.3.6 to 9.4.1? Following the instructions from patch I’ve got following errors in postgresql.log of replica after trying to start it with hot_standby = on: 2015-02-24 11:47:22.861 MSK WARNING: WAL was generated with wal_level=minimal, data may be missing 2015-02-24 11:47:22.861 MSK HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup. 2015-02-24 11:47:22.861 MSK FATAL: hot standby is not possible because wal_level was not set to hot_standby or higher on the master server 2015-02-24 11:47:22.861 MSK HINT: Either set wal_level to hot_standby on the master, or turn off hot_standby here. 2015-02-24 11:47:22.862 MSK LOG: startup process (PID 28093) exited with exit code 1 2015-02-24 11:47:22.862 MSK LOG: aborting startup due to startup process failure -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Да пребудет с вами сила… https://simply.name/ru
Re: [HACKERS] Join push-down support for foreign tables
Thanks for the detailed comments. 2015-03-03 18:01 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: Hanada-san, I checked the patch, below is the random comments from my side. * Context variables --- Sorry, I might give you a wrong suggestion. The foreign_glob_cxt and deparse_expr_cxt were re-defined as follows: typedef struct foreign_glob_cxt { PlannerInfo *root; /* global planner state */ - RelOptInfo *foreignrel; /* the foreign relation we are planning + RelOptInfo *outerrel; /* the foreign relation, or outer child + RelOptInfo *innerrel; /* inner child, only set for join */ } foreign_glob_cxt; /* @@ -86,9 +89,12 @@ typedef struct foreign_loc_cxt typedef struct deparse_expr_cxt { PlannerInfo *root; /* global planner state */ - RelOptInfo *foreignrel; /* the foreign relation we are planning + RelOptInfo *outerrel; /* the foreign relation, or outer child + RelOptInfo *innerrel; /* inner child, only set for join */ StringInfo buf;/* output buffer to append to */ List **params_list;/* exprs that will become remote Params + ForeignScan *outerplan; /* outer child's ForeignScan node */ + ForeignScan *innerplan; /* inner child's ForeignScan node */ } deparse_expr_cxt; However, the outerrel does not need to have double-meaning. RelOptInfo-reloptkind gives us information whether the target relation is base-relation or join-relation. So, foreign_expr_walker() can be implemented as follows: if (bms_is_member(var-varno, glob_cxt-foreignrel-relids) var-varlevelsup == 0) { : also, deparseVar() can checks relation type using: if (context-foreignrel-reloptkind == RELOPT_JOINREL) { deparseJoinVar(...); In addition, what we need in deparse_expr_cxt are target-list of outer and inner relation in deparse_expr_cxt. How about to add inner_tlist/outer_tlist instead of innerplan and outerplan in deparse_expr_cxt? The deparseJoinVar() references these fields, but only target list. Ah, I've totally misunderstood your suggestion. Now I reverted my changes and use target lists to know whether the var came from either of the relations. * GetForeignJoinPath method of FDW -- It should be portion of the interface patch, so I added these enhancement of FDW APIs with documentation updates. Please see the newer version of foreign/custom-join interface patch. Agreed. * enable_foreignjoin parameter -- I'm uncertain whether we should have this GUC parameter that affects to all FDW implementation. Rather than a built-in one, my preference is an individual GUC variable defined with DefineCustomBoolVariable(), by postgres_fdw. Pros: It offers user more flexible configuration. Cons: Each FDW has to implement this GUC by itself? Hum... In a sense, I added this GUC parameter for debugging purpose. As you pointed out, users might want to control join push-down feature per-FDW. I'd like to hear others' opinion. * syntax violated query if empty targetlist --- At least NULL shall be injected if no columns are referenced. Also, add a dummy entry to fdw_ps_tlist to fit slot tuple descriptor. postgres=# explain verbose select NULL from ft1,ft2 where aid=bid; QUERY PLAN --- Foreign Scan (cost=100.00..129.25 rows=2925 width=0) Output: NULL::unknown Remote SQL: SELECT FROM (SELECT bid, NULL FROM public.t2) l (a_0, a_1) INNER JOIN (SELECT aid, NULL FROM public.t1) r (a_0, a_1) ON ((r.a_0 = l.a_0)) Fixed. * Bug reported by Thom Brown - # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN countries ON people.country_id = countries.id LIMIT 3) x; ERROR: could not open relation with OID 0 Sorry, it was a problem caused by my portion. The patched setrefs.c checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan node is associated with a certain base relation. If *_ps_tlist is valid, it also expects scanrelid == 0. However, things we should check is incorrect. We may have a case with empty *_ps_tlist if remote join expects no columns. So, I adjusted the condition to check scanrelid instead. Is this issue fixed by v5 custom/foreign join patch? * make_tuple_from_result_row() call The 4th argument (newly added) is referenced without NULL checks, however, store_returning_result() and analyze_row_processor() put NULL on this argument, then it leads segmentation fault. RelationGetDescr(fmstate-rel or astate-rel) should be given on the caller. Fixed.
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On Tue, Mar 3, 2015 at 4:25 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 03/03/2015 01:43 AM, Andres Freund wrote: On 2015-03-02 15:40:27 -0800, Josh Berkus wrote: ! #max_wal_size = 1GB # in logfile segments Independent of the rest of the changes, the in logfile segments bit should probably be changed. The base unit is still logfile segments, so if you write just max_wal_size = 10 without a unit, that means 160MB. That's what the comment means. Is it needed? Many settings have a similar comment, but most don't. Most of the ones that do have a magic value 0, -1 as default, so that it's not obvious from the default what the unit is. For example: #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds; # 0 selects the system default #temp_file_limit = -1 # limits per-session temp file space # in kB, or -1 for no limit #commit_delay = 0 # range 0-10, in microseconds #log_temp_files = -1# log temporary files equal or larger # than the specified size in kilobytes; # -1 disables, 0 logs all temp files #statement_timeout = 0 # in milliseconds, 0 is disabled #lock_timeout = 0 # in milliseconds, 0 is disabled #wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables But there are also: #wal_receiver_timeout = 60s # time that receiver waits for # communication from master # in milliseconds; 0 disables #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. Seems OK to me. BTW, we can also remove in milliseconds from wal_sender_timeout. (and we should also change wal_keep_segments to accept MB/GB, as discussed already) +1 Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Sun, Feb 22, 2015 at 03:12:12PM -0500, Magnus Hagander wrote: # Attempt to identify the file using magic information mtype = mag.buffer(contents) if mtype.startswith('text/x-diff'): a.ispatch = True else: a.ispatch = False ... I think the old system where the patch submitter declared, this message contains my patch, is the only one that will work. Would you suggest removing the automated system completely, or keep it around and just make it possible to override it (either by removing the note that something is a patch, or by making something that's not listed as a patch become marked as such)? One counter-idea would be to assume every attachment is a patch _unless_ the attachment type matches a pattern that identifies it as not a patch. However, I agree with Tom that we should go a little longer before changing it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autogenerated column names + views are a dump hazard
Andres Freund and...@2ndquadrant.com writes: On 2015-03-03 09:39:03 -0500, Tom Lane wrote: And on the third hand ... doing that would really only be robust as long as you assume that the output will be read by a server using exactly the same FigureColname() logic as what we are using. So maybe the whole idea is a bad one, and we should just bite the bullet and print AS clauses always. I think this is the way to go though. There's different extremes we can go to though - the easiest is to simply remove the attname = ?column? assignment from get_target_list(). That way plain Var references (aside from whole row vars/subplans and such that get_variable() deals with) don't get a forced alias, but everything else does. That seems like a good compromise between readability and safety. get_rule_expr() deals with most of the nasty stuff. I wasn't aware that there was any room for compromise on the safety aspect. If pg_dump gets this wrong, that means pg_upgrade is broken, for example. That's why I was thinking that relying on the pretty flag might be a reasonable thing. pg_dump would be unconditionally right, and we'd not uglify EXPLAIN or \d+ output. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Tue, Mar 3, 2015 at 10:58:28AM -0500, Bruce Momjian wrote: Would you suggest removing the automated system completely, or keep it around and just make it possible to override it (either by removing the note that something is a patch, or by making something that's not listed as a patch become marked as such)? One counter-idea would be to assume every attachment is a patch _unless_ the attachment type matches a pattern that identifies it as not a patch. However, I agree with Tom that we should go a little longer before changing it. Also, can we look inside the attachment to see if it starts with 'diffspace'. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory
On Tue, Oct 14, 2014 at 01:21:53PM -0400, Bruce Momjian wrote: On Tue, Oct 14, 2014 at 09:20:22AM -0700, Jeff Janes wrote: On Mon, Oct 13, 2014 at 12:11 PM, Bruce Momjian br...@momjian.us wrote: I looked into this, and came up with more questions. Why is checkpoint_completion_target involved in the total number of WAL segments? If checkpoint_completion_target is 0.5 (the default), the calculation is: (2 + 0.5) * checkpoint_segments + 1 while if it is 0.9, it is: (2 + 0.9) * checkpoint_segments + 1 Is this trying to estimate how many WAL files are going to be created during the checkpoint? If so, wouldn't it be (1 + checkpoint_completion_target), not 2 +. My logic is you have the old WAL files being checkpointed (that's the 1), plus you have new WAL files being created during the checkpoint, which would be checkpoint_completion_target * checkpoint_segments, plus one for the current WAL file. WAL is not eligible to be recycled until there have been 2 successful checkpoints. So at the end of a checkpoint, you have 1 cycle of WAL which has just become eligible for recycling, 1 cycle of WAL which is now expendable but which is kept anyway, and checkpoint_completion_target worth of WAL which has occurred while the checkpoint was occurring and is still needed for crash recovery. OK, so based on this analysis, what is the right calculation? This? (1 + checkpoint_completion_target) * checkpoint_segments + 1 + max(wal_keep_segments, checkpoint_segments) Now that we have min_wal_size and max_wal_size in 9.5, I don't see any value to figuring out the proper formula for backpatching. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autogenerated column names + views are a dump hazard
On 2015-03-03 11:09:29 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-03-03 09:39:03 -0500, Tom Lane wrote: I think this is the way to go though. There's different extremes we can go to though - the easiest is to simply remove the attname = ?column? assignment from get_target_list(). That way plain Var references (aside from whole row vars/subplans and such that get_variable() deals with) don't get a forced alias, but everything else does. That seems like a good compromise between readability and safety. get_rule_expr() deals with most of the nasty stuff. I wasn't aware that there was any room for compromise on the safety aspect. Meh. I think that *not* printing an alias for a plain column reference is reasonable and doesn't present a relevant risk even in the face of future changes. foo.bar AS bar is just a bit too redundant imo. That's why I was thinking that relying on the pretty flag might be a reasonable thing. pg_dump would be unconditionally right, and we'd not uglify EXPLAIN or \d+ output. I don't think EXPLAIN VERBOSE currently is affected - it doesn't seem to show aliases/generated column names at all. I personally do occasionally copy paste from from \d+ output, so I'd rather have slightly more verbose output rather than wrong output. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
On 3 March 2015 at 12:34, Shigeru Hanada shigeru.han...@gmail.com wrote: I rebased join push-down patch onto Kaigai-san's Custom/Foreign Join v6 patch. I posted some comments to v6 patch in this post: http://www.postgresql.org/message-id/CAEZqfEcNvjqq-P=jxnw1pb4t9wvpcporcn7g6cc46jgub7d...@mail.gmail.com Before applying my v3 patch, please apply Kaigai-san's v6 patch and my mod_cjv6.patch. Sorry for complex patch combination. Those patches will be arranged soon by Kaigai-san and me. I fixed the issues pointed out by Thom and Kohei, but still the patch has an issue about joins underlying UPDATE or DELETE. Now I'm working on fixing this issue. Besides this issue, existing regression test passed. Re-tested the broken query and it works for me now. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Mar 3, 2015 at 08:38:50AM -0500, Bruce Momjian wrote: 2015-02-24 11:47:22.861 MSK WARNING: WAL was generated with wal_level= minimal, data may be missing 2015-02-24 11:47:22.861 MSK HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup. 2015-02-24 11:47:22.861 MSK FATAL: hot standby is not possible because wal_level was not set to hot_standby or higher on the master server 2015-02-24 11:47:22.861 MSK HINT: Either set wal_level to hot_standby on the master, or turn off hot_standby here. 2015-02-24 11:47:22.862 MSK LOG: startup process (PID 28093) exited with exit code 1 2015-02-24 11:47:22.862 MSK LOG: aborting startup due to startup process failure OK, hmmm. Thanks for testing. It feels like you didn't have your new master set up for streaming replication when you ran pg_upgrade. Is that correct? Should I specify that specifically in the instructions? Actually, I think you are on to something that needs to be documented. Because the old and new clusters might be using the same port number, you can't configure the new master to use streaming replication because you can't be shipping those logs to the old standby. Yikes. OK, I think we need to document that you need to set wal_level=hot_standby on the new master, but not set up streaming. Once you are done the upgrade, you should configure streaming. If this fixes the problem, I will generate an updated documentation patch. Please let me know. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Mar 3, 2015 at 11:38:58AM +0300, Vladimir Borodin wrote: No, you would not need to take a full backup if you use these instructions. Although it would be applied to documentation for 9.5 only, are these instructions applicable for upgrading from 9.3.6 to 9.4.1? Yes. They work all the way back to 9.0. Following the instructions from patch I’ve got following errors in postgresql.log of replica after trying to start it with hot_standby = on: 2015-02-24 11:47:22.861 MSK WARNING: WAL was generated with wal_level= minimal, data may be missing 2015-02-24 11:47:22.861 MSK HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup. 2015-02-24 11:47:22.861 MSK FATAL: hot standby is not possible because wal_level was not set to hot_standby or higher on the master server 2015-02-24 11:47:22.861 MSK HINT: Either set wal_level to hot_standby on the master, or turn off hot_standby here. 2015-02-24 11:47:22.862 MSK LOG: startup process (PID 28093) exited with exit code 1 2015-02-24 11:47:22.862 MSK LOG: aborting startup due to startup process failure OK, hmmm. Thanks for testing. It feels like you didn't have your new master set up for streaming replication when you ran pg_upgrade. Is that correct? Should I specify that specifically in the instructions? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On 3/3/15 9:26 AM, Andres Freund wrote: On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. 4071 /srv/dev/pgdev-master 1425396089 5440 /tmp localhost 5440001 82345992 If we already have all this extra stuff, why not include an actual error message then, or at least the first line of an error (or maybe just swap any newlines with spaces)? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On March 3, 2015 10:29:43 AM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. Yeah, that ship sailed long ago. I'm sure anyone who's doing this is using head -1 not just cat, and what I suggest wouldn't break that. regards, tom lane And what I've implemented doesn't either. The extra line is added to the back of the file. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On March 3, 2015 11:09:29 AM Jim Nasby wrote: On 3/3/15 9:26 AM, Andres Freund wrote: On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. 4071 /srv/dev/pgdev-master 1425396089 5440 /tmp localhost 5440001 82345992 If we already have all this extra stuff, why not include an actual error message then, or at least the first line of an error (or maybe just swap any newlines with spaces)? Not impossible. I can play around with that and see if it's as straightforward as I think it is. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote: On 02/26/2015 01:54 PM, Alvaro Herrera wrote: This patch decouples these three things so that they can changed freely -- but provides no user interface to do so. I think that trying to only decouple the thing we currently have in two pieces, and then have a subsequent patch to decouple again, is additional conceptual complexity for no gain. Oh, I didn't realize there weren't commands to change the LCO. Without at least SQL syntax for LCO, I don't see why we'd take it; this sounds more like a WIP patch. FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the columns in physical order with some logical ordering information, i.e. pg_upgrade cannot be passed only logical ordering from pg_dump. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On 3/3/15 11:23 AM, Bruce Momjian wrote: On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote: On 02/26/2015 01:54 PM, Alvaro Herrera wrote: This patch decouples these three things so that they can changed freely -- but provides no user interface to do so. I think that trying to only decouple the thing we currently have in two pieces, and then have a subsequent patch to decouple again, is additional conceptual complexity for no gain. Oh, I didn't realize there weren't commands to change the LCO. Without at least SQL syntax for LCO, I don't see why we'd take it; this sounds more like a WIP patch. FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the columns in physical order with some logical ordering information, i.e. pg_upgrade cannot be passed only logical ordering from pg_dump. Wouldn't it need attno info too, so all 3 orderings? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On Tue, Mar 3, 2015 at 11:24:38AM -0600, Jim Nasby wrote: On 3/3/15 11:23 AM, Bruce Momjian wrote: On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote: On 02/26/2015 01:54 PM, Alvaro Herrera wrote: This patch decouples these three things so that they can changed freely -- but provides no user interface to do so. I think that trying to only decouple the thing we currently have in two pieces, and then have a subsequent patch to decouple again, is additional conceptual complexity for no gain. Oh, I didn't realize there weren't commands to change the LCO. Without at least SQL syntax for LCO, I don't see why we'd take it; this sounds more like a WIP patch. FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the columns in physical order with some logical ordering information, i.e. pg_upgrade cannot be passed only logical ordering from pg_dump. Wouldn't it need attno info too, so all 3 orderings? Uh, what is the third ordering? Physical, logical, and ? It already gets information about dropped columns, if that is the third one. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On 3/3/15 11:15 AM, Jan de Visser wrote: On March 3, 2015 11:09:29 AM Jim Nasby wrote: On 3/3/15 9:26 AM, Andres Freund wrote: On 2015-03-03 15:21:24 +, Greg Stark wrote: Fwiw this concerns me slightly. I'm sure a lot of people are doing things like kill -HUP `cat .../postmaster.pid` or the equivalent. postmaster.pid already contains considerably more than just the pid. e.g. 4071 /srv/dev/pgdev-master 1425396089 5440 /tmp localhost 5440001 82345992 If we already have all this extra stuff, why not include an actual error message then, or at least the first line of an error (or maybe just swap any newlines with spaces)? Not impossible. I can play around with that and see if it's as straightforward as I think it is. I'm sure the code side of this is trivial; it's a question of why Tom was objecting. It would probably be better for us to come to a conclusion before working on this. On a related note... something else we could do here would be to keep a last-known-good copy of the config files around. That way if you flubbed something at least the server would still start. I do think that any admin worth anything would notice an error from pg_ctl, but maybe others have a different opinion. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On Fri, Feb 27, 2015 at 09:09:35AM +0900, Amit Langote wrote: On 27-02-2015 AM 03:24, Andres Freund wrote: On 2015-02-26 12:15:17 +0900, Amit Langote wrote: On 26-02-2015 AM 05:15, Josh Berkus wrote: I would love to have it for 9.5, but I guess the patch isn't nearly baked enough for that? I'm not quite sure what would qualify as baked enough for 9.5 though we can surely try to reach some consensus on various implementation aspects and perhaps even get it ready in time for 9.5. I think it's absolutely unrealistic to get this into 9.5. There's barely been any progress on the current (last!) commitfest - where on earth should the energy come to make this patch ready? And why would that be fair against all the others that have submitted in time? I realize and I apologize that it was irresponsible of me to have said that; maybe got a bit too excited. I do not want to unduly draw people's time on something that's not quite ready while there are other things people have worked hard on to get in time. In all earnestness, I say we spend time perfecting those things. I'll add this into CF-JUN'15. I will keep posting updates meanwhile so that when that commitfest finally starts, we will have something worth considering. I am _very_ glad you have started on this. There is a huge need for this, and I am certainly excited about it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Normalizing units for postgresql.conf WAS: Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 10:37 AM, Heikki Linnakangas wrote: On 03/03/2015 08:31 PM, Josh Berkus wrote: On 03/03/2015 10:29 AM, Heikki Linnakangas wrote: On 03/03/2015 08:21 PM, Josh Berkus wrote: On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. I think it's a good rule that if the commented-out default in the sample file does not contain a unit, then the base unit is in the comment. Otherwise it's not. For example: #shared_buffers = 32MB# min 128kB # (change requires restart) The base unit is BLCKSZ, i.e. 8k, but usually people will usually use MB/GB. And that is evident from the default value, 32MB, so there's no need to mention it in the comment. #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default Here it's not obvious what the unit should be from the default itself. So the comment says it's in seconds. Sure. Although, do we take (s) for tcp_keepalives_idle? Or only an INT? It's a time unit, so you can say 10s or 1ms. If you don't specify a unit, it implies seconds. So if we're going to make this consistent, let's make it consistent. 1. All GUCs which accept time/size units will have them on the default setting. 2. Time/size comments will be removed, *except* from GUCs which do not accept (ms/s/min) or (kB/MB/GB). Argument for: the current inconsistency confuses new users and his entirely historical in nature. Argument Against: will create unnecessary diff changes between 9.4's pg.conf and 9.5's pg.conf. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comparing primary/HS standby in tests
On 03/03/2015 07:49 AM, Andres Freund wrote: I'd very much like to add a automated test like this to the tree, but I don't see wa way to do that sanely without a comparison tool... We could use a comparison tool anyway. Baron Schwartz was pointing out that Percona has a comparison tool for MySQL, and the amount of drift and corruption that they find in a large replication cluster is generally pretty alarming, and *always* present. While our replication isn't as flaky as MySQL's, networks are often lossy or corrupt. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 08:21 PM, Josh Berkus wrote: On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. I think it's a good rule that if the commented-out default in the sample file does not contain a unit, then the base unit is in the comment. Otherwise it's not. For example: #shared_buffers = 32MB # min 128kB # (change requires restart) The base unit is BLCKSZ, i.e. 8k, but usually people will usually use MB/GB. And that is evident from the default value, 32MB, so there's no need to mention it in the comment. #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default Here it's not obvious what the unit should be from the default itself. So the comment says it's in seconds. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 3/3/15 9:08 AM, Greg Stark wrote: On Mon, Jun 30, 2014 at 8:06 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: After sleeping on it, I realised that the code would return '{all}' for 'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly ambiguous, but I don't think it's especially useful for callers. Hm. Nope, it doesn't. It just says {all} regardless of whether all is quoted or not. This makes sense, the rules for when to quote things in pg_hba and when to quote things for arrays are separate. And we definitely don't want to start adding quotes to every token that the parser noted was quoted because then you'll start seeing things like {\all\} and {\database with space\} which won't make it any easier to match things. I'm not sure adding a separate column is really the solution either. You can have things like all,databasename (which is the keyword all not a database all). Or more likely something like sameuser,bob or even things like replication,all. What about a separate column that's just the text from pg_hba? Or is that what you're opposed to? FWIW, I'd say that having the individual array elements be correct is more important than what the result of array_out is. That way you could always do array_to_string(..., ', ') and get valid pg_hba output. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 08:31 PM, Josh Berkus wrote: On 03/03/2015 10:29 AM, Heikki Linnakangas wrote: On 03/03/2015 08:21 PM, Josh Berkus wrote: On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. I think it's a good rule that if the commented-out default in the sample file does not contain a unit, then the base unit is in the comment. Otherwise it's not. For example: #shared_buffers = 32MB# min 128kB # (change requires restart) The base unit is BLCKSZ, i.e. 8k, but usually people will usually use MB/GB. And that is evident from the default value, 32MB, so there's no need to mention it in the comment. #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default Here it's not obvious what the unit should be from the default itself. So the comment says it's in seconds. Sure. Although, do we take (s) for tcp_keepalives_idle? Or only an INT? It's a time unit, so you can say 10s or 1ms. If you don't specify a unit, it implies seconds. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
Naive question: would it be /possible/ to change configuration to accept percentages, and have a percent mean of existing RAM at startup time? I ask because most of the tuning guidelines I see suggest setting memory parameters as a % of RAM available. On Tue, Mar 3, 2015 at 1:29 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 03/03/2015 08:21 PM, Josh Berkus wrote: On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. I think it's a good rule that if the commented-out default in the sample file does not contain a unit, then the base unit is in the comment. Otherwise it's not. For example: #shared_buffers = 32MB # min 128kB # (change requires restart) The base unit is BLCKSZ, i.e. 8k, but usually people will usually use MB/GB. And that is evident from the default value, 32MB, so there's no need to mention it in the comment. #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default Here it's not obvious what the unit should be from the default itself. So the comment says it's in seconds. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
Josh Berkus j...@agliodbs.com writes: Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. Meh. Doing this strikes me as a serious documentation failure, since it is no longer apparent what happens if you put in a unitless number. Now, if we were to change the server so that it *refused* settings that didn't have a unit, that argument would become moot. But I'm not going to defend the castle against the villagers who will show up if you do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 11:57 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. Meh. Doing this strikes me as a serious documentation failure, since it is no longer apparent what happens if you put in a unitless number. Well, you know my opinion about documentation in the pg.conf file. However, I'm not confident I'm in the majority there. Now, if we were to change the server so that it *refused* settings that didn't have a unit, that argument would become moot. But I'm not going to defend the castle against the villagers who will show up if you do that. No, thanks! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On Tue, Mar 3, 2015 at 2:00 AM, Jeremy Harris j...@wizmail.org wrote: Yes; there seemed no advantage to any additional complexity. The merge consistently performs fewer comparisons than the quicksort, on random input - and many fewer if there are any sorted (or reverse-sorted) sections. However, it also consistently takes slightly longer (a few percent). I was unable to chase this down but assume it to be a cacheing effect. So I don't currently think it should replace the current sort for all use. It's definitely caching effects. That's a very important consideration here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 03/03/2015 10:29 AM, Heikki Linnakangas wrote: On 03/03/2015 08:21 PM, Josh Berkus wrote: On 03/03/2015 10:15 AM, Josh Berkus wrote: On 03/02/2015 11:25 PM, Heikki Linnakangas wrote: I propose that we remove the comment from max_wal_size, and also remove the in milliseconds from wal_receiver_timeout and autovacuum_vacuum_cost_delay. +1 Actually, let's be consistent about this. It makes no sense to remove unit comments from some settings which accept ms but not others. Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. I think it's a good rule that if the commented-out default in the sample file does not contain a unit, then the base unit is in the comment. Otherwise it's not. For example: #shared_buffers = 32MB# min 128kB # (change requires restart) The base unit is BLCKSZ, i.e. 8k, but usually people will usually use MB/GB. And that is evident from the default value, 32MB, so there's no need to mention it in the comment. #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default Here it's not obvious what the unit should be from the default itself. So the comment says it's in seconds. Sure. Although, do we take (s) for tcp_keepalives_idle? Or only an INT? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Tue, Mar 3, 2015 at 6:05 PM, Jim Nasby jim.na...@bluetreble.com wrote: What about a separate column that's just the text from pg_hba? Or is that what you're opposed to? I'm not sure what you mean by that. There's a rawline field we could put somewhere but it contains the entire line. FWIW, I'd say that having the individual array elements be correct is more important than what the result of array_out is. That way you could always do array_to_string(..., ', ') and get valid pg_hba output. Well I don't think you can get that without making the view less useful for every other purpose. Like, I would want to be able to do WHERE user @ array[?] or WHERE database = array[?] or to join against a list of users or databases somewhere else. To do what you suggest would mean the tokens will need to be quoted based on pg_hba.conf syntax requirements. That would mean I would need to check each variable or join value against pg_hba.conf's quoting requirements to compare with it. It seems more practical to have that knowledge if you're actually going to generate a pg_hba.conf than to pass around these quoted strings all the time. On further review I've made a few more changes attached. I think we should change the column names to users and databases to be clear they're lists and also to avoid the user SQL reserved word. I removed the dependency on strlist_to_array which is in objectaddress.c which isn't a very sensible dependency -- it does seem like it would be handy to have a list-based version of construct_array moved to arrayfuncs.c but for now it's not much more work to handle these ourselves. I changed the options to accumulate one big array instead of an array of bunches of options. Previously you could only end up with a singleton array with a comma-delimited string or a two element array with one of those and map=. I think the error if pg_hba fails to reload needs to be LOG. It would be too unexpected to the user who isn't necessarily the one who issued the SIGHUP to spontaneously get a warning. I also removed the mask from local entries and made some of the NULLS that shouldn't be possible to happen (unknown auth method or connection method) actually throw errors. -- greg *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 7393,7398 --- 7393,7403 entryviews/entry /row + row + entrylink linkend=view-pg-hba-settingsstructnamepg_hba_settings/structname/link/entry + entryclient authentication settings/entry + /row + /tbody /tgroup /table *** *** 9696,9699 SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx --- 9701,9786 /sect1 + sect1 id=view-pg-hba-settings + titlestructnamepg_hba_settings/structname/title + + indexterm zone=view-pg-hba-settings +primarypg_hba_settings/primary + /indexterm + + para +The read-only structnamepg_hba_settings/structname view provides +access to the client authentication configuration from pg_hba.conf. +Access to this view is limited to superusers. + /para + + table +titlestructnamepg_hba_settings/ Columns/title + +tgroup cols=3 + thead + row + entryName/entry + entryType/entry + entryDescription/entry + /row + /thead + tbody + row + entrystructfieldtype/structfield/entry + entrytypetext/type/entry + entryType of connection/entry + /row + row + entrystructfielddatabases/structfield/entry + entrytypetext[]/type/entry + entryList of database names/entry + /row + row + entrystructfieldusers/structfield/entry + entrytypetext[]/type/entry + entryList of user names/entry + /row + row + entrystructfieldaddress/structfield/entry + entrytypeinet/type/entry + entryClient machine address/entry + /row + row + entrystructfieldmask/structfield/entry + entrytypeinet/type/entry + entryIP Mask/entry + /row + row + entrystructfieldcompare_method/structfield/entry + entrytypetext/type/entry + entryIP address comparison method/entry + /row + row + entrystructfieldhostname/structfield/entry + entrytypetext/type/entry + entryClient host name/entry + /row + row + entrystructfieldmethod/structfield/entry + entrytypetext/type/entry + entryAuthentication method/entry + /row + row + entrystructfieldoptions/structfield/entry + entrytypetext[]/type/entry + entryConfiguration options set for authentication method/entry + /row + row + entrystructfieldline_number/structfield/entry + entrytypeinteger/type/entry + entry +Line number within client authentication configuration file +the current value was set at + /entry + /row + /tbody +/tgroup + /table + /sect1 /chapter
Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views
On Tue, Mar 03, 2015 at 05:49:06PM -0300, Alvaro Herrera wrote: Jim Nasby wrote: FWIW, what I would find most useful at this point is a way to get the equivalent of an AFTER STATEMENT trigger that provided all changed rows in a MV as the result of a statement. Ah, like https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com Yes, very much like that. Kevin, might you be able to give some guidance on this? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
No intention to hijack. Dropping issue for now. On Tue, Mar 3, 2015 at 2:05 PM, Josh Berkus j...@agliodbs.com wrote: On 03/03/2015 10:58 AM, Corey Huinker wrote: Naive question: would it be /possible/ to change configuration to accept percentages, and have a percent mean of existing RAM at startup time? I ask because most of the tuning guidelines I see suggest setting memory parameters as a % of RAM available. Waaay off topic for this thread. Please don't hijack; start a new thread if you want to discuss this. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: [HACKERS] NULL-pointer check and incorrect comment for pstate in addRangeTableEntry
On Wed, Feb 25, 2015 at 7:27 AM, Michael Paquier michael.paqu...@gmail.com wrote: Coverity is pointing out that addRangeTableEntry contains the following code path that does a NULL-pointer check on pstate: if (pstate != NULL) pstate-p_rtable = lappend(pstate-p_rtable, rte); But pstate is dereferenced before in isLockedRefname when grabbing the lock mode: lockmode = isLockedRefname(pstate, refname) ? RowShareLock : AccessShareLock; Note that there is as well the following comment that is confusing on top of addRangeTableEntry: * If pstate is NULL, we just build an RTE and return it without adding it * to an rtable list. So I have looked at all the code paths calling this function and found first that the only potential point where pstate could be NULL is transformTopLevelStmt in analyze.c. One level higher there are parse_analyze_varparams and parse_analyze that may use pstate as NULL, and even one level more higher in the stack there is pg_analyze_and_rewrite. But well, looking at each case individually, in all cases we never pass NULL for the parse tree node, so I think that we should remove the comment on top of addRangeTableEntry, remove the NULL-pointer check and add an assertion as in the patch attached. Thoughts? That seems to make sense to me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in pg_dump
On 3/1/15 2:17 PM, Stephen Frost wrote: Peter, if you have a minute, could you take a look at this thread and discussion of having TAP tests under src/test/modules which need to install an extension? I think it's something we certainly want to support but I'm not sure it's a good idea to just run install in every directory that has a prove_check. I don't think the way the tests are set up is scalable. Over time, we will surely want to test more and different extension dumping scenarios. We don't want to have to create a new fully built-out extension for each of those test cases, and we're going to run out of useful names for the extensions, too. Moreover, I would like to have tests of pg_dump in src/bin/pg_dump/, not in remote areas of the code. Here's what I would do: - set up basic scaffolding for TAP tests in src/bin/pg_dump - write a Perl function that can create an extension on the fly, given name, C code, SQL code - add to the proposed t/001_dump_test.pl code to write the extension - add that test to the pg_dump test suite Eventually, the dump-and-restore routine could also be refactored, but as long as we only have one test case, that can wait. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove pg_standby?
On Mon, Mar 2, 2015 at 6:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Mar 3, 2015 at 3:07 AM, Josh Berkus j...@agliodbs.com wrote: Per document, -- In fast failover, the server is brought up immediately. Any WAL files in the archive that have not yet been applied will be ignored, and all transactions in those files are lost. To trigger a fast failover, create a trigger file and write the word fast into it. pg_standby can also be configured to execute a fast failover automatically if no new WAL file appears within a defined interval. -- I thought we had that as a 9.4 feature, actually. Well wait ... that's for streaming. s/9.4/9.3? That's different from one we had in 9.3. Fast failover that pg_standby supports is something like the feature that I was proposing at http://www.postgresql.org/message-id/cahgqgwhtvydqkzaywya9zyylecakif5p0kpcpune_tsrgtf...@mail.gmail.com that is, the feature which allows us to give up replaying remaining WAL data for some reasons at the standby promotion. OTOH, fast failover that was supported in 9.3 enables us to skip an end-of-recovery checkpoint at the promotion and reduce the failover time. Calling those things by the same name is very confusing. The data-losing feature ought to be called immediate failover or something else more dangerous-sounding than fast. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views
Jim Nasby wrote: FWIW, what I would find most useful at this point is a way to get the equivalent of an AFTER STATEMENT trigger that provided all changed rows in a MV as the result of a statement. Ah, like https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more tests on event triggers
On Wed, Feb 25, 2015 at 10:03 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: You can add tests in src/test/modules/dummy_seclabel. Patch attached to test sec label there, in addition to the other more standard checks in event_trigger. These tests seem worthwhile to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On Mon, Mar 2, 2015 at 5:41 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: but I'd like to have a more robust discussion about what we want the error reporting to look like rather than just sliding it into this patch. As an input, I suggest that the error reporting feature should provide a clue about where the issue is, including a line number and possibly the offending line. Not sure what else is needed. I agree. But I think it might be better to try to put everything related to a single error on one line, in a consistent format, e.g.: bad.sql:3: syntax error in set command at column 25 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CATUPDATE confusion?
On 2/28/15 6:32 PM, Stephen Frost wrote: This isn't really /etc/shadow though, this is more like direct access to the filesystem through the device node- and you'll note that Linux certainly has got an independent set of permissions for that called the capabilities system. That's because messing with those pieces can crash the kernel. You're not going to crash the kernel if you goof up /etc/shadow. I think this characterization is incorrect. The Linux capability system does not exist because the actions are scary or can crash the kernel. Capabilities exist because they are not attached to file system objects and can therefore not be represented using the usual permission system. Note that one can write directly to raw devices or the kernel memory through various /dev and /proc files. No capability protects against that. It's only the file permissions, possibly in combination with mount options. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
On 3/3/15 2:36 PM, Andrew Dunstan wrote: On 03/03/2015 02:59 PM, Josh Berkus wrote: On 03/03/2015 11:57 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Do we want to remove unit comments from all settings which accept MB,GB or ms,s,min? There's more than a few. I'd be in favor of this, but seems like (a) it should be universal, and (b) its own patch. Meh. Doing this strikes me as a serious documentation failure, since it is no longer apparent what happens if you put in a unitless number. Well, you know my opinion about documentation in the pg.conf file. However, I'm not confident I'm in the majority there. If we were consistent across all variables for each vartype we could maybe get away with this. But that's not the case. How are users supposed to know that statement_timeout is in ms while authentication_timeout is in seconds? I think there's a difference between comments about the function of a GUC and stating the units it's specified in. It's more than annoying to have to go and look that up where it's not stated. Look up the units? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. Updated patch is attached with the addition of check_for_interrupts in the for loop. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about to have relnamespace and relrole?
On 3/3/15 8:04 PM, Kyotaro HORIGUCHI wrote: Note: The OID alias types don't sctrictly comply the transaction isolation rules so do not use them where exact transaction isolation on the values of these types has a significance. Likewise, since they look as simple constants to planner so you might get slower plans than the queries joining the system tables correnspond to the OID types. Might I suggest: Note: The OID alias types do not completely follow transaction isolation rules. The planner also treats them as simple constants, which may result in sub-optimal planning. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On Fri, Dec 26, 2014 at 1:08 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-09-25 15:40:11 +0530, a...@2ndquadrant.com wrote: All right, then I'll post a version that addresses Amit's other points, adds a new file/function to pgstattuple, acquires content locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all. Sorry, I forgot to post this patch. It does what I listed above, and seems to work fine (it's still quite a lot faster than pgstattuple in many cases). I think one of the comment is not handled in latest patch. 5. Don't we need the handling for empty page (PageIsEmpty()) case? I think we should have siimilar for PageIsEmpty() as you have done for PageIsNew() in your patch. + stat.tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned, + stat.tuple_count); Here for scanned tuples, you have used the live tuple counter whereas I think we need to count HEAPTUPLE_INSERT_IN_PROGRESS and HEAPTUPLE_DELETE_IN_PROGRESS and HEAPTUPLE_RECENTLY_DEAD. Refer the similar logic in Vacuum. Although I am not in favor of using HeapTupleSatisfiesVacuum(), however if you want to use it then lets be consistent with what Vacuum does for estimation of tuples. A couple of remaining questions: 1. I didn't change the handling of LP_DEAD items, because the way it is now matches what pgstattuple counts. I'm open to changing it, though. Maybe it makes sense to count LP_DEAD items iff lp_len != 0? Or just leave it as it is? I think it doesn't matter much. 2. I changed the code to acquire the content locks on the buffer, as discussed, but it still uses HeapTupleSatisfiesVacuum. Amit suggested using HeapTupleSatisfiesVisibility, but it's not clear to me why that would be better. I welcome advice in this matter. The reason to use HeapTupleSatisfiesVacuum() is that it helps us in freezing and some other similar stuff which is required by Vacuum. Also it could be beneficial if we want take specific action for various result codes of Vaccum. I think as this routine mostly gives the estimated count, so it might not matter much even if we use HeapTupleSatisfiesVacuum(), however it is better to be consistent with pgstat_heap() in this case. Do you have any reason for using HeapTupleSatisfiesVacuum() rather than what is used in pgstat_heap()? (If anything, I should use HeapTupleIsSurelyDead, which doesn't set any hint bits, but which I earlier rejected as too conservative in its dead/not-dead decisions for this purpose.) (I've actually patched the pgstattuple.sgml docs as well, but I want to re-read that to make sure it's up to date, and didn't want to wait to post the code changes.) Sure, post the same when it is ready. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Mar 4, 2015 at 12:41 AM, Syed, Rahila rahila.s...@nttdata.com wrote: Please find attached updated patch with WAL replay error fixed. The patch follows chunk ID approach of xlog format. (Review done independently of the chunk_id stuff being good or not, already gave my opinion on the matter). * readRecordBufSize is set to the new buffer size. - * + The patch has some noise diffs. You may want to change the values of BKPBLOCK_WILL_INIT and BKPBLOCK_SAME_REL to respectively 0x01 and 0x02. + uint8 chunk_id = 0; + chunk_id |= XLR_CHUNK_BLOCK_REFERENCE; Why not simply that: chunk_id = XLR_CHUNK_BLOCK_REFERENCE; +#define XLR_CHUNK_ID_DATA_SHORT255 +#define XLR_CHUNK_ID_DATA_LONG 254 Why aren't those just using one bit as well? This seems inconsistent with the rest. + if ((blk-with_hole == 0 blk-hole_offset != 0) || + (blk-with_hole == 1 blk-hole_offset = 0)) In xlogreader.c blk-with_hole is defined as a boolean but compared with an integer, could you remove the ==0 and ==1 portions for clarity? - goto err; + goto err; } } - if (remaining != datatotal) This gathers incorrect code alignment and unnecessary diffs. typedef struct XLogRecordBlockHeader { + /* Chunk ID precedes */ + uint8 id; What prevents the declaration of chunk_id as an int8 here instead of this comment? This is confusing. Following are brief measurement numbers. WAL FPW compression on 122.032 MB FPW compression off 155.239 MB HEAD 155.236 MB What is the test run in this case? How many block images have been generated in WAL for each case? You could gather some of those numbers with pg_xlogdump --stat for example. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add LINE: hint when schemaname.typename is a non-existent schema
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, failed Spec compliant: not tested Documentation:not tested Tom suggested few changes already which I too think author needs to address. So marking it Waiting on Author. However, I see following, example does not work well. postgres=# create or replace function f1(a abc.test.id%type) returns int as $$ select 1; $$ language sql; ERROR: schema abc does not exist Is that expected? I guess we need it at all places in parse_*.c where we will look for namespace. Please fix. Also, like Tom's suggestion on make_oper_cache_key, can we push down this inside func_get_detail() as well, just to limit it for namespace lookup? However, patch is not getting applied cleanly on latest sources. Need rebase. On Tom comments on parse_utilcmd.c: I guess the block is moved after the pstate and CreateStmtContext are setup properly. I guess, we can move just after pstate setup, so that it will result into minimal changes? Can we have small test-case? Or will it be too much for this feature? The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Install shared libs in lib/ and bin/ with MSVC (Was: install libpq.dll in bin directory on Windows / Cygwin)
On Tue, Mar 3, 2015 at 8:36 PM, Asif Naeem anaeem...@gmail.com wrote: Thank you Michael. I have looked the patch. Thanks for the review! Overall logic looks good to me, I have checked it with MSVC{2013,2008}. It works for MSVC 2013 but fail for MSVC 2008, I think the condition if ($proj =~ qr{ResourceCompile\s*Include=([^]+)}) is not going to work for MSVC2008 and MSVC2005 i.e. [...] Thanks for the details, my patch is definitely missing vcproj entries. Note that I don't have yet an environment with MSVC 2008 or similar, I just got 2010 on my box for now. So you will have to wait until I have a fresh environment to get an updated patch... It seems that search criteria can be narrowed to skip reading related Makefile for SO_MAJOR_VERSION string when VS project file is related to StaticLibrary or Application. Well, agreed, and the patch takes care of that for vcxproj files by only installing shared libraries if they use DynamicLibrary. Perhaps you have in mind a better logic? Although this patch is going to make MSVC build consistent with Cygwin and MinGW build, following files seems redundant now, is there any use for them other than backward compatibility ? i.e. inst\lib\libpq.dll inst\lib\libpgtypes.dll inst\lib\libecpg_compat.dll inst\lib\libecpg.dll Indeed, I haven't noticed them. But we can simply remove them as they are installed in bin/ as well with this patch, it seems better for consistency with MinGW and Cygwin in any case. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
Etsuro Fujita wrote: While updating the patch, I noticed that in the previous patch, there is a bug in pushing down parameterized UPDATE/DELETE queries; generic plans for such queries fail with a can't-happen error. I fixed the bug and tried to add the regression tests that execute the generic plans, but I couldn't because I can't figure out how to force generic plans. Is there any way to do that? I don't know about a way to force it, but if you run the statement six times, it will probably switch to a generic plan. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
* Robert Haas (robertmh...@gmail.com) wrote: On Sat, Feb 28, 2015 at 12:27 AM, Stephen Frost sfr...@snowman.net wrote: While this generally works, the usual expectation is that functions that should be superuser-only have a check in the function rather than depending on the execute privilege. I'm certainly happy to debate the merits of that approach, but for the purposes of this patch, I'd suggest you stick an if (!superuser()) ereport(must be superuser) into the function itself and not work about setting the correct permissions on it. -1. If that policy exists at all, it's a BAD policy, because it prevents users from changing the permissions using DDL. I think the superuser check should be inside the function, when, for example, it masks some of the output data depending on the user's permissions. But I see little virtue in handicapping an attempt by a superuser to grant SELECT rights on pg_file_settings. It's not a documented policy but it's certainly what a whole slew of functions *do*. Including pg_start_backup, pg_stop_backup, pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point, pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume, pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir. Indeed, it's the larger portion of what the additional role attributes are for. I'm not entirely thrilled to hear that nearly all of that work might be moot, but if that's the case then so be it and let's just remove all those superuser checks and instead revoke EXECUTE from public and let the superuser grant out those rights. As for pg_stat_get_wal_senders, pg_stat_get_activity, and proably some others, column-level privileges and/or RLS policies might be able to provide what we want there (or possibly not), but it's something to consider if we want to go this route. For pg_signal_backend, we could revoke direct EXECUTE permissions on it and then keep just the check that says only superusers can signal superuser initiated processes, and then a superuser could grant EXECUTE rights on it directly for users they want to have that access. We could possibly also create new helper functions for pg_terminate and pg_cancel. The discussion I'm having with Peter on another thread is a very similar case that should be looping in, which is if we should continue to have any superuser check on updating catalog tables. He is advocating that we remove that also and let the superuser GRANT modification access on catalog tables to users. For my part, I understood that we specifically didn't want to allow that for the same reason that we didn't want to simply depend on the GRANT system for the above functions, but everyone else on these discussions so far is advocating for using the GRANT system. My memory might be wrong, but I could have sworn that I had brought up exactly that suggestion in years past only to have it shot down. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
Jim Nasby jim.na...@bluetreble.com writes: On 3/3/15 11:48 AM, Andres Freund wrote: It'll be confusing to have different interfaces in one/multiple error cases. If we simply don't want the code complexity then fine, but I just don't buy this argument. How could it possibly be confusing? What I'm concerned about is confusing the code. There is a lot of stuff that looks at pidfiles and a lot of it is not very bright (note upthread argument about cat vs head -1). I don't want possibly localized (non-ASCII) text in there, especially when there's not going to be any sane way to know which encoding it's in. And I definitely don't want multiline error messages in there. It's possible we could dumb things down enough to meet these restrictions, but I'd really rather not go there at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On March 3, 2015 04:57:58 PM Jim Nasby wrote: On 3/3/15 11:48 AM, Andres Freund wrote: I'm saying that you'll need a way to notice that a reload was processed or not. And that can't really be the message itself, there has to be some other field; like the timestamp Tom proposes. Ahh, right. We should make sure we don't go brain-dead if the system clock moves backwards. I assume we couldn't just fstat the file... The timestamp field is already there (in my patch). It gets populated when the server starts and repopulated by SIGHUP_handler. It's the only timestamp pg_ctl pays attention to. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: On 3/3/15 5:22 PM, Stephen Frost wrote: The problem with the role attribute approach is that they aren't inheirted the way GRANTs are, which means you can't have a backup role that is then granted out to users, you'd have to set a BACKUP role attribute for every role added. Yeah, but you'd still have to grant backup to every role created anyway, right? Yes, you would. Or you could create a role that has the backup attribute and then grant that to users. Then they'd have to intentionally SET ROLE my_backup_role to elevate their privilege. That seems like a safer way to do things... This is already possible with the GRANT system- create a 'noinherit' role instead of an 'inherit' role. I agree it's safer to require a 'SET ROLE' and configure all of my systems with a noinherit 'admin' role. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Combining Aggregates
Robert Haas robertmh...@gmail.com writes: On Tue, Feb 24, 2015 at 2:20 PM, Peter Eisentraut pete...@gmx.net wrote: I think the combine function is not actually a property of the aggregate, but a property of the transition function. If two aggregates have the same transition function, they will also have the same combine function. The combine function really just says, how do you combine two series of these function calls. Say maybe this should be put into pg_proc instead. (Or you make the transition functions transition operators and put the info into pg_operator instead, which is where function call optimization information is usually kept.) This seems like a weird design to me. It's probably true that if the transition function is the same, the state-combiner function will also be the same. But the state-combiner function is only going to exist for aggregate transition functions, not functions or operators in general. So linking it from pg_proc or pg_operator feels wrong to me. FWIW, I agree with Robert. It's better to keep this in pg_aggregate. We don't need yet another mostly-empty column in pg_proc; and as for pg_operator, there is no expectation that an aggregate transition function is in pg_operator. Also, I don't think it's a foregone conclusion that same transition function implies same combiner function: that logic falls apart when you consider that one of them might be polymorphic and the other not. (Admittedly, that probably means the aggregate creator is missing a bet; but we should not design the catalog schema in a way that says that only maximally efficient aggregate designs are allowed.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configurable location for extension .control files
18.02.2015, 01:49, Jim Nasby kirjoitti: On 2/17/15 4:39 PM, Oskari Saarenmaa wrote: 10.06.2013, 17:51, Dimitri Fontaine kirjoitti: Andres Freund and...@2ndquadrant.com writes: In any case, no packager is going to ship an insecure-by-default configuration, which is what Dimitri seems to be fantasizing would happen. It would have to be local option to relax the permissions on the directory, no matter where it is. *I* don't want that at all. All I'd like to have is a postgresql.conf option specifying additional locations. Same from me. I think I would even take non-plural location. Here's a patch to allow overriding extension installation directory. The patch allows superusers to change it at runtime, but we could also restrict it to postgresql.conf if needed. I don't really see a point in restricting it (or not implementing the option at all) as the superuser can already load custom C code and sql from anywhere in the filesystem; not having configurable extension directory just makes it harder to test extensions and to create private clusters of PG data in a custom location without using custom binaries. I'm interested in this because it could potentially make it possible to install SQL extensions without OS access. (My understanding is there's some issue with writing the extension files even if LIBDIR is writable by the OS account). I'm not sure this patch makes extensions without OS access any easier to implement; you still need to write the files somewhere, and someone needs to set up the cluster with a nonstandard extension path. I don't think anyone should be packaging and shipping PG with extension_directory set, but we really should allow it for superusers and postgresql.conf so people can test extensions without hacks like this: https://github.com/ohmu/pgmemcache/blob/master/localtests.sh#L23 FWIW I'd like to see is breaking the cluster setup/teardown capability in pg_regress into it's own tool. It wouldn't solve the extension test problem, but it would eliminate the need for most of what that script is doing, and it would do it more robustly. It would make it very easy to unit test things with frameworks other than pg_regress. This would certainly be useful. I can try to do something about it if we get configurable extension path supported. The patch is now in https://commitfest.postgresql.org/5/170/ / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2015/02/16 12:03, Etsuro Fujita wrote: I'll update the patch. While updating the patch, I noticed that in the previous patch, there is a bug in pushing down parameterized UPDATE/DELETE queries; generic plans for such queries fail with a can't-happen error. I fixed the bug and tried to add the regression tests that execute the generic plans, but I couldn't because I can't figure out how to force generic plans. Is there any way to do that? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On Mon, Feb 23, 2015 at 7:11 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: On 28.1.2015 05:03, Abhijit Menon-Sen wrote: At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote: Otherwise, the code looks OK to me. Now, there are a few features I'd like to have for production use (to minimize the impact): 1) no index support :-( I'd like to see support for more relation types (at least btree indexes). Are there any plans for that? Do we have an idea on how to compute that? 2) sampling just a portion of the table For example, being able to sample just 5% of blocks, making it less obtrusive, especially on huge tables. Interestingly, there's a TABLESAMPLE patch in this CF, so maybe it's possible to reuse some of the methods (e.g. functions behind SYSTEM sampling)? 3) throttling Another feature minimizing impact of running this on production might be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s' or something along those lines. I think these features could be done separately if anybody is interested. The patch in its proposed form seems useful to me. 4) prefetch fbstat_heap is using visibility map to skip fully-visible pages, which is nice, but if we skip too many pages it breaks readahead similarly to bitmap heap scan. I believe this is another place where effective_io_concurrency (i.e. prefetch) would be appropriate. Good point. We can even think of using the technique used by Vacuum which is skip only when we can skip atleast SKIP_PAGES_THRESHOLD pages. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Proposal: knowing detail of config files via SQL
On Tue, Mar 3, 2015 at 10:29 PM, Stephen Frost sfr...@snowman.net wrote: -1. If that policy exists at all, it's a BAD policy, because it prevents users from changing the permissions using DDL. I think the superuser check should be inside the function, when, for example, it masks some of the output data depending on the user's permissions. But I see little virtue in handicapping an attempt by a superuser to grant SELECT rights on pg_file_settings. It's not a documented policy but it's certainly what a whole slew of functions *do*. Including pg_start_backup, pg_stop_backup, pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point, pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume, pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir. And the same issue comes up for the pg_hba_settings patch. Fwiw it's not entirely true that users can't change the permissions on these functions via DDL -- it's just a lot harder. They have to create a security-definer wrapper function and then grant access to that function to who they want. I'm generally of the opinion we should use the GRANT system and not hard-wire things but I also see the concern that it's really easy to grant access to something without realizing all the consequences. It's a lot harder to audit your system to be sure nothing inappropriate has been granted which can be escalated to super-user access. It's not clear how to determine what the set of things are that could do that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
On Fri, Feb 20, 2015 at 4:01 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: This seems to happen because ordered_set_startup() calls tuplesort_begin_datum() when (use_tuples == true), which only sets 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it does not expect that. Oops. You're right. Attached patch fixes the issue, by making tuplesort_begin_datum() consistent with other comparable routines for other tuple cases. I think it makes sense to have the 'sortKeys' field always set, and the 'onlyKey' field also set when that additional optimization makes sense. That was what I understood the API to be, so arguably this is a pre-existing issue with tuplesort_begin_datum(). This doesn't completely fix the issue. Even with this patch applied: CREATE TABLE stuff AS SELECT random()::text AS randtext FROM generate_series(1,5000); set maintenance_work_mem='1024'; create index on stuff using hash (randtext); ...wait for a bit, server crash... I find your statement that this is a pre-existing issue in tuplesort_begin_datum() to be pretty misleading, unless what you mean by it is pre-existing since November, when an earlier patch by Peter Geoghegan changed the comments without changing the code to match. Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments for the sortKey field to say that it would be set in all cases except for the hash index case, but it did not make that statement true. Subsequently, another of your patches, which I committed as 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on state-sortKeys always being set. Now you've got this patch here, which you claim makes sure that sortKeys is always set, but it actually doesn't, which is why the above still crashes. There are not so many tuplesort_begin_xxx routines that you couldn't audit them all before submitting your patches. Anyway, I propose the attached instead, which makes both Tomas's test case and the one above pass. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company tuplesort-fixes.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
On Tue, Feb 24, 2015 at 2:20 PM, Peter Eisentraut pete...@gmx.net wrote: On 2/20/15 3:09 PM, Tomas Vondra wrote: The 'combine' function gets two such 'state' values, while transition gets 'state' + next value. I think the combine function is not actually a property of the aggregate, but a property of the transition function. If two aggregates have the same transition function, they will also have the same combine function. The combine function really just says, how do you combine two series of these function calls. Say maybe this should be put into pg_proc instead. (Or you make the transition functions transition operators and put the info into pg_operator instead, which is where function call optimization information is usually kept.) This seems like a weird design to me. It's probably true that if the transition function is the same, the state-combiner function will also be the same. But the state-combiner function is only going to exist for aggregate transition functions, not functions or operators in general. So linking it from pg_proc or pg_operator feels wrong to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NULL-pointer check and incorrect comment for pstate in addRangeTableEntry
On Wed, Mar 4, 2015 at 6:43 AM, Robert Haas wrote: That seems to make sense to me. Committed. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On 3/3/15 5:13 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 3/3/15 11:48 AM, Andres Freund wrote: It'll be confusing to have different interfaces in one/multiple error cases. If we simply don't want the code complexity then fine, but I just don't buy this argument. How could it possibly be confusing? What I'm concerned about is confusing the code. There is a lot of stuff that looks at pidfiles and a lot of it is not very bright (note upthread argument about cat vs head -1). I don't want possibly localized (non-ASCII) text in there, especially when there's not going to be any sane way to know which encoding it's in. And I definitely don't want multiline error messages in there. Definitely no multi-line. If we keep that restriction, couldn't we just dedicate one entire line to the error message? ISTM that would be safe. It's possible we could dumb things down enough to meet these restrictions, but I'd really rather not go there at all. IMHO the added DBA convenience would be worth it (assuming we can make it safe). I know I'd certainly appreciate it... On 3/3/15 5:24 PM, Jan de Visser wrote: On March 3, 2015 04:57:58 PM Jim Nasby wrote: On 3/3/15 11:48 AM, Andres Freund wrote: I'm saying that you'll need a way to notice that a reload was processed or not. And that can't really be the message itself, there has to be some other field; like the timestamp Tom proposes. Ahh, right. We should make sure we don't go brain-dead if the system clock moves backwards. I assume we couldn't just fstat the file... The timestamp field is already there (in my patch). It gets populated when the server starts and repopulated by SIGHUP_handler. It's the only timestamp pg_ctl pays attention to. What happens if someone does a reload sometime in the future (perhaps because they've messed with the system clock for test purposes). Do we still detect a new reload? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:17 AM, Jim Nasby jim.na...@bluetreble.com wrote: I can make these changes if you want. Personally I'm just not convinced this is worth it. It makes the catalogs harder for people to read and use and only benefits people who have users named all or databases named all, sameuser, or samerole which I can't really imagine would be anyone. If this were going to be the infrastructure on which lots of tools rested rather than just a read-only view that was mostly going to be read by humans that might be different. Are you envisioning a tool that would look at this view, offer a gui for users to make changes based on that information, and build a new pg_hba.conf to replace it? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: Out of curiosity, regarding the result materialize code addition, Any way the caller of hba_settings function ExecMakeTableFunctionResult also stores the results in tuple_store. Is there any advantage doing it in hba_settings function rather than in the caller? That might protect against concurrent pg_hba reloads, though I suspect there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could maybe put one in this loop and check if the parser memory context has been reset. But the immediate problem is that having the API that looked up a line by line number and then calling it repeatedly with successive line numbers was O(n^2). Each time it was called it had to walk down the list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or O(n^2). This isn't performance critical but it would not have run in a reasonable length of time for large pg_hba files. And having to store the state between calls means the code is at least as simple for the tuplestore, imho even simpler. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Obsolete SnapshotNow reference within snapbuild.c
SnapBuildCommitTxn() has what I gather is an obsolete reference to SnapshotNow(). Attached patch corrects this. -- Peter Geoghegan diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index e911453..ff5ff26 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1077,7 +1077,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, /* refcount of the snapshot builder for the new snapshot */ SnapBuildSnapIncRefcount(builder-snapshot); - /* add a new SnapshotNow to all currently running transactions */ + /* add a new Snapshot to all currently running transactions */ SnapBuildDistributeNewCatalogSnapshot(builder, lsn); } else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
Stephen Frost sfr...@snowman.net writes: It's not a documented policy but it's certainly what a whole slew of functions *do*. Including pg_start_backup, pg_stop_backup, pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point, pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume, pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir. Indeed, it's the larger portion of what the additional role attributes are for. I'm not entirely thrilled to hear that nearly all of that work might be moot, but if that's the case then so be it and let's just remove all those superuser checks and instead revoke EXECUTE from public and let the superuser grant out those rights. It does seem like that could be an easier and more flexible answer than inventing a pile of role attributes. The discussion I'm having with Peter on another thread is a very similar case that should be looping in, which is if we should continue to have any superuser check on updating catalog tables. He is advocating that we remove that also and let the superuser GRANT modification access on catalog tables to users. For my part, I understood that we specifically didn't want to allow that for the same reason that we didn't want to simply depend on the GRANT system for the above functions, but everyone else on these discussions so far is advocating for using the GRANT system. My memory might be wrong, but I could have sworn that I had brought up exactly that suggestion in years past only to have it shot down. I'm a bit less comfortable with this one, mainly because the ability to modify the system catalogs directly implies the ability to crash, corrupt, or hijack the database in any number of non-obvious ways. I would go so far as to argue that if you grant me modify permissions on many of the catalogs, I would have the ability to become superuser outright, and therefore any notion that such a grant is any weaker than granting full superuserness is a dangerous lie. It may be that the same type of permissions escalation is possible with some of the functions you mention above; but anywhere that it isn't, I should think GRANT on the function is a reasonable solution. One aspect of this that merits some thought is that in some cases access to some set of functions is best granted as a unit. That's easy with role properties but much less so with plain GRANT. Do we have enough such cases to make it an issue? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
On Tue, Mar 3, 2015 at 3:53 PM, Robert Haas robertmh...@gmail.com wrote: I find your statement that this is a pre-existing issue in tuplesort_begin_datum() to be pretty misleading, unless what you mean by it is pre-existing since November, when an earlier patch by Peter Geoghegan changed the comments without changing the code to match. Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments for the sortKey field to say that it would be set in all cases except for the hash index case, but it did not make that statement true. Agreed. I believe that is in fact what I meant. I'm not trying to deflect blame - just to explain my understanding of the problem. Subsequently, another of your patches, which I committed as 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on state-sortKeys always being set. Now you've got this patch here, which you claim makes sure that sortKeys is always set, but it actually doesn't, which is why the above still crashes. There are not so many tuplesort_begin_xxx routines that you couldn't audit them all before submitting your patches. Sorry. I should have caught the hash index case too. Anyway, I propose the attached instead, which makes both Tomas's test case and the one above pass. My patch actually matches Andrew Gierth's datumsort patch, in that it also uses this convention, as I believe it should. For that reason, I'd prefer to make the comment added in November true, rather than changing the comment...I feel it ought to be true anyway (which is what I meant about a pre-existing issue). You'll still need the defenses you've added for the the hash case, of course. You might want to also put a comment specifying why it's necessary, since the hash tuple case is the oddball cases that doesn't always have sortKeys set. In other words, I suggest that you commit the union of our two patches, less your changes to the comments around sortKeys'. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
On 3/3/15 5:22 PM, Stephen Frost wrote: The problem with the role attribute approach is that they aren't inheirted the way GRANTs are, which means you can't have a backup role that is then granted out to users, you'd have to set a BACKUP role attribute for every role added. Yeah, but you'd still have to grant backup to every role created anyway, right? Or you could create a role that has the backup attribute and then grant that to users. Then they'd have to intentionally SET ROLE my_backup_role to elevate their privilege. That seems like a safer way to do things... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On Tue, Mar 3, 2015 at 11:41:20AM -0600, Jim Nasby wrote: Wouldn't it need attno info too, so all 3 orderings? Uh, what is the third ordering? Physical, logical, and ? It already gets information about dropped columns, if that is the third one. attnum; used in other catalogs to reference columns. If you're shuffling everything though pg_dump anyway then maybe it's not needed... Yes, all those attno system table links are done with pg_dump SQL commands. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 03/03/2015 05:07 PM, Greg Stark wrote: On Wed, Mar 4, 2015 at 12:17 AM, Jim Nasby jim.na...@bluetreble.com wrote: I can make these changes if you want. Personally I'm just not convinced this is worth it. It makes the catalogs harder for people to read and use and only benefits people who have users named all or databases named all, sameuser, or samerole which I can't really imagine would be anyone. If this were going to be the infrastructure on which lots of tools rested rather than just a read-only view that was mostly going to be read by humans that might be different. Are you envisioning a tool that would look at this view, offer a gui for users to make changes based on that information, and build a new pg_hba.conf to replace it? I'm planning to write such a tool. However, I am not concerned about weird name cases like the above. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:18 PM, Greg Stark st...@mit.edu wrote: On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: Out of curiosity, regarding the result materialize code addition, Any way the caller of hba_settings function ExecMakeTableFunctionResult also stores the results in tuple_store. Is there any advantage doing it in hba_settings function rather than in the caller? That might protect against concurrent pg_hba reloads, though I suspect there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could maybe put one in this loop and check if the parser memory context has been reset. I feel there is no problem of current pg_hba reloads, because the check_for_interrupts doesn't reload the conf files. It will be done in the postgresMain function once the query finishes. Am I missing something? + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. But the immediate problem is that having the API that looked up a line by line number and then calling it repeatedly with successive line numbers was O(n^2). Each time it was called it had to walk down the list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or O(n^2). This isn't performance critical but it would not have run in a reasonable length of time for large pg_hba files. And having to store the state between calls means the code is at least as simple for the tuplestore, imho even simpler. Got it. Thanks. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views
On 3/3/15 3:34 PM, David Fetter wrote: On Tue, Mar 03, 2015 at 05:49:06PM -0300, Alvaro Herrera wrote: Jim Nasby wrote: FWIW, what I would find most useful at this point is a way to get the equivalent of an AFTER STATEMENT trigger that provided all changed rows in a MV as the result of a statement. Ah, like https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com Yes, very much like that. Actually, I was talking about the next step beyond that. I don't want what changed in a single table; I want what changed *in the source of the entire MV*. Kevin has a whitepaper that describes how to do this in set notation; theoretically this is a matter of converting that to SQL. IIRC this needs the deltas and current (or maybe NEW and OLD) for every table in the MV. So one way you could model this is a function that accepts a bunch of NEW and OLD recordsets. Theoretically you could actually drive that with per-row triggers, but the performance would presumably suck. Next best thing would be providing NEW and OLD for AFTER STATEMENT triggers (what Kevin was talking about in that email). Though, if you're driving this at a statement level that means you can't actually reference the MV in a statement that's performing DML on any of the dependent tables. As you can see, this is all pretty involved. Doing just a small part of this would make for a good GSoC project. AFTER STATEMENT NEW and OLD might be a good project; I don't know how much more work Kevin's stuff needs. But there's much greater value in creating something that would take the definition for a MV and turn that into appropriate delta logic. That would be the basis for detecting if a MV was stale (beyond just the gross level check of were any of the tables involved touched), and is what is needed to do *any* kind of incremental update. That logic doesn't have to be driven by triggers. For example, you could have PgQ or similar capture all DML on all tables for a MV and feed that data to the delta logic on an async incremental basis. It's pretty easy for an end user to setup PgQ or similar but doing the delta logic is tightly coupled to the MV definition, which would be very hard for an end user to deal with. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 3/3/15 12:57 PM, Greg Stark wrote: On Tue, Mar 3, 2015 at 6:05 PM, Jim Nasby jim.na...@bluetreble.com wrote: What about a separate column that's just the text from pg_hba? Or is that what you're opposed to? I'm not sure what you mean by that. There's a rawline field we could put somewhere but it contains the entire line. I mean have a field for each of user/databases that gives you valid pg_hba.conf output. That would allow for cut paste. But perhaps that's just not a use case. FWIW, I'd say that having the individual array elements be correct is more important than what the result of array_out is. That way you could always do array_to_string(..., ', ') and get valid pg_hba output. Well I don't think you can get that without making the view less useful for every other purpose. Like, I would want to be able to do WHERE user @ array[?] or WHERE database = array[?] or to join against a list of users or databases somewhere else. I think we're screwed in that regard anyway, because of the special constructs. You'd need different logic to handle things like +role and sameuser. We might even end up painted in a corner where we can't change it in the future because it'll break everyone's scripts. To do what you suggest would mean the tokens will need to be quoted based on pg_hba.conf syntax requirements. That would mean I would need to check each variable or join value against pg_hba.conf's quoting requirements to compare with it. It seems more practical to have that knowledge if you're actually going to generate a pg_hba.conf than to pass around these quoted strings all the time. What about this: - database_specials enum[] contains all occurrences of all, sameuser, samerole and replication (or maybe it doesn't need to be an array) - in_roles name[] is a list of all cases of +role_name, with the + stripped off (I think the @ case is already handled before the SRF is called??) - role_specials enum[] handles all (enum[] for any future expansion) Alternatively in_roles could just be combined with role_specials as long as we preserve the +. Any tokens that match the special conditions do not show up in databases/users, and those fields become name[]. AFAIK that means the quoting should be identical (at least looking at check_db() and check_role()). I can make these changes if you want. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 5:57 AM, Greg Stark st...@mit.edu wrote: On further review I've made a few more changes attached. I think we should change the column names to users and databases to be clear they're lists and also to avoid the user SQL reserved word. I removed the dependency on strlist_to_array which is in objectaddress.c which isn't a very sensible dependency -- it does seem like it would be handy to have a list-based version of construct_array moved to arrayfuncs.c but for now it's not much more work to handle these ourselves. I changed the options to accumulate one big array instead of an array of bunches of options. Previously you could only end up with a singleton array with a comma-delimited string or a two element array with one of those and map=. The changes are fine, this eliminates the unnecessary dependency on objectaddress. I think the error if pg_hba fails to reload needs to be LOG. It would be too unexpected to the user who isn't necessarily the one who issued the SIGHUP to spontaneously get a warning. I also removed the mask from local entries and made some of the NULLS that shouldn't be possible to happen (unknown auth method or connection method) actually throw errors. changes are fine. All the tests are passed. Out of curiosity, regarding the result materialize code addition, Any way the caller of hba_settings function ExecMakeTableFunctionResult also stores the results in tuple_store. Is there any advantage doing it in hba_settings function rather than in the caller? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On 3/3/15 11:48 AM, Andres Freund wrote: On 2015-03-03 11:43:46 -0600, Jim Nasby wrote: It's certainly better than now, but why put DBAs through an extra step for no reason? Because it makes it more complicated than it already is? It's nontrivial to capture the output, escape it to somehow fit into a delimited field, et al. I'd rather have a committed improvement, than talks about a bigger one. I don't see how this would be significantly more complex, but of course that's subjective. Though, in the case of multiple errors perhaps it would be best to just report a count and point them at the log. It'll be confusing to have different interfaces in one/multiple error cases. If we simply don't want the code complexity then fine, but I just don't buy this argument. How could it possibly be confusing? Either you'll get the actual error message or a message that says Didn't work, check the log. And the error will always be in the log no matter what. I really can't imagine that anyone would be unhappy that we just up-front told them what the error was if we could. Why make them jump through needless hoops? I'm saying that you'll need a way to notice that a reload was processed or not. And that can't really be the message itself, there has to be some other field; like the timestamp Tom proposes. Ahh, right. We should make sure we don't go brain-dead if the system clock moves backwards. I assume we couldn't just fstat the file... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: The discussion I'm having with Peter on another thread is a very similar case that should be looping in, which is if we should continue to have any superuser check on updating catalog tables. He is advocating that we remove that also and let the superuser GRANT modification access on catalog tables to users. For my part, I understood that we specifically didn't want to allow that for the same reason that we didn't want to simply depend on the GRANT system for the above functions, but everyone else on these discussions so far is advocating for using the GRANT system. My memory might be wrong, but I could have sworn that I had brought up exactly that suggestion in years past only to have it shot down. I'm a bit less comfortable with this one, mainly because the ability to modify the system catalogs directly implies the ability to crash, corrupt, or hijack the database in any number of non-obvious ways. I would go so far as to argue that if you grant me modify permissions on many of the catalogs, I would have the ability to become superuser outright, and therefore any notion that such a grant is any weaker than granting full superuserness is a dangerous lie. I tend to agree with this approach and it's what I was advocating for in my overall analysis of superuser checks that gave rise to the additional role attributes idea. If it can be used to become superuser then it doesn't make sense to have it be independent from superuser. It may be that the same type of permissions escalation is possible with some of the functions you mention above; but anywhere that it isn't, I should think GRANT on the function is a reasonable solution. The functions identified for the new role attributes were specifically those that, I believe, could be used without also giving the user superuser rights. Those are: pg_start_backup, pg_stop_backup, pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point, pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume. One aspect of this that merits some thought is that in some cases access to some set of functions is best granted as a unit. That's easy with role properties but much less so with plain GRANT. Do we have enough such cases to make it an issue? That's what roles are for, in my mind. If we're fine with using the grant system to control executing these functions then I would suggest we address this by providing some pre-configured roles or even just documentation which list what a given role would need. That's certainly what a lot of people coming from other databases expect. The problem with the role attribute approach is that they aren't inheirted the way GRANTs are, which means you can't have a backup role that is then granted out to users, you'd have to set a BACKUP role attribute for every role added. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding
On Tue, Mar 3, 2015 at 3:13 AM, Andres Freund and...@2ndquadrant.com wrote: toast_save_datum() is called with a heap_insert() call before heap insertion for the tuple proper. We're relying on the assumption that if there is no immediate super deletion record, things are fine. We cannot speculatively insert into catalogs, BTW. I fail to see what prohibits e.g. another catalog modifying transaction to commit inbetween and distribute a new snapshot. SnapBuildDistributeNewCatalogSnapshot()/REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT does look like a problematic case. It's problematic specifically because it involves some xact queuing a change to *some other* xact - we cannot assume that this won't happen between WAL-logging within heap_insert() and heap_delete(). Can you think of any other such cases? I think that REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID should be fine. REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID is not an issue either. In both cases I believe my assumption does not break because there won't be writes to system catalogs between the relevant heap_insert() and heap_delete() calls, which are tightly coupled, and also because speculative insertion into catalogs is unsupported. That just leaves the non-*CHANGE_INTERAL_* (regular DML) cases, which should also be fine. As for SnapBuildDistributeNewCatalogSnapshot(), I'm open to suggestions. Do you see any opportunity around making assumptions about heavyweight locking making the interleaving of some REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change between a REORDER_BUFFER_CHANGE_INTERNAL_INSERT change and a REORDER_BUFFER_CHANGE_INTERNAL_DELETE change? AFAICT, that's all this approach really needs to worry about (or the interleaving of something else not under the control of speculative insertion - doesn't have to be a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, that's just the most obvious problematic case). Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case next, in the case that we need to care about (when the tuple was super deleted immediately afterwards)? It's irrelevant whether you care about it or not. Your ReorderBufferIterTXNNext() consumes a change that needs to actually be processed. It could very well be something important like a new snapshot. But it is actually processed, in the next iteration (we avoid calling ReorderBufferIterTXNNext() at the top of the loop if it has already been called for that iteration as part of peeking ahead). AFAICT all that is at issue is the safety of one particular assumption I've made: that it is safe to assume that there will never be a super deletion in the event of not seeing a super deletion change immediately following a speculative insertion change within some xact when decoding. It's still going to be processed if it's something else. The implementation will, however, fail to consolidate the speculative insertion change and super deletion change if my assumption is ever faulty. This whole approach doesn't seem to be all that different to how there is currently coordination within ReorderBufferCommit() between TOAST-related changes, and changes to the relation proper. In fact, I've now duplicated the way the IsToastRelation() case performs dlist_delete(change-node) in order to avoid chunk reuse in the event of spilling to disk. Stress testing by decreasing max_changes_in_memory to 1 does not exhibit broken behavior, I believe (although that does break the test_decoding regression tests on the master branch, FWIW). Also, obviously I have not considered the SnapBuildDistributeNewCatalogSnapshot() case too closely. I attach an updated patch that I believe at least handles the serialization aspects correctly, FYI. Note that I assert that REORDER_BUFFER_CHANGE_INTERNAL_DELETE follows a REORDER_BUFFER_CHANGE_INTERNAL_INSERT, so if you can find a way to break the assumption it should cause an assertion failure, which is something to look out for during testing. While what I have here *is* very ugly, I see no better alternative. By doing what you suggest, we'd be finding a special case way of safely violating the general WAL log-before-data rule. No, it won't. We don't WAL log modifications that don't need to persist in a bunch of places. It'd be perfectly fine to start upsert with a 'acquiring a insertion lock' record that pretty much only contains the item pointer (and a FPI if necessary to prevent torn pages). And then only log the full new record once it's sure that the whole thing needs to survive. Redo than can simply clean out the size touched by the insertion lock. That seems like a lot of work in the general case to not do something that will only very rarely need to occur anyway. The optimistic approach of value locking scheme #2 has a small race that is detected (which necessitates super deletion), but that will only very rarely be required. Also, there is value in making super deletion (and speculative insertion) as close as possible to regular deletion
Re: [HACKERS] Bootstrap DATA is a pita
On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote: On 2/20/15 8:46 PM, Josh Berkus wrote: Or what about just doing CSV? I don't think that would actually address the problems. It would just be the same format as now with different delimiters. Yea, we need hierarchies and named keys. Yeah. One thought though is that I don't think we need the data layer in your proposal; that is, I'd flatten the representation to something more like { oid = 2249, oiddefine = 'CSTRINGOID', typname = 'cstring', typlen = -2, typbyval = 1, ... } Even this promises to vastly increase the number of lines in the file, and make it harder to compare entries by grepping out some common substring. I agree that the current format is a pain in the tail, but pg_proc.h is 5k lines already. I don't want it to be 100k lines instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in pg_dump
On Wed, Mar 4, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote: - set up basic scaffolding for TAP tests in src/bin/pg_dump Agreed. - write a Perl function that can create an extension on the fly, given name, C code, SQL code I am perplex about that. Where would the SQL code or C code be stored? In the pl script itself? I cannot really see the advantage to generate automatically the skeletton of an extension based on some C or SQL code in comparison to store the extension statically as-is. Adding those extensions in src/test/modules is out of scope to not bloat it, so we could for example add such test extensions in t/extensions/ or similar, and have prove_check scan this folder, then install those extensions in the temporary installation. - add to the proposed t/001_dump_test.pl code to write the extension - add that test to the pg_dump test suite Eventually, the dump-and-restore routine could also be refactored, but as long as we only have one test case, that can wait. Agreed on all those points. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] MD5 authentication needs help
It feels like MD5 has accumulated enough problems that we need to start looking for another way to store and pass passwords. The MD5 problems are: 1) MD5 makes users feel uneasy (though our usage is mostly safe) 2) The per-session salt sent to the client is only 32-bits, meaning that it is possible to reply an observed MD5 hash in ~16k connection attempts. 3) Using the user name for the MD5 storage salt allows the MD5 stored hash to be used on a different cluster if the user used the same password. 4) Using the user name for the MD5 storage salt causes the renaming of a user to break the stored password. For these reasons, it is probably time to start thinking about a replacement that fixes these issues. We would keep MD5 but recommend a better option. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Obviously FDW can add multiple paths at a time, like GetForeignPaths, so IMO it should be renamed to GetForeignJoinPaths, with plural form. In addition to that, new member of RelOptInfo, fdw_handler, should be initialized explicitly in build_simple_rel. Please see attached a patch for these changes. Thanks for your checks. Yep, the name of FDW handler should be ...Paths(), instead of Path(). The attached one integrates Hanada-san's updates. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Shigeru Hanada [mailto:shigeru.han...@gmail.com] Sent: Tuesday, March 03, 2015 9:26 PM To: Kaigai Kouhei(海外 浩平) Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) Kaigai-san, The v6 patch was cleanly applied on master branch. I'll rebase my patch onto it, but before that I have a comment about name of the new FDW API handler GetForeignJoinPath. Obviously FDW can add multiple paths at a time, like GetForeignPaths, so IMO it should be renamed to GetForeignJoinPaths, with plural form. In addition to that, new member of RelOptInfo, fdw_handler, should be initialized explicitly in build_simple_rel. Please see attached a patch for these changes. I'll review the v6 path afterwards. 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: Sorry, I misoperated on patch creation. Attached one is the correct version. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Tuesday, March 03, 2015 6:31 PM To: Kaigai Kouhei(海外 浩平); Robert Haas Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) The attached version of custom/foreign-join interface patch fixes up the problem reported on the join-pushdown support thread. The previous version referenced *_ps_tlist on setrefs.c, to check whether the Custom/ForeignScan node is associated with a particular base relation, or not. This logic considered above nodes performs base relation scan, if *_ps_tlist is valid. However, it was incorrect in case when underlying pseudo-scan relation has empty targetlist. Instead of the previous logic, it shall be revised to check scanrelid itself. If zero, it means Custom/ForeignScan node is not associated with a particular base relation, thus, its slot descriptor for scan shall be constructed based on *_ps_tlist. Also, I noticed a potential problem if CSP/FDW driver want to displays expression nodes using deparse_expression() but varnode within this expression does not appear in the *_ps_tlist. For example, a remote query below shall return rows with two columns. SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; Thus, ForeignScan will perform like as a scan on relation with two columns, and FDW driver will set two TargetEntry on the fdw_ps_tlist. If FDW is designed to keep the join condition (aid = bid) using expression node form, it is expected to be saved on custom/fdw_expr variable, then setrefs.c rewrites the varnode according to *_ps_tlist. It means, we also have to add *_ps_tlist both of aid and bid to avoid failure on variable lookup. However, these additional entries changes the definition of the slot descriptor. So, I adjusted ExecInitForeignScan and ExecInitCustomScan to use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct the slot descriptor based on the *_ps_tlist. It expects CSP/FDW drivers to add target-entries with resjunk=true, if it wants to have additional entries for variable lookups on EXPLAIN command. Fortunately or unfortunately, postgres_fdw keeps its remote query in cstring form, so it does not need to add junk entries on the fdw_ps_tlist. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Sunday, February 15, 2015 11:01 PM To: Kaigai Kouhei(海外 浩平); Robert Haas Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) The attached patch is a rebased version of join replacement with foreign-/custom-scan. Here is no feature updates at this moment but SGML documentation is added (according to Michael's comment). This infrastructure allows foreign-data-wrapper and custom-scan- provider to add alternative scan paths towards relations join. From viewpoint of the executor, it looks like a scan on a pseudo- relation that is