Re: [HACKERS] NOT Null constraint on foreign table not working
Rushabh Lathia wrote: I found constraints on foreign table is very useful for the application when the multiple user accessing same remote table using fdw and both user want to enforce different constraint on particular table or different user want to enforce different DEFAULT expression for the same table column. I agree with you that if we want to enforce constraint locally then it should be FDW's task to do it rather then nodeModifyTable.c. I believe that a column of a foreign table should be NOT NULL only if it is guaranteed that it cannot contain NULL values. Doesn't the planner rely on that? But PostgreSQL cannot guarantee that, that has to happen on the remote side (or in the FDW). I think that it is best that an error for a constraint violation is thrown by the same entity that guarantees that the constraint is respected. So I agree with your last statement. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/21/2014 07:22 AM, Harold Giménez wrote: First of all, I apologize for submitting a patch and missing the commitfest deadline. Given the size of the patch, I thought I'd submit it for your consideration regardless. This patch prevents non-superusers from viewing other user's pg_stat_activity.application_name. This topic was discussed some time ago [1] and consequently application_name was made world readable [2]. I would like to propose that we hide it instead by reverting to the original behavior. There is a very large number of databases on the same cluster shared across different users who can easily view each other's application_name values. Along with that, there are some libraries that default application_name to the name of the running process [3], which can leak information about what web servers applications are running, queue systems, etc. Furthermore leaking application names in a multi-tenant environment is more information than an attacker should have access to on services like Heroku and other similar providers. I don't find these arguments compelling to change it now. It's well-documented that application_name is visible to everyone. Just don't put sensitive information there. For those users that don't mind advertising application_name, the patch would be highly inconvenient. For example, the database owner could no longer see the application_name of other users connected to her database. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: option --if-exists for pg_dump
Hi Pavel, Consider following test scenario: mydb=# \d emp Table public.emp Column | Type | Modifiers +-+--- empno | integer | not null deptno | integer | ename | text| Indexes: emp_pkey PRIMARY KEY, btree (empno) Foreign-key constraints: emp_deptno_fkey FOREIGN KEY (deptno) REFERENCES dept(deptno) mydb=# \d dept Table public.dept Column | Type | Modifiers +-+--- deptno | integer | not null dname | text| Indexes: dept_pkey PRIMARY KEY, btree (deptno) Referenced by: TABLE emp CONSTRAINT emp_deptno_fkey FOREIGN KEY (deptno) REFERENCES dept(deptno) mydb=# \q jeevan@ubuntu:~/pg_master$ ./install/bin/pg_dump -d mydb --if-exists -c mydb_ic.dmp I see following lines in dump which looks certainly wrong: === DROP FK CONSTRAINT IF EXISTS ublic.emp DROP CONSTRAINT emp_deptno_fkey; DROP CONSTRAINT IF EXISTS Y public.emp DROP CONSTRAINT emp_pkey; DROP CONSTRAINT IF EXISTS Y public.dept DROP CONSTRAINT dept_pkey; When try to restore, as expected it is throwing an error: === psql:mydb_ic.dmp:14: ERROR: syntax error at or near FK LINE 1: DROP FK CONSTRAINT IF EXISTS ublic.emp DROP CONSTRAINT emp_d... ^ psql:mydb_ic.dmp:15: ERROR: syntax error at or near CONSTRAINT LINE 1: DROP CONSTRAINT IF EXISTS Y public.emp DROP CONSTRAINT emp_p... ^ psql:mydb_ic.dmp:16: ERROR: syntax error at or near CONSTRAINT LINE 1: DROP CONSTRAINT IF EXISTS Y public.dept DROP CONSTRAINT dept... ^ Note: === Commands which are in form of ALTER TABLE ... DROP are failing. You need to test each and every object with DROP .. IF EXISTS command. Better write small test-case with all objects included. Following logic has flaw: === diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 7fc0288..0677712 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -413,8 +413,30 @@ RestoreArchive(Archive *AHX) /* Select owner and schema as necessary */ _becomeOwner(AH, te); _selectOutputSchema(AH, te-namespace); -/* Drop it */ -ahprintf(AH, %s, te-dropStmt); + +if (*te-dropStmt != '\0') +{ +/* Inject IF EXISTS clause when it is required. */ +if (ropt-if_exists) +{ +char buffer[40]; +size_t l; + +/* But do it only, when it is not there yet. */ +snprintf(buffer, sizeof(buffer), DROP %s IF EXISTS, + te-desc); +l = strlen(buffer); + +if (strncmp(te-dropStmt, buffer, l) != 0) +{ +ahprintf(AH, DROP %s IF EXISTS %s, +te-desc, +te-dropStmt + l); +} +else +ahprintf(AH, %s, te-dropStmt); +} +} } } Also: === 1. This is still out of sync. @@ -348,6 +350,8 @@ main(int argc, char *argv[]) appendPQExpBufferStr(pgdumpopts, --binary-upgrade); if (column_inserts) appendPQExpBufferStr(pgdumpopts, --column-inserts); +if (if_exists) +appendPQExpBufferStr(pgdumpopts, --if-exists); if (disable_dollar_quoting) appendPQExpBufferStr(pgdumpopts, --disable-dollar-quoting); if (disable_triggers) 2. Spell check required: +/* skip first n chars, and create a modifieble copy */ modifieble = modifiable +/* DROP IF EXISTS pattern is not appliable on dropStmt */ appliable = applicable 3. +/* + * Object description is based on dropStmt statement. But + * a drop statements can be enhanced about IF EXISTS clause. + * We have to increase a offset in this case, IF EXISTS + * should not be included on object description. + */ Looks like you need to re-phrase these comments line. Something like: /* * Object description is based on dropStmt statement which may have * IF EXISTS clause. Thus we need to update an offset such that it * won't be included in the object description. */ Or as per your choice. Need to have careful thought on a bug mentioned above. Thanks On Fri, Jan 17, 2014 at 6:23 PM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello 2014/1/16 Jeevan Chalke jeevan.cha...@enterprisedb.com Hi Pavel, I have reviewed the patch and here are my concerns and notes: POSITIVES: --- 1. Patch applies with some white-space errors. 2. make / make install / make check is smooth. No issues as such. 3. Feature looks good as well. 4. NO
Re: [HACKERS] Funny representation in pg_stat_statements.query.
Hello, I found it would be reasonably fixed by small modifications. - CURRENT_TIME and the like are parsed into the nodes directly represents the node specs in gram.y blah, blah Since this becomes more than a simple bug fix, I think it is too late for 9.4 now. I'll work on this in the longer term. The fundamental cause of this issue is Const node which conveys the location of other than constant tokens. Any other nodes, for instance TypeCasts, are safe. So this is fixed by quite simple way like following getting rid of the referred difficulties of keeping the code sane and total loss of token locations. (white spaces are collapsed for readability) | --- a/src/backend/parser/gram.y | +++ b/src/backend/parser/gram.y | @@ -11355,8 +11355,8 @@ func_expr_common_subexpr: |* to rely on it.) |*/ | Node *n; | - n = makeStringConstCast(now, @1, SystemTypeName(text)); | - $$ = makeTypeCast(n, SystemTypeName(date), -1); | + n = makeStringConstCast(now, -1, SystemTypeName(text)); | + $$ = makeTypeCast(n, SystemTypeName(date), @1); Any suggestions? Attached is the complete patch. As of 9.4dev, transformTypeCast passes the locations retrieved from the source TypeCast node itself or its TypeName subnode to the newly created CoerceViaIO node but leaves that of the inner expressions (the now, I mean) as it is, so I think this fits to what transformTypeCast does more than before. The query tree after the transformation sees like follows. I think this is preferable to that -1 to CoerceViaIO and 7 to Const. ... { type = TargetEtnry expr = { type = CoerceViaIO, location = 7, arg = { type = Const, location = -1 } } } In addition, the location stored in StringConstCast seems currently not having no route to users' sight, even any step out the node - in short 'useless'. And yet the location of CoerceViaIO is also not used... || CURRENT_DATE .. |- n = makeStringConstCast(now, @1, SystemTypeName(text) |+ n = makeStringConstCast(nowe, @1, SystemTypeName(text) yields only this, including logs. | =# select CURRENT_DATE; | ERROR: invalid input syntax for type date: nowe regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 12a6beb..a14a381 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -11399,8 +11399,8 @@ func_expr_common_subexpr: * to rely on it.) */ Node *n; - n = makeStringConstCast(now, @1, SystemTypeName(text)); - $$ = makeTypeCast(n, SystemTypeName(date), -1); + n = makeStringConstCast(now, -1, SystemTypeName(text)); + $$ = makeTypeCast(n, SystemTypeName(date), @1); } | CURRENT_TIME { @@ -11409,8 +11409,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast(now, @1, SystemTypeName(text)); - $$ = makeTypeCast(n, SystemTypeName(timetz), -1); + n = makeStringConstCast(now, -1, SystemTypeName(text)); + $$ = makeTypeCast(n, SystemTypeName(timetz), @1); } | CURRENT_TIME '(' Iconst ')' { @@ -11420,10 +11420,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast(now, @1, SystemTypeName(text)); + n = makeStringConstCast(now, -1, SystemTypeName(text)); d = SystemTypeName(timetz); d-typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast(n, d, -1); + $$ = makeTypeCast(n, d, @1); } | CURRENT_TIMESTAMP { @@ -11441,10 +11441,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast(now, @1, SystemTypeName(text)); + n = makeStringConstCast(now, -1, SystemTypeName(text)); d = SystemTypeName(timestamptz); d-typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast(n, d, -1); + $$ = makeTypeCast(n, d, @1); } | LOCALTIME { @@ -11453,8 +11453,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast(now, @1, SystemTypeName(text)); - $$ = makeTypeCast((Node *)n, SystemTypeName(time), -1); + n = makeStringConstCast(now, -1, SystemTypeName(text)); + $$ = makeTypeCast((Node *)n, SystemTypeName(time), @1); } | LOCALTIME '(' Iconst ')' { @@ -11464,10 +11464,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast(now, @1, SystemTypeName(text)); + n = makeStringConstCast(now, -1, SystemTypeName(text)); d = SystemTypeName(time); d-typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast((Node *)n, d, -1); + $$ = makeTypeCast((Node *)n, d, @1); } | LOCALTIMESTAMP { @@ -11476,8 +11476,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n =
Re: [HACKERS] proposal: hide application_name from other users
On 01/21/2014 04:19 PM, Heikki Linnakangas wrote: On 01/21/2014 07:22 AM, Harold Giménez wrote: First of all, I apologize for submitting a patch and missing the commitfest deadline. Given the size of the patch, I thought I'd submit it for your consideration regardless. This patch prevents non-superusers from viewing other user's pg_stat_activity.application_name. This topic was discussed some time ago [1] and consequently application_name was made world readable [2]. I would like to propose that we hide it instead by reverting to the original behavior. There is a very large number of databases on the same cluster shared across different users who can easily view each other's application_name values. Along with that, there are some libraries that default application_name to the name of the running process [3], which can leak information about what web servers applications are running, queue systems, etc. Furthermore leaking application names in a multi-tenant environment is more information than an attacker should have access to on services like Heroku and other similar providers. I don't find these arguments compelling to change it now. It's well-documented that application_name is visible to everyone. Just don't put sensitive information there. For those users that don't mind advertising application_name, the patch would be highly inconvenient. For example, the database owner could no longer see the application_name of other users connected to her database. It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. If you want control over visibility of application_name, it should be done with a column privilige granted to a system role, or something like that - so the ability to see it can be given to public on default (thus not breaking BC) and if it's revoked from public, given to roles that need to see it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 21 January 2014 01:09, KaiGai Kohei kai...@ak.jp.nec.com wrote: (2014/01/13 22:53), Craig Ringer wrote: On 01/09/2014 11:19 PM, Tom Lane wrote: Dean Rasheed dean.a.rash...@gmail.com writes: My first thought was that it should just preprocess any security barrier quals in subquery_planner() in the same way as other quals are preprocessed. But thinking about it further, those quals are destined to become the quals of subqueries in the range table, so we don't actually want to preprocess them at that stage --- that will happen later when the new subquery is planned by recursion back into subquery_planner(). So I think the right answer is to make adjust_appendrel_attrs() handle recursion into sublink subqueries. TBH, this sounds like doubling down on a wrong design choice. I see no good reason that updatable security views should require any fundamental rearrangements of the order of operations in the planner. In that case, would you mind offerign a quick sanity check on the following alternative idea: - Add sourceRelation to Query. This refers to the RTE that supplies tuple projections to feed into ExecModifyTable, with appropriate resjunk ctid and (if requ'd) oid cols present. - When expanding a target view into a subquery, set sourceRelation on the outer view to the index of the RTE of the newly expanded subquery. I have sane opinion. Existing assumption, resultRelation is RTE of the table to be read not only modified, makes the implementation complicated. If we would have a separate sourceRelation, it allows to have flexible form including sub-query with security_barrier on the reader side. Could you tell me the direction you're inclined right now? I wonder whether I should take the latest patch submitted on 09-Jan for a deep code reviewing and testing. Yes, please review the patch from 09-Jan (http://www.postgresql.org/message-id/CAEZATCUiKxOg=voovja2s6g-sixzzxg18totggp8zobq6qn...@mail.gmail.com). The idea behind that patch is to avoid a lot of the complication that leads to and then arises from having a separate sourceRelation in the query. If you go down the route of expanding the subquery in the rewriter and making it the sourceRelation, then you have to make extensive changes to preprocess_targetlist so that it recursively descends into the subquery to pull out variables needed by an update. Also you would probably need additional changes in the rewriter so that later stages didn't trample on the sourceRelation, and correctly updated it in the case of views on top of other views. Also, you would need to make changes to the inheritance planner code, because you'd now have 2 RTEs referring to the target relation (resultRelation and sourceRelation wrapping it in a subquery). Referring to the comment in inheritance_planner(): * Source inheritance is expanded at the bottom of the * plan tree (see allpaths.c), but target inheritance has to be expanded at * the top. except that in the case of the sourceRelation, it is actually effectively the target, which means it shouldn't be expanded, otherwise you'd get plans like the ones I complained about upthread (see the final explain output in http://www.postgresql.org/message-id/caezatcu3vcdkjpnhgfkrmrkz0axkcuh4ce_kqq3z2hzknhi...@mail.gmail.com). Perhaps all of that could be made to work, but I think that it would end up being a far more invasive patch than my 09-Jan patch. My patch avoids both those issues by not doing the subquery expansion until after inheritance expansion, and after targetlist preprocessing. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 01/21/2014 04:02 AM, Tomas Vondra wrote: On 20.1.2014 19:30, Heikki Linnakangas wrote: Attached is a yet another version, with more bugs fixed and more comments added and updated. I would appreciate some heavy-testing of this patch now. If you could re-run the tests you've been using, that could be great. I've tested the WAL replay by replicating GIN operations over streaming replication. That doesn't guarantee it's correct, but it's a good smoke test. I gave it a try - the OOM error seems to be gone, but now get this PANIC: cannot insert duplicate items to GIN index page This only happens when building the index incrementally (i.e. using a sequence of INSERT statements into a table with GIN index). When I create a new index on a table (already containing the same dataset) it works just fine. Also, I tried to reproduce the issue by running a simple plpgsql loop (instead of a complex python script): DO LANGUAGE plpgsql $$ DECLARE r tsvector; BEGIN FOR r IN SELECT body_tsvector FROM data_table LOOP INSERT INTO idx_table (body_tsvector) VALUES (r); END LOOP; END$$; where data_table is the table with imported data (the same data I mentioned in the post about OOM errors), and index_table is an empty table with a GIN index. And indeed it fails, but only if I run the block in multiple sessions in parallel. Oh, I see what's going on. I had assumed that there cannot be duplicate insertions into the posting tree, but that's dead wrong. The fast insertion mechanism depends on a duplicate insertion to do nothing. Will fix, thanks for the testing! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sun, Dec 15, 2013 at 2:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On 14 Dec 2013 15:40, Tom Lane t...@sss.pgh.pa.us wrote: I think you *can't* cover them for the float types; roundoff error would mean you don't get the same answers as before. I was going to say the same thing. But then I started to wonder What's so special about the answers we used to give? They are also subject to round off and the results are already quite questionable in those cases. Well, we can't easily do better than the old answers, and the new ones might be arbitrarily worse. Example: sum or average across single-row windows ought to be exact in any case, but it might be arbitrarily wrong with the negative-transition technique. More generally, this is supposed to be a performance enhancement only; it's not supposed to change the results. It came to me that it might be possible to implement inverse transitions for floating point aggregates by just detecting if precision has been lost during forward transitions. I've written the test to do this as: IF state.value + value = state.value AND value 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF; The inverse transition function checks the precision_lost and if it's true it returns NULL. The core code is now implemented (thanks to Florian) to re-aggregate when NULL is returned from the inverse transition function. I've attached an implementation of this with the transition functions written in plpgsql. I don't really know for sure yet if it can handle all cases and give the exact same results as it would without inverse transitions, but it certainly fixes the error case which was presented Using the attached on HEAD of https://github.com/david-rowley/postgres/commits/invtrans explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1e20),(2,1)) b(i,v); Gives me the expected results of 1e20 and 1, instead of my original attempt which gave 1e20 and 0. I guess the extra tracking on forward transition might mean this would not be practical to implement in C for sum(float), but I just wanted to run the idea through a few heads to see if anyone can present a case where it can still produce wrong results. If it seems sound enough, then I may implement it in C to see how much overhead it adds to forward aggregation for floating point types, but even if it did add too much overhead to forward aggregation it might be worth allowing aggregates to have 2 forward transition functions and if the 2nd one exists then it could be used in windowing functions where the frame does not have unbounded following. Any thoughts? Regards David Rowley BEGIN WORK; CREATE TYPE float_state AS (precision_lost bool, value float); CREATE OR REPLACE FUNCTION float_sum(state float_state, value float) RETURNS float_state AS $$ DECLARE newstate float_state; BEGIN IF state IS NULL THEN IF value IS NULL THEN RETURN NULL; ELSE newstate.value := value; newstate.precision_lost := false; return newstate; END IF; END IF; IF state.value + value = state.value AND value 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF; RETURN newstate; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION float_sum_inv(state float_state, value float) RETURNS float_state AS $$ DECLARE newstate float_state; BEGIN IF state.precision_lost = true THEN RETURN NULL; ELSE newstate.value := state.value - value; RETURN newstate; END IF; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION float_sum_final(state float_state) RETURNS float AS $$ BEGIN IF NOT(state IS NULL) THEN RETURN state.value; ELSE RETURN NULL; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE AGGREGATE mysum (float) ( stype = float_state, sfunc = float_sum, invfunc = float_sum_inv, finalfunc = float_sum_final ); select mysum(v) from (values(1,1e20),(2,1)) b(i,v); -- forces re-aggregate due to precision loss --explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1e20),(2,1)) b(i,v); -- does not force reaggregate. --explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1),(2,2),(3,3)) b(i,v); rollback; -- Sent via pgsql-hackers mailing list
[HACKERS] alternative back-end block formats
Hi all, I'm playing around with Postgres, and I thought it might be fun to experiment with alternative formats for relation blocks, to see if I can get smaller files and/or faster server performance. Does anyone know if this has been done before with Postgres? I would have assumed yes, but I'm not finding anything in Google about people having done this. Thanks, Christian
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan21, 2014, at 10:53 , David Rowley dgrowle...@gmail.com wrote: It came to me that it might be possible to implement inverse transitions for floating point aggregates by just detecting if precision has been lost during forward transitions. I've written the test to do this as: IF state.value + value = state.value AND value 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF; The inverse transition function checks the precision_lost and if it's true it returns NULL. The core code is now implemented (thanks to Florian) to re-aggregate when NULL is returned from the inverse transition function. That's not sufficient, I fear. You can lose all significant digits of the value and still have precision_lost = false afterwards. Try summing over 1e16, 1.01. SELECT 1e16::float8 + 1.01::float8 = 1e16::float8 returns FALSE, yet SELECT 1e16::float8 + 1.01::float8 - 1e16::float8 returns 2 where 1.01 would have been correct. That's still too much precision loss. I'm quite certain that the general idea has merit, but the actual invertibility condition is going to be more involved. If you want to play with this, I think the first step has to be to find a set of guarantees that SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the summands all have the same sign, the error is bound by C * N, where C is some (machine-specific?) constant (about 1e-15 or so), and N is the number of input rows. Or at least so I think from looking at SUMs over floats in descending order, which I guess is the worst case. You could, for example, try to see if you can find a invertibility conditions which guarantees the same, but allows C to be larger. That would put a bound on the number of digits lost by the new SUM(float) compared to the old one. I don't have high hopes for this getting int 9.4, though. If it seems sound enough, then I may implement it in C to see how much overhead it adds to forward aggregation for floating point types, but even if it did add too much overhead to forward aggregation it might be worth allowing aggregates to have 2 forward transition functions and if the 2nd one exists then it could be used in windowing functions where the frame does not have unbounded following. I don't think adding yet another type of aggregation function is the solution here. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Rebased patch is attached. pg_stat_statements in PG9.4dev has already changed table columns in. So I hope this patch will be committed in PG9.4dev. Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,83 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; --- 78,84 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ + #define EXEC_TIME_INIT (-1) /* initial execution time */ #define JUMBLE_SIZE1024 /* query serialization buffer size */ *** *** 114,119 typedef struct Counters --- 115,123 { int64 calls; /* # of times executed */ double total_time; /* total execution time, in msec */ + double total_sqtime; /* cumulated square execution time, in msec */ + double min_time; /* maximum execution time, in msec */ + double max_time; /* minimum execution time, in msec */ int64 rows; /* total # of retrieved or affected rows */ int64 shared_blks_hit; /* # of shared buffer hits */ int64 shared_blks_read; /* # of shared disk blocks read */ *** *** 237,245 void _PG_init(void); --- 241,251 void _PG_fini(void); Datum pg_stat_statements_reset(PG_FUNCTION_ARGS); + Datum pg_stat_statements_reset_time(PG_FUNCTION_ARGS); Datum pg_stat_statements(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pg_stat_statements_reset); + PG_FUNCTION_INFO_V1(pg_stat_statements_reset_time); PG_FUNCTION_INFO_V1(pg_stat_statements); static void pgss_shmem_startup(void); *** *** 266,271 static pgssEntry *entry_alloc(pgssHashKey *key, const char *query, --- 272,278 int query_len, bool sticky); static void entry_dealloc(void); static void entry_reset(void); + static void entry_reset_time(void); static void AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size); static void JumbleQuery(pgssJumbleState *jstate, Query *query); *** *** 1046,1051 pgss_store(const char *query, uint32 queryId, --- 1053,1059 e-counters.calls += 1; e-counters.total_time += total_time; + e-counters.total_sqtime += total_time * total_time; e-counters.rows += rows; e-counters.shared_blks_hit += bufusage-shared_blks_hit; e-counters.shared_blks_read += bufusage-shared_blks_read; *** *** 1061,1066 pgss_store(const char *query, uint32
Re: [HACKERS] Fix memset usage in pgcrypto
On Mon, Jan 20, 2014 at 06:49:21PM -0300, Alvaro Herrera wrote: Marko Kreen escribió: http://www.viva64.com/en/b/0227/ reported that on-stack memset()s might be optimized away by compilers. Fix it. Just to clarify, this needs to be applied to all branches, right? If so, does the version submitted apply cleanly to all of them? It does apply cleanly. It is not critical fix, but it's simple, so I think it should be back-patched. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
* Craig Ringer (cr...@2ndquadrant.com) wrote: It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. We've already got *far* too much of that going on for my taste. I'd love to see a comprehensive solution to this problem which allows monitoring systems to run w/o superuser privileges. If you want control over visibility of application_name, it should be done with a column privilige granted to a system role, or something like that - so the ability to see it can be given to public on default (thus not breaking BC) and if it's revoked from public, given to roles that need to see it. I agree with this- individuals should be able to control access to this information for their databases/clusters. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] GIN improvements part 1: additional information
On Mon, Jan 20, 2014 at 10:30 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/17/2014 08:49 PM, Alexander Korotkov wrote: On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/17/2014 01:05 PM, Alexander Korotkov wrote: Seems to be fixed in the attached version of patch. Another improvement in this version: only left-most segments and further are re-encoded. Left part of page are left untouched. I'm looking into this now. A few quick notes: * Even when you don't re-encode the whole page, you still WAL-log the whole page. While correct, it'd be a pretty obvious optimization to only WAL-log the modified part. Oh, I overlooked it. I wrote code in ginRedoInsertData which finds appropriate place fro data but didn't notice that I still wrote whole page to xlog record. Yeah. I think ginRedoInsertData was too cute to be bug-free. I added an explicit unmodifiedsize field to the WAL record, so that ginRedoInsertData doesn't need to calculate it. * When new items are appended to the end of the page so that only the last existing compressed segment is re-encoded, and the page has to be split, the items aren't divided 50/50 on the pages. The loop that moves segments destined for the left page to the right won't move any existing, untouched, segments. I think this loop will move one segment. However, it's too small. Implemented this. I noticed that the gin vacuum redo routine is dead code, except for the data-leaf page handling, because we never remove entries or internal nodes (page deletion is a separate wal record type). And the data-leaf case is functionally equivalent to heap newpage records. I removed the dead code and made it more clear that it resembles heap newpage. Attached is a yet another version, with more bugs fixed and more comments added and updated. I would appreciate some heavy-testing of this patch now. If you could re-run the tests you've been using, that could be great. I've tested the WAL replay by replicating GIN operations over streaming replication. That doesn't guarantee it's correct, but it's a good smoke test. I tried my test-suite but it hangs on index scan with infinite loop. I re-tried it on my laptop with -O0. I found it to crash on update and vacuum in some random places like: Assert(GinPageIsData(page)); in xlogVacuumPage Assert(ndecoded == totalpacked); in ginCompressPostingList Trying to debug it. -- With best regards, Alexander Korotkov.
Re: [HACKERS] ALTER SYSTEM SET typos and fix for temporary file name management
On Mon, Jan 20, 2014 at 2:12 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Sat, Jan 18, 2014 at 7:59 PM, Michael Paquier michael.paqu...@gmail.com wrote: Hi all, After going through commit 65d6e4c (introducing ALTER SYSTEM SET), I noticed a couple of typo mistakes as well as (I think) a weird way of using the temporary auto-configuration name postgresql.auto.conf.temp in two different places, resulting in the patch attached. .tmp suffix is used at couple other places in code as well. snapmgr.c XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), .tmp); receivelog.c snprintf(tmppath, MAXPGPATH, %s.tmp, path); Although similar suffix is used at other places, but still I think it might be better to define for current case as here prefix (postgresql.auto.conf) is also fixed and chances of getting it changed are less. However even if we don't do, it might be okay as we are using such suffixes at other places as well. In the case of snapmgr.c and receivelog.c, the operations are kept local, so it does not matter much if this way of naming is done as all the operations for a temporary file are kept within the same, isolated function. However, the case of postgresql.auto.conf.temp is different: this temporary file name is used as well for a check in basebackup.c, so I imagine that it would be better to centralize this information in a single place. Now this name is only used in two places, but who knows if some additional checks here are there won't be needed for this temp file... postgresql.auto.conf.temp is also located at the root of PGDATA, so it might be surprising for the user to find it even if there are few chances that it can happen. It might be an overkill to use a dedicated variable for the temporary autoconfiguration file (here PG_AUTOCONF_FILENAME_TEMP), but I found kind of weird the way things are currently done on master branch. IMO, it would reduce bug possibilities to have everything managed with a single variable. Typos at least should be fixed :) - appendStringInfoString(buf, # Do not edit this file manually! \n); - appendStringInfoString(buf, # It will be overwritten by ALTER SYSTEM command. \n); + appendStringInfoString(buf, # Do not edit this file manually!\n); + appendStringInfoString(buf, # It will be overwritten by ALTER SYSTEM command.\n); Same change in initdb.c is missing. Thanks, I missed it. -- Michael commit a8cf33deb3c46c4f56f1e617bbd3142cbbd66f1b Author: Michael Paquier mich...@otacoo.com Date: Tue Jan 21 21:44:30 2014 +0900 Fix typos and temporary file management in ALTER SYSTEM Using a temporary file uniquely defined in a single place reduces the occurence of bugs related to it. Some typos are fixed at the same time. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index fc35f5b..a31bcdf 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -834,11 +834,10 @@ sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces) /* skip auto conf temporary file */ if (strncmp(de-d_name, - PG_AUTOCONF_FILENAME .temp, - sizeof(PG_AUTOCONF_FILENAME) + 4) == 0) + PG_AUTOCONF_FILENAME_TEMP, + sizeof(PG_AUTOCONF_FILENAME_TEMP)) == 0) continue; - /* * If there's a backup_label file, it belongs to a backup started by * the user with pg_start_backup(). It is *not* correct for this diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 1217098..7da52bf 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -6457,9 +6457,9 @@ flatten_set_variable_args(const char *name, List *args) } /* - * Writes updated configuration parameter values into - * postgresql.auto.conf.temp file. It traverses the list of parameters - * and quote the string values before writing them to temporaray file. + * Write updated configuration parameter values into + * file PG_AUTOCONF_FILENAME_TEMP. It traverses the list of parameters + * and quotes the string values before writing them to temporary file. */ static void write_auto_conf_file(int fd, const char *filename, ConfigVariable **head_p) @@ -6468,8 +6468,8 @@ write_auto_conf_file(int fd, const char *filename, ConfigVariable **head_p) StringInfoData buf; initStringInfo(buf); - appendStringInfoString(buf, # Do not edit this file manually! \n); - appendStringInfoString(buf, # It will be overwritten by ALTER SYSTEM command. \n); + appendStringInfoString(buf, # Do not edit this file manually!\n); + appendStringInfoString(buf, # It will be overwritten by ALTER SYSTEM command.\n); /* * write the file header message before contents, so that if there is no @@ -6596,7 +6596,7 @@ replace_auto_config_value(ConfigVariable **head_p, ConfigVariable **tail_p, * * The configuration parameters are written to a temporary * file then renamed to the final name. The template for the - * temporary file is
Re: [HACKERS] GIN improvements part 1: additional information
On Tue, Jan 21, 2014 at 4:28 PM, Alexander Korotkov aekorot...@gmail.comwrote: I noticed that the gin vacuum redo routine is dead code, except for the data-leaf page handling, because we never remove entries or internal nodes (page deletion is a separate wal record type). And the data-leaf case is functionally equivalent to heap newpage records. I removed the dead code and made it more clear that it resembles heap newpage. Attached is a yet another version, with more bugs fixed and more comments added and updated. I would appreciate some heavy-testing of this patch now. If you could re-run the tests you've been using, that could be great. I've tested the WAL replay by replicating GIN operations over streaming replication. That doesn't guarantee it's correct, but it's a good smoke test. I tried my test-suite but it hangs on index scan with infinite loop. I re-tried it on my laptop with -O0. I found it to crash on update and vacuum in some random places like: Assert(GinPageIsData(page)); in xlogVacuumPage Assert(ndecoded == totalpacked); in ginCompressPostingList Trying to debug it. Another question is about dataPlaceToPageLeaf: while ((Pointer) seg segend) { if (ginCompareItemPointers(minNewItem, seg-first) 0) break; Shouldn't we adjust seg to previous segment? If minNewItem is less than seg-first we should insert it to previous segment. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: Amit Kapila amit.kapil...@gmail.com How about below message: event source event_source_name is already registered. Thanks. I'll use this, making the initial letter the capital E like other messages in the same source file. I'm going to submit the final patch and update the CommitFest entry tomorrow at the earliest. Today, I reviewed the patch again and found it okay, except a small inconsistency which is about default event source name in postgresql.conf, all other places it's changed except in .conf file. Do you think it makes sense to change there as well? Oh, I missed it. postgresql.conf.sample says: # The commented-out settings shown in this file represent the default values. To follow this, we have the line as: #event_source = 'PostgreSQL 9.4' But this requires us to change this line for each major release. That's a maintenance headache. Another idea is: #event_source = 'PostgreSQL major_release' But this is not the exact default value. So I want to leave the line as now. Anyway, some other lines in postgresql.conf.sample do not represent the default value, either,: #data_directory = 'ConfigDir' # use data in another directory #max_stack_depth = 2MB # min 100kB #include_if_exists = 'exists.conf' # include file only if it exists #include = 'special.conf' # include file Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
Stephen Frost wrote: * Craig Ringer (cr...@2ndquadrant.com) wrote: It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. We've already got *far* too much of that going on for my taste. I'd love to see a comprehensive solution to this problem which allows monitoring systems to run w/o superuser privileges. Yeah, we need a CAP_MONITOR capability thingy. (CAP_BACKUP would be great too -- essentially read only but read everything) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On Fri, Jan 17, 2014 at 8:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-17 13:50:08 -0500, Tom Lane wrote: I think this approach is fundamentally broken, because it can't reasonably cope with any case more complicated than %zu or %zd. Am I just too tired, or am I not getting how INT64_FORMAT currently allows the arguments to be used posititional? It doesn't, which is one of the reasons for not allowing it in translatable strings (the other being lack of standardization of the strings that would be subject to translation). On second thought, that answer was too glib. There's no need for %n$ in the format strings *in the source code*, so INT64_FORMAT isn't getting in the way from that perspective. However, expand_fmt_string() is necessarily applied to formats *after* they've been through gettext(), so it has to expect that it might see %n$ in the now-translated strings. It's still the case that we need a policy against INT64_FORMAT in translatable strings, though, because what's passed to the gettext mechanism has to be a fixed string not something with platform dependencies in it. Or so I would assume, anyway. Well, that's kinda problematic, isn't it? Printing the variable into a static buffer so that it can then be formatted with %s is pretty pessimal for a message that we might not even be planning to emit. Perhaps we should jettison entirely the idea of using the operating system's built-in sprintf and use one of our own that has all of the nice widgets we need, like a format code that's guaranteed to be right for uint64 and one that's guaranteed to be right for Size. This could turn out to be a bad idea if the best sprintf we can write is much slower than the native sprintf on any common platforms ... and maybe it wouldn't play nice with GCC's desire to check format strings. But what we're doing now is a real nuisance, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] change alter user to be a true alias for alter role
Jov am...@amutu.com wrote: attach patch add the per database set for alter user Please add to the open CommitFest so that the patch doesn't fall through the cracks: https://commitfest.postgresql.org/action/commitfest_view/open -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it
On Fri, Jan 17, 2014 at 12:07 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 17, 2014 at 12:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: PS: off topic, but isn't ParseConfigDirectory leaking the result of AbsoluteConfigLocation? In both normal and error paths? Yes, I also think it leaks in both cases and similar leak is present in ParseConfigFile(). I have tried to fix both of these leaks with attached patch. Committed and back-patched to 9.3. While reviewing, I noted that the skipping missing configuration file message in ParseConfigFile() uses an elevel of LOG, while the other messages in the same file use elevel. I'm thinking that's a bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Potential relcache leak in get_object_address_attribute
On Sat, Jan 18, 2014 at 7:14 PM, Marti Raudsepp ma...@juffo.org wrote: It looks like the get_object_address_attribute() function has a potential relcache leak. When missing_ok=true, the relation is found but attribute is not, then the relation isn't closed, nor is it returned to the caller. I couldn't figure out any ways to trigger this, but it's best to fix anyway. I agree. Committed, thanks for the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Robert Haas escribió: On Fri, Jan 17, 2014 at 8:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-17 13:50:08 -0500, Tom Lane wrote: Am I just too tired, or am I not getting how INT64_FORMAT currently allows the arguments to be used posititional? It doesn't, which is one of the reasons for not allowing it in translatable strings (the other being lack of standardization of the strings that would be subject to translation). On second thought, that answer was too glib. There's no need for %n$ in the format strings *in the source code*, so INT64_FORMAT isn't getting in the way from that perspective. However, expand_fmt_string() is necessarily applied to formats *after* they've been through gettext(), so it has to expect that it might see %n$ in the now-translated strings. How difficult would it be to have expand_fmt_string deal with positional modifiers? I don't think we need anything from it other than the %n$ notation, so perhaps it's not so problematic. Perhaps we should jettison entirely the idea of using the operating system's built-in sprintf and use one of our own that has all of the nice widgets we need, like a format code that's guaranteed to be right for uint64 and one that's guaranteed to be right for Size. This could turn out to be a bad idea if the best sprintf we can write is much slower than the native sprintf on any common platforms ... and maybe it wouldn't play nice with GCC's desire to check format strings. But what we're doing now is a real nuisance, too. Maybe we can use our own implementation if the system's doesn't support %z. It's present in glibc 2.1 at least, and it's part of in the 2004 edition of POSIX:2001. http://pubs.opengroup.org/onlinepubs/009695399/functions/sprintf.html (It is not present in SUSv2 (1997), and I wasn't able to find the original POSIX:2001 version.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-21 12:11:23 -0300, Alvaro Herrera wrote: How difficult would it be to have expand_fmt_string deal with positional modifiers? I don't think we need anything from it other than the %n$ notation, so perhaps it's not so problematic. I don't think there's much reason to go there. I didn't go for the pg-supplied sprintf() because I thought it'd be considered to invasive. Since that's apparently not the case... Perhaps we should jettison entirely the idea of using the operating system's built-in sprintf and use one of our own that has all of the nice widgets we need, like a format code that's guaranteed to be right for uint64 and one that's guaranteed to be right for Size. This could turn out to be a bad idea if the best sprintf we can write is much slower than the native sprintf on any common platforms ... and maybe it wouldn't play nice with GCC's desire to check format strings. But what we're doing now is a real nuisance, too. Maybe we can use our own implementation if the system's doesn't support %z. It's present in glibc 2.1 at least, and it's part of in the 2004 edition of POSIX:2001. http://pubs.opengroup.org/onlinepubs/009695399/functions/sprintf.html Yea, I have a patch for that, will send it as soon as some other stuff is finished. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.2
On Sun, Jan 19, 2014 at 2:43 PM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Nov 19, 2013 at 6:38 PM, Andres Freund and...@2ndquadrant.com wrote: The attached patches compile and make check successfully on linux x86, amd64 and msvc x86 and amd64. This time and updated configure is included. The majority of operations still rely on CAS emulation. Note that the atomics-generic-acc.h file has ® characters in the Latin-1 encoding (Intel ® Itanium ®). If you have to use weird characters, I think you should make sure they're UTF-8 I think we're better off avoiding them altogether. What's wrong with (R)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.2
On 2014-01-21 10:20:35 -0500, Robert Haas wrote: On Sun, Jan 19, 2014 at 2:43 PM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Nov 19, 2013 at 6:38 PM, Andres Freund and...@2ndquadrant.com wrote: The attached patches compile and make check successfully on linux x86, amd64 and msvc x86 and amd64. This time and updated configure is included. The majority of operations still rely on CAS emulation. Note that the atomics-generic-acc.h file has ® characters in the Latin-1 encoding (Intel ® Itanium ®). If you have to use weird characters, I think you should make sure they're UTF-8 I think we're better off avoiding them altogether. What's wrong with (R)? Nothing at all. That was just the copied title from the pdf, it's gone now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve the help message about psql -F
On Mon, Jan 20, 2014 at 2:04 PM, Jov am...@amutu.com wrote: reasonable,I removed the set,patch attached. Hi Jov, A new commitfest was just opened, due on 2014-06. Please add your patch here: https://commitfest.postgresql.org/action/commitfest_view?id=22 (You'll need a community account if you don't already have one) Sometimes simple fixes like yours are merged outside a CommitFest, but adding it there makes sure it won't get lost. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
Stephen Frost sfr...@snowman.net writes: * Craig Ringer (cr...@2ndquadrant.com) wrote: If you want control over visibility of application_name, it should be done with a column privilige granted to a system role, or something like that - so the ability to see it can be given to public on default (thus not breaking BC) and if it's revoked from public, given to roles that need to see it. I agree with this- individuals should be able to control access to this information for their databases/clusters. I think that'd be much more complexity than the case justifies. The argument that application_name might contain sensitive information seems ludicrously weak to me: whatever a client is exposing as application_name is its own darn choice. If you don't like it, go fix the client. If there is some client library that sets application_name without allowing the choice to be overridden, then that's a problem with that library, not with the server's behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: In the Redesigning checkpoint_segments thread, many people opined that there should be a hard limit on the amount of disk space used for WAL: http://www.postgresql.org/message-id/CA+TgmoaOkgZb5YsmQeMg8ZVqWMtR=6s4-ppd+6jiy4oq78i...@mail.gmail.com. I'm starting a new thread on that, because that's mostly orthogonal to redesigning checkpoint_segments. The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can try to avoid that by checkpointing early enough, so that we can remove old WAL segments to make room for new ones before you run out, but unless we somehow throttle or stop new WAL insertions, it's always going to be possible to use up all disk space. A typical scenario where that happens is when archive_command fails for some reason; even a checkpoint can't remove old, unarchived segments in that case. But it can happen even without WAL archiving. I don't see we need to prevent WAL insertions when the disk fills. We still have the whole of wal_buffers to use up. When that is full, we will prevent further WAL insertions because we will be holding the WALwritelock to clear more space. So the rest of the system will lock up nicely, like we want, apart from read-only transactions. Instead of PANICing, we should simply signal the checkpointer to perform a shutdown checkpoint. That normally requires a WAL insertion to complete, but it seems easy enough to make that happen by simply rewriting the control file, after which ALL WAL files are superfluous for crash recovery and can be deleted. Once that checkpoint is complete, we can begin deleting WAL files that are archived/replicated and continue as normal. The previously failing WAL write can now be made again and may succeed this time - if it does, we continue, if not - now we PANIC. Note that this would not require in-progress transactions to be aborted. They can continue normally once wal_buffers re-opens. We don't really want anything too drastic, because if this situation happens once it may happen many times - I'm imagining a flaky network etc.. So we want the situation to recover quickly and easily, without too many consequences. The above appears to be very minimal change from existing code and doesn't introduce lots of new points of breakage. I've seen a case, where it was even worse than a PANIC and shutdown. pg_xlog was on a separate partition that had nothing else on it. The partition filled up, and the system shut down with a PANIC. Because there was no space left, it could not even write the checkpoint after recovery, and thus refused to start up again. There was nothing else on the partition that you could delete to make space. The only recourse would've been to add more disk space to the partition (impossible), or manually delete an old WAL file that was not needed to recover from the latest checkpoint (scary). Fortunately this was a test system, so we just deleted everything. Doing shutdown checkpoints via the control file would exactly solve that issue. We already depend upon the readability of the control file anyway, so this changes nothing. (And if you regard it does, then we can have multiple control files, or at least a backup control file at shutdown). We can make the shutdown checkpoint happen always at EOF of a WAL segment, so at shutdown we don't need any WAL files to remain at all. So we need to somehow stop new WAL insertions from happening, before it's too late. I don't think we do. What might be sensible is to have checkpoints speed up as WAL volume approaches a predefined limit, so that we minimise the delay caused when wal_buffers locks up. Not suggesting anything here for 9.4, since we're midCF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On Tue, Jan 21, 2014 at 2:41 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: Stephen Frost wrote: * Craig Ringer (cr...@2ndquadrant.com) wrote: It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. We've already got *far* too much of that going on for my taste. I'd love to see a comprehensive solution to this problem which allows monitoring systems to run w/o superuser privileges. Yeah, we need a CAP_MONITOR capability thingy. (CAP_BACKUP would be great too -- essentially read only but read everything) Isn't CAP_BACKUP pretty much the REPLICATION privilege? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] proposal: hide application_name from other users
* Magnus Hagander (mag...@hagander.net) wrote: Isn't CAP_BACKUP pretty much the REPLICATION privilege? Not unless we change it to allow read-access to all tables to allow for pg_dump to work... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] currawong is not a happy animal
On Sat, Jan 18, 2014 at 11:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: Is there any specific reason why adding PGDLLIMPORT for exporting const variables is not good, when we are already doing for other variables like InterruptHoldoffCount, CritSectionCount? On searching in code, I found that for few const variables we do extern PGDLLIMPORT. For example: extern PGDLLIMPORT const int NumScanKeywords; OK, that one is a direct counterexample to my worry. I'm still unconvinced that having a contrib module checking stuff based on the size of a struct it's not supposed to access represents a sane division of responsibilities; but let's fix the build problem now and then Robert can contemplate that question at his leisure. Exposing the whole struct because of that minor detail seemed really ugly to me, and I feel strongly that we should avoid it. In the intended use of this code, I expect queues to be some number of kilobytes on the low end up to some number of megabytes on the high end, so the fact that it can't be less than 56 bytes or so is sort of a meaningless blip on the radar. So one thing I thought about is just defining the minimum size as something conservative, like 1024. But for testing purposes, it was certainly useful to create the smallest possible queue, because that stresses the synchronization code to the maximum degree possible. So on the whole I'm still happy with how I did this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom collations, collation modifiers (eg case-insensitive)
Craig Ringer wrote: (I intensely dislike the idea of ignoring accents, but it's something some people appear to need/want, and is supported by other vendors). FWIW at least in spanish, users always want to search for entries ignoring accents. Nowadays I think most turn to unaccent(), but before that was written people resorted to calling transform() on both the stored data and the user-entered query, so that all pertinent matches would be found. People would even create indexes on the unaccented data to speed this up. It's an important feature, not a corner case by any means. Not sure about other languages. For instance perhaps in French they would be interested in ignoring some accents but not others. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On Tue, Jan 21, 2014 at 5:18 PM, Stephen Frost sfr...@snowman.net wrote: * Magnus Hagander (mag...@hagander.net) wrote: Isn't CAP_BACKUP pretty much the REPLICATION privilege? Not unless we change it to allow read-access to all tables to allow for pg_dump to work... That sounds more like CAP_DUMP than CAP_BACKUP :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)
On Fri, Jan 17, 2014 at 04:31:48PM +, Mel Gorman wrote: NUMA Optimisations -- The primary one that showed up was zone_reclaim_mode. Enabling that parameter is a disaster for many workloads and apparently Postgres is one. It might be time to revisit leaving that thing disabled by default and explicitly requiring that NUMA-aware workloads that are correctly partitioned enable it. Otherwise NUMA considerations are not that much of a concern right now. Here is a blog post about our zone_reclaim_mode-disable recommendations: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html Direct IO, buffered IO, double buffering and wishlists -- 6. Only writeback pages if explicitly synced. Postgres has strict write ordering requirements. In the words of Tom Lane -- As things currently stand, we dirty the page in our internal buffers, and we don't write it to the kernel until we've written and fsync'd the WAL data that needs to get to disk first. mmap() would avoid double buffering but it has no control about the write ordering which is a show-stopper. As Andres Freund described; What was not explicitly stated here is that the Postgres design is taking advantage of the double-buffering feature here and writing to a memory copy of the page while there is still an unmodified copy in the kernel cache, or on disk. In the case of a crash, we rely on the fact that the disk page is unchanged. Certainly any design that requires the kernel to mange two different copies of the same page is going to be confusing. One larger question is how many of these things that Postgres needs are needed by other applications? I doubt Postgres is large enough to warrant changes on its own. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com wrote: I briefly checked the patch. Most of lines are mechanical replacement from LWLockId to LWLock *, and compiler didn't claim anything with -Wall -Werror option. My concern is around LWLockTranche mechanism. Isn't it too complicated towards the purpose? For example, it may make sense to add char lockname[NAMEDATALEN]; at the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it also adds an argument of LWLockAssign() to gives the human readable name. Is the additional 64bytes (= NAMEDATALEN) per lock too large for recent hardware? Well, we'd need it when either LOCK_DEBUG was defined or when LWLOCK_STATS was defined or when --enable-dtrace was used, and while the first two are probably rarely enough used that that would be OK, I think the third case is probably fairly common, and I don't think we want to have such a potentially performance-critical difference between builds with and without dtrace. Also... yeah, it's a lot of memory. If we add an additional 64 bytes to the structure, then we're looking at 96 bytes per lwlock instead of 32, after padding out to a 32-byte boundary to avoid crossing cache lines. We need 2 lwlocks per buffer, so that's an additional 128 bytes per 8kB buffer. For shared_buffers = 8GB, that's an extra 128MB for storing lwlocks. I'm not willing to assume nobody cares about that. And while I agree that this is a bit complex, I don't think it's really as bad as all that. We've gotten by for a long time without people being able to put lwlocks in other parts of memory, and most extension authors have gotten by with LWLockAssign() just fine and can continue doing things that way; only users with particularly sophisticated needs should bother to worry about the tranche stuff. One idea I just had is to improve the dsm_toc module so that it can optionally set up a tranche of lwlocks for you, and provide some analogues of RequestAddinLWLocks and LWLockAssign for that case. That would probably make this quite a bit simpler to use, at least for people using it with dynamic shared memory. But I think that's a separate patch. Below is minor comments. It seems to me this newer form increased direct reference to the MainLWLockArray. Is it really graceful? My recommendation is to define a macro that wraps the reference to MainLWLockArray. For example: #define PartitionLock(i) \ (MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + (i)].lock) Yeah, that's probably a good idea. This chunk is enclosed by #ifdef EXEC_BACKEND, the legacy LWLockArray was not removed. Oops, will fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
* Magnus Hagander (mag...@hagander.net) wrote: On Tue, Jan 21, 2014 at 5:18 PM, Stephen Frost sfr...@snowman.net wrote: Not unless we change it to allow read-access to all tables to allow for pg_dump to work... That sounds more like CAP_DUMP than CAP_BACKUP :) Well, perhaps CAP_READONLY (or READALL?), there are auditor-type roles which could be reduced to that level instead of superuser. I'm on the fence about if this needs to be seperate from REPLICATION though- how many different such options are we going to have and how ugly is it going to get to litter the code with if(superuser || read-only || ...)? Perhaps a way to say this role has X-privilege on all objects of this type which could then be used to GRANT SELECT and would be a single point where we need to add those checks (in the ACL code for each object type)? One of the key points would be that the privilege apply to newly created objects as well.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Closing commitfest 2013-11
On Mon, Jan 20, 2014 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: With apologies to our beloved commitfest-mace-wielding CFM, commitfest 2013-11 intentionally still contains a few open patches. I think that CF is largely being ignored by most people now that we have CF 2014-01 in progress. If we don't want to do anything about these patches in the immediate future, I propose we move them to CF 2014-01. I think the idea was that patch authors should take responsibility for pushing their patches forward to 2014-01 if they still wanted them considered. Quite a few patches already were moved that way, IIRC. Agreed on that general theory. And, also, yeah, the shared memory message queueing stuff got committed. Sorry, I missed the fact that there was still an open CF entry for that; I assumed that it would have been marked Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 12:11:23 -0300, Alvaro Herrera wrote: How difficult would it be to have expand_fmt_string deal with positional modifiers? I don't think we need anything from it other than the %n$ notation, so perhaps it's not so problematic. I don't think there's much reason to go there. I didn't go for the pg-supplied sprintf() because I thought it'd be considered to invasive. Since that's apparently not the case... Yeah, that would make expand_fmt_string considerably more complicated and so presumably slower. We don't really need that when we can make what I expect is a pretty trivial addition to snprintf.c. Also, fixing snprintf.c will make it safe to use the z flag in contexts other than ereport/elog. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] integrate pg_upgrade analyze_new_cluster.sh into vacuumdb
09.01.2014 05:15, Peter Eisentraut kirjoitti: pg_upgrade creates a script analyze_new_cluster.{sh|bat} that runs vacuumdb --analyze-only in three stages with different statistics target settings to get a fresh cluster analyzed faster. I think this behavior is also useful for clusters or databases freshly created by pg_restore or any other loading mechanism, so it's suboptimal to have this constrained to pg_upgrade. I think the three stage analyze is a wrong solution to the slow analyze problem. In my experience most of the analyze time goes to reading random blocks from the disk but we usually use only a small portion of that data (1 row per block.) If we were able to better utilize the data we read we could get good statistics with a lot less IO than we currently need. This was discussed in length at http://www.postgresql.org/message-id/CAM-w4HOjRbNPMW=shjhw_qfapcuu5ege1tmdr0zqu+kqx8q...@mail.gmail.com but it hasn't turned into patches so far. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. I don't see we need to prevent WAL insertions when the disk fills. We still have the whole of wal_buffers to use up. When that is full, we will prevent further WAL insertions because we will be holding the WALwritelock to clear more space. So the rest of the system will lock up nicely, like we want, apart from read-only transactions. I'm not sure that all writing transactions lock up hard is really so much better than the current behavior. My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's being done inside a critical section, but could we move that? Instead of PANICing, we should simply signal the checkpointer to perform a shutdown checkpoint. And if that fails for lack of disk space? In any case, what you're proposing sounds like a lot of new complication in a code path that is necessarily never going to be terribly well tested. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On Fri, Dec 20, 2013 at 9:21 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com ! if (source == XLOG_FROM_ARCHIVE StandbyModeRequested) Even when standby_mode is not enabled, we can use cascade replication and it needs the accumulated WAL files. So I think that AllowCascadeReplication() should be added into this condition. ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR /RECOVERYXLOG); ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); ! ! if (restoredFromArchive) Don't we need to check !StandbyModeRequested and !AllowCascadeReplication() here? Oh, you are correct. Okay, done. Thanks! The patch looks good to me. Attached is the updated version of the patch. I added the comments. Did you test whether this patch works properly in several recovery cases? Regards, -- Fujii Masao *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 3772,3781 XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, } /* ! * If the segment was fetched from archival storage, replace the existing ! * xlog segment (if any) with the archival version. */ ! if (source == XLOG_FROM_ARCHIVE) { KeepFileRestoredFromArchive(path, xlogfname); --- 3772,3792 } /* ! * If the segment was fetched from archival storage and either cascading ! * replication or standby_mode is enabled, replace the existing xlog ! * segment (if any) with the archival version. Cascading replication needs ! * this replacement so that cascading walsender can send the xlog segment ! * which was restored from the archive. When standby_mode is enabled, ! * fast promotion is performed at the end of recovery, and also needs this ! * replacement so that the crash recovery just after fast promotion can ! * replay all the required segments from pg_xlog. ! * ! * If the replacement is not required, the segments are always restored onto ! * the same file named RECOVERYXLOG from the archive. This prevents the ! * large increase of segments in pg_xlog. */ ! if (source == XLOG_FROM_ARCHIVE ! (AllowCascadeReplication() || StandbyModeRequested)) { KeepFileRestoredFromArchive(path, xlogfname); *** *** 5572,5593 exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo) } /* ! * If we are establishing a new timeline, we have to copy data from the ! * last WAL segment of the old timeline to create a starting WAL segment ! * for the new timeline. * ! * Notify the archiver that the last WAL segment of the old timeline is ! * ready to copy to archival storage. Otherwise, it is not archived for a ! * while. */ ! if (endTLI != ThisTimeLineID) { ! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo); ! if (XLogArchivingActive()) { ! XLogFileName(xlogpath, endTLI, endLogSegNo); ! XLogArchiveNotify(xlogpath); } } --- 5583,5646 } /* ! * If the segment was fetched from archival storage and neither cascading ! * replication nor fast promotion is enabled, we replace the existing xlog ! * segment (if any) with the archival version named RECOVERYXLOG. This is ! * because whatever is in XLOGDIR is very possibly older than what we have ! * from the archives, since it could have come from restoring a PGDATA ! * backup. In any case, the archival version certainly is more ! * descriptive of what our current database state is, because that is what ! * we replayed from. ! * ! * Note that if we are establishing a new timeline, ThisTimeLineID is ! * already set to the new value, and so we will create a new file instead ! * of overwriting any existing file. (This is, in fact, always the case ! * at present.) * ! * If either cascading replication or fast promotion is enabled, we don't ! * need the replacement here because it has already done just after the ! * segment was restored from the archive. */ ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR /RECOVERYXLOG); ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); ! ! if (restoredFromArchive ! !AllowCascadeReplication() !StandbyModeRequested) { ! unlink(xlogpath); /* might or might not exist */ ! if (rename(recoveryPath, xlogpath) != 0) ! ereport(FATAL, ! (errcode_for_file_access(), ! errmsg(could not rename file \%s\ to \%s\: %m, ! recoveryPath, xlogpath))); ! /* XXX might we need to fix permissions on the file? */ ! } ! else ! { ! /* ! * Since there might be a partial WAL segment named RECOVERYXLOG, ! * get rid of it. ! */ ! unlink(recoveryPath); /* ignore any error */ ! /* ! * If we are establishing a new timeline, we have to copy data from ! * the last WAL segment of the old timeline to create a starting WAL ! * segment for the new timeline. ! * ! * Notify the archiver that the last WAL segment of the old timeline ! * is ready to copy to archival
Re: [HACKERS] array_length(anyarray)
On Sun, Jan 19, 2014 at 1:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014/1/19 Marko Tiikkaja ma...@joh.to On 1/19/14, 12:21 AM, Pavel Stehule wrote: I checked it and I got a small issue bash-4.1$ patch -p1 cardinality.patch (Stripping trailing CRs from patch.) not sure about source of this problem. I can't reproduce the problem. In fact, I don't see a single CR byte in the patch file on my disk, the file my email clients claims to have sent or a copy of the file I downloaded from the archives. Are you sure this isn't a problem on your end? It can be problem on my side - some strange combination of mime type. I seen this issue before. I will recheck it tomorrow from other computer. Doesn't matter anyway. Patch needing to strip trailing CRs doesn't cause any issue. I got the same message, BTW; maybe some kind of gmail weirdness. As it turns out, a function called cardinality() was added in 2009 and ripped back out again. But the one that was committed back then had funny semantics that people weren't sure about, and people seemed to think it should behave like this one does. So I've gone ahead and committed this, crossing my fingers that we won't have to rip it back out again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing commitfest 2013-11
Hello I disagree with it. There was no any request to move ready for commit patches to next commitfest! I expected so only unfinishing patches should by moved there by their authors. I sent question to Peter E. But without reply, but Tom did commits from thist list, so I expected so there is some agreement about it and I did'nt any alarm. My patch there is prerequsity for dump --if-exi Dne 21.1.2014 17:41 Robert Haas robertmh...@gmail.com napsal(a): On Mon, Jan 20, 2014 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: With apologies to our beloved commitfest-mace-wielding CFM, commitfest 2013-11 intentionally still contains a few open patches. I think that CF is largely being ignored by most people now that we have CF 2014-01 in progress. If we don't want to do anything about these patches in the immediate future, I propose we move them to CF 2014-01. I think the idea was that patch authors should take responsibility for pushing their patches forward to 2014-01 if they still wanted them considered. Quite a few patches already were moved that way, IIRC. Agreed on that general theory. And, also, yeah, the shared memory message queueing stuff got committed. Sorry, I missed the fact that there was still an open CF entry for that; I assumed that it would have been marked Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
On Mon, Jan 20, 2014 at 1:39 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: * I haven't introduced settings to tweak this per table for autovacuum. I don't think those are needed. It's not hard to do, however; if people opine against this, I will implement that. I can't think of any reason to believe that it will be less important to tune these values on a per-table basis than it is to be able to do the same with the autovacuum parameters. Indeed, all the discussion on this thread suggests precisely that we have no real idea how to set these values yet, so more configurability is good. Even if you reject that argument, I think it's a bad idea to start making xmax vacuuming and xmin vacuuming less than parallel; such decisions confuse users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing commitfest 2013-11
Dne 21.1.2014 18:52 Pavel Stehule pavel.steh...@gmail.com napsal(a): Hello I disagree with it. There was no any request to move ready for commit patches to next commitfest! I expected so only unfinishing patches should by moved there by their authors. I sent question to Peter E. But without reply, but Tom did commits from thist list, so I expected so there is some agreement about it and I did'nt any alarm. My patch there is prerequsity for dump --if-exi Sorry, train and mobile :( It is required for dump --if-exists feature. Regards Pavel Dne 21.1.2014 17:41 Robert Haas robertmh...@gmail.com napsal(a): On Mon, Jan 20, 2014 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: With apologies to our beloved commitfest-mace-wielding CFM, commitfest 2013-11 intentionally still contains a few open patches. I think that CF is largely being ignored by most people now that we have CF 2014-01 in progress. If we don't want to do anything about these patches in the immediate future, I propose we move them to CF 2014-01. I think the idea was that patch authors should take responsibility for pushing their patches forward to 2014-01 if they still wanted them considered. Quite a few patches already were moved that way, IIRC. Agreed on that general theory. And, also, yeah, the shared memory message queueing stuff got committed. Sorry, I missed the fact that there was still an open CF entry for that; I assumed that it would have been marked Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug? could not remove shared memory segment
Hi, When I shut down the standby server and then the master server in that order, I found that the master server emit the following message. LOG: XX000: could not remove shared memory segment /PostgreSQL.1804289383: Invalid argument LOCATION: dsm_impl_posix, dsm_impl.c:269 Is this intentional message or a bug? $ uname -a Darwin test.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 I used the latest version of the source code in the git master. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom collations, collation modifiers (eg case-insensitive)
This feature is interesting for Czech language too. Lot of applications allows accent free searching due possible issues in original data or some client limits - missing Czech keyboard and similar Dne 21.1.2014 17:17 Alvaro Herrera alvhe...@2ndquadrant.com napsal(a): Craig Ringer wrote: (I intensely dislike the idea of ignoring accents, but it's something some people appear to need/want, and is supported by other vendors). FWIW at least in spanish, users always want to search for entries ignoring accents. Nowadays I think most turn to unaccent(), but before that was written people resorted to calling transform() on both the stored data and the user-entered query, so that all pertinent matches would be found. People would even create indexes on the unaccented data to speed this up. It's an important feature, not a corner case by any means. Not sure about other languages. For instance perhaps in French they would be interested in ignoring some accents but not others. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 1/21/14, 6:42 PM, Robert Haas wrote: On Sun, Jan 19, 2014 at 1:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It can be problem on my side - some strange combination of mime type. I seen this issue before. I will recheck it tomorrow from other computer. Doesn't matter anyway. Patch needing to strip trailing CRs doesn't cause any issue. I got the same message, BTW; maybe some kind of gmail weirdness. Interesting. As it turns out, a function called cardinality() was added in 2009 and ripped back out again. But the one that was committed back then had funny semantics that people weren't sure about, and people seemed to think it should behave like this one does. So I've gone ahead and committed this, crossing my fingers that we won't have to rip it back out again. Thanks! I'll keep my fingers crossed as well. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incorrectly reporting config errors
Hi all, I'm getting a report of a config error when changing a config value that requires a restart: In postgresql.conf max_connections = 92 (pg_ctl restart) postgres=# show max_connections ; max_connections - 92 (1 row) (Edit postgresql.conf so that max_connections = 93) (pg_ctl reload) Now the log file contains: 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. I changed the 92 to 93. If I restart, it doesn't complain, and there's nothing in the log about the config anymore. This seems to be the case for any parameter with a postmaster context. I can understand why it logs the message about it not changing without a restart, but the other one seems like a bug. I've tested this on 9.3 and 9.4devel. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Robert Haas robertmh...@gmail.com writes: On Fri, Jan 17, 2014 at 8:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's still the case that we need a policy against INT64_FORMAT in translatable strings, though, because what's passed to the gettext mechanism has to be a fixed string not something with platform dependencies in it. Or so I would assume, anyway. Well, that's kinda problematic, isn't it? Printing the variable into a static buffer so that it can then be formatted with %s is pretty pessimal for a message that we might not even be planning to emit. Well, it's a tad annoying, but a quick grep says there are maybe 40 cases of this in our source code, so I'm not sure it's rising to the level of a must-fix problem. Perhaps we should jettison entirely the idea of using the operating system's built-in sprintf and use one of our own that has all of the nice widgets we need, like a format code that's guaranteed to be right for uint64 and one that's guaranteed to be right for Size. This could turn out to be a bad idea if the best sprintf we can write is much slower than the native sprintf on any common platforms ... and maybe it wouldn't play nice with GCC's desire to check format strings. That last is a deal-breaker. It's not just whether gcc desires to check this --- we *need* that checking, because people get it wrong without it. I thought about proposing that we insist that snprintf support the ll flag (substituting our own if not). But that doesn't really fix anything unless we're willing to explicitly cast the corresponding values to long long, which is probably not workable from a portability standpoint. I'm not really worried about platforms thinking long long is 128 bits --- I'm worried about old compilers that don't have such a datatype. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug? could not remove shared memory segment
On Tue, Jan 21, 2014 at 1:05 PM, Fujii Masao masao.fu...@gmail.com wrote: When I shut down the standby server and then the master server in that order, I found that the master server emit the following message. LOG: XX000: could not remove shared memory segment /PostgreSQL.1804289383: Invalid argument LOCATION: dsm_impl_posix, dsm_impl.c:269 Is this intentional message or a bug? $ uname -a Darwin test.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 I used the latest version of the source code in the git master. Urgh. I think what happened is that, when you cloned the master to produce the standby, you also copied the pg_dynshmem directory, which cause the standby to latch onto the master's control segment and blow it away on startup. When the master tried to shut down, it got unhappy. I think this is further evidence that we need to get rid of the state file; while I haven't agreed with Heikki's comments on this topic in their entirety, there is clearly a problem here that needs to get fixed somehow. Latest discussion of this topic is here: http://www.postgresql.org/message-id/CA+TgmoZAkz3yrZkm8D=n27dsc00pdqsshzljbkq29_twqge...@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
Thom Brown t...@linux.com writes: I'm getting a report of a config error when changing a config value that requires a restart: ... 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. Yeah it does: it's got a value that can't be applied. I think you're making a semantic quibble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 21 January 2014 18:35, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. I don't see we need to prevent WAL insertions when the disk fills. We still have the whole of wal_buffers to use up. When that is full, we will prevent further WAL insertions because we will be holding the WALwritelock to clear more space. So the rest of the system will lock up nicely, like we want, apart from read-only transactions. I'm not sure that all writing transactions lock up hard is really so much better than the current behavior. Lock up momentarily, until the situation clears. But my proposal would allow the situation to fully clear, i.e. all WAL files could be deleted as soon as replication/archiving has caught up. The current behaviour doesn't automatically correct itself as this proposal would. My proposal is also fully safe in line with synchronous replication, as well as zero performance overhead for mainline processing. My preference would be that we simply start failing writes with ERRORs rather than PANICs. Yes, that is what I am proposing, amongst other points. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's being done inside a critical section, but could we move that? Yes, I think so. Instead of PANICing, we should simply signal the checkpointer to perform a shutdown checkpoint. And if that fails for lack of disk space? I proposed a way to ensure it wouldn't fail for that, at least on pg_xlog space. In any case, what you're proposing sounds like a lot of new complication in a code path that is necessarily never going to be terribly well tested. It's the smallest amount of change proposed so far... I agree on the danger of untested code. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On 01/21/2014 10:26 AM, Thom Brown wrote: Hi all, I'm getting a report of a config error when changing a config value that requires a restart: In postgresql.conf max_connections = 92 (pg_ctl restart) postgres=# show max_connections ; max_connections - 92 (1 row) (Edit postgresql.conf so that max_connections = 93) (pg_ctl reload) Now the log file contains: 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. I changed the 92 to 93. If I restart, it doesn't complain, and there's nothing in the log about the config anymore. This seems to be the case for any parameter with a postmaster context. I can understand why it logs the message about it not changing without a restart, but the other one seems like a bug. You wanted a change in a value, the change did not occur, hence an error. I've tested this on 9.3 and 9.4devel. Thom -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Documentation patch for to_date and to_timestamp
The attached patch is in response to ongoing mailing-list questions regarding perceived weirdness in to_timestamp and to_date. The patch modifies doc/src/sgml/func.sgml to add (see usage notes) in the description column for to_date and to_timestamp in the Formatting Functions table and adds the following two list items to the start of the usage notes for date/time conversion: The to_date and to_timestamp functions exist to parse unusual input formats that cannot be handled by casting. These functions interpret input liberally and with minimal error checking so the conversion has the potential to yield unexpected results. Read the following notes and test carefully before use. Casting is the preferred method of conversion wherever possible. Input to to_date and to_timestamp is not restricted to normal ranges thus to_date('20096040','MMDD') returns 2014-01-17 rather than generating an error. Cheers, Steve diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c76d357..19197ce 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5426,7 +5426,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); literalfunctionto_date(typetext/type, typetext/type)/function/literal /entry entrytypedate/type/entry -entryconvert string to date/entry +entryconvert string to date (see usage notes)/entry entryliteralto_date('05nbsp;Decnbsp;2000', 'DDnbsp;Monnbsp;')/literal/entry /row row @@ -5448,7 +5448,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); literalfunctionto_timestamp(typetext/type, typetext/type)/function/literal /entry entrytypetimestamp with time zone/type/entry -entryconvert string to time stamp/entry +entryconvert string to time stamp (see usage notes)/entry entryliteralto_timestamp('05nbsp;Decnbsp;2000', 'DDnbsp;Monnbsp;')/literal/entry /row row @@ -5750,10 +5750,32 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); para Usage notes for date/time formatting: + /para + para itemizedlist listitem para + The functionto_date/function and functionto_timestamp/function + functions exist to parse unusual input formats that cannot be handled + by casting. These functions interpret input liberally and with minimal + error checking so the conversion has the potential to yield unexpected + results. Read the following notes and test carefully before use. + Casting is the preferred method of conversion wherever possible. + /para + /listitem + + listitem + para + Input to functionto_date/function and + functionto_timestamp/function is not restricted to normal ranges + thus literalto_date('20096040','MMDD')/literal returns + literal2014-01-17/literal rather than generating an error. + /para + /listitem + + listitem + para literalFM/literal suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width. In productnamePostgreSQL/productname, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On 21 January 2014 18:35, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I'm getting a report of a config error when changing a config value that requires a restart: ... 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. Yeah it does: it's got a value that can't be applied. I think you're making a semantic quibble. I see it as technically wrong. There's nothing wrong with my config file. A reload of the file may not be able to apply all the settings, but there's no typo or mistake anywhere in my file. I would just need to restart instead of reload. However, given that you find it unsurprising, I'll leave it there. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT Null constraint on foreign table not working
Albe Laurenz laurenz.a...@wien.gv.at writes: I believe that a column of a foreign table should be NOT NULL only if it is guaranteed that it cannot contain NULL values. Doesn't the planner rely on that? The planner does expect that constraints tell the truth. I don't remember how significant a false NOT NULL constraint might be, but certainly false CHECK constraints can give rise to incorrect plans. But PostgreSQL cannot guarantee that, that has to happen on the remote side (or in the FDW). I think that it is best that an error for a constraint violation is thrown by the same entity that guarantees that the constraint is respected. A point worth making here is that it's already implicit in the contract that the CREATE FOREIGN TABLE command accurately represents what the far-side table is. If you get the column datatypes wrong, for example, it's possible to have subtle semantic bugs not all that different from what will happen with an incorrectly-deduced plan. And we don't make any attempt to slap your wrist for that. So I don't see that there's anything fundamentally wrong with the position that any NOT NULL or CHECK constraints attached to a foreign table must be accurate reflections of constraints that exist on the far side, rather than something we should enforce locally. (Note that this argument applies to FDWs for remote SQL servers, but not necessarily for FDWs for non-SQL data sources, where conceivably the CREATE FOREIGN TABLE command actually is itself the authoritative truth. Such an FDW would then be responsible for enforcing constraints.) I agree though that we've failed miserably to explain this in the docs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Jan 21, 2014 at 2:00 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 20, 2014 at 9:49 PM, Robert Haas robertmh...@gmail.com wrote: I ran Heikki's test suit on latest master and latest master plus pgrb_delta_encoding_v4.patch on a PPC64 machine, but the results didn't look too good. The only tests where the WAL volume changed by more than half a percent were the one short and one long field, no change test, where it dropped by 17%, but at the expense of an increase in duration of 38%; and the hundred tiny fields, half nulled test, where it dropped by 2% without a change in runtime. Unfortunately, some of the tests where WAL didn't change significantly took a runtime hit - in particular, hundred tiny fields, half changed slowed down by 10% and hundred tiny fields, all changed by 8%. I think this part of result is positive, as with earlier approaches here the dip was 20%. Refer the result posted at link: http://www.postgresql.org/message-id/51366323.8070...@vmware.com I've attached the full results in OpenOffice format. Profiling the one short and one long field, no change test turns up the following: 51.38% postgres pgrb_delta_encode 23.58% postgres XLogInsert 2.54% postgres heap_update 1.09% postgres LWLockRelease 0.90% postgres LWLockAcquire 0.89% postgres palloc0 0.88% postgres log_heap_update 0.84% postgres HeapTupleSatisfiesMVCC 0.75% postgres ExecModifyTable 0.73% postgres hash_search_with_hash_value Yipes. That's a lot more than I remember this costing before. And I don't understand why I'm seeing such a large time hit on this test where you actually saw a significant time *reduction*. One possibility is that you may have been running with a default checkpoint_segments value or one that's low enough to force checkpointing activity during the test. I ran with checkpoint_segments=300. I ran with checkpoint_segments = 128 and when I ran with v4, I also see similar WAL reduction as you are seeing, except that in my case runtime for both are almost similar (i think in your case disk writes are fast, so CPU overhead is more visible). I think the major difference in above test is due to below part of code: pgrb_find_match() { .. + /* if (match_chunk) + { + while (*ip == *hp) + { + matchlen++; + ip++; + hp++; + } + } */ } Basically if we don't go for longer match, then for test where most data (one short and one long field, no change) is similar, it has to do below extra steps with no advantage: a. copy extra tags b. calculation for rolling hash c. finding the match I think here major cost is due to 'a', but others might also not be free. To confirm the theory, if we run the test by just un-commenting above code, there can be significant change in both WAL reduction and runtime for this test. I have one idea to avoid the overhead of step a) which is to combine the tags, means don't write the tag until it founds any un-matching data. When any un-matched data is found, then combine all the previously matched data and write it as one tag. This should eliminate the overhead due to step a. I think that's a good thing to try. Can you code it up? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Mon, Jan 20, 2014 at 9:44 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: Thanks for the comments. 2014/1/21 KaiGai Kohei kai...@ak.jp.nec.com: In addition, an idea which I can't throw away is to assume that all constraints defined on foreign tables as ASSERTIVE. Foreign tables potentially have dangers to have wrong data by updating source data not through foreign tables. This is not specific to an FDW, so IMO constraints defined on foreign tables are basically ASSERTIVE. Of course PG can try to maintain data correct, but always somebody might break it. qu Does it make sense to apply assertive CHECK constraint on the qual of ForeignScan to filter out tuples with violated values at the local side, as if row-level security feature doing. It enables to handle a situation that planner expects only clean tuples are returned but FDW driver is unavailable to anomalies. Probably, this additional check can be turned on/off on the fly, if FDW driver has a way to inform the core system its capability, like FDW_CAN_ENFORCE_CHECK_CONSTRAINT that informs planner to skip local checks. Hmm, IIUC you mean that local users can't (or don't need to) know that data which violates the local constraints exist on remote side. Applying constraints to the data which is modified through FDW would be necessary as well. In that design, FDW is a bidirectional filter which provides these features: 1) Don't push wrong data into remote data source, by applying local constraints to the result of the modifying query executed on local PG. This is not perfect filter, because remote constraints don't mapped automatically or perfectly (imagine constraints which is available on remote but is not supported in PG). 2) Don't retrieve wrong data from remote to local PG, by applying local constraints I have a concern about consistency. It has not been supported, but let's think of Aggregate push-down invoked by a query below. SELECT count(*) FROM remote_table; If this query was fully pushed down, the result is the # of records exist on remote side, but the result would be # of valid records when we don't push down the aggregate. This would confuse users. Besides CHECK constraints, currently NOT NULL constraints are virtually ASSERTIVE (not enforcing). Should it also be noted explicitly? Backward compatibility…. Yep, backward compatibility (especially visible ones to users) should be minimal, ideally zero. NOT NULL [ASSERTIVE] might be an option. Treating [ASSERTIVE | NOT ASSERTIVE] like DEFERRABLE, and allow ingASSERTIVE for only foreign tables? It makes sense, though we need consider exclusiveness . But It needs to default to ASSERTIVE on foreign tables, and NOT ASSERTIVE (means forced) on others. Isn't is too complicated? CREATE FOREIGN TABLE foo ( id int NOT NULL ASSERTIVE CHECK (id 1) ASSERTIVE, … CONSTRAINT chk_foo_name_upper CHECK (upper(name) = name) ASSERTIVE ) SERVER server; BTW, I noticed that this is like push-down-able expressions in JOIN/WHERE. We need to check a CHECK constraint defined on a foreign tables contains only expressions which have same semantics as remote side (in practice, built-in and immutable)? I don't think that that ASSERTIVE is going to fly, because assertive means (sayeth the Google) having or showing a confident and forceful personality, which is not what we mean here. It's tempting to do something like try to replace the keyword check with assume or assert or (stretching) assertion, but that would require whichever one we picked to be a fully-reserved keyword, which I can't think is going to get much support here, for entirely understandable reasons. So I think we should look for another option. Currently, constraints can be marked NO INHERIT (though this seems to have not been fully documented, as the ALTER TABLE page doesn't mention it anywhere) or NOT VALID, so I'm thinking maybe we should go with something along those lines. Some ideas: - NO CHECK. The idea of writing CHECK (id 1) NO CHECK is pretty hilarious, though. - NO VALIDATE. But then people need to understand that NOT VALID means we didn't validate it yet while no validate means we don't ever intend to validate it, which could be confusing. - NO ENFORCE. Requires a new (probably unreserved) keyword. - NOT VALIDATED or NOT CHECKED. Same problems as NO CHECK and NO VALIDATE, respectively, plus now we have to create a new keyword. Another idea is to apply an extensible-options syntax to constraints, like we do for EXPLAIN, VACUUM, etc. Like maybe: CHECK (id 1) OPTIONS (enforced false, valid true) Yet another idea is to consider validity a three-state property: either the constraint is valid (because we have checked it and are enforcing it), or it is not valid (because we are enforcing it but have not checked the pre-existing data), or it is assumed true (because we are not checking or enforcing it but are believing it anyway). So then we
Re: [HACKERS] Closing commitfest 2013-11
On 01/20/2014 10:31 PM, Tom Lane wrote: I think the idea was that patch authors should take responsibility for pushing their patches forward to 2014-01 if they still wanted them considered. Quite a few patches already were moved that way, IIRC. Agreed though that we shouldn't let them just rot. Does this mean I can resurrect my pg_sleep_until() patch? I didn't set it back to Needs Review after I completely changed my approach based on feedback. I would hate for it to get lost just because I didn't know how to use the commitfest app. https://commitfest.postgresql.org/action/patch_view?id=1189 -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny representation in pg_stat_statements.query.
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: - CURRENT_TIME and the like are parsed into the nodes directly represents the node specs in gram.y blah, blah Since this becomes more than a simple bug fix, I think it is too late for 9.4 now. I'll work on this in the longer term. OK. I will get around to it someday, but if you do it first, that's fine. The fundamental cause of this issue is Const node which conveys the location of other than constant tokens. Any other nodes, for instance TypeCasts, are safe. So this is fixed by quite simple way like following getting rid of the referred difficulties of keeping the code sane and total loss of token locations. (white spaces are collapsed for readability) Huh, that's a cute idea. It's certainly a kluge with a capital K, and might fail to extend to new cases; but if we're intending to replace all these cases with new special-purpose parse nodes soon, neither of those objections seem very strong. The main concern I'd have would be whether there could be any weird change in error cursor locations, but right offhand that's probably not a problem. We know in each of these cases that there will be some node produced by the coercion step, so the location won't disappear entirely, and so exprLocation() of the node tree as a whole should be the same. Not labeling the A_Const could result in failure to produce an error location for a complaint about invalid input for the coercion --- but surely that should never happen in any of these cases. So right offhand I think this is probably workable, and while it's ugly it's an appropriate amount of effort to put into code whose days are numbered anyhow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET typos and fix for temporary file name management
On Tue, Jan 21, 2014 at 7:47 AM, Michael Paquier michael.paqu...@gmail.com wrote: After going through commit 65d6e4c (introducing ALTER SYSTEM SET), I noticed a couple of typo mistakes as well as (I think) a weird way of using the temporary auto-configuration name postgresql.auto.conf.temp in two different places, resulting in the patch attached. .tmp suffix is used at couple other places in code as well. snapmgr.c XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), .tmp); receivelog.c snprintf(tmppath, MAXPGPATH, %s.tmp, path); Although similar suffix is used at other places, but still I think it might be better to define for current case as here prefix (postgresql.auto.conf) is also fixed and chances of getting it changed are less. However even if we don't do, it might be okay as we are using such suffixes at other places as well. In the case of snapmgr.c and receivelog.c, the operations are kept local, so it does not matter much if this way of naming is done as all the operations for a temporary file are kept within the same, isolated function. However, the case of postgresql.auto.conf.temp is different: this temporary file name is used as well for a check in basebackup.c, so I imagine that it would be better to centralize this information in a single place. Now this name is only used in two places, but who knows if some additional checks here are there won't be needed for this temp file... postgresql.auto.conf.temp is also located at the root of PGDATA, so it might be surprising for the user to find it even if there are few chances that it can happen. I don't think there's any real reason to defined PG_AUTOCONF_FILENAME_TEMP. pg_stat_statements just writes PGSS_DUMP_FILE .tmp and that hasn't been a problem that I know of. I do wonder why ALTER SYSTEM SET is spelling the suffix temp instead of tmp. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? pg_stat_statements in PG9.4dev has already changed table columns in. So I hope this patch will be committed in PG9.4dev. I've read through the preceding discussion and I'm not very impressed. Lots of people have spoken about wanting histogram output and I can't even see a clear statement of whether that will or will not happen. AFAICS, all that has happened is that people have given their opinions and we've got almost the same identical patch, with a rush-rush comment to commit even though we've waited months. If you submit a patch, then you need to listen to feedback and be clear about what you will do next, if you don't people will learn to ignore you and nobody wants that. I should point out that technically this patch is late and we could reject it solely on that basis, if we wanted to. I agree with people saying that stddev is better than nothing at all, so I am inclined to commit this, in spite of the above. Any objections to commit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny representation in pg_stat_statements.query.
On Tue, Jan 21, 2014 at 12:30 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: The fundamental cause of this issue is Const node which conveys the location of other than constant tokens. Any other nodes, for instance TypeCasts, are safe. So this is fixed by quite simple way like following getting rid of the referred difficulties of keeping the code sane and total loss of token locations. (white spaces are collapsed for readability) There was a more obvious case of this that I noticed during the development of pg_stat_statements query normalization. As you may have noticed, that was addressed by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5d3fcc4c2e137417ef470d604fee5e452b22f6a7 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Dec 18, 2013 at 12:21:08PM -0500, Robert Haas wrote: On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: The larger point is that such a shutdown process has never in the history of Postgres been successful at removing shared-memory (or semaphore) resources. I do not feel a need to put a higher recoverability standard onto the DSM code than what we've had for fifteen years with SysV shmem. But, by the same token, it shouldn't be any *less* recoverable. In this context that means that starting a new postmaster should recover the resources, at least 90% of the time (100% if we still have the old postmaster lockfile). I think the idea of keeping enough info in the SysV segment to permit recovery of DSM resources is a good one. Then, any case where the existing logic is able to free the old SysV segment is guaranteed to also work for DSM. So I'm taking a look at this. There doesn't appear to be anything intrinsically intractable here, but it seems important to time the order of operations so as to minimize the chances that things fail in awkward places. The point where we remove the old shared memory segment from the previous postmaster invocation is here: /* * The segment appears to be from a dead Postgres process, or from a * previous cycle of life in this same process. Zap it, if possible. * This probably shouldn't fail, but if it does, assume the segment * belongs to someone else after all, and continue quietly. */ shmdt(memAddress); if (shmctl(shmid, IPC_RMID, NULL) 0) continue; My first thought was to remember the control segment ID from the header just before the shmdt() there, and then later when the DSM module is starting, do cleanup. But that doesn't seem like a good idea, because then if startup fails after we remove the old shared memory segment and before we get to the DSM initialization code, we'll have lost the information on what control segment needs to be cleaned up. A subsequent startup attempt won't see the old shm again, because it's already gone. I'm fairly sure that this would be a net reduction in reliability vs. the status quo. So now what I'm thinking is that we ought to actually perform the DSM cleanup before detaching the old segment and trying to remove it. That shouldn't fail, but if it does, or if we get killed before completing it, the next run will hopefully find the same old shm and finish the cleanup. But that kind of flies in the face of the comment above: if we perform DSM cleanup and then discover that the segment wasn't ours after all, that means we just stomped all over somebody else's stuff. That's not too good. But trying to remove the segment first and then perform the cleanup creates a window where, if we get killed, the next restart will have lost information about how to finish cleaning up. So it seems that some kind of logic tweak is needed here, but I'm not sure exactly what. As I see it, the options are: 1. Make failure to remove the shared memory segment we thought was ours an error. This will definitely show up any problems, but only after we've burned down some other processes's dynamic shared memory segments. The most likely outcome is creating failure-to-start problems that don't exist today. 2. Make it a warning. We'll still burn down somebody else's DSMs, but at least we'll still start ourselves. Sadly, WARNING: You have just done a bad thing. It's too late to fix it. Sorry! is not very appealing. It has long been the responsibility of PGSharedMemoryCreate() to determine that a segment is unimportant before calling IPC_RMID. The success or failure of IPC_RMID is an unreliable guide to the correctness of that determination. IPC_RMID will succeed on an important segment owned by the same UID, and it will fail if some other process removed the segment after our shmat(). Such a failure does not impugn our having requested DSM cleanup on the basis of the PGShmemHeader we did read, so an apologetic WARNING would be wrong. 3. Remove the old shared memory segment first, then perform the cleanup immediately afterwards. If we get killed before completing the cleanup, we'll leak the un-cleaned-up stuff. Decide that's OK and move on. The period in which process exit would leak segments is narrow enough that this seems fine. I see no net advantage over performing the cleanup before shmdt(), though. 4. Adopt some sort of belt-and-suspenders approach, keeping the state file we have now and backstopping it with this mechanism, so that we really only need this to work when $PGDATA has been blown away and recreated. This seems pretty inelegant, and I'm not sure who it'll benefit other than those (few?) people who kill -9 the postmaster (or it segfaults or otherwise dies without running the code to remove shared memory segments) and then remove
[HACKERS] Re[2]: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)
Hi But maybe postgres should provide its own subsystem like linux active/inactive memory over and/or near shared buffers? There could be some postgres special heuristics in its own approach. And does anyone know how mysql-innodb guys are getting with similar issues? Thank you!
Re: [HACKERS] inherit support for foreign tables
Robert Haas robertmh...@gmail.com writes: One thing that's bugging me a bit about this whole line of attack is that, in the first instance, the whole goal here is to support inheritance hierarchies that mix ordinary tables with foreign tables. If you have a table with children some of which are inherited and others of which are not inherited, you're very likely going to want your constraints enforced for real on the children that are tables and assumed true on the children that are foreign tables, and none of what we're talking about here gets us to that, because we normally want the constraints to be identical throughout the inheritance hierarchy. There's a nearby thread that's addressing this same question, in which I make the case (again) that the right thing for postgres_fdw constraints is that they're just assumed true. So I'm not sure why this conversation is proposing to implement a lot of mechanism to do something different from that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/21/2014 02:48 PM, Simon Riggs wrote: I agree with people saying that stddev is better than nothing at all, so I am inclined to commit this, in spite of the above. Any objections to commit? I have not been following terribly closely, but if it includes stddev then yes, please do, many of us will find it very useful indeed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Tue, Jan 21, 2014 at 3:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: One thing that's bugging me a bit about this whole line of attack is that, in the first instance, the whole goal here is to support inheritance hierarchies that mix ordinary tables with foreign tables. If you have a table with children some of which are inherited and others of which are not inherited, you're very likely going to want your constraints enforced for real on the children that are tables and assumed true on the children that are foreign tables, and none of what we're talking about here gets us to that, because we normally want the constraints to be identical throughout the inheritance hierarchy. There's a nearby thread that's addressing this same question, in which I make the case (again) that the right thing for postgres_fdw constraints is that they're just assumed true. So I'm not sure why this conversation is proposing to implement a lot of mechanism to do something different from that. /me scratches head. Because the other guy named Tom Lane took the opposite position on the second message on this thread, dated 11/14/13? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Tue, Jan 21, 2014 at 11:48 AM, Simon Riggs si...@2ndquadrant.com wrote: I agree with people saying that stddev is better than nothing at all, so I am inclined to commit this, in spite of the above. I could live with stddev. But we really ought to be investing in making pg_stat_statements work well with third-party tools. I am very wary of enlarging the counters structure, because it is protected by a spinlock. There has been no attempt to quantify that cost, nor has anyone even theorized that it is not likely to be appreciable. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing commitfest 2013-11
Vik Fearing wrote: On 01/20/2014 10:31 PM, Tom Lane wrote: I think the idea was that patch authors should take responsibility for pushing their patches forward to 2014-01 if they still wanted them considered. Quite a few patches already were moved that way, IIRC. Agreed though that we shouldn't let them just rot. Does this mean I can resurrect my pg_sleep_until() patch? I didn't set it back to Needs Review after I completely changed my approach based on feedback. I would hate for it to get lost just because I didn't know how to use the commitfest app. https://commitfest.postgresql.org/action/patch_view?id=1189 No objection here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re[2]: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)
On Tue, Jan 21, 2014 at 5:01 PM, Миша Тюрин tmih...@bk.ru wrote: And does anyone know how mysql-innodb guys are getting with similar issues? I'm no innodb dev, but from managing mysql databases, I can say that mysql simply eats all the RAM the admin is willing to allocate for the DB, and is content with the page cache almost not working. IOW: mysql manages its own cache and doesn't need or want the page cache. That *does* result in terrible performance when I/O is needed. Some workloads are nigh-impossible to optimize with this scheme. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On Tue, Jan 21, 2014 at 1:59 PM, Thom Brown t...@linux.com wrote: On 21 January 2014 18:35, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I'm getting a report of a config error when changing a config value that requires a restart: ... 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. Yeah it does: it's got a value that can't be applied. I think you're making a semantic quibble. I see it as technically wrong. There's nothing wrong with my config file. A reload of the file may not be able to apply all the settings, but there's no typo or mistake anywhere in my file. I would just need to restart instead of reload. However, given that you find it unsurprising, I'll leave it there. I kind of agree with Thom. I understand why it's doing what it's doing, but it still seems sort of lame. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
Robert Haas robertmh...@gmail.com writes: I kind of agree with Thom. I understand why it's doing what it's doing, but it still seems sort of lame. Well, the point of the message is to report that we failed to apply all the settings requested by the file. If you prefer some wording squishier than error, we could bikeshed the message phrasing. But I don't think we should suppress the message entirely; nor does it seem worthwhile to try to track whether all the failures were of precisely this type. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
Fwiw I think all transactions lock up until space appears is *much* better than PANICing. Often disks fill up due to other transient storage or people may have options to manually increase the amount of space. it's much better if the database just continues to function after that rather than need to be restarted. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_istready and older versions
All, pg_isready works against older versions of PostgreSQL. Does anyone know if there's a limit to that? v3 protocol change? Something else? Backwards compatibility ought to be in its docs, but to fix that I need to know what version it's compatible *to*. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On Tue, Jan 21, 2014 at 3:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I kind of agree with Thom. I understand why it's doing what it's doing, but it still seems sort of lame. Well, the point of the message is to report that we failed to apply all the settings requested by the file. If you prefer some wording squishier than error, we could bikeshed the message phrasing. But I don't think we should suppress the message entirely; nor does it seem worthwhile to try to track whether all the failures were of precisely this type. Well, to me the whole thing smacks of: LOG: there is a problem LOG: please be aware that we logged a message about a problem The only real argument for the message: LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied ...is that somebody might think that the presence of a single error caused all the changes to be ignored. And there's a good reason they might think that: we used to do it that way. But on the flip side, if we emitted a LOG or WARNING message every time the user did something that works in a way incompatible with previous releases, we'd go insane. So I think the argument for just dumping that message altogether is actually pretty good. Bikeshedding the wording is, of course, another viable option. For that matter, ignoring the problem is a pretty viable option, too. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 01/21/2014 11:35 AM, Heikki Linnakangas wrote: On 01/21/2014 04:02 AM, Tomas Vondra wrote: On 20.1.2014 19:30, Heikki Linnakangas wrote: Attached is a yet another version, with more bugs fixed and more comments added and updated. I would appreciate some heavy-testing of this patch now. If you could re-run the tests you've been using, that could be great. I've tested the WAL replay by replicating GIN operations over streaming replication. That doesn't guarantee it's correct, but it's a good smoke test. I gave it a try - the OOM error seems to be gone, but now get this PANIC: cannot insert duplicate items to GIN index page This only happens when building the index incrementally (i.e. using a sequence of INSERT statements into a table with GIN index). When I create a new index on a table (already containing the same dataset) it works just fine. Also, I tried to reproduce the issue by running a simple plpgsql loop (instead of a complex python script): DO LANGUAGE plpgsql $$ DECLARE r tsvector; BEGIN FOR r IN SELECT body_tsvector FROM data_table LOOP INSERT INTO idx_table (body_tsvector) VALUES (r); END LOOP; END$$; where data_table is the table with imported data (the same data I mentioned in the post about OOM errors), and index_table is an empty table with a GIN index. And indeed it fails, but only if I run the block in multiple sessions in parallel. Oh, I see what's going on. I had assumed that there cannot be duplicate insertions into the posting tree, but that's dead wrong. The fast insertion mechanism depends on a duplicate insertion to do nothing. Ok, this turned out to be a much bigger change than I thought... In principle, it's not difficult to eliminate duplicates. However, to detect a duplicate, you have to check if the item is present in the uncompressed items array, or in the compressed lists. That requires decoding the segment where it should be. But if we decode the segment, what's the purpose of the uncompressed items array? Its purpose was to speed up insertions, by buffering them so that we don't need to decode and re-encode the page/segment on every inserted item. But if we're paying the price of decoding it anyway, we might as well include the new item and re-encode the segment. The uncompressed area saves some effort in WAL-logging, as the record of inserting an entry into the uncompressed area is much smaller than that of re-encoding part of the page, but if that really is a concern, we could track more carefully which parts of the page is modified, and only WAL-log the required parts. And hopefully, the fast-update lists buffer inserts so that you insert many items at a time to the posting tree, and the price of re-encoding is only paid once. So, now that I think about this once more, I don't think the uncompressed area in every leaf page is a good idea. I refactored the way the recompression routine works again. It is now more clearly a multi-step process. First, the existing page is disassembled into an in-memory struct, which is a linked list of all the segments. In-memory each segment can be represented as an array of item pointers, or in the compressed format. In the next phase, all the new items are added to the segments where they belong. This naturally can lead to overly large segments; in the third phase, the items are redistributed among the segments, and compressed back to the compressed format. Finally, the finished segments are written back to the page, or pages if it had to be split. The same subroutines to disassemble and recompress are also used in vacuum. Attached is what I've got now. This is again quite heavily-changed from the previous version - sorry for repeatedly rewriting this. I'll continue testing and re-reviewing this tomorrow. - Heikki gin-packed-postinglists-20140121.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET typos and fix for temporary file name management
Robert Haas escribió: I don't think there's any real reason to defined PG_AUTOCONF_FILENAME_TEMP. pg_stat_statements just writes PGSS_DUMP_FILE .tmp and that hasn't been a problem that I know of. I do wonder why ALTER SYSTEM SET is spelling the suffix temp instead of tmp. I agree with Michael that having pg_basebackup be aware of the .temp suffix is ugly; for instance if we were to fix it to .tmp in ALTER SYSTEM but forgot to change pg_basebackup, the check would be immediately broken. But on the other hand I'm not sure why it's such a problem for pg_basebackup that it needs to be checking in the first place -- how about we rip that out? Perhaps the temp file needs to be in pgsql_tmp? (Do we need to worry about cross-filesystem links if we do? I mean, do we support mounting pgsql_tmp separately?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny representation in pg_stat_statements.query.
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: The fundamental cause of this issue is Const node which conveys the location of other than constant tokens. Any other nodes, for instance TypeCasts, are safe. So this is fixed by quite simple way like following getting rid of the referred difficulties of keeping the code sane and total loss of token locations. (white spaces are collapsed for readability) Committed with minor adjustments (improved comments, and hit one additional case you missed) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On 01/21/2014 07:31 PM, Fujii Masao wrote: On Fri, Dec 20, 2013 at 9:21 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com ! if (source == XLOG_FROM_ARCHIVE StandbyModeRequested) Even when standby_mode is not enabled, we can use cascade replication and it needs the accumulated WAL files. So I think that AllowCascadeReplication() should be added into this condition. ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR /RECOVERYXLOG); ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); ! ! if (restoredFromArchive) Don't we need to check !StandbyModeRequested and !AllowCascadeReplication() here? Oh, you are correct. Okay, done. Thanks! The patch looks good to me. Attached is the updated version of the patch. I added the comments. Sorry for reacting so slowly, but I'm not sure I like this patch. It's a quite useful property that all the WAL files that are needed for recovery are copied into pg_xlog, even when restoring from archive, even when not doing cascading replication. It guarantees that you can restart the standby, even if the connection to the archive is lost for some reason. I intentionally changed the behavior for archive recovery too, when it was introduced for cascading replication. Also, I think it's good that the behavior does not depend on whether cascading replication is enabled - it's a quite subtle difference. So, IMHO this is not a bug, it's a feature. To solve the original problem of running out of disk space in archive recovery, I wonder if we should perform restartpoints more aggressively. We intentionally don't trigger restatpoings by checkpoint_segments, only checkpoint_timeout, but I wonder if there should be an option for that. MauMau, did you try simply reducing checkpoint_timeout, while doing recovery? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
Greg Stark st...@mit.edu writes: Fwiw I think all transactions lock up until space appears is *much* better than PANICing. Often disks fill up due to other transient storage or people may have options to manually increase the amount of space. it's much better if the database just continues to function after that rather than need to be restarted. Well, PANIC is certainly bad, but what I'm suggesting is that we just focus on getting that down to ERROR and not worry about trying to get out of the disk-shortage situation automatically. Nor do I believe that it's such a good idea to have the database freeze up until space appears rather than reporting errors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
Robert Haas robertmh...@gmail.com writes: The only real argument for the message: LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied ...is that somebody might think that the presence of a single error caused all the changes to be ignored. And there's a good reason they might think that: we used to do it that way. But on the flip side, if we emitted a LOG or WARNING message every time the user did something that works in a way incompatible with previous releases, we'd go insane. So I think the argument for just dumping that message altogether is actually pretty good. Hm. I don't recall what the arguments were for adding that message, but you have a point that it might have served its purpose by now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alternative back-end block formats
On Tue, Jan 21, 2014 at 06:43:54AM -0500, Christian Convey wrote: Hi all, I'm playing around with Postgres, and I thought it might be fun to experiment with alternative formats for relation blocks, to see if I can get smaller files and/or faster server performance. Does anyone know if this has been done before with Postgres? I would have assumed yes, but I'm not finding anything in Google about people having done this. Not that I know of. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. I don't see we need to prevent WAL insertions when the disk fills. We still have the whole of wal_buffers to use up. When that is full, we will prevent further WAL insertions because we will be holding the WALwritelock to clear more space. So the rest of the system will lock up nicely, like we want, apart from read-only transactions. I'm not sure that all writing transactions lock up hard is really so much better than the current behavior. My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's being done inside a critical section, but could we move that? My understanding is that if it runs out of buffer space while in an XLogInsert, it will be holding one or more buffer content locks exclusively, and unless it can complete the xlog (or scrounge up the info to return that buffer to its previous state), it can never release that lock. There might be other paths were it could get by with an ERROR, but if no one can write xlog anymore, all of those paths must quickly converge to the one that cannot simply ERROR. Cheers, Jeff
Re: [HACKERS] Backup throttling
I realize the following should be applied on the top of v7: index a0216c1..16dd939 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1263,7 +1263,7 @@ throttle(size_t increment) throttling_counter %= throttling_sample; /* Once the (possible) sleep has ended, new period starts. */ - if (wait_result | WL_TIMEOUT) + if (wait_result WL_TIMEOUT) throttled_last += elapsed + sleep; else if (sleep 0) /* Sleep was necessary but might have been interrupted. */ // Tony On 01/20/2014 05:10 PM, Antonin Houska wrote: On 01/15/2014 10:52 PM, Alvaro Herrera wrote: I gave this patch a look. There was a bug that the final bounds check for int32 range was not done when there was no suffix, so in effect you could pass numbers larger than UINT_MAX and pg_basebackup would not complain until the number reached the server via BASE_BACKUP. Maybe that's fine, given that numbers above 1G will cause a failure on the server side anyway, but it looked like a bug to me. I tweaked the parse routine slightly; other than fix the bug, I made it accept fractional numbers, so you can say 0.5M for instance. Thanks. Perhaps we should make sure pg_basebackup rejects numbers larger than 1G as well. Is there a good place to define the constant, so that both backend and client can use it? I'd say 'include/common' but no existing file seems to be appropriate. I'm not sure if it's worth to add a new file there. Another thing I found a bit strange was the use of the latch. What this patch does is create a separate latch which is used for the throttling. This means that if the walsender process receives a signal, it will not wake up if it's sleeping in throttling. Perhaps this is okay: as Andres was quoted upthread as saying, maybe this is not a problem because the sleep times are typically short anyway. But we're pretty much used to the idea that whenever a signal is sent, processes act on it *immediately*. Maybe some admin will not feel comfortable about waiting some extra 20ms when they cancel their base backups. In any case, having a secondary latch to sleep on in a process seems weird. Maybe this should be using MyWalSnd-latch somehow. o.k., MyWalSnd-latch is used now. You have this interesting THROTTLING_MAX_FREQUENCY constant defined to 128, with the comment check this many times per second. Let's see: if the user requests 1MB/s, this value results in throttling_sample = 1MB / 128 = 8192. So for every 8kB transferred, we would stop, check the wall clock time, and if less time has lapsed than we were supposed to spend transferring those 8kB then we sleep. Isn't a check every 8kB a bit too frequent? This doesn't seem sensible to me. I think we should be checking perhaps every tenth of the requested maximum rate, or something like that, not every 1/128th. Now, what the code actually does is not necessarily that, because the sampling value is clamped to a minimum of 32 kB. But then if we're going to do that, why use such a large divisor value in the first place? I propose we set that constant to a smaller value such as 8. I tried to use THROTTLING_SAMPLE_MIN and THROTTLING_MAX_FREQUENCY to control both the minimum and maximum chunk size. It was probably too generic, THROTTLING_SAMPLE_MIN is no longer there. New patch version is attached. // Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] yum psycopg2 doc package not signed
I was updating the packages from one of my servers and I got this message: Package python-psycopg2-doc-2.5.2-1.f19.x86_64.rpm is not signed If I remove the package (I thought it might be that package alone) I get errors from other packages: Package python-psycopg2-2.5.2-1.f19.x86_64.rpm is not signed Something wrong with the packages? -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
I have been mulling over this patch, and I can't seem to come to terms with it. I first started making it look nicer here and there, thinking it was all mostly okay, but eventually arrived at the idea that it seems wrong to do what this does: basically, get_object_address() tries to obtain an object address, and if that fails, return InvalidOid; then, in RemoveObjects, we try rather hard to figure out why that failed, and construct an error message. It seems to me that it would make more sense to have get_object_address somehow return a code indicating what failed; then we don't have to go all over through the parser code once more. Perhaps, for example, when missing_ok is given as true to get_object_address it also needs to get a pointer to ObjectType and a string; if some object does not exist then fill the ObjectType with the failing object and the string with the failing name. Then RemoveObjects can construct a string more easily. Not sure how workable this exact idea is; maybe there is a better way. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On 01/21/2014 12:29 PM, Robert Haas wrote: On Tue, Jan 21, 2014 at 1:59 PM, Thom Brown t...@linux.com wrote: On 21 January 2014 18:35, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I'm getting a report of a config error when changing a config value that requires a restart: ... 2014-01-21 18:14:53 GMT [28718]: [4-1] user=,db=,client= LOG: received SIGHUP, reloading configuration files 2014-01-21 18:14:53 GMT [28718]: [5-1] user=,db=,client= LOG: parameter max_connections cannot be changed without restarting the server 2014-01-21 18:14:53 GMT [28718]: [6-1] user=,db=,client= LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied It doesn't contain errors. Yeah it does: it's got a value that can't be applied. I think you're making a semantic quibble. I see it as technically wrong. There's nothing wrong with my config file. A reload of the file may not be able to apply all the settings, but there's no typo or mistake anywhere in my file. I would just need to restart instead of reload. However, given that you find it unsurprising, I'll leave it there. I kind of agree with Thom. I understand why it's doing what it's doing, but it still seems sort of lame. Though I am not sure why it is lame when it seems to be following protocol; announce the problem, then tell where it originates. Seems like useful information to me. postgres-2014-01-21 14:39:54.738 PST-0ERROR: parameter max_connections cannot be changed without restarting the server postgres-2014-01-21 14:39:54.738 PST-0STATEMENT: SET max_connections=99; -2014-01-21 14:42:23.166 PST-0LOG: received SIGHUP, reloading configuration files -2014-01-21 14:42:23.168 PST-0LOG: parameter max_connections cannot be changed without restarting the server -2014-01-21 14:42:23.169 PST-0LOG: configuration file /usr/local/pgsql93/data/postgresql.conf contains errors; unaffected changes were applied -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrectly reporting config errors
On 2014-01-21 16:13:11 -0500, Robert Haas wrote: On Tue, Jan 21, 2014 at 3:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I kind of agree with Thom. I understand why it's doing what it's doing, but it still seems sort of lame. Well, the point of the message is to report that we failed to apply all the settings requested by the file. If you prefer some wording squishier than error, we could bikeshed the message phrasing. But I don't think we should suppress the message entirely; nor does it seem worthwhile to try to track whether all the failures were of precisely this type. Well, to me the whole thing smacks of: LOG: there is a problem LOG: please be aware that we logged a message about a problem The only real argument for the message: LOG: configuration file /home/thom/Development/data/postgresql.conf contains errors; unaffected changes were applied ...is that somebody might think that the presence of a single error caused all the changes to be ignored. And there's a good reason they might think that: we used to do it that way. But on the flip side, if we emitted a LOG or WARNING message every time the user did something that works in a way incompatible with previous releases, we'd go insane. So I think the argument for just dumping that message altogether is actually pretty good. I don't find that argument all that convincing. The expectation that a config file isn't applied if it contains errors is a generally reasonable one, not just because we used to do it that way. I also don't see the disadvantage of the current behavour of reporting that we didn't apply the change. Additionally it makes it easier to look for such errors, by having a relatively easy to remember message to search for in the log. What I think is that we're reporting such problems way too often. The usual cause I know of is something like: shared_buffers = 4GB ... shared_buffers = 6GB When reloading the config we'll report an error in the line setting it to 4GB because shared_buffers is PGC_POSTMASTER leading to a spurious contains errors message. Rather annoying. Now, I am sure somebody will argue along the lines of well, don't do that then, but I don't see much merit in that argument. A rather common and sensible configuration is to have a common configuration file used across servers, which then is overwritten by a per-server or per-cluster config file containing values specific to a server/cluster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] yum psycopg2 doc package not signed
Hi, On Tue, 2014-01-21 at 20:19 -0200, Martín Marqués wrote: I was updating the packages from one of my servers and I got this message: Package python-psycopg2-doc-2.5.2-1.f19.x86_64.rpm is not signed If I remove the package (I thought it might be that package alone) I get errors from other packages: Package python-psycopg2-2.5.2-1.f19.x86_64.rpm is not signed Something wrong with the packages? I thought I fixed that -- but apparently not. Please run yum clean metadata and try updating the package again. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: [HACKERS] new json funcs
Hi Andrew, On 1/18/14, 10:05 PM, I wrote: But I'll continue with my review now that this has been sorted out. Sorry about the delay. I think the API for the new functions looks good. They are all welcome additions to the JSON family. The implementation side looks reasonable to me. I'm not sure there's need to duplicate so much code, though. E.g. json_to_recordset is almost identical to json_populate_recordset, and json_to_record has a bit of the same disease. Finally, (as I'm sure you know already), docs are still missing. Marking the patch Waiting on Author for the time being. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
Jeff Janes jeff.ja...@gmail.com writes: On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's being done inside a critical section, but could we move that? My understanding is that if it runs out of buffer space while in an XLogInsert, it will be holding one or more buffer content locks exclusively, and unless it can complete the xlog (or scrounge up the info to return that buffer to its previous state), it can never release that lock. There might be other paths were it could get by with an ERROR, but if no one can write xlog anymore, all of those paths must quickly converge to the one that cannot simply ERROR. Well, the point is we'd have to somehow push detection of the problem to a point before the critical section that does the buffer changes and WAL insertion. The first idea that comes to mind is (1) estimate the XLOG space needed (an overestimate is fine here); (2) just before entering the critical section, call some function to reserve that space, such that we always have at least sum(outstanding reservations) available future WAL space; (3) release our reservation as part of the actual XLogInsert call. The problem here is that the reserve function would presumably need an exclusive lock, and would be about as much of a hot spot as XLogInsert itself is. Plus we'd be paying a lot of extra cycles to solve a corner case problem that, with all due respect, comes up pretty darn seldom. So probably we need a better idea than that. Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of future WAL space always available. But I'm not sure exactly how to make use of that flexibility. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Tue, Jan 21, 2014 at 3:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of future WAL space always available. But I'm not sure exactly how to make use of that flexibility. In the past I've thought that one approach that would eliminate concerns about portably and efficiently knowing how much space is left on the pg_xlog filesystem is to have a ballast file. Under this scheme, perhaps XLogInsert() could differentiate between a soft and hard failure. Hopefully the reserve function you mentioned, which is still called at the same place, just before each critical section thereby becomes cheap. Perhaps I'm just restating what you said, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 2014-01-21 18:24:39 -0500, Tom Lane wrote: Jeff Janes jeff.ja...@gmail.com writes: On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's being done inside a critical section, but could we move that? My understanding is that if it runs out of buffer space while in an XLogInsert, it will be holding one or more buffer content locks exclusively, and unless it can complete the xlog (or scrounge up the info to return that buffer to its previous state), it can never release that lock. There might be other paths were it could get by with an ERROR, but if no one can write xlog anymore, all of those paths must quickly converge to the one that cannot simply ERROR. Well, the point is we'd have to somehow push detection of the problem to a point before the critical section that does the buffer changes and WAL insertion. Well, I think that's already hard for the heapam.c stuff, but doing that in xact.c seems fracking hairy. We can't simply stop in the middle of a commit and not continue, that'd often grind the system to a halt preventing cleanup. Additionally the size of the inserted record for commits is essentially unbounded, which makes it an especially fun case. The first idea that comes to mind is (1) estimate the XLOG space needed (an overestimate is fine here); (2) just before entering the critical section, call some function to reserve that space, such that we always have at least sum(outstanding reservations) available future WAL space; (3) release our reservation as part of the actual XLogInsert call. I think that's not necessarily enough. In a COW filesystem like btrfs or ZFS you really cannot give much guarantees about writes suceeding or failing, even if we were able to create (and zero) a new segment. Even if you disregard that, we'd need to keep up with lots of concurrent reservations, looking a fair bit into the future. E.g. during a smart shutdown in a workload with lots of subtransactions trying to reserve space might make the situation actually worse because we might end up trying to reserve the combined size of records. The problem here is that the reserve function would presumably need an exclusive lock, and would be about as much of a hot spot as XLogInsert itself is. Plus we'd be paying a lot of extra cycles to solve a corner case problem that, with all due respect, comes up pretty darn seldom. So probably we need a better idea than that. Yea, I don't think anything really safe is going to work without signifcant penalties. Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of future WAL space always available. But I'm not sure exactly how to make use of that flexibility. If we'd be more aggressive with preallocating WAL files and doing so in the WAL writer, we could stop accepting writes in some common codepaths (e.g. nodeModifyTable.c) as soon as preallocating failed but continue to accept writes in other locations (e.g. TRUNCATE, DROP TABLE). That'd still fail if you write a *very* large commit record using up all the reserve though... I personally think this isn't worth complicating the code for. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/21/2014 06:21 PM, Marko Tiikkaja wrote: Hi Andrew, On 1/18/14, 10:05 PM, I wrote: But I'll continue with my review now that this has been sorted out. Sorry about the delay. I think the API for the new functions looks good. They are all welcome additions to the JSON family. The implementation side looks reasonable to me. I'm not sure there's need to duplicate so much code, though. E.g. json_to_recordset is almost identical to json_populate_recordset, and json_to_record has a bit of the same disease. I can probably factor some of that out. Of course, when it was an extension there wasn't the possibility. Finally, (as I'm sure you know already), docs are still missing. Marking the patch Waiting on Author for the time being. Yes, I have a draft, just waiting for time to go through it. Thanks for the review. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On Tue, Jan 21, 2014 at 12:31 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 01/21/2014 04:19 PM, Heikki Linnakangas wrote: On 01/21/2014 07:22 AM, Harold Giménez wrote: First of all, I apologize for submitting a patch and missing the commitfest deadline. Given the size of the patch, I thought I'd submit it for your consideration regardless. This patch prevents non-superusers from viewing other user's pg_stat_activity.application_name. This topic was discussed some time ago [1] and consequently application_name was made world readable [2]. I would like to propose that we hide it instead by reverting to the original behavior. There is a very large number of databases on the same cluster shared across different users who can easily view each other's application_name values. Along with that, there are some libraries that default application_name to the name of the running process [3], which can leak information about what web servers applications are running, queue systems, etc. Furthermore leaking application names in a multi-tenant environment is more information than an attacker should have access to on services like Heroku and other similar providers. I don't find these arguments compelling to change it now. It's well-documented that application_name is visible to everyone. Just don't put sensitive information there. For those users that don't mind advertising application_name, the patch would be highly inconvenient. For example, the database owner could no longer see the application_name of other users connected to her database. It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. Well, the fact is that if you don't run monitoring tools as superuser, there may not be enough connection slots available anyways, in cases where actual usage is consuming all of max_connections, and only the reserved slots are available. So in a way it's already unreliable to run monitoring as non-superuser unfortunately. If you want control over visibility of application_name, it should be done with a column privilige granted to a system role, or something like that - so the ability to see it can be given to public on default (thus not breaking BC) and if it's revoked from public, given to roles that need to see it. Something along these lines sounds like would solve the problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 18:24:39 -0500, Tom Lane wrote: Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of future WAL space always available. But I'm not sure exactly how to make use of that flexibility. If we'd be more aggressive with preallocating WAL files and doing so in the WAL writer, we could stop accepting writes in some common codepaths (e.g. nodeModifyTable.c) as soon as preallocating failed but continue to accept writes in other locations (e.g. TRUNCATE, DROP TABLE). That'd still fail if you write a *very* large commit record using up all the reserve though... I personally think this isn't worth complicating the code for. I too have got doubts about whether a completely bulletproof solution is practical. (And as you say, even if our internal logic was bulletproof, a COW filesystem defeats all guarantees in this area anyway.) But perhaps a 99% solution would be a useful compromise. Another thing to think about is whether we couldn't put a hard limit on WAL record size somehow. Multi-megabyte WAL records are an abuse of the design anyway, when you get right down to it. So for example maybe we could split up commit records, with most of the bulky information dumped into separate records that appear before the real commit. This would complicate replay --- in particular, if we abort the transaction after writing a few such records, how does the replayer realize that it can forget about those records? But that sounds probably surmountable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On Tue, Jan 21, 2014 at 03:57:37PM -0800, Harold Giménez wrote: It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. Well, the fact is that if you don't run monitoring tools as superuser, there may not be enough connection slots available anyways, in cases where actual usage is consuming all of max_connections, and only the reserved slots are available. So in a way it's already unreliable to run monitoring as non-superuser unfortunately. You might need to run as superuser in these cases, but it is hard to see why would need to do that in the normal case. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers