Re: [HACKERS] TABLESAMPLE patch
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek wrote: > > > I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. > I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Few other comments: 1. Current patch fails to apply, minor change is required: patching file `src/backend/utils/misc/Makefile' Hunk #1 FAILED at 15. 1 out of 1 hunk FAILED -- saving rejects to src/backend/utils/misc/Makefile.rej 2. Few warnings in code (compiled on windows(msvc)) 1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' : truncation from 'double' to 'float4' 1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' : truncation from 'double' to 'float4' 3. +static void +InitScanRelation(SampleScanState *node, EState *estate, int eflags, + TableSampleClause *tablesample) { .. + /* + * Page at a time mode is useless for us as we need to check visibility + * of tuples individually because tuple offsets returned by sampling + * methods map to rs_vistuples values and not to its indexes. + */ + node->ss.ss_currentScanDesc->rs_pageatatime = false; .. } Modifying scandescriptor in nodeSamplescan.c looks slightly odd, Do we modify this way at anyother place? I think it is better to either teach heap_beginscan_* api's about it or expose it via new API in heapam.c 4. +Datum +tsm_system_cost(PG_FUNCTION_ARGS) { .. + *tuples = path->rows * samplesize; } It seems above calculation considers all rows in table are of equal size and hence multiplying directly with samplesize will give estimated number of rows for sample method, however if row size varies then this calculation might not give right results. I think if possible we should consider the possibility of rows with varying sizes else we can at least write a comment to indicate the possibility of improvement for future. 5. gram.y - /* * Given "UPDATE foo set set ...", we have to decide without looking any Unrelated line removed. 6. @@ -13577,6 +13608,7 @@ reserved_keyword: | PLACING | PRIMARY | REFERENCES + | REPEATABLE Have you tried to investigate the reason why it is giving shift/reduce error for unreserved category and if we are not able to resolve that, then at least we can try to make it in some less restrictive category. I tried (on windows) by putting it in (type_func_name_keyword:) and it seems to be working. To investigate, you can try with information at below link: http://www.gnu.org/software/bison/manual/html_node/Understanding.html Basically I think we should try some more before concluding to change the category of REPEATABLE and especially if we want to make it a RESERVED keyword. On a related note, it seems you have agreed upthread with Kyotaro-san for below change, but I don't see the same in latest patch: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; } | /*EMPTY*/ { $$ = NULL; } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WALWriter active during recovery
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao wrote: > On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs wrote: >> Currently, WALReceiver writes and fsyncs data it receives. Clearly, >> while we are waiting for an fsync we aren't doing any other useful >> work. >> >> Following patch starts WALWriter during recovery and makes it >> responsible for fsyncing data, allowing WALReceiver to progress other >> useful actions. With the patch, replication didn't work fine in my machine. I started the standby server after removing all the WAL files from the standby. ISTM that the patch doesn't handle that case. That is, in that case, the standby tries to start up walreceiver and replication to retrieve the REDO-starting checkpoint record *before* starting up walwriter (IOW, before reaching the consistent point). Then since walreceiver works without walwriter, no received WAL data cannot be fsync'd in the standby. So replication cannot advance furthermore. I think that walwriter needs to start before walreceiver starts. I just marked this patch as Waiting on Author. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table-level log_autovacuum_min_duration
On Thu, Feb 19, 2015 at 3:32 PM, Michael Paquier wrote: > On Thu, Feb 19, 2015 at 2:13 PM, Naoya Anzai > wrote: >> As a result, I think you should not delete VACOPT_VERBOSE. > > In v8 it is not deleted. It is still declared, and its use is isolated > in gram.y, similarly to VACOPT_FREEZE. > >> According to the last mail I have posted, the difference of >> manual-vacuum log and auto-vacuum log exists clearly. > > Did you test the latest patch v8? I have added checks in it to see if > a process is an autovacuum worker to control elevel and the extra logs > of v7 do not show up. > >> So, at least you should not touch the mechanism of VACOPT_VERBOSE >> until both vacuum log formats are unified to a same format. > > If you mean that we should have the same kind of log outputs for > autovacuum and manual vacuum, I think that this is not going to > happen. Autovacuum entries are kept less verbose on purpose, contract > that v7 clealy broke. > >> If you agree my think, please undo your removing VACOPT_VERBOSE work. > > Well, I don't agree :) And I am guessing that you did not look at v8 > as well. Centralizing the control of logs using log_min_duration is > more extensible than simply having VACOPT_VERBOSE. With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message. Why did you remove that functionality? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2015/03/04 17:07, Etsuro Fujita wrote: On 2015/03/04 16:58, Albe Laurenz wrote: Etsuro Fujita wrote: While updating the patch, I noticed that in the previous patch, there is a bug in pushing down parameterized UPDATE/DELETE queries; generic plans for such queries fail with a can't-happen error. I fixed the bug and tried to add the regression tests that execute the generic plans, but I couldn't because I can't figure out how to force generic plans. Is there any way to do that? I don't know about a way to force it, but if you run the statement six times, it will probably switch to a generic plan. Yeah, I was just thinking running the statement six times in the regression tests ... Here is an updated version. In this version, the bug has been fixed, but any regression tests for that hasn't been added, because I'm not sure that the above way is a good idea and don't have any other ideas. The EXPLAIN output has also been improved as discussed in [1]. On top of this, I'll try to extend the join push-down patch to support a pushed-down update on a join, though I'm still digesting the series of patches. Comments welcome. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/31942.1410534...@sss.pgh.pa.us *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 189,198 is_foreign_expr(PlannerInfo *root, if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt)) return false; - /* Expressions examined here should be boolean, ie noncollatable */ - Assert(loc_cxt.collation == InvalidOid); - Assert(loc_cxt.state == FDW_COLLATE_NONE); - /* * An expression which includes any mutable functions can't be sent over * because its result is not stable. For example, sending now() remote --- 189,194 *** *** 788,794 deparseTargetList(StringInfo buf, * * If params is not NULL, it receives a list of Params and other-relation Vars * used in the clauses; these values must be transmitted to the remote server ! * as parameter values. * * If params is NULL, we're generating the query for EXPLAIN purposes, * so Params and other-relation Vars should be replaced by dummy values. --- 784,791 * * If params is not NULL, it receives a list of Params and other-relation Vars * used in the clauses; these values must be transmitted to the remote server ! * as parameter values. Caller is responsible for initializing the result list ! * to empty if necessary. * * If params is NULL, we're generating the query for EXPLAIN purposes, * so Params and other-relation Vars should be replaced by dummy values. *** *** 805,813 appendWhereClause(StringInfo buf, int nestlevel; ListCell *lc; - if (params) - *params = NIL; /* initialize result list to empty */ - /* Set up context struct for recursion */ context.root = root; context.foreignrel = baserel; --- 802,807 *** *** 940,945 deparseUpdateSql(StringInfo buf, PlannerInfo *root, --- 934,996 } /* + * deparse remote UPDATE statement + * + * The statement text is appended to buf, and we also create an integer List + * of the columns being retrieved by RETURNING (if any), which is returned + * to *retrieved_attrs. + */ + void + deparsePushedDownUpdateSql(StringInfo buf, PlannerInfo *root, + Index rtindex, Relation rel, + List *targetlist, + List *targetAttrs, + List *remote_conds, + List **params_list, + List *returningList, + List **retrieved_attrs) + { + RelOptInfo *baserel = root->simple_rel_array[rtindex]; + deparse_expr_cxt context; + bool first; + ListCell *lc; + + if (params_list) + *params_list = NIL; /* initialize result list to empty */ + + /* Set up context struct for recursion */ + context.root = root; + context.foreignrel = baserel; + context.buf = buf; + context.params_list = params_list; + + appendStringInfoString(buf, "UPDATE "); + deparseRelation(buf, rel); + appendStringInfoString(buf, " SET "); + + first = true; + foreach(lc, targetAttrs) + { + int attnum = lfirst_int(lc); + TargetEntry *tle = get_tle_by_resno(targetlist, attnum); + + if (!first) + appendStringInfoString(buf, ", "); + first = false; + + deparseColumnRef(buf, rtindex, attnum, root); + appendStringInfo(buf, " = "); + deparseExpr((Expr *) tle->expr, &context); + } + if (remote_conds) + appendWhereClause(buf, root, baserel, remote_conds, + true, params_list); + + deparseReturningList(buf, root, rtindex, rel, false, + returningList, retrieved_attrs); + } + + /* * deparse remote DELETE statement * * The statement text is appended to buf, and we also create an integer List *** *** 962,967 deparseDeleteSql(StringInfo buf, PlannerInfo *root, --- 1013,1048 } /* + * deparse remote DELETE statement + * + * The s
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Please find attached a patch. As discussed, flag to denote compression and presence of hole in block image has been added in XLogRecordImageHeader rather than block header. Following are WAL numbers based on attached test script posted by Michael earlier in the thread. WAL generated FPW compression on 122.032 MB FPW compression off 155.223 MB HEAD 155.236 MB Compression : 21 % Number of block images generated in WAL : 63637 Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-of-full-page-writes-in-WAL_v23.patch Description: Support-compression-of-full-page-writes-in-WAL_v23.patch compress_run.bash Description: compress_run.bash -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] contraints_exclusion fails to refute simple condition
PostGIS installs standard constraints of this kind: CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL) The constraint is used by constraint_exclusion if using this condition: WHERE g IS NOT NULL AND geometrytype(g) = 'LINESTRING' But it is _NOT_ used if the NOT NULL condition is removed: WHERE geometrytype(g) = 'LINESTRING' As the "geometrytype" is defined as STRICT and IMMUTABLE, there's no way for geometrytype(g) = 'LINESTRING' to hold true, so why is the "IS NOT NULL" condition also needed by the planner ? Andres Freund on IRC suggested that predicate_refuted_by_simple_clause() looks like trying to handle such cases, but if that's the case it seems to fail here. --strk; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
2015-03-04 22:41 GMT+01:00 Peter Eisentraut : > On 3/3/15 7:17 PM, Jim Nasby wrote: > > I think we're screwed in that regard anyway, because of the special > > constructs. You'd need different logic to handle things like +role and > > sameuser. We might even end up painted in a corner where we can't change > > it in the future because it'll break everyone's scripts. > > Yeah, I'm getting worried about this. I think most people agree that > getting a peek at pg_hba.conf from within the server is useful, but > everyone seems to have quite different uses for it. Greg wants to join > against other catalog tables, Jim wants to reassemble a valid and > accurate pg_hba.conf, Josh wants to write an editing tool. Personally, > I'd like to see something as close to the actual file as possible. > > If there were an obviously correct way to map the various special > constructs to the available SQL data types, then fine. But if there > isn't, then we shouldn't give a false overinterpretation. So I'd render > everything that's disputed as a plain text field. (Not even an array of > text.) > I disagree with last note - arrays where is expected are valid. I don't see any reason why anybody have to do parsing some informations from any table. The face of pg_hba.conf in SQL space can be different than original file - but all data should be clean (without necessity of additional parsing) Regards Pavel > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Join push-down support for foreign tables
Hi Ashutosh, thanks for the review. 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat : > In create_foreignscan_path() we have lines like - > 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel, > 1588 > required_outer); > Now, that the same function is being used for creating foreign scan paths > for joins, we should be calling get_joinrel_parampathinfo() on a join rel > and get_baserel_parampathinfo() on base rel. Got it. Please let me check the difference. > > The patch seems to handle all the restriction clauses in the same way. There > are two kinds of restriction clauses - a. join quals (specified using ON > clause; optimizer might move them to the other class if that doesn't affect > correctness) and b. quals on join relation (specified in the WHERE clause, > optimizer might move them to the other class if that doesn't affect > correctness). The quals in "a" are applied while the join is being computed > whereas those in "b" are applied after the join is computed. For example, > postgres=# select * from lt; > val | val2 > -+-- >1 |2 >1 |3 > (2 rows) > > postgres=# select * from lt2; > val | val2 > -+-- >1 |2 > (1 row) > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2); > val | val2 | val | val2 > -+--+-+-- >1 |2 | 1 |2 >1 |3 | | > (2 rows) > > postgres=# select * from lt left join lt2 on (true) where (lt.val2 = > lt2.val2); > val | val2 | val | val2 > -+--+-+-- >1 |2 | 1 |2 > (1 row) > > The difference between these two kinds is evident in case of outer joins, > for inner join optimizer puts all of them in class "b". The remote query > sent to the foreign server has all those in ON clause. Consider foreign > tables ft1 and ft2 pointing to local tables on the same server. > postgres=# \d ft1 > Foreign table "public.ft1" > Column | Type | Modifiers | FDW Options > +-+---+- > val| integer | | > val2 | integer | | > Server: loopback > FDW Options: (table_name 'lt') > > postgres=# \d ft2 > Foreign table "public.ft2" > Column | Type | Modifiers | FDW Options > +-+---+- > val| integer | | > val2 | integer | | > Server: loopback > FDW Options: (table_name 'lt2') > > postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 = > ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null; > > QUERY PLAN > > --- > > Foreign Scan (cost=100.00..125.60 rows=2560 width=16) >Output: val, val2, val, val2 >Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM > public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a > _0, a_1) ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 = > l.a_1)) > (3 rows) > > The result is then wrong > postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where > ft1.val + ft2.val > ft1.val2 or ft2.val is null; > val | val2 | val | val2 > -+--+-+-- >1 |2 | | >1 |3 | | > (2 rows) > > which should match the result obtained by substituting local tables for > foreign ones > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where > lt.val + lt2.val > lt.val2 or lt2.val is null; > val | val2 | val | val2 > -+--+-+-- >1 |3 | | > (1 row) > > Once we start distinguishing the two kinds of quals, there is some > optimization possible. For pushing down a join it's essential that all the > quals in "a" are safe to be pushed down. But a join can be pushed down, even > if quals in "a" are not safe to be pushed down. But more clauses one pushed > down to foreign server, lesser are the rows fetched from the foreign server. > In postgresGetForeignJoinPath, instead of checking all the restriction > clauses to be safe to be pushed down, we need to check only those which are > join quals (class "a"). The argument restrictlist of GetForeignJoinPaths contains both conditions mixed, so I added extract_actual_join_clauses() to separate it into two lists, join_quals and other clauses. This is similar to what create_nestloop_plan and siblings do. > > Following EXPLAIN output seems to be confusing > ft1 and ft2 both are pointing to same lt on a foreign server. > postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where > ft1.val + ft1.val2 = ft2.val; > > QUERY PLAN > > --- > -- > Foreign Scan (cost=100.00..132.00 rows=256
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila wrote: > Please find attached a patch. As discussed, flag to denote compression and > presence of hole in block image has been added in XLogRecordImageHeader > rather than block header. > > Following are WAL numbers based on attached test script posted by Michael > earlier in the thread. > > WAL generated > FPW compression on 122.032 MB > > FPW compression off 155.223 MB > > HEAD 155.236 MB > > Compression : 21 % > Number of block images generated in WAL : 63637 ISTM that we are getting a nice thing here. I tested the patch and WAL replay is working correctly. Some nitpicky comments... + * bkp_info stores flags for information about the backup block image + * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image is compressed. + * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a block image. + * If the block image has no hole, it is ensured that the raw size of a compressed + * block image is equal to BLCKSZ, hence the contents of + * XLogRecordBlockImageCompressionInfo are not necessary. Take care of the limit of 80 characters per line. (Perhaps you could run pgindent on your code before sending a patch?). The first line of this paragraph is a sentence in itself, no? In xlogreader.c, blk->with_hole is a boolean, you could remove the ==0 and ==1 it is compared with. + /* + * Length of a block image must be less than BLCKSZ + * if the block has hole + */ "if the block has a hole." (End of the sentence needs a dot.) + /* +* Length of a block image must be equal to BLCKSZ +* if the block does not have hole +*/ "if the block does not have a hole." Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-03-05 12:14:04 +, Syed, Rahila wrote: > Please find attached a patch. As discussed, flag to denote > compression and presence of hole in block image has been added in > XLogRecordImageHeader rather than block header. FWIW, I personally won't commit it with things done that way. I think it's going the wrong way, leading to a harder to interpret and less flexible format. I'm not going to further protest if Fujii or Heikki commit it this way though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anyarray
On 03/04/2015 10:28 PM, Tom Lane wrote: Peter Eisentraut writes: On 2/13/15 10:20 AM, Teodor Sigaev wrote: Some of users of intarray contrib module wish to use its features with another kind of arrays, not only for int4 type. Suggested module generalizes intarray over other (not all) types op pgsql. I think this module should be merged with the intarray module. Having two modules with very similar functionality would be confusing. Perhaps. I think it would be hard to remove intarray without breaking things for existing users of it; even if the functionality remains under another name. And surely we don't want to generalize intarray while keeping that same name. So it might be hard to get to a clean solution. Speaking of names, I can't avoid the feeling that it is a seriously bad idea to name an extension the same thing as an existing core type. +1. We have far too much experience already of this type of naming confusion. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 9:41 PM, Peter Eisentraut wrote: > everyone seems to have quite different uses for it. Greg wants to join > against other catalog tables, Jim wants to reassemble a valid and > accurate pg_hba.conf, Josh wants to write an editing tool. Personally, > I'd like to see something as close to the actual file as possible. Well if you want to read the file as is you can do so using the file reading functions which afaik were specifically intended for the purpose of writing config editing tools. Just to repeat, even if you're reassembling a valid and accurage pg_hba.conf or writing an editing tool you can do that with what we have today as long as there are no databases named "all", "sameuser", or "samerole" and no roles named "all". That's something an editing tool could check and provide a warning for to the user and refuse to write a config file if it's the case and I doubt there are any such users out there anyways. Having five separate columns would work but I just think it's way more complicated than necessary and burdens everyone who wants to use the view less formally. > > If there were an obviously correct way to map the various special > constructs to the available SQL data types, then fine. But if there > isn't, then we shouldn't give a false overinterpretation. So I'd render > everything that's disputed as a plain text field. (Not even an array of > text.) Joining against other catalog tables may be kind of exaggerated but if I have data in an SQL view I certainly expect to be able to write WHERE clauses to find the rows I want without having to do string parsing. If it were a comma-delimited string I have to do something like WHERE users LIKE '%,username,%' but then that's not good enough since it may be the first or last and the username itself may contain white-space or a comma etc etc. I think knowing about the + prefix and the risk of tokens like "all" and "sameuser" etc are pretty small compromises to make. But having to know the quoting rules and write a tokenizer are too far. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c
Robert Haas wrote: > On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier > wrote: > > Hm, why not. That would remove all inconsistencies between the parser > > and the autovacuum code path. Perhaps something like the attached > > makes sense then? > > I really don't see this patch, or any of the previous ones, as solving > any actual problem. There's no bug here, and no reason to suspect > that future code changes would be particularly like to introduce one. > Assertions are a great way to help developers catch coding mistakes, > but it's a real stretch to think that a developer is going to add a > new syntax for ANALYZE that involves setting options proper to VACUUM > and not notice it. That was my opinion of previous patches in this thread. But I think this last one makes a lot more sense: why is the parser concerned with figuring out the right defaults given FREEZE/not-FREEZE? I think there is a real gain in clarity here by deferring those decisions until vacuum time. The parser's job should be to pass the FREEZE flag down only, which is what this patch does, and consequently results in a (small) net reduction of LOC in gram.y. Here's a simple idea to improve the patch: make VacuumParams include VacuumStmt and the for_wraparound and do_toast flags. ISTM that reduces the number of arguments to be passed down in a couple of places. In particular: vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel) vacuum_rel(VacuumParams *params) Does that sound more attractive? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Robert Haas wrote: > > On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier > > wrote: > > > Hm, why not. That would remove all inconsistencies between the parser > > > and the autovacuum code path. Perhaps something like the attached > > > makes sense then? > > > > I really don't see this patch, or any of the previous ones, as solving > > any actual problem. There's no bug here, and no reason to suspect > > that future code changes would be particularly like to introduce one. > > Assertions are a great way to help developers catch coding mistakes, > > but it's a real stretch to think that a developer is going to add a > > new syntax for ANALYZE that involves setting options proper to VACUUM > > and not notice it. > > That was my opinion of previous patches in this thread. But I think > this last one makes a lot more sense: why is the parser concerned with > figuring out the right defaults given FREEZE/not-FREEZE? I think there > is a real gain in clarity here by deferring those decisions until vacuum > time. The parser's job should be to pass the FREEZE flag down only, > which is what this patch does, and consequently results in a (small) net > reduction of LOC in gram.y. Yeah, that was my thinking also in my earlier review. > Here's a simple idea to improve the patch: make VacuumParams include > VacuumStmt and the for_wraparound and do_toast flags. ISTM that reduces > the number of arguments to be passed down in a couple of places. In > particular: > > vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel) > > vacuum_rel(VacuumParams *params) > > Does that sound more attractive? I had been hoping we'd be able to provide an API which didn't require autovacuum to build up a VacuumStmt, but that's not a big deal and it's doing it currently anyway. Just mentioning it as that was one of the other things that I had been hoping to get out of this. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Join push-down support for foreign tables
Here is the v5 patch of Join push-down support for foreign tables. Changes since v4: - Separete remote conditions into ON and WHERE, per Ashutosh. - Add regression test cases for foreign join. - Don't skip reversed relation combination in OUTER join cases. I'm now working on two issues from Kaigai-san and Ashutosu, whole-row reference handling and use of get_joinrel_parampathinfo(). 2015-03-05 22:00 GMT+09:00 Shigeru Hanada : > Hi Ashutosh, thanks for the review. > > 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat : >> In create_foreignscan_path() we have lines like - >> 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel, >> 1588 >> required_outer); >> Now, that the same function is being used for creating foreign scan paths >> for joins, we should be calling get_joinrel_parampathinfo() on a join rel >> and get_baserel_parampathinfo() on base rel. > > Got it. Please let me check the difference. > >> >> The patch seems to handle all the restriction clauses in the same way. There >> are two kinds of restriction clauses - a. join quals (specified using ON >> clause; optimizer might move them to the other class if that doesn't affect >> correctness) and b. quals on join relation (specified in the WHERE clause, >> optimizer might move them to the other class if that doesn't affect >> correctness). The quals in "a" are applied while the join is being computed >> whereas those in "b" are applied after the join is computed. For example, >> postgres=# select * from lt; >> val | val2 >> -+-- >>1 |2 >>1 |3 >> (2 rows) >> >> postgres=# select * from lt2; >> val | val2 >> -+-- >>1 |2 >> (1 row) >> >> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2); >> val | val2 | val | val2 >> -+--+-+-- >>1 |2 | 1 |2 >>1 |3 | | >> (2 rows) >> >> postgres=# select * from lt left join lt2 on (true) where (lt.val2 = >> lt2.val2); >> val | val2 | val | val2 >> -+--+-+-- >>1 |2 | 1 |2 >> (1 row) >> >> The difference between these two kinds is evident in case of outer joins, >> for inner join optimizer puts all of them in class "b". The remote query >> sent to the foreign server has all those in ON clause. Consider foreign >> tables ft1 and ft2 pointing to local tables on the same server. >> postgres=# \d ft1 >> Foreign table "public.ft1" >> Column | Type | Modifiers | FDW Options >> +-+---+- >> val| integer | | >> val2 | integer | | >> Server: loopback >> FDW Options: (table_name 'lt') >> >> postgres=# \d ft2 >> Foreign table "public.ft2" >> Column | Type | Modifiers | FDW Options >> +-+---+- >> val| integer | | >> val2 | integer | | >> Server: loopback >> FDW Options: (table_name 'lt2') >> >> postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 = >> ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null; >> >> QUERY PLAN >> >> --- >> >> Foreign Scan (cost=100.00..125.60 rows=2560 width=16) >>Output: val, val2, val, val2 >>Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM >> public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a >> _0, a_1) ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 = >> l.a_1)) >> (3 rows) >> >> The result is then wrong >> postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where >> ft1.val + ft2.val > ft1.val2 or ft2.val is null; >> val | val2 | val | val2 >> -+--+-+-- >>1 |2 | | >>1 |3 | | >> (2 rows) >> >> which should match the result obtained by substituting local tables for >> foreign ones >> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where >> lt.val + lt2.val > lt.val2 or lt2.val is null; >> val | val2 | val | val2 >> -+--+-+-- >>1 |3 | | >> (1 row) >> >> Once we start distinguishing the two kinds of quals, there is some >> optimization possible. For pushing down a join it's essential that all the >> quals in "a" are safe to be pushed down. But a join can be pushed down, even >> if quals in "a" are not safe to be pushed down. But more clauses one pushed >> down to foreign server, lesser are the rows fetched from the foreign server. >> In postgresGetForeignJoinPath, instead of checking all the restriction >> clauses to be safe to be pushed down, we need to check only those which are >> join quals (class "a"). > > The argument restrictlist of GetForeignJoinPaths contains both > conditions mixed, so I added extract_actual_join_clauses() to separate > it into two lists, join_qua
Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c
On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier >> wrote: >> > Hm, why not. That would remove all inconsistencies between the parser >> > and the autovacuum code path. Perhaps something like the attached >> > makes sense then? >> >> I really don't see this patch, or any of the previous ones, as solving >> any actual problem. There's no bug here, and no reason to suspect >> that future code changes would be particularly like to introduce one. >> Assertions are a great way to help developers catch coding mistakes, >> but it's a real stretch to think that a developer is going to add a >> new syntax for ANALYZE that involves setting options proper to VACUUM >> and not notice it. > > That was my opinion of previous patches in this thread. But I think > this last one makes a lot more sense: why is the parser concerned with > figuring out the right defaults given FREEZE/not-FREEZE? I think there > is a real gain in clarity here by deferring those decisions until vacuum > time. The parser's job should be to pass the FREEZE flag down only, > which is what this patch does, and consequently results in a (small) net > reduction of LOC in gram.y. Sure, I'll buy that. If you want to refactor things that way, I have no issue with it - I just want to point out that it seems to have zip to do with what started this thread, which is why I wondered if we were just talking for the sake of talking. > Here's a simple idea to improve the patch: make VacuumParams include > VacuumStmt and the for_wraparound and do_toast flags. ISTM that reduces > the number of arguments to be passed down in a couple of places. In > particular: > > vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel) > > vacuum_rel(VacuumParams *params) > > Does that sound more attractive? I dislike passing down parser nodes straight into utility commands. It tends to make those those functions hard for in-core users to call, and also to lead to security vulnerabilities where we look up the same names more than once and just hope that we get the same OID every time. Stuffing the VacuumStmt pointer inside the VacuumParams object doesn't, for me, help anything. It'd be a lot more interesting if we could get rid of that altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] object description for FDW user mappings
When commit cae565e503 introduced FDW user mappings, it used this in getObjectDescription for them: appendStringInfo(&buffer, _("user mapping for %s"), usename); This was later mostly copied (by yours truly) as object identity by commit f8348ea32e wherein I used this: appendStringInfo(&buffer, "%s", usename); As it turns out, this is wrong, because the pg_user_mapping catalog has a two-column "primary key" which is user OID and server OID. Therefore it seems to me that the correct object description and identity must include both username and server name. I propose we change the above to this: appendStringInfo(&buffer, _("user mapping for %s in server %s"), usename, srv->servername); I propose to apply the (appropriately rebased) attached patch to all back branches. Note in particular the wording changes in some error messages in the foreign_data test. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 82e8814c9f4b89d31d51b2fa370add1c04ae0f12 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 5 Mar 2015 12:30:53 -0300 Subject: [PATCH] fix user mapping object description/identity --- src/backend/catalog/objectaddress.c| 30 -- src/test/regress/expected/foreign_data.out | 24 2 files changed, 32 insertions(+), 22 deletions(-) diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c index 70043fc..0740b4f 100644 --- a/src/backend/catalog/objectaddress.c +++ b/src/backend/catalog/objectaddress.c @@ -2510,14 +2510,17 @@ getObjectDescription(const ObjectAddress *object) HeapTuple tup; Oid useid; char *usename; +Form_pg_user_mapping umform; +ForeignServer *srv; tup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(object->objectId)); if (!HeapTupleIsValid(tup)) elog(ERROR, "cache lookup failed for user mapping %u", object->objectId); - -useid = ((Form_pg_user_mapping) GETSTRUCT(tup))->umuser; +umform = (Form_pg_user_mapping) GETSTRUCT(tup); +useid = umform->umuser; +srv = GetForeignServer(umform->umserver); ReleaseSysCache(tup); @@ -2526,7 +2529,8 @@ getObjectDescription(const ObjectAddress *object) else usename = "public"; -appendStringInfo(&buffer, _("user mapping for %s"), usename); +appendStringInfo(&buffer, _("user mapping for %s in server %s"), usename, + srv->servername); break; } @@ -3906,19 +3910,18 @@ getObjectIdentityParts(const ObjectAddress *object, { HeapTuple tup; Oid useid; +Form_pg_user_mapping umform; +ForeignServer *srv; const char *usename; -/* no objname support */ -if (objname) - *objname = NIL; - tup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(object->objectId)); if (!HeapTupleIsValid(tup)) elog(ERROR, "cache lookup failed for user mapping %u", object->objectId); - -useid = ((Form_pg_user_mapping) GETSTRUCT(tup))->umuser; +umform = (Form_pg_user_mapping) GETSTRUCT(tup); +useid = umform->umuser; +srv = GetForeignServer(umform->umserver); ReleaseSysCache(tup); @@ -3927,7 +3930,14 @@ getObjectIdentityParts(const ObjectAddress *object, else usename = "public"; -appendStringInfoString(&buffer, usename); +if (objname) +{ + *objname = list_make1(pstrdup(usename)); + *objargs = list_make1(pstrdup(srv->servername)); +} + +appendStringInfo(&buffer, "%s in server %s", usename, + srv->servername); break; } diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out index 632b7e5..4d0de1f 100644 --- a/src/test/regress/expected/foreign_data.out +++ b/src/test/regress/expected/foreign_data.out @@ -247,7 +247,7 @@ CREATE USER MAPPING FOR current_user SERVER s1; DROP FOREIGN DATA WRAPPER foo; -- ERROR ERROR: cannot drop foreign-data wrapper foo because other objects depend on it DETAIL: server s1 depends on foreign-data wrapper foo -user mapping for foreign_data_user depends on server s1 +user mapping for foreign_data_user in server s1 depends on server s1 HINT: Use DROP ... CASCADE to drop the dependent objects too. SET ROLE regress_test_role; DROP FOREIGN DATA WRAPPER foo CASCADE; -- ERROR @@ -256,7 +256,7 @@ RESET ROLE; DROP FOREIGN DATA WRAPPER foo CASCADE; NOTICE: drop cascades to 2 other objects DETAIL: drop cascades to server s1 -drop cascades to user mapping for foreign_data_user +drop cascades to user mapping for foreign_data_user in server s1 \dew+ List of foreign-data wrappers Name| Owner | Handler |Validator | Access privileges | FDW Options | Description @@ -526
Re: [HACKERS] Combining Aggregates
On Wed, Mar 4, 2015 at 11:00 PM, Kouhei Kaigai wrote: > Anyway, I could find out, at least, these complicated issues around > two-phase aggregate integration with planner. Someone can suggest > minimum invasive way for these integration? I don't think I have the brain space to think about that until we get the basic parallelism stuff in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v8
On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian wrote: > On Thu, Mar 5, 2015 at 01:25:13PM +0900, Fujii Masao wrote: >> >> Yeah, it might make the situation better than today. But I'm afraid that >> >> many users might get disappointed about that behavior of an incremental >> >> backup after the release... >> > >> > I don't get what do you mean here. Can you elaborate this point? >> >> The proposed version of LSN-based incremental backup has some limitations >> (e.g., every database files need to be read even when there is no >> modification >> in database since last backup, and which may make the backup time longer than >> users expect) which may disappoint users. So I'm afraid that users who can >> benefit from the feature might be very limited. IOW, I'm just sticking to >> the idea of timestamp-based one :) But I should drop it if the majority in >> the list prefers the LSN-based one even if it has such limitations. > > We need numbers on how effective each level of tracking will be. Until > then, the patch can't move forward. The point is that this is a stepping stone toward what will ultimately be a better solution. You can use timestamps today if (a) whole-file granularity is good enough for you and (b) you trust your system clock to never go backwards. In fact, if you use pg_start_backup() and pg_stop_backup(), you don't even need a server patch; you can just go right ahead and implement whatever you like. A server patch would be needed to make pg_basebackup do a file-time-based incremental backup, but I'm not excited about that because I think the approach is a dead-end. If you want block-level granularity, and you should, an approach based on file times is never going to get you there. An approach based on LSNs can. If the first version of the patch requires reading the whole database, fine, it's not going to perform all that terribly well. But we can optimize that later by keeping track of which blocks have been modified since a given LSN. If we do that, we can get better reliability than the timestamp approach can ever offer, plus excellent transfer and storage characteristics. What I'm unhappy with about this patch is that it insists on sending the whole file if a single block in that file has changed. That is lame. To get something useful out of this, we should be looking to send only those blocks whose LSNs have actually changed. That would reduce I/O (in the worst case, the current patch each file in its entirety twice) and transfer bandwidth as compared to the proposed patch. We'd still have to read the whole database so it might very well do more I/O than the file-timestamp approach, but it would beat the file-timestamp approach on transfer bandwidth and on the amount of storage required to store the incremental. In many workloads, I expect those savings would be quite significant. If we then went back in a later release and implemented one of the various proposals to avoid needing to read every block, we'd then have a very robust and complete solution. But I agree with Fujii to the extent that I see little value in committing this patch in the form proposed. Being smart enough to use the LSN to identify changed blocks, but then sending the entirety of every file anyway because you don't want to go to the trouble of figuring out how to revise the wire protocol to identify the individual blocks being sent and write the tools to reconstruct a full backup based on that data, does not seem like enough of a win. As Fujii says, if we ship this patch as written, people will just keep using the timestamp-based approach anyway. Let's wait until we have something that is, at least in some circumstances, a material improvement over the status quo before committing anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
* Josh Berkus (j...@agliodbs.com) wrote: > Catching up here ... > > On 03/03/2015 06:01 PM, Bruce Momjian wrote: > > It feels like MD5 has accumulated enough problems that we need to start > > looking for another way to store and pass passwords. The MD5 problems > > are: > > > > 1) MD5 makes users feel uneasy (though our usage is mostly safe) > > > > 2) The per-session salt sent to the client is only 32-bits, meaning > > that it is possible to reply an observed MD5 hash in ~16k connection > > attempts. > > Seems like we could pretty easily increase the size of the salt. Of > course, that just increases the required number of connection attempts, > without really fixing the problem. Please do not propose hacks on top of the existing "md5" authentication method which break the wireline protocol. There is absolutely nothing useful to be gained from that. We need to introduce a new auth method with whatever protocol changes that requires without breaking the existing auth method. Discussing trade-offs of changing the existing md5 mechanism *without* breaking the wireline protocol may be worthwhile as we might be able to improve things a bit there, but nothing there is a proper solution for security conscious individuals. > > 3) Using the user name for the MD5 storage salt allows the MD5 stored > > hash to be used on a different cluster if the user used the same > > password. > > This is a feature as well as a bug. For example, pgBouncer relies on > this aspect of md5 auth. It's not a feature and pgBouncer could be made to not rely on this. > > 4) Using the user name for the MD5 storage salt causes the renaming of > > a user to break the stored password. > > Wierdly, in 17 years of Postgres, I've never encountered this issue. I agree, that's kind of odd. :) > So, are we more worried about attackers getting a copy of pg_authid, or > sniffing the hash on the wire? They are both attack vectors to consider but most applications address the network-based risk through TLS and have a hash-based approach to address the pg_authid-based risk. Administrators are used to that and are quite surprised to discover that PG doesn't work that way. As I mentioned up-thread, it's certainly very rare for anyone concerned about a network-based risk to *not* use TLS, but they tend to still use passwords for the actual authentication mechansim and the md5 auth mech makes the wrong trade-off for them. The password auth mech isn't ideal either since the server ends up seeing the PW. The change I'm suggesting addresses both the pg_authid-based risk and the "server seeing the PW" risk without changing the wireline protocol. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: > On Wed, Mar 4, 2015 at 05:56:25PM -0800, Josh Berkus wrote: > > So, are we more worried about attackers getting a copy of pg_authid, or > > sniffing the hash on the wire? > > Both. Stephen is more worried about pg_authid, but I am more worried > about sniffing. I'm also worried about both, but if the admin is worried about sniffing in their environment, they're much more likely to use TLS than to set up client side certificates, kerberos, or some other strong auth mechanism, simply because TLS is pretty darn easy to get working and distros set it up for you by default. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: > One way to fix #2 would be to use a per-user or per-cluster counter for > the session salt, rather than a random number --- that would change > replays from ~16k to 4 billion, with no wire protocol change needed. I'm not against doing that if we decide to ignore the pg_authid-based vector (which we could certainly do), but given the relatively poor hashing algorithm we use and the small salt, along with the commonly used practice of using TLS to address network-based attacks, I'm not sure it's really worth it. Note that changing the algorithm or the salt would require a wireline protocol change and therefore isn't interesting to consider as, if we're going to do that, we should be moving to a vetted solution instead. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: > On Wed, Mar 4, 2015 at 04:19:00PM -0500, Stephen Frost wrote: > > > Hm, well, "don't change the wireline protocol" could be another wanna-have > > > ... but if we want to stop using MD5, that's not really a realistic goal > > > is it? > > > > I'm trying to address both sides of the issue- improve the current > > situation without breaking existing clients AND provide a new auth > > method and encourage everyone to move to it as soon as they're able. We > > can't simply deprecate md5 as that would break pg_upgrade for any users > > who are currently using it. > > Actually, pg_upgrade uses 'trust' and a private socket file, at least on > Unix. Of course, post-upgrade, users would have trouble logging in. The post-upgrade trouble logging in is what I was getting at. Of course, that issue exists with a pg_dump-based approach also, so my reference to pg_upgrade above wasn't accurate and it shold have been "would break upgrades for any users who are currently using it." > > This half of the discussion has been all about improving md5 without > > breaking the wireline protocol or existing users. > > > > The other half of the discussion is about implementing a new > > password-based authentication based on one of the vetted authentication > > protocols already in use today (SCRAM or SRP, for example). Using those > > new authentication protocols would include a move off of the MD5 hashing > > function, of course. This would also mean breaking the on-disk hash, > > but that's necessary anyway because what we do there today isn't secure > > either and no amount of futzing is going to change that. > > While I don't like the requirement to use TLS to improve MD5 fix, I also > don't like the idea of having users go through updating all these > passwords only to have us implement the _right_ solution in the next > release. I don't see why it is useful to be patching up MD5 with a TLS > requirement when we know they should be moving to SCRAM or SRP. If the > MD5 change was transparent to users/admins, that would be different. So, TLS *isn't* an actual requirement with the approach that I've outlined, it's just that if you're not using it then you have a somewhat larger risk of successful network-based attacks, but then, if you're not using TLS then you're unlikely to be worried about that vector. > > I've got nearly zero interest in trying to go half-way on this by > > designing something that we think is secure which has had no external > > review or anyone else who uses it. Further, going that route makes me > > very nervous that we'd decide on certain compromises in order to make > > things easier for users without actually realising the problems with > > such an approach (eg: "well, if we use hack X we wouldn't have to > > change what is stored on disk and therefore we wouldn't break > > I am not happy to blindly accept a new security setup without > understanding exactly what it is trying to fix, which is why I am asking > all these questions. I certainly didn't intend to suggest that anyone blindly accept a new security setup. I don't mind the questions but I do get a bit frustrated at the suggestions that we can "fix" the md5 auth mechanism by simply increasing the salt or putting in a better hashing algorithm. Those are not solutions to this authentication challenge- for that, we need to look at SCRAM or SRP, where RFCs have been reviewed and published which outline exactly how they work and why they work the way they do. I'd certainly encourage everyone interested in this to go look at those RFCs. SCRAM - https://tools.ietf.org/html/rfc5802 SRP - https://www.ietf.org/rfc/rfc2945.txt > > of a transistion would be a lot easier on our users and I'd definitely > > recommend that we add that with this new authentication mechanism, to > > address this kind of issue in the future (not to mention that's often > > asked for..). Password complexity would be another thing we should > > really add and is also often requested. > > I agree our password management could use improvement. I'm *very* glad to hear that. When I brought it up 10-or-so years ago, there was very little interest in these kinds of improvements. > > Frankly, in the end, I don't see us being able to produce something in > > time for this release unless someone can be dedicated to the effort over > > the next couple of months, and therefore I'd prefer to improve the > > current issues with md5 without breaking the wireline protocol than > > simply do nothing (again). > > I am not sure why we have to shove something into 9.5 --- as you said, > this issue has been known about for 10+ years. It'd be nice to show that we're being responsive when issues are brought up, even if they've existed for a long time (because, in the minds of some, that we've done nothing in 10 years doesn't exactly reflect very well on us :( ). > I think we should do what we can to improve MD5 in cases where the
Re: [HACKERS] object description for FDW user mappings
Alvaro Herrera writes: > When commit cae565e503 introduced FDW user mappings, it used this in > getObjectDescription for them: > appendStringInfo(&buffer, _("user mapping for %s"), usename); > This was later mostly copied (by yours truly) as object identity by > commit f8348ea32e wherein I used this: > appendStringInfo(&buffer, "%s", usename); > As it turns out, this is wrong, because the pg_user_mapping catalog has > a two-column "primary key" which is user OID and server OID. Therefore > it seems to me that the correct object description and identity must > include both username and server name. I propose we change the above to > this: > appendStringInfo(&buffer, _("user mapping for %s in server %s"), > usename, >srv->servername); +1 for the concept, but to be nitpicky, "in" doesn't seem like the right word here. "on server" would read better to me; or perhaps "at server". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: knowing detail of config files via SQL
On Sat, Feb 28, 2015 at 2:27 PM, Stephen Frost wrote: > Sawada, > > * Sawada Masahiko (sawada.m...@gmail.com) wrote: >> Attached patch is latest version including following changes. >> - This view is available to super use only >> - Add sourceline coulmn > > Alright, first off, to Josh's point- I'm definitely interested in a > capability to show where the heck a given config value is coming from. > I'd be even happier if there was a way to *force* where config values > have to come from, but that ship sailed a year ago or so. Having this > be for the files, specifically, seems perfectly reasonable to me. I > could see having another function which will report where a currently > set GUC variable came from (eg: ALTER ROLE or ALTER DATABASE), but > having a function for "which config file is this GUC set in" seems > perfectly reasonable to me. > > To that point, here's a review of this patch. > >> diff --git a/src/backend/catalog/system_views.sql >> b/src/backend/catalog/system_views.sql >> index 6df6bf2..f628cb0 100644 >> --- a/src/backend/catalog/system_views.sql >> +++ b/src/backend/catalog/system_views.sql >> @@ -412,6 +412,11 @@ CREATE RULE pg_settings_n AS >> >> GRANT SELECT, UPDATE ON pg_settings TO PUBLIC; >> >> +CREATE VIEW pg_file_settings AS >> + SELECT * FROM pg_show_all_file_settings() AS A; >> + >> +REVOKE ALL on pg_file_settings FROM public; > > While this generally "works", the usual expectation is that functions > that should be superuser-only have a check in the function rather than > depending on the execute privilege. I'm certainly happy to debate the > merits of that approach, but for the purposes of this patch, I'd suggest > you stick an if (!superuser()) ereport("must be superuser") into the > function itself and not work about setting the correct permissions on > it. > >> + ConfigFileVariable *guc; > > As this ends up being an array, I'd call it "guc_array" or something > along those lines. GUC in PG-land has a pretty specific single-entity > meaning. > >> @@ -344,6 +346,21 @@ ProcessConfigFile(GucContext context) >> PGC_BACKEND, >> PGC_S_DYNAMIC_DEFAULT); >> } >> >> + guc_file_variables = (ConfigFileVariable *) >> + guc_malloc(FATAL, num_guc_file_variables * >> sizeof(struct ConfigFileVariable)); > > Uh, perhaps I missed it, but what happens on a reload? Aren't we going > to realloc this every time? Seems like we should be doing a > guc_malloc() the first time through but doing guc_realloc() after, or > we'll leak memory on every reload. > >> + /* >> + * Apply guc config parameters to guc_file_variable >> + */ >> + guc = guc_file_variables; >> + for (item = head; item; item = item->next, guc++) >> + { >> + guc->name = guc_strdup(FATAL, item->name); >> + guc->value = guc_strdup(FATAL, item->value); >> + guc->filename = guc_strdup(FATAL, item->filename); >> + guc->sourceline = item->sourceline; >> + } > > Uh, ditto and double-down here. I don't see a great solution other than > looping through the previous array and free'ing each of these, since we > can't depend on the memory context machinery being up and ready at this > point, as I recall. > >> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c >> index 931993c..c69ce66 100644 >> --- a/src/backend/utils/misc/guc.c >> +++ b/src/backend/utils/misc/guc.c >> @@ -3561,9 +3561,17 @@ static const char *const map_old_guc_names[] = { >> */ >> static struct config_generic **guc_variables; >> >> +/* >> + * lookup of variables for pg_file_settings view. >> + */ >> +static struct ConfigFileVariable *guc_file_variables; >> + >> /* Current number of variables contained in the vector */ >> static int num_guc_variables; >> >> +/* Number of file variables */ >> +static int num_guc_file_variables; >> + > > I'd put these together, and add a comment explaining that > guc_file_variables is an array of length num_guc_file_variables.. > >> /* >> + * Return the total number of GUC File variables >> + */ >> +int >> +GetNumConfigFileOptions(void) >> +{ >> + return num_guc_file_variables; >> +} > > I don't see the point of this function.. > >> +Datum >> +show_all_file_settings(PG_FUNCTION_ARGS) >> +{ >> + FuncCallContext *funcctx; >> + TupleDesc tupdesc; >> + int call_cntr; >> + int max_calls; >> + AttInMetadata *attinmeta; >> + MemoryContext oldcontext; >> + >> + if (SRF_IS_FIRSTCALL()) >> + { >> + funcctx = SRF_FIRSTCALL_INIT(); >> + >> + oldcontext = >> MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); >> + >> + /* >> + * need a tuple descriptor representing NUM_PG_SETTINGS_ATTS >> columns >> + * of the appropriate types >> + */ >> + >> + tupdesc = CreateTemplateTupleDesc(NUM_PG_FILE_SETTINGS_ATTS,
Re: [HACKERS] Proposal: knowing detail of config files via SQL
* Peter Eisentraut (pete...@gmx.net) wrote: > On 3/3/15 5:29 PM, Stephen Frost wrote: > > For my part, I understood that we specifically didn't want to allow that > > for the same reason that we didn't want to simply depend on the GRANT > > system for the above functions, but everyone else on these discussions > > so far is advocating for using the GRANT system. My memory might be > > wrong, but I could have sworn that I had brought up exactly that > > suggestion in years past only to have it shot down. > > I think a lot of this access control work is done based on some > undocumented understandings, when in fact there is no consensus on > anything. I think we need more clarity. When there hasn't been discussion on a particular topic and years of ongoing development, all of which uses one approach, I tend to figure that makes it an unspoking consensus. I'm not saying we shouldn't question that when it makes sense to, we certainly should, but I'm not sure it makes sense to ask "why didn't you attempt to get consensus on this thing we've all been doing for years?" Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Additional role attributes && superuser review
* Peter Eisentraut (pete...@gmx.net) wrote: > On 2/28/15 10:10 PM, Stephen Frost wrote: > > * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: > >> I have attached and updated patch for review. > > > > Thanks! I've gone over this and made quite a few documentation and > > comment updates, but not too much else, so I'm pretty happy with how > > this is coming along. As mentioned elsewhere, this conflicts with the > > GetUserId() to has_privs_of_role() cleanup, but as I anticipate handling > > both this patch and that one, I'll find some way to manage. :) > > > > Updated patch attached. Barring objections, I'll be moving forward with > > this soonish. Would certainly appreciate any additional testing or > > review that you (or anyone!) has time to provide. > > Let's move this discussion to the right thread. Agreed. > Why are we not using roles and function execute privileges for this? There isn't a particular reason not to, except that the existing checks are in C code and those would need to be removed and the permission changes done at initdb time to revoke EXECUTE from PUBLIC for these functions. Further, as you pointed out, we'd need to dump out the permissions for the catalog tables and functions with this approach. I don't expect that to be too difficult to do though. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] contraints_exclusion fails to refute simple condition
> "Sandro" == Sandro Santilli writes: Sandro> PostGIS installs standard constraints of this kind: Sandro> CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL) If geometrytype() is strict, the IS NULL condition is superfluous. CHECK(x) _passes_ when x is null, rather than rejecting the row. Sandro> But it is _NOT_ used if the NOT NULL condition is removed: Sandro> WHERE geometrytype(g) = 'LINESTRING' Sandro> As the "geometrytype" is defined as STRICT and IMMUTABLE, Sandro> there's no way for geometrytype(g) = 'LINESTRING' to hold true, Sandro> so why is the "IS NOT NULL" condition also needed by the Sandro> planner ? Sandro> Andres Freund on IRC suggested that Sandro> predicate_refuted_by_simple_clause() looks like trying to Sandro> handle such cases, but if that's the case it seems to fail Sandro> here. It looks like it searches only one level deep into the clause, so it knows that if OP or FUNC is strict that (g OP const) or (FUNC(g)) will refute (g IS NULL), but not that (FUNC(g) OP const) will do so, because the reference to g is nested too deep. In this case the obvious thing to do is remove the redundant clause. Whether it would be worth teaching the predicate prover to recurse in such cases would be a performance tradeoff - likely a poor one, since it would have to recurse into arbitrarily complex expressions. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CATUPDATE confusion?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Peter Eisentraut writes: > > Any other opinions? > > The options are: > > 0) Leave as is. > > 1) Remove catupdate and replace with superuser checks. > > 2) Remove catupdate and rely on regular table permissions on catalogs. > > I think I vote for (1). I do not like (2) because of the argument I made > upthread that write access on system catalogs is far more dangerous than > a naive DBA might think. (0) has some attraction but really CATUPDATE > is one more concept than we need. I agree with #1 on this. > On the other hand, if Stephen is going to push forward with his plan to > subdivide superuserness, we might have the equivalent of CATUPDATE right > back again. (But at least it would be properly documented and > supported...) The subdivision of superuserness is intended only for operations which can't be used to directly give the user superuser access back and therefore I don't think we'd ever put back CATUPDATE or an equivilant. I'd much rather reduce the need for direct catalog modifications by adding additional syntax for those operations which can't be done without modifying the catalog directly and, where it makes sense to, add a way to control access to those operations. For example, changing a database to not accept connections seems like something the database owner should be allowed to do. Perhaps that'd be interesting to allow users other than the owner to do, perhaps it doesn't, but that would be an independent question to address. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal: knowing detail of config files via SQL
* Peter Eisentraut (pete...@gmx.net) wrote: > On 3/3/15 5:58 PM, Tom Lane wrote: > > One aspect of this that merits some thought is that in some cases > > access to some set of functions is best granted as a unit. That's > > easy with role properties but much less so with plain GRANT. > > Do we have enough such cases to make it an issue? > > You could have built-in roles, such as "backup" and ship the system with > the "backup" role having permissions on some functions. And then users > are granted those roles. Similar to how some Linux systems ship with > groups such as "adm". One thought I had for this was a contrib module which added an extension to create and grant those roles. That approach would mean that we don't need to worry about upgrade-path problems which we could get into if we declared new roles like 'backup' which users might already have. An alternative approach which might be better, now that I think about it, would be to declare that the 'pg_' prefix applies to roles too and then have a 'pg_backup' role which is granted the correct permissions. Personally, I like that idea a lot.. We could then have pg_upgrade throw an error and pg_dump a warning (or something along those lines) if they find any existing roles with that prefix. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] deparsing utility commands
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Thanks for the review. I have pushed these patches, squashed in one > commit, with the exception of the one that changed ALTER TABLE. You had > enough comments about that one that I decided to put it aside for now, > and get the other ones done. I will post a new series shortly. I look forward to seeing the new series posted soon. > I fixed (most of?) the issues you pointed out; some additional comments: These all looked good to me. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MD5 authentication needs help
On 3/4/15 2:56 PM, Stephen Frost wrote: 2) The per-session salt sent to the client is only 32-bits, meaning >that it is possible to reply an observed MD5 hash in ~16k connection >attempts. Yes, and we have no (PG-based) mechanism to prevent those connection attempts, which is a pretty horrible situation to be in. Is there some reason we don't just fix that? I'm thinking that this is a special case where we could just modify the pg_auth tuple in-place without bloating the catalog (we already do that somewhere else). Is there something else that makes this difficult? Are we afraid of an extra GUC to control it? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
* Jim Nasby (jim.na...@bluetreble.com) wrote: > On 3/4/15 2:56 PM, Stephen Frost wrote: > >>2) The per-session salt sent to the client is only 32-bits, meaning > >>>that it is possible to reply an observed MD5 hash in ~16k connection > >>>attempts. > >Yes, and we have no (PG-based) mechanism to prevent those connection > >attempts, which is a pretty horrible situation to be in. > > Is there some reason we don't just fix that? I'm thinking that this > is a special case where we could just modify the pg_auth tuple > in-place without bloating the catalog (we already do that somewhere > else). Is there something else that makes this difficult? Are we > afraid of an extra GUC to control it? I'm all for it, though I would ask that we provide a way for superusers to delegate the ability to reset locked accounts to non-superusers. I'd want to think about it a bit more before settling on using pg_authid to track the data. In any case, I do think we need a way to disable this ability for certain roles and, furtherr, that we not track failed logins in cases where it's disabled (which might well be the default- I don't think we want to add this overhead for systems which have lots of recurring logins (think application users where they aren't doing pooling). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] a2e35b53 added unused variable to ConversionCreate()
Peter Geoghegan wrote: > I'm seeing this on a the master branch tip when building at -O2: > > pg_conversion.c: In function ‘ConversionCreate’: > pg_conversion.c:53:6: error: variable ‘oid’ set but not used > [-Werror=unused-but-set-variable] > Oid oid; > ^ > cc1: all warnings being treated as errors > > I think that commit a2e35b53 did this. Obviously the commit did not introduce an unused variable, but instead made another variable take its place as the function's return value. In an assert-enabled build it was used by an assert, which is why I didn't notice the problem. I removed the assert and the variable so this should be fixed now. Thanks for reporting. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
On 3/5/15 2:17 PM, Stephen Frost wrote: * Jim Nasby (jim.na...@bluetreble.com) wrote: On 3/4/15 2:56 PM, Stephen Frost wrote: 2) The per-session salt sent to the client is only 32-bits, meaning that it is possible to reply an observed MD5 hash in ~16k connection attempts. Yes, and we have no (PG-based) mechanism to prevent those connection attempts, which is a pretty horrible situation to be in. Is there some reason we don't just fix that? I'm thinking that this is a special case where we could just modify the pg_auth tuple in-place without bloating the catalog (we already do that somewhere else). Is there something else that makes this difficult? Are we afraid of an extra GUC to control it? I'm all for it, though I would ask that we provide a way for superusers to delegate the ability to reset locked accounts to non-superusers. I'd want to think about it a bit more before settling on using pg_authid I guess it's a question of how durable we want it to be. We could conceivable keep it in shared memory and let it wipe on a crash. But we already have code that ignores MVCC on a catalog table (IIRC for updating pg_class stats after vacuum) so the pattern is there. I don't see that we need more sophistication than that... to track the data. In any case, I do think we need a way to disable this ability for certain roles In the interest of something for this release... do we really need that? My thought is we just special-case the postgres user and be done with it. Though, if there's some other way to reset an account from the shell, no need to even special case postgres. Though, I guess if we just follow the normal GUC behavior of allowing per-database and -user overrides it wouldn't be that hard. and, furtherr, that we not track failed logins in cases where it's disabled (which might well be the default- I don't think we want to add this overhead for systems which have lots of recurring logins (think application users where they aren't doing pooling). Yeah, presumably if allowed_authentication_failures < 0 then we don't bother with the check at all. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
* Jim Nasby (jim.na...@bluetreble.com) wrote: > On 3/5/15 2:17 PM, Stephen Frost wrote: > >* Jim Nasby (jim.na...@bluetreble.com) wrote: > >I'm all for it, though I would ask that we provide a way for superusers > >to delegate the ability to reset locked accounts to non-superusers. > > > >I'd want to think about it a bit more before settling on using pg_authid > > I guess it's a question of how durable we want it to be. We could > conceivable keep it in shared memory and let it wipe on a crash. > > But we already have code that ignores MVCC on a catalog table (IIRC > for updating pg_class stats after vacuum) so the pattern is there. I > don't see that we need more sophistication than that... I'm not sure we should jump to that immediately.. > >to track the data. In any case, I do think we need a way to disable > >this ability for certain roles > > In the interest of something for this release... do we really need > that? My thought is we just special-case the postgres user and be > done with it. Though, if there's some other way to reset an account > from the shell, no need to even special case postgres. I don't think this is going to happen for 9.5 unless someone shows up with code in the *very* short term.. Further, realistically, we would want to design this properly and not just hack something together. > Though, I guess if we just follow the normal GUC behavior of > allowing per-database and -user overrides it wouldn't be that hard. Yes, using the GUC-based approach would allow users to be excluded from an overall (or per-database) policy. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"
On 3/4/15 7:13 PM, Jan de Visser wrote: On March 4, 2015 11:08:09 PM Andres Freund wrote: Let's get the basic feature (notification of failed reloads) done first. That will be required with or without including the error message. Then we can get more fancy later, if somebody really wants to invest the time. Except for also fixing pg_reload_conf() to pay attention to what happens with postmaster.pid, the patch I submitted does exactly what you want I think. And I don't mind spending time on stuff like this. I'm not smart enough to deal with actual, you know, database technology. Yeah, lets at least get this wrapped and we can see about improving it. I like the idea of doing a here-doc or similar in the .pid, though I think it'd be sufficient to just prefix all the continuation lines with a tab. An uglier option would be just striping the newlines out. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/4/15 9:10 AM, Robert Haas wrote: On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby wrote: Could the large allocation[2] for the dead tuple array in lazy_space_alloc cause problems with linux OOM? [1] and some other things I've read indicate that a large mmap will count towards total system memory, including producing a failure if overcommit is disabled. I believe that this is possible. Would it be worth avoiding the full size allocation when we can? Maybe. I'm not aware of any evidence that this is an actual problem as opposed to a theoretical one. vacrelstats->dead_tuples is limited to a 1GB allocation, which is not a trivial amount of memory, but it's not huge, either. But we could consider changing the representation from a single flat array to a list of chunks, with each chunk capped at say 64MB. That would not only reduce the amount of memory that we I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. needlessly allocate, but would allow autovacuum to make use of more than 1GB of maintenance_work_mem, which it looks like it currently can't. I'm not sure that's a huge problem right now either, because I'm confused... how autovacuum is special in this regard? Each worker can use up to 1G, just like a regular vacuum, right? Or are you just saying getting rid of the 1G limit would be good? it's probably rare to vacuum at able with more than 1GB / 6 = 178,956,970 dead tuples in it, but it would certainly suck if you did and if the current 1GB limit forced you to do multiple vacuum passes. Yeah, with 100+ GB machines not that uncommon today perhaps it's worth significantly upping this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in pg_dump
On Wed, Mar 4, 2015 at 2:03 PM, Michael Paquier wrote: > On Wed, Mar 4, 2015 at 6:48 AM, Peter Eisentraut wrote: >> - set up basic scaffolding for TAP tests in src/bin/pg_dump > > Agreed. > >> - write a Perl function that can create an extension on the fly, given >> name, C code, SQL code > > I am perplex about that. Where would the SQL code or C code be stored? > In the pl script itself? I cannot really see the advantage to generate > automatically the skeletton of an extension based on some C or SQL > code in comparison to store the extension statically as-is. Adding > those extensions in src/test/modules is out of scope to not bloat it, > so we could for example add such test extensions in t/extensions/ or > similar, and have prove_check scan this folder, then install those > extensions in the temporary installation. > >> - add to the proposed t/001_dump_test.pl code to write the extension >> - add that test to the pg_dump test suite >> Eventually, the dump-and-restore routine could also be refactored, but >> as long as we only have one test case, that can wait. > > Agreed on all those points. Please note that I have created a new thread especially for this purpose here: http://www.postgresql.org/message-id/CAB7nPqRx=zmbfjyjrwhguhhqk__8m+wd+p95cenjtmhxxr7...@mail.gmail.com Perhaps we should move there this discussion as it is rather independent of the problem that has been reported. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan wrote: > Attached patch series forms what I'm calling V3.0 of the INSERT ... ON > CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel > this development justifies a new thread, though.) Bruce Momjian kindly made available a server for stress-testing [1]. I'm using jjanes_upsert for this task (I stopped doing this for a little while, and was in need of a new server). At the very highest client count I'm testing (128), I see unprincipled deadlocks for the exclusion constraint case very infrequently: " 2015-03-05 14:09:36 EST [ 64987901 ]: ERROR: deadlock detected 2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL: Process 7044 waits for ShareLock on promise tuple with token 1 of transaction 64987589; blocked by process 7200. Process 7200 waits for ShareLock on transaction 64987901; blocked by process 7044. Process 7044: insert into upsert_race_test (index, count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count Process 7200: insert into upsert_race_test (index, count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count 2015-03-05 14:09:36 EST [ 64987901 ]: HINT: See server log for query details. 2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT: insert into upsert_race_test (index, count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count " (Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs, are artificial; this has been enabled within the parser solely for the benefit of this stress-testing. Also, the B-Tree AM does not have nor require "livelock insurance"). This only happens after just over 30 minutes, while consistently doing 128 client exclusion constraint runs. This is pretty close to the most stressful thing that you could throw at the implementation, so that's really not too bad. I believe that this regression occurred as a direct result of adding "livelock insurance". Basically, we cannot be 100% sure that the interleaving of WAL-logged things within and across transactions won't be such that the "only", "oldest" session that gets to wait in the second stage (the second possible call to check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples()) will really be the "oldest" XID. Another *older* xact could just get in ahead of us, waiting on our promise tuple as we wait on its xact end (maybe it updates some third tuple that we didn't see in that has already committed...not 100% sure). Obviously XID assignment order does not guarantee that things like heap insertion and index tuple insertion occur in serial XID order, especially with confounding factors like super deletion. And so, every once in a long while we deadlock. Now, the very fact that this happens at all actually demonstrates the need for "livelock insurance", IMV. The fact that we reliably terminate is *reassuring*, because livelocks are in general a terrifying possibility. We cannot *truly* solve the unprincipled deadlock problem without adding livelocks, I think. But what we have here is very close to eliminating unprincipled deadlocks, while not also adding any livelocks, AFAICT. I'd argue that that's good enough. Of course, when I remove "livelock insurance", the problem ostensibly goes away (I've waited on such a stress test session for a couple of hours now, so this conclusion seems very likely to be correct). I think that we should do nothing about this, other than document it as possible in our comments on "livelock insurance". Again, it's very unlikely to be a problem in the real world, especially since B-Trees are unaffected. Also, I should point out that one of those waiters doesn't look like an insertion-related wait at all: "7200 waits for ShareLock on transaction 64987901; blocked by process 7044". Looks like row locking is an essential part of this deadlock, and ordinarily that isn't even possible for exclusion constraints (unless the patch is hacked to make the parser accept exclusion constraints for an UPSERT, as it has been here). Not quite sure what exact XactLockTableWait() call did this, but don't think it was any of them within check_exclusion_or_unique_constraint(), based on the lack of details (TID and so on are not shown). [1] http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] object description for FDW user mappings
On 06-03-2015 AM 01:32, Tom Lane wrote: > Alvaro Herrera writes: >> appendStringInfo(&buffer, _("user mapping for %s in server %s"), >> usename, >> srv->servername); > > +1 for the concept, but to be nitpicky, "in" doesn't seem like the right > word here. "on server" would read better to me; or perhaps "at server". > One more option may be "for server" (reading the doc for CREATE USER MAPPING) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] object description for FDW user mappings
On 06-03-2015 AM 09:18, Amit Langote wrote: > On 06-03-2015 AM 01:32, Tom Lane wrote: >> +1 for the concept, but to be nitpicky, "in" doesn't seem like the right >> word here. "on server" would read better to me; or perhaps "at server". >> > > One more option may be "for server" (reading the doc for CREATE USER MAPPING) Oh, I see it's been done that way already. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rethinking pg_dump's function sorting code
In bug #12832 Marko Tiikkaja points out that commit 7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary dump failure hazard, since it applies pg_get_function_identity_arguments() to every function in the database, even those that won't get dumped. I think we should fix this by getting rid of pg_dump's use of that function altogether. A low-tech way to sort functions of identical names would be to compare argument type OIDs, as in the attached simple patch. If people feel it's important to avoid depending on numerical OID order, we could instead look up type names locally and compare them as in the attached less-simple patch. (Both patches neglect reverting the data collection aspects of the prior commit, since that's mechanical; the only interesting part is what we'll do to sort.) Neither patch will exactly preserve the sort behavior of the current code, but I don't think that's important. Comments? regards, tom lane diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 4b9bba0..816c9f0 100644 *** a/src/bin/pg_dump/pg_dump_sort.c --- b/src/bin/pg_dump/pg_dump_sort.c *** DOTypeNameCompare(const void *p1, const *** 291,303 { FuncInfo *fobj1 = *(FuncInfo *const *) p1; FuncInfo *fobj2 = *(FuncInfo *const *) p2; cmpval = fobj1->nargs - fobj2->nargs; if (cmpval != 0) return cmpval; ! cmpval = strcmp(fobj1->proiargs, fobj2->proiargs); ! if (cmpval != 0) ! return cmpval; } else if (obj1->objType == DO_OPERATOR) { --- 291,307 { FuncInfo *fobj1 = *(FuncInfo *const *) p1; FuncInfo *fobj2 = *(FuncInfo *const *) p2; + int i; cmpval = fobj1->nargs - fobj2->nargs; if (cmpval != 0) return cmpval; ! for (i = 0; i < fobj1->nargs; i++) ! { ! cmpval = oidcmp(fobj1->argtypes[i], fobj2->argtypes[i]); ! if (cmpval != 0) ! return cmpval; ! } } else if (obj1->objType == DO_OPERATOR) { diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 4b9bba0..de9407c 100644 *** a/src/bin/pg_dump/pg_dump_sort.c --- b/src/bin/pg_dump/pg_dump_sort.c *** DOTypeNameCompare(const void *p1, const *** 291,303 { FuncInfo *fobj1 = *(FuncInfo *const *) p1; FuncInfo *fobj2 = *(FuncInfo *const *) p2; cmpval = fobj1->nargs - fobj2->nargs; if (cmpval != 0) return cmpval; ! cmpval = strcmp(fobj1->proiargs, fobj2->proiargs); ! if (cmpval != 0) ! return cmpval; } else if (obj1->objType == DO_OPERATOR) { --- 291,313 { FuncInfo *fobj1 = *(FuncInfo *const *) p1; FuncInfo *fobj2 = *(FuncInfo *const *) p2; + int i; cmpval = fobj1->nargs - fobj2->nargs; if (cmpval != 0) return cmpval; ! for (i = 0; i < fobj1->nargs; i++) ! { ! TypeInfo *argtype1 = findTypeByOid(fobj1->argtypes[i]); ! TypeInfo *argtype2 = findTypeByOid(fobj2->argtypes[i]); ! ! if (argtype1 && argtype2) ! { ! cmpval = strcmp(argtype1->dobj.name, argtype2->dobj.name); ! if (cmpval != 0) ! return cmpval; ! } ! } } else if (obj1->objType == DO_OPERATOR) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] object description for FDW user mappings
Amit Langote writes: > On 06-03-2015 AM 01:32, Tom Lane wrote: >> +1 for the concept, but to be nitpicky, "in" doesn't seem like the right >> word here. "on server" would read better to me; or perhaps "at server". > One more option may be "for server" (reading the doc for CREATE USER MAPPING) Hm, but then you'd have "user mapping for foo for server bar", which doesn't read so nicely either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] object description for FDW user mappings
On 06-03-2015 AM 09:30, Tom Lane wrote: > Amit Langote writes: > >> One more option may be "for server" (reading the doc for CREATE USER MAPPING) > > Hm, but then you'd have "user mapping for foo for server bar", which > doesn't read so nicely either. > Yeah, I had totally missed the "for foo" part; I was thinking of it as "of foo". By the way, in this case, is "foo" the name/id of a local user or does it really refer to some "foo on the remote server"? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] object description for FDW user mappings
Amit Langote writes: > By the way, in this case, is "foo" the name/id of a local user or does it > really refer to some "foo on the remote server"? It's the name of a local user. I see your point that somebody might misread this as suggesting that it's a remote username, but not sure that there's anything great we can do here to disambiguate that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
> Actually val and val2 come from public.lt in "r" side, but as you say > it's too difficult to know that from EXPLAIN output. Do you have any > idea to make the "Output" item more readable? > A fundamental reason why we need to have symbolic aliases here is that postgres_fdw has remote query in cstring form. It makes implementation complicated to deconstruct/construct a query that is once constructed on the underlying foreign-path level. If ForeignScan keeps items to construct remote query in expression node form (and construction of remote query is delayed to beginning of the executor, probably), we will be able to construct more human readable remote query. However, I don't recommend to work on this great refactoring stuff within the scope of join push-down support project. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada > Sent: Thursday, March 05, 2015 10:00 PM > To: Ashutosh Bapat > Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development > Subject: Re: [HACKERS] Join push-down support for foreign tables > > Hi Ashutosh, thanks for the review. > > 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat : > > In create_foreignscan_path() we have lines like - > > 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel, > > 1588 > > required_outer); > > Now, that the same function is being used for creating foreign scan paths > > for joins, we should be calling get_joinrel_parampathinfo() on a join rel > > and get_baserel_parampathinfo() on base rel. > > Got it. Please let me check the difference. > > > > > The patch seems to handle all the restriction clauses in the same way. There > > are two kinds of restriction clauses - a. join quals (specified using ON > > clause; optimizer might move them to the other class if that doesn't affect > > correctness) and b. quals on join relation (specified in the WHERE clause, > > optimizer might move them to the other class if that doesn't affect > > correctness). The quals in "a" are applied while the join is being computed > > whereas those in "b" are applied after the join is computed. For example, > > postgres=# select * from lt; > > val | val2 > > -+-- > >1 |2 > >1 |3 > > (2 rows) > > > > postgres=# select * from lt2; > > val | val2 > > -+-- > >1 |2 > > (1 row) > > > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2); > > val | val2 | val | val2 > > -+--+-+-- > >1 |2 | 1 |2 > >1 |3 | | > > (2 rows) > > > > postgres=# select * from lt left join lt2 on (true) where (lt.val2 = > > lt2.val2); > > val | val2 | val | val2 > > -+--+-+-- > >1 |2 | 1 |2 > > (1 row) > > > > The difference between these two kinds is evident in case of outer joins, > > for inner join optimizer puts all of them in class "b". The remote query > > sent to the foreign server has all those in ON clause. Consider foreign > > tables ft1 and ft2 pointing to local tables on the same server. > > postgres=# \d ft1 > > Foreign table "public.ft1" > > Column | Type | Modifiers | FDW Options > > +-+---+- > > val| integer | | > > val2 | integer | | > > Server: loopback > > FDW Options: (table_name 'lt') > > > > postgres=# \d ft2 > > Foreign table "public.ft2" > > Column | Type | Modifiers | FDW Options > > +-+---+- > > val| integer | | > > val2 | integer | | > > Server: loopback > > FDW Options: (table_name 'lt2') > > > > postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 = > > ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null; > > > > QUERY PLAN > > > > > > --- > > > > > > Foreign Scan (cost=100.00..125.60 rows=2560 width=16) > >Output: val, val2, val, val2 > >Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM > > public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a > > _0, a_1) ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 = > > l.a_1)) > > (3 rows) > > > > The result is then wrong > > postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where > > ft1.val + ft2.val > ft1.val2 or ft2.val is null; > > val | val2 | val | val2 > > -+--+-+-- > >1 |2 | | > >1 |3 | | > > (2 rows) > > > > which should match the result obtained by substituting local tables for > > foreign ones > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where > > l
Re: [HACKERS] object description for FDW user mappings
Tom Lane-2 wrote > Amit Langote < > Langote_Amit_f8@.co > > writes: >> By the way, in this case, is "foo" the name/id of a local user or does it >> really refer to some "foo on the remote server"? > > It's the name of a local user. I see your point that somebody might > misread this as suggesting that it's a remote username, but not sure > that there's anything great we can do here to disambiguate that. Yeah, clarifying that point seems to add verbosity for little gain. On the message aspect is it against style to use composite notation in a case like this? "user mapping (%s, %s)" It is basically a single, if compound, noun so adding in/at/to doesn't feel right... David J. -- View this message in context: http://postgresql.nabble.com/object-description-for-FDW-user-mappings-tp5840655p5840729.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdly pesimistic estimates in optimizer
On 2/28/15 12:01 PM, David Kubečka wrote: With 'random_fk_dupl': -> Index Scan using facts_fk_idx on facts (cost=0.42..5.75 rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100) With 'random_fk_uniq': -> Index Scan using facts_fk_idx on facts (cost=0.42..214.26 rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100) I have read the optimizer README file and also looked briefly at the code, but this seems to be something not related to particular implementation of algorithm (e.g. nested loop). Perhaps it's the way how cost estimates are propagated down (or sideways? that would be weird...) the query tree. But I am really not sure, since this is my first time lookng at the optimizer code base. I should also add that I have reproduced this behaviour for all versions of Pg from 9.2 up to current devel. This got answered on one of the other lists, right? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On 3/2/15 10:58 AM, Sawada Masahiko wrote: On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby wrote: On 2/24/15 8:28 AM, Sawada Masahiko wrote: According to the above discussion, VACUUM and REINDEX should have trailing options. Tom seems (to me) suggesting that SQL-style (bare word preceded by WITH) options and Jim suggesting '()' style options? (Anyway VACUUM gets the*third additional* option sytle, but it is the different discussion from this) Well, almost everything does a trailing WITH. We need to either stick with that for consistency, or add leading () as an option to those WITH commands. Does anyone know why those are WITH? Is it ANSI? As a refresher, current commands are: VACUUM (ANALYZE, VERBOSE) table1 (col1); REINDEX INDEX index1 FORCE; COPY table1 FROM 'file.txt' WITH (FORMAT csv); CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA; CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over; CREATE ROLE role WITH LOGIN; GRANT WITH GRANT OPTION; CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION; ALTER DATABASE db1 WITH CONNECTION LIMIT 50; DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD; BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the most consistent with everything else. Is there a problem with doing that? I know getting syntax is one of the hard parts of new features, but it seems like we reached consensus here... We have discussed about this option including FORCE option, but I think there are not user who want to use both FORCE and VERBOSE option at same time. I find that very hard to believe... I would expect a primary use case for VERBOSE to be "I ran REINDEX, but it doesn't seem to have done anything... what's going on?" and that's exactly when you might want to use FORCE. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configurable location for extension .control files
On 3/4/15 1:41 AM, Oskari Saarenmaa wrote: >I'm interested in this because it could potentially make it possible to >install SQL extensions without OS access. (My understanding is there's >some issue with writing the extension files even if LIBDIR is writable >by the OS account). I'm not sure this patch makes extensions without OS access any easier to implement; you still need to write the files somewhere, and someone needs to set up the cluster with a nonstandard extension path. Right, I wasn't expecting it to work out of the box. Admittedly I'm foggy on what the exact problem with pushing a file into that directory via SQL is, so maybe it still won't help. >>I don't think anyone should be packaging and shipping PG with >>extension_directory set, but we really should allow it for superusers >>and postgresql.conf so people can test extensions without hacks like >>this:https://github.com/ohmu/pgmemcache/blob/master/localtests.sh#L23 > >FWIW I'd like to see is breaking the cluster setup/teardown capability >in pg_regress into it's own tool. It wouldn't solve the extension test >problem, but it would eliminate the need for most of what that script is >doing, and it would do it more robustly. It would make it very easy to >unit test things with frameworks other than pg_regress. This would certainly be useful. I can try to do something about it if we get configurable extension path supported. The patch is now in https://commitfest.postgresql.org/5/170/ Awesome! Let me know if I can help. Or I may start on it if I can get some other stuff off my plate. By the way, after thinking about this a little more, I think a utility for standing up "temporary" Postgres clusters would be very welcome by power users. It's a very common need for QA environments. Originally I was thinking about it in terms of things like extension testing, but now I realize there's a much larger audience than that. So I definitely think this should be a stand alone shell command (pg_temp_cluster?), and either have pg_regress call it or (probably more logical) add it to the make files as a dependency for make check (make temp-cluster/remove-temp-cluster or similar). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anyarray
On 3/4/15 10:28 PM, Tom Lane wrote: > Peter Eisentraut writes: >> On 2/13/15 10:20 AM, Teodor Sigaev wrote: >>> Some of users of intarray contrib module wish to use its features with >>> another kind of arrays, not only for int4 type. Suggested module >>> generalizes intarray over other (not all) types op pgsql. > >> I think this module should be merged with the intarray module. Having >> two modules with very similar functionality would be confusing. > > Perhaps. I think it would be hard to remove intarray without breaking > things for existing users of it; even if the functionality remains under > another name. And surely we don't want to generalize intarray while > keeping that same name. So it might be hard to get to a clean solution. Maybe we could have the anyarray module (name TBD) install two extensions: one of its own, and one that is named intarray that just pulls anyarray in as a dependency. There are more fundamental questions to be answered about the functionality of this proposal first, however. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdly pesimistic estimates in optimizer
On 3/5/15 7:58 PM, Jim Nasby wrote: This got answered on one of the other lists, right? That was supposed to be off-list. I'll answer my own question: yes. Sorry for the noise. :( -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Thu, Mar 5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote: > You are correct that a pg_controldata file is copied over that has > wal_level=minimal, but that should not be a problem. > > > I suppose, this is the root cause of why replica does not start as hot > standby. > It it enough to start it as warm standby, but not hot standby. > See CheckRequiredParameterValues function in xlog.c which is called inside of > StartupXLOG function. Yes, you are correct. I spent all day building a test harness so I could automate this setup and test various failures. I was able to reproduce your failure, and you are correct that the proper fix is to set wal_level=hot_standby on the new master, and then start and stop the new cluster just before rsync. The root cause is that pg_upgrade calls pg_resetxlog -o on the new cluster _after_ the new cluster stopped for the final time, so rsync is copying the incorrect pg_controldata wal_level value. Also, even if pg_resetxlog preserved wal_level in the control file, there is no guarantee that the user configured the new cluster's wal_level for hot_standby anyway. What I have done is to update the pg_upgrade instructions to add this required step. Updated doc patch attached. (I also added the --delete flag to rsync.) Thanks so much for your detailed report. > But it could not be done with --size-only key, because control-file is > of fixed > size and rsync would skip it. Or may be everything should be copied > with > --size-only and control-file should be copied without this option. > > > Well, what happens is that there is no _new_ standby pg_controldata > file, so it is copied fully from the new master. Are you running initdb > to create the new standby --- you shouldn't be doing that as the rsync > will do that for you. > > > No, I don’t. The scenario of the problem with copying control-file was in case > when I: > 1. ran pg_upgrade on master and got control-file with "wal_level = minimal", > 2. did rsync --size-only to replica (and it got this control-file with > "wal_level = minimal"), > 3. started and stopped postgres on master to get «good» control-file with > "wal_level = hot_standby», > 4. did rsync --size-only to replica one more time. And this time control-file > is not copied because of the same size of control-file. > > Actually, if you don’t do step 2, everything works as expected. Sorry for > bothering you. Ah, yes, I think doing rsync twice is never a good suggestion. It can lead to too many failures. Doing the start/stop before rsync seems like the best solution. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml new file mode 100644 index 07ca0dc..e25e0d0 *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *** tar -cf backup.tar /usr/local/pgsql/data *** 438,445 Another option is to use rsync to perform a file system backup. This is done by first running rsync while the database server is running, then shutting down the database !server just long enough to do a second rsync. The !second rsync will be much quicker than the first, because it has relatively little data to transfer, and the end result will be consistent because the server was down. This method allows a file system backup to be performed with minimal downtime. --- 438,447 Another option is to use rsync to perform a file system backup. This is done by first running rsync while the database server is running, then shutting down the database !server long enough to do an rsync --checksum. !(--checksum is necessary because rsync only !has file modification-time granularity of one second.) The !second rsync will be quicker than the first, because it has relatively little data to transfer, and the end result will be consistent because the server was down. This method allows a file system backup to be performed with minimal downtime. diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml new file mode 100644 index e1cd260..56cb45d *** a/doc/src/sgml/pgupgrade.sgml --- b/doc/src/sgml/pgupgrade.sgml *** NET STOP postgresql-8.4 *** 315,320 --- 315,324 NET STOP postgresql-9.0 + + + Log-shipping standby servers can remain running until a later step. + *** pg_upgrade.exe *** 399,404 --- 403,535 + Upgrade any Log-Shipping Standby Servers + + + If you have Log-Shipping Standby Servers (), follow these steps to upgrade them (before + starting any servers): + + + + + + Install the new PostgreSQL binaries on standby servers + + +Make sure the new binaries and support fi
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 3/5/15 9:42 AM, Greg Stark wrote: > Well if you want to read the file as is you can do so using the file > reading functions which afaik were specifically intended for the > purpose of writing config editing tools. Sure, but those things are almost never installed by default, and I don't want to install them if they provide easy write access, and they don't help you find the actual file. This proposal is much nicer. > Joining against other catalog tables may be kind of exaggerated but if > I have data in an SQL view I certainly expect to be able to write > WHERE clauses to find the rows I want without having to do string > parsing. If it were a comma-delimited string I have to do something > like WHERE users LIKE '%,username,%' but then that's not good enough > since it may be the first or last and the username itself may contain > white-space or a comma etc etc. The point is, it should be one or the other (or both), not something in the middle. It's either a textual representation of the file or a semantic one. If it's the latter, then all user names, group names, and special key words need to be resolved in some way that they cannot be confused with unfortunately named user names. And there needs to be a way that we can add more in the future. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund wrote: > On 2015-03-05 12:14:04 +, Syed, Rahila wrote: >> Please find attached a patch. As discussed, flag to denote >> compression and presence of hole in block image has been added in >> XLogRecordImageHeader rather than block header. > > FWIW, I personally won't commit it with things done that way. I think > it's going the wrong way, leading to a harder to interpret and less > flexible format. I'm not going to further protest if Fujii or Heikki > commit it this way though. I'm pretty sure that we can discuss the *better* WAL format even after committing this patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c
On Fri, Mar 6, 2015 at 12:42 AM, Robert Haas wrote: > On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera wrote: >> Here's a simple idea to improve the patch: make VacuumParams include >> VacuumStmt and the for_wraparound and do_toast flags. ISTM that reduces >> the number of arguments to be passed down in a couple of places. In >> particular: >> >> vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel) >> >> vacuum_rel(VacuumParams *params) >> >> Does that sound more attractive? > > I dislike passing down parser nodes straight into utility commands. > It tends to make those those functions hard for in-core users to call, > and also to lead to security vulnerabilities where we look up the same > names more than once and just hope that we get the same OID every > time. Stuffing the VacuumStmt pointer inside the VacuumParams object > doesn't, for me, help anything. It'd be a lot more interesting if we > could get rid of that altogether. Do you mean removing totally VacuumStmt from the stack? We would then need to add relation and va_cols as additional arguments of things like vacuum_rel, analyze_rel, do_analyze_rel or similar. FWIW, adding do_toast and for_wraparound into VacuumParams makes sense to me, but not VacuumStmt. It has little meaning as VacuumParams should be used for parameters. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table-level log_autovacuum_min_duration
On Thu, Mar 5, 2015 at 9:49 PM, Michael Paquier wrote: > On Thu, Mar 5, 2015 at 7:10 PM, Fujii Masao wrote: >> With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message. >> Why did you remove that functionality? > > Oops. Sorry about that. In gram.y, the combination of VacuumStmt with > AnalyzeStmt overwrote the value of log_min_duration incorrectly. I > also found another bug related to logging of ANALYZE not working > correctly because of the use of IsAutoVacuumWorkerProcess() instead of > VACOPT_VERBOSE (this is reducing the diffs of the patch btw). All > those things are fixed in the attached. Thanks for updating the patch! Why does log_min_duration need to be set even when manual VACUUM command is executed? Per the latest version of the patch, log_min_duration is checked only when the process is autovacuum worker. So ISTM that log_min_duration doesn't need to be set in gram.y. It's even confusing to me. Or if you're going to implement something like "VACUUM VERBOSE DURATION n" (i.e., verbose message is output if n seconds have been elapsed), that might be necessary, though... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdly pesimistic estimates in optimizer
=?UTF-8?Q?David_Kube=C4=8Dka?= writes: > There is division by loop_count because of predicted effect of caching and > it is exactly this division which makes the run_cost for single index > lookup so low compared with the query version with random_fk_uniq. So the > main problem is that the planner calls cost_index with loop_count equal to > number of rows of inner table, *but it ignores semi-join semantics*, i.e. > it doesn't account for the number of *unique* rows in the inner table which > will actually be the number of loops. Good point. > Since this is my first time looking into the optimizer (and in fact any > postgres) code I am not yet able to locate the exact place where this > should be repaired, but I hope that in few days I will have a patch :-) Hm, this might not be the best problem to tackle for your first Postgres patch :-(. The information about the estimated number of unique rows isn't readily available at the point where we're creating parameterized paths. When we do compute such an estimate, it's done like this: pathnode->path.rows = estimate_num_groups(root, uniq_exprs, rel->rows); where uniq_exprs is a list of right-hand-side expressions we've determined belong to the semijoin, and rel->rows represents the raw size of the semijoin's RHS relation (which might itself be a join). Neither of those things are available to create_index_path. I chewed on this for awhile and decided that there'd be no real harm in taking identification of the unique expressions out of create_unique_path() and doing it earlier, in initsplan.c; we'd need a couple more fields in SpecialJoinInfo but that doesn't seem like a problem. However, rel->rows is a *big* problem; we simply have not made any join size estimates yet, and can't, because these things are done bottom up. However ... estimate_num_groups's dependency on its rowcount input is not large (it's basically using it as a clamp). So conceivably we could have get_loop_count just multiply together the sizes of the base relations included in the semijoin's RHS to get a preliminary estimate of that number. This would be the right thing anyway for a single relation in the RHS, which is the most common case. It would usually be an overestimate for join RHS, but we could hope that the output of estimate_num_groups wouldn't be affected too badly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
On Thu, Mar 5, 2015 at 9:30 AM, Kouhei Kaigai wrote: > > On Wed, Mar 4, 2015 at 4:41 AM, David Rowley > wrote: > > >> This thread mentions "parallel queries" as a use case, but that means > > >> passing data between processes, and that requires being able to > > >> serialize and deserialize the aggregate state somehow. For actual data > > >> types that's not overly difficult I guess (we can use in/out > functions), > > >> but what about aggretates using 'internal' state? I.e. aggregates > > >> passing pointers that we can't serialize? > > > > > > This is a good question. I really don't know the answer to it as I've > not > > > looked at the Robert's API for passing data between backends yet. > > > > > > Maybe Robert or Amit can answer this? > > > > I think parallel aggregation will probably require both the > > infrastructure discussed here, namely the ability to combine two > > transition states into a single transition state, and also the ability > > to serialize and de-serialize transition states, which has previously > > been discussed in the context of letting hash aggregates spill to > > disk. My current thinking is that the parallel plan will look > > something like this: > > > > HashAggregateFinish > > -> Funnel > > -> HashAggregatePartial > > -> PartialHeapScan > > > > So the workers will all pull from a partial heap scan and each worker > > will aggregate its own portion of the data. Then it'll need to pass > > the results of that step back to the master, which will aggregate the > > partial results and produce the final results. I'm guessing that if > > we're grouping on, say, column a, the HashAggregatePartial nodes will > > kick out 2-column tuples of the form (, > transition state for group with that value for a>). > > > It may not be an urgent topic to be discussed towards v9.5, however, > I'd like to raise a topic about planner and aggregates. > > Once we could get the two phase aggregation, planner needs to pay > attention possibility of partial aggregate during plan construction, > even though our current implementation attach Agg node after the > join/scan plan construction. > > Probably, a straightforward design is to add FunnelPath with some > child nodes on set_rel_pathlist() or add_paths_to_joinrel(). > Its child node may be PartialAggregate node (or some other parallel > safe node of course). In this case, it must inform the planner this > node (for more correctness, targetlist of the node) returns partial > aggregation, instead of the values row-by-row. > Then, planner need to track and care about which type of data shall > be returned to the upper node. Path node may have a flag to show it, > and we also may need to inject dummy PartialAggregate if we try to > join a relation that returns values row-by-row and another one that > returns partial aggregate. > It also leads another problem. The RelOptInfo->targetlist will > depend on the Path node chosen. Even if float datum is expected as > an argument of AVG(), its state to be combined is float[3]. > So, we will need to adjust the targetlist of RelOptInfo, once Path > got chosen. > > Anyway, I could find out, at least, these complicated issues around > two-phase aggregate integration with planner. Someone can suggest > minimum invasive way for these integration? > > Postgres-XC solved this question by creating a plan with two Agg/Group nodes, one for combining transitioned result and one for creating the distributed transition results (one per distributed run per group). So, Agg/Group for combining result had as many Agg/Group nodes as there are distributed/parallel runs. But XC chose this way to reduce the code footprint. In Postgres, we can have different nodes for combining and transitioning as you have specified above. Aggregation is not pathified in current planner, hence XC took the approach of pushing the Agg nodes down the plan tree when there was distributed/parallel execution possible. If we can get aggregation pathified, we can go by path-based approach which might give a better judgement of whether or not to distribute the aggregates itself. Looking at Postgres-XC might be useful to get ideas. I can help you there. > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Combining Aggregates
On 6 March 2015 at 19:01, Ashutosh Bapat wrote: > Postgres-XC solved this question by creating a plan with two Agg/Group > nodes, one for combining transitioned result and one for creating the > distributed transition results (one per distributed run per group). > > So, Agg/Group for combining result had as many Agg/Group nodes as there > are distributed/parallel runs. > This sounds quite like the planner must be forcing the executor to having to execute the plan on a fixed number of worker processes. I really hoped that we could, one day, have a load monitor process that decided what might be the best number of threads to execute a parallel plan on. Otherwise how would we decide how many worker processes to allocate to a plan? Surely there must be times where only utilising half of the processors for a query would be better than trying to use all processors and having many more context switched to perform. Probably the harder part about dynamically deciding the number of workers would be around the costing. Where maybe the plan will execute the fastest with 32 workers, but if it was only given 2 workers then it might execute better as a non-parallel plan. > But XC chose this way to reduce the code footprint. In Postgres, we can > have different nodes for combining and transitioning as you have specified > above. Aggregation is not pathified in current planner, hence XC took the > approach of pushing the Agg nodes down the plan tree when there was > distributed/parallel execution possible. If we can get aggregation > pathified, we can go by path-based approach which might give a better > judgement of whether or not to distribute the aggregates itself. > > Looking at Postgres-XC might be useful to get ideas. I can help you there. > > Regards David Rowley
Re: [HACKERS] pg_upgrade and rsync
> 6 марта 2015 г., в 6:11, Bruce Momjian написал(а): > > On Thu, Mar 5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote: >>You are correct that a pg_controldata file is copied over that has >>wal_level=minimal, but that should not be a problem. >> >> >> I suppose, this is the root cause of why replica does not start as hot >> standby. >> It it enough to start it as warm standby, but not hot standby. >> See CheckRequiredParameterValues function in xlog.c which is called inside of >> StartupXLOG function. > > Yes, you are correct. I spent all day building a test harness so I > could automate this setup and test various failures. I was able to > reproduce your failure, and you are correct that the proper fix is to > set wal_level=hot_standby on the new master, and then start and stop the > new cluster just before rsync. > > The root cause is that pg_upgrade calls pg_resetxlog -o on the new > cluster _after_ the new cluster stopped for the final time, so rsync is > copying the incorrect pg_controldata wal_level value. Also, even if > pg_resetxlog preserved wal_level in the control file, there is no > guarantee that the user configured the new cluster's wal_level for > hot_standby anyway. > > What I have done is to update the pg_upgrade instructions to add this > required step. Updated doc patch attached. (I also added the --delete > flag to rsync.) Thanks so much for your detailed report. It seems to work fine now. The only thing that would be nice to change is to explicitly write that these instructions are correct for hot standby installations too. + + If you have Log-Shipping Standby Servers (), follow these steps to upgrade them (before + starting any servers): + Actually, I’ve entered this thread because it is not obvious from the paragraph above or any other places. > >>But it could not be done with --size-only key, because control-file is >>of fixed >>size and rsync would skip it. Or may be everything should be copied >>with >>--size-only and control-file should be copied without this option. >> >> >>Well, what happens is that there is no _new_ standby pg_controldata >>file, so it is copied fully from the new master. Are you running initdb >>to create the new standby --- you shouldn't be doing that as the rsync >>will do that for you. >> >> >> No, I don’t. The scenario of the problem with copying control-file was in >> case >> when I: >> 1. ran pg_upgrade on master and got control-file with "wal_level = minimal", >> 2. did rsync --size-only to replica (and it got this control-file with >> "wal_level = minimal"), >> 3. started and stopped postgres on master to get «good» control-file with >> "wal_level = hot_standby», >> 4. did rsync --size-only to replica one more time. And this time control-file >> is not copied because of the same size of control-file. >> >> Actually, if you don’t do step 2, everything works as expected. Sorry for >> bothering you. > > Ah, yes, I think doing rsync twice is never a good suggestion. It can > lead to too many failures. Doing the start/stop before rsync seems like > the best solution. > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + Everyone has their own god. + > -- May the force be with you… https://simply.name
Re: [HACKERS] Combining Aggregates
On Fri, Mar 6, 2015 at 12:41 PM, David Rowley wrote: > On 6 March 2015 at 19:01, Ashutosh Bapat > wrote: > >> Postgres-XC solved this question by creating a plan with two Agg/Group >> nodes, one for combining transitioned result and one for creating the >> distributed transition results (one per distributed run per group). >> > > >> So, Agg/Group for combining result had as many Agg/Group nodes as there >> are distributed/parallel runs. >> > > This sounds quite like the planner must be forcing the executor to having > to execute the plan on a fixed number of worker processes. > > I really hoped that we could, one day, have a load monitor process that > decided what might be the best number of threads to execute a parallel plan > on. Otherwise how would we decide how many worker processes to allocate to > a plan? Surely there must be times where only utilising half of the > processors for a query would be better than trying to use all processors > and having many more context switched to perform. > > Probably the harder part about dynamically deciding the number of workers > would be around the costing. Where maybe the plan will execute the fastest > with 32 workers, but if it was only given 2 workers then it might execute > better as a non-parallel plan. > XC does that, because it knew on how many nodes it had to distribute the aggregation while creating the plan. To keep that dynamic, we can add a place-holder planner node for producing transitioned results for a given distributed run. At the time of execution, that node creates one executor node (corresponding to the place-holder node) per parallel run. I haven't seen a precedence in PG code to create more than one executor node for a given planner node, but is that a rule? > > >> But XC chose this way to reduce the code footprint. In Postgres, we can >> have different nodes for combining and transitioning as you have specified >> above. Aggregation is not pathified in current planner, hence XC took the >> approach of pushing the Agg nodes down the plan tree when there was >> distributed/parallel execution possible. If we can get aggregation >> pathified, we can go by path-based approach which might give a better >> judgement of whether or not to distribute the aggregates itself. >> >> Looking at Postgres-XC might be useful to get ideas. I can help you there. >> >> > > Regards > > David Rowley > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company