Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Thu, Feb 19, 2015 at 3:57 PM, Michael Paquier wrote: > On Thu, Feb 19, 2015 at 2:58 PM, Tom Lane wrote: >> Michael Paquier writes: >>> 1-2) sizeof(ParamListInfoData) is present in a couple of places, >>> assuming that sizeof(ParamListInfoData) has the equivalent of 1 >>> parameter, like prepare.c, functions.c, spi.c and postgres.c: >>> - /* sizeof(ParamListInfoData) includes the first array element */ >>> paramLI = (ParamListInfo) >>> palloc(sizeof(ParamListInfoData) + >>> - (num_params - 1) * sizeof(ParamExternData)); >>> + num_params * sizeof(ParamExternData)); >>> 1-3) FuncCandidateList in namespace.c (thanks Andres!): >>> newResult = (FuncCandidateList) >>> - palloc(sizeof(struct _FuncCandidateList) - >>> sizeof(Oid) >>> - + effective_nargs * sizeof(Oid)); >>> + palloc(sizeof(struct _FuncCandidateList) + >>> + effective_nargs * sizeof(Oid)); >>> I imagine that we do not want for those palloc calls to use ifdef >>> FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if >>> compiler does not support flexible-array length, right? >> >> These are just wrong. As a general rule, we do not want to *ever* take >> sizeof() a struct that contains a flexible array: the results will not >> be consistent across platforms. The right thing is to use offsetof() >> instead. See the helpful comment autoconf provides: >> >> [...] > > And I had this one in front of my eyes a couple of hours ago... Thanks. > >> This point is actually the main reason we've not done this change long >> since. People did not feel like running around to make sure there were >> no overlooked uses of sizeof(). > > Thanks for the clarifications and the review. Attached is a new set. Grr. Completely forgot to use offsetof in dumputils.c as well. Patch that can be applied on top of 0001 is attached. -- Michael diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c index 892d3fa..3459adc 100644 --- a/src/bin/pg_dump/dumputils.c +++ b/src/bin/pg_dump/dumputils.c @@ -1217,7 +1217,7 @@ simple_string_list_append(SimpleStringList *list, const char *val) SimpleStringListCell *cell; /* this calculation correctly accounts for the null trailing byte */ - cell = (SimpleStringListCell *) pg_malloc(sizeof(SimpleStringListCell)); + cell = (SimpleStringListCell *) pg_malloc(offsetof(SimpleStringListCell, val)); cell->next = NULL; strcpy(cell->val, val); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai : > Let me put some comments in addition to where you're checking now. > > [design issues] > * Cost estimation > Estimation and evaluation of cost for remote join query is not an > obvious issue. In principle, local side cannot determine the cost > to run remote join without remote EXPLAIN, because local side has > no information about JOIN logic applied on the remote side. > Probably, we have to put an assumption for remote join algorithm, > because local planner has no idea about remote planner's choice > unless foreign-join don't take "use_remote_estimate". > I think, it is reasonable assumption (even if it is incorrect) to > calculate remote join cost based on local hash-join algorithm. > If user wants more correct estimation, remote EXPLAIN will make > more reliable cost estimation. Hm, I guess that you chose hash-join as "least-costed join". In the pgbench model, most combination between two tables generate hash join as cheapest path. Remote EXPLAIN is very expensive in the context of planning, so it would easily make the plan optimization meaningless. But giving an option to users is good, I agree. > > It also needs a consensus whether cost for remote CPU execution is > equivalent to local CPU. If we think local CPU is rare resource > than remote one, a discount rate will make planner more preferable > to choose remote join than local one Something like cpu_cost_ratio as a new server-level FDW option? > > Once we assume a join algorithm for remote join, unit cost for > remote CPU, we can calculate a cost for foreign join based on > the local join logic plus cost for network translation (maybe > fdw_tuple_cost?). Yes, sum of these costs is the total cost of a remote join. o fdw_startup_cost o hash-join cost, estimated as a local join o fdw_tuple_cost * rows * width > * FDW options > Unlike table scan, FDW options we should refer is unclear. > Table level FDW options are associated with a foreign table as > literal. I think we have two options here: > 1. Foreign-join refers FDW options for foreign-server, but ones >for foreign-tables are ignored. > 2. Foreign-join is prohibited when both of relations don't have >identical FDW options. > My preference is 2. Even though N-way foreign join, it ensures > all the tables involved with (N-1)-way foreign join has identical > FDW options, thus it leads we can make N-way foreign join with > all identical FDW options. > One exception is "updatable" flag of postgres_fdw. It does not > make sense on remote join, so I think mixture of updatable and > non-updatable foreign tables should be admitted, however, it is > a decision by FDW driver. > > Probably, above points need to take time for getting consensus. > I'd like to see your opinion prior to editing your patch. postgres_fdw can't push down a join which contains foreign tables on multiple servers, so use_remote_estimate and fdw_startup_cost are the only FDW options to consider. So we have options for each option. 1-a. If all foreign tables in the join has identical use_remote_estimate, allow pushing down. 1-b. If any of foreign table in the join has true as use_remote_estimate, use remote estimate. 2-a. If all foreign tables in the join has identical fdw_startup_cost, allow pushing down. 2-b. Always use max value in the join. (cost would be more expensive) 2-c. Always use min value in the join. (cost would be cheaper) I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have reasonable cost about startup. I agree about "updatable" option. > > [implementation issues] > The interface does not intend to add new Path/Plan type for each scan > that replaces foreign joins. What postgres_fdw should do is, adding > ForeignPath towards a particular joinrel, then it populates ForeignScan > with remote join query once it got chosen by the planner. That idea is interesting, and make many things simpler. Please let me consider. > > A few functions added in src/backend/foreign/foreign.c are not > called by anywhere, at this moment. > > create_plan_recurse() is reverted to static. It is needed for custom- > join enhancement, if no other infrastructure can support. I made it back to static because I thought that create_plan_recurse can be called by core before giving control to FDWs. But I'm not sure it can be applied to custom scans. I'll recheck that part. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Thu, Feb 19, 2015 at 2:58 PM, Tom Lane wrote: > Michael Paquier writes: >> 1-2) sizeof(ParamListInfoData) is present in a couple of places, >> assuming that sizeof(ParamListInfoData) has the equivalent of 1 >> parameter, like prepare.c, functions.c, spi.c and postgres.c: >> - /* sizeof(ParamListInfoData) includes the first array element */ >> paramLI = (ParamListInfo) >> palloc(sizeof(ParamListInfoData) + >> - (num_params - 1) * sizeof(ParamExternData)); >> + num_params * sizeof(ParamExternData)); >> 1-3) FuncCandidateList in namespace.c (thanks Andres!): >> newResult = (FuncCandidateList) >> - palloc(sizeof(struct _FuncCandidateList) - >> sizeof(Oid) >> - + effective_nargs * sizeof(Oid)); >> + palloc(sizeof(struct _FuncCandidateList) + >> + effective_nargs * sizeof(Oid)); >> I imagine that we do not want for those palloc calls to use ifdef >> FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if >> compiler does not support flexible-array length, right? > > These are just wrong. As a general rule, we do not want to *ever* take > sizeof() a struct that contains a flexible array: the results will not > be consistent across platforms. The right thing is to use offsetof() > instead. See the helpful comment autoconf provides: > > [...] And I had this one in front of my eyes a couple of hours ago... Thanks. > This point is actually the main reason we've not done this change long > since. People did not feel like running around to make sure there were > no overlooked uses of sizeof(). Thanks for the clarifications and the review. Attached is a new set. -- Michael From ca92019afc1db4da147ebb5c436164a787f7fa62 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 16 Feb 2015 02:44:10 +0900 Subject: [PATCH 1/4] First cut with FLEXIBLE_ARRAY_MEMBER This is targetted to prevent false positive errors that static code analyzers may do by assuming that some structures have an unvarying size. --- contrib/cube/cubedata.h | 7 +++ contrib/intarray/_int.h | 4 ++-- contrib/ltree/ltree.h | 14 +++--- contrib/pg_trgm/trgm.h | 2 +- src/backend/catalog/namespace.c | 4 ++-- src/backend/commands/prepare.c | 6 +++--- src/backend/executor/functions.c| 6 +++--- src/backend/executor/spi.c | 5 ++--- src/backend/nodes/params.c | 5 ++--- src/backend/tcop/postgres.c | 5 ++--- src/bin/pg_dump/dumputils.c | 3 +-- src/bin/pg_dump/dumputils.h | 2 +- src/include/access/gin_private.h| 2 +- src/include/access/gist_private.h | 5 +++-- src/include/access/heapam_xlog.h| 2 +- src/include/access/spgist_private.h | 10 +- src/include/access/xact.h | 8 src/include/c.h | 4 ++-- src/include/catalog/namespace.h | 4 ++-- src/include/commands/dbcommands.h | 4 ++-- src/include/commands/tablespace.h | 2 +- src/include/executor/hashjoin.h | 2 +- src/include/nodes/bitmapset.h | 4 ++-- src/include/nodes/params.h | 2 +- src/include/nodes/tidbitmap.h | 2 +- src/include/postgres.h | 9 + src/include/postmaster/syslogger.h | 2 +- src/include/replication/walsender_private.h | 2 +- src/include/storage/bufpage.h | 3 ++- src/include/storage/fsm_internals.h | 2 +- src/include/storage/standby.h | 4 ++-- src/include/tsearch/dicts/regis.h | 2 +- src/include/tsearch/dicts/spell.h | 6 +++--- src/include/tsearch/ts_type.h | 6 +++--- src/include/utils/catcache.h| 4 ++-- src/include/utils/datetime.h| 5 +++-- src/include/utils/geo_decls.h | 5 +++-- src/include/utils/jsonb.h | 2 +- src/include/utils/relmapper.h | 2 +- src/include/utils/varbit.h | 3 ++- 40 files changed, 86 insertions(+), 85 deletions(-) diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h index 5d44e11..3535847 100644 --- a/contrib/cube/cubedata.h +++ b/contrib/cube/cubedata.h @@ -23,11 +23,10 @@ typedef struct NDBOX unsigned int header; /* - * Variable length array. The lower left coordinates for each dimension - * come first, followed by upper right coordinates unless the point flag - * is set. + * The lower left coordinates for each dimension come first, followed + * by upper right coordinates unless the point flag is set. */ - double x[1]; + double x[FLEXIBLE_ARRAY_MEMBER]
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 10, 2015 at 12:09 PM, Peter Geoghegan wrote: >> Then the problem suddenly becomes that previous choices of >> indexes/statements aren't possible anymore. It seems much better to >> introduce the syntax now and not have too much of a usecase for >> it. > > The only way the lack of a way of specifying which opclass to use > could bite us is if there were two *actually* defined unique indexes > on the same column, each with different "equality" operators. That > seems like kind of a funny scenario, even if that were quite possible > (even if non-default opclasses existed that had a non-identical > "equality" operators, which is basically not the case today). > > I grant that is a bit odd that we're talking about unique indexes > definitions affecting semantics, but that is to a certain extent the > nature of the beast. As a compromise, I suggest having the inference > specification optionally accept a named opclass per attribute, in the > style of CREATE INDEX (I'm already reusing a bit of the raw parser > support for CREATE INDEX, you see) - that'll make inference insist on > that opclass, rather than make it a strict matter of costing available > alternatives (not that any alternative is expected with idiomatic > usage). That implies no additional parser overhead, really. If that's > considered ugly, then at least it's an ugly thing that literally no > one will ever use in the foreseeable future...and an ugly thing that > is no more necessary in CREATE INDEX than here (and yet CREATE INDEX > lives with the ugliness). Any thoughts on this, anyone? AFAICT, only this and the behavior of logical decoding are open items at this point. I'd like to close out both of those sooner rather than later. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund wrote: > On 2015-02-16 11:30:20 +, Syed, Rahila wrote: >> - * As a trivial form of data compression, the XLOG code is aware that >> - * PG data pages usually contain an unused "hole" in the middle, which >> - * contains only zero bytes. If hole_length > 0 then we have removed >> - * such a "hole" from the stored data (and it's not counted in the >> - * XLOG record's CRC, either). Hence, the amount of block data actually >> - * present is BLCKSZ - hole_length bytes. >> + * Block images are able to do several types of compression: >> + * - When wal_compression is off, as a trivial form of compression, the >> + * XLOG code is aware that PG data pages usually contain an unused "hole" >> + * in the middle, which contains only zero bytes. If length < BLCKSZ >> + * then we have removed such a "hole" from the stored data (and it is >> + * not counted in the XLOG record's CRC, either). Hence, the amount >> + * of block data actually present is "length" bytes. The hole "offset" >> + * on page is defined using "hole_offset". >> + * - When wal_compression is on, block images are compressed using a >> + * compression algorithm without their hole to improve compression >> + * process of the page. "length" corresponds in this case to the length >> + * of the compressed block. "hole_offset" is the hole offset of the page, >> + * and the length of the uncompressed block is defined by "raw_length", >> + * whose data is included in the record only when compression is enabled >> + * and "with_hole" is set to true, see below. >> + * >> + * "is_compressed" is used to identify if a given block image is compressed >> + * or not. Maximum page size allowed on the system being 32k, the hole >> + * offset cannot be more than 15-bit long so the last free bit is used to >> + * store the compression state of block image. If the maximum page size >> + * allowed is increased to a value higher than that, we should consider >> + * increasing this structure size as well, but this would increase the >> + * length of block header in WAL records with alignment. >> + * >> + * "with_hole" is used to identify the presence of a hole in a block image. >> + * As the length of a block cannot be more than 15-bit long, the extra bit >> in >> + * the length field is used for this identification purpose. If the block >> image >> + * has no hole, it is ensured that the raw size of a compressed block image >> is >> + * equal to BLCKSZ, hence the contents of >> XLogRecordBlockImageCompressionInfo >> + * are not necessary. >> */ >> typedef struct XLogRecordBlockImageHeader >> { >> - uint16 hole_offset;/* number of bytes before "hole" */ >> - uint16 hole_length;/* number of bytes in "hole" */ >> + uint16 length:15, /* length of block data in >> record */ >> + with_hole:1;/* status of hole in the block >> */ >> + >> + uint16 hole_offset:15, /* number of bytes before "hole" */ >> + is_compressed:1;/* compression status of image */ >> + >> + /* Followed by the data related to compression if block is compressed >> */ >> } XLogRecordBlockImageHeader; > > Yikes, this is ugly. > > I think we should change the xlog format so that the block_id (which > currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't > the block id but something like XLR_CHUNK_ID. Which is used as is for > XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to > XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, > XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the > block id following the chunk id. > Yes, that'll increase the amount of data for a backup block by 1 byte, > but I think that's worth it. I'm pretty sure we will be happy about the > added extensibility pretty soon. Yeah, that would help for readability and does not cost much compared to BLCKSZ. Still could you explain what kind of extensibility you have in mind except code readability? It is hard to make a nice picture with only the paper and the pencils, and the current patch approach has been taken to minimize the record length, particularly for users who do not care about WAL compression. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table-level log_autovacuum_min_duration
On Thu, Feb 19, 2015 at 2:13 PM, Naoya Anzai wrote: > As a result, I think you should not delete VACOPT_VERBOSE. In v8 it is not deleted. It is still declared, and its use is isolated in gram.y, similarly to VACOPT_FREEZE. > According to the last mail I have posted, the difference of > manual-vacuum log and auto-vacuum log exists clearly. Did you test the latest patch v8? I have added checks in it to see if a process is an autovacuum worker to control elevel and the extra logs of v7 do not show up. > So, at least you should not touch the mechanism of VACOPT_VERBOSE > until both vacuum log formats are unified to a same format. If you mean that we should have the same kind of log outputs for autovacuum and manual vacuum, I think that this is not going to happen. Autovacuum entries are kept less verbose on purpose, contract that v7 clealy broke. > If you agree my think, please undo your removing VACOPT_VERBOSE work. Well, I don't agree :) And I am guessing that you did not look at v8 as well. Centralizing the control of logs using log_min_duration is more extensible than simply having VACOPT_VERBOSE. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assessing parallel-safety
On Wed, Feb 18, 2015 at 12:23:05PM -0500, Robert Haas wrote: > On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch wrote: > > It's important for parallelism to work under the extended query protocol, > > and > > that's nontrivial. exec_parse_message() sets cursorOptions. > > exec_bind_message() runs the planner. We don't know if a parallel plan is > > okay before seeing max_rows in exec_execute_message(). > > Yes, that's a problem. One could require users of the extended query > protocol to indicate their willingness to accept a parallel query plan > when sending the bind message by setting the appropriate cursor > option; and one could, when that option is specified, refuse execute > messages with max_rows != 0. However, that has the disadvantage of > forcing all clients to be updated for the new world order. Right; that would imply a protocol version bump. > Or, one > could assume a parallel plan is OK and re-plan if we choose one and > then a non-zero max_rows shows up. In the worst case, though, that > could require every query to be planned twice. That sounds reasonable as a first cut. It is perfect for libpq clients, which always use zero max_rows. The policy deserves wider scrutiny before release, but it's the sort of thing we can tweak without wide-ranging effects on the rest of the parallelism work. One might use the Bind message portal name as an additional heuristic. libpq usually (always?) uses the empty portal name. A caller specifying a non-empty portal name is far more likely to have a non-zero max_rows on its way. More radical still: upon receiving the Bind message, peek ahead to see if an Execute message is waiting. If so, let the max_rows of the Execute inform Bind-associated planning. > > The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql > > reset proparallel=u for some built-in functions, such as make_interval. > > I can fix that if I add a PARALLEL { SAFE | RESTRICTED | UNSAFE } > clause to CREATE FUNCTION. Eventually that, but adding "UPDATE pg_proc" after the CORF is a fine stopgap. > Or are you saying that CORF shouldn't > change the existing value of the flag? I'm not sure exactly what our > policy is here, but I would expect that the CORF is doing the right > thing and we need to add the necessary syntax to CREATE FUNCTION and > then add that clause to those calls. I agree that CREATE OR REPLACE FUNCTION is doing the right thing. > > - Several functions, such as array_in and anytextcat, can call arbitrary I/O > > functions. This implies a policy that I/O functions must be > > PROPARALLEL_SAFE. I agree with that decision, but it should be > > documented. > > Where? On further thought, it can wait for the CREATE FUNCTION syntax additions. At that time, xtypes.sgml is the place. Perhaps just adding PARALLEL SAFE to the example CREATE FUNCTION calls will suffice. > > - All selectivity estimator functions are PROPARALLEL_SAFE. That implies, > > for > > example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or (I meant to write "eqsel", not "eqjoinsel". eqjoinsel is not a restriction selectivity estimator function, but the rest of the argument applies to both functions.) That argument does not apply to all possible selectivity estimator functions. It is common for an estimator to call the opercode on MCV and/or histogram values, which makes the estimator function no more parallel safe than the weakest associated opercode. eqsel, eqjoinsel and scalarltsel all do that. > > define its own restriction selectivity estimator function. On the other > > hand, the planner need not check the parallel safety of selectivity > > estimator functions. If we're already in parallel mode at the moment we > > enter the planner, we must be planning a query called within a > > PROPARALLEL_SAFE function. If the query uses an unsafe operator, the > > containing function was mismarked. > > This seems backwards to me. If some hypothetical selectivity > estimator were PROPARALLEL_UNSAFE, then any operator that uses that > function would also need to be PROPARALLEL_UNSAFE. It's fair to have a PROPARALLEL_UNSAFE estimator for a PROPARALLEL_SAFE function, because the planning of a parallel query is often not itself done in parallel mode. In that case, "SELECT * FROM tablename WHERE colname OP 0" might use a parallel seqscan but fail completely if called from inside a function running in parallel mode. That is to say, an affected query can itself use parallelism, but placing the query in a function makes the function PROPARALLEL_UNSAFE. Surprising, but not wrong. Rereading my previous message, I failed to make the bottom line clear: I recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an estimator's proparallel before calling it in the planner. > The reverse is not > true. The operator can be as unsafe as it likes and still use a safe > estimator. Agreed. "contsel" is PROPARALLEL_SAFE no matter
Re: [HACKERS] Table-level log_autovacuum_min_duration
Hi, Michael I thought about VACOPT_VERBOSE again. As a result, I think you should not delete VACOPT_VERBOSE. According to the last mail I have posted, the difference of manual-vacuum log and auto-vacuum log exists clearly. So, at least you should not touch the mechanism of VACOPT_VERBOSE until both vacuum log formats are unified to a same format. If you agree my think, please undo your removing VACOPT_VERBOSE work. Regards, --- Naoya -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Michael Paquier writes: > 1-2) sizeof(ParamListInfoData) is present in a couple of places, > assuming that sizeof(ParamListInfoData) has the equivalent of 1 > parameter, like prepare.c, functions.c, spi.c and postgres.c: > - /* sizeof(ParamListInfoData) includes the first array element */ > paramLI = (ParamListInfo) > palloc(sizeof(ParamListInfoData) + > - (num_params - 1) * sizeof(ParamExternData)); > + num_params * sizeof(ParamExternData)); > 1-3) FuncCandidateList in namespace.c (thanks Andres!): > newResult = (FuncCandidateList) > - palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid) > - + effective_nargs * sizeof(Oid)); > + palloc(sizeof(struct _FuncCandidateList) + > + effective_nargs * sizeof(Oid)); > I imagine that we do not want for those palloc calls to use ifdef > FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if > compiler does not support flexible-array length, right? These are just wrong. As a general rule, we do not want to *ever* take sizeof() a struct that contains a flexible array: the results will not be consistent across platforms. The right thing is to use offsetof() instead. See the helpful comment autoconf provides: /* Define to nothing if C supports flexible array members, and to 1 if it does not. That way, with a declaration like `struct s { int n; double d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99 compilers. When computing the size of such an object, don't use 'sizeof (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)' instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with MSVC and with C++ compilers. */ #define FLEXIBLE_ARRAY_MEMBER /**/ This point is actually the main reason we've not done this change long since. People did not feel like running around to make sure there were no overlooked uses of sizeof(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Thu, Feb 19, 2015 at 11:00 AM, Tom Lane wrote: > Michael Paquier writes: >> On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane wrote: >>> Moreover, if we have any code that is assuming such cases are okay, it >>> probably needs a second look. Isn't this situation effectively assuming >>> that a variable-length array is fixed-length? > >> AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has >> put a couple of things in light that could be revisited: >> 1) tuptoaster.c, with this declaration of varlena: >> struct >> ... >> } chunk_data; > > I'm pretty sure that thing ought to be a union, not a struct. > >> 2) inv_api.c with this thing: >> ... > And probably this too. Sounds good to me, but with an additional VARHDRSZ to give enough room IMO (or toast_save_datum explodes). >> 3) heapam.c in three places with HeapTupleHeaderData: >> struct >> { >> HeapTupleHeaderData hdr; >> chardata[MaxHeapTupleSize]; >> } tbuf; > > And this, though I'm not sure if we'd have to change the size of the > padding data[] member. Here I think that we should add sizeof(HeapTupleHeaderData) to ensure that there is enough room >> 5) reorderbuffer.h with its use of HeapTupleHeaderData: > > Hmm. Andres will have to answer for that one ;-) Surely. This impacts decode.c and reorder.c at quick glance. So, attached are a new set of patches: 1) 0001 is more or less the same as upthread, changing trivial places with foo[1]. I have checked as well calls to sizeof for the structures impacted: 1-1) In dumputils.c, I guess that the call of sizeof with SimpleStringListCell should be changed as follows: cell = (SimpleStringListCell *) - pg_malloc(sizeof(SimpleStringListCell) + strlen(val)); + pg_malloc(sizeof(SimpleStringListCell)); 1-2) sizeof(ParamListInfoData) is present in a couple of places, assuming that sizeof(ParamListInfoData) has the equivalent of 1 parameter, like prepare.c, functions.c, spi.c and postgres.c: - /* sizeof(ParamListInfoData) includes the first array element */ paramLI = (ParamListInfo) palloc(sizeof(ParamListInfoData) + - (num_params - 1) * sizeof(ParamExternData)); + num_params * sizeof(ParamExternData)); 1-3) FuncCandidateList in namespace.c (thanks Andres!): newResult = (FuncCandidateList) - palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid) - + effective_nargs * sizeof(Oid)); + palloc(sizeof(struct _FuncCandidateList) + + effective_nargs * sizeof(Oid)); I imagine that we do not want for those palloc calls to use ifdef FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if compiler does not support flexible-array length, right? 2) 0002 fixes pg_authid to use CATALOG_VARLEN, this is needed for 0003. 3) 0003 switches varlena in c.h to use FLEXIBLE_ARRAY_MEMBER, with necessary tweaks added for tuptoaster.c and inv_api.c 4) 0004 is some preparatory work before switching HeapTupleHeaderData and MinimalTupleData, changing the struct declarations to union in heapam.c with enough room ensured for processing. Regards, -- Michael From 509e9e28e523624881b2584ce05c78fc422cff44 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 16 Feb 2015 02:44:10 +0900 Subject: [PATCH 1/4] First cut with FLEXIBLE_ARRAY_MEMBER This is targetted to prevent false positive errors that static code analyzers may do by assuming that some structures have an unvarying size. --- contrib/cube/cubedata.h | 7 +++ contrib/intarray/_int.h | 4 ++-- contrib/ltree/ltree.h | 14 +++--- contrib/pg_trgm/trgm.h | 2 +- src/backend/catalog/namespace.c | 4 ++-- src/backend/commands/prepare.c | 4 ++-- src/backend/executor/functions.c| 3 +-- src/backend/executor/spi.c | 3 +-- src/backend/nodes/params.c | 3 +-- src/backend/tcop/postgres.c | 3 +-- src/bin/pg_dump/dumputils.c | 3 +-- src/bin/pg_dump/dumputils.h | 2 +- src/include/access/gin_private.h| 2 +- src/include/access/gist_private.h | 5 +++-- src/include/access/heapam_xlog.h| 2 +- src/include/access/spgist_private.h | 10 +- src/include/access/xact.h | 8 src/include/c.h | 4 ++-- src/include/catalog/namespace.h | 4 ++-- src/include/commands/dbcommands.h | 4 ++-- src/include/commands/tablespace.h | 2 +- src/include/executor/hashjoin.h | 2 +- src/include/nodes/bitmapset.h | 4 ++-- src/include/node
[HACKERS] Dead code in gin_private.h related to page split in WAL
Hi all, I noticed that the following structures are still defined in gin_private.h but they are used nowhere since 2c03216d that has reworked WAL format: - ginxlogSplitEntry - ginxlogSplitDataLeaf - ginxlogSplitDataInternal Attached is a trivial patch to remove them. Regards, -- Michael diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h index bda7c28..9bb0ba4 100644 --- a/src/include/access/gin_private.h +++ b/src/include/access/gin_private.h @@ -511,34 +511,6 @@ typedef struct ginxlogSplit #define GIN_INSERT_ISLEAF 0x02 /* .. */ #define GIN_SPLIT_ROOT 0x04 /* only for split records */ -typedef struct -{ - OffsetNumber separator; - OffsetNumber nitem; - - /* FOLLOWS: IndexTuples */ -} ginxlogSplitEntry; - -typedef struct -{ - uint16 lsize; - uint16 rsize; - ItemPointerData lrightbound; /* new right bound of left page */ - ItemPointerData rrightbound; /* new right bound of right page */ - - /* FOLLOWS: new compressed posting lists of left and right page */ - char newdata[1]; -} ginxlogSplitDataLeaf; - -typedef struct -{ - OffsetNumber separator; - OffsetNumber nitem; - ItemPointerData rightbound; - - /* FOLLOWS: array of PostingItems */ -} ginxlogSplitDataInternal; - /* * Vacuum simply WAL-logs the whole page, when anything is modified. This * functionally identical heap_newpage records, but is kept separate for diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index cfd580c..ca7454f 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2192,9 +2192,6 @@ ginxlogInsertEntry ginxlogInsertListPage ginxlogRecompressDataLeaf ginxlogSplit -ginxlogSplitDataInternal -ginxlogSplitDataLeaf -ginxlogSplitEntry ginxlogUpdateMeta ginxlogVacuumDataLeafPage ginxlogVacuumPage -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan wrote: > > On 02/18/2015 08:34 PM, David Fetter wrote: > >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: >> >>> On 1/20/15 6:32 PM, David G Johnston wrote: >>> In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether "future" queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. >>> > I think we should be calculating the population variance. >>> >> > Why population variance and not sample variance? In distributions >> where the second moment about the mean exists, it's an unbiased >> estimator of the variance. In this, it's different from the >> population variance. >> > > Because we're actually measuring the whole population, and not a sample? > This. The key incorrect word in David Fetter's statement is "estimator". We are not estimating anything but rather providing descriptive statistics for a defined population. Users extrapolate that the next member added to the population would be expected to conform to this statistical description without bias (though see below). We can also then define the new population by including this new member and generating new descriptive statistics (which allows for evolution to be captured in the statistics). Currently (I think) we allow the end user to kill off the entire population and build up from scratch so that while, in the short term, the ability to predict the attributes of future members is limited once the population has reached a statistically significant level new predictions will no longer be skewed by population members who attributes were defined in a older and possibly significantly different environment. In theory it would be nice to be able to give the user the ability to specify - by time or percentage - a subset of the population to leave alive. Actual time-weighted sampling would be an alternative but likely one significantly more difficult to accomplish. I really have dug too deep into the mechanics of the current code but I don't see any harm in sharing the thought. David J.
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Thu, Feb 19, 2015 at 12:25 AM, David Steele wrote: > Hi Fujii, > > Thanks for taking a look at the patch. Comments below: > > On 2/18/15 6:11 AM, Fujii Masao wrote: >> On Wed, Feb 18, 2015 at 1:26 AM, David Steele wrote: >>> On 2/17/15 10:23 AM, Simon Riggs wrote: I vote to include pgaudit in 9.5, albeit with any changes. In particular, David may have some changes to recommend, but I haven't seen a spec or a patch, just a new version of code (which isn't how we do things...). >>> >>> I submitted the new patch in my name under a separate thread "Auditing >>> extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net) >> >> I played the patch version of pg_audit a bit and have basic comments about >> its spec. >> >> The pg_audit doesn't log BIND parameter values when prepared statement is >> used. >> Seems this is an oversight of the patch. Or is this intentional? > > It's actually intentional - following the model I talked about in my > earlier emails, the idea is to log statements only. Is this acceptable for audit purpose in many cases? Without the values, I'm afraid that it's hard to analyze what table records are affected by the statements from the audit logs. I was thinking that identifying the data affected is one of important thing for the audit. If I'm malicious DBA, I will always use the extended protocol to prevent the values from being audited when I execute the statement. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)
On 23.12.2014 11:28, Heikki Linnakangas wrote: > On 12/07/2014 03:54 AM, Tomas Vondra wrote: >> The one interesting case is the 'step skew' with statistics_target=10, >> i.e. estimates based on mere 3000 rows. In that case, the adaptive >> estimator significantly overestimates: >> >> values currentadaptive >> -- >> 106 99 107 >> 1068 6449190 >> 1006 38 6449190 >> 10006327 42441 >> >> I don't know why I didn't get these errors in the previous runs, because >> when I repeat the tests with the old patches I get similar results with >> a 'good' result from time to time. Apparently I had a lucky day back >> then :-/ >> >> I've been messing with the code for a few hours, and I haven't >> found any significant error in the implementation, so it seems that >> the estimator does not perform terribly well for very small samples >> (in this case it's 3000 rows out of 10.000.000 (i.e. ~0.03%). > > The paper [1] gives an equation for an upper bound of the error of this > GEE estimator. How do the above numbers compare with that bound? Well, that's a bit more complicated because the "Theorem 1" you mention does not directly specify upper boundary for a single estimate. It's formulated like this: Assume table with "N" rows, D distinct values and sample of "r" rows (all those values are fixed). Then there exists a dataset with those features, so that "ratio error" error(D, D') = max(D'/D, D/D') is greater than f(N, r, P) with probability at least "P". I.e. if you randomly choose a sample of 'r' rows, you'll get an error exceeding the ratio with probability P. So it's not not a hard limit, but speaks about probability of estimation error with some (unknown) dataset dataset. So it describes what you can achieve at best - if you think your estimator is better, there'll always be a dataset hiding in the shadows hissing "Theorem 1". Let's say we're looking for boundary that's crossed only in 1% (or 5%) of measurements. Applying this to the sample data I posted before, i.e. 10M rows with three sample sizes 'r' (3000, 30.000 and 300.000 rows), the ratio error boundary per the the paper is 3.000 30.000 300.000 1% 88 28 9 5% 70 22 7 At least that's what I get if I compute it using this python function: def err(N, r, p): return sqrt((N-r)/(2.0*r) * log(1.0/p)) So the estimates I posted before are not terribly good, I guess, especially the ones returning 6449190. I wonder whether there really is some stupid bug in the implementation. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On 19/02/15 15:00, Tom Lane wrote: Michael Paquier writes: On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane wrote: Moreover, if we have any code that is assuming such cases are okay, it probably needs a second look. Isn't this situation effectively assuming that a variable-length array is fixed-length? AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has put a couple of things in light that could be revisited: 1) tuptoaster.c, with this declaration of varlena: struct { struct varlena hdr; chardata[TOAST_MAX_CHUNK_SIZE]; /* make struct big enough */ int32 align_it; /* ensure struct is aligned well enough */ } chunk_data; I'm pretty sure that thing ought to be a union, not a struct. 2) inv_api.c with this thing: struct { bytea hdr; chardata[LOBLKSIZE];/* make struct big enough */ int32 align_it; /* ensure struct is aligned well enough */ } workbuf; And probably this too. 3) heapam.c in three places with HeapTupleHeaderData: struct { HeapTupleHeaderData hdr; chardata[MaxHeapTupleSize]; } tbuf; And this, though I'm not sure if we'd have to change the size of the padding data[] member. 5) reorderbuffer.h with its use of HeapTupleHeaderData: Hmm. Andres will have to answer for that one ;-) regards, tom lane Curious, has this problem been raised with the gcc maintainers? Is this still a problem with gcc 5.0 (which is due to be released soon)? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
On Wed, Feb 18, 2015 at 10:57 PM, Kevin Grittner wrote: > Magnus Hagander wrote: > > On Feb 17, 2015 12:26 AM, "Andres Freund" > wrote: > >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote: > >> But max_standby_streaming_delay, max_standby_archive_delay and > >> hot_standby_feedback are among the most frequent triggers for > >> questions and complaints that I/we see. > >> > > Agreed. > > And a really bad one used to be vacuum_defer_cleanup_age, because > > of confusing units amongst other things. Which in terms seems > > fairly close to Kevins suggestions, unfortunately. > > Particularly my initial suggestion, which was to base snapshot > "age" it on the number of transaction IDs assigned. Does this look > any better to you if it is something that can be set to '20min' or > '1h'? Just to restate, that would not automatically cancel the > snapshots past that age; it would allow vacuum of any tuples which > became "dead" that long ago, and would cause a "snapshot too old" > message for any read of a page modified more than that long ago > using a snapshot which was older than that. > Yes, it would definitely look much better. My reference per above was exactly that - having a setting in the unit "number of xids" confused a lot of users and made it really hard to tune. Having something in time units is a lot easier to understand and tune for most people. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
On 2015/02/18 21:44, Stephen Frost wrote: * Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote: On 2015/02/18 7:44, Stephen Frost wrote: Attached is a patch which should address this. Would love your (or anyone else's) feedback on it. It appears to address the issue which you raised and the regression test changes are all in-line with inserting a LockRows into the subquery, as anticipated. I've looked into the patch. * The patch applies to the latest head, 'make' passes successfully, but 'make check' fails in the rowsecurity test. Apologies for not being clear- the patch was against 9.4, where it passes all the regression tests (at least for me- if you see differently, please let me know!). Sorry, I assumed that the patch was against HEAD. I comfermed that the back-patched 9.4 passes all the regression tests! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anyarray
On 19.2.2015 03:14, Tomas Vondra wrote: > > I've noticed two unrelated files Meh, should be "I noticed the patch removes two unrelated files" ... > > ../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out > ../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql > > I suppose that's not intentional, right? > -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anyarray
On 13.2.2015 16:20, Teodor Sigaev wrote: > Some of users of intarray contrib module wish to use its features > with another kind of arrays, not only for int4 type. Suggested > module generalizes intarray over other (not all) types op pgsql. > > Anyarray also provides a calculation of similarity two one > dimensinal arrays similar to smlar module. Anyarray module doesn't > provide an something similar to query_int feature of intarray, > because this feature is very hard to implement (it requires new > pseudo-type anyquery), it is close to impossible to have operation > extensibility and it's complicated queries are hidden from pgsql's > optimizer (like to jsquery). As far I know, query_int isn't very > popular for now. Hi Teodor, I started reviewing this patch today - first of all, kudos for having 944kB of regression tests in 1MB patch ;-) I've noticed two unrelated files ../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out ../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql I suppose that's not intentional, right? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Michael Paquier writes: > On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane wrote: >> Moreover, if we have any code that is assuming such cases are okay, it >> probably needs a second look. Isn't this situation effectively assuming >> that a variable-length array is fixed-length? > AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has > put a couple of things in light that could be revisited: > 1) tuptoaster.c, with this declaration of varlena: > struct > { > struct varlena hdr; > chardata[TOAST_MAX_CHUNK_SIZE]; /* make > struct big enough */ > int32 align_it; /* ensure struct is > aligned well enough */ > } chunk_data; I'm pretty sure that thing ought to be a union, not a struct. > 2) inv_api.c with this thing: > struct > { > bytea hdr; > chardata[LOBLKSIZE];/* make struct > big enough */ > int32 align_it; /* ensure struct is > aligned well enough */ > } workbuf; And probably this too. > 3) heapam.c in three places with HeapTupleHeaderData: > struct > { > HeapTupleHeaderData hdr; > chardata[MaxHeapTupleSize]; > } tbuf; And this, though I'm not sure if we'd have to change the size of the padding data[] member. > 5) reorderbuffer.h with its use of HeapTupleHeaderData: Hmm. Andres will have to answer for that one ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the stats snapshot timestamp to SQL
Hi there, On 30.1.2015 06:27, Matt Kelly wrote: > Actually, I just did one more code review of myself, and somehow missed > that I submitted the version with the wrong oid. The oid used in the > first version is wrong (1) and was from before I read the docs on > properly picking one. > > Attached is the fixed version. (hopefully with the right mime-type and > wrong extension. Alas, gmail doesn't let you set mime-types; time to > find a new email client...) I do have a question regarding the patch, although I see the patch is already marked as 'ready for committer' (sorry, somehow managed to miss the patch until now). I see the patch only works with the top-level snapshot timestamp, stored in globalStats, but since 9.3 (when the stats were split into per-db files) we track per-database timestamps too. Shouldn't we make those timestamps accessible too? It's not necessary for detecting unresponsive statistics collector (if it's stuck it's not writing anything, so the global timestamp will be old too), but it seems more appropriate for querying database-level stats to query database-level timestamp too. But maybe that's not necessary, because to query database stats you have to be connected to that particular database and that should write fresh stats, so the timestamps should not be very different. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/18/2015 08:34 PM, David Fetter wrote: On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether "future" queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Because we're actually measuring the whole population, and not a sample? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: > On 1/20/15 6:32 PM, David G Johnston wrote: > > In fact, as far as the database knows, the values provided to this > > function do represent an entire population and such a correction > > would be unnecessary. I guess it boils down to whether "future" > > queries are considered part of the population or whether the > > population changes upon each query being run and thus we are > > calculating the ever-changing population variance. > > I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane wrote: > Michael Paquier writes: >> On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund >> wrote: >>> The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the >>> middle of a struct but not when when you embed a struct that uses it >>> into the middle another struct. At least gcc doesn't and I think it'd be >>> utterly broken if another compiler did that. If there's a compiler that >>> does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1. > >> clang does complain on my OSX laptop regarding that ;) > > I'm a bit astonished that gcc doesn't consider this an error. Sure seems > like it should. (Has anyone tried it on recent gcc?) Just tried with gcc 4.9.2 on an ArchLinux bix and it does not complain after switching the declaration of varlena declaration from [1] to FLEXIBLE_ARRAY_MEMBER in c.h on HEAD. But it does with clang 3.5.1 on the same box. > I am entirely > opposed to Andreas' claim that we ought to consider compilers that do warn > to be broken; if anything it's the other way around. I'm on board with that. > Moreover, if we have any code that is assuming such cases are okay, it > probably needs a second look. Isn't this situation effectively assuming > that a variable-length array is fixed-length? AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has put a couple of things in light that could be revisited: 1) tuptoaster.c, with this declaration of varlena: struct { struct varlena hdr; chardata[TOAST_MAX_CHUNK_SIZE]; /* make struct big enough */ int32 align_it; /* ensure struct is aligned well enough */ } chunk_data; 2) inv_api.c with this thing: struct { bytea hdr; chardata[LOBLKSIZE];/* make struct big enough */ int32 align_it; /* ensure struct is aligned well enough */ } workbuf; 3) heapam.c in three places with HeapTupleHeaderData: struct { HeapTupleHeaderData hdr; chardata[MaxHeapTupleSize]; } tbuf; 4) pg_authid.h with its use of relpasswd. 5) reorderbuffer.h with its use of HeapTupleHeaderData: typedef struct ReorderBufferTupleBuf { /* position in preallocated list */ slist_node node; /* tuple, stored sequentially */ HeapTupleData tuple; HeapTupleHeaderData header; chardata[MaxHeapTupleSize]; } ReorderBufferTupleBuf; Those issues can be grouped depending on where foo[1] is switched to FLEXIBLE_ARRAY_MEMBER, so I will try to get a set of patches depending on that. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
* Kevin Grittner (kgri...@ymail.com) wrote: > Stephen Frost wrote: > > I also agree with the general idea that it makes sense to provide a way > > to control bloat, but I think you've missed what Andres was getting at > > with his suggestion (if I understand correctly, apologies if I don't). > > > > The problem is that we're only looking at the overall xmin / xmax > > horizon when it comes to deciding if a given tuple is dead. That's > > not quite right- the tuple might actually be dead to all *current* > > transactions by being newer than the oldest transaction but dead for all > > later transactions. Basically, there exist gaps between our cluster > > wide xmin / xmax where we might find actually dead rows. Marking those > > rows dead and reusable *would* stop the bloat, not just slow it down. > > > > In the end, with a single long-running transaction, the worst bloat you > > would have is double the size of the system at the time the long-running > > transaction started. > > I agree that limiting bloat to one dead tuple for every live one > for each old snapshot is a limit that has value, and it was unfair > of me to characterize that as not being a limit. Sorry for that. > > This possible solution was discussed with the user whose feedback > caused me to write this patch, but there were several reasons they > dismissed that as a viable total solution for them, two of which I > can share: > > (1) They have a pool of connections each of which can have several > long-running cursors, so the limit from that isn't just doubling > the size of their database, it is limiting it to some two-or-three > digit multiple of the necessary size. This strikes me as a bit off-the-cuff; was an analysis done which deteremined that would be the result? If there is overlap between the long-running cursors then there would be less bloat, and most systems which I'm familiar with don't turn the entire database over in 20 minutes, 20 hours, or even 20 days except in pretty specific cases. Perhaps this is one of those, and if so then I'm all wet, but the feeling I get is that this is a way to dismiss this solution because it's not what's wanted, which is "what Oracle did." > (2) They are already prepared to deal with "snapshot too old" > errors on queries that run more than about 20 minutes and which > access tables which are modified. They would rather do that than > suffer the bloat beyond that point. That, really, is the crux here- they've already got support for dealing with it the way Oracle did and they'd like PG to do that too. Unfortunately, that, by itself, isn't a good reason for a particular capability (we certainly aren't going to be trying to duplicate PL/SQL in PG any time soon). That said, there are capabilities in other RDBMS's which are valuable and which we *do* want, so the fact that Oracle does this also isn't a reason to not include it. > IMO all of these changes people are working are very valuable, and > complement each other. This particular approach is likely to be > especially appealing to those moving from Oracle because it is a > familiar mechanism, and one which some of them have written their > software to be aware of and deal with gracefully. For my 2c, I'd much rather provide them with a system where they don't have to deal with broken snapshots than give them a way to have them the way Oracle provided them. :) That said, even the approach Andres outlined will cause bloat and it may be beyond what's acceptable in some environments, and it's certainly more complicated and unlikely to get done in the short term. > > I'm not against having a knob like this, which is defaulted to off, > > Thanks! I'm not sure that amounts to a +1, but at least it doesn't > sound like a -1. :-) So, at the time I wrote that, I wasn't sure if it was a +1 or not myself. I've been thinking about it since then, however, and I'm leaning more towards having the capability than not, so perhaps that's a +1, but it doesn't excuse the need to come up with an implementation that everyone can be happy with and what you've come up with so far doesn't have a lot of appeal, based on the feedback (I've only glanced through it myself, but I agree with Andres and Tom that it's a larger footprint than we'd want for this and the number of places having to be touched is concerning as it could lead to future bugs). A lot of that would go away if there was a way to avoid having to mess with the index AMs, I'd think, but I wonder if we'd actually need more done there- it's not immediately obvious to me how an index-only scan is safe with this. Whenever an actual page is visited, we can check the LSN, and the relation can't be truncated by vacuum since the transaction will still have a lock on the table which prevents it, but does the visibility-map update check make sure to never mark pages all-visible when one of these old transactions is running around? On what basis? > > but I do think we'd be a lot better off
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
On Sun, Feb 15, 2015 at 8:27 PM, Tom Lane wrote: > There might be something in that, but again it's not much like this patch. > The key point I think we're both making is that nondeterministic failures > are bad, especially when you're talking about long-running, expensive-to- > retry transactions. > But I think Kevin would agree with you there. That's why he's interested in having the errors not occur if you don't read from the volatile tables. Ie, your reporting query reading from read-only tables would be guaranteed to succeed even if some other table had had some rows vacuumed away. I'm not sure that's really going to work because of things like hint bit updates or hot pruning. That is, say your table is read-only now but last week some of the records were updated. You might reasonably expect those rows to be safe and indeed the rows themselves will still be in your report. But the earlier versions of the rows could still be sitting on some pages invisible to every query but not vacuumable quite yet and then suddenly they get vacuumed away and the LSN on the page gets bumped. Fwiw I would strongly suggest that instead of having a number of transactions to have a time difference. I haven't been following the "commit timestamp" thread but I'm hoping that patch has the infrastructure needed to look up an lsn and find out the timestamp and vice versa. So when you take a snapshot you could look up the effective cut-off LSN based on the time difference specified in the GUC. As a DBA I would find it infinitely more comforting specifying "old_snapshot_threshold=4h" than specifying some arbitrary large number of transactions and hoping I calculated the transaction rate reasonably accurately. -- greg
[HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding
Andres pointed out that the INSERT ... ON CONFLICT UPDATE patch doesn't work well with logical decoding. Basically, as things stand, there is no means of determining that an INSERT had an ON CONFLICT UPDATE clause from logical decoding. It isn't obvious that this matters - after all, the relevant WAL records are generally consistent with there being a vanilla INSERT (or vanilla UPDATE). If there was no possibility of concurrent conflicts, that might actually be the end of it, but of course there is. I guess that the best way of fixing this is exposing to output plugins that a "super deletion" is a REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE. This is kind of an internal ReorderBufferChangeType constant, because it means that the output plugin should probably just omit the tuple just inserted and now deleted. I tend to think that it would be best if the fact that a speculative insertion was a speculative insertion is not exposed. We just formalize that a REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE is a variant of REORDER_BUFFER_CHANGE_DELETE...it's implicit that things like triggers are not fired here (which may or may not matter, depending on the use case, I suppose). Would that be sufficiently flexible for the various logical replication use cases? I guess we are somewhat forced to push that kind of thing into output plugins, because doing so lets them decide how to handle this. It's a little unfortunate that this implementation detail is exposed to output plugins, though, which is why I'd be willing to believe that it'd be better to have transaction reassembly normalize the records such that a super deleted tuple was never exposed to output plugins in the first place...they'd only see a REORDER_BUFFER_CHANGE_INSERT when that was the definitive outcome of an UPSERT, or alternatively a REORDER_BUFFER_CHANGE_UPDATE when that was the definitive outcome. No need for output plugins to consider REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE at all. Thoughts? Can't say that I've given conflict resolution for multi-master systems a great deal of thought before now, so I might be quite off the mark here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] array_push is just stupid ...
While hacking around with expanded arrays, I realized that array_push() is just a horribly badly designed function. There is no reason at all for the same C function to underlie both array_append and array_prepend: the function spends significant time and effort reverse-engineering which way it was called, and there's very little logic-in-common that would justify merging the two code paths. If we split it apart into separate array_append and array_prepend functions, I think we'd actually end up with less code. And it would certainly be noticeably faster because we'd not need to be doing get_fn_expr_argtype and get_element_type (the latter of which is a syscache lookup, hence not so cheap) on *both* arguments every time through. (Admittedly, some of that could be bought back with a bit more effort on locally caching the result, but it would be better to know which input is the array a priori.) Barring objections, I'm going to go fix this independently of the expanded-array changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
Stephen Frost wrote: > I also agree with the general idea that it makes sense to provide a way > to control bloat, but I think you've missed what Andres was getting at > with his suggestion (if I understand correctly, apologies if I don't). > > The problem is that we're only looking at the overall xmin / xmax > horizon when it comes to deciding if a given tuple is dead. That's > not quite right- the tuple might actually be dead to all *current* > transactions by being newer than the oldest transaction but dead for all > later transactions. Basically, there exist gaps between our cluster > wide xmin / xmax where we might find actually dead rows. Marking those > rows dead and reusable *would* stop the bloat, not just slow it down. > > In the end, with a single long-running transaction, the worst bloat you > would have is double the size of the system at the time the long-running > transaction started. I agree that limiting bloat to one dead tuple for every live one for each old snapshot is a limit that has value, and it was unfair of me to characterize that as not being a limit. Sorry for that. This possible solution was discussed with the user whose feedback caused me to write this patch, but there were several reasons they dismissed that as a viable total solution for them, two of which I can share: (1) They have a pool of connections each of which can have several long-running cursors, so the limit from that isn't just doubling the size of their database, it is limiting it to some two-or-three digit multiple of the necessary size. (2) They are already prepared to deal with "snapshot too old" errors on queries that run more than about 20 minutes and which access tables which are modified. They would rather do that than suffer the bloat beyond that point. IMO all of these changes people are working are very valuable, and complement each other. This particular approach is likely to be especially appealing to those moving from Oracle because it is a familiar mechanism, and one which some of them have written their software to be aware of and deal with gracefully. > I'm not against having a knob like this, which is defaulted to off, Thanks! I'm not sure that amounts to a +1, but at least it doesn't sound like a -1. :-) > but I do think we'd be a lot better off with a system that could > realize when rows are not visible to any currently running transaction > and clean them up. +1 But they are not mutually exclusive; I see them as complementary. > If this knob's default is off then I don't think > we'd be likely to get the complaints which are being discussed (or, if > we did, we could point the individual at the admin who set the knob...). That's how I see it, too. I would not suggest making the default anything other than "off", but there are situations where it would be a nice tool to have in the toolbox. Thanks for the feedback! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 16/02/15 10:46, Andres Freund wrote: On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote: At a quick glance, this basic design seems workable. I would suggest expanding the replication IDs to regular 4 byte oids. Two extra bytes is a small price to pay, to make it work more like everything else in the system. I don't know. Growing from 3 to 5 byte overhead per relevant record (or even 0 to 5 in case the padding is reused) is rather noticeable. If we later find it to be a limit (I seriously doubt that), we can still increase it in a major release without anybody really noticing. I agree that limiting the overhead is important. But I have related though about this - I wonder if it's worth to try to map this more directly to the CommitTsNodeId. I mean it is currently using that for the actual storage, but I am thinking of having the Replication Identifiers be the public API and the CommitTsNodeId stuff be just hidden implementation detail. This thought is based on the fact that CommitTsNodeId provides somewhat meaningless number while Replication Identifiers give us nice name for that number. And if we do that then the type should perhaps be same for both? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
Kevin, * Kevin Grittner (kgri...@ymail.com) wrote: > Magnus Hagander wrote: > > On Feb 17, 2015 12:26 AM, "Andres Freund" wrote: > >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote: > > >>> It seems we already have a mechanism in place that allows > >>> tuning of query cancel on standbys vs. preventing standby > >>> queries from seeing old data, specifically > >>> max_standby_streaming_delay/max_standby_archive_delay. We > >>> obsessed about how users were going to react to these odd > >>> variables, but there has been little negative feedback. > >> > >> FWIW, I think that's a somewhat skewed perception. I think it > >> was right to introduce those, because we didn't really have any > >> alternatives. > > As far as I can see, the "alternatives" suggested so far are all > about causing heap bloat to progress more slowly, but still without > limit. I suggest, based on a lot of user feedback (from the > customer I've talked about at some length on this thread, as well > as numerous others), that unlimited bloat based on the activity of > one connection is a serious deficiency in the product; and that > there is no real alternative to something like a "snapshot too old" > error if we want to fix that deficiency. Enhancements to associate > a snapshot with a database and using a separate vacuum xmin per > database, not limiting vacuum of a particular object by snapshots > that cannot see that snapshot, etc., are all Very Good Things and I > hope those changes are made, but none of that fixes a very > fundamental flaw. I also agree with the general idea that it makes sense to provide a way to control bloat, but I think you've missed what Andres was getting at with his suggestion (if I understand correctly, apologies if I don't). The problem is that we're only looking at the overall xmin / xmax horizon when it comes to deciding if a given tuple is dead. That's not quite right- the tuple might actually be dead to all *current* transactions by being newer than the oldest transaction but dead for all later transactions. Basically, there exist gaps between our cluster wide xmin / xmax where we might find actually dead rows. Marking those rows dead and reusable *would* stop the bloat, not just slow it down. In the end, with a single long-running transaction, the worst bloat you would have is double the size of the system at the time the long-running transaction started. Another way of thinking about it is with this timeline: xx /\ /\ /\ /\ Long runningrow createdrow deletedvacuum transaction starts Now, if all transactions currently running started either before the row was created or after the row was deleted, then the row is dead and vacuum can reclaim it. Finding those gaps in the transaction space for all of the currently running processes might not be cheap, but for this kind of a use-case, it might be worth the effort. On a high-churn system where the actual set of rows being copied over constantly isn't very high then you could end up with some amount of duplication of rows- those to satisfy the ancient transaction and the current working set, but there wouldn't be any further bloat and it'd almost cerainly be a much better situation than what is being seen now. > Particularly my initial suggestion, which was to base snapshot > "age" it on the number of transaction IDs assigned. Does this look > any better to you if it is something that can be set to '20min' or > '1h'? Just to restate, that would not automatically cancel the > snapshots past that age; it would allow vacuum of any tuples which > became "dead" that long ago, and would cause a "snapshot too old" > message for any read of a page modified more than that long ago > using a snapshot which was older than that. > > Unless this idea gets some +1 votes Real Soon Now, I will mark the > patch as Rejected and move on. I'm not against having a knob like this, which is defaulted to off, but I do think we'd be a lot better off with a system that could realize when rows are not visible to any currently running transaction and clean them up. If this knob's default is off then I don't think we'd be likely to get the complaints which are being discussed (or, if we did, we could point the individual at the admin who set the knob...). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Michael Paquier writes: > On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund > wrote: >> The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the >> middle of a struct but not when when you embed a struct that uses it >> into the middle another struct. At least gcc doesn't and I think it'd be >> utterly broken if another compiler did that. If there's a compiler that >> does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1. > clang does complain on my OSX laptop regarding that ;) I'm a bit astonished that gcc doesn't consider this an error. Sure seems like it should. (Has anyone tried it on recent gcc?) I am entirely opposed to Andreas' claim that we ought to consider compilers that do warn to be broken; if anything it's the other way around. Moreover, if we have any code that is assuming such cases are okay, it probably needs a second look. Isn't this situation effectively assuming that a variable-length array is fixed-length? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund wrote: > On 2015-02-18 17:15:18 +0900, Michael Paquier wrote: >> >> - I don't think that the t_bits fields in htup_details.h should be >> >> updated either. >> > >> > Why not? Any not broken code should already use MinHeapTupleSize and >> > similar macros. >> >> Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and >> similarly a couple of redo routines in heapam.c using >> HeapTupleHeaderData in a couple of structures not placing it at the >> end (compiler complains). > > The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the > middle of a struct but not when when you embed a struct that uses it > into the middle another struct. At least gcc doesn't and I think it'd be > utterly broken if another compiler did that. If there's a compiler that > does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1. clang does complain on my OSX laptop regarding that ;) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On 2015-02-19 07:10:19 +0900, Michael Paquier wrote: > On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund > wrote: > > On 2015-02-18 17:15:18 +0900, Michael Paquier wrote: > >> >> - I don't think that the t_bits fields in htup_details.h should be > >> >> updated either. > >> > > >> > Why not? Any not broken code should already use MinHeapTupleSize and > >> > similar macros. > >> > >> Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and > >> similarly a couple of redo routines in heapam.c using > >> HeapTupleHeaderData in a couple of structures not placing it at the > >> end (compiler complains). > > > > The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the > > middle of a struct but not when when you embed a struct that uses it > > into the middle another struct. At least gcc doesn't and I think it'd be > > utterly broken if another compiler did that. If there's a compiler that > > does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1. > > clang does complain on my OSX laptop regarding that ;) I think that then puts the idea of doing it for those structs pretty much to bed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
Magnus Hagander wrote: > On Feb 17, 2015 12:26 AM, "Andres Freund" wrote: >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote: >>> It seems we already have a mechanism in place that allows >>> tuning of query cancel on standbys vs. preventing standby >>> queries from seeing old data, specifically >>> max_standby_streaming_delay/max_standby_archive_delay. We >>> obsessed about how users were going to react to these odd >>> variables, but there has been little negative feedback. >> >> FWIW, I think that's a somewhat skewed perception. I think it >> was right to introduce those, because we didn't really have any >> alternatives. As far as I can see, the "alternatives" suggested so far are all about causing heap bloat to progress more slowly, but still without limit. I suggest, based on a lot of user feedback (from the customer I've talked about at some length on this thread, as well as numerous others), that unlimited bloat based on the activity of one connection is a serious deficiency in the product; and that there is no real alternative to something like a "snapshot too old" error if we want to fix that deficiency. Enhancements to associate a snapshot with a database and using a separate vacuum xmin per database, not limiting vacuum of a particular object by snapshots that cannot see that snapshot, etc., are all Very Good Things and I hope those changes are made, but none of that fixes a very fundamental flaw. >> But max_standby_streaming_delay, max_standby_archive_delay and >> hot_standby_feedback are among the most frequent triggers for >> questions and complaints that I/we see. >> > Agreed. > And a really bad one used to be vacuum_defer_cleanup_age, because > of confusing units amongst other things. Which in terms seems > fairly close to Kevins suggestions, unfortunately. Particularly my initial suggestion, which was to base snapshot "age" it on the number of transaction IDs assigned. Does this look any better to you if it is something that can be set to '20min' or '1h'? Just to restate, that would not automatically cancel the snapshots past that age; it would allow vacuum of any tuples which became "dead" that long ago, and would cause a "snapshot too old" message for any read of a page modified more than that long ago using a snapshot which was older than that. Unless this idea gets some +1 votes Real Soon Now, I will mark the patch as Rejected and move on. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > > Now, we probably don't want to hack *all* the utility commands to return > > > ObjectAddress instead of OID, because it many cases that's just not > > > going to be convenient (not to speak of the code churn); so I think for > > > most objtypes the ProcessUtilitySlow stanza would look like this: > > > That'd be fine with me, though for my 2c, I wouldn't object to changing > > them all to return ObjectAddress either. I agree that it'd cause a fair > > bit of code churn to do so, but there's a fair bit of code churn > > happening here anyway (looking at what 0008 does to ProcessUtilitySlow > > anyway). > > Well, that would make my life easier I think (even if it's a bit more > work), so unless there are objections I will do things this way. It's a > bit of a pity that Robert and Dimitri went to huge lengths to have these > functions return OID and we're now changing it all to ObjAddress > instead, but oh well. Not sure that I see it as that huge a deal.. They're still returning an Oid, it's just embedded in the ObjAddress to provide a complete statement of what the object is. btw, the hunk in 0026 which adds a 'break;' into standard_ProcessUtility caught me by surprise. Looks like that 'break;' was missing from 0003 (for T_GrantStmt). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] deparsing utility commands
> > Now, we probably don't want to hack *all* the utility commands to return > > ObjectAddress instead of OID, because it many cases that's just not > > going to be convenient (not to speak of the code churn); so I think for > > most objtypes the ProcessUtilitySlow stanza would look like this: > That'd be fine with me, though for my 2c, I wouldn't object to changing > them all to return ObjectAddress either. I agree that it'd cause a fair > bit of code churn to do so, but there's a fair bit of code churn > happening here anyway (looking at what 0008 does to ProcessUtilitySlow > anyway). Well, that would make my life easier I think (even if it's a bit more work), so unless there are objections I will do things this way. It's a bit of a pity that Robert and Dimitri went to huge lengths to have these functions return OID and we're now changing it all to ObjAddress instead, but oh well. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > > Patch 0002 I think is good to go as well, AFAICT (have the various > > > RENAME commands return the OID and attnum of affected objects). > > > > It's not a huge complaint, but it feels a bit awkward to me that > > ExecRenameStmt is now returning one item and using an out variable for > > the other when the two really go together (Oid and Object Sub ID, that > > is). Further, the comment above ExecRenameStmt should make it clear > > that it's safe to pass NULL into objsubid if you don't care about it. > > > > The same probably goes for the COMMENT bits. > > Hmm, while I agree that it's a relatively minor point, it seems a fair > one. I think we could handle this by returning ObjectAddress rather > than Oid in ExecRenameStmt() and CommentObject(); then you have all the > bits you need in a single place. Furthermore, the function in another > patch EventTriggerStashCommand() instead of getting separately (ObjType, > objectId, objectSubId) could take a single argument of type > ObjectAddress. Agreed, that thought occured to me as well while I was looking through the other deparse patches and I agree that it makes sense. > Now, we probably don't want to hack *all* the utility commands to return > ObjectAddress instead of OID, because it many cases that's just not > going to be convenient (not to speak of the code churn); so I think for > most objtypes the ProcessUtilitySlow stanza would look like this: > > case T_AlterTSConfigurationStmt: > objectId = > AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree); > objectType = OBJECT_TSCONFIGURATION; > break; > > For ExecRenameStmt and CommentObject (and probably other cases such as > security labels) the stanza in ProcessUtilitySlow would be simpler: > > case T_CommentStmt: > address = CommentObject((CommentStmt *) > parsetree); > break; > > and at the bottom of the loop we would transform the objid/type into > address for the cases that need it: > > if (!commandStashed) > { > if (objectId != InvalidOid) > { > address.classId = > get_objtype_catalog_oid(objectType); > address.objectId = objectId; > address.objectSubId = 0; > } > EventTriggerStashCommand(address, secondaryOid, > parsetree); > } That'd be fine with me, though for my 2c, I wouldn't object to changing them all to return ObjectAddress either. I agree that it'd cause a fair bit of code churn to do so, but there's a fair bit of code churn happening here anyway (looking at what 0008 does to ProcessUtilitySlow anyway). > > > Yes, I will push these unless somebody objects soon, as they seem > > > perfectly reasonable to me. The only troubling thing is that there is > > > no regression test for this kind of thing in event triggers (i.e. verify > > > which command tags get support and which don't), which seems odd to me. > > > Not these patches's fault, though, so I'm not considering adding any ATM. > > > > Ugh. I dislike that when we say an event trigger will fire before > > 'GRANT' what we really mean is "GRANT when it's operating against a > > local object". At the minimum we absolutely need to be very clear in > > the documentation about that limitation. Perhaps there is something > > already which reflects that, but I don't see anything in the patch > > being added about that. > > Hmm, good point, will give this some thought. I'm thinking perhaps we > can add a table of which object types are supported for generic commands > such as GRANT, COMMENT and SECURITY LABEL. That sounds like a good idea. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] deparsing utility commands
Stephen, Stephen Frost wrote: > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > Patch 0002 I think is good to go as well, AFAICT (have the various > > RENAME commands return the OID and attnum of affected objects). > > It's not a huge complaint, but it feels a bit awkward to me that > ExecRenameStmt is now returning one item and using an out variable for > the other when the two really go together (Oid and Object Sub ID, that > is). Further, the comment above ExecRenameStmt should make it clear > that it's safe to pass NULL into objsubid if you don't care about it. > > The same probably goes for the COMMENT bits. Hmm, while I agree that it's a relatively minor point, it seems a fair one. I think we could handle this by returning ObjectAddress rather than Oid in ExecRenameStmt() and CommentObject(); then you have all the bits you need in a single place. Furthermore, the function in another patch EventTriggerStashCommand() instead of getting separately (ObjType, objectId, objectSubId) could take a single argument of type ObjectAddress. Now, we probably don't want to hack *all* the utility commands to return ObjectAddress instead of OID, because it many cases that's just not going to be convenient (not to speak of the code churn); so I think for most objtypes the ProcessUtilitySlow stanza would look like this: case T_AlterTSConfigurationStmt: objectId = AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree); objectType = OBJECT_TSCONFIGURATION; break; For ExecRenameStmt and CommentObject (and probably other cases such as security labels) the stanza in ProcessUtilitySlow would be simpler: case T_CommentStmt: address = CommentObject((CommentStmt *) parsetree); break; and at the bottom of the loop we would transform the objid/type into address for the cases that need it: if (!commandStashed) { if (objectId != InvalidOid) { address.classId = get_objtype_catalog_oid(objectType); address.objectId = objectId; address.objectSubId = 0; } EventTriggerStashCommand(address, secondaryOid, parsetree); } > > On 0006 (which is about having ALTER TABLE return the OID/attnum of the > > affected object on each subcommand), I have a problem about the ALTER > > TABLE ALTER COLUMN SET DATA TYPE USING subcommand. The problem with > > that is that in order to fully deparse that subcommand we need to > > deparse the expression of the USING clause. But in the parse node we > > only have info about the untransformed expression, so it is not possible > > to pass it through ruleutils directly; it needs to be run by > > transformExpr() first. > > I agree- I'm pretty sure we definitely don't want to run through > transformExpr again in the deparse code. I'm not a huge fan of adding a > Node* output parameter, but I havn't got any other great ideas about how > to address that. Yeah, my thoughts exactly :-( > > > I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good > > > independently as well, but there previously have been raised some > > > concerns about shared objects. I think the answer in the patches which > > > is to raise events when affecting database local objects makes sense, > > > but others might disagree. > > > > Yes, I will push these unless somebody objects soon, as they seem > > perfectly reasonable to me. The only troubling thing is that there is > > no regression test for this kind of thing in event triggers (i.e. verify > > which command tags get support and which don't), which seems odd to me. > > Not these patches's fault, though, so I'm not considering adding any ATM. > > Ugh. I dislike that when we say an event trigger will fire before > 'GRANT' what we really mean is "GRANT when it's operating against a > local object". At the minimum we absolutely need to be very clear in > the documentation about that limitation. Perhaps there is something > already which reflects that, but I don't see anything in the patch > being added about that. Hmm, good point, will give this some thought. I'm thinking perhaps we can add a table of which object types are supported for generic commands such as GRANT, COMMENT and SECURITY LABEL. > Still looking at the rest of the patches. Great, thanks -- and thanks for the review so far. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Andres Freund wrote: > > On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote: > > I think you should just go ahead and commit 0001, 0002, 0006. Those have > > previously been discussed and seem like a good idea and uncontroversial > > even if the rest of the series doesn't go in. > > I have pushed 0001 (changed pg_identify_object for opclasses and > opfamilies to use USING instead of FOR) to master only, and backpatched > a fix for pg_conversion object identities, which were missing > schema-qualification. That looked fine to me. > Patch 0002 I think is good to go as well, AFAICT (have the various > RENAME commands return the OID and attnum of affected objects). It's not a huge complaint, but it feels a bit awkward to me that ExecRenameStmt is now returning one item and using an out variable for the other when the two really go together (Oid and Object Sub ID, that is). Further, the comment above ExecRenameStmt should make it clear that it's safe to pass NULL into objsubid if you don't care about it. The same probably goes for the COMMENT bits. > On 0006 (which is about having ALTER TABLE return the OID/attnum of the > affected object on each subcommand), I have a problem about the ALTER > TABLE ALTER COLUMN SET DATA TYPE USING subcommand. The problem with > that is that in order to fully deparse that subcommand we need to > deparse the expression of the USING clause. But in the parse node we > only have info about the untransformed expression, so it is not possible > to pass it through ruleutils directly; it needs to be run by > transformExpr() first. Doing that in the deparse code seems the wrong > solution, so I am toying with the idea of adding a "Node *" output > parameter for some ATExec* routines; the Prep step would insert the > transformed expression there, so that it is available at the deparse > stage. As far as I know, only the SET DATA TYPE USING form has this > issue: for other subcommands, the current code is simply grabbing the > expression from catalogs. Maybe it would be good to use that Node in > those cases as well and avoid catalog lookups, not sure. I agree- I'm pretty sure we definitely don't want to run through transformExpr again in the deparse code. I'm not a huge fan of adding a Node* output parameter, but I havn't got any other great ideas about how to address that. > > I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good > > independently as well, but there previously have been raised some > > concerns about shared objects. I think the answer in the patches which > > is to raise events when affecting database local objects makes sense, > > but others might disagree. > > Yes, I will push these unless somebody objects soon, as they seem > perfectly reasonable to me. The only troubling thing is that there is > no regression test for this kind of thing in event triggers (i.e. verify > which command tags get support and which don't), which seems odd to me. > Not these patches's fault, though, so I'm not considering adding any ATM. Ugh. I dislike that when we say an event trigger will fire before 'GRANT' what we really mean is "GRANT when it's operating against a local object". At the minimum we absolutely need to be very clear in the documentation about that limitation. Perhaps there is something already which reflects that, but I don't see anything in the patch being added about that. Still looking at the rest of the patches. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] deparsing utility commands
Andres Freund wrote: > Hi, > > On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote: > > This is a repost of the patch to add CREATE command deparsing support to > > event triggers. It now supports not only CREATE but also ALTER and > > other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL. > > This patch series is broken up in a rather large number of patches, but > > my intention would be to commit separately only the first 6 patches; the > > remaining parts are split up only so that it's easy to see how deparsing > > each patch is implemented, and would be committed as a single changeset > > (but before that I need a bunch of extra fixes and additions to other > > parts.) > > I think you should just go ahead and commit 0001, 0002, 0006. Those have > previously been discussed and seem like a good idea and uncontroversial > even if the rest of the series doesn't go in. I have pushed 0001 (changed pg_identify_object for opclasses and opfamilies to use USING instead of FOR) to master only, and backpatched a fix for pg_conversion object identities, which were missing schema-qualification. Patch 0002 I think is good to go as well, AFAICT (have the various RENAME commands return the OID and attnum of affected objects). On 0006 (which is about having ALTER TABLE return the OID/attnum of the affected object on each subcommand), I have a problem about the ALTER TABLE ALTER COLUMN SET DATA TYPE USING subcommand. The problem with that is that in order to fully deparse that subcommand we need to deparse the expression of the USING clause. But in the parse node we only have info about the untransformed expression, so it is not possible to pass it through ruleutils directly; it needs to be run by transformExpr() first. Doing that in the deparse code seems the wrong solution, so I am toying with the idea of adding a "Node *" output parameter for some ATExec* routines; the Prep step would insert the transformed expression there, so that it is available at the deparse stage. As far as I know, only the SET DATA TYPE USING form has this issue: for other subcommands, the current code is simply grabbing the expression from catalogs. Maybe it would be good to use that Node in those cases as well and avoid catalog lookups, not sure. > I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good > independently as well, but there previously have been raised some > concerns about shared objects. I think the answer in the patches which > is to raise events when affecting database local objects makes sense, > but others might disagree. Yes, I will push these unless somebody objects soon, as they seem perfectly reasonable to me. The only troubling thing is that there is no regression test for this kind of thing in event triggers (i.e. verify which command tags get support and which don't), which seems odd to me. Not these patches's fault, though, so I'm not considering adding any ATM. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > I've started taking a look at this as the pgaudit bits depend on it and > > noticed that the patch set implies there's 42 patches, but there were > > only 37 attached..? > > Ah, I didn't realize when I posted that the subject was counting those > extra patches. They are later patches that add the testing framework, > but since the tests don't pass currently, they are not usable yet. > Mostly they are about the running deparse_init.sql file that I posted > separately. I will post a real patch for that stuff later today to make > it clear what it is that we're talking about. Oh, ok, no problem, just wanted to make sure things weren't missing. > FWIW, one of Robert's main objections is that future hackers will forget > to add deparse support for new commands as they are added. In an > attempt to get this sorted out, I have modified the stuff in > ProcessUtilitySlow() so that instead of having one > EventTriggerStashCommand call for each node type, there is only one call > at the end of the function. That way, new cases in the big switch there > will automatically get something added to the stash, which should make > the test fail appropriately. (Things like adding a new clause to > existing commands will be tested by running pg_dump on the databases and > comparing.) Right, I've been following the discussion and fully agree with Robert that we really need a way to make sure that future work doesn't forget to address deparseing (or the various other bits, object classes, etc, really). The approach you've outlined sounds interesting but I haven't yet gotten to that bit of the code. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] deparsing utility commands
Stephen Frost wrote: > I've started taking a look at this as the pgaudit bits depend on it and > noticed that the patch set implies there's 42 patches, but there were > only 37 attached..? Ah, I didn't realize when I posted that the subject was counting those extra patches. They are later patches that add the testing framework, but since the tests don't pass currently, they are not usable yet. Mostly they are about the running deparse_init.sql file that I posted separately. I will post a real patch for that stuff later today to make it clear what it is that we're talking about. FWIW, one of Robert's main objections is that future hackers will forget to add deparse support for new commands as they are added. In an attempt to get this sorted out, I have modified the stuff in ProcessUtilitySlow() so that instead of having one EventTriggerStashCommand call for each node type, there is only one call at the end of the function. That way, new cases in the big switch there will automatically get something added to the stash, which should make the test fail appropriately. (Things like adding a new clause to existing commands will be tested by running pg_dump on the databases and comparing.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > This is a repost of the patch to add CREATE command deparsing support to > event triggers. It now supports not only CREATE but also ALTER and > other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL. > This patch series is broken up in a rather large number of patches, but > my intention would be to commit separately only the first 6 patches; the > remaining parts are split up only so that it's easy to see how deparsing > each patch is implemented, and would be committed as a single changeset > (but before that I need a bunch of extra fixes and additions to other > parts.) I've started taking a look at this as the pgaudit bits depend on it and noticed that the patch set implies there's 42 patches, but there were only 37 attached..? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
Etsuro, * Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote: > On 2015/02/18 7:44, Stephen Frost wrote: > * The patch applies to the latest head, 'make' passes successfully, > but 'make check' fails in the rowsecurity test. Here's the patch against master. I'm still fiddling with the comment wording and the commit message a bit, but barring objections these patches are what I'm planning to move forward with. Thanks! Stephen From ea3713a8b648459d3024d331ef0374f6c9622247 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Tue, 17 Feb 2015 15:43:33 -0500 Subject: [PATCH] Add locking clause for SB views for update/delete In expand_security_qual(), we were handling locking correctly when a PlanRowMark existed, but not when we were working with the target relation (which doesn't have any PlanRowMarks, but the subquery created for the security barrier quals still needs to lock the rows under it). Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't properly issuing a SELECT ... FOR UPDATE to the remote side under a DELETE. Back-patch to 9.4 where updatable security barrier views were introduced. --- src/backend/optimizer/prep/prepsecurity.c | 24 ++- src/test/regress/expected/rowsecurity.out | 64 --- src/test/regress/expected/updatable_views.out | 264 ++ 3 files changed, 199 insertions(+), 153 deletions(-) diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c index af3ee61..ce7b203 100644 --- a/src/backend/optimizer/prep/prepsecurity.c +++ b/src/backend/optimizer/prep/prepsecurity.c @@ -37,7 +37,7 @@ typedef struct } security_barrier_replace_vars_context; static void expand_security_qual(PlannerInfo *root, List *tlist, int rt_index, - RangeTblEntry *rte, Node *qual); + RangeTblEntry *rte, Node *qual, bool targetRelation); static void security_barrier_replace_vars(Node *node, security_barrier_replace_vars_context *context); @@ -63,6 +63,7 @@ expand_security_quals(PlannerInfo *root, List *tlist) Query *parse = root->parse; int rt_index; ListCell *cell; + bool targetRelation = false; /* * Process each RTE in the rtable list. @@ -98,6 +99,12 @@ expand_security_quals(PlannerInfo *root, List *tlist) { RangeTblEntry *newrte = copyObject(rte); + /* + * We need to let expand_security_qual know if this is the target + * relation, as it has additional work to do in that case. + */ + targetRelation = true; + parse->rtable = lappend(parse->rtable, newrte); parse->resultRelation = list_length(parse->rtable); @@ -147,7 +154,8 @@ expand_security_quals(PlannerInfo *root, List *tlist) rte->securityQuals = list_delete_first(rte->securityQuals); ChangeVarNodes(qual, rt_index, 1, 0); - expand_security_qual(root, tlist, rt_index, rte, qual); + expand_security_qual(root, tlist, rt_index, rte, qual, + targetRelation); } } } @@ -160,7 +168,7 @@ expand_security_quals(PlannerInfo *root, List *tlist) */ static void expand_security_qual(PlannerInfo *root, List *tlist, int rt_index, - RangeTblEntry *rte, Node *qual) + RangeTblEntry *rte, Node *qual, bool targetRelation) { Query *parse = root->parse; Oid relid = rte->relid; @@ -256,6 +264,16 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index, } /* + * We need to handle the case where this is the target relation + * explicitly since it won't have any row marks, because we still + * need to lock the records coming back from the with-security-quals + * subquery. This might not appear obivous, but it matches what + * we're doing above and keeps FDWs happy too. + */ + if (targetRelation) +applyLockingClause(subquery, 1, LCS_FORUPDATE, + LockWaitBlock, false); + /* * Replace any variables in the outer query that refer to the * original relation RTE with references to columns that we will * expose in the new subquery, building the subquery's targetlist diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out index 21817d8..f41bef1 100644 --- a/src/test/regress/expected/rowsecurity.out +++ b/src/test/regress/expected/rowsecurity.out @@ -1034,22 +1034,25 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2); -- SET SESSION AUTHORIZATION rls_regress_user1; EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b); - QUERY PLAN -- +QUERY PLAN +--- Update on t1 t1_3 -> Subquery Scan on t1 Filter: f_leak(t1.b) - -> Seq Scan on t1 t1_4 - Filter: ((a % 2) = 0) + -> LockRows + -> Seq Scan on t1 t1_4 + Filter: ((a % 2) = 0) -> Subquery Scan on t1_1 Filter: f_leak(t1_1.b) - -> Seq Scan on t2 -
Re: [HACKERS] Commit fest 2015-12 enters money time
On Tue, Feb 17, 2015 at 12:34 AM, Michael Paquier wrote: > On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai wrote: > > 2015-02-16 12:25 GMT+09:00 Michael Paquier : > >> > >> > >> On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier > >> wrote: > >>> > >>> In order to move on to the next CF, I am going to go through all the > >>> remaining patches and update their status accordingly. And sorry for > >>> slacking a bit regarding that. If you wish to complain, of course feel > >>> free to post messages on this thread or on the thread related to the > >>> patch itself. > >> > >> > >> As all the entries have been either moved or updated, CF 2014-12 is > closed, > >> and CF 2015-02 has been changed to "In Progress". > >> > > I tried to add my name on the CF entry for the "Join pushdown support > > for foreign tables" patch, however, it was unintentionally marked as > > rejected on the last CF, even though it should be "returned with > feedback". > > > > https://commitfest.postgresql.org/3/20/ > > > > Is it available to move it to the current CF? > > I don't recall you can... > You're correct, you can't. It would probably make sense for at least a CF manager to be able to do that - added to my TODO. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Commit fest 2015-12 enters money time
On Wed, Feb 18, 2015 at 12:34 AM, Michael Paquier wrote: > On Wed, Feb 18, 2015 at 5:55 AM, Corey Huinker > wrote: > > What is required to get the New Patch superpower? I'm also in need of it. > > CCing Magnus... I am not sure myself, but I would imagine that the > commit fest needs to be "Open" and not "In Process" to be able to add > patches. > Sorry, I was out on a long travel day with lots of airplane time. I had this thread on my watch list but hadn't time to fix it. So the basic reason is that there was no Open commitfest available. Only Commitfest Managers are allowed to "break the rules" and add patches to the wrong commitfest - this is why Michael was able to do that, and pretty much nobody else. There's supposed to always be one open commitfest. Since none was added when 2015-02 was switched to In Progress (probably because I didn't actually tell Michael about it), there was nowhere to post new patches. I have fixed this now by creating 2015-06, which is open for submissions. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] deparsing utility commands
Hi, On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote: > This is a repost of the patch to add CREATE command deparsing support to > event triggers. It now supports not only CREATE but also ALTER and > other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL. > This patch series is broken up in a rather large number of patches, but > my intention would be to commit separately only the first 6 patches; the > remaining parts are split up only so that it's easy to see how deparsing > each patch is implemented, and would be committed as a single changeset > (but before that I need a bunch of extra fixes and additions to other > parts.) I think you should just go ahead and commit 0001, 0002, 0006. Those have previously been discussed and seem like a good idea and uncontroversial even if the rest of the series doesn't go in. I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good independently as well, but there previously have been raised some concerns about shared objects. I think the answer in the patches which is to raise events when affecting database local objects makes sense, but others might disagree. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Hi Fujii, Thanks for taking a look at the patch. Comments below: On 2/18/15 6:11 AM, Fujii Masao wrote: > On Wed, Feb 18, 2015 at 1:26 AM, David Steele wrote: >> On 2/17/15 10:23 AM, Simon Riggs wrote: >>> I vote to include pgaudit in 9.5, albeit with any changes. In >>> particular, David may have some changes to recommend, but I haven't >>> seen a spec or a patch, just a new version of code (which isn't how we >>> do things...). >> >> I submitted the new patch in my name under a separate thread "Auditing >> extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net) > > I played the patch version of pg_audit a bit and have basic comments about > its spec. > > The pg_audit doesn't log BIND parameter values when prepared statement is > used. > Seems this is an oversight of the patch. Or is this intentional? It's actually intentional - following the model I talked about in my earlier emails, the idea is to log statements only. This also follows on 2ndQuadrant's implementation. Logging values is interesting, but I'm sure the user would want to specify which columns to log, which I felt was beyond the scope of the patch. > The pg_audit cannot log the statement like "SELECT 1" which doesn't access to > the database object. Is this intentional? I think that there are many users > who > want to audit even such statement. I think I see how to make this work. I'll work on it for the next version of the patch. > > Imagine the case where you call the user-defined function which executes > many nested statements. In this case, pg_audit logs only top-level statement > (i.e., issued directly by client) every time nested statement is executed. > In fact, one call of such UDF can cause lots of *same* log messages. I think > this is problematic. I agree - not sure how to go about addressing it, though. I've tried to cut down on the verbosity of the logging in general, but of course it can still be a problem. Using security definer and a different logging GUC for the defining role might work. I'll add that to my unit tests and see what happens. I appreciate your feedback! -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/17/2015 08:21 PM, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether "future" queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. We are clearly taking the population to be all past queries (from the last reset point). Otherwise calculating min and max wouldn't make sense. The difference is likely to be very small in any case where you actually want to examine the standard deviation, so I feel we're rather arguing about how many angels can fit on the head of a pin, but if this is the consensus I'll change it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On 2/18/15 8:25 AM, Simon Riggs wrote: > On 15 February 2015 at 02:34, David Steele wrote: > >> I've posted a couple of messages over the last few weeks about the work >> I've been doing on the pg_audit extension. The lack of response could >> be due to either universal acclaim or complete apathy, but in any case I >> think this is a very important topic so I want to give it another try. > > You mentioned you had been following the thread for some time and yet > had not contributed to it. Did that indicate your acclaim for the > earlier patch, or was that apathy? I think neither. In my case it actually was acclaim. I was happy with the direction things were going and had nothing in particular to add - and I didn't think a +1 from me was going to carry any weight with the community. I can see now that everyone's opinion matters here, so I'll be more active about weighing in when I think something is valuable. > > People have been working on this feature for >9 months now, so you > having to wait 9 days for a response is neither universal acclaim, nor > apathy. I've waited much longer than that for Stephen to give the > review he promised, but have not bad mouthed him for that wait, nor do > I do so now. In your first post you had removed the author's email > addresses, so they were likely unaware of your post. I certainly was. I understand that, but with the CF closing I felt like I had to act. Abhijit's last comment on the thread was that he was no longer going to work on it in relation to 9.5. I felt that it was an important feature (and one that I have a lot of interest in), so that's when I got involved. I posted two messages, but I only addressed one of them directly to Abhijit. As you said, I'm new here and I'm still getting used to the way things are done. >> I've extensively reworked the code that was originally submitted by >> 2ndQuandrant. This is not an indictment of their work, but rather an >> attempt to redress concerns that were expressed by members of the >> community. I've used core functions to determine how audit events >> should be classified and simplified and tightened the code wherever >> possible. I've removed deparse and event triggers and opted for methods >> that rely only on existing hooks. In my last message I provided >> numerous examples of configuration, usage, and output which I hoped >> would alleviate concerns of complexity. I've also written a ton of unit >> tests to make sure that the code works as expected. > > Some people that have contributed ideas to this patch are from > 2ndQuadrant, some are not. The main point is that we work together on > things, rather than writing a slightly altered version and then > claiming credit. > > If you want to help, please do. We give credit where its due, not to > whoever touched the code last in some kind of bidding war. If we let > this happen, we'd generate a flood of confusing patch versions and > little would ever get committed. Agreed, and I apologize if I came off that way. It certainly wasn't my intention. I was hesitant because I had made so many changes and I wasn't sure how the authors would feel about it. I wrote to them privately to get their take on the situation. > Let's keep to one thread and work to include everybody's ideas then > we'll get something useful committed. I'm a little confused about how to proceed here. I created a new thread because the other patch had already been rejected. How should I handle that? -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
> VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)] > | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)] > | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})] > > REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)] > > EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] > I don't think "(OptionName [bool])" style like "(VERBOSE on, FORCE on)" is needed for REINDEX command. EXPLAIN command has such option style because it has the FORMAT option can have value excepting ON/TRUE or OFF/FALSE.(e.g., TEXT, XML) But the value of REINDEX command option can have only ON or OFF. I think the option name is good enough. Next, regarding of the location of such option, the several maintenance command like CLUSTER, VACUUM has option at immediately after command name. >From consistency perspective, I tend to agree with Robert to put option at immediately after command name as follows. REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name; Btw how long will the FORCE command available? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Andres Freund writes: > On 2015-02-16 21:34:57 -0500, Tom Lane wrote: >> Also, if we want to insist that these fields be accessed >> through heap_getattr, I'd be inclined to put them inside the "#ifdef >> CATALOG_VARLEN" to enforce that. > That we definitely should do. It's imo just a small bug that it was > omitted here. I'll fix it, but not backpatch unless you prefer? Agreed, that's really independent of FLEXIBLE_ARRAY_MEMBER, so fix it as its own patch. I also agree that back-patching is probably not appropriate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Stephen Frost wrote: > Abhijit, > > * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: > > I'm confused and unhappy about your characterisation of the state of > > this patch. You make it seem as though there was broad consensus about > > the changes that were needed, and that I left the patch sitting in the > > archives for a long time without addressing important issues. > > The specific issue which I was referring to there was the #ifdef's for > the deparse branch. I thought it was pretty clear, based on the > feedback from multiple people, that all of that really needed to be > taken out as we don't commit code to the main repo which has such > external dependencies. We can remove the #ifdef lines as soon as DDL deparse gets committed, of course. There is no external dependency here, only a dependency on a patch that's being submitted in parallel. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >I think we should change the xlog format so that the block_id (which currently >is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but >something like XLR_CHUNK_ID. Which is used as is for >XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to >>XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... >The BKP blocks will then follow, storing the block id following the chunk id. >Yes, that'll increase the amount of data for a backup block by 1 byte, but I >think that's worth it. I'm pretty sure we will be happy about the added >extensibility pretty soon. To clarify my understanding of the above change, Instead of a block id to reference different fragments of an xlog record , a single byte field "chunk_id" should be used. chunk_id will be same as XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. But for block references, it will take store following values in order to store information about the backup blocks. #define XLR_CHUNK_BKP_COMPRESSED 0x01 #define XLR_CHUNK_BKP_WITH_HOLE 0x02 ... The new xlog format should look like follows, Fixed-size header (XLogRecord struct) Chunk_id(add a field before id field in XLogRecordBlockHeader struct) XLogRecordBlockHeader Chunk_id XLogRecordBlockHeader ... ... Chunk_id ( rename id field of the XLogRecordDataHeader struct) XLogRecordDataHeader[Short|Long] block data block data ... main data I will post a patch based on this. Thank you, Rahila Syed -Original Message- From: Andres Freund [mailto:and...@2ndquadrant.com] Sent: Monday, February 16, 2015 5:26 PM To: Syed, Rahila Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On 2015-02-16 11:30:20 +, Syed, Rahila wrote: > - * As a trivial form of data compression, the XLOG code is aware that > - * PG data pages usually contain an unused "hole" in the middle, > which > - * contains only zero bytes. If hole_length > 0 then we have removed > - * such a "hole" from the stored data (and it's not counted in the > - * XLOG record's CRC, either). Hence, the amount of block data > actually > - * present is BLCKSZ - hole_length bytes. > + * Block images are able to do several types of compression: > + * - When wal_compression is off, as a trivial form of compression, > + the > + * XLOG code is aware that PG data pages usually contain an unused "hole" > + * in the middle, which contains only zero bytes. If length < BLCKSZ > + * then we have removed such a "hole" from the stored data (and it is > + * not counted in the XLOG record's CRC, either). Hence, the amount > + * of block data actually present is "length" bytes. The hole "offset" > + * on page is defined using "hole_offset". > + * - When wal_compression is on, block images are compressed using a > + * compression algorithm without their hole to improve compression > + * process of the page. "length" corresponds in this case to the > + length > + * of the compressed block. "hole_offset" is the hole offset of the > + page, > + * and the length of the uncompressed block is defined by > + "raw_length", > + * whose data is included in the record only when compression is > + enabled > + * and "with_hole" is set to true, see below. > + * > + * "is_compressed" is used to identify if a given block image is > + compressed > + * or not. Maximum page size allowed on the system being 32k, the > + hole > + * offset cannot be more than 15-bit long so the last free bit is > + used to > + * store the compression state of block image. If the maximum page > + size > + * allowed is increased to a value higher than that, we should > + consider > + * increasing this structure size as well, but this would increase > + the > + * length of block header in WAL records with alignment. > + * > + * "with_hole" is used to identify the presence of a hole in a block image. > + * As the length of a block cannot be more than 15-bit long, the > + extra bit in > + * the length field is used for this identification purpose. If the > + block image > + * has no hole, it is ensured that the raw size of a compressed block > + image is > + * equal to BLCKSZ, hence the contents of > + XLogRecordBlockImageCompressionInfo > + * are not necessary. > */ > typedef struct XLogRecordBlockImageHeader { > - uint16 hole_offset;/* number of bytes before "hole" */ > - uint16 hole_length;/* number of bytes in "hole" */ > + uint16 length:15, /* length of block data in > record */ > + with_hole:1;/* status of hole in the block > */ > + > + uint16 hole_offset:15, /* number of bytes before "hole" */ > + is_compressed:1;/* compression status of image */ > + > + /* Followed by the data related to compression if block is > +compressed */ > } XLogR
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Abhijit, * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: > At 2015-02-17 13:01:46 -0500, sfr...@snowman.net wrote: > > I have to admit that I'm confused by this. Patches don't stabilise > > through sitting in the archives, they stabilise when the comments are > > being addressed, the patch updated, and further comments are > > addressing less important issues. The issues which Robert and I had > > both commented on didn't appear to be getting addressed. > > I'm confused and unhappy about your characterisation of the state of > this patch. You make it seem as though there was broad consensus about > the changes that were needed, and that I left the patch sitting in the > archives for a long time without addressing important issues. The specific issue which I was referring to there was the #ifdef's for the deparse branch. I thought it was pretty clear, based on the feedback from multiple people, that all of that really needed to be taken out as we don't commit code to the main repo which has such external dependencies. That wasn't meant as a characterization of the patch itself but rather a comment on the state of the process and that I, at least, had been hoping for a new version which addressed those bits. > You revised and refined your GRANT proposal in stages, and I tried to > adapt the code to suit. I'm sorry that my efforts were not fast enough > or responsive enough to make you feel that progress was being made. But > nobody else commented in detail on the GRANT changes except to express > general misgivings, and nobody else even disagreed when I inferred, in > light of the lack of consensus that Robert pointed out, that the code > was unlikely to make it into 9.5. Progress was being made and I certainly appreciate your efforts. I don't think anyone would want to stand up and say it's going to be in 9.5 or not be in 9.5 which is likely why you didn't get any reaction from your comment about it being unlikely. I'm certainly hopeful that something gets in along these lines as it's a pretty big capability that we're currently missing. > Given that I've maintained the code over the past year despite its being > rejected in an earlier CF, and given the circumstances outlined above, I > do not think it's reasonable to conclude after a couple of weeks without > a new version that it was abandoned. As I had mentioned earlier, there > are people who already use pgaudit as-is, and complain if I break it. For my part, at least, I didn't intend to characterize it as abandoned but rather that it simply didn't seem to be moving forward, perhaps due to a lack of time to work on it or other priorities; I didn't mean to imply that you wouldn't be continuing to maintain it. As for the comment about people depending on it, I'm not sure what that's getting at, but I don't think that's really a consideration for us as it relates to having a contrib module to provide this capability. We certainly want to consider what capabilities users are looking for and make sure we cover those cases, if possible, but this is at a development stage with regard to core and therefore things may break for existing users. If that's an issue for your users then it would probably be good to have a clean distinction between the stable code you're providing to them for auditing and what's being submitted for inclusion in core, with an expectation that users will need to make some changes when they switch to the version which is included in core. > Anyway, I guess there is no such thing as a constructive history > discussion, so I'll drop it. It's not my intent to continue this as I certainly agree that it seems unlikely to be a constructive use of time, but I wanted to reply and clarify that it wasn't my intent to characterize pgaudit as abandoned and to apologize for my comments coming across that way. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] postgres_fdw foreign keys with default sequence
Tim Kane wrote: > CREATE MATERIALISED VIEW local.devices; > > CREATE test_table (device_id bigint FOREIGN KEY (device_id) REFERENCES > local.devices (device_id) ); > > ERROR: referenced relation "devices" is not a table In the future, please show code that you have actually run. In this case it's pretty easy to know how to answer, but in others it may really draw out the process of helping you. At this time materialized views do not support constraints, and may not be referenced in foreign key definitions. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On 15 February 2015 at 02:34, David Steele wrote: > I've posted a couple of messages over the last few weeks about the work > I've been doing on the pg_audit extension. The lack of response could > be due to either universal acclaim or complete apathy, but in any case I > think this is a very important topic so I want to give it another try. You mentioned you had been following the thread for some time and yet had not contributed to it. Did that indicate your acclaim for the earlier patch, or was that apathy? I think neither. People have been working on this feature for >9 months now, so you having to wait 9 days for a response is neither universal acclaim, nor apathy. I've waited much longer than that for Stephen to give the review he promised, but have not bad mouthed him for that wait, nor do I do so now. In your first post you had removed the author's email addresses, so they were likely unaware of your post. I certainly was. > I've extensively reworked the code that was originally submitted by > 2ndQuandrant. This is not an indictment of their work, but rather an > attempt to redress concerns that were expressed by members of the > community. I've used core functions to determine how audit events > should be classified and simplified and tightened the code wherever > possible. I've removed deparse and event triggers and opted for methods > that rely only on existing hooks. In my last message I provided > numerous examples of configuration, usage, and output which I hoped > would alleviate concerns of complexity. I've also written a ton of unit > tests to make sure that the code works as expected. Some people that have contributed ideas to this patch are from 2ndQuadrant, some are not. The main point is that we work together on things, rather than writing a slightly altered version and then claiming credit. If you want to help, please do. We give credit where its due, not to whoever touched the code last in some kind of bidding war. If we let this happen, we'd generate a flood of confusing patch versions and little would ever get committed. Let's keep to one thread and work to include everybody's ideas then we'll get something useful committed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On 2015-02-16 21:34:57 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2015-02-17 05:51:22 +0900, Michael Paquier wrote: > >> diff --git a/src/include/catalog/pg_authid.h > >> b/src/include/catalog/pg_authid.h > >> index e01e6aa..d8789a5 100644 > >> --- a/src/include/catalog/pg_authid.h > >> +++ b/src/include/catalog/pg_authid.h > >> @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION > >> BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC > >> int32 rolconnlimit; /* max connections allowed (-1=no > >> limit) */ > >> > >> /* remaining fields may be null; use heap_getattr to read them! */ > >> - textrolpassword;/* password, if any */ > >> timestamptz rolvaliduntil; /* password expiration time, if any */ > >> +#ifdef CATALOG_VARLEN > >> + textrolpassword;/* password, if any */ > >> +#endif > >> } FormData_pg_authid; > > > That change IIRC is wrong, because it'll make rolvaliduntil until NOT > > NULL (any column that's fixed width and has only fixed with columns > > before it is marked as such). > > You were muttering about a BKI_FORCE_NOT_NULL option ... for symmetry, > maybe we could add BKI_FORCE_NULL as well, and then use that for cases > like this? Yea, I guess it'd not be too hard. > Also, if we want to insist that these fields be accessed > through heap_getattr, I'd be inclined to put them inside the "#ifdef > CATALOG_VARLEN" to enforce that. That we definitely should do. It's imo just a small bug that it was omitted here. I'll fix it, but not backpatch unless you prefer? > I'm generally -1 on reordering any catalog columns as part of this patch. > There should be zero user-visible change from it IMO. However, if we > stick both those columns inside the ifdef, we don't need to reorder. Agreed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Neil, * Neil Tiffin (ne...@neiltiffin.com) wrote: > I meant it to go to the list, but hit the wrong button. No problem. > > On Feb 17, 2015, at 7:01 PM, Stephen Frost wrote: > > I noticed that you email'd me directly on this reply. Not sure if you > > intended to or not, but I'm fine with my response going to the list. > > > > This approach doesn't violate anything in PG and can be used with any of > > the pgaudit approaches being discussed. The fact that it's > > postgresql.conf, which represents GUCs, doesn't change anything > > regarding what you’re getting it. > > > > It changes everything, pg superusers have complete control of files in the pg > data directory. If set up correctly pg superusers have no control in /etc. If set up correctly, postgresql.conf is in /etc. :) That's distribution specific though. However, postgresql.auto.conf ends up causing problems since it can't be disabled and it's forced to live in the PG data directory. Even so though, your argument was that, as long as the system is set up to be auditing initially and whatever action the superuser takes to disable auditing will be audited, and disabling auditing is against policy, then it's sufficient to meet the requirement. That's true in either case. > > The issue is really around what we claim to provide with this auditing. > > We can't claim to provide *full* superuser auditing with any of these > > approaches since the superuser can disable auditing. We can, however, > > provide auditing up until that happens, which is likely to be sufficient > > in many environments. For those environments where full superuser > > auditing is required, an independent system must be used. > > > > Of course, it's important that any auditing mechanism which is used to > > audit superusers be impossible for the superuser to modify after the > > fact, meaning that syslog or similar needs to be used. > > I’m still confused since you do do not differentiate between db superuser and > os superuser and what you mean by an independent system? When I'm talking about 'superuser' here, it's the PG superuser, not the OS superuser. What I mean by independent system is a process which is not owned by the same user that the database is running as, and isn't owned by the user who is connecting, that facilitates the connection from the user to PG, but which logs everything that happens on that connection. > With the scheme I described above, how does the db superuser disable auditing > without os root privileges? If they can, then pg security is fundamentally > broken, and I don’t believe it is. The DB superuser can modify the running process, through mechanisms as simple as changing GUCs to creating functions in untrusted languages (including C) and then using them to change or break more-or-less anything that's happening in the backend. > How can an independent system monitor what commands are issued inside the > database? It can log everything which happens on the connection between the user and the database because it's in the middle. > I understand my comments do not cover what is being proposed or worked on and > that is fine. But saying it does not have value because the superuser could > disable any system in pg, is wrong IMO. Being able to reliability log db > superusers without their ability to change the logging would be a > fundamentally good security tool as companies become more serious about > internal security. This is, and will be more, important since a lot of > people consider insider breaches the biggest security challenge today. It'd be a great tool, certainly, but it's not actually possible to do completely inside the DB process given the capabilities the PG superuser has. Being able to create and run functions in untrusted languages means that the superuser can completely hijack the process, that's why those languages are untrusted. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On 2015-02-18 16:59:26 +0530, Amit Kapila wrote: > On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund > wrote: > > A query whose runetime is dominated by a sequential scan (+ attached > > filter) is certainly going to require a bigger prefetch size than one > > that does other expensive stuff. > > > > Imagine parallelizing > > SELECT * FROM largetable WHERE col = low_cardinality_value; > > and > > SELECT * > > FROM largetable JOIN gigantic_table ON (index_nestloop_condition) > > WHERE col = high_cardinality_value; > > > > The first query will be a simple sequential and disk reads on largetable > > will be the major cost of executing it. In contrast the second query > > might very well sensibly be planned as a parallel sequential scan with > > the nested loop executing in the same worker. But the cost of the > > sequential scan itself will likely be completely drowned out by the > > nestloop execution - index probes are expensive/unpredictable. > I think the work/task given to each worker should be as granular > as possible to make it more predictable. > I think the better way to parallelize such a work (Join query) is that > first worker does sequential scan and filtering on large table and > then pass it to next worker for doing join with gigantic_table. I'm pretty sure that'll result in rather horrible performance. IPC is rather expensive, you want to do as little of it as possible. > > > > > > I think it makes sense to think of a set of tasks in which workers can > > > assist. So you a query tree which is just one query tree, with no > > > copies of the nodes, and then there are certain places in that query > > > tree where a worker can jump in and assist that node. To do that, it > > > will have a copy of the node, but that doesn't mean that all of the > > > stuff inside the node becomes shared data at the code level, because > > > that would be stupid. > > > > My only "problem" with that description is that I think workers will > > have to work on more than one node - it'll be entire subtrees of the > > executor tree. > There could be some cases where it could be beneficial for worker > to process a sub-tree, but I think there will be more cases where > it will just work on a part of node and send the result back to either > master backend or another worker for further processing. I think many parallelism projects start out that way, and then notice that it doesn't parallelize very efficiently. The most extreme example, but common, is aggregation over large amounts of data - unless you want to ship huge amounts of data between processes eto parallize it you have to do the sequential scan and the pre-aggregate step (that e.g. selects count() and sum() to implement a avg over all the workers) inside one worker. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On 2015-02-18 17:15:18 +0900, Michael Paquier wrote: > >> - I don't think that the t_bits fields in htup_details.h should be > >> updated either. > > > > Why not? Any not broken code should already use MinHeapTupleSize and > > similar macros. > > Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and > similarly a couple of redo routines in heapam.c using > HeapTupleHeaderData in a couple of structures not placing it at the > end (compiler complains). The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the middle of a struct but not when when you embed a struct that uses it into the middle another struct. At least gcc doesn't and I think it'd be utterly broken if another compiler did that. If there's a compiler that does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1. Code embedding structs using *either* [FLEXIBLE_ARRAY_MEMBER] or [1] for variable length obviously has to take care where the variable length part goes. And that what the structs you removed where doing - that's where the t_bits et al go: { ... HeapTupleHeaderData header; chardata[MaxHeapTupleSize]; ... } the 'data' bit is just the t_bits + data together. And similar in - struct - { - struct varlena hdr; - chardata[TOAST_MAX_CHUNK_SIZE]; /* make struct big enough */ - int32 align_it; /* ensure struct is aligned well enough */ - } chunk_data; The 'data' is where the varlena's vl_dat stuff is stored. > >> Places using a variable-length variable not at the end of a structure > >> are changed with workaround, without impacting what those features do. > > > > I vote for rejecting most of this, except a (corrected version) of the > > pg_authid change. Which doesn't really belong to the rest of the patch > > anyway ;)x > > Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or > at least some changes in this area as something with > FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure. Again, I think you're confusing things that may not be be done in a single struct, and structs that are embedded in other places. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Stephen, I meant it to go to the list, but hit the wrong button. > On Feb 17, 2015, at 7:01 PM, Stephen Frost wrote: > > Neil, > > I noticed that you email'd me directly on this reply. Not sure if you > intended to or not, but I'm fine with my response going to the list. > > * Neil Tiffin (ne...@neiltiffin.com) wrote: >>> On Feb 17, 2015, at 1:10 PM, Stephen Frost wrote: >>> If the DB account isn't a superuser then everything changes. There's no >>> point fighting with the OS user- they can run some other PG binary which >>> they've copied over locally and run SQL with that, or copy all the files >>> over to another server which they have complete access to. The fact >>> that they can also connect to the DB and run SQL isn’t really an issue. >> >> Thats the point. If this environment matters then the db superuser would not >> be an authorized os superuser (with root or even close privileges). And no, >> they could not be running some other binary. >> >> One way to do pg superuser auditing is to utilize a file (outside of the pg >> data directory, which probably violates something in pg) like >> /etc/pg_su_audit that has os root rw and r for all others (or the equivalent >> for other os’s) containing a URL. If this file is present, send all db >> superuser usage to the URL. If this file is present with the wrong >> privileges, then don’t start pg. Done. No configuration in pg config files, >> no GUCs, no nothing for the pg superuser to mess around with, not tables, no >> ability for any pg superuser to configure or control. > > This approach doesn't violate anything in PG and can be used with any of > the pgaudit approaches being discussed. The fact that it's > postgresql.conf, which represents GUCs, doesn't change anything > regarding what you’re getting it. > It changes everything, pg superusers have complete control of files in the pg data directory. If set up correctly pg superusers have no control in /etc. >> If they can copy or install a PG binary, then the OS auditing and security >> failed. PG code need not be concerned. >> >> Sure someone could still break access to the URL, but again, not PG’s >> concern. Other security and auditing would have responsibility to pick that >> up. >> >> It is a really simple use case, record everything any db superuser does and >> send it to the audit system. Done. Then a designated role can control what >> gets audited in production. As long as everything the db superuser does can >> be written to an audit log, then it no longer matters technically if the db >> superuser can change the rest of the auditing. If they do and it violates >> policy, then when the audit picks it up, they lose their job plus depending >> on the environment. > > The issue is really around what we claim to provide with this auditing. > We can't claim to provide *full* superuser auditing with any of these > approaches since the superuser can disable auditing. We can, however, > provide auditing up until that happens, which is likely to be sufficient > in many environments. For those environments where full superuser > auditing is required, an independent system must be used. > > Of course, it's important that any auditing mechanism which is used to > audit superusers be impossible for the superuser to modify after the > fact, meaning that syslog or similar needs to be used. > I’m still confused since you do do not differentiate between db superuser and os superuser and what you mean by an independent system? With the scheme I described above, how does the db superuser disable auditing without os root privileges? If they can, then pg security is fundamentally broken, and I don’t believe it is. How can an independent system monitor what commands are issued inside the database? I understand my comments do not cover what is being proposed or worked on and that is fine. But saying it does not have value because the superuser could disable any system in pg, is wrong IMO. Being able to reliability log db superusers without their ability to change the logging would be a fundamentally good security tool as companies become more serious about internal security. This is, and will be more, important since a lot of people consider insider breaches the biggest security challenge today. Neil -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
Etsuro, * Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote: > On 2015/02/18 7:44, Stephen Frost wrote: > >Attached is a patch which should address this. Would love your (or > >anyone else's) feedback on it. It appears to address the issue which > >you raised and the regression test changes are all in-line with > >inserting a LockRows into the subquery, as anticipated. > > I've looked into the patch. > > * The patch applies to the latest head, 'make' passes successfully, > but 'make check' fails in the rowsecurity test. Apologies for not being clear- the patch was against 9.4, where it passes all the regression tests (at least for me- if you see differently, please let me know!). > * I found one place in expand_security_qual that I'm concerned about: > > + if (targetRelation) > + applyLockingClause(subquery, 1, LCS_FORUPDATE, > +false, > false); > > ISTM that it'd be better to use LockWaitBlock as the fourth argument > of applyLockingClause. LockWaitBlock isn't in 9.4. :) Otherwise, I'd agree, and it's what I plan to do for master. > Other than that, the patch looks good to me. Great, thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund wrote: > > On 2015-02-11 15:49:17 -0500, Robert Haas wrote: > > A query whose runetime is dominated by a sequential scan (+ attached > filter) is certainly going to require a bigger prefetch size than one > that does other expensive stuff. > > Imagine parallelizing > SELECT * FROM largetable WHERE col = low_cardinality_value; > and > SELECT * > FROM largetable JOIN gigantic_table ON (index_nestloop_condition) > WHERE col = high_cardinality_value; > > The first query will be a simple sequential and disk reads on largetable > will be the major cost of executing it. In contrast the second query > might very well sensibly be planned as a parallel sequential scan with > the nested loop executing in the same worker. But the cost of the > sequential scan itself will likely be completely drowned out by the > nestloop execution - index probes are expensive/unpredictable. > I think the work/task given to each worker should be as granular as possible to make it more predictable. I think the better way to parallelize such a work (Join query) is that first worker does sequential scan and filtering on large table and then pass it to next worker for doing join with gigantic_table. > > > > I think it makes sense to think of a set of tasks in which workers can > > assist. So you a query tree which is just one query tree, with no > > copies of the nodes, and then there are certain places in that query > > tree where a worker can jump in and assist that node. To do that, it > > will have a copy of the node, but that doesn't mean that all of the > > stuff inside the node becomes shared data at the code level, because > > that would be stupid. > > My only "problem" with that description is that I think workers will > have to work on more than one node - it'll be entire subtrees of the > executor tree. > There could be some cases where it could be beneficial for worker to process a sub-tree, but I think there will be more cases where it will just work on a part of node and send the result back to either master backend or another worker for further processing. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Wed, Feb 18, 2015 at 1:26 AM, David Steele wrote: > On 2/17/15 10:23 AM, Simon Riggs wrote: >> I vote to include pgaudit in 9.5, albeit with any changes. In >> particular, David may have some changes to recommend, but I haven't >> seen a spec or a patch, just a new version of code (which isn't how we >> do things...). > > I submitted the new patch in my name under a separate thread "Auditing > extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net) I played the patch version of pg_audit a bit and have basic comments about its spec. The pg_audit doesn't log BIND parameter values when prepared statement is used. Seems this is an oversight of the patch. Or is this intentional? The pg_audit cannot log the statement like "SELECT 1" which doesn't access to the database object. Is this intentional? I think that there are many users who want to audit even such statement. Imagine the case where you call the user-defined function which executes many nested statements. In this case, pg_audit logs only top-level statement (i.e., issued directly by client) every time nested statement is executed. In fact, one call of such UDF can cause lots of *same* log messages. I think this is problematic. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015 - mentors, students and admins.
On 9 February 2015 at 20:52, Thom Brown wrote: > > Hi all, > > Google Summer of Code 2015 is approaching. I'm intending on registering > PostgreSQL again this year. > > Before I do that, I'd like to have an idea of how many people are > interested in being either a student or a mentor. > > I've volunteered to be admin again, but if anyone else has a strong > interest of seeing the projects through this year, please let yourself be > known. > > Who would be willing to mentor projects this year? > > Project ideas are welcome, and I've copied many from last year's > submissions into this year's wiki page. Please feel free to add more (or > delete any that stand no chance or are redundant): > https://wiki.postgresql.org/wiki/GSoC_2015 > > Students can find more information about GSoC here: > http://www.postgresql.org/developer/summerofcode > I'd also like to ask whether anyone would be willing to be a backup admin for this year's GSoC? Please message me directly if you're interested. I need to know A.S.A.P though. Thanks Thom
Re: [HACKERS] ExplainModifyTarget doesn't work as expected
On 2015/02/18 8:13, Tom Lane wrote: > So I went back to your v1 patch and have now committed that with some > cosmetic modifications. Sorry for making you put time into a dead end. I don't mind it. Thanks for committing the patch! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
On 2015/02/18 7:44, Stephen Frost wrote: * Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote: On 2015/02/11 4:06, Stephen Frost wrote: I had been trying to work out an FDW-specific way to address this, but I think Dean's right that this should be addressed in expand_security_qual(), which means it'll apply to all cases and not just these FDW calls. I don't think that's actually an issue though and it'll match up to how SELECT FOR UPDATE is handled today. Sorry, my explanation was not accurate, but I also agree with Dean's idea. In the above, I just wanted to make it clear that such a lock request done by expand_security_qual() should be limited to the case where the relation that is a former result relation is a foreign table. Attached is a patch which should address this. Would love your (or anyone else's) feedback on it. It appears to address the issue which you raised and the regression test changes are all in-line with inserting a LockRows into the subquery, as anticipated. I've looked into the patch. * The patch applies to the latest head, 'make' passes successfully, but 'make check' fails in the rowsecurity test. * I found one place in expand_security_qual that I'm concerned about: + if (targetRelation) + applyLockingClause(subquery, 1, LCS_FORUPDATE, + false, false); ISTM that it'd be better to use LockWaitBlock as the fourth argument of applyLockingClause. Other than that, the patch looks good to me. Thanks for the work! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about to have relnamespace and relrole?
Sorry, I sent the previous mail without patches by accident. The patches are attached to this mail. Hello, this is the patchset v2 of this feature. 0001-Add-regrole.patch Adding regrole as the name says. 0002-Add-regnamespace.patch Adding regnamespace. This depends on 0001 patch. 0003-Check-new-reg-types-are-not-used-as-default-values.patch Inhibiting the new OID aliss types from being used as constants in default values, which make dependencies on the other (existing) OID alias types. 0004-Documentation-for-new-OID-alias-types.patch Documentation patch for this new types. regards, At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp> > Hello, thank you for the comment. > > The current patch lacks change in documentation and dependency > stuff. Current framework doesn't consider changing pg_shdepend > from column default expressions so the possible measures are the > followings, I think. > > 1. Make pg_shdepend to have refobjsubid and add some deptypes of >pg_depend, specifically DEPENDENCY_NORMAL is needed. Then, >change the dependency stuff. > > 2. Simplly inhibit specifying them in default > expressions. Integer or OID can act as the replacement. > >=# create table t1 (a int, b regrole default 'horiguti'::regrole); >ERROR: Type 'regrole' cannot have a default value > > 1 is quite overkill but hardly seems to be usable. So I will go > on 2 and post additional patch. > > > At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby > wrote in <54dd3358.9030...@bluetreble.com> > > On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote: > > > Hello, I changed the subject. > > # Ouch! the subject remaines wrong.. > > > > 1. Expected frequency of use > > ... > > > For that reason, although the current policy of deciding whether > > > to have reg* types seems to be whether they have schema-qualified > > > names, I think regrole and regnamespace are valuable to have. > > > > Perhaps schema qualification was the original intent, but I think at > > this point everyone uses them for convenience. Why would I want to > > type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could > > simply do ???namespace::regnamespace? > > Yes, that is the most important point of this patch. > > > > 2. Anticipaed un-optimizability > > > > > > Tom pointed that these reg* types prevents planner from > > > optimizing the query, so we should refrain from having such > > > machinary. It should have been a long-standing issue but reg*s > > > sees to be rather faster than joining corresponding catalogs > > > for moderate number of the result rows, but this raises another > > > more annoying issue. > > > > > > > > > 3. Potentially breakage of MVCC > > > > > > The another issue Tom pointed is potentially breakage of MVCC by > > > using these reg* types. Syscache is invalidated on updates so > > > this doesn't seem to be a problem on READ COMMITTED mode, but > > > breaks SERIALIZABLE mode. But IMHO it is not so serious problem > > > as long as such inconsistency occurs only on system catalog and > > > it is explicitly documented togethee with the un-optimizability > > > issue. I'll add a patch for this later. > > > > The way I look at it, all these arguments went out the window when > > regclass was introduced. > > > > The reality is that reg* is *way* more convenient than manual > > joins. The arguments about optimization and MVCC presumably apply to > > all reg* casts, which means that ship has long since sailed. > > I agree basically, but I think some caveat is needed. I'll > include that in the coming documentation patch. > > > If we offered views that made it easier to get at this data then maybe > > this wouldn't matter so much. But DBA's don't want to join 3 catalogs > > together to try and answer a simple question. > > It has been annoying me these days. -- Kyotaro Horiguchi NTT Open Source Software Center >From 2a6f689afdc8197c2fe2fc235a4819ce1a5e9928 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 18 Feb 2015 14:38:32 +0900 Subject: [PATCH 1/4] Add regrole Add new regtype regrole. For the development reason, the system OIDs used for the new procs, types, casts are making a gap from existing OIDs. --- src/backend/bootstrap/bootstrap.c | 2 + src/backend/utils/adt/regproc.c | 101 ++ src/backend/utils/adt/selfuncs.c | 2 + src/backend/utils/cache/catcache.c| 1 + src/include/catalog/pg_cast.h | 7 +++ src/include/catalog/pg_proc.h | 10 src/include/catalog/pg_type.h | 5 ++ src/include/utils/builtins.h | 5 ++ src/test/regress/expected/regproc.out | 26 - src/test/regress/sql/regproc.sql | 7 +++ 10 files changed, 165 insertions(+), 1 deletion(-) diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index bc66eac..11e40ee 100644 --
Re: [HACKERS] How about to have relnamespace and relrole?
Hello, this is the patchset v2 of this feature. 0001-Add-regrole.patch Adding regrole as the name says. 0002-Add-regnamespace.patch Adding regnamespace. This depends on 0001 patch. 0003-Check-new-reg-types-are-not-used-as-default-values.patch Inhibiting the new OID aliss types from being used as constants in default values, which make dependencies on the other (existing) OID alias types. 0004-Documentation-for-new-OID-alias-types.patch Documentation patch for this new types. regards, At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp> > Hello, thank you for the comment. > > The current patch lacks change in documentation and dependency > stuff. Current framework doesn't consider changing pg_shdepend > from column default expressions so the possible measures are the > followings, I think. > > 1. Make pg_shdepend to have refobjsubid and add some deptypes of >pg_depend, specifically DEPENDENCY_NORMAL is needed. Then, >change the dependency stuff. > > 2. Simplly inhibit specifying them in default > expressions. Integer or OID can act as the replacement. > >=# create table t1 (a int, b regrole default 'horiguti'::regrole); >ERROR: Type 'regrole' cannot have a default value > > 1 is quite overkill but hardly seems to be usable. So I will go > on 2 and post additional patch. > > > At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby > wrote in <54dd3358.9030...@bluetreble.com> > > On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote: > > > Hello, I changed the subject. > > # Ouch! the subject remaines wrong.. > > > > 1. Expected frequency of use > > ... > > > For that reason, although the current policy of deciding whether > > > to have reg* types seems to be whether they have schema-qualified > > > names, I think regrole and regnamespace are valuable to have. > > > > Perhaps schema qualification was the original intent, but I think at > > this point everyone uses them for convenience. Why would I want to > > type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could > > simply do ???namespace::regnamespace? > > Yes, that is the most important point of this patch. > > > > 2. Anticipaed un-optimizability > > > > > > Tom pointed that these reg* types prevents planner from > > > optimizing the query, so we should refrain from having such > > > machinary. It should have been a long-standing issue but reg*s > > > sees to be rather faster than joining corresponding catalogs > > > for moderate number of the result rows, but this raises another > > > more annoying issue. > > > > > > > > > 3. Potentially breakage of MVCC > > > > > > The another issue Tom pointed is potentially breakage of MVCC by > > > using these reg* types. Syscache is invalidated on updates so > > > this doesn't seem to be a problem on READ COMMITTED mode, but > > > breaks SERIALIZABLE mode. But IMHO it is not so serious problem > > > as long as such inconsistency occurs only on system catalog and > > > it is explicitly documented togethee with the un-optimizability > > > issue. I'll add a patch for this later. > > > > The way I look at it, all these arguments went out the window when > > regclass was introduced. > > > > The reality is that reg* is *way* more convenient than manual > > joins. The arguments about optimization and MVCC presumably apply to > > all reg* casts, which means that ship has long since sailed. > > I agree basically, but I think some caveat is needed. I'll > include that in the coming documentation patch. > > > If we offered views that made it easier to get at this data then maybe > > this wouldn't matter so much. But DBA's don't want to join 3 catalogs > > together to try and answer a simple question. > > It has been annoying me these days. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup may fail to send feedbacks.
Hello, this is the last patch for pg_basebackup/pg_receivexlog on master (9.5). Preor versions don't have this issue. 4. basebackup_reply_fix_mst_v2.patch receivelog.c patch applyable on master. This is based on the same design with walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday(). regards, At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20150217.194519.58137941.horiguchi.kyot...@lab.ntt.co.jp> > Thank you for the comment. Three new patches are attached. I > forgot to give a revision number on the previous patch, but I > think this is the 2nd version. > > 1. walrcv_reply_fix_94_v2.patch >Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE > > 2. walrcv_reply_fix_92_v2.patch >Walreceiver patch applyable on REL9_2_STABLE > > 3. walrcv_reply_fix_91_v2.patch >Walreceiver patch applyable on REL9_1_STABLE > > > At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao wrote > in > > On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI > > wrote: > > >> Introduce the maximum number of records that we can receive and > > >> process between feedbacks? For example, we can change > > >> XLogWalRcvSendHSFeedback() so that it's called per at least > > >> 8 records. I'm not sure what number is good, though... > > > > > > At the beginning of the "while(len > 0){if (len > 0){" block, > > > last_recv_timestamp is always kept up to date (by using > > > gettimeotda():). So we can use the variable instead of > > > gettimeofday() in my previous proposal. > > > > Yes, but something like last_recv_timestamp doesn't exist in > > REL9_1_STABLE. So you don't think that your proposed fix > > should be back-patched to all supported versions. Right? > > Back to 9.3 has the loop with the same structure. 9.2 is in a bit > differenct shape but looks to have the same issue. 9.1 and 9.0 > has the same staps with 9.2 but 9.0 doesn't has wal sender > timeout and 9.1 does not have a variable having current time. > > 9.3 and later: the previous patch works, but revised as your > comment. > > 9.2: The time of the last reply is stored in the file-scope > variable reply_message, and the current time is stored in > walrcv->lastMsgReceiptTime. The timeout is determined using > theses variables. > > 9.1: walrcv doesn't have lastMsgReceiptTime. The current time > should be acquired explicitly in the innermost loop. I tried > calling gettimeofday() once per several loops. The skip count > is determined by anticipated worst duration to process a wal > record and wal_receiver_status_interval. However, I doubt the > necessity to do so.. The value of the worst duration is simply > a random guess. > > 9.0: No point to do this kind of fix. > > > > > The start time of the timeout could be last_recv_timestamp at the > > > beginning of the while loop, since it is a bit earlier than the > > > actual time at the point. > > > > Sounds strange to me. I think that last_recv_timestamp should be > > compared with the time when the last feedback message was sent, > > e.g., maybe you can expose sendTime in XLogWalRcvSendReplay() > > as global variable, and use it to compare with last_recv_timestamp. > > You're right to do exactly right thing, but, as you mentioned, > exposing sendTime is requied to do so. The solution I proposed is > the second-best assuming that wal_sender_timeout is recommended > to be longer enough (several times longer) than > wal_receiver_status_interval. If no one minds to expose sendTime, > it is the best solution. The attached patch does it. > > # The added comment in the patch was wrong, though. > > > When we get out of the WAL receive loop due to the timeout check > > which your patch added, XLogWalRcvFlush() is always executed. > > I don't think that current WAL file needs to be flushed at that time. > > Thank you for pointing it out. Run XLogWalRcvSendReply(force = > true) immediately instead of breaking from the loop. > > > > The another solution would be using > > > RegisterTimeout/enable_timeout_after and it seemed to be work for > > > me. It can throw out the time counting stuff in the loop we are > > > talking about and that of XLogWalRcvSendHSFeedback and > > > XLogWalRcvSendReply, but it might be a bit too large for the > > > gain. > > > > Yes, sounds overkill. > > However, gettimeofday() for each recv is also a overkill for most > cases. I'll post the patches for receivelog.c based on the patch > for 9.1 wal receiver. -- Kyotaro Horiguchi NTT Open Source Software Center >From cfe01eadd4d8567f4410bccb8334c52fc897c002 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 18 Feb 2015 12:30:58 +0900 Subject: [PATCH] Make pg_bascbackup and pg_receivexlog to keep sending WAL receive feedback regularly on heavy load v2. pg_basebackup and pg_receivexlog fail to send receiver reply message while they are receiving continuous WAL stream caused by heavy load or something else. This patch makes them to send reply messa
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Tue, Feb 17, 2015 at 9:02 AM, Andres Freund wrote: > On 2015-02-17 05:51:22 +0900, Michael Paquier wrote: >> -- inv_api.c uses bytea in an internal structure without putting it at >> the end of the structure. For clarity I think that we should just use >> a bytea pointer and do a sufficient allocation for the duration of the >> lobj manipulation. > > Hm. I don't really see the problem tbh. > > There's actually is a reason the bytea is at the beginning - the other > fields are *are* the data part of the bytea (which is just the varlena > header). > >> -- Similarly, tuptoaster.c needed to be patched for toast_save_datum > > I'm not a big fan of these two changes. This adds some not that small > memory allocations to a somewhat hot path. Without a big win in > clarity. And it doesn't seem to have anything to do with with > FLEXIBLE_ARRAY_MEMBER. We could replace those palloc calls with a simple buffer that has a predefined size, but this somewhat reduces the readability of the code. >> There are as well couple of things that are not changed on purpose: >> - in namespace.h for FuncCandidateList. I tried manipulating it but it >> resulted in allocation overflow in PortalHeapMemory > > Did you change the allocation in FuncnameGetCandidates()? Note the - > sizeof(Oid) there. Yeah. Missed it. >> - I don't think that the t_bits fields in htup_details.h should be >> updated either. > > Why not? Any not broken code should already use MinHeapTupleSize and > similar macros. Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and similarly a couple of redo routines in heapam.c using HeapTupleHeaderData in a couple of structures not placing it at the end (compiler complains). We could use for each of them a buffer that has enough room with sizeof(HeapTupleHeaderData) + MaxHeapTupleSize, but wouldn't it reduce the readability of the current code? Opinions welcome. >> diff --git a/src/include/catalog/pg_attribute.h >> b/src/include/catalog/pg_attribute.h > [...] > Generally the catalog changes are much less clearly a benefit imo. OK, let's drop them then. >> From ad8f54cdb5776146f17d1038bb295b5f13b549f1 Mon Sep 17 00:00:00 2001 >> From: Michael Paquier >> Date: Mon, 16 Feb 2015 03:53:38 +0900 >> Subject: [PATCH 3/3] Update varlena in c.h to use FLEXIBLE_ARRAY_MEMBER >> >> Places using a variable-length variable not at the end of a structure >> are changed with workaround, without impacting what those features do. > > I vote for rejecting most of this, except a (corrected version) of the > pg_authid change. Which doesn't really belong to the rest of the patch > anyway ;)x Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or at least some changes in this area as something with FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure. But I guess that we can do fine as well by replacing those structures with some buffers with a pre-defined size. I'll draft an additional patch on top of 0001 with all those less-trivial changes implemented. >> #define VARHDRSZ ((int32) sizeof(int32)) >> diff --git a/src/include/catalog/pg_authid.h >> b/src/include/catalog/pg_authid.h >> index e01e6aa..d8789a5 100644 >> --- a/src/include/catalog/pg_authid.h >> +++ b/src/include/catalog/pg_authid.h >> @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION >> BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC >> int32 rolconnlimit; /* max connections allowed (-1=no >> limit) */ >> >> /* remaining fields may be null; use heap_getattr to read them! */ >> - textrolpassword;/* password, if any */ >> timestamptz rolvaliduntil; /* password expiration time, if any */ >> +#ifdef CATALOG_VARLEN >> + textrolpassword;/* password, if any */ >> +#endif >> } FormData_pg_authid; > > That change IIRC is wrong, because it'll make rolvaliduntil until NOT > NULL (any column that's fixed width and has only fixed with columns > before it is marked as such). This sounds better as a separate patch... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
Hi Rowley, Let me put some comments on this patch. This patch itself looks good as an infrastructure towards the big picture, however, we still don't reach the consensus how combined functions are used instead of usual translation functions. Aggregate function usually consumes one or more values extracted from a tuple, then it accumulates its internal state according to the argument. Exiting transition function performs to update its internal state with assumption of a function call per records. On the other hand, new combined function allows to update its internal state with partial aggregated values which is processed by preprocessor node. An aggregate function is represented with Aggref node in plan tree, however, we have no certain way to determine which function shall be called to update internal state of aggregate. For example, avg(float) has an internal state with float[3] type for number of rows, sum of X and X^2. If combined function can update its internal state with partially aggregated values, its argument should be float[3]. It is obviously incompatible to float8_accum(float) that is transition function of avg(float). I think, we need a new flag on Aggref node to inform executor which function shall be called to update internal state of aggregate. Executor cannot decide it without this hint. Also, do you have idea to push down aggregate function across joins? Even though it is a bit old research, I could find a systematic approach to push down aggregate across join. https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf I think, it is great if core functionality support this query rewriting feature based on cost estimation, without external modules. How about your opinions? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Rowley > Sent: Friday, December 19, 2014 8:39 PM > To: Simon Riggs > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila > Subject: Re: [HACKERS] Combining Aggregates > > On 18 December 2014 at 02:48, Simon Riggs wrote: > > > David, if you can update your patch with some docs to explain the > behaviour, it looks complete enough to think about committing it > in > early January, to allow other patches that depend upon it to stand > a > chance of getting into 9.5. (It is not yet ready, but I see it could > be). > > > > > Sure, I've more or less added the docs from your patch. I still need to > trawl through and see if there's anywhere else that needs some additions. > > > The above example is probably the best description of the need, > since > user defined aggregates must also understand this. > > Could we please call these "combine functions" or other? MERGE is > an > SQL Standard statement type that we will add later, so it will be > confusing if we use the "merge" word in this context. > > > > > Ok, changed. > > > David, your patch avoids creating any mergefuncs for existing > aggregates. We would need to supply working examples for at least > a > few of the builtin aggregates, so we can test the infrastructure. > We > can add examples for all cases later. > > > > > In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point > types, cash and interval. I've now added combine functions for count(*) > and count(col). It seems that int8pl() is suitable for this. > > > Do you think it's worth adding any new functions at this stage, or is it > best to wait until there's a patch which can use these? > > Regards > > David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers