[HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
Hi all, Coverity is pointing out $subject, with the following stuff in gbt_var_same(): GBT_VARKEY *t1 = (GBT_VARKEY *) DatumGetPointer(d1); GBT_VARKEY *t2 = (GBT_VARKEY *) DatumGetPointer(d2); GBT_VARKEY_R r1, r2; r1 = gbt_var_key_readable(t1); = t1 dereferenced r2 = gbt_var_key_readable(t2); = t2 dereferenced if (t1 t2) result = ((*tinfo-f_cmp) (r1.lower, r2.lower, collation) == 0 (*tinfo-f_cmp) (r1.upper, r2.upper, collation) == 0); else result = (t1 == NULL t2 == NULL); = Coverity complains here return result; As Heikki pointed me out on IM, the lack of crash report in this area, as well as similar coding style in cube/ seem to be sufficient arguments to simply remove those NULL checks instead of doing more solid checks on them. Patch is attached. Regards, -- Michael diff --git a/contrib/btree_gist/btree_utils_var.c b/contrib/btree_gist/btree_utils_var.c index b7dd060..6ad3347 100644 --- a/contrib/btree_gist/btree_utils_var.c +++ b/contrib/btree_gist/btree_utils_var.c @@ -337,7 +337,6 @@ bool gbt_var_same(Datum d1, Datum d2, Oid collation, const gbtree_vinfo *tinfo) { - bool result; GBT_VARKEY *t1 = (GBT_VARKEY *) DatumGetPointer(d1); GBT_VARKEY *t2 = (GBT_VARKEY *) DatumGetPointer(d2); GBT_VARKEY_R r1, @@ -346,13 +345,8 @@ gbt_var_same(Datum d1, Datum d2, Oid collation, r1 = gbt_var_key_readable(t1); r2 = gbt_var_key_readable(t2); - if (t1 t2) - result = ((*tinfo-f_cmp) (r1.lower, r2.lower, collation) == 0 - (*tinfo-f_cmp) (r1.upper, r2.upper, collation) == 0); - else - result = (t1 == NULL t2 == NULL); - - return result; + return ((*tinfo-f_cmp) (r1.lower, r2.lower, collation) == 0 + (*tinfo-f_cmp) (r1.upper, r2.upper, collation) == 0); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
Hello, thank you for the comment. I added experimental adaptive fetch size feature in this v6 patch. At Tue, 20 Jan 2015 04:51:13 +, Matt Kelly mkell...@gmail.com wrote in ca+kcukhluo+vaj4xr8gvsof_nw79udztdyhosdt13cfjkae...@mail.gmail.com I think its telling that varying the fetch size doubled the performance, even on localhost. If you were to repeat this test across a network, the performance difference would be far more drastic. I think so surely. I understand the desire to keep the fetch size small by default, but I think your results demonstrate how important the value is. At the very least, it is worth reconsidering this arbitrary value. However, I think the real solution is to make this configurable. It probably should be a new option on the foreign server or table, but an argument could be made for it to be global across the server just like work_mem. The optimal number of fetch_count varies depending on query. Only from the performance view, it should be the same as the table size when simple scan on a table. Most of joins also not need to read target relations simultaneously. (Local merge join on remote sorted results is not available since fdw is not aware of the sorted-ness). But it would be changed in near future. So I have found no appropriate policy to decide the number. The another point of view is memory requirement. This wouldn't matter using single-row mode of libpq but it doesn't allow multple simultaneous queries. The space needed for the fetch buffer widely varies in proportion to the average row length. If it is 1Kbytes, 1 rows requires over 10MByes, which is larger than the default value of work_mem. I tried adaptive fetch_size based on fetch durtaion and required buffer size for the previous turn in this version. But hard limit cannot be imposed since we cannot know of the mean row length in advance. So, for example, the average row length suddenly grows 1KB-10KB when fetch_size is 1, 100MB is required for the turn. I think, for the ordinary cases, maximum fetch size cannot exceeds 1000. The attatched is the new version implemented the adaptive fetch size. Simple test runs showed the values below. A single scan was boosted by about 5% (No effect?) and a join by 33%. The former case is ununderstandable so I'll examine it tomorrow. This doesn't seem so promising, though.. = master=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1; QUERY PLAN - Foreign Scan on ft1 (actual time=1.741..10046.272 rows=100 loops=1) Planning time: 0.084 ms Execution time: 10145.730 ms (3 rows) patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1; QUERY PLAN Foreign Scan on ft1 (actual time=1.072..9582.980 rows=100 loops=1) Planning time: 0.077 ms Execution time: 9683.164 ms (3 rows) patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x JOIN ft1 AS y on x.a = y.a; QUERY PLAN postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x JOIN ft1 AS y on x.a = y.a; QUERY PLAN --- --- Merge Join (actual time=18191.739..19534.001 rows=100 loops=1) Merge Cond: (x.a = y.a) - Sort (actual time=9031.155..9294.465 rows=100 loops=1) Sort Key: x.a Sort Method: external sort Disk: 142728kB - Foreign Scan on ft1 x (actual time=1.156..6486.632 rows=100 lo ops=1) - Sort (actual time=9160.577..9479.076 rows=100 loops=1) Sort Key: y.a Sort Method: external sort Disk: 146632kB - Foreign Scan on ft1 y (actual time=0.641..6517.594 rows=100 lo ops=1) Planning time: 0.203 ms Execution time: 19626.881 ms (12 rows) --- --- Merge Join (actual time=11790.690..13134.071 rows=100 loops=1) Merge Cond: (x.a = y.a) - Sort (actual time=8149.225..8413.611 rows=100 loops=1) Sort Key: x.a Sort Method: external sort Disk: 142728kB - Foreign Scan on ft1 x (actual time=0.679..3989.160 rows=100 lo ops=1) - Sort (actual time=3641.457..3957.240 rows=100 loops=1) Sort Key: y.a Sort Method: external sort Disk: 146632kB - Foreign Scan on ft1 y (actual time=0.605..1852.655 rows=100 lo ops=1) Planning time: 0.203 ms Execution time: 13226.414 ms (12 rows) Obviously, this shouldn't block your current patch but its worth revisiting. regards, -- Kyotaro
Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL
On Fri, Jan 16, 2015 at 04:59:56PM +0100, Andres Freund wrote: On 2015-01-16 15:16:20 +0100, Andreas Karlsson wrote: For this reason I opted to only lower the lock levels of ADD and ALTER TRIGGER, and not DROP TRIGGER. Neither of those require MVCC of then WHEN clause. I'm unconvinced that this is true. Using a snapshot for part of getting a definition certainly opens the door for getting strange results. Acquiring a lock that prevents schema changes on the table and then getting the definition using the syscaches guarantees that that definition is at least self consistent because no further schema changes are possible and the catalog caches will be up2date. What you're doing though is doing part of the scan using the transaction's snapshot (as used by pg_dump that will usually be a repeatable read snapshot and possibly quite old) and the other using a fresh catalog snapshot. This can result in rather odd things. Just consider: S1: CREATE TABLE flubber(id serial primary key, data text); S1: CREATE FUNCTION blarg() RETURNS TRIGGER LANGUAGE plpgsql AS $$BEGIN RETURN NEW; END;$$; S1: CREATE TRIGGER flubber_blarg BEFORE INSERT ON flubber FOR EACH ROW WHEN (NEW.data IS NOT NULL) EXECUTE PROCEDURE blarg(); S2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; S2: SELECT 'dosomethingelse'; S1: ALTER TABLE flubber RENAME TO wasflubber; S1: ALTER TABLE wasflubber RENAME COLUMN data TO wasdata; S1: ALTER TRIGGER flubber_blarg ON wasflubber RENAME TO wasflubber_blarg; S1: ALTER FUNCTION blarg() RENAME TO wasblarg; S2: SELECT pg_get_triggerdef(oid) FROM pg_trigger; This will give you: The old trigger name. The new table name. The new column name. The new function name. I don't think using a snapshot for tiny parts of these functions actually buys anything. Now, this isn't a pattern you introduced. But I think we should think about this for a second before expanding it further. Fair enough. It did reinforce pg_get_constraintdef() as a subroutine of pg_dump rather than an independent, rigorous interface. It perhaps made the function worse for non-pg_dump callers. In that vein, each one of these hacks has a cost. One could make a reasonable argument that ALTER TRIGGER RENAME locking is not important enough to justify spreading the hack from pg_get_constraintdef() to pg_get_triggerdef(). Lowering the CREATE TRIGGER lock level does not require any ruleutils.c change for the benefit of pg_dump, because pg_dump won't see the pg_trigger row of a too-recent trigger. Before you argue that this isn't relevant for pg_dump: It is. Precisely the above can happen - just replace the 'dosomethingelse' with several LOCK TABLEs as pg_dump does. The first blocks after a snapshot has been acquired. While waiting, all the ALTERs happen. We wish pg_dump would take a snapshot of the database; that is, we wish its output always matched some serial execution of transactions. pg_dump has, since ancient times, failed to achieve that if non-table DDL commits during the dump or if table DDL commits between acquiring the dump transaction snapshot and acquiring the last table lock. My reviews have defended the standard that table DDL issued after pg_dump has acquired locks does not change the dump. That's what we bought with pg_get_constraintdef()'s use of the transaction snapshot and would buy with the same in pg_get_triggerdef(). My reviews have deliberately ignored effects on scenarios where pg_dump already fails to guarantee snapshot-like output. Arguably the benefit here is that the window for this issue is becoming smaller as pg_dump (and hopefully other possible callers) acquire exclusive locks on the table. I.e. that the lowering of the lock level doesn't introduce new races. But on the other side of the coin, this makes the result of pg_get_triggerdef() even *more* inaccurate in many cases. What is this about pg_dump acquiring exclusive locks? To summarize, the problem you raise has been out of scope, because it affects pg_dump only at times when pg_dump is already wrong. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] documentation update for doc/src/sgml/func.sgml
I had a look at this patch. This patch adds some text below a table of functions. Immediately above that table, there is this existing language: The functions working with typedouble precision/type data are mostly implemented on top of the host system's C library; accuracy and behavior in boundary cases can therefore vary depending on the host system. This seems to me to substantially duplicate the information added by the patch. I would rather say that it explicites the potential issues. Taking that into account, maybe the part about floating point could be moved up after the above sentence, or the above sentence moved down as an introduction, with some pruning so that it fits in? The second paragraph about bitwise ops is not related to these. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app: changing email sender
Kyotaro == Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Kyotaro Hmm. The mail address indeed *was* mine but is now obsolete, Kyotaro so that the email might bounce. But I haven't find how to Kyotaro change it within the app itself, and the PostgreSQL community Kyotaro account page. Just being able to change the email address on the community account isn't enough; I for one am subscribed to the lists with a different email address than the one associated with my community account (for spam management reasons). There needs to be a way to have multiple addresses or to specify which is to be used for the post. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2015-01-20 18:17 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com: The current patch, which I am evaluating for commit, does away with per-group memory contexts (it uses a common context for all groups), and reduces the initial array allocation from 64 to 8 (but preserves doubling behavior). Jeff Tomas, spotted this comment in v8 patch: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory may be freed by an explicit pfree() + * call (unless it's meant to be freed by destroying the parent context). + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Simple pfree(astate) call is not enough to free the memory. If it's scalar accumulation (initArrayResult), the user must pfree(astate-dvalues) and pfree(astate-dnulls) before astate. If it's array accumulation, pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if no array accumulated and some other cases. If its any (scalar or array) accumulation, it's more complicated. I suggest it's simpler to just force the API user to destroy the parent context instead. So the comment become like this: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory is meant to be freed by destroying + * the parent context. + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Sorry, there is another comment of makeMdArrayResult, i suggest also changing it like this: @@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate, * beware: no check that specified dimensions match the number of values * accumulated. * + * beware: if the astate was not initialized within a separate memory + * context (i.e. using subcontext=true when calling initArrayResult), + * using release=true is illegal as it releases the whole context, + * and that may include other memory still used elsewhere (instead use + * release=false and release the memory with the parent context later) + * * astate is working state (must not be NULL) * rcontext is where to construct result
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. This caught my eye (thanks to conflict with GS patch): * In the future, we should consider forcing the * tuplesort_begin_heap() case when the abbreviated key * optimization can thereby be used, even when numInputs is 1. The comment in tuplesort_begin_datum that abbreviation can't be used seems wrong to me; why is the copy of the original value pointed to by stup-tuple (in the case of by-reference types, and abbreviation is obviously not needed for by-value types) not sufficient? Or what am I missing? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com: The current patch, which I am evaluating for commit, does away with per-group memory contexts (it uses a common context for all groups), and reduces the initial array allocation from 64 to 8 (but preserves doubling behavior). Jeff Tomas, spotted this comment in v8 patch: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory may be freed by an explicit pfree() + * call (unless it's meant to be freed by destroying the parent context). + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Simple pfree(astate) call is not enough to free the memory. If it's scalar accumulation (initArrayResult), the user must pfree(astate-dvalues) and pfree(astate-dnulls) before astate. If it's array accumulation, pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if no array accumulated and some other cases. If its any (scalar or array) accumulation, it's more complicated. I suggest it's simpler to just force the API user to destroy the parent context instead. So the comment become like this: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory is meant to be freed by destroying + * the parent context. + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Regards, -- Ali Akbar
Re: [HACKERS] Bug in pg_dump
Le 19/01/2015 14:41, Robert Haas a écrit : On Thu, Jan 15, 2015 at 6:26 AM, Gilles Darold gilles.dar...@dalibo.com wrote: I attach a patch that solves the issue in pg_dump, let me know if it might be included in Commit Fest or if the three other solutions are a better choice. I think a fix in pg_dump is appropriate, so I'd encourage you to add this to the next CommitFest. Ok, thanks Robert. The patch have been added to next CommitFest under the Bug Fixes section. I've also made some review of the patch and more test. I've rewritten some comments and added a test when TableInfo is NULL to avoid potential pg_dump crash. New patch attached: pg_dump.c-extension-FK-v2.patch Regards -- Gilles Darold Consultant PostgreSQL http://dalibo.com - http://dalibo.org --- ../postgresql/src/bin/pg_dump/pg_dump.c 2015-01-19 19:03:45.897706879 +0100 +++ src/bin/pg_dump/pg_dump.c 2015-01-20 11:22:01.144794489 +0100 @@ -209,6 +209,7 @@ static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo); static void getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids); +static bool hasExtensionMember(TableInfo *tblinfo, int numTables); static void makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo, bool oids); static void buildMatViewRefreshDependencies(Archive *fout); static void getTableDataFKConstraints(void); @@ -730,9 +731,20 @@ if (!dopt.schemaOnly) { + bool has_ext_member; + getTableData(dopt, tblinfo, numTables, dopt.oids); + + /* Search if there's dumpable table's members in an extension */ + has_ext_member = hasExtensionMember(tblinfo, numTables); + buildMatViewRefreshDependencies(fout); - if (dopt.dataOnly) + /* + * Get FK constraints even with schema+data if extension's + * members have FK because in this case tables need to be + * dump-ordered too. + */ + if (dopt.dataOnly || has_ext_member) getTableDataFKConstraints(); } @@ -1852,6 +1864,26 @@ } /* + * hasExtensionMember - + * return true when on of the dumpable object + * is an extension members + */ +static bool +hasExtensionMember(TableInfo *tblinfo, int numTables) +{ + int i; + + for (i = 0; i numTables; i++) + { + if (tblinfo[i].dobj.ext_member) + return true; + } + + return false; +} + + +/* * Make a dumpable object for the data of this specific table * * Note: we make a TableDataInfo if and only if we are going to dump the @@ -2026,10 +2058,12 @@ * getTableDataFKConstraints - * add dump-order dependencies reflecting foreign key constraints * - * This code is executed only in a data-only dump --- in schema+data dumps - * we handle foreign key issues by not creating the FK constraints until - * after the data is loaded. In a data-only dump, however, we want to - * order the table data objects in such a way that a table's referenced + * This code is executed only in a data-only dump or when there is extension's + * member -- in schema+data dumps we handle foreign key issues by not creating + * the FK constraints until after the data is loaded. In a data-only dump or + * when there is an extension member to dump (schema dumps do not concern + * extension's objects, they are created during the CREATE EXTENSION), we want + * to order the table data objects in such a way that a table's referenced * tables are restored first. (In the presence of circular references or * self-references this may be impossible; we'll detect and complain about * that during the dependency sorting step.) @@ -2930,9 +2964,14 @@ PQExpBuffer delqry; const char *cmd; + /* Policy is SCHEMA only */ if (dopt-dataOnly) return; + /* CREATE EXTENSION should take care of that */ + if ((tbinfo != NULL) tbinfo-dobj.ext_member) + return; + /* * If polname is NULL, then this record is just indicating that ROW * LEVEL SECURITY is enabled for the table. Dump as ALTER TABLE table @@ -7884,6 +7923,10 @@ if (dopt-dataOnly) return; + /* CREATE EXTENSION should take care of that */ + if ((tbinfo != NULL) tbinfo-dobj.ext_member) + return; + /* Search for comments associated with relation, using table */ ncomments = findComments(fout, tbinfo-dobj.catId.tableoid, @@ -13138,6 +13181,10 @@ if (dopt-dataOnly) return; + /* CREATE EXTENSION should take care of that */ + if ((tbinfo != NULL) tbinfo-dobj.ext_member) + return; + /* Search for comments associated with relation, using table */ nlabels = findSecLabels(fout, tbinfo-dobj.catId.tableoid, @@ -13345,7 +13392,7 @@ static void dumpTable(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo) { - if (tbinfo-dobj.dump !dopt-dataOnly) + if (tbinfo-dobj.dump !dopt-dataOnly !tbinfo-dobj.ext_member) { char *namecopy; @@ -13483,6 +13530,10 @@ int j, k; + /* CREATE EXTENSION should take care of that */ + if ((tbinfo != NULL) tbinfo-dobj.ext_member) + return; + /* Make sure we are in proper schema */ selectSourceSchema(fout, tbinfo-dobj.namespace-dobj.name);
Re: [HACKERS] parallel mode and parallel contexts
On Sat, Jan 17, 2015 at 3:40 AM, Robert Haas robertmh...@gmail.com wrote: New patch attached. I'm going to take the risk of calling this v1 (previous versions have been 0.x), since I've now done something about the heavyweight locking issue, as well as fixed the message-looping bug Amit pointed out. It doubtless needs more work, but it's starting to smell a bit more like a real patch. I need some clarification regarding below code: +BgwHandleStatus +WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle) +{ + BgwHandleStatus status; + int rc; + bool save_set_latch_on_sigusr1; + + save_set_latch_on_sigusr1 = set_latch_on_sigusr1; + set_latch_on_sigusr1 = true; + + PG_TRY(); + { + for (;;) + { + pid_t pid; + + CHECK_FOR_INTERRUPTS(); + + status = GetBackgroundWorkerPid(handle, pid); + if (status == BGWH_STOPPED) + return status; + + rc = WaitLatch(MyProc-procLatch, + WL_LATCH_SET | WL_POSTMASTER_DEATH, 0); + + if (rc WL_POSTMASTER_DEATH) + return BGWH_POSTMASTER_DIED; It seems this code has possibility to wait forever. Assume one of the worker is not able to start (not able to attach to shared memory or some other reason), then status returned by GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED and after that it can wait forever in WaitLatch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] New CF app deployment
Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com wrote: I think you misunderstood me ;). I was talking about the old CF application providing a RSS feed of all changes to all entries. https://commitfest-old.postgresql.org/action/commitfest_activity.rss Oh, I didn't know this one. That's indeed useful. Personally, I never used the RSS feed as such, but I did often consult the activity log pages, which I think are equivalent: https://commitfest-old.postgresql.org/action/commitfest_activity?id=25 That feature seems to be gone too :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: Yeah, you need two separate global variables pointing to shm_mq objects, one of which gets used by pqmq.c for errors and the other of which gets used by printtup.c for tuples. Okay, I will try to change the way as suggested without doing switching, but this way we need to do it separately for 'T', 'D', and 'C' messages. I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel_seqscan_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
Michael Paquier michael.paqu...@gmail.com writes: Coverity is pointing out $subject, with the following stuff in gbt_var_same(): ... As Heikki pointed me out on IM, the lack of crash report in this area, as well as similar coding style in cube/ seem to be sufficient arguments to simply remove those NULL checks instead of doing more solid checks on them. Patch is attached. The way to form a convincing argument that these checks are unnecessary would be to verify that (1) the SQL-accessible functions directly calling gbt_var_same() are all marked STRICT, and (2) the core GIST code never passes a NULL to these support functions. I'm prepared to believe that (1) and (2) are both true, but it merits checking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Jeff Davis pg...@j-davis.com writes: Tom (tgl), Is my reasoning above acceptable? Uh, sorry, I've not been paying any attention to this thread for awhile. What's the remaining questions at issue? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com wrote: I think you misunderstood me ;). I was talking about the old CF application providing a RSS feed of all changes to all entries. https://commitfest-old.postgresql.org/action/commitfest_activity.rss Oh, I didn't know this one. That's indeed useful. Personally, I never used the RSS feed as such, but I did often consult the activity log pages, which I think are equivalent: https://commitfest-old.postgresql.org/action/commitfest_activity?id=25 That feature seems to be gone too :-( Yeah, that's annoying. In fact, I'm the one who forced the RSS feed into the old system, and it's something I use all the time myself. Oops. Turns out I have those in a feature branch that it appears I forgot to merge :( And it also no longer merges cleanly. I've added this to my short-term TODO, and will look at it this evening. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Hi, On 20.1.2015 12:23, Ali Akbar wrote: 2015-01-20 18:17 GMT+07:00 Ali Akbar the.ap...@gmail.com Sorry, there is another comment of makeMdArrayResult, i suggest also changing it like this: @@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate, * beware: no check that specified dimensions match the number of values * accumulated. * + * beware: if the astate was not initialized within a separate memory + * context (i.e. using subcontext=true when calling initArrayResult), + * using release=true is illegal as it releases the whole context, + * and that may include other memory still used elsewhere (instead use + * release=false and release the memory with the parent context later) + * *astate is working state (must not be NULL) *rcontext is where to construct result I think both comment fixes are appropriate. I'll wait a bit and then post an updated version of the patch (unless it gets commited with the comment fixes before that). -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches. With this optimization flag enabled, recent versions of gcc can generate incorrect code that assumes variable-length arrays (such as oidvector) are actually fixed-length because they're embedded in some larger struct. The known instance of this problem was fixed in 9.2 and up by commit 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides actually-variable-length catalog fields from the compiler altogether. And we plan to gradually convert variable-length fields to official flexible array member notation over time, which should prevent this type of bug from reappearing as gcc gets smarter. We're not going to try to back-port those changes into older branches, though, so apply this band-aid instead. Would anybody object to me pushing this commit to branches 8.2 and 8.3? Since those branches are out of support, I am not sure what the point is. If we want people to be able to use those branches reasonably we need to back-port fixes for critical security and stability issues, not just this one thing. But maybe I am missing something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On Mon, Jan 19, 2015 at 8:48 PM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: Specifically, do we regard a partitions as pg_inherits children of its partitioning parent? I don't think this is totally an all-or-nothing decision. I think everyone is agreed that we need to not break things that work today -- e.g. Merge Append. What that implies for pg_inherits isn't altogether clear. One point is that an implementation may end up establishing the parent-partition hierarchy somewhere other than (or in addition to) pg_inherits. One intention would be to avoid tying partitioning scheme to certain inheritance features that use pg_inherits. For example, consider call sites of find_all_inheritors(). One notable example is Append/MergeAppend which would be of interest to partitioning. We would want to reuse that part of the infrastructure but we could might as well write an equivalent, say find_all_partitions() which scans something other than pg_inherits to get all partitions. IMHO, there's little reason to avoid putting pg_inherits entries in for the partitions, and then this just works. We can find other ways to make it work if that turns out to be better, but if we don't have one, there's no reason to complicate things. Agree that some concrete idea of internal representation should help guide the catalog structure. If we are going to cache the partitioning info in relcache (which we most definitely will), then we should try to make sure to consider the scenario of having a lot of partitioned tables with a lot of individual partitions. It looks like it would be similar to a scenarios where there are a lot of inheritance hierarchies. But, availability of partitioning feature would definitely cause these numbers to grow larger. Perhaps this is an important point driving this discussion. Yeah, it would be good if the costs of supporting, say, 1000 partitions were negligible. A primary question for me about partition-pruning is when do we do it? Should we model it after relation_excluded_by_constraints() and hence totally plan-time? But, the tone of the discussion is that we postpone partition-pruning to execution-time and hence my perhaps misdirected attempts to inject it into some executor machinery. It's useful to prune partitions at plan time, because then you only have to do the work once. But sometimes you don't know enough to do it at plan time, so it's useful to do it at execution time, too. Then, you can do it differently for every tuple based on the actual value you have. There's no point in doing 999 unnecessary relation scans if we can tell which partition the actual run-time value must be in. But I think execution-time pruning can be a follow-on patch. If you don't restrict the scope of the first patch as much as possible, you're not going to have much luck getting this committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown t...@linux.com wrote: On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: Yeah, you need two separate global variables pointing to shm_mq objects, one of which gets used by pqmq.c for errors and the other of which gets used by printtup.c for tuples. Okay, I will try to change the way as suggested without doing switching, but this way we need to do it separately for 'T', 'D', and 'C' messages. I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. Which commit is this based against? I'm getting errors with the latest master: It seems to me that you have not applied parallel-mode patch before applying this patch, can you try once again by first applying the patch posted by Robert at below link: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com commit-id used for this patch - 0b49642 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On 20 January 2015 at 16:55, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown t...@linux.com wrote: On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: Yeah, you need two separate global variables pointing to shm_mq objects, one of which gets used by pqmq.c for errors and the other of which gets used by printtup.c for tuples. Okay, I will try to change the way as suggested without doing switching, but this way we need to do it separately for 'T', 'D', and 'C' messages. I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. Which commit is this based against? I'm getting errors with the latest master: It seems to me that you have not applied parallel-mode patch before applying this patch, can you try once again by first applying the patch posted by Robert at below link: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com commit-id used for this patch - 0b49642 D'oh. Yes, you're completely right. Works fine now. Thanks. Thom
Re: [HACKERS] Parallel Seq Scan
On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: Yeah, you need two separate global variables pointing to shm_mq objects, one of which gets used by pqmq.c for errors and the other of which gets used by printtup.c for tuples. Okay, I will try to change the way as suggested without doing switching, but this way we need to do it separately for 'T', 'D', and 'C' messages. I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. I'm getting an issue: ➤ psql://thom@[local]:5488/pgbench # set parallel_seqscan_degree = 8; SET Time: 0.248 ms ➤ psql://thom@[local]:5488/pgbench # explain select c1 from t1; QUERY PLAN -- Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4) Number of Workers: 8 Number of Blocks Per Worker: 11 (3 rows) Time: 0.322 ms # explain analyse select c1 from t1; QUERY PLAN --- Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4) (actual time=0.024..13.468 rows=100 loops=1) Number of Workers: 8 Number of Blocks Per Worker: 11 Planning time: 0.040 ms Execution time: 13.862 ms (5 rows) Time: 14.188 ms ➤ psql://thom@[local]:5488/pgbench # set parallel_seqscan_degree = 10; SET Time: 0.219 ms ➤ psql://thom@[local]:5488/pgbench # explain select c1 from t1; QUERY PLAN -- Parallel Seq Scan on t1 (cost=0.00..19.18 rows=100 width=4) Number of Workers: 10 Number of Blocks Per Worker: 9 (3 rows) Time: 0.375 ms ➤ psql://thom@[local]:5488/pgbench # explain analyse select c1 from t1; So setting parallel_seqscan_degree above max_worker_processes causes the CPU to max out, and the query never returns, or at least not after waiting 2 minutes. Shouldn't it have a ceiling of max_worker_processes? The original test I performed where I was getting OOM errors now appears to be fine: # explain (analyse, buffers, timing) select distinct bid from pgbench_accounts; QUERY PLAN HashAggregate (cost=1400411.11..1400412.11 rows=100 width=4) (actual time=8504.333..8504.335 rows=13 loops=1) Group Key: bid Buffers: shared hit=32 read=18183 - Parallel Seq Scan on pgbench_accounts (cost=0.00..1375411.11 rows=1000 width=4) (actual time=0.054..7183.494 rows=1000 loops=1) Number of Workers: 8 Number of Blocks Per Worker: 18215 Buffers: shared hit=32 read=18183 Planning time: 0.058 ms Execution time: 8876.967 ms (9 rows) Time: 8877.366 ms Note that I increased seq_page_cost to force a parallel scan in this case. Thom
Re: [HACKERS] proposal: searching in array function - array_position
Hi I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function. Regards Pavel 2015-01-17 23:43 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: Hi here is a proof concept of array_offset function possible question: * used comparation = or IS NOT DISTINCT FROM In this initial proof concept I used IS NOT DISTINCT FROM operator - but my opinion is not strong in this question. Both has some advantages and disadvantages. Regards Pavel 2015-01-16 19:12 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: 2015-01-16 18:37 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/16/15 11:16 AM, Pavel Stehule wrote: 2015-01-16 17:57 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto: jim.na...@bluetreble.com: On 1/16/15 3:39 AM, Pavel Stehule wrote: I am proposing a simple function, that returns a position of element in array. Yes please! FUNCTION array_position(anyarray, anyelement) RETURNS int That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice. It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array array_position([1,2,3],2) -- 2 array_position([[1,2],[2,3],[3,4]], [2,3]) -- 2 /* 2nd parameter should to have N-1 dimension of first parameter */ The problem with that is you can't actually use '2' to get [2,3] back: select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL; ?column? -- t (1 row) yes, but when you are searching a array in array you can use a full slice selection: postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a constant every time in this case -- so it should not be returned array - {{1,2}} (1 row) I think the bigger problem here is we need something better than slices for handling subsets of arrays. Even if the function returned [2:2] it's still going to behave differently than it will in the non-array case because you won't be getting the expected number of dimensions back. :( you cannot to return a slice and I don't propose it, although we can return a range type or array of range type - but still we cannot to use range for a arrays. array_position_md([1,2,3],2) -- [2] array_position_md([[1,2],[2,3],[3,4]], 2) -- [2,1] another question is how to solve more than one occurrence on one value - probably two sets of functions - first returns first occurrence of value, second returns set of occurrence Gee, if only way had some way to return multiple elements of something... ;P In other words, I think all of these should actually return an array of positions. I think it's OK for someone that only cares about the first instance to just do [1]. there can be two functions - position - returns first and positions returns all as a array -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml new file mode 100644 index 9ea1068..b3630b4 *** a/doc/src/sgml/array.sgml --- b/doc/src/sgml/array.sgml *** SELECT * FROM sal_emp WHERE pay_by_quart *** 600,605 --- 600,614 index, as described in xref linkend=indexes-types. /para + para + You can also search any value in array using the functionarray_offset/ + function (It returns a position of first occurrence of value in the array): + + programlisting + SELECT array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon'); + /programlisting + /para + tip para Arrays are not sets; searching for specific array elements diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 5e7b000..62c9f7f *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT NULLIF(value, '(none)') ... *** 11474,11479 --- 11474,11482 primaryarray_lower/primary /indexterm indexterm + primaryarray_offset/primary + /indexterm + indexterm primaryarray_prepend/primary /indexterm indexterm *** SELECT NULLIF(value, '(none)') ... *** 11592,11597 --- 11595,11613 /row row entry + literal + functionarray_offset/function(typeanyarray/type, typeanyelement/type optional, typeint/type/optional) + /literal + /entry + entrytypeint/type/entry + entryreturns a offset of first occurrence of some element in a array. It uses + a literalIS NOT DISTINCT FROM/ operator for comparation. Third + optional argument can specify a initial offset when searching starts. /entry + entryliteralarray_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon')/literal/entry + entryliteral2/literal/entry +/row +
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
Robert Haas wrote: Would anybody object to me pushing this commit to branches 8.2 and 8.3? Since those branches are out of support, I am not sure what the point is. If we want people to be able to use those branches reasonably we need to back-port fixes for critical security and stability issues, not just this one thing. But maybe I am missing something. I just want to make it easy to compile those branches with current toolset so that I can study their behavior to suggest workarounds for customer problems etc -- nothing more. I am not proposing to open them up again for support. Of course, I can carry the patched branches locally if there is strong opposition, but since it's harmless, I don't see why would there be any such. Another easy workaround is to add -O0 to CFLAGS, and I can script that easily too. Without this patch or -O0, initdb fails with inicializando pg_authid ... FATAL: wrong number of index expressions SENTENCIA: CREATE TRIGGER pg_sync_pg_database AFTER INSERT OR UPDATE OR DELETE ON pg_database FOR EACH STATEMENT EXECUTE PROCEDURE flatfile_update_trigger(); There is the additional problem that contrib/cube fails to compile, but I don't care enough about that one: /pgsql/source/REL8_3_STABLE/contrib/cube/cubeparse.y:61:17: error: ‘result’ undeclared (first use in this function) *((void **)result) = write_box( dim, $2, $4 ); ^ -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila amit.kapil...@gmail.com wrote: I think whichever process reads postgresql.conf/postgresql.auto.conf have to do this (unless we restrict that this will be done at some other time) and postmaster is one of them. It seems to me that it is not good idea unless we do it at other time like populating entries for a view. Well, the postmaster does not have database access, and neither do most of the auxiliary processes. The autovacuum launcher has very limited database access (shared catalogs only). Making the postmaster access the database is a complete non-starter; we have worked very hard to avoid that, for reasons of overall system robustness. If the postmaster crashes or locks up, manual intervention is required; if any other process crashes, the postmaster can reset the whole system. Independently of that, it sounds like solving the problem from the wrong end. I think the idea of ALTER SYSTEM .. SET ought to be that you should EITHER edit postgresql.conf for all your configuration changes, OR you should use ALTER SYSTEM .. SET for all of your changes. If you mix-and-match, well, that's certainly allowed and it will work and everything, but you - as a human being - might get confused. Right, but we can't completely eliminate such a possibility (as an example we have some default settings like max_connections, shared_buffers, etc). I agree with you that users should use only way of changing the settings, however for certain rare cases (default settings or some other) we can provide a way for user to know which of the settings are duplicate. I think if we can provide such an information via some view with ease then it's worth to have it. I'd suggest abandoning the idea of storing anything in a persistent basis in a system catalog and look for some way for the backend to which the user is connected to expose its own state instead. For example, pg_settings already exposes sourcefile and sourceline settings. Actually, I'm not quite sure why that's not sufficient here, but if it isn't, it could be extended. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever. Assume one of the worker is not able to start (not able to attach to shared memory or some other reason), then status returned by GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED and after that it can wait forever in WaitLatch. I don't think that's right. The status only remains BGWH_NOT_YET_STARTED until the postmaster forks the child process. At that point it immediately changes to BGWH_STARTED. If it starts up and then dies early on, for example because it can't attach to shared memory or somesuch, the status will change to BGWH_STOPPED. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
Hi, I'm analyzing a problem in which a customer had a pg_basebackup (from standby) created 9.2 cluster that failed with WAL contains references to invalid pages. The failed record was a xlog redo visible i.e. XLOG_HEAP2_VISIBLE. First I thought there might be another bug along the line of 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't seem to have any problems. Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when the basebackup was started and finished *before* pg_basebackup finished. movedb() basically works in these steps: 1) lock out users of the database 2) RequestCheckpoint(IMMEDIATE|WAIT) 3) DropDatabaseBuffers() 4) copydir() 5) XLogInsert(XLOG_DBASE_CREATE) 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE) 7) rmtree(src_dbpath) 8) XLogInsert(XLOG_DBASE_DROP) 9) unlock database If a basebackup starts while 4) is in progress and continues until 7) happens I think a pretty wide race opens: The basebackup can end up with a partial copy of the database in the old tablespace because the rmtree(old_path) concurrently was in progress. Normally such races are fixed during replay. But in this case, the replay of the XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);. fixing nothing. Besides making AD .. ST use sane WAL logging, which doesn't seem backpatchable, I don't see what could be done against this except somehow making basebackups fail if a AD .. ST is in progress. Which doesn't look entirely trivial either. This is a pretty nasty bug imo, because you're in no way guaranteed to be noticed. If the problem happens only in some large, seldomly read, table, the database might appear to be in a correct state. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote: I think that the attached patch should at least fix that much. Maybe the problem on the other animal is also explained by the lack of this, since there could also be a MinGW-ish strxfrm_l(), I suppose. Committed that, rather blindly, since it looks like a reasonable fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
On 25.12.2014 22:28, Tomas Vondra wrote: On 25.12.2014 21:14, Andres Freund wrote: That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? The goal was to keep the logic as close to the original as possible. IIRC there were pgstat wait timeout issues before, and in most cases the conclusion was that it's probably because of overloaded I/O. But maybe there actually was another bug, and it's entirely possible that the split introduced a new one, and that's what we're seeing now. The strange thing is that the split happened ~2 years ago, which is inconsistent with the sudden increase of this kind of issues. So maybe something changed on that particular animal (a failing SD card causing I/O stalls, perhaps)? Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce and analyze the issue locally. But that won't happen until January. I've tried to reproduce this on my Raspberry PI 'machine' and it's not very difficult to trigger this. About 7 out of 10 'make check' runs fail because of 'pgstat wait timeout'. All the occurences I've seen were right after some sort of VACUUM (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time looked something like this: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util mmcblk0 0.0075.000.008.00 0.0036.00 9.00 5.73 15633.750.00 15633.75 125.00 100.00 So pretty terrible (this is a Class 4 SD card, supposedly able to handle 4 MB/s). If hamster had faulty SD card, it might have been much worse, I guess. This of course does not prove the absence of a bug - I plan to dig into this a bit more. Feel free to point out some suspicious scenarios that might be worth reproducing and analyzing. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: Yeah, you need two separate global variables pointing to shm_mq objects, one of which gets used by pqmq.c for errors and the other of which gets used by printtup.c for tuples. Okay, I will try to change the way as suggested without doing switching, but this way we need to do it separately for 'T', 'D', and 'C' messages. I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. Which commit is this based against? I'm getting errors with the latest master: thom@swift:~/Development/postgresql$ patch -p1 ~/Downloads/parallel_seqscan_v4.patch patching file src/backend/access/Makefile patching file src/backend/access/common/printtup.c patching file src/backend/access/shmmq/Makefile patching file src/backend/access/shmmq/shmmqam.c patching file src/backend/commands/explain.c Hunk #1 succeeded at 721 (offset 8 lines). Hunk #2 succeeded at 918 (offset 8 lines). Hunk #3 succeeded at 1070 (offset 8 lines). Hunk #4 succeeded at 1337 (offset 8 lines). Hunk #5 succeeded at 2239 (offset 83 lines). patching file src/backend/executor/Makefile patching file src/backend/executor/execProcnode.c patching file src/backend/executor/execScan.c patching file src/backend/executor/execTuples.c patching file src/backend/executor/nodeParallelSeqscan.c patching file src/backend/executor/nodeSeqscan.c patching file src/backend/libpq/pqmq.c Hunk #1 succeeded at 23 with fuzz 2 (offset -3 lines). Hunk #2 FAILED at 63. Hunk #3 succeeded at 132 (offset -31 lines). 1 out of 3 hunks FAILED -- saving rejects to file src/backend/libpq/pqmq.c.rej patching file src/backend/optimizer/path/Makefile patching file src/backend/optimizer/path/allpaths.c patching file src/backend/optimizer/path/costsize.c patching file src/backend/optimizer/path/parallelpath.c patching file src/backend/optimizer/plan/createplan.c patching file src/backend/optimizer/plan/planner.c patching file src/backend/optimizer/plan/setrefs.c patching file src/backend/optimizer/util/pathnode.c patching file src/backend/postmaster/Makefile patching file src/backend/postmaster/backendworker.c patching file src/backend/postmaster/postmaster.c patching file src/backend/tcop/dest.c patching file src/backend/tcop/postgres.c Hunk #1 succeeded at 54 (offset -1 lines). Hunk #2 succeeded at 1132 (offset -1 lines). patching file src/backend/utils/misc/guc.c patching file src/backend/utils/misc/postgresql.conf.sample can't find file to patch at input line 2105 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h |index 761ba1f..00ad468 100644 |--- a/src/include/access/parallel.h |+++ b/src/include/access/parallel.h -- File to patch: -- Thom
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson tim.wil...@telogis.com wrote: Was slightly optimistic that this comment in the release notes might mean that my bug with bloat on hot tables might have been fixed in 9.4 /Make VACUUM properly report dead but not-yet-removable rows to the statistics collector (Hari Babu) Previously these were reported as live rows./ Unfortunately that is not the case, and we still have the problem in 9.4 As far as I can tell from reviewing the thread, what we need to do here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. There was a lot of concern about doing that in the back-branches, but I'm skeptical that the concern is justified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
2015-01-20 19:16 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: Hi I cannot to set my name as author for patch: https://commitfest.postgresql.org/4/112/ It is solved now - I don't understand a autocomplete in first moment All works well Regards Pavel Regards Pavel 2015-01-13 6:35 GMT+01:00 Magnus Hagander mag...@hagander.net: Hi! Last I said something about the new CF app I said I was planning to deploy it over the holidays, and that clearly did not happen. But I've been talking to Michael, as the current cf-chief, and doing some further testing with it, and we think now is a good time to go for it :) As the plan is to close out the current CF just a few days from now, we're going to use that and try to swap it out when traffic is at least at it's lowest (even if we're well aware it's not going to be zero). So based on this, we plan to: In the late evening on Jan 19th (evening European time that is), I will make the current CF app readonly, and move it to a new url where it will remain available for the foreseeable future (we have a lot of useful data in it after all). Once this is done, Michael (primarily) will start working on syncing up the information about the latest patches into the new app. Much info is already synced there, but to make sure all the latest changes are included. In the morning European time on the 20th, I'll take over and try to finish up what's left. And sometime during the day when things are properly in sync, we'll open up the new app for business to all users. There are surely some bugs remaining in the system, so please have some oversight with that over the coming days/weeks. I'll try to get onto fixing them as soon as I can - and some others can look at that as well if it's something critical at least. Further status updates to come as we start working on it... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Reducing buildfarm disk usage: remove temp installs when done
On 01/19/2015 09:53 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: But I'm wondering if we should look at using the tricks git-new-workdir uses, setting up symlinks instead of a full clone. Then we'd have one clone with a bunch of different work dirs. That plus a but of explicitly done garbage collection and possibly a periodic re-clone might do the trick. Yeah, I was wondering whether it'd be okay to depend on git-new-workdir. That would fix the problem pretty nicely. But in the installations I've seen, that's not in PATH but squirreled away in some hard-to-guess library directory ... We should move this discussion to the buildfarm members list. I'll be publishing a patch there. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
Tom Lane wrote: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches. With this optimization flag enabled, recent versions of gcc can generate incorrect code that assumes variable-length arrays (such as oidvector) are actually fixed-length because they're embedded in some larger struct. The known instance of this problem was fixed in 9.2 and up by commit 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides actually-variable-length catalog fields from the compiler altogether. And we plan to gradually convert variable-length fields to official flexible array member notation over time, which should prevent this type of bug from reappearing as gcc gets smarter. We're not going to try to back-port those changes into older branches, though, so apply this band-aid instead. Would anybody object to me pushing this commit to branches 8.2 and 8.3? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
On 2015-01-20 11:10:53 -0500, Robert Haas wrote: On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches. With this optimization flag enabled, recent versions of gcc can generate incorrect code that assumes variable-length arrays (such as oidvector) are actually fixed-length because they're embedded in some larger struct. The known instance of this problem was fixed in 9.2 and up by commit 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides actually-variable-length catalog fields from the compiler altogether. And we plan to gradually convert variable-length fields to official flexible array member notation over time, which should prevent this type of bug from reappearing as gcc gets smarter. We're not going to try to back-port those changes into older branches, though, so apply this band-aid instead. Would anybody object to me pushing this commit to branches 8.2 and 8.3? Since those branches are out of support, I am not sure what the point is. If we want people to be able to use those branches reasonably we need to back-port fixes for critical security and stability issues, not just this one thing. But maybe I am missing something. Supporting and being able to compile and run 'make check' (which doesn't complete = gcc 4.8) aren't the same thing though. And we e.g. try to provide pg_dump and libpq support for older versions, which is hard to ensure if you can't run them. I personally think that being able to at least compile/make check old versions a bit longer is a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
On Tue, Jan 20, 2015 at 11:18 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: Would anybody object to me pushing this commit to branches 8.2 and 8.3? Since those branches are out of support, I am not sure what the point is. If we want people to be able to use those branches reasonably we need to back-port fixes for critical security and stability issues, not just this one thing. But maybe I am missing something. I just want to make it easy to compile those branches with current toolset so that I can study their behavior to suggest workarounds for customer problems etc -- nothing more. I am not proposing to open them up again for support. Oh, I see. Well, that doesn't bother me, then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
Hi I cannot to set my name as author for patch: https://commitfest.postgresql.org/4/112/ Regards Pavel 2015-01-13 6:35 GMT+01:00 Magnus Hagander mag...@hagander.net: Hi! Last I said something about the new CF app I said I was planning to deploy it over the holidays, and that clearly did not happen. But I've been talking to Michael, as the current cf-chief, and doing some further testing with it, and we think now is a good time to go for it :) As the plan is to close out the current CF just a few days from now, we're going to use that and try to swap it out when traffic is at least at it's lowest (even if we're well aware it's not going to be zero). So based on this, we plan to: In the late evening on Jan 19th (evening European time that is), I will make the current CF app readonly, and move it to a new url where it will remain available for the foreseeable future (we have a lot of useful data in it after all). Once this is done, Michael (primarily) will start working on syncing up the information about the latest patches into the new app. Much info is already synced there, but to make sure all the latest changes are included. In the morning European time on the 20th, I'll take over and try to finish up what's left. And sometime during the day when things are properly in sync, we'll open up the new app for business to all users. There are surely some bugs remaining in the system, so please have some oversight with that over the coming days/weeks. I'll try to get onto fixing them as soon as I can - and some others can look at that as well if it's something critical at least. Further status updates to come as we start working on it... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Davis pg...@j-davis.com writes: Tom (tgl), Is my reasoning above acceptable? Uh, sorry, I've not been paying any attention to this thread for awhile. What's the remaining questions at issue? This patch is trying to improve the array_agg case where there are many arrays being constructed simultaneously, such as in HashAgg. You strongly suggested that a commitable patch would differentiate between the grouped and ungrouped cases (or perhaps you meant differentiate between HashAgg and sorted aggregation?). Tomas's current patch does not do so; it does two main things: 1. Uses a common memory context for arrays being constructed by array_agg in any context (ungrouped, sorted agg, and HashAgg) 2. Reduces the initial array allocation to 8 elements from 64, but preserves doubling behavior I don't see either as a big problem, but perhaps there are some downsides in some cases. I think a worst case would be a slowdown in the sorted agg case where every group has 64 elements, so I'll try to see if that's a real problem or not. If you saw a bigger problem, please let me know; and if not, I'll proceed with the review. There are also some other concerns I have about the API ugliness, which I believe is the reason there's so much discussion about making the comments better. The reason the API is ugly is for backwards compatibility, so there's no perfect solution. Your opinion is welcome here too, but I mainly need to see if your objection above has been addressed. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: So when I'm trying to decide what to audit, I have to: (a) check if the current user is mentioned in .roles; if so, audit. (b) check if the current user is a descendant of one of the roles mentioned in .roles; if not, no audit. (c) check for permissions granted to the root role and see if that tells us to audit. Is that right? If so, it would work fine. I don't look forward to trying to explain it to people, but yes, it would work (for anything you could grant permissions for). I think this points to fundamental weakness in the idea of doing this through the GRANT system. Some people are going want to audit everything a particular user does, some people are going to want to audit all access to particular objects, and some people will have more complicated requirements. Some people will want to audit even super-users, others especially super-users, others only non super-users. None of this necessarily matches up to the usual permissions framework. Have you considered splitting pgaudit.c into multiple files, perhaps along the pre/post deparse lines? If such a split were done properly, then could the backwards-compatible version be more acceptable for inclusion in contrib in 9.5? If so, I'll look into it. We're not going to include code in contrib that has leftovers in it for compatibility with earlier source trees. That's been discussed on this mailing list many times and the policy is clear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On 1/16/15 10:32 PM, David G Johnston wrote: One thought I have in this line is that currently there doesn't seem to be a way to know if the setting has an entry both in postgresql.conf and postgresql.auto.conf, if we can have some way of knowing the same (pg_settings?), then it could be convenient for user to decide if the value in postgresql.auto.conf is useful or not and if it's not useful then use Alter System .. Reset command to remove the same from postgresql.auto.conf. I think one way is that pg_settings has file name of variables, But It would not affect to currently status of postgresql.conf So we would need to parse postgresql.conf again at that time. Yeah that could be a possibility, but I think that will break the existing command('s) as this is the common infrastructure used for SHOW .. commands as well which displays the guc value that is used by current session rather than the value in postgresql.conf. You're right. pg_setting and SHOW command use value in current session rather than config file. It might break these common infrastructure. Two changes solve this problem in what seems to be a clean way. 1) Upon each parsing of postgresql.conf we store all assigned variables somewhere Parsing is relatively cheap, and it's not like we need high performance from this. So, -1 on permanent storage. 2) We display these assignments in a new pg_settings column named system_reset_val I would also extend this to include: a) upon each parsing of postgresql.auto.conf we store all assigned variables somewhere (maybe the same place as postgresql.conf and simply label the file source) You can not assume there are only postgresql.conf and postgresql.auto.conf. Complex environments will have multiple included files. b) add an alter_system_val field to show that value (or null) c) add a db_role_val to show the current value for the session via pg_db_role_setting You're forgetting that there are also per-role settings. And I'm with Robert; what's wrong with sourcefile and sourceline? Perhaps we just need to teach those about ALTER ROLE SET and ALTER DATABASE SET (if they don't already know about them). c.1) add a db_role_id to show the named user that is being used for the db_role_val lookup The thinking for c.1 is that in situations with role hierarchies and SET ROLE usage it would be nice to be able to query what the connection role - the one used during variable lookup - is. I'm losing track of exactly what we're trying to solve here, but... If the goal is to figure out what settings would be in place for a specific user connecting to a specific database, then we should create a SRF that does just that (accepting a database name and role name). You could then do... SELECT * FROM pg_show_all_settings( 'database', 'role' ) a; I'm probably going overkill on this but there are not a lot of difference sources nor do they change frequently so extending the pg_settings view to be more of a one-stop-shopping for this information seems to make sense. Speaking of overkill... one thing that you currently can't do is find out what #includes have been processed. Perhaps it's worth having a SRF that would return that info... As it relates back to this thread the desired merging ends up being done inside this view and at least gives the DBA a central location (well, along with pg_db_role_setting) to go and look at the configuration landscape for the system. I think the goal is good, but the interface needs to be rethought. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Tue, Jan 20, 2015 at 1:24 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/16/15 10:32 PM, David G Johnston wrote: Two changes solve this problem in what seems to be a clean way. 1) Upon each parsing of postgresql.conf we store all assigned variables somewhere Parsing is relatively cheap, and it's not like we need high performance from this. So, -1 on permanent storage. OK 2) We display these assignments in a new pg_settings column named system_reset_val I would also extend this to include: a) upon each parsing of postgresql.auto.conf we store all assigned variables somewhere (maybe the same place as postgresql.conf and simply label the file source) You can not assume there are only postgresql.conf and postgresql.auto.conf. Complex environments will have multiple included files. #include files still appear to come from postgresql.conf; I'm not proposing we try and memorize every single instance of a variable declaration and provide a global overlaps query - though that piece already seems to be in place... b) add an alter_system_val field to show that value (or null) c) add a db_role_val to show the current value for the session via pg_db_role_setting You're forgetting that there are also per-role settings. And I'm with Robert; what's wrong with sourcefile and sourceline? Perhaps we just need to teach those about ALTER ROLE SET and ALTER DATABASE SET (if they don't already know about them). Actually, there are not separate PER ROLE and PER DATABASE settings even though there are different SQL commands. The catalog is simply pg_db_role_setting with the use of all tags (i.e., 0) as necessary. But I see where you are going and do not disagree that precedence information could be useful. sourceline and sourcefile pertain only to the current value while the point of adding these other pieces is to provide a snapshot of all the different mappings that the system knows about; instead of having to tell a user to go look in two different files (and associated includes) and a database catalog to find out what possible values are in place. That doesn't solve the need to scan the catalog to see other possible values - though you could at least put a counter in pg_settings that indicates how many pg_db_role_setting entries reference the given variable so that if non-zero the user is clued into the fact that they need to check out said catalog table. c.1) add a db_role_id to show the named user that is being used for the db_role_val lookup The thinking for c.1 is that in situations with role hierarchies and SET ROLE usage it would be nice to be able to query what the connection role - the one used during variable lookup - is. I'm losing track of exactly what we're trying to solve here, but... If the goal is to figure out what settings would be in place for a specific user connecting to a specific database, then we should create a SRF that does just that (accepting a database name and role name). You could then do... SELECT * FROM pg_show_all_settings( 'database', 'role' ) a; This is fine - but I'm thinking about situations where a user immediately SET ROLE on their session and typically operate as said user; if they try doing an ALTER ROLE SET for this SET ROLE user it will not work because their login user is what matters. This is probably a solution looking for a problem but it is a dynamic that one needs to be aware of. I'm probably going overkill on this but there are not a lot of difference sources nor do they change frequently so extending the pg_settings view to be more of a one-stop-shopping for this information seems to make sense. Speaking of overkill... one thing that you currently can't do is find out what #includes have been processed. Perhaps it's worth having a SRF that would return that info... As it relates back to this thread the desired merging ends up being done inside this view and at least gives the DBA a central location (well, along with pg_db_role_setting) to go and look at the configuration landscape for the system. I think the goal is good, but the interface needs to be rethought. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com My main concern with the UI would be too-much-information; but otherwise if we accept the premise that we should include as much as possible and then let the user (or us) provide more useful subsets based upon common use-cases the main issue is making sure we've identified all of the relevant information that needs to be captured - even if it is something as minor as the whole logon vs. current role dynamic. I'm ignoring the cost/benefit aspect of implementation for the moment largely because I do known enough about the backend to make reasonable comments. But much of the data is already present in various views and catalogs - I just think having one-stop-shop would be useful and go a long
Re: [HACKERS] logical column ordering
I've decided to abandon this patch. I have spent too much time looking at it now. If anyone is interested in trying to study, I can provide the patches I came up with, explanations, and references to prior discussion -- feel free to ask. My main motivation for this work is to enable a later patch for column stores. Right now, since columns have monotonically increasing attnums, it's rather difficult to have columns that are stored elsewhere. My plan for that now is much more modest, something like adding a constant 1 to attnums and that would let us identify columns that are outside the heap -- or something like that. I haven't fully worked it out yet. Just a few quick notes about this patch: last thing I was doing was mess with setrefs.c so that Var nodes carry varphysnum annotations, which are set to 0 during initial planner phases, and are turned into the correct attphysnum (the value extracted from catalogs) so that TupleDescs constructed from targetlists by ExecTypeFromTL and friends can have the correct attphysnum too. I think this part works correctly, with the horrible exception that I had to do a relation_open() in setrefs.c to get hold of the right attphysnum from a tupledesc obtained from catalogs. That's not acceptable at all; I think the right way to do this would be to store a list of numbers earlier (not sure when) and store it in RelOptInfo or RangeTableEntry; that would be accesible during setrefs.c. The other bit I did was modify all the heaptuple.c code so that it could deal correctly with tupledescs that have attphysnums and attlognum in an order different from stock attnum. That took some time to get right, but I think it's also correct now. One issue I had was finding places that use attnum as an index into the tupledesc attrs array. I had to examine all these places and change them to use a physattrs array, which is an array that has been sorted by physical number. I don't think all the changes are correct, and I'm not sure that I caught them all. Anyway it seems to me that this is mostly there. If somebody is interested in learning executor code, this project would be damn cool to get done. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Hi, On 20.1.2015 21:13, Jeff Davis wrote: On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Davis pg...@j-davis.com writes: Tom (tgl), Is my reasoning above acceptable? Uh, sorry, I've not been paying any attention to this thread for awhile. What's the remaining questions at issue? This patch is trying to improve the array_agg case where there are many arrays being constructed simultaneously, such as in HashAgg. You strongly suggested that a commitable patch would differentiate between the grouped and ungrouped cases (or perhaps you meant differentiate between HashAgg and sorted aggregation?). Tomas's current patch does not do so; it does two main things: I don't think that's entirely true. The problem with the initial (experimental) patch was that while it fixed aggregate queries, it mostly ignored all the other callers, and either resulted in memory corruption (unexpected pfree) or bloat (when not doint the pfree). Tom's message where he points that out is here: http://www.postgresql.org/message-id/20707.1396372...@sss.pgh.pa.us The current patch does that distinction properly (IMHO), because it does this distinction - all the callers using the underlying array functions will use the original approach (with subcontexts), and only the array_agg uses the new API and forces subcontext=false. 1. Uses a common memory context for arrays being constructed by array_agg in any context (ungrouped, sorted agg, and HashAgg) 2. Reduces the initial array allocation to 8 elements from 64, but preserves doubling behavior Yes, that's true. I'd like to point out that while the current code uses 64 items, it also uses 8kB per-grop contexts. That's slightly overkill I guess ... I don't see either as a big problem, but perhaps there are some downsides in some cases. I think a worst case would be a slowdown in the sorted agg case where every group has 64 elements, so I'll try to see if that's a real problem or not. If you saw a bigger problem, please let me know; and if not, I'll proceed with the review. FWIW I've done a fair amount of measurements and not noticed any measurable difference (unless using a rather crazy testcase, IIRC). Certainly the issues with excessive memory consumption (and swapping) were much worse. There are also some other concerns I have about the API ugliness, which I believe is the reason there's so much discussion about making the comments better. The reason the API is ugly is for backwards compatibility, so there's no perfect solution. Your opinion is welcome here too, but I mainly need to see if your objection above has been addressed. I generally agree that having two API 'facets' with different behavior is slightly awkward and assymetric, but I wouldn't call that ugly. I actually modified both APIs initially, but I think Ali is right that not breaking the existing API (and keeping the original behavior in that case) is better. We can break it any time we want in the future, but it's impossible to unbreak it ;-) regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote: Dear me. Peter, can you fix this RSN? Investigating. It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's the appropriate fix, though: there is probably a case to be made for not bothering with abbreviation once we've read tuples in for the final merge run. More likely, the strongest case is for storing the abbreviated keys on disk too, and reading those back. -- Peter Geoghegan diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 6d3aa88..adf4c4d 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -3148,6 +3148,7 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup, MinimalTuple tuple = (MinimalTuple) palloc(tuplen); char *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET; HeapTupleData htup; + Datum original; USEMEM(state, GetMemoryChunkSpace(tuple)); /* read in the tuple proper */ @@ -3161,10 +3162,29 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup, /* set up first-column key value */ htup.t_len = tuple-t_len + MINIMAL_TUPLE_OFFSET; htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET); - stup-datum1 = heap_getattr(htup, -state-sortKeys[0].ssup_attno, -state-tupDesc, -stup-isnull1); + /* Once again, store abbreviated key representation */ + original = heap_getattr(htup, + state-sortKeys[0].ssup_attno, + state-tupDesc, + stup-isnull1); + + if (!state-sortKeys-abbrev_converter || stup-isnull1) + { + /* + * Store ordinary Datum representation, or NULL value. If there is a + * converter it won't expect NULL values, and cost model is not + * required to account for NULL, so in that case we avoid calling + * converter and just set datum1 to void representation (to be + * consistent). + */ + stup-datum1 = original; + } + else + { + /* Store abbreviated key representation */ + stup-datum1 = state-sortKeys-abbrev_converter(original, + state-sortKeys); + } } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
On 21.1.2015 00:38, Michael Paquier wrote: On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra I've tried to reproduce this on my Raspberry PI 'machine' and it's not very difficult to trigger this. About 7 out of 10 'make check' runs fail because of 'pgstat wait timeout'. All the occurences I've seen were right after some sort of VACUUM (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time looked something like this: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util mmcblk0 0.0075.000.008.00 0.0036.00 9.00 5.73 15633.750.00 15633.75 125.00 100.00 So pretty terrible (this is a Class 4 SD card, supposedly able to handle 4 MB/s). If hamster had faulty SD card, it might have been much worse, I guess. By experience, a class 10 is at least necessary, with a minimum amount of memory to minimize the apparition of those warnings, hamster having now a 8GB class 10 card. Well, my goal was exactly to produce those warnings ;-) and see if I can identify some strange cases. That's why I chose just class 4. But even then it produces rather low number of those warnings (one or two per check run), and mostly at the expected places with significant I/O overload. So I'm not any wiser :-( regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. While hacking up a patch to demonstrate the simplicity of extending this to the Datum sorter, I seem to have run into a fairly major issue with this: there seems to be no attempt whatsoever to handle spilling to disk correctly. The data spilled to disk only has the un-abbreviated values, but nothing tries to re-abbreviate it (or disable abbreviations) when it is read back in, and chaos ensues: Dear me. Peter, can you fix this RSN? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: The comment in tuplesort_begin_datum that abbreviation can't be used seems wrong to me; why is the copy of the original value pointed to by stup-tuple (in the case of by-reference types, and abbreviation is obviously not needed for by-value types) not sufficient? We haven't formalized the idea that pass-by-value types are not targets for abbreviation (it's just that the practical application of abbreviated keys is likely to be limited to pass-by-reference types, generating a compact pass-by-value abbreviated representation). That could be a useful restriction to formalize, and certainly seems likely to be a harmless one, but for now that's the way it is. It might be sufficient for some tuplesort_begin_datum() callers. Datum tuple sorts require the original values. Aside from the formalization of abbreviation only applying to pass-by-value types, you'd have to teach tuplesort_getdatum() to reconstruct the non-abbreviated representation transparently from each SortTuple's tuple proper. However, the actual tuplesort_getdatum() calls could be the dominant cost, not the sort (I'm not sure of that offhand - that's total speculation). Basically, the intersection of the datum sort case with abbreviated keys seems complicated. I tended to think that the solution was to force a heaptuple sort instead (where abbreviation naturally can be used), since clearly that could work in some important cases like nodeAgg.c, iff the gumption to do it that way was readily available. Rightly or wrongly, I preferred that idea to the idea of teaching the Datum case to handle abbreviation across the board. Maybe that's the wrong way of fixing that, but for now I don't think it's acceptable that abbreviation isn't always used in certain cases where it could make sense (e.g. not for simple GroupAggregates with a single attribute -- only multiple attribute GroupAggregates). After all, most sort cases (e.g. B-Tree builds) didn't use SortSupport for several years, simply because no one got around to it until I finally did a few months back. Note that most tuplesort non-users of abbreviation don't use abbreviation for sensible reasons. For example, abbreviation simply doesn't make sense for Top-N heap sorts, or MJExamineQuals(). The non-single-attribute GroupAggregate/nodeAgg.c case seems bad, but I don't have a good sense of how bad things are with orderedsetaggs.c non-use is...it might matter less than the other case. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/20/2015 01:26 PM, Arne Scheffer wrote: Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. Umm, no. It's not. e-counters.sum_var_time += (total_time - old_mean) * (total_time - e-counters.mean_time); This is not a square that's being added. old_mean is not the same as e-counters.mean_time. Since the variance is this value divided by (n - 1), AIUI, I think sum of variances isn't a bad description. I'm open to alternative suggestions. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at http://www.johndcook.com/blog/standard_deviation/ I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? Good catch. Will fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. While hacking up a patch to demonstrate the simplicity of extending this to the Datum sorter, I seem to have run into a fairly major issue with this: there seems to be no attempt whatsoever to handle spilling to disk correctly. The data spilled to disk only has the un-abbreviated values, but nothing tries to re-abbreviate it (or disable abbreviations) when it is read back in, and chaos ensues: set work_mem = 64; select v, v lag(v) over (order by v) from (select 'B'||i as v from generate_series(1,1) i union all select 'a'||i from generate_series(1,1) i offset 0) s order by v limit 20; v| ?column? +-- a1 | B1 | f a1000 | t a1001 | t a1002 | t a1003 | t B1000 | f B1001 | t B1002 | t B1003 | t B1004 | t B1005 | t a1004 | t a1005 | t a1006 | t a1007 | t a1008 | t B1 | f B10| t B100 | t (20 rows) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: On 25.12.2014 22:28, Tomas Vondra wrote: On 25.12.2014 21:14, Andres Freund wrote: That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? The goal was to keep the logic as close to the original as possible. IIRC there were pgstat wait timeout issues before, and in most cases the conclusion was that it's probably because of overloaded I/O. But maybe there actually was another bug, and it's entirely possible that the split introduced a new one, and that's what we're seeing now. The strange thing is that the split happened ~2 years ago, which is inconsistent with the sudden increase of this kind of issues. So maybe something changed on that particular animal (a failing SD card causing I/O stalls, perhaps)? Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce and analyze the issue locally. But that won't happen until January. I've tried to reproduce this on my Raspberry PI 'machine' and it's not very difficult to trigger this. About 7 out of 10 'make check' runs fail because of 'pgstat wait timeout'. All the occurences I've seen were right after some sort of VACUUM (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time looked something like this: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util mmcblk0 0.0075.000.008.00 0.0036.00 9.00 5.73 15633.750.00 15633.75 125.00 100.00 So pretty terrible (this is a Class 4 SD card, supposedly able to handle 4 MB/s). If hamster had faulty SD card, it might have been much worse, I guess. By experience, a class 10 is at least necessary, with a minimum amount of memory to minimize the apparition of those warnings, hamster having now a 8GB class 10 card. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan p...@heroku.com wrote: Maybe that's the wrong way of fixing that, but for now I don't think it's acceptable that abbreviation isn't always used in certain cases where it could make sense (e.g. not for simple GroupAggregates with a single attribute -- only multiple attribute GroupAggregates). After all, most sort cases (e.g. B-Tree builds) didn't use SortSupport for several years, simply because no one got around to it until I finally did a few months back. Exuse me. I mean that this *is* an acceptable restriction for the time being. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote: Dear me. Peter, can you fix this RSN? Investigating. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote: Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's failing make check. Ditto hamerkop (Windows 2k8/VC++) and currawong (Windows XP Pro/MSVC++). jacana (Windows 8/gcc) and brolga (Windows XP Pro/cygwin) are unhappy too, although the failures are showing up in different build stages rather than in 'make check'. narwhal (Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy, though, so it's not affecting ALL of the Windows critters. Still, I'm leaning toward the view that we should disable this optimization across-the-board on Windows until somebody has time to do the legwork to figure out what it takes to make it work, and what makes it work on some of these critters and fail on others. We can't leave the buildfarm red for long periods of time. Fair enough. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/20/2015 06:32 PM, David G Johnston wrote: Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Note point 3 in the linked Wikipedia article. Thanks. Still not quite sure what to do, though :-) I guess in the end we want the answer to come up with similar results to the builtin stddev SQL function. I'll try to set up a test program, to see if we do. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote: It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's the appropriate fix, though: there is probably a case to be made for not bothering with abbreviation once we've read tuples in for the final merge run. More likely, the strongest case is for storing the abbreviated keys on disk too, and reading those back. Maybe not, though: An extra 8 bytes per tuple on disk is not free. OTOH, if we're I/O bound on the final merge, as we ought to be, then recomputing the abbreviated keys could make sense, since there may well be an idle CPU core anyway. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com wrote: I think you misunderstood me ;). I was talking about the old CF application providing a RSS feed of all changes to all entries. https://commitfest-old.postgresql.org/action/commitfest_activity.rss Oh, I didn't know this one. That's indeed useful. Personally, I never used the RSS feed as such, but I did often consult the activity log pages, which I think are equivalent: https://commitfest-old.postgresql.org/action/commitfest_activity?id=25 That feature seems to be gone too :-( Activity log is back, clickable from the top right corner when viewing the frontpage or a commitfest. RSS feed is there as well. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote: I think that the attached patch should at least fix that much. Maybe the problem on the other animal is also explained by the lack of this, since there could also be a MinGW-ish strxfrm_l(), I suppose. Committed that, rather blindly, since it looks like a reasonable fix. Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's failing make check. Ditto hamerkop (Windows 2k8/VC++) and currawong (Windows XP Pro/MSVC++). jacana (Windows 8/gcc) and brolga (Windows XP Pro/cygwin) are unhappy too, although the failures are showing up in different build stages rather than in 'make check'. narwhal (Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy, though, so it's not affecting ALL of the Windows critters. Still, I'm leaning toward the view that we should disable this optimization across-the-board on Windows until somebody has time to do the legwork to figure out what it takes to make it work, and what makes it work on some of these critters and fail on others. We can't leave the buildfarm red for long periods of time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Note point 3 in the linked Wikipedia article. David J. -- View this message in context: http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. The spill-to-disk case is at least as compelling at the internal sort case. The overhead of comparisons is much higher for tapesort. Attached patch serializes keys. On reflection, I'm inclined to go with this approach. Even if the CPU overhead of reconstructing strxfrm() blobs is acceptable for text, it might be much more expensive for other types. I'm loathe to throw away those abbreviated keys unnecessarily. We don't have to worry about having aborted abbreviation, since once we spill to disk we've effectively committed to abbreviation. This patch formalizes the idea that there is strictly a pass-by-value representation required for such cases (but not that the original Datums must be of a pass-by-reference, which is another thing entirely). I've tested it some, obviously with Andrew's testcase and the regression tests, but also with my B-Tree verification tool. Please review it. Sorry about this. -- Peter Geoghegan diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 6d3aa88..a442617 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -413,10 +413,13 @@ struct Tuplesortstate * * If state-randomAccess is true, then the stored representation of the * tuple must be followed by another unsigned int that is a copy of the - * length --- so the total tape space used is actually sizeof(unsigned int) - * more than the stored length value. This allows read-backwards. When - * randomAccess is not true, the write/read routines may omit the extra - * length word. + * length (less any abbreviated key that is stored, which is also possible) --- + * so the total tape space used is actually sizeof(unsigned int) more than the + * stored length value, as well as another sizeof(Datum) overhead for storing + * abbreviated keys. This allows read-backwards, and avoids the need to + * re-compute abbreviated keys. When randomAccess is not true, the write/read + * routines may omit the extra length word. Also, when abbreviation is not in + * play, abbreviated keys are not stored. * * writetup is expected to write both length words as well as the tuple * data. When readtup is called, the tape is positioned just after the @@ -3124,11 +3127,15 @@ writetup_heap(Tuplesortstate *state, int tapenum, SortTuple *stup) char *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET; unsigned int tupbodylen = tuple-t_len - MINIMAL_TUPLE_DATA_OFFSET; - /* total on-disk footprint: */ + /* total on-disk footprint for tuple: */ unsigned int tuplen = tupbodylen + sizeof(int); LogicalTapeWrite(state-tapeset, tapenum, (void *) tuplen, sizeof(tuplen)); + /* Store abbreviated key, if any (not accounted for by tuplen) */ + if (state-sortKeys-abbrev_converter) + LogicalTapeWrite(state-tapeset, tapenum, + (void *) stup-datum1, sizeof(Datum)); LogicalTapeWrite(state-tapeset, tapenum, (void *) tupbody, tupbodylen); if (state-randomAccess) /* need trailing length word? */ @@ -3150,6 +3157,10 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup, HeapTupleData htup; USEMEM(state, GetMemoryChunkSpace(tuple)); + /* read in the tuple abbreviated key, if any */ + if (state-sortKeys-abbrev_converter) + LogicalTapeReadExact(state-tapeset, tapenum, (void *) stup-datum1, + sizeof(Datum)); /* read in the tuple proper */ tuple-t_len = tuplen; LogicalTapeReadExact(state-tapeset, tapenum, @@ -3161,10 +3172,12 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup, /* set up first-column key value */ htup.t_len = tuple-t_len + MINIMAL_TUPLE_OFFSET; htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET); - stup-datum1 = heap_getattr(htup, -state-sortKeys[0].ssup_attno, -state-tupDesc, -stup-isnull1); + /* set up first-column key value (for non-abbreviated case) */ + if (!state-sortKeys-abbrev_converter) + stup-datum1 = heap_getattr(htup, + state-sortKeys[0].ssup_attno, + state-tupDesc, + stup-isnull1); } /* @@ -3355,12 +3368,20 @@ writetup_cluster(Tuplesortstate *state, int tapenum, SortTuple *stup) { HeapTuple tuple = (HeapTuple) stup-tuple; unsigned int tuplen = tuple-t_len + sizeof(ItemPointerData) + sizeof(int); + AttrNumber leading = state-indexInfo-ii_KeyAttrNumbers[0]; - /* We need to store t_self, but not other fields of HeapTupleData */ + /* + * We need to store t_self, but not other fields of HeapTupleData (although + * possibly abbreviated key value) + */ LogicalTapeWrite(state-tapeset, tapenum, tuplen, sizeof(tuplen)); LogicalTapeWrite(state-tapeset, tapenum, tuple-t_self, sizeof(ItemPointerData)); + /* Store abbreviated key, if any (not accounted for by tuplen) */ + if
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: So when I'm trying to decide what to audit, I have to: (a) check if the current user is mentioned in .roles; if so, audit. (b) check if the current user is a descendant of one of the roles mentioned in .roles; if not, no audit. (c) check for permissions granted to the root role and see if that tells us to audit. Is that right? If so, it would work fine. I don't look forward to trying to explain it to people, but yes, it would work (for anything you could grant permissions for). I think this points to fundamental weakness in the idea of doing this through the GRANT system. Some people are going want to audit everything a particular user does, some people are going to want to audit all access to particular objects, and some people will have more complicated requirements. Some people will want to audit even super-users, others especially super-users, others only non super-users. None of this necessarily matches up to the usual permissions framework. I'm hopeful that my email from a few minutes ago provides a way to cover all of the above, while still making it easy for users who just want to say audit everything this user does or audit everything which touches this column. Any auditing of superusers is going to be circumventable by those same superusers if it's happening in the context of the PG process, so I'm not sure why they would be any different under this scheme vs. some other scheme. Don't get me wrong- I agree completely that using the GRANT system isn't ideal, but I don't see anyone proposing another way for an extension to store metadata on catalog tables with the same granularity the GRANT system provides and its dependency handling and ability to have default values. We've been over that ground a number of times and I certainly don't feel like we're any closer to solving it and I'd hate to see that block pgaudit. Put another way, if the requirement for pgaudit is that it has to solve the metadata-on-catalogs-for-extensions problem, I think I'd rather just move pgaudit into core instead, give it catalog tables, make it part of the dependency system, provide syntax for it, etc. I'm pretty sure that'd be easier and happen a lot faster. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote: I don't want to change the on-disk format for tapes without a lot more discussion. Can you come up with a fix that avoids that for now? A more conservative approach would be to perform conversion on-the-fly once more. That wouldn't be patently unacceptable from a performance perspective, and would also not change the on-disk format for tapes. Thoughts? That might be OK. Probably needs a bit of performance testing to see how it looks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Tue, Jan 20, 2015 at 5:03 PM, Jim Nasby jim.na...@bluetreble.com wrote: +1. In particular I'm very concerned with the idea of doing this via roles, because that would make it trivial for any superuser to disable auditing. The only good option I could see to provide this kind of flexibility would be allowing the user to provide a function that accepts role, object, etc and make return a boolean. The performance of that would presumably suck with anything but a C function, but we could provide some C functions to handle simple cases. That said, I think the best idea at this stage is either log everything or nothing. We can always expand upon that later. I think this is throwing the baby out with the bathwater. Neither C functions nor all-or-nothing are going to be of any practical use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. The spill-to-disk case is at least as compelling at the internal sort case. The overhead of comparisons is much higher for tapesort. First, we need to unbreak this. Then, we can look at optimizing it. The latter task will require performance testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Tue, Jan 20, 2015 at 4:07 PM, David Johnston david.g.johns...@gmail.com wrote: sourceline and sourcefile pertain only to the current value while the point of adding these other pieces is to provide a snapshot of all the different mappings that the system knows about; instead of having to tell a user to go look in two different files (and associated includes) and a database catalog to find out what possible values are in place. That doesn't solve the need to scan the catalog to see other possible values - though you could at least put a counter in pg_settings that indicates how many pg_db_role_setting entries reference the given variable so that if non-zero the user is clued into the fact that they need to check out said catalog table. This last proposal seems pointless to me. If the user knows about pg_db_role_setting, they will know to check it; if they don't, a counter won't fix that. I can see that there might be some utility to a query that would tell you, for a given setting, all sources of that setting the system knew about, whether in configuration files, pg_db_role_setting, or the current session. But I don't see that putting information that's already available via one system catalog query into a different system catalog query helps anything - we should presume DBAs know how to write SQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
* Noah Misch (n...@leadboat.com) wrote: On Mon, Jan 19, 2015 at 11:05:22AM -0500, Stephen Frost wrote: One remaining question is about single-column key violations. Should we special-case those and allow them to be shown or no? I can't see a reason not to currently but I wonder if we might have cause to act differently in the future (not that I can think of a reason we'd ever need to). Keep them hidden. Attempting to delete a PK row should not reveal otherwise-inaccessible values involved in any constraint violation. It's tempting to make an exception for single-column UNIQUE constraints, but some of the ensuing data disclosures are questionable. What if the violation arose from a column default expression or from index corruption? Interesting point. I've kept them hidden in this latest version. On Mon, Jan 19, 2015 at 11:46:35AM -0500, Stephen Frost wrote: Right, ExecBuildSlotValueDescription() was made to be consistent with BuildIndexValueDescription. The reason for BuildIndexValueDescription returning an empty string is different from why you hypothosize though- it's exported and I was a bit worried that existing external callers might not be prepared for it to return a NULL instead of a string of some kind. If this was a green field instead of a back-patch fix, I'd certainly return NULL instead. If others feel that's not a good reason to avoid returning NULL, I can certainly change it around. I won't lose sleep if it does return for that reason, but I'm relatively unconcerned about extension API compatibility here. That function is called from datatype-independent index uniqueness checks. This mailing list has discussed the difficulties of implementing an index access method in an extension, and no such extension has come forward. Alright, I've reworked this to have those functions return NULL instead. Your latest patch has whitespace errors; visit git diff --check. Yeah, I had done that but then made a few additional tweaks and didn't recheck, sorry about that. I'm playing around w/ my git workflow a bit and hopefully it won't happen again. :) Updated patch attached. Thanks! Stephen From d9f0e186a74e4d11e9c7f3181dbf98bae3224111 Mon Sep 17 00:00:00 2001 From: Stephen Frost sfr...@snowman.net Date: Mon, 12 Jan 2015 17:04:11 -0500 Subject: [PATCH] Fix column-privilege leak in error-message paths While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit. --- src/backend/access/index/genam.c | 59 ++- src/backend/access/nbtree/nbtinsert.c| 12 ++- src/backend/commands/copy.c | 6 +- src/backend/commands/matview.c | 7 ++ src/backend/executor/execMain.c | 171 --- src/backend/executor/execUtils.c | 12 ++- src/backend/utils/adt/ri_triggers.c | 78 ++ src/backend/utils/sort/tuplesort.c | 9 +- src/test/regress/expected/privileges.out | 31 ++ src/test/regress/sql/privileges.sql | 24 + 10 files changed, 338 insertions(+), 71 deletions(-) diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 850008b..69cbbd5 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -25,10 +25,12 @@ #include lib/stringinfo.h #include miscadmin.h #include storage/bufmgr.h +#include utils/acl.h #include utils/builtins.h #include utils/lsyscache.h #include utils/rel.h #include utils/snapmgr.h +#include utils/syscache.h #include utils/tqual.h @@ -154,6 +156,9 @@ IndexScanEnd(IndexScanDesc scan) * form (key_name, ...)=(key_value, ...). This is currently used * for building unique-constraint and exclusion-constraint error messages. * + * Note that if the user does not have permissions to view the columns + * involved, a NULL is returned. + * * The passed-in values/nulls arrays are the raw input to the index AM, * e.g. results of FormIndexDatum --- this is not necessarily what is stored * in
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote: I don't want to change the on-disk format for tapes without a lot more discussion. Can you come up with a fix that avoids that for now? A more conservative approach would be to perform conversion on-the-fly once more. That wouldn't be patently unacceptable from a performance perspective, and would also not change the on-disk format for tapes. Thoughts? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Abhijit, * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: At 2015-01-19 08:26:59 -0500, sfr...@snowman.net wrote: I'm confused by this statement.. Let me see if I've understood your clarification: Thanks much for the example use-case and for working this through with me. I actually think I've come up with a further specification which might allow us to make this extremely flexible, but also simple for those who want to keep it simple. Consider this: Everything is single-level to the roles mentioned in the GUC. Is the logged in role a member of one of the GUC roles? Yes - Audit Now to cover the user X for table Y case: Did any of the GUC value roles grant SELECT rights for this table to the current role? Yes - Audit SELECT on the table by the current role Did any of the GUC value roles grant INSERT rights for this table to the current role? Yes - Audit INSERT on the table by the current role etc. For the 'log all access to an object' case, under this scheme, I'm afraid we'd need some special role to GRANT the access to. We wouldn't want that to simply be 'public' since then it might actually be granted access rights that we don't want to. We can't simply use the same role because you need to grant that role whatever access 'with grant option' in order for it to be able to re-grant the privilege. With the special role, it becomes: Does the special role have SELECT rights on the table? Yes - Audit SELECTs on the table Does the special role have INSERT rights on the table? Yes - Audit INSERTs on the table Suppose you have pgaudit.roles set to 'r0, r1', and that you have granted r0 to u0. Not quite- I wasn't thinking you'd grant r0 to u0 but rather the other way around- u0 is granted to r0. If you granted r0 to u0, then u0 would have all of r0's rights which could be quite a bit larger than you want u0 to have. It only works in the other direction. Now, if you're connected to the database as r0 or r1, then everything you do is logged. But if you're connected as u0, then only those things are logged that can be derived (in a manner discussed later) from the permissions explicitly granted to r0 (but not u0)? So when I'm trying to decide what to audit, I have to: (a) check if the current user is mentioned in .roles; if so, audit. (b) check if the current user is a descendant of one of the roles mentioned in .roles; if not, no audit. (c) check for permissions granted to the root role and see if that tells us to audit. Is that right? If so, it would work fine. I don't look forward to trying to explain it to people, but yes, it would work (for anything you could grant permissions for). This is pretty close- (a) and (b) are mostly correct, though I would strongly discourage users from actually logging in as an audit role. The one caveat with (b) is that 'if not, no audit' is not correct- all cases are essentially OR'd together when it comes to auditing. Roles can be audited even if they are not descendants of the roles mentioned in .roles. Review the opening of this email though and consider that we could look at what privileges has the audit role granted to the current role? as defining what is to be audited. You can't say that you want r1 to have X actions logged, with r2 having Y actions logged. Right. But how do you do that for non-GRANT-able actions in your scheme? For example, what if I want to see all the tables created and dropped by a particular user? I hadn't been intending to address that with this scheme, but I think we have that by looking for privilege grants where the audit role is the grantee and the role-to-be-audited the grantor. Have you considered splitting pgaudit.c into multiple files, perhaps along the pre/post deparse lines? If such a split were done properly, then could the backwards-compatible version be more acceptable for inclusion in contrib in 9.5? If so, I'll look into it. As Robert says, the short answer is 'no'- but it might make it easier to get the 9.5 bits into 9.5.. :) The key part above is any. We're using the grant system as a proxy for saying I want to audit column X. That's different from I want to audit commands which would be allowed if I *only* had access to column X. If I want to audit access to column X, then: select A, B, X from mytable; Should be audited, even if the audit role doesn't have access to columns A or B. Yes, that's what the code already does (modulo handling of the audit role's oid, which I tweaked to match the behaviour described earlier in this mail). I did the following: create table x(a int,b int,c int); insert into x(a,b,c) values (1,2,3); create user y; grant select on x to y; create role x; grant select (a) on table x to x; grant x to y; Then, as user y, I did «select a,b,c from x» and «select b,c from x». Only the former was logged: LOG: AUDIT,2015-01-20
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote: It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's the appropriate fix, though: there is probably a case to be made for not bothering with abbreviation once we've read tuples in for the final merge run. More likely, the strongest case is for storing the abbreviated keys on disk too, and reading those back. Maybe not, though: An extra 8 bytes per tuple on disk is not free. OTOH, if we're I/O bound on the final merge, as we ought to be, then recomputing the abbreviated keys could make sense, since there may well be an idle CPU core anyway. I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan schrieb am 2015-01-20: On 01/20/2015 01:26 PM, Arne Scheffer wrote: Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. Umm, no. It's not. Umm, yes, i think, it is ;-) e-counters.sum_var_time += (total_time - old_mean) * (total_time - e-counters.mean_time); This is not a square that's being added. That's correct. Nevertheless it's the difference between the computed sum of squared differences and the preceeding one, added in every step. old_mean is not the same as e-counters.mean_time. Since the variance is this value divided by (n - 1), AIUI, I think sum of variances isn't a bad description. I'm open to alternative suggestions. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at http://www.johndcook.com/blog/standard_deviation/ (There is nothing bad about that calculations, Welford's algorithm is simply sequently adding the differences mentioned above.) VlG-Arne I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? Good catch. Will fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: +1. In particular I'm very concerned with the idea of doing this via roles, because that would make it trivial for any superuser to disable auditing. The only good option I could see to provide this kind of flexibility would be allowing the user to provide a function that accepts role, object, etc and make return a boolean. The performance of that would presumably suck with anything but a C function, but we could provide some C functions to handle simple cases. Superusers will be able to bypass, trivially, anything that's done in the process space of PG. The only possible exception to that being an SELinux or similar solution, but I don't think that's what you were getting at. I certainly don't think having the user provide a C function to specify what should be audited as making any sense- if they can do that, they can use the same hooks pgaudit is using and skip the middle-man. As for the performance concern you raise, I actually don't buy into it at all. It's not like we worry about the performance of checking permissions on objects in general and, for my part, I like to think that's because it's pretty darn quick already. That said, I think the best idea at this stage is either log everything or nothing. We can always expand upon that later. We've already got those options and they are, clearly, insufficient for a large number of our users. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 20, 2015 at 10:59 PM, Thom Brown t...@linux.com wrote: On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote: Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. I'm getting an issue: # set parallel_seqscan_degree = 10; SET Time: 0.219 ms ➤ psql://thom@[local]:5488/pgbench ➤ psql://thom@[local]:5488/pgbench # explain analyse select c1 from t1; So setting parallel_seqscan_degree above max_worker_processes causes the CPU to max out, and the query never returns, or at least not after waiting 2 minutes. Shouldn't it have a ceiling of max_worker_processes? Yes, it should behave that way, but this is not handled in patch as still we have to decide on what is the best execution strategy (block-by-block or fixed chunks for different workers) and based on that I can handle this scenario in patch. I could return an error for such a scenario or do some work to handle it seamlessly, but it seems to me that I have to rework on the same if we select different approach for doing execution than used in patch, so I am waiting for that to get decided. I am planing to work on getting the performance data for both the approaches, so that we can decide which is better way to go-ahead. The original test I performed where I was getting OOM errors now appears to be fine: Thanks for confirming the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Tue, Jan 20, 2015 at 8:46 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 4:07 PM, David Johnston david.g.johns...@gmail.com wrote: sourceline and sourcefile pertain only to the current value while the point of adding these other pieces is to provide a snapshot of all the different mappings that the system knows about; instead of having to tell a user to go look in two different files (and associated includes) and a database catalog to find out what possible values are in place. That doesn't solve the need to scan the catalog to see other possible values - though you could at least put a counter in pg_settings that indicates how many pg_db_role_setting entries reference the given variable so that if non-zero the user is clued into the fact that they need to check out said catalog table. This last proposal seems pointless to me. If the user knows about pg_db_role_setting, they will know to check it; if they don't, a counter won't fix that. I can see that there might be some utility to a query that would tell you, for a given setting, all sources of that setting the system knew about, whether in configuration files, pg_db_role_setting, or the current session. But I don't see that putting information that's already available via one system catalog query into a different system catalog query helps anything - we should presume DBAs know how to write SQL. While this is not exactly a high-traffic catalog/view why have them write a likely 4-way join query (pg_db_role_setting is not the friendly in its current form), or even two separate queries, when we can give them a solid and comprehensive view standard. I guess part of the mixup is that I am basically talking about a view of multiple catalogs as a DBA UI as opposed to really even caring what specific catalogs (existing or otherwise) or functions are needed to make the whole thing work. Maybe it does all fit directly on pg_settings but tacking on some read-only columns to this updateable view/table doesn't come across as something that should be forbidden in general. Maybe I am imagining a use-case that just isn't there but if there are two separate queries needed, and we call one consolidated, then having that query indicate whether the other query has useful information, and it is quite likely that it will not, avoids the user expending the effort to run the wasteful secondary query. David J.
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila amit.kapil...@gmail.com wrote: I think whichever process reads postgresql.conf/postgresql.auto.conf have to do this (unless we restrict that this will be done at some other time) and postmaster is one of them. It seems to me that it is not good idea unless we do it at other time like populating entries for a view. Well, the postmaster does not have database access, and neither do most of the auxiliary processes. The autovacuum launcher has very limited database access (shared catalogs only). Making the postmaster access the database is a complete non-starter; Okay and I was also not in favour of this approach. Right, but we can't completely eliminate such a possibility (as an example we have some default settings like max_connections, shared_buffers, etc). I agree with you that users should use only way of changing the settings, however for certain rare cases (default settings or some other) we can provide a way for user to know which of the settings are duplicate. I think if we can provide such an information via some view with ease then it's worth to have it. I'd suggest abandoning the idea of storing anything in a persistent basis in a system catalog and look for some way for the backend to which the user is connected to expose its own state instead. Agreed. For example, pg_settings already exposes sourcefile and sourceline settings. Actually, I'm not quite sure why that's not sufficient here, but if it isn't, it could be extended. The reason why sourcefile and sourceline are not sufficient is that they can only give the information about the setting in last file it is present. Assume max_connections (or any other setting) is available in both postgresql.conf and postgresql.auto.conf, then it will display the information about the setting in postgresql.auto.conf, so now user might not be able to decide whether that is the setting he want to retain unless he knows the information about setting in postgresql.conf. Now as I have suggested upthread, that we can have a new view pg_file_settings which will display information about settings even when there exists multiple entries for the same in different files. I think adding such information to existing view pg_settings would be difficult as the same code is used for show commands which we don't want to change. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/20/15 2:20 PM, Robert Haas wrote: On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sena...@2ndquadrant.com wrote: So when I'm trying to decide what to audit, I have to: (a) check if the current user is mentioned in .roles; if so, audit. (b) check if the current user is a descendant of one of the roles mentioned in .roles; if not, no audit. (c) check for permissions granted to the root role and see if that tells us to audit. Is that right? If so, it would work fine. I don't look forward to trying to explain it to people, but yes, it would work (for anything you could grant permissions for). I think this points to fundamental weakness in the idea of doing this through the GRANT system. Some people are going want to audit everything a particular user does, some people are going to want to audit all access to particular objects, and some people will have more complicated requirements. Some people will want to audit even super-users, others especially super-users, others only non super-users. None of this necessarily matches up to the usual permissions framework. +1. In particular I'm very concerned with the idea of doing this via roles, because that would make it trivial for any superuser to disable auditing. The only good option I could see to provide this kind of flexibility would be allowing the user to provide a function that accepts role, object, etc and make return a boolean. The performance of that would presumably suck with anything but a C function, but we could provide some C functions to handle simple cases. That said, I think the best idea at this stage is either log everything or nothing. We can always expand upon that later. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
2015-01-20 21:32 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/20/15 11:12 AM, Pavel Stehule wrote: I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function. What happened to returning an array of offsets? I think that would be both easier to use than this version as well as performing better. I have still thinking about this idea. It needs a different function and I didn't start with this. Implementation a optional start parameter to array_offset is cheap - and I am thinking so it can be useful for some use cases. I see you dropped multi-dimension support, but I think that's fine. It can be implemented later. There is no any barriers to later implementation. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote: That might be OK. Probably needs a bit of performance testing to see how it looks. Well, we're still only doing it when we do our final merge. So that's only doubling the number of conversions required, which if we're blocked on I/O might not matter at all. You'll probably prefer the attached. This patch works by disabling abbreviation, but only after writing out runs, with the final merge left to go. That way, it doesn't matter when abbreviated keys are not read back from disk (or regenerated). I believe this bug was missed because it only occurs when there are multiple runs, and not in the common case where there is one big initial run that is found already sorted when we reach mergeruns(). -- Peter Geoghegan diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 6d3aa88..b6e302f 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -2172,6 +2172,22 @@ mergeruns(Tuplesortstate *state) return; } + if (state-sortKeys-abbrev_converter) + { + /* + * If there are multiple runs to be merged, when we go to read back + * tuples from disk, abbreviated keys will not have been stored, and we + * don't care to regenerate them. Disable abbreviation from this point + * on. + */ + state-sortKeys-abbrev_converter = NULL; + state-sortKeys-comparator = state-sortKeys-abbrev_full_comparator; + + /* Not strictly necessary, but be tidy */ + state-sortKeys-abbrev_abort = NULL; + state-sortKeys-abbrev_full_comparator = NULL; + } + /* End of step D2: rewind all output tapes to prepare for merging */ for (tapenum = 0; tapenum state-tapeRange; tapenum++) LogicalTapeRewind(state-tapeset, tapenum, false); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
I'm trying to compare v5 and v6 in my laptop right now. Apparently my laptop is quite a bit faster than your machine because the tests complete in roughly 3.3 seconds. I added more data and didn't see anything other than noise. (Then again the queries were dominated by the disk sort so I should retry with larger work_mem). I'll try it again when I have more time to play with it. I suspect the benefits would be more clear over a network. Larger than default work_mem yes, but I think one of the prime use case for the fdw is for more warehouse style situations (PostgresXL style use cases). In those cases, work_mem might reasonably be set to 1GB. Then even if you have 10KB rows you can fetch a million rows and still be using less than work_mem. A simpler change would be to vary it with respect to work_mem. Half baked idea: I know its the wrong time in the execution phase, but if you are using remote estimates for cost there should also be a row width estimate which I believe is based from pg_statistic and its mean column width. Its actually a pity that there is no way to set fetch sizes based on give me as many tuples as will fit in less than x amount of memory. Because that is almost always exactly what you want. Even when writing application code, I've never actually wanted precisely 10,000 rows; I've always wanted a reasonable size chunk that could fit into memory and then backed my way into how many rows I wanted. If we were to extend FETCH to support syntax like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need estimate the value on the fly. The async stuff, however, is a huge improvement over the last time I played with the fdw. The two active postgres processes were easily consuming a core and half of CPU. I think its not worth tying these two things together. Its probably worth it to make these two separate discussions and separate patches. - Matt Kelly *Just sanity checking myself: Shutting down the server, applying the different patch, 'make clean install' in postgres_fdw, and then restarting the server should obviously be sufficient to make sure its running the new code because that is all linked at runtime, right?
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
At 2015-01-20 20:36:39 -0500, robertmh...@gmail.com wrote: I think this is throwing the baby out with the bathwater. Neither C functions nor all-or-nothing are going to be of any practical use. Do you see some approach that has a realistic chance of making 9.5 and would also actually be worth putting into 9.5? Suggestions very welcome. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. The spill-to-disk case is at least as compelling at the internal sort case. The overhead of comparisons is much higher for tapesort. First, we need to unbreak this. Then, we can look at optimizing it. The latter task will require performance testing. I don't see that any alternative isn't a performance trade-off. My patch accomplishes unbreaking this. I agree that it needs still needs review from that perspective, but it doesn't seem any worse than any other alternative. Would you prefer it if the spill-to-disk case aborted in the style of low entropy keys? That doesn't seem significantly safer than this, and it certainly not acceptable from a performance perspective. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote: Would you prefer it if the spill-to-disk case aborted in the style of low entropy keys? That doesn't seem significantly safer than this, and it certainly not acceptable from a performance perspective. BTW, I can write that patch if that's your preference. Should I? I just don't favor that even as a short term correctness fix, because it seems unacceptable to throw away all the strxfrm() work where that's a very predictable and even likely outcome. I suggest reviewing and committing my fix as a short term fix, that may well turn out to be generally acceptable upon further consideration. Yes, we'll need to make a point of reviewing an already committed patch, but there is a precedent for that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. The spill-to-disk case is at least as compelling at the internal sort case. The overhead of comparisons is much higher for tapesort. Attached patch serializes keys. On reflection, I'm inclined to go with this approach. Even if the CPU overhead of reconstructing strxfrm() blobs is acceptable for text, it might be much more expensive for other types. I'm loathe to throw away those abbreviated keys unnecessarily. We don't have to worry about having aborted abbreviation, since once we spill to disk we've effectively committed to abbreviation. This patch formalizes the idea that there is strictly a pass-by-value representation required for such cases (but not that the original Datums must be of a pass-by-reference, which is another thing entirely). I've tested it some, obviously with Andrew's testcase and the regression tests, but also with my B-Tree verification tool. Please review it. Sorry about this. I don't want to change the on-disk format for tapes without a lot more discussion. Can you come up with a fix that avoids that for now? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote: That might be OK. Probably needs a bit of performance testing to see how it looks. Well, we're still only doing it when we do our final merge. So that's only doubling the number of conversions required, which if we're blocked on I/O might not matter at all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
On Tue, Jan 20, 2015 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: Coverity is pointing out $subject, with the following stuff in gbt_var_same(): ... As Heikki pointed me out on IM, the lack of crash report in this area, as well as similar coding style in cube/ seem to be sufficient arguments to simply remove those NULL checks instead of doing more solid checks on them. Patch is attached. The way to form a convincing argument that these checks are unnecessary would be to verify that (1) the SQL-accessible functions directly calling gbt_var_same() are all marked STRICT, and (2) the core GIST code never passes a NULL to these support functions. I'm prepared to believe that (1) and (2) are both true, but it merits checking. Sure. gbt_var_same is called by those functions in btree_gist/: - gbt_bit_same - gbt_bytea_same - gbt_numeric_same - gbt_text_same =# select proname, proisstrict from pg_proc where proname in ('gbt_bit_same', 'gbt_bytea_same', 'gbt_numeric_same', 'gbt_text_same'); proname | proisstrict --+- gbt_text_same| t gbt_bytea_same | t gbt_numeric_same | t gbt_bit_same | t (4 rows) For the second point, I have run regression tests with an assertion in gbt_var_same checking if t1 or t2 are NULL and tests worked. Also, looking at the code of gist, gbt_var_same is called through gistKeyIsEQ, which is used for an insertion or for a split. The point is that when doing an insertion, a call to gistgetadjusted is done and we look if one of the keys is NULL. If one of them is, code does not go through gistKeyIsEQ, so the NULL checks do not seem necessary in gbt_var_same. Btw, looking at the code of gist, I think that it would be interesting to add an assertion in gistKeyIsEQ like that: Assert(DatumGetPointer(a) != NULL DatumGetPointer(b) != NULL); Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Tue, Jan 20, 2015 at 9:52 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever. Assume one of the worker is not able to start (not able to attach to shared memory or some other reason), then status returned by GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED and after that it can wait forever in WaitLatch. I don't think that's right. The status only remains BGWH_NOT_YET_STARTED until the postmaster forks the child process. I think the control flow can reach the above location before postmaster could fork all the workers. Consider a case that we have launched all workers during ExecutorStart phase and then before postmaster could start all workers an error occurs in master backend and then it try to Abort the transaction and destroy the parallel context, at that moment it will get the above status and wait forever in above code. I am able to reproduce above scenario with debugger by using parallel_seqscan patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On 20-01-2015 PM 11:29, Amit Kapila wrote: I have taken care of integrating the parallel sequence scan with the latest patch posted (parallel-mode-v1.patch) by Robert at below location: http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com Changes in this version --- 1. As mentioned previously, I have exposed one parameter ParallelWorkerNumber as used in parallel-mode patch. 2. Enabled tuple queue to be used for passing tuples from worker backend to master backend along with error queue as per suggestion by Robert in the mail above. 3. Involved master backend to scan the heap directly when tuples are not available in any shared memory tuple queue. 4. Introduced 3 new parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for deciding the cost of parallel plan. Currently, I have kept the default values for parallel_setup_cost and parallel_startup_cost as 0.0, as those require some experiments. 5. Fixed some issues (related to memory increase as reported upthread by Thom Brown and general feature issues found during test) Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. I got an assertion failure: In src/backend/executor/execTuples.c: ExecStoreTuple() /* passing shouldFree=true for a tuple on a disk page is not sane */ Assert(BufferIsValid(buffer) ? (!shouldFree) : true); when called from: In src/backend/executor/nodeParallelSeqscan.c: ParallelSeqNext() I think something like the following would be necessary (reading from comments in the code): --- a/src/backend/executor/nodeParallelSeqscan.c +++ b/src/backend/executor/nodeParallelSeqscan.c @@ -85,7 +85,7 @@ ParallelSeqNext(ParallelSeqScanState *node) if (tuple) ExecStoreTuple(tuple, slot, - scandesc-rs_cbuf, + fromheap ? scandesc-rs_cbuf : InvalidBuffer, !fromheap); After fixing this, the assertion failure seems to be gone though I observed the blocked (CPU maxed out) state as reported elsewhere by Thom Brown. What I was doing: CREATE TABLE test(a) AS SELECT generate_series(1, 1000); postgres=# SHOW max_worker_processes; max_worker_processes -- 8 (1 row) postgres=# SET seq_page_cost TO 100; SET postgres=# SET parallel_seqscan_degree TO 4; SET postgres=# EXPLAIN SELECT * FROM test; QUERY PLAN - Parallel Seq Scan on test (cost=0.00..1801071.27 rows=8981483 width=4) Number of Workers: 4 Number of Blocks Per Worker: 8849 (3 rows) Though, EXPLAIN ANALYZE caused the thing. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
Hello, thank you for looking this but sorry that the last patch was buggy so that adaptive fetch size did not work. The attached is the fixed patch. It apparently improves the performance for the test case shown in the previous mail, in which the average tuple length is about 140 bytes. 21 Jan 2015 05:22:34 +, Matt Kelly mkell...@gmail.com wrote in ca+kcukg4cvdlf4v0m9_rvv_zuasg1odhpj_yvczja6w2nsk...@mail.gmail.com I'm trying to compare v5 and v6 in my laptop right now. Apparently my laptop is quite a bit faster than your machine because the tests complete in roughly 3.3 seconds. I added more data and didn't see anything other than noise. (Then again the queries were dominated by the disk sort so I should retry with larger work_mem). I'll try it again when I have more time to play with it. I suspect the benefits would be more clear over a network. Larger than default work_mem yes, but I think one of the prime use case for the fdw is for more warehouse style situations (PostgresXL style use cases). In those cases, work_mem might reasonably be set to 1GB. Then even if you have 10KB rows you can fetch a million rows and still be using less than work_mem. A simpler change would be to vary it with respect to work_mem. Agreed about the nature of the typical workload for postgres FDW. But I think server itself including postgres_fdw should not crash even by a sudden explosion of tuple length. The number 100 seems to be safe enough but 1000 seems suspicious, and 1 is looks to be danger from such standpoint. Half baked idea: I know its the wrong time in the execution phase, but if you are using remote estimates for cost there should also be a row width estimate which I believe is based from pg_statistic and its mean column width. It reduces the chance to claim unexpected amount of memory, but still the chance remains. Its actually a pity that there is no way to set fetch sizes based on give me as many tuples as will fit in less than x amount of memory. Because that is almost always exactly what you want. Even when writing application code, I've never actually wanted precisely 10,000 rows; I've always wanted a reasonable size chunk that could fit into memory and then backed my way into how many rows I wanted. If we were to extend FETCH to support syntax like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need estimate the value on the fly. I didn't think about hat. It makes sense, at least to me:) There would be many cases that the *amount* of data is more crucial than their number. I'll work on it. The async stuff, however, is a huge improvement over the last time I played with the fdw. The two active postgres processes were easily consuming a core and half of CPU. I think its not worth tying these two things together. Its probably worth it to make these two separate discussions and separate patches. Yes, they can be separated and also should be. I'll split them after this. - Matt Kelly *Just sanity checking myself: Shutting down the server, applying the different patch, 'make clean install' in postgres_fdw, and then restarting the server should obviously be sufficient to make sure its running the new code because that is all linked at runtime, right? Yes. it's enough and I also did so. This patch touches only postgres_fdw. regards, -- Kyotaro Horiguchi NTT Open Source Software Center From 12d0e8666871e2272beb1b85903baf0be92881b5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp Date: Tue, 13 Jan 2015 19:20:35 +0900 Subject: [PATCH] Asynchronous execution of postgres_fdw v7 This is the buf fixed version of v6, which was modified version of Asynchronous execution of postgres_fdw. --- contrib/postgres_fdw/Makefile | 2 +- contrib/postgres_fdw/PgFdwConn.c| 200 ++ contrib/postgres_fdw/PgFdwConn.h| 61 ++ contrib/postgres_fdw/connection.c | 82 contrib/postgres_fdw/postgres_fdw.c | 391 +--- contrib/postgres_fdw/postgres_fdw.h | 15 +- 6 files changed, 629 insertions(+), 122 deletions(-) create mode 100644 contrib/postgres_fdw/PgFdwConn.c create mode 100644 contrib/postgres_fdw/PgFdwConn.h diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index d2b98e1..d0913e2 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -1,7 +1,7 @@ # contrib/postgres_fdw/Makefile MODULE_big = postgres_fdw -OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES) +OBJS = postgres_fdw.o PgFdwConn.o option.o deparse.o connection.o $(WIN32RES) PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL PG_CPPFLAGS = -I$(libpq_srcdir) diff --git a/contrib/postgres_fdw/PgFdwConn.c b/contrib/postgres_fdw/PgFdwConn.c new file mode 100644 index 000..b13b597 --- /dev/null +++ b/contrib/postgres_fdw/PgFdwConn.c @@ -0,0 +1,200 @@
Re: [HACKERS] hamerkop is stuck
On Mon, Jan 19, 2015 at 10:44:37AM +0900, TAKATSUKA Haruka wrote: On Fri, 16 Jan 2015 03:25:00 -0500 Noah Misch n...@leadboat.com wrote: Buildfarm member hamerkop stopped reporting in after commit f6dc6dd. After commit 8d9cb0b, it resumed reporting in for 9.3 and earlier branches. It is still silent for HEAD and REL9_4_STABLE. It may have stale processes or stale lock files requiring manual cleanup. Would you check it? Sorry about giving corrupted reports. We examine what it is caused by now. It is good now. Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
On 1/20/15 11:12 AM, Pavel Stehule wrote: I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function. What happened to returning an array of offsets? I think that would be both easier to use than this version as well as performing better. I see you dropped multi-dimension support, but I think that's fine. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/19/15 7:20 AM, Robert Haas wrote: Another thing is that I think prefetching is not supported on all platforms (Windows) and for such systems as per above algorithm we need to rely on block-by-block method. Well, I think we should try to set up a test to see if this is hurting us. First, do a sequential-scan of a related too big at least twice as large as RAM. Then, do a parallel sequential scan of the same relation with 2 workers. Repeat these in alternation several times. If the operating system is accomplishing meaningful readahead, and the parallel sequential scan is breaking it, then since the test is I/O-bound I would expect to see the parallel scan actually being slower than the normal way. Or perhaps there is some other test that would be better (ideas welcome) but the point is we may need something like this, but we should try to figure out whether we need it before spending too much time on it. I'm guessing that not all supported platforms have prefetching that actually helps us... but it would be good to actually know if that's the case. Where I think this gets a lot more interesting is if we could apply this to an index scan. My thought is that would result in one worker mostly being responsible for advancing the index scan itself while the other workers were issuing (and waiting on) heap IO. So even if this doesn't turn out to be a win for seqscan, there's other places we might well want to use it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? VlG Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers