Re: [HACKERS] WIP: Rework access method interface
On Fri, Sep 25, 2015 at 7:41 PM, Teodor Sigaevwrote: > I'm OK about continuing work on amvalidate if we can build consuensus on >> its design. >> Could you give some feedback on amvalidate version of patch please? >> >> http://www.postgresql.org/message-id/capphfds8zywenz9vw6te5rzxbol1vu_wsw181veq+mu+v1d...@mail.gmail.com >> > > In attach a bit modified patch based on 4-th version, and if there are no > strong objections, I will commit it. Waiting this patch stops Alexander's > development on CREATE ACCESS METHOD and he needs to move forward. > > 1. +InitIndexAmRoutine(Relation relation) +{ + IndexAmRoutine *result, *tmp; + + result = (IndexAmRoutine *)MemoryContextAlloc(CacheMemoryContext, + sizeof(IndexAmRoutine)); + tmp = (IndexAmRoutine *)DatumGetPointer( + OidFunctionCall0(relation->rd_am->amhandler)); + memcpy(result, tmp, sizeof(IndexAmRoutine)); + relation->amroutine = result; +} Is it appropriate to use CacheMemoryContext here? Currently in load_relcache_init_file(), all the other information like rd_indoption, rd_aminfo in Relation is allocated in indexcxt which ofcourse in allocated in CacheMemoryContext, but still is there a reason why this also shouldn't be allocated in same context. 2. - aform = (Form_pg_am) MemoryContextAlloc(CacheMemoryContext, sizeof *aform); + aform = (Form_pg_am)MemoryContextAlloc(CacheMemoryContext, sizeof *aform); Spurious change. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pageinspect patch, for showing tuple data
On Sat, Sep 26, 2015 at 1:46 AM, Nikolay Shaplov wrote: > В письме от 25 сентября 2015 20:59:29 пользователь Michael Paquier написал: >> Thanks! I just had a short look at it: >> - I am not convinced that it is worth declaring 3 versions of >> tuple_data_split. > How which of them should we leave? The one with the most arguments. Now perhaps we could have as well two of them: one which has the minimum number of arguments with default values, and the second one with the maximum number of arguments. >> - The patch does not respect the project code style, >> particularly one-line "if condition {foo;}" are not adapted, code line is > limited >> to 80 characters, etc. The list is basically here: >> http://www.postgresql.org/docs/current/static/source.html > I did my best. Results are attached. Thanks, it looks better. >> - Be aware of typos: s/whitch/which is one. > I've run spellchecker on all comments. Hope that I removed most of the > mistakes. But as I am not native speaker, I will not be able to eliminate them > all. I will need help here. I have not spotted new mistakes regarding that. >> +t_infomask2 >> +integer >> +stores number of attributes (11 bits of the word). The >> rest are used for flag bits: >> + >> +0x2000 - tuple was updated and key cols modified, or tuple deleted >> +0x4000 - tuple was HOT-updated >> +0x8000 - this is heap-only tuple >> +0xE000 - visibility-related bits (so called "hint bits") >> + >> This large chunk of documentation is a duplicate of storage.sgml. If >> that's really necessary, it looks adapted to me to have more detailed >> comments at code level directly in heapfuncs.c. > Hm... There is no explanation of t_infomask/t_infomask2 bits in storage.sgml. > > If there is no source of information other then source code, then the > documentation is not good. pageinspect is already something that works at low-level. My guess is that users of this module are going to have a look at the code either way to understand how tuple data is manipulated within a page. > So I would consider two options: Either move t_infomask/t_infomask2 details > into storage.sgml or leave as it is. The documentation redirects the reader to look at htup_details.h (the documentation is actually incorrect, I sent a separate patch), and I see no point in duplicating such low-level descriptions, that would be double maintenance for the same result. > I am lazy, and does not feel confidence about touching main documentation, so > I > would prefer to leave as it is. Your patch is proving the contrary :) Honestly I would just rip out the part you have added to describe all the fields related to HeapTupleHeaderData, and have only a simple self-contained example to show how to use the new function tuple_data_split. >> The example of tuple_data_split in the docs would be more interesting >> if embedded with a call to heap_page_items. > > This example would be almost not readable. May be I should add one more praise > explaining where did we take arguments for tuple_data_split I would as well just remove heap_page_item_attrs, an example in the docs would be just enough IMO (note that I never mind being outvoted). Btw, shouldn't the ereport messages in tuple_data_split use error_level instead of ERROR? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned checkpointing
Hello, These are interesting runs. In a situation in which small values are set in dirty_bytes and dirty_backgound_bytes, a buffer is likely stored in the HD immediately after the buffer is written in the kernel by the checkpointer. Thus, I tried a quick hack to make the checkpointer invoke write system call to write a dirty buffer immediately followed by invoking store operation for a buffer implemented with sync_file_range() system call. # For reference, I attach the patch. As shown in file_sync_range.JPG, this strategy considered to have been effective. Indeed. This approach is part of this current patch: https://commitfest.postgresql.org/6/260/ Basically, what you do is to call sync_file_range on each block, and you tested on a high-end system probably with a lot of BBU disk cache, which I guess allows the disk to reorder writes so as to benefit from sequential write performance. In conclusion, as long as pgbench execution against linux concerns, using sync_file_range() is a promising solution. I found that calling sync_file_range for every block could degrade performance a bit under some conditions, at least onmy low-end systems (just a [raid] disk, no significant disk cache in front of it), so the above patch aggregates neighboring writes so as to issue less sync_file_range calls. That is, the checkpointer invokes sync_file_range() to store a buffer immediately after it writes the buffer in the kernel. Yep. It is interesting that sync_file_range alone improves stability a lot on your high-end system, although sorting is mandatory for low-end systems. My interpretation, already stated above, is that the hardware does the sorting on the cached data at the disk level in your system. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench stats per script & other stuff
Here is a v10, which is a rebase because of the "--progress-timestamp" option addition. It also include the fix for the tps without connection computation and some minor code simplification, so it is redundant with this bug fix patch: https://commitfest.postgresql.org/7/378/ -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 5:52 AM, Robert Haaswrote: > > On Fri, Sep 25, 2015 at 12:00 AM, Amit Kapila wrote: > > I think initPlan will work with the existing patches as we are always > > executing it in master and then sending the result to workers. Refer > > below code in funnel patch: > > Sure, *if* that's what we're doing, then it will work. But if an > initPlan actually attaches below a funnel, then it will break. > Currently, it's considered for initPlan of only left tree of Funnel, however if we want to push multiple nodes under Funnel, then it won't work as it is. I think even if we want to make that work, we would need to traverse the whole tree under Funnel and do what currently is done for each of the initPlan we encounter. In general, I think the idea of passing the results of initPlan to workers seems like the right way of dealing with initPlans and by the way this was a suggestion made by you sometime back. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] PATCH: index-only scans with partial indexes
Hi, On 09/18/2015 03:46 AM, Kyotaro HORIGUCHI wrote: Hello, At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondrawrote in <55fadeeb.4000...@2ndquadrant.com> Yes, this seems sane. I've been poking at this a bit too, and I came to the same plan in general, except that I think it's better to build list of clauses that are *not* implied by the index, because that's what we need both in cost_index and check_index_only. I intended to isolate IndexOptInfo from belonging RelOptInfo but the exclusion list also bonds them tightly, and one IndexOptInfo belongs to only one RelOptInfo so no need to isolate. So not-implied-restrictclauses in IndexOptInfo would be preferable. It also seems to me that this change (arguably a bug fix) should pretty much make the original patch irrelevant, because check_index_only can simply walk over the new list. Yeah. This seems to be a bug irrelevant to your index-only-scan ptch. Attached is a patch that makes this work correctly, I believe. The only difference compared to the v1 of the patch is this piece of code in match_clause_to_index (indexpath.c): if ((index->indpred != NIL) && (predicate_implied_by(lappend(NIL, rinfo->clause), index->indpred))) continue; which effectively says that we should ignore clauses implied by the index predicate. This fixes the way we build IndexClauseSet for the two indexes - originally we'd add the implied clause on the large index but not on the small one (because it does not have the column referenced in the condition). And as far as I can say, this also fixes all the costing issues that I've described because cost_index sees the same thing for both indexes. It's also early enough to fix the actual path, so EXPLAIN shows the same thing for both indexes (so no "Index Cond" present for only one of the indexes). I've originally tried to fix this in extract_nonindex_conditions, but that's way too late - it can only fix the costing, not the path. Arguably this part of the patch is a bugfix of an issue present on master. The patch does not change the check_index_only implementation - it still needs to check the clauses, just like in v1 of the patch. To make this re-check unnecessary, we'd have to stick the remaining clauses somewhere, so that check_index_only can use only the filtered list (instead of walking through the complete list of restrictions). Or perhaps we can do the check in match_clause_to_index, pretty much for free? If the clause is not implied by the index predicate (which we know thanks to the fix), and we don't assign it to any of the index columns, it means we can'd use IOS, no? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 7069f60..d2c9f51 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -86,6 +86,7 @@ #include "optimizer/placeholder.h" #include "optimizer/plancat.h" #include "optimizer/planmain.h" +#include "optimizer/predtest.h" #include "optimizer/restrictinfo.h" #include "parser/parsetree.h" #include "utils/lsyscache.h" diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 9da5444..2d3c0f8 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -1798,13 +1798,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) * Check that all needed attributes of the relation are available from the * index. * - * XXX this is overly conservative for partial indexes, since we will - * consider attributes involved in the index predicate as required even - * though the predicate won't need to be checked at runtime. (The same is - * true for attributes used only in index quals, if we are certain that - * the index is not lossy.) However, it would be quite expensive to - * determine that accurately at this point, so for now we take the easy - * way out. + * For partial indexes we won't consider attributes involved in clauses + * implied by the index predicate, as those won't be needed at runtime. + * + * XXX The same is true for attributes used only in index quals, if we + * are certain that the index is not lossy. However, it would be quite + * expensive to determine that accurately at this point, so for now we + * take the easy way out. */ /* @@ -1819,6 +1819,28 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + /* + * If the index is partial, we won't consider the clauses that are + * implied by the index predicate (those are not needed at runtime). + */ + if (index->indpred != NIL) + { + bool implied = false; + List *clauses = NIL; + + /* need a list for the 'implied_by' call */ + clauses = lappend(clauses, (Node *)
Re: [HACKERS] Improving test coverage of extensions with pg_dump
On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrerawrote: > Euler Taveira wrote: > > On 17-09-2015 14:21, Michael Paquier wrote: > > >pg_dump relies on attnum to define the column ordering, so one > > >possibility would be to do things more consistently at backend level. > > We discussed this in some other thread, not long ago. I looked briefly > in the archives but couldn't find it. I think the conclusion was > something along the lines of "hmm, tough". > That's hours, even days of fun ahead for sure. > > Someone can say that we could assign an attnum for column "d" considering > > all of the inheritance tree. However, attnum is used as an index to > arrays > > (we could bloat some of those) and some logic rely on it to count the > number > > of columns. It would become tablecmds.c into an spaghetti. > That was my first impression, but that's just a band-aid. Switching the logic in stable branches to assign a correct attnum for a child table, while switching on the way the attnum referring to the parent entries is a scary idea. I would suspect that this would break many other things for fixing a narrow case, and it is still possible for the user to get away by using COPY or INSERT with the list of columns listed when taking a dump. We don't need any more spaghetti there, thanks! > Agreed. > > IMHO a possible way to solve it is adding support for logical column > > ordering. An ALTER TABLE command (emitted if a parameter was informed) > > during dump could handle it. BTW, last thread [1] about logical column > > ordering seems to have died a few months ago. Alvaro? > > Tomas Vondra also worked a bit on this patch, and we eventually gave up > on it due to lack of time. We might be able to get back on it someday, > but do not hold your breath. If you want the current bug fixed, do not > wait for logical column numbering. > Honestly I have the feeling that the discussion of this thread gets unproductive, let's not forget that the patch presented on this thread is just aiming at adding one test case to ensure that extensions using dumpable relations with FKs get correctly dumped, which is to ensure that we do not reintroduce a bug that existed in the extension facility since its introduction in 9.1. That being said, the patch is just fine for this purpose, but that's just my opinion. -- Michael
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 12:38 PM, Amit Kapilawrote: > > On Sat, Sep 26, 2015 at 6:07 AM, Robert Haas wrote: > > > > > Assuming I'm not confused, I'm planning to see about fixing this... > > > > Can't we just traverse the queryDesc->planstate tree and fetch/add > all the instrument information if there are multiple nodes? > I think the above suggestion made by me won't work, because we want this information per node basis in master as Explain will display each node's information separately. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Tab completion for ALTER COLUMN SET STATISTICS
On Sat, Sep 26, 2015 at 7:18 AM, Jeff Janeswrote: > If I have "alter table foo alter COLUMN bar SET STATISTICS" in the line > buffer, > it tab completes to add " TO", which is not legal. > > The attached patch makes it not tab complete anything at all, which is at > least not actively misleading. > I thought of having it complete "-1", "" so that it gives a clue > about what is needed, but I didn't see any precedence for non-literal > clue-giving and I did not want to try to create new precedence. +1 for the way you are doing it in your patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 3:08 AM, Amit Kapilawrote: >> memcpy() can cope with unaligned data; structure member assignment can't. > > So doesn't coping means, it anyways have to have to pay the performance > penality to make it equivalent to aligned address access. Apart from that, > today I had read about memcpy's behaviour incase of unaligned address, > it seems from some of the information on net that it could be unsafe > [1],[2]. I'm not concerned about the performance penalty for unaligned access in this case; I'm concerned about the fact that on some platforms it causes a segmentation fault. The links you've provided there are examples of cases where that wasn't true, and people reported that as a bug in memcpy. > Yes, you have figured out correctly, I was under impression that we > will have single node execution in worker for first version and then > will extend it later. No, I really want it to work with multiple nodes from the start, and I've pretty much got that working here now. > QueryDesc's totaltime is for instrumentation information for plugin's > like pg_stat_statements and we need only the total buffer usage > of each worker to make it work as the other information is already > collected in master backend, so I think that should work as I have > written. I don't think that's right at all. First, an extension can choose to look at any part of the Instrumentation, not just the buffer usage. Secondly, the buffer usage inside QueryDesc's totaltime isn't the same as the global pgBufferUsage. >> Assuming I'm not confused, I'm planning to see about fixing this... > > Can't we just traverse the queryDesc->planstate tree and fetch/add > all the instrument information if there are multiple nodes? Well you need to add each node's information in each worker to the corresponding node in the leader. You're not just adding them all up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 6:07 AM, Robert Haaswrote: > > On Fri, Sep 25, 2015 at 7:46 AM, Amit Kapila wrote: > > I have a question here which is why this format doesn't have a similar > > problem > > as the current version, basically in current patch the second read of > > SerializedParamExternData can be misaligned and for same reason in your > > patch the second read of Oid could by misaligned? > > memcpy() can cope with unaligned data; structure member assignment can't. > So doesn't coping means, it anyways have to have to pay the performance penality to make it equivalent to aligned address access. Apart from that, today I had read about memcpy's behaviour incase of unaligned address, it seems from some of the information on net that it could be unsafe [1],[2]. > I've worked some of this code over fairly heavily today and I'm pretty > happy with how my copy of execParallel.c looks now, but I've run into > one issue where I wanted to check with you. There are three places > where Instrumentation can be attached to a query: a ResultRelInfo's > ri_TrigInstrument (which doesn't matter for us because we don't > support parallel write queries, and triggers don't run on reads), a > PlanState's instrument, and a QueryDesc's total time. > > Your patch makes provision to copy ONE Instrumentation structure per > worker back to the parallel leader. I assumed this must be the > QueryDesc's totaltime, but it looks like it's actually the PlanState > of the top node passed to the worker. That's of course no good if we > ever push more than one node down to the worker, which we may very > well want to do in the initial version, and surely want to do > eventually. We can't just deal with the top node and forget all the > others. Is that really what's happening here, or am I confused? > Yes, you have figured out correctly, I was under impression that we will have single node execution in worker for first version and then will extend it later. QueryDesc's totaltime is for instrumentation information for plugin's like pg_stat_statements and we need only the total buffer usage of each worker to make it work as the other information is already collected in master backend, so I think that should work as I have written. > Assuming I'm not confused, I'm planning to see about fixing this... > Can't we just traverse the queryDesc->planstate tree and fetch/add all the instrument information if there are multiple nodes? [1] - http://gcc.gnu.org/ml/gcc-bugs/2000-03/msg00155.html [2] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53016 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] PATCH: index-only scans with partial indexes
Hi, On 09/26/2015 01:28 PM, Tomas Vondra wrote: The patch does not change the check_index_only implementation - it still needs to check the clauses, just like in v1 of the patch. To make this re-check unnecessary, we'd have to stick the remaining clauses somewhere, so that check_index_only can use only the filtered list (instead of walking through the complete list of restrictions). After thinking about this a bit more, I realized we already have a good place for keeping this list - IndexClauseSet. So I've done that, and removed most of the code from check_index_only - it still needs to decide whether to use the full list of restrictions (regular indexes) or the filtered list (for partial indexes). Calling predicate_implied_by in match_clause_to_index makes the check a bit earlier, compared to the v1 of the patch. So theoretically there might be cases where we'd interrupt the execution between those places, and the v1 would not pay the price for the call. But those places are fairly close together, so I find that unlikely and I've been unable to reproduce a plausible example of such regression despite trying. The only regression test this breaks is "aggregates", and I believe that's actually correct because it changes the plans like this: -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 -Index Cond: (f1 IS NOT NULL) i.e. it removes the Index Condition from scans of minmaxtest3i, which is perfectly sensible because the index is defined like this: create index minmaxtest3i on minmaxtest3(f1) where f1 is not null; So it's partial index and that condition is implied by the predicate (it's actually exactly the same). I haven't fixed those regression tests for now. Or perhaps we can do the check in match_clause_to_index, pretty much for free? If the clause is not implied by the index predicate (which we know thanks to the fix), and we don't assign it to any of the index columns, it means we can'd use IOS, no? After thinking about this a bit more, this is clearly nonsense. It fails on conditions referencing multiple columns (WHERE a=b) which can't be assigned to a single index column. So the logic would have to be much more complicated, effectively doing what check_index_only is doing just a tiny bit later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 9da5444..d257441 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -59,6 +59,7 @@ typedef struct bool nonempty; /* True if lists are not all empty */ /* Lists of RestrictInfos, one per index column */ List *indexclauses[INDEX_MAX_KEYS]; + List *rclauses; /* clauses not implied by predicate */ } IndexClauseSet; /* Per-path data used within choose_bitmap_and() */ @@ -129,7 +130,7 @@ static PathClauseUsage *classify_index_clause_usage(Path *path, static Relids get_bitmap_tree_required_outer(Path *bitmapqual); static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static int find_list_position(Node *node, List **nodelist); -static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index); +static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index, List *clauses); static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids); static double adjust_rowcount_for_semijoins(PlannerInfo *root, Index cur_relid, @@ -1019,7 +1020,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, * index data retrieval anyway. */ index_only_scan = (scantype != ST_BITMAPSCAN && - check_index_only(rel, index)); + check_index_only(rel, index, clauses->rclauses)); /* * 4. Generate an indexscan path if there are relevant restriction clauses @@ -1782,7 +1783,7 @@ find_list_position(Node *node, List **nodelist) * Determine whether an index-only scan is possible for this index. */ static bool -check_index_only(RelOptInfo *rel, IndexOptInfo *index) +check_index_only(RelOptInfo *rel, IndexOptInfo *index, List *clauses) { bool result; Bitmapset *attrs_used = NULL; @@ -1790,6 +1791,8 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) ListCell *lc; int i; + List *rclauses; + /* Index-only scans must be enabled */ if (!enable_indexonlyscan) return false; @@ -1798,13 +1801,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) * Check that all needed attributes of the relation are available from the * index. * - * XXX this is overly conservative for partial indexes, since we will - * consider attributes involved in the index predicate as required even - * though the predicate won't need to be checked at runtime.
Re: [HACKERS] Improving test coverage of extensions with pg_dump
On 2015-09-26 21:54:46 +0900, Michael Paquier wrote: > On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera> wrote: > > We discussed this in some other thread, not long ago. I looked briefly > > in the archives but couldn't find it. I think the conclusion was > > something along the lines of "hmm, tough". > That's hours, even days of fun ahead for sure. To me that's a somewhat serious bug, and something that we probably need to address at some point. > Honestly I have the feeling that the discussion of this thread gets > unproductive, let's not forget that the patch presented on this thread is > just aiming at adding one test case to ensure that extensions using > dumpable relations with FKs get correctly dumped, which is to ensure that > we do not reintroduce a bug that existed in the extension facility since > its introduction in 9.1. That being said, the patch is just fine for this > purpose, but that's just my opinion. It's an unsustainable test model. Adding own test runners, directories, initdb etc. for a simple regression test of a couple lines won't hurt for maybe the first five but after that it starts to get unmaintainable. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
Hi, I pushed this to 9.5 and master, committing the xlog page magic bump separately. To avoid using a magic value from master in 9.5 I bumped the numbers by two in both branches. Should this get a release note entry given that we're not (at least immediately) backpatching this? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-09-25 17:45, Alvaro Herrera wrote: I think the API of getOpFamilyInfo is a bit odd; is the caller expected to fill intype and keytype always, and then it only sets the procs/opers lists? I wonder if it would be more sensible to have that routine receive the pg_opclass tuple (or even the opclass OID) instead of the opfamily OID. I think "amapi.h" is not a great file name. What about am_api.h? Well we have related fdwapi.h and tsmapi.h (and several unrelated *api.h which also don't use "_" in the name) so amapi.h seems fine to me. I'm unsure about BRIN_NPROC. Why did you set it to 15? It's not unthinkable that future opclass frameworks will have different numbers The BRIN_NPROC should be probably defined in brin.c since it's only used for sizing local array variable in amvalidate and should be used to set amsupport in the init function as well then. of support procs. For BRIN I'm thinking that we could add another support proc which validates the opclass definition using the specific framework; that way we will be able to check that the set of operators defined are correct, etc (something that the current approach cannot do). As I said before in the thread I would prefer more granular approach to validation - have amvalidateopclass in the struct for the current functionality so that we can easily add more validators in the future. There can still be one amvalidate function exposed on SQL level that just calls all the amvalidate* functions that the am defines. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Sat, Sep 26, 2015 at 4:29 AM, Paul Ramsey wrote: > On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquier wrote: > src/backend/utils/adt/format_type.c > +/* > + * This version allows a nondefault typemod to be specified and fully > qualified. > + */ > +char * > +format_type_with_typemod_qualified(Oid type_oid, int32 typemod) > +{ > + return format_type_internal(type_oid, typemod, true, false, true); > +} Patch 0001 in this email has a routine called format_type_detailed that I think is basically what we need for this stuff: http://www.postgresql.org/message-id/CACACo5Q_UXYwF117LBhjZ3xaMPyrgqnqE=mxvrhefjj51ac...@mail.gmail.com Could you look at it? > I've put a very very small regression file in that tests the shippable > feature, which can be fleshed out further as needed. > > Thanks so much! Nice. Thanks for the addition of the regression tests. It begins to have a nice shape. + * shippable.c + * Non-built-in objects cache management and utilities. + * + * Is a non-built-in shippable to the remote server? Only if + * the object is in an extension declared by the user in the + * OPTIONS of the wrapper or the server. This is rather unclear. It would be more simple to describe that as "Facility to track database objects shippable to a foreign server". +extern bool extractExtensionList(char *extensionString, + List **extensionOids); What's the point of the boolean status in this new routine? The return value of extractExtensionList is never checked, and it would be more simple to just return the parsed list as return value, no? -REGRESS = postgres_fdw +REGRESS = postgres_fdw shippable +EXTRA_INSTALL = contrib/seg The order of the tests is important and should be mentioned, shippable.sql using the loopback server created by postgres_fdw. +-- === +-- clean up +-- === Perhaps here you meant dropping the schema and the foreign tables created previously? +CREATE SCHEMA "SH 1"; Is there a reason to use double-quoted relations? There is no real reason to not use them, this is just to point out that this is not a common practice in the other regression tests. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
Andres Freundwrites: > Should this get a release note entry given that we're not (at least > immediately) backpatching this? I'll probably put something in when I update the release notes for beta1 (next week sometime); no real need to deal with it individually. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 10:16 AM, Robert Haaswrote: > On Sat, Sep 26, 2015 at 8:38 AM, Robert Haas wrote: >>> QueryDesc's totaltime is for instrumentation information for plugin's >>> like pg_stat_statements and we need only the total buffer usage >>> of each worker to make it work as the other information is already >>> collected in master backend, so I think that should work as I have >>> written. >> >> I don't think that's right at all. First, an extension can choose to >> look at any part of the Instrumentation, not just the buffer usage. >> Secondly, the buffer usage inside QueryDesc's totaltime isn't the same >> as the global pgBufferUsage. > > Oh... but I'm wrong. As long as our local pgBufferUsage gets update > correctly to incorporate the data from the other workers, the > InstrStopNode(queryDesc->totaltime) will suck in those statistics. > And the only other things getting updated are nTuples (which shouldn't > count anything that the workers did), firsttuple (similarly), and > counter (where the behavior is a bit more arguable, but just counting > the master's wall-clock time is at least defensible). So now I think > you're right: this should be OK. OK, so here's a patch extracted from your parallel_seqscan_partialseqscan_v18.patch with a fairly substantial amount of rework by me: - I left out the Funnel node itself; this is just the infrastructure portion of the patch. I also left out the stop-the-executor early stuff and the serialization of PARAM_EXEC values. I want to have those things, but I think they need more thought and study first. - I reorganized the code a fair amount into a former that I thought was clearer, and certainly is closer to what I did previously in parallel.c. I found your version had lots of functions with lots of parameters, and I found that made the logic difficult to follow, at least for me. As part of that, I munged the interface a bit so that execParallel.c returns a structure with a bunch of pointers in it instead of separately returning each one as an out parameter. I think that's cleaner. If we need to add more stuff in the future, that way we don't break existing callers. - I reworked the interface with instrument.c and tried to preserve something of an abstraction boundary there. I also changed the way that stuff accumulated statistics to include more things; I couldn't see any reason to make it as narrow as you had it. - I did a bunch of cosmetic cleanup, including changing function names and rewriting comments. - I replaced your code for serializing and restoring a ParamListInfo with my version. - I fixed the code so that it can handle collecting instrumentation data from multiple nodes, bringing all the data back to the leader and associating it with the right plan node. This involved giving every plan node a unique ID, as discussed with Tom on another recent thread. After I did all that, I wrote some test code, which is also attached here, that adds a new GUC force_parallel_worker. If you set that GUC, when you run a query, it'll run the query in a parallel worker and feed the results back to the master. I've tested this and it seems to work, at least on the queries where you'd expect it to work. It's just test code, so it doesn't have error checking or make any attempt not to push down queries that will fail in parallel mode. But you can use it to see what happens. You can also run queries under EXPLAIN ANALYZE this way and, lo and behold, the worker stats show up attached to the correct plan nodes. I intend to commit this patch (but not the crappy test code, of course) pretty soon, and then I'm going to start working on the portion of the patch that actually adds the Funnel node, which I think you are working on renaming to Gather. I think that getting that part committed is likely to be pretty straightforward; it doesn't need to do a lot more than call this stuff and tell it to go do its thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company From 7c5fa754fada96553930820b970da876fb1dc468 Mon Sep 17 00:00:00 2001 From: Robert Haas Date: Fri, 25 Sep 2015 17:50:19 -0400 Subject: [PATCH 2/2] Test code. --- src/backend/executor/execMain.c | 47 - src/backend/executor/execParallel.c | 4 +++- src/backend/utils/misc/guc.c| 11 + src/include/utils/guc.h | 2 ++ 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 85ff46b..4863afd 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -45,6 +45,8 @@ #include "commands/matview.h" #include "commands/trigger.h" #include "executor/execdebug.h" +#include "executor/execParallel.h" +#include "executor/tqueue.h" #include "foreign/fdwapi.h" #include "mb/pg_wchar.h" #include "miscadmin.h" @@ -339,13
Re: [HACKERS] unclear about row-level security USING vs. CHECK
On 9/23/15 3:41 PM, Stephen Frost wrote: > The CREATE POLICY documentation discusses how lack of a WITH CHECK > policy means the USING expression is used: > > """ > Policies can be applied for specific commands or for specific roles. The > default for newly created policies is that they apply for all commands > and roles, unless otherwise specified. If multiple policies apply to a > given query, they will be combined using OR (although ON CONFLICT DO > UPDATE and INSERT policies are not combined in this way, but rather > enforced as noted at each stage of ON CONFLICT execution). Further, for > commands which can have both USING and WITH CHECK policies (ALL and > UPDATE), if no WITH CHECK policy is defined then the USING policy will > be used for both what rows are visible (normal USING case) and which > rows will be allowed to be added (WITH CHECK case). > """ I see. But it is a bit odd to hide this very fundamental behavior somewhere in a paragraph that starts out with something about roles. There is also a mistake, I believe: DELETE policies also take both a CHECK and a USING clause. I still find something about this weird, but I'm not sure what. It's not clear to me at what level this USING->CHECK mapping is applied. I can write FOR ALL USING and it will be mapped to CHECK for all actions, including INSERT, but when I write FOR INSERT USING it complains. Why doesn't it do the mapping that case, too? >> (Btw., what's the meaning of a policy for DELETE?) > > The DELETE policy controls what records a user is able to delete. That needs to be documented somewhere. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Sep 26, 2015 at 8:38 AM, Robert Haaswrote: >> QueryDesc's totaltime is for instrumentation information for plugin's >> like pg_stat_statements and we need only the total buffer usage >> of each worker to make it work as the other information is already >> collected in master backend, so I think that should work as I have >> written. > > I don't think that's right at all. First, an extension can choose to > look at any part of the Instrumentation, not just the buffer usage. > Secondly, the buffer usage inside QueryDesc's totaltime isn't the same > as the global pgBufferUsage. Oh... but I'm wrong. As long as our local pgBufferUsage gets update correctly to incorporate the data from the other workers, the InstrStopNode(queryDesc->totaltime) will suck in those statistics. And the only other things getting updated are nTuples (which shouldn't count anything that the workers did), firsttuple (similarly), and counter (where the behavior is a bit more arguable, but just counting the master's wall-clock time is at least defensible). So now I think you're right: this should be OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reusing abbreviated keys during second pass of ordered [set] aggregates
On Sun, Sep 6, 2015 at 9:35 PM, Thomas Munrowrote: > Running various contrived aggregate queries on a large low cardinality > dataset in a small range (therefore frequently the same weight & size), I > managed to measure a small improvement of up to a few percent with the > attached patch. I also wonder whether numeric_cmp could be profitably > implemented with memcmp on big endian systems and some unrolled loops on > little endian systems when size & weight match. I think that setting up numeric SortSupport to have scratch memory used across authoritative numeric comparator calls would also help. We should prefer to do this kind of thing in a datatype independent way, of course. I'm not opposed to doing what you outline too, but I don't think it will be especially helpful for the cases here. I think that what you're talking about would live in the SortSupport authoritative comparator, and would benefit non-leading-attribute comparisons most. > Of course there are a ton of other overheads involved with numeric. I > wonder how crazy or difficult it would be to make it so that we could > optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag > bit to say 'I'm an immediate value, not a pointer', and then packing 3 > digits (= 12 significant figures) + sign + weight into the other 63 bits. That seems possible, but very invasive. I'd want to get a good sense of the pay-off before undertaking such a project. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers