Re: [HACKERS] store narrow values in hash indexes?
On Sat, Sep 24, 2016 at 1:02 AM, Robert Haaswrote: > Currently, hash indexes always store the hash code in the index, but > not the actual Datum. It's recently been noted that this can make a > hash index smaller than the corresponding btree index would be if the > column is wide. However, if the index is being built on a fixed-width > column with a typlen <= sizeof(Datum), we could store the original > value in the hash index rather than the hash code without using any > more space. That would complicate the code, but I bet it would be > faster: we wouldn't need to set xs_recheck, we could rule out hash > collisions without visiting the heap, and we could support index-only > scans in such cases. > What exactly you mean by Datum? Is it for datatypes that fits into 64 bits like integer. I think if we are able to support index only scans for hash indexes for some data types, that will be a huge plus. Surely, there is some benefit without index only scans as well, which is we can avoid recheck, but not sure if that alone can give us any big performance boost. As, you say, it might lead to some complication in code, but I think it is worth trying. Won't it add some requirements for pg_upgrade as well? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondrawrote: > On 09/23/2016 03:07 PM, Amit Kapila wrote: >> >> On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra >> wrote: >>> >>> On 09/23/2016 01:44 AM, Tomas Vondra wrote: ... The 4.5 kernel clearly changed the results significantly: >>> ... (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. >>> >>> The more I think about these random spikes in pgbench performance on >>> 3.2.80, >>> the more I find them intriguing. Let me show you another example (from >>> Dilip's workload and group-update patch on 64 clients). >>> >>> This is on 3.2.80: >>> >>> 44175 34619 51944 38384 49066 >>> 37004 47242 36296 46353 36180 >>> >>> and on 4.5.5 it looks like this: >>> >>> 34400 35559 35436 34890 34626 >>> 35233 35756 34876 35347 35486 >>> >>> So the 4.5.5 results are much more even, but overall clearly below >>> 3.2.80. >>> How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we >>> randomly do something right, but what is it and why doesn't it happen on >>> the >>> new kernel? And how could we do it every time? >>> >> >> As far as I can see you are using default values of min_wal_size, >> max_wal_size, checkpoint related params, have you changed default >> shared_buffer settings, because that can have a bigger impact. > > > Huh? Where do you see me using default values? > I was referring to one of your script @ http://bit.ly/2doY6ID. I haven't noticed that you have changed default values in postgresql.conf. > There are settings.log with a > dump of pg_settings data, and the modified values are > > checkpoint_completion_target = 0.9 > checkpoint_timeout = 3600 > effective_io_concurrency = 32 > log_autovacuum_min_duration = 100 > log_checkpoints = on > log_line_prefix = %m > log_timezone = UTC > maintenance_work_mem = 524288 > max_connections = 300 > max_wal_size = 8192 > min_wal_size = 1024 > shared_buffers = 2097152 > synchronous_commit = on > work_mem = 524288 > > (ignoring some irrelevant stuff like locales, timezone etc.). > >> Using default values of mentioned parameters can lead to checkpoints in >> between your runs. > > > So I'm using 16GB shared buffers (so with scale 300 everything fits into > shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout > 1h etc. So no, there are no checkpoints during the 5-minute runs, only those > triggered explicitly before each run. > Thanks for clarification. Do you think we should try some different settings *_flush_after parameters as those can help in reducing spikes in writes? >> Also, I think instead of 5 mins, read-write runs should be run for 15 >> mins to get consistent data. > > > Where does the inconsistency come from? Thats what I am also curious to know. > Lack of warmup? Can't say, but at least we should try to rule out the possibilities. I think one way to rule out is to do slightly longer runs for Dilip's test cases and for pgbench we might need to drop and re-create database after each reading. > Considering how > uniform the results from the 10 runs are (at least on 4.5.5), I claim this > is not an issue. > It is quite possible that it is some kernel regression which might be fixed in later version. Like we are doing most tests in cthulhu which has 3.10 version of kernel and we generally get consistent results. I am not sure if later version of kernel say 4.5.5 is a net win, because there is a considerable difference (dip) of performance in that version, though it produces quite stable results. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Complete LOCK TABLE ... IN ACCESS|ROW|SHARE
Hi After LOCK TABLE ... IN ACCESS|ROW|SHARE we run out of completions. Here's a patch to improve that, for November. -- Thomas Munro http://www.enterprisedb.com complete-lock-table.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared m
On Fri, Sep 23, 2016 at 7:41 PM, Tom Lanewrote: > Amit Kapila writes: >> Now, it is quite possible >> that error code is set to 0 on success in my windows environment >> (Win7) and doesn't work in some other environment. In any case, if we >> want to go ahead and don't want to rely on CreateFileMapping(), then >> attached patch should suffice the need. > > Yeah, seeing that this is not mentioned in MS' documentation I don't > think we should rely on it. The patch looks fine to me, pushed. > Thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Refactor StartupXLOG?
Hi hackers StartupXLOG is 1549 lines of code. Its unwieldy size came up in a discussion in an IRC channel where some PG hackers lurk and I decided to see how it might be chopped up into subroutines with a clear purpose and reasonable size, as an exercise. I came up with the following on a first pass, though obviously you could go a lot further: StartupXLOG -- reduced to 651 lines -> ReportControlFileState() -- 40 lines to log recovery startup message -> ReportRecoveryTarget() -- 32 lines to log recovery target message -> ProcessBackupLabel() -- 90 lines to read backup label's checkpoint -> CleanUpTablespaceMap() -- 33 lines of file cleanup -> BeginRedo() -- 191 lines for redo setup -> BeginHotStandby() -- 74 lines for hot standby initialisation -> ReplayRedo() -- 260 lines for the main redo loop -> FinishRedo() -- 63 lines for redo completion -> AssignNewTimeLine() -- 66 lines -> CleanUpOldTimelineSegments() -- 74 lines of file cleanup -> RecoveryCheckpoint() -- 71 lines Unfortunately the attached sketch patch (not tested much) is totally unreadable as a result of moving so many things around. It would probably need to be done in a series of small well tested readable patches moving one thing out at a time. Perhaps with an extra context/state object to cut down on wide argument lists. What would the appetite be for that kind of refactoring work, considering the increased burden on committers who have to backpatch bug fixes? Is it a project goal to reduce the size of large complicated functions like StartupXLOG and heap_update? It seems like a good way for new players to learn how they work. -- Thomas Munro http://www.enterprisedb.com refactor-startupxlog-sketch.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 21/09/16 15:04, Peter Eisentraut wrote: > Some partial notes on 0005-Add-logical-replication-workers.patch: > > Document to what extent other relation types are supported (e.g., > materialized views as source, view or foreign table or temp table as > target). Suggest an updatable view as target if user wants to have > different table names or write into a different table structure. > I don't think that's good suggestion, for one it won't work for UPDATEs as we have completely different path for finding the tuple to update which only works on real data, not on view. I am thinking of even just allowing table to table replication in v1 tbh, but yes it should be documented what target relation types can be. > > subscriptioncmds.c: Perhaps combine logicalrep_worker_find() and > logicalrep_worker_stop() into one call that also encapsulates the > required locking. I was actually thinking of moving the wait loop that waits for worker to finish there as well. > > In get_subscription_list(), the memory context pointers don't appear to > do anything useful, because everything ends up being CurrentMemoryContext. > That's kind of the point of the memory context pointers there though as we start transaction inside that function. > pg_stat_get_subscription(NULL) for "all" seems a bit of a weird interface. > I modeled that after pg_stat_get_activity() which seems to be similar type of interface. > pglogical_apply_main not used, should be removed. > Hah. > In logicalreprel_open(), the error message "cache lookup failed for > remote relation %u" could be clarified. This message could probably > happen if the protocol did not send a Relation message first. (The term > "cache" is perhaps inappropriate for LogicalRepRelMap, because it > implies that the value can be gotten from elsewhere if it's not in the > cache. In this case it's really session state that cannot be recovered > easily.) > Yeah I have different code and error for that now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
[ Adjusting subject line to reflect the actual topic of discussion better. ] On Fri, Sep 23, 2016 at 9:29 AM, Robert Haaswrote: > On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar > wrote: >> For e.g., in the above plan which you specified, suppose : >> 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so >> is >> waiting in ExecAsyncWaitForNode(foreign_scan_on_b). >> 2. The event wait list already has foreign scan on a that is on a different >> subtree. >> 3. This foreign scan a happens to be ready, so in >> ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called, >> which returns with result_ready. >> 4. Since it returns result_ready, it's parent node is now inserted in the >> callbacks array, and so it's parent (Append) is executed. >> 5. But, this Append planstate is already in the middle of executing Hash >> join, and is waiting for HashJoin. > > Ah, yeah, something like that could happen. I've spent much of this > week working on a new design for this feature which I think will avoid > this problem. It doesn't work yet - in fact I can't even really test > it yet. But I'll post what I've got by the end of the day today so > that anyone who is interested can look at it and critique. Well, I promised to post this, so here it is. It's not really working all that well at this point, and it's definitely not doing anything that interesting, but you can see the outline of what I have in mind. Since Kyotaro Horiguchi found that my previous design had a system-wide performance impact due to the ExecProcNode changes, I decided to take a different approach here: I created an async infrastructure where both the requestor and the requestee have to be specifically modified to support parallelism, and then modified Append and ForeignScan to cooperate using the new interface. Hopefully that means that anything other than those two nodes will suffer no performance impact. Of course, it might have other problems Some notes: - EvalPlanQual rechecks are broken. - EXPLAIN ANALYZE instrumentation is broken. - ExecReScanAppend is broken, because the async stuff needs some way of canceling an async request and I didn't invent anything like that yet. - The postgres_fdw changes pretend to be async but aren't actually. It's just a demo of (part of) the interface at this point. - The postgres_fdw changes also report all pg-fdw paths as async-capable, but actually the direct-modify ones aren't, so the regression tests fail. - Errors in the executor can leak the WaitEventSet. Probably we need to modify ResourceOwners to be able to own WaitEventSets. - There are probably other bugs, too. Whee! Note that I've tried to solve the re-entrancy problems by (1) putting all of the event loop's state inside the EState rather than in local variables and (2) having the function that is called to report arrival of a result be thoroughly different than the function that is used to return a tuple to a synchronous caller. Comments welcome, if you're feeling brave enough to look at anything this half-baked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company async-wip-2016-09-23.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tbm_lossify causes "unbalanced" hashtables / bitmaps
On 2016-09-23 17:40:13 -0400, Tom Lane wrote: > My idea of an appropriate fix would be to resume the scan at the same > point where the last scan stopped, and work circularly around the table > when necessary. I've played with that idea, and it does help a great deal. Not that surprisingly, it's better than starting at a random point (which in turn is better than starting at one end all the time). > But I'm not sure there is any really good way to do that > in the dynahash infrastructure. Maybe it'd work to keep the iteration > state around, but I don't remember how well that interacts with other > insertions/deletions. What about in your implementation? It's easy enough to specify a start point. Requires exposing some things that I don't necessarily want to - that's why I played around with a random start point - but other than that it's easy to implement. Internally growing the hashtable would be kind of an issue, invalidating that point, but a) we're most of the time not growing at that point anymore b) it'd be quite harmless to start at the wrong point. > There's also the point mentioned in the existing comment, that it'd be > better to go after pages with more bits set first. Not sure of an > inexpensive way to do that (ie, one that doesn't involve multiple > scans of the hashtable). But your results suggest that maybe it'd > be worth making tbm_lossify slower in order to get better results. It's not easy, I agree. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tbm_lossify causes "unbalanced" hashtables / bitmaps
Andres Freundwrites: > Because iteration (both in my implementation and dynahash) is > essentially in bucket order, the front of the hashtable will be mostly > empty, whereas later parts of the hashtable will be full (i.e. a lot of > collisions). The reason for that is that we'll find all parts of the > hashtable that are uncompressed and "early" in the hashspace, and > they'll possibly hash to something later in the table. Hm, yeah, I had supposed that this would hit a random part of the key space, but you're right that over successive calls it's not good. My idea of an appropriate fix would be to resume the scan at the same point where the last scan stopped, and work circularly around the table when necessary. But I'm not sure there is any really good way to do that in the dynahash infrastructure. Maybe it'd work to keep the iteration state around, but I don't remember how well that interacts with other insertions/deletions. What about in your implementation? There's also the point mentioned in the existing comment, that it'd be better to go after pages with more bits set first. Not sure of an inexpensive way to do that (ie, one that doesn't involve multiple scans of the hashtable). But your results suggest that maybe it'd be worth making tbm_lossify slower in order to get better results. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tbm_lossify causes "unbalanced" hashtables / bitmaps
On 2016-09-23 13:58:43 -0700, Andres Freund wrote: > static void > tbm_lossify(TIDBitmap *tbm) > { > ... > pagetable_start_iterate_random(tbm->pagetable, ); Uh, obviously that's from one of my attempts to address the problem. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tbm_lossify causes "unbalanced" hashtables / bitmaps
Hi, Playing around with my hashtable implementation [1] and using it for bitmapscans/tidbitmap.c I noticed something curious. When using small work_mem (4MB in my case) for large-ish tables (~5GB), the performance tanks. That's primarily caused by some issues in the hashtable code (since fixed), but it made me look more at the relevant tidbitmap code. Namely tbm_lossify(): static void tbm_lossify(TIDBitmap *tbm) { ... pagetable_start_iterate_random(tbm->pagetable, ); while ((page = pagetable_iterate(tbm->pagetable, )) != NULL) { if (page->ischunk) continue; /* already a chunk header */ /* * If the page would become a chunk header, we won't save anything by * converting it to lossy, so skip it. */ if ((page->blockno % PAGES_PER_CHUNK) == 0) continue; /* This does the dirty work ... */ tbm_mark_page_lossy(tbm, page->blockno); if (tbm->nentries <= tbm->maxentries / 2) { /* we have done enough */ break; } } tbm_mark_page_lossy() in turn deletes the individual page entry, and adds one for the chunk (spanning several pages). The relevant part is that the loop stops when enough page ranges have been lossified. This leads to some problems: Because iteration (both in my implementation and dynahash) is essentially in bucket order, the front of the hashtable will be mostly empty, whereas later parts of the hashtable will be full (i.e. a lot of collisions). The reason for that is that we'll find all parts of the hashtable that are uncompressed and "early" in the hashspace, and they'll possibly hash to something later in the table. As the number of entries in the hashtable doesn't increase, we'll never grow the hashtable to decrease the number of collisions. I've seen that cause quite noticeable number of conflicts (which of course are worse in open addressing table, rather than a separately chained one). Secondly it'll lead to pretty random parts of the bitmap being compressed. For performance it'd be good to compress "heavily used" areas of the bitmap, instead of just whatever happens to be early in the hash space. I'm not entirely sure how to resolve that best, but it does seem worth addressing. One thing that might be nice is to record the last N insertions once at 90% full or so, and then lossify in a more targeted manner using those. E.g. lossify block ranges (256 long atm) that contain a lot of individual entries or such. Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/20160727004333.r3e2k2y6fvk2ntup%40alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 9.6.0 release schedule
On 09/23/2016 01:31 PM, Tom Lane wrote: > Apologies for the late notice on this, but the release team has concluded > that 9.6 is about as ready as it's going to get. We are planning to go > ahead with 9.6.0 release next week --- that is, wrap Monday 26th for > public announcement on Thursday 29th. > > Thanks to the Release Management Team (Robert, Alvaro, and Noah) for > driving 9.6 towards an on-time release! +100 -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
[HACKERS] PG 9.6.0 release schedule
Apologies for the late notice on this, but the release team has concluded that 9.6 is about as ready as it's going to get. We are planning to go ahead with 9.6.0 release next week --- that is, wrap Monday 26th for public announcement on Thursday 29th. Thanks to the Release Management Team (Robert, Alvaro, and Noah) for driving 9.6 towards an on-time release! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Hi 2016-09-23 10:07 GMT+02:00 Craig Ringer: > > Did some docs copy-editing and integrated some examples. > > Whoops, forgot to attach. > > Rather than sending a whole new copy of the patch, here's a diff > against your patched tree of my changes so you can see what I've done > and apply the parts you want. > > Note that I didn't updated the expected files. > I applied your patch - there is small misunderstanding. The PATH is evaluated once for input row already. It is not clean in code, because it is executor node started and running for all rows. I changed it in your part of doc. to a simple value before calling the function. PATH + expressions are normally evaluated exactly once per result row ## per input row + , Regards Pavel > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hash Indexes
On 2016-09-23 15:19:14 -0400, Robert Haas wrote: > On Wed, Sep 21, 2016 at 10:33 PM, Andres Freundwrote: > > On 2016-09-21 22:23:27 -0400, Tom Lane wrote: > >> Andres Freund writes: > >> > Sure. But that can be addressed, with a lot less effort than fixing and > >> > maintaining the hash indexes, by adding the ability to do that > >> > transparently using btree indexes + a recheck internally. How that > >> > compares efficiency-wise is unclear as of now. But I do think it's > >> > something we should measure before committing the new code. > >> > >> TBH, I think we should reject that argument out of hand. If someone > >> wants to spend time developing a hash-wrapper-around-btree AM, they're > >> welcome to do so. But to kick the hash AM as such to the curb is to say > >> "sorry, there will never be O(1) index lookups in Postgres". > > > > Note that I'm explicitly *not* saying that. I just would like to see > > actual comparisons being made before investing significant amounts of > > code and related effort being invested in fixing the current hash table > > implementation. And I haven't seen a lot of that. If the result of that > > comparison is that hash-indexes actually perform very well: Great! > > Yeah, I just don't agree with that. I don't think we have any policy > that you can't develop A and get it committed unless you try every > alternative that some other community member thinks might be better in > the long run first. Huh. I think we make such arguments *ALL THE TIME*. Anyway, I don't see much point in continuing to discuss this, I'm clearly in the minority. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] store narrow values in hash indexes?
Robert Haaswrites: > Another thought is that hash codes are 32 bits, but a Datum is 64 bits > wide on most current platforms. So we're wasting 4 bytes per index > tuple storing nothing. Datum is not a concept that exists on-disk. What's stored is the 32-bit hash value. You're right that we waste space if the platform's MAXALIGN is 8, but that's the fault of the alignment requirement not the index definition. > If we generated 64-bit hash codes we could > store as many bits of it as a Datum will hold and reduce hash > collisions. I think there is considerable merit in trying to move to 64-bit hash codes (at least for data types that are more than 4 bytes to begin with), but that's largely in the hope of reducing hash collisions in very large indexes, not because we'd avoid wasting alignment pad space. If we do support that, I think we would do it regardless of the platform MAXALIGN. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
Craig Ringerwrites: > On 23 September 2016 at 00:32, Tom Lane wrote: >> Certainly there are restrictions, but I'd imagine that every new release >> will be adding features to the TAP test infrastructure for some time to >> come. I think it's silly to claim that 9.6 is the first branch where >> TAP testing is usable at all. > Also, despite what I said upthread, there's no need for normal > PGXS-using extensions to define their $(prove_installcheck) > replacement. It works as-is. The problems I was having stemmed from > the fact that I was working with a pretty nonstandard Makefile without > realising that the changes were going to affect prove. $(prove_check) > isn't useful from extensions of course since it expects a temp install > and PGXS doesn't know how to make one, but $(prove_installcheck) does > the job fine. > It's thus sufficient to apply the patch to install the perl modules to > 9.4, 9.5 and 9.6. Nothing else is needed. I've attached backports for > 9.4 and 9.5. Pushed with cosmetic adjustments --- mostly, I followed the project standard that makefiles are named Makefile. (We use a GNUmakefile only in directories where it seems likely that non-developers would invoke make by hand, and there's supposed to be a Makefile beside them that throws an error whining about how you're not using GNU make. I see no need for that in src/test/perl.) Looking back over the thread, I see that you also proposed installing isolationtester and pg_isolation_regress for the benefit of extensions. I'm very much less excited about that idea. It'd be substantially more dead weight in typical installations, and I'm not sure that it'd be useful to common extensions, and I'm not eager to treat isolationtester's API and behavior as something we need to hold stable for extension use. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] store narrow values in hash indexes?
Currently, hash indexes always store the hash code in the index, but not the actual Datum. It's recently been noted that this can make a hash index smaller than the corresponding btree index would be if the column is wide. However, if the index is being built on a fixed-width column with a typlen <= sizeof(Datum), we could store the original value in the hash index rather than the hash code without using any more space. That would complicate the code, but I bet it would be faster: we wouldn't need to set xs_recheck, we could rule out hash collisions without visiting the heap, and we could support index-only scans in such cases. Another thought is that hash codes are 32 bits, but a Datum is 64 bits wide on most current platforms. So we're wasting 4 bytes per index tuple storing nothing. If we generated 64-bit hash codes we could store as many bits of it as a Datum will hold and reduce hash collisions. Alternatively, we could try to stick some other useful information in those bytes, like an abbreviated abbreviated key. Not sure if these are good ideas. They're just ideas. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 9/23/16 2:42 AM, Heikki Linnakangas wrote: How do we handle single-dimensional arrays of composite types at the moment? At a quick glance, it seems that the composite types are just treated like strings, when they're in an array. That's probably OK, but it means that there's nothing special about composite types in multi-dimensional arrays. In any case, we should mention that in the docs. That is how they're handled, but I'd really like to change that. I've held off because I don't know how to handle the backwards incompatibility that would introduce. (I've been wondering if we might add a facility to allow specifying default TRANSFORMs that should be used for specific data types in specific languages.) The converse case (a composite with arrays) suffers the same problem (array is just treated as a string). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Wed, Sep 21, 2016 at 10:33 PM, Andres Freundwrote: > On 2016-09-21 22:23:27 -0400, Tom Lane wrote: >> Andres Freund writes: >> > Sure. But that can be addressed, with a lot less effort than fixing and >> > maintaining the hash indexes, by adding the ability to do that >> > transparently using btree indexes + a recheck internally. How that >> > compares efficiency-wise is unclear as of now. But I do think it's >> > something we should measure before committing the new code. >> >> TBH, I think we should reject that argument out of hand. If someone >> wants to spend time developing a hash-wrapper-around-btree AM, they're >> welcome to do so. But to kick the hash AM as such to the curb is to say >> "sorry, there will never be O(1) index lookups in Postgres". > > Note that I'm explicitly *not* saying that. I just would like to see > actual comparisons being made before investing significant amounts of > code and related effort being invested in fixing the current hash table > implementation. And I haven't seen a lot of that. If the result of that > comparison is that hash-indexes actually perform very well: Great! Yeah, I just don't agree with that. I don't think we have any policy that you can't develop A and get it committed unless you try every alternative that some other community member thinks might be better in the long run first. If we adopt such a policy, we'll have no developers and no new features. Also, in this particular case, I think there's no evidence that the alternative you are proposing would actually be better or less work to maintain. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Fri, Sep 23, 2016 at 8:54 AM, Peter Eisentrautwrote: > On 9/20/16 4:07 PM, Robert Haas wrote: >> No, I'm assuming that the classes would be built-in. A string tag >> seems like over-engineering to me, particularly because the postmaster >> needs to switch on the tag, and we need to be very careful about the >> degree to which the postmaster trusts the contents of shared memory. > > I'm hoping that we can come up with something that extensions can > participate in, without the core having to know ahead of time what those > extensions are or how they would be categorized. > > My vision is something like > > max_processes = 512 # requires restart > > process_allowances = 'connection:300 superuser:10 autovacuum:10 > parallel:30 replication:10 someextension:20 someotherextension:20' > # does not require restart I don't think it's going to be very practical to allow extensions to participate in the mechanism because there have to be a finite number of slots that is known at the time we create the main shared memory segment. Also, it's really important that we don't add lots more surface area for the postmaster to crash and burn. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/12/16 2:50 PM, Peter Eisentraut wrote: > The comments on excludeDirContents are somewhat imprecise. For > example, none of those directories are actually removed or recreated > on startup, only the contents are. Fixed. > The comment for pg_replslot is incorrect. I think you can copy > replication slots just fine, but you usually don't want to. Fixed. > What is the 2 for in this code? > > if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0) This should be basepathlen + 1 (i.e., $PGDATA/). Fixed. > I would keep the signature of _tarWriteDir() similar to > _tarWriteHeader(). So don't pass sizeonly in there, or do pass in > sizeonly but do the same with _tarWriteHeader(). I'm not sure how much more similar they can be, but I do agree that sizeonly logic should be wrapped in _tarWriteHeader(). Done. > The documentation in pg_basebackup.sgml and protocol.sgml should be > updated. Done. I moved a few things to the protocol docs to avoid repetition. > Add some tests. At least test that one entry from the directory list > and one entry from the files list is not contained in the backup > result. Done for all files and directories. > Testing the symlink handling would also be good. Done using pg_replslot for the test. > (The > pg_basebackup tests claim that Windows doesn't support symlinks and > therefore skip all the symlink tests. That seems a bit at odds with > your code handling symlinks on Windows.) Hopefully Michael's response down-thread answered your question. -- -David da...@pgmasters.net diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 0f09d82..a8daa07 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1090,6 +1090,22 @@ SELECT pg_stop_backup(); +The contents of the pg_dynshmem/, pg_stat_tmp/, +pg_notify/, pg_serial/, +pg_snapshots/, and pg_subtrans/ directories can +be omitted from the backup as they will be initialized on postmaster +startup. If the is set and is +under the database cluster directory then the contents of the directory +specified by can also be omitted. + + + +Any file or directory beginning with pgsql_tmp can be +omitted from the backup. These files are removed on postmaster start and +the directories will be recreated as needed. + + + The backup label file includes the label string you gave to pg_start_backup, as well as the time at which pg_start_backup was run, and diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 68b0941..d65687f 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2059,17 +2059,26 @@ The commands accepted in walsender mode are: - postmaster.pid + postmaster.pid and postmaster.opts - postmaster.opts + postgresql.auto.conf.tmp - various temporary files created during the operation of the PostgreSQL server + backup_label and tablespace_map. If these + files exist they belong to an exclusive backup and are not applicable + to the base backup. + + + + + Various temporary files and directories created during the operation of + the PostgreSQL server, i.e. any file or directory beginning with + pgsql_tmp. @@ -2082,7 +2091,11 @@ The commands accepted in walsender mode are: - pg_replslot is copied as an empty directory. + pg_replslot, pg_dynshmem, + pg_stat_tmp, pg_notify, + pg_serial, pg_snapshots, and + pg_subtrans are copied as empty directories (even if they + are symbolic links). diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 9f1eae1..984ea5b 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -610,10 +610,8 @@ PostgreSQL documentation The backup will include all files in the data directory and tablespaces, including the configuration files and any additional files placed in the - directory by third parties. But only regular files and directories are - copied. Symbolic links (other than those used for tablespaces) and special - device files are skipped. (See for - the precise details.) + directory by third parties, with certain exceptions. (See +for the complete list of exceptions.) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index da9b7a6..04909ef 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -30,6 +30,7 @@ #include "replication/basebackup.h" #include "replication/walsender.h" #include "replication/walsender_private.h" +#include "storage/dsm_impl.h" #include "storage/fd.h" #include
Re: [HACKERS] pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location
On 9/23/16 2:12 PM, Peter Eisentraut wrote: > On 4/26/16 5:02 AM, Ashutosh Sharma wrote: >> Knowing that pg_basebackup always creates an empty directory for >> pg_stat_tmp and pg_replslot in backup location, even i think it would be >> better to handle these directories in such a way that pg_basebackup >> generates an empty directory for pg_replslot and pg_stat_tmp if they are >> symbolic link. > > I just wanted to update you, I have taken this commit fest entry patch > to review because I think it will be addresses as part of "Exclude > additional directories in pg_basebackup", which I'm also reviewing. > Therefore, I'm not actually planning on discussing this patch further. > Please correct me if this assessment does not match your expectations. Just to be clear, and as I noted in [1], I pulled the symlink logic from this this patch into the exclusion patch [2]. I think it makes sense to only commit [2] as long as Ashutosh gets credit for his contribution. Thanks, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/raw/ced3b05f-c1d9-c262-ce63-9744ef7e6de8%40pgmasters.net [2] https://commitfest.postgresql.org/10/721/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location
On 4/26/16 5:02 AM, Ashutosh Sharma wrote: > Knowing that pg_basebackup always creates an empty directory for > pg_stat_tmp and pg_replslot in backup location, even i think it would be > better to handle these directories in such a way that pg_basebackup > generates an empty directory for pg_replslot and pg_stat_tmp if they are > symbolic link. I just wanted to update you, I have taken this commit fest entry patch to review because I think it will be addresses as part of "Exclude additional directories in pg_basebackup", which I'm also reviewing. Therefore, I'm not actually planning on discussing this patch further. Please correct me if this assessment does not match your expectations. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequences and pg_upgrade
Here is an updated patch set. Compared to the initial set, I have changed pg_dump's sorting priorities so that sequence data is always after table data. This would otherwise have introduced a problem because sortDataAndIndexObjectsBySize() only considers consecutive DO_TABLE_DATA entries. Also, I have removed the separate --sequence-data switch from pg_dump and made it implicit in --binary-upgrade. (So the previous patches 0002 and 0003 have been combined, because it's no longer a separate feature.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services v2-0001-pg_dump-Separate-table-and-sequence-data-object-t.patch Description: invalid/octet-stream v2-0002-pg_upgrade-Upgrade-sequence-data-via-pg_dump.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade vs user created range type extension
On 09/23/2016 11:55 AM, Tom Lane wrote: Andrew Dunstanwrites: On 09/22/2016 07:33 PM, Tom Lane wrote: If that diagnosis is correct, we should either change pg_dump to not dump that function, or change CREATE TYPE AS RANGE to not auto-create the constructor functions in binary-upgrade mode. The latter might be more flexible in the long run. Yeah, I think your diagnosis is correct. I'm not sure I see the point of the flexibility given that you can't specify a constructor function for range types (if that feature had been available I would probably have used it in this extension). It turns out that pg_dump already contains logic that's meant to exclude constructor functions from getting dumped, but it's broken for binary upgrades from pre-9.6 branches because somebody fat-fingered the AND/OR nesting in the giant WHERE clauses in getFuncs(). Curiously, it's *not* broken when dumping from >= 9.6, which explains why I couldn't reproduce this in HEAD. It looks like Stephen fixed this while adding the pg_init_privs logic, while not realizing that he'd left it broken in the existing cases. The comment in front of all this is nearly as confused as the code is, too. Will fix. Great, Thanks. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade vs user created range type extension
Andrew Dunstanwrites: > On 09/22/2016 07:33 PM, Tom Lane wrote: >> If that diagnosis is correct, we should either change pg_dump to not >> dump that function, or change CREATE TYPE AS RANGE to not auto-create >> the constructor functions in binary-upgrade mode. The latter might be >> more flexible in the long run. > Yeah, I think your diagnosis is correct. I'm not sure I see the point of > the flexibility given that you can't specify a constructor function for > range types (if that feature had been available I would probably have > used it in this extension). It turns out that pg_dump already contains logic that's meant to exclude constructor functions from getting dumped, but it's broken for binary upgrades from pre-9.6 branches because somebody fat-fingered the AND/OR nesting in the giant WHERE clauses in getFuncs(). Curiously, it's *not* broken when dumping from >= 9.6, which explains why I couldn't reproduce this in HEAD. It looks like Stephen fixed this while adding the pg_init_privs logic, while not realizing that he'd left it broken in the existing cases. The comment in front of all this is nearly as confused as the code is, too. Will fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On Fri, Sep 23, 2016 at 10:54 AM, Peter Eisentrautwrote: > This is looking pretty good. More comments on this patch set: Thanks for the review. > 0001: > > Keep the file order alphabetical in Mkvcbuild.pm. > > Needs nls.mk updates for file move (in initdb and pg_basebackup > directories). Fixed. > 0002: > > durable_rename needs close(fd) in non-error path (compare backend code). Oops, leak. > Changing from fsync() to fsync_name() in some cases means that > inaccessible files are now ignored. I guess this would only happen in > very obscure circumstances, but it's worth considering if that is OK. In writeTimeLineHistoryFile() that's fine, the user should have ownership of it as it has been created by receivelog.c. Similar remark for . In mark_file_as_archived, the previous open() call would have just failed. > You added this comment: > * XXX: This means that we might not restart if a crash occurs > before the > * fsync below. We probably should create the file in a temporary path > * like the backend does... > pg_receivexlog uses the .partial suffix for this reason. Why doesn't > pg_basebackup do that? Because it matters for pg_receivexlog as it has a retry logic and it is able to rework on a partial segment. Thinking more about that this comment does not make much sense, because for pg_basebackup a crash would just cause the backup to be incomplete, so I have removed it. > In open_walfile, could we move the fsync calls before the fstat or > somewhere around there so we don't have to make the same calls in two > different branches? We could refactor a bit the code so as follows: if (size == 16MB) { walfile = f; } else if (size == 0) { //do padding and lseek } else error(); fsync(); I am not sure if this gains in clarity though, perhaps I looked too much the current code. > 0003: > > There was a discussion about renaming the --nosync option in initdb to > --no-sync, but until that is done, the option in pg_basebackup should > stay what initdb has right now. Changed. > There is a whitespace alignment error in usage(). Not sure I follow here. > The -N option should be listed after -n. In the switch()? Fixed. > Some fsync calls are not covered by a do_sync conditional. I see some > in close_walfile(), HandleCopyStream(), ProcessKeepaliveMsg(). Right. I looked at the rest and did not notice any extra mistakes. -- Michael From 79a9c302bc4723d6709bf5f1dc6ca673598e8b4a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 23 Sep 2016 23:23:05 +0900 Subject: [PATCH 1/4] Relocation fsync routines of initdb into src/common Those are aimed at being used by other utilities, pg_basebackup mainly, and at some other degree by pg_dump and pg_receivexlog. --- src/bin/initdb/initdb.c | 270 ++- src/bin/initdb/nls.mk | 2 +- src/bin/pg_basebackup/nls.mk| 2 +- src/common/Makefile | 2 +- src/common/file_utils.c | 276 src/include/common/file_utils.h | 22 src/tools/msvc/Mkvcbuild.pm | 2 +- 7 files changed, 313 insertions(+), 263 deletions(-) create mode 100644 src/common/file_utils.c create mode 100644 src/include/common/file_utils.h diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3350e13..e52e67d 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -61,6 +61,7 @@ #endif #include "catalog/catalog.h" +#include "common/file_utils.h" #include "common/restricted_token.h" #include "common/username.h" #include "mb/pg_wchar.h" @@ -70,13 +71,6 @@ #include "fe_utils/string_utils.h" -/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */ -#if defined(HAVE_SYNC_FILE_RANGE) -#define PG_FLUSH_DATA_WORKS 1 -#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED) -#define PG_FLUSH_DATA_WORKS 1 -#endif - /* Ideally this would be in a .h file, but it hardly seems worth the trouble */ extern const char *select_default_timezone(const char *share_path); @@ -237,13 +231,6 @@ static char **filter_lines_with_token(char **lines, const char *token); #endif static char **readfile(const char *path); static void writefile(char *path, char **lines); -static void walkdir(const char *path, - void (*action) (const char *fname, bool isdir), - bool process_symlinks); -#ifdef PG_FLUSH_DATA_WORKS -static void pre_sync_fname(const char *fname, bool isdir); -#endif -static void fsync_fname_ext(const char *fname, bool isdir); static FILE *popen_check(const char *command, const char *mode); static void exit_nicely(void); static char *get_id(void); @@ -270,7 +257,6 @@ static void load_plpgsql(FILE *cmdfd); static void vacuum_db(FILE *cmdfd); static void make_template0(FILE *cmdfd); static void make_postgres(FILE *cmdfd); -static void fsync_pgdata(void); static void trapsig(int signum); static void check_ok(void);
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 02:59 PM, Pavan Deolasee wrote: On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra> wrote: On 09/23/2016 05:10 AM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra > wrote: On 09/21/2016 08:04 AM, Amit Kapila wrote: (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. how long each run was? Generally, I do half-hour run to get stable results. 10 x 5-minute runs for each client count. The full shell script driving the benchmark is here: http://bit.ly/2doY6ID and in short it looks like this: for r in `seq 1 $runs`; do for c in 1 8 16 32 64 128 192; do psql -c checkpoint pgbench -j 8 -c $c ... done done I see couple of problems with the tests: 1. You're running regular pgbench, which also updates the small tables. At scale 300 and higher clients, there is going to heavy contention on the pgbench_branches table. Why not test with pgbench -N? Sure, I can do a bunch of tests with pgbench -N. Good point. But notice that I've also done the testing with Dilip's workload, and the results are pretty much the same. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 03:07 PM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondrawrote: On 09/23/2016 01:44 AM, Tomas Vondra wrote: ... The 4.5 kernel clearly changed the results significantly: ... (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. The more I think about these random spikes in pgbench performance on 3.2.80, the more I find them intriguing. Let me show you another example (from Dilip's workload and group-update patch on 64 clients). This is on 3.2.80: 44175 34619 51944 38384 49066 37004 47242 36296 46353 36180 and on 4.5.5 it looks like this: 34400 35559 35436 34890 34626 35233 35756 34876 35347 35486 So the 4.5.5 results are much more even, but overall clearly below 3.2.80. How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we randomly do something right, but what is it and why doesn't it happen on the new kernel? And how could we do it every time? As far as I can see you are using default values of min_wal_size, max_wal_size, checkpoint related params, have you changed default shared_buffer settings, because that can have a bigger impact. Huh? Where do you see me using default values? There are settings.log with a dump of pg_settings data, and the modified values are checkpoint_completion_target = 0.9 checkpoint_timeout = 3600 effective_io_concurrency = 32 log_autovacuum_min_duration = 100 log_checkpoints = on log_line_prefix = %m log_timezone = UTC maintenance_work_mem = 524288 max_connections = 300 max_wal_size = 8192 min_wal_size = 1024 shared_buffers = 2097152 synchronous_commit = on work_mem = 524288 (ignoring some irrelevant stuff like locales, timezone etc.). Using default values of mentioned parameters can lead to checkpoints in between your runs. So I'm using 16GB shared buffers (so with scale 300 everything fits into shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout 1h etc. So no, there are no checkpoints during the 5-minute runs, only those triggered explicitly before each run. Also, I think instead of 5 mins, read-write runs should be run for 15 mins to get consistent data. Where does the inconsistency come from? Lack of warmup? Considering how uniform the results from the 10 runs are (at least on 4.5.5), I claim this is not an issue. For Dilip's workload where he is using only Select ... For Update, i think it is okay, but otherwise you need to drop and re-create the database between each run, otherwise data bloat could impact the readings. And why should it affect 3.2.80 and 4.5.5 differently? I think in general, the impact should be same for both the kernels because you are using same parameters, but I think if use appropriate parameters, then you can get consistent results for 3.2.80. I have also seen variation in read-write tests, but the variation you are showing is really a matter of concern, because it will be difficult to rely on final data. Both kernels use exactly the same parameters (fairly tuned, IMHO). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Remove redundant if clause in standbydesc.c
Michael Paquierwrites: > On Fri, Sep 23, 2016 at 6:54 PM, Aleksander Alekseev > wrote: >>> Very simple small patch - see attachment. >>> >>> /* not expected, but print something anyway */ >>> else if (msg->id == SHAREDINVALRELMAP_ID) >>> -appendStringInfoString(buf, " relmap"); >>> -else if (msg->id == SHAREDINVALRELMAP_ID) >>> appendStringInfo(buf, " relmap db %u", msg->rm.dbId); > Not really but my guess is that the two conditions have been left for > this purpose. I think it's a pretty obvious copy-and-pasteo. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Phrase search distance syntax
Teodor Sigaevwrites: >> Why does the phrase distance operator assume <1> means adjacent words, >> and not <0>. (FYI, <-> is the same as <1>.) > Because > 1 it is a result of subtruction of word's positions > 2 <0> could be used as special case like a word with two infinitives: This is actually documented, in 12.1.2: A special case that's sometimes useful is that <0> can be used to require that two patterns match the same word. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segm
Amit Kapilawrites: > Attached patch tightens the error handling. However, on debugging, I > found that CreateFileMapping() always set error code to 0 on success. Ah, interesting. > Now, it is quite possible > that error code is set to 0 on success in my windows environment > (Win7) and doesn't work in some other environment. In any case, if we > want to go ahead and don't want to rely on CreateFileMapping(), then > attached patch should suffice the need. Yeah, seeing that this is not mentioned in MS' documentation I don't think we should rely on it. The patch looks fine to me, pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Phrase search distance syntax
On Fri, Sep 23, 2016 at 05:07:26PM +0300, Teodor Sigaev wrote: > >Sorry to be asking another phrase search syntax question, and so close > >to final release, but ... > Really close... > > > >Why does the phrase distance operator assume <1> means adjacent words, > >and not <0>. (FYI, <-> is the same as <1>.) > Because > 1 it is a result of subtruction of word's positions > 2 <0> could be used as special case like a word with two infinitives: > # create text search dictionary xx (template = 'ispell', > DictFile='ispell_sample', AffFile='ispell_sample'); > # alter text search configuration english ALTER MAPPING FOR asciiword WITH > xx, english_stem; > > # select to_tsvector('english', 'bookings'); > to_tsvector > -- > 'book':1 'booking':1 > > # select to_tsvector('english', 'bookings') @@ 'book <0> booking'; > ?column? > -- > t > (1 row) OK, thanks. I just found it as unusual. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Phrase search distance syntax
Sorry to be asking another phrase search syntax question, and so close to final release, but ... Really close... Why does the phrase distance operator assume <1> means adjacent words, and not <0>. (FYI, <-> is the same as <1>.) Because 1 it is a result of subtruction of word's positions 2 <0> could be used as special case like a word with two infinitives: # create text search dictionary xx (template = 'ispell', DictFile='ispell_sample', AffFile='ispell_sample'); # alter text search configuration english ALTER MAPPING FOR asciiword WITH xx, english_stem; # select to_tsvector('english', 'bookings'); to_tsvector -- 'book':1 'booking':1 # select to_tsvector('english', 'bookings') @@ 'book <0> booking'; ?column? -- t (1 row) -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Remove redundant if clause in standbydesc.c
On Fri, Sep 23, 2016 at 6:54 PM, Aleksander Alekseevwrote: >> > Very simple small patch - see attachment. >> >> /* not expected, but print something anyway */ >> else if (msg->id == SHAREDINVALRELMAP_ID) >> -appendStringInfoString(buf, " relmap"); >> -else if (msg->id == SHAREDINVALRELMAP_ID) >> appendStringInfo(buf, " relmap db %u", msg->rm.dbId); >> >> Looking at inval.c, dbId can be InvalidOid. > > Frankly I'm not very familiar with this part of code. InvalidOid is just > zero. Does it create some problem in this case? Not really but my guess is that the two conditions have been left for this purpose. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Fri, Sep 23, 2016 at 10:32 AM, Robert Haaswrote: > On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro > wrote: >> I was thinking about suggesting a category "Replication" to cover the >> waits for client IO relating to replication, as opposed to client IO >> waits relating to regular user connections. Then you could put sync >> rep into that category instead of IPC, even though technically it is >> waiting for IPC from walsender process(es), on the basis that it's >> more newsworthy to a DBA that it's really waiting for a remote replica >> to respond. But it's probably pretty clear what's going on from the >> the wait point names, so maybe it's not worth a category. Thoughts? > > I thought about a replication category but either it will only have > SyncRep in it, which is odd, or it will pull in other things that > otherwise fit nicely into the Activity category, and then that > boundaries of all the categories become mushy: is the subsystem that > causes the wait that we are trying to document, or the kind of thing > for which we are waiting? Using category IPC for SyncRep looks fine to me. >> I do suspect that the set of wait points will grow quite a bit as we >> develop more parallel stuff though. For example, I have been working >> on a patch that adds several more wait points, indirectly via >> condition variables (using your patch). Actually in my case it's >> BarrierWait -> ConditionVariableWait -> WaitEventSetWait. I propose >> that these higher level wait primitives should support passing a wait >> point identifier through to WaitEventSetWait. > > +1. As much as I suspect that inclusion of pgstat.h will become more and more usual to allow more code paths to access to a given WaitClass. After digesting all the comments given, I have produced the patch attached that uses the following categories: +const char *const WaitEventNames[] = { + /* activity */ + "ArchiverMain", + "AutoVacuumMain", + "BgWriterHibernate", + "BgWriterMain", + "CheckpointerMain", + "PgStatMain", + "RecoveryWalAll", + "RecoveryWalStream", + "SysLoggerMain", + "WalReceiverMain", + "WalSenderMain", + "WalWriterMain", + /* client */ + "SecureRead", + "SecureWrite", + "SSLOpenServer", + "WalReceiverWaitStart", + "WalSenderWaitForWAL", + "WalSenderWriteData", + /* Extension */ + "Extension", + /* IPC */ + "BgWorkerShutdown", + "BgWorkerStartup", + "ExecuteGather", + "MessageQueueInternal", + "MessageQueuePutMessage", + "MessageQueueReceive", + "MessageQueueSend", + "ParallelFinish", + "ProcSignal", + "ProcSleep", + "SyncRep", + /* timeout */ + "BaseBackupThrottle", + "PgSleep", + "RecoveryApplyDelay", +}; I have moved WalSenderMain as it tracks a main loop, so it was strange to not have it in Activity. I also kept SecureRead and SecureWrite because this is referring to the functions of the same name. For the rest, I got convinced by what has been discussed and it makes things clearer. My head is not spinning anymore when I try to keep in sync the list of names and its associated enum, which is good. I have as well rewritten the docs to follow that. -- Michael diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 8ca1c1c..9222b73 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -17,6 +17,7 @@ #include "access/xact.h" #include "mb/pg_wchar.h" #include "miscadmin.h" +#include "pgstat.h" #include "storage/latch.h" #include "utils/hsearch.h" #include "utils/memutils.h" @@ -496,7 +497,9 @@ pgfdw_get_result(PGconn *conn, const char *query) wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE, PQsocket(conn), - -1L); + -1L, + WAIT_EXTENSION, + WE_EXTENSION); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0776428..6f7eef9 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -679,6 +679,44 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser buffer in question. + + + Activity: The server process is waiting for some + activity to happen on a socket. This is mainly used system processes + in their main processing loop. wait_event will identify + the type of activity waited for. + + + + + Extension: The server process is waiting for activity + in a plugin or an extension. This category is useful for plugin + and module developers to track custom waiting points. + + + + + Client: The server process is waiting for some activity + on a socket from a client process, and that the server expects +
Re: [HACKERS] asynchronous and vectorized execution
On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekarwrote: > For e.g., in the above plan which you specified, suppose : > 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so > is > waiting in ExecAsyncWaitForNode(foreign_scan_on_b). > 2. The event wait list already has foreign scan on a that is on a different > subtree. > 3. This foreign scan a happens to be ready, so in > ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called, > which returns with result_ready. > 4. Since it returns result_ready, it's parent node is now inserted in the > callbacks array, and so it's parent (Append) is executed. > 5. But, this Append planstate is already in the middle of executing Hash > join, and is waiting for HashJoin. Ah, yeah, something like that could happen. I've spent much of this week working on a new design for this feature which I think will avoid this problem. It doesn't work yet - in fact I can't even really test it yet. But I'll post what I've got by the end of the day today so that anyone who is interested can look at it and critique. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:50 AM, Robert Haaswrote: > On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra > wrote: >> I don't dare to suggest rejecting the patch, but I don't see how we could >> commit any of the patches at this point. So perhaps "returned with feedback" >> and resubmitting in the next CF (along with analysis of improved workloads) >> would be appropriate. > > I think it would be useful to have some kind of theoretical analysis > of how much time we're spending waiting for various locks. So, for > example, suppose we one run of these tests with various client counts > - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select > wait_event from pg_stat_activity" once per second throughout the test. > Then we see how many times we get each wait event, including NULL (no > wait event). Now, from this, we can compute the approximate > percentage of time we're spending waiting on CLogControlLock and every > other lock, too, as well as the percentage of time we're not waiting > for lock. That, it seems to me, would give us a pretty clear idea > what the maximum benefit we could hope for from reducing contention on > any given lock might be. > As mentioned earlier, such an activity makes sense, however today, again reading this thread, I noticed that Dilip has already posted some analysis of lock contention upthread [1]. It is clear that patch has reduced LWLock contention from ~28% to ~4% (where the major contributor was TransactionIdSetPageStatus which has reduced from ~53% to ~3%). Isn't it inline with what you are looking for? [1] - https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Phrase search distance syntax
Sorry to be asking another phrase search syntax question, and so close to final release, but ... Why does the phrase distance operator assume <1> means adjacent words, and not <0>. (FYI, <-> is the same as <1>.) For example: select to_tsvector('park a a house') @@ to_tsquery('park <3> house'); seems like it would be more logical as <2>, meaning two lexems between the words. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:29 PM, Pavan Deolaseewrote: > On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra > wrote: >> >> On 09/23/2016 05:10 AM, Amit Kapila wrote: >>> >>> On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra >>> wrote: On 09/21/2016 08:04 AM, Amit Kapila wrote: > > (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. >>> >>> how long each run was? Generally, I do half-hour run to get stable >>> results. >>> >> >> 10 x 5-minute runs for each client count. The full shell script driving >> the benchmark is here: http://bit.ly/2doY6ID and in short it looks like >> this: >> >> for r in `seq 1 $runs`; do >> for c in 1 8 16 32 64 128 192; do >> psql -c checkpoint >> pgbench -j 8 -c $c ... >> done >> done > > > > I see couple of problems with the tests: > > 1. You're running regular pgbench, which also updates the small tables. At > scale 300 and higher clients, there is going to heavy contention on the > pgbench_branches table. Why not test with pgbench -N? As far as this patch > is concerned, we are only interested in seeing contention on > ClogControlLock. In fact, how about a test which only consumes an XID, but > does not do any write activity at all? Complete artificial workload, but > good enough to tell us if and how much the patch helps in the best case. We > can probably do that with a simple txid_current() call, right? > Right, that is why in the initial tests done by Dilip, he has used Select .. for Update. I think using txid_current will generate lot of contention on XidGenLock which will mask the contention around CLOGControlLock, in-fact we have tried that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl promote wait
On 9/22/16 11:16 AM, Masahiko Sawada wrote: > Commit e7010ce seems to forget to change help message of pg_ctl. > Attached patch. Fixed, thanks. I also added the -t option and reformatted a bit. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondrawrote: > On 09/23/2016 01:44 AM, Tomas Vondra wrote: >> >> ... >> The 4.5 kernel clearly changed the results significantly: >> > ... >> >> >> (c) Although it's not visible in the results, 4.5.5 almost perfectly >> eliminated the fluctuations in the results. For example when 3.2.80 >> produced this results (10 runs with the same parameters): >> >> 12118 11610 27939 11771 18065 >> 12152 14375 10983 13614 11077 >> >> we get this on 4.5.5 >> >> 37354 37650 37371 37190 37233 >> 38498 37166 36862 37928 38509 >> >> Notice how much more even the 4.5.5 results are, compared to 3.2.80. >> > > The more I think about these random spikes in pgbench performance on 3.2.80, > the more I find them intriguing. Let me show you another example (from > Dilip's workload and group-update patch on 64 clients). > > This is on 3.2.80: > > 44175 34619 51944 38384 49066 > 37004 47242 36296 46353 36180 > > and on 4.5.5 it looks like this: > > 34400 35559 35436 34890 34626 > 35233 35756 34876 35347 35486 > > So the 4.5.5 results are much more even, but overall clearly below 3.2.80. > How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we > randomly do something right, but what is it and why doesn't it happen on the > new kernel? And how could we do it every time? > As far as I can see you are using default values of min_wal_size, max_wal_size, checkpoint related params, have you changed default shared_buffer settings, because that can have a bigger impact. Using default values of mentioned parameters can lead to checkpoints in between your runs. Also, I think instead of 5 mins, read-write runs should be run for 15 mins to get consistent data. For Dilip's workload where he is using only Select ... For Update, i think it is okay, but otherwise you need to drop and re-create the database between each run, otherwise data bloat could impact the readings. I think in general, the impact should be same for both the kernels because you are using same parameters, but I think if use appropriate parameters, then you can get consistent results for 3.2.80. I have also seen variation in read-write tests, but the variation you are showing is really a matter of concern, because it will be difficult to rely on final data. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondrawrote: > On 09/23/2016 05:10 AM, Amit Kapila wrote: > >> On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra >> wrote: >> >>> On 09/21/2016 08:04 AM, Amit Kapila wrote: >>> >>> (c) Although it's not visible in the results, 4.5.5 almost perfectly >>> eliminated the fluctuations in the results. For example when 3.2.80 >>> produced >>> this results (10 runs with the same parameters): >>> >>> 12118 11610 27939 11771 18065 >>> 12152 14375 10983 13614 11077 >>> >>> we get this on 4.5.5 >>> >>> 37354 37650 37371 37190 37233 >>> 38498 37166 36862 37928 38509 >>> >>> Notice how much more even the 4.5.5 results are, compared to 3.2.80. >>> >>> >> how long each run was? Generally, I do half-hour run to get stable >> results. >> >> > 10 x 5-minute runs for each client count. The full shell script driving > the benchmark is here: http://bit.ly/2doY6ID and in short it looks like > this: > > for r in `seq 1 $runs`; do > for c in 1 8 16 32 64 128 192; do > psql -c checkpoint > pgbench -j 8 -c $c ... > done > done I see couple of problems with the tests: 1. You're running regular pgbench, which also updates the small tables. At scale 300 and higher clients, there is going to heavy contention on the pgbench_branches table. Why not test with pgbench -N? As far as this patch is concerned, we are only interested in seeing contention on ClogControlLock. In fact, how about a test which only consumes an XID, but does not do any write activity at all? Complete artificial workload, but good enough to tell us if and how much the patch helps in the best case. We can probably do that with a simple txid_current() call, right? 2. Each subsequent pgbench run will bloat the tables. Now that may not be such a big deal given that you're checkpointing between each run. But it still makes results somewhat hard to compare. If a vacuum kicks in, that may have some impact too. Given the scale factor you're testing, why not just start fresh every time? Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Rename max_parallel_degree?
On 9/20/16 4:07 PM, Robert Haas wrote: > No, I'm assuming that the classes would be built-in. A string tag > seems like over-engineering to me, particularly because the postmaster > needs to switch on the tag, and we need to be very careful about the > degree to which the postmaster trusts the contents of shared memory. I'm hoping that we can come up with something that extensions can participate in, without the core having to know ahead of time what those extensions are or how they would be categorized. My vision is something like max_processes = 512 # requires restart process_allowances = 'connection:300 superuser:10 autovacuum:10 parallel:30 replication:10 someextension:20 someotherextension:20' # does not require restart -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On Fri, Sep 23, 2016 at 6:11 PM, Peter Eisentrautwrote: > On 9/23/16 1:56 AM, Amit Kapila wrote: >> which comment are you referring here? hashm_mapp contains block >> numbers of bitmap pages. > > The comment I'm referring to says > > The blknos of these bitmap pages are kept in bitmaps[]; nmaps is the > number of currently existing bitmaps. > > But there is no "bitmaps" field anywhere. > Okay. You are right, it should be hashm_mapp. >> In the above code, it appears that you are trying to calculate >> max_avail space for all pages in same way. Don't you need to >> calculate it differently for bitmap page or meta page as they don't >> share the same format as other type of pages? > > Is this even useful for hash indexes? > I think so. It will be helpful for bucket and overflow pages. They store the index tuples similar to btree. Is there a reason, why you think it won't be useful? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.85140
Amit Kapilawrites: > On Fri, Sep 23, 2016 at 1:21 AM, Tom Lane wrote: >> Robert Haas writes: >>> So, we could have dsm_postmaster_startup() seed the random number >>> generator itself, and then let PostmasterRandom() override the seed >>> later. Like maybe: >> >> Works for me, at least as a temporary solution. > Isn't it better if we use the same technique in dsm_create() as well > which uses random() for handle? dsm_create() is executed in backends, not the postmaster, and they already have their own random seeds (cf BackendRun). Adding more srandom calls at random places will *not* make things better. However, it's certainly true that dsm_postmaster_startup() might not be the only place in postmaster startup that wants to use random(). What seems like the best thing after sleeping on it is to put "srandom(time(NULL))" somewhere early in PostmasterMain, so that one such call suffices for all uses during postmaster startup. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 01:44 AM, Tomas Vondra wrote: ... The 4.5 kernel clearly changed the results significantly: ... > (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. The more I think about these random spikes in pgbench performance on 3.2.80, the more I find them intriguing. Let me show you another example (from Dilip's workload and group-update patch on 64 clients). This is on 3.2.80: 44175 34619 51944 38384 49066 37004 47242 36296 46353 36180 and on 4.5.5 it looks like this: 34400 35559 35436 34890 34626 35233 35756 34876 35347 35486 So the 4.5.5 results are much more even, but overall clearly below 3.2.80. How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we randomly do something right, but what is it and why doesn't it happen on the new kernel? And how could we do it every time? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 13 September 2016 at 20:20, Robert Haaswrote: > On Mon, Aug 29, 2016 at 4:08 AM, Kyotaro HORIGUCHI > wrote: > > [ new patches ] > > +/* > + * We assume that few nodes are async-aware and async-unaware > + * nodes cannot be revserse-dispatched from lower nodes that > is > + * async-aware. Firing of an async node that is not a > descendant > + * of the planstate will cause such reverse-diaptching to > + * async-aware nodes, which is unexpected behavior for them. > + * > + * For instance, consider an async-unaware Hashjoin(OUTER, > INNER) > + * where the OUTER is running asynchronously but the Hashjoin > is > + * waiting on the async INNER during inner-hash creation. If > the > + * OUTER fires for the case, since anyone is waiting on it, > + * ExecAsyncWaitForNode finally dispatches to the Hashjoin > which > + * is now in the middle of thing its work. > + */ > +if (!IsParent(planstate, node)) > +continue; > > I'm not entirely sure that I understand this comment, but I don't > think it's going in the right direction. Let's start with the example > in the second paragraph. If the hash join is async-unaware, then it > isn't possible for the hash join to be both running the outer side of > the join asynchronously and at the same time waiting on the inner > side. Once it tries to pull the first tuple from the outer side, it's > waiting for that to finish and can't do anything else. So, the inner > side can't possibly get touched in any way until the outer side > finishes. For anything else to happen, the hash join would have to be > async-aware. Even if we did that, I don't think it would be right to > kick off both sides of the hash join at the same time. Right now, if > the outer side turns out to be empty, we never need to build the hash > table, and that's good. > I feel the !IsParent() condition is actually to prevent the infinite wait caused by a re-entrant issue in ExecAsuncWaitForNode() that Kyotaro mentioned earlier. But yes, the comments don't explain exactly how the hash join can cause the re-entrant issue. But I attempted to come up with some testcase which might reproduce the infinite-waiting in ExecAsyncWaitForNode() after removing the !IsParent() check so that the other subtree nodes are also included, but I couldn't reproduce. Kyotaro, is it possible for you to give a testcase that consistently hangs if we revert back the !IsParent() check ? I was also thinking about another possibility where the same plan state node is re-entered, as explained below. > > I don't think it's a good idea to wait for only nodes that are in the > current subtree. For example, consider a plan like this: > > Append > -> Foreign Scan on a > -> Hash Join > -> Foreign Scan on b > -> Hash > -> Seq Scan on x > > Suppose Append and Foreign Scan are parallel-aware but the other nodes > are not. Append kicks off the Foreign Scan on a and then waits for > the hash join to produce a tuple; the hash join kicks off the Foreign > Scan on b and waits for it to return a tuple. If, while we're waiting > for the foreign scan on b, the foreign scan on a needs some attention > - either to produce tuples, or maybe just to call PQconsumeInput() so > that more data can be sent from the other side, I think we need to be > able to do that. There's no real problem here; even if the Append > becomes result-ready before the hash join returns, that is fine. Yes I agree : we should be able to do this. Sine we have all the waiting events in a common estate, there's no harm if we start executing nodes of another sub-tree if we get an event from there. But I am thinking about what would happen when this node from other sub-tree returns result_ready, and then it's parents are called, and then the result gets bubbled up upto the node which had already caused us to call ExecAsyncWaitForNode() in the first place. For e.g., in the above plan which you specified, suppose : 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so is waiting in ExecAsyncWaitForNode(foreign_scan_on_b). 2. The event wait list already has foreign scan on a that is on a different subtree. 3. This foreign scan a happens to be ready, so in ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called, which returns with result_ready. 4. Since it returns result_ready, it's parent node is now inserted in the callbacks array, and so it's parent (Append) is executed. 5. But, this Append planstate is already in the middle of executing Hash join, and is waiting for HashJoin. Is this safe to execute the same plan state when it is already inside its execution ? In other words, is the plan state re-entrant ? I suspect, the new execution may even corrupt the structures with which it was
Re: [HACKERS] pageinspect: Hash index support
On 9/23/16 1:56 AM, Amit Kapila wrote: > On Fri, Sep 23, 2016 at 9:40 AM, Peter Eisentraut >> - hash_metap result fields spares and mapp should be arrays of integer. >> > > how would you like to see both those arrays in tuple, right now, I > think Jesper's code is showing it as string. I'm not sure what you are asking here. >> (Incidentally, the comment in hash.h talks about bitmaps[] but I think >> it means hashm_mapp[].) >> > > which comment are you referring here? hashm_mapp contains block > numbers of bitmap pages. The comment I'm referring to says The blknos of these bitmap pages are kept in bitmaps[]; nmaps is the number of currently existing bitmaps. But there is no "bitmaps" field anywhere. > In the above code, it appears that you are trying to calculate > max_avail space for all pages in same way. Don't you need to > calculate it differently for bitmap page or meta page as they don't > share the same format as other type of pages? Is this even useful for hash indexes? This idea came from btrees. Neither the gin nor the brin code have this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 05:10 AM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondrawrote: On 09/21/2016 08:04 AM, Amit Kapila wrote: (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. how long each run was? Generally, I do half-hour run to get stable results. 10 x 5-minute runs for each client count. The full shell script driving the benchmark is here: http://bit.ly/2doY6ID and in short it looks like this: for r in `seq 1 $runs`; do for c in 1 8 16 32 64 128 192; do psql -c checkpoint pgbench -j 8 -c $c ... done done It of course begs the question what kernel version is running on the machine used by Dilip (i.e. cthulhu)? Although it's a Power machine, so I'm not sure how much the kernel matters on it. cthulhu is a x86 m/c and the kernel version is 3.10. Seeing, the above results I think kernel version do matter, but does that mean we ignore the benefits we are seeing on somewhat older kernel version. I think right answer here is to do some experiments which can show the actual contention as suggested by Robert and you. Yes, I think it'd be useful to test a new kernel version. Perhaps try 4.5.x so that we can compare it to my results. Maybe even try using my shell script There are results for 64, 128 and 192 clients. Why should we care about numbers in between? How likely (and useful) would it be to get improvement with 96 clients, but no improvement for 64 or 128 clients? >> The only point to take was to see from where we have started seeing improvement, saying that the TPS has improved from >=72 client count is different from saying that it has improved from >=128. I find the exact client count rather uninteresting - it's going to be quite dependent on hardware, workload etc. >> I don't dare to suggest rejecting the patch, but I don't see how we could commit any of the patches at this point. So perhaps "returned with feedback" and resubmitting in the next CF (along with analysis of improvedworkloads) would be appropriate. Agreed with your conclusion and changed the status of patch in CF accordingly. +1 -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Thu, Sep 22, 2016 at 10:16 AM, Amit Kapilawrote: > On Thu, Sep 22, 2016 at 8:51 AM, Jeff Janes wrote: >> >> >> Correct. But any torn page write must be covered by the restoration of a >> full page image during replay, shouldn't it? And that restoration should >> happen blindly, without first reading in the old page and verifying the >> checksum. >> > > Probably, but I think this is not currently the way it is handled and > I don't want to change it. AFAIU, what happens now is that we first > read the old page (and we do page verification while reading old page > and display *warning* if checksum doesn't match) and then restore the > image. The relevant code path is > XLogReadBufferForRedo()->XLogReadBufferExtended()->ReadBufferWithoutRelcache()->ReadBuffer_common()->PageIsVerified(). > > Now, here the point is that why instead of WARNING we are seeing FATAL > for the bitmap page of hash index. The reason as explained in my > previous e-mail is that for bitmap page we are not logging full page > image and we should fix that as explained there. Once the full page > image is logged, then first time it reads the torn page, it will use > flag RBM_ZERO_AND_LOCK which will make the FATAL error you are seeing > to WARNING. > I think here I am slightly wrong. For the full page writes, it do use RBM_ZERO_AND_LOCK mode to read the page and for such mode we are not doing page verification check and rather blindly setting the page to zero and then overwrites it with full page image. So after my fix, you will not see the error of checksum failure. I have a fix ready, but still doing some more verification. If everything passes, I will share the patch in a day or so. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MSVC pl-perl error message is not verbose enough
On 08/02/2016 08:18 PM, John Harvey wrote: On Mon, Aug 1, 2016 at 9:39 PM, Michael Paquierwrote: On Tue, Aug 2, 2016 at 2:08 AM, Robert Haas wrote: Did you add this to the next CommitFest? I have added it here: https://commitfest.postgresql.org/10/691/ John, it would be good if you could get a community account and add your name to this patch as its author. I could not find you. Done. If there's anything else that I should do, please let me know. I'd be glad to help out. Committed, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment
On Thu, Sep 22, 2016 at 10:40 PM, Tom Lanewrote: > Amit Kapila writes: >> On Tue, Sep 20, 2016 at 10:33 PM, Tom Lane wrote: >>> ISTM both the previous coding and this version can fail for no good >>> reason, that is what if GetLastError() happens to return one of these >>> error codes as a result of some unrelated failure from before this >>> subroutine is entered? That is, wouldn't it be a good idea to >>> do SetLastError(0) before calling CreateFileMapping? > >> Yes, that seems like a good idea. Do you need a patch with some >> testing on windows environment? > > Please; I can't test it. > Attached patch tightens the error handling. However, on debugging, I found that CreateFileMapping() always set error code to 0 on success. Basically, before calling CreateFileMapping(), I have set the error code as 10 (SetLastError(10)) and then after CreateFileMapping(), it sets the error code to 0 on success and appropriate error code on failure. I also verified that error code is set to 10 by calling GetLastError() before CreateFileMapping(). Now, it is quite possible that error code is set to 0 on success in my windows environment (Win7) and doesn't work in some other environment. In any case, if we want to go ahead and don't want to rely on CreateFileMapping(), then attached patch should suffice the need. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com clear_errorcode_dsm-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Remove redundant if clause in standbydesc.c
> > Very simple small patch - see attachment. > > /* not expected, but print something anyway */ > else if (msg->id == SHAREDINVALRELMAP_ID) > -appendStringInfoString(buf, " relmap"); > -else if (msg->id == SHAREDINVALRELMAP_ID) > appendStringInfo(buf, " relmap db %u", msg->rm.dbId); > > Looking at inval.c, dbId can be InvalidOid. Frankly I'm not very familiar with this part of code. InvalidOid is just zero. Does it create some problem in this case? -- Best regards, Aleksander Alekseev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Remove redundant if clause in standbydesc.c
On Fri, Sep 23, 2016 at 6:08 PM, Aleksander Alekseevwrote: > Very simple small patch - see attachment. /* not expected, but print something anyway */ else if (msg->id == SHAREDINVALRELMAP_ID) -appendStringInfoString(buf, " relmap"); -else if (msg->id == SHAREDINVALRELMAP_ID) appendStringInfo(buf, " relmap db %u", msg->rm.dbId); Looking at inval.c, dbId can be InvalidOid. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Remove redundant if clause in standbydesc.c
Hello. Very simple small patch - see attachment. -- Best regards, Aleksander Alekseev diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index 13797a3..77983e5 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -122,8 +122,6 @@ standby_desc_invalidations(StringInfo buf, appendStringInfoString(buf, " smgr"); /* not expected, but print something anyway */ else if (msg->id == SHAREDINVALRELMAP_ID) - appendStringInfoString(buf, " relmap"); - else if (msg->id == SHAREDINVALRELMAP_ID) appendStringInfo(buf, " relmap db %u", msg->rm.dbId); else if (msg->id == SHAREDINVALSNAPSHOT_ID) appendStringInfo(buf, " snapshot %u", msg->sn.relId); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-09-23 10:05 GMT+02:00 Craig Ringer: > On 22 September 2016 at 02:31, Pavel Stehule > wrote: > > > another small update - fix XMLPath parser - support multibytes characters > > I'm returning for another round of review. > > The code doesn't handle the 5 XML built-in entities correctly in > text-typed output. It processes and but not , or > . See added test. I have not fixed this, but I think it's clearly > broken: > > > + -- XML builtin entities > + SELECT * FROM xmltable('/x/a' PASSING > '< > ent>' > COLUMNS ent text); > + ent > + --- > + ' > + " > + > + > + > + (5 rows) > > so I've adjusted the docs to claim that they're expanded. The code > needs fixing to avoid entity-escaping when the output column type is > not 'xml'. > > > and entities in xml-typed output are expanded, not > preserved. I don't know if this is intended but I suspect it is: > > + SELECT * FROM xmltable('/x/a' PASSING > '< > ent>' > COLUMNS ent xml); > +ent > + -- > + ' > + " > + > + > + > + (5 rows) > > > For the docs changes relevant to the above search for "The five > predefined XML entities". Adjust that bit of docs if I guessed wrong > about the intended behaviour. > > The tests don't cover CDATA or PCDATA . I didn't try to add that, but > they should. > > > Did some docs copy-editing and integrated some examples. Explained how > nested elements work, that multiple top level elements is an error, > etc. Explained the time-of-evaluation stuff. Pointed out that you can > refer to prior output columns in PATH and DEFAULT, since that's weird > and unusual compared to normal SQL. Documented handling of multiple > node matches, including the surprising results of somepath/text() on > xy. Documented handling of nested > elements. Documented that xmltable works only on XML documents, not > fragments/forests. > > Regarding evaluation time, it struck me that evaluating path > expressions once per row means the xpath must be parsed and processed > once per row. Isn't it desirable to store and re-use the preparsed > xpath? I don't think this is a major problem, since we can later > detect stable/immutable expressions including constants, evaluate only > once in that case, and cache. It's just worth thinking about. > > The docs and tests don't seem to cover XML entities. What's the > behaviour there? Core XML only defines one entity, but if a schema > defines more how are they processed? The tests need to cover the > predefined entities and at least. > > I have no idea whether the current code can fetch a DTD and use any > declarations to expand entities, but I'm guessing not? If > not, external DTDs, and internal DTDs with external entities should be > documented as unsupported. > > It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): > > SELECT * FROM xmltable('/' PASSING $XML$ standalone="yes" ?> > > > ]> > Hello . > $XML$ COLUMNS foo text); > > + ERROR: invalid XML content > + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$ +^ > + DETAIL: line 2: StartTag: invalid element name > + + ^ > + line 3: StartTag: invalid element name > + > +^ > + line 4: StartTag: invalid element name > + > +^ > + line 6: Entity 'pg' not defined > + Hello . > +^ > > > libxml seems to support documents with internal DTDs: > > $ xmllint --valid /tmp/x > > > > ]> > Hello . > > > so presumably the issue lies in the xpath stuff? Note that it's not > even ignoring the DTD and choking on the undefined entity, it's > choking on the DTD its self. > > > OK, code comments: > > > In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP > instead of a PG_TRY() / PG_CATCH() block? > > > I think the new way you handle the type stuff is much, much better, > and with comments to explain too. Thanks very much. > > > There's an oversight in tableexpr vs xmltable separation here: > > +case T_TableExpr: > +*name = "xmltable"; > +return 2; > > presumably you need to look at the node and decide what kind of table > expression it is or just use a generic "tableexpr". > > Same problem here: > > +case T_TableExpr: > +{ > +TableExpr *te = (TableExpr *) node; > + > +/* c_expr shoud be closed in brackets */ > +appendStringInfoString(buf, "XMLTABLE("); > > This is correct, but not well commented - looks on XMLEXPR node - TableExpr is a holder, but it is invisible for user. User running a XMLTABLE function and should to see XMLTABLE. It will be more clean when we will support JSON_TABLE function. > > > I don't have the libxml knowledge or remaining brain to usefully > evaluate the xpath and xml specifics in xpath.c today. It does strike > me that the new xpath parser should probably live in its own file, > though. > I'll try move it to separate file > > I
Re: [HACKERS] patch: function xmltable
On 22 September 2016 at 02:31, Pavel Stehulewrote: > another small update - fix XMLPath parser - support multibytes characters I'm returning for another round of review. The code doesn't handle the 5 XML built-in entities correctly in text-typed output. It processes and but not , or . See added test. I have not fixed this, but I think it's clearly broken: + -- XML builtin entities + SELECT * FROM xmltable('/x/a' PASSING '' COLUMNS ent text); + ent + --- + ' + " + + + + (5 rows) so I've adjusted the docs to claim that they're expanded. The code needs fixing to avoid entity-escaping when the output column type is not 'xml'. and entities in xml-typed output are expanded, not preserved. I don't know if this is intended but I suspect it is: + SELECT * FROM xmltable('/x/a' PASSING '' COLUMNS ent xml); +ent + -- + ' + " + + + + (5 rows) For the docs changes relevant to the above search for "The five predefined XML entities". Adjust that bit of docs if I guessed wrong about the intended behaviour. The tests don't cover CDATA or PCDATA . I didn't try to add that, but they should. Did some docs copy-editing and integrated some examples. Explained how nested elements work, that multiple top level elements is an error, etc. Explained the time-of-evaluation stuff. Pointed out that you can refer to prior output columns in PATH and DEFAULT, since that's weird and unusual compared to normal SQL. Documented handling of multiple node matches, including the surprising results of somepath/text() on xy. Documented handling of nested elements. Documented that xmltable works only on XML documents, not fragments/forests. Regarding evaluation time, it struck me that evaluating path expressions once per row means the xpath must be parsed and processed once per row. Isn't it desirable to store and re-use the preparsed xpath? I don't think this is a major problem, since we can later detect stable/immutable expressions including constants, evaluate only once in that case, and cache. It's just worth thinking about. The docs and tests don't seem to cover XML entities. What's the behaviour there? Core XML only defines one entity, but if a schema defines more how are they processed? The tests need to cover the predefined entities and at least. I have no idea whether the current code can fetch a DTD and use any declarations to expand entities, but I'm guessing not? If not, external DTDs, and internal DTDs with external entities should be documented as unsupported. It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): SELECT * FROM xmltable('/' PASSING $XML$ ]> Hello . $XML$ COLUMNS foo text); + ERROR: invalid XML content + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$ +^ + line 4: StartTag: invalid element name + +^ + line 6: Entity 'pg' not defined + Hello . +^ libxml seems to support documents with internal DTDs: $ xmllint --valid /tmp/x ]> Hello . so presumably the issue lies in the xpath stuff? Note that it's not even ignoring the DTD and choking on the undefined entity, it's choking on the DTD its self. OK, code comments: In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP instead of a PG_TRY() / PG_CATCH() block? I think the new way you handle the type stuff is much, much better, and with comments to explain too. Thanks very much. There's an oversight in tableexpr vs xmltable separation here: +case T_TableExpr: +*name = "xmltable"; +return 2; presumably you need to look at the node and decide what kind of table expression it is or just use a generic "tableexpr". Same problem here: +case T_TableExpr: +{ +TableExpr *te = (TableExpr *) node; + +/* c_expr shoud be closed in brackets */ +appendStringInfoString(buf, "XMLTABLE("); I don't have the libxml knowledge or remaining brain to usefully evaluate the xpath and xml specifics in xpath.c today. It does strike me that the new xpath parser should probably live in its own file, though. I think this is all a big improvement. Barring the notes above and my lack of review of the guts of the xml.c parts of it, I'm pretty happy with what I see now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 09/22/2016 10:28 AM, Pavel Stehule wrote: Now, the tests are enough - so I'll mark this patch as ready for commiters. I had to fix tests - there was lot of white spaces, and the result for python3 was missing Thanks Pavel! This crashes with arrays with non-default lower bounds: postgres=# SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}'); INFO: ([1, 2, ], ) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. I'd like to see some updates to the docs for this. The manual doesn't currently say anything about multi-dimensional arrays in pl/python, but it should've mentioned that they're not supported. Now that it is supported, should mention that, and explain briefly that a multi-dimensional array is mapped to a python list of lists. It seems we don't have any mention in the docs about arrays with non-default lower-bounds ATM. That's not this patch's fault, but it would be good to point out that the lower bounds are discarded when an array is passed to python. I find the loop in PLyList_FromArray() quite difficult to understand. Are the comments there mixing up the "inner" and "outer" dimensions? I wonder if that would be easier to read, if it was written in a recursive-style, rather than iterative with stacks for the dimensions. On 08/03/2016 02:49 PM, Alexey Grishchenko wrote: This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable. How do we handle single-dimensional arrays of composite types at the moment? At a quick glance, it seems that the composite types are just treated like strings, when they're in an array. That's probably OK, but it means that there's nothing special about composite types in multi-dimensional arrays. In any case, we should mention that in the docs. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Wed, Aug 31, 2016 at 2:46 PM, Peter Eisentrautwrote: > Here is a patch I've been working on to allow the use of ICU for sorting > and other locale things. This is very interesting work, and it's great to see some development in this area. I've been peripherally involved in more than one collation-change-broke-my-data incident over the years. I took the patch for a quick spin today. Here are a couple of initial observations. This patch adds pkg-config support to our configure script, in order to produce the build options for ICU. That's cool, and I'm a fan of pkg-config, but it's an extra dependency that I just wanted to highlight. For example MacOSX appears to ship with ICU but has is no pkg-config or ICU .pc files out of the box (erm, I can't even find the headers, so maybe that copy of ICU is useless to everyone except Apple). The MacPorts ICU port ships with .pc files, so that's easy to deal with, and I don't expect it to be difficult to get a working pkg-config and meta-data installed alongside ICU on any platform that doesn't already ship with them. It may well be useful for configuring other packages. (There is also other cool stuff that would become possible once ICU is optionally around, off topic.) There is something missing from the configure script however: the output of pkg-config is not making it into CFLAGS or LDFLAGS, so building fails on FreeBSD and MacOSX where for example doesn't live in the default search path. I tried very briefly to work out what but autoconfuse defeated me. You call the built-in strcoll/strxfrm collation provider "posix", and although POSIX does define those functions, it only does so because it inherits them from ISO C90. As far as I can tell Windows provides those functions because it conforms to the C spec, not the POSIX spec, though I suppose you could argue that in that respect it DOES conform to the POSIX spec... Also, we already have a collation called "POSIX". Maybe we should avoid confusing people with a "posix" provider and a "POSIX" collation? But then I'm not sure what else to call it, but perhaps "system" as Petr Jelinek suggested[1], or "libc". postgres=# select * from pg_collation where collname like 'en_NZ%'; ┌──┬───┬───┬──┬──┬──┬──┬─┐ │ collname │ collnamespace │ collowner │ collprovider │ collencoding │ collcollate│collctype │ collversion │ ├──┼───┼───┼──┼──┼──┼──┼─┤ │ en_NZ│11 │10 │ p│ 6 │ en_NZ│ en_NZ│ 0 │ │ en_NZ│11 │10 │ p│ 8 │ en_NZ.ISO8859-1 │ en_NZ.ISO8859-1 │ 0 │ │ en_NZ│11 │10 │ p│ 16 │ en_NZ.ISO8859-15 │ en_NZ.ISO8859-15 │ 0 │ │ en_NZ.ISO8859-1 │11 │10 │ p│ 8 │ en_NZ.ISO8859-1 │ en_NZ.ISO8859-1 │ 0 │ │ en_NZ.ISO8859-15 │11 │10 │ p│ 16 │ en_NZ.ISO8859-15 │ en_NZ.ISO8859-15 │ 0 │ │ en_NZ.UTF-8 │11 │10 │ p│ 6 │ en_NZ.UTF-8 │ en_NZ.UTF-8 │ 0 │ │ en_NZ%icu│11 │10 │ i│ -1 │ en_NZ│ en_NZ│ -1724383232 │ └──┴───┴───┴──┴──┴──┴──┴─┘ (7 rows) I notice that you use encoding -1, meaning "all". I haven't fully grokked what that really means but it appears that we won't be able to create any new such collations after initdb using CREATE COLLATION, if for example you update your ICU installation and now have a new collation available. When I tried dropping some and recreating them they finished up with collencoding = 6. Should the initdb-created rows use 6 too? + ucol_getVersion(collator, versioninfo); + numversion = ntohl(*((uint32 *) versioninfo)); + + if (numversion != collform->collversion) + ereport(WARNING, + (errmsg("ICU collator version mismatch"), + errdetail("The database was created using version 0x%08X, the library provides version 0x%08X.", + (uint32) collform->collversion, (uint32) numversion), + errhint("Rebuild affected indexes, or build PostgreSQL with the right version of ICU."))); I played around with bumping version numbers artificially. That gives each session that accesses the collation one warning: postgres=# select * from foo order by id; WARNING: ICU collator version mismatch DETAIL: The database was created using version 0x99380001, the library provides version 0x9938. HINT: Rebuild affected indexes, or build PostgreSQL with the right version of ICU. ┌┐ │ id │ ├┤ └┘ (0 rows) postgres=# select * from foo
Re: [HACKERS] Showing parallel status in \df+
2016-09-23 7:22 GMT+02:00 Rushabh Lathia: > > > On Thu, Sep 22, 2016 at 10:04 PM, Tom Lane wrote: > >> Rushabh Lathia writes: >> > I agree with the argument in this thread, having "Source code" as part >> > of \df+ is bit annoying, specifically when output involve some really >> > big PL language functions. Having is separate does make \df+ output more >> > readable. So I would vote for \df++ rather then adding the source code >> > as part of footer for \df+. >> >> If it's unreadable in \df+, how would \df++ make that any better? >> >> > Eventhough source code as part of \df+ is bit annoying (specifically for > PL functions), > I noticed the argument in this thread that it's useful information for > some of. So \df++ > is just alternate option for the those who want the source code. > ++ is little bit obscure. So better to remove src everywhere. Regards Pavel > > > >> regards, tom lane >> > > > > -- > Rushabh Lathia >
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618
On Fri, Sep 23, 2016 at 1:21 AM, Tom Lanewrote: > Robert Haas writes: >> On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane wrote: >>> I'd be the first to agree that this point is inadequately documented >>> in the code, but PostmasterRandom should be reserved for its existing >>> security-related uses, not exposed to the world for (ahem) random other >>> uses. > >> So, we could have dsm_postmaster_startup() seed the random number >> generator itself, and then let PostmasterRandom() override the seed >> later. Like maybe: > > Works for me, at least as a temporary solution. The disturbing thing > here is that this still only does what we want if dsm_postmaster_startup > happens before the first PostmasterRandom call --- which is OK today but > seems pretty fragile. Still, redesigning PostmasterRandom's seeding > technique is not something to do right before 9.6 release. Let's revert > the prior patch and do it as you have here: > >> struct timeval tv; >> gettimeofday(, NULL); >> srandom(tv.tv_sec); >> ... >> dsm_control_handle = random(); > > for the time being. > Isn't it better if we use the same technique in dsm_create() as well which uses random() for handle? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File system operations.
On 22 September 2016 at 20:02, Yury Zhuravlevwrote: > On четверг, 15 сентября 2016 г. 19:09:16 MSK, Tom Lane wrote: >> >> Robert Haas writes: >>> >>> On Thu, Sep 15, 2016 at 11:01 AM, Anastasia Lubennikova >>> wrote: What do you think about moving stuff from pg_upgrade/file.c to storage/file/ to allow reuse of this code? I think it'll be really helpful for developers of contrib modules like backup managers. >> >> >>> Well, storage/file is backend and pg_upgrade is frontend. If you want >>> to reuse the same code for both it's got to go someplace else. >> >> >> Also, to the extent that it's important to use those wrappers rather >> than libc directly, it's because the wrappers are tied into backend >> resource management and error handling conventions. I don't see how >> you convert that into code that also works in a frontend program, >> or what the point would be exactly. > > > Maybe we should towards to framework ecosystem. I mean, developers of third > party tools must use maximum same api what in backend. Well, there's a very gradual shift in that direction with libpgcommon etc. But it's minimalist. Most of what's done in the backend makes no sense in frontend code. It'd make much more sense to adopt an existing portable runtime (nspr, apr, whatever) than write our own if we wanted to go full framework. But I don't think we're likely to. Much more likely to cause pain than prevent it, esp since we're multiprocessing and shmem based. There are a few areas we could use abstractions for though, like fork()/exec() vs CreateProcessEx(...). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers