Re: PATCH: Configurable file mode mask
On 03/13/2018 01:50 PM, Stephen Frost wrote: > Yes, that's true, but PG has never paid attention to POSIX ACLs or Linux > capabilities. Changing it to do so is quite a bit beyond the scope... I think we're largely in agreement here, as my aim was not to advocate that PG should work harder to understand the subtleties of every system it could be installed on, but rather that it should work less hard at pretending to understand them when it doesn't, and thus avoid obstructing the admin, who presumably does. > I'll point out that PG does run on quite a few other OS's beyond Linux. I'll second that, as I think it is making my argument. When I can supply three or four examples of semantic subtleties that undermine PG's assumptions under Linux alone, the picture when broadened to those quite-a-few other platforms as well certainly doesn't become simpler! >> but then sooner or later it will still end up making assumptions >> that aren't true under, say, SELinux, so there's another #ifdef, >> and where does it end? > > I agree with this general concern. :) That's probably where it became clear that I'm not advocating an add-#ifdefs-for-everything approach. > PG will stat PGDATA and conclude that the system is saying that group > access has been granted on PGDATA and will do the same for subsequent > files created later on. ... The only issue that remains is that PG > doesn't understand or work with POSIX ACLs or Linux capabilities, > but that's not anything new. What's new is that it is now pretending even more extravagantly to understand what it doesn't understand. Where it would previously draw in incorrect conclusion and refuse to start, which is annoying but not very difficult to work around if need be, it would now draw the same incorrect conclusion and then actively go about making the real world embody the incorrect conclusion, contrary to the admin's efforts. >> umask inherited by the postmaster. A --permission-transparency option >> would simply use open and mkdir in the traditional ways, allowing >> the chosen umask, ACL defaults, SELinux contexts, etc., to do their >> thing, and would avoid then stepping on the results with explicit >> chmods (and of course skip the I-refuse-to-start-because-I- >> misunderstand-your-setup check). >> ... > > I'm a fan of this idea in general, but it's unclear how that > --permission-transparency option would work in practice. Are you > suggesting that it be a compile-time flag, which would certainly work > but also then have to be debated among packagers as to the right setting > and if there's any need to be concerned about users misunderstanding it, > or a flag for each program, I was thinking of a command-line option ... > which hardly seems ideal as some invokations > of programs might forget to specify that, leading to inconsistent > permissions, or something else..? ... but I see how that gets complicated with the various other command- line utilities included. > .. we'd have to build in complete > support for POSIX ACLs and Linux capabilities if we go down a route I'm wary of an argument that we can't do better except by building in complete support for POSIX ACLs, and capabilities (and NFSv4 ACLs, and SELinux, and AppArmor, and #ifdef and #ifdef and #ifdef). It seems to me that, in most cases, the designers of these sorts of extensions to old traditional Unix behavior take great pains to design them such that they can still usefully function in the presence of programs that "don't pay attention to or understand or use" them, as long as those programs are in some sense well-behaved, and not going out of their way with active steps that insist on or impose permissions that aren't appropriate under the non-understood circumstances. So my suggestion boils down to PG having at least an option, somehow, to be well-behaved in that sense. -Chap
Re: TupleTableSlot abstraction
On 2018-03-13 15:18:34 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > I've recently been discussing with Robert how to abstract > > TupleTableSlots in the light of upcoming developments around abstracting > > storage away. Besides that aspect I think we also need to make them > > more abstract to later deal with vectorized excution, but I'm more fuzzy > > on the details there. > Hi, is this patch proposed for pg11? I wish, but I don't see it happening unless I get a time compression device from somewhere :( Greetings, Andres Freund
Re: JIT compiling with LLVM v11
On 2018-03-13 10:25:49 -0400, Peter Eisentraut wrote: > On 3/12/18 17:04, Andres Freund wrote: > > │ JIT: > > │ > > │ Functions: 4 > > │ > > │ Inlining: false > > │ > > │ Inlining Time: 0.000 > > │ > > │ Optimization: false > > │ > > │ Optimization Time: 5.023 > > │ > > │ Emission Time: 34.987 > > │ > > The time quantities need some units. > > > │ Execution time: 46.277 ms > > │ > > like this :) Yea, I know. I was planning to start a thread about that. explain.c is littered with code like if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfo(es->str, "Planning time: %.3f ms\n", 1000.0 * plantime); else ExplainPropertyFloat("Planning Time", 1000.0 * plantime, 3, es); which, to me, is bonkers. I think we should add add 'const char *unit' parameter to at least ExplainProperty{Float,Integer,Long}? Or a *Unit version of them doing so, allowing a bit more gradual change? Greetings, Andres Freund
Re: JIT compiling with LLVM v11
Hi, On 2018-03-13 14:36:44 -0400, Robert Haas wrote: > On Mon, Mar 12, 2018 at 5:04 PM, Andres Freundwrote: > > Currently a handful of explain outputs in the regression tests change > > output when compiled with JITing. Therefore I'm thinking of adding > > JITINFO or such option, which can be set to false for those tests? > > Can we spell that JIT or at least JIT_INFO? The latter works, I don't have a strong opinion on that. For now I've just tied it to COSTS off. > I realize that EXPLAIN (JIT OFF) may sound like it's intended to > disable JIT itself Yea, that's what I'm concerned about. > , but I think it's pretty clear that EXPLAIN (BUFFERS OFF) does not > disable the use of actual buffers, only the display of buffer-related > information. Hm. Greetings, Andres Freund
Re: INOUT parameters in procedures
2018-03-13 14:14 GMT+01:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > On 3/8/18 02:25, Pavel Stehule wrote: > > It looks like some error in this concept. The rules for enabling > > overwriting procedures should modified, so this collision should not be > > done. > > > > When I using procedure from PL/pgSQL, then it is clear, so I place on > > *OUT position variables. But when I call procedure from top, then I'll > > pass fake parameters to get some result. > > What we'll probably want to do here is to make the OUT parameters part > of the identity signature of procedures, unlike in functions. This > should be a straightforward change, but it will require some legwork in > many parts of the code. > yes > > >if (argmodes && (argmodes[i] == PROARGMODE_INOUT || > > argmodes[i] == PROARGMODE_OUT)) > > + { > > + Param *param; > > > > Because PROARGMODE_OUT are disallowed, then this check is little bit > > messy. Please, add some comment. > > Fixed. > > I discovered another issue, in LANGUAGE SQL procedures. Currently, if > you make a CALL with an INOUT parameter in an SQL procedure, the output > is thrown away (unless it's the last command). I would like to keep > open the option of assigning the results by name, like we do in > PL/pgSQL. So in this patch I have made a change to prohibit calling > procedures with INOUT parameters in LANGUAGE SQL routines (see > check_sql_fn_statements()). What do you think? > The disabling it, it is probably the best what is possible now. The variables in SQL are more named parameters than variables. Is not necessary to complicate it. Regards Pavel > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Feb 14, 2018 at 5:37 AM, Amit Kapilawrote: > Your concern is valid, but isn't the same problem exists in another > approach as well, because in that also we can call > adjust_paths_for_srfs after generating gather path which means that we > might lose some opportunity to reduce the relative cost of parallel > paths due to tlists having SRFs. Also, a similar problem can happen > in create_order_paths for the cases as described in the example > above. You're right. I think cleaning all of this up for v11 is too much to consider, but please tell me your opinion of the attached patch set. Here, instead of the ripping the problematic logic out of apply_projection_to_path, what I've done is just removed several of the callers to apply_projection_to_path. I think that the work of removing other callers to that function could be postponed to a future release, but we'd still get some benefit now, and this shows the direction I have in mind. I'm going to explain what the patches do one by one, but out of order, because I backed into the need for the earlier patches as a result of troubleshooting the later ones in the series. Note that the patches need to be applied in order even though I'm explaining them out of order. 0003 introduces a new upper relation to represent the result of applying the scan/join target to the topmost scan/join relation. I'll explain further down why this seems to be needed. Since each path must belong to only one relation, this involves using create_projection_path() for the non-partial pathlist as we already do for the partial pathlist, rather than apply_projection_to_path(). This is probably marginally more expensive but I'm hoping it doesn't matter. (However, I have not tested.) Each non-partial path in the topmost scan/join rel produces a corresponding path in the new upper rel. The same is done for partial paths if the scan/join target is parallel-safe; otherwise we can't. 0004 causes the main part of the planner to skip calling generate_gather_paths() from the topmost scan/join rel. This logic is mostly stolen from your patch. If the scan/join target is NOT parallel-safe, we perform generate_gather_paths() on the topmost scan/join rel. If the scan/join target IS parallel-safe, we do it on the post-projection rel introduced by 0003 instead. This is the patch that actually fixes Jeff's original complaint. 0005 goes a bit further and removes a bunch of logic from create_ordered_paths(). The removed logic tries to satisfy the query's ordering requirement via cheapest_partial_path + Sort + Gather Merge. Instead, it adds an additional path to the new upper rel added in 0003. This again avoids a call to apply_projection_to_path() which could cause projection to be pushed down after costing has already been fixed. Therefore, it gains the same advantages as 0004 for queries that would this sort of plan. Currently, this loses the ability to set limit_tuples for the Sort path; that doesn't look too hard to fix but I haven't done it. If we decide to proceed with this overall approach I'll see about getting it sorted out. Unfortunately, when I initially tried this approach, a number of things broke due to the fact that create_projection_path() was not exactly equivalent to apply_projection_to_path(). This initially surprised me, because create_projection_plan() contains logic to eliminate the Result node that is very similar to the logic in apply_projection_to_path(). If apply_projection_path() sees that the subordinate node is projection-capable, it applies the revised target list to the child; if not, it adds a Result node. create_projection_plan() does the same thing. However, create_projection_plan() can lose the physical tlist optimization for the subordinate node; it forces an exact projection even if the parent node doesn't require this. 0001 fixes this, so that we get the same plans with create_projection_path() that we would with apply_projection_to_path(). I think this patch is a good idea regardless of what we decide to do about the rest of this, because it can potentially avoid losing the physical-tlist optimization in any place where create_projection_path() is used. It turns out that 0001 doesn't manage to preserve the physical-tlist optimization when the projection path is attached to an upper relation. 0002 fixes this. If we decide to go forward with this approach, it may makes sense to merge some of these when actually committing, but I found it useful to separate them for development and testing purposes, and for clarity about what was getting changed at each stage. Please let me know your thoughts. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0005-Remove-explicit-path-construction-logic-in-create_or.patch Description: Binary data 0004-Postpone-generate_gather_paths-for-topmost-scan-join.patch Description: Binary data
Re: JIT compiling with LLVM v11
On Mon, Mar 12, 2018 at 5:04 PM, Andres Freundwrote: > Currently a handful of explain outputs in the regression tests change > output when compiled with JITing. Therefore I'm thinking of adding > JITINFO or such option, which can be set to false for those tests? Can we spell that JIT or at least JIT_INFO? I realize that EXPLAIN (JIT OFF) may sound like it's intended to disable JIT itself, but I think it's pretty clear that EXPLAIN (BUFFERS OFF) does not disable the use of actual buffers, only the display of buffer-related information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal: schema variables
2018-03-13 10:54 GMT+01:00 Pavel Luzanov: > Pavel Stehule wrote > > Now, there is one important question - storage - Postgres stores all > > objects to files - only memory storage is not designed yet. This is part, > > where I need a help. > > O, I do not feel confident in such questions. > May be some ideas you can get from extension with similar functionality: > https://github.com/postgrespro/pg_variables Unfortunately not - it doesn't implement this functionality Regards Pavel > > > - > Pavel Luzanov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers- > f1928748.html > >
neqjoinsel versus "refresh materialized view concurrently"
The following commit has caused a devastating performance regression in concurrent refresh of MV: commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 Author: Tom LaneDate: Wed Nov 29 22:00:29 2017 -0500 Fix neqjoinsel's behavior for semi/anti join cases. The below reproduction goes from taking about 1 second to refresh, to taking an amount of time I don't have the patience to measure. drop table foobar2 cascade; create table foobar2 as select * from generate_series(1,20); create materialized view foobar3 as select * from foobar2; create unique index on foobar3 (generate_series ); analyze foobar3; refresh materialized view CONCURRENTLY foobar3 ; When I interrupt the refresh, I get a message including this line: CONTEXT: SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16420 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1" So I makes sense that the commit in question could have caused a change in the execution plan. Because these are temp tables, I can't easily get my hands on them to investigate further. Cheers, Jeff
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Feb 19, 2018 at 4:02 AM, David Rowleywrote: > On 19 February 2018 at 18:01, David Rowley > wrote: >> On 19 February 2018 at 15:11, Tomas Vondra >> wrote: >>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >>> naming for boolean variables. >> >> You're right. I'll change that and post an updated patch. I'll also >> reprocess your email again and try to improve some comments to make >> the intent of the code more clear. > > I've made a few changes to the patch. "isproxy" is now "is_proxy". > I've made the comments a bit more verbose at the top of the definition > of struct AppendPath. Also shuffled some comments around in allpaths.c > > I've attached both a delta patch with just these changes, and also a > completely new patch based on a recent master. While I was working on http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com I noticed that a projection path is very close to being usable as a proxy path, because we've already got code to strip out unnecessary proxy paths at plan generation time. I noticed that there were a few problems with that which I wrote patches to fix (see 0001, 0002 attached to that email) but they seem to be only minor issues. What do you think about the idea of using a projection path as a proxy path instead of inventing a new method? It seems simple enough to do: new_path = (Path *) create_projection_path(root, new_rel, old_path, old_path->pathtarget); ...when we need a proxy path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PATCH: Configurable file mode mask
On 03/13/2018 02:47 PM, Tom Lane wrote: > Well, to be blunt, what we target is POSIX-compatible systems. If > you're telling us to worry about non-POSIX filesystem semantics, > and your name is not Microsoft, it's going to be a hard sell. > We have enough to do just keeping up with that scope of target > systems. So, how many POSIX-compatible systems are available today (if any), where you can actually safely assume that there are no additional security/access-control-related considerations in effect beyond three user bits/three group bits/three other bits, and not be wrong? I'm not advocating the Sisyphean task of having PG incorporate knowledge of all those possibilities. I'm advocating the conservative approach: have PG be that well-behaved application that those extended semantics are generally all designed to play well with, and just not do stuff that obstructs or tramples over what the admin takes care to set up. On 03/13/2018 03:44 PM, Stephen Frost wrote: > * Chapman Flack (c...@anastigmatix.net) wrote: >> So my suggestion boils down to PG having at least an option, somehow, >> to be well-behaved in that sense. > > I'm afraid that we haven't got any great answer to that "somehow". I > was hoping you might have some other ideas beyond command-line > switches which could leave the system in an inconsistent state a bit > too easily. I wonder how complicated it would have to be really. On any system with a POSIX base, I guess it's possible for PGDATA to have an S_ISVTX "sticky" bit in the mode, which does have an official significance (but one that only affects whether non-postgres can rename or unlink things in the directory, which might be of little practical significance). Perhaps its meaning could be overloaded with "the admin is handling the permissions, thank you", and postmaster and various command-line utilities could see it, and refrain from any gratuitous chmods or refusals to function. Or, if overloading S_ISVTX seems in poor taste, what would be wrong with simply checking for an empty file PERMISSIONS-ARE-MANAGED in PGDATA and responding the same way? Or, assuming some form of ACL is available, just let the admin change the owner and group of PGDATA to other than postgres, grant no access to other, and give rwx to postgres in the ACL? PG could then reason as follows: * I do not own this directory. * I am not the group of this directory. * It grants no access to other. * Yet, I find myself listing and accessing files in it without difficulty. * The admin has set this up for me in a way I do not understand. * I will refrain from messing with it. Three ideas off the top of my head. Probably more where they came from. :) -Chap
Re: parallel append vs. simple UNION ALL
On Tue, Mar 13, 2018 at 12:35 AM, Ashutosh Bapatwrote: > It looks like it was not changed in all the places. make falied. I > have fixed all the instances of these two functions in the attached > patchset (only 0003 changes). Please check. Oops. Thanks. I'm going to go ahead and commit 0001 here. Any more thoughts on the rest? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Problem while setting the fpw with SIGHUP
On Fri, Mar 09, 2018 at 01:42:04PM +0530, Dilip Kumar wrote: > While setting the full_page_write with SIGHUP I hit an assert in checkpoint > process. And, that is because inside a CRITICAL section we are calling > RecoveryInProgress which intern allocates memory. So I have moved > RecoveryInProgress call out of the CRITICAL section and the problem got > solved. Indeed, I can see how this is possible. If you apply the following you can also have way more fun, but that's overdoing it: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7918,6 +7918,8 @@ CheckRecoveryConsistency(void) bool RecoveryInProgress(void) { + Assert(CritSectionCount == 0); Anyway, it seems to me that you are not taking care of all possible race conditions here. RecoveryInProgress() could as well be called in XLogFlush(), and that's a code path taken during redo. Instead of doing what you are suggesting, why not moving InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as the allocations for WAL inserts is done unconditionally? This has the cost of also making this allocation even for backends which are started during recovery, still we are talking about allocating a couple of bytes in exchange of addressing completely all race conditions in this area. InitXLogInsert() does not depend on any post-recovery data like ThisTimeLineId, so a split is possible. Thoughts? -- Michael signature.asc Description: PGP signature
Re: PATCH: Unlogged tables re-initialization tests
On Mon, Mar 12, 2018 at 02:33:18PM -0400, David Steele wrote: > On 3/12/18 11:59 AM, Dagfinn Ilmari Mannsåker wrote: >> However, that is still wrong, because die() and BAIL_OUT() mean >> different things: die() aborts the current test script, while BAIL_OUT() >> aborts the entire 'prove' run, i.e. all subsequent scripts in the same >> test directory. > > If that's the case, do we really want to abort all subsequent test > modules if a single module fails? I'm good either way, just throwing it > out there for consideration. I am getting that in those code paths, we want the test series to die in a painful way which should be used for tests where things are critically failing. In which case, as documented by Test:More we should use BAIL_OUT. It is true that using die() has the advantage that one can look at multiple failures in a single run when you want to look for similar failure patterns, however it makes debugging harder if you a lot of test scripts because it becomes harder to track which one was the first to fail in a parallel test. Also, if you look at all the code paths calling die() in the test suites those refer to critical failures, which should cause an immediate stop of the test. So my take would be to switch all die() calls to BAIL_OUT() in order to make all this code consistent. -- Michael signature.asc Description: PGP signature
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Mon, 12 Mar 2018 12:21:34 -0400 Tom Lanewrote: > I wrote: > > Maybe this type of situation is an argument for trusting an ANALYZE-based > > estimate more than the VACUUM-based estimate. I remain uncomfortable with > > that in cases where VACUUM looked at much more of the table than ANALYZE > > did, though. Maybe we need some heuristic based on the number of pages > > actually visited by each pass? > > I looked into doing something like that. It's possible, but it's fairly > invasive; there's no clean way to compare those page counts without > altering the API of acquire_sample_rows() to pass back the number of pages > it visited. That would mean a change in FDW-visible APIs. We could do > that, but I don't want to insist on it when there's nothing backing it up > except a fear that *maybe* ANALYZE's estimate will be wrong often enough > to worry about. > > So at this point I'm prepared to go forward with your patch, though not > to risk back-patching it. Field experience will tell us if we need to > do more. I propose the attached cosmetic refactoring, though. I like the re-factor. Making vac_estimate_reltuples() specific to the special case of vacuum and having the normal analyze case just in analyze seems like an improvement overall. It it helps I have been experimenting with your thought experiment (update first 20% of rows, then delete 50% of those) to try to trick analyze. I built test scripts and generate data and found the the current system after vacuum is usually about 8% to 10% off on reltuples. Analyze moves it very slowly closer to the true count. With the patch analyze nails it immediately. To see how this was affected by the relationship of table size and sample I created another test setup just to iterate analyze runs and compare to the true count. fter a couple thousand analyzes with various table mutations and table sizes up to 100 M rows, (1.8 M pages) and default_statistics_targets ranging from 1 to 1000 I am pretty confident that we can get 2% accuracy for any size table even with the old statistics target. The table size does not really matter much, the error drops and clusters more tightly as sample size increases but past 1 or so it's well into diminishing returns. Summary of 100 analyze runs for each line below. Errors and sample fraction are in percent for easy reading. / --- percent ---/ testcase | Mrows | stats | pages | sample | fraction | maxerr | avgerr | stddev ---+---+---+-++--+++- first-last |10 | 1 | 163935 |300 | 0.001830 | 6.6663 | 2.3491 | 2.9310 first-last |10 | 3 | 163935 |900 | 0.005490 | 3.8886 | 1.2451 | 1.5960 first-last |10 |10 | 163935 | 3000 | 0.018300 | 2.8337 | 0.7539 | 0.9657 first-last |10 |33 | 163935 | 9900 | 0.060390 | 1.4903 | 0.3723 | 0.4653 first-last |10 | 100 | 163935 | 3 | 0.182999 | 0.6580 | 0.2221 | 0.2707 first-last |10 | 333 | 163935 | 99900 | 0.609388 | 0.1960 | 0.0758 | 0.0897 first-last | 100 | 1 | 1639345 |300 | 0.000183 | 8.7500 | 2.2292 | 2.8685 first-last | 100 | 3 | 1639345 |900 | 0.000549 | 5.4166 | 1.1695 | 1.5431 first-last | 100 |10 | 1639345 | 3000 | 0.001830 | 1.7916 | 0.6258 | 0.7593 first-last | 100 |33 | 1639345 | 9900 | 0.006039 | 1.8182 | 0.4141 | 0.5433 first-last | 100 | 100 | 1639345 | 3 | 0.018300 | 0.9417 | 0.2464 | 0.3098 first-last | 100 | 333 | 1639345 | 99900 | 0.060939 | 0.4642 | 0.1206 | 0.1542 first-last | 100 | 1000 | 1639345 | 30 | 0.183000 | 0.2192 | 0.0626 | 0.0776 un-updated |10 | 1 | 180328 |300 | 0.001664 | 7.9259 | 2.2845 | 2.7806 un-updated |10 | 3 | 180328 |900 | 0.004991 | 4.2964 | 1.2923 | 1.5990 un-updated |10 |10 | 180328 | 3000 | 0.016636 | 2.2593 | 0.6734 | 0.8271 un-updated |10 |33 | 180328 | 9900 | 0.054900 | 0.9260 | 0.3305 | 0.3997 un-updated |10 | 100 | 180328 | 3 | 0.166364 | 1.0162 | 0.2024 | 0.2691 un-updated |10 | 333 | 180328 | 99900 | 0.553991 | 0.2058 | 0.0683 | 0.0868 un-updated | 100 | 1 | 1803279 |300 | 0.000166 | 7. | 1.8793 | 2.3820 un-updated | 100 | 3 | 1803279 |900 | 0.000499 | 3.8889 | 1.0586 | 1.3265 un-updated | 100 |10 | 1803279 | 3000 | 0.001664 | 2.1407 | 0.6710 | 0.8376 un-updated | 100 |33 | 1803279 | 9900 | 0.005490 | 1.1728 | 0.3779 | 0.4596 un-updated | 100 | 100 | 1803279 | 3 | 0.016636 | 0.6256 | 0.1983 | 0.2551 un-updated | 100 | 333 | 1803279 | 99900 | 0.055399 | 0.3454 | 0.1181 | 0.1407 un-updated | 100 | 1000 | 1803279 | 30 | 0.166364 | 0.1738 | 0.0593 | 0.0724 I also thought about the theory and am confident that there really is no way to trick it. Basically if there are enough
Re: Ambigous Plan - Larger Table on Hash Side
On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U Uwrote: > Hi , > > Recently I came across a case where the planner choose larger table on > hash side. I am not sure whether it is an intended behavior or we are > missing something. > > I have two tables (a and b) each with single column in it. One table > 'a' is large with around 30 million distinct rows and other table 'b' has > merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed > both the table. But while joining both the table I get the larger table on > hash side. > > tpch=# explain select b from b left join a on a = b; >QUERY PLAN > - > Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4) >Hash Cond: (b.b = a.a)o >-> Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4) > CStore File: > /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879 > CStore File Size: 314587 >-> Hash (cost=321721.22..321721.22 rows=30667722 width=4) > -> Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4) >CStore File: > /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876 >CStore File Size: 123236206 > (9 rows) > > > > I would like to know the reason for choosing this plan and Is there a easy > fix to prevent such plans (especially like this one where it choose a larger > hash table) ? A plan with larger table being hashed doesn't necessarily bad performing one. During partition-wise join analysis I have seen plans with larger table being hashed perform better than the plans with smaller table being hashed. But I have seen the other way around as well. Although, I don't know an easy way to force which side of join gets hashed. I tried that under the debugger. In your case, if you run EXPLAIN ANALYZE on this query, produce outputs of two plans: one with larger table being hashed and second with the smaller one being hashed, you will see which of them performs better. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [WIP PATCH] Index scan offset optimisation using visibility map
Tom, thanks for inspecting the patch. There's so many problems that slipped from my attention... But one thing that bothers me most is the problem with predicate locking > 13 марта 2018 г., в 0:55, Tom Laneнаписал(а): > > The PredicateLockPage call also troubles me quite a bit, not only from > a modularity standpoint but because that implies a somewhat-user-visible > behavioral change when this optimization activates. People who are using > serializable mode are not going to find it to be an improvement if their > queries fail and need retries a lot more often than they did before. > I don't know if that problem is bad enough that we should disable skipping > when serializable mode is active, but it's something to think about. Users already have to expect this if the scan turns into IoS somewhat accidentally. There will be page predicate locks with possible false positive conflicts. I'm not sure that keeping existing tuple-level lock granularity is necessary. I think we can do it if we introduce PredicateLockTuple that do not require tuple's xmin, that takes only tid and does not look into heap. But this tweak seems outside of the patch scope and I believe it's better avoid messing with SSI internals without strong reason. Best regards, Andrey Borodin.
Re: PATCH: Configurable file mode mask
On Mon, Mar 12, 2018 at 03:14:13PM -0400, Stephen Frost wrote: > We already had a discussion about having a GUC for this and concluded, > rightly in my view, that it's not sensible to have since we don't want > all of the various tools having to read and parse out postgresql.conf. If the problem is parsing, it could as well be more portable to put that in the control file, no? I have finished for example finished by implementing my own flavor of pg_controldata to save parsing efforts soas it prints control file fields on a per-call basis using individual fields, which saved also games with locales for as translations of pg_controldata can disturb the parsing logic. > I don't see anything in the discussion which has changed that and I > don't agree that there's an issue with using the privileges on the data > directory for this- it's a simple solution which all of the tools can > use and work with easily. I certainly don't agree that it's a serious > issue to relax the explicit check- it's just a check, which a user could > implement themselves if they wished to and had a concern for. On the > other hand, with the explicit check, we are actively preventing an > entirely reasonable goal of wanting to use a read-only role to perform a > backup of the system. Well, one thing is that the current checks in the postmaster make sure that a data folder is never using anything else than 0700. From a security point of view, making it possible to allow a postmaster to start with 0750 is a step backwards if users don't authorize it explicitely. There are a lot of systems which use a bunch of users with only single group with systemd. So this would remove an existing safeguard. I am not against the idea of this thread, just that I think that secured defaults should be user-enforceable if they want Postgres to behave so. -- Michael signature.asc Description: PGP signature
RE: Temporary tables prevent autovacuum, leading to XID wraparound
From: Tom Lane [mailto:t...@sss.pgh.pa.us] > So the correct fix is to improve autovacuum's check to discover whether > an old temp table is orphaned, so that it isn't fooled by putative owner > processes that are connected to some other DB. Step 2 of the proposed patch > tries to do that, but it's only half right: we need a change near line 2264 > not only line 2090. (Evidently, we need either a comment that the logic > is repeated, or else refactor so that there's only one copy.) > > Now, you can argue that autovacuum's check can be fooled by an "owner" > backend that is connected to the current DB but hasn't actually taken > possession of its assigned temp schema (and hence the table in question > really is orphaned after all). This edge case could be taken care of by > having backends clear their temp schema immediately, as in step 1 of the > patch. But I still think that that is an expensive way to catch what would > be a really infrequent case. Perhaps a better idea is to have backends > advertise in PGPROC whether they have taken possession of their assigned > temp schema, and then autovacuum could ignore putative owners that aren't > showing that flag. Or we could just do nothing about that, on the grounds > that nobody has shown the case to be worth worrying about. > Temp schemas that are sufficiently low-numbered to be likely to have an > apparent owner are also likely to get cleaned out by actual temp table > creation reasonably often, so I'm not at all convinced that we need a > mechanism that covers this case. We do need something for the cross-DB > case though, because even a very low-numbered temp schema can be problematic > if it's in a seldom-used DB, which seems to be the case that was complained > of originally. > > On the whole, my vote is to fix and apply step 2, and leave it at that. Done. It seems to work well. Regards Takayuki Tsunakawa reset_temp_schema_v2.patch Description: reset_temp_schema_v2.patch
Re: proposal: schema variables
Pavel Stehule wrote > Now, there is one important question - storage - Postgres stores all > objects to files - only memory storage is not designed yet. This is part, > where I need a help. O, I do not feel confident in such questions. May be some ideas you can get from extension with similar functionality: https://github.com/postgrespro/pg_variables - Pavel Luzanov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Ambigous Plan - Larger Table on Hash Side
Hi, Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture. I have a work_mem of 1000 MB. The Plan which we get regularly with table being analyzed . tpch=# explain analyze select b from tab2 left join tab1 on a = b; QUERY PLAN Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual time=9439.410..20445.620 rows=78264 loops=1) Hash Cond: (tab2.b = tab1.a) - Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual time=0.006..5.116 rows=78264 loops=1) - Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual time=9133.593..9133.593 rows=30667722 loops=1) Buckets: 33554432 Batches: 2 Memory Usage: 801126kB - Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) (actual time=0.030..3584.652 rows=30667722 loops=1) Planning time: 0.055 ms Execution time: 20472.603 ms (8 rows) I reproduced the other plan by not analyzing the smaller table. tpch=# explain analyze select b from tab2 left join tab1 on a = b; QUERY PLAN -- Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual time=15.331..7590.406 rows=78264 loops=1) Hash Cond: (tab1.a = tab2.b) - Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) (actual time=0.046..2697.480 rows=30667722 loops=1) - Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual time=15.133..15.133 rows=78264 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 3776kB - Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) (actual time=0.009..5.516 rows=78264 loops=1) Planning time: 0.053 ms Execution time: 7592.688 ms (8 rows) The actual plan seems to be Slower. The smaller table (tab2) has exactly each row duplicated 8 times and all the rows in larger table (tab2) are distinct. what may be the exact reason and can we fix this ? P.s I have also attached a sql file to reproduce this On Tue, 13 Mar 2018 12:42:12 +0530 Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U U narendra.prad...@zohocorp.com wrote: Hi , Recently I came across a case where the planner choose larger table on hash side. I am not sure whether it is an intended behavior or we are missing something. I have two tables (a and b) each with single column in it. One table 'a' is large with around 30 million distinct rows and other table 'b' has merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed both the table. But while joining both the table I get the larger table on hash side. tpch=# explain select b from b left join a on a = b; QUERY PLAN - Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4) Hash Cond: (b.b = a.a)o - Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4) CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879 CStore File Size: 314587 - Hash (cost=321721.22..321721.22 rows=30667722 width=4) - Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4) CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876 CStore File Size: 123236206 (9 rows) I would like to know the reason for choosing this plan and Is there a easy fix to prevent such plans (especially like this one where it choose a larger hash table) ? A plan with larger table being hashed doesn't necessarily bad performing one. During partition-wise join analysis I have seen plans with larger table being hashed perform better than the plans with smaller table being hashed. But I have seen the other way around as well. Although, I don't know an easy way to force which side of join gets hashed. I tried that under the debugger. In your case, if you run EXPLAIN ANALYZE on this query, produce outputs of two plans: one with larger table being hashed and second with the smaller one being hashed, you will see which of them performs better. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company join_plan.sql Description: Binary data
Re: SQL/JSON: functions
On 7 March 2018 at 14:34, Nikita Glukhovwrote: > Attached 12th version of SQL/JSON patches rebased onto the latest master. Please write some docs or notes to go with this. If you drop a big pile of code with no explanation it will just be ignored. I think many people want SQL/JSON, but the purpose of a patch submission is to allow a committer to review and commit without needing to edit anything. It shouldn't be like assembling flat pack furniture while wearing a blindfold. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Google Summer of Code: Potential Applicant
Hello everyone, > > I am mostly interested in anything that requires C/C++ implementation and > > AlgoDS. > > > > For that reason I would love to work in any of the following (in that > > order of preference): > > > >1. Sorting algorithms benchmark and implementation > >2. Enhancing amcheck for all AMs > >3. TOAST'ing in slices > >4. Thrift datatype support > > > Having recently worked with Thrift, I recommend ... don't use Thrift. The > library is awkward to work with, it isn't very source-compatible across > versions. > > Consider protobuf instead. Craig, I believe you probably did something wrong if you had to work with some library directly. Actually you generate classes from text description and just use them. I worked with Thrift some time ago, in 2015 [1]. I wouldn't call it awkward. Protobuf is fine too, but unfortunately we don't have any Protobuf-related projects this time. Also it's probably worth noticing that the GSoC project doesn't imply using any existing libraries, only the binary format which is quite stable. Christos, I appreciate your interest in the Thrift-related project. You should know however that we already have a student interested in it [2]. Feel free to apply for it as well but in this case be prepared for a little competition. [1]: https://github.com/afiskon/scala-thrift-example/blob/master/src/test/scala/me/eax/examples/thrift/tests/BinaryProtocol.scala#L15 [2]: https://postgr.es/m/CA%2BSXE9sP1iHNp9_DFJzdbE0cszAA-QF8d-8GAUyoCA4q9KCsGw%40mail.gmail.com -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: Using base backup exclusion filters to reduce data transferred with pg_rewind
05.02.2018 10:10, Michael Paquier: So the patch set attached is made of the following: - 0001, which refactors all hardcoded system paths into pg_paths.h. This modifies only initdb.c and basebackup.c to ease reviews. - 0002 spreads the path changes and the use of pg_paths.h across the core code. - 0003 moves the last set of definitions with backup_label, tablespace_map and pg_internal.init. - 0004 creates basebackup_paths.h, this can be consumed by pg_rewind. - 0005 makes the changes for pg_rewind. Thank you for this set of patches. This refactoring makes code way more convenient to read and change. Due to some merge conflicts, patch 0002 was not applying clearly. So I attach the updated version. I also noticed a couple of rows that were not updated, and wrote a patch 0006, which contains just minor changes and can be applied on top of any patch after 0003. Since these patches contain mostly cosmetic changes and do not break anything, I think it's fine to mark this thread as Ready For Committer without long discussion. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 6698d430ffb6bd2e33aba0b21dc2a9358cf6328c Mon Sep 17 00:00:00 2001 From: Michael PaquierDate: Mon, 5 Feb 2018 13:03:38 +0900 Subject: [PATCH 1/5] Refactor path definitions into a single header file, pg_paths.h The definition of all internal paths of PostgreSQL are spread across the code base, making tracing of those definitions difficult to begin with, and resulting in a lot of code paths to be hardcoded. While this has no real practical consequences in practice, this makes the creation of lists manipulating those paths less maintainable as as things stand the already-hardcoded paths need to be copied around more. In order to improve things for the long-term, move all those definitions into a single header file. This commit does a first step by switching basebackup.c and initdb.c to use them. An upcoming commit will make all the definitions across the code base use this new header more consistently. --- src/backend/postmaster/postmaster.c | 1 + src/backend/postmaster/syslogger.c | 1 + src/backend/replication/basebackup.c | 16 ++-- src/backend/storage/ipc/dsm.c| 1 + src/backend/storage/ipc/dsm_impl.c | 1 + src/backend/storage/lmgr/predicate.c | 3 +- src/backend/utils/adt/misc.c | 1 + src/bin/initdb/initdb.c | 45 ++-- src/include/access/xlog_internal.h | 7 +- src/include/pg_paths.h | 137 +++ src/include/pgstat.h | 3 - src/include/postmaster/syslogger.h | 7 -- src/include/storage/dsm_impl.h | 9 --- src/include/utils/guc.h | 7 -- 14 files changed, 176 insertions(+), 63 deletions(-) create mode 100644 src/include/pg_paths.h diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index f3ddf828bb..66d80914a0 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -105,6 +105,7 @@ #include "libpq/pqsignal.h" #include "miscadmin.h" #include "pg_getopt.h" +#include "pg_paths.h" #include "pgstat.h" #include "port/pg_bswap.h" #include "postmaster/autovacuum.h" diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c index f70eea37df..c8770f6c61 100644 --- a/src/backend/postmaster/syslogger.c +++ b/src/backend/postmaster/syslogger.c @@ -35,6 +35,7 @@ #include "libpq/pqsignal.h" #include "miscadmin.h" #include "nodes/pg_list.h" +#include "pg_paths.h" #include "pgstat.h" #include "pgtime.h" #include "postmaster/fork_process.h" diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index dd7ad64862..63ab81f837 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -117,25 +117,25 @@ static const char *excludeDirContents[] = * even if the intention is to restore to another master. See backup.sgml * for a more detailed description. */ - "pg_replslot", + PG_REPLSLOT_DIR, /* Contents removed on startup, see dsm_cleanup_for_mmap(). */ PG_DYNSHMEM_DIR, /* Contents removed on startup, see AsyncShmemInit(). */ - "pg_notify", + PG_NOTIFY_DIR, /* * Old contents are loaded for possible debugging but are not required for * normal operation, see OldSerXidInit(). */ - "pg_serial", + PG_SERIAL_DIR, /* Contents removed on startup, see DeleteAllExportedSnapshotFiles(). */ - "pg_snapshots", + PG_SNAPSHOTS_DIR, /* Contents zeroed on startup, see StartupSUBTRANS(). */ - "pg_subtrans", + PG_SUBTRANS_DIR, /* end of list */ NULL @@ -147,7 +147,7 @@ static const char *excludeDirContents[] = static const char *excludeFiles[] = { /* Skip auto conf temporary file. */ - PG_AUTOCONF_FILENAME ".tmp", + PG_AUTOCONF_FILENAME_TMP, /* Skip current log file temporary file */
Re: Additional Statistics Hooks
On 13 March 2018 at 11:44, Tom Lanewrote: > While it would certainly be nice to have better behavior for that, > "add a hook so users who can write C can fix it by hand" doesn't seem > like a great solution. On top of the sheer difficulty of writing a > hook function, you'd have the problem that no pre-written hook could > know about all available functions. I think somehow we'd need a way > to add per-function knowledge, perhaps roughly like the protransform > feature. I always imagined that extended statistics could be used for this. Right now the estimates are much better when you create an index on the function, but there's no real reason to limit the stats that are gathered to just plain columns + expression indexes. I believe I'm not the only person to have considered this. Originally extended statistics were named multivariate statistics. I think it was Dean and I (maybe others too) that suggested to Tomas to give the feature a more generic name so that it can be used for a more general purpose later. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: MCV lists for highly skewed distributions
Hi Dean, I've looked over your patch briefly, but not tested it yet. > [construction of a pathological data set] > So the table has around 31 million rows, and the stats target is maxed > out -- we're sampling around 10% of the table, and it's not possible > to sample more. Looking at the value a=5, it appears 102 times in the > table, so we can expect it to be seen around 10 times in any sample, > but the minimum count for the new algorithm in this case is 23, so it > won't be included in the MCV list. This leads to it having the same > estimate as all other non-MCV values, which turns out to be rows=5, a > considerable under-estimate. > So arguably, the new algorithm is still better than the current one > for this data, but the fact that it doesn't include a=5 in the list > bugs me, because the estimate for that single value is consistently > worse. Looking at the distribution of a=5 in the sample, it is a > hypergeometric distribution with N=3100, n=300 and K=102. This > gives a sample mean of 9.8 and a variance of 8.9 (a standard deviation > of 3.0). Mean = n * K / N = 9.8 Var = (K / N) * (1 - K / N) * n * (N - n) / (N - 1) = 8.9 Stdev = sqrt(Var) = 3.0 Got it. > Also, this is common enough that in fact that distribution > can be reasonably approximated by a normal distribution. For the archives, because it's typically seen 10 times in the sample, per the rule of thumb mention upthread? > The value is > excluded from the MCV list because the spread is deemed too large, > relative to the mean, so the relative error of the MCV-based estimate > would be too high. However, not including it in the MCV list causes it > to have an estimate of 5, which would correspond to a sample mean of > 0.5, and the observed sample mean is more than 3 standard deviations > higher than that. So we do actually have enough information to > conclude that the value is almost certainly more common than the > non-MCV selectivity would suggest, and I think that's sufficient > justification for including it in the MCV list. > The crux of this is that the relative-standard-error algorithm is > making use of 2 pieces of information -- the sample frequency, and an > estimate of its statistical spread. However, there is a 3rd piece of > information that we know, that is not being made use of -- the > selectivity that would be assigned to the value if it were not > included in the MCV list. Looking at just the first 2 pieces of > information ensures that we get decent estimates for the values in the > MCV list (where what constitutes "decent" depends on an arbitrarily > chosen RSE cutoff), but it doesn't take into account the fact that the > values not in the list may have much worse estimates. Making use of > the non-MCV-based estimate allows us to do better -- if the MCV-based > estimate is statistically significantly higher than the non-MCV-based > estimate, then it would almost certainly be better to include that > value in the MCV list, even if its spread is quite large. Because we can treat the sample as normal, testing for > 2 standard deviations means that we're "~95% sure" it's more common in the table than the non-MCV selectivity would suggest, right? (I realize I'm not using technically accurate language) In your v1 patch, we didn't need a clamp on 10 values in the sample to treat it as normal, because it was mathematically impossible to pass the RSE test with less than 10 unless we were sampling most of the table, in which case it didn't matter. Is there a similar justification with the new algorithm? > [results summary] > > So HEAD and P1 are both producing too many MCVs. The latest 2 patches > cure that problem, but the new v2 patch is better at picking out all > the common values. Looks good. I'll run some tests of my own this week, trying to find corner cases different from yours. As far as the code goes, I haven't studied it very closely yet, but I did want to call attention to this comment: + /* +* If the value is kept in the MCV list, its population frequency is +* assumed to equal its sample frequency, and the distribution of the +* value's count in the sample is a hypergeomtric distribution with +* the following standard deviation. +*/ The part after "and" doesn't seem to follow from the first part. Would you mind clarifying? -John Naylor
Re: Additional Statistics Hooks
On Tue, Mar 13, 2018 at 4:14 AM, Tom Lanewrote: > Mat Arye writes: >> So the use-case is an analytical query like > >> SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg >> FROM hyper >> WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00' >> GROUP BY MetricMinuteTs >> ORDER BY MetricMinuteTs DESC; > >> Right now this query will choose a much-less-efficient GroupAggregate plan >> instead of a HashAggregate. It will choose this because it thinks the >> number of groups >> produced here is 9,000,000 because that's the number of distinct time >> values there are. >> But, because date_trunc "buckets" the values there will be about 24 groups >> (1 for each hour). > > While it would certainly be nice to have better behavior for that, > "add a hook so users who can write C can fix it by hand" doesn't seem > like a great solution. On top of the sheer difficulty of writing a > hook function, you'd have the problem that no pre-written hook could > know about all available functions. I think somehow we'd need a way > to add per-function knowledge, perhaps roughly like the protransform > feature. Like cost associated with a function, we may associate mapping cardinality with a function. It tells how many distinct input values map to 1 output value. By input value, I mean input argument tuple. In Mat's case the mapping cardinality will be 12. The number of distinct values that function may output is estimated as number of estimated rows / mapping cardinality of that function. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Google Summer of Code: Potential Applicant
Hi all, At first, I appreciate your insights Craig, but I think I will stick with AlgoDS ;) Aleksander, I am mostly interested on the sorting algos benchmark and implementation one. I will start writing a proposal soon enough. Do you have any project related insights as to what I should put in there? Thanks a lot in advance! On Tue, Mar 13, 2018 at 11:55 AM, Aleksander Alekseev < a.aleks...@postgrespro.ru> wrote: > Hello everyone, > > > > I am mostly interested in anything that requires C/C++ implementation > and > > > AlgoDS. > > > > > > For that reason I would love to work in any of the following (in that > > > order of preference): > > > > > >1. Sorting algorithms benchmark and implementation > > >2. Enhancing amcheck for all AMs > > >3. TOAST'ing in slices > > >4. Thrift datatype support > > > > > Having recently worked with Thrift, I recommend ... don't use Thrift. The > > library is awkward to work with, it isn't very source-compatible across > > versions. > > > > Consider protobuf instead. > > Craig, I believe you probably did something wrong if you had to work > with some library directly. Actually you generate classes from text > description and just use them. I worked with Thrift some time ago, in > 2015 [1]. I wouldn't call it awkward. Protobuf is fine too, but > unfortunately we don't have any Protobuf-related projects this time. > Also it's probably worth noticing that the GSoC project doesn't imply > using any existing libraries, only the binary format which is quite > stable. > > Christos, I appreciate your interest in the Thrift-related project. You > should know however that we already have a student interested in it [2]. > Feel free to apply for it as well but in this case be prepared for a > little competition. > > [1]: https://github.com/afiskon/scala-thrift-example/blob/ > master/src/test/scala/me/eax/examples/thrift/tests/ > BinaryProtocol.scala#L15 > [2]: https://postgr.es/m/CA%2BSXE9sP1iHNp9_DFJzdbE0cszAA- > QF8d-8GAUyoCA4q9KCsGw%40mail.gmail.com > > -- > Best regards, > Aleksander Alekseev >
Re: Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi Ivan, On 3/6/18 9:25 PM, Michael Paquier wrote: > On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote: >> Hello, I now is preparing the patch over syntax that Simon offered. And in >> few day I will update the patch. >> Thank you for your interest in thread. > > It has been more than one month since a patch update has been requested, > and time is growing short. This refactored patch introduces a whole new > concept as well, so my recommendation would be to mark this patch as > returned with feedback, and then review it freshly for v12 if this > concept is still alive and around. This patch wasn't updated at the beginning of the CF and still hasn't been updated after almost two weeks. I have marked the patch Returned with Feedback. Please resubmit to a new CF when you have an updated patch. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
On Mon, Mar 12, 2018 at 7:18 PM, Amit Kapilawrote: > On Mon, Mar 12, 2018 at 12:18 AM, Alexander Korotkov > wrote: >> On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila wrote: >>> >>> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro >>> > If that is indeed a race, could it be fixed by >>> > calling PredicateLockPageSplit() at the start of _hash_splitbucket() >>> > instead? >>> > >>> >>> Yes, but I think it would be better if we call this once we are sure >>> that at least one tuple from the old bucket has been transferred >>> (consider if all tuples in the old bucket are dead). >> >> >> Is it really fair? For example, predicate lock can be held by session >> which queried some key, but didn't find any corresponding tuple. >> If we imagine this key should be in new bucket while all existing >> tuples would be left in old bucket. As I get, in this case no locks >> would be transferred since no tuples were moved to the new bucket. >> So, further insertion to the new bucket wouldn't conflict with session, >> which looked for non-existing key, while it should. Do it make sense? >> > > Valid point, I think on split we should always transfer locks from old > bucket to new bucket. > Attached patch changes it as per above suggestion. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Predicate-Locking-in-hash-index_v8.patch Description: Binary data
Re: PATCH: Unlogged tables re-initialization tests
Thanks for reviewing, Peter. On 3/9/18 5:23 PM, Peter Eisentraut wrote: > This seems like a useful test. > > On 3/5/18 12:35, David Steele wrote: >> +mkdir($tablespaceDir) >> +or die "unable to mkdir \"$tablespaceDir\""; > > Use BAIL_OUT instead of die in tests. Done. >> +$ts1UnloggedPath = $node->safe_psql('postgres', >> +q{select pg_relation_filepath('ts1_unlogged')}); > > strange indentation Sure is. Fixed. >> +# Write forks to test that they are removed during recovery >> +$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"], >> +'touch vm fork in base'); >> +$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"], >> +'touch fsm fork in base'); > > These are not tests, just some prep work. So they should not use > command_ok. Removed command_ok(). > It would probably also be better to avoid the Unix-specific touch > command and instead use Perl code to open and write to the files. Updated to use append_to_file(). >> +# Check unlogged table in base >> +ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base'); >> +ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base'); >> +ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base'); >> +ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base'); > > These test names could be a bit more verbose and distinct, for example, > 'main fork was recreated after restart'. Done. A new patch is attached. Thanks, -- -David da...@pgmasters.net >From 6e082e9db8f401d986a03d02023173e37063996b Mon Sep 17 00:00:00 2001 From: David SteeleDate: Tue, 13 Mar 2018 10:06:09 -0400 Subject: [PATCH 1/1] Add regression tests for reinit.c. These were originally written for a patch that refactored reinit.c. That refactor ended up not being needed, but it seems like a good idea to add the tests. --- src/test/recovery/t/014_unlogged_reinit.pl | 97 ++ 1 file changed, 97 insertions(+) create mode 100644 src/test/recovery/t/014_unlogged_reinit.pl diff --git a/src/test/recovery/t/014_unlogged_reinit.pl b/src/test/recovery/t/014_unlogged_reinit.pl new file mode 100644 index 00..e87c805e46 --- /dev/null +++ b/src/test/recovery/t/014_unlogged_reinit.pl @@ -0,0 +1,97 @@ +# Tests that unlogged tables are properly reinitialized after a crash. +# +# The behavior should be the same when restoring from a backup but that is not +# tested here. +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 12; + +# Initialize node without replication settings +my $node = get_new_node('main'); + +$node->init; +$node->start; +my $pgdata = $node->data_dir; + +# Create an unlogged table to test that forks other than init are not copied +$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)'); + +my $baseUnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('base_unlogged')}); + +# Make sure main and init forks exist +ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base'); +ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base'); + +# Create unlogged tables in a tablespace +my $tablespaceDir = undef; +my $ts1UnloggedPath = undef; + +$tablespaceDir = TestLib::tempdir . "/ts1"; + +mkdir($tablespaceDir) + or BAIL_OUT("unable to mkdir '$tablespaceDir'"); + +$node->safe_psql('postgres', + "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'"); +$node->safe_psql('postgres', + 'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1'); + +$ts1UnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('ts1_unlogged')}); + +# Make sure main and init forks exist +ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace'); +ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace'); + +# Crash the postmaster +$node->stop('immediate'); + +# Write forks to test that they are removed during recovery +append_to_file("$pgdata/${baseUnloggedPath}_vm", 'TEST_VM'); +append_to_file("$pgdata/${baseUnloggedPath}_fsm", 'TEST_FSM'); + +# Remove main fork to test that it is recopied from init +unlink("$pgdata/${baseUnloggedPath}") + or BAIL_OUT("unable to remove '${baseUnloggedPath}'"); + +# Write forks to test that they are removed by recovery +append_to_file("$pgdata/${ts1UnloggedPath}_vm", 'TEST_VM'); +append_to_file("$pgdata/${ts1UnloggedPath}_fsm", 'TEST_FSM'); + +# Remove main fork to test that it is recopied from init +unlink("$pgdata/${ts1UnloggedPath}") + or BAIL_OUT("unable to remove '${ts1UnloggedPath}'"); + +# Start the postmaster +$node->start; + +# Check unlogged table in base +ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork still exists in base'); +ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base recreated at startup'); +ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork in base removed at startup'); +ok(!-f "$pgdata/${baseUnloggedPath}_fsm", + 'fsm fork in base removed at startup'); + +# Drop unlogged table
Re: Fix error in ECPG while connection handling
> Now, ideally the connection would have been null here, but, as the > 'ClosePortalStmt' > rule freed the connection but did not set it to NULL, it still sees > that there > is a connection(which is actually having garbage in it) and throws an > error. Thanks for spotting and fixing. I will push the patch as soon as I'm online again. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Alexander Korotkov wrote: > And what happen if somebody concurrently set (fastupdate = on)? > Can we miss conflicts because of that? I think it'd be better to have that option require AccessExclusive lock, so that it can never be changed concurrently with readers. Seems to me that penalizing every single read to cope with this case would be a bad trade-off. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: INOUT parameters in procedures
On 3/8/18 02:25, Pavel Stehule wrote: > It looks like some error in this concept. The rules for enabling > overwriting procedures should modified, so this collision should not be > done. > > When I using procedure from PL/pgSQL, then it is clear, so I place on > *OUT position variables. But when I call procedure from top, then I'll > pass fake parameters to get some result. What we'll probably want to do here is to make the OUT parameters part of the identity signature of procedures, unlike in functions. This should be a straightforward change, but it will require some legwork in many parts of the code. > if (argmodes && (argmodes[i] == PROARGMODE_INOUT || > argmodes[i] == PROARGMODE_OUT)) > + { > + Param *param; > > Because PROARGMODE_OUT are disallowed, then this check is little bit > messy. Please, add some comment. Fixed. I discovered another issue, in LANGUAGE SQL procedures. Currently, if you make a CALL with an INOUT parameter in an SQL procedure, the output is thrown away (unless it's the last command). I would like to keep open the option of assigning the results by name, like we do in PL/pgSQL. So in this patch I have made a change to prohibit calling procedures with INOUT parameters in LANGUAGE SQL routines (see check_sql_fn_statements()). What do you think? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 5b9f1506e73826f4f6ff567e54b12c4e232a4263 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Mon, 12 Mar 2018 21:39:26 -0400 Subject: [PATCH v4] Support INOUT parameters in procedures In a top-level CALL, the values of INOUT parameters will be returned as a result row. In PL/pgSQL, the values are assigned back to the input parameters. In other languages, the same convention as for return a record from a function is used. That does not require any code changes in the PL implementations. Reviewed-by: Pavel Stehule --- doc/src/sgml/plperl.sgml | 14 +++ doc/src/sgml/plpgsql.sgml | 16 +++ doc/src/sgml/plpython.sgml | 11 ++ doc/src/sgml/pltcl.sgml| 12 ++ doc/src/sgml/ref/create_procedure.sgml | 7 +- src/backend/catalog/pg_proc.c | 4 +- src/backend/commands/functioncmds.c| 51 +++-- src/backend/executor/functions.c | 51 + src/backend/tcop/utility.c | 3 +- src/backend/utils/fmgr/funcapi.c | 11 +- src/include/commands/defrem.h | 3 +- src/include/executor/functions.h | 2 + src/include/funcapi.h | 3 +- src/pl/plperl/expected/plperl_call.out | 25 + src/pl/plperl/sql/plperl_call.sql | 22 src/pl/plpgsql/src/expected/plpgsql_call.out | 89 +++ .../plpgsql/src/expected/plpgsql_transaction.out | 2 +- src/pl/plpgsql/src/pl_comp.c | 10 +- src/pl/plpgsql/src/pl_exec.c | 125 - src/pl/plpgsql/src/pl_funcs.c | 25 + src/pl/plpgsql/src/pl_gram.y | 38 +-- src/pl/plpgsql/src/pl_scanner.c| 1 + src/pl/plpgsql/src/plpgsql.h | 12 ++ src/pl/plpgsql/src/sql/plpgsql_call.sql| 108 ++ src/pl/plpython/expected/plpython_call.out | 23 src/pl/plpython/plpy_exec.c| 24 ++-- src/pl/plpython/sql/plpython_call.sql | 20 src/pl/tcl/expected/pltcl_call.out | 26 + src/pl/tcl/sql/pltcl_call.sql | 23 src/test/regress/expected/create_procedure.out | 21 src/test/regress/sql/create_procedure.sql | 19 31 files changed, 752 insertions(+), 49 deletions(-) diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml index cff7a847de..9295c03db9 100644 --- a/doc/src/sgml/plperl.sgml +++ b/doc/src/sgml/plperl.sgml @@ -278,6 +278,20 @@ PL/Perl Functions and Arguments hash will be returned as null values. + + Similarly, output parameters of procedures can be returned as a hash + reference: + + +CREATE PROCEDURE perl_triple(INOUT a integer, INOUT b integer) AS $$ +my ($a, $b) = @_; +return {a = $a * 3, b = $b * 3}; +$$ LANGUAGE plperl; + +CALL perl_triple(5, 10); + + + PL/Perl functions can also return sets of either scalar or composite types. Usually you'll want to return rows one at a diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index c1e3c6a19d..6c25116538 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1870,6 +1870,22 @@ Returning From a
Re: Google Summer of Code: Potential Applicant
Hello Christos, > Do you have any project related insights as to what I should put in > there? Nope :) I believe Andrey Borodin and Atri Sharma have (added to CC). -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: Using base backup exclusion filters to reduce data transferred with pg_rewind
On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote: > Since these patches contain mostly cosmetic changes and do not break > anything, I think it's fine to mark this thread as Ready For Committer > without long discussion. Thanks Anastasia for the review. The refactoring is quite intuitive I think, still that's perhaps a bit too much intrusive. Extra opinions about that are welcome. As I read again the patch set, please note that I am not much happy yet about the part for the handling of temporary files when applying the filters in pg_rewind and the lack inconsistency for file filters and directory filters... -- Michael signature.asc Description: PGP signature
Re: SQL/JSON: functions
On Tue, Mar 13, 2018 at 2:04 PM, Simon Riggswrote: > On 7 March 2018 at 14:34, Nikita Glukhov wrote: > >> Attached 12th version of SQL/JSON patches rebased onto the latest master. > > Please write some docs or notes to go with this. > > If you drop a big pile of code with no explanation it will just be ignored. > > I think many people want SQL/JSON, but the purpose of a patch > submission is to allow a committer to review and commit without > needing to edit anything. It shouldn't be like assembling flat pack > furniture while wearing a blindfold. The docs are here https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md It's not easy to write docs for SQL/JSON in xml, so I decided to write in more friendly way. We'll have time to convert it to postgres format. > > -- > Simon Riggshttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Partition-wise aggregation/grouping
Hi, I have resolved all the comments/issues reported in this new patch-set. Changes done by Ashutosh Bapat for splitting out create_grouping_paths() into separate functions so that partitionwise aggregate code will use them were based on my partitionwise aggregate changes. Those were like refactoring changes. And thus, I have refactored them separately and before any partitionwise changes (see 0005-Split-create_grouping_paths-and-Add-GroupPathExtraDa.patch). And then I have re-based all partitionwise changes over it including all fixes. The patch-set is complete now. But still, there is a scope of some comment improvements due to all these refactorings. I will work on it. Also, need to update few documentations and indentations etc. Will post those changes in next patch-set. But meanwhile, this patch-set is ready to review. On Tue, Mar 13, 2018 at 9:12 AM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke >wrote: > > > > > > On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat > > wrote: > >> > >> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat > >> > We don't need UpperRelationKind member in that structure. That will be > provided by the RelOptInfo passed. > > The problem here is the extra information required for grouping is not > going to be the same for that needed for window aggregate and > certainly not for ordering. If we try to jam everything in the same > structure, it will become large with many members useless for a given > operation. A reader will not have an idea about which of them are > useful and which of them are not. So, instead we should try some > polymorphism. I think we can pass a void * to GetForeignUpperPaths() > and corresponding FDW hook knows what to cast it to based on the > UpperRelationKind passed. > Yep. Done this way. > > BTW, the patch has added an argument to GetForeignUpperPaths() but has > not documented the change in API. If we go the route of polymorphism, > we will need to document the mapping between UpperRelationKind and the > type of structure passed in. > Oops. Will do that in next patchset. Thanks for pointing out, I have missed this at first place it self. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company partition-wise-agg-v17.tar.gz Description: application/gzip
Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
> 13 марта 2018 г., в 17:02, Alexander Korotkov> написал(а): > > BTW to BTW. I think we should check pending list size with > GinGetPendingListCleanupSize() here > + > + /* > +* If fast update is enabled, we acquire a predicate lock on the > entire > +* relation as fast update postpones the insertion of tuples into > index > +* structure due to which we can't detect rw conflicts. > +*/ > + if (GinGetUseFastUpdate(ginstate->index)) > + PredicateLockRelation(ginstate->index, snapshot); > > Because we can alter alter index set (fastupdate = off), but there still will > be pending list. > > And what happen if somebody concurrently set (fastupdate = on)? > Can we miss conflicts because of that? No, AccessExclusiveLock will prevent this kind of problems with enabling fastupdate. Best regards, Andrey Borodin.
Re: JIT compiling with LLVM v11
On 3/12/18 17:04, Andres Freund wrote: > │ JIT: >│ > │ Functions: 4 >│ > │ Inlining: false >│ > │ Inlining Time: 0.000 >│ > │ Optimization: false >│ > │ Optimization Time: 5.023 >│ > │ Emission Time: 34.987 >│ The time quantities need some units. > │ Execution time: 46.277 ms >│ like this :) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: Configurable file mode mask
On 3/13/18 2:46 AM, Michael Paquier wrote: > On Mon, Mar 12, 2018 at 03:14:13PM -0400, Stephen Frost wrote: >> We already had a discussion about having a GUC for this and concluded, >> rightly in my view, that it's not sensible to have since we don't want >> all of the various tools having to read and parse out postgresql.conf. > > If the problem is parsing, it could as well be more portable to put that > in the control file, no? The current approach is based on early discussion of this patch, around [1] and [2] in particular. I proposed an enforcing GUC at that time but there wasn't any interest in the idea. I definitely think it's overkill to put a field in pg_control as that requires more tooling to update values. >> I don't see anything in the discussion which has changed that and I >> don't agree that there's an issue with using the privileges on the data >> directory for this- it's a simple solution which all of the tools can >> use and work with easily. I certainly don't agree that it's a serious >> issue to relax the explicit check- it's just a check, which a user could >> implement themselves if they wished to and had a concern for. On the >> other hand, with the explicit check, we are actively preventing an >> entirely reasonable goal of wanting to use a read-only role to perform a >> backup of the system. > > Well, one thing is that the current checks in the postmaster make sure > that a data folder is never using anything else than 0700. From a > security point of view, making it possible to allow a postmaster to > start with 0750 is a step backwards if users don't authorize it > explicitely. I would argue that changing the mode of PGDATA is explicit, even if it is accidental. To be clear, after a pg_upgrade the behavior of the cluster WRT to setting the mode would be exactly the same as now. The user would need to specify -g at initdb time or explicitly update PGDATA to 750 for group access to be enabled. > There are a lot of systems which use a bunch of users with > only single group with systemd. So this would remove an existing > safeguard. I am not against the idea of this thread, just that I think > that secured defaults should be user-enforceable if they want Postgres > to behave so. As Stephen notes, this can be enforced by the user if they want to, and without much effort (and with better tools). Regards, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/20526.1489428968%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/22248.1489431803%40sss.pgh.pa.us signature.asc Description: OpenPGP digital signature
Re: Transform for pl/perl
Hi. I have reviewed this patch too. Attached new version with v8-v9 delta-patch. Here is my changes: * HV_ToJsonbValue(): - addded missing hv_iterinit() - used hv_iternextsv() instead of hv_iternext(), HeSVKEY_force(), HeVAL() * SV_ToJsonbValue(): - added recursive dereferencing for all SV types - removed unnecessary JsonbValue heap-allocations * Jsonb_ToSV(): - added iteration to the end of iterator needed for correct freeing of JsonbIterators * passed JsonbParseState ** to XX_ToJsonbValue() functions. * fixed warnings (see below) * fixed comments (see below) Also I am not sure if we need to use newRV() for returning SVs in Jsonb_ToSV() and JsonbValue_ToSV(). On 12.03.2018 18:06, Pavel Stehule wrote: 2018-03-12 9:08 GMT+01:00 Anthony Bykov>: On Mon, 5 Mar 2018 14:03:37 +0100 Pavel Stehule > wrote: > Hi > > I am looking on this patch. I found few issues: > > 1. compile warning > > I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 > -I/usr/lib64/perl5/CORE -c -o jsonb_plperl.o jsonb_plperl.c > jsonb_plperl.c: In function ‘SV_FromJsonbValue’: > jsonb_plperl.c:69:9: warning: ‘result’ may be used uninitialized in > this function [-Wmaybe-uninitialized] > return result; > ^~ > jsonb_plperl.c: In function ‘SV_FromJsonb’: > jsonb_plperl.c:142:9: warning: ‘result’ may be used uninitialized in > this function [-Wmaybe-uninitialized] > return result; > ^~ Hello, thanks for reviewing my patch! I really appreciate it. That warnings are created on purpose - I was trying to prevent the case when new types are added into pl/perl, but new transform logic was not. Maybe there is a better way to do it, but right now I'll just add the "default: pg_unreachable" logic. pg_unreachable() is replaced with elog(ERROR) for reporting impossible JsonbValue types and JsonbIteratorTokens. > 3. Do we need two identical tests fro PLPerl and PLPerlu? Why? > > Regards > > Pavel About the 3 point - I thought that plperlu and plperl uses different interpreters. And if they act identically on same examples - there is no need in identical tests for them indeed. plperlu and plperl uses same interprets - so the duplicate tests has not any sense. But in last versions there are duplicate tests again I have not removed duplicate test yet, because I am not sure that this test does not make sense at all. The naming convention of functions is not consistent almost are are src_to_dest This is different and it is little bit messy +static SV * +SV_FromJsonb(JsonbContainer *jsonb) Renamed to Jsonb_ToSV() and JsonbValue_ToSV(). This comment is broken +/* + * plperl_to_jsonb(SV *in) + * + * Transform Jsonb into SV ---< should be SV to Jsonb + */ +PG_FUNCTION_INFO_V1(plperl_to_jsonb); +Datum Fixed. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/Makefile b/contrib/Makefile index 8046ca4..53d44fe 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -75,9 +75,9 @@ ALWAYS_SUBDIRS += sepgsql endif ifeq ($(with_perl),yes) -SUBDIRS += hstore_plperl +SUBDIRS += hstore_plperl jsonb_plperl else -ALWAYS_SUBDIRS += hstore_plperl +ALWAYS_SUBDIRS += hstore_plperl jsonb_plperl endif ifeq ($(with_python),yes) diff --git a/contrib/jsonb_plperl/Makefile b/contrib/jsonb_plperl/Makefile new file mode 100644 index 000..cd86553 --- /dev/null +++ b/contrib/jsonb_plperl/Makefile @@ -0,0 +1,40 @@ +# contrib/jsonb_plperl/Makefile + +MODULE_big = jsonb_plperl +OBJS = jsonb_plperl.o $(WIN32RES) +PGFILEDESC = "jsonb_plperl - jsonb transform for plperl" + +PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl + +EXTENSION = jsonb_plperlu jsonb_plperl +DATA = jsonb_plperlu--1.0.sql jsonb_plperl--1.0.sql + +REGRESS = jsonb_plperl jsonb_plperl_relocatability jsonb_plperlu jsonb_plperlu_relocatability + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/jsonb_plperl +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +# We must link libperl explicitly +ifeq ($(PORTNAME), win32) +# these settings are the same as for plperl +override CPPFLAGS += -DPLPERL_HAVE_UID_GID -Wno-comment +# ... see silliness in plperl Makefile ... +SHLIB_LINK += $(sort $(wildcard ../../src/pl/plperl/libperl*.a)) +else +rpathdir = $(perl_archlibexp)/CORE +SHLIB_LINK += $(perl_embed_ldflags) +endif + +# As with plperl we need to make sure that the CORE directory is included +# last, probably because it sometimes contains some header files with names +# that clash with some of ours, or with some that we include, notably on +# Windows.
Re: PATCH: Configurable file mode mask
Stephen Frostwrites: > * Michael Paquier (mich...@paquier.xyz) wrote: >> If the problem is parsing, it could as well be more portable to put that >> in the control file, no? > Then we'd need a tool to allow changing it for people who do want to > change it. There's no reason we should have to have an extra tool for > this- an administrator who chooses to change the privileges on the data > folder should be able to do so and I don't think anyone is going to > thank us for requiring them to use some special tool to do so for > PostgreSQL. FWIW, I took a quick look through this patch and don't have any problem with the approach, which appears to be "use the data directory's startup-time permissions as the status indicator". I am kinda wondering about this though: +(These files can confuse pg_ctl.) If group read +access is enabled on the data directory and an unprivileged user in the +PostgreSQL group is performing the backup, then +postmaster.pid will not be readable and must be +excluded. If we're allowing group read on the data directory, why should that not include postmaster.pid? There's nothing terribly security-relevant in there, and being able to find out the postmaster PID seems useful. I can see the point of restricting server key files, as documented elsewhere, but it's possible to keep those somewhere else so they don't cause problems for simple backup software. Also, in general, this patch desperately needs a round of copy-editing, particularly with attention to the comments. For instance, there are a minimum of three grammatical errors in this one comment: + * Group read/execute may optional be enabled on PGDATA so any frontend tools + * That write into PGDATA must know what mask to set and the permissions to + * use for creating files and directories. In a larger sense, this fails to explain the operating principle, namely what I stated above, that we allow the existing permissions on PGDATA to decide what we allow as group permissions. If you don't already understand that, you're going to have a hard time extracting it from either file_perm.h or file_perm.c. (The latter fails to even explain what the argument of DataDirectoryMask is.) regards, tom lane
Re: Google Summer of Code: Potential Applicant
Greetings, * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > Craig, I believe you probably did something wrong if you had to work > with some library directly. Actually you generate classes from text > description and just use them. I worked with Thrift some time ago, in > 2015 [1]. I wouldn't call it awkward. Protobuf is fine too, but > unfortunately we don't have any Protobuf-related projects this time. Just to be clear, the list on the wiki is just a set of suggestions- students are welcome to propose their own projects as well. > Also it's probably worth noticing that the GSoC project doesn't imply > using any existing libraries, only the binary format which is quite > stable. A student proposal should really include information about what other libraries, if any, are being considered for the project as that will play into the consideration as to if it's something we would be interested in including in PG or not. Thanks! Stephen signature.asc Description: PGP signature
Re: PATCH: Configurable file mode mask
Michael, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Mar 12, 2018 at 03:14:13PM -0400, Stephen Frost wrote: > > We already had a discussion about having a GUC for this and concluded, > > rightly in my view, that it's not sensible to have since we don't want > > all of the various tools having to read and parse out postgresql.conf. > > If the problem is parsing, it could as well be more portable to put that > in the control file, no? I have finished for example finished by > implementing my own flavor of pg_controldata to save parsing efforts > soas it prints control file fields on a per-call basis using individual > fields, which saved also games with locales for as translations of > pg_controldata can disturb the parsing logic. Then we'd need a tool to allow changing it for people who do want to change it. There's no reason we should have to have an extra tool for this- an administrator who chooses to change the privileges on the data folder should be able to do so and I don't think anyone is going to thank us for requiring them to use some special tool to do so for PostgreSQL. > > I don't see anything in the discussion which has changed that and I > > don't agree that there's an issue with using the privileges on the data > > directory for this- it's a simple solution which all of the tools can > > use and work with easily. I certainly don't agree that it's a serious > > issue to relax the explicit check- it's just a check, which a user could > > implement themselves if they wished to and had a concern for. On the > > other hand, with the explicit check, we are actively preventing an > > entirely reasonable goal of wanting to use a read-only role to perform a > > backup of the system. > > Well, one thing is that the current checks in the postmaster make sure > that a data folder is never using anything else than 0700. From a > security point of view, making it possible to allow a postmaster to > start with 0750 is a step backwards if users don't authorize it > explicitely. There are a lot of systems which use a bunch of users with > only single group with systemd. So this would remove an existing > safeguard. I am not against the idea of this thread, just that I think > that secured defaults should be user-enforceable if they want Postgres > to behave so. I'm aware of what the current one-time check in the postmaster does, and that we don't implement it on all platforms, making me seriously doubt that the level of concern being raised here makes sense. Should we consider it a security issue that the Windows builds don't perform this check, and never has? Further, if the permissions are changed without authorization, it's probably done while the database is running and unlikely to be noticed for days, weeks, or longer, if the administrator is depending on PG to let them know of the change. Considering that the only user who can change the privileges is a database owner or root, it seems even less likely to help (why would an attacker change those privileges when they already have full access?). Lastly, the user *is* able to enforce the privileges on the data directory if they wish to, using tools such as tripwire which are built specifically to provide such checks and to do so regularly instead of the extremely ad-hoc check provided by PG. PostgreSQL should, and does, secure the data directory when it's created by initdb, and subsequent files and directories are similairly secured appropriately. Ultimately, the default which makes sense here isn't a one-size-fits-all but is system dependent and the administrator should be able to choose what permissions they want to have. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw
Greetings, * Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote: > On Mon, Mar 12, 2018 at 1:25 PM, Etsuro Fujita >wrote: > > (2018/03/09 20:55), Etsuro Fujita wrote: > >> (2018/03/08 14:24), Ashutosh Bapat wrote: > >>> For local constraints to be enforced, we use remote > >>> constraints. For local WCO we need to use remote WCO. That means we > >>> create many foreign tables pointing to same local table on the foreign > >>> server through many views, but it's not impossible. > >> > >> Maybe I don't understand this correctly, but I guess that it would be > >> the user's responsibility to not create foreign tables in such a way. > > > > I think I misunderstood your words. Sorry for that. I think what you > > proposed would be a solution for this issue, but I'm not sure that's a good > > one because that wouldn't work for the data sources that don't support views > > with WCO options. > > Our solution for the constraints doesn't work with the data sources > (like flat files) which don't support constraints. So, that argument > doesn't help. It would really help to have some examples of exactly what is being proposed here wrt solutions. WCO is defined at a view level, so I'm not following the notion that we're going to depend on something remote to enforce the WCO when the remote object is just a regular table that you can't define a WCO on top of. That's not the case when we're talking about foreign tables vs. local tables, so it's not the same. I certainly don't think we should require a remote view to exist to perform the WCO check. If the remote WCO is a view itself then I would expect it to operate in the same manner as WCO on local views does- you can have them defined as being cascaded or not. In other words, there is no case where we have a "foreign view." Views are always local. A "foreign table" could actually be a view, in which case everything we treat it as a table in the local database, but WCO doesn't come up in that case at all- there's no way to define WCO on a table, foreign or not. If WCO is defined on the view on the remote server, then it should operate properly and not require anything from the local side. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] Proposal: generic WAL compression
On 02/07/2018 09:02 PM, Markus Nullmeier wrote: One general comment I can already make is that enabling compression should be made optional, which appears to be a small and easy addition to the generic WAL API. The new version of the patch is attached. In order to control generic WAL compression the new structure PageXLogCompressParams is introduced. It is passed as an additional parameter into GenericXLogRegisterBufferEx, so the access method can control its own WAL compression. GenericXLogRegisterBuffer uses default compression settings which appeared to be a reasonable tradeoff between performance overheads and compression rate on RUM. On my HDD PostgreSQL works even 10% faster for some RUM workloads because of reducing size of generic WAL to be written. Oleg Ivanov Postgres Professional The Russian PostgreSQL Company diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c index ce02354..fa3cd4e 100644 --- a/src/backend/access/transam/generic_xlog.c +++ b/src/backend/access/transam/generic_xlog.c @@ -43,19 +43,41 @@ * a full page's worth of data. *- */ + +#define MAX_ALIGN_MISMATCHES 255 +/* MAX_ALIGN_MISMATCHES is not supposed to be greater than PG_UINT8_MAX */ +#if MAX_ALIGN_MISMATCHES > PG_UINT8_MAX +#error "MAX_ALIGN_MISMATCHES must be not greater than PG_UINT8_MAX" +#endif + #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber)) +#define REGION_HEADER_SIZE (sizeof(char) + sizeof(int)) +#define DIFF_DELTA_HEADER_SIZE (sizeof(char) + 2 * sizeof(OffsetNumber)) +#define MISMATCH_HEADER_SIZE (sizeof(char) + sizeof(uint8) + \ + sizeof(OffsetNumber)) #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE -#define MAX_DELTA_SIZE (BLCKSZ + 2 * FRAGMENT_HEADER_SIZE) +#define MAX_DELTA_SIZE (BLCKSZ + \ + 2 * REGION_HEADER_SIZE + \ + 2 * FRAGMENT_HEADER_SIZE + \ + 2 * DIFF_DELTA_HEADER_SIZE + \ + MAX_ALIGN_MISMATCHES * MISMATCH_HEADER_SIZE \ + + sizeof(bool)) + +#define writeToPtr(ptr, x) memcpy(ptr, &(x), sizeof(x)), ptr += sizeof(x) +#define readFromPtr(ptr, x) memcpy(&(x), ptr, sizeof(x)), ptr += sizeof(x) /* Struct of generic xlog data for single page */ typedef struct { Buffer buffer; /* registered buffer */ int flags; /* flags for this buffer */ + int deltaLen; /* space consumed in delta field */ char *image; /* copy of page image for modification, do not * do it in-place to have aligned memory chunk */ char delta[MAX_DELTA_SIZE]; /* delta between page images */ + + PageXLogCompressParams compressParams; /* compress parameters */ } PageData; /* State of generic xlog record construction */ @@ -71,15 +93,77 @@ struct GenericXLogState bool isLogged; }; +/* Describes for the region which type of delta is used in it */ +typedef enum +{ + DIFF_DELTA = 0,/* diff delta with insert, remove and replace + * operations */ + BASE_DELTA = 1/* base delta with update operations only */ +} DeltaType; + +/* Diff delta operations for transforming current page to target page */ +typedef enum +{ + DIFF_INSERT = 0, + DIFF_REMOVE = 1, + DIFF_REPLACE = 2 +} DiffDeltaOperations; + +/* Describes the kind of region of the page */ +typedef enum +{ + UPPER_REGION = 0, + LOWER_REGION = 1 +} RegionType; + static void writeFragment(PageData *pageData, OffsetNumber offset, OffsetNumber len, const char *data); static void computeRegionDelta(PageData *pageData, const char *curpage, const char *targetpage, int targetStart, int targetEnd, - int validStart, int validEnd); + int validStart, int validEnd, + uint8 maxMismatches); static void computeDelta(PageData *pageData, Page curpage, Page targetpage); static void applyPageRedo(Page page, const char *delta, Size deltaSize); +static int alignRegions(const char *curRegion, const char *targetRegion, + int curRegionLen, int targetRegionLen, uint8 maxMismatches); +static int restoreCompactAlignment(const char *curRegion, + const char *targetRegion, + int curRegionLen, + int targetRegionLen, + int numMismatches); + +static bool computeRegionDiffDelta(PageData *pageData, + const char *curpage, const char *targetpage, + int targetStart, int targetEnd, + int validStart, int validEnd, + uint8 maxMismatches); +static const char *applyPageDiffRedo(Page page, const char *delta, Size deltaSize); + +static void computeRegionBaseDelta(PageData *pageData, + const char *curpage, const char *targetpage, + int targetStart, int targetEnd, + int validStart, int validEnd); +static const char *applyPageBaseRedo(Page page, const char *delta, Size deltaSize); + +static bool pageDataContainsDiffDelta(PageData *pageData); +static void downgradeDeltaToBaseFormat(PageData *pageData); + +/* Arrays for the alignment building and for the resulting alignments */
Re: Google Summer of Code: Potential Applicant
Thanks, Aleksander!SP- > 13 марта 2018 г., в 19:03, Aleksander Alekseev> написал(а): > >> Do you have any project related insights as to what I should put in >> there? > Christos, as far as I remember, good proposal must have schedule, implementation details and deliverables. Also, it is good to show that you are capable of executing the project, like mentioning your previous project (no matter commercial, educational or pet projects), achievements etc. GSoC typically have 3 milestones, usually this milestones must have some viable results. There are exact dates, but here I'll put a sketch. Algorithms. June - implement introsort and timsort, July - design benchmarks, implement some other hashtables, August - if benchmarks are successful, then propose patch to commitfest and review others patches from commitfest, else implement more algorithms. amcheck. June - implement checks for Gin (like b-tree in b-tree, resembles existing amcheck), July - checks for Hash, BRIN and SP-GiST, August - RUM, patch, commitfest, reviews. Best regards, Andrey Borodin.
Re: committing inside cursor loop
On 3/6/18 07:48, Ildus Kurbangaliev wrote: > Although as was discussed before it seems inconsistent without ROLLBACK > support. There was a little discussion about it, but no replies. Maybe > the status of the patch should be changed to 'Waiting on author' until > the end of discussion. I'm wondering what the semantics of it should be. For example, consider this: drop table test1; create table test1 (a int, b int); insert into test1 values (1, 11), (2, 22), (3, 33); do language plpgsql $$ declare x int; begin for x in update test1 set b = b + 1 where b > 20 returning a loop raise info 'x = %', x; if x = 2 then rollback; end if; end loop; end; $$; The ROLLBACK call in the first loop iteration undoes the UPDATE command that drives the loop. Is it then sensible to continue the loop? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: Configurable file mode mask
On 3/13/18 11:00 AM, Tom Lane wrote: > Stephen Frostwrites: >> * Michael Paquier (mich...@paquier.xyz) wrote: >>> If the problem is parsing, it could as well be more portable to put that >>> in the control file, no? > >> Then we'd need a tool to allow changing it for people who do want to >> change it. There's no reason we should have to have an extra tool for >> this- an administrator who chooses to change the privileges on the data >> folder should be able to do so and I don't think anyone is going to >> thank us for requiring them to use some special tool to do so for >> PostgreSQL. > > FWIW, I took a quick look through this patch and don't have any problem > with the approach, which appears to be "use the data directory's > startup-time permissions as the status indicator". I am kinda wondering > about this though: > > +(These files can confuse pg_ctl.) If group > read > +access is enabled on the data directory and an unprivileged user in the > +PostgreSQL group is performing the backup, > then > +postmaster.pid will not be readable and must be > +excluded. > > If we're allowing group read on the data directory, why should that not > include postmaster.pid? There's nothing terribly security-relevant in > there, and being able to find out the postmaster PID seems useful. I can > see the point of restricting server key files, as documented elsewhere, > but it's possible to keep those somewhere else so they don't cause > problems for simple backup software. I'm OK with that. I had a look at the warnings regarding the required mode of postmaster.pid in miscinit.c (889-911) and it seems to me they still hold true if the mode is 640 instead of 600. Do you agree, Tom? Stephen? If so, I'll make those changes. > Also, in general, this patch desperately needs a round of copy-editing, > particularly with attention to the comments. For instance, there are a > minimum of three grammatical errors in this one comment: > > + * Group read/execute may optional be enabled on PGDATA so any frontend tools > + * That write into PGDATA must know what mask to set and the permissions to > + * use for creating files and directories. > In a larger sense, this fails to explain the operating principle, > namely what I stated above, that we allow the existing permissions > on PGDATA to decide what we allow as group permissions. If you > don't already understand that, you're going to have a hard time > extracting it from either file_perm.h or file_perm.c. (The latter > fails to even explain what the argument of DataDirectoryMask is.) I'll do comment/doc review for the next round of patches. Thanks, -- -David da...@pgmasters.net
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
David Gouldwrites: > I also thought about the theory and am confident that there really is no way > to trick it. Basically if there are enough pages that are different to affect > the overall density, say 10% empty or so, there is no way a random sample > larger than a few hundred probes can miss them no matter how big the table is. > If there are few enough pages to "hide" from the sample, then they are so few > they don't matter anyway. > After all this my vote is for back patching too. I don't see any case where > the patched analyze is or could be worse than what we are doing. I'm happy to > provide my test cases if anyone is interested. Yeah, you have a point. I'm still worried about unexpected side-effects, but it seems like overall this is very unlikely to hurt anyone. I'll back-patch (minus the removal of the unneeded vac_estimate_reltuples argument). regards, tom lane
Re: Additional Statistics Hooks
On Tue, Mar 13, 2018 at 6:56 AM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Tue, Mar 13, 2018 at 4:14 AM, Tom Lanewrote: > > Mat Arye writes: > >> So the use-case is an analytical query like > > > >> SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg > >> FROM hyper > >> WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00' > >> GROUP BY MetricMinuteTs > >> ORDER BY MetricMinuteTs DESC; > > > >> Right now this query will choose a much-less-efficient GroupAggregate > plan > >> instead of a HashAggregate. It will choose this because it thinks the > >> number of groups > >> produced here is 9,000,000 because that's the number of distinct time > >> values there are. > >> But, because date_trunc "buckets" the values there will be about 24 > groups > >> (1 for each hour). > > > > While it would certainly be nice to have better behavior for that, > > "add a hook so users who can write C can fix it by hand" doesn't seem > > like a great solution. On top of the sheer difficulty of writing a > > hook function, you'd have the problem that no pre-written hook could > > know about all available functions. I think somehow we'd need a way > > to add per-function knowledge, perhaps roughly like the protransform > > feature. > > Like cost associated with a function, we may associate mapping > cardinality with a function. It tells how many distinct input values > map to 1 output value. By input value, I mean input argument tuple. In > Mat's case the mapping cardinality will be 12. The number of distinct > values that function may output is estimated as number of estimated > rows / mapping cardinality of that function. > I think this is complicated by the fact that the mapping cardinality is not a constant per function but depends on the constant given as the first argument to the function and the granularity of the underlying data (do you have a second-granularity or microsecond granularity). I actually think the logic for the estimate here should be the (max(time)-min(time))/interval. I think to be general you need to allow functions on statistics to determine the estimate. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company >
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
David Gouldwrites: > I have attached the patch we are currently using. It applies to 9.6.8. I > have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, > and presumably head but I can update it if there is any interest. > The patch has three main features: > - Impose an ordering on the autovacuum workers worklist to avoid > the need for rechecking statistics to skip already vacuumed tables. > - Reduce the frequency of statistics refreshes > - Remove the AutovacuumScheduleLock As per the earlier thread, the first aspect of that needs more work to not get stuck when the worklist has long tasks near the end. I don't think you're going to get away with ignoring that concern. Perhaps we could sort the worklist by decreasing table size? That's not an infallible guide to the amount of time that a worker will need to spend, but it's sure safer than sorting by OID. Alternatively, if we decrease the frequency of stats refreshes, how much do we even need to worry about reordering the worklist? In any case, I doubt anyone will have any appetite for back-patching such a change. I'd recommend that you clean up your patch and rebase to HEAD, then submit it into the September commitfest (either on a new thread or a continuation of the old #13750 thread, not this one). regards, tom lane
Re: Additional Statistics Hooks
On Tue, Mar 13, 2018 at 6:31 AM, David Rowleywrote: > On 13 March 2018 at 11:44, Tom Lane wrote: > > While it would certainly be nice to have better behavior for that, > > "add a hook so users who can write C can fix it by hand" doesn't seem > > like a great solution. On top of the sheer difficulty of writing a > > hook function, you'd have the problem that no pre-written hook could > > know about all available functions. I think somehow we'd need a way > > to add per-function knowledge, perhaps roughly like the protransform > > feature. > I think this isn't either-or. I think a general hook can be useful for extensions that want to optimize particular data distributions/workloads using domain-knowledge about functions common for those workloads. That way users working with that data can use extensions to optimize workloads without writing C themselves. I also think a protransform like feature would add a lot of power to the native planner but this could take a while to get into core properly and may not handle all kinds of data distributions/cases. An example, of a case a protransform type system would not be able to optimize is mathematical operator expressions like bucketing integers by decile --- (integer / 10) * 10. This is somewhat analogous to date_trunc in the integer space and would also change the number of resulting distinct rows. > > I always imagined that extended statistics could be used for this. > Right now the estimates are much better when you create an index on > the function, but there's no real reason to limit the stats that are > gathered to just plain columns + expression indexes. > > I believe I'm not the only person to have considered this. Originally > extended statistics were named multivariate statistics. I think it was > Dean and I (maybe others too) that suggested to Tomas to give the > feature a more generic name so that it can be used for a more general > purpose later. > I also think that the point with extended statistics is a good one and points to the need for more experimentation/experience which I think a C hook is better suited for. Putting in a hook will allow extension writers like us to experiment and figure out the kinds of transform on statistics that are useful while having a small footprint on the core. I think designing a protransform-like system would benefit from more experience with the kinds of transformations that are useful. For example, can anything be done if the interval passed to date_trunc is not constant, or is it not even worth bothering with that case? Maybe extended statistics is a better approach, etc. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
On Mon, Mar 12, 2018 at 02:11:42PM +0100, Julian Markwort wrote: > > In same manner pg_stat_statements_good_plan_reset() and > > pg_stat_statements_bad_plan_reset() functions can be combined in > > function: > > > pg_stat_statements_plan_reset(in queryid bigint, in good boolean, in bad > > boolean) > > This is also sensible, however if any more kinds of plans would be added in > the future, there would be a lot of flags in this function. I think having > varying amounts of flags (between extension versions) as arguments to the > function also makes any upgrade paths difficult. Thinking more about this, we > could also user function overloading to avoid this. > An alternative would be to have the caller pass the type of plan he wants to > reset. Omitting the type would result in the deletion of all plans, maybe? > pg_stat_statements_plan_reset(in queryid bigint, in plan_type text) Yes, it is a good idea. But maybe instead of passing an empty string into plans type we should overload the function with only queryid argument. I think it is a common practice in PostgreSQL. Otherwise allowance to pass empty string as plan type may confuse users. So functions declaration may be the following: pg_stat_statements_plan_reset(in queryid bigint, in plan_type text) pg_stat_statements_plan_reset(in queryid bigint) + interquartile_dist = 2.0*(0.6745 * sqrt(e->counters.sum_var_time / e->counters.calls)); I think it would be better to have defines for 2.0 and 0.6745 values for the sake of readability. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [WIP PATCH] Index scan offset optimisation using visibility map
Hello. Tom, thanks a lot for your thorough review. > What you've done to > IndexNext() is a complete disaster from a modularity standpoint: it now > knows all about the interactions between index_getnext, index_getnext_tid, > and index_fetch_heap. I was looking into the current IndexOnlyNext implementation as a starting point - and it knows about index_getnext_tid and index_fetch_heap already. At the same time I was trying to keep patch non-invasive. Patched IndexNext now only knowns about index_getnext_tid and index_fetch_heap - the same as IndexOnlyNext. But yes, it probably could be done better. > I'm not sure about a nicer way to refactor that, but I'd suggest that > maybe you need an additional function in indexam.c that hides all this > knowledge about the internal behavior of an IndexScanDesc. I'll try to split index_getnext into two functions. A new one will do everything index_getnext does except index_fetch_heap. > Or that is, it knows too much and still not enough, > because it's flat out wrong for the case that xs_continue_hot is set. > You can't call index_getnext_tid when that's still true from last time. Oh.. Yes, clear error here. < The PredicateLockPage call also troubles me quite a bit, not only from < a modularity standpoint but because that implies a somewhat-user-visible < behavioral change when this optimization activates. People who are using < serializable mode are not going to find it to be an improvement if their < queries fail and need retries a lot more often than they did before. < I don't know if that problem is bad enough that we should disable skipping < when serializable mode is active, but it's something to think about. Current IndexOnlyScan already does that. And I think a user should expect such a change in serializable mode. > You haven't documented the behavior required for tuple-skipping in any > meaningful fashion, particularly not the expectation that the child plan > node will still return tuples that just need not contain any valid > content. Only nodeLimit could receive such tuples and they are immediately discarded. I'll add some comment to it. > is a gross hack and probably wrong. You could use ExecStoreAllNullTuple, > perhaps. Oh, nice, missed it. > I'm disturbed by the fact that you've not taught the planner about the > potential cost saving from this, so that it won't have any particular > reason to pick a regular indexscan over some other plan type when this > optimization applies. Maybe there's no practical way to do that, or maybe > it wouldn't really matter in practice; I've not looked into it. But not > doing anything feels like a hack. I was trying to do it. But current planner architecture does not provide a nice way to achive it due to the way it handles limit and offset. So, I think it is better to to be avoided for now. > Setting this back to Waiting on Author. I'll try to make the required changes in a few days. Thanks.
Re: PATCH: Unlogged tables re-initialization tests
Dagfinn Ilmari Mannsåker wrote: > $SIG{__DIE__} gets called even for exceptions that would be caught by a > surrunding eval block, so this should at the very least be: > > $SIG{__DIE__} = sub { BAIL_OUT(@_) unless $^S }; > > However, that is still wrong, because die() and BAIL_OUT() mean > different things: die() aborts the current test script, while BAIL_OUT() > aborts the entire 'prove' run, i.e. all subsequent scripts in the same > test directory. Sounds like 'die' is the correct thing, then, and that BAIL_OUT should be called sparingly ... for example this one in PostgresNode::start seems like an overreaction: BAIL_OUT("node \"$name\" is already running") if defined $self->{_pid}; Yes? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: Configurable file mode mask
Greetings, * David Steele (da...@pgmasters.net) wrote: > On 3/13/18 11:00 AM, Tom Lane wrote: > > Stephen Frostwrites: > >> * Michael Paquier (mich...@paquier.xyz) wrote: > >>> If the problem is parsing, it could as well be more portable to put that > >>> in the control file, no? > > > >> Then we'd need a tool to allow changing it for people who do want to > >> change it. There's no reason we should have to have an extra tool for > >> this- an administrator who chooses to change the privileges on the data > >> folder should be able to do so and I don't think anyone is going to > >> thank us for requiring them to use some special tool to do so for > >> PostgreSQL. > > > > FWIW, I took a quick look through this patch and don't have any problem > > with the approach, which appears to be "use the data directory's > > startup-time permissions as the status indicator". I am kinda wondering > > about this though: > > > > +(These files can confuse pg_ctl.) If group > > read > > +access is enabled on the data directory and an unprivileged user in the > > +PostgreSQL group is performing the backup, > > then > > +postmaster.pid will not be readable and must be > > +excluded. > > > > If we're allowing group read on the data directory, why should that not > > include postmaster.pid? There's nothing terribly security-relevant in > > there, and being able to find out the postmaster PID seems useful. I can > > see the point of restricting server key files, as documented elsewhere, > > but it's possible to keep those somewhere else so they don't cause > > problems for simple backup software. > > I'm OK with that. I had a look at the warnings regarding the required > mode of postmaster.pid in miscinit.c (889-911) and it seems to me they > still hold true if the mode is 640 instead of 600. > > Do you agree, Tom? Stephen? > > If so, I'll make those changes. I agree that we can still consider an EPERM-error case as being ok even with the changes proposed, and with the same-user check happening earlier in checkDataDir(), we won't even get to the point of looking at the pid file if the userid's don't match. The historical comment about the old datadir permissions can likely just be removed, perhaps replaced with a bit more commentary above that check in checkDataDir(). The open() call should also fail if we only have group-read privileges on the file (0640), but surely the kill() will in any case. Thanks! Stephen signature.asc Description: PGP signature
Re: PATCH: Exclude temp relations from base backup
Hi, On 2/28/18 10:55 AM, David Steele wrote: > This is a follow-up patch from the exclude unlogged relations discussion > [1]. > > The patch excludes temporary relations during a base backup using the > existing looks_like_temp_rel_name() function for identification. > > It shares code to identify database directories from [1], so for now > that has been duplicated in this patch to make it independent. I'll > rebase depending on what gets committed first. Updated the patch to change die() to BAIL_OUT() and use append_to_file() as suggested for another test patch [1]. Regards, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/6bc5d931-5b00-279f-f65a-26e32de400a6%40pgmasters.net From c83f3ae09bacb961c511ebe9cb1f9f79bf1e3d29 Mon Sep 17 00:00:00 2001 From: David SteeleDate: Tue, 13 Mar 2018 12:22:24 -0400 Subject: [PATCH 1/1] Exclude temporary relations from base backup. Exclude temporary relations during a base backup using the existing looks_like_temp_rel_name() function for identification. It shares code to identify database directories with [1], so for now that has been duplicated in this patch to make it independent. I'll rebase depending on what gets committed first. [1] https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net --- doc/src/sgml/protocol.sgml | 2 +- src/backend/replication/basebackup.c | 41 +++ src/backend/storage/file/fd.c| 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 49 +++- src/include/storage/fd.h | 1 + 5 files changed, 92 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 4fd61d7c2d..629a462574 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2565,7 +2565,7 @@ The commands accepted in replication mode are: Various temporary files and directories created during the operation of the PostgreSQL server, such as any file or directory beginning - with pgsql_tmp. + with pgsql_tmp and temporary relations. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 185f32a5f9..f0c3d13b2b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -26,6 +26,7 @@ #include "nodes/pg_list.h" #include "pgtar.h" #include "pgstat.h" +#include "port.h" #include "postmaster/syslogger.h" #include "replication/basebackup.h" #include "replication/walsender.h" @@ -958,6 +959,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, charpathbuf[MAXPGPATH * 2]; struct stat statbuf; int64 size = 0; + const char *lastDir; /* Split last dir from parent path. */ + boolisDbDir = false;/* Does this directory contain relations? */ + + /* +* Determine if the current path is a database directory that can +* contain relations. +* +* Start by finding the location of the delimiter between the parent +* path and the current path. +*/ + lastDir = last_dir_separator(path); + + /* Does this path look like a database path (i.e. all digits)? */ + if (lastDir != NULL && + strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1)) + { + /* Part of path that contains the parent directory. */ + int parentPathLen = lastDir - path; + + /* +* Mark path as a database directory if the parent path is either +* $PGDATA/base or a tablespace version path. +*/ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && +strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), +TABLESPACE_VERSION_DIRECTORY, +sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) + isDbDir = true; + } dir = AllocateDir(path); while ((de = ReadDir(dir, path)) != NULL) @@ -1007,6 +1038,16 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, if (excludeFound) continue; + /* Exclude temporary relations */ + if (isDbDir && looks_like_temp_rel_name(de->d_name)) + { + elog(DEBUG2, +"temporary relation file \"%s\" excluded from backup", +de->d_name); + + continue; + } + snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path,
Re: PATCH: Configurable file mode mask
Hi Michael, On 3/12/18 3:28 AM, Michael Paquier wrote: > On Fri, Mar 09, 2018 at 01:51:14PM -0500, David Steele wrote: >> How about a GUC that enforces one mode or the other on startup? Default >> would be 700. The GUC can be set automatically by initdb based on the >> -g option. We had this GUC originally, but since the front-end tools >> can't read it we abandoned it. Seems like it would be good as an >> enforcing mechanism, though. > > Hm. OK. I can see the whole set of points about that. Please let me > think a bit more about that bit. Do you think that there could be a > pool of users willing to switch from one mode to another? Compared to > your v1, we could indeed have a GUC which enforces a restriction to not > allow group access, and enabled by default. As the commit fest is > running and we don't have a clear picture yet, I am afraid that it may > be better to move that to v12, and focus on getting patches 1 and 2 > committed. This will provide a good base for the next move. > > There are three places where things are still not correct: > > - if (chmod(location, S_IRWXU) != 0) > + current_umask = umask(0); > + umask(current_umask); > + > + if (chmod(location, PG_DIR_MODE_DEFAULT & ~current_umask) != 0) > This is in tablespace.c. I have moved this hunk to 03 and used only PG_DIR_MODE_DEFAULT instead. > @@ -185,6 +186,9 @@ main(int argc, char **argv) > exit(1); > } > > + /* Set dir/file mode mask */ > + umask(PG_MODE_MASK_DEFAULT); > + > In pg_rewind and pg_resetwal, isn't that also a portion which is not > necessary without the group access feature? These seem like a good idea to me with or without patch 03. Some of our front-end tools (initdb, pg_upgrade) were setting umask and others weren't. I think it's more consistent (and safer) if they all do, at least if they are writing into PGDATA. > This is all I have basically for patch 2, which would be good for > shipping. Thanks! I'll attach new patches in a reply to [1] once I have made the changes Tom requested. -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/22928.1520953220%40sss.pgh.pa.us signature.asc Description: OpenPGP digital signature
Re: [submit code] I develop a tool for pgsql, how can I submit it
2018-03-13 12:19 GMT-03:00 leap: > I develop a tool for pgsql, I want to submit it on pgsql. > how can I submit it? > As Laurenz said a tool doesn't necessarily have to be part of PostgreSQL. Sometimes a lot of people ask for a tool, someone write it and the community decide to maintain it. I'm not sure this is the case for your tool. The pro is that the tool will be maintained by a group of experienced programmers (I'm not saying you can't have this group outside PostgreSQL project. You may have enough interest if people are excited by it). The con about this direction is that the tool development cycle is tied to PostgreSQL (which means 1-year cycle and slow development over time). Since I do a lot of debug it seems very useful for me. If you decide to convince people about the usefulness of your tool, you should: (i) remove some overlap between the tool and core, (ii) cleanup your code to follow the PostgreSQL guidelines, (iii) reuse core functions, (iv) reuse constants instead of redefine them and (v) document everything. Steps (i), (iii) and (iv) may require some code rearrange in PostgreSQL. Even if you decide to continue the development outside PostgreSQL, those steps would improve your code. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: ALTER TABLE ADD COLUMN fast default
On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstanwrote: > On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan > wrote: > >>> >>> Going by the commitfest app, the patch still does appear to be waiting >>> on Author. Never-the-less, I've made another pass over it and found a >>> few mistakes and a couple of ways to improve things: >>> >> >> working on these. Should have a new patch tomorrow. >> > > > Here is a patch that attends to most of these, except that I haven't > re-indented it. > > Your proposed changes to slot_getmissingattrs() wouldn't work, but I > have rewritten it in a way that avoids the double work you disliked. > > I'll rerun the benchmark tests I posted upthread and let you know the results. > Here are the benchmark results from the v15 patch. Fairly similar to previous results. I'm going to run some profiling again to see if I can identify any glaring hotspots. I do suspect that the "physical tlist" optimization sometimes turns out not to be one. It seems perverse to be able to improve a query's performance by dropping a column. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services results.t100r50k.v15 Description: Binary data results.t100r64.v15 Description: Binary data
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > I will continue working on improving the comments / cleaning things up and > post a revised version soon, but until then please look at the attached. I tried to give this a read. It looks pretty neat stuff -- as far as I can tell, it follows Robert's sketch for how this should work. The fact that it's under-commented makes me unable to follow it too closely though (I felt like adding a few "wtf?" comments here and there), so it's taking me a bit to follow things in detail. Please do submit improved versions as you have them. I think you're using an old version of pg_bsd_indent. In particular need of commentary * match_clause_to_partition_key() should indicate which params are output and what do they get * get_steps_using_prefix already has a comment, but it doesn't really explain much. (I'm not sure why you use the term "tuple" here. I mean, mathematically it probably makes sense, but in the overall context it seems just confusing.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SQL/JSON: functions
On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote: > The docs are here > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > It's not easy to write docs for SQL/JSON in xml, so I decided to write in more > friendly way. We'll have time to convert it to postgres format. If you aim at getting a feature committed first without its documentation, and getting the docs written after the feature freeze using a dedicated open item or such, this is much acceptable in my opinion and the CF is running short in time. -- Michael signature.asc Description: PGP signature
Re: SQL/JSON: functions
On 2018-03-14 07:54:35 +0900, Michael Paquier wrote: > On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote: > > The docs are here > > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > > > It's not easy to write docs for SQL/JSON in xml, so I decided to write in > > more > > friendly way. We'll have time to convert it to postgres format. > > If you aim at getting a feature committed first without its > documentation, and getting the docs written after the feature freeze > using a dedicated open item or such, this is much acceptable in my > opinion and the CF is running short in time. Given that this patch still uses PG_TRY/CATCH around as wide paths of code as a whole ExecEvalExpr() invocation, basically has gotten no review, I don't see this going anywhere for v11. Greetings, Andres Freund
Re: [HACKERS] path toward faster partition pruning
By the way, I checked whether patch 0002 (additional tests) had an effect on coverage, and couldn't detect any changes in terms of lines/functions. Were you able to find any bugs in your code thanks to the new tests that would not have been covered by existing tests? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT compiling with LLVM v11
Hi, On 2018-03-14 10:32:40 +1300, Thomas Munro wrote: > I decided to try this on a CentOS 7.2 box. It has LLVM 3.9 in the > 'epel' package repo, but unfortunately it only has clang 3.4. That's a bit odd, given llvm and clang really live in the same repo... > clang: error: unknown argument: '-fexcess-precision=standard' > clang: error: unknown argument: '-flto=thin' > > Ok, so I hacked src/Makefile.global.in to remove -flto=thin. I think I can get actually rid of that entirely. > It looks > like -fexcess-precision-standard is coming from a configure test that > was run against ${CC}, not against ${CLANG}, so I hacked the generated > src/Makefile.global to remove that too, just to see if I could get > past that. Yea, I'd hoped we could avoid duplicating all the configure tests, but maybe not :(. > Then I could build successfully and make check passed. I did see one warning: > > In file included from execExpr.c:39: > ../../../src/include/jit/jit.h:36:3: warning: redefinition of typedef > 'JitProviderCallbacks' is a C11 feature [-Wtypedef-redefinition] > } JitProviderCallbacks; > ^ > ../../../src/include/jit/jit.h:22:37: note: previous definition is here > typedef struct JitProviderCallbacks JitProviderCallbacks; > ^ Yep. Removed the second superflous / redundant typedef. Will push a heavily rebased version in a bit, will include fix for this. Greetings, Andres Freund
Re: neqjoinsel versus "refresh materialized view concurrently"
On Wed, Mar 14, 2018 at 11:34 AM, Thomas Munrowrote: > LOG: duration: 26101.612 ms plan: > Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE > newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 > OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid > OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 > Limit (cost=0.00..90.52 rows=1 width=28) (actual > time=26101.608..26101.608 rows=0 loops=1) > -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) > (actual time=26101.606..26101.606 rows=0 loops=1) >Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= > newdata2.*)) >Rows Removed by Join Filter: 2500 >-> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) > Filter: (newdata.* IS NOT NULL) >-> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual > time=0.000..0.500 rows=5000 loops=5000) > -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) >Filter: (newdata2.* IS NOT NULL) This plan is chosen because we're looking for just one row (LIMIT 1) that has equal data but a different ctid. In this case we're not going to find one, so we'll pay the full enormous cost of the nested loop, but the startup cost is estimated as 0 and we think we are going to find a row straight away. That's because we don't know that it's unlikely for there to be a row with the same columns but a different ctid. There is a fundamental and complicated estimation problem lurking here of course and I'm not sure what to think about that yet. Maybe there is a very simple fix for this particular problem: --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -660,7 +660,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, "(SELECT * FROM %s newdata2 WHERE newdata2 IS NOT NULL " "AND newdata2 OPERATOR(pg_catalog.*=) newdata " "AND newdata2.ctid OPERATOR(pg_catalog.<>) " -"newdata.ctid) LIMIT 1", +"newdata.ctid)", tempname, tempname); if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); That gets me back to the sort-merge plan, but maybe it's too superficial. -- Thomas Munro http://www.enterprisedb.com
Re: neqjoinsel versus "refresh materialized view concurrently"
Thomas Munrowrites: > This looks like an invisible correlation problem. Yeah --- the planner has no idea that the join rows satisfying newdata.* *= newdata2.* are likely to be exactly the ones not satisfying newdata.ctid <> newdata2.ctid. It's very accidental that we got a good plan before. I've not looked to see where this query is generated, but I wonder if we could make things better by dropping the LIMIT 1 and instead using an executor count parameter to stop early. regards, tom lane
JIT compiling with LLVM v12
Hi, I've pushed a revised and rebased version of my JIT patchset. The git tree is at https://git.postgresql.org/git/users/andresfreund/postgres.git in the jit branch https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/jit There's nothing hugely exciting, mostly lots of cleanups. - added some basic EXPLAIN output, displaying JIT options and time spent jitting (see todo below) JIT: Functions: 9 Generation Time: 4.604 Inlining: false Inlining Time: 0.000 Optimization: false Optimization Time: 0.585 Emission Time: 12.858 - Fixed bugs around alignment computations in tuple deforming. Wasn't able to trigger any bad consequences, but it was clearly wrong. - Worked a lot on making code more pgindent safe. There's still some minor layout damage, but it's mostly ok now. For that I had to add a bunch of helpers that make the code shorter - Freshly emitted functions now have proper attributes indicating architecture, floating point behaviour etc. That's what previously prevented the inliner of doing its job without forcing its hand. That yields a bit of a speedup. - reduced size of code a bit by deduplicating code, in particular don't "manually" create signatures for function declarations anymore. Besides deduplicating, this also ensures code generation time errors when function signatures change. - fixed a number of FIXMEs etc - added a lot of comments - portability fixes (OSX, freebsd) Todo: - some build issues with old clang versions pointed out by Thomas Munro - when to take jit_expressions into account (both exec and plan or just latter) - EXPLAIN for queries that are JITed should display units. Starting thread about effort to not duplicate code for that - more explanations of type & function signature syncing - GUC docs (including postgresql.conf.sample) Thanks everyone, particularly Peter in this update, for helping me along! Regards, Andres
Re: neqjoinsel versus "refresh materialized view concurrently"
On Wed, Mar 14, 2018 at 8:07 AM, Jeff Janeswrote: > The following commit has caused a devastating performance regression > in concurrent refresh of MV: > > commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 > Author: Tom Lane > Date: Wed Nov 29 22:00:29 2017 -0500 > > Fix neqjoinsel's behavior for semi/anti join cases. > > > The below reproduction goes from taking about 1 second to refresh, to taking > an amount of time I don't have the patience to measure. > > drop table foobar2 cascade; > create table foobar2 as select * from generate_series(1,20); > create materialized view foobar3 as select * from foobar2; > create unique index on foobar3 (generate_series ); > analyze foobar3; > refresh materialized view CONCURRENTLY foobar3 ; > > > When I interrupt the refresh, I get a message including this line: > > CONTEXT: SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420 newdata > WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16420 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) > newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1" > > So I makes sense that the commit in question could have caused a change in > the execution plan. Because these are temp tables, I can't easily get my > hands on them to investigate further. Ouch. A quadratic join. This looks like an invisible correlation problem. load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; drop table if exists t cascade; create table t as select generate_series(1, 5000); create materialized view mv as select * from t; create unique index on mv(generate_series); analyze mv; refresh materialized view concurrently mv; HEAD: LOG: duration: 26101.612 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=0.00..90.52 rows=1 width=28) (actual time=26101.608..26101.608 rows=0 loops=1) -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) (actual time=26101.606..26101.606 rows=0 loops=1) Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= newdata2.*)) Rows Removed by Join Filter: 2500 -> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual time=0.000..0.500 rows=5000 loops=5000) -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) And with commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 reverted: LOG: duration: 36.358 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16470 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16470 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=756.95..939.50 rows=1 width=28) (actual time=36.354..36.354 rows=0 loops=1) -> Merge Semi Join (cost=756.95..2947.51 rows=12 width=28) (actual time=36.352..36.352 rows=0 loops=1) Merge Cond: (newdata.* *= newdata2.*) Join Filter: (newdata2.ctid <> newdata.ctid) Rows Removed by Join Filter: 5000 -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=9.622..10.300 rows=5000 loops=1) Sort Key: newdata.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.021..4.986 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=7.378..8.010 rows=5000 loops=1) Sort Key: newdata2.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.017..3.034 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) -- Thomas Munro http://www.enterprisedb.com
Re: PATCH: Configurable file mode mask
On 03/13/2018 10:40 AM, Stephen Frost wrote: > * Michael Paquier (mich...@paquier.xyz) wrote: >> Well, one thing is that the current checks in the postmaster make sure >> that a data folder is never using anything else than 0700. From a >> security point of view, making it possible to allow a postmaster to >> start with 0750 is a step backwards ... > Lastly, the user *is* able to enforce the privileges on the data > directory if they wish to, using tools such as tripwire which are built > specifically to provide such checks and to do so regularly instead of > the extremely ad-hoc check provided by PG. > > ... Ultimately, the default which makes sense here isn't a > one-size-fits-all but is system dependent and the administrator should > be able to choose what permissions they want to have. Hear, hear. Returning for a moment again to https://www.postgresql.org/message-id/8b16cc30-0187-311e-04b5-1f32446d89dd%40anastigmatix.net we see that a stat() returning mode 0750 on a modern system may not even mean there is any group access at all. In that example, the datadir had these permissions: # getfacl . # file: . # owner: postgres # group: postgres user::rwx user:root:r-x group::--- mask::r-x other::--- While PostgreSQL does its stat() and interprets the mode as if it is still on a Unix box from the '80s, two things have changed underneath it: POSIX ACLs and Linux capabilities. Capabilities take the place of the former specialness of root, who now needs to be granted r-x explicitly in the ACL to be able to read stuff there at all, and there is clearly no access to group and no access to other. It would be hard for anybody to call this an insecure configuration. But when you stat() an object with a POSIX ACL, you get the 'mask' value where the 'group' bits used to go, so postgres sees this as 0750, thinks the 5 represents group access, and refuses to start. Purely a mistake. It's the kind of mistake that is inherent in this sort of check, which tries to draw conclusions from the semantics it assumes, while systems evolve and semantics march along. One hypothetical fix would be to add: #ifdef HAS_POSIX_ACLS ... check if there's really an ACL here, and the S_IRWXG bits are really just the mask, or even try to pass judgment on whether the admin's chosen ACL is adequately secure ... #endif but then sooner or later it will still end up making assumptions that aren't true under, say, SELinux, so there's another #ifdef, and where does it end? On 03/13/2018 11:00 AM, Tom Lane wrote: > In a larger sense, this fails to explain the operating principle, > namely what I stated above, that we allow the existing permissions > on PGDATA to decide what we allow as group permissions. I admit I've only skimmed the patches, trying to determine what that will mean in practice. In a case like the ACL example above, does this mean that postgres will stat PGDATA, conclude incorrectly that group access is granted, and then, based on that, actually go granting unwanted group access for real on newly-created files and directories? That doesn't seem ideal. On 03/13/2018 10:45 AM, David Steele wrote: > As Stephen notes, this can be enforced by the user if they want to, > and without much effort (and with better tools). To me, that seems really the key. An --allow-group-access option is nice (but, as we see, misleading if its assumptions are outdated regarding how the filesystem works), but I would plug even harder for a --permission-transparency option, which would just assume that the admin is making security arrangements, through mechanisms that postgres may or may not even understand. The admin can create ACLs with default entries that propagate to newly created objects. SELinux contexts can work in similar ways. The admin controls the umask inherited by the postmaster. A --permission-transparency option would simply use open and mkdir in the traditional ways, allowing the chosen umask, ACL defaults, SELinux contexts, etc., to do their thing, and would avoid then stepping on the results with explicit chmods (and of course skip the I-refuse-to-start-because-I- misunderstand-your-setup check). It wouldn't be for every casual install, but it would be the best way to stay out of the way of an admin securing a system with modern facilities. A lot of the design effort put into debating what postgres itself should or shouldn't insist on could then, perhaps, go into writing postgres-specific configuration rule packages for some of those better configuration-checking tools, and there it might even be possible to write tests that incorporate knowledge of ACLs, SELinux, etc. -Chap
Re: Additional Statistics Hooks
Mat Aryewrites: > An example, of a case a protransform type system would not be able to > optimize is mathematical operator expressions like bucketing integers by > decile --- (integer / 10) * 10. Uh, why not? An estimation function that is specific to integer divide shouldn't have much trouble figuring out that x/10 has one-tenth as many distinct values as x does. I'd certainly rather have that knowledge associated directly with int4div, and the corresponding knowledge about date_trunc associated with that function, and similar knowledge about extension-provided operators provided by the extensions, than try to maintain a hook function that embeds all such knowledge. > I also think that the point with extended statistics is a good one and > points to the need for more experimentation/experience which I think > a C hook is better suited for. Putting in a hook will allow extension > writers like us to experiment and figure out the kinds of transform on > statistics that are useful while having > a small footprint on the core. If you're experimenting you might as well just change the source code. A hook is only useful if you're trying to ship something for production, and I doubt that factorizing things this way is a credible production solution. regards, tom lane
Inquiry regarding the projects under GSOC
Hello Postgres Team I am Ankit 3rd year B.tech student from India I am really interested to work along with your organisation as an intern to rectify the issues and bugs as I love challenges and stated in the docs that some features might be not possible to implement so I would love to try. My inquiry is that what are the current bug fixes or features updates available for interns to work upon Thanks !!
Re: PATCH: Configurable file mode mask
Greetings Chapman, * Chapman Flack (c...@anastigmatix.net) wrote: > On 03/13/2018 10:40 AM, Stephen Frost wrote: > > ... Ultimately, the default which makes sense here isn't a > > one-size-fits-all but is system dependent and the administrator should > > be able to choose what permissions they want to have. > > Hear, hear. Returning for a moment again to > > https://www.postgresql.org/message-id/8b16cc30-0187-311e-04b5-1f32446d89dd%40anastigmatix.net > > we see that a stat() returning mode 0750 on a modern system may not > even mean there is any group access at all. In that example, the > datadir had these permissions: Yes, that's true, but PG has never paid attention to POSIX ACLs or Linux capabilities. Changing it to do so is quite a bit beyond the scope of this particular patch and I don't see anything in what this patch is doing which would preclude someone from putting in that effort in the future. > While PostgreSQL does its stat() and interprets the mode as if it is > still on a Unix box from the '80s, two things have changed underneath > it: POSIX ACLs and Linux capabilities. Capabilities take the place of > the former specialness of root, who now needs to be granted r-x > explicitly in the ACL to be able to read stuff there at all, and there > is clearly no access to group and no access to other. It would be hard > for anybody to call this an insecure configuration. But when you stat() > an object with a POSIX ACL, you get the 'mask' value where the 'group' > bits used to go, so postgres sees this as 0750, thinks the 5 represents > group access, and refuses to start. Purely a mistake. I'll point out that PG does run on quite a few other OS's beyond Linux. > It's the kind of mistake that is inherent in this sort of check, > which tries to draw conclusions from the semantics it assumes, while > systems evolve and semantics march along. One hypothetical fix would > be to add: > > #ifdef HAS_POSIX_ACLS > ... check if there's really an ACL here, and the S_IRWXG bits are > really just the mask, or even try to pass judgment on whether the > admin's chosen ACL is adequately secure ... > #endif > > but then sooner or later it will still end up making assumptions > that aren't true under, say, SELinux, so there's another #ifdef, > and where does it end? I agree with this general concern. > On 03/13/2018 11:00 AM, Tom Lane wrote: > > In a larger sense, this fails to explain the operating principle, > > namely what I stated above, that we allow the existing permissions > > on PGDATA to decide what we allow as group permissions. > > I admit I've only skimmed the patches, trying to determine what > that will mean in practice. In a case like the ACL example above, > does this mean that postgres will stat PGDATA, conclude incorrectly > that group access is granted, and then, based on that, actually go > granting unwanted group access for real on newly-created files > and directories? PG will stat PGDATA and conclude that the system is saying that group access has been granted on PGDATA and will do the same for subsequent files created later on. This is new in PG, so there isn't any concern about this causing problems in an existing environment- you couldn't have had those ACLs on an existing PGDATA dir in the first place, as you note above. The only issue that remains is that PG doesn't understand or work with POSIX ACLs or Linux capabilities, but that's not anything new. > On 03/13/2018 10:45 AM, David Steele wrote: > > As Stephen notes, this can be enforced by the user if they want to, > > and without much effort (and with better tools). > > To me, that seems really the key. An --allow-group-access option is > nice (but, as we see, misleading if its assumptions are outdated > regarding how the filesystem works), but I would plug even harder for > a --permission-transparency option, which would just assume that the > admin is making security arrangements, through mechanisms that > postgres may or may not even understand. The admin can create ACLs > with default entries that propagate to newly created objects. > SELinux contexts can work in similar ways. The admin controls the > umask inherited by the postmaster. A --permission-transparency option > would simply use open and mkdir in the traditional ways, allowing > the chosen umask, ACL defaults, SELinux contexts, etc., to do their > thing, and would avoid then stepping on the results with explicit > chmods (and of course skip the I-refuse-to-start-because-I- > misunderstand-your-setup check). > > It wouldn't be for every casual install, but it would be the best > way to stay out of the way of an admin securing a system with modern > facilities. > > A lot of the design effort put into debating what postgres itself > should or shouldn't insist on could then, perhaps, go into writing > postgres-specific configuration rule packages for some of those > better configuration-checking tools, and there it might even be > possible to write tests
Re: [HACKERS] path toward faster partition pruning
Hi Amit, On 03/13/2018 07:37 AM, Amit Langote wrote: I will continue working on improving the comments / cleaning things up and post a revised version soon, but until then please look at the attached. Passes check-world. Some minor comments: 0001: Ok 0002: Ok 0003: * Trailing white space * pruning.c - partkey_datum_from_expr * "Add more expression types..." -- Are you planning to add more of these ? Otherwise change the comment - get_partitions_for_null_keys * Documentation for method * 'break;' missing for _HASH and default case - get_partitions_for_keys * 'break;'s are outside of the 'case' blocks * The 'switch(opstrategy)'s could use some {} blocks * 'break;' missing from default - perform_pruning_combine_step * Documentation for method * nodeFuncs.c - Missing 'break;'s to follow style 0004: Ok Best regards, Jesper
[submit code] I develop a tool for pgsql, how can I submit it
hello! I develop a tool for pgsql, I want to submit it on pgsql. how can I submit it? address: https://github.com/leapking/pgcheck
Re: PATCH: Configurable file mode mask
David Steelewrites: > On 3/12/18 3:28 AM, Michael Paquier wrote: >> In pg_rewind and pg_resetwal, isn't that also a portion which is not >> necessary without the group access feature? > These seem like a good idea to me with or without patch 03. Some of our > front-end tools (initdb, pg_upgrade) were setting umask and others > weren't. I think it's more consistent (and safer) if they all do, at > least if they are writing into PGDATA. +1 ... see a926eb84e for an example of how easy it is to screw up if the process's overall umask is permissive. regards, tom lane
Re: add queryEnv to ExplainOneQuery_hook
On Wed, Mar 14, 2018 at 05:36:26PM +1300, Thomas Munro wrote: > Hmm. I suppose we could have invented a new extended hook with a > different name and back-patched it so that PG10 would support both. > Then binary compatibility with existing compiled extensions wouldn't > be affected AFAICS, but you could use the new extended hook in (say) > 10.4 or higher. Then for PG11 (or later) we could remove the old hook > and just keep the new one. I suppose that option is still technically > open to us, though I'm not sure of the committers' appetite for messing > with back branches like that. The interactions between both hooks would not be difficult to define: if the original hook is not defined, just do not trigger the second. Still that's too late for v10, so I would rather let it go. New features are not backpatched. -- Michael signature.asc Description: PGP signature
Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types
On Fri, Mar 09, 2018 at 04:42:30PM -0500, Peter Eisentraut wrote: > It seems, however, that PGhost() has always been broken for hostaddr > use. In 9.6 (before the multiple-hosts stuff was introduced), when > connecting to "hostaddr=127.0.0.1", PGhost() returns "/tmp". Urgh. > > I think we should decide what PGhost() is supposed to mean when hostaddr > is in use, and then make a fix for that consistently across all versions. Hm. The only inconsistency I can see here is when "host" is not defined but "hostaddr" is, so why not make PQhost return the value of "hostaddr" in this case? -- Michael signature.asc Description: PGP signature
Re: User defined data types in Logical Replication
On Wed, Mar 7, 2018 at 9:19 PM, Masahiko Sawadawrote: > On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera > wrote: >> Masahiko Sawada wrote: >>> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera >>> wrote: >> >>> > Therefore, I'm inclined to make this function raise a warning, then >>> > return a substitute value (something like "unrecognized type XYZ"). >>> > [...] >>> >>> I agree with you about not hiding the actual reason for the error but >>> if we raise a warning at logicalrep_typmap_gettypname don't we call >>> slot_store_error_callback recursively? >> >> Hmm, now that you mention it, I don't really know. I think it's >> supposed not to happen, since calling ereport() again opens a new >> recursion level, but then maybe errcontext doesn't depend on the >> recursion level ... I haven't checked. This is why the TAP test would >> be handy :-) > > The calling ereport opens a new recursion level. The calling ereport > with error doesn't return to caller but the calling with warning does. > So the recursively calling ereport(WARNING) ends up with exceeding the > errordata stack size. So it seems to me that we can set errcontext in > logicalrep_typmap_gettypname() instead of raising warning or error. > >> >>> Agreed. Will add a TAP test. >> >> Great. This patch waits on that, then. >> > > Okay. I think the most simple and convenient way to reproduce this > issue is to call an elog(LOG) in input function of a user-defined data > type. So I'm thinking to create the test in src/test/module directory. > After more thought, I think since the errors in logicalrep_typmap_gettypname are not relevant with the actual error (i.g. type conversion error) it would not be good idea to show the error message of "could not found data type entry" in errcontext. It might be more appropriate if we return a substitute value ("unrecognized type" or "unrecognized built-in type") without raising neither an error nor a warning. Thoughts? Regarding to regression test, I added a new test module test_subscription that creates a new user-defined data type. In a subscription regression test, using test_subscription we make subscriber call slot_store_error_callback and check if the subscriber can correctly look up both remote and local data type strings. One downside of this regression test is that in a failure case, the duration of the test will be long (up to 180sec) because it has to wait for the polling timeout. Attached an updated patch with a regression test. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_slot_store_error_callback_v13.patch Description: Binary data
planner bug regarding lateral and subquery?
Hi Hackers, I found a bug, maybe. If it is able to get an explain command result from below query successfully, I think that it means the query is executable. However, I got an error by executing the query without an explain command. I guess that planner makes a wrong plan. I share a reproduction procedure and query results on 3b7ab4380440d7b14ee390fabf39f6d87d7491e2. * Reproduction create table test (c1 integer, c2 integer, c3 text); insert into test values (1, 3, 'a'); insert into test values (2, 4, 'b'); explain (costs off) select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; * Result of Explain: succeeded # explain (costs off) select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; QUERY PLAN --- Nested Loop InitPlan 1 (returns $0) -> Seq Scan on test InitPlan 2 (returns $1) -> Seq Scan on test test_1 -> Seq Scan on test ref_0 -> Nested Loop Left Join Join Filter: ($1 = ref_2.c1) -> Seq Scan on test ref_2 -> Materialize -> Result One-Time Filter: ($0 IS NULL) -> Seq Scan on test ref_1 * Result of Select: failed # select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; ERROR: more than one row returned by a subquery used as an expression * The error message came from here ./src/backend/executor/nodeSubplan.c if (found && (subLinkType == EXPR_SUBLINK || subLinkType == MULTIEXPR_SUBLINK || subLinkType == ROWCOMPARE_SUBLINK)) ereport(ERROR, (errcode(ERRCODE_CARDINALITY_VIOLATION), errmsg("more than one row returned by a subquery used as an expression"))); Thanks, Tatsuro Yamada
Re: planner bug regarding lateral and subquery?
On Tuesday, March 13, 2018, Tatsuro Yamadawrote: > Hi Hackers, > > I found a bug, maybe. > If it is able to get an explain command result from below query > successfully, > I think that it means the query is executable. > There is a difference between executable, compilable, and able to execute to completion, runtime, on specific data. You've proven the former but as the error indicates specific data causes the complete execution of the query to fail. I can write "select cola / colb from tbl" and as long as there are no zeros in colb the query will complete, but if there is you get a divide by zero runtime error. This is similar. David J.
Re: Faster inserts with mostly-monotonically increasing values
On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freirewrote: > On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee > > > > > Yes, I will try that next - it seems like a good idea. So the idea would > be: > > check if the block is still the rightmost block and the insertion-key is > > greater than the first key in the page. If those conditions are > satisfied, > > then we do a regular binary search within the page to find the correct > > location. This might add an overhead of binary search when keys are > strictly > > ordered and a single client is inserting the data. If that becomes a > > concern, we might be able to look for that special case too and optimise > for > > it too. > > Yeah, pretty much that's the idea. Beware, if the new item doesn't > fall in the rightmost place, you still need to check for serialization > conflicts. > So I've been toying with this idea since yesterday and I am quite puzzled with the results. See the attached patch which compares the insertion key with the last key inserted by this backend, if the cached block is still the rightmost block in the tree. I initially only compared with the first key in the page, but I tried this version because of the strange performance regression which I still have no answers. For a small number of clients, the patched version does better. But as the number of clients go up, the patched version significantly underperforms master. I roughly counted the number of times the fastpath is taken and I noticed that almost 98% inserts take the fastpath. I first thought that the "firstkey" location in the page might be becoming a hot-spot for concurrent processes and hence changed that to track the per-backend last offset and compare against that the next time. But that did not help much. +-++---+ | clients | Master - Avg load time in sec | Patched - Avg load time in sec | +-++---+ | 1 | 500.0725203| 347.632079| +-++---+ | 2 | 308.4580771| 263.9120163 | +-++---+ | 4 | 359.4851779| 514.7187444 | +-++---+ | 8 | 476.4062592| 780.2540855 | +-++---+ The perf data does not show anything interesting either. I mean there is a reduction in CPU time spent in btree related code in the patched version, but the overall execution time to insert the same number of records go up significantly. Perf (master): === + 72.59% 1.81% postgres postgres[.] ExecInsert + 47.55% 1.27% postgres postgres[.] ExecInsertIndexTuples + 44.24% 0.48% postgres postgres[.] btinsert - 42.40% 0.87% postgres postgres[.] _bt_doinsert - 41.52% _bt_doinsert + 21.14% _bt_search + 12.57% _bt_insertonpg + 2.03% _bt_binsrch 1.60% _bt_mkscankey 1.20% LWLockAcquire + 1.03% _bt_freestack 0.67% LWLockRelease 0.57% _bt_check_unique + 0.87% _start + 26.03% 0.95% postgres postgres[.] ExecScan + 21.14% 0.82% postgres postgres[.] _bt_search + 20.70% 1.31% postgres postgres[.] ExecInterpExpr + 19.05% 1.14% postgres postgres[.] heap_insert + 18.84% 1.16% postgres postgres[.] nextval_internal + 18.08% 0.84% postgres postgres[.] ReadBufferExtended + 17.24% 2.03% postgres postgres[.] ReadBuffer_common + 12.57% 0.59% postgres postgres[.] _bt_insertonpg + 11.12% 1.63% postgres postgres[.] XLogInsert +9.90% 0.10% postgres postgres[.] _bt_relandgetbuf +8.97% 1.16% postgres postgres[.] LWLockAcquire +8.42% 2.03% postgres postgres[.] XLogInsertRecord +7.26% 1.01% postgres postgres[.] _bt_binsrch +7.07% 1.20% postgres postgres[.] RelationGetBufferForTuple +6.27% 4.92% postgres postgres[.] _bt_compare +5.97% 0.63% postgres postgres[.] read_seq_tuple.isra.3 +5.70% 4.89% postgres postgres[.] hash_search_with_hash_value +5.44% 5.44% postgres postgres[.] LWLockAttemptLock Perf (Patched): + 69.33% 2.36% postgres postgres[.] ExecInsert + 35.21% 0.64% postgres postgres[.] ExecInsertIndexTuples - 32.14% 0.45% postgres
Re: inserts into partitioned table may cause crash
Fujita-san, Thanks for the updates and sorry I couldn't reply sooner. On 2018/03/06 21:26, Etsuro Fujita wrote: > One thing I notice while working on this is this in ExecInsert/CopyFrom, > which I moved to ExecPrepareTupleRouting as-is for the former: > > /* > * If we're capturing transition tuples, we might need to convert from > the > * partition rowtype to parent rowtype. > */ > if (mtstate->mt_transition_capture != NULL) > { > if (resultRelInfo->ri_TrigDesc && > (resultRelInfo->ri_TrigDesc->trig_insert_before_row || > resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) > { > /* > * If there are any BEFORE or INSTEAD triggers on the partition, > * we'll have to be ready to convert their result back to > * tuplestore format. > */ > mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL; > mtstate->mt_transition_capture->tcs_map = > TupConvMapForLeaf(proute, rootRelInfo, leaf_part_index); > } > > Do we need to consider INSTEAD triggers here? The partition is either a > plain table or a foreign table, so I don't think it can have those > triggers. Am I missing something? I think you're right. We don't need to consider INSTEAD OF triggers here if we know we're dealing with a partition which cannot have those. Just to make sure, a word from Thomas would help. On 2018/03/12 12:25, Etsuro Fujita wrote: > (2018/03/09 20:18), Etsuro Fujita wrote: >> Here are updated patches for PG10 and HEAD. >> >> Other changes: >> * Add regression tests based on your test cases shown upthread > > I added a little bit more regression tests and revised comments. Please > find attached an updated patch. Thanks for adding the test. I was concerned that ExecMaterializeSlot will be called twice now -- first in ExecPrepareTupleRouting and then in ExecInsert -- instead of only once in ExecInsert with the existing code, but perhaps it doesn't matter much because most of the work would be done in the 1st call anyway. Sorry that this may be nitpicking that I should've brought up before, but doesn't ExecPrepareTupleRouting do all the work that's needed for routing a tuple and hence isn't the name a bit misleading? Maybe, ExecPerformTupleRouting? Btw, I noticed that the patches place ExecPrepareTupleRouting (both the declaration and the definition) at different relative locations within nodeModifyTable.c in the HEAD branch vs. in the 10 branch. It might be a good idea to bring them to the same relative location to avoid hassle when back-patching relevant patches in the future. I did that in the attached updated version (v4) of the patch for HEAD, which does not make any other changes. Although, the patch for PG-10 is unchanged, I still changed its file name to contain v4. Regards, Amit *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 2656,2668 CopyFrom(CopyState cstate) if (cstate->transition_capture != NULL) { if (resultRelInfo->ri_TrigDesc && ! (resultRelInfo->ri_TrigDesc->trig_insert_before_row || ! resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) { /* !* If there are any BEFORE or INSTEAD triggers on the !* partition, we'll have to be ready to convert their !* result back to tuplestore format. */ cstate->transition_capture->tcs_original_insert_tuple = NULL; cstate->transition_capture->tcs_map = --- 2656,2667 if (cstate->transition_capture != NULL) { if (resultRelInfo->ri_TrigDesc && ! resultRelInfo->ri_TrigDesc->trig_insert_before_row) { /* !* If there are any BEFORE triggers on the partition, !* we'll have to be ready to convert their result back to !* tuplestore format. */ cstate->transition_capture->tcs_original_insert_tuple = NULL; cstate->transition_capture->tcs_map = *** *** 2803,2814 CopyFrom(CopyState cstate) * tuples inserted by an INSERT command. */ processed++; !
Re: [HACKERS] GUC for cleanup indexes threshold.
On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkovwrote: > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada > wrote: >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov >> wrote: >> > 2) These parameters are reset during btbulkdelete() and set during >> > btvacuumcleanup(). >> >> Can't we set these parameters even during btbulkdelete()? By keeping >> them up to date, we will able to avoid an unnecessary cleanup vacuums >> even after index bulk-delete. > > > We certainly can update cleanup-related parameters during btbulkdelete(). > However, in this case we would update B-tree meta-page during each > VACUUM cycle. That may cause some overhead for non append-only > workloads. I don't think this overhead would be sensible, because in > non append-only scenarios VACUUM typically writes much more of information. > But I would like this oriented to append-only workload patch to be > as harmless as possible for other workloads. What overhead are you referring here? I guess the overhead is only the calculating the oldest btpo.xact. And I think it would be harmless. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] path toward faster partition pruning
On 14 March 2018 at 06:54, Jesper Pedersenwrote: > * "Add more expression types..." -- Are you planning to add more of > these ? Otherwise change the comment Run-time pruning will deal with Param types here. The comment there might be just to remind us all that the function must remain generic enough so we can support more node types later. I don't particularly need the comment for that patch, and I'm not quite sure if I should be removing it in that patch. My imagination is not stretching far enough today to think what we could use beyond Const and Param. I don't feel strongly about the comment either way. This is just to let you know that there are more up and coming things to do in that spot. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re:Re: Re: [GSOC 18] Performance Farm Project——Initialization Project
Hi Dave, I am willing to use React to re-create the front-end application. Since I plan to use separate front-end and back-end development methods, this means that there will be no html files in Django applications. Front-end and back-end applications will interact via restful api. I will use react to rewrite some existing html files if needed. Before initializing the react application, I want to learn more about your tendency toward front-end technology. - About React: Which version of React (15.x or 16) and react-router (2.x or 4) do you tend to use? - About Package Manager: Do you tend to use yarn or npm? - About the UI library: I guess you might prefer Bootstrap or Material-UI. At the same time I hope you can help me understand the functional requirements of this project more clearly. Here are some of my thoughts on PerfFarm: - I see this comment in the client folder (In line 15 of the "pgperffarm \ client \ benchmarks \ pgbench.py" file): ''' # TODO allow running custom scripts, not just the default ''' Will PerfFarm have many test items so that the test item search function or even the test item sort function is expected to be provided? - What value will be used to determine if a machine's performance is improving or declining? - After the user logs in to the site, I think it may be necessary to provide a page for the user to browse or manage his own machine. Is there any function that requires users to log in before they can use it? - When I registered a machine on BuildFarm, I did not receive the confirmation email immediately but several days later. In PerfFarm, when a user registers a machine, will the administrator review it before sending a confirmation email? - I see BuildFarm assigning an animal name to each registered machine. Will PerfFarm also have this interesting feature? My postgresql.org community account is: - Username: maleicacid - Email: cs_maleica...@163.com I hope to get the commit permission of the pgperffarm.git repository. I am willing to continue coding and complete the project on the existing code.Looking forward to your reply. Best Regards, Hongyuan Ma (cs_maleica...@163.com) At 2018-03-13 03:03:08, "Dave Page"wrote: Hi On Mon, Mar 12, 2018 at 9:57 AM, Hongyuan Ma wrote: Hi Dave, Thank you for your reminder. This is indeed my negligence. In fact, I have browsed the code in the pgperffarm.git repository, including the client folder and the web folder. However, I found that the web folder has some unfinished html files and does not contain model class files. And the project used Django 1.8 without importing the Django REST Framework (When I talked to Mark about the PerfFarm project, he told me he insisted on using Django 1.11 and considered using the REST framework). So I mistakenly thought that the code in the web folder had been shelved. Nope, not at all. It just wasn't updated to meet the latest PG infrastructure requirements yet (basically just the update to Django 11). The rest should be fine as-is. In the newly initialized Django application, I upgraded its version and assigned the directory to make the project structure clearer. I want to use a separate front-end development approach (The front-end applications will use vue.js for programming.). I hope you can allow me to use it instead of the old one. I am willing to integrate the auth module into this new application as soon as possible. I would much prefer to see jQuery + React, purely because there are likely more PostgreSQL developers (particularly from the pgAdmin team) that know those technologies. It is important to consider long term maintenance as well as ease of initial development with any project. Thanks. Regards, Hongyuan Ma (cs_maleica...@163.com) 在 2018-03-12 02:26:43,"Dave Page" 写道: Hi Maybe I’m missing something (I’ve been offline a lot recently for unavoidable reasons), but the perf farm project already has a Django backend initialised and configured to work with community auth, on community infrastructure. https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary On Sunday, March 11, 2018, Hongyuan Ma wrote: Hello, mark I initialized a Django project and imported the Django REST Framework. Its github address is: https://github.com/PGPerfFarm/server-code I created some model classes and also wrote scripts in the dbtools folder to import simulation data for development. I am hesitant to use admin or xadmin as the administrator's backend for the site (I prefer to use xadmin). I also initialized the website's front-end application. Here is its address: https://github.com/PGPerfFarm/front-end-code.git I wrote the header component as shown: I hope this effect can enhance the future user experience:). This application uses vue.js and element-ui, but if you insist on using other js libraries or not using js, please let me
Re: Faster inserts with mostly-monotonically increasing values
On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolaseewrote: > > > On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire > wrote: >> >> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee >> >> > >> > Yes, I will try that next - it seems like a good idea. So the idea would >> > be: >> > check if the block is still the rightmost block and the insertion-key is >> > greater than the first key in the page. If those conditions are >> > satisfied, >> > then we do a regular binary search within the page to find the correct >> > location. This might add an overhead of binary search when keys are >> > strictly >> > ordered and a single client is inserting the data. If that becomes a >> > concern, we might be able to look for that special case too and optimise >> > for >> > it too. >> >> Yeah, pretty much that's the idea. Beware, if the new item doesn't >> fall in the rightmost place, you still need to check for serialization >> conflicts. > > > So I've been toying with this idea since yesterday and I am quite puzzled > with the results. See the attached patch which compares the insertion key > with the last key inserted by this backend, if the cached block is still the > rightmost block in the tree. I initially only compared with the first key in > the page, but I tried this version because of the strange performance > regression which I still have no answers. > > For a small number of clients, the patched version does better. But as the > number of clients go up, the patched version significantly underperforms > master. I roughly counted the number of times the fastpath is taken and I > noticed that almost 98% inserts take the fastpath. I first thought that the > "firstkey" location in the page might be becoming a hot-spot for concurrent > processes and hence changed that to track the per-backend last offset and > compare against that the next time. But that did not help much. > > +-++---+ > | clients | Master - Avg load time in sec | Patched - Avg load time in sec | > +-++---+ > | 1 | 500.0725203| 347.632079| > +-++---+ > | 2 | 308.4580771| 263.9120163 | > +-++---+ > | 4 | 359.4851779| 514.7187444 | > +-++---+ > | 8 | 476.4062592| 780.2540855 | > +-++---+ > > The perf data does not show anything interesting either. I mean there is a > reduction in CPU time spent in btree related code in the patched version, > but the overall execution time to insert the same number of records go up > significantly. > > Perf (master): > === > > + 72.59% 1.81% postgres postgres[.] ExecInsert > + 47.55% 1.27% postgres postgres[.] > ExecInsertIndexTuples > + 44.24% 0.48% postgres postgres[.] btinsert > - 42.40% 0.87% postgres postgres[.] _bt_doinsert >- 41.52% _bt_doinsert > + 21.14% _bt_search > + 12.57% _bt_insertonpg > + 2.03% _bt_binsrch > 1.60% _bt_mkscankey > 1.20% LWLockAcquire > + 1.03% _bt_freestack > 0.67% LWLockRelease > 0.57% _bt_check_unique >+ 0.87% _start > + 26.03% 0.95% postgres postgres[.] ExecScan > + 21.14% 0.82% postgres postgres[.] _bt_search > + 20.70% 1.31% postgres postgres[.] ExecInterpExpr > + 19.05% 1.14% postgres postgres[.] heap_insert > + 18.84% 1.16% postgres postgres[.] nextval_internal > + 18.08% 0.84% postgres postgres[.] ReadBufferExtended > + 17.24% 2.03% postgres postgres[.] ReadBuffer_common > + 12.57% 0.59% postgres postgres[.] _bt_insertonpg > + 11.12% 1.63% postgres postgres[.] XLogInsert > +9.90% 0.10% postgres postgres[.] _bt_relandgetbuf > +8.97% 1.16% postgres postgres[.] LWLockAcquire > +8.42% 2.03% postgres postgres[.] XLogInsertRecord > +7.26% 1.01% postgres postgres[.] _bt_binsrch > +7.07% 1.20% postgres postgres[.] > RelationGetBufferForTuple > +6.27% 4.92% postgres postgres[.] _bt_compare > +5.97% 0.63% postgres postgres[.] > read_seq_tuple.isra.3 > +5.70% 4.89% postgres postgres[.] > hash_search_with_hash_value > +5.44% 5.44% postgres postgres[.]
Re: Faster inserts with mostly-monotonically increasing values
On Wed, Mar 14, 2018 at 10:58 AM, Claudio Freirewrote: > > > I'm thinking there could be contention on some lock somewhere. > > Can you attach the benchmark script you're using so I can try to reproduce > it? > I am using a very ad-hoc script.. But here it is.. It assumes a presence of a branch named "btree_rightmost" with the patched code. You will need to make necessary adjustments of course. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services start_options="-c shared_buffers=16GB -c max_wal_size=64GB -c min_wal_size=16GB -c checkpoint_timeout=60min -c bgwriter_delay=100 -c bgwriter_lru_maxpages=100 -c bgwriter_lru_multiplier=3 -c bgwriter_flush_after=256 -c checkpoint_completion_target=0.9" PGDATA=/data/pgsql echo "INSERT INTO testtab(b) SELECT generate_series(1,10);" > s1.sql for b in master btree_rightmost; do git checkout $b > /dev/null 2>&1 make -s clean > /dev/null 2>&1 ./configure --prefix=$HOME/pg-install/$b make -s -j8 install > /dev/null 2>&1 export PATH=$HOME/pg-install/$b/bin:$PATH for c in 1 2 4 8; do pg_ctl -D $PGDATA -w stop rm -rf $PGDATA initdb -D $PGDATA pg_ctl -D $PGDATA -o "$start_options" -w start -l logfile.$b psql -c "CREATE TABLE testtab (a bigserial UNIQUE, b bigint);" postgres echo "$b $c" >> results.txt num_txns_per_client=$((1024 / $c)) time pgbench -n -l -c $c -j 1 -t $num_txns_per_client -f s1.sql postgres >> results.txt done done
Re: add queryEnv to ExplainOneQuery_hook
Michael Paquierwrites: > On Wed, Mar 14, 2018 at 05:36:26PM +1300, Thomas Munro wrote: >> Hmm. I suppose we could have invented a new extended hook with a >> different name and back-patched it so that PG10 would support both. >> Then binary compatibility with existing compiled extensions wouldn't >> be affected AFAICS, but you could use the new extended hook in (say) >> 10.4 or higher. Then for PG11 (or later) we could remove the old hook >> and just keep the new one. I suppose that option is still technically >> open to us, though I'm not sure of the committers' appetite for messing >> with back branches like that. > The interactions between both hooks would not be difficult to define: if > the original hook is not defined, just do not trigger the second. Still > that's too late for v10, so I would rather let it go. New features are > not backpatched. Yeah. There would be no good way for a v10 extension to know whether the additional hook is available --- it would have to know that at compile time, and it can't --- so I don't see that this is practical. Ideally we'd have noticed the problem before v10 got out ... so my own takeaway here is that this is a reminder to extension authors that they ought to test their stuff during beta period. regards, tom lane
Re: Ambigous Plan - Larger Table on Hash Side
On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U Uwrote: > Hi, > Thanks everyone for your suggestions. I would like to add explain > analyze of both the plans so that we can have broader picture. > > I have a work_mem of 1000 MB. > > The Plan which we get regularly with table being analyzed . > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; >QUERY PLAN > > Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual > time=9439.410..20445.620 rows=78264 loops=1) >Hash Cond: (tab2.b = tab1.a) >-> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual > time=0.006..5.116 rows=78264 loops=1) >-> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual > time=9133.593..9133.593 rows=30667722 loops=1) > Buckets: 33554432 Batches: 2 Memory Usage: 801126kB > -> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) > (actual time=0.030..3584.652 rows=30667722 loops=1) > Planning time: 0.055 ms > Execution time: 20472.603 ms > (8 rows) > > > > I reproduced the other plan by not analyzing the smaller table. > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; > QUERY PLAN > -- > Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual > time=15.331..7590.406 rows=78264 loops=1) >Hash Cond: (tab1.a = tab2.b) >-> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) > (actual time=0.046..2697.480 rows=30667722 loops=1) >-> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual > time=15.133..15.133 rows=78264 loops=1) > Buckets: 131072 Batches: 1 Memory Usage: 3776kB > -> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) > (actual time=0.009..5.516 rows=78264 loops=1) > Planning time: 0.053 ms > Execution time: 7592.688 ms > (8 rows) I am surprised to see the estimates to be very close to the actual values even without analysing the small table. > > > The actual plan seems to be Slower. The smaller table (tab2) has exactly > each row duplicated 8 times and all the rows in larger table (tab2) are > distinct. what may be the exact reason and can we fix this ? After analysing the small table, the first plan is chosen as the cheapest. This means that the plan with smaller table being hashed has cost higher than the plan with larger table being hashed. We need to examine that costing to see what went wrong in costing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
On Tue, Mar 6, 2018 at 11:27 PM, Alvaro Herrerawrote: > Hello > > I haven't read your respective patches yet, but both these threads > brought to memory a patch I proposed a few years ago that I never > completed: > > https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org Thank you for sharing the thread. > > In that thread I posted a patch to implement a prioritisation scheme for > autovacuum, based on an equation which was still under discussion when > I abandoned it. Chris Browne proposed a crazy equation to mix in both > XID age and fraction of dead tuples; probably that idea is worth > studying further. I tried to implement that in my patch but I probably > didn't do it correctly (because, as I recall, it failed to work as > expected). Nowadays I think we would also consider the multixact freeze > age, too. > > Maybe that's worth giving a quick look in case some of the ideas there > are useful for the patches now being proposed. Yeah, that's definitely useful for the patches. I'll look at this and will discuss that here. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: planner bug regarding lateral and subquery?
Greetings, * Tatsuro Yamada (yamada.tats...@lab.ntt.co.jp) wrote: > I found a bug, maybe. I don't think so... > * Result of Select: failed > > # select > subq_1.c0 > from > test as ref_0, > lateral (select subq_0.c0 as c0 >from > (select ref_0.c2 as c0, > (select c1 from test) as c1 from test as ref_1 >where (select c3 from test) is NULL) as subq_0 >right join test as ref_2 >on (subq_0.c1 = ref_2.c1 )) as subq_1; > > ERROR: more than one row returned by a subquery used as an expression You don't need LATERAL or anything complicated to reach that error, simply do: =*> select * from test where (select c1 from test) is null; ERROR: more than one row returned by a subquery used as an expression The problem there is that the WHERE clause is trying to evaluate an expression, which is "(select c1 from test) is null" and you aren't allowed to have multiple rows returned from that subquery (otherwise, how would we know which row to compare in the expression..?). If you're actually intending to refer to the 'c3' column from the test through the lateral join, you would just refer to it as 'ref_0.c3', as you do in another part of that query. Thanks! Stephen signature.asc Description: PGP signature
Re: remove pg_class.relhaspkey
On Sat, Mar 10, 2018 at 01:52:56PM +0100, Tomas Vondra wrote: > I agree with this sentiment - I don't think those flags are particularly > helpful for client applications, and would vote +1 for removal. OK, so I can see that we are moving to a consensus here. > For the other flags we would probably need to test what impact would it > have (e.g. table with no indexes, many indexes on other tables, and > something calling get_relation_info a lot). But this patch proposes to > remove only relhaspkey. Yes, you are right here. Peter, would you do that within this commit fest or not? As we are half-way through the commit fest, we could also cut the apple in half and just remove relhaspkey for now as that's a no-brainer. So I would suggest to just do the latter and consider this patch as done. Attached is a rebased patch, there were some conflicts in pg_class.h by the way. -- Michael From 1601b84f9ef2e8d0467b109c0b1d3df435edb59a Mon Sep 17 00:00:00 2001 From: Michael PaquierDate: Wed, 14 Mar 2018 14:46:43 +0900 Subject: [PATCH] Remove pg_class.relhaspkey It's not used for anything internally, and it can't be relied on for external uses, so it can just be removed. Patch by Peter Eisentraut. --- doc/src/sgml/catalogs.sgml | 9 - src/backend/catalog/heap.c | 1 - src/backend/catalog/index.c | 32 ++- src/backend/commands/vacuum.c | 10 -- src/backend/rewrite/rewriteDefine.c | 1 - src/include/catalog/pg_class.h | 38 ++--- 6 files changed, 20 insertions(+), 71 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index a0e6d7062b..30e6741305 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1848,15 +1848,6 @@ SCRAM-SHA-256$iteration count: - - relhaspkey - bool - - - True if the table has (or once had) a primary key - - - relhasrules bool diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index cf36ce4add..3d80ff9e5b 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -798,7 +798,6 @@ InsertPgClassTuple(Relation pg_class_desc, values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts); values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks); values[Anum_pg_class_relhasoids - 1] = BoolGetDatum(rd_rel->relhasoids); - values[Anum_pg_class_relhaspkey - 1] = BoolGetDatum(rd_rel->relhaspkey); values[Anum_pg_class_relhasrules - 1] = BoolGetDatum(rd_rel->relhasrules); values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel->relhastriggers); values[Anum_pg_class_relrowsecurity - 1] = BoolGetDatum(rd_rel->relrowsecurity); diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 431bc31969..9e2dd0e729 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -125,7 +125,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid, bool isvalid, bool isready); static void index_update_stats(Relation rel, - bool hasindex, bool isprimary, + bool hasindex, double reltuples); static void IndexCheckExclusion(Relation heapRelation, Relation indexRelation, @@ -1162,7 +1162,6 @@ index_create(Relation heapRelation, */ index_update_stats(heapRelation, true, - isprimary, -1.0); /* Make the above update visible */ CommandCounterIncrement(); @@ -1364,21 +1363,6 @@ index_constraint_create(Relation heapRelation, InvalidOid, conOid, indexRelationId, true); } - /* - * If needed, mark the table as having a primary key. We assume it can't - * have been so marked already, so no need to clear the flag in the other - * case. - * - * Note: this might better be done by callers. We do it here to avoid - * exposing index_update_stats() globally, but that wouldn't be necessary - * if relhaspkey went away. - */ - if (mark_as_primary) - index_update_stats(heapRelation, - true, - true, - -1.0); - /* * If needed, mark the index as primary and/or deferred in pg_index. * @@ -2041,7 +2025,6 @@ FormIndexDatum(IndexInfo *indexInfo, * to ensure we can do all the necessary work in just one update. * * hasindex: set relhasindex to this value - * isprimary: if true, set relhaspkey true; else no change * reltuples: if >= 0, set reltuples to this value; else no change * * If reltuples >= 0, relpages and relallvisible are also updated (using @@ -2058,7 +2041,6 @@ FormIndexDatum(IndexInfo *indexInfo, static void index_update_stats(Relation rel, bool hasindex, - bool isprimary, double reltuples) { Oid relid = RelationGetRelid(rel); @@ -2088,7 +2070,7 @@ index_update_stats(Relation rel, * It is safe to use a non-transactional update even though our * transaction could still fail
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 07:30:23PM -0700, David G. Johnston wrote: > On Tue, Mar 13, 2018 at 6:50 PM, Michael Paquier> wrote: >> To simplify user's life, we >> could also recommend just to users to issue a ALTER FUNCTION SET >> search_path to fix the problem for all functions, that's easier to >> digest. > > I'm unclear as to what scope you are suggesting the above advice (and > option #1) apply to. All pg_catalog/information_schema functions or all > functions including those created by users? > I am suggesting that to keep simple the release notes of the next minor versions, but still patch information_schema.sql with the changes based on operator(pg_catalog.foo) for all functions internally as as new deployments get the right call. -- Michael signature.asc Description: PGP signature
Re: add queryEnv to ExplainOneQuery_hook
On Wed, Mar 14, 2018 at 2:57 PM, Jim Finnertywrote: > Passing NULL in place of queryEnv in PG10 causes a failure in installcheck > tests portals and plpgsql. > > For PG10, if you want a both an ExplainOneQuery hook and clean run of > installcheck, is there a better workaround than to (a) pass NULL in place of > queryEnv, and (b) to comment out the portals and plpgsql tests? Hi Jim, I can't think of a good way right now. It's unfortunate that we couldn't back-patch 4d41b2e0 because 10 was out the door; but perhaps you can? Hmm. I suppose we could have invented a new extended hook with a different name and back-patched it so that PG10 would support both. Then binary compatibility with existing compiled extensions wouldn't be affected AFAICS, but you could use the new extended hook in (say) 10.4 or higher. Then for PG11 (or later) we could remove the old hook and just keep the new one. I suppose that option is still technically open to us, though I'm not sure of the committers' appetite for messing with back branches like that. -- Thomas Munro http://www.enterprisedb.com
Re: planner bug regarding lateral and subquery?
Hi Stephen, On 2018/03/14 12:36, Stephen Frost wrote: Greetings, * Tatsuro Yamada (yamada.tats...@lab.ntt.co.jp) wrote: I found a bug, maybe. I don't think so... * Result of Select: failed # select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; ERROR: more than one row returned by a subquery used as an expression You don't need LATERAL or anything complicated to reach that error, simply do: =*> select * from test where (select c1 from test) is null; ERROR: more than one row returned by a subquery used as an expression The problem there is that the WHERE clause is trying to evaluate an expression, which is "(select c1 from test) is null" and you aren't allowed to have multiple rows returned from that subquery (otherwise, how would we know which row to compare in the expression..?). If you're actually intending to refer to the 'c3' column from the test through the lateral join, you would just refer to it as 'ref_0.c3', as you do in another part of that query. Thanks for your reply. The query is not useful for me and it's just a test query for planner because it is made by sqlsmith. :) My question is that was it possible to handle the error only in executer phase? I expected that it is checked in parsing or planning phase. Thanks, Tatsuro Yamada
Re: PATCH: Configurable file mode mask
On Tue, Mar 13, 2018 at 01:28:17PM -0400, Tom Lane wrote: > David Steelewrites: >> On 3/12/18 3:28 AM, Michael Paquier wrote: >>> In pg_rewind and pg_resetwal, isn't that also a portion which is not >>> necessary without the group access feature? > >> These seem like a good idea to me with or without patch 03. Some of our >> front-end tools (initdb, pg_upgrade) were setting umask and others >> weren't. I think it's more consistent (and safer) if they all do, at >> least if they are writing into PGDATA. > > +1 ... see a926eb84e for an example of how easy it is to screw up if > the process's overall umask is permissive. Okay. A suggestion that I have here would be to split those extra calls into a separate patch. That's a useful self-contained improvement. -- Michael signature.asc Description: PGP signature
Re: PATCH: Configurable file mode mask
On Tue, Mar 13, 2018 at 12:19:07PM -0400, David Steele wrote: > I'll attach new patches in a reply to [1] once I have made the changes > Tom requested. Cool, thanks for your patience. Looking forward to seeing those. I'll spend time on 0003 at the same time. Let's also put the additional umask calls for pg_rewind and pg_resetwal into a separate patch. -- Michael signature.asc Description: PGP signature