Re: [HACKERS] patch: function xmltable
2017-01-24 21:38 GMT+01:00 Alvaro Herrera: > Pavel Stehule wrote: > > > * SELECT (xmltable(..)).* + regress tests > > * compilation and regress tests without --with-libxml > > Thanks. I just realized that this is doing more work than necessary -- > ?? I don't understand? > I think it would be simpler to have tableexpr fill a tuplestore with the > results, instead of just expecting function execution to apply > ExecEvalExpr over and over to obtain the results. So evaluating a > tableexpr returns just the tuplestore, which function evaluation can > return as-is. That code doesn't use the value-per-call interface > anyway. > ok > I also realized that the expr context callback is not called if there's > an error, which leaves us without shutting down libxml properly. I > added PG_TRY around the fetchrow calls, but I'm not sure that's correct > either, because there could be an error raised in other parts of the > code, after we've already emitted a few rows (for example out of > memory). I think the right way is to have PG_TRY around the execution > of the whole thing rather than just row at a time; and the tuplestore > mechanism helps us with that. > ok. > > I think it would be good to have a more complex test case in regress -- > let's say there is a table with some simple XML values, then we use > XMLFOREST (or maybe one of the table_to_xml functions) to generate a > large document, and then XMLTABLE uses that document as input document. > > Please fix. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] pgbench more operators & functions
Hello Tom, I concur that this is expanding pgbench's expression language well beyond what anybody has shown a need for. As for the motivation, I'm assuming that pgbench should provide features necessary to implement benchmarks, so I'm adding operators that appear in standard benchmark specifications. From TPC-B 2.0.0 section 5.3.5: | The Account_ID is generated as follows: | • A random number X is generated within [0,1] | • If X<0.85 or branches = 1, a random Account_ID is selected over all |accounts. | • If X>=0.85 and branches > 1, a random Account_ID is selected over all | non- accounts. The above uses a condition (If), logic (or, and), comparisons (=, <, >=). From TPC-C 5.11 section 2.1.6, a bitwise-or operator is used to skew a distribution: | NURand (A, x, y) = (((random (0, A) | random (x, y)) + C) % (y - x + 1)) + x And from section 5.2.5.4 of same, some time is computed based on a logarithm: | Tt = -log(r) * µ I'm also concerned that there's an opportunity cost here, in that the patch establishes a precedence ordering for its new operators, which we'd have a hard time changing later. That's significant because the proposed precedence is different from what you'd get for similarly-named operators on the backend side. I realize that it's trying to follow the C standard instead, Oops. I just looked at the precedence from a C parser, without realizing that precedence there was different from postgres SQL implementation:-( This is a bug on my part. I'd be okay with the parts of this that duplicate existing backend syntax and semantics, but I don't especially want to invent further than that. Okay. In the two latest versions sent, discrepancies from that were bugs, I was trying to conform. Version 8 attached hopefully fixes the precedence issue raised above: - use precedence taken from "backend/gram.y" instead of C. I'm not sure that it is wise that pg has C-like operators with a different precedence, but this is probably much too late... And fixes the documentation: - remove a non existing anymore "if" function documentation which made Robert assume that I had not taken the hint to remove it. I had! - reorder operator documentation by their pg SQL precedence. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 1eee8dc..73101e1 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -828,11 +828,11 @@ pgbench options dbname The expression may contain integer constants such as 5432, double constants such as 3.14159, references to variables :variablename, - unary operators (+, -) and binary operators - (+, -, *, /, - %) with their usual precedence and associativity, - function calls, and - parentheses. + operators + with their usual precedence and associativity, + function calls, + SQL CASE generic conditional expressions + and parentheses. @@ -917,6 +917,165 @@ pgbench options dbname + + Built-In Operators + + + The arithmetic, bitwise, comparison and logical operators listed in +are built into pgbench + and may be used in expressions appearing in + \set. + + + + pgbench Operators by increasing precedence + + + + Operator + Description + Example + Result + + + + + OR + logical or + 5 or 0 + 1 + + + AND + logical and + 3 and 0 + 0 + + + NOT + logical not + not 0 + 1 + + + = + is equal + 5 = 4 + 0 + + + + is not equal + 5 4 + 1 + + + != + is not equal + 5 != 5 + 0 + + + + lower than + 5 4 + 0 + + + = + lower or equal + 5 = 4 + 0 + + + + greater than + 5 4 + 1 + + + = + greater or equal + 5 = 4 + 1 + + + | + integer bitwise OR + 1 | 2 + 3 + + + # + integer bitwise XOR + 1 # 3 + 2 + + + + integer bitwise AND + 1 3 + 1 + + + ~ + integer bitwise NOT + ~ 1 + -2 + + + + bitwise shift left + 1 2 + 4 + + + + bitwise shift right + 8 2 + 2 + + + + + addition + 5 + 4 + 9 + + + - + substraction + 3 - 2.0 + 1.0 + + + * + multiplication + 5 * 4 + 20 + + + / + division (integer truncates the results) + 5 / 3 + 1 + + + % + modulo + 3 % 2 + 1 + + + - + opposite + - 2.0 + -2.0 +
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Hi Alvaro, Am 24.01.2017 um 19:36 schrieb Alvaro Herrera: Tobias Oberstein wrote: I am benchmarking IOPS, and while doing so, it becomes apparent that at these scales it does matter _how_ IO is done. The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU load. Using any synchronous IO engine is slower and produces higher load. I do understand that switching to libaio isn't going to fly for PG (completely different approach). Maybe it is possible to write a new f_smgr implementation (parallel to md.c) that uses libaio. There is no "seek" in that interface, at least, though the interface does assume that the implementation is blocking. FWIW, I now systematically compared the IO performance when normalized for system load induced over different IO methods. I use the FIO ioengine terminology: sync = lseek/read/write psync = pread/pwrite Here: https://github.com/oberstet/scratchbox/raw/master/cruncher/engines-compared/normalized-iops.pdf Conclusion: psync has 1.15x the normalized IOPS compared to sync libaio has up to 6.5x the normalized IOPS compared to sync --- These measurements where done on 16 NVMe block devices. As mentioned, when Linux MD comes into the game, the difference between sync and psync is much higher - the is a lock contention in MD. The reason for that is: when MD comes into the game, even our massive CPU cannot hide the inefficiency of the double syscalls anymore. This MD issue is our bigger problem (compared to PG using sync/psync). I am going to post to the linux-raid list about that, as being advised by FIO developers. --- That being said, regarding getting maximum performance out of NVMes with minimal system load, the real deal probably isn't libaio either, but kernel bypass (hinted to my by FIO devs): http://www.spdk.io/ FIO has a plugin for SPDK, which I am going to explore to establish a final conclusive baseline for maximum IOPS normalized for load. There are similar approaches in networking (BSD netmap, DPDK) to bypass the kernel altogether (zero copy to userland, no interrupts but polling etc). With hardware like this (NVMe, 100GbE etc), the kernel gets in the way .. Anyway, this is now probably OT as for PG;) Cheers, /Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Hi Andres, Using pread instead of lseek+read halfes the syscalls. I really don't understand what you are fighting here .. Sure, there's some overhead. And as I said upthread, I'm much less against this change than Tom. What I'm saying is that your benchmarks haven't shown a benefit in a meaningful way, so I don't think I can agree with "Well, my point remains that I see little value in messing with long-established code if you can't demonstrate a benefit that's clearly above the noise level." I have done lots of benchmarking over the last days on a massive box, and I can provide numbers that I think show that the impact can be significant. since you've not actually shown that the impact is above the noise level when measured with an actual postgres workload. I can follow that. So real prove cannot be done with FIO, but "actual PG workload". Synthetic PG workload or real world production workload? Also: rgd the perf profiles from production that show lseek as #1 syscall. You said it wouldn't be prove either, because it only shows number of syscalls, and though it is clear that millions of syscalls/sec do come with overhead, it is still not showing "above noise" level relevance (because PG is such a CPU hog in itself anyways;) So how would I do a perf profile that would be acceptable as prove? Maybe I can expand our https://gist.github.com/oberstet/ca03d7ab49be4c8edb70ffa1a9fe160c profiling function. Cheers, /Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
As it stands right now you haven't provided enough context for this patch and only the social difficulty of actually marking a patch rejected has prevented its demise in its current form - because while it has interesting ideas its added maintenance burden for -core without any in-core usage. But it you make it the first patch in a 3-patch series that implements the per-spec tpc-b the discussion moves away from these support functions and into the broader framework in which they are made useful. I think Fabien already did post something of the sort, or at least discussion towards it, Yep. and there was immediately objection as to whether his idea of TPC-B compliance was actually right. I remember complaining that he had a totally artificial idea of what "fetching a data value" requires. Yep. I think that the key misunderstanding is that you are honest and assume that other people are honest too. This is naïve: There is a long history of vendors creatively "cheating" to get better than deserve benchmark results. Benchmark specifications try to prevent such behaviors by laying careful requirements and procedures. In this instance, you "know" that when pg has returned the result of the query the data is actually on the client side, so you considered it is fetched. That is fine for you, but from a benchmarking perspective with external auditors your belief is not good enough. For instance, the vendor could implement a new version of the protocol where the data are only transfered on demand, and the result just tells that the data is indeed somewhere on the server (eg on "SELECT abalance" it could just check that the key exists, no need to actually fetch the data from the table, so no need to read the table, the index is enough...). That would be pretty stupid for real application performance, but the benchmark would could get better tps by doing so. Without even intentionnaly cheating, this could be part of a useful "streaming mode" protocol option which make sense for very large results but would be activated for a small result. Another point is that decoding the message may be a little expensive, so that by not actually extracting the data into the client but just keeping it in the connection/OS one gets better performance. Thus, TPC-B 2.0.0 benchmark specification says: "1.3.2 Each transaction shall return to the driver the Account_Balance resulting from successful commit of the transaction. Comment: It is the intent of this clause that the account balance in the database be returned to the driver, i.e., that the application retrieve the account balance." For me the correct interpretation of "the APPLICATION retrieve the account balance" is that the client application code, pgbench in this context, did indeed get the value from the vendor code, here "libpq" which is handling the connection. Having the value discarded from libpq by calling PQclear instead of PQntuples/PQgetvalue/... skips a key part of the client code that no real application would skip. This looks strange and is not representative of real client code: as a potential auditor, because of this I would not check the corresponding item in the audit check list: "11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3." So the benchmark implementation would not be validated. Another trivial reason to be able to actually retrieve data is that for benchmarking purpose it is very easy to want to test a scenario where you want to do different things based on data received, which imply that the data can be manipulated somehow on the benchmarking client side, which is currently not possible. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: recursive json_populate_record()
Nikita Glukhovwrites: > On 25.01.2017 23:58, Tom Lane wrote: >> I think you need to take a second look at the code you're producing >> and realize that it's not so clean either. This extract from >> populate_record_field, for example, is pretty horrid: > But what if we introduce some helper macros like this: > #define JsValueIsNull(jsv) \ > ((jsv)->is_json ? !(jsv)->val.json.str \ > : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) > #define JsValueIsString(jsv) \ > ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ > : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) Yeah, I was wondering about that too. I'm not sure that you can make a reasonable set of helper macros that will fix this, but if you want to try, go for it. BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have to go back to the manual every darn time to convince myself whether that means (a?b:c)||d or a?b:(c||d). It's better not to rely on the reader (... or the author) having memorized C's precedence rules in quite that much detail. Extra parens help. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2017-01-25 23:33 GMT+01:00 Andres Freund: > On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote: > > 2017-01-25 22:40 GMT+01:00 Andres Freund : > > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > > > reimplement > > > > it partially :( - mainly for usage in "ROWS FROM ()" > > > > > > > The TableExpr implementation is based on SRF now. You and Alvaro propose > > independent implementation like generic executor node. I am sceptic so > > FunctionScan supports reading from generic executor node. > > Why would it need to? > Simply - due consistency with any other functions that can returns rows. Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE function - TableExpr that looks like SRF function, has similar behave - returns more rows, but should be significantly different implemented, and should to have different limits - should not be used there and there ... It is hard to see consistency there for me. Regards Pavel
Re: [HACKERS] Checksums by default?
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frostwrote: > > Yet, our default is to have them disabled and *really* hard to enable. > > First of all, that could be fixed by further development. I'm certainly all for doing so, but I don't agree that it necessairly is required before we flip the default. That said, if the way to get checksums enabled by default is providing a relativly easy way to turn them off, then that's something which I'll do what I can to help work towards. In other words, I'm not going to continue to argue, given the various opinions of the group, that we should just flip it tomorrow. I hope to discuss it further after we have the ability to turn it off easily. > Second, really hard to enable is a relative term. I accept that > enabling checksums is not a pleasant process. Right now, you'd have > to do a dump/restore, or use logical replication to replicate the data > to a new cluster and then switch over. On the other hand, if > checksums are really a critical feature, how are people getting to the > point where they've got a mission-critical production system and only > then discovering that they want to enable checksums? I truely do wish everyone would come talk to me before building out a database. Perhaps that's been your experience, in which case, I envy you, but I tend to get a reaction more along the lines of "wait, what do you mean I had to pass some option to initdb to enable checksum?!?!". The fact that we've got a WAL implementation and clearly understand fsync requirements, why full page writes make sense, and that our WAL has its own CRCs which isn't possible to disable, tends to lead people to think we really know what we're doing and that we care a lot about their data. > > I agree that it's unfortunate that we haven't put more effort into > > fixing that- I'm all for it, but it's disappointing to see that people > > are not in favor of changing the default as I believe it would both help > > our users and encourage more development of the feature. > > I think it would help some users and hurt others. I do agree that it > would encourage more development of the feature -- almost of > necessity. In particular, I bet it would spur development of an > efficient way of turning checksums off -- but I'd rather see us > approach it from the other direction: let's develop an efficient way > of turning the feature on and off FIRST. Deciding that the feature > has to be on for everyone because turning it on later is too hard for > the people who later decide they want it is letting the tail wag the > dog. As I have said, I don't believe it has to be on for everyone. > Also, I think that one of the big problems with the way checksums work > is that you don't find problems with your archived data until it's too > late. Suppose that in February bits get flipped in a block. You > don't access the data until July[1]. Well, it's nice to have the > system tell you that the data is corrupted, but what are you going to > do about it? By that point, all of your backups are probably > corrupted. So it's basically: If your backup system is checking the checksums when backing up PG, which I think every backup system *should* be doing, then guess what? You've got a backup which you can go back to immediately, possibly with the ability to restore all of the data from WAL. That won't always be the case, naturally, but it's a much better position than simply having a system which continues to degrade until you've actually reached the "you're screwed" level because PG will no longer read a page or perhaps can't even start up, *and* you no longer have any backups. As it is, there are backup solutions which *do* check the checksum when backing up PG. This is no longer, thankfully, some hypothetical thing, but something which really exists and will hopefully keep users from losing data. > It's nice to know that (maybe?) but without a recovery strategy a > whole lot of people who get that message are going to immediately > start asking "How do I ignore the fact that I'm screwed and try to > read the data anyway?". And we have options for that. > And then you wonder what the point of having > the feature turned on is, especially if it's costly. It's almost an > attractive nuisance at that point - nobody wants to be the user that > turns off checksums because they sound good on paper, but when you > actually have a problem an awful lot of people are NOT going to want > to try to restore from backup and maybe lose recent transactions. > They're going to want to ignore the checksum failures. That's kind of > awful. Presently, last I checked at least, the database system doesn't fall over and die if a single page's checksum fails. I agree entirely that we want the system to fail gracefully (unless the user instructs us otherwise, perhaps because they have a redundant system that they can flip
Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
Ashutosh Bapatwrites: > On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquier > wrote: >> On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane wrote: >>> Here's an updated patch with doc changes. Aside from that one, >>> I tried to spell "pseudo-type" consistently, and I changed one >>> place where we were calling something a pseudo-type that isn't. > I think, those changes, even though small, deserve their own commit. > The changes themselves look good. Pushed, thanks for the reviews! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
> We are talking about the recovery/promote code path. Specifically this > call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions(). > > We write the files to disk and they get immediately read up in the > following code. We could not write the files to disk and read > KnownPreparedList in the code path that follows as well as elsewhere. Thinking more on this. The only optimization that's really remaining is handling of prepared transactions that have not been committed or will linger around for long. The short lived 2PC transactions have been optimized already via this patch. The question remains whether saving off a few fsyncs/reads for these long-lived prepared transactions is worth the additional code churn. Even if we add code to go through the KnownPreparedList, we still will have to go through the other on-disk 2PC transactions anyways. So, maybe not. Regards, Nikhils > > Regards, > Nikhils > > >>> The difference between those two is likely noise. >>> >>> By the way, in those measurements, the OS cache is still filled with >>> the past WAL segments, which is a rather best case, no? What happens >>> if you do the same kind of tests on a box where memory is busy doing >>> something else and replayed WAL segments get evicted from the OS cache >>> more aggressively once the startup process switches to a new segment? >>> This could be tested for example on a VM with few memory (say 386MB or >>> less) so as the startup process needs to access again the past WAL >>> segments to recover the 2PC information it needs to get them back >>> directly from disk... One trick that you could use here would be to >>> tweak the startup process so as it drops the OS cache once a segment >>> is finished replaying, and see the effects of an aggressive OS cache >>> eviction. This patch is showing really nice improvements with the OS >>> cache backing up the data, still it would make sense to test things >>> with a worse test case and see if things could be done better. The >>> startup process now only reads records sequentially, not randomly >>> which is a concept that this patch introduces. >>> >>> Anyway, perhaps this does not matter much, the non-recovery code path >>> does the same thing as this patch, and the improvement is too much to >>> be ignored. So for consistency's sake we could go with the approach >>> proposed which has the advantage to not put any restriction on the >>> size of the 2PC file contrary to what an implementation saving the >>> contents of the 2PC files into memory would need to do. >> >> Maybe i’m missing something, but I don’t see how OS cache can affect >> something here. >> >> Total WAL size was 0x44 * 16 = 1088 MB, recovery time is about 20s. >> Sequential reading 1GB of data >> is order of magnitude faster even on the old hdd, not speaking of ssd. Also >> you can take a look on flame graphs >> attached to previous message — majority of time during recovery spent in >> pg_qsort while replaying >> PageRepairFragmentation, while whole xact_redo_commit() takes about 1% of >> time. That amount can >> grow in case of uncached disk read but taking into account total recovery >> time this should not affect much. >> >> If you are talking about uncached access only during checkpoint than here we >> are restricted with >> max_prepared_transaction, so at max we will read about hundred of small >> files (usually fitting into one filesystem page) which will also >> be barely noticeable comparing to recovery time between checkpoints. Also >> wal segments cache eviction during >> replay doesn’t seems to me as standard scenario. >> >> Anyway i took the machine with hdd to slow down read speed and run tests >> again. During one of the runs i >> launched in parallel bash loop that was dropping os cache each second (while >> wal fragment replay takes >> also about one second). >> >> 1.5M transactions >> start segment: 0x06 >> last segment: 0x47 >> >> patched, with constant cache_drop: >> total recovery time: 86s >> >> patched, without constant cache_drop: >>total recovery time: 68s >> >> (while difference is significant, i bet that happens mostly because of >> database file segments should be re-read after cache drop) >> >> master, without constant cache_drop: >>time to recover 35 segments: 2h 25m (after that i tired to wait) >>expected total recovery time: 4.5 hours >> >> -- >> Stas Kelvich >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >> > > > > -- > Nikhil Sontakke http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw
On 2016/11/30 17:29, Etsuro Fujita wrote: On 2016/11/23 20:28, Rushabh Lathia wrote: I wrote: How about inserting that before the out param **retrieved_attrs, like this? static void deparseExplicitTargetList(List *tlist, bool is_returning, List **retrieved_attrs, deparse_expr_cxt *context); Yes, adding it before retrieved_attrs would be good. OK, will do. Done. You wrote: 5) make_explicit_returning_list() pull the var list from the returningList and build the targetentry for the returning list and at the end rewrites the fdw_scan_tlist. AFAIK, in case of DML - which is going to pushdown to the remote server ideally fdw_scan_tlist should be same as returning list, as final output for the query is query will be RETUNING list only. isn't that true? I wrote: That would be true. But the fdw_scan_tlist passed from the core would contain junk columns inserted by the rewriter and planner work, such as CTID for the target table and whole-row Vars for other tables specified in the FROM clause of an UPDATE or the USING clause of a DELETE. So, I created the patch so that the pushed-down UPDATE/DELETE retrieves only the data needed for the RETURNING calculation from the remote server, not the whole fdw_scan_tlist data. Yes, I noticed that fdw_scan_tlist have CTID for the target table and whole-raw vars for other tables specified in the FROM clause of the DML. But I was thinking isn't it possible to create new fdw_scan_tlist once we found that DML is direct pushable to foreign server? I tried quickly doing that - but later its was throwing "variable not found in subplan target list" from set_foreignscan_references(). We could probably avoid that error by replacing the targetlist of the subplan with fdw_scan_tlist, but that wouldn't be enough ... You wrote: If yes, then why can't we directly replace the fdw_scan_tlist with the returning list, rather then make_explicit_returning_list()? I wrote: I think we could do that if we modified the core (maybe the executor part). I'm not sure that's a good idea, though. Yes modifying core is not good idea. But just want to understand why you think that this need need to modify core? Sorry, I don't remember that. Will check. The reason why I think so is that the ModifyTable node on top of the ForeignScan node requires that the targetlist of the ForeignScan has (1) junk whole-row Vars for secondary relations in UPDATE/DELETE and (2) all attributes of the target relation to produce the new tuple for UPDATE. (So, it wouldn't be enough to just replace the ForeignScan's targetlist with fdw_scan_tlist!) For #1, see this (and the following code) in ExecInitModifyTable: /* * If we have any secondary relations in an UPDATE or DELETE, they need to * be treated like non-locked relations in SELECT FOR UPDATE, ie, the * EvalPlanQual mechanism needs to be told about them. Locate the * relevant ExecRowMarks. */ And for #2, see this (and the following code, especially where calling ExecCheckPlanOutput) in the same function: * This section of code is also a convenient place to verify that the * output of an INSERT or UPDATE matches the target table(s). What you proposed would be a good idea because the FDW could calculate the user-query RETURNING list more efficiently in some cases, but I'd like to leave that for future work. Attached is the new version of the patch. I also addressed other comments from you: moved rewriting the fdw_scan_tlist to postgres_fdw.c, added/revised comments, and added regression tests for the case where a pushed down UPDATE/DELETE on a join has RETURNING. My apologies for having been late to work on this. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 130,142 static void deparseTargetList(StringInfo buf, Bitmapset *attrs_used, bool qualify_col, List **retrieved_attrs); ! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context); static void deparseReturningList(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, bool trig_after_row, List *returningList, List **retrieved_attrs); static void deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root, bool qualify_col); static void deparseRelation(StringInfo buf, Relation rel); --- 130,151 Bitmapset *attrs_used, bool qualify_col, List **retrieved_attrs); ! static void deparseExplicitTargetList(List *tlist, ! bool is_returning, ! List **retrieved_attrs, deparse_expr_cxt *context);
Re: [HACKERS] pgbench more operators & functions
Bonjour Michaël, Hello Robert, Let's mark this Returned with Feedback and move on. We've only got a week left in the CommitFest anyhow and there are lots of other things that still need work (and which actually have been revised to match previous feedback). Done as such, let's move on. Hmmm. I think that there is a misunderstanding, most of which being my fault. I have really tried to do everything that was required from committers, including revising the patch to match all previous feedback. Version 6 sent on Oct 4 did include all fixes required at the time (no if, no unusual and operators, TAP tests)... However I forgot to remove some documentation about the removed stuff, which made Robert think that I had not done it. I apologise for this mistake and the subsequent misunderstanding:-( The current v8 sent on Jan 25 should only implement existing server-side stuff, including with the same precedence as pointed out by Tom. So for the implementation side I really think that I have done exactly all that was required of me by committers, although sometimes with bugs or errors, my apology, again... As for the motivation, which is another argument, I cannot do more than point to actual published official benchmark specifications that do require these functions. I'm not inventing anything or providing some useless catalog of math functions. If pgbench is about being seated on a bench and running postgres on your laptop to get some heat, my mistake... I thought it was about benchmarking, which does imply a few extra capabities. If the overall feedback is to be undestood as "the postgres community does not think that pgbench should be able to be used to implement benchmarks such as TPC-B", then obviously I will stop efforts to improve it for that purpose. To conclude: IMHO the relevant current status of the patch should be "Needs review" and possibly "Move to next CF". If the feedback is "we do not want pgbench to implement benchmarks such as TPC-B", then indeed the proposed features are not needed and the status should be "Rejected". In any case, "Returned with feedback" does not really apply. A+ -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substantial bloat in postgres_fdw regression test runtime
Ashutosh Bapatwrites: > On Thu, Nov 3, 2016 at 1:58 PM, Jeevan Chalke > wrote: >> Attached patch with test-case modification. > I verified that this patch indeed bring the time down to 2 to 3 > seconds from 10 seconds. Thanks for working on this, guys. > The additional condition t2.c2 = 6 seems to echo the filter t2.c2 = 6 > of aggregate. We wouldn't know which of those actually worked. I > modified the testcase to use t2.c2 % 6 = 0 instead and keep the filter > condition intact. This increases the execution time by .2s, which may > be ok. Let me know what you thing of the attached patch. Agreed, that seems like a good compromise. Pushed that way. > Also, please add this to the commitfest, so that it isn't forgotten. No need. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
Hello Tom, I concur that this is expanding pgbench's expression language well beyond what anybody has shown a need for. As for the motivation, I'm assuming that pgbench should provide features necessary to implement benchmarks, so I'm adding operators that appear in standard benchmark specifications. From TPC-B 2.0.0 section 5.3.5: | The Account_ID is generated as follows: | • A random number X is generated within [0,1] | • If X<0.85 or branches = 1, a random Account_ID is selected over all |accounts. | • If X>=0.85 and branches > 1, a random Account_ID is selected over all | non- accounts. The above uses a condition (If), logic (or, and), comparisons (=, <, >=). From TPC-C 5.11 section 2.1.6, a bitwise-or operator is used to skew a distribution: | NURand (A, x, y) = (((random (0, A) | random (x, y)) + C) % (y - x + 1)) + x And from section 5.2.5.4 of same, some time is computed based on a logarithm: | Tt = -log(r) * µ I'm also concerned that there's an opportunity cost here, in that the patch establishes a precedence ordering for its new operators, which we'd have a hard time changing later. That's significant because the proposed precedence is different from what you'd get for similarly-named operators on the backend side. I realize that it's trying to follow the C standard instead, Oops. I just looked at the precedence from a C parser, without realizing that precedence there was different from postgres SQL implementation:-( This is a bug on my part. I'd be okay with the parts of this that duplicate existing backend syntax and semantics, but I don't especially want to invent further than that. Okay. In the two latest versions sent, discrepancies from that were bugs, I was trying to conform. Version 8 attached hopefully fixes the precedence issue raised above: - use precedence taken from "backend/gram.y" instead of C. I'm not sure that it is wise that pg has C-like operators with a different precedence, but this is probably much too late... And fixes the documentation: - remove a non existing anymore "if" function documentation which made Robert assume that I had not taken the hint to remove it. I had! - reorder operator documentation by their pg SQL precedence. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 1eee8dc..73101e1 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -828,11 +828,11 @@ pgbench options dbname The expression may contain integer constants such as 5432, double constants such as 3.14159, references to variables :variablename, - unary operators (+, -) and binary operators - (+, -, *, /, - %) with their usual precedence and associativity, - function calls, and - parentheses. + operators + with their usual precedence and associativity, + function calls, + SQL CASE generic conditional expressions + and parentheses. @@ -917,6 +917,165 @@ pgbench options dbname + + Built-In Operators + + + The arithmetic, bitwise, comparison and logical operators listed in +are built into pgbench + and may be used in expressions appearing in + \set. + + + + pgbench Operators by increasing precedence + + + + Operator + Description + Example + Result + + + + + OR + logical or + 5 or 0 + 1 + + + AND + logical and + 3 and 0 + 0 + + + NOT + logical not + not 0 + 1 + + + = + is equal + 5 = 4 + 0 + + + + is not equal + 5 4 + 1 + + + != + is not equal + 5 != 5 + 0 + + + + lower than + 5 4 + 0 + + + = + lower or equal + 5 = 4 + 0 + + + + greater than + 5 4 + 1 + + + = + greater or equal + 5 = 4 + 1 + + + | + integer bitwise OR + 1 | 2 + 3 + + + # + integer bitwise XOR + 1 # 3 + 2 + + + + integer bitwise AND + 1 3 + 1 + + + ~ + integer bitwise NOT + ~ 1 + -2 + + + + bitwise shift left + 1 2 + 4 + + + + bitwise shift right + 8 2 + 2 + + + + + addition + 5 + 4 + 9 + + + - + substraction + 3 - 2.0 + 1.0 + + + * + multiplication + 5 * 4 + 20 + + + / + division (integer truncates the results) + 5 / 3 + 1 + + + % + modulo + 3 % 2 + 1 + + + - + opposite + - 2.0 + -2.0 + + + + + + Built-In Functions @@ -963,6 +1122,13 @@ pgbench options
Re: [HACKERS] PostgreSQL 8.2 installation error on Windows 2016 server
Hi Aaron, Thank you so much for the information. Regards, Shruti Rawal -Original Message- From: Aaron W. Swenson [mailto:titanof...@gentoo.org] Sent: Tuesday, January 24, 2017 8:22 PM To: Shruti Rawal Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] PostgreSQL 8.2 installation error on Windows 2016 server On 2017-01-24 09:20, Shruti Rawal wrote: > Hi, > > We are doing an assessment for for migrating our Perl applications to Windows > 2016 server. > I am trying to install PostgreSQL 8.2 version on my Windows server 2016. But > it is giving me following error: > Malformed permissions property: 'langid' > > We could not find any relavant information on PostgreSQl site that if the > stated version 8.2 will work on Windows 2016 server. > I would like to know whether 8.2 version supports on 2016 server, if not > which version is supported? The information you’re looking for is accessible on the home page. You’re having a tough time finding information on 8.2 because it is way past EOL. You will be happier evaluating 9.6.1. (https://www.postgresql.org/support/versioning/) Further, -hackers is not intended as a support mailing list. You should write to -general instead. The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequence data type
Here is an updated patch that allows changing the sequence type. This was clearly a concern for reviewers, and the presented use cases seemed convincing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 5aff9dbe7a94417aa0faf56dab37187fcbec23f5 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Wed, 25 Jan 2017 09:35:58 -0500 Subject: [PATCH v4] Add CREATE SEQUENCE AS clause This stores a data type, required to be an integer type, with the sequence. The sequences min and max values default to the range supported by the type, and they cannot be set to values exceeding that range. The internal implementation of the sequence is not affected. Change the serial types to create sequences of the appropriate type. This makes sure that the min and max values of the sequence for a serial column match the range of values supported by the table column. So the sequence can no longer overflow the table column. This also makes monitoring for sequence exhaustion/wraparound easier, which currently requires various contortions to cross-reference the sequences with the table columns they are used with. This commit also effectively reverts the pg_sequence column reordering in f3b421da5f4addc95812b9db05a24972b8fd9739, because the new seqtypid column allows us to fill the hole in the struct and create a more natural overall column ordering. --- doc/src/sgml/catalogs.sgml | 14 +++- doc/src/sgml/information_schema.sgml| 4 +- doc/src/sgml/ref/alter_sequence.sgml| 30 ++-- doc/src/sgml/ref/create_sequence.sgml | 36 ++ src/backend/catalog/information_schema.sql | 4 +- src/backend/commands/sequence.c | 89 +--- src/backend/parser/gram.y | 6 +- src/backend/parser/parse_utilcmd.c | 2 +- src/bin/pg_dump/pg_dump.c | 103 +++- src/bin/pg_dump/t/002_pg_dump.pl| 2 + src/include/catalog/catversion.h| 2 +- src/include/catalog/pg_proc.h | 2 +- src/include/catalog/pg_sequence.h | 8 ++- src/test/modules/test_pg_dump/t/001_base.pl | 1 + src/test/regress/expected/sequence.out | 52 ++ src/test/regress/expected/sequence_1.out| 52 ++ src/test/regress/sql/sequence.sql | 25 +-- 17 files changed, 312 insertions(+), 120 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 524180e011..162975746d 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -5774,10 +5774,11 @@ pg_sequence Columns - seqcycle - bool + seqtypid + oid + pg_type.oid - Whether the sequence cycles + Data type of the sequence @@ -5814,6 +5815,13 @@ pg_sequence Columns Cache size of the sequence + + + seqcycle + bool + + Whether the sequence cycles + diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml index c43e325d06..a3a19ce8ce 100644 --- a/doc/src/sgml/information_schema.sgml +++ b/doc/src/sgml/information_schema.sgml @@ -4653,9 +4653,7 @@ sequences Columns data_type character_data - The data type of the sequence. In - PostgreSQL, this is currently always - bigint. + The data type of the sequence. diff --git a/doc/src/sgml/ref/alter_sequence.sgml b/doc/src/sgml/ref/alter_sequence.sgml index 3b52e875e3..252a668189 100644 --- a/doc/src/sgml/ref/alter_sequence.sgml +++ b/doc/src/sgml/ref/alter_sequence.sgml @@ -23,7 +23,9 @@ -ALTER SEQUENCE [ IF EXISTS ] name [ INCREMENT [ BY ] increment ] +ALTER SEQUENCE [ IF EXISTS ] name +[ AS data_type ] +[ INCREMENT [ BY ] increment ] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] [ START [ WITH ] start ] [ RESTART [ [ WITH ] restart ] ] @@ -81,6 +83,26 @@ Parameters + data_type + + +The optional +clause AS data_type +changes the data type of the sequence. Valid types are +are smallint, integer, +and bigint. + + + +Note that changing the data type does not automatically change the +minimum and maximum values. You can use the clauses NO +MINVALUE and NO MAXVALUE to adjust the +minimum and maximum values to the range of the new data type. + + + + + increment @@ -102,7 +124,7 @@ Parameters class="parameter">minvalue determines the minimum value a sequence can generate. If NO MINVALUE is specified, the defaults of 1 and --263 for ascending and descending sequences, +the minimum value of the data
Re: [HACKERS] patch: function xmltable
Tom Lane wrote: > Andres Freundwrites: > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > >> XMLTABLE is specified by the standard to return multiple rows ... but > >> then as far as my reading goes, it is only supposed to be supported in > >> the range table (FROM clause) not in the target list. I wonder if > >> this would end up better if we only tried to support it in RT. I asked > >> Pavel to implement it like that a few weeks ago, but ... > > > Right - it makes sense in the FROM list - but then it should be an > > executor node, instead of some expression thingy. > > +1 --- we're out of the business of having simple expressions that > return rowsets. Well, that's it. I'm not committing this patch against two other committers' opinion, plus I was already on the fence about the implementation anyway. I think you should just go with the flow and implement this by creating nodeTableexprscan.c. It's not even difficult. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote: > Attached is a patch that implements copy_srf(). > > The function signature reflects cstate more than it reflects the COPY > options (filename+is_program instead of FILENAME or PROGRAM, etc) The patch as it stands needs a rebase. I'll see what I can do today. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
I am responding both of your emails together. > Perhaps I don't understand it. Many opclass are defined for > btree. But since (ordinary) btree handles only one-dimentional, > totally-orderable sets, geometric types even point don't fit > it. Even if we relaxed it by removing EPSILON, multidimentional > data still doesn't fit. Yes, multidimentional data doesn't fit into btree. Let's say we would have to introduce operators called *<, *<=, *>, *>= to support btree opclass for point. I agree those operators would be useless, but btree opclass is used for other purposes too. "ORDER BY point", merge-joins, btree index support for ~= would be useful. Lack of ORDER BY, GROUP BY, and DISTINCT support is the major inconvenience about the geometric types. There are many complaints about this on the mailing list. Many frameworks and ORMs are not prepared to deal with getting an error when they use certain types on those clauses. >> - Only some operators are using the epsilon. >> >> I included the list of the ones which doesn't use the epsilon on >> initial post of this thread. This makes the operators not compatible >> with each other. For example, you cannot say "if box_a @> box_b and >> box_b @> point then box_a @> box_b". > > (It must be "then box_a @> point".) Yes, I am sorry. > Both of the operators don't > seem to use EPSILON so the transitivity holds, but perhaps we > should change them to more appropriate shape, that is, to use > EPSILON. Even having the tolerance, we can define the operators > to keep the transitivity. Yes, that would be another way. In my view, this would make things worse, because I believe epsilon approach is completely broken. The operators which are not using the epsilon are the only ones people can sanely use to develop applications. >> > - The application of epsilon is implementation dependent. >> > >> > Epsilon works between two numbers. There is not always a single way >> > of implementing geometric operators. This is surprising to the users. >> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= >> > epsilon". >> >> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then >> ..". I'm not sure off-hand. But I don't see the relationship is >> significant. What I meant was the behaviour being unclear to even people who knows about the epsilon. If two boxes are overlapping with each other, one who knows about EPSILON can expect the distance between them to be less than EPSILON, but this wouldn't be true. >> - Some operators are violating commutative property. >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". > > Whether EPSILON is introduced or not, commutativity cannot be > assured for it from calculation error, I suppose. It can easily be assured by treating both sides of the operator the same. It is actually assured on my patch. >> - Some operators are violating transitive property. >> >> For example, you cannot say "if point_a ~= point_b and point_b ~= >> point_c then point_a ~= point_c". > > It holds only in ideal (or mathematical) world, or for similars > of integer (which are strictly implemented mathematica world on > computer). Without the EPSILON, two points hardly match by > floating point error, as I mentioned. I don't have an evidence > though (sorry), mere equality among three floating point numbers > is not assured. Of course, it is assured. Floating point numbers are deterministic. >> - The operators with epsilon are not compatible with float operators. >> >> This is also surprising for the users. For example, you cannot say >> "if point_a << point_b then point_a[0] < point_b[0]". > > It is because "is strictly left of" just doesn't mean their > x-coordinates are in such a relationship. The difference might be > surprising but is a specification (truely not a bug:). Then what does that mean? Every operator with EPSILON is currently surprising in different way. People use databases to build applications. They often need to implement same operations on the application side. It is very hard to be bug-compatible with our geometric types. >> = Missing Bits on the Operator Classes > This final section seems to mention the application but sorry, it > still don't describe for me what application that this patch > aiming to enable. But I can understand that you want to handle > floating point numbers like integers. The epsilon is surely quite > annoying for the purpose. I will try to itemise what applications I am aiming to enable: - Applications with very little GIS requirement PostGIS can be too much work for an application in s different business domain but has a little requirement to GIS. The built-in geometric types are incomplete to support real world geography. Getting rid of epsilon would make this worse. In contrary, it would allow people to deal with the difficulties on their own. - Applications with geometric objects I believe people could make use the geometric
Re: [HACKERS] multivariate statistics (v19)
Michael Paquier wrote: > On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra >wrote: > > Attached is v22 of the patch series, rebased to current master and fixing > > the reported bug. I haven't made any other changes - the issues reported by > > Petr are mostly minor, so I've decided to wait a bit more for (hopefully) > > other reviews. > > And nothing has happened since. Are there people willing to review > this patch and help it proceed? I am going to grab this patch as committer. > As this patch is quite large, I am not sure if it is fit to join the > last CF. Thoughts? All patches, regardless of size, are welcome to join any commitfest. The last commitfest is not different in that regard. The rule I remember is that patches may not arrive *for the first time* in the last commitfest. This patch has already seen a lot of work in previous commitfests, so it's fine. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
HI, I patched 4 patchset and run "make", but I got failed. Is this a bug or my mistake ? I'm sorry if I'm wrong. [$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch [$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch [$(TOP)]$ patch -p1 < ../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch [$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch [$(TOP)]$ ./configure [Unicode]$ make '/usr/bin/perl' UCS_to_most.pl Type of arg 1 to keys must be hash (not hash element) at convutils.pm line 443, near "}) " Type of arg 1 to values must be hash (not hash element) at convutils.pm line 596, near "}) " Type of arg 1 to each must be hash (not private variable) at convutils.pm line 755, near "$map) " Compilation failed in require at UCS_to_most.pl line 19. make: *** [iso8859_2_to_utf8.map] Error 255 Regars, -- Ayumi Ishii -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On Mon, Jan 23, 2017 at 6:37 PM, Jim Nasbywrote: > I'm not sure the default GUC setting of 0 makes sense. If you've loaded the > module, presumably you want it to be running. I think it'd be nice if the GUC > had a -1 setting that meant to use checkpoint_timeout. Actually, I think we need to use -1 to mean "don't run the worker at all". 0 means "run the worker, but don't do timed dumps". >0 means "run the worker, and dump at that interval". I have to admit that when I was first thinking about this feature, my initial thought was "hey, let's dump once per checkpoint_timeout". But I think that Mithun's approach is better. There's no intrinsic connection between this and checkpointing, and letting the user pick the interval is a lot more flexible. We could still have a magic value that means "same as checkpoint_timeout" but it's not obvious to me that there's any value in that; the user might as well just pick the time interval that they want. Actually, for busy systems, the interval is probably shorter than checkpoint_timeout. Dumping the list of buffers isn't that expensive, and if you are doing checkpoints every half hour or so that's not probably longer than what you want for this. So I suggest that we should just have the documentation could recommend a suitable value (e.g. 5 minutes) and not worry about checkpoint_timeout. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Wed, Jan 25, 2017 at 6:43 PM, Vitaly Burovoywrote: > On 1/23/17, Haribabu Kommi wrote: > > The patch is split into two parts. > > 1. Macaddr8 datatype support > > 2. Contrib module support. > > Hello, > > I'm sorry for the delay. > The patch is almost done, but I have two requests since the last review. > Thanks for the review. > 1. > src/backend/utils/adt/mac8.c: > + int a, > + b, > + c, > + d = 0, > + e = 0, > ... > > There is no reason to set them as 0. For EUI-48 they will be > reassigned in the "if (count != 8)" block, for EUI-64 -- in one of > sscanf. > They could be set to "d = 0xFF, e = 0xFE," and avoid the "if" block > mentioned above, but it makes the code be much less readable. > > Oh. I see. In the current version it must be assigned because for > EUI-48 memory can have values <0 or >255 in uninitialized variables. > It is better to avoid initialization by merging two parts of the code: > + if (count != 6) > + { > + /* May be a 8-byte MAC address */ > ... > + if (count != 8) > + { > + d = 0xFF; > + e = 0xFE; > + } > > to a single one: > + if (count == 6) > + { > + d = 0xFF; > + e = 0xFE; > + } > + else > + { > + /* May be a 8-byte MAC address */ > ... > Changed accordingly. > 2. > src/backend/utils/adt/network.c: > + res = (mac->a << 24) | (mac->b << 16) | > (mac->c << 8) | (mac->d); > + res = (double)((uint64)res << 32); > + res += (mac->e << 24) | (mac->f << 16) | > (mac->g << 8) | (mac->h); > > Khm... I trust that modern compilers can do a lot of optimizations but > for me it looks terrible because of needless conversions. > The reason why earlier versions did have two lines "res *= 256 * 256" > was an integer overflow for four multipliers, but it can be solved by > defining the first multiplier as a double: > + res = (mac->a << 24) | (mac->b << 16) | > (mac->c << 8) | (mac->d); > + res *= (double)256 * 256 * 256 * 256; > + res += (mac->e << 24) | (mac->f << 16) | > (mac->g << 8) | (mac->h); > > In this case the left-hand side argument for the "*=" operator is > computed at the compile time as a single constant. > The second line can be written as "res *= 256. * 256 * 256 * 256;" > (pay attention to a dot in the first multiplier), but it is not > readable at all (and produces the same code). Corrected as suggested. Updated patch attached. There is no change in the contrib patch. Regards, Hari Babu Fujitsu Australia mac_eui64_support_8.patch Description: Binary data contrib_macaddr8_1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] simplify sequence test
On 25/01/17 03:48, Peter Eisentraut wrote: > We maintain a separate test output file sequence_1.out because the > log_cnt value can vary if there is a checkpoint happening at the right > time. So we have to maintain two files because of a one character > difference. I propose the attached patch to restructure the test a bit > to avoid this, without loss of test coverage. > +1, looks good. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquierwrote: > On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane wrote: >> I wrote: >>> Michael Paquier writes: The table of Pseudo-Types needs to be updated as well with unknown in datatype.sgml. >> >>> Check. >> >> Here's an updated patch with doc changes. Aside from that one, >> I tried to spell "pseudo-type" consistently, and I changed one >> place where we were calling something a pseudo-type that isn't. I think, those changes, even though small, deserve their own commit. The changes themselves look good. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasbywrote: > I took a look at this again, and it doesn't appear to be working for me. The > library is being loaded during startup, but I don't see any further activity > in the log, and I don't see an autoprewarm file in $PGDATA. > > There needs to be some kind of documentation change as part of this patch. > > I'm not sure the default GUC setting of 0 makes sense. If you've loaded the > module, presumably you want it to be running. I think it'd be nice if the GUC > had a -1 setting that meant to use checkpoint_timeout. > > Having the GUC be restart-only is also pretty onerous. I don't think it'd be > hard to make the worker respond to a reload... there's code in the autovacuum > launcher you could use as an example. > +1. I don't think there should be any problem in making it PGC_SIGHUP. > I'm also wondering if this really needs to be a permanently running > process... perhaps the worker could simply be started as necessary? Do you want to invoke worker after every buff_dump_interval? I think that will be bad in terms of starting a new process and who will monitor when to start such a process. I think it is better to keep it as a permanently running background process if loaded by user. > Though maybe that means it wouldn't run at shutdown. Yeah, that will be another drawback. Few comments found while glancing the patch. 1. +TO DO: +-- +Add functionality to dump based on timer at regular interval. I think you need to remove above TO DO. 2. + /* Load the page only if there exist a free buffer. We do not want to + * replace an existing buffer. */ This is not a PG style multiline comment. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Hello, The patch works fine on applying on latest master branch and testing it for various variables as listed in PsqlSettings struct. I will post further comments on patch soon. Thank you, Rahila Syed On Wed, Jan 25, 2017 at 1:33 AM, Tom Lanewrote: > "Daniel Verite" writes: > > Here's an update with these changes: > > I scanned this patch very quickly and think it addresses my previous > stylistic objections. Rahila is the reviewer of record though, so > I'll wait for his comments before going further. > > regards, tom lane >
Re: [HACKERS] Substantial bloat in postgres_fdw regression test runtime
On Thu, Nov 3, 2016 at 1:58 PM, Jeevan Chalkewrote: > > On Wed, Nov 2, 2016 at 10:09 PM, Tom Lane wrote: >> >> In 9.6, "make installcheck" in contrib/postgres_fdw takes a shade >> under 3 seconds on my machine. In HEAD, it's taking 10 seconds. >> I am not happy, especially not since there's no parallelization >> of the contrib regression tests. That's a direct consumption of >> my time and all other developers' time too. This evidently came >> in with commit 7012b132d (Push down aggregates to remote servers), >> and while that's a laudable feature, I cannot help thinking that >> it does not deserve this much of an imposition on every make check >> that's ever done for the rest of eternity. > > > Thanks Tom for reporting this substantial bloat in postgres_fdw regression > test runtime. On my machine "make installcheck" in contrib/postgres_fdw > takes 6.2 seconds on master (HEAD: > 770671062f130a830aa89100c9aa2d26f8d4bf32). > However if I remove all tests added for aggregate push down, it drops down > to 2.2 seconds. Oops 4 seconds more. > > I have timed each of my tests added as part of aggregate push down patch and > observed that one of the test (LATERAL one) is taking around 3.5 seconds. > This is causing because of the parameterization. I have added a filter so > that we will have less number of rows for parameterization. Doing this, > lateral query in question now runs in 100ms. Also updated few more queries > which were taking more than 100ms to have runtime around 30ms or so. So > effectively, with changes "make installcheck" now takes around 2.5 seconds. > > Attached patch with test-case modification. > I verified that this patch indeed bring the time down to 2 to 3 seconds from 10 seconds. The additional condition t2.c2 = 6 seems to echo the filter t2.c2 = 6 of aggregate. We wouldn't know which of those actually worked. I modified the testcase to use t2.c2 % 6 = 0 instead and keep the filter condition intact. This increases the execution time by .2s, which may be ok. Let me know what you thing of the attached patch. Also, please add this to the commitfest, so that it isn't forgotten. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 0045f3f..3a09280 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2685,9 +2685,9 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f -- Outer query is aggregation query explain (verbose, costs off) -select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1; - QUERY PLAN +select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1; + QUERY PLAN +-- Unique Output: ((SubPlan 1)) -> Sort @@ -2696,14 +2696,14 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft -> Foreign Scan Output: (SubPlan 1) Relations: Aggregate on (public.ft2 t2) - Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" + Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0)) SubPlan 1 -> Foreign Scan on public.ft1 t1 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6)) (13 rows) -select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1; +select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1; count --- 1 @@ -2711,7 +2711,7 @@ select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft -- Inner query is aggregation query explain (verbose, costs off) -select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1; +select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote: > On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi> wrote: > > On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker > > wrote: > >> > >> Attached is a patch that implements copy_srf(). > > > > Moved to next CF with "needs review" status. > > This patch is still waiting for review. David, are you planning to > look at it by the end of the CF? I'll be doing this today. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Commit 1574783b4ced0356fbc626af1a1a469faa6b41e1 gratifyingly removed hard-coded superuser checks from assorted functions, which makes it possible to GRANT EXECUTE ON FUNCTION pg_catalog.whatever() to unprivileged users if the DBA so desires. However, the functions in genfile.c still have hard-coded checks: pg_read_file(), pg_read_binary_file(), pg_stat_file(), and pg_ls_dir(). I think those functions ought to get the same treatment that the commit mentioned above gave to a bunch of others. Obviously, there's some risk of DBAs doing stupid things there, but stupidity is hard to prevent in a general way and nanny-ism is annoying. The use case I have in mind is a monitoring tool that needs access to one more of those functions -- in keeping with the principle of least privilege, it's much better to give the monitoring user only the privileges which it actually needs than to make it a superuser. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance improvement for joins where outer side is unique
David Rowleywrote: > On 19 January 2017 at 11:06, David Rowley > wrote: > > Old patch no longer applies, so I've attached a rebased patch. This > > also re-adds a comment line which I mistakenly removed. > > (meanwhile Andres commits 69f4b9c) > > I should've waited a bit longer. > > Here's another that fixes the new conflicts. I suspect that "inner" and "outer" relation / tuple are sometimes confused in comments: * analyzejoins.c:70 "searches for subsequent matching outer tuples." * analyzejoins.c:972 /* * innerrel_is_unique *Check for proofs which prove that 'innerrel' can, at most, match a *single tuple in 'outerrel' based on the join condition in *'restrictlist'. */ * relation.h:1831 boolinner_unique; /* inner side of join matches no more than one * outer side tuple */ -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote: > On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote: > > Attached is a patch that implements copy_srf(). > > > > The function signature reflects cstate more than it reflects the COPY > > options (filename+is_program instead of FILENAME or PROGRAM, etc) > > The patch as it stands needs a rebase. I'll see what I can do today. Please find attached a rebase, which fixes an OID collision that crept in. - The patch builds against master and passes "make check". - The patch does not contain user-visible documentation or examples. - The patch does not contain tests. I got the following when I did a brief test. SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 4dfedf8..ae07cfb 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL STRICT IMMUTABLE PARALLEL SAFE AS 'jsonb_insert'; +CREATE OR REPLACE FUNCTION copy_srf( + IN filename text DEFAULT null, + IN is_program boolean DEFAULT false, + IN format text DEFAULT 'text', + IN delimiter text DEFAULT null, + IN null_string text DEFAULT E'\\N', + IN header boolean DEFAULT false, + IN quote text DEFAULT null, + IN escape text DEFAULT null, + IN encoding text DEFAULT null) +RETURNS SETOF RECORD +LANGUAGE INTERNAL +VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT +AS 'copy_srf'; + -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather -- than use explicit 'superuser()' checks in those functions, we use the GRANT diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f9362be..8e1bd39 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -30,6 +30,7 @@ #include "commands/defrem.h" #include "commands/trigger.h" #include "executor/executor.h" +#include "funcapi.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread) errmsg("could not read from COPY file: %m"))); break; case COPY_OLD_FE: - /* * We cannot read more than minread bytes (which in practice is 1) * because old protocol doesn't have any clear way of separating @@ -4740,3 +4740,377 @@ CreateCopyDestReceiver(void) return (DestReceiver *) self; } + +Datum +copy_srf(PG_FUNCTION_ARGS) +{ + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore = NULL; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + FmgrInfo*in_functions; + Oid *typioparams; + Oid in_func_oid; + + CopyStateData copy_state; + int col; + + Datum *values; + bool*nulls; + + /* check to see if caller supports us returning a tuplestore */ + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("set-valued function called in context that cannot accept a set"))); + } + + if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc == NULL) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("materialize mode required, but it is not allowed in this context"))); + } + + tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc); + values = (Datum *) palloc(tupdesc->natts * sizeof(Datum)); + nulls = (bool *) palloc(tupdesc->natts * sizeof(bool)); + in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo)); + typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid)); + + for (col = 0; col < tupdesc->natts; col++) + { + getTypeInputInfo(tupdesc->attrs[col]->atttypid,_func_oid,[col]); +
Re: [HACKERS] PoC: Grouped base relation
David Rowleywrote: > On 20 January 2017 at 00:22, Antonin Houska wrote: > > Sorry, it was my thinko - I somehow confused David's CROSS JOIN example with > > this one. If one side of the join clause is unique and the other becomes > > unique due to aggregation (and if parallel processing is not engaged) then > > neither combinefn nor multiplyfn should be necessary before the finalfn. > > Yes, if the join can be detected not to duplicate the groups then a > normal aggregate node can be pushed below the join. No need for > Partial Aggregate, or Finalize Aggregate nodes. > > I've a pending patch in the commitfest named "Unique Joins", which > aims teach the planner about the unique properties of joins. So you > should just have both stages of aggregation occur for now, and that > can be improved on once the planner is a bit smart and knows about > unique joins. Thanks for a hint. I haven't paid attention to the "Unique Joins" patch until today. Yes, that's definitely useful. Given the progress of your patch, I don't worry to make the next version of my patch depend on it. Implementing temporary solution for the aggregation push-down seems to me like wasted effort. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Reading 0001_track_root_lp_v9.patch again: > +/* > + * We use the same HEAP_LATEST_TUPLE flag to check if the tuple's t_ctid > field > + * contains the root line pointer. We can't use the same > + * HeapTupleHeaderIsHeapLatest macro because that also checks for > TID-equality > + * to decide whether a tuple is at the of the chain > + */ > +#define HeapTupleHeaderHasRootOffset(tup) \ > +( \ > + ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0 \ > +) > > +#define HeapTupleHeaderGetRootOffset(tup) \ > +( \ > + AssertMacro(((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0), \ > + ItemPointerGetOffsetNumber(&(tup)->t_ctid) \ > +) Interesting stuff; it took me a bit to see why these macros are this way. I propose the following wording which I think is clearer: Return whether the tuple has a cached root offset. We don't use HeapTupleHeaderIsHeapLatest because that one also considers the slow case of scanning the whole block. Please flag the macros that have multiple evaluation hazards -- there are a few of them. > +/* > + * If HEAP_LATEST_TUPLE is set in the last tuple in the update chain. But for > + * clusters which are upgraded from pre-10.0 release, we still check if c_tid > + * is pointing to itself and declare such tuple as the latest tuple in the > + * chain > + */ > +#define HeapTupleHeaderIsHeapLatest(tup, tid) \ > +( \ > + (((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0) || \ > + ((ItemPointerGetBlockNumber(&(tup)->t_ctid) == > ItemPointerGetBlockNumber(tid)) && \ > + (ItemPointerGetOffsetNumber(&(tup)->t_ctid) == > ItemPointerGetOffsetNumber(tid))) \ > +) I suggest rewording this comment as: Starting from PostgreSQL 10, the latest tuple in an update chain has HEAP_LATEST_TUPLE set; but tuples upgraded from earlier versions do not. For those, we determine whether a tuple is latest by testing that its t_ctid points to itself. (as discussed, there is no "10.0 release"; it's called the "10 release" only, no ".0". Feel free to use "v10" or "pg10"). > +/* > + * Get TID of next tuple in the update chain. Caller should have checked that > + * we are not already at the end of the chain because in that case t_ctid may > + * actually store the root line pointer of the HOT chain whose member this > + * tuple is. > + */ > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \ > +do { \ > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \ > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \ > +} while (0) Actually, I think this macro could just return the TID so that it can be used as struct assignment, just like ItemPointerCopy does internally -- callers can do ctid = HeapTupleHeaderGetNextTid(tup); or more precisely, this pattern > + if (!HeapTupleHeaderIsHeapLatest(tp.t_data, _self)) > + HeapTupleHeaderGetNextTid(tp.t_data, >ctid); > + else > + ItemPointerCopy(_self, >ctid); becomes hufd->ctid = HeapTupleHeaderIsHeapLatest(foo) ? HeapTupleHeaderGetNextTid(foo) : >t_self; or something like that. I further wonder if it'd make sense to hide this into yet another macro. The API of RelationPutHeapTuple appears a bit contorted, where root_offnum is both input and output. I think it's cleaner to have the argument be the input, and have the output offset be the return value -- please check whether that simplifies things; for example I think this: > + root_offnum = InvalidOffsetNumber; > + RelationPutHeapTuple(relation, buffer, heaptup, false, > + _offnum); becomes root_offnum = RelationPutHeapTuple(relation, buffer, heaptup, false, InvalidOffsetNumber); Please remove the words "must have" in this comment: > + /* > + * Also mark both copies as latest and set the root offset information. > If > + * we're doing a HOT/WARM update, then we just copy the information from > + * old tuple, if available or computed above. For regular updates, > + * RelationPutHeapTuple must have returned us the actual offset number > + * where the new version was inserted and we store the same value since > the > + * update resulted in a new HOT-chain > + */ Many comments lack finishing periods in complete sentences, which looks odd. Please fix. I have not looked at the other patch yet. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
I tend to agree. But in the past when this came up people pointed out you could equally do things this way and still grant all the access you wanted using SECURITY DEFINER. Arguably that's a better approach because then instead of auditing the entire monitor script you only need to audit this one wrapper function, pg_ls_monitor_dir() which just calls pg_ls_dir() on this one directory. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 11:30 AM, Greg Starkwrote: > I tend to agree. But in the past when this came up people pointed out > you could equally do things this way and still grant all the access > you wanted using SECURITY DEFINER. Arguably that's a better approach > because then instead of auditing the entire monitor script you only > need to audit this one wrapper function, pg_ls_monitor_dir() which > just calls pg_ls_dir() on this one directory. I agree that can be done, but it's nicer if you can use the same SQL all the time. With that solution, you need one SQL query to run when you've got superuser privileges and a different SQL query to run when you are running without superuser privileges but somebody's run the create-security-definer-wrappers-for-me script. That's a deployment nuisance if you want to support both configurations. Also, the same argument could be made about removing the built-in superuser check from ANY function, and we've already rejected that argument for a bunch of other functions. If we say that argument is valid for some functions but not others, then we've got to decide for which ones it's valid and for which ones it isn't, and consensus will not be forthcoming. I take the position that hard-coded superuser checks stink in general, and I'm grateful to Stephen for his work making dump/restore work properly on system catalog permissions so that we can support better alternatives. I'm not asking for anything more than that we apply that same policy here as we have in other cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Tue, Jan 24, 2017 at 1:49 AM, Claudio Freirewrote: > On Fri, Jan 20, 2017 at 6:24 AM, Masahiko Sawada > wrote: >> On Thu, Jan 19, 2017 at 8:31 PM, Claudio Freire >> wrote: >>> On Thu, Jan 19, 2017 at 6:33 AM, Anastasia Lubennikova >>> wrote: 28.12.2016 23:43, Claudio Freire: Attached v4 patches with the requested fixes. Sorry for being late, but the tests took a lot of time. >>> >>> I know. Takes me several days to run my test scripts once. >>> create table t1 as select i, md5(random()::text) from generate_series(0,4) as i; create index md5_idx ON t1(md5); update t1 set md5 = md5((random() * (100 + 500))::text); vacuum; Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass, while for old version it took three passes (1GB+1GB+0.9GB). Vacuum duration results: vanilla: LOG: duration: 4359006.327 ms statement: vacuum verbose t1; patched: LOG: duration: 3076827.378 ms statement: vacuum verbose t1; We can see 30% vacuum speedup. I should note that this case can be considered as favorable to vanilla vacuum: the table is not that big, it has just one index and disk used is a fast fusionIO. We can expect even more gain on slower disks. Thank you again for the patch. Hope to see it in 10.0. >>> >>> Cool. Thanks for the review and the tests. >>> >> >> I encountered a bug with following scenario. >> 1. Create table and disable autovacuum on that table. >> 2. Make about 20 dead tuples on the table. >> 3. SET maintenance_work_mem TO 1024 >> 4. VACUUM >> >> @@ -729,7 +759,7 @@ lazy_scan_heap(Relation onerel, int options, >> LVRelStats *vacrelstats, >> * not to reset latestRemovedXid since we want >> that value to be >> * valid. >> */ >> - vacrelstats->num_dead_tuples = 0; >> + lazy_clear_dead_tuples(vacrelstats); >> vacrelstats->num_index_scans++; >> >> /* Report that we are once again scanning the heap */ >> >> I think that we should do vacrelstats->dead_tuples.num_entries = 0 as >> well in lazy_clear_dead_tuples(). Once the amount of dead tuples >> reached to maintenance_work_mem, lazy_scan_heap can never finish. > > That's right. > > I added a test for it in the attached patch set, which uncovered > another bug in lazy_clear_dead_tuples, and took the opportunity to > rebase. > > On Mon, Jan 23, 2017 at 1:06 PM, Alvaro Herrera > wrote: >> I pushed this patch after rewriting it rather completely. I added >> tracing notices to inspect the blocks it was prefetching and observed >> that the original coding was failing to prefetch the final streak of >> blocks in the table, which is an important oversight considering that it >> may very well be that those are the only blocks to read at all. >> >> I timed vacuuming a 4000-block table in my laptop (single SSD disk; >> dropped FS caches after deleting all rows in table, so that vacuum has >> to read all blocks from disk); it changes from 387ms without patch to >> 155ms with patch. I didn't measure how much it takes to run the other >> steps in the vacuum, but it's clear that the speedup for the truncation >> phase is considerable. >> >> ĄThanks, Claudio! > > Cool. > > Though it wasn't the first time this idea has been floating around, I > can't take all the credit. > > > On Fri, Jan 20, 2017 at 6:25 PM, Alvaro Herrera > wrote: >> FWIW, I think this patch is completely separate from the maint_work_mem >> patch and should have had its own thread and its own commitfest entry. >> I intend to get a look at the other patch next week, after pushing this >> one. > > That's because it did have it, and was left in limbo due to lack of > testing on SSDs. I just had to adopt it here because otherwise tests > took way too long. Thank you for updating the patch! + /* +* Quickly rule out by lower bound (should happen a lot) Upper bound was +* already checked by segment search +*/ + if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0) + return false; I think that if the above result is 0, we can return true as itemptr matched lower bound item pointer in rseg->dead_tuples. +typedef struct DeadTuplesSegment +{ + int num_dead_tuples;/* # of entries in the segment */ + int max_dead_tuples;/* # of entries allocated in the segment */ + ItemPointerData last_dead_tuple;/* Copy of the last dead tuple (unset + * until the segment is fully + * populated) */ + unsigned short padding; + ItemPointer dead_tuples;
Re: [HACKERS] COPY as a set returning function
David Fetter wrote: > @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, > int maxread) >errmsg("could not read from > COPY file: %m"))); > break; > case COPY_OLD_FE: > - > /* >* We cannot read more than minread bytes (which in > practice is 1) >* because old protocol doesn't have any clear way of > separating This change is pointless as it'd be undone by pgindent. > + /* > + * Function signature is: > + * copy_srf( filename text default null, > + * is_program boolean default false, > + * format text default 'text', > + * delimiter text default E'\t' in text mode, ',' in csv mode, > + * null_string text default '\N', > + * header boolean default false, > + * quote text default '"' in csv mode only, > + * escape text default null, -- defaults to whatever quote is > + * encoding text default null > + */ This comment would be mangled by pgindent -- please add an /*--- marker to prevent it. > + /* param 7: escape text default null, -- defaults to whatever quote is > */ > + if (PG_ARGISNULL(7)) > + { > + copy_state.escape = copy_state.quote; > + } > + else > + { > + if (copy_state.csv_mode) > + { > + copy_state.escape = > TextDatumGetCString(PG_GETARG_TEXT_P(7)); > + if (strlen(copy_state.escape) != 1) > + { > + ereport(ERROR, > + > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("COPY escape must be a > single one-byte character"))); > + } > + } > + else > + { > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("COPY escape available only in > CSV mode"))); > + } > + } I don't understand why do we have all these checks. Can't we just pass the values to COPY and let it apply the checks? That way, when COPY is updated to support multibyte escape chars (for example) we don't need to touch this code. Together with removing the unneeded braces that would make these stanzas about six lines long instead of fifteen. > + tuple = heap_form_tuple(tupdesc,values,nulls); > + //tuple = BuildTupleFromCStrings(attinmeta, field_strings); > + tuplestore_puttuple(tupstore, tuple); No need to form a tuple; use tuplestore_putvalues here. > + } > + > + /* close "file" */ > + if (copy_state.is_program) > + { > + int pclose_rc; > + > + pclose_rc = ClosePipeStream(copy_state.copy_file); > + if (pclose_rc == -1) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not close pipe to > external command: %m"))); > + else if (pclose_rc != 0) > + ereport(ERROR, > + > (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), > + errmsg("program \"%s\" failed", > + copy_state.filename), > + errdetail_internal("%s", > wait_result_to_str(pclose_rc; > + } > + else > + { > + if (copy_state.filename != NULL && > FreeFile(copy_state.copy_file)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not close file \"%s\": > %m", > + copy_state.filename))); > + } I wonder if these should be an auxiliary function in copy.c to do this. Surely copy.c itself does pretty much the same thing ... > +DATA(insert OID = 3353 ( copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 > 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ > copy_srf _null_ _null_ _null_ )); > +DESCR("set-returning COPY proof of concept"); Why is this marked "proof of concept"? If this is a PoC only, why are you submitting it as a patch? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On Wed, Jan 25, 2017 at 2:49 PM, Tom Lanewrote: > I looked at the 0002 patch, and while the code is probably OK, I am > dissatisfied with this API spec: > > + * If copy is TRUE, the slot receives a copied tuple that will stay valid > + * regardless of future manipulations of the tuplesort's state. Memory is > + * owned by the caller. If copy is FALSE, the slot may just receive a > pointer > + * to a tuple held within the tuplesort. The latter is more efficient, but > + * the slot contents may be corrupted if there is another call here before > + * previous slot contents are used. > > What does "here" mean? If that means specifically "another call of > tuplesort_gettupleslot", say so. If "here" refers to the whole module, > it would be better to say something like "the slot contents may be > invalidated by any subsequent manipulation of the tuplesort's state". > In any case it'd be a good idea to delineate safe usage patterns, perhaps > "copy=FALSE is recommended only when the next tuplesort manipulation will > be another tuplesort_gettupleslot fetch into the same slot." I agree with your analysis. It means "another call to tuplesort_gettupleslot", but I believe that it would be safer (more future-proof) to actually specify "the slot contents may be invalidated by any subsequent manipulation of the tuplesort's state" instead. > There are several other uses of "call here", both in this patch and > pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. > Let's try to improve that. Should I write a patch along those lines? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] safer node casting
Hi Peter, On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote: > RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc)); Are you planning to add this / update this patch? Because I really would have liked this a number of times already... I can update it according to my suggestions (to avoid multiple eval scenarios) if helpful Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:14 AM, Peter Geogheganwrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> As it is, there are backup solutions which *do* check the checksum when >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> but something which really exists and will hopefully keep users from >> losing data. > > Wouldn't that have issues with torn pages? Why? What do you foresee here? I would think such backup solutions are careful enough to ensure correctly the durability of pages so as they are not partially written. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] safer node casting
Andres Freundwrites: > On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote: >> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc)); > Are you planning to add this / update this patch? Because I really would > have liked this a number of times already... I can update it according > to my suggestions (to avoid multiple eval scenarios) if helpful Yeah, I'd like that in sooner rather than later, too. But we do need it to be foolproof - no multiple evals. The first draft had inadequate-parenthesization hazards, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geogheganwrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. That means you have to replay enough WAL to get into a consistent state... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:22 AM, Andres Freundwrote: > On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: >> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >> As it is, there are backup solutions which *do* check the checksum when >> >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >> but something which really exists and will hopefully keep users from >> >> losing data. >> > >> > Wouldn't that have issues with torn pages? >> >> Why? What do you foresee here? I would think such backup solutions are >> careful enough to ensure correctly the durability of pages so as they >> are not partially written. > > That means you have to replay enough WAL to get into a consistent > state... Ah, OK I got the point. Yes that would be a problem to check this field on raw backups except if the page size matches the kernel's one at 4k. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquierwrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >>> As it is, there are backup solutions which *do* check the checksum when >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >>> but something which really exists and will hopefully keep users from >>> losing data. >> >> Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. Well, you'd have to keep a read(fd, buf, 8192) performed by the backup tool from overlapping with a write(fd, buf, 8192) performed by the backend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > * Peter Geoghegan (p...@heroku.com) wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frostwrote: > > > As it is, there are backup solutions which *do* check the checksum when > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > but something which really exists and will hopefully keep users from > > > losing data. > > > > Wouldn't that have issues with torn pages? > > No, why would it? The page has either been written out by PG to the OS, > in which case the backup s/w will see the new page, or it hasn't been. Uh. Writes aren't atomic on that granularity. That means you very well *can* see a torn page (in linux you can e.g. on 4KB os page boundaries of a 8KB postgres page). Just read a page while it's being written out. You simply can't reliably verify checksums without replaying WAL (or creating a manual version of replay, as in checking the WAL for a FPW). > This isn't like a case where only half the page made it to the disk > because of a system failure though; everything is online and working > properly during an online backup. I don't think that really changes anything. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > > * Peter Geoghegan (p...@heroku.com) wrote: > > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frostwrote: > > > > As it is, there are backup solutions which *do* check the checksum when > > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > > but something which really exists and will hopefully keep users from > > > > losing data. > > > > > > Wouldn't that have issues with torn pages? > > > > No, why would it? The page has either been written out by PG to the OS, > > in which case the backup s/w will see the new page, or it hasn't been. > > Uh. Writes aren't atomic on that granularity. That means you very well > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > of a 8KB postgres page). Just read a page while it's being written out. > > You simply can't reliably verify checksums without replaying WAL (or > creating a manual version of replay, as in checking the WAL for a FPW). Looking through the WAL isn't any surprise and is something we've been planning to do for other reasons anyway. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > That would be enough. It should also be rare enough that there would > not be that many pages to track when looking at records from the > backup start position to minimum recovery point. It could be also > simpler, though more time-consuming, to just let a backup recover up > to the minimum recovery point (recovery_target = 'immediate'), and > then run the checksum sanity checks. There are other checks usually > needed on a backup anyway like being sure that index pages are in good > shape even with a correct checksum, etc. Belive me, I'm all for *all* of that. > But here I am really high-jacking the thread, so I'll stop.. If you have further thoughts, I'm all ears. This is all relatively new, and I don't expect to have all of the answer or solutions. Obviously, having to bring up a full database is an extra step (one we try to make easy to do), but, sadly, we don't have any way to ask PG to verify all the checksums with released versions, so that's what we're working with. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] patch: function xmltable
2017-01-25 15:07 GMT+01:00 Alvaro Herrera: > Tom Lane wrote: > > Andres Freund writes: > > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > > >> XMLTABLE is specified by the standard to return multiple rows ... but > > >> then as far as my reading goes, it is only supposed to be supported in > > >> the range table (FROM clause) not in the target list. I wonder if > > >> this would end up better if we only tried to support it in RT. I > asked > > >> Pavel to implement it like that a few weeks ago, but ... > > > > > Right - it makes sense in the FROM list - but then it should be an > > > executor node, instead of some expression thingy. > > > > +1 --- we're out of the business of having simple expressions that > > return rowsets. > > Well, that's it. I'm not committing this patch against two other > committers' opinion, plus I was already on the fence about the > implementation anyway. I think you should just go with the flow and > implement this by creating nodeTableexprscan.c. It's not even > difficult. > I am playing with this and the patch looks about 15kB longer - just due implementation basic scan functionality - and I didn't touch a planner. I am not happy from this - still I have a feeling so I try to reimplement reduced SRF. Regards Pavel > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frostwrote: > > Your email is 'pg_ls_dir & friends', which I took to imply at *least* > > pg_read_file() and pg_read_binary_file(), and it's not unreasonable to > > think you may have meant everything in adminpack, which also includes > > pg_file_write(), pg_file_rename() and pg_file_unlink(). > > Well, I was talking about genfile.c, which doesn't contain those > functions, but for the record I'm in favor of extirpating every single > hard-coded superuser check from the backend without exception. Then it was correct for me to assume that's what you meant, and means my reaction and response are on-point. > Preventing people from calling functions by denying the ability to > meaningfully GRANT EXECUTE on those functions doesn't actually stop > them from delegating those functions to non-superusers. It either (a) > causes them to give superuser privileges to accounts that otherwise > would have had lesser privileges or (b) forces them to use wrapper > functions to grant access rather than doing it directly or (c) some > other dirty hack. If they pick (a), security is worse; if they pick > (b) or (c), you haven't prevented them from doing what they wanted to > do anyway. You've just made it annoying and inconvenient. The notion that security is 'worse' under (a) is flawed- it's no different. With regard to 'b', if their wrapper function is sufficiently careful to ensure that the caller isn't able to do anything which would increase the caller's level to that of superuser, then security is improved. If the wrapper simply turns around can calls the underlying function, then it's no different from '(a)'. I am suggesting that we shouldn't make it look like there are distinctions when there is actually no difference. That is a good thing for our users because it keeps them informed about what they're actually doing when they grant access. > Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch > of things that don't work unless you run as superuser. I think we > should be trying to provide ways of reducing those lists. If I can't > get agreement on method (a), I'm going to go look for ways of doing > (b) or (c), which is more work and uglier but if I can't get consensus > here then oh well. I've commented on here and spoken numerous times about exactly that goal of reducing the checks in check_postgres.pl which require superuser. You're not actually doing that and nothing you've outlined in here so far makes me believe you see how having pg_write_file() access is equivilant to giving someone superuser, and that concerns me. As someone who has contributed code and committed code back to check_postgres.pl, I would be against making changes there which install security definer functions to give the monitoring user superuser-level access, and I believe Greg S-M would feel the same way considering that he and I have discussed exactly that in the past. I don't mean to speak for him and perhaps his opinion has changed, but it seems unlikely to me. If the DBA wants to give the monitoring user superuser-level access to run the superuser-requiring checks that check_postgres.pl has, they're welcome to do so, but they'll be making an informed decision that they have weighed the risk of their monitoring user being compromised against the value of that additional monitoring, which is an entirely appropriate and reasonable decision for them to make. > I do not accept your authority to determine what monitoring tools need > to or should do. Monitoring tools that use pg_ls_dir are facts on the > ground, and your disapprobation doesn't change that at all. It just > obstructs moving to a saner permissions framework. Allowing GRANT to be used to give access to pg_write_file and friends is not a 'permissions framework'. Further, I am not claiming authority over what monitoring tools should need to do or not do, and a great many people run their monitoring tools as superuser. I am not trying to take away their ability to do so. The way to make progress here is not, however, to just decide that all those superuser() checks we put in place were silly and should be removed, it's to provide better ways to monitor PG which provide exactly the monitoring information needed in a useful and sensible way. I understand the allure of just removing a few lines of code to make things "easier" or "faster" or "better", but I don't think it's a good idea to remove these superuser checks, nor do I think it's a good idea to remove our WAL CRCs even if it'd help some of my clients, nor do I think it's a good idea to have checksums disabled by default, or to rip out all of WAL as being "unnecessary" because we have ZFS and it should handle all things data and we could just fsync all of the heap files on commit and be done with it. Admittedly, you're not argueing for half of what I just
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > Preventing people from calling functions by denying the ability to > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > > them from delegating those functions to non-superusers. It either (a) > > > causes them to give superuser privileges to accounts that otherwise > > > would have had lesser privileges or (b) forces them to use wrapper > > > functions to grant access rather than doing it directly or (c) some > > > other dirty hack. If they pick (a), security is worse; if they pick > > > (b) or (c), you haven't prevented them from doing what they wanted to > > > do anyway. You've just made it annoying and inconvenient. > > > > The notion that security is 'worse' under (a) is flawed- it's no > > different. > > Huh? Obviously that's nonesense, given the pg_ls_dir example. Robert's made it clear that he'd like to have a blanket rule that we don't have superuser checks in these code paths if they can be GRANT'd at the database level, which goes beyond pg_ls_dir. If the question was only about pg_ls_dir, then I still wouldn't be for it, because, as the bits you didn't quote discussed, it encourages users and 3rd party tool authors to base more things off of pg_ls_dir to look into the way PG stores data on disk, and affords more access than the monitoring user has any need for, none of which are good, imv. It also discourages people from implementing proper solutions which you can 'just use pg_ls_dir()', which I also don't agree with. If you really want to do an ls, go talk to the OS. ACLs are possible to provide that with more granularity than what would be available through pg_ls_dir(). We aren't in the "give a user the ability to do an ls" business, frankly. > > With regard to 'b', if their wrapper function is > > sufficiently careful to ensure that the caller isn't able to do anything > > which would increase the caller's level to that of superuser, then > > security is improved. > > Given how complex "sufficiently careful" is for security definer UDFs, > in comparison to estimating the security of granting to a function like > pg_ls_dir (or pretty much any other that doesn't call out to SQL level > stuff like operators, output functions, etc), I don't understand this. If you're implying that security definer UDFs are hard to write and get correct, then I agree with you there. I was affording the benefit of the doubt to that proposed approach. > > If the wrapper simply turns around can calls the underlying function, > > then it's no different from '(a)'. > > Except for stuff like search path. If you consider '(a)' to be the same as superuser, which I was postulating, then this doesn't strike me as making terribly much difference. > > I am suggesting that we shouldn't make it look like there are > > distinctions when there is actually no difference. That is a good thing > > for our users because it keeps them informed about what they're actually > > doing when they grant access. > > This position doesn't make a lick of sense to me. There's simply no > benefit at all in requiring to create wrapper functions, over allowing > to grant to non-superuser. Both is possible, secdef is a lot harder to > get right. And you already heavily started down the path of removing > superuser() type checks - you're just arguing to make it more or less > randomly less consistent. I find this bizarre considering I went through a detailed effort to go look at every superuser check in the system and discussed, on this list, the reasoning behind each and every one of them. I do generally consider arbitrary access to syscalls via the database to be a privilege which really only the superuser should have. > > I've commented on here and spoken numerous times about exactly that goal > > of reducing the checks in check_postgres.pl which require superuser. > > You're not actually doing that and nothing you've outlined in here so > > far makes me believe you see how having pg_write_file() access is > > equivilant to giving someone superuser, and that concerns me. > > That's the users responsibility, and Robert didnt' really suggest > granting pg_write_file() permissions, so this seems to be a straw man. He was not arguing for only pg_ls_dir(), but for checks in all "friends", which he later clarified to include pg_write_file(). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Peter Geogheganwrites: > I should have already specifically pointed out that the original > discussion on what became 0002-* is here: > postgr.es/m/7256.1476711...@sss.pgh.pa.us > As I said already, the general idea seems uncontroversial. I looked at the 0002 patch, and while the code is probably OK, I am dissatisfied with this API spec: + * If copy is TRUE, the slot receives a copied tuple that will stay valid + * regardless of future manipulations of the tuplesort's state. Memory is + * owned by the caller. If copy is FALSE, the slot may just receive a pointer + * to a tuple held within the tuplesort. The latter is more efficient, but + * the slot contents may be corrupted if there is another call here before + * previous slot contents are used. What does "here" mean? If that means specifically "another call of tuplesort_gettupleslot", say so. If "here" refers to the whole module, it would be better to say something like "the slot contents may be invalidated by any subsequent manipulation of the tuplesort's state". In any case it'd be a good idea to delineate safe usage patterns, perhaps "copy=FALSE is recommended only when the next tuplesort manipulation will be another tuplesort_gettupleslot fetch into the same slot." There are several other uses of "call here", both in this patch and pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. Let's try to improve that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
[ in the service of closing out this thread... ] Peter Geogheganwrites: > Finally, 0003-* is a Valgrind suppression borrowed from my parallel > CREATE INDEX patch. It's self-explanatory. Um, I didn't find it all that self-explanatory. Why wouldn't we want to avoid writing undefined data? I think the comment at least needs to explain exactly what part of the written data might be uninitialized. And I'd put the comment into valgrind.supp, too, not in the commit msg. Also, the suppression seems far too broad. It would for instance block any complaint about a write() invoked via an elog call from any function invoked from any LogicalTape* function, no matter how far removed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frostwrote: > I hope to discuss it further after we have the ability to turn it off > easily. I think we should have the ability to flip it in BOTH directions easily. >> Second, really hard to enable is a relative term. I accept that >> enabling checksums is not a pleasant process. Right now, you'd have >> to do a dump/restore, or use logical replication to replicate the data >> to a new cluster and then switch over. On the other hand, if >> checksums are really a critical feature, how are people getting to the >> point where they've got a mission-critical production system and only >> then discovering that they want to enable checksums? > > I truely do wish everyone would come talk to me before building out a > database. Perhaps that's been your experience, in which case, I envy > you, but I tend to get a reaction more along the lines of "wait, what do > you mean I had to pass some option to initdb to enable checksum?!?!". > The fact that we've got a WAL implementation and clearly understand > fsync requirements, why full page writes make sense, and that our WAL > has its own CRCs which isn't possible to disable, tends to lead people > to think we really know what we're doing and that we care a lot about > their data. It sounds to me like you are misleading users about the positives and negatives of checksums, which then causes them to be shocked that they are not the default. > As I have said, I don't believe it has to be on for everyone. For the second time, I didn't say that. But the default has a powerful influence on behavior. If it didn't, you wouldn't be trying to get it changed. > [ unsolicited bragging about an unspecified backup tool, presumably > pgbackrest ] Great. > Presently, last I checked at least, the database system doesn't fall > over and die if a single page's checksum fails. This is another thing that I never said. > [ more unsolicited bragging an unspecified backup tool, presumably still > pgbackrest ] Swell. >> I'm not trying to downplay the usefulness of checksums *in a certain >> context*. It's a good feature, and I'm glad we have it. But I think >> you're somewhat inflating the utility of it while discounting the very >> real costs. > > The costs for checksums don't bother me any more than the costs for WAL > or WAL CRCs or full page writes. Obviously. But I think they should. Frankly, I think the costs for full page writes should bother the heck out of all of us, but the solution isn't to shut them off any more than it is to enable checksums despite the cost. It's to find a way to reduce the costs. > They may not be required on every > system, but they're certainly required on more than 'zero' entirely > reasonable systems which people deploy in their production environments. Nobody said otherwise. > I'd rather walk into an engagement where the user is saying "yeah, we > enabled checksums and it caught this corruption issue" than having to > break the bad news, which I've had to do over and over, that their > existing system hasn't got checksums enabled. This isn't hypothetical, > it's what I run into regularly with entirely reasonable and skilled > engineers who have been deploying PG. Maybe you should just stop telling them and use the time thus freed up to work on improving the checksum feature. I'm skeptical of this whole discussion because you seem to be filled with unalloyed confidence that checksums have little performance impact and will do wonderful things to prevent data loss, whereas I think they have significant performance impact and will only very slightly help to prevent data loss. I admit that the idea of having pgbackrest verify checksums while backing up seems like it could greatly improve the chances of checksums being useful, but I'm not going to endorse changing PostgreSQL's default for pgbackrest's benefit. It's got to be to the benefit of PostgreSQL users broadly, not just the subset of those people who use one particular backup tool. Also, the massive hit that will probably occur on high-concurrency OLTP workloads larger than shared_buffers is going to be had to justify for any amount of backup security. I think that problem's got to be solved or at least mitigated before we think about changing this. I realize that not everyone would set the bar that high, but I see far too many customers with exactly that workload to dismiss it lightly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frostwrote: > > As it is, there are backup solutions which *do* check the checksum when > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > but something which really exists and will hopefully keep users from > > losing data. > > Wouldn't that have issues with torn pages? No, why would it? The page has either been written out by PG to the OS, in which case the backup s/w will see the new page, or it hasn't been. Our testing has not turned up any issues as yet. That said, it's relatively new and I wouldn't be surprised if we need to do some adjustments in that area, which might be system-dependent even. We could certainly check the WAL for the page that had a checksum error (we currently simply report them, though don't throw away a prior backup if we detect one). This isn't like a case where only half the page made it to the disk because of a system failure though; everything is online and working properly during an online backup. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Michael Paquier (michael.paqu...@gmail.com) wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geogheganwrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. I believe his concern was that the backup sw might see a partially-updated page when it reads the file while PG is writing it. In other words, would the kernel return some intermediate state of data while an fwrite() is in progress. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier >wrote: > > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >>> As it is, there are backup solutions which *do* check the checksum when > >>> backing up PG. This is no longer, thankfully, some hypothetical thing, > >>> but something which really exists and will hopefully keep users from > >>> losing data. > >> > >> Wouldn't that have issues with torn pages? > > > > Why? What do you foresee here? I would think such backup solutions are > > careful enough to ensure correctly the durability of pages so as they > > are not partially written. > > Well, you'd have to keep a read(fd, buf, 8192) performed by the backup > tool from overlapping with a write(fd, buf, 8192) performed by the > backend. As Michael mentioned, that'd depend on if things are atomic from a user's perspective at certain sizes (perhaps 4k, which wouldn't be too surprising, but may also be system-dependent), in which case verifying that the page is in the WAL would be sufficient. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frostwrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier >> wrote: >> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >>> As it is, there are backup solutions which *do* check the checksum when >> >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >>> but something which really exists and will hopefully keep users from >> >>> losing data. >> >> >> >> Wouldn't that have issues with torn pages? >> > >> > Why? What do you foresee here? I would think such backup solutions are >> > careful enough to ensure correctly the durability of pages so as they >> > are not partially written. >> >> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup >> tool from overlapping with a write(fd, buf, 8192) performed by the >> backend. > > As Michael mentioned, that'd depend on if things are atomic from a > user's perspective at certain sizes (perhaps 4k, which wouldn't be too > surprising, but may also be system-dependent), in which case verifying > that the page is in the WAL would be sufficient. That would be enough. It should also be rare enough that there would not be that many pages to track when looking at records from the backup start position to minimum recovery point. It could be also simpler, though more time-consuming, to just let a backup recover up to the minimum recovery point (recovery_target = 'immediate'), and then run the checksum sanity checks. There are other checks usually needed on a backup anyway like being sure that index pages are in good shape even with a correct checksum, etc. But here I am really high-jacking the thread, so I'll stop.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pdf versus single-html
On 1/21/17 6:29 AM, Erik Rijkers wrote: > It might even be good to include such a single-file html in the Makefile > as an option. > > I'll give it a try but has anyone done this work already, perhaps? Already exists: doc/src/sgml$ make postgres.html -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote: > > Feel free to mark it returned as feedback. The concept didn't > generate as much enthusiasm as I had hoped, so I think the right > thing to do now is scale it back to a patch that makes > CopyFromRawFields() externally visible so that extensions can use > them. You're getting enthusiasm in the form of these reviews and suggestions for improvement. That it doesn't always happen immediately is a byproduct of the scarcity of developer time and the sheer volume of things to which people need to pay attention. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 1/22/17 8:11 PM, Petr Jelinek wrote: > 0001 - Changes the libpqrcv_connect to use async libpq api so that it > won't get stuck forever in case of connect is stuck. This is preexisting > bug that also affects walreceiver but it's less visible there as there > is no SQL interface to initiate connection there. Probably a mistake here: + case PGRES_POLLING_READING: + extra_flag = WL_SOCKET_READABLE; + /* pass through */ + case PGRES_POLLING_WRITING: + extra_flag = WL_SOCKET_WRITEABLE; extra_flag gets overwritten in the reading case. Please elaborate in the commit message what this change is for. > 0002 - Close replication connection when CREATE SUBSCRIPTION gets > canceled (otherwise walsender on the other side may stay in idle in > transaction state). committed > 0003 - Fixes buffer initialization in walsender that I found when > testing the above two. This one should be back-patched to 9.4 since it's > broken since then. Can you explain more in which code path this problem occurs? I think we should get rid of the global variables and give each function its own buffer that it initializes the first time through. Otherwise we'll keep having to worry about this. > 0004 - Fixes the foreign key issue reported by Thom Brown and also adds > tests for FK and trigger handling. I think the trigger handing should go into execReplication.c. > 0005 - Adds support for renaming publications and subscriptions. Could those not be handled in the generic rename support in ExecRenameStmt()? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical-replication.sgml improvements
On 1/20/17 11:00 AM, Erik Rijkers wrote: > logical-replication.sgml.diff changes Committed, thanks! -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasbywrote: > I'm not completely grokking your second paragraph, but I would think that an > average user would love got get a heads-up that their hardware is failing. Sure. If the database runs fast enough with checksums enabled, there's basically no reason to have them turned off. The issue is when it doesn't. Also, it's not as if there are no other ways of checking whether your disks are failing. SMART, for example, is supposed to tell you about incipient hardware failures before PostgreSQL ever sees a bit flip. Surely an average user would love to get a heads-up that their hardware is failing even when that hardware is not being used to power PostgreSQL, yet many people don't bother to configure SMART (or similar proprietary systems provided by individual vendors). Trying to force those people to use checksums is just masterminding; they've made their own decision that it's not worth bothering with. When something goes wrong, WE still care about distinguishing hardware failure from PostgreSQL failure. Our pride is on the line. But the customer often doesn't. The DBA isn't the same person as the operating system guy, and the operating system guy isn't going to listen to the DBA even if the DBA complains of checksum failures. Or the customer has 100 things on the same piece of hardware and PostgreSQL is the only one that failed; or alternatively they all failed around the same time; either way the culprit is obvious. Or the remedy is to restore from backup[1] whether the problem is hardware or software and regardless of whose software is to blame. Or their storage cost a million dollars and is a year old and they simply won't believe that it's failing. Or their storage cost a hundred dollars and is 8 years old and they're looking for an excuse to replace it whether it's responsible for the problem du jour or not. I think it's great that we have a checksum feature and I think it's great for people who want to use it and are willing to pay the cost of it to turn it on. I don't accept the argument that all of our users, or even most of them, fall into that category. I also think it's disappointing that there's such a vigorous argument for changing the default when so little follow-on development has gone into this feature. If we had put any real effort into making this easier to turn on and off, for example, the default value would be less important, because people could change it more easily. But nobody's making that effort. I suggest that the people who think this a super-high-value feature should be willing to put some real work into improving it instead of trying to force it down everybody's throat as-is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Alternatively, sometimes the remedy is to wish the had a usable backup while frantically running pg_resetxlog. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 1/23/17 11:19 AM, Fujii Masao wrote: > The copyright in each file that the commit of logical rep added needs to > be updated. I have fixed that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [FEATURE PATCH] pg_stat_statements with plans
Hello psql-hackers! TL:DR; We extended the functionality of pg_stat_statements so it can track worst and best case execution plans. Based on a suggestion of my colleague Arne Scheffer, Marius Timmer and I extended pg_stat_statements so it can also record execution plans, whenever the execution time is exceeded (or deceeded) by a definable factor. We were largely inspired by the pg_stat_plans extension by Peter Geoghegan and Simon Riggs - we don't claim any originality on this part - which is unfortunately not available on newer postgresql versions. There are a few differences which will become apparent in the following lines. By default, the modified pg_stat_statements extension will now track good plans and bad plans for each entry in pg_stat_statements. The plans are not normalized or hashed (as opposed to pg_stat_plans), they represent discreet statements. A good plan is saved, whenever this sort of query has been used for the first time or the time of the previously recorded good plan has been deceeded by a smaller factor than 0.9 . Analogous to this, a bad_plan is saved, when the time has been exceeded by a factor greater than 1.1 . There are GUCs available so these parameters can be tuned to your liking. Tracking can be disabled for both plans individually. A plan_format can be defined to enable better readability or processability through other tools. You can reset your good and bad plans by using a select on pg_stat_statements_good_plan_reset([queryid]); resetting bad plans uses pg_stat_statements_bad_plan_reset, obviously. In case of a reset, the execution time, timestamp and plan itself are just set to 0 respective NULL. The pg_stat_statements view now provides six extra columns: good_plan, good_plan_time, good_plan_timestamp, bad_plan, bad_plan_time and bad_plan_timestamp. Plans are only displayed if the showtext argument is true and the user is the superuser or the user who has been associated with that entry. Furthermore, we implemented a GUC that allows you to control the maximum refresh frequency to avoid performance impacts on restarts or resets. A plan is only updated when tracking is enabled and more time than "plan_min_interval" has passed (default: 5 seconds) and the previously mentioned conditions for the execution time have been met. The major selling point of this feature? Beeing able to find plans that need optimization (e.g. by creating indexes). As pg_stat_statements tracks normalized queries, there might be certain values or even daytimes that result in very bad plans, while others result in perfectly fine plans. Of course, the GUC log_min_duration_statement can also detect long runners, but the advantage of pg_stat_statements is that we count the total calls of normalized queries, which enables us to find plans, that don't count as long runners, while their aggregated time might show shortcomings regarding their plans. We've found this sort of tool really useful when dealing with queries produced by ORM libraries, where optimization is not intuitive. Various tests using pg_bench suggest that this extension does not worsen the performance of the database. We're really looking forward to your opinions and feedback on this feature patch Julian, Marius and Arne diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 298951a..2a22eb5 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -4,7 +4,7 @@ MODULE_big = pg_stat_statements OBJS = pg_stat_statements.o $(WIN32RES) EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.3--1.4.sql \ +DATA = pg_stat_statements--1.5.sql pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.3--1.4.sql \ pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \ pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements" diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6edc3d9..a3cfe6d 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -61,7 +61,9 @@ #include #include +#include "utils/timestamp.h" #include "access/hash.h" +#include "commands/explain.h" #include "executor/instrument.h" #include "funcapi.h" #include "mb/pg_wchar.h" @@ -118,7 +120,8 @@ typedef enum pgssVersion PGSS_V1_0 = 0, PGSS_V1_1, PGSS_V1_2, - PGSS_V1_3 + PGSS_V1_3, + PGSS_V1_5 } pgssVersion; /* @@ -159,6 +162,14 @@ typedef struct Counters double usage; /* usage factor */ } Counters; +typedef struct pgssPlan +{ + Size offset; + int len; + double time; /* execution time in msec when the latest plan was updated */ + TimestampTz timestamp; +} pgssPlan; + /* * Statistics per statement * @@ -172,6 +183,8 @@ typedef struct pgssEntry Counters counters;
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 11:57 AM, Andres Freundwrote: > On 2017-01-21 11:39:18 +0100, Magnus Hagander wrote: >> Is it time to enable checksums by default, and give initdb a switch to turn >> it off instead? > > -1 - the WAL overhead is quite massive, and in contrast to the other > GUCs recently changed you can't just switch this around. I agree. I bet that if somebody does the test suggested by Amit downthread, it'll turn out that the performance is just awful. And those cases are common. I think the people saying "well, the overhead is worth it" must be people whose systems (or whose customer's systems) aren't processing continuous heavy OLTP workloads. If you've got a data warehousing workload, checksums are probably pretty cheap. If you've got a low-velocity OLTP workload, or an OLTP workload that fits in shared_buffers, it's probably bearable. But if you've got 8GB of shared_buffers and 100GB of data, and you've got 100 or so backends continuously doing random updates, I think checksums are going nail you to the wall. And EnterpriseDB, at least, has lots of customers that do exactly that sort of thing. Having said that, I've certain run into situations where I speculated that a customer had a hardware problem and they speculated that we had given them buggy database software. In a pretty significant number of cases, the customer turned out to be right; for example, some of those people were suffering from multixact bugs that resulted in unexplainable corruption. Now, would it have been useful to know that checksums were passing (suggesting a PostgreSQL problem) rather than failing (suggesting an OS problem)? Yes, that would have been great. I could have given those customers better support. On the other hand, I think I've run into MORE cases where the customer was desperately seeking options to improve write performance, which remains a pretty significant problem for PostgreSQL. I can't see taking a significant hit in that area for my convenience in understanding what's going on in data corruption situations. The write performance penalty is paid by everybody all the time, whereas data corruption is a rare event even among support cases. And even when you do have corruption, whether or not the data corruption is accompanied by a checksum failure is only ONE extra bit of useful data. A failure doesn't guarantee a hardware problem; it could be caused by a faulty backup procedure, like forgetting to run pg_start_backup(). The lack of a failure doesn't guarantee a software problem; it could be caused by a faulty backup procedure, like using an OS-level snapshot facility that isn't exactly simultaneous across tablespaces. What you really need to do when a customer has corruption is figure out why they have corruption, and the leading cause by far is neither the hardware nor the software but some kind of user error. Checksums are at best a very modest assist in figuring out whether an error has been made and if so of what type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning vs. information_schema
On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentrautwrote: > On 1/18/17 2:32 PM, Robert Haas wrote: >> Unless we can find something official, I suppose we should just >> display BASE TABLE in that case as we do in other cases. I wonder if >> the schema needs some broader revision; for example, are there >> information_schema elements intended to show information about >> partitions? > > Is it intentional that we show the partitions by default in \d, > pg_tables, information_schema.tables? Or should we treat those as > somewhat-hidden details? I'm not really sure what the right thing to do is there. I was hoping you had an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrerawrote: > David Fetter wrote: > > > @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int > minread, int maxread) > >errmsg("could not read > from COPY file: %m"))); > > break; > > case COPY_OLD_FE: > > - > > /* > >* We cannot read more than minread bytes (which > in practice is 1) > >* because old protocol doesn't have any clear way > of separating > > This change is pointless as it'd be undone by pgindent. > > > + /* > > + * Function signature is: > > + * copy_srf( filename text default null, > > + * is_program boolean default false, > > + * format text default 'text', > > + * delimiter text default E'\t' in text mode, ',' in csv > mode, > > + * null_string text default '\N', > > + * header boolean default false, > > + * quote text default '"' in csv mode only, > > + * escape text default null, -- defaults to whatever > quote is > > + * encoding text default null > > + */ > > This comment would be mangled by pgindent -- please add an /*--- marker > to prevent it. > > > + /* param 7: escape text default null, -- defaults to whatever > quote is */ > > + if (PG_ARGISNULL(7)) > > + { > > + copy_state.escape = copy_state.quote; > > + } > > + else > > + { > > + if (copy_state.csv_mode) > > + { > > + copy_state.escape = TextDatumGetCString(PG_GETARG_ > TEXT_P(7)); > > + if (strlen(copy_state.escape) != 1) > > + { > > + ereport(ERROR, > > + > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > + errmsg("COPY escape must > be a single one-byte character"))); > > + } > > + } > > + else > > + { > > + ereport(ERROR, > > + (errcode(ERRCODE_FEATURE_NOT_ > SUPPORTED), > > + errmsg("COPY escape available > only in CSV mode"))); > > + } > > + } > > I don't understand why do we have all these checks. Can't we just pass > the values to COPY and let it apply the checks? That way, when COPY is > updated to support multibyte escape chars (for example) we don't need to > touch this code. Together with removing the unneeded braces that would > make these stanzas about six lines long instead of fifteen. > > > > + tuple = heap_form_tuple(tupdesc,values,nulls); > > + //tuple = BuildTupleFromCStrings(attinmeta, > field_strings); > > + tuplestore_puttuple(tupstore, tuple); > > No need to form a tuple; use tuplestore_putvalues here. > > > > + } > > + > > + /* close "file" */ > > + if (copy_state.is_program) > > + { > > + int pclose_rc; > > + > > + pclose_rc = ClosePipeStream(copy_state.copy_file); > > + if (pclose_rc == -1) > > + ereport(ERROR, > > + (errcode_for_file_access(), > > + errmsg("could not close pipe to > external command: %m"))); > > + else if (pclose_rc != 0) > > + ereport(ERROR, > > + (errcode(ERRCODE_EXTERNAL_ > ROUTINE_EXCEPTION), > > + errmsg("program \"%s\" failed", > > + > copy_state.filename), > > + errdetail_internal("%s", > wait_result_to_str(pclose_rc; > > + } > > + else > > + { > > + if (copy_state.filename != NULL && > FreeFile(copy_state.copy_file)) > > + ereport(ERROR, > > + (errcode_for_file_access(), > > + errmsg("could not close file > \"%s\": %m", > > + > copy_state.filename))); > > + } > > I wonder if these should be an auxiliary function in copy.c to do this. > Surely copy.c itself does pretty much the same thing ... > > > > +DATA(insert OID = 3353 ( copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v > u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ > copy_srf _null_ _null_ _null_ )); > > +DESCR("set-returning COPY proof of concept"); > > Why is this marked "proof of concept"? If this is a PoC only, why are > you submitting it as a patch? > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > Feel free to mark it returned as feedback. The concept didn't generate as much enthusiasm as I had hoped, so I think the
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
On 25 January 2017 at 17:34, Julian Markwortwrote: > Analogous to this, a bad_plan is saved, when the time has been exceeded by a > factor greater than 1.1 . ...and the plan differs? Probably best to use some stat math to calculate deviation, rather than fixed %. Sounds good. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning vs. information_schema
On 1/18/17 2:32 PM, Robert Haas wrote: > Unless we can find something official, I suppose we should just > display BASE TABLE in that case as we do in other cases. I wonder if > the schema needs some broader revision; for example, are there > information_schema elements intended to show information about > partitions? Is it intentional that we show the partitions by default in \d, pg_tables, information_schema.tables? Or should we treat those as somewhat-hidden details? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 1:10 PM, David Fetterwrote: > On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote: > > > > Feel free to mark it returned as feedback. The concept didn't > > generate as much enthusiasm as I had hoped, so I think the right > > thing to do now is scale it back to a patch that makes > > CopyFromRawFields() externally visible so that extensions can use > > them. > > You're getting enthusiasm in the form of these reviews and suggestions > for improvement. That it doesn't always happen immediately is a > byproduct of the scarcity of developer time and the sheer volume of > things to which people need to pay attention. True about that. I was referring to "ooh! I need that!"-type interest. I'll proceed, keeping in mind that there's a fallback position of just making some of the guts of COPY available to be called by extensions like was done for file_fdw.
Re: [HACKERS] patch: function xmltable
2017-01-25 22:40 GMT+01:00 Andres Freund: > Hi, > > > > > I'll try to explain my motivation. Please, check it and correct me > if I > > > am > > > > wrong. I don't keep on my implementation - just try to implement > XMLTABLE > > > > be consistent with another behave and be used all time without any > > > > surprise. > > > > > > > > 1. Any function that produces a content can be used in target list. > We > > > > support SRF in target list and in FROM part. Why XMLTABLE should be a > > > > exception? > > > > > > targetlist SRFs were a big mistake. They cause a fair number of > problems > > > code-wise. They permeated for a long while into bits of both planner > and > > > executor, where they really shouldn't belong. Even after the recent > > > changes there's a fair amount of uglyness associated with them. We > > > can't remove tSRFs for backward compatibility reasons, but that's not > > > true for XMLTABLE > > > > > > > > > > > ok > > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > reimplement > > it partially :( - mainly for usage in "ROWS FROM ()" > The TableExpr implementation is based on SRF now. You and Alvaro propose independent implementation like generic executor node. I am sceptic so FunctionScan supports reading from generic executor node. Regards Pavel > Huh? > > Greetings, > > Andres Freund >
Re: [HACKERS] patch: function xmltable
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote: > 2017-01-25 22:40 GMT+01:00 Andres Freund: > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > > reimplement > > > it partially :( - mainly for usage in "ROWS FROM ()" > > > > The TableExpr implementation is based on SRF now. You and Alvaro propose > independent implementation like generic executor node. I am sceptic so > FunctionScan supports reading from generic executor node. Why would it need to? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Hi, On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > > Preventing people from calling functions by denying the ability to > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > them from delegating those functions to non-superusers. It either (a) > > causes them to give superuser privileges to accounts that otherwise > > would have had lesser privileges or (b) forces them to use wrapper > > functions to grant access rather than doing it directly or (c) some > > other dirty hack. If they pick (a), security is worse; if they pick > > (b) or (c), you haven't prevented them from doing what they wanted to > > do anyway. You've just made it annoying and inconvenient. > > The notion that security is 'worse' under (a) is flawed- it's no > different. Huh? Obviously that's nonesense, given the pg_ls_dir example. > With regard to 'b', if their wrapper function is > sufficiently careful to ensure that the caller isn't able to do anything > which would increase the caller's level to that of superuser, then > security is improved. Given how complex "sufficiently careful" is for security definer UDFs, in comparison to estimating the security of granting to a function like pg_ls_dir (or pretty much any other that doesn't call out to SQL level stuff like operators, output functions, etc), I don't understand this. > If the wrapper simply turns around can calls the underlying function, > then it's no different from '(a)'. Except for stuff like search path. > I am suggesting that we shouldn't make it look like there are > distinctions when there is actually no difference. That is a good thing > for our users because it keeps them informed about what they're actually > doing when they grant access. This position doesn't make a lick of sense to me. There's simply no benefit at all in requiring to create wrapper functions, over allowing to grant to non-superuser. Both is possible, secdef is a lot harder to get right. And you already heavily started down the path of removing superuser() type checks - you're just arguing to make it more or less randomly less consistent. > I've commented on here and spoken numerous times about exactly that goal > of reducing the checks in check_postgres.pl which require superuser. > You're not actually doing that and nothing you've outlined in here so > far makes me believe you see how having pg_write_file() access is > equivilant to giving someone superuser, and that concerns me. That's the users responsibility, and Robert didnt' really suggest granting pg_write_file() permissions, so this seems to be a straw man. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frostwrote: > As it is, there are backup solutions which *do* check the checksum when > backing up PG. This is no longer, thankfully, some hypothetical thing, > but something which really exists and will hopefully keep users from > losing data. Wouldn't that have issues with torn pages? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag
(Adding Robert in CC.) On Thu, Jan 26, 2017 at 4:34 AM, Wang Haowrote: > An unlogged table has an initialization fork. The initialization fork does > not have an BM_PERMANENT flag when get a buffer. > In checkpoint (not shutdown or end of recovery), it will not write to disk. > after a crash recovery, the page of initialization fork will not correctly, > then make the main fork not correctly too. For init forks the flush need absolutely to happen, so that's really not good. We ought to fix BufferAlloc() appropriately here. > Here is an example for GIN index. > > create unlogged table gin_test_tbl(i int4[]); > create index gin_test_idx on gin_test_tbl using gin (i); > checkpoint; > > kill all the postgres process, and restart again. > > vacuum gin_test_tbl; -- crash. > > It seems have same problem in BRIN, GIN, GiST and HASH index which using > buffer for meta page initialize in ambuildempty function. Yeah, other index AMs deal directly with the sync of the page, that's why there is no issue for them. So the patch attached fixes the problem by changing BufferAlloc() in such a way that initialization forks are permanently written to disk, which is what you are suggesting. As a simple fix for back-branches that's enough, though on HEAD I think that we should really rework the empty() routines so as the write goes through shared buffers first, that seems more solid than relying on the sgmr routines to do this work. Robert, what do you think? -- Michael unlogged-flush-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On 2017-01-25 18:04:09 -0500, Stephen Frost wrote: > Andres, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > > > * Robert Haas (robertmh...@gmail.com) wrote: > > > > Preventing people from calling functions by denying the ability to > > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > > > them from delegating those functions to non-superusers. It either (a) > > > > causes them to give superuser privileges to accounts that otherwise > > > > would have had lesser privileges or (b) forces them to use wrapper > > > > functions to grant access rather than doing it directly or (c) some > > > > other dirty hack. If they pick (a), security is worse; if they pick > > > > (b) or (c), you haven't prevented them from doing what they wanted to > > > > do anyway. You've just made it annoying and inconvenient. > > > > > > The notion that security is 'worse' under (a) is flawed- it's no > > > different. > > > > Huh? Obviously that's nonesense, given the pg_ls_dir example. > > Robert's made it clear that he'd like to have a blanket rule that we > don't have superuser checks in these code paths if they can be GRANT'd > at the database level, which goes beyond pg_ls_dir. That seems right to me. I don't see much benefit for the superuser() style checks, with a few exceptions. Granting by default is obviously an entirely different question. > If the question was only about pg_ls_dir, then I still wouldn't be for > it, because, as the bits you didn't quote discussed, it encourages users > and 3rd party tool authors to base more things off of pg_ls_dir to > look into the way PG stores data on disk, and affords more access than > the monitoring user has any need for, none of which are good, imv. It > also discourages people from implementing proper solutions which you can > 'just use pg_ls_dir()', which I also don't agree with. In other words, you're trying to force people to do stuff your preferred way, instead of allowing them to get things done is a reasonable manner. > If you really want to do an ls, go talk to the OS. ACLs are possible to > provide that with more granularity than what would be available through > pg_ls_dir(). We aren't in the "give a user the ability to do an ls" > business, frankly. Wut. > > > I am suggesting that we shouldn't make it look like there are > > > distinctions when there is actually no difference. That is a good thing > > > for our users because it keeps them informed about what they're actually > > > doing when they grant access. > > > > This position doesn't make a lick of sense to me. There's simply no > > benefit at all in requiring to create wrapper functions, over allowing > > to grant to non-superuser. Both is possible, secdef is a lot harder to > > get right. And you already heavily started down the path of removing > > superuser() type checks - you're just arguing to make it more or less > > randomly less consistent. > > I find this bizarre considering I went through a detailed effort to go > look at every superuser check in the system and discussed, on this list, > the reasoning behind each and every one of them. I do generally > consider arbitrary access to syscalls via the database to be a privilege > which really only the superuser should have. Just because you argued doesn't mean I agree. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Peter Geogheganwrites: > It means "another call to tuplesort_gettupleslot", but I believe that > it would be safer (more future-proof) to actually specify "the slot > contents may be invalidated by any subsequent manipulation of the > tuplesort's state" instead. WFM. >> There are several other uses of "call here", both in this patch and >> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. >> Let's try to improve that. > Should I write a patch along those lines? Please. You might want to hit the existing ones with a separate patch, but it doesn't much matter; I'd be just as happy with a patch that did both things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On Wed, Jan 25, 2017 at 3:11 PM, Tom Lanewrote: > Please. You might want to hit the existing ones with a separate patch, > but it doesn't much matter; I'd be just as happy with a patch that did > both things. Got it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
On Tue, Jan 10, 2017 at 8:22 PM, Kyotaro HORIGUCHIwrote: > [...patch...] Nobody has showed up yet to review this patch, so I am giving it a shot. The patch file sizes are scary at first sight, but after having a look: 36 files changed, 1411 insertions(+), 54398 deletions(-) Yes that's a surprise, something like git diff --irreversible-delete would have helped as most of the diffs are just caused by 3 files being deleted in patch 0004, making 50k lines going to the abyss of deletion. > Hello, I found a bug in my portion while rebasing. Right, that's 0001. Nice catch. > The attached files are the following. This patchset is not > complete missing changes of map files. The change is tremendously > large but generatable. > > 0001-Add-missing-semicolon.patch > > UCS_to_EUC_JP.pl has a line missing teminating semicolon. This > doesn't harm but surely a syntax error. This patch fixes it. > This might should be a separate patch. This requires a back-patch. This makes me wonder how long this script has actually not run... > 0002-Correct-reference-resolution-syntax.patch > > convutils.pm has lines with different syntax of reference > resolution. This unifies the syntax. Yes that looks right to me. I am the best perl guru on this list but looking around $$var{foo} is bad, ${$var}{foo} is better, and $var->{foo} is even better. This also generates no diffs when running make in src/backend/utils/mb/Unicode/. So no objections to that. > 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > > Before adding radix tree stuff, applied pgperltidy and inserted > format-skipping pragma for the parts where perltidy seems to do > too much. Which version of perltidy did you use? Looking at the archives, the perl code is cleaned up with a specific version, v20090616. See https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com for example on the matter. As perltidy changes over time, this may be a sensitive change if done this way. > 0004-Use-radix-tree-for-character-conversion.patch > > Radix tree body. Well, here a lot of diffs could have been saved. > The unattached fifth patch is generated by the following steps. > > [$(TOP)]$ ./configure > [Unicode]$ make > [Unicode]$ make distclean > [Unicode]$ git add . > [Unicode]$ commit > === COMMITE MESSSAGE > Replace map files with radix tree files. > > These encodings no longer uses the former map files and uses new radix > tree files. > === OK, I can see that working, with 200k of maps generated.. So going through the important bits of this jungle.. +/* + * radix tree conversion function - this should be identical to the function in + * ../conv.c with the same name + */ +static inline uint32 +pg_mb_radix_conv(const pg_mb_radix_tree *rt, +int l, +unsigned char b1, +unsigned char b2, +unsigned char b3, +unsigned char b4) This is not nice. Having a duplication like that is a recipe to forget about it as this patch introduces a dependency with conv.c and the radix tree generation. Having a .gitignore in Unicode/ would be nice, particularly to avoid committing map_checker. A README documenting things may be welcome, or at least comments at the top of map_checker.c. Why is map_checker essential? What does it do? There is no way to understand that easily, except that it includes a "radix tree conversion function", and that it performs sanity checks on the radix trees to be sure that they are on a good shape. But as this something that one would guess only after looking at your patch and the code (at least I will sleep less stupid tonight after reading this stuff). --- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl +++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl # Drop these SJIS codes from the source for UTF8=>SJIS conversion #<<< do not let perltidy touch this -my @reject_sjis =( +my @reject_sjis = ( 0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782, - 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, + 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, 0x879a..0x879c -); + ); This is not generated, it would be nice to drop the noise from the patch. Here is another one: - $i->{code} = $jis | ( - $jis < 0x100 - ? 0x8e00 - : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); - +#<<< do not let perltidy touch this + $i->{code} = $jis | ($jis < 0x100 ? 0x8e00: +($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); +#>>> if (l == 2) { - iutf = *utf++ << 8; - iutf |= *utf++; + b3 = *utf++; + b4 = *utf++; } Ah, OK. This conversion is important so as it performs a minimum of bitwise operations. Yes let's keep that. That's pretty cool to get a faster operation. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] Speedup twophase transactions
On Thu, Jan 26, 2017 at 4:09 PM, Nikhil Sontakkewrote: >>I look at this patch from you and that's present for me: >>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com > >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) >> (errmsg("unexpected timeline ID %u (should be %u) >> in checkpoint record", >> checkPoint.ThisTimeLineID, ThisTimeLineID))); >> >> +KnownPreparedRecreateFiles(checkPoint.redo); >> RecoveryRestartPoint(); >> } > > Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a > function by this name. And now I see, the name is CheckPointTwoPhase() > :-) My mistake then :D >> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC >> files are not flushed to disk with this patch. This is a problem as a >> new restart point is created... Having the flush in CheckpointTwoPhase >> really makes the most sense. > > Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby > promote" code path. CreateRestartPoint() calls it via CheckPointGuts() while in recovery. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding of two-phase transactions
>> >> Yes, that’s also possible but seems to be less flexible restricting us to >> some >> specific GID format. >> >> Anyway, I can measure WAL space overhead introduced by the GID’s inside >> commit records >> to know exactly what will be the cost of such approach. > > Stas, > > Have you had a chance to look at this further? Generally i’m okay with Simon’s approach and will send send updated patch. Anyway want to perform some test to estimate how much disk space is actually wasted by extra WAL records. > I think the approach of storing just the xid and fetching the GID > during logical decoding of the PREPARE TRANSACTION is probably the > best way forward, per my prior mail. I don’t think that’s possible in this way. If we will not put GID in commit record, than by the time when logical decoding will happened transaction will be already committed/aborted and there will be no easy way to get that GID. I thought about several possibilities: * Tracking xid/gid map in memory also doesn’t help much — if server reboots between prepare and commit we’ll lose that mapping. * We can provide some hooks on prepared tx recovery during startup, but that approach also fails if reboot happened between commit and decoding of that commit. * Logical messages are WAL-logged, but they don’t have any redo function so don’t helps much. So to support user-accessible 2PC over replication based on 2PC decoding we should invent something more nasty like writing them into a table. > That should eliminate Simon's > objection re the cost of tracking GIDs and still let us have access to > them when we want them, which is the best of both worlds really. Having 2PC decoding in core is a good thing anyway even without GID tracking =) -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
>I look at this patch from you and that's present for me: >https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) > (errmsg("unexpected timeline ID %u (should be %u) > in checkpoint record", > checkPoint.ThisTimeLineID, ThisTimeLineID))); > > +KnownPreparedRecreateFiles(checkPoint.redo); > RecoveryRestartPoint(); > } Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a function by this name. And now I see, the name is CheckPointTwoPhase() :-) > And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC > files are not flushed to disk with this patch. This is a problem as a > new restart point is created... Having the flush in CheckpointTwoPhase > really makes the most sense. Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby promote" code path. Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
On Wed, Jan 25, 2017 at 7:18 PM, Ishii Ayumiwrote: > I patched 4 patchset and run "make", but I got failed. > Is this a bug or my mistake ? > I'm sorry if I'm wrong. > > [$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch > [$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch > [$(TOP)]$ patch -p1 < > ../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > [$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch > [$(TOP)]$ ./configure > [Unicode]$ make > '/usr/bin/perl' UCS_to_most.pl > Type of arg 1 to keys must be hash (not hash element) at convutils.pm > line 443, near "}) > " > Type of arg 1 to values must be hash (not hash element) at > convutils.pm line 596, near "}) > " > Type of arg 1 to each must be hash (not private variable) at > convutils.pm line 755, near "$map) > " > Compilation failed in require at UCS_to_most.pl line 19. > make: *** [iso8859_2_to_utf8.map] Error 255 Hm, I am not sure what you are missing. I was able to get things to build. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 18:04:09 -0500, Stephen Frost wrote: > > Robert's made it clear that he'd like to have a blanket rule that we > > don't have superuser checks in these code paths if they can be GRANT'd > > at the database level, which goes beyond pg_ls_dir. > > That seems right to me. I don't see much benefit for the superuser() > style checks, with a few exceptions. Granting by default is obviously > an entirely different question. Well, for my part at least, I disagree. Superuser is a very different animal, imv, than privileges which can be GRANT'd, and I feel that's an altogether good thing. > In other words, you're trying to force people to do stuff your preferred > way, instead of allowing them to get things done is a reasonable manner. Apparently we disagree about what is a 'reasonable manner'. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_hba_file_settings view patch
On Thu, Jan 26, 2017 at 2:32 AM, Tom Lanewrote: > The way I'd be inclined to make the individual reporting changes is like > > if (!EnableSSL) > +{ > - ereport(LOG, > + ereport(elevel, > (errcode(ERRCODE_CONFIG_FILE_ERROR), > errmsg("hostssl record cannot match because SSL is disabled"), > errhint("Set ssl = on in postgresql.conf."), > errcontext("line %d of configuration file \"%s\"", > line_num, HbaFileName))); > +*err_msg = pstrdup("hostssl record cannot match because SSL > is disabled"); > +} > > which is considerably less invasive and hence easier to review, and > supports reporting different text in the view than appears in the log, > should we need that. It seems likely also that we could drop the pstrdup > in the case of constant strings (we'd still need psprintf if we want to > insert values into the view messages), which would make this way cheaper > than what's in the patch now. I don't really understand the argument about readability of the patch as what Haribabu has proposed is simply to avoid a duplicate of the strings and the diffs of the patch are really clear. For the sake of not translating the strings sent back to the system view though I can buy it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 26 January 2017 at 01:58, Thomas Munrowrote: > > I don't know how comparable it is to our checksum technology, but > MySQL seems to have some kind of checksums on table data, and you can > find public emails, blogs etc lamenting corrupted databases by > searching Google for the string "InnoDB: uncompressed page, stored > checksum in field1" (that's the start of a longer error message that > includes actual and expected checksums). I'm not sure what exactly that teaches us however. I see these were often associated with software bugs (Apparently MySQL long assumed that a checksum of 0 never happened for example). In every non software case I stumbled across seemed to be following a power failure. Apparently MySQL uses a "doublewrite buffer" to protect against torn pages but when I search for that I get tons of people inquiring how to turn it off... So even without software bugs in the checksum code I don't know that the frequency of the error necessarily teaches us anything about the frequency of hardware corruption either. And more to the point it seems what people are asking for in all those lamentations is how they can convince MySQL to continue and ignore the corruption. A typical response was "We slightly modified innochecksum and added option -f that means if the checksum of a page is wrong, rewrite it in the InnoDB page header." Which begs the question... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:37 PM, Andres Freundwrote: > > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > >> * Peter Geoghegan (p...@heroku.com) wrote: > >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost > >> > wrote: > >> > > As it is, there are backup solutions which *do* check the checksum when > >> > > backing up PG. This is no longer, thankfully, some hypothetical thing, > >> > > but something which really exists and will hopefully keep users from > >> > > losing data. > >> > > >> > Wouldn't that have issues with torn pages? > >> > >> No, why would it? The page has either been written out by PG to the OS, > >> in which case the backup s/w will see the new page, or it hasn't been. > > > > Uh. Writes aren't atomic on that granularity. That means you very well > > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > > of a 8KB postgres page). Just read a page while it's being written out. > > Yeah. This is also why backups force full page writes on even if > they're turned off in general. I've got a question into David about this, I know we chatted about the risk at one point, I just don't recall what we ended up doing (I can imagine a few different possible things- re-read the page, which isn't a guarantee but reduces the chances a fair bit, or check the LSN, or perhaps the plan was to just check if it's in the WAL, as I mentioned) or if we ended up concluding it wasn't a risk for some, perhaps incorrect, reason and need to revisit it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Speedup twophase transactions
>> The question remains whether saving off a few fsyncs/reads for these >> long-lived prepared transactions is worth the additional code churn. >> Even if we add code to go through the KnownPreparedList, we still will >> have to go through the other on-disk 2PC transactions anyways. So, >> maybe not. > > We should really try to do things right now, or we'll never come back > to it. 9.3 (if my memory does not fail me?) has reduced the time to do > promotion by removing the need of the end-of-recovery checkpoint, > while I agree that there should not be that many 2PC transactions at > this point, if there are for a reason or another, the time it takes to > complete promotion would be impacted. So let's refactor > PrescanPreparedTransactions() so as it is able to handle 2PC data from > a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. > Not quite. If we modify PrescanPreparedTransactions(), we also need to make RecoverPreparedTransactions() and StandbyRecoverPreparedTransactions() handle 2PC data via XlogReadTwoPhaseData(). > + /* > +* Move prepared transactions, if any, from KnownPreparedList to > files. > +* It is possible to skip this step and teach subsequent code about > +* KnownPreparedList, but PrescanPreparedTransactions() happens once > +* during end of recovery or on promote, so probably it isn't worth > +* the additional code. > +*/ This comment is misplaced. Does not make sense before this specific call. > + KnownPreparedRecreateFiles(checkPoint.redo); > RecoveryRestartPoint(); > Looking again at this code, I think that this is incorrect. The > checkpointer should be in charge of doing this work and not the > startup process, so this should go into CheckpointTwoPhase() instead. I don't see a function by the above name in the code? Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequence data type
On Wed, Jan 25, 2017 at 11:53 PM, Peter Eisentrautwrote: > Here is an updated patch that allows changing the sequence type. This > was clearly a concern for reviewers, and the presented use cases seemed > convincing. I have been torturing this patch and it looks rather solid to me. Here are a couple of comments: @@ -15984,6 +15992,9 @@ dumpSequence(Archive *fout, TableInfo *tbinfo) "CREATE SEQUENCE %s\n", fmtId(tbinfo->dobj.name)); + if (strcmp(seqtype, "bigint") != 0) + appendPQExpBuffer(query, "AS %s\n", seqtype); Wouldn't it be better to assign that unconditionally? There is no reason that a dump taken from pg_dump version X will work on X - 1 (as there is no reason to not make the life of users uselessly difficult as that's your point), but that seems better to me than rely on the sequence type hardcoded in all the pre-10 dump queries for sequences. That would bring also more consistency to the CREATE SEQUENCE queries of test_pg_dump/t/001_base.pl. Could you increase the regression test coverage to cover some of the new code paths? For example such cases are not tested: =# create sequence toto as smallint; CREATE SEQUENCE =# alter sequence toto as smallint maxvalue 1000; ERROR: 22023: RESTART value (2147483646) cannot be greater than MAXVALUE (1000) LOCATION: init_params, sequence.c:1537 =# select setval('toto', 1); setval 1 (1 row) =# alter sequence toto as smallint; ERROR: 22023: MAXVALUE (2147483647) is too large for sequence data type smallint LOCATION: init_params, sequence.c:1407 + if ((seqform->seqtypid == INT2OID && seqform->seqmin < PG_INT16_MIN) + || (seqform->seqtypid == INT4OID && seqform->seqmin < PG_INT32_MIN) + || (seqform->seqtypid == INT8OID && seqform->seqmin < PG_INT64_MIN)) + { + charbufm[100]; + + snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin); + + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("MINVALUE (%s) is too large for sequence data type %s", + bufm, format_type_be(seqform->seqtypid; + } "large" does not apply to values lower than the minimum, no? The int64 path is never going to be reached (same for the max value), it doesn't hurt to code it I agree. Testing serial columns, the changes are consistent with the previous releases. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Thu, Jan 26, 2017 at 1:38 PM, Nikhil Sontakkewrote: >> We should really try to do things right now, or we'll never come back >> to it. 9.3 (if my memory does not fail me?) has reduced the time to do >> promotion by removing the need of the end-of-recovery checkpoint, >> while I agree that there should not be that many 2PC transactions at >> this point, if there are for a reason or another, the time it takes to >> complete promotion would be impacted. So let's refactor >> PrescanPreparedTransactions() so as it is able to handle 2PC data from >> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. > > Not quite. If we modify PrescanPreparedTransactions(), we also need to > make RecoverPreparedTransactions() and > StandbyRecoverPreparedTransactions() handle 2PC data via > XlogReadTwoPhaseData(). Ah, right for both, even for RecoverPreparedTransactions() that happens at the end of recovery. Thanks for noticing. The patch mentions that as well: + ** At the end of recovery we move all known prepared transactions to disk. + * This allows RecoverPreparedTransactions() and + * StandbyRecoverPreparedTransactions() to do their work. I need some strong coffee.. >> + KnownPreparedRecreateFiles(checkPoint.redo); >> RecoveryRestartPoint(); >> Looking again at this code, I think that this is incorrect. The >> checkpointer should be in charge of doing this work and not the >> startup process, so this should go into CheckpointTwoPhase() instead. > > I don't see a function by the above name in the code? I look at this patch from you and that's present for me: https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq-Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com If I look as well at the last version of Stas it is here: https://www.postgresql.org/message-id/becc988a-db74-48d5-b5d5-a54551a62...@postgrespro.ru As this change: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record", checkPoint.ThisTimeLineID, ThisTimeLineID))); +KnownPreparedRecreateFiles(checkPoint.redo); RecoveryRestartPoint(); } And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC files are not flushed to disk with this patch. This is a problem as a new restart point is created... Having the flush in CheckpointTwoPhase really makes the most sense. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 1:22 PM, Peter Geogheganwrote: > I understand that my experience with storage devices is unusually > narrow compared to everyone else here. That's why I remain neutral on > the high level question of whether or not we ought to enable checksums > by default. I'll ask other hackers to answer what may seem like a very > naive question, while bearing what I just said in mind. The question > is: Have you ever actually seen a checksum failure in production? And, > if so, how helpful was it? I'm surprised that nobody has answered my question yet. I'm not claiming that not actually seeing any corruption in the wild due to a failing checksum invalidates any argument. I *do* think that data points like this can be helpful, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
Fabien, * Fabien COELHO (coe...@cri.ensmp.fr) wrote: > I think that there is a misunderstanding, most of which being my fault. No worries, it happens. :) > I have really tried to do everything that was required from > committers, including revising the patch to match all previous > feedback. Thanks for continuing to try to work through everything. I know it can be a difficult process, but it's all towards a (hopefully) improved and better PG. > Version 6 sent on Oct 4 did include all fixes required at the time > (no if, no unusual and operators, TAP tests)... However I forgot to > remove some documentation about the removed stuff, which made Robert > think that I had not done it. I apologise for this mistake and the > subsequent misunderstanding:-( Ok, that helps clarify things. As does the rest of your email, for me, anyway. > If pgbench is about being seated on a bench and running postgres on > your laptop to get some heat, my mistake... I thought it was about > benchmarking, which does imply a few extra capabities. I believe we do want to improve pgbench and your changes are generally welcome when it comes to adding useful capabilities. Your explanation was also helpful about the specific requirements. > IMHO the relevant current status of the patch should be "Needs > review" and possibly "Move to next CF". For my 2c, at least, while I'm definitely interested in this, it's not nearly high enough on my plate with everything else going on to get any attention in the next few weeks, at least. I do think that, perhaps, this patch may deserve a bit of a break, to allow people to come back to it with a fresh perspective, so perhaps moving it to the next commitfest would be a good idea, in a Needs Review state. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 2:28 PM, Stephen Frostwrote: > Sadly, without having them enabled by default, there's not a huge corpus > of example cases to draw from. > > There have been a few examples already posted about corruption failures > with PG, but one can't say with certainty that they would have been > caught sooner if checksums had been enabled. I don't know how comparable it is to our checksum technology, but MySQL seems to have some kind of checksums on table data, and you can find public emails, blogs etc lamenting corrupted databases by searching Google for the string "InnoDB: uncompressed page, stored checksum in field1" (that's the start of a longer error message that includes actual and expected checksums). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Wed, Jan 25, 2017 at 11:55 PM, Nikhil Sontakkewrote: >> We are talking about the recovery/promote code path. Specifically this >> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions(). >> >> We write the files to disk and they get immediately read up in the >> following code. We could not write the files to disk and read >> KnownPreparedList in the code path that follows as well as elsewhere. > > Thinking more on this. > > The only optimization that's really remaining is handling of prepared > transactions that have not been committed or will linger around for > long. The short lived 2PC transactions have been optimized already via > this patch. > > The question remains whether saving off a few fsyncs/reads for these > long-lived prepared transactions is worth the additional code churn. > Even if we add code to go through the KnownPreparedList, we still will > have to go through the other on-disk 2PC transactions anyways. So, > maybe not. We should really try to do things right now, or we'll never come back to it. 9.3 (if my memory does not fail me?) has reduced the time to do promotion by removing the need of the end-of-recovery checkpoint, while I agree that there should not be that many 2PC transactions at this point, if there are for a reason or another, the time it takes to complete promotion would be impacted. So let's refactor PrescanPreparedTransactions() so as it is able to handle 2PC data from a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9573,6 +9573,15 @@ xlog_redo(XLogReaderState *record) (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record", checkPoint.ThisTimeLineID, ThisTimeLineID))); + + /* +* Move prepared transactions, if any, from KnownPreparedList to files. +* It is possible to skip this step and teach subsequent code about +* KnownPreparedList, but PrescanPreparedTransactions() happens once +* during end of recovery or on promote, so probably it isn't worth +* the additional code. +*/ + KnownPreparedRecreateFiles(checkPoint.redo); RecoveryRestartPoint(); Looking again at this code, I think that this is incorrect. The checkpointer should be in charge of doing this work and not the startup process, so this should go into CheckpointTwoPhase() instead. At the end, we should be able to just live without KnownPreparedRecreateFiles() and just rip it off from the patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frostwrote: > > I hope to discuss it further after we have the ability to turn it off > > easily. > > I think we should have the ability to flip it in BOTH directions easily. Presumably you imply this to mean "before we enable it by default." I'm not sure that I can agree with that, but we haven't got it in either direction yet, so it's not terribly interesting to discuss that particular "what if." > It sounds to me like you are misleading users about the positives and > negatives of checksums, which then causes them to be shocked that they > are not the default. I don't try to claim that they are without downsides or performance impacts, if that's the implication here. > > [ more unsolicited bragging an unspecified backup tool, presumably still > > pgbackrest ] It was explicitly to counter the claim that there aren't things out there which are working to actively check the checksums. > > I'd rather walk into an engagement where the user is saying "yeah, we > > enabled checksums and it caught this corruption issue" than having to > > break the bad news, which I've had to do over and over, that their > > existing system hasn't got checksums enabled. This isn't hypothetical, > > it's what I run into regularly with entirely reasonable and skilled > > engineers who have been deploying PG. > > Maybe you should just stop telling them and use the time thus freed up > to work on improving the checksum feature. I'm working to improve the usefulness of our checksum feature in a way which will produce practical and much more immediate results than anything I could do today in PG. That said, I do plan to also support working on checksums as I'm able to. At the moment, that's supporting Magnus' thread about enabling them by default. I'd be a bit surprised if he was trying to force a change on PG because he thinks it's going to improve things for pgbackrest, but if so, I'm not going to complain when it seems like an entirely sensible and good change which will benefit PG's users too. Even better would be if we had an independent tool to check checksums endorsed by the PG community, but that won't happen for a release cycle. I'd also be extremely happy if the other backup tools out there grew the ability to check checksums in PG pages; frankly, I hope that adding it to pgbackrest will push them to do so. > I'm skeptical of this whole discussion because you seem to be filled > with unalloyed confidence that checksums have little performance > impact and will do wonderful things to prevent data loss, whereas I > think they have significant performance impact and will only very > slightly help to prevent data loss. I admit that they'll have a significant performance impact in some environments, but I think the vast majority of installations won't see anything different, while some of them may be saved by it, including, as likely as not, a number of actual corruption issues that have been brought up on these lists in the past few days, simply because reports were asked for. > I admit that the idea of having > pgbackrest verify checksums while backing up seems like it could > greatly improve the chances of checksums being useful, but I'm not > going to endorse changing PostgreSQL's default for pgbackrest's > benefit. I'm glad to hear that you generally endorse the idea of having a backup tool verify checksums. I'd love it if all of them did and I'm not going to apologize for, as far as I'm aware, being the first to even make an effort in that direction. > It's got to be to the benefit of PostgreSQL users broadly, > not just the subset of those people who use one particular backup > tool. Hopefully, other backup solutions will add similar capability, and perhaps someone will also write an independent tool, and eventually those will get out in released versions, and maybe PG will grow a tool to check checksums too, but I can't make other tool authors implement it, nor can I make other committers work on it and while I'm doing what I can, as I'm sure you understand, we all have a lot of different hats. > Also, the massive hit that will probably occur on > high-concurrency OLTP workloads larger than shared_buffers is going to > be had to justify for any amount of backup security. I think that > problem's got to be solved or at least mitigated before we think about > changing this. I realize that not everyone would set the bar that > high, but I see far too many customers with exactly that workload to > dismiss it lightly. I have a sneaking suspicion that the customers which you get directly involved with tend to be at a different level than the majority of PG users which exist out in the wild (I can't say that it's really any different for me). I don't think that's a bad thing, but I do think users at all levels deserve consideration and not just those running close to the limits of
Re: [HACKERS] Checksums by default?
Peter, * Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 1:22 PM, Peter Geogheganwrote: > > I understand that my experience with storage devices is unusually > > narrow compared to everyone else here. That's why I remain neutral on > > the high level question of whether or not we ought to enable checksums > > by default. I'll ask other hackers to answer what may seem like a very > > naive question, while bearing what I just said in mind. The question > > is: Have you ever actually seen a checksum failure in production? And, > > if so, how helpful was it? > > I'm surprised that nobody has answered my question yet. > > I'm not claiming that not actually seeing any corruption in the wild > due to a failing checksum invalidates any argument. I *do* think that > data points like this can be helpful, though. Sadly, without having them enabled by default, there's not a huge corpus of example cases to draw from. There have been a few examples already posted about corruption failures with PG, but one can't say with certainty that they would have been caught sooner if checksums had been enabled. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] COPY as a set returning function
> > >> I don't understand why do we have all these checks. Can't we just pass >> the values to COPY and let it apply the checks? That way, when COPY is >> updated to support multibyte escape chars (for example) we don't need to >> touch this code. Together with removing the unneeded braces that would >> make these stanzas about six lines long instead of fifteen. >> > > If I understand you correctly, COPY (via BeginCopyFrom) itself relies on > having a relation in pg_class to reference for attributes. > In this case, there is no such relation. So I'd have to fake a relcache > entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the > Relation and pass that along to a new function BeginCopyFromReturnSet. I'm > happy to go that route if you think it's a good idea. > > >> >> >> > + tuple = heap_form_tuple(tupdesc,values,nulls); >> > + //tuple = BuildTupleFromCStrings(attinmeta, >> field_strings); >> > + tuplestore_puttuple(tupstore, tuple); >> >> No need to form a tuple; use tuplestore_putvalues here. >> > > Good to know! > > > >> >> I wonder if these should be an auxiliary function in copy.c to do this. >> Surely copy.c itself does pretty much the same thing ... >> > > Yes. This got started as a patch to core because not all of the parts of > COPY are externally callable, and aren't broken down in a way that allowed > for use in a SRF. > > I'll get to work on these suggestions. > I've put in some more work on this patch, mostly just taking Alvaro's suggestions, which resulted in big code savings. I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This seemed the easiest way to leverage the existing tested code (and indeed, it worked nearly out-of-the-box). The only drawback is that a minor change will have to be made to the BeginCopyFrom() call in file_fdw.c, and any other extensions that leverage COPY. We could make compatibility functions that take the original signature and pass it along to the corresponding function with rsTupDesc set to NULL. Some issues: - I'm still not sure if the direction we want to go is a set returning function, or a change in grammar that lets us use COPY as a CTE or similar. - This function will have the same difficulties as adding the program option did to file_fdw: there's very little we can reference that isn't os/environment specific - Inline (STDIN) prompts the user for input, but gives the error: server sent data ("D" message) without prior row description ("T" message). I looked for a place where the Relation was consulted for the row description, but I'm not finding it. I can continue to flesh this out with documentation and test cases if there is consensus that this is the way to go. # select * from copy_srf('echo "x\ty"',true) as t(x text, y text); x | y ---+--- x | y (1 row) Time: 1.074 ms # select * from copy_srf('echo "x\t4"',true) as t(x text, y integer); x | y ---+--- x | 4 (1 row) Time: 1.095 ms # select * from copy_srf(null) as t(x text, y integer); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> a4 >> b5 >> \. server sent data ("D" message) without prior row description ("T" message) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 4dfedf8..26f81f3 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL STRICT IMMUTABLE PARALLEL SAFE AS 'jsonb_insert'; +CREATE OR REPLACE FUNCTION copy_srf( + IN filename text, + IN is_program boolean DEFAULT false, + IN format text DEFAULT null, + IN delimiter text DEFAULT null, + IN null_string text DEFAULT null, + IN header boolean DEFAULT null, + IN quote text DEFAULT null, + IN escape text DEFAULT null, + IN encoding text DEFAULT null) +RETURNS SETOF RECORD +LANGUAGE INTERNAL +VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT +AS 'copy_srf'; + -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather -- than use explicit 'superuser()' checks in those functions, we use the GRANT diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f9362be..4e6a32c 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -30,6 +30,7 @@ #include "commands/defrem.h" #include "commands/trigger.h" #include "executor/executor.h" +#include "funcapi.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" @@ -286,7 +287,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, +static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, + TupleDesc rsTupDesc, RawStmt *raw_query,
Re: [HACKERS] logical decoding of two-phase transactions
On 5 January 2017 at 20:43, Stas Kelvichwrote: > >> On 5 Jan 2017, at 13:49, Simon Riggs wrote: >> >> Surely in this case the master server is acting as the Transaction >> Manager, and it knows the mapping, so we are good? >> >> I guess if you are using >2 nodes then you need to use full 2PC on each node. >> >> Please explain precisely how you expect to use this, to check that GID >> is required. >> > > For example if we are using logical replication just for failover/HA and > allowing user > to be transaction manager itself. Then suppose that user prepared tx on > server A and server A > crashed. After that client may want to reconnect to server B and commit/abort > that tx. > But user only have GID that was used during prepare. > >> But even then, if you adopt the naming convention that all in-progress >> xacts will be called RepOriginId-EPOCH-XID, so they have a fully >> unique GID on all of the child nodes then we don't need to add the >> GID. > > Yes, that’s also possible but seems to be less flexible restricting us to some > specific GID format. > > Anyway, I can measure WAL space overhead introduced by the GID’s inside > commit records > to know exactly what will be the cost of such approach. Stas, Have you had a chance to look at this further? I think the approach of storing just the xid and fetching the GID during logical decoding of the PREPARE TRANSACTION is probably the best way forward, per my prior mail. That should eliminate Simon's objection re the cost of tracking GIDs and still let us have access to them when we want them, which is the best of both worlds really. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 8:22 PM, Stephen Frostwrote: > Apparently we disagree about what is a 'reasonable manner'. Yes. I think that a "reasonable manner" should mean "what the DBA thinks is reasonable", whereas you apparently think it should mean "what the DBA thinks is reasonable, but only if the core developers and in particular Stephen Frost also think it's reasonable". Your proposed policy is essentially that functions should have built-in superuser checks if having access to that function is sufficient to escalate your account to full superuser privileges. But: 1. There's no consensus on any such policy. 2. If there were such a policy it would favor, not oppose, changing pg_ls_dir(), because you can't escalate to superuser given access to pg_ls_dir(). Yet you are also opposed to changing pg_ls_dir() for reasons that boil down to a personal preference on your part for people not using it to build monitoring scripts. 3. Such a policy can only be enforced to the extent that we can accurately predict which functions can be used to escalate to superuser, which is not necessarily obvious in every case. Under your proposed policy, if a given function turns out to be more dangerous than we'd previously thought, we'd have to stick the superuser check back in for the next release. And if it turns out to be less dangerous than we thought, we'd take the check out. That would be silly. 4. Such a policy is useless from a security perspective because you can't actually prevent superusers from delegating access to those functions. You can force them to use wrapper functions but that doesn't eo ipso improve security. It might make security better or worse depending on how well the functions are written, and it seems extremely optimistic to suppose that everyone who writes a security definer wrapper function will actually do anything more than expose the underlying function as-is (and maybe forget to schema-qualify something). 5. If you're worried about people granting access to functions that allow escalation to the superuser account, what you really ought to do is put some effort into documenting which functions have such hazards and for what general reasons. That would have a much better chance of preventing people from delegating access to dangerous functions inadvertently than the current method, which relies on people knowing (without documentation) that you've attempted to leave hard-coded superuser() checks in some functions but not others for reasons that sometimes but not always include privilege escalation, correctly distinguishing which such cases involve privilege escalation as opposed to other arbitrary criteria, and deciding neither to create secdef wrappers for anything that has a built-in check for reasons of privilege isolation nor to give up on privilege isolation and hand out superuser to people who need pg_ls_dir(). While such a clever chain of deductive reasoning cannot be ruled out, it would be astonishing if it happened very often. I'd be willing to agree to write documentation along the lines suggested in (5) as a condition of removing the remaining superuser checks if you'd be willing to review it and suggest a place to put it. But I have a feeling compromise may not be possible here. To me, the hand-wringing about the evils of pg_ls_dir() on this thread contrasts rather starkly with the complete lack of discussion about whether the patch removing superuser checks from pgstattuple was opening up any security vulnerabilities. And given that the aforesaid patch lets a user who has EXECUTE privileges on pgstattuple run that function even on relations for which they have no other privileges, such as say pg_authid, it hardly seems self-evident that there are no leaks there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers