Re: [Proposal] Global temporary tables
pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik < k.knizh...@postgrespro.ru> napsal: > > > On 01.11.2019 18:26, Robert Haas wrote: > > On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik > > wrote: > >> It seems to me that I have found quite elegant solution for per-backend > statistic for GTT: I just inserting it in backend's catalog cache, but not > in pg_statistic table itself. > >> To do it I have to add InsertSysCache/InsertCatCache functions which > insert pinned entry in the correspondent cache. > >> I wonder if there are some pitfalls of such approach? > > That sounds pretty hackish. You'd have to be very careful, for > > example, that if the tables were dropped or re-analyzed, all of the > > old entries got removed -- > > I have checked it: > - when table is reanalyzed, then cache entries are replaced. > - when table is dropped, then cache entries are removed. > > > and then it would still fail if any code > > tried to access the statistics directly from the table, rather than > > via the caches. My assumption is that the statistics ought to be > > stored in some backend-private data structure designed for that > > purpose, and that the code that needs the data should be taught to > > look for it there when the table is a GTT. > > Yes, if you do "select * from pg_statistic" then you will not see > statistic for GTT in this case. > But I do not think that it is so critical. I do not believe that anybody > is trying to manually interpret values in this table. > And optimizer is retrieving statistic through sys-cache mechanism and so > is able to build correct plan in this case. > Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. I have another idea. Can be pg_statistics view instead a table? Some like SELECT * FROM pg_catalog.pg_statistics_rel UNION ALL SELECT * FROM pg_catalog.pg_statistics_gtt(); Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. Or crazy idea - today we can implement own types of heaps. Is possible to create engine where result can be combination of some shared data and local data. So union will be implemented on heap level. This implementation can be simple, just scanning pages from shared buffers and from local buffers. For these tables we don't need complex metadata. It's crazy idea, and I think so union with table function should be best. Regards Pavel > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: [BUG] Partition creation fails after dropping a column and adding a partial index
On Fri, Nov 01, 2019 at 09:58:26AM +0900, Amit Langote wrote: > On Thu, Oct 31, 2019 at 1:45 PM Michael Paquier wrote: >> The patch is rather simple as per the attached, with extended >> regression tests included. I have not checked on back-branches yet, >> but that's visibly wrong since 8b08f7d down to v11 (will do that when >> back-patching). > > The patch looks correct and applies to both v12 and v11. Thanks for the review, committed down to v11. The version for v11 had a couple of conflicts actually. >> There could be a point in changing convert_tuples_by_name_map & co so >> as they return the length of the map on top of the map to avoid such >> mistakes in the future. That's a more invasive patch not really >> adapted for a back-patch, but we could do that on HEAD once this bug >> is fixed. I have also checked other calls of this API and the >> handling is done correctly. > > I've been bitten by this logical error when deciding what length to > use for the map, so seems like a good idea. Okay, let's see about that then. -- Michael signature.asc Description: PGP signature
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 1, 2019 at 7:47 PM Michael Paquier wrote: > > This line of argument seems to me to be the moral equivalent of > > "let's drop 32-bit support altogether". I'm not entirely on board > > with that. Certainly, a lot of the world is 64-bit these days, > > but people are still building small systems and they might want > > a database; preferably one that hasn't been detuned to the extent > > that it barely manages to run at all on such a platform. Making > > a whole lot of internal APIs 64-bit would be a pretty big hit for > > a 32-bit platform --- more instructions, more memory consumed for > > things like Datum arrays, all in a memory space that's not that big. > > I don't agree as well with the line of arguments to just remove 32b > support. Clearly you didn't read what I actually wrote, Michael. -- Peter Geoghegan
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On 2019-11-02 11:47:26 +0900, Michael Paquier wrote: > On Fri, Nov 01, 2019 at 02:00:10PM -0400, Tom Lane wrote: > > Peter Geoghegan writes: > >> Even Raspberry Pi devices (which can cost as little as $35) use 64-bit > >> ARM processors. It's abundantly clear that 32-bit platforms do not > >> matter enough to justify keeping all the SIZEOF_DATUM crud around. > > > > This line of argument seems to me to be the moral equivalent of > > "let's drop 32-bit support altogether". I'm not entirely on board > > with that. Certainly, a lot of the world is 64-bit these days, > > but people are still building small systems and they might want > > a database; preferably one that hasn't been detuned to the extent > > that it barely manages to run at all on such a platform. Making > > a whole lot of internal APIs 64-bit would be a pretty big hit for > > a 32-bit platform --- more instructions, more memory consumed for > > things like Datum arrays, all in a memory space that's not that big. > > I don't agree as well with the line of arguments to just remove 32b > support. Nobody is actually proposing that, as far as I can tell? I mean, by all means argue that the overhead is too high, but just claiming that slowing down 32bit systems by a measurable fraction is morally equivalent to removing 32bit support seems pointlessly facetious. Greetings, Andres Freund
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 01, 2019 at 02:00:10PM -0400, Tom Lane wrote: > Peter Geoghegan writes: >> Even Raspberry Pi devices (which can cost as little as $35) use 64-bit >> ARM processors. It's abundantly clear that 32-bit platforms do not >> matter enough to justify keeping all the SIZEOF_DATUM crud around. > > This line of argument seems to me to be the moral equivalent of > "let's drop 32-bit support altogether". I'm not entirely on board > with that. Certainly, a lot of the world is 64-bit these days, > but people are still building small systems and they might want > a database; preferably one that hasn't been detuned to the extent > that it barely manages to run at all on such a platform. Making > a whole lot of internal APIs 64-bit would be a pretty big hit for > a 32-bit platform --- more instructions, more memory consumed for > things like Datum arrays, all in a memory space that's not that big. I don't agree as well with the line of arguments to just remove 32b support. The newest models of PI indeed use 64b ARM processors, but the first model, as well as the PI zero are on 32b if I recall correctly, and I would like to believe that these are still widely used. -- Michael signature.asc Description: PGP signature
Re: Make StringInfo available to frontend code.
Hi, On 2019-11-01 23:19:33 +0100, Daniel Gustafsson wrote: > > On 30 Oct 2019, at 01:10, Andres Freund wrote: > > >Make StringInfo available to frontend code. > > I’ve certainly wanted just that on multiple occasions, so +1 on this. Cool. > + * StringInfo provides an extensible string data type. It can be used to > > It might be useful to point out the upper bound on the extensibility in the > rewrite of this sentence, and that it’s not guaranteed to be consistent > between > frontend and backend. Hm. Something like 'Currently the maximum length of a StringInfo is 1GB.'? I don't really think it's worth pointing out that they may not be consistent, when they currently are... And I suspect we should just fix the length limit to be higher for both, perhaps somehow allowing to limit the length for the backend cases where we want to error out if a string gets too long (possibly adding a separate initialization API that allows to specify the memory allocation flags or such). > > I'm still using stringinfo in the aforementioned thread, and I also want > > to use it in a few more places. On the more ambitious side I really > > would like to have a minimal version of elog.h available in the backend, > > and that would really be a lot easier with stringinfo available. > > s/backend/frontend/? Indeed. Thanks for looking, Andres Freund
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
I wrote: > Anyway, with the benefit of more time to let this thing percolate > in my hindbrain, I am thinking that the fundamental error we've made > is to do transformAlterTableStmt in advance of execution *at all*. > The idea I now have is to scrap that, and instead apply the > parse_utilcmd.c transformations individually to each AlterTable > subcommand when it reaches execution in "phase 2" of AlterTable(). Attached is a patch that does things that way. This appears to fix all of the previously reported order-of-operations bugs in ALTER TABLE, although there's still some squirrely-ness around identity columns. My original thought of postponing all parse analysis into the execution phase turned out to be not quite right. We still want to analyze ALTER COLUMN TYPE subcommands before we start doing anything. The reason why is that any USING expressions in those subcommands should all be parsed against the table's starting rowtype, since those expressions will all be evaluated against that state during a single rewrite pass in phase 3. Fortunately (but not coincidentally, I think) the execution-passes design is "DROP, then ALTER COLUMN TYPE, then everything else", so that this is okay. I had to do some other finagling to get it to work, notably breaking down some of the passes a bit more. This allows us to have a rule that any new subcommands deduced during mid-execution parse analysis steps will be executed in a strictly later pass. It might've been possible to allow it to be "same pass", but I thought that would be putting an undesirable amount of reliance on the semantics of appending to a list that some other function is busy scanning. What I did about the API issues we were arguing about before was just to move the logic ProcessUtilitySlow had for handling non-AlterTableStmts generated by ALTER TABLE parse analysis into a new function that tablecmds.c calls. This doesn't really resolve any of the questions I had about event trigger processing, but I think it at least doesn't make anything worse. (The event trigger, logical decoding, and sepgsql tests all pass without any changes.) It's tempting to consider providing a similar API for CREATE SCHEMA to use, but I didn't do so here. The squirrely-ness around identity is that while this now works: regression=# CREATE TABLE itest8 (f1 int); CREATE TABLE regression=# ALTER TABLE itest8 regression-# ADD COLUMN f2 int NOT NULL, regression-# ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY; ALTER TABLE it doesn't work if there's rows in the table: regression=# CREATE TABLE itest8 (f1 int); CREATE TABLE regression=# insert into itest8 default values; INSERT 0 1 regression=# ALTER TABLE itest8 ADD COLUMN f2 int NOT NULL, ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY; ERROR: column "f2" contains null values The same would be true if you tried to do the ALTER as two separate operations (because the ADD ... NOT NULL, without a default, will naturally fail on a nonempty table). So I don't feel *too* awful about that. But it'd be better if this worked. It'll require some refactoring of where the dependency link from an identity column to its sequence gets set up. This patch seems large enough as-is, and it covers all the cases we've gotten field complaints about, so I'm content to leave the residual identity issues for later. regards, tom lane diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8d25d14..7dc5d9a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -86,6 +86,7 @@ #include "storage/lock.h" #include "storage/predicate.h" #include "storage/smgr.h" +#include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -144,11 +145,13 @@ static List *on_commits = NIL; #define AT_PASS_OLD_CONSTR 3 /* re-add existing constraints */ /* We could support a RENAME COLUMN pass here, but not currently used */ #define AT_PASS_ADD_COL 4 /* ADD COLUMN */ -#define AT_PASS_COL_ATTRS 5 /* set other column attributes */ -#define AT_PASS_ADD_INDEX 6 /* ADD indexes */ -#define AT_PASS_ADD_CONSTR 7 /* ADD constraints, defaults */ -#define AT_PASS_MISC 8 /* other stuff */ -#define AT_NUM_PASSES 9 +#define AT_PASS_ADD_CONSTR 5 /* ADD constraints (initial examination) */ +#define AT_PASS_COL_ATTRS 6 /* set column attributes, eg NOT NULL */ +#define AT_PASS_ADD_INDEXCONSTR 7 /* ADD index-based constraints */ +#define AT_PASS_ADD_INDEX 8 /* ADD indexes */ +#define AT_PASS_ADD_OTHERCONSTR 9 /* ADD other constraints, defaults */ +#define AT_PASS_MISC 10 /* other stuff */ +#define AT_NUM_PASSES 11 typedef struct AlteredTableInfo { @@ -161,6 +164,7 @@ typedef struct AlteredTableInfo /* Information saved by Phases 1/2 for Phase 3: */ List *constraints; /* List of NewConstraint */ List *newvals; /* List of NewColumnValue */ + List *afterStmts; /* List of utility command
Re: Make StringInfo available to frontend code.
> On 30 Oct 2019, at 01:10, Andres Freund wrote: >Make StringInfo available to frontend code. I’ve certainly wanted just that on multiple occasions, so +1 on this. >Therefore it seems best to just making StringInfo being usable by >frontend code. There's not much to do for that, except for rewriting >two subsequent elog/ereport calls into others types of error >reporting, and deciding on a maximum string length. Skimming (but not testing) the patch, it seems a reasonable approach. + * StringInfo provides an extensible string data type. It can be used to It might be useful to point out the upper bound on the extensibility in the rewrite of this sentence, and that it’s not guaranteed to be consistent between frontend and backend. > I'm still using stringinfo in the aforementioned thread, and I also want > to use it in a few more places. On the more ambitious side I really > would like to have a minimal version of elog.h available in the backend, > and that would really be a lot easier with stringinfo available. s/backend/frontend/? cheers ./daniel
Re: Join Correlation Name
On 30/10/2019 09:04, Fabien COELHO wrote: > >> I think possibly what the spec says (and that neither my patch nor >> Peter's implements) is assigning the alias just to the > list>. > > I think you are right, the alias is only on the identical columns. > > It solves the issue I raised about inaccessible attributes, and > explains why it is only available with USING and no other join variants. > >> So my original example query should actually be: >> >> SELECT a.x, b.y, j.z FROM a INNER JOIN b USING (z) AS j; > > Yep, only z should be in j, it is really just about the USING clause. My reading of SQL:2016-2 7.10 SR 11.a convinces me that this is the case. My reading of transformFromClauseItem() convinces me that this is way over my head and I have to abandon it. :-(
Re: fe-utils - share query cancellation code
Then you need to add #include libpq-fe.h in cancel.h. Our policy is that headers compile standalone (c.h / postgres_fe.h / postgres.h excluded). Ok. I do not make a habit of adding headers in postgres, so I did not notice there was an alphabetical logic to that. Attached patch v4 does it. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 03bcd22996..904d6f0e00 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -59,6 +59,7 @@ #include "common/int.h" #include "common/logging.h" +#include "fe_utils/cancel.h" #include "fe_utils/conditional.h" #include "getopt_long.h" #include "libpq-fe.h" @@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con) exit(1); } + if (CancelRequested) + break; + /* * If we want to stick with the original logging, print a message each * 100k inserted rows. @@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps) if ((con = doConnect()) == NULL) exit(1); + setup_cancel_handler(NULL); + SetCancelConn(con); + for (step = initialize_steps; *step != '\0'; step++) { instr_time start; @@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps) } fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data); + ResetCancelConn(); PQfinish(con); termPQExpBuffer(); } diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index b981ae81ff..f1d9e0298a 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -29,6 +29,7 @@ #include "libpq-fe.h" #include "pqexpbuffer.h" #include "common/logging.h" +#include "fe_utils/cancel.h" #include "fe_utils/print.h" #include "fe_utils/string_utils.h" diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 90f6380170..0a66b71372 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -24,6 +24,7 @@ #include "copy.h" #include "crosstabview.h" #include "fe_utils/mbprint.h" +#include "fe_utils/cancel.h" #include "fe_utils/string_utils.h" #include "portability/instr_time.h" #include "settings.h" @@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message) } +#ifndef WIN32 /* * Code to support query cancellation @@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message) * * SIGINT is supposed to abort all long-running psql operations, not only * database queries. In most places, this is accomplished by checking - * cancel_pressed during long-running loops. However, that won't work when + * CancelRequested during long-running loops. However, that won't work when * blocked on user input (in readline() or fgets()). In those places, we * set sigint_interrupt_enabled true while blocked, instructing the signal * catcher to longjmp through sigint_interrupt_jmp. We assume readline and @@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false; sigjmp_buf sigint_interrupt_jmp; -static PGcancel *volatile cancelConn = NULL; - -#ifdef WIN32 -static CRITICAL_SECTION cancelConnLock; -#endif - -/* - * Write a simple string to stderr --- must be safe in a signal handler. - * We ignore the write() result since there's not much we could do about it. - * Certain compilers make that harder than it ought to be. - */ -#define write_stderr(str) \ - do { \ - const char *str_ = (str); \ - int rc_; \ - rc_ = write(fileno(stderr), str_, strlen(str_)); \ - (void) rc_; \ - } while (0) - - -#ifndef WIN32 - static void -handle_sigint(SIGNAL_ARGS) +psql_sigint_callback(void) { - int save_errno = errno; - char errbuf[256]; - /* if we are waiting for input, longjmp out of it */ if (sigint_interrupt_enabled) { @@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS) } /* else, set cancel flag to stop any long-running loops */ - cancel_pressed = true; - - /* and send QueryCancel if we are processing a database query */ - if (cancelConn != NULL) - { - if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) - write_stderr("Cancel request sent\n"); - else - { - write_stderr("Could not send cancel request: "); - write_stderr(errbuf); - } - } - - errno = save_errno; /* just in case the write changed it */ + CancelRequested = true; } -void -setup_cancel_handler(void) -{ - pqsignal(SIGINT, handle_sigint); -} -#else /* WIN32 */ - -static BOOL WINAPI -consoleHandler(DWORD dwCtrlType) -{ - char errbuf[256]; - - if (dwCtrlType == CTRL_C_EVENT || - dwCtrlType == CTRL_BREAK_EVENT) - { - /* - * Can't longjmp here, because we are in wrong thread :-( - */ - - /* set cancel flag to stop any long-running loops */ - cancel_pressed = true; - - /* and send QueryCancel if we are processing a database query */ - EnterCriticalSection(); - if (cancelConn != NULL) - { - if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) -write_stderr("Cancel request sent\n"); - else - { -write_stderr("Could not send cancel request: "); -write_stderr(errbuf); - } - } - LeaveCriticalSection(); - - return TRUE; - } - else - /* Return FALSE for any
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 1, 2019 at 1:19 PM Robert Haas wrote: > Yeah! I mean, users who are using only 4-byte or smaller pass-by-value > quantities will be harmed, especially in cases where they are storing > a lot of them at the same time (e.g. sorting) and especially if they > double their space consumption and run out of their very limited > supply of memory. I think that you meant treble, not double. You're forgetting about the palloc() header overhead. ;-) Doing even slightly serious work on a 32-bit machine is penny wise and pound foolish. I also believe that we should support minority platforms reasonably well, including 32-bit platforms, because it's always a good idea to try to meet people where they are. Your proposal doesn't seem like it really gives up on that goal. > Those are all worthwhile considerations and perhaps > strong arguments against my proposal. But, people using 8-byte > oughta-be-pass-by-value quantities are certainly being harmed by the > present system. It's not a black-and-white thing, like, oh, 8-byte > datums on 32-bit system is awful all the time. Or at least, I don't > think it is. Right. > Yeah, I've had to fight with this multiple times, and so have other > people. It's a nuisance, and it causes bugs, certainly in draft > patches, sometimes in committed ones. It's not "free." If it's a cost > worth paying, ok, but is it? #ifdef crud is something that we should go out of our way to eliminate on general principle. All good portable C codebases go to great lengths to encapsulate platform differences, if necessary by adding a compatibility layer. One of the worst things about the OpenSSL codebase is that it makes writing portable code everybody's problem. -- Peter Geoghegan
excluding CREATE INDEX CONCURRENTLY from OldestXmin
It would be useful to have CREATE INDEX CONCURRENTLY be ignored by vacuuming's OldestXmin. Frequently in OLTP scenarios, CIC transactions are severely disruptive because they are the only long-running transactions in the system, and VACUUM has to keep rows only for their sake, pointlessly. The motivation for this change seems well justified to me (but feel free to argue if you think otherwise). So the question is how to implement this. Here's a very small patch for it: diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 374e2d0efe..9081dfe384 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -532,6 +532,12 @@ DefineIndex(Oid relationId, errmsg("cannot use more than %d columns in an index", INDEX_MAX_KEYS))); + if (stmt->concurrent && !IsTransactionBlock()) + { + Assert(GetCurrentTransactionIdIfAny() == InvalidTransactionId); + MyPgXact->vacuumFlags |= PROC_IN_VACUUM; + } + /* * Only SELECT ... FOR UPDATE/SHARE are allowed while doing a standard * index build; but for concurrent builds we allow INSERT/UPDATE/DELETE There's an obvious flaw, which is that this doesn't consider expressions in partial indexes and column definitions. That's moderately easy to fix. But there are less obvious flaws, such as: is it possible that CIC's xmin is required for other reasons? (such as access to catalogs, which get cleaned by concurrent vacuum) If it is, can we fix that problem by keeping track of a separate Xmin for catalog vacuuming purposes? (We already have catalog_xmin for replication slots, so this is not completely ridiculous I think ...) -- Álvaro Herrera
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 1, 2019 at 3:15 PM Peter Geoghegan wrote: > I don't think that those two things are equivalent at all. There may > even be workloads that will benefit when run on 32-bit hardware. > Having to palloc() and pfree() with 8 byte integers is probably very > slow. Yeah! I mean, users who are using only 4-byte or smaller pass-by-value quantities will be harmed, especially in cases where they are storing a lot of them at the same time (e.g. sorting) and especially if they double their space consumption and run out of their very limited supply of memory. Those are all worthwhile considerations and perhaps strong arguments against my proposal. But, people using 8-byte oughta-be-pass-by-value quantities are certainly being harmed by the present system. It's not a black-and-white thing, like, oh, 8-byte datums on 32-bit system is awful all the time. Or at least, I don't think it is. > The mental burden of considering "SIZEOF_DATUM == 4" and > "USE_FLOAT8_BYVAL" is a real cost for us. We maintain non-trivial > 32-bit only code for numeric abbreviated keys. We also have to worry > about pfree()'ing memory when USE_FLOAT8_BYVAL within > heapam_index_validate_scan(). How confident are we that there isn't > some place that leaks memory on !USE_FLOAT8_BYVAL builds because > somebody forgot to add a pfree() in an #ifdef block? Yeah, I've had to fight with this multiple times, and so have other people. It's a nuisance, and it causes bugs, certainly in draft patches, sometimes in committed ones. It's not "free." If it's a cost worth paying, ok, but is it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian wrote: > OK, I think you are missing something. Let me go over the details. > First, I think we are all agreed we are using CTR for heap/index pages, > and for WAL, because CTR allows byte granularity, it is faster, and > might be more secure. > > So, to write 8k heap/index pages, we use the agreed-on LSN/page-number > to encrypt each page. In CTR mode, we do that by creating an 8k bit > stream, which is created in 16-byte chunks with AES by incrementing the > counter used for each 16-byte chunk. Wee then XOR the bits with what we > want to encrypt, and skip the LSN and CRC parts of the page. Seems reasonable (not that I am an encryption expert). > For WAL, we effectively create a 16MB bitstream, though we can create it > in parts as needed. (Creating it in parts is easier in CTR mode.) The > nonce is the segment number, but each 16-byte chunk uses a different > counter. Therefore, even if you are encrypting the same 8k page several > times in the WAL, the 8k page would be different because of the LSN (and > other changes), and the bitstream you encrypt/XOR it with would be > different because the counter would be different for that offset in the > WAL. But, if you encrypt the same WAL page several times, the LSN won't change, because a WAL page doesn't have an LSN on it, and if it did, it wouldn't be changing, because an LSN is just a position within the WAL stream, so any given byte on any given WAL page always has the same LSN, whatever it is. And if the counter value changed on re-encryption, I don't see how we'd know what counter value to use when decrypting. There's no way for the code that is decrypting to know how many times the page got rewritten as it was being filled. Please correct me if I'm being stupid here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Mon, Aug 5, 2019 at 8:44 PM Bruce Momjian wrote: > Right. The 8k page LSN changes each time the page is modified, and the > is part of the page nonce. What about hint bit changes? I think even with wal_log_hints=on, it's not the case that *every* change to hint bits results in an LSN change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: fe-utils - share query cancellation code
On 2019-Nov-01, Fabien COELHO wrote: > Because "cancel.h" requires PGconn declaration, thus must appear after > "libpq-fe.h", Then you need to add #include libpq-fe.h in cancel.h. Our policy is that headers compile standalone (c.h / postgres_fe.h / postgres.h excluded). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 1, 2019 at 11:00 AM Tom Lane wrote: > This line of argument seems to me to be the moral equivalent of > "let's drop 32-bit support altogether". I'm not entirely on board > with that. I don't think that those two things are equivalent at all. There may even be workloads that will benefit when run on 32-bit hardware. Having to palloc() and pfree() with 8 byte integers is probably very slow. > Certainly, a lot of the world is 64-bit these days, > but people are still building small systems and they might want > a database; preferably one that hasn't been detuned to the extent > that it barely manages to run at all on such a platform. Even the very low end is 64-bit these days. $100 smartphones have 64-bit CPUs and 4GB of ram. AFAICT, anything still being produced that is recognizable as a general purpose CPU (e.g. by having virtual memory) is 64-bit. We're talking about a change that can't affect users until late 2020 at the earliest. I accept that there are some number of users that still have 32-bit Postgres installations, probably because they happened to have a 32-bit machine close at hand. The economics of running a database server on a 32-bit machine are already awful, though. > It seems especially insane to conclude that we should pull the plug > on such use-cases just to get rid of one obscure configure option. > If we were expending any significant devel effort on supporting > 32-bit platforms, I might be ready to drop support, but we're not. > (Robert's proposal looks to me like it's actually creating new work > to do, not saving work.) The mental burden of considering "SIZEOF_DATUM == 4" and "USE_FLOAT8_BYVAL" is a real cost for us. We maintain non-trivial 32-bit only code for numeric abbreviated keys. We also have to worry about pfree()'ing memory when USE_FLOAT8_BYVAL within heapam_index_validate_scan(). How confident are we that there isn't some place that leaks memory on !USE_FLOAT8_BYVAL builds because somebody forgot to add a pfree() in an #ifdef block? -- Peter Geoghegan
Re: ICU for global collation
Peter Eisentraut wrote: > I looked into this problem. The way to address this would be adding > proper collation support to the text search subsystem. See the TODO > markers in src/backend/tsearch/ts_locale.c for starting points. These > APIs spread out to a lot of places, so it will take some time to finish. > In the meantime, I'm pausing this thread and will set the CF entry as RwF. Even if the FTS code is improved in that matter, any extension code with libc functions depending on LC_CTYPE is still going to be potentially problematic. In particular when it happens to be set to a different encoding than the database. Couldn't we simply invent per-database GUC options, as in ALTER DATABASE myicudb SET libc_lc_ctype TO 'value'; ALTER DATABASE myicudb SET libc_lc_collate TO 'value'; where libc_lc_ctype/libc_lc_collate would specifically set the values in the LC_CTYPE and LC_COLLATE environment vars of any backend serving the corresponding database"? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [PATCH] Implement INSERT SET syntax
On Fri, Nov 1, 2019 at 6:31 PM Robert Haas wrote: > On Sun, Aug 18, 2019 at 11:00 AM Tom Lane wrote: > > Perhaps the way to resolve Peter's objection is to make the syntax > > more fully like UPDATE: > > > > INSERT INTO target SET c1 = x, c2 = y+z, ... FROM tables-providing-x-y-z > > > > (with the patch as-submitted corresponding to the case with an empty > > FROM clause, hence no variables in the expressions-to-be-assigned). > > > > Of course, this is not functionally distinct from > > > > INSERT INTO target(c1,c2,...) SELECT x, y+z, ... FROM > tables-providing-x-y-z > > > > and it's fair to question whether it's worth supporting a nonstandard > > syntax just to allow the target column names to be written closer to > > the expressions-to-be-assigned. > > For what it's worth, I think this would be useful enough to justify > its existence. Back in days of yore when dragons roamed the earth and > I wrote database-driven applications instead of hacking on the > database itself, I often wondered why I had to write two > completely-different looking SQL statements, one to insert the data > which a user had entered into a webform into the database, and another > to update previously-entered data. This feature would allow those > queries to be written in the same way, which would have pleased me, > back in the day. > I still do, and this would be a big help. I don't care if it's non-standard. .m
Re: Remove configure --disable-float4-byval and --disable-float8-byval
Peter Geoghegan writes: > On Fri, Nov 1, 2019 at 7:41 AM Robert Haas wrote: >> Could we get around this by making Datum 8 bytes everywhere? > I really like that idea. > Even Raspberry Pi devices (which can cost as little as $35) use 64-bit > ARM processors. It's abundantly clear that 32-bit platforms do not > matter enough to justify keeping all the SIZEOF_DATUM crud around. This line of argument seems to me to be the moral equivalent of "let's drop 32-bit support altogether". I'm not entirely on board with that. Certainly, a lot of the world is 64-bit these days, but people are still building small systems and they might want a database; preferably one that hasn't been detuned to the extent that it barely manages to run at all on such a platform. Making a whole lot of internal APIs 64-bit would be a pretty big hit for a 32-bit platform --- more instructions, more memory consumed for things like Datum arrays, all in a memory space that's not that big. It seems especially insane to conclude that we should pull the plug on such use-cases just to get rid of one obscure configure option. If we were expending any significant devel effort on supporting 32-bit platforms, I might be ready to drop support, but we're not. (Robert's proposal looks to me like it's actually creating new work to do, not saving work.) regards, tom lane
Re: Allow superuser to grant passwordless connection rights on postgres_fdw
On 11/1/19 12:58 PM, Robert Haas wrote: > On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan > wrote: >> This patch allows the superuser to grant passwordless connection rights >> in postgres_fdw user mappings. > This is clearly something that we need, as the current code seems > woefully ignorant of the fact that passwords are not the only > authentication method supported by PostgreSQL, nor even the most > secure. > > But, I do wonder a bit if we ought to think harder about the overall > authentication model for FDW. Like, maybe we'd take a different view > of how to solve this particular piece of the problem if we were > thinking about how FDWs could do LDAP authentication, SSL > authentication, credentials forwarding... > I'm certainly open to alternatives. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ssl passphrase callback
On 11/1/19 11:01 AM, Robert Haas wrote: > On Thu, Oct 31, 2019 at 11:37 AM Andrew Dunstan > wrote: >> This patch provides a hook for a function that can supply an SSL >> passphrase. The hook can be filled in by a shared preloadable module. In >> order for that to be effective, the startup order is modified slightly. >> There is a test attached that builds and uses one trivial >> implementation, which just takes a configuration setting and rot13's it >> before supplying the result as the passphrase. > It seems to me that it would be a lot better to have an example in > contrib that does something which might be of actual use to users, > such as running a shell command and reading the passphrase from > stdout. > > Features that are only accessible by writing C code are, in general, > not as desirable as features which can be accessed via SQL or > configuration. > Well, I tried to provide the most trivial and simple test I could come up with. Running a shell command can already be accomplished via the ssl_passphrase_command setting. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: MarkBufferDirtyHint() and LSN update
Robert Haas wrote: > On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska wrote: > > 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear > > BM_DIRTY because MarkBufferDirtyHint() has eventually set > > BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next > > call of FlushBuffer(). However page LSN is hasn't been updated so the > > requirement that WAL must be flushed first is not met. > > This part confuses me. Are you saying that MarkBufferDirtyHint() can > set BM_JUST_DIRTIED without setting BM_DIRTY? No, I'm saying that MarkBufferDirtyHint() leaves BM_DIRTY set, as expected. However, if things happen in the order I described, then LSN returned by XLogSaveBufferForHint() is not assigned to the page. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: fe-utils - share query cancellation code
I understand that you are unhappy about something, but where the issue is fails me, the "wtf" 3 characters are not enough to point me in the right direction. Feel free to elaborate a little bit more:-) I don't see why you move the "conditional.h" line out of its correct alphabetical position (where it is now), and then add "cancel.h" next to it also out of its correct alphabetical position. Because "cancel.h" requires PGconn declaration, thus must appear after "libpq-fe.h", and once I put it after that letting "conditional.h" above & alone looked a little bit silly. I put cancel after conditional because it was the new addition, which is somehow logical, although not alphabetical. Now I can put cancel before conditional, sure. Patch v3 attached does that. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 03bcd22996..7aab4bdf47 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -59,9 +59,10 @@ #include "common/int.h" #include "common/logging.h" -#include "fe_utils/conditional.h" #include "getopt_long.h" #include "libpq-fe.h" +#include "fe_utils/cancel.h" +#include "fe_utils/conditional.h" #include "pgbench.h" #include "portability/instr_time.h" @@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con) exit(1); } + if (CancelRequested) + break; + /* * If we want to stick with the original logging, print a message each * 100k inserted rows. @@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps) if ((con = doConnect()) == NULL) exit(1); + setup_cancel_handler(NULL); + SetCancelConn(con); + for (step = initialize_steps; *step != '\0'; step++) { instr_time start; @@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps) } fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data); + ResetCancelConn(); PQfinish(con); termPQExpBuffer(); } diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index b981ae81ff..f1d9e0298a 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -29,6 +29,7 @@ #include "libpq-fe.h" #include "pqexpbuffer.h" #include "common/logging.h" +#include "fe_utils/cancel.h" #include "fe_utils/print.h" #include "fe_utils/string_utils.h" diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 90f6380170..0a66b71372 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -24,6 +24,7 @@ #include "copy.h" #include "crosstabview.h" #include "fe_utils/mbprint.h" +#include "fe_utils/cancel.h" #include "fe_utils/string_utils.h" #include "portability/instr_time.h" #include "settings.h" @@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message) } +#ifndef WIN32 /* * Code to support query cancellation @@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message) * * SIGINT is supposed to abort all long-running psql operations, not only * database queries. In most places, this is accomplished by checking - * cancel_pressed during long-running loops. However, that won't work when + * CancelRequested during long-running loops. However, that won't work when * blocked on user input (in readline() or fgets()). In those places, we * set sigint_interrupt_enabled true while blocked, instructing the signal * catcher to longjmp through sigint_interrupt_jmp. We assume readline and @@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false; sigjmp_buf sigint_interrupt_jmp; -static PGcancel *volatile cancelConn = NULL; - -#ifdef WIN32 -static CRITICAL_SECTION cancelConnLock; -#endif - -/* - * Write a simple string to stderr --- must be safe in a signal handler. - * We ignore the write() result since there's not much we could do about it. - * Certain compilers make that harder than it ought to be. - */ -#define write_stderr(str) \ - do { \ - const char *str_ = (str); \ - int rc_; \ - rc_ = write(fileno(stderr), str_, strlen(str_)); \ - (void) rc_; \ - } while (0) - - -#ifndef WIN32 - static void -handle_sigint(SIGNAL_ARGS) +psql_sigint_callback(void) { - int save_errno = errno; - char errbuf[256]; - /* if we are waiting for input, longjmp out of it */ if (sigint_interrupt_enabled) { @@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS) } /* else, set cancel flag to stop any long-running loops */ - cancel_pressed = true; - - /* and send QueryCancel if we are processing a database query */ - if (cancelConn != NULL) - { - if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) - write_stderr("Cancel request sent\n"); - else - { - write_stderr("Could not send cancel request: "); - write_stderr(errbuf); - } - } - - errno = save_errno; /* just in case the write changed it */ + CancelRequested = true; } -void -setup_cancel_handler(void) -{ - pqsignal(SIGINT, handle_sigint); -} -#else /* WIN32 */ - -static BOOL WINAPI -consoleHandler(DWORD dwCtrlType) -{ - char errbuf[256]; - - if (dwCtrlType == CTRL_C_EVENT || - dwCtrlType == CTRL_BREAK_EVENT) - {
Re: On disable_cost
On 2019-11-01 12:56:30 -0400, Robert Haas wrote: > On Fri, Nov 1, 2019 at 12:43 PM Andres Freund wrote: > > As a first step I'd be inclined to "just" adjust disable_cost up to > > something like 1.0e12. Unfortunately much higher and and we're getting > > into the area where the loss of precision starts to be significant > > enough that I'm not sure that we're always careful enough to perform > > math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000 > > being 1e+20). I've seen queries with costs above 1e10 where that costing > > wasn't insane. > > We've done that before and we can do it again. But we're going to need > to have something better eventually, I think, not just keep kicking > the can down the road. Yea, that's why I continued on to describe what we should do afterwards ;) > Yet another approach would be to divide the cost into two parts, a > "cost" component and a "violations" component. If two paths are > compared, the one with fewer violations always wins; if it's a tie, > they compare on cost. A path's violation count is the total of its > children, plus one for itself if it does something that's disabled. > This would be more principled than the current approach, but maybe > it's too costly. Namely go for something like this. I think we probably get away with the additional comparison, especially if we were to store the violations as an integer and did it like if (unlikely(path1->nviolations != path2->nviolations)) or such - that ought to be very well predicted in nearly all cases. I wonder how much we'd need to reformulate compare_path_costs/compare_path_costs_fuzzily to allow the compiler to auto-vectorize. Might not be worth caring... Greetings, Andres Freund
Re: 64 bit transaction id
On Fri, Nov 01, 2019 at 12:05:12PM +0300, Павел Ерёмин wrote: Hi. sorry for my English. I want to once again open the topic of 64 bit transaction id. I did not manage to find in the archive of the option that I want to discuss, so I write. If I searched poorly, then please forgive me. The idea is not very original and probably has already been considered, again I repeat - I did not find it. Therefore, please do not scold me severely. In discussions of 64-bit transaction id, I did not find mention of an algorithm for storing them, as it was done, for example, in MS SQL Server. What if instead of 2 fields (xmin and xmax) with a total length of 64 bits - use 1 field (let's call it xid) with a length of 64 bits in tuple header? In this field store the xid of the transaction that created the version. In this case, the new transaction in order to understand whether the read version is suitable for it or not, will have to read the next version as well. Those. The downside of such decision is of course an increase in I / O. Transactions will have to read the +1 version. On the plus side, the title of the tuple remains the same length. I think that assumes we can easily identify the next version of a tuple, and I don't think we can do that. We may be able to do that for for HOT chains, but that only works when the next version fits onto the same page (and does not update indexed columns). But when we store the new version on a separate page, we don't have any link between those tuples. And adding it may easily mean more overhead than the 8B we'd save by only storing a single XID. IMO the most promising solution to this is the "page epoch" approach discussed some time ago (1-2 years?). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Resume vacuum and autovacuum from interruption and cancellation
On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih wrote: > Sounds like an interesting idea, but does it really help? Because if > vacuum was interrupted previously, wouldn't it already know the dead > tuples, etc in the next run quite quickly, as the VM, FSM is already > updated for the page in the previous run. +1. I don't deny that a patch like this could sometimes save something, but it doesn't seem like it would save all that much all that often. If your autovacuum runs are being frequently cancelled, that's going to be a big problem, I think. And as Rafia says, even though you might do a little extra work reclaiming garbage from subsequently-modified pages toward the beginning of the table, it would be unusual if they'd *all* been modified. Plus, if they've recently been modified, they're more likely to be in cache. I think this patch really needs a test scenario or demonstration of some kind to prove that it produces a measurable benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: 64 bit transaction id
On Fri, Nov 01, 2019 at 10:25:17AM +0100, Pavel Stehule wrote: Hi pá 1. 11. 2019 v 10:11 odesílatel Павел Ерёмин napsal: Hi. sorry for my English. I want to once again open the topic of 64 bit transaction id. I did not manage to find in the archive of the option that I want to discuss, so I write. If I searched poorly, then please forgive me. The idea is not very original and probably has already been considered, again I repeat - I did not find it. Therefore, please do not scold me severely. In discussions of 64-bit transaction id, I did not find mention of an algorithm for storing them, as it was done, for example, in MS SQL Server. What if instead of 2 fields (xmin and xmax) with a total length of 64 bits - use 1 field (let's call it xid) with a length of 64 bits in tuple header? In this field store the xid of the transaction that created the version. In this case, the new transaction in order to understand whether the read version is suitable for it or not, will have to read the next version as well. Those. The downside of such decision is of course an increase in I / O. Transactions will have to read the +1 version. On the plus side, the title of the tuple remains the same length. is 32 bit tid really problem? Why you need to know state of last 2^31 transactions? Is not problem in too low usage (or maybe too high overhead) of VACUUM FREEZE. It certainly can be an issue for large and busy systems, that may need anti-wraparoud vacuum every couple of days. If that requires rewriting a couple of TB of data, it's not particularly nice. That's why 64-bit XIDs were discussed repeatedly in the past, and it's likely to get even more pressing as the systems get larger. I am not sure if increasing this range can has much more fatal problems (maybe later) Well, not fatal, but naive approaches can increase per-tuple overhead. And we already have plenty of that, hence there were proposals to use page epochs and so on. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: On disable_cost
On Fri, Nov 01, 2019 at 09:30:52AM -0700, Jim Finnerty wrote: re: coping with adding disable_cost more than once Another option would be to have a 2-part Cost structure. If disable_cost is ever added to the Cost, then you set a flag recording this. If any plans exist that have no disable_costs added to them, then the planner chooses the minimum cost among those, otherwise you choose the minimum cost path. Yeah, I agree having is_disabled flag, and treat all paths with 'true' as more expensive than paths with 'false' (and when both paths have the same value then actually compare the cost) is probably the way forward. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pglz performance
On Fri, Nov 01, 2019 at 12:48:28PM -0300, Alvaro Herrera wrote: On 2019-Nov-01, Peter Eisentraut wrote: On 2019-10-25 07:05, Andrey Borodin wrote: > > 21 окт. 2019 г., в 14:09, Andrey Borodin написал(а): > > > > With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. > > Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. > > Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. Your message from 21 October appears to say that this change makes the performance worse. So I don't know how to proceed with this. As I understand that report, in these results "less is better", so the hacked8 variant shows better performance (33.8) than current (42.5). The "hacked" variant shows worse performance (48.2) that the current code. The "in spite" phrase seems to have been a mistake. I am surprised that there is so much variability in the performance numbers, though, based on such small tweaks of the code. I'd try running the benchmarks to verify the numbers, and maybe do some additional tests, but it's not clear to me which patches should I use. I think the last patches with 'hacked' and 'hacked8' in the name are a couple of months old, and the recent posts attach just a single patch. Andrey, can you post current versions of both patches? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Allow superuser to grant passwordless connection rights on postgres_fdw
On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan wrote: > This patch allows the superuser to grant passwordless connection rights > in postgres_fdw user mappings. This is clearly something that we need, as the current code seems woefully ignorant of the fact that passwords are not the only authentication method supported by PostgreSQL, nor even the most secure. But, I do wonder a bit if we ought to think harder about the overall authentication model for FDW. Like, maybe we'd take a different view of how to solve this particular piece of the problem if we were thinking about how FDWs could do LDAP authentication, SSL authentication, credentials forwarding... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)
On Fri, Nov 01, 2019 at 11:01:22PM +0900, Michael Paquier wrote: > On Fri, Nov 01, 2019 at 08:59:48AM -0500, Justin Pryzby wrote: > > I guess you mean because it's not a child until after the ALTER. Yes, that > > makes sense. > > Yes, perhaps you have another idea than mine on how to shape this > sentence? I can't think of anything better. Attaching a partition acquires a SHARE UPDATE EXCLUSIVE lock on the parent table, in addition to ACCESS EXCLUSIVE locks on the table to be attached and the DEFAULT partition (if any). Justin
Re: On disable_cost
On Fri, Nov 1, 2019 at 12:43 PM Andres Freund wrote: > Hm. That seems complicated. Is it clear that we'd always notice that we > have no plan early enough to know which paths to reconsider? I think > there's cases where that'd only happen a few levels up. Yeah, there could be problems of that kind. I think if a baserel has no paths, then we know right away that we've got a problem, but for joinrels it might be more complicated. > As a first step I'd be inclined to "just" adjust disable_cost up to > something like 1.0e12. Unfortunately much higher and and we're getting > into the area where the loss of precision starts to be significant > enough that I'm not sure that we're always careful enough to perform > math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000 > being 1e+20). I've seen queries with costs above 1e10 where that costing > wasn't insane. We've done that before and we can do it again. But we're going to need to have something better eventually, I think, not just keep kicking the can down the road. Another point to consider here is that in some cases we could really just skip generating certain paths altogether. We already do this for hash joins: if we're planning a join and enable_hashjoin is disabled, we just don't generate hash joins paths at all, except for full joins, where there might be no other legal method. As this example shows, this cannot be applied in all cases, but maybe we could do it more widely than we do today. I'm not sure how beneficial that technique would be, though, because it doesn't seem like it's quite enough to solve this problem by itself. Yet another approach would be to divide the cost into two parts, a "cost" component and a "violations" component. If two paths are compared, the one with fewer violations always wins; if it's a tie, they compare on cost. A path's violation count is the total of its children, plus one for itself if it does something that's disabled. This would be more principled than the current approach, but maybe it's too costly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Looking for a demo of extensible nodes
Hi, I've a basic experimental extension where I use extensible nodes. This is the commit which adds the extensible node to the project: https://github.com/onderkalaci/pgcolor/commit/10cba5d02a828dbee4bc140f5e88d6c44b40e5c2 Hope that gives you some pointers. On Fri, Nov 1, 2019 at 1:18 PM Adam Lee wrote: > Hi, hackers > > As the $Subject, does anyone have one? I'd like to refer to it, and > write an example for people who is also looking for the document. > > Thanks. > > -- > Adam Lee > > >
Re: On disable_cost
Hi, On 2019-11-01 12:22:06 -0400, Robert Haas wrote: > On Fri, Nov 1, 2019 at 12:00 PM Andres Freund wrote: > > That seems like a bad idea - we add the cost multiple times. And we > > still want to compare plans that potentially involve that cost, if > > there's no other way to plan the query. > > Yeah. I kind of wonder if we shouldn't instead (a) skip adding paths > that use methods which are disabled and then (b) if we don't end up > with any paths for that reloptinfo, try again, ignoring disabling > GUCs. Hm. That seems complicated. Is it clear that we'd always notice that we have no plan early enough to know which paths to reconsider? I think there's cases where that'd only happen a few levels up. As a first step I'd be inclined to "just" adjust disable_cost up to something like 1.0e12. Unfortunately much higher and and we're getting into the area where the loss of precision starts to be significant enough that I'm not sure that we're always careful enough to perform math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000 being 1e+20). I've seen queries with costs above 1e10 where that costing wasn't insane. And then, in a larger patch, go for something like Heikki's proposal quoted by Zhenghua Lyu upthread, where we treat 'forbidden' as a separate factor in comparisons of path costs, rather than fudging the cost upwards. But there's some care to be taken to make sure we don't regress performance too much due to the additional logic in compare_path_cost et al. I'd also be curious to see if there's some other problem with cost calculation here - some of the quoted final costs seem high enough to be suspicious. I'd be curious to see a plan... Greetings, Andres Freund
Re: pglz performance
On 2019-Nov-01, Peter Eisentraut wrote: > On 2019-10-25 07:05, Andrey Borodin wrote: > > > 21 окт. 2019 г., в 14:09, Andrey Borodin > > > написал(а): > > > > > > With Silesian corpus pglz_decompress_hacked is actually decreasing > > > performance on high-entropy data. > > > Meanwhile pglz_decompress_hacked8 is still faster than usual > > > pglz_decompress. > > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is > > > safer option. > > > > Here's v3 which takes into account recent benchmarks with Silesian Corpus > > and have better comments. > > Your message from 21 October appears to say that this change makes the > performance worse. So I don't know how to proceed with this. As I understand that report, in these results "less is better", so the hacked8 variant shows better performance (33.8) than current (42.5). The "hacked" variant shows worse performance (48.2) that the current code. The "in spite" phrase seems to have been a mistake. I am surprised that there is so much variability in the performance numbers, though, based on such small tweaks of the code. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Implement INSERT SET syntax
On Sun, Aug 18, 2019 at 11:00 AM Tom Lane wrote: > Perhaps the way to resolve Peter's objection is to make the syntax > more fully like UPDATE: > > INSERT INTO target SET c1 = x, c2 = y+z, ... FROM tables-providing-x-y-z > > (with the patch as-submitted corresponding to the case with an empty > FROM clause, hence no variables in the expressions-to-be-assigned). > > Of course, this is not functionally distinct from > > INSERT INTO target(c1,c2,...) SELECT x, y+z, ... FROM tables-providing-x-y-z > > and it's fair to question whether it's worth supporting a nonstandard > syntax just to allow the target column names to be written closer to > the expressions-to-be-assigned. For what it's worth, I think this would be useful enough to justify its existence. Back in days of yore when dragons roamed the earth and I wrote database-driven applications instead of hacking on the database itself, I often wondered why I had to write two completely-different looking SQL statements, one to insert the data which a user had entered into a webform into the database, and another to update previously-entered data. This feature would allow those queries to be written in the same way, which would have pleased me, back in the day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.
Hi, As per the following code, t1 is a remote table through postgres_fdw: test=# BEGIN; BEGIN test=# SELECT * FROM t1; ... test=# PREPARE TRANSACTION 'gxid1'; ERROR: cannot prepare a transaction that modified remote tables I have attached a patch to the documentation that adds remote tables to the list of objects where any operation prevent using a prepared transaction, currently it is just notified "operations involving temporary tables or the session's temporary namespace". The second patch modify the message returned by postgres_fdw as per the SELECT statement above the message should be more comprehensible with: ERROR: cannot PREPARE a transaction that has operated on remote tables like for temporary objects: ERROR: cannot PREPARE a transaction that has operated on temporary objects Best regards, -- Gilles -- Gilles Darold http://www.darold.net/ diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml index 5016ca287e..443caf131e 100644 --- a/doc/src/sgml/ref/prepare_transaction.sgml +++ b/doc/src/sgml/ref/prepare_transaction.sgml @@ -99,8 +99,8 @@ PREPARE TRANSACTION transaction_id It is not currently allowed to PREPARE a transaction that has executed any operations involving temporary tables or the session's - temporary namespace, created any cursors WITH HOLD, or - executed LISTEN, UNLISTEN, or + temporary namespace or remote tables, created any cursors WITH HOLD, + or executed LISTEN, UNLISTEN, or NOTIFY. Those features are too tightly tied to the current session to be useful in a transaction to be prepared. diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 7cd69cc709..2f5c37b159 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -725,7 +725,7 @@ pgfdw_xact_callback(XactEvent event, void *arg) */ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot prepare a transaction that modified remote tables"))); + errmsg("cannot prepare a transaction that has operated on remote tables"))); break; case XACT_EVENT_PARALLEL_COMMIT: case XACT_EVENT_COMMIT:
Re: On disable_cost
On Fri, Nov 1, 2019 at 12:00 PM Andres Freund wrote: > That seems like a bad idea - we add the cost multiple times. And we > still want to compare plans that potentially involve that cost, if > there's no other way to plan the query. Yeah. I kind of wonder if we shouldn't instead (a) skip adding paths that use methods which are disabled and then (b) if we don't end up with any paths for that reloptinfo, try again, ignoring disabling GUCs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Global temporary tables
On 01.11.2019 18:26, Robert Haas wrote: On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik wrote: It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. I wonder if there are some pitfalls of such approach? That sounds pretty hackish. You'd have to be very careful, for example, that if the tables were dropped or re-analyzed, all of the old entries got removed -- I have checked it: - when table is reanalyzed, then cache entries are replaced. - when table is dropped, then cache entries are removed. and then it would still fail if any code tried to access the statistics directly from the table, rather than via the caches. My assumption is that the statistics ought to be stored in some backend-private data structure designed for that purpose, and that the code that needs the data should be taught to look for it there when the table is a GTT. Yes, if you do "select * from pg_statistic" then you will not see statistic for GTT in this case. But I do not think that it is so critical. I do not believe that anybody is trying to manually interpret values in this table. And optimizer is retrieving statistic through sys-cache mechanism and so is able to build correct plan in this case. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Adding percentile metrics to pg_stat_statements module
> > That's not what I wrote. My point was that we *should* store the digests > themselves, otherwise we just introduce additional errors into the > estimates, because it discards the weights/frequencies. Sorry. I meant to write "no reason to *not* store the digests" Em sex, 1 de nov de 2019 às 11:17, Tomas Vondra < tomas.von...@2ndquadrant.com> escreveu: > On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote: > >Yeah, I agree that there's no reason to store the digests themselves and I > >really liked the idea of it being optional. > > That's not what I wrote. My point was that we *should* store the digests > themselves, otherwise we just introduce additional errors into the > estimates, because it discards the weights/frequencies. > > >If it turns out that memory consumption on real workloads is small enough, > >it could eventually be turned on by default. > > > > Maybe, but it's not just about memory consumption. CPU matters too. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: On disable_cost
Hi, On 2019-11-01 19:58:04 +1300, Thomas Munro wrote: > On Fri, Nov 1, 2019 at 7:42 PM Zhenghua Lyu wrote: > > It is tricky to set disable_cost a huge number. Can we come up with > > better solution? > > What happens if you use DBL_MAX? That seems like a bad idea - we add the cost multiple times. And we still want to compare plans that potentially involve that cost, if there's no other way to plan the query. - Andres
Re: WIP/PoC for parallel backup
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman wrote: > 'startptr' is used by sendFile() during checksum verification. Since > SendBackupFiles() is using sendFIle we have to set a valid WAL location. Ugh, global variables. Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and STOP_BACKUP all using the same base_backup_opt_list production as BASE_BACKUP? Presumably most of those options are not applicable to most of those commands, and the productions should therefore be separated. You should add docs, too. I wouldn't have to guess what some of this stuff was for if you wrote documentation explaining what this stuff was for. :-) >> The tablespace_path option appears entirely unused, and I don't know >> why that should be necessary here, either. > > This is to calculate the basepathlen. We need to exclude the tablespace > location (or > base path) from the filename before it is sent to the client with sendFile > call. I added > this option primarily to avoid performing string manipulation on filename to > extract the > tablespace location and then calculate the basepathlen. > > Alternatively we can do it by extracting the base path from the received > filename. What > do you suggest? I don't think the server needs any information from the client in order to be able to exclude the tablespace location from the pathname. Whatever it needs to know, it should be able to figure out, just as it would in a non-parallel backup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: fe-utils - share query cancellation code
On 2019-Nov-01, Fabien COELHO wrote: > > > #include "common/int.h" > > > #include "common/logging.h" > > > -#include "fe_utils/conditional.h" > > > #include "getopt_long.h" > > > #include "libpq-fe.h" > > > +#include "fe_utils/conditional.h" > > > +#include "fe_utils/cancel.h" > > > #include "pgbench.h" > > > #include "portability/instr_time.h" > > > > wtf? > > I understand that you are unhappy about something, but where the issue is > fails me, the "wtf" 3 characters are not enough to point me in the right > direction. Feel free to elaborate a little bit more:-) I don't see why you move the "conditional.h" line out of its correct alphabetical position (where it is now), and then add "cancel.h" next to it also out of its correct alphabetical position. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Global temporary tables
On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik wrote: > It seems to me that I have found quite elegant solution for per-backend > statistic for GTT: I just inserting it in backend's catalog cache, but not in > pg_statistic table itself. > To do it I have to add InsertSysCache/InsertCatCache functions which insert > pinned entry in the correspondent cache. > I wonder if there are some pitfalls of such approach? That sounds pretty hackish. You'd have to be very careful, for example, that if the tables were dropped or re-analyzed, all of the old entries got removed -- and then it would still fail if any code tried to access the statistics directly from the table, rather than via the caches. My assumption is that the statistics ought to be stored in some backend-private data structure designed for that purpose, and that the code that needs the data should be taught to look for it there when the table is a GTT. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: fe-utils - share query cancellation code
Hello Alvaro, #include "common/int.h" #include "common/logging.h" -#include "fe_utils/conditional.h" #include "getopt_long.h" #include "libpq-fe.h" +#include "fe_utils/conditional.h" +#include "fe_utils/cancel.h" #include "pgbench.h" #include "portability/instr_time.h" wtf? I understand that you are unhappy about something, but where the issue is fails me, the "wtf" 3 characters are not enough to point me in the right direction. Feel free to elaborate a little bit more:-) -- Fabien.
Re: MarkBufferDirtyHint() and LSN update
On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska wrote: > 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear > BM_DIRTY because MarkBufferDirtyHint() has eventually set > BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next > call of FlushBuffer(). However page LSN is hasn't been updated so the > requirement that WAL must be flushed first is not met. This part confuses me. Are you saying that MarkBufferDirtyHint() can set BM_JUST_DIRTIED without setting BM_DIRTY? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Fri, Nov 1, 2019 at 7:41 AM Robert Haas wrote: > Could we get around this by making Datum 8 bytes everywhere? I really like that idea. Even Raspberry Pi devices (which can cost as little as $35) use 64-bit ARM processors. It's abundantly clear that 32-bit platforms do not matter enough to justify keeping all the SIZEOF_DATUM crud around. -- Peter Geoghegan
Re: [Proposal] Global temporary tables
On 25.10.2019 20:00, Pavel Stehule wrote: > >> So except the limitation mentioned above (which I do not consider as critical) there is only one problem which was not addressed: maintaining statistics for GTT. >> If all of the following conditions are true: >> >> 1) GTT are used in joins >> 2) There are indexes defined for GTT >> 3) Size and histogram of GTT in different backends can significantly vary. >> 4) ANALYZE was explicitly called for GTT >> >> then query execution plan built in one backend will be also used for other backends where it can be inefficient. >> I also do not consider this problem as "show stopper" for adding GTT to Postgres. > I think that's *definitely* a show stopper. Well, if both you and Pavel think that it is really "show stopper", then this problem really has to be addressed. I slightly confused about this opinion, because Pavel has told me himself that 99% of users never create indexes for temp tables or run "analyze" for them. And without it, this problem is not a problem at all. Users doesn't do ANALYZE on temp tables in 99%. It's true. But second fact is so users has lot of problems. It's very similar to wrong statistics on persistent tables. When data are small, then it is not problem for users, although from my perspective it's not optimal. When data are not small, then the problem can be brutal. Temporary tables are not a exception. And users and developers are people - we know only about fatal problems. There are lot of unoptimized queries, but because the problem is not fatal, then it is not reason for report it. And lot of people has not any idea how fast the databases can be. The knowledges of users and app developers are sad book. Pavel It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. I wonder if there are some pitfalls of such approach? New patch for GTT is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c index 647350c..ca5f22d 100644 --- a/src/backend/access/brin/brin_revmap.c +++ b/src/backend/access/brin/brin_revmap.c @@ -25,6 +25,7 @@ #include "access/brin_revmap.h" #include "access/brin_tuple.h" #include "access/brin_xlog.h" +#include "access/brin.h" #include "access/rmgr.h" #include "access/xloginsert.h" #include "miscadmin.h" @@ -79,6 +80,11 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange, meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO); LockBuffer(meta, BUFFER_LOCK_SHARE); page = BufferGetPage(meta); + + if (GlobalTempRelationPageIsNotInitialized(idxrel, page)) + brin_metapage_init(page, BrinGetPagesPerRange(idxrel), + BRIN_CURRENT_VERSION); + TestForOldSnapshot(snapshot, idxrel, page); metadata = (BrinMetaPageData *) PageGetContents(page); diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c index 439a91b..8a6ac71 100644 --- a/src/backend/access/gin/ginfast.c +++ b/src/backend/access/gin/ginfast.c @@ -241,6 +241,16 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector) metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO); metapage = BufferGetPage(metabuffer); + if (GlobalTempRelationPageIsNotInitialized(index, metapage)) + { + Buffer rootbuffer = ReadBuffer(index, GIN_ROOT_BLKNO); + LockBuffer(rootbuffer, BUFFER_LOCK_EXCLUSIVE); + GinInitMetabuffer(metabuffer); + GinInitBuffer(rootbuffer, GIN_LEAF); + MarkBufferDirty(rootbuffer); + UnlockReleaseBuffer(rootbuffer); + } + /* * An insertion to the pending list could logically belong anywhere in the * tree, so it conflicts with all serializable scans. All scans acquire a diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index b18ae2b..41bab5d 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -1750,7 +1750,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos) /* * Collect all matched rows from pending list into bitmap. */ -static void +static bool scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids) { GinScanOpaque so = (GinScanOpaque) scan->opaque; @@ -1774,6 +1774,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids) LockBuffer(metabuffer, GIN_SHARE); page = BufferGetPage(metabuffer); TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page); + + if (GlobalTempRelationPageIsNotInitialized(scan->indexRelation, page)) + { + UnlockReleaseBuffer(metabuffer); + return false; + } blkno = GinPageGetMeta(page)->head; /* @@ -1784,7 +1790,7 @@
Re: ssl passphrase callback
On Thu, Oct 31, 2019 at 11:37 AM Andrew Dunstan wrote: > This patch provides a hook for a function that can supply an SSL > passphrase. The hook can be filled in by a shared preloadable module. In > order for that to be effective, the startup order is modified slightly. > There is a test attached that builds and uses one trivial > implementation, which just takes a configuration setting and rot13's it > before supplying the result as the passphrase. It seems to me that it would be a lot better to have an example in contrib that does something which might be of actual use to users, such as running a shell command and reading the passphrase from stdout. Features that are only accessible by writing C code are, in general, not as desirable as features which can be accessed via SQL or configuration. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: On disable_cost
Em sex, 1 de nov de 2019 às 03:42, Zhenghua Lyu escreveu: > > My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For query > 104, it generates > nestloop join even with enable_nestloop set off. And the final plan's total > cost is very huge (about 1e24). But If I enlarge the disable_cost to 1e30, > then, planner will generate hash join. > > So I guess that disable_cost is not large enough for huge amount of data. > > It is tricky to set disable_cost a huge number. Can we come up with > better solution? > Isn't it a case for a GUC disable_cost? As Thomas suggested, DBL_MAX upper limit should be sufficient. > The following thoughts are from Heikki: >> >> Aside from not having a large enough disable cost, there's also the fact >> that the high cost might affect the rest of the plan, if we have to use a >> plan type that's disabled. For example, if a table doesn't have any indexes, >> but enable_seqscan is off, we might put the unavoidable Seq Scan on >> different side of a join than we we would with enable_seqscan=on, because of >> the high cost estimate. > > >> >> I think a more robust way to disable forbidden plan types would be to handle >> the disabling in add_path(). Instead of having a high disable cost on the >> Path itself, the comparison add_path() would always consider disabled paths >> as more expensive than others, regardless of the cost. > I'm afraid it is not as cheap as using diable_cost as a node cost. Are you proposing to add a new boolean variable in Path struct to handle those cases in compare_path_costs_fuzzily? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: Remove configure --disable-float4-byval and --disable-float8-byval
On Thu, Oct 31, 2019 at 9:36 AM Tom Lane wrote: > > float8 and related types are now hardcoded to pass-by-value or > > pass-by-reference depending on whether the build is 64- or 32-bit, as > > was previously also the default. > > I'm less happy with doing this. It makes it impossible to test the > pass-by-reference code paths without actually firing up a 32-bit > environment. It'd be fine to document --disable-float8-byval as > a developer-only option (it might be so already), but I don't want > to lose it completely. I fail to see any advantage in getting rid > of it, anyway, since we do still have to maintain both code paths. Could we get around this by making Datum 8 bytes everywhere? Years ago, we got rid of INT64_IS_BUSTED, so we're relying on 64-bit types working on all platforms. However, Datum on a system where pointers are only 32 bits wide is also only 32 bits wide, so we can't pass 64-bit quantities the same way we do elsewhere. But, why is the width of a Datum equal to the width of a pointer, anyway? It would surely cost something to widen 1, 2, and 4 byte quantities to 8 bytes when packing them into datums on 32-bit platforms, but (1) we've long since accepted that cost on 64-bit platforms, (2) it would save palloc/pfree cycles when packing 8 byte quantities into 4-byte values, (3) 32-bit platforms are now marginal and performance on those platforms is not critical, and (4) it would simplify a lot of code and reduce future bugs. On a related note, why do we store typbyval in the catalog anyway instead of inferring it from typlen and maybe typalign? It seems like a bad idea to record on disk the way we pass around values in memory, because it means that a change to how values are passed around in memory has ramifications for on-disk compatibility. rhaas=# select typname, typlen, typbyval, typalign from pg_type where typlen in (1,2,4,8) != typbyval; typname | typlen | typbyval | typalign --++--+-- macaddr8 | 8 | f| i (1 row) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Adding percentile metrics to pg_stat_statements module
On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote: Yeah, I agree that there's no reason to store the digests themselves and I really liked the idea of it being optional. That's not what I wrote. My point was that we *should* store the digests themselves, otherwise we just introduce additional errors into the estimates, because it discards the weights/frequencies. If it turns out that memory consumption on real workloads is small enough, it could eventually be turned on by default. Maybe, but it's not just about memory consumption. CPU matters too. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Adding percentile metrics to pg_stat_statements module
Yeah, I agree that there's no reason to store the digests themselves and I really liked the idea of it being optional. If it turns out that memory consumption on real workloads is small enough, it could eventually be turned on by default. I'll start working on patch Em qui, 31 de out de 2019 às 16:32, Tomas Vondra < tomas.von...@2ndquadrant.com> escreveu: > On Thu, Oct 31, 2019 at 12:51:17PM -0300, Igor Calabria wrote: > >Hi everyone, > > > >I was taking a look at pg_stat_statements module and noticed that it does > >not collect any percentile metrics. I believe that It would be really > handy > >to have those available and I'd love to contribute with this feature. > > > >The basic idea is to accumulate the the query execution times using an > >approximation structure like q-digest or t-digest and add those results to > >the pg_stat_statements table as fixed columns. Something like this > > > >p90_time: > >p95_time: > >p99_time: > >p70_time: > >... > > > >Another solution is to persist de digest structure in a binary column and > >use a function to extract the desired quantile ilke this SELECT > >approx_quantile(digest_times, 0.99) FROM pg_stat_statements > > > > IMO having some sort of CDF approximation (being a q-digest or t-digest) > would be useful, although it'd probably need to be optional (mostly > becuase of memory consumption). > > I don't see why we would not store the digests themselves. Storing just > some selected percentiles would be pretty problematic due to losing a > lot of information on restart. Also, pg_stat_statements is not a table > but a view on in-memory hash table. > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: Problem with synchronous replication
On Thu, Oct 31, 2019 at 05:38:32PM +0900, Fujii Masao wrote: > Thanks for the patch! Looks good to me. Thanks. I have applied the core fix down to 9.4, leaving the new assertion improvements only for HEAD. -- Michael signature.asc Description: PGP signature
Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)
On Fri, Nov 01, 2019 at 08:59:48AM -0500, Justin Pryzby wrote: > I guess you mean because it's not a child until after the ALTER. Yes, that > makes sense. Yes, perhaps you have another idea than mine on how to shape this sentence? -- Michael signature.asc Description: PGP signature
Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)
On Thu, Oct 31, 2019 at 06:07:34PM +0900, Michael Paquier wrote: > On Mon, Oct 28, 2019 at 10:56:33PM -0500, Justin Pryzby wrote: > > I suppose it should something other than partition(ed), since partitions > > can be > > partitioned, too... > > > > Attaching a partition acquires a SHARE UPDATE > > EXCLUSIVE > > lock on the parent table, in addition to > > ACCESS EXCLUSIVE locks on the child table and the > > DEFAULT partition (if any). > > In this context, "on the child table" sounds a bit confusing? Would > it make more sense to say the "on the table to be attached" instead? I guess you mean because it's not a child until after the ALTER. Yes, that makes sense. Thanks, Justin
Re: fe-utils - share query cancellation code
On 2019-Nov-01, Fabien COELHO wrote: > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > index 03bcd22996..389b4d7bcd 100644 > --- a/src/bin/pgbench/pgbench.c > +++ b/src/bin/pgbench/pgbench.c > @@ -59,9 +59,10 @@ > > #include "common/int.h" > #include "common/logging.h" > -#include "fe_utils/conditional.h" > #include "getopt_long.h" > #include "libpq-fe.h" > +#include "fe_utils/conditional.h" > +#include "fe_utils/cancel.h" > #include "pgbench.h" > #include "portability/instr_time.h" wtf? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add a GUC variable that control logical replication
On 2019-10-20 00:23, Euler Taveira wrote: You can probably achieve that using ALTER PUBLICATION to disable publication of deletes or truncates, as the case may be, either permanently or just for the duration of the operations you want to skip. ... then you are skipping all tables in the publication. You can group tables into different publications and set the subscription to subscribe to multiple publications if you need this kind of granularity. In any case, this kind of thing needs to be handled by the decoding plugin based on its configuration policies and depending on its needs. For example, let's say you have two decoding plugins running: one for a replication system and one for writing an audit log. It would not be appropriate to disable logging for both of them because of some performance optimization for one of them. And it would also not be appropriate to do this with a USERSET setting. If we need different hooks or more DDL commands do this better, then that can be considered. But this seems to be the wrong way to do it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pause recovery if pitr target not reached
On 2019-10-21 08:44, Fujii Masao wrote: Probably we can use standby mode + recovery target setting for the almost same purpose. In this configuration, if end-of-WAL is reached before recovery target, the startup process keeps waiting for new WAL to be available. Then, if recovery target is reached, the startup process works as recovery_target_action indicates. So basically get rid of recovery.signal mode and honor recovery target parameters in standby mode? That has some appeal because it simplify this whole space significantly, but perhaps it would be too confusing for end users? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pglz performance
On 2019-10-25 07:05, Andrey Borodin wrote: 21 окт. 2019 г., в 14:09, Andrey Borodin написал(а): With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. Your message from 21 October appears to say that this change makes the performance worse. So I don't know how to proceed with this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Add some useful asserts into View Options macroses
On 2019-10-08 12:44, Nikolay Shaplov wrote: В письме от понедельник, 7 октября 2019 г. 12:59:27 MSK пользователь Robert Haas написал: This thread is a follow up to the thread https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m where I've been trying to remove StdRdOptions structure and replace it with unique structure for each relation kind. I've decided to split that patch into smaller parts. This part adds some asserts to ViewOptions macroses. Since an option pointer there is converted into (ViewOptions *) it would be really good to make sure that this macros is called in proper context, and we do the convertation properly. At least when running tests with asserts turned on. Committed. I simplified the parentheses by one level from your patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Looking for a demo of extensible nodes
Hi, hackers As the $Subject, does anyone have one? I'd like to refer to it, and write an example for people who is also looking for the document. Thanks. -- Adam Lee
Re: alternative to PG_CATCH
On 2019-10-29 17:10, Tom Lane wrote: Peter Eisentraut writes: On 2019-10-28 13:45, Robert Haas wrote: In theory, the do_rethrow variable could conflict with a symbol declared in the surrounding scope, but that doesn't seem like it's a problem worth getting worked up about. Right. A PG_TRY block also declares other local variables for internal use without much care about namespacing. If it becomes a problem, it's easy to address. Although we haven't been terribly consistent about it, some of our macros address this problem by using local variable names with a leading and/or trailing underscore, or otherwise making them names you'd be quite unlikely to use in normal code. I suggest doing something similar here. (Wouldn't be a bad idea to make PG_TRY's variables follow suit.) committed with a leading underscore -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: 64 bit transaction id
Hi pá 1. 11. 2019 v 10:11 odesílatel Павел Ерёмин napsal: > Hi. > sorry for my English. > I want to once again open the topic of 64 bit transaction id. I did not > manage to find in the archive of the option that I want to discuss, so I > write. If I searched poorly, then please forgive me. > The idea is not very original and probably has already been considered, > again I repeat - I did not find it. Therefore, please do not scold me > severely. > In discussions of 64-bit transaction id, I did not find mention of an > algorithm for storing them, as it was done, for example, in MS SQL Server. > What if instead of 2 fields (xmin and xmax) with a total length of 64 bits > - use 1 field (let's call it xid) with a length of 64 bits in tuple header? > In this field store the xid of the transaction that created the version. In > this case, the new transaction in order to understand whether the read > version is suitable for it or not, will have to read the next version as > well. Those. The downside of such decision is of course an increase in I / > O. Transactions will have to read the +1 version. On the plus side, the > title of the tuple remains the same length. > is 32 bit tid really problem? Why you need to know state of last 2^31 transactions? Is not problem in too low usage (or maybe too high overhead) of VACUUM FREEZE. I am not sure if increasing this range can has much more fatal problems (maybe later) Pavel > > regards, Eremin Pavel. >
Re: fe-utils - share query cancellation code
Hello, I give a quick look and I think we can void psql_setup_cancel_handler(void) { setup_cancel_handler(psql_sigint_callback); } Because it does not matter for setup_cancel_handler what we passed because it is ignoring that in case of windows. The "psql_sigint_callback" function is not defined under WIN32. I've fixed a missing NULL argument in the section you pointed out, though. I've used the shared infrastructure in pgbench. I've noticed yet another instance of the cancelation stuff in "src/bin/pg_dump/parallel.c", but it seems somehow different from the two others, so I have not tried to used the shared version. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 03bcd22996..389b4d7bcd 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -59,9 +59,10 @@ #include "common/int.h" #include "common/logging.h" -#include "fe_utils/conditional.h" #include "getopt_long.h" #include "libpq-fe.h" +#include "fe_utils/conditional.h" +#include "fe_utils/cancel.h" #include "pgbench.h" #include "portability/instr_time.h" @@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con) exit(1); } + if (CancelRequested) + break; + /* * If we want to stick with the original logging, print a message each * 100k inserted rows. @@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps) if ((con = doConnect()) == NULL) exit(1); + setup_cancel_handler(NULL); + SetCancelConn(con); + for (step = initialize_steps; *step != '\0'; step++) { instr_time start; @@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps) } fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data); + ResetCancelConn(); PQfinish(con); termPQExpBuffer(); } diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index b981ae81ff..f1d9e0298a 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -29,6 +29,7 @@ #include "libpq-fe.h" #include "pqexpbuffer.h" #include "common/logging.h" +#include "fe_utils/cancel.h" #include "fe_utils/print.h" #include "fe_utils/string_utils.h" diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 90f6380170..0a66b71372 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -24,6 +24,7 @@ #include "copy.h" #include "crosstabview.h" #include "fe_utils/mbprint.h" +#include "fe_utils/cancel.h" #include "fe_utils/string_utils.h" #include "portability/instr_time.h" #include "settings.h" @@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message) } +#ifndef WIN32 /* * Code to support query cancellation @@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message) * * SIGINT is supposed to abort all long-running psql operations, not only * database queries. In most places, this is accomplished by checking - * cancel_pressed during long-running loops. However, that won't work when + * CancelRequested during long-running loops. However, that won't work when * blocked on user input (in readline() or fgets()). In those places, we * set sigint_interrupt_enabled true while blocked, instructing the signal * catcher to longjmp through sigint_interrupt_jmp. We assume readline and @@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false; sigjmp_buf sigint_interrupt_jmp; -static PGcancel *volatile cancelConn = NULL; - -#ifdef WIN32 -static CRITICAL_SECTION cancelConnLock; -#endif - -/* - * Write a simple string to stderr --- must be safe in a signal handler. - * We ignore the write() result since there's not much we could do about it. - * Certain compilers make that harder than it ought to be. - */ -#define write_stderr(str) \ - do { \ - const char *str_ = (str); \ - int rc_; \ - rc_ = write(fileno(stderr), str_, strlen(str_)); \ - (void) rc_; \ - } while (0) - - -#ifndef WIN32 - static void -handle_sigint(SIGNAL_ARGS) +psql_sigint_callback(void) { - int save_errno = errno; - char errbuf[256]; - /* if we are waiting for input, longjmp out of it */ if (sigint_interrupt_enabled) { @@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS) } /* else, set cancel flag to stop any long-running loops */ - cancel_pressed = true; - - /* and send QueryCancel if we are processing a database query */ - if (cancelConn != NULL) - { - if (PQcancel(cancelConn, errbuf, sizeof(errbuf))) - write_stderr("Cancel request sent\n"); - else - { - write_stderr("Could not send cancel request: "); - write_stderr(errbuf); - } - } - - errno = save_errno; /* just in case the write changed it */ + CancelRequested = true; } -void -setup_cancel_handler(void) -{ - pqsignal(SIGINT, handle_sigint); -} -#else /* WIN32 */ - -static BOOL WINAPI -consoleHandler(DWORD dwCtrlType) -{ - char errbuf[256]; - - if (dwCtrlType == CTRL_C_EVENT || - dwCtrlType == CTRL_BREAK_EVENT) - { - /* - * Can't longjmp here, because we are in wrong thread :-( - */ - - /* set cancel flag to stop any long-running loops */ -
Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"
On Fri, Nov 1, 2019 at 8:00 AM Fujii Masao wrote: > On Fri, Nov 1, 2019 at 6:34 AM Ibrar Ahmed wrote: > > > > > > > > On Thu, Oct 31, 2019 at 6:56 PM Tom Lane wrote: > >> > >> Fujii Masao writes: > >> > ... I found that the command tag of > >> > ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER > VIEW". > >> > >> > =# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x; > >> > ALTER TABLE > >> > >> > Is this intentional? Or bug? > >> > >> Seems like an oversight. > > Thanks for the check! > > > The same issue is with ALTER FOREIGN TABLE > > Yes. > > > Attached patch fixes that for ALTER VIEW , ALTER MATERIALIZED VIEW and > ALTER FOREIGN TABLE > > You introduced subtype in your patch, but I think it's better and simpler > to just give relationType to AlterObjectTypeCommandTag() > if renaming the columns (i.e., renameType = OBJECT_COLUMN). > > That's works perfectly along with future oversight about the command tag. > To avoid this kind of oversight about command tag, I'd like to add > regression > tests to make sure that SQL returns valid and correct command tag. > But currently there seems no mechanism for such test, in regression > test. Right?? > Do we really need a regression test cases for such small oversights? > Maybe we will need that mechanism. > > Regards, > > -- > Fujii Masao > -- Ibrar Ahmed
64 bit transaction id
Hi.sorry for my English.I want to once again open the topic of 64 bit transaction id. I did not manage to find in the archive of the option that I want to discuss, so I write. If I searched poorly, then please forgive me.The idea is not very original and probably has already been considered, again I repeat - I did not find it. Therefore, please do not scold me severely.In discussions of 64-bit transaction id, I did not find mention of an algorithm for storing them, as it was done, for example, in MS SQL Server.What if instead of 2 fields (xmin and xmax) with a total length of 64 bits - use 1 field (let's call it xid) with a length of 64 bits in tuple header? In this field store the xid of the transaction that created the version. In this case, the new transaction in order to understand whether the read version is suitable for it or not, will have to read the next version as well. Those. The downside of such decision is of course an increase in I / O. Transactions will have to read the +1 version. On the plus side, the title of the tuple remains the same length. regards, Eremin Pavel.
Re: [HACKERS] Block level parallel vacuum
On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar wrote: > > On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar wrote: > > > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada > > wrote: > > > Actually after increased shared_buffer I got expected results: > > > > > > * Test1 (after increased shared_buffers) > > > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > > > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > > > I updated the patch that computes the total cost delay shared by > > > Dilip[1] so that it collects the number of buffer hits and so on, and > > > have attached it. It can be applied on top of my latest patch set[1]. > > While reading your modified patch (PoC-delay-stats.patch), I have > noticed that in my patch I used below formulae to compute the total > delay > total delay = delay in heap scan + (total delay of index scan > /nworkers). But, in your patch, I can see that it is just total sum of > all delay. IMHO, the total sleep time during the index vacuum phase > must be divided by the number of workers, because even if at some > point, all the workers go for sleep (e.g. 10 msec) then the delay in > I/O will be only for 10msec not 30 msec. I think the same is > discussed upthread[1] > I think that two approaches make parallel vacuum worker wait in different way: in approach(a) the vacuum delay works as if vacuum is performed by single process, on the other hand in approach(b) the vacuum delay work for each workers independently. Suppose that the total number of blocks to vacuum is 10,000 blocks, the cost per blocks is 10, the cost limit is 200 and sleep time is 5 ms. In single process vacuum the total sleep time is 2,500ms (= (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. Because all parallel vacuum workers use the shared balance value and a worker sleeps once the balance value exceeds the limit. In approach(b), since the cost limit is divided evenly the value of each workers is 40 (e.g. when 5 parallel degree). And suppose each workers processes blocks evenly, the total sleep time of all workers is 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can compute the sleep time of approach(b) by dividing the total value by the number of parallel workers. IOW the approach(b) makes parallel vacuum delay much more than normal vacuum and parallel vacuum with approach(a) even with the same settings. Which behaviors do we expect? I thought the vacuum delay for parallel vacuum should work as if it's a single process vacuum as we did for memory usage. I might be missing something. If we prefer approach(b) I should change the patch so that the leader process divides the cost limit evenly. Regards, -- Masahiko Sawada
Re: Creating foreign key on partitioned table is too slow
Hello, On Fri, Oct 25, 2019 at 7:18 AM Tomas Vondra wrote: > On Thu, Oct 24, 2019 at 03:48:57PM -0300, Alvaro Herrera wrote: > >On 2019-Oct-23, kato-...@fujitsu.com wrote: > > > >> Hello > >> > >> To benchmark with tpcb model, I tried to create a foreign key in the > >> partitioned history table, but backend process killed by OOM. > >> the number of partitions is 8192. I tried in master(commit: ad4b7aeb84). > >> > >> I did the same thing in another server which has 200GB memory, but > >> creating foreign key did not end in 24 hours. > > > >Thanks for reporting. Thank you Kato-san. > It sounds like there must be a memory leak here. > >I am fairly pressed for time at present so I won't be able to > >investigate this until, at least, mid November. > > I've briefly looked into this, and I think the main memory leak is in > RelationBuildPartitionDesc. It gets called with PortalContext, it > allocates a lot of memory building the descriptor, copies it into > CacheContext but does not even try to free anything. So we end up with > something like this: ... > The attached patch trivially fixes that by adding a memory context > tracking all the temporary data, and then just deletes it as a whole at > the end of the function. This significantly reduces the memory usage for > me, not sure it's 100% correct. Thank you Tomas. I think we have considered this temporary context fix a number of times before, but it got stalled for one reason or another ([1] comes to mind as the last thread where this came up). Another angle to look at this is that our design where PartitionDesc is rebuilt on relcache reload of the parent relation is not a great one after all. It seems that we're rightly (?) invalidating the parent's relcache 8192 times in this case, because its cacheable foreign key descriptor changes on processing each partition, but PartitionDesc itself doesn't change. Having to pointlessly rebuild it 8192 times seems really wasteful. I recall a discussion where it was proposed to build PartitionDesc only when needed as opposed on every relcache reload of the parent relation. Attached PoC-at-best patch that does that seems to go through without OOM and passes make check-world. I think this should have a very minor impact on select queries. But... > FWIW, even with this fix it still takes an awful lot to create the > foreign key, because the CPU is stuck doing this > > 60.78%60.78% postgres postgres[.] bms_equal > 32.58%32.58% postgres postgres[.] > get_eclass_for_sort_expr > 3.83% 3.83% postgres postgres[.] > add_child_rel_equivalences > 0.23% 0.00% postgres [unknown] [.] 0x0005 > 0.22% 0.00% postgres [unknown] [.] > 0.18% 0.18% postgres postgres[.] AllocSetCheck ...we have many problems to solve here. :-( Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw%40mail.gmail.com build-PartitionDesc-when-needed-PoC.patch Description: Binary data
Refactor parse analysis of EXECUTE command
This patch moves the parse analysis component of ExecuteQuery() and EvaluateParams() into a new transformExecuteStmt() that is called from transformStmt(). This makes EXECUTE behave more like other utility commands. Effects are that error messages can have position information (see regression test case), and it allows using external parameters in the arguments of the EXECUTE command. I had previously inquired about this in [0] and some vague concerns were raised. I haven't dug very deep on this, but I figure with an actual patch it might be easier to review and figure out if there are any problems. [0]: https://www.postgresql.org/message-id/flat/ed2767e5-c506-048d-8ddf-280ecbc9e1b7%402ndquadrant.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: On disable_cost
On Fri, Nov 1, 2019 at 7:42 PM Zhenghua Lyu wrote: > It is tricky to set disable_cost a huge number. Can we come up with > better solution? What happens if you use DBL_MAX?
On disable_cost
Hi, Postgres has a global variable `disable_cost`. It is set the value 1.0e10. This value will be added to the cost of path if related GUC is set off. For example, if enable_nestloop is set off, when planner trys to add nestloop join path, it continues to add such path but with a huge cost `disable_cost`. But 1.0e10 may not be large enough. I encounter this issue in Greenplum(based on postgres). Heikki tolds me that someone also encountered the same issue on Postgres. So I send it here to have a discussion. My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For query 104, it generates nestloop join even with enable_nestloop set off. And the final plan's total cost is very huge (about 1e24). But If I enlarge the disable_cost to 1e30, then, planner will generate hash join. So I guess that disable_cost is not large enough for huge amount of data. It is tricky to set disable_cost a huge number. Can we come up with better solution? The following thoughts are from Heikki: > Aside from not having a large enough disable cost, there's also the > fact that the high cost might affect the rest of the plan, if we have to > use a plan type that's disabled. For example, if a table doesn't have any > indexes, but enable_seqscan is off, we might put the unavoidable Seq Scan > on different side of a join than we we would with enable_seqscan=on, > because of the high cost estimate. > I think a more robust way to disable forbidden plan types would be to > handle the disabling in add_path(). Instead of having a high disable cost > on the Path itself, the comparison add_path() would always consider > disabled paths as more expensive than others, regardless of the cost. Any thoughts or ideas on the problem? Thanks! Best Regards, Zhenghua Lyu
Re: progress report for ANALYZE
Hi vignesh! On 2019/09/17 20:51, vignesh C wrote: On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera wrote: There were some minor problems in v5 -- bogus Docbook as well as outdated rules.out, small "git diff --check" complaint about whitespace. This v6 (on today's master) fixes those, no other changes. + + The command is preparing to begin scanning the heap. This phase is + expected to be very brief. + In the above after "." there is an extra space, is this intentional. I had noticed that in lot of places there is couple of spaces and sometimes single space across this file. Like in below, there is single space after ".": Time when this process' current transaction was started, or null if no transaction is active. If the current query is the first of its transaction, this column is equal to the query_start column. Sorry for the late reply. Probably, it is intentional because there are many extra space in other documents. See below: # Results of grep = $ grep '\. ' doc/src/sgml/monitoring.sgml | wc -l 114 $ grep '\. ' doc/src/sgml/information_schema.sgml | wc -l 184 $ grep '\. ' doc/src/sgml/func.sgml | wc -l 577 = Therefore, I'm going to leave it as is. :) Thanks, Tatsuro Yamada