Re: [HACKERS] ICU integration
On Wed, Aug 31, 2016 at 2:46 PM, Peter Eisentraut wrote: > Here is a patch I've been working on to allow the use of ICU for sorting > and other locale things. This is very interesting work, and it's great to see some development in this area. I've been peripherally involved in more than one collation-change-broke-my-data incident over the years. I took the patch for a quick spin today. Here are a couple of initial observations. This patch adds pkg-config support to our configure script, in order to produce the build options for ICU. That's cool, and I'm a fan of pkg-config, but it's an extra dependency that I just wanted to highlight. For example MacOSX appears to ship with ICU but has is no pkg-config or ICU .pc files out of the box (erm, I can't even find the headers, so maybe that copy of ICU is useless to everyone except Apple). The MacPorts ICU port ships with .pc files, so that's easy to deal with, and I don't expect it to be difficult to get a working pkg-config and meta-data installed alongside ICU on any platform that doesn't already ship with them. It may well be useful for configuring other packages. (There is also other cool stuff that would become possible once ICU is optionally around, off topic.) There is something missing from the configure script however: the output of pkg-config is not making it into CFLAGS or LDFLAGS, so building fails on FreeBSD and MacOSX where for example doesn't live in the default search path. I tried very briefly to work out what but autoconfuse defeated me. You call the built-in strcoll/strxfrm collation provider "posix", and although POSIX does define those functions, it only does so because it inherits them from ISO C90. As far as I can tell Windows provides those functions because it conforms to the C spec, not the POSIX spec, though I suppose you could argue that in that respect it DOES conform to the POSIX spec... Also, we already have a collation called "POSIX". Maybe we should avoid confusing people with a "posix" provider and a "POSIX" collation? But then I'm not sure what else to call it, but perhaps "system" as Petr Jelinek suggested[1], or "libc". postgres=# select * from pg_collation where collname like 'en_NZ%'; ┌──┬───┬───┬──┬──┬──┬──┬─┐ │ collname │ collnamespace │ collowner │ collprovider │ collencoding │ collcollate│collctype │ collversion │ ├──┼───┼───┼──┼──┼──┼──┼─┤ │ en_NZ│11 │10 │ p│ 6 │ en_NZ│ en_NZ│ 0 │ │ en_NZ│11 │10 │ p│ 8 │ en_NZ.ISO8859-1 │ en_NZ.ISO8859-1 │ 0 │ │ en_NZ│11 │10 │ p│ 16 │ en_NZ.ISO8859-15 │ en_NZ.ISO8859-15 │ 0 │ │ en_NZ.ISO8859-1 │11 │10 │ p│ 8 │ en_NZ.ISO8859-1 │ en_NZ.ISO8859-1 │ 0 │ │ en_NZ.ISO8859-15 │11 │10 │ p│ 16 │ en_NZ.ISO8859-15 │ en_NZ.ISO8859-15 │ 0 │ │ en_NZ.UTF-8 │11 │10 │ p│ 6 │ en_NZ.UTF-8 │ en_NZ.UTF-8 │ 0 │ │ en_NZ%icu│11 │10 │ i│ -1 │ en_NZ│ en_NZ│ -1724383232 │ └──┴───┴───┴──┴──┴──┴──┴─┘ (7 rows) I notice that you use encoding -1, meaning "all". I haven't fully grokked what that really means but it appears that we won't be able to create any new such collations after initdb using CREATE COLLATION, if for example you update your ICU installation and now have a new collation available. When I tried dropping some and recreating them they finished up with collencoding = 6. Should the initdb-created rows use 6 too? + ucol_getVersion(collator, versioninfo); + numversion = ntohl(*((uint32 *) versioninfo)); + + if (numversion != collform->collversion) + ereport(WARNING, + (errmsg("ICU collator version mismatch"), + errdetail("The database was created using version 0x%08X, the library provides version 0x%08X.", + (uint32) collform->collversion, (uint32) numversion), + errhint("Rebuild affected indexes, or build PostgreSQL with the right version of ICU."))); I played around with bumping version numbers artificially. That gives each session that accesses the collation one warning: postgres=# select * from foo order by id; WARNING: ICU collator version mismatch DETAIL: The database was created using version 0x99380001, the library provides version 0x9938. HINT: Rebuild affected indexes, or build PostgreSQL with the right version of ICU. ┌┐ │ id │ ├┤ └┘ (0 rows) postgres=# select * from foo order by id; ┌┐ │ id │ ├┤ └
Re: [HACKERS] Showing parallel status in \df+
2016-09-23 7:22 GMT+02:00 Rushabh Lathia : > > > On Thu, Sep 22, 2016 at 10:04 PM, Tom Lane wrote: > >> Rushabh Lathia writes: >> > I agree with the argument in this thread, having "Source code" as part >> > of \df+ is bit annoying, specifically when output involve some really >> > big PL language functions. Having is separate does make \df+ output more >> > readable. So I would vote for \df++ rather then adding the source code >> > as part of footer for \df+. >> >> If it's unreadable in \df+, how would \df++ make that any better? >> >> > Eventhough source code as part of \df+ is bit annoying (specifically for > PL functions), > I noticed the argument in this thread that it's useful information for > some of. So \df++ > is just alternate option for the those who want the source code. > ++ is little bit obscure. So better to remove src everywhere. Regards Pavel > > > >> regards, tom lane >> > > > > -- > Rushabh Lathia >
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618
On Fri, Sep 23, 2016 at 1:21 AM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane wrote: >>> I'd be the first to agree that this point is inadequately documented >>> in the code, but PostmasterRandom should be reserved for its existing >>> security-related uses, not exposed to the world for (ahem) random other >>> uses. > >> So, we could have dsm_postmaster_startup() seed the random number >> generator itself, and then let PostmasterRandom() override the seed >> later. Like maybe: > > Works for me, at least as a temporary solution. The disturbing thing > here is that this still only does what we want if dsm_postmaster_startup > happens before the first PostmasterRandom call --- which is OK today but > seems pretty fragile. Still, redesigning PostmasterRandom's seeding > technique is not something to do right before 9.6 release. Let's revert > the prior patch and do it as you have here: > >> struct timeval tv; >> gettimeofday(&tv, NULL); >> srandom(tv.tv_sec); >> ... >> dsm_control_handle = random(); > > for the time being. > Isn't it better if we use the same technique in dsm_create() as well which uses random() for handle? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File system operations.
On 22 September 2016 at 20:02, Yury Zhuravlev wrote: > On четверг, 15 сентября 2016 г. 19:09:16 MSK, Tom Lane wrote: >> >> Robert Haas writes: >>> >>> On Thu, Sep 15, 2016 at 11:01 AM, Anastasia Lubennikova >>> wrote: What do you think about moving stuff from pg_upgrade/file.c to storage/file/ to allow reuse of this code? I think it'll be really helpful for developers of contrib modules like backup managers. >> >> >>> Well, storage/file is backend and pg_upgrade is frontend. If you want >>> to reuse the same code for both it's got to go someplace else. >> >> >> Also, to the extent that it's important to use those wrappers rather >> than libc directly, it's because the wrappers are tied into backend >> resource management and error handling conventions. I don't see how >> you convert that into code that also works in a frontend program, >> or what the point would be exactly. > > > Maybe we should towards to framework ecosystem. I mean, developers of third > party tools must use maximum same api what in backend. Well, there's a very gradual shift in that direction with libpgcommon etc. But it's minimalist. Most of what's done in the backend makes no sense in frontend code. It'd make much more sense to adopt an existing portable runtime (nspr, apr, whatever) than write our own if we wanted to go full framework. But I don't think we're likely to. Much more likely to cause pain than prevent it, esp since we're multiprocessing and shmem based. There are a few areas we could use abstractions for though, like fork()/exec() vs CreateProcessEx(...). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On Fri, Sep 23, 2016 at 9:40 AM, Peter Eisentraut wrote: > On 9/21/16 9:30 AM, Jesper Pedersen wrote: >> Attached is v5, which add basic page verification. > > There are still some things that I found that appear to follow the old > style (btree) conventions instead the new style (brin, gin) conventions. > > - Rename hash_metap to hash_metapage_info. > > - Don't use BuildTupleFromCStrings(). (There is nothing inherently > wrong with it, but why convert numbers to strings and back again when it > can be done more directly.) > > - Documentation for hash_page_stats still has blkno argument. > > - In hash_metap, the magic field could be type bytea, so that it > displays in hex. Or text like the brin functions. > > Some other comments: > > - hash_metap result fields spares and mapp should be arrays of integer. > how would you like to see both those arrays in tuple, right now, I think Jesper's code is showing it as string. > (Incidentally, the comment in hash.h talks about bitmaps[] but I think > it means hashm_mapp[].) > which comment are you referring here? hashm_mapp contains block numbers of bitmap pages. While looking at patch, I noticed below code which seems somewhat problematic: + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + /* page type (flags) */ + if (opaque->hasho_flag & LH_META_PAGE) + stat->type = 'm'; + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) + stat->type = 'v'; + else if (opaque->hasho_flag & LH_BUCKET_PAGE) + stat->type = 'b'; + else if (opaque->hasho_flag & LH_BITMAP_PAGE) + stat->type = 'i'; In the above code, it appears that you are trying to calculate max_avail space for all pages in same way. Don't you need to calculate it differently for bitmap page or meta page as they don't share the same format as other type of pages? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
On Thu, Sep 22, 2016 at 10:04 PM, Tom Lane wrote: > Rushabh Lathia writes: > > I agree with the argument in this thread, having "Source code" as part > > of \df+ is bit annoying, specifically when output involve some really > > big PL language functions. Having is separate does make \df+ output more > > readable. So I would vote for \df++ rather then adding the source code > > as part of footer for \df+. > > If it's unreadable in \df+, how would \df++ make that any better? > > Eventhough source code as part of \df+ is bit annoying (specifically for PL functions), I noticed the argument in this thread that it's useful information for some of. So \df++ is just alternate option for the those who want the source code. > regards, tom lane > -- Rushabh Lathia
Re: [HACKERS] Typo in libpq-int.h
On 09/22/2016 04:35 PM, Daniel Gustafsson wrote: Ran into a typo in libpq-int.h while reading/hacking: - char *gsslib; /* What GSS librart to use ("gssapi" or + char *gsslib; /* What GSS library to use ("gssapi” or Patch attached. Thanks, fixed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > On Tue, Sep 20, 2016 at 2:20 AM, Tsunakawa, Takayuki > wrote: > > There's no apparent evidence to indicate the cause, but I could guess > > a few reasons. What do you think these are correct and should fix > > PostgreSQL? (I think so) > > I think that we shouldn't start changing things based on guesses about what > the problem is, even if they're fairly smart guesses. The thing to do would > be to construct a test rig, crash the server repeatedly, and add debugging > instrumentation to figure out where the time is actually going. We have tried to reproduce the problem in the past several days with much more stress on our environment than on the customer's one -- 1,000 tables aiming for a dozens of times larger stats file and repeated reconnection requests from hundreds of clients -- but we could not succeed. > I do think your theory about the stats collector might be worth pursuing. > It seems that the stats collector only responds to SIGQUIT, ignoring SIGTERM. > Making it do a clean shutdown on SIGTERM and a fast exit on SIGQUIT seems > possibly worthwhile. Thank you for giving confidence for proceeding. And I also believe that postmaster should close the listening ports earlier. Regardless of whether this problem will be solved not confident these will solve the, I think it'd be better to fix these two points so that postmaster doesn't longer time than necessary. I think I'll create a patch after giving it a bit more thought. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
On 19 September 2016 at 07:12, Vladimir Gordiychuk wrote: > New patch in attach. 0001 and 0002 without changes. > 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca > after receive SIGINT will send CopyDone package to postgresql and wait from > server CopyDone. For backward compatible after repeat send SIGINT > pg_recvloginca will continue immediately without wait CopyDone from server. > 0004 patch contain regression tests on scenarios that fix 0001 and 0002 > patches. Great. Thanks for this. First observation (which I'll fix in updated patch): It looks like you didn't import my updated patches, so I've rebased your new patches on top of them. Whitespace issues: $ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop-replication.patch Applying: Add ability for pg_recvlogical safe stop replication /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent with spaces. msgBuf + hdr_len + bytes_written, /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent with spaces. /* Backward compatible, allow force interrupt logical replication /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent with spaces. * after second SIGINT without wait CopyDone from server /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent with spaces. */ warning: 4 lines add whitespace errors. Remember to use "git log --check" before sending patches. Also, comment style, please do /* this */ or /* * this */ not /* this */ I did't see you explain why this was removed: -/* fast path */ -/* Try to flush pending output to the client */ -if (pq_flush_if_writable() != 0) -WalSndShutdown(); - -if (!pq_is_send_pending()) -return; I fixed a warning introduced here: pg_recvlogical.c: In function ‘ProcessXLogData’: pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] int bytes_left = msgLength - hdr_len; ^ Some of the changes to pg_recvlogical look to be copied from receivelog.c, most notably the functions ProcessKeepalive and ProcessXLogData . I thought that rather than copying them, shouldn't the existing ones be modified to meet your needs? But it looks like the issue is that a fair chunk of the rest of pg_recvlogical doesn't re-use code from receivelog.c either; for example, pg_recvlogical's sendFeedback differs from receivelog.c's sendFeedback. So pg_recvlogical doesn't share the globals that receivelog.c assumes are used. Also, what I thought were copied from receivelog.c were actually extracted from code previously inline in StreamLogicalLog(...) in pg_recvlogical.c . I'm reluctant to try to untangle this in the same patch for risk that it'll get stalled by issues with refactoring. The change you've already made is a useful cleanup without messing with unnecessary code. Your patch 3 does something ... odd: src/test/recovery/t/001_stream_rep.pl| 63 -- src/test/recovery/t/002_archiving.pl | 53 --- src/test/recovery/t/003_recovery_targets.pl | 146 --- src/test/recovery/t/004_timeline_switch.pl | 75 -- src/test/recovery/t/005_replay_delay.pl | 69 src/test/recovery/t/006_logical_decoding.pl | 40 -- src/test/recovery/t/007_sync_rep.pl | 174 src/test/recovery/t/previous/001_stream_rep.pl | 63 ++ src/test/recovery/t/previous/002_archiving.pl| 53 +++ src/test/recovery/t/previous/003_recovery_targets.pl | 146 +++ src/test/recovery/t/previous/004_timeline_switch.pl | 75 ++ src/test/recovery/t/previous/005_replay_delay.pl | 69 src/test/recovery/t/previous/006_logical_decoding.pl | 40 ++ src/test/recovery/t/previous/007_sync_rep.pl | 174 so I've revised it to remove that whole change, which I think you probably did not mean to commit. Reworded commit message. Ensured that we send feedback just before a client-initiated CopyDone so server knows what position we saved up to. We don't discard already buffered data I've modified patch 3 so that it also flushes to disk and sends feedback before sending CopyDone, then discards any xlog data received after it sends CopyDone. This is helpful in ensuring that we get predictable behaviour when cancelling pg_recvxlog and restarting it because the client and server agree on the point at which replay stopped. I was evaluating the tests and I don't think they actually demonstrate that the patch works. There's nothing done to check that pg_recvl
Re: [HACKERS] pg_ctl promote wait
On Fri, Sep 23, 2016 at 12:16 AM, Masahiko Sawada wrote: > On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut > wrote: >> On 8/11/16 9:28 AM, Michael Paquier wrote: >> Committed with that adjustment. Thanks... > Commit e7010ce seems to forget to change help message of pg_ctl. > Attached patch. And right, we missed that bit. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Fri, Sep 23, 2016 at 7:02 AM, Robert Haas wrote: > On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro > wrote: > >> I was thinking about suggesting a category "Replication" to cover the >> waits for client IO relating to replication, as opposed to client IO >> waits relating to regular user connections. Then you could put sync >> rep into that category instead of IPC, even though technically it is >> waiting for IPC from walsender process(es), on the basis that it's >> more newsworthy to a DBA that it's really waiting for a remote replica >> to respond. But it's probably pretty clear what's going on from the >> the wait point names, so maybe it's not worth a category. Thoughts? > > I thought about a replication category but either it will only have > SyncRep in it, which is odd, or it will pull in other things that > otherwise fit nicely into the Activity category, and then that > boundaries of all the categories become mushy: is the subsystem that > causes the wait that we are trying to document, or the kind of thing > for which we are waiting? > I also think that it can add some confusion in defining boundaries and also might not be consistent with current way we have defined the waits, however there is some value in using subsystem which is that user can identify the bottlenecks with ease. I think this applies to both Replication and Parallel Query. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Thu, Sep 22, 2016 at 7:19 PM, Robert Haas wrote: > On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro > wrote: > > So, I tried to classify these. Here's what I came up with. > > Activity: ArchiverMain, AutoVacuumMain, BgWriterMain, > BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll, > RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain > > Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain, > WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart > > Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay > > IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather, > MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive, > MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep > > Extension: Extension > We already call lwlock waits from an extension as "extension", so I think just naming this an Extension might create some confusion. > I classified all of the main loop waits as waiting for activity; all > of those are background processes that are waiting for something to > happen and are more or less happy to sleep forever until it does. I > also included the RecoveryWalAll and RecoveryWalStream events in > there; those don't have the sort of "main loop" flavor of the others > but they are happy to wait more or less indefinitely for something to > occur. Likewise, it was pretty easy to find all of the events that > were waiting for client I/O, and I grouped those all under "Client". > A few of the remaining wait events seemed like they were clearly > waiting for a particular timeout to expire, so I gave those their own > "Timeout" category. > > I believe these categorizations are actually useful for users. For > example, you might want to see all of the waits in the system but > exclude the "Client", "Activity", and "Timeout" categories because > those are things that aren't signs of a problem. A "Timeout" wait is > one that you explicitly requested, a "Client" wait isn't the server's > fault, and an "Activity" wait just means nothing is happening. In > contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is > actually delaying work that we'd ideally prefer to have get done > sooner. > > I grouped the rest of this stuff as "IPC" because all of these events > are cases where one server process is waiting for another server > processes . That could be further subdivided, of course: most of > those events are only going to occur in relation to parallel query, > but I didn't want to group it that way explicitly because both > background workers and shm_mq have other potential uses. ProcSignal > and ProcSleep are related to heavyweight locks and SyncRep is of > course related to synchronous replication. But they're all related > in that one server process is waiting for another server process to > tell it that a certain state has been reached, so IPC seems like a > good categorization. > > Finally, extensions got their own category in this taxonomy, though I > wonder if it would be better to instead have > Activity/ExtensionActivity, Client/ExtensionClient, > Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a > separate toplevel category. > +1. It can avoid confusion. > To me, this seems like a pretty solid toplevel categorization and a > lot more useful than just throwing all of these in one bucket and > saying "good luck". Agreed. This categorisation is very good and can help patch author to proceed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 9/21/16 9:30 AM, Jesper Pedersen wrote: > Attached is v5, which add basic page verification. There are still some things that I found that appear to follow the old style (btree) conventions instead the new style (brin, gin) conventions. - Rename hash_metap to hash_metapage_info. - Don't use BuildTupleFromCStrings(). (There is nothing inherently wrong with it, but why convert numbers to strings and back again when it can be done more directly.) - Documentation for hash_page_stats still has blkno argument. - In hash_metap, the magic field could be type bytea, so that it displays in hex. Or text like the brin functions. Some other comments: - hash_metap result fields spares and mapp should be arrays of integer. (Incidentally, the comment in hash.h talks about bitmaps[] but I think it means hashm_mapp[].) - As of very recently, we don't need to move pageinspect--1.5.sql to pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql. - In the documentation for hash_page_stats(), the "+" sign is misaligned. - In hash_page_items, the fields itemlen, nulls, vars are always 16, false, false. So perhaps there is no need for them. Similarly, the hash_page_stats in hash_page_stats is always 16. - The data field could be of type bytea. - Add a pointer in the documentation to the relevant header file. Bonus: - Add subsections in the documentation (general functions, B-tree functions, etc.) - Add tests. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On Thu, Sep 22, 2016 at 7:32 PM, Robert Haas wrote: > On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila wrote: >> I think for certain cases like into clause, the rows passed will be >> equal to actual number of rows, otherwise it will generate error. So >> we can pass that information in executor layer. Another kind of cases >> which are worth considering are when from plpgsql we fetch limited >> rows at-a-time, but we fetch till end like the case of >> exec_stmt_return_query(). > > Yes, I think that those cases can be considered. Some careful code > inspection will be needed to make sure the cases we want to enable are > safe, and some testing will be needed to make sure they behave > properly. But it doesn't sound like an unsolvable problem. I hope > someone who isn't me will decide to work on it. :-) > makes sense. I think along with that we can also evaluate, if we can enable parallel query from other pl languages. I think if we can enable parallelism from all pl languages in 10.0, that will be a good step forward. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra wrote: > On 09/21/2016 08:04 AM, Amit Kapila wrote: >> > > (c) Although it's not visible in the results, 4.5.5 almost perfectly > eliminated the fluctuations in the results. For example when 3.2.80 produced > this results (10 runs with the same parameters): > > 12118 11610 27939 11771 18065 > 12152 14375 10983 13614 11077 > > we get this on 4.5.5 > > 37354 37650 37371 37190 37233 > 38498 37166 36862 37928 38509 > > Notice how much more even the 4.5.5 results are, compared to 3.2.80. > how long each run was? Generally, I do half-hour run to get stable results. > (d) There's no sign of any benefit from any of the patches (it was only > helpful >= 128 clients, but that's where the tps actually dropped on 3.2.80 > - apparently 4.5.5 fixes that and the benefit is gone). > > It's a bit annoying that after upgrading from 3.2.80 to 4.5.5, the > performance with 32 and 64 clients dropped quite noticeably (by more than > 10%). I believe that might be a kernel regression, but perhaps it's a price > for improved scalability for higher client counts. > > It of course begs the question what kernel version is running on the machine > used by Dilip (i.e. cthulhu)? Although it's a Power machine, so I'm not sure > how much the kernel matters on it. > cthulhu is a x86 m/c and the kernel version is 3.10. Seeing, the above results I think kernel version do matter, but does that mean we ignore the benefits we are seeing on somewhat older kernel version. I think right answer here is to do some experiments which can show the actual contention as suggested by Robert and you. > I'll ask someone else with access to this particular machine to repeat the > tests, as I have a nagging suspicion that I've missed something important > when compiling / running the benchmarks. I'll also retry the benchmarks on > 3.2.80 to see if I get the same numbers. > >> >> Okay, but I think it is better to see the results between 64~128 >> client count and may be greater than128 client counts, because it is >> clear that patch won't improve performance below that. >> > > There are results for 64, 128 and 192 clients. Why should we care about > numbers in between? How likely (and useful) would it be to get improvement > with 96 clients, but no improvement for 64 or 128 clients? > The only point to take was to see from where we have started seeing improvement, saying that the TPS has improved from >=72 client count is different from saying that it has improved from >=128. >> No issues, I have already explained why I think it is important to >> reduce the remaining CLOGControlLock contention in yesterday's and >> this mail. If none of you is convinced, then I think we have no >> choice but to drop this patch. >> > > I agree it's useful to reduce lock contention in general, but considering > the last set of benchmarks shows no benefit with recent kernel, I think we > really need a better understanding of what's going on, what workloads / > systems it's supposed to improve, etc. > > I don't dare to suggest rejecting the patch, but I don't see how we could > commit any of the patches at this point. So perhaps "returned with feedback" > and resubmitting in the next CF (along with analysis of improved workloads) > would be appropriate. > Agreed with your conclusion and changed the status of patch in CF accordingly. Many thanks for doing the tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 7:17 AM, Tomas Vondra wrote: > On 09/23/2016 03:20 AM, Robert Haas wrote: >> >> On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra >> wrote: >>> >>> I don't dare to suggest rejecting the patch, but I don't see how >>> we could commit any of the patches at this point. So perhaps >>> "returned with feedback" and resubmitting in the next CF (along >>> with analysis of improvedworkloads) would be appropriate. >> >> >> I think it would be useful to have some kind of theoretical analysis >> of how much time we're spending waiting for various locks. So, for >> example, suppose we one run of these tests with various client >> counts - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run >> "select wait_event from pg_stat_activity" once per second throughout >> the test. Then we see how many times we get each wait event, >> including NULL (no wait event). Now, from this, we can compute the >> approximate percentage of time we're spending waiting on >> CLogControlLock and every other lock, too, as well as the percentage >> of time we're not waiting for lock. That, it seems to me, would give >> us a pretty clear idea what the maximum benefit we could hope for >> from reducing contention on any given lock might be. >> > > Yeah, I think that might be a good way to analyze the locks in general, not > just got these patches. 24h run with per-second samples should give us about > 86400 samples (well, multiplied by number of clients), which is probably > good enough. > > We also have LWLOCK_STATS, that might be interesting too, but I'm not sure > how much it affects the behavior (and AFAIK it also only dumps the data to > the server log). > Right, I think LWLOCK_STATS give us the count of how many time we have blocked due to particular lock like below where *blk* gives that number. PID 164692 lwlock main 11: shacq 2734189 exacq 146304 blk 73808 spindelay 73 dequeue self 57241 I think doing some experiments with both the techniques can help us to take a call on these patches. Do we want these experiments on different kernel versions or are we okay with the current version on cthulhu (3.10) or we want to only consider the results with latest kernel? >> >> >> Now, we could also try that experiment with various patches. If we >> can show that some patch reduces CLogControlLock contention without >> increasing TPS, they might still be worth committing for that >> reason. Otherwise, you could have a chicken-and-egg problem. If >> reducing contention on A doesn't help TPS because of lock B and >> visca-versa, then does that mean we can never commit any patch to >> reduce contention on either lock? Hopefully not. But I agree with you >> that there's certainly not enough evidence to commit any of these >> patches now. To my mind, these numbers aren't convincing. >> > > Yes, the chicken-and-egg problem is why the tests were done with unlogged > tables (to work around the WAL lock). > Yeah, but I suspect still there was a impact due to ProcArrayLock. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
On 23 September 2016 at 00:32, Tom Lane wrote: > Craig Ringer writes: >> On 13 September 2016 at 22:02, Tom Lane wrote: >>> Without taking a position on the merits of this patch per se, I'd like >>> to say that I find the argument for back-patching into 9.6 and not >>> further than that to be pretty dubious. $(prove_check) has been there >>> since 9.4, and in the past we've often regretted it when we failed >>> to back-patch TAP infrastructure fixes all the way back to 9.4. > >> No objection to backpatching, I just thought I'd be more intrusive to >> do that than just 9.6. > >> Since 9.5 and older have more limited versions of PostgresNode which >> lack safe_psql, etc, I'm not sure it's very practical for extensions >> to bother running TAP tests on 9.4 and 9.5 anyway. > > Certainly there are restrictions, but I'd imagine that every new release > will be adding features to the TAP test infrastructure for some time to > come. I think it's silly to claim that 9.6 is the first branch where > TAP testing is usable at all. OK. I didn't intend to claim 9.6 is the first branch where it's usable, just that the differences between 9.4/9.5 and 9.6 mean that supporting both in extensions will likely be more pain than it is worth. Mainly due to the safe_psql change in 2c83f435. It might've been good to backport that, but it was pretty wide reaching and at the time I wasn't thinking in terms of using TAP tests in extensions so it didn't occur to me. Extension Perl code can always use some adapter/glue code to handle 9.4 and 9.5 if they want, or error if they don't, so it's not a fatal barrier. Also, despite what I said upthread, there's no need for normal PGXS-using extensions to define their $(prove_installcheck) replacement. It works as-is. The problems I was having stemmed from the fact that I was working with a pretty nonstandard Makefile without realising that the changes were going to affect prove. $(prove_check) isn't useful from extensions of course since it expects a temp install and PGXS doesn't know how to make one, but $(prove_installcheck) does the job fine. It's thus sufficient to apply the patch to install the perl modules to 9.4, 9.5 and 9.6. Nothing else is needed. I've attached backports for 9.4 and 9.5. If you were really keen, we could actually backport the new TAP code to 9.4 and 9.5 pretty much wholesale. 9.4 and 9.5 don't have the psql method, PostgresNode, etc, so there's nothing to break. But that's for separate consideration. >> Extension authors can just use: >> ifeq ($(MAJORVERSION),9.6) >> endif >> when defining their prove rules. > > That will break as soon as 10 comes out. And numerical >= tests aren't > all that convenient in Make. It'd be much better if a test on whether > $(prove_check) is defined would be sufficient. Fair enough. I forgot how much numeric range tests sucked in Make. In that case we should backpatch the installation of the Perl modules right back to 9.4. There's not even a need to test if $(prove_installcheck) is defined then, just need a semicolon so it evaluates to empty e.g. prove_installcheck: $(prove_installcheck) ; check: prove_check So ... patches attached. The 9.6 patch applies to head too. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 75c5c0fe98618a6e59af82c4314832df47e6268e Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 23 Sep 2016 10:15:00 +0800 Subject: [PATCH] Install Perl modules for TAP tests for use from PGXS --- src/Makefile | 1 + src/test/Makefile | 2 +- src/test/perl/GNUmakefile | 35 +++ 3 files changed, 37 insertions(+), 1 deletion(-) create mode 100644 src/test/perl/GNUmakefile diff --git a/src/Makefile b/src/Makefile index e859826..750c3a9 100644 --- a/src/Makefile +++ b/src/Makefile @@ -25,6 +25,7 @@ SUBDIRS = \ bin \ pl \ makefiles \ + test/perl \ test/regress # There are too many interdependencies between the subdirectories, so diff --git a/src/test/Makefile b/src/test/Makefile index 0fd7eab..ef1dcb5 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -12,6 +12,6 @@ subdir = src/test top_builddir = ../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = regress isolation +SUBDIRS = regress isolation perl $(recurse) diff --git a/src/test/perl/GNUmakefile b/src/test/perl/GNUmakefile new file mode 100644 index 000..09945f7 --- /dev/null +++ b/src/test/perl/GNUmakefile @@ -0,0 +1,35 @@ +#- +# +# Makefile for src/test/perl +# +# Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group +# Portions Copyright (c) 1994, Regents of the University of California +# +# src/test/perl/GNUmakefile +# +#- + +subdir = src/test/perl +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.globa
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
This is looking pretty good. More comments on this patch set: 0001: Keep the file order alphabetical in Mkvcbuild.pm. Needs nls.mk updates for file move (in initdb and pg_basebackup directories). 0002: durable_rename needs close(fd) in non-error path (compare backend code). Changing from fsync() to fsync_name() in some cases means that inaccessible files are now ignored. I guess this would only happen in very obscure circumstances, but it's worth considering if that is OK. You added this comment: * XXX: This means that we might not restart if a crash occurs before the * fsync below. We probably should create the file in a temporary path * like the backend does... pg_receivexlog uses the .partial suffix for this reason. Why doesn't pg_basebackup do that? In open_walfile, could we move the fsync calls before the fstat or somewhere around there so we don't have to make the same calls in two different branches? 0003: There was a discussion about renaming the --nosync option in initdb to --no-sync, but until that is done, the option in pg_basebackup should stay what initdb has right now. There is a whitespace alignment error in usage(). The -N option should be listed after -n. Some fsync calls are not covered by a do_sync conditional. I see some in close_walfile(), HandleCopyStream(), ProcessKeepaliveMsg(). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 03:20 AM, Robert Haas wrote: On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra wrote: I don't dare to suggest rejecting the patch, but I don't see how we could commit any of the patches at this point. So perhaps "returned with feedback" and resubmitting in the next CF (along with analysis of improvedworkloads) would be appropriate. I think it would be useful to have some kind of theoretical analysis of how much time we're spending waiting for various locks. So, for example, suppose we one run of these tests with various client counts - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select wait_event from pg_stat_activity" once per second throughout the test. Then we see how many times we get each wait event, including NULL (no wait event). Now, from this, we can compute the approximate percentage of time we're spending waiting on CLogControlLock and every other lock, too, as well as the percentage of time we're not waiting for lock. That, it seems to me, would give us a pretty clear idea what the maximum benefit we could hope for from reducing contention on any given lock might be. Yeah, I think that might be a good way to analyze the locks in general, not just got these patches. 24h run with per-second samples should give us about 86400 samples (well, multiplied by number of clients), which is probably good enough. We also have LWLOCK_STATS, that might be interesting too, but I'm not sure how much it affects the behavior (and AFAIK it also only dumps the data to the server log). > Now, we could also try that experiment with various patches. If we can show that some patch reduces CLogControlLock contention without increasing TPS, they might still be worth committing for that reason. Otherwise, you could have a chicken-and-egg problem. If reducing contention on A doesn't help TPS because of lock B and visca-versa, then does that mean we can never commit any patch to reduce contention on either lock? Hopefully not. But I agree with you that there's certainly not enough evidence to commit any of these patches now. To my mind, these numbers aren't convincing. Yes, the chicken-and-egg problem is why the tests were done with unlogged tables (to work around the WAL lock). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro wrote: > Interesting. OK, I agree that it'd be useful to show that we're > waiting because there's nothing happening, or waiting because the user > asked us to sleep, or waiting on IO, or waiting for an IPC response > because something is happening, and that higher level information is > difficult/impossible to extract automatically from the WaitEventSet. Cool. :-) > I understand that "Activity" is the category of wait points that are > waiting for activity, but I wonder if it might be clearer to users if > that were called "Idle", because it's the category of idle waits > caused by non-activity. I thought about that but figured it would be better to consistently state the thing *for which* we were waiting. We wait FOR a client or a timeout or activity. We do not wait FOR idle; we wait to be NOT idle. > Why is WalSenderMain not in that category alongside WalReceiverMain? > Hmm, actually it's kind of a tricky one: whether it's really idle or > waiting for IO depends. It's always ready to wait for clients to send > messages, but I'd say that's part of its "idle" behaviour. But it's > sometimes waiting for the socket to be writable: if > (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's > when it's definitely not idle, it's actively trying to feed WAL down > the pipe. Do we want to get into dynamic categories depending on > conditions like that? I suspect that's overkill. I don't want wait-point-naming to make programming the system noticeably more difficult, so I think it's fine to pick a categorization of what we think the typical case will be and call it good. If we try that and people find it's a nuisance, we can fix it then. In the case of WAL sender, I assume it will normally be waiting for more WAL to be generated; whereas in the case of WAL receiver, I assume it will normally be waiting for more WAL to be received from the remote side. The reverse cases are possible: the sender could be waiting for the socket buffer to drain so it can push more WAL onto the wire, and the receiver could likewise be waiting for buffer space to push out feedback messages. But probably mostly not. At least for a first cut, I'd be inclined to handle this fuzziness by putting weasel-words in the documentation rather than by trying to make the reporting 100% perfectly accurate. > I was thinking about suggesting a category "Replication" to cover the > waits for client IO relating to replication, as opposed to client IO > waits relating to regular user connections. Then you could put sync > rep into that category instead of IPC, even though technically it is > waiting for IPC from walsender process(es), on the basis that it's > more newsworthy to a DBA that it's really waiting for a remote replica > to respond. But it's probably pretty clear what's going on from the > the wait point names, so maybe it's not worth a category. Thoughts? I thought about a replication category but either it will only have SyncRep in it, which is odd, or it will pull in other things that otherwise fit nicely into the Activity category, and then that boundaries of all the categories become mushy: is the subsystem that causes the wait that we are trying to document, or the kind of thing for which we are waiting? > I do suspect that the set of wait points will grow quite a bit as we > develop more parallel stuff though. For example, I have been working > on a patch that adds several more wait points, indirectly via > condition variables (using your patch). Actually in my case it's > BarrierWait -> ConditionVariableWait -> WaitEventSetWait. I propose > that these higher level wait primitives should support passing a wait > point identifier through to WaitEventSetWait. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra wrote: > I don't dare to suggest rejecting the patch, but I don't see how we could > commit any of the patches at this point. So perhaps "returned with feedback" > and resubmitting in the next CF (along with analysis of improved workloads) > would be appropriate. I think it would be useful to have some kind of theoretical analysis of how much time we're spending waiting for various locks. So, for example, suppose we one run of these tests with various client counts - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select wait_event from pg_stat_activity" once per second throughout the test. Then we see how many times we get each wait event, including NULL (no wait event). Now, from this, we can compute the approximate percentage of time we're spending waiting on CLogControlLock and every other lock, too, as well as the percentage of time we're not waiting for lock. That, it seems to me, would give us a pretty clear idea what the maximum benefit we could hope for from reducing contention on any given lock might be. Now, we could also try that experiment with various patches. If we can show that some patch reduces CLogControlLock contention without increasing TPS, they might still be worth committing for that reason. Otherwise, you could have a chicken-and-egg problem. If reducing contention on A doesn't help TPS because of lock B and visca-versa, then does that mean we can never commit any patch to reduce contention on either lock? Hopefully not. But I agree with you that there's certainly not enough evidence to commit any of these patches now. To my mind, these numbers aren't convincing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why postgres take RowExclusiveLock on all partition
On Fri, Sep 16, 2016 at 09:56:39PM +0530, Sachin Kotwal wrote: > Hi Tom, > > What I understood from this https://www.postgresql.org/docs/9.5/static/ > explicit-locking.html#TABLE-LOCK-COMPATIBILITY > is : > > The RowExclusiveLock conflicts with queries want SHARE, SHARE ROW EXCLUSIVE, > EXCLUSIVE ACCESS EXCLUSIVE locks. > > In one of our customer environment we want do some DDL operation everyday > through cronjobs . This cronjobs get blocked by RowExclusiveLock lock taken by > UPDATE query. And then lot more queries are waiting on this cronjob as sqls > under cronjob have hold ACCESS EXCLUSIVE on related tables involved in other > select queries. > > > If we can not reduce locking in partition scenario, then it is fine. We can > consider this is limitation of PostgreSQL or any other RDBMS system. We can't have DDL happening while a table is being accessed. I guess we could drop the lock once we are done with the partition but we don't currently do that, and it would be complicated. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/21/2016 08:04 AM, Amit Kapila wrote: On Wed, Sep 21, 2016 at 3:48 AM, Tomas Vondra wrote: ... I'll repeat the test on the 4-socket machine with a newer kernel, but that's probably the last benchmark I'll do for this patch for now. Attached are results from benchmarks running on kernel 4.5 (instead of the old 3.2.80). I've only done synchronous_commit=on, and I've added a few client counts (mostly at the lower end). The data are pushed the data to the git repository, see git push --set-upstream origin master The summary looks like this (showing both the 3.2.80 and 4.5.5 results): 1) Dilip's workload 3.2.80 16 32 64128192 --- master 26138 37790 38492 13653 8337 granular-locking25661 38586 40692 14535 8311 no-content-lock 25653 39059 41169 14370 8373 group-update26472 39170 42126 18923 8366 4.5.5 1 8 16 32 64128192 --- granular-locking 4050 23048 27969 32076 34874 36555 37710 no-content-lock4025 23166 28430 33032 35214 37576 39191 group-update 4002 23037 28008 32492 35161 36836 38850 master 3968 22883 27437 32217 34823 36668 38073 2) pgbench 3.2.80 16 32 64128192 --- master 22904 36077 41295 35574 8297 granular-locking23323 36254 42446 43909 8959 no-content-lock 23304 36670 42606 48440 8813 group-update23127 36696 41859 46693 8345 4.5.5 1 8 16 32 64128192 --- granular-locking 3116 19235 27388 29150 31905 34105 36359 no-content-lock3206 19071 27492 29178 32009 34140 36321 group-update 3195 19104 26888 29236 32140 33953 35901 master 3136 18650 26249 28731 31515 33328 35243 The 4.5 kernel clearly changed the results significantly: (a) Compared to the results from 3.2.80 kernel, some numbers improved, some got worse. For example, on 3.2.80 pgbench did ~23k tps with 16 clients, on 4.5.5 it does 27k tps. With 64 clients the performance dropped from 41k tps to ~34k (on master). (b) The drop above 64 clients is gone - on 3.2.80 it dropped very quickly to only ~8k with 192 clients. On 4.5 the tps actually continues to increase, and we get ~35k with 192 clients. (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. (d) There's no sign of any benefit from any of the patches (it was only helpful >= 128 clients, but that's where the tps actually dropped on 3.2.80 - apparently 4.5.5 fixes that and the benefit is gone). It's a bit annoying that after upgrading from 3.2.80 to 4.5.5, the performance with 32 and 64 clients dropped quite noticeably (by more than 10%). I believe that might be a kernel regression, but perhaps it's a price for improved scalability for higher client counts. It of course begs the question what kernel version is running on the machine used by Dilip (i.e. cthulhu)? Although it's a Power machine, so I'm not sure how much the kernel matters on it. I'll ask someone else with access to this particular machine to repeat the tests, as I have a nagging suspicion that I've missed something important when compiling / running the benchmarks. I'll also retry the benchmarks on 3.2.80 to see if I get the same numbers. Okay, but I think it is better to see the results between 64~128 client count and may be greater than128 client counts, because it is clear that patch won't improve performance below that. There are results for 64, 128 and 192 clients. Why should we care about numbers in between? How likely (and useful) would it be to get improvement with 96 clients, but no improvement for 64 or 128 clients? >> I agree with Robert that the cases the patch is supposed to improve are a bit contrived because of the very high client counts. No issues, I have already explained why I think it is important to reduce the remaining CLOGControlLock contention in yesterday's and this mail. If none of you is convinced, then I think we have no choice but to drop this patch. I agree it's useful to reduce lock contention in general, but considering the last set of
Re: [HACKERS] pg_upgrade vs user created range type extension
On 09/22/2016 07:33 PM, Tom Lane wrote: Andrew Dunstan writes: I have just encountered an apparent bug in pg_upgrade (or possibly pg_dump). Hmm, it sort of looks like pg_dump believes it should dump the range's constructor function in binary-upgrade mode, while the backend is creating the constructor function during CREATE TYPE anyway. But if that's the case, upgrade of user-defined range types would never have worked ... seems like we should have noticed before now. If that diagnosis is correct, we should either change pg_dump to not dump that function, or change CREATE TYPE AS RANGE to not auto-create the constructor functions in binary-upgrade mode. The latter might be more flexible in the long run. Yeah, I think your diagnosis is correct. I'm not sure I see the point of the flexibility given that you can't specify a constructor function for range types (if that feature had been available I would probably have used it in this extension). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade vs user created range type extension
Andrew Dunstan writes: > I have just encountered an apparent bug in pg_upgrade (or possibly pg_dump). Hmm, it sort of looks like pg_dump believes it should dump the range's constructor function in binary-upgrade mode, while the backend is creating the constructor function during CREATE TYPE anyway. But if that's the case, upgrade of user-defined range types would never have worked ... seems like we should have noticed before now. If that diagnosis is correct, we should either change pg_dump to not dump that function, or change CREATE TYPE AS RANGE to not auto-create the constructor functions in binary-upgrade mode. The latter might be more flexible in the long run. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Fri, Sep 23, 2016 at 1:49 AM, Robert Haas wrote: > On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro > wrote: >>> Moreover, it's pretty confusing that we have this general concept of >>> wait events in pg_stat_activity, and then here the specific type of >>> wait event we're waiting for is the ... wait event kind. Uh, what? >> >> Yeah, that is confusing. It comes about because of the coincidence >> that pg_stat_activity finished up with a wait_event column, and our IO >> multiplexing abstraction finished up with the name WaitEventSet. >> We could instead call these new things "wait >> points", because, well, erm, they name points in the code at which we >> wait. They don't name events (they just happen to use the >> WaitEventSet mechanism, which itself does involve events: but those >> events are things like "hey, this socket is now ready for >> writing"). > > Sure, we could do that, but it means breaking backward compatibility > for pg_stat_activity *again*. I'd be willing to do it but I don't > think I'd win any popularity contests. I didn't mean changing the column headings in pg_stat_activity. I meant that the enum called WaitEventIdentifier in Michael's v5 patch should be called WaitPointIdentifier, and if we go with a single name to appear in the wait_event_type column then it could be "WaitPoint". But I would also prefer to see something more informative in that column, as discussed below (and upthread). >> Well we already discussed a couple of different ways to get "Socket", >> "Latch", "Socket|Latch", ... or something like that into the >> wait_event_type column or new columns. Wouldn't that be better, and >> automatically fall out of the code rather than needing human curated >> categories? Although Michael suggested that that should be done as a >> separate patch on top of the basic feature. > > I think making that a separate patch is just punting the decision down > the field to a day that may never come. Let's try to agree on > something that we can all live with and implement it now. In terms of > avoiding human-curated categories, I basically see two options: > > 1. Find a name that is sufficiently generic that it covers everything > that might reach WaitEventSetWait either now or in the future when it > might wait for even more kinds of things than it does now. For > example, we could call it "Stuff" or "Thing". Less tongue-in-cheek > suggestions are welcome, but it's hard to come up with something that > is sufficiently-generic without being tautological. "Event" is an > example of a name which is general enough to encompass everything but > also stupid: the column is called "wait_event" so everything that > appears in it is an event by definition. > > 2. Classify the events that fall into this category by some rigid > principle based on the types of things being awaited. For example, we > could decide that if any sockets are awaited, the event class will be > "Client" if it is connected to a user and "IPC" for auxiliary > processes. > > [...] > occur. Likewise, it was pretty easy to find all of the events that > were waiting for client I/O, and I grouped those all under "Client". > A few of the remaining wait events seemed like they were clearly > waiting for a particular timeout to expire, so I gave those their own > "Timeout" category. Interesting. OK, I agree that it'd be useful to show that we're waiting because there's nothing happening, or waiting because the user asked us to sleep, or waiting on IO, or waiting for an IPC response because something is happening, and that higher level information is difficult/impossible to extract automatically from the WaitEventSet. I understand that "Activity" is the category of wait points that are waiting for activity, but I wonder if it might be clearer to users if that were called "Idle", because it's the category of idle waits caused by non-activity. Why is WalSenderMain not in that category alongside WalReceiverMain? Hmm, actually it's kind of a tricky one: whether it's really idle or waiting for IO depends. It's always ready to wait for clients to send messages, but I'd say that's part of its "idle" behaviour. But it's sometimes waiting for the socket to be writable: if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's when it's definitely not idle, it's actively trying to feed WAL down the pipe. Do we want to get into dynamic categories depending on conditions like that? I was thinking about suggesting a category "Replication" to cover the waits for client IO relating to replication, as opposed to client IO waits relating to regular user connections. Then you could put sync rep into that category instead of IPC, even though technically it is waiting for IPC from walsender process(es), on the basis that it's more newsworthy to a DBA that it's really waiting for a remote replica to respond. But it's probably pretty clear what's going on from the the wait point names, so maybe it's not worth a categ
[HACKERS] pg_upgrade vs user created range type extension
I have just encountered an apparent bug in pg_upgrade (or possibly pg_dump). To recreate, install the cranges extension - which can be obtained via "git clone https://bitbucket.org/adunstan/pg-closed-ranges.git"; - for both 9.4 and 9.5. Create a fresh 9.4 instance, create a database and in it run "create extension cranges schema pg_catalog". Then create a fresh 9.5 instance and try to pg_upgrade from the 9.4 instance to the 9.5 instance. Here's the tail of the log: pg_restore: creating SCHEMA "public" pg_restore: creating COMMENT "SCHEMA "public"" pg_restore: creating EXTENSION "cranges" pg_restore: creating COMMENT "EXTENSION "cranges"" pg_restore: creating SHELL TYPE "pg_catalog.cdaterange" pg_restore: creating FUNCTION "pg_catalog.cdaterange_canonical("cdaterange")" pg_restore: creating TYPE "pg_catalog.cdaterange" pg_restore: creating SHELL TYPE "pg_catalog.cint4range" pg_restore: creating FUNCTION "pg_catalog.cint4range_canonical("cint4range")" pg_restore: creating TYPE "pg_catalog.cint4range" pg_restore: creating SHELL TYPE "pg_catalog.cint8range" pg_restore: creating FUNCTION "pg_catalog.cint8range_canonical("cint8range")" pg_restore: creating TYPE "pg_catalog.cint8range" pg_restore: creating FUNCTION "pg_catalog.cdaterange("date", "date")" pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 191; 1255 16389 FUNCTION cdaterange("date", "date") andrew pg_restore: [archiver (db)] could not execute query: ERROR: function "cdaterange" already exists with same argument types Command was: CREATE FUNCTION "cdaterange"("date", "date") RETURNS "cdaterange" LANGUAGE "internal" IMMUTABLE AS $$range_construct... The end result is that I currently can't upgrade a database using this extension, which is rather ugly. Similar things happen if I put the extension in public instead of pg_catalog. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: > It'd be good if you could overlap the final merges in the workers with the > merge in the leader. ISTM it would be quite straightforward to replace the > final tape of each worker with a shared memory queue, so that the leader > could start merging and returning tuples as soon as it gets the first tuple > from each worker. Instead of having to wait for all the workers to complete > first. If you do that, make sure to have the leader read multiple tuples at a time from each worker whenever possible. It makes a huge difference to performance. See bc7fcab5e36b9597857fa7e3fa6d9ba54aaea167. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On 08/02/2016 01:18 AM, Peter Geoghegan wrote: No merging in parallel -- Currently, merging worker *output* runs may only occur in the leader process. In other words, we always keep n worker processes busy with scanning-and-sorting (and maybe some merging), but then all processes but the leader process grind to a halt (note that the leader process can participate as a scan-and-sort tuplesort worker, just as it will everywhere else, which is why I specified "parallel_workers = 7" but talked about 8 workers). One leader process is kept busy with merging these n output runs on the fly, so things will bottleneck on that, which you saw in the example above. As already described, workers will sometimes merge in parallel, but only their own runs -- never another worker's runs. I did attempt to address the leader merge bottleneck by implementing cross-worker run merging in workers. I got as far as implementing a very rough version of this, but initial results were disappointing, and so that was not pursued further than the experimentation stage. Parallel merging is a possible future improvement that could be added to what I've come up with, but I don't think that it will move the needle in a really noticeable way. It'd be good if you could overlap the final merges in the workers with the merge in the leader. ISTM it would be quite straightforward to replace the final tape of each worker with a shared memory queue, so that the leader could start merging and returning tuples as soon as it gets the first tuple from each worker. Instead of having to wait for all the workers to complete first. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permiss
Robert Haas writes: > On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane wrote: >> I'd be the first to agree that this point is inadequately documented >> in the code, but PostmasterRandom should be reserved for its existing >> security-related uses, not exposed to the world for (ahem) random other >> uses. > So, we could have dsm_postmaster_startup() seed the random number > generator itself, and then let PostmasterRandom() override the seed > later. Like maybe: Works for me, at least as a temporary solution. The disturbing thing here is that this still only does what we want if dsm_postmaster_startup happens before the first PostmasterRandom call --- which is OK today but seems pretty fragile. Still, redesigning PostmasterRandom's seeding technique is not something to do right before 9.6 release. Let's revert the prior patch and do it as you have here: > struct timeval tv; > gettimeofday(&tv, NULL); > srandom(tv.tv_sec); > ... > dsm_control_handle = random(); for the time being. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possibly too stringent Assert() in b-tree code
Robert Haas writes: > On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila wrote: >> I think you have a valid point. It seems we don't need to write WAL >> for reuse page (aka don't call _bt_log_reuse_page()), if the page is >> new, as the only purpose of that log is to handle conflict based on >> transaction id stored in special area which will be anyway zero. > +1. This is clearly an oversight in Simon's patch fafa374f2, which introduced this code without any consideration for the possibility that the page doesn't have a valid special area. We could prevent the crash by doing nothing if PageIsNew, a la if (_bt_page_recyclable(page)) { /* * If we are generating WAL for Hot Standby then create a * WAL record that will allow us to conflict with queries * running on standby. */ - if (XLogStandbyInfoActive() && RelationNeedsWAL(rel)) + if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) && + !PageIsNew(page)) { BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); _bt_log_reuse_page(rel, blkno, opaque->btpo.xact); } /* Okay to use page. Re-initialize and return it */ but I'm not very clear on whether this is a safe fix, mainly because I don't understand what the reuse WAL record really accomplishes. Maybe we need to instead generate a reuse record with some special transaction ID indicating worst-case assumptions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use of SizeOfIptrData - is that obsolete?
Pavan Deolasee writes: > On Tue, Sep 20, 2016 at 8:34 PM, Tom Lane wrote: >> Realistically, because struct HeapTupleHeaderData contains a field of >> type ItemPointerData, it's probably silly to imagine that we can save >> anything if the compiler can't be persuaded to believe that >> sizeof(ItemPointerData) is 6. It may well be that the structure pragmas >> work on everything that wouldn't natively believe that anyway. > Yeah, that's what I thought and rest of the code seems to make that > assumption as well. Attached patch removes the last remaining reference to > SizeOfIptrData and also removes the macro and the associated comment. I thought removing the comment altogether was not appropriate, because it remains true that you want to work really hard to ensure that sizeof(ItemPointerData) is 6. We're just giving up on pretense of support for compilers that don't believe that. I'm half tempted to introduce a StaticAssert about it, but refrained for the moment. > While htup.h refactoring happened in 9.5, I don't see any point in back > patching this. Agreed. Pushed to HEAD only. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gratuitous casting away const
> On Sep 22, 2016, at 9:14 AM, Tom Lane wrote: > > I'd call this kind of a wash, I guess. I'd be more excited about it if > the change allowed removal of an actual cast-away-of-constness somewhere. > > I suppose it's a bit of a chicken and egg situation, in that the lack > of const markings on leaf subroutines discourages use of "const" in > callers, and you have to start somewhere if you want to make it better. > But I don't really want to just plaster "const" onto individual functions > without some larger vision of where we're going and which code is going > to benefit. Otherwise it seems like mostly just code churn. > > regards, tom lane I have two purposes in doing this. First, I find the code more self-documenting this way. Second, I can get whole directories to compile cleanly without warnings using the -Wcast-qual flag, where currently that flag results in warnings. That makes it possible to add cast-qual to more individual source directories' Makefiles than I can currently do while still using -Werror in Makefile.global. Now, I'm not proposing that everybody else needs to have -Wcast-qual. I'm just saying that I'd like to be able to have that in my copy of the project. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude schema during pg_restore
Hi, On Tue, Sep 20, 2016 at 08:59:37PM -0400, Peter Eisentraut wrote: > On 9/19/16 3:23 PM, Michael Banck wrote: > > Version 2 attached. > > Committed, thanks. Thanks! > I added the new option to the help output in pg_restore. Oh, sorry I missed that. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better tracking of free space during SP-GiST index build
Tomas Vondra writes: >> On 08/25/2016 01:45 AM, Tom Lane wrote: >>> I'll put this in the commitfest queue. It could use review from >>> someone with the time and motivation to do performance >>> testing/tuning. > I've been toying with this patch a bit today, particularly looking at > (1) sizing of the cache, and (2) how beneficial it'd be to choose pages > from the cache in a smarter way. Thanks for reviewing! > I wonder why the patch only sets the cache size to 100 items, when we > might fit many more entries into the ~8kB limit. I chose that because it would still work with the minimum allowed page size of 1K. We could make the cache size variable depending on BLCKSZ, but I'm not sure it's worth the trouble. There is some cost to a larger lastUsedPages array, in that you spend more time memcpy'ing it back and forth between the metapage buffer and backend local memory; and I was afraid of that outweighing the incremental gains from a larger cache. > ... I've tried increasing the cache size to 768 > entries, with vast majority of them (~600) allocated to leaf pages. > Sadly, this seems to only increase the CREATE INDEX duration a bit, > without making the index significantly smaller (still ~120MB). Yeah, that's in line with my results: not much further gain from a larger cache. Though if you were testing with the same IRRExplorer data, it's not surprising that our results would match. Would be good to try some other cases... > I do think selecting the page with the least free space would save > additional space. Instead of making SpGistGetBuffer() more complicated, > I've simply shoved a pg_qsort() calls on a few places, sorting the cache > by free space, with unused slots at the end. Clearly, this is quite > expensive and proper patch would have to do that differently (e.g. > maintaining the order incrementally), but OTOH it'd allow some > optimizations in SpGistGetBuffer() - the first page with enough free > space would have the smallest amount of free space (more or less). > This actually helped a bit, and the index size dropped by ~2MB. So not > bad, but nowhere close to the initial 132MB -> 120MB improvement. OK, I'll think about how to do that more efficiently. The smaller incremental improvement isn't surprising, because in this example the index would still be 90-something MB if it had no free space at all, so there's going to be decreasing returns from any additional work to avoid wasted free space. But if we can do it cheaply, this does suggest that using pages in order by free space is of value. > It's worth mentioning the spgist fillfactor is a bit crude, most likely > thanks to splits - e.g. the 109MB index still contains ~10MB of free > space on the pages (measures using pageinspect as upper-lower), so > almost 10%. Perhaps it really is time to increase the spgist default > fillfactor? Maybe; I don't think anyone's put much effort into tuning that. > It seems the patch keeps new/empty/deleted pages in the cache, and thus > segregated by type. Is that intentional, or should > SpGistSetLastUsedPage() remove such pages from the cache? Or maybe move > them into a special category? It's true we'll reuse those pages, as > allocNewBuffer() allocates the buffer directly, but those pages are > unlikely to get evicted from the cache due to high freeSpace value > (despite possibly already reused). My thought was that once we've found a new/empty/deleted page, putting it in the cache is a cheap way of making sure it gets used soon. Sure, we could record it in the FSM instead, but that sounds relatively more expensive. That intuition could be wrong of course. > Similarly for completely full pages (with freeSpace==0) - does it make > sense to keep them in the cache? Although it's probably harmless, as > those pages will get evicted first if needed. IIRC, we don't have anything to replace the cache entry with at that point, so there's no particular value in marking the entry unused rather than used-but-with-zero-space. It will be first priority for replacement either way, so there seemed little point in writing more code to make it unused. > One thing I'd change is making the SpGistLUPCache dynamic, i.e. storing > the size and lastUsedPagesMap on the meta page. That should allow us > resizing the cache and tweak lastUsedPagesMap in the future. Yeah, probably a good idea. I had thought of bumping SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but keeping it as a separate field won't add noticeable cost, and it might save some trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/Po
Amit Kapila writes: > On Tue, Sep 20, 2016 at 10:33 PM, Tom Lane wrote: >> ISTM both the previous coding and this version can fail for no good >> reason, that is what if GetLastError() happens to return one of these >> error codes as a result of some unrelated failure from before this >> subroutine is entered? That is, wouldn't it be a good idea to >> do SetLastError(0) before calling CreateFileMapping? > Yes, that seems like a good idea. Do you need a patch with some > testing on windows environment? Please; I can't test it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Wed, Sep 21, 2016 at 08:44:15PM +0100, Geoff Winkless wrote: > On 21 September 2016 at 13:29, Robert Haas wrote: > > I'd be curious what benefits people expect to get. > > An edge case I came across the other day was a unique index on a large > string: postgresql popped up and told me that I couldn't insert a > value into the field because the BTREE-index-based constraint wouldn't > support the size of string, and that I should use a HASH index > instead. Which, of course, I can't, because it's fairly clearly > deprecated in the documentation... Thanks for that. Forgot about that bit of nastiness. I came across the above migrating a MySQL app to PostgreSQL. MySQL, I believe, handles this by silently truncating the string on index. PostgreSQL by telling you it can't index. :( So, as a result, AFAIK, I had a choice between a trigger that did a left() on the string and inserts it into a new column on the table that I can then index or do an index on left(). Either way you wind up re-writing a whole bunch of queries. If I wanted to avoid the re-writes I had the option of making the DB susceptible to poor recovery from crashes, et all. No matter which option I chose, the end result was going to be ugly. It would be good not to have to go ugly in such situations. Sometimes one size does not fit all. For me this would be a second major case where I'd use usable hashed indexes the moment they showed up. Andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
Rushabh Lathia writes: > I agree with the argument in this thread, having "Source code" as part > of \df+ is bit annoying, specifically when output involve some really > big PL language functions. Having is separate does make \df+ output more > readable. So I would vote for \df++ rather then adding the source code > as part of footer for \df+. If it's unreadable in \df+, how would \df++ make that any better? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
Craig Ringer writes: > On 13 September 2016 at 22:02, Tom Lane wrote: >> Without taking a position on the merits of this patch per se, I'd like >> to say that I find the argument for back-patching into 9.6 and not >> further than that to be pretty dubious. $(prove_check) has been there >> since 9.4, and in the past we've often regretted it when we failed >> to back-patch TAP infrastructure fixes all the way back to 9.4. > No objection to backpatching, I just thought I'd be more intrusive to > do that than just 9.6. > Since 9.5 and older have more limited versions of PostgresNode which > lack safe_psql, etc, I'm not sure it's very practical for extensions > to bother running TAP tests on 9.4 and 9.5 anyway. Certainly there are restrictions, but I'd imagine that every new release will be adding features to the TAP test infrastructure for some time to come. I think it's silly to claim that 9.6 is the first branch where TAP testing is usable at all. > Extension authors can just use: > ifeq ($(MAJORVERSION),9.6) > endif > when defining their prove rules. That will break as soon as 10 comes out. And numerical >= tests aren't all that convenient in Make. It'd be much better if a test on whether $(prove_check) is defined would be sufficient. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gratuitous casting away const
Mark Dilger writes: >> On Sep 20, 2016, at 1:06 PM, Tom Lane wrote: >> ... that seems to be discarding type information in order to add >> "const"; does not seem like a net benefit from here. > The following seems somewhere in between, with ItemPointer > changing to const ItemPointerData *. I expect you would not care > for this change, but thought I'd check to see where you draw the line: I'd call this kind of a wash, I guess. I'd be more excited about it if the change allowed removal of an actual cast-away-of-constness somewhere. I suppose it's a bit of a chicken and egg situation, in that the lack of const markings on leaf subroutines discourages use of "const" in callers, and you have to start somewhere if you want to make it better. But I don't really want to just plaster "const" onto individual functions without some larger vision of where we're going and which code is going to benefit. Otherwise it seems like mostly just code churn. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
Alvaro Herrera writes: > Petr Jelinek wrote: >>> I'm not sure how well it will work to replace all the bits of LIKE and >>> regular expressions with ICU API calls. One problem is that ICU likes >>> to do case folding as a whole string, not by character. I need to do >>> more research about that. >> Can't we use the regular expression api provided by ICU, or is that too >> incompatible? > That probably won't fly very well -- it would be rather absurd to have > different regex behavior when ICU is compiled compared to when it's > not. Perhaps down the road we could have an extension that provides > ICU-based regex operators, but most likely not in core. The regex stuff does not handle case-insensitivity by case-folding the data string, anyway. Rather, it does that by converting 'a' or 'A' in the pattern to the equivalent of '[aA]'. So performance for that step is not terribly critical. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgpassfile connection option
I haven't really thought about this as I had been asked to make this work as an additional option to the connection parameters... Now that I've looked at it - there is really only the benefit of saving the step of setting the PGPASSFILE environment variable. However, there might be cases in which setting an environment variable might not be the easiest option. Am 22.09.2016 um 17:15 schrieb Andrew Dunstan: I'm not necessarily opposed to this, but what is the advantage over the existing PGPASSFILE environment setting mechanism? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
Petr Jelinek wrote: > > I'm not sure how well it will work to replace all the bits of LIKE and > > regular expressions with ICU API calls. One problem is that ICU likes > > to do case folding as a whole string, not by character. I need to do > > more research about that. > > Can't we use the regular expression api provided by ICU, or is that too > incompatible? That probably won't fly very well -- it would be rather absurd to have different regex behavior when ICU is compiled compared to when it's not. Perhaps down the road we could have an extension that provides ICU-based regex operators, but most likely not in core. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgpassfile connection option
On 09/22/2016 10:44 AM, Julian Markwort wrote: Hello psql-hackers! We thought it would be advantageous to be able to specify a 'custom' pgpassfile within the connection string along the lines of the existing parameters sslkey and sslcert. Which is exactly what this very compact patch does. The patch is minimally invasive - when no pgpassfile attribute is provided in the connection string, the regular pgpassfile is used. The security-measures (which are limited to checking the permissions for 0600) are kept, however we could loosen that restriciton to allow group access as well along the lines of the ssl key file , if this is preferred. (in case multiple users belonging to the same group would like to connect using the same file). The patch applies cleanly to master and compiles and runs as expected (as there are no critical alterations). I've not written any documentation as of now, but I'll follow up closely if there is any interest for this patch. notes: - using ~ to denote the user's home directory in the path does not work, however $HOME works (as this is translated by bash beforehand). - the notation in the custom pgpassfile should follow the notation of the 'default' pgpass files: hostname:port:database:username:password - this has only been tested on linux so far, however due to the nature of the changes I suspect that there is nothing that could go wrong in other environments, although I could test that as well, if deemed necessary. I'm not necessarily opposed to this, but what is the advantage over the existing PGPASSFILE environment setting mechanism? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl promote wait
On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut wrote: > On 8/11/16 9:28 AM, Michael Paquier wrote: >> I have looked at them and the changes are looking fine for me. So I >> have switched the patch as ready for committer, aka you. >> >> Just a nit: >> + if (wait_seconds > 0) >> + { >> + sleep(1); >> + wait_seconds--; >> + continue; >> + } >> This may be better this pg_usleep() instead of sleep(). > > Committed with that adjustment. > Commit e7010ce seems to forget to change help message of pg_ctl. Attached patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_pg_ctl_help.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] pgpassfile connection option
Hello psql-hackers! We thought it would be advantageous to be able to specify a 'custom' pgpassfile within the connection string along the lines of the existing parameters sslkey and sslcert. Which is exactly what this very compact patch does. The patch is minimally invasive - when no pgpassfile attribute is provided in the connection string, the regular pgpassfile is used. The security-measures (which are limited to checking the permissions for 0600) are kept, however we could loosen that restriciton to allow group access as well along the lines of the ssl key file , if this is preferred. (in case multiple users belonging to the same group would like to connect using the same file). The patch applies cleanly to master and compiles and runs as expected (as there are no critical alterations). I've not written any documentation as of now, but I'll follow up closely if there is any interest for this patch. notes: - using ~ to denote the user's home directory in the path does not work, however $HOME works (as this is translated by bash beforehand). - the notation in the custom pgpassfile should follow the notation of the 'default' pgpass files: hostname:port:database:username:password - this has only been tested on linux so far, however due to the nature of the changes I suspect that there is nothing that could go wrong in other environments, although I could test that as well, if deemed necessary. I'm looking forward to any feedback, Julian -- Julian Markwort Westphalian Wilhelms-University in Münster julian.markw...@uni-muenster.de diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 9b2839b..5c0d88a 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -184,6 +184,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { "Database-Password", "*", 20, offsetof(struct pg_conn, pgpass)}, + {"pgpassfile", "PGPASSFILE", NULL, NULL, + "Database-Password-File", "", 64, + offsetof(struct pg_conn, pgpassfile)}, + {"connect_timeout", "PGCONNECT_TIMEOUT", NULL, NULL, "Connect-timeout", "", 10, /* strlen(INT32_MAX) == 10 */ offsetof(struct pg_conn, connect_timeout)}, @@ -375,7 +379,7 @@ static int parseServiceFile(const char *serviceFile, bool *group_found); static char *pwdfMatchesString(char *buf, char *token); static char *PasswordFromFile(char *hostname, char *port, char *dbname, - char *username); + char *username, char *pgpassfile); static bool getPgPassFilename(char *pgpassfile); static void dot_pg_pass_warning(PGconn *conn); static void default_threadlock(int acquire); @@ -806,8 +810,9 @@ connectOptions2(PGconn *conn) { if (conn->pgpass) free(conn->pgpass); + /* We'll pass conn->pgpassfile regardless of it's contents - checks happen in PasswordFromFile() */ conn->pgpass = PasswordFromFile(conn->pghost, conn->pgport, - conn->dbName, conn->pguser); + conn->dbName, conn->pguser, conn->pgpassfile); if (conn->pgpass == NULL) { conn->pgpass = strdup(DefaultPassword); @@ -5699,14 +5704,15 @@ pwdfMatchesString(char *buf, char *token) return NULL; } -/* Get a password from the password file. Return value is malloc'd. */ +/* Get a password from the password file or the user-specified pgpassfile. Return value is malloc'd. */ static char * -PasswordFromFile(char *hostname, char *port, char *dbname, char *username) +PasswordFromFile(char *hostname, char *port, char *dbname, char *username, char *pgpassfile) { FILE *fp; - char pgpassfile[MAXPGPATH]; + char temp_path[MAXPGPATH]; struct stat stat_buf; + #define LINELEN NAMEDATALEN*5 char buf[LINELEN]; @@ -5731,8 +5737,13 @@ PasswordFromFile(char *hostname, char *port, char *dbname, char *username) if (port == NULL) port = DEF_PGPORT_STR; - if (!getPgPassFilename(pgpassfile)) - return NULL; + if(!pgpassfile || pgpassfile[0]=='\0') + { + /* If no pgpassfile was supplied by the user or it is empty, we try to get a global pgpassfile */ + if (!getPgPassFilename(temp_path)) + return NULL; + pgpassfile = temp_path; + } /* If password file cannot be opened, ignore it. */ if (stat(pgpassfile, &stat_buf) != 0) diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 1183323..ae9317f 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -317,6 +317,7 @@ struct pg_conn char *replication; /* connect as the replication standby? */ char *pguser; /* Postgres username and password, if any */ char *pgpass; + char *pgpassfile; /* path to a file containing the password */ char *keepalives; /* use TCP keepalives? */ char *keepalives_idle; /* time between TCP keepalives */ char *keepalives_interval; /* time between TCP keepalive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mai
Re: [HACKERS] Executor's internal ParamExecData Params versus EvalPlanQual
I wrote: > The reason this happens is that EvalPlanQualBegin copies down all the > es_param_exec_vals values from the outer query into the EPQ execution > state. That's OK for most uses of es_param_exec_vals values, but not > at all for cteParam Params, which are used as internal communication > variables within a plan tree. After further study and caffeine-consumption, I concluded that that's wrong. There's nothing wrong with reusing the result of a CTE that has already been scanned by the original query invocation (its output can't be affected by the new candidate-for-update row). The problem is that ExecInitCteScan is assuming that when it initializes as a follower, the tuplestore read pointer it's given by tuplestore_alloc_read_pointer will be pointing to the start of the tuplestore. That's wrong; the new read pointer is defined as being a clone of read pointer 0, which is already at EOF in this scenario. So a simple and localized fix is to forcibly rewind the new read pointer, as in the attached patch. (This should have negligible cost as long as the tuplestore hasn't yet spilled to disk.) I also considered redefining tuplestore_alloc_read_pointer as creating a read pointer that points to start-of-tuplestore. The other existing callers wouldn't care, because they are working with a just-created, known-empty tuplestore. But there is a risk of breaking third-party code, so this didn't seem like a back-patchable solution. Also, I think the reason why tuplestore_alloc_read_pointer was given this behavior is so that it could succeed even if the tuplestore has already been moved forward and perhaps had old data truncated off it. With the other behavior, it would have to have the same failure cases as tuplestore_rescan. BTW, I no longer think the recursive-union case is broken; rather, failure to copy its communication Param would break it, in scenarios where a WorkTableScan is underneath a SELECT FOR UPDATE that's underneath the RecursiveUnion. So that's another reason not to mess with the Param propagation logic. regards, tom lane diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c index 3c2f684..162650a 100644 *** a/src/backend/executor/nodeCtescan.c --- b/src/backend/executor/nodeCtescan.c *** ExecInitCteScan(CteScan *node, EState *e *** 224,232 --- 224,236 { /* Not the leader */ Assert(IsA(scanstate->leader, CteScanState)); + /* Create my own read pointer, and ensure it is at start */ scanstate->readptr = tuplestore_alloc_read_pointer(scanstate->leader->cte_table, scanstate->eflags); + tuplestore_select_read_pointer(scanstate->leader->cte_table, + scanstate->readptr); + tuplestore_rescan(scanstate->leader->cte_table); } /* diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out index 5898d94..10c784a 100644 *** a/src/test/isolation/expected/eval-plan-qual.out --- b/src/test/isolation/expected/eval-plan-qual.out *** ta_id ta_value tb_row *** 163,165 --- 163,186 1 newTableAValue (1,tableBValue) step c2: COMMIT; + + starting permutation: wrtwcte readwcte c1 c2 + step wrtwcte: UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; + step readwcte: + WITH + cte1 AS ( + SELECT id FROM table_b WHERE value = 'tableBValue' + ), + cte2 AS ( + SELECT * FROM table_a + WHERE id = (SELECT id FROM cte1) + FOR UPDATE + ) + SELECT * FROM cte2; + + step c1: COMMIT; + step c2: COMMIT; + step readwcte: <... completed> + id value + + 1 tableAValue2 diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec index de481a3..7ff6f6b 100644 *** a/src/test/isolation/specs/eval-plan-qual.spec --- b/src/test/isolation/specs/eval-plan-qual.spec *** step "readforss" { *** 103,113 --- 103,129 FROM table_a ta WHERE ta.id = 1 FOR UPDATE OF ta; } + step "wrtwcte" { UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; } step "c2" { COMMIT; } session "s3" setup { BEGIN ISOLATION LEVEL READ COMMITTED; } step "read" { SELECT * FROM accounts ORDER BY accountid; } + + # this test exercises EvalPlanQual with a CTE, cf bug #14328 + step "readwcte" { + WITH + cte1 AS ( + SELECT id FROM table_b WHERE value = 'tableBValue' + ), + cte2 AS ( + SELECT * FROM table_a + WHERE id = (SELECT id FROM cte1) + FOR UPDATE + ) + SELECT * FROM cte2; + } + teardown { COMMIT; } permutation "wx1" "wx2" "c1" "c2" "read" *** permutation "writep2" "returningp1" "c1" *** 118,120 --- 134,137 permutation "wx2" "partiallock" "c2" "c1" "read" permutation "wx2" "lockwithvalues" "c2" "c1" "read" permutation "updateforss" "readforss" "c1" "c2"
Re: [HACKERS] Parallel sec scan in plpgsql
On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila wrote: > I think for certain cases like into clause, the rows passed will be > equal to actual number of rows, otherwise it will generate error. So > we can pass that information in executor layer. Another kind of cases > which are worth considering are when from plpgsql we fetch limited > rows at-a-time, but we fetch till end like the case of > exec_stmt_return_query(). Yes, I think that those cases can be considered. Some careful code inspection will be needed to make sure the cases we want to enable are safe, and some testing will be needed to make sure they behave properly. But it doesn't sound like an unsolvable problem. I hope someone who isn't me will decide to work on it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Thu, Sep 22, 2016 at 1:52 AM, Michael Paquier wrote: > Then comes the interesting bits: I don't think that it is a good idea > to associate only one event for a waiting point in the system views. > Say that at a waiting point WaitLatch is called with > WL_POSTMASTER_DEATH and WL_TIMEOUT, I'd rather let the user know that > both of them have been set up and not choose just one of them using > some hierarchy. But I rather think that we should list all of them > depending on the WL_* flags set up by the caller. Here comes a second > patch: let's use the last byte of the wait events and add this new > text[] column in pg_stat_activity, say wait_details. So for a waiting > point it would be possible to tell to the user that it is waiting on > '{socket,timeout,postmaster_death}' for example. I think this is focusing on the wrong thing. What we need here is not infinite detail on exactly what is going on under the hood but useful classification rules. For example, as I said in my email to Thomas, being able to exclude all cases of waiting for client I/O is useful because those aren't signs of a problem in the way that LWLock or Lock waits are. It's better for us to provide that classification using the existing system than for users to have to work out exactly which things ought to be excluded on the basis of specific event names. On the other hand, I submit that knowing which waits will be interrupted by the death of the postmaster and which will not doesn't add much. In fact, I think that's almost an anti-feature, because it will encourage users to care about details that are very rarely relevant. > Then comes a third patch: addition of a system view to list all the > activity happening for processes that are not dependent on > max_connections. pg_stat_activity has the advantage, as Tom mentioned > a couple of days ago, that one can just run count(*) on it to estimate > the number of connections left. I think this is quite helpful. Or we > could just add a column in pg_stat_activity to mark a process type: is > it a system process or not? This deserves its own discussion for sure. Sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro wrote: >> Moreover, it's pretty confusing that we have this general concept of >> wait events in pg_stat_activity, and then here the specific type of >> wait event we're waiting for is the ... wait event kind. Uh, what? > > Yeah, that is confusing. It comes about because of the coincidence > that pg_stat_activity finished up with a wait_event column, and our IO > multiplexing abstraction finished up with the name WaitEventSet. > We could instead call these new things "wait > points", because, well, erm, they name points in the code at which we > wait. They don't name events (they just happen to use the > WaitEventSet mechanism, which itself does involve events: but those > events are things like "hey, this socket is now ready for > writing"). Sure, we could do that, but it means breaking backward compatibility for pg_stat_activity *again*. I'd be willing to do it but I don't think I'd win any popularity contests. >> I have to admit that I like the individual event names quite a bit, >> and I think the detail will be useful to users. But I wonder if >> there's a better way to describe the class of events that we're >> talking about that's not so dependent on internal data structures. >> Maybe we could divide these waits into a couple of categories - e.g. >> "Socket", "Timeout", "Process" - and then divide these detailed wait >> events among those classes. > > Well we already discussed a couple of different ways to get "Socket", > "Latch", "Socket|Latch", ... or something like that into the > wait_event_type column or new columns. Wouldn't that be better, and > automatically fall out of the code rather than needing human curated > categories? Although Michael suggested that that should be done as a > separate patch on top of the basic feature. I think making that a separate patch is just punting the decision down the field to a day that may never come. Let's try to agree on something that we can all live with and implement it now. In terms of avoiding human-curated categories, I basically see two options: 1. Find a name that is sufficiently generic that it covers everything that might reach WaitEventSetWait either now or in the future when it might wait for even more kinds of things than it does now. For example, we could call it "Stuff" or "Thing". Less tongue-in-cheek suggestions are welcome, but it's hard to come up with something that is sufficiently-generic without being tautological. "Event" is an example of a name which is general enough to encompass everything but also stupid: the column is called "wait_event" so everything that appears in it is an event by definition. 2. Classify the events that fall into this category by some rigid principle based on the types of things being awaited. For example, we could decide that if any sockets are awaited, the event class will be "Client" if it is connected to a user and "IPC" for auxiliary processes. For myself, I don't see any real problem with using humans to classify things; that's pretty much the one thing humans are much better at than computers, so we might as well take advantage of it. I think that it's useful to try to group together types of waits which the user will see as logically related to each other, even if that involves applying some human judgement that might someday lead to some discussion about what the best categorization for some new thing is. PostgreSQL is intended to be used by humans, and avoiding discussions (or even arguments) on pgsql-hackers shouldn't outrank usability on the list of concerns. So, I tried to classify these. Here's what I came up with. Activity: ArchiverMain, AutoVacuumMain, BgWriterMain, BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll, RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain, WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather, MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive, MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep Extension: Extension I classified all of the main loop waits as waiting for activity; all of those are background processes that are waiting for something to happen and are more or less happy to sleep forever until it does. I also included the RecoveryWalAll and RecoveryWalStream events in there; those don't have the sort of "main loop" flavor of the others but they are happy to wait more or less indefinitely for something to occur. Likewise, it was pretty easy to find all of the events that were waiting for client I/O, and I grouped those all under "Client". A few of the remaining wait events seemed like they were clearly waiting for a particular timeout to expire, so I gave those their own "Timeout" category. I believe these categorizations are actually useful f
[HACKERS] Typo in libpq-int.h
Ran into a typo in libpq-int.h while reading/hacking: - char *gsslib; /* What GSS librart to use ("gssapi" or + char *gsslib; /* What GSS library to use ("gssapi” or Patch attached. cheers ./daniel typo-libpq-int.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Executor's internal ParamExecData Params versus EvalPlanQual
I looked into the misbehavior reported in bug #14328, https://www.postgresql.org/message-id/20160919160136.1348.55...@wrigleys.postgresql.org What is happening is that during the EvalPlanQual recheck to see whether the updated row still satisfies the SELECT's quals, we run ExecInitCteScan and then CteScanNext in the EPQ-created copy of the plan tree. This should result in a fresh scan of the underlying CTE ... but it does not, because ExecInitCteScan sees the communication value estate->es_param_exec_vals[node->cteParam] already set and decides it is not a leader node, but rather a follower of the outer-query CTE Scan node that it's a doppelganger of. That one's already at EOF so the node->leader->eof_cte test fails in CteScanNext and it returns nothing. The reason this happens is that EvalPlanQualBegin copies down all the es_param_exec_vals values from the outer query into the EPQ execution state. That's OK for most uses of es_param_exec_vals values, but not at all for cteParam Params, which are used as internal communication variables within a plan tree. I believe that recursive unions are probably broken in EPQ rechecks in the same way, since they use a Param for similar intra-plan communication purposes, but haven't bothered to devise a test case to prove it. I think the most robust way to fix this is to explicitly mark es_param_exec_vals entries as to whether they should be copied down to the EPQ execution state. There is room in struct ParamExecData to fit a uint8 or uint16 field into existing pad space, so we could add a flags field and then use one bit of it for this purpose without causing any ABI breaks in the back branches. A simpler fix would be to decide that we never need to copy any es_param_exec_vals entries into the EPQ execution state. I think that that would work, but it would provoke extra evaluations of InitPlan subplans, and I'm hesitant to make such a significant behavioral change without a lot more analysis. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] get_home_path: use HOME
Tom Lane wrote: > If we take this patch, what's to stop someone from complaining that we > broke *their* badly-designed system that abuses the HOME variable? POSIX warns against doing that, listing HOME in the variables that should be left to their intended usage: http://pubs.opengroup.org/onlinepubs/9699919799/ If the variables in the following two sections are present in the environment during the execution of an application or utility, they shall be given the meaning described below [...] HOME The system shall initialize this variable at the time of login to be a pathname of the user's home directory. See . psql is indirectly using $HOME already for readline and terminfo: $ HOME=/tmp/home2 strace psql 2>tr ; grep home2 tr ... stat("/tmp/home2/.terminfo", 0x7ff985bf4730) = -1 ENOENT (No such file or directory) stat("/tmp/home2/.inputrc", 0x7fff3f641d70) = -1 ENOENT (No such file or directory) Also when using Debian's psql, the wrapper looks for it own config file in $HOME: open("/tmp/home2/.postgresqlrc", O_RDONLY) = -1 ENOENT (No such file or directory) Being written in Perl, it could use getpwuid(), but it doesn't, like I believe the majority of programs that just want the home directory. +1 on using HOME for being consistent with other pieces of code around postgres, and for the easiness of locally overriding it when troubleshooting problems with dot files. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On Tue, Sep 20, 2016 at 8:31 PM, Robert Haas wrote: > On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila wrote: >> I think here point is that for any case where there is count of rows >> to be selected, we disable parallelism. There are many genuine cases >> like select count(*) into cnt ... which will run to completion, but as >> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode. >> There are couple other cases like that. Do you see a reason for not >> enabling parallelism for such cases? > > If we can somehow know that the rowcount which is passed is greater > than or equal to the actual number of rows which will be generated, > then it's fine to enable parallelism. > I think for certain cases like into clause, the rows passed will be equal to actual number of rows, otherwise it will generate error. So we can pass that information in executor layer. Another kind of cases which are worth considering are when from plpgsql we fetch limited rows at-a-time, but we fetch till end like the case of exec_stmt_return_query(). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File system operations.
On четверг, 15 сентября 2016 г. 19:09:16 MSK, Tom Lane wrote: Robert Haas writes: On Thu, Sep 15, 2016 at 11:01 AM, Anastasia Lubennikova wrote: What do you think about moving stuff from pg_upgrade/file.c to storage/file/ to allow reuse of this code? I think it'll be really helpful for developers of contrib modules like backup managers. Well, storage/file is backend and pg_upgrade is frontend. If you want to reuse the same code for both it's got to go someplace else. Also, to the extent that it's important to use those wrappers rather than libc directly, it's because the wrappers are tied into backend resource management and error handling conventions. I don't see how you convert that into code that also works in a frontend program, or what the point would be exactly. Maybe we should towards to framework ecosystem. I mean, developers of third party tools must use maximum same api what in backend. -- Yury Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - compute & show latency consistently
On Wed, Sep 21, 2016 at 6:53 PM, Fabien COELHO wrote: > In front of the tps line. Well, the performance displayed could also be > improved... On my dual core SSD laptop I just got: > > sh> ./pgbench -c 10 -t 1000 > starting vacuum...end. > transaction type: > scaling factor: 100 > query mode: simple > number of clients: 10 > number of threads: 1 > number of transactions per client: 1000 > number of transactions actually processed: 1/1 > latency average = 9.527 ms > tps = 1049.665115 (including connections establishing) > tps = 1049.890194 (excluding connections establishing) > > Which is about 10 times better. Yes, you are right. In the documentation, the above example has not been updated across different pg versions. Although this is just an example, it should reflect the current performance. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing docs about GetForeignUpperPaths in fdwhandler.sgml
On Fri, Sep 2, 2016 at 5:00 PM, Etsuro Fujita wrote: > Hi Robert, > > Thanks for the comments! > > On 2016/09/02 11:55, Robert Haas wrote: > >> On Mon, Aug 1, 2016 at 5:44 PM, Etsuro Fujita >> wrote: >> >>> I noticed that the following note about direct modification via >>> GetForeignUpperPaths in fdwhandler.sgml is a bit confusing. We have >>> another >>> approach using PlanDirectModify, so that should be reflected in the note >>> as >>> well. Please find attached a patch. >>> >>> PlanForeignModify and the other callbacks described in >>> are designed around the >>> assumption >>> that the foreign relation will be scanned in the usual way and then >>> individual row updates will be driven by a local >>> ModifyTable >>> plan node. This approach is necessary for the general case where an >>> update requires reading local tables as well as foreign tables. >>> However, if the operation could be executed entirely by the foreign >>> server, the FDW could generate a path representing that and insert >>> it >>> into the UPPERREL_FINAL upper relation, where it would >>> compete against the ModifyTable approach. >>> >> > I suppose this is factually correct, but I don't think it's very >> illuminating. I think that if we're going to document both >> approaches, there should be some discussion of the pros and cons of >> PlanDirectModify vs. PlanForeignModify. >> > > PlanDirectModify vs. GetForeignUpperPaths for an UPPERREL_FINAL upper > relation? > > Of course either should be >> better than an iterative ModifyTable, but how should the FDW author >> decide between the two of them? >> > > That would apply to row locking. We have two approaches for that too: > GetForeignRowMarkType and GetForeignUpperPaths, which is documented in the > same paragraph following the above documentation: > > This approach > could also be used to implement remote SELECT FOR UPDATE, > rather than using the row locking callbacks described in > . Keep in mind that a path > > The point of the patch is just to let the FDW author know that there is > another approach for implementing direct modification (ie, > PlanDirectModify) just as for implementing row locking. > > Considering the primary object of this patch is just to let the FDW author know that there is another approach for implementing direct modification, I like the idea of modifying the document. I agree that the documentation about how the FDW author should decide > between the two would be helpful, but I'd like to leave that for future > work. > I performed basic test with patch, a) patch get applied cleanly on latest source, b) able to build documentation cleanly. Marking this as ready for committer. > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Rushabh Lathia
Re: [HACKERS] Declarative partitioning - another take
On Thu, Sep 22, 2016 at 1:02 PM, Ashutosh Bapat wrote: > For list partitions, the ListInfo stores the index maps for values > i.e. the index of the partition to which the value belongs. Those > indexes are same as the indexes in partition OIDs array and come from > the catalogs. In case a user creates two partitioned tables with > exactly same lists for partitions but specifies them in a different > order, the OIDs are stored in the order specified. This means that > index array for these tables come out different. equal_list_info() > works around that by creating an array of mappings and checks whether > that mapping is consistent for all values. This means we will create > the mapping as many times as equal_list_info() is called, which is > expected to be more than the number of time > RelationBuildPartitionDescriptor() is called. Instead, if we > "canonicalise" the indexes so that they come out exactly same for > similarly partitioned tables, we build the mapping only once and > arrange OIDs accordingly. > > Here's patch to do that. I have ran make check with this and it didn't > show any failure. Please consider this to be included in your next set > of patches. The patch has an if condition as statement by itself +if (l1->null_index != l2->null_index); return false; There shouldn't be ';' at the end. It looks like in the tests you have added the function always bails out before it reaches this statement. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Tue, Sep 20, 2016 at 4:26 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > PFA patch which takes care of some of the TODOs mentioned in my > previous mail. The patch is based on the set of patches supporting > declarative partitioning by Amit Langoted posted on 26th August. I have applied declarative partitioning patches posted by Amit Langote on 26 Aug 2016 and then latest partition-wise-join patch, getting below error while make install. ../../../../src/include/catalog/partition.h:37: error: redefinition of typedef ‘PartitionScheme’ ../../../../src/include/nodes/relation.h:492: note: previous declaration of ‘PartitionScheme’ was here make[4]: *** [commit_ts.o] Error 1 make[4]: Leaving directory `/home/edb/Desktop/edb_work/WORKDB/PG_PWJ/postgresql/src/backend/access/transam' make[3]: *** [transam-recursive] Error 2 make[3]: Leaving directory `/home/edb/Desktop/edb_work/WORKDB/PG_PWJ/postgresql/src/backend/access' make[2]: *** [access-recursive] Error 2 make[2]: Leaving directory `/home/edb/Desktop/edb_work/WORKDB/PG_PWJ/postgresql/src/backend' make[1]: *** [all-backend-recurse] Error 2 make[1]: Leaving directory `/home/edb/Desktop/edb_work/WORKDB/PG_PWJ/postgresql/src' make: *** [all-src-recurse] Error 2 PS : I am using - gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17) I have commented below statement in src/include/catalog/partition.h file and then tried to install, it worked fine. /* typedef struct PartitionSchemeData*PartitionScheme; */ Thanks & Regards, Rajkumar Raghuwanshi
Re: [HACKERS] README of hash index
On Fri, Sep 16, 2016 at 9:56 PM, Jeff Janes wrote: > On Fri, Sep 16, 2016 at 4:20 AM, Amit Kapila > wrote: >> >> Currently README of hash module contain algorithms written in below form. >> >> The insertion algorithm is rather similar: >> >> pin meta page and take buffer content lock in shared mode >> loop: >> compute bucket number for target hash key >> release meta page buffer content lock >> if (correct bucket page is already locked) >> break >> release any existing bucket page lock (if a concurrent split happened) >> take heavyweight bucket lock in shared mode >> retake meta page buffer content lock in shared mode >> -- (so far same as reader) >> release pin on metapage >> .. >> .. >> >> I have mostly updated them in the patches I have proposed to improve >> hash index. However, each time I try to update them, I find that it >> is easy to follow the code than to read and understand the existing >> algorithm written in above form from README. >> >> Do others find it useful to maintain the algorithms in above form? > > > I think that having them all condensed into one place makes it easier to > think through the locking models to decide if there might be races or > deadlocks. If you only care about the algorithm for inserting in isolation, > I agree reading the code might be better. > Thanks Jeff and Ken for the inputs. I will keep the algorithms updated in README of hash index. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Thu, Sep 22, 2016 at 4:17 AM, Heikki Linnakangas wrote: > On 09/22/2016 03:40 AM, Claudio Freire wrote: >> >> On Tue, Sep 20, 2016 at 3:34 PM, Claudio Freire >> wrote: >>> >>> The results seem all over the map. Some regressions seem significant >>> (both in the amount of performance lost and their significance, since >>> all 4 runs show a similar regression). The worst being "CREATE INDEX >>> ix_lotsofitext_zz2ijw ON lotsofitext (z, z2, i, j, w);" with 4GB >>> work_mem, which should be an in-memory sort, which makes it odd. >>> >>> I will re-run it overnight just in case to confirm the outcome. >> >> >> A new run for "patched" gives better results, it seems it was some >> kind of glitch in the run (maybe some cron decided to do something >> while running those queries). >> >> Attached >> >> In essence, it doesn't look like it's harmfully affecting CPU >> efficiency. Results seem neutral on the CPU front. > > > Looking at the spreadsheet, there is a 40% slowdown in the "slow" "CREATE > INDEX ix_lotsofitext_zz2ijw ON lotsofitext (z, z2, i, j, w);" test with 4GB > of work_mem. I can't reproduce that on my laptop, though. Got any clue > what's going on there? It's not present in other runs, so I think it's a fluke the spreadsheet isn't filtering out. Especially considering that one should be a fully in-memory fast sort and thus unaffected by the current patch (z and z2 being integers, IIRC, most comparisons should be about comparing the first columns and thus rarely involve the big strings). I'll try to confirm that's the case though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
For list partitions, the ListInfo stores the index maps for values i.e. the index of the partition to which the value belongs. Those indexes are same as the indexes in partition OIDs array and come from the catalogs. In case a user creates two partitioned tables with exactly same lists for partitions but specifies them in a different order, the OIDs are stored in the order specified. This means that index array for these tables come out different. equal_list_info() works around that by creating an array of mappings and checks whether that mapping is consistent for all values. This means we will create the mapping as many times as equal_list_info() is called, which is expected to be more than the number of time RelationBuildPartitionDescriptor() is called. Instead, if we "canonicalise" the indexes so that they come out exactly same for similarly partitioned tables, we build the mapping only once and arrange OIDs accordingly. Here's patch to do that. I have ran make check with this and it didn't show any failure. Please consider this to be included in your next set of patches. That helps partition-wise join as well. For partition-wise join (and further optimizations for partitioned tables), we create a list of canonical partition schemes. In this list two similarly partitioned tables share partition scheme pointer. A join between relations with same partition scheme pointer can be joined partition-wise. It's important that the indexes in partition scheme match to the OIDs array to find matching RelOptInfos for partition-wise join. On Thu, Sep 22, 2016 at 11:12 AM, Ashutosh Bapat wrote: > Hi Amit, > Following sequence of DDLs gets an error > -- > -- multi-leveled partitions > -- > CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a); > CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END > (250) PARTITION BY RANGE (b); > CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END > (100); > CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START > (100) END (250); > CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END > (500) PARTITION BY RANGE (c); > CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START > ('0250') END ('0400'); > CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START > ('0400') END ('0500'); > CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END > (600) PARTITION BY RANGE ((b + a)); > ERROR: cannot use column or expression from ancestor partition key > > The last statement is trying create subpartitions by range (b + a), > which contains a partition key from ancestor partition key but is not > exactly same as that. In fact it contains some extra columns other > than the ancestor partition key columns. Why do we want to prohibit > such cases? > > On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote > wrote: >> >> Hi >> >> On 2016/09/09 17:55, Amit Langote wrote: >>> On 2016/09/06 22:04, Amit Langote wrote: Will fix. >>> >>> Here is an updated set of patches. >> >> An email from Rajkumar somehow managed to break out of this thread. >> Quoting his message below so that I don't end up replying with patches on >> two different threads. >> >> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote: >>> I have Continued with testing declarative partitioning with the latest >>> patch. Got some more observation, given below >> >> Thanks a lot for testing. >> >>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range >>> partition. >>> >>> create table test_range_bound ( a int) partition by range(a); >>> --creating a partition to contain records {1,2,3,4}, by default 1 is >>> inclusive and 5 is exclusive >>> create table test_range_bound_p1 partition of test_range_bound for values >>> start (1) end (5); >>> --now trying to create a partition by explicitly mentioning start is >>> exclusive to contain records {5,6,7}, here trying to create with START with >>> 4 as exclusive so range should be 5 to 8, but getting partition overlap >>> error. >>> create table test_range_bound_p2 partition of test_range_bound for values >>> start (4) EXCLUSIVE end (8); >>> ERROR: partition "test_range_bound_p2" would overlap partition >>> "test_range_bound_p1" >> >> Wow, this is bad. What is needed in this case is "canonicalization" of >> the range partition bounds specified in the command. Range types do this >> and hence an equivalent test done with range type values would disagree >> with the result given by the patch for range partition bounds. >> >> select '[1,5)'::int4range && '(4,8]'::int4range as cmp; >> cmp >> - >> f >> (1 row) >> >> In this case, the second range is converted into its equivalent canonical >> form viz. '[5, 9)'. Then comparison of bounds 5) and [5 can tell that the >> ranges do not overlap after all. Range type operators can do this because >> their code can rely on the availability of a canonicalization function for >> a given range type. From the range type
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
Hi 2016-09-21 19:53 GMT+02:00 Dave Cramer : > > On 18 September 2016 at 09:27, Dave Cramer wrote: > >> >> On 10 August 2016 at 01:53, Pavel Stehule >> wrote: >> >>> Hi >>> >>> 2016-08-03 13:54 GMT+02:00 Alexey Grishchenko : >>> On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko < agrishche...@pivotal.io> wrote: > Hi > > Current implementation of PL/Python does not allow the use of > multi-dimensional arrays, for both input and output parameters. This > forces > end users to introduce workarounds like casting arrays to text before > passing them to the functions and parsing them after, which is an > error-prone approach > > This patch adds support for multi-dimensional arrays as both input and > output parameters for PL/Python functions. The number of dimensions > supported is limited by Postgres MAXDIM macrovariable, by default equal to > 6. Both input and output multi-dimensional arrays should have fixed > dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays > represent MxNxK cube, etc. > > This patch does not support multi-dimensional arrays of composite > types, as composite types in Python might be represented as iterators and > there is no obvious way to find out when the nested array stops and > composite type structure starts. For example, if we have a composite type > of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], > [4,'d'] ] ]", and it is hard to find out that the first two lists are > lists, and the third one represents structure. Things are getting even > more > complex when you have arrays as members of composite type. This is why I > think this limitation is reasonable. > > Given the function: > > CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS > int4[] AS $$ > plpy.info(x, type(x)) > return x > $$ LANGUAGE plpythonu; > > Before patch: > > # SELECT * FROM test_type_conversion_array_int > 4(ARRAY[[1,2,3],[4,5,6]]); > ERROR: cannot convert multidimensional array to Python list > DETAIL: PL/Python only supports one-dimensional arrays. > CONTEXT: PL/Python function "test_type_conversion_array_int4" > > > After patch: > > # SELECT * FROM test_type_conversion_array_int > 4(ARRAY[[1,2,3],[4,5,6]]); > INFO: ([[1, 2, 3], [4, 5, 6]], ) > test_type_conversion_array_int4 > - > {{1,2,3},{4,5,6}} > (1 row) > > > -- > Best regards, > Alexey Grishchenko > Also this patch incorporates the fix for https://www.postgresql.org /message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3q wHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python >>> I am sending review of this patch: >>> >>> 1. The implemented functionality is clearly benefit - passing MD arrays, >>> pretty faster passing bigger arrays >>> 2. I was able to use this patch cleanly without any errors or warnings >>> 3. There is no any error or warning >>> 4. All tests passed - I tested Python 2.7 and Python 3.5 >>> 5. The code is well commented and clean >>> 6. For this new functionality the documentation is not necessary >>> >>> 7. I invite more regress tests for both directions (Python <-> Postgres) >>> for more than two dimensions >>> >>> My only one objection is not enough regress tests - after fixing this >>> patch will be ready for commiters. >>> >> Now, the tests are enough - so I'll mark this patch as ready for commiters. I had to fix tests - there was lot of white spaces, and the result for python3 was missing Regards Pavel > >>> Good work, Alexey >>> >>> Thank you >>> >>> Regards >>> >>> Pavel >>> >>> -- Best regards, Alexey Grishchenko >>> >>> >> >> Pavel, >> >> I will pick this up. >> >> >> Pavel, > > Please see attached patch which provides more test cases > > I just realized this patch contains the original patch as well. What is > the protocol for sending in subsequent patches ? > >> >> Dave Cramer >> >> da...@postgresintl.com >> www.postgresintl.com >> >> > diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out index f0b6abd..296ef8b 100644 --- a/src/pl/plpython/expected/plpython_types.out +++ b/src/pl/plpython/expected/plpython_types.out @@ -537,9 +537,126 @@ INFO: (None, ) (1 row) SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]); -ERROR: cannot convert multidimensional array to Python list -DETAIL: PL/Python only supports one-dimensional arrays. -CONTEXT: PL/Python function "test_type_conversion_array_int4" +INFO: ([[1, 2, 3], [4, 5, 6]], ) + test_type_conversion_array_int4 +- + {{1,2,3},{4,5,6}} +(1 row) + +SELECT * FROM test_type_conversion_array_int4(ARRAY[[[1,2,NULL],[NULL,
Re: [HACKERS] Tuplesort merge pre-reading
On 09/22/2016 03:40 AM, Claudio Freire wrote: On Tue, Sep 20, 2016 at 3:34 PM, Claudio Freire wrote: The results seem all over the map. Some regressions seem significant (both in the amount of performance lost and their significance, since all 4 runs show a similar regression). The worst being "CREATE INDEX ix_lotsofitext_zz2ijw ON lotsofitext (z, z2, i, j, w);" with 4GB work_mem, which should be an in-memory sort, which makes it odd. I will re-run it overnight just in case to confirm the outcome. A new run for "patched" gives better results, it seems it was some kind of glitch in the run (maybe some cron decided to do something while running those queries). Attached In essence, it doesn't look like it's harmfully affecting CPU efficiency. Results seem neutral on the CPU front. Looking at the spreadsheet, there is a 40% slowdown in the "slow" "CREATE INDEX ix_lotsofitext_zz2ijw ON lotsofitext (z, z2, i, j, w);" test with 4GB of work_mem. I can't reproduce that on my laptop, though. Got any clue what's going on there? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers