Re: JIT compiling with LLVM v9.0
On Sat, Jan 27, 2018 at 5:15 PM, Andres Freundwrote: >> Also, I'm sure you considered this, but I'd like to ask if we can try >> harder make the JIT itself happen in an extension. It has some pretty >> huge benefits: > > I'm very strongly against this. To the point that I'll not pursue JITing > further if that becomes a requirement. I would like to see this feature succeed and I'm not making any specific demands. > infeasible because quite freuquently both non-JITed code and JITed code > need adjustments. That'd solve your concern about Can you explain further? > I think it's a fools errand to try to keep in sync with core changes on > the expression evaluation and struct definition side of things. There's > planner integration, error handling integration and similar related > things too, all of which require core changes. Therefore I don't think > there's a reasonable chance of success of doing this outside of core > postgres. I wasn't suggesting the entire patch be done outside of core. Core will certainly need to know about JIT compilation, but I am not convinced that it needs to know about the details of LLVM. All the references to the LLVM library itself are contained in a few files, so you've already got it well organized. What's stopping us from putting that code into a "jit provider" extension that implements the proper interfaces? > Well, but doing this outside of core would pretty much prohibit doing so > forever, no? First of all, building .bc files at build time is much less invasive than linking to the LLVM library. Any version of clang will produce bitcode that can be read by any LLVM library or tool later (more or less). Second, we could change our minds later. Mark any extension APIs as experimental, and decide we want to move LLVM into postgres whenever it is needed. Third, there's lots of cool stuff we can do here: * put the source in the catalog * an extension could have its own catalog and build the source into bitcode and cache it there * the source for functions would flow to replicas, etc. * security-conscious environments might even choose to run some of the C code in a safe C interpreter rather than machine code So I really don't see this as permanently closing off our options. Regards, Jeff Davis
Re: Reorder C includes in partition.c
On Thu, Jan 25, 2018 at 06:58:40PM +0900, Etsuro Fujita wrote: > Here is a small patch for $SUBJECT. > > Best regards, > Etsuro Fujita > diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c > index 8adc4ee..e69bbc0 100644 > --- a/src/backend/catalog/partition.c > +++ b/src/backend/catalog/partition.c > @@ -46,11 +46,11 @@ > #include "utils/array.h" > #include "utils/builtins.h" > #include "utils/datum.h" > -#include "utils/memutils.h" > #include "utils/fmgroids.h" > #include "utils/hashutils.h" > #include "utils/inval.h" > #include "utils/lsyscache.h" > +#include "utils/memutils.h" > #include "utils/rel.h" > #include "utils/ruleutils.h" > #include "utils/syscache.h" Patch applied to head. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Cancelling parallel query leads to segfault
Hi Peter (and others who mucked around with related code), While testing another patch I found that cancelling a parallel query on master segfaults the leader in an interesting manner: #0 0x5648721cb361 in tas (lock=0x7fbd8e025360 ) at /home/andres/src/postgresql/src/include/storage/s_lock.h:228 #1 0x5648721cc2d6 in shm_mq_detach_internal (mq=0x7fbd8e025360) at /home/andres/src/postgresql/src/backend/storage/ipc/shm_mq.c:812 #2 0x5648721cc25c in shm_mq_detach (mqh=0x564874956bb0) at /home/andres/src/postgresql/src/backend/storage/ipc/shm_mq.c:778 #3 0x56487201e8fc in ExecParallelFinish (pei=0x5648749514f8) at /home/andres/src/postgresql/src/backend/executor/execParallel.c:1011 #4 0x56487203706c in ExecShutdownGatherWorkers (node=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/nodeGather.c:383 #5 0x5648720370b9 in ExecShutdownGather (node=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/nodeGather.c:400 #6 0x564872036c84 in ExecEndGather (node=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/nodeGather.c:241 #7 0x564872020eff in ExecEndNode (node=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:609 #8 0x5648720522f5 in ExecEndSort (node=0x564874906eb0) at /home/andres/src/postgresql/src/backend/executor/nodeSort.c:260 #9 0x564872021053 in ExecEndNode (node=0x564874906eb0) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:695 #10 0x56487203193e in ExecEndAgg (node=0x564874906788) at /home/andres/src/postgresql/src/backend/executor/nodeAgg.c:3329 #11 0x564872021075 in ExecEndNode (node=0x564874906788) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:703 #12 0x56487201a102 in ExecEndPlan (planstate=0x564874906788, estate=0x564874906538) at /home/andres/src/postgresql/src/backend/executor/execMain.c:1604 #13 0x56487201823b in standard_ExecutorEnd (queryDesc=0x56487485e808) at /home/andres/src/postgresql/src/backend/executor/execMain.c:493 #14 0x564872018167 in ExecutorEnd (queryDesc=0x56487485e808) at /home/andres/src/postgresql/src/backend/executor/execMain.c:464 #15 0x564871fb136e in PortalCleanup (portal=0x5648748a7b48) at /home/andres/src/postgresql/src/backend/commands/portalcmds.c:301 #16 0x5648723a2148 in AtAbort_Portals () at /home/andres/src/postgresql/src/backend/utils/mmgr/portalmem.c:781 #17 0x564871e84168 in AbortTransaction () at /home/andres/src/postgresql/src/backend/access/transam/xact.c:2540 #18 0x564871e865e2 in AbortOutOfAnyTransaction () at /home/andres/src/postgresql/src/backend/access/transam/xact.c:4387 #19 0x564872378c40 in ShutdownPostgres (code=1, arg=0) at /home/andres/src/postgresql/src/backend/utils/init/postinit.c:1156 #20 0x5648721c2a63 in shmem_exit (code=1) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:228 #21 0x5648721c293a in proc_exit_prepare (code=1) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:185 #22 0x5648721c28a2 in proc_exit (code=1) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:102 #23 0x5648723638cd in errfinish (dummy=0) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:543 #24 0x5648721f5f54 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:2921 #25 0x564872022630 in ExecScanFetch (node=0x564874907c18, accessMtd=0x564872050b32 , recheckMtd=0x564872050bfd ) at /home/andres/src/postgresql/src/backend/executor/execScan.c:41 #26 0x564872022840 in ExecScan (node=0x564874907c18, accessMtd=0x564872050b32 , recheckMtd=0x564872050bfd ) at /home/andres/src/postgresql/src/backend/executor/execScan.c:162 #27 0x564872050c4b in ExecSeqScan (pstate=0x564874907c18) at /home/andres/src/postgresql/src/backend/executor/nodeSeqscan.c:130 #28 0x56487202c115 in ExecProcNode (node=0x564874907c18) at /home/andres/src/postgresql/src/include/executor/executor.h:245 #29 0x56487202c5af in fetch_input_tuple (aggstate=0x5648749074a0) at /home/andres/src/postgresql/src/backend/executor/nodeAgg.c:406 #30 0x56487202eb03 in agg_fill_hash_table (aggstate=0x5648749074a0) at /home/andres/src/postgresql/src/backend/executor/nodeAgg.c:1915 #31 0x56487202e38c in ExecAgg (pstate=0x5648749074a0) at /home/andres/src/postgresql/src/backend/executor/nodeAgg.c:1534 #32 0x564872020c76 in ExecProcNodeFirst (node=0x5648749074a0) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:446 #33 0x5648720366fc in ExecProcNode (node=0x5648749074a0) at /home/andres/src/postgresql/src/include/executor/executor.h:245 #34 0x564872036d83 in gather_getnext (gatherstate=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/nodeGather.c:285 #35 0x564872036c0d in ExecGather (pstate=0x564874907148) at /home/andres/src/postgresql/src/backend/executor/nodeGather.c:216 #36 0x564872020c76 in
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkovwrote: > On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila > wrote: >> >> On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov >> wrote: >> > +1, >> > Very nice idea! Locking hash values directly seems to be superior over >> > locking hash index pages. >> > Shubham, do you have any comment on this? >> > >> >> As Shubham seems to be running out of time, I thought of helping him >> by looking into the above-suggested idea. I think one way to lock a >> particular hash value is we can use TID of heap tuple associated with >> each index entry (index entry for the hash index will be hash value). > > > Sorry, I didn't get what do you particularly mean. If locking either TID of > associated heap tuple or TID of hash index tuple, then what will we lock > in the case when nothing found? Even if we found nothing, we have > to place some lock according to search key in order to detect cases when > somebody has inserted the row which we might see according to that search > key. > Okay, but if you use hash value as lock tag (which is possible) how will we deal with things like page split? I think even if use blocknumber/page/bucketnumber corresponding to the hash value along with hash value in lock tag, then also it doesn't appear to work. I think using page level locks for index makes sense, especially because it will be convinient to deal with page splits. Also, as predicate locks stay in-memory, so creating too many such locks doesn't sound like a nice strategy even though we have a way to upgrade it to next level (page) as that has a separate cost to it. >> >> However, do we really need it for implementing predicate locking for >> hash indexes? If we look at the "Index AM implementations" section of >> README-SSI, it doesn't seem to be required. Basically, if we look at >> the strategy of predicate locks in btree [1], it seems to me locking >> at page level for hash index seems to be a right direction as similar >> to btree, the corresponding heap tuple read will be locked. > > > Btree uses leaf-pages locking because it supports both range searches > and exact value searches. And it needs to detect overlaps between > these two kinds of searches. Therefore, btree locks leaf-pages in both > cases. > Also, probably using page level locks makes it easier to deal index operations like page split. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: JIT compiling with LLVM v9.0
On Sat, Jan 27, 2018 at 1:20 PM, Andres Freundwrote: > b) The optimizations to take advantage of the constants and make the >code faster with the constant tupledesc is fairly slow (you pretty >much need at least an -O2 equivalent), whereas the handrolled tuple >deforming is faster than the slot_getsomeattrs with just a single, >pretty cheap, mem2reg pass. We're talking about ~1ms vs 70-100ms in >a lot of cases. The optimizer often will not actually unroll the >loop with many attributes despite that being beneficial. This seems like the major point. We would have to customize the optimization passes a lot and/or choose carefully which ones we apply. > I think in most cases using the approach you advocate makes sense, to > avoid duplication, but tuple deforming is such a major bottleneck that I > think it's clearly worth doing it manually. Being able to use llvm with > just a always-inline and a mem2reg pass makes it so much more widely > applicable than doing the full inlining and optimization work. OK. On another topic, I'm trying to find a way we could break this patch into smaller pieces. For instance, if we concentrate on tuple deforming, maybe it would be committable in time for v11? I see that you added some optimizations to the existing generic code. Do those offer a measurable improvement, and if so, can you commit those first to make the JIT stuff more readable? Also, I'm sure you considered this, but I'd like to ask if we can try harder make the JIT itself happen in an extension. It has some pretty huge benefits: * The JIT code is likely to go through a lot of changes, and it would be nice if it wasn't tied to a yearly release cycle. * Would mean postgres itself isn't dependent on a huge library like llvm, which just seems like a good idea from a packaging standpoint. * May give GCC or something else a chance to compete with it's own JIT. * It may make it easier to get something in v11. It appears reasonable to make the slot deforming and expression evaluator parts an extension. execExpr.h only exports a couple new functions; heaptuple.c has a lot of changes but they seem like they could be separated (unless I'm missing something). The biggest problem is that the inlining would be much harder to separate out, because you are building the .bc files at build time. I really like the idea of inlining, but it doesn't necessarily need to be in the first commit. Regards, Jeff Davis
Re: [HACKERS] GnuTLS support
On Wed, Jan 17, 2018 at 10:02:35PM -0500, Tom Lane wrote: > That is a really good point. For precedent, note that darn near nobody > seems to know whether their psql contains readline or libedit. If we > force the issue by giving the settings different names, then they'll be > forced to figure out which SSL implementation they have. > > On the other hand, you could argue that there are more user-friendly > ways to expose that information than demanding that users play twenty > questions with their config files. I'd at least want us to recognize > when somebody tries to set "openssl_foo" in a gnutls implementation, > and respond with "you need to twiddle the gnutls_xxx variables instead" > rather than just "unrecognized configuration parameter". Maybe that'd > be good enough, though. To open another can of worms, are we ever going to rename "ssl" parameters to "tls" since TLS is the protocol used by all modern secure communication libraries. SSL was deprecated in 2015: https://www.globalsign.com/en/blog/ssl-vs-tls-difference/ Both SSL 2.0 and 3.0 have been deprecated by the IETF (in 2011 and 2015, respectively). -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Built-in connection pooling
On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote: > > Yes, external connection pooling is more flexible. It allows to > > perform pooling either at client side either at server side (or even > > combine two approaches).> > > Also external connection pooling for PostgreSQL is not limited by > > pgbouncer/pgpool.> > > There are many frameworks maintaining their own connection pool, for > > example J2EE, jboss, hibernate,...> > > I have a filling than about 70% of enterprise systems working with > > databases are written in Java and doing connection pooling in their > > own way.> > > True, but that does not really mean we don't need "our" connection > pooling (built-in or not). The connection pools are usually built into > the application servers, so each application server has their own > independent pool. With larger deployments (a couple of application > servers) that quickly causes problems with max_connections. I found this thread and the pthread thread very interesting. Konstantin, thank you for writing prototypes and giving us very useful benchmarks for ideas I thought I might never see. As much as I would like to move forward with coding, I would like to back up and understand where we need to go with these ideas. First, it looks like pthreads and a builtin pooler help mostly with 1000+ connections. It seems like you found that pthreads wasn't sufficient and the builtin pooler was better. Is that correct? Is there anything we can do differently about allowing long-idle connections to reduce their resource usage, e.g. free their caches? Remove from PGPROC? Could we do it conditionally, e.g. only sessions that don't have open transactions or cursors? It feels like user and db mismatches are always going to cause pooling problems. Could we actually exit and restart connections that have default session state? Right now, if you hit max_connections, we start rejecting new connections. Would it make sense to allow an option to exit idle connections when this happens so new users can connect? I know we have relied on external connection poolers to solve all the high connection problems but it seems there might be simple things we can do to improve matters. FYI, I did write a blog entry comparing external and internal connection poolers: https://momjian.us/main/blogs/pgblog/2017.html#April_21_2017 -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: \describe*
On Sat, Jan 27, 2018 at 10:54:07PM +0100, Vik Fearing wrote: > On 01/27/2018 05:39 PM, David Fetter wrote: > > On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote: > >> On 01/26/2018 03:49 PM, David Fetter wrote: > >>> I propose that we do what at least MySQL, Oracle, and DB2 do and > >>> implement DESCRIBE as its own command. > >> Hard pass. > > > > Would you be so kind as to expand on this? "Pass" might indicate a > > lack of interest in doing the work, but "hard pass" seems to indicate > > that you have reasons the work should not be done. Have I interpreted > > this correctly? > > Andreas said it quite well. I don't like having client commands look > like server commands. I don't mind exceptions for "help" and "quit", > but I see no reason for anything more. I did not propose a client command mimicking a server command. I thought I made that clear by mentioning that the \ commands are unavailable to clients other than psql, and offering an alternative. What I propose is in fact a server command, which at least three of the other popular RDBMSs already have. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
On 01/27/2018 10:45 PM, Tom Lane wrote: > David Rowleywrites: >> I'd offer to put it back to the order of the enum, but I want to >> minimise the invasiveness of the patch. I'm not sure yet if it should >> be classed as a bug fix or a new feature. > > FWIW, I'd call it a new feature. > I'm not sure what exactly the feature would be? I mean "keep statistics even if you only ask for indexes" does not seem particularly helpful to me. So I see it more like a bug - I certainly think it should have been handled differently in 10. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
Hi, On 01/27/2018 10:09 PM, David Rowley wrote: > On 27 January 2018 at 00:03, Telswrote: >> Looking at the patch, at first I thought the order was sorted and you >> swapped STORAGE and STATISTICS by accident. But then, it seems the order >> is semi-random. Should that list be sorted or is it already sorted by some >> criteria that I don't see? >> >> - INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING >> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING >> COMMENTS. >> + INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING >> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS >> INCLUDING COMMENTS. > > It looks like they were in order of how they're defined in enum > TableLikeOption up until [1], then I'm not so sure what the new order > is based on after that. > > I'd offer to put it back to the order of the enum, but I want to > minimise the invasiveness of the patch. I'm not sure yet if it should > be classed as a bug fix or a new feature. > > On looking at this I realised I missed changing the syntax synopsis. > The attached adds this. > Thanks for working on a patch. This should have been in the statistics patch, no doubt about that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Write lifetime hints for NVMe
On 01/27/2018 08:06 PM, Dmitry Dolgov wrote: >> On 27 January 2018 at 16:03, Tomas Vondra>> wrote: >> >> Aren't those numbers far lower that you'd expect from NVMe storage? I do >> have a NVMe drive (Intel 750) in my machine, and I can do thousands of >> transactions on it with two clients. Seems a bit suspicious. > > Maybe an NVMe storage can provide much higher numbers in general, but there > are > resource limitations from AWS itself. I was using c5.large, which is the > smallest possible instance of type c5, so maybe that can explain absolute > numbers - but anyway I can recheck, just in case if I missed something. > According to [1] the C5 instances don't have actual NVMe devices (say, storage in PCIe slot or connected using M.2) but EBS volumes exposed as NVMe devices. That would certainly make explain the low IOPS numbers, as EBS has built-in throttling. I don't know how much of the NVMe features does this EBS variant support. Amazon actually does provide instance types (f1 and i3) with real NVMe devices. That's what I'd be testing. I can do some testing on my system with NVMe storage, to see if there really is any change thanks to the patch. [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] generated columns
On 1/26/18 12:46, Robert Haas wrote: > On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut >wrote: >>> Does the SQL spec mention the matter? How do other systems >>> handle such cases? >> >> In Oracle you get the same overflow error. > > That seems awful. If a user says "SELECT * FROM tab" and it fails, > how are they supposed to recover, or even understand what the problem > is? I think we should really try to at least generate an errcontext > here: > > ERROR: integer out of range > CONTEXT: while generating virtual column "b" > > And maybe a hint, too, like "try excluding this column". This is expanded in the rewriter, so there is no context like that. This is exactly how views work, e.g., create table t1 (id int, length int); create view v1 as select id, length * 10 as length_in_nanometers from t1; insert into t1 values (1, 5); select * from v1; ERROR: integer out of range I think this is not a problem in practice. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] GnuTLS support
On 1/25/18 09:07, Peter Eisentraut wrote: > On 1/19/18 13:43, Peter Eisentraut wrote: >> Comparing the existing {be,fe}-secure-openssl.c with the proposed >> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed >> Apple Secure Transport implementation, I have identified a few more >> areas of refactoring that should be done in order to avoid excessive >> copy-and-pasting in the new implementations: > > And here is another place that needs cleaning up, where the OpenSSL API > was used directly. And here is another one. The last one for now I think. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ab0bd8f655a5c75c8040b462a9d2c0111fbf323a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Sat, 27 Jan 2018 13:47:52 -0500 Subject: [PATCH] Refactor client-side SSL certificate checking code Separate the parts specific to the SSL library from the general logic. The previous code structure was open_client_SSL() calls verify_peer_name_matches_certificate() calls verify_peer_name_matches_certificate_name() calls wildcard_certificate_match() and was completely in fe-secure-openssl.c. The new structure is open_client_SSL() [openssl] calls pq_verify_peer_name_matches_certificate() [generic] calls pgtls_verify_peer_name_matches_certificate_guts() [openssl] calls openssl_verify_peer_name_matches_certificate_name() [openssl] calls pq_verify_peer_name_matches_certificate_name() [generic] calls wildcard_certificate_match() [generic] --- src/interfaces/libpq/fe-secure-openssl.c | 209 --- src/interfaces/libpq/fe-secure.c | 185 ++- src/interfaces/libpq/libpq-int.h | 20 +++ 3 files changed, 228 insertions(+), 186 deletions(-) diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index 9ab317320a..228ea5897a 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -60,9 +60,8 @@ #endif #include -static bool verify_peer_name_matches_certificate(PGconn *); static int verify_cb(int ok, X509_STORE_CTX *ctx); -static int verify_peer_name_matches_certificate_name(PGconn *conn, +static int openssl_verify_peer_name_matches_certificate_name(PGconn *conn, ASN1_STRING *name, char **store_name); static void destroy_ssl_system(void); @@ -492,76 +491,16 @@ verify_cb(int ok, X509_STORE_CTX *ctx) /* - * Check if a wildcard certificate matches the server hostname. - * - * The rule for this is: - * 1. We only match the '*' character as wildcard - * 2. We match only wildcards at the start of the string - * 3. The '*' character does *not* match '.', meaning that we match only - *a single pathname component. - * 4. We don't support more than one '*' in a single pattern. - * - * This is roughly in line with RFC2818, but contrary to what most browsers - * appear to be implementing (point 3 being the difference) - * - * Matching is always case-insensitive, since DNS is case insensitive. - */ -static int -wildcard_certificate_match(const char *pattern, const char *string) -{ - int lenpat = strlen(pattern); - int lenstr = strlen(string); - - /* If we don't start with a wildcard, it's not a match (rule 1 & 2) */ - if (lenpat < 3 || - pattern[0] != '*' || - pattern[1] != '.') - return 0; - - if (lenpat > lenstr) - /* If pattern is longer than the string, we can never match */ - return 0; - - if (pg_strcasecmp(pattern + 1, string + lenstr - lenpat + 1) != 0) - - /* -* If string does not end in pattern (minus the wildcard), we don't -* match -*/ - return 0; - - if (strchr(string, '.') < string + lenstr - lenpat) - - /* -* If there is a dot left of where the pattern started to match, we -* don't match (rule 3) -*/ - return 0; - - /* String ended with pattern, and didn't have a dot before, so we match */ - return 1; -} - -/* - * Check if a name from a server's certificate matches the peer's hostname. - * - * Returns 1 if the name matches, and 0 if it does not. On error, returns - * -1, and sets the libpq error message. - * - * The name extracted from the certificate is returned in *store_name. The - * caller is responsible for freeing it. + * OpenSSL-specific wrapper around + * pq_verify_peer_name_matches_certificate_name(), converting the ASN1_STRING + * into a plain C string. */ static int -verify_peer_name_matches_certificate_name(PGconn
Re: \describe*
On 01/27/2018 05:39 PM, David Fetter wrote: > On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote: >> On 01/26/2018 03:49 PM, David Fetter wrote: >>> I propose that we do what at least MySQL, Oracle, and DB2 do and >>> implement DESCRIBE as its own command. >> Hard pass. > > Would you be so kind as to expand on this? "Pass" might indicate a > lack of interest in doing the work, but "hard pass" seems to indicate > that you have reasons the work should not be done. Have I interpreted > this correctly? Andreas said it quite well. I don't like having client commands look like server commands. I don't mind exceptions for "help" and "quit", but I see no reason for anything more. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
David Rowleywrites: > I'd offer to put it back to the order of the enum, but I want to > minimise the invasiveness of the patch. I'm not sure yet if it should > be classed as a bug fix or a new feature. FWIW, I'd call it a new feature. I think the ordering of these items suffers greatly from "add new stuff at the end no matter what" syndrome. Feel free to design an ordering that makes more sense, but then please apply it consistently. regards, tom lane
Re: JIT compiling with LLVM v9.0
Hi, On 2018-01-26 22:52:35 -0800, Jeff Davis wrote: > The version of LLVM that I tried this against had a linker option > called "InternalizeLinkedSymbols" that would prevent the visibility > problem you mention (assuming I understand you correctly). I don't think they're fully solvable - you can't really internalize a reference to a mutable static variable in another translation unit. Unless you modify that translation unit, which doesn't work when postgres running. > That option is no longer there so I will have to figure out how to do > it with the current LLVM API. Look at the llvmjit_wrap.c code invoking FunctionImporter - that pretty much does that. I'll push a cleaned up version of that code sometime this weekend (it'll then live in llvmjit_inline.cpp). > > Afaict that's effectively what I've already implemented. We could export > > more input as constants to the generated program, but other than that... > > I brought this up in the context of slot_compile_deform(). In your > patch, you have code like: > > + if (!att->attnotnull) > + { > ... > + v_nullbyte = LLVMBuildLoad( > + builder, > + LLVMBuildGEP(builder, v_bits, > +_nullbyteno, 1, > ""), > + "attnullbyte"); > + > + v_nullbit = LLVMBuildICmp( > + builder, > + LLVMIntEQ, > + LLVMBuildAnd(builder, v_nullbyte, > v_nullbytemask, ""), > + LLVMConstInt(LLVMInt8Type(), 0, false), > + "attisnull"); > ... > > So it looks like you are reimplementing the generic code, but with > conditional code gen. If the generic code changes, someone will need > to read, understand, and change this code, too, right? Right. Not that that's code that has changed that much... > With my approach, then it would initially do *un*conditional code gen, > and be less efficient and less specialized than the code generated by > your current patch. But then it would link in the constant tupledesc, > and optimize, and the optimizer will realize that they are constants > (hopefully) and then cut out a lot of the dead code and specialize it > to the given tupledesc. Right. > This places a lot of faith in the optimizer and I realize it may not > happen as nicely with real code as it did with my earlier experiments. > Maybe you already tried and you are saying that's a dead end? I'll > give it a shot, though. I did that, yes. There's two major downsides: a) The code isn't as efficient as the handrolled code. The handrolled code e.g. can take into account that it doesn't need to access the NULL bitmap for a NOT NULL column and we don't need to check the tuple's number of attributes if there's a following NOT NULL attribute. Those safe a good number of cycles. b) The optimizations to take advantage of the constants and make the code faster with the constant tupledesc is fairly slow (you pretty much need at least an -O2 equivalent), whereas the handrolled tuple deforming is faster than the slot_getsomeattrs with just a single, pretty cheap, mem2reg pass. We're talking about ~1ms vs 70-100ms in a lot of cases. The optimizer often will not actually unroll the loop with many attributes despite that being beneficial. I think in most cases using the approach you advocate makes sense, to avoid duplication, but tuple deforming is such a major bottleneck that I think it's clearly worth doing it manually. Being able to use llvm with just a always-inline and a mem2reg pass makes it so much more widely applicable than doing the full inlining and optimization work. > >> I experimented a bit before and it works for basic cases, but I'm not > >> sure if it's as good as your hand-generated LLVM. > > > > For deforming it doesn't even remotely get as good in my experiments. > > I'd like some more information here -- what didn't work? It didn't > recognize constants? Or did recognize them, but didn't optimize as > well as you did by hand? It didn't optimize as well as I did by hand, without significantly complicating (and slowing) the originating the code. It sometimes decided not to unroll the loop, and it takes a *lot* longer than the direct emission of the code. I'm hoping to work on making more of the executor JITed, and there I do think it's largely going to be what you're proposing, due to the sheer mass of code. Greetings, Andres Freund
Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
On 27 January 2018 at 00:03, Telswrote: > Looking at the patch, at first I thought the order was sorted and you > swapped STORAGE and STATISTICS by accident. But then, it seems the order > is semi-random. Should that list be sorted or is it already sorted by some > criteria that I don't see? > > - INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING > CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING > COMMENTS. > + INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING > CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS > INCLUDING COMMENTS. It looks like they were in order of how they're defined in enum TableLikeOption up until [1], then I'm not so sure what the new order is based on after that. I'd offer to put it back to the order of the enum, but I want to minimise the invasiveness of the patch. I'm not sure yet if it should be classed as a bug fix or a new feature. On looking at this I realised I missed changing the syntax synopsis. The attached adds this. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3217327053638085d24dd4d276e7c1f7ac2c4c6b -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services create_table_like_statistics_v2.patch Description: Binary data
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
> On 27 Jan 2018, at 00:36, Tom Lanewrote: > > I've pushed this mostly as-is. Thanks! > I also took out the parser changes related to > allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that > is not a goal I consider worthy of adding extra grammar complexity. > We don't document that PARALLEL works there, and there has never been > any expectation that deprecated legacy syntax would grow new options > as needed to have feature parity with the modern syntax. Ok, didn’t know old syntax wasn’t extended to support new options so I went after it having run into the regress tests using it. > I also trimmed the new regression test cases a bit, as most of them seemed > pretty duplicative. Makes sense, they were made verbose to aid review but were too chatty for inclusion. cheers ./daniel
Re: [HACKERS] PoC: custom signal handler for extensions
Hello, On Mon, Jan 22, 2018 at 02:34:58PM +0300, Maksim Milyutin wrote: > ... > I have attached a new version of patch and updated version of > remote_effective_user function implementation that demonstrates the usage of > custom signals API. Thank you. The patch is applied and build. > +/* > + * UnregisterCustomProcSignal > + * Release slot of specific custom signal. > + * > + * This function have to be called in _PG_init or _PG_fini functions of > + * extensions at the stage of loading shared preloaded libraries. Otherwise > it > + * throws fatal error. Also it throws fatal error if argument is not valid > + * custom signal. > + */ > +void > +UnregisterCustomProcSignal(ProcSignalReason reason) > +{ > + if (!process_shared_preload_libraries_in_progress) > + ereport(FATAL, (errcode(ERRCODE_INTERNAL_ERROR), > + errmsg("cannot unregister > custom signal after startup"))); Is there actual need in UnregisterCustomProcSignal() within _PG_init()? An extension registers a handler and then unregister it doing nothing. It seems useless. Also process_shared_preload_libraries_in_progress within _PG_fini() will be false I think. _PG_fini() won't be called though, because implementation of internal_unload_library() is disabled. Actually, is there need in UnregisterCustomProcSignal() at all? It unregisters a handler only in current backend, for actual unergistering we need to call it everywhere, if I'm not mistaken. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: Write lifetime hints for NVMe
> On 27 January 2018 at 16:03, Tomas Vondra> wrote: > > Aren't those numbers far lower that you'd expect from NVMe storage? I do > have a NVMe drive (Intel 750) in my machine, and I can do thousands of > transactions on it with two clients. Seems a bit suspicious. Maybe an NVMe storage can provide much higher numbers in general, but there are resource limitations from AWS itself. I was using c5.large, which is the smallest possible instance of type c5, so maybe that can explain absolute numbers - but anyway I can recheck, just in case if I missed something.
Re: \describe*
On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote: > On 01/26/2018 03:49 PM, David Fetter wrote: > > I propose that we do what at least MySQL, Oracle, and DB2 do and > > implement DESCRIBE as its own command. > Hard pass. Would you be so kind as to expand on this? "Pass" might indicate a lack of interest in doing the work, but "hard pass" seems to indicate that you have reasons the work should not be done. Have I interpreted this correctly? Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers wrote: >> Regression tests only succeed for assert-disabled compiles; they fail when >> assert-enabled: > Problem seems to be with an existing Assert in catcache.c:1545: > Assert(nkeys > 0 && nkeys < cache->cc_nkeys); > The "<" needs to be "<=" (and is changed in the attached patch). No, that Assert is correct, because it's in SearchCatCacheList. It doesn't make any sense to use SearchCatCacheList for a lookup that specifies all of the key columns, because then you necessarily have at most one match; you might as well use regular SearchCatCache, which is significantly more efficient. > AFAICT this was never a problem before purely because no code before > this patch called SearchSysCacheList4, so they always called with > fewer keys than the number available. Hm, probably we shouldn't even have that macro. regards, tom lane
Re: Write lifetime hints for NVMe
On 01/27/2018 02:20 PM, Dmitry Dolgov wrote: > Hi, > > From what I see some time ago the write lifetime hints support for NVMe multi > streaming was merged into Linux kernel [1]. Theoretically it allows data > written together on media so they can be erased together, which minimizes > garbage collection, resulting in reduced write amplification as well as > efficient flash utilization [2]. I couldn't find any discussion about that on > hackers, so I decided to experiment with this feature a bit. My idea was to > test quite naive approach when all file descriptors, that are related to > temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them > `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any > infrastructure around to enable/disable hints. > > It turns out that it's possible to perform benchmarks on some EC2 instance > types (e.g. c5) with the corresponding version of the kernel, since they > expose > a volume as nvme device: > > ``` > # nvme list > Node SN Model > Namespace Usage Format FW Rev > > - > -- > /dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store > 1 0.00 B / 8.59 GB512 B + 0 B 1.0 > ``` > > To get some baseline results I've run several rounds of pgbench on these quite > modest instances (dedicated, with optimized EBS) with slightly adjusted > `max_wal_size` and with default configuration: > > $ pgbench -s 200 -i > $ pgbench -T 600 -c 2 -j 2 > > Analyzing `strace` output I can see that during this test there were some > significant number of operations with pg_stat_tmp and xlogtemp, so I assume > write lifetime hints should have some effect. > > As a result I've got reduction of latency about 5-8% (but so far these numbers > are unstable, probably because of virtualization). > > ``` > # without patch > number of transactions actually processed: 491945 > latency average = 2.439 ms > tps = 819.906323 (including connections establishing) > tps = 819.908755 (excluding connections establishing) > ``` > > ``` > with patch > number of transactions actually processed: 521805 > latency average = 2.300 ms > tps = 869.665330 (including connections establishing) > tps = 869.668026 (excluding connections establishing) > ``` > Aren't those numbers far lower that you'd expect from NVMe storage? I do have a NVMe drive (Intel 750) in my machine, and I can do thousands of transactions on it with two clients. Seems a bit suspicious. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] proposal: psql command \graw
2018-01-27 15:22 GMT+01:00 Daniel Verite: > Pavel Stehule wrote: > > > We are able to generate html/tex/markdown formats on client side. CSV is > > more primitive, but much more important data format, so it should not be > a > > problem. But I remember some objections related to code duplication. > > While experimenting with adding csv as an output format > (similar to unaligned except that it quotes with CSV rules), I find out > that we can already do this: > > \o | client-side-program with arguments > or > \o /path/to/file.csv > COPY (select query) TO STDOUT CSV [HEADER] ; > \o > > This can be used to route multiple resultsets into multiple programs > or files without the minor drawbacks of \copy (single line > formatting, no variables interpolation), or > psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid > (single redirect). > > > OTOH I notice that this simpler form with a right-side argument of \g: > =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv > outputs to the console rather into a file. Not sure why it acts > differently than \o and whether it's intentional. > > Anyway I think the above sequence with \o already does everything > that is being proposed as an alternative to \graw, or am I missing > something? > There is one difference, hard to say how much is important - you have to wrap your query to COPY command. That is not too friendly. Regards Pavel > > > What \o + COPY CSV lacks compared to a csv output format > is the ability to export results from meta-commands in csv, > such as \g in \x mode or \gx, \crosstabview, \d, \l... > > The other problem with \o |program + COPY CSV is that it's > not easy to discover that combination. It's more obvious if > we have csv explicitly listed as an output format. > > I'll post the patch implementing that in a new thread and we'll see > how it goes. > > > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: http://www.manitou-mail.org > Twitter: @DanielVerite >
Re: [HACKERS] proposal: psql command \graw
Pavel Stehule wrote: > We are able to generate html/tex/markdown formats on client side. CSV is > more primitive, but much more important data format, so it should not be a > problem. But I remember some objections related to code duplication. While experimenting with adding csv as an output format (similar to unaligned except that it quotes with CSV rules), I find out that we can already do this: \o | client-side-program with arguments or \o /path/to/file.csv COPY (select query) TO STDOUT CSV [HEADER] ; \o This can be used to route multiple resultsets into multiple programs or files without the minor drawbacks of \copy (single line formatting, no variables interpolation), or psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid (single redirect). OTOH I notice that this simpler form with a right-side argument of \g: =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv outputs to the console rather into a file. Not sure why it acts differently than \o and whether it's intentional. Anyway I think the above sequence with \o already does everything that is being proposed as an alternative to \graw, or am I missing something? What \o + COPY CSV lacks compared to a csv output format is the ability to export results from meta-commands in csv, such as \g in \x mode or \gx, \crosstabview, \d, \l... The other problem with \o |program + COPY CSV is that it's not easy to discover that combination. It's more obvious if we have csv explicitly listed as an output format. I'll post the patch implementing that in a new thread and we'll see how it goes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Write lifetime hints for NVMe
Hi, >From what I see some time ago the write lifetime hints support for NVMe multi streaming was merged into Linux kernel [1]. Theoretically it allows data written together on media so they can be erased together, which minimizes garbage collection, resulting in reduced write amplification as well as efficient flash utilization [2]. I couldn't find any discussion about that on hackers, so I decided to experiment with this feature a bit. My idea was to test quite naive approach when all file descriptors, that are related to temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any infrastructure around to enable/disable hints. It turns out that it's possible to perform benchmarks on some EC2 instance types (e.g. c5) with the corresponding version of the kernel, since they expose a volume as nvme device: ``` # nvme list Node SN Model Namespace Usage Format FW Rev - -- /dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store 1 0.00 B / 8.59 GB512 B + 0 B 1.0 ``` To get some baseline results I've run several rounds of pgbench on these quite modest instances (dedicated, with optimized EBS) with slightly adjusted `max_wal_size` and with default configuration: $ pgbench -s 200 -i $ pgbench -T 600 -c 2 -j 2 Analyzing `strace` output I can see that during this test there were some significant number of operations with pg_stat_tmp and xlogtemp, so I assume write lifetime hints should have some effect. As a result I've got reduction of latency about 5-8% (but so far these numbers are unstable, probably because of virtualization). ``` # without patch number of transactions actually processed: 491945 latency average = 2.439 ms tps = 819.906323 (including connections establishing) tps = 819.908755 (excluding connections establishing) ``` ``` with patch number of transactions actually processed: 521805 latency average = 2.300 ms tps = 869.665330 (including connections establishing) tps = 869.668026 (excluding connections establishing) ``` So I have a few questions: * Does it sound interesting and worthwhile to create a proper patch? * Maybe someone else has similar results? * Any suggestions about what can be the best/worst case scenarios of using such kind of hints? [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c75b1d9421f80f4143e389d2d50ddfc8a28c8c35 [2]: https://regmedia.co.uk/2016/09/23/0_storage-intelligence-prodoverview-2015-0.pdf nvme_write_lifetime_poc.patch Description: Binary data
Re: Setting BLCKSZ 4kB
On 01/27/2018 05:01 AM, Bruce Momjian wrote: > On Fri, Jan 26, 2018 at 11:53:33PM +0100, Tomas Vondra wrote: >> >> ... >> >> FWIW even if it's not save in general, it would be useful to >> understand what are the requirements to make it work. I mean, >> conditions that need to be met on various levels (sector size of >> the storage device, page size of of the file system, filesystem >> alignment, ...). > > I think you are fine as soon the data arrives at the durable > storage, and assuming the data can't be partially written to durable > storage. I was thinking more of a case where you have a file system, > a RAID card without a BBU, and then magnetic disks. In that case, > even if the file system were to write in 4k chunks, the RAID > controller would also need to do the same, and with the same > alignment. Of course, that's probably a silly example since there is > probably no way to atomically write 4k to a magnetic disk. > > Actually, what happens if a 4k write is being written to an SSD and > the server crashes. Is the entire write discarded? > AFAIK it's not possible to end up with a partial write, particularly not such that would contain a mix of old and new data - that's because SSDs can't overwrite a block without erasing it first. So the write should either succeed or fail as a whole, depending on when exactly the server crashes - it might be right before confirming the flush back to the client, for example. That assumes the drive has 4kB sectors (internal pages) - on drives with volatile write cache but supporting write barriers and cache flushes. On drives with non-volatile write cache (so with battery/capacitor) it should always succeed and never get discarded. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2018-01-27 11:49, Oliver Ford wrote: On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkerswrote: On 2018-01-27 00:35, Oliver Ford wrote: Attached patch implements an extensible version of the RANGE with values clause. It doesn't actually add any more type support than was [...] I've tested that the existing regression tests in previous versions still pass, and also added new tests for descending mode. Hi, Regression tests only succeed for assert-disabled compiles; they fail when assert-enabled: I used (Centos 6.9): Could you please try the attached version? It works for me with asserts enabled. [0001-window-frame-v8.patch] Yes, that fixed it, thanks. Problem seems to be with an existing Assert in catcache.c:1545: Assert(nkeys > 0 && nkeys < cache->cc_nkeys); The "<" needs to be "<=" (and is changed in the attached patch). AFAICT this was never a problem before purely because no code before this patch called SearchSysCacheList4, so they always called with fewer keys than the number available. But it's surely correct to assert that the number of keys supplied is less than or equal to, not less than, the number of keys in the cache.
Re: General purpose hashing func in pgbench
Hello Fabien, 26/01/2018 09:28, Fabien COELHO пишет: > > Hello Ildar, > > Applies, compiles, runs. > > I still have a few very minor comments, sorry for this (hopefully) > last iteration from my part. I'm kind of iterative... > > The XML documentation source should avoid a paragraph on one very long > line, but rather be indented like other sections. > > I'd propose simplify the second part: > > Hash functions can be used, for example, to modify the distribution of > random_zipfian or > random_exponential > functions in order to obtain scattered distribution. > Thus the following pgbench script simulates possible real world > workload > typical for social media and blogging platforms where few accounts > generate excessive load: > > -> > > Hash functions can be used to scatter the distribution of random > functions such as random_zipfian or > random_exponential. > For instance, the following pgbench script simulates possible real > world workload typical for social media and blogging platforms where > few accounts generate excessive load: > > Comment the Assert(0) as an internal error that cannot happen. > > I'd suggest to compact the execution code by declaring int64 variable > and coerce to int in one go, like the integer bitwise functions. I'm > in favor to keeping them in their own case and not reuse this one. > I did everything you mention here and attached a new version on the patch. Thank you! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..503dd79 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -874,13 +874,18 @@ pgbench options d -scale - current scale factor +client_id + unique number identifying the client session (starts from zero) -client_id - unique number identifying the client session (starts from zero) +default_seed + random seed used in hash functions by default + + + +scale + current scale factor @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int @@ -1423,6 +1449,30 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / +Hash functions hash, hash_murmur2 and +hash_fnv1a accept an input value and an optional seed parameter. +In case the seed isn't provided the value of :default_seed +is used, which is initialized randomly unless set by the command-line +-D option. Hash functions can be used to scatter the +distribution of random functions such as random_zipfian or +random_exponential. For instance, the following pgbench +script simulates possible real world workload typical for social media and +blogging platforms where few accounts generate excessive load: + + +\set r random_zipfian(0, 1, 1.07) +\set k abs(hash(:r)) % 100 + + +In some cases several distinct distributions are needed which don't correlate with each other and this is when implicit seed parameter comes in handy: + + +\set k1 abs(hash(:r), :default_seed + 123) % 100 +\set k2 abs(hash(:r), :default_seed + 321) % 100 + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..fc42c47 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Jan 26, 2018 at 12:36 PM, Amit Kapilawrote: > On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan wrote: >> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila >> wrote: At this point, my preferred solution is for someone to go implement Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems like the logical person for the job). >>> >>> I can implement it and share a prototype patch with you which you can >>> use to test parallel sort stuff. >> >> That would be great. Thank you. >> >>> I would like to highlight the >>> difference which you will see with WaitForParallelWorkersToAttach as >>> compare to WaitForParallelWorkersToFinish() is that the former will >>> give you how many of nworkers_launched workers are actually launched >>> whereas latter gives an error if any of the expected workers is not >>> launched. I feel former is good and your proposed way of calling it >>> after the leader is done with its work has alleviated the minor >>> disadvantage of this API which is that we need for workers to startup. >> >> I'm not sure that it makes much difference, though, since in the end >> WaitForParallelWorkersToFinish() is called anyway, much like >> nodeGather.c. Have I missed something? >> > > Nopes, you are right. I had in my mind that if we have something like > what I am proposing, then we don't even need to detect failures in > WaitForParallelWorkersToFinish and we can finish the work without > failing. > >> I had imagined that WaitForParallelWorkersToAttach() would give me an >> error in the style of WaitForParallelWorkersToFinish(), without >> actually waiting for the parallel workers to finish. >> > > I think that is also doable. I will give it a try and report back if > I see any problem with it. > I have posted the patch for the above API and posted it on a new thread [1]. Do let me know either here or on that thread if the patch suffices your need? [1] - https://www.postgresql.org/message-id/CAA4eK1%2Be2MzyouF5bg%3DOtyhDSX%2B%3DAo%3D3htN%3DT-r_6s3gCtKFiw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Wait for parallel workers to attach
During the recent development of parallel operation (parallel create index)[1], a need has been arised for $SUBJECT. The idea is to allow leader backend to rely on number of workers that are successfully started. This API allows leader to wait for all the workers to start or fail even if one of the workers fails to attach. We consider workers started/attached once they are attached to error queue. This will ensure that any error after the workers are attached won't be silently ignored by leader. I have used wait event as WAIT_EVENT_BGWORKER_STARTUP similar to WaitForReplicationWorkerAttach, but we might want to change it. I have tested this patch by calling this API in nodeGather.c and then introducing failuires at various places: (a) induce fork failure for background workers (force_fork_failure_v1.patch), (b) Exit parallel worker before attaching to the error queue (exit_parallel_worker_before_error_queue_attach_v1.patch) and (c) Exit parallel worker after attaching to the error queue (exit_parallel_worker_after_error_queue_attach_v1.patch). In all above cases, I got the errors as expected. [1] - https://www.postgresql.org/message-id/CAA4eK1KgmdT3ivm1vG%2BrJzKOKeYQU2XLhp6ma5LMHxaG89brsA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com WaitForParallelWorkersToAttach_v1.patch Description: Binary data modify_gather_to_wait_for_attach_v1.patch Description: Binary data force_fork_failure_v1.patch Description: Binary data exit_parallel_worker_before_error_queue_attach_v1.patch Description: Binary data exit_parallel_worker_after_error_queue_attach_v1.patch Description: Binary data