Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 07 Jul 2015, at 22:52, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell graeme.b...@nibio.no wrote: Hi Merlin, Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in a pl/pgsql environment here. You're just measuring whether postgres can parallelise entering that environment and get back out. Don't get me wrong - it's great that this scales well because it affects situations where you have lots of calls to trivial functions. However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in postgis or similar. Maybe so. But it will be a lot easier for me (and others on this) list if you submit a self contained test case that runs via pgbench. Hi Merlin, I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as well as the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are also trivial to drop into pgbench (cut/paste). e.g. did you see these parts of my previous email To clear up the issue I build a little test harness around your comment below. http://github.com/gbb/t; Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 different problems, all of it is clearly documented. I haven't used perf with pgbench before, and I can't run any code today. If you're interested in this but short on time, maybe you can glance at the repo above and just add 'perf' at the appropriate point in the rbuild wrapper. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 07/07/2015 08:05 PM, Craig James wrote: No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, calculate the molecular weight of this molecule). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients. Hi all, The sample code / results were put up last night at http://github.com/gbb/t Craig's problem sounds similar to my own, assuming he means running C indirectly via SQL vs running C more directly. Lots of parallel connections to postgres but maximum 2 CPUs of scaling (and it gets worse, as you try to run more things). Tom Lane has posted an interesting comment over on the bugs list which identies a likely source at least one of the problems, maybe both. It seems to be linked to internal locking inside postgres (which makes sense, given the results - both problems feel 'lock-y'). Also, he mentions a workaround for some functions that scales to 8-way apparently. http://www.postgresql.org/message-id/31265.1436317...@sss.pgh.pa.us I think it's potentially a big problem for CPU intensive postgres libraries like pgrouting, or perhaps the postgis postgis raster functions, things like that. I don't know how well their functions are marked for e.g. immutability. Are there any postgis devs on this list? Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 11:13:04 +, Graeme B. Bell wrote: I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as well as the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are also trivial to drop into pgbench (cut/paste). You realize that you want something here, not Merlin, right? e.g. did you see these parts of my previous email To clear up the issue I build a little test harness around your comment below. http://github.com/gbb/t; Well, that requires reviewing the source code of the run script and such. I think we shouldn't discuss this on two threads (-performance, -bugs), that makes it hard to follow. Given Tom's more detailed answer I think the -bugs thread already contains more pertinent information. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse
Hello, From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mike Broers Sent: Dienstag, 7. Juli 2015 18:28 To: Tom Lane Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse After bumping up work_mem from 12MB to 25MB that last materialize is indeed hashing and this cut the query time by about 60%. Thanks, this was very helpful and gives me something else to look for when troubleshooting explains. On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers mbro...@gmail.com wrote: Thanks, very informative! I'll experiment with work_mem settings and report back. On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Mike Broers mbro...@gmail.com writes: I had a query that was filtering with a wildcard search of a text field for %SUCCESS%. The query took about 5 seconds and was running often so I wanted to improve it. I suggested that the engineers include a new boolean column for successful status. They implemented the requested field, but the query that filters on that new column runs very long (i kill it after letting it run for about an hour). Can someone help me understand why that is the case and how to resolve it? It's hashing the subplan output in the first case and not the second: Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369) (actual time=4699.619..4699.869 rows=1 loops=1) Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3 vs Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 width=369) Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3 ^ Presumably, the new more-accurate rows count causes the planner to realize that the hash table will exceed work_mem so it doesn't choose to hash ... but for your situation, you'd rather it did, because what you're getting instead is a Materialize node that spills to disk (again, because the data involved exceeds work_mem) and that's a killer for this query. You should be able to get back the old behavior if you raise work_mem enough. Another idea you might think about is changing the OR'd IN conditions to a single IN over a UNION ALL of the subselects. I'm not really sure if that would produce a better plan, but it's worth trying if it wouldn't require too much app-side contortion. Hello, you might try to use a CTE to first collect the IDs to exclude, and join them to your main table. This should result in an anti join pattern. Something like: WITH IDS as ( SELECT U1.lead_id AS lead_id FROM event U1 WHERE U1.event_type ='type_1' UNION ( SELECT U1.lead_id AS lead_id FROM event U1 WHERE U1.event_type = 'type_2' INTERSECT SELECT U1.lead_id AS lead_id FROM event U1 WHERE successful ) ) SELECT * FROM lead LEFT OUTER JOIN IDS ON (lead.id=IDS.lead_id) WHERE IDS.lead_id IS NULL; regards, Marc Mamin regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote: On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com wrote: On 07/07/2015 08:05 PM, Craig James wrote: No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, calculate the molecular weight of this molecule). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients. When I moved these same functions into an Apache fast-CGI HTTP service (exact same code, same network overhead), I could easily scale up and use the full 100% of all eight CPUs. I have no idea why, and never investigated further. The convenience of having the functions in SQL wasn't that important. I admit that I haven't read this whole thread but: Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com wrote: On 07/07/2015 08:05 PM, Craig James wrote: No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, calculate the molecular weight of this molecule). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients. When I moved these same functions into an Apache fast-CGI HTTP service (exact same code, same network overhead), I could easily scale up and use the full 100% of all eight CPUs. I have no idea why, and never investigated further. The convenience of having the functions in SQL wasn't that important. I admit that I haven't read this whole thread but: Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? Craig JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- - Craig A. James Chief Technology Officer eMolecules, Inc. -
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 07/08/2015 10:48 AM, Craig James wrote: I admit that I haven't read this whole thread but: Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? In theory but that isn't PostgreSQL that does that, it will be the kernel scheduler. Although (and I am grasping at straws): I wonder if the execution is taking place outside of the backend proper or... are you using a pooler? JD Craig JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- - Craig A. James Chief Technology Officer eMolecules, Inc. - -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On Wed, Jul 8, 2015 at 10:52 AM, Joshua D. Drake j...@commandprompt.com wrote: On 07/08/2015 10:48 AM, Craig James wrote: I admit that I haven't read this whole thread but: Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? In theory but that isn't PostgreSQL that does that, it will be the kernel scheduler. Although (and I am grasping at straws): I wonder if the execution is taking place outside of the backend proper or... are you using a pooler? No pooler, and the functions were in an ordinary SQL extension .so library and loaded as CREATE OR REPLACE FUNCTION funcname( ... ) returns ... AS 'libxxx.so', 'funcname LANGUAGE c STRICT IMMUTABLE COST 1; Craig JD Craig JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- - Craig A. James Chief Technology Officer eMolecules, Inc. - -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- - Craig A. James Chief Technology Officer eMolecules, Inc. -
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote: On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. And addititionally memory bandwidth (shared between everything, even in the numa case), cross socket/bus bandwidth (absolutely performance critical in multi-socket configurations), cache capacity (shared between cores, and sometimes even sockets!). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
and...@anarazel.de (Andres Freund) writes: On 2015-07-08 15:38:24 -0700, Craig James wrote: From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, select is_prime(N) where N is a simple number. I mostly replied to Merlin's general point (additionally in the context of plpgsql). But I have a hard time seing that postgres would be the bottleneck for a is_prime() function (or something with similar characteristics) that's written in C where the average runtime is more than, say, a couple thousand cyles. I'd like to see a profile of that. But that was not the case that Graeme was complaining about. He's talking about simple-arithmetic-and-looping written in plpgsql, in a volatile function that is going to take a new snapshot for every statement, even if that's only n := n+1. So it's going to spend a substantial fraction of its runtime banging on the ProcArray, and that doesn't scale. If you write your is_prime function purely in plpgsql, and don't bother to mark it nonvolatile, *it will not scale*. It'll be slow even in single-thread terms, but it'll be particularly bad if you're saturating a multicore machine with it. One of my Salesforce colleagues has been looking into ways that we could decide to skip the per-statement snapshot acquisition even in volatile functions, if we could be sure that a particular statement isn't going to do anything that would need a snapshot. Now, IMO that doesn't really do much for properly written plpgsql; but there's an awful lot of bad plpgsql code out there, and it can make a huge difference for that. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On Wed, Jul 8, 2015 at 1:27 PM, Andres Freund and...@anarazel.de wrote: On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote: On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. And addititionally memory bandwidth (shared between everything, even in the numa case), cross socket/bus bandwidth (absolutely performance critical in multi-socket configurations), cache capacity (shared between cores, and sometimes even sockets!). From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, select is_prime(N) where N is a simple number. Are you saying that in order to calculate is_prime(N), all of that stuff (inter-process locking, memory bandwith, bus bandwidth, cache capacity, etc.) is even relevant? And if so, how is it that Postgres is so different from an Apache fast-CGI program that runs the exact same is_prime(N) calculation? Just curious ... as I said, I've already implemented a different solution. Craig
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 15:38:24 -0700, Craig James wrote: From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, select is_prime(N) where N is a simple number. I mostly replied to Merlin's general point (additionally in the context of plpgsql). But I have a hard time seing that postgres would be the bottleneck for a is_prime() function (or something with similar characteristics) that's written in C where the average runtime is more than, say, a couple thousand cyles. I'd like to see a profile of that. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance