Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
Added to TODO: * Experiment with multi-threaded backend better resource utilization This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. --- Chris Browne wrote: [EMAIL PROTECTED] (Michael Stone) writes: On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. Of course, table scanning is going to be IO limited in most cases, and having every query spawn 16 independent IO threads is likely to slow things down in more cases than it speeds them up. It could work if you have a bunch of storage devices, but at that point it's probably easier and more direct to implement a clustered approach. All stipulated, yes. It obviously wouldn't be terribly useful to scan more aggressively than I/O bandwidth can support. The point is that this is one of the kinds of places where concurrent processing could do some good... -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://cbbrowne.com/info/spiritual.html Save the whales. Collect the whole set. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian http://candle.pha.pa.us + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) wrote: On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it. Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). Most times Jan comes to town, we spend a few minutes musing about the splitting queries across threads problem, and dismiss it again; if there's the beginning of a split across processes, that's decidedly neat :-). Correct me if I'm wrong, but there's no way to (reasonably) accomplish that without having some dedicated extra processes laying around that you can use to execute the queries, no? In other words, the cost of a fork() during query execution would be too prohibitive... FWIW, DB2 executes all queries in a dedicated set of processes. The process handling the connection from the client will pass a query request off to one of the executor processes. I can't remember which process actually plans the query, but I know that the executor runs it. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
Christopher, On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote: Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). OK - I'll byte: It's process based, we fork backends at slice points in the execution plan. To take care of the startup latency problem, we persist sets of these backends, called gangs. They appear, persist for connection scope for reuse, then are disbanded. Most times Jan comes to town, we spend a few minutes musing about the splitting queries across threads problem, and dismiss it again; if there's the beginning of a split across processes, that's decidedly neat :-). :-) - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Jim C. Nasby) writes: On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) wrote: On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it. Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). Most times Jan comes to town, we spend a few minutes musing about the splitting queries across threads problem, and dismiss it again; if there's the beginning of a split across processes, that's decidedly neat :-). Correct me if I'm wrong, but there's no way to (reasonably) accomplish that without having some dedicated extra processes laying around that you can use to execute the queries, no? In other words, the cost of a fork() during query execution would be too prohibitive... Counterexample... The sort of scenario we keep musing about is where you split off a (thread|process) for each partition of a big table. There is in fact a natural such partitioning, in that tables get split at the 1GB mark, by default. Consider doing a join against 2 tables that are each 8GB in size (e.g. - they consist of 8 data files). Let's assume that the query plan indicates doing seq scans on both. You *know* you'll be reading through 16 files, each 1GB in size. Spawning a process for each of those files doesn't strike me as prohibitively expensive. A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. That particular scenario is one where the fork()s would hardly be noticeable. FWIW, DB2 executes all queries in a dedicated set of processes. The process handling the connection from the client will pass a query request off to one of the executor processes. I can't remember which process actually plans the query, but I know that the executor runs it. It seems to me that the kinds of cases where extra processes/threads would be warranted are quite likely to be cases where fork()ing may be an immaterial cost. -- let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];; http://www.ntlug.org/~cbbrowne/languages.html TECO Madness: a moment of convenience, a lifetime of regret. -- Dave Moon ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Luke Lonergan) writes: Christopher, On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote: Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). OK - I'll byte: It's process based, we fork backends at slice points in the execution plan. By slice points, do you mean that you'd try to partition tables (e.g. - if there's a Seq Scan on a table with 8 1GB segments, you could spawn as many as 8 processes), or that two scans that are then merge joined means a process for each scan, and a process for the merge join? Or perhaps both :-). Or perhaps something else entirely ;-). To take care of the startup latency problem, we persist sets of these backends, called gangs. They appear, persist for connection scope for reuse, then are disbanded. If only that could happen to more gangs... -- output = (cbbrowne @ cbbrowne.com) http://cbbrowne.com/info/multiplexor.html I'm sorry, the teleportation booth you have reached is not in service at this time. Please hand-reassemble your molecules or call an operator to help you ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. Of course, table scanning is going to be IO limited in most cases, and having every query spawn 16 independent IO threads is likely to slow things down in more cases than it speeds them up. It could work if you have a bunch of storage devices, but at that point it's probably easier and more direct to implement a clustered approach. Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: Correct me if I'm wrong, but there's no way to (reasonably) accomplish that without having some dedicated extra processes laying around that you can use to execute the queries, no? In other words, the cost of a fork() during query execution would be too prohibitive... Counterexample... The sort of scenario we keep musing about is where you split off a (thread|process) for each partition of a big table. There is in fact a natural such partitioning, in that tables get split at the 1GB mark, by default. Consider doing a join against 2 tables that are each 8GB in size (e.g. - they consist of 8 data files). Let's assume that the query plan indicates doing seq scans on both. You *know* you'll be reading through 16 files, each 1GB in size. Spawning a process for each of those files doesn't strike me as prohibitively expensive. Have you ever tried reading from 2 large files on a disk at the same time, let alone 16? The results ain't pretty. What you're suggesting maybe makes sense if the two tables are in different tablespaces, provided you have some additional means to know if those two tablespaces are on the same set of spindles. Though even here the usefulness is somewhat suspect, because CPU is a hell of a lot faster than disks are, unless you have a whole lot of disks. Of course, this is exactly the target market for MPP. Where parallel execution really makes sense is when you're doing things like sorts or hash operations, because those are relatively CPU-intensive. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Michael Stone) writes: On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. Of course, table scanning is going to be IO limited in most cases, and having every query spawn 16 independent IO threads is likely to slow things down in more cases than it speeds them up. It could work if you have a bunch of storage devices, but at that point it's probably easier and more direct to implement a clustered approach. All stipulated, yes. It obviously wouldn't be terribly useful to scan more aggressively than I/O bandwidth can support. The point is that this is one of the kinds of places where concurrent processing could do some good... -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://cbbrowne.com/info/spiritual.html Save the whales. Collect the whole set. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers
[EMAIL PROTECTED] (Jojo Paderes) wrote: I'd like to know if the latest PostgreSQL release can scale up by utilizing multiple cpu or dual core cpu to boost up the sql executions. I already do a research on the PostgreSQL mailing archives and only found old threads dating back 2000. A lot of things have improved with PostgreSQL and hopefully the support for multiple cpu or dual cores is already provided. If you submit multiple concurrent queries, they can be concurrently processed on separate CPUs; that has long been supported, and people have been using SMP systems to this end for years. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://cbbrowne.com/info/spreadsheets.html In other words -- and this is the rock solid principle on which the whole of the Corporation's Galaxy-wide success is founded -- their fundamental design flaws are completely hidden by their superficial design flaws. -- HHGTG ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Thu, 23 Mar 2006 14:19:24 +0800 Jojo Paderes [EMAIL PROTECTED] wrote: I'd like to know if the latest PostgreSQL release can scale up by utilizing multiple cpu or dual core cpu to boost up the sql executions. I already do a research on the PostgreSQL mailing archives and only found old threads dating back 2000. A lot of things have improved with PostgreSQL and hopefully the support for multiple cpu or dual cores is already provided. Yes PostgreSQL can take advantage of multiple CPUs and core, has been able to for quite some time. - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Thu, 2006-03-23 at 00:19, Jojo Paderes wrote: I'd like to know if the latest PostgreSQL release can scale up by utilizing multiple cpu or dual core cpu to boost up the sql executions. I already do a research on the PostgreSQL mailing archives and only found old threads dating back 2000. A lot of things have improved with PostgreSQL and hopefully the support for multiple cpu or dual cores is already provided. Can a single query be split up into parts and run on separate processors at the same time? No. Can multiple incoming queries be run on different processors for better performance? Yes. Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) wrote: On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it. Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). Most times Jan comes to town, we spend a few minutes musing about the splitting queries across threads problem, and dismiss it again; if there's the beginning of a split across processes, that's decidedly neat :-). -- output = reverse(moc.liamg @ enworbbc) http://linuxfinances.info/info/internet.html Why do we put suits in a garment bag, and put garments in a suitcase? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers
I'd like to know if the latest PostgreSQL release can scale up by utilizing multiple cpu or dual core cpu to boost up the sql executions. I already do a research on the PostgreSQL mailing archives and only found old threads dating back 2000. A lot of things have improved with PostgreSQL and hopefully the support for multiple cpu or dual cores is already provided. -- http://jojopaderes.multiply.com http://jojopaderes.wordpress.com ---(end of broadcast)--- TIP 6: explain analyze is your friend