Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-04-09 Thread Bruce Momjian

Added to TODO:

* Experiment with multi-threaded backend better resource utilization
  This would allow a single query to make use of multiple CPU's or
  multiple I/O channels simultaneously.


---

Chris Browne wrote:
 [EMAIL PROTECTED] (Michael Stone) writes:
 
  On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
 A naive read on this is that you might start with one backend process,
 which then spawns 16 more.  Each of those backends is scanning through
 one of those 16 files; they then throw relevant tuples into shared
 memory to be aggregated/joined by the central one.
 
  Of course, table scanning is going to be IO limited in most cases, and
  having every query spawn 16 independent IO threads is likely to slow
  things down in more cases than it speeds them up. It could work if you
  have a bunch of storage devices, but at that point it's probably
  easier and more direct to implement a clustered approach.
 
 All stipulated, yes.  It obviously wouldn't be terribly useful to scan
 more aggressively than I/O bandwidth can support.  The point is that
 this is one of the kinds of places where concurrent processing could
 do some good...
 -- 
 let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;;
 http://cbbrowne.com/info/spiritual.html
 Save the whales. Collect the whole set. 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
  Bruce Momjian   http://candle.pha.pa.us

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Jim C. Nasby
On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
 wrote:
  On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
   Has someone been working on the problem of splitting a query into pieces
   and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
   done that.  
  
  I believe that is limited to Bizgress MPP yes?
 
  Yep.  I hope that someday it will be released to the postgresql global
  dev group for inclusion.  Or at least parts of it.
 
 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).
 
 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

Correct me if I'm wrong, but there's no way to (reasonably) accomplish
that without having some dedicated extra processes laying around that
you can use to execute the queries, no? In other words, the cost of a
fork() during query execution would be too prohibitive...

FWIW, DB2 executes all queries in a dedicated set of processes. The
process handling the connection from the client will pass a query
request off to one of the executor processes. I can't remember which
process actually plans the query, but I know that the executor runs it.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Luke Lonergan
Christopher,

On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote:

 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).

OK - I'll byte:

It's process based, we fork backends at slice points in the execution plan.

To take care of the startup latency problem, we persist sets of these
backends, called gangs.  They appear, persist for connection scope for
reuse, then are disbanded.

 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

:-)

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Jim C. Nasby) writes:
 On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
 wrote:
  On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
   Has someone been working on the problem of splitting a query into pieces
   and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
   done that.  
  
  I believe that is limited to Bizgress MPP yes?
 
  Yep.  I hope that someday it will be released to the postgresql global
  dev group for inclusion.  Or at least parts of it.
 
 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).
 
 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

 Correct me if I'm wrong, but there's no way to (reasonably) accomplish
 that without having some dedicated extra processes laying around that
 you can use to execute the queries, no? In other words, the cost of a
 fork() during query execution would be too prohibitive...

Counterexample...

The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table.  There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
by default.

Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files).  Let's assume that the query
plan indicates doing seq scans on both.

You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
prohibitively expensive.

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

That particular scenario is one where the fork()s would hardly be
noticeable.

 FWIW, DB2 executes all queries in a dedicated set of processes. The
 process handling the connection from the client will pass a query
 request off to one of the executor processes. I can't remember which
 process actually plans the query, but I know that the executor runs
 it.

It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
-- 
let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];;
http://www.ntlug.org/~cbbrowne/languages.html
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Luke Lonergan) writes:
 Christopher,

 On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote:

 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).

 OK - I'll byte:

 It's process based, we fork backends at slice points in the execution plan.

By slice points, do you mean that you'd try to partition tables
(e.g. - if there's a Seq Scan on a table with 8 1GB segments, you
could spawn as many as 8 processes), or that two scans that are then
merge joined means a process for each scan, and a process for the
merge join?  Or perhaps both :-).  Or perhaps something else entirely ;-).

 To take care of the startup latency problem, we persist sets of
 these backends, called gangs.  They appear, persist for connection
 scope for reuse, then are disbanded.

If only that could happen to more gangs...
-- 
output = (cbbrowne @ cbbrowne.com)
http://cbbrowne.com/info/multiplexor.html
I'm sorry, the teleportation booth you have reached is not in service
at this  time.   Please  hand-reassemble  your molecules or   call  an
operator to help you

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Michael Stone

On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.


Of course, table scanning is going to be IO limited in most cases, and 
having every query spawn 16 independent IO threads is likely to slow 
things down in more cases than it speeds them up. It could work if you 
have a bunch of storage devices, but at that point it's probably easier 
and more direct to implement a clustered approach.


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
  Correct me if I'm wrong, but there's no way to (reasonably) accomplish
  that without having some dedicated extra processes laying around that
  you can use to execute the queries, no? In other words, the cost of a
  fork() during query execution would be too prohibitive...
 
 Counterexample...
 
 The sort of scenario we keep musing about is where you split off a
 (thread|process) for each partition of a big table.  There is in fact
 a natural such partitioning, in that tables get split at the 1GB mark,
 by default.
 
 Consider doing a join against 2 tables that are each 8GB in size
 (e.g. - they consist of 8 data files).  Let's assume that the query
 plan indicates doing seq scans on both.
 
 You *know* you'll be reading through 16 files, each 1GB in size.
 Spawning a process for each of those files doesn't strike me as
 prohibitively expensive.

Have you ever tried reading from 2 large files on a disk at the same
time, let alone 16? The results ain't pretty.

What you're suggesting maybe makes sense if the two tables are in
different tablespaces, provided you have some additional means to know
if those two tablespaces are on the same set of spindles. Though even
here the usefulness is somewhat suspect, because CPU is a hell of a lot
faster than disks are, unless you have a whole lot of disks. Of course,
this is exactly the target market for MPP.

Where parallel execution really makes sense is when you're doing things
like sorts or hash operations, because those are relatively
CPU-intensive.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Michael Stone) writes:

 On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

 Of course, table scanning is going to be IO limited in most cases, and
 having every query spawn 16 independent IO threads is likely to slow
 things down in more cases than it speeds them up. It could work if you
 have a bunch of storage devices, but at that point it's probably
 easier and more direct to implement a clustered approach.

All stipulated, yes.  It obviously wouldn't be terribly useful to scan
more aggressively than I/O bandwidth can support.  The point is that
this is one of the kinds of places where concurrent processing could
do some good...
-- 
let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;;
http://cbbrowne.com/info/spiritual.html
Save the whales. Collect the whole set. 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

2006-03-23 Thread Christopher Browne
[EMAIL PROTECTED] (Jojo Paderes) wrote:
 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.

 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

If you submit multiple concurrent queries, they can be concurrently
processed on separate CPUs; that has long been supported, and people
have been using SMP systems to this end for years.
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://cbbrowne.com/info/spreadsheets.html
In other words  -- and this is the rock solid  principle on which the
whole  of the Corporation's  Galaxy-wide success  is founded  -- their
fundamental design  flaws are  completely hidden by  their superficial
design flaws. -- HHGTG

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Frank Wiles
On Thu, 23 Mar 2006 14:19:24 +0800
Jojo Paderes [EMAIL PROTECTED] wrote:

 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.
 
 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

  Yes PostgreSQL can take advantage of multiple CPUs and core, has been
  able to for quite some time. 

 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Scott Marlowe
On Thu, 2006-03-23 at 00:19, Jojo Paderes wrote:
 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.
 
 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

Can a single query be split up into parts and run on separate processors
at the same time?  No.

Can multiple incoming queries be run on different processors for better
performance?  Yes.

Has someone been working on the problem of splitting a query into pieces
and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
done that.  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Joshua D. Drake



Has someone been working on the problem of splitting a query into pieces
and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
done that.  


I believe that is limited to Bizgress MPP yes?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Scott Marlowe
On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
  Has someone been working on the problem of splitting a query into pieces
  and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
  done that.  
 
 I believe that is limited to Bizgress MPP yes?

Yep.  I hope that someday it will be released to the postgresql global
dev group for inclusion.  Or at least parts of it.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Christopher Browne
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
wrote:
 On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
  Has someone been working on the problem of splitting a query into pieces
  and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
  done that.  
 
 I believe that is limited to Bizgress MPP yes?

 Yep.  I hope that someday it will be released to the postgresql global
 dev group for inclusion.  Or at least parts of it.

Question: Does the Bizgress/MPP use threading for this concurrency?
Or forking?

If it does so via forking, that's more portable, and less dependent on
specific complexities of threading implementations (which amounts to
non-portability ;-)).

Most times Jan comes to town, we spend a few minutes musing about the
splitting queries across threads problem, and dismiss it again; if
there's the beginning of a split across processes, that's decidedly
neat :-).
-- 
output = reverse(moc.liamg @ enworbbc)
http://linuxfinances.info/info/internet.html
Why do we put suits in a garment bag, and put garments in a suitcase? 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

2006-03-22 Thread Jojo Paderes
I'd like to know if the latest PostgreSQL release can scale up by
utilizing multiple cpu or dual core cpu to boost up the sql
executions.

I already do a research on the PostgreSQL mailing archives and only
found old threads dating back 2000. A lot of things have improved with
PostgreSQL and hopefully the support for multiple cpu or dual cores is
already provided.

--
http://jojopaderes.multiply.com
http://jojopaderes.wordpress.com

---(end of broadcast)---
TIP 6: explain analyze is your friend