Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-13 Thread Bruce Momjian

Enhanded TODO:

* Experiment with multi-threaded backend better resource utilization

  This would allow a single query to make use of multiple CPU's or
  multiple I/O channels simultaneously.  One idea is to create a
  background reader that can pre-fetch sequential and index scan
  pages needed by other backends.  This could be expanded to allow
  concurrent reads from multiple devices in a partitioned table.

The issue of parallism is basically a sliding scale, meaning what do you
want to do concurrently.  In this case, we are trying to do I/O and CPU
concurrently.  Another idea is to do I/O for partitioned tables
concurrently, and of course there are many CPU concurrent cases like
sorting.

---

Tom Lane wrote:
 Myron Scott [EMAIL PROTECTED] writes:
  Gregory Maxwell wrote:
  There are other cases where it is useful to perform parallel I/O
  without parallel processing..
 
  I have done some testing more along these lines with an old fork of
  postgres code (2001).  In my tests, I used a thread to delegate out
  the actual heap scan of the SeqScan.  The job of the slave thread
  the was to fault in buffer pages and determine the time validity of
  the tuples.  ItemPointers are passed back to the master thread via a
  common memory area guarded by mutex locking.
 
 I was considering a variant idea in the shower this morning: suppose
 that we invent one or more background reader processes that have
 basically the same infrastructure as the background writer, but have
 the responsibility of causing buffer reads to happen at useful times
 (whereas the writer causes writes to happen at useful times).  The
 idea would be for backends to signal the readers when they know they
 will need a given block soon, and then hopefully when they need it
 it'll already be in shared buffers.  For instance, in a seqscan it'd be
 pretty trivial to request block N+1 just after reading block N, and then
 doing our actual processing on block N while (we hope) some reader
 process is faulting in N+1.  Bitmap indexscans could use this structure
 too; I'm less sure about whether plain indexscans could do much with it
 though.
 
 The major issues I can see are:
 
 1. We'd need a shared-memory queue of read requests, probably much like
 the queue of fsync requests.  We've already seen problems with
 contention for the fsync queue, IIRC, and that's used much less heavily
 than the read request queue would be.  So there might be some
 performance issues with getting the block requests sent over to the
 readers.
 
 2. There are some low-level assumptions that no one reads in pages of
 a relation without having some kind of lock on the relation (consider
 eg the case where the relation is being dropped).  A bgwriter-like
 process wouldn't be able to hold lmgr locks, and we wouldn't really want
 it to be thrashing the lmgr shared data structures for each read anyway.
 So you'd have to design some interlock to guarantee that no backend
 abandons a query (and releases its own lmgr locks) while an async read
 request it made is still pending.  Ugh.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Myron Scott
On Mon, 2006-04-10 at 02:16, Martijn van Oosterhout wrote:

 The appears to be two seperate cases here though, one is to just farm
 out the read request to another process (basically aio), the other is
 to do actual processing there. The latter is obviously for more useful
 but requires a fair bit more infrastructure.
 

I ran some tests to see where time is spent during SeqScans.  I did the
following.

tester=# vacuum analyze verbose test;
INFO:  vacuuming public.test
INFO:  test: found 0 removable, 727960 nonremovable row
versions in 5353 pagesDETAIL:  0 dead row versions cannot be
removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.18s/0.27u sec elapsed 0.91 sec.
INFO:  analyzing public.test
INFO:  test: scanned 3000 of 5353 pages, containing 407952
live rows and 0 dead rows; 3000 rows in sample, 727922 estimated
total rows
VACUUM
tester=# select version();
version

---
 PostgreSQL 8.2devel on sparc-sun-solaris2.11, compiled by GCC
gcc (GCC) 3.3.2
(1 row)

tester=# select count(random) from test;
 count

 727960
(1 row)


With the follow ing dtrace results...

# ./probediff2.d 514607
dtrace: script './probediff2.d' matched 10 probes
CPU IDFUNCTION:NAME
  0  46811ExecEndSeqScan:return scan time 20406
^C

  smgrread 
641566800
  Virtualized - smgrread   
439798800
  smgread - Call Count  
5353
  HeapTupleSatisfiesSnapshot  
6735471000
  Virtualized - HeapTupleSatisfiesSnapshot
3516556800
  HeapTupleSatisfiesSnapshot - Call Count 
727960
  Virtualized - ReadBuffer 
558230600
  ReadBuffer   
864931000
  Virtualized - ExecutePlan   
7331181400
  Virtualized - ExecSeqScan   
7331349600
  ExecutePlan
20405943000
  ExecSeqScan
20406161000


The virtualized times are supposed to be actual time spent on the CPU
with
the time spent in the probe factored out.  It seems here that half the
time
in SeqScan is spent time validating the tuples as opposed to 1/10th
doing IO.
I'm not sure that just farming out read IO is going to be all that
helpful 
in this situation.  That's why I think it's a good idea to create a
slave 
process that prefetchs pages and transfers valid ItemPointers to the
master.
There may not be much to be gained on simple SeqScans, however, in
complex 
queries that include a SeqScan, you may gain alot by offloading this
work
onto a slave thread.  A table with TOAST'ed attributes comes to mind. 
The
slave thread could be working away on the rest of the table while the
master
is PG_DETOAST_DATUM'ing the attributes for transmission back to the
client 
or additional processing.  Am I missing something in this analysis?

I've attached my dtrace script.

Myron Scott

#!/usr/sbin/dtrace -s
pid$1::ExecInitSeqScan:entry
{
ts = timestamp;
vts = vtimestamp;
timeon = 1;
}
pid$1::ExecEndSeqScan:return
/ts/
{
printf(scan time %d,(timestamp - ts) /100) ;
@val[ExecSeqScan] = sum(timestamp - ts);
@val[Virtualized - ExecSeqScan] = sum(vtimestamp - vts);
ts = 0;
vts = 0;
timeon = 0;
}
pid$1::HeapTupleSatisfiesSnapshot:entry
/timeon/
{
validity = timestamp;
vvalidity = vtimestamp;
}
pid$1::HeapTupleSatisfiesSnapshot:return
/validity/
{
@val[HeapTupleSatisfiesSnapshot] = sum(timestamp - validity);
@val[Virtualized - HeapTupleSatisfiesSnapshot] = sum(vtimestamp - 
vvalidity);
@val[HeapTupleSatisfiesSnapshot - Call Count] = sum(1);
validity = 0;
vvalidity = 0;
}
pid$1::smgrread:entry
/timeon/
{
rt= timestamp;
vrt= vtimestamp;
}
pid$1::smgrread:return
/rt/
{
@val[smgrread] = sum(timestamp - rt);
@val[Virtualized - smgrread] = sum(vtimestamp - vrt);
@val[smgread - Call Count] = sum(1);
rt = 0;
vrt = 0;
}
pid$1::ReadBuffer:entry
/timeon/
{
brt= timestamp;
vbrt= vtimestamp;
}
pid$1::ReadBuffer:return
/brt/
{
@val[ReadBuffer] = sum(timestamp - brt);
@val[Virtualized - 

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 An idea arising in chat with Joshua Drake:  the retargetting code, if it
 turns out to work and not be excessively expensive, could also be useful
 to implement a server-side connection pooling of sorts: the postmaster
 could keep idle backends and retarget them to a database that receives
 an incoming connection.  However, we'd also need a mechanism to clean
 all backend state previous to reusing a connection, to leave it as
 new (no prepared statements, WITH HOLD cursors, etc.)

Isn't all that work pretty much exactly the main cost of starting a new
backend?


-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Alvaro Herrera
Greg Stark wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 
  An idea arising in chat with Joshua Drake:  the retargetting code, if it
  turns out to work and not be excessively expensive, could also be useful
  to implement a server-side connection pooling of sorts: the postmaster
  could keep idle backends and retarget them to a database that receives
  an incoming connection.  However, we'd also need a mechanism to clean
  all backend state previous to reusing a connection, to leave it as
  new (no prepared statements, WITH HOLD cursors, etc.)
 
 Isn't all that work pretty much exactly the main cost of starting a new
 backend?

On Linux and other systems were fork() has negligible cost, maybe; but
on Windows and Solaris, it's certainly not.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark

Simon Riggs [EMAIL PROTECTED] writes:

 I think it would be useful to think about exactly what type of
 query/activity we are looking to improve the performance on. That way we
 can understand the benefit of this proposal and take some baseline
 measurements to analyse what is happening for those cases.

I find the focus on sequential scans, index scans, etc. quite odd when you're
discussing parallel query processing. The whole goal of parallel query
processing is to bring more *cpu* to bear on the problem. That's going to be
most relevant when you're cpu bound, not i/o bound.

The queries I would expect to be helped most by parallel query processing are
queries that involve sorting. For example, a big merge join with two sorts on
either side could perform the two sorts simultaneously. If they provide the
results of the final pass to a third thread it can execute the merge join and
the rest of the query plan while the sorts are still executing on two other
processors.


-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Myron Scott
On Tue, 2006-04-11 at 07:47, Myron Scott wrote:
 client 
 or additional processing.  Am I missing something in this analysis?
 
 I've attached my dtrace script.
 

To answer my own question, I suppose my processors are
relatively slow compared to most setups.

Myron Scott


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 Greg Stark wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
  
   An idea arising in chat with Joshua Drake:  the retargetting code, if it
   turns out to work and not be excessively expensive, could also be useful
   to implement a server-side connection pooling of sorts: the postmaster
   could keep idle backends and retarget them to a database that receives
   an incoming connection.  However, we'd also need a mechanism to clean
   all backend state previous to reusing a connection, to leave it as
   new (no prepared statements, WITH HOLD cursors, etc.)
  
  Isn't all that work pretty much exactly the main cost of starting a new
  backend?
 
 On Linux and other systems were fork() has negligible cost, maybe; but
 on Windows and Solaris, it's certainly not.

Even on Solaris I'm sure parsing and preparing plans for all the queries,
building up the system table cache for all the objects in the database, and so
on are much much more expensive than fork(). I wouldn't be surprised if even
on windows it was still a pretty close race.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Alvaro Herrera
Greg Stark wrote:

 Even on Solaris I'm sure parsing and preparing plans for all the queries,
 building up the system table cache for all the objects in the database, and so
 on are much much more expensive than fork(). I wouldn't be surprised if even
 on windows it was still a pretty close race.

Parsing/planning what queries?  Regarding system caches, they are
populated from a cache file; they are not read from the catalogs each
time.

But while we don't see a patch implementing the idea, this is all very
theoretical and probably wrong.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Joshua D. Drake

Greg Stark wrote:

Simon Riggs [EMAIL PROTECTED] writes:


I think it would be useful to think about exactly what type of
query/activity we are looking to improve the performance on. That way we
can understand the benefit of this proposal and take some baseline
measurements to analyse what is happening for those cases.


I find the focus on sequential scans, index scans, etc. quite odd when you're
discussing parallel query processing. The whole goal of parallel query
processing is to bring more *cpu* to bear on the problem. That's going to be
most relevant when you're cpu bound, not i/o bound.

The queries I would expect to be helped most by parallel query processing are
queries that involve sorting. For example, a big merge join with two sorts on
either side could perform the two sorts simultaneously. If they provide the
results of the final pass to a third thread it can execute the merge join and
the rest of the query plan while the sorts are still executing on two other
processors.


Well I am go out on a limb here and gather to guess that sequential 
scans and index scans are still very relevant because the CPU could be 
bound by the scan (either one) based on the type of query being performed.


This doesn't really have anything to do with being IO bound as to the 
type of accesses to the data we have to deal with in regards to query 
processing.


You are correct about parallel query processing helping mutliple sort 
queries but those sorts may or may not hit and index.


Sincerely,

Joshua D. Drake








--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 Well I am go out on a limb here and gather to guess that sequential scans and
 index scans are still very relevant because the CPU could be bound by the scan
 (either one) based on the type of query being performed.
 
 This doesn't really have anything to do with being IO bound as to the type of
 accesses to the data we have to deal with in regards to query processing.

It has everything to do with being i/o bound. The only way having two
processors perform part of an index or sequential scan would help is if your
disk subsystem is capable of providing data faster than a single processor is
capable of requesting it.

That's only going to be true for very high end systems with multiple raid
controllers and dozens of spindles.

On the other hand even moderately sized dual-core systems could probably
benefit from being able to perform multiple cpu-intensive operations
simultaneously.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Jim C. Nasby
On Mon, Apr 10, 2006 at 05:22:13PM +0800, Qingqing Zhou wrote:
 
 Markus Schiltknecht [EMAIL PROTECTED] wrote
  Hi Qingqing,
 
  
   As Tom pointed out, without big change, a backend on database D1 can't
   connect to D2. This is because to connect to a database, we need to
   initialize a lot of variables. So when you reconnect to another one on
 the
   fly, you have to change these variables one by one.
 
  Sure, the question is: what is needed to retarget a backend?
 
 Check the code InitPostgres(). These global varaibles are scattered in many
 places, so I am not sure if it is easy to write clean code to clear up these
 variables. But if you can come up with a patch to do reconnect without
 disconnect, that will be cool.

Something else to consider: most queries that would benefit from
parallel execution are expensive enough that the cost of spawning some
new backends wouldn't be that big a deal, so perhaps for an initial
version it would be best to KISS and just spawn parallel execution
backends as needed.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Jim C. Nasby
On Mon, Apr 10, 2006 at 01:36:45PM -0400, Alvaro Herrera wrote:
 Markus Schiltknecht wrote:
  On Mon, 2006-04-10 at 17:22 +0800, Qingqing Zhou wrote:
   Check the code InitPostgres(). These global varaibles are scattered in 
   many
   places, so I am not sure if it is easy to write clean code to clear up 
   these
   variables. But if you can come up with a patch to do reconnect without
   disconnect, that will be cool.
  
  Yes, that's where I've been playing around already. Not with much
  success until now, though. :-(
 
 An idea arising in chat with Joshua Drake:  the retargetting code, if it
 turns out to work and not be excessively expensive, could also be useful
 to implement a server-side connection pooling of sorts: the postmaster
 could keep idle backends and retarget them to a database that receives
 an incoming connection.  However, we'd also need a mechanism to clean
 all backend state previous to reusing a connection, to leave it as
 new (no prepared statements, WITH HOLD cursors, etc.)

Oracle allows you to essentially re-connect to an existing connection by
saving connection state when a connection goes back to the connection
pool. Essentially, you connect, and then re-authenticate as a different
user. That user has a specific environment associated with it which is
then pulled in. This makes it reasonable to use Oracle's built-in code
for handling permissions, etc; you just give each system user a database
account. While this sounds scary to folk that haven't used it, it's
actually safer than rolling your own authentication and security
mechanism (which will likely have bugs in it) and having your middleware
connect to the database with some dedicated account.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Joshua D. Drake

 That's only going to be true for very high end systems with multiple raid
 controllers and dozens of spindles.

Not true. I have a system right now that would benifit. My database is
only 500 megs and I have 2 gig of ram and two processors... I only have
a raid 1, but that is o.k. because most things are cached in memory
anyway.

 
 On the other hand even moderately sized dual-core systems could probably
 benefit from being able to perform multiple cpu-intensive operations
 simultaneously.

See above :)

Joshua D. Drake

 
-- 

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Markus Schiltknecht
Hi,

On Sun, 2006-04-09 at 15:11 -0400, Tom Lane wrote:
 You can't just retarget a backend to operate in another database, at
 least not without major changes in that infrastructure. 

Why not? What would be needed to retarget a backend to operate in
another database?

Such a retargetting would be helpful in several cases:
- pre-forking of backends (well, depends on efficiency of
retargetting..)
- slave backends for parallel query execution
- slave backends for replaying replicated transactions

I'm using backends operating without a client connection, with superuser
privileges only. But those are still connected to one specific database.
It would really be helpfull if I could retarget them as needed.

Regards

Markus



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Qingqing Zhou

Markus Schiltknecht [EMAIL PROTECTED] wrote
 Hi,

 On Sun, 2006-04-09 at 15:11 -0400, Tom Lane wrote:
  You can't just retarget a backend to operate in another database, at
  least not without major changes in that infrastructure.

 Why not? What would be needed to retarget a backend to operate in
 another database?


As Tom pointed out, without big change, a backend on database D1 can't
connect to D2. This is because to connect to a database, we need to
initialize a lot of variables. So when you reconnect to another one on the
fly, you have to change these variables one by one.

Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Simon Riggs
On Sun, 2006-04-09 at 17:11 +0800, Qingqing Zhou wrote:
 Tom Lane [EMAIL PROTECTED] wrote
 
  This is exactly the bit of optimism I was questioning.  We've already
  been sweating blood trying to reduce multiprocessor contention on data
  structures in which collisions ought to be avoidable (ie, buffer arrays
  where you hope not everyone is hitting the same buffer at once).  I
  think passing large volumes of data between different processes is going
  to incur quite a lot of locking overhead, pipeline stalls for cache line
  transfers, etc, etc, because heavy contention for the transfer buffer is
  simply not going to be avoidable.
 
 
 Yes, there is overhead. Let's say how can we tackle them one by one:
 
 * lock contention of transfer buffer *

I think it would be useful to think about exactly what type of
query/activity we are looking to improve the performance on. That way we
can understand the benefit of this proposal and take some baseline
measurements to analyse what is happening for those cases.

For example, a CREATE INDEX will require most if not all rows. Whereas a
SeqScan may have a selectivity of 0.001 or below in some cases. The
SeqScan case might also vary considerably in the amount of CPU required
to identify qualifying rows and that CPU might benefit from
parallelisation also.

e.g. A low-selectivity SeqScan is a good case for fairly simple
performance improvement, since the output from that executor node is
going to be much less than its input and there is less need to
parallelize the complete tree. A parallel slave node might then be
placed in charge of running a complete SeqScan node on an agreed section
of a table, rather than simply performing I/O for another query.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Markus Schiltknecht
Hi Qingqing,

On Mon, 2006-04-10 at 16:38 +0800, Qingqing Zhou wrote:
  Why not? What would be needed to retarget a backend to operate in
  another database?

 As Tom pointed out, without big change, a backend on database D1 can't
 connect to D2. This is because to connect to a database, we need to
 initialize a lot of variables. So when you reconnect to another one on the
 fly, you have to change these variables one by one.

Sure, the question is: what is needed to retarget a backend?

IMHO, in all of the three examples mentioned up-thread, it would be
better to retarget an existing backend, instead of forking a new one,
because that would perform better and/or save some resources.

Please do also consider, that for parallel query execution as well as
for replaying remote transactions access rights checking and any client
connection setup could be omitted. Such special processing backeds could
easily be run with superuser privileges and without a client connection.

Regards

Markus



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Martijn van Oosterhout
On Sun, Apr 09, 2006 at 05:54:56PM -0400, Tom Lane wrote:
 Here's a detailed scenario:
 
   1. Backend X reads page N of a table T, queues a request for N+1.
   2. While processing page N, backend X gets an error and aborts
  its transaction, thereby dropping all its lmgr locks.
   3. Backend Y executes a DROP or TRUNCATE on T, which it can
  now do because there's no lock held anywhere on T.  There
  are actually two interesting sub-phases of this:
  3a.  Kill any shared buffers holding pages to be deleted.
  3b.  Physically drop or truncate the OS file.
   4. Bgreader tries to execute the pending read request.  Ooops.
 
 If step 4 happens after 3b, the bgreader gets an error.  Maybe we could
 kluge that to not cause any serious problems, but the nastier case is
 where 4 happens between 3a and 3b --- the bgreader sees nothing wrong,
 but it's now loaded a shared buffer that must *not* be there.

The way I dealt with this when playing with async i/o for postgres was
to have the initiator mark the buffer I/O-in-progress then send the
request, just like you suggest. This would cause other people to wait
on it. When the request later finished you processed it as usual and
backends waiting would continue.

The appears to be two seperate cases here though, one is to just farm
out the read request to another process (basically aio), the other is
to do actual processing there. The latter is obviously for more useful
but requires a fair bit more infrastructure.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Qingqing Zhou

Markus Schiltknecht [EMAIL PROTECTED] wrote
 Hi Qingqing,

 
  As Tom pointed out, without big change, a backend on database D1 can't
  connect to D2. This is because to connect to a database, we need to
  initialize a lot of variables. So when you reconnect to another one on
the
  fly, you have to change these variables one by one.

 Sure, the question is: what is needed to retarget a backend?

Check the code InitPostgres(). These global varaibles are scattered in many
places, so I am not sure if it is easy to write clean code to clear up these
variables. But if you can come up with a patch to do reconnect without
disconnect, that will be cool.

 IMHO, in all of the three examples mentioned up-thread, it would be
 better to retarget an existing backend, instead of forking a new one,
 because that would perform better and/or save some resources.

That's for sure.

 Please do also consider, that for parallel query execution as well as
 for replaying remote transactions access rights checking and any client
 connection setup could be omitted. Such special processing backeds could
 easily be run with superuser privileges and without a client connection.

This is a problem for parallel query execution. I suspect we can't run with
superuser privileges for now and for future. For now, I am not clear if
there is any ACL check at query execution stage -- seems should not be, at
least for DML. For future, I am pretty clear that this is not the way -- the
mere possibility of future support of audit or MAC will bring in mind the
reason.

Regards,
Qingqing



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-04-09 kell 18:26, kirjutas Martijn van
Oosterhout:

 The cost of fetching a page from the OS is not really much of an
 overhead, 

Have you tested this ?

I remember having a case, where data load usin COPY into a table with
several indexes ran an order of magnitude faster with bigger work-mem.
It was with a computer with enough memory to fit the whole working set
(table + indexes) into OS cache, thus I can't think of any other
overhead except reading the index pages repeatedly from OS cache to
shared mem buffers.

I agree that a real disk read is much slower than OS cache read, but
even an OS cache read is still much slower than no read at all. Sure
it's not nearly as bad on seqscans, but it still takes time.

 so I'd like to know how much benefit these buffer cache hacks
 actually produce.

---
Hannu



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Markus Schiltknecht
On Mon, 2006-04-10 at 17:22 +0800, Qingqing Zhou wrote:
 Check the code InitPostgres(). These global varaibles are scattered in many
 places, so I am not sure if it is easy to write clean code to clear up these
 variables. But if you can come up with a patch to do reconnect without
 disconnect, that will be cool.

Yes, that's where I've been playing around already. Not with much
success until now, though. :-(

 This is a problem for parallel query execution. I suspect we can't run with
 superuser privileges for now and for future. For now, I am not clear if
 there is any ACL check at query execution stage -- seems should not be, at
 least for DML. For future, I am pretty clear that this is not the way -- the
 mere possibility of future support of audit or MAC will bring in mind the
 reason.

(What's audit or MAC?)

Well, then let such a slave backend inherit access rights from it's
caller. You'd still save rechecking of passwords or such. And you save
the client connection.

I'll give the retargetting code another try. If anybody has some hints
on what stumbling blocks and potholes to watch out for on the way
there...

Regards

Markus


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Alvaro Herrera
Markus Schiltknecht wrote:
 On Mon, 2006-04-10 at 17:22 +0800, Qingqing Zhou wrote:
  Check the code InitPostgres(). These global varaibles are scattered in many
  places, so I am not sure if it is easy to write clean code to clear up these
  variables. But if you can come up with a patch to do reconnect without
  disconnect, that will be cool.
 
 Yes, that's where I've been playing around already. Not with much
 success until now, though. :-(

An idea arising in chat with Joshua Drake:  the retargetting code, if it
turns out to work and not be excessively expensive, could also be useful
to implement a server-side connection pooling of sorts: the postmaster
could keep idle backends and retarget them to a database that receives
an incoming connection.  However, we'd also need a mechanism to clean
all backend state previous to reusing a connection, to leave it as
new (no prepared statements, WITH HOLD cursors, etc.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Markus Schiltknecht
Hi,

On Mon, 2006-04-10 at 13:36 -0400, Alvaro Herrera wrote:
 An idea arising in chat with Joshua Drake:  the retargetting code, if it
 turns out to work and not be excessively expensive, could also be useful
 to implement a server-side connection pooling of sorts: the postmaster
 could keep idle backends and retarget them to a database that receives
 an incoming connection.  However, we'd also need a mechanism to clean
 all backend state previous to reusing a connection, to leave it as
 new (no prepared statements, WITH HOLD cursors, etc.)

I suspect that's what I was calling 'pre-forking'. Please also note that
this would be the most difficult of my three examples (by difficulty of
implementation).

Regards

Markus



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Luke Lonergan
Hannu,

On 4/10/06 2:23 AM, Hannu Krosing [EMAIL PROTECTED] wrote:

 The cost of fetching a page from the OS is not really much of an
 overhead,
 
 Have you tested this ?

I have - the overhead of fetching a page from Linux I/O cache to buffer
cache is about an additional 20% over fetching it directly from buffer cache
on PG 7.4.

Haven't tested it on 8.1.

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-10 Thread Luke Lonergan
Gregory,

On 4/9/06 2:04 PM, Gregory Maxwell [EMAIL PROTECTED] wrote:

 It increases Linux's maximum readahead from 128K to 1meg .. and it
 should be smart enough that you could crank it up further without too
 much risk of hurting performance elsewhere.

Interesting - we are now routinely using the blockdev --setra to set the
block device readahead and our common setting now is 16MB, so I will look at
how the behavior of this adaptive readahead affects normal query workload.

From the patch:

  Normally, the kernel uses a stock readahead logic that is well
  understood and well tuned. This option enables a much complex and
  feature rich one. It is more aggressive and memory efficient in
  doing readahead, and supports some less-common access patterns such
  as reading backward and reading sparsely. However, due to the great
  diversity of real world applications, it might not fit everyone.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-04-08 kell 18:47, kirjutas Luke Lonergan:

 I agree - there is a real continuing need for overlapped scan and execution,
 though I'm not sure quite how to get it efficiently using inter-process
 communication.  The AIO interface is nicely proven at this point, but to
 Tom's point, it's non-portable.
 
 What we see now is that the executor is CPU bound at about 300MB/s scan rate
 using modern Opteron CPUs.  

Where exactly does the CPU spend its time at 300MB/s ? Is it mainly I/O
related processing,  processing pages into in-memory tuples or something
else ?

--
Hannu



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Qingqing Zhou

Tom Lane [EMAIL PROTECTED] wrote

 This is exactly the bit of optimism I was questioning.  We've already
 been sweating blood trying to reduce multiprocessor contention on data
 structures in which collisions ought to be avoidable (ie, buffer arrays
 where you hope not everyone is hitting the same buffer at once).  I
 think passing large volumes of data between different processes is going
 to incur quite a lot of locking overhead, pipeline stalls for cache line
 transfers, etc, etc, because heavy contention for the transfer buffer is
 simply not going to be avoidable.


Yes, there is overhead. Let's say how can we tackle them one by one:

* lock contention of transfer buffer *
A general way is partitioning. We can try to alleviate the lock contention 
by sharing transfer buffer only between each two processes (master and 
slave).

* large amount of data *
Thinking of the simplest master-slave seqscan senario, I am copying the 
HeapTuple so far (like the code in sort read/write tuple), but potentially 
we can only pass the offset and pinned page, which will require the slave 
process pinned the page for a little longer time, say every 5 pages. I 
didn't see any application can cause Postgres pinned a lot pages, so this 
might be one step to reduce the data volumn.

On the other hand, why do we need parallel execution anyway? Because if we 
got a 4 way SMP with strong IO device, we may want to trade the performance 
with more resources for some queries.


Regards,
Qingqing 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Myron Scott


On Apr 8, 2006, at 10:29 PM, Luke Lonergan wrote:


Myron,

First, this sounds really good!

On 4/8/06 9:54 PM, Myron Scott [EMAIL PROTECTED] wrote:


 I added a little hack to the buffer
code to force
pages read into the buffer to stay at the back of the free buffer  
list

until the master
thread has had a chance to use it.


This is the part I'm curious about - is this using the  
shared_buffers region

in a circular buffer fashion to store pre-fetched pages?


Yes.  That is basically what the slave thread is trying to do.  As  
well as weed out

any tuples/pages that don't need to be looked at due to dead tuples.
I did several things to try and insure that a buffer needed by the  
master thread
would not be pulled out of the buffer pool before it was seen by the  
master.
I wanted to do this without holding the buffer pinned, so I did the  
change to

the buffer free list to do this.

static void
AddBufferToFreelist(BufferDesc *bf)
{
S_LOCK(SLockArray[FreeBufMgrLock]);
int movebehind = SharedFreeList-freePrev;
/* find the right spot with bias */
while ( BufferDescriptors[movebehind].bias  bf-bias ) {
movebehind = BufferDescriptors[movebehind].freePrev;
}
...

The bias number is removed the next time the buffer is pulled out of  
the free list.


Also, I force an ItemPointer transfer when the ItemPointer transfer list
is full ( currently 4096 ) or 10% of the buffer pool have been affected
by the slave thread.  Lastly, if the slave thread gets too far ahead of
the master thread, it waits for the master to catch up.  To my  
knowledge,

this hasn't happened yet.



One thing I've wondered about is: how much memory is required to get
efficient overlap?  Did you find that you had to tune the amount of  
buffer

memory to get the performance to work out?


I haven't done much tuning yet.  I think there is an optimal balance  
that I

most likely haven't found yet.

Myron Scott


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Martijn van Oosterhout
On Sun, Apr 09, 2006 at 08:23:36AM -0700, Myron Scott wrote:
 This is the part I'm curious about - is this using the  
 shared_buffers region
 in a circular buffer fashion to store pre-fetched pages?
 
 Yes.  That is basically what the slave thread is trying to do.  As  
 well as weed out
 any tuples/pages that don't need to be looked at due to dead tuples.
 I did several things to try and insure that a buffer needed by the  
 master thread
 would not be pulled out of the buffer pool before it was seen by the  
 master.
 I wanted to do this without holding the buffer pinned, so I did the  
 change to
 the buffer free list to do this.

Is this necessary? I mean, what's the chance that a page might get
thrown out early? And if so, what's the chance that page will still be
in the OS cache?

The cost of fetching a page from the OS is not really much of an
overhead, so I'd like to know how much benefit these buffer cache hacks
actually produce.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Myron Scott [EMAIL PROTECTED] writes:
 Gregory Maxwell wrote:
 There are other cases where it is useful to perform parallel I/O
 without parallel processing..

 I have done some testing more along these lines with an old fork of
 postgres code (2001).  In my tests, I used a thread to delegate out
 the actual heap scan of the SeqScan.  The job of the slave thread
 the was to fault in buffer pages and determine the time validity of
 the tuples.  ItemPointers are passed back to the master thread via a
 common memory area guarded by mutex locking.

I was considering a variant idea in the shower this morning: suppose
that we invent one or more background reader processes that have
basically the same infrastructure as the background writer, but have
the responsibility of causing buffer reads to happen at useful times
(whereas the writer causes writes to happen at useful times).  The
idea would be for backends to signal the readers when they know they
will need a given block soon, and then hopefully when they need it
it'll already be in shared buffers.  For instance, in a seqscan it'd be
pretty trivial to request block N+1 just after reading block N, and then
doing our actual processing on block N while (we hope) some reader
process is faulting in N+1.  Bitmap indexscans could use this structure
too; I'm less sure about whether plain indexscans could do much with it
though.

The major issues I can see are:

1. We'd need a shared-memory queue of read requests, probably much like
the queue of fsync requests.  We've already seen problems with
contention for the fsync queue, IIRC, and that's used much less heavily
than the read request queue would be.  So there might be some
performance issues with getting the block requests sent over to the
readers.

2. There are some low-level assumptions that no one reads in pages of
a relation without having some kind of lock on the relation (consider
eg the case where the relation is being dropped).  A bgwriter-like
process wouldn't be able to hold lmgr locks, and we wouldn't really want
it to be thrashing the lmgr shared data structures for each read anyway.
So you'd have to design some interlock to guarantee that no backend
abandons a query (and releases its own lmgr locks) while an async read
request it made is still pending.  Ugh.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Gregory Maxwell [EMAIL PROTECTED] writes:
 For example, one case made in this thread involved bursty performance
 with seqscans presumably because the I/O was stalling while processing
 was being performed.

Actually, the question that that raised in my mind is why isn't the
kernel doing read-ahead properly?  When we're doing nonsequential
access like an indexscan, it's unsurprising that the kernel can't guess
which block we need next, but in a plain seqscan you'd certainly expect
the read-ahead algorithm to kick in and ensure that the next block is
fetched before we need it.

So before we go inventing complicated bits of code with lots of added
overhead, we should first find out exactly why the system doesn't
already work the way it's supposed to.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Jonah H. Harris
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 I was considering a variant idea in the shower this morning:suppose
 that we invent one or more background reader processes that have
 basically the same infrastructure as the background writer, but have
 the responsibility of causing buffer reads to happen at useful times
 (whereas the writer causes writes to happen at useful times).  The
 idea would be for backends to signal the readers when they know they
 will need a given block soon, and then hopefully when they need it
 it'll already be in shared buffers.

This is sort of what I'm playing with.  There are N-number of backends
which are configured at startup and are available solely for parallel
processing.  If you have parallelism enabled and you have a query
which the planner says can take advantage of it, your backend becomes
a sort of coordinator and signals the number of backends defined by
(right now) a global parallelism degree parameter for participation in
query execution.

 The major issues I can see are:

 1. We'd need a shared-memory queue of read requests, probably much like
 the queue of fsync requests.  We've already seen problems with
 contention for the fsync queue, IIRC, and that's used much less heavily
 than the read request queue would be.  So there might be some
 performance issues with getting the block requests sent over to the
 readers.

Yes, I was looking at this contention as well.  However, I haven't
looked at the fsync request queue in detail, but I'm going to play
around with a simple initial semaphore test such that if the queue is
full, the query will just be processed serially.  I'll also take a
look at the fsync queue and see how it's working.

While it's down the road a ways, I'm really looking forward to seeing
parallel index builds, sorts, and joins.  In my opinion, all of our
combined ideas and system knowledge is really going to pay off big
time on this one.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Myron Scott


On Apr 9, 2006, at 9:26 AM, Martijn van Oosterhout wrote:


On Sun, Apr 09, 2006 at 08:23:36AM -0700, Myron Scott wrote:

This is the part I'm curious about - is this using the
shared_buffers region
in a circular buffer fashion to store pre-fetched pages?


Yes.  That is basically what the slave thread is trying to do.  As
well as weed out
any tuples/pages that don't need to be looked at due to dead tuples.
I did several things to try and insure that a buffer needed by the
master thread
would not be pulled out of the buffer pool before it was seen by the
master.
I wanted to do this without holding the buffer pinned, so I did the
change to
the buffer free list to do this.


Is this necessary? I mean, what's the chance that a page might get
thrown out early? And if so, what's the chance that page will still be
in the OS cache?

The cost of fetching a page from the OS is not really much of an
overhead, so I'd like to know how much benefit these buffer cache  
hacks

actually produce.



You may be right on this one.  I wanted ensure that I didn't lose
pages I needed.  I may have just added a belt to my
suspenders.  I'll add a switch to turn it off and on and try and
devise a test to see what the costs either way are.


Myron Scott

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Gregory Maxwell
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 Gregory Maxwell [EMAIL PROTECTED] writes:
  For example, one case made in this thread involved bursty performance
  with seqscans presumably because the I/O was stalling while processing
  was being performed.

 Actually, the question that that raised in my mind is why isn't the
 kernel doing read-ahead properly?  When we're doing nonsequential
 access like an indexscan, it's unsurprising that the kernel can't guess
 which block we need next, but in a plain seqscan you'd certainly expect
 the read-ahead algorithm to kick in and ensure that the next block is
 fetched before we need it.

 So before we go inventing complicated bits of code with lots of added
 overhead, we should first find out exactly why the system doesn't
 already work the way it's supposed to.

But is that really the behavior we should expect?

How much memory can we expect the OS to spend on opportunistic
read-in? How much disk access should be spent on a guess? There is an
intrinsic tradeoff here, applications tend to be bursty so just
because you're reading a lot now doesn't mean you'll continue... and
the filesystem will have fragmentation, so a failed guess can
translate into a lot of pointless seeking.

As I recall, in Linux 2.6 you have something like a max of 128KB of
readahead. Given that and a disk subsystem that reads at 200MB/sec you
can't spend more than 600us processing before requesting enough
additional blocks put the disk back into readhead or you will stall
the disk.  Stalling the disk costs more than you'd expect, due to FS
fragmentation there can be terrific gains from allowing the OS and
disk to issue reads out of order from a large request queue.

It would probably be reasonable to say that the OS should be using
much larger readhead buffers, especially on systems with fast disk
subsystems... But that doesn't come for free and can slaughter
performance for many workloads (consider, what if it was triggering
5MB of file oriented read-ahead for every index scan seek we did?). 
There is an adaptive readahead patch for Linux which should improve
things (http://lwn.net/Articles/176279/ and if you google around there
are some benchmarks) but I doubt that even that would be able to keep
a 200MB/sec+ disk subsystem saturated with the sort of access patterns
PG has...

To address this in a cross platform way will be a challenge. I doubt
Linux is alone at having skimpy readahead (because big readahead
translates into huge losses if you get it wrong).

Given this information, a stupid 'fault-in' process should probably
give huge gains for seqscans... but I think the extra work required to
find a solution which is also useful for index operations is probably
worth it as well.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 I was considering a variant idea in the shower this morning:suppose
 that we invent one or more background reader processes that have
 basically the same infrastructure as the background writer, but have
 the responsibility of causing buffer reads to happen at useful times

 This is sort of what I'm playing with.  There are N-number of backends
 which are configured at startup and are available solely for parallel
 processing.

That's not remotely the same thing: a backend is a very different animal
from a bgwriter.  In particular, bgwriter (and bgreaders if we had 'em)
aren't database-specific, don't need to think about permission checking
as they don't execute on behalf of particular users, don't have syscaches
to keep in-sync with everything else, etc etc.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Gregory Maxwell [EMAIL PROTECTED] writes:
 On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 So before we go inventing complicated bits of code with lots of added
 overhead, we should first find out exactly why the system doesn't
 already work the way it's supposed to.

 But is that really the behavior we should expect?

Certainly.  If the OS has readahead logic at all, it ought to think that
a seqscan of a large table qualifies.  Your arguments seem to question
whether readahead is useful at all --- but they would apply *just as
well* to an app doing its own readahead, which is what is really
getting proposed in this thread.

Before we go replacing a standard OS-level facility with our own
version, we need to have a much clearer idea of why the OS isn't getting
the job done for us.  Otherwise we're likely to write a large amount of
code and find out that it doesn't work very well either.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Jonah H. Harris
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 That's not remotely the same thing: a backend is a very different animal
 from a bgwriter.

Yes, I know.  Had I known it would've been picked apart, I would've
described it better.

 In particular, bgwriter (and bgreaders if we had 'em)
 aren't database-specific, don't need to think about permission checking
 as they don't execute on behalf of particular users, don't have syscaches
 to keep in-sync with everything else, etc etc.

In the current proof-of-concept, there is a backend (manager) and
several reader-like processes; However, in this case, the readers
perform actual execution on behalf of the user.

The reader-like processes or execution agents, whatever you want to
call them, are limited to the postgres database only because they've
been coded that way... but there's nothing that actually restricts
them from being used for any database.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 The reader-like processes or execution agents, whatever you want to
 call them, are limited to the postgres database only because they've
 been coded that way... but there's nothing that actually restricts
 them from being used for any database.

Oh?  Anything much above the level of bufmgr is going to be dependent on
relcache, syscache, etc, and those are all definitely database-specific.
You can't just retarget a backend to operate in another database, at
least not without major changes in that infrastructure.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Jonah H. Harris
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 Oh?  Anything much above the level of bufmgr is going to be dependent on
 relcache, syscache, etc, and those are all definitely database-specific.
 You can't just retarget a backend to operate in another database, at
 least not without major changes in that infrastructure.

It's not that complicated and it may not prove to be the best method,
but it's worth a try.  When I get done playing with it, I'll send it
to ya to check out.


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Alvaro Herrera
Jonah H. Harris wrote:
 On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
  Oh?  Anything much above the level of bufmgr is going to be dependent on
  relcache, syscache, etc, and those are all definitely database-specific.
  You can't just retarget a backend to operate in another database, at
  least not without major changes in that infrastructure.
 
 It's not that complicated and it may not prove to be the best method,
 but it's worth a try.  When I get done playing with it, I'll send it
 to ya to check out.

Hmm, if it turns out to work, I'm interested.  This is something that
definitely could be of some help to autovacuum.  Please post to
pgsql-patches.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread lister
I am working with Solaris on SPARC almost exclusively and I believe Josh said 
that Sun was the one who found the bursty behavior with scans.  Has it been 
confirmed that this is the case on all/most platforms?

Myron Scott
-Original Message-

From:  Tom Lane [EMAIL PROTECTED]
Subj:  Re: [HACKERS] Support Parallel Query Execution in Executor
Date:  Sun Apr 9, 2006 11:48 am
Size:  1K
To:  Gregory Maxwell [EMAIL PROTECTED]
cc:  pgsql-hackers@postgresql.org

Gregory Maxwell [EMAIL PROTECTED] writes:
 On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 So before we go inventing complicated bits of code with lots of added
 overhead, we should first find out exactly why the system doesn't
 already work the way it's supposed to.

 But is that really the behavior we should expect?

Certainly.  If the OS has readahead logic at all, it ought to think that
a seqscan of a large table qualifies.  Your arguments seem to question
whether readahead is useful at all --- but they would apply *just as
well* to an app doing its own readahead, which is what is really
getting proposed in this thread.

Before we go replacing a standard OS-level facility with our own
version, we need to have a much clearer idea of why the OS isn't getting
the job done for us.  Otherwise we're likely to write a large amount of
code and find out that it doesn't work very well either.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Luke Lonergan
Myron,

On 4/9/06 1:18 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 I am working with Solaris on SPARC almost exclusively and I believe Josh said
 that Sun was the one who found the bursty behavior with scans.  Has it been
 confirmed that this is the case on all/most platforms?

It was our folks at Greenplum that found this behavior.  It is particularly
bad on Solaris because of the poor OS readahead on UFS.  You can modify UFS
readahead behavior using OS tuning parameters as mentioned previously in the
thread and contained in Jignesh Shaw's blog at Sun (we worked with Jignesh
and Sun for about a year to get this information).

Interestingly, the use of POSIX fadvise() to indicate the nature of the I/O
as sequential makes a big positive difference in many cases, but it hurts a
lot in some other workloads where there is seek I/O and the sequential
pattern was set.

I think the best solution will use a more direct overlap approach in the
executor.

- Luke   



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Gregory Maxwell
On 4/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 Certainly.  If the OS has readahead logic at all, it ought to think that
 a seqscan of a large table qualifies.  Your arguments seem to question
 whether readahead is useful at all --- but they would apply *just as
 well* to an app doing its own readahead, which is what is really
 getting proposed in this thread.

We know we're going to read the whole table, the OS doesn't. We can be
confident that we're will not use our  read-ahead when we're really
doing random accesses.

The OS has to deal with many applications with many workloads running
on a wider spectrum of hardware. It's important that it does the right
thing, but probably more important that it doesn't do the wrong thing.
 This encourages erroring on the side of small readahead.

 Before we go replacing a standard OS-level facility with our own
 version, we need to have a much clearer idea of why the OS isn't getting
 the job done for us.  Otherwise we're likely to write a large amount of
 code and find out that it doesn't work very well either.

Thats a fair position...  It would be useful to know much much
readahead PG needs in order to keep a high speed disk subsystem
saturated. This would involve profiling how frequently PG requests
data, how much it requests when running out of a hot cache. We could
then say that the OS would need to readahead xMB to keep a yMB/s disk
subsystem saturated.

It would be good to know how much FBSD will readahead...

It might also be interesting for someone with the right testing rig on
linux to try the adaptive
readahead patch to see if that improves PG's ability to keep the disk busy.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Luke Lonergan
Gregory,

On 4/9/06 1:36 PM, Gregory Maxwell [EMAIL PROTECTED] wrote:

 It might also be interesting for someone with the right testing rig on
 linux to try the adaptive
 readahead patch to see if that improves PG's ability to keep the disk busy.

the adaptive readahead patch?  Did I miss one?

We will happily test experimental patches that improve I/O utilitization.
We have an assortment of gear with high speed I/O, mostly Linux now.

- Luke 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Luke Lonergan
Tom,

On 4/9/06 9:27 AM, Tom Lane [EMAIL PROTECTED] wrote:

 2. There are some low-level assumptions that no one reads in pages of
 a relation without having some kind of lock on the relation (consider
 eg the case where the relation is being dropped).  A bgwriter-like
 process wouldn't be able to hold lmgr locks, and we wouldn't really want
 it to be thrashing the lmgr shared data structures for each read anyway.
 So you'd have to design some interlock to guarantee that no backend
 abandons a query (and releases its own lmgr locks) while an async read
 request it made is still pending.  Ugh.

Does this lead us right back to planning for the appropriate amount of
readahead at plan time?  We could consider a page range lock at that point
instead of locking each individual page.

It seems that the planner might have enough information to identify the
times when this approach would be beneficial (low selectivity aggregate,
etc), and to allocate sufficient resources to accommodate the overlap.  The
planner specific knowledge would be the main reason this would work better
than the OS built-in dynamic readahead algorithms.

- Luke 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Gregory Maxwell
On 4/9/06, Luke Lonergan [EMAIL PROTECTED] wrote:
 Gregory,

 On 4/9/06 1:36 PM, Gregory Maxwell [EMAIL PROTECTED] wrote:

  It might also be interesting for someone with the right testing rig on
  linux to try the adaptive
  readahead patch to see if that improves PG's ability to keep the disk busy.

 the adaptive readahead patch?  Did I miss one?

 We will happily test experimental patches that improve I/O utilitization.
 We have an assortment of gear with high speed I/O, mostly Linux now.

Linux kernel patch, I'd mentioned it in a prior post.

http://www.vanheusden.com/ara/

It increases Linux's maximum readahead from 128K to 1meg .. and it
should be smart enough that you could crank it up further without too
much risk of hurting performance elsewhere.

If PG's bottlenecked on seqscan due to insufficient readahead, I'd
expect this to show an improvement...  although I am still somewhat
doubtful that it'll be enough to keep the disk saturated if PG's
behavior is highly unoptimal.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-09 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 On 4/9/06 9:27 AM, Tom Lane [EMAIL PROTECTED] wrote:
 2. There are some low-level assumptions that no one reads in pages of
 a relation without having some kind of lock on the relation (consider
 eg the case where the relation is being dropped).  A bgwriter-like
 process wouldn't be able to hold lmgr locks, and we wouldn't really want
 it to be thrashing the lmgr shared data structures for each read anyway.
 So you'd have to design some interlock to guarantee that no backend
 abandons a query (and releases its own lmgr locks) while an async read
 request it made is still pending.  Ugh.

 Does this lead us right back to planning for the appropriate amount of
 readahead at plan time?  We could consider a page range lock at that point
 instead of locking each individual page.

No, you're missing my point entirely.  What's bothering me is the
prospect of a bgreader process taking actions that are only safe
because of a lock that is held by a different process --- changing the
granularity of that lock doesn't get you out of trouble.

Here's a detailed scenario:

1. Backend X reads page N of a table T, queues a request for N+1.
2. While processing page N, backend X gets an error and aborts
   its transaction, thereby dropping all its lmgr locks.
3. Backend Y executes a DROP or TRUNCATE on T, which it can
   now do because there's no lock held anywhere on T.  There
   are actually two interesting sub-phases of this:
   3a.  Kill any shared buffers holding pages to be deleted.
   3b.  Physically drop or truncate the OS file.
4. Bgreader tries to execute the pending read request.  Ooops.

If step 4 happens after 3b, the bgreader gets an error.  Maybe we could
kluge that to not cause any serious problems, but the nastier case is
where 4 happens between 3a and 3b --- the bgreader sees nothing wrong,
but it's now loaded a shared buffer that must *not* be there.

Having thought more about this, there may be a solution possible using
tighter integration with the bufmgr.  The bufmgr already has a notion of
a buffer page being read busy.  Maybe, rather than just pushing a read
request into a separate queue somewhere, the requestor has to assign a
shared buffer for the page it wants and put the buffer into read-busy
state, but then pass the request to perform the physical read off to
someone else.  The advantage of this is that there's state that step 3a
can see telling it that a conflicting read is pending, and it just needs
to wait for the read to finish before killing the buffer.

Bottom line seems to be: just as the bgwriter is pretty intimately tied
to bufmgr, bgreaders would have to be as well.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 Basically reading a large table off disk does this:

 read some table while not processing
 process in cpu while not reading
 read some more table while not processing
 process some more in cpu while not reading
 etc.
 resulting in an I/O througput graph that looks like:

 *   *   *
* ** ** *
   ** ** **
  *   *   *  *

Interesting ...

 The really annoying part about this, for me personally, is that the peaks 
 are significantly faster than comparable commercial DBMSes ... but our 
 average is far less.   So even on a single seq scan, parallel query 
 execution would make a significant difference in performance, possibly as 
 much as +75% on seq scans of large tables.

... but I'm failing to follow where it says that parallel processing 
will fix that.  All I can foresee in that direction is extra data
transfer costs, bought at the price of portability and locking headaches.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Jonah H. Harris
On 4/8/06, Tom Lane [EMAIL PROTECTED] wrote:
 ... but I'm failing to follow where it says that parallel processing
 will fix that.  All I can foresee in that direction is extra data
 transfer costs, bought at the price of portability and locking headaches.

I don't think it's any less portable than the system is now; It's just
enabling multiple slave processes to participate in scans and
processing (parallel query, parallel index builds, parallel sorts,
...)  Likewise, the additional I/O cost isn't that much of an issue
because systems which really take advantage of this type of parallel
processing have large bandwidth I/O arrays anyway.

I didn't even want to mention that EVERY other database I know of
(Oracle, DB2, Sybase, SQL Server, Ingres, Bizgres MPP, MaxDB) supports
this, but it's a pretty obvious win for many environments.


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Markus Schiltknecht
Hi,

On Sat, 2006-04-08 at 13:16 -0400, Jonah H. Harris wrote:
 On 4/8/06, Tom Lane [EMAIL PROTECTED] wrote:
  ... but I'm failing to follow where it says that parallel processing
  will fix that.  All I can foresee in that direction is extra data
  transfer costs, bought at the price of portability and locking headaches.
 
 I don't think it's any less portable than the system is now;

ACK. As long as processes, signals and shared memory are used this could
be as portable as PostgreSQL is now.

 It's just
 enabling multiple slave processes to participate in scans and
 processing (parallel query, parallel index builds, parallel sorts,
 ...)  Likewise, the additional I/O cost isn't that much of an issue
 because systems which really take advantage of this type of parallel
 processing have large bandwidth I/O arrays anyway.

Ehm.. which additional I/O cost? Or do you count inter-process
communication to I/O?

I'd like to help teaching PostgreSQL the art of parallel query
execution. I have rawly implemented an internal message passing system
on shared memory basis. This allows all pg-processes to send messages to
each other identified by PID. Uppon receiving of a message a process
gets a SIGUSR2 (AFAIC remember now).

On the positive side are:
 - portable as I'm using only shmem and signals
 - fast (not much copying around of the data in memory)

One downside I see with my approach is:
 - the shared memory buffer is limited in size, so it can simply be
  'full' and no more messages can be sent before another process reads
  its messages and frees the memory for other messages.

In case you're interested I'll compile a patch for review.

Regards

Markus



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Jonah H. Harris
On 4/8/06, Markus Schiltknecht [EMAIL PROTECTED] wrote:
 ACK. As long as processes, signals and shared memory are used this could
 be as portable as PostgreSQL is now.

This is certainly the case.

 Ehm.. which additional I/O cost? Or do you count inter-process
 communication to I/O?

Inter-process will add a minimal amount, but if it's done correctly,
you'll still end up ahead.

 I'd like to help teaching PostgreSQL the art of parallel query
 execution.

Cool, we're not at the communication-level yet, but your help would be
appreciated.

 In case you're interested I'll compile a patch for review.

Surely!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 On 4/8/06, Markus Schiltknecht [EMAIL PROTECTED] wrote:
 Ehm.. which additional I/O cost? Or do you count inter-process
 communication to I/O?

 Inter-process will add a minimal amount, but if it's done correctly,
 you'll still end up ahead.

This is exactly the bit of optimism I was questioning.  We've already
been sweating blood trying to reduce multiprocessor contention on data
structures in which collisions ought to be avoidable (ie, buffer arrays
where you hope not everyone is hitting the same buffer at once).  I
think passing large volumes of data between different processes is going
to incur quite a lot of locking overhead, pipeline stalls for cache line
transfers, etc, etc, because heavy contention for the transfer buffer is
simply not going to be avoidable.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Jonah H. Harris
On 4/8/06, Tom Lane [EMAIL PROTECTED] wrote:
 passing large volumes of data between different processes is going
 to incur quite a lot of locking overhead, pipeline stalls for cache line
 transfers, etc, etc, because heavy contention for the transfer buffer is
 simply not going to be avoidable.

I don't think anyone believes there isn't going to be contention
somewhere.  Anyone that's developed a parallel system acknowledges
there's a price to pay with parallelism (in complexity) which requires
good algorithms, strategy, and design.

I may certainly be misunderstanding you, but it seems like you want to
avoid parallelism altogether.  Aside from PostgreSQL, the only systems
I know of which don't support parallelism in query execution are MySQL
and Firebird, but I don't see competing at the low end as a compelling
argument.  Similarly, I don't think every major vendor put money into
developing something that isn't useful.

I do agree that there are many times you won't want to have parallel
query.  The reasons for employing parallelism is very dependent on the
environment and application it's used in, so it's certainly going to
be up to the user to decide whether they want to use it or not.

We're currently playing with a proof-of-concept to find issues before
proposing a design.  This doesn't affect the community in any way. 
Rather than negativity, I'd really like to see your suggestions on
avoiding contention as much as possible; your ideas may certainly get
us past several obstacles more quickly.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Gregory Maxwell
On 4/8/06, Tom Lane [EMAIL PROTECTED] wrote:
 This is exactly the bit of optimism I was questioning.  We've already
 been sweating blood trying to reduce multiprocessor contention on data
 structures in which collisions ought to be avoidable (ie, buffer arrays
 where you hope not everyone is hitting the same buffer at once).  I
 think passing large volumes of data between different processes is going
 to incur quite a lot of locking overhead, pipeline stalls for cache line
 transfers, etc, etc, because heavy contention for the transfer buffer is
 simply not going to be avoidable.

We should consider true parallel execution and overlapping execution
with I/O as distinct cases.

For example, one case made in this thread involved bursty performance
with seqscans presumably because the I/O was stalling while processing
was being performed.  In general this can be avoided without parallel
execution through the use of non-blocking I/O and making an effort to
keep the request pipeline full.

There are other cases where it is useful to perform parallel I/O
without parallel processing.. for example: a query that will perform
an index lookup per row can benefit from running some number of those
lookups in parallel in order to hide the lookup latency and give the
OS and disk elevators a chance to make the random accesses a little
more orderly. This can be accomplished without true parallel
processing. (Perhaps PG does this already?)

Parallel execution to get access to more CPU and memory bandwidth is a
fine thing, and worth the costs in many cases... but it shouldn't be
used as an easy way to get parallel IO without careful consideration.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Luke Lonergan
Greg,

On 4/8/06 5:43 PM, Gregory Maxwell [EMAIL PROTECTED] wrote:

 For example, one case made in this thread involved bursty performance
 with seqscans presumably because the I/O was stalling while processing
 was being performed.  In general this can be avoided without parallel
 execution through the use of non-blocking I/O and making an effort to
 keep the request pipeline full.

I agree - there is a real continuing need for overlapped scan and execution,
though I'm not sure quite how to get it efficiently using inter-process
communication.  The AIO interface is nicely proven at this point, but to
Tom's point, it's non-portable.

What we see now is that the executor is CPU bound at about 300MB/s scan rate
using modern Opteron CPUs.  Using some kind of overlap strategy would help
this a lot - using programmable readahead to stage I/O requests
asynchronously from the scan node would be better than OS hints like
fadvise(), but the tuning would be tricky IMO.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Myron Scott

Gregory Maxwell wrote:

We should consider true parallel execution and overlapping execution
with I/O as distinct cases.

For example, one case made in this thread involved bursty performance
with seqscans presumably because the I/O was stalling while processing
was being performed.  In general this can be avoided without parallel
execution through the use of non-blocking I/O and making an effort to
keep the request pipeline full.

There are other cases where it is useful to perform parallel I/O
without parallel processing.. for example: a query that will perform
an index lookup per row can benefit from running some number of those
lookups in parallel in order to hide the lookup latency and give the
OS and disk elevators a chance to make the random accesses a little
more orderly. This can be accomplished without true parallel
processing. (Perhaps PG does this already?)

  


I have done some testing more along these lines with an old fork of 
postgres code
(2001).  In my tests, I used a thread to delegate out the actual heap 
scan of the
SeqScan.  The job of  the slave thread the was to fault in buffer 
pages and
determine the time validity of the tuples.  ItemPointers are passed back 
to the

master thread via a common memory area guarded by mutex locking.  The
master thread is then responsible for converting the ItemPointers to 
HeapTuples
and finishing the execution run.  I added a little hack to the buffer 
code to force
pages read into the buffer to stay at the back of the free buffer list 
until the master
thread has had a chance to use it.  These are the parameters of my test 
table.


Pages 9459:  ; Tup 961187: Live 673029, Dead 288158

Average tuple size is 70 bytes

create table test (rand int, varchar(256) message)

So far I've done a couple of runs with a single query on a 2 processor 
machine with

the following results via dtrace.

select * from test;

CPU IDFUNCTION:NAME
1  46218ExecEndSeqScan:return Inline scan time 81729
0  46216   ExecEndDelegatedSeqScan:return Delegated scan time 59903
0  46218ExecEndSeqScan:return Inline scan time 95708
0  46216   ExecEndDelegatedSeqScan:return Delegated scan time 58255
0  46218ExecEndSeqScan:return Inline scan time 79028
0  46216   ExecEndDelegatedSeqScan:return Delegated scan time 50500

average 34% decrease in total time using the delegated scan.

A very crude, simple test but I think it shows some promise.

I know I used threads but you could probably just as easily use a slave 
process

and pass ItemPointers via pipes or shared memory.

Thanks,

Myron Scott




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-08 Thread Luke Lonergan
Myron,

First, this sounds really good!

On 4/8/06 9:54 PM, Myron Scott [EMAIL PROTECTED] wrote:

  I added a little hack to the buffer
 code to force
 pages read into the buffer to stay at the back of the free buffer list
 until the master
 thread has had a chance to use it.

This is the part I'm curious about - is this using the shared_buffers region
in a circular buffer fashion to store pre-fetched pages?

One thing I've wondered about is: how much memory is required to get
efficient overlap?  Did you find that you had to tune the amount of buffer
memory to get the performance to work out?

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-07 Thread Mike Rylander
On 4/6/06, Qingqing Zhou [EMAIL PROTECTED] wrote:

 Jonah H. Harris [EMAIL PROTECTED] wrote
 
  Great work!  I had looked into this a little bit and came to the same
  ideas/problems you did, but none of them seemed insurmountable at all.
   I'd be interested in working with you on this if you'd like.
 

First, I want to second Jonah's enthusiasm.  This is very exciting!


 Yes, I am happy to work with anyone on the topic. The plan in mind is like
 this:
 (1) stable the master-slave seqscan: solve all the problems left;
 (2) parallize the seqscan: AFAICS, this should not very difficult based on
 1, may only need some scan portition assignment;

This is really only a gut feeling for me (it can't be otherwise, since
we can't yet test), but I think parallelizing a single seqscan is
pretty much guaranteed to do nothing, because seqscans, especially on
large tables, are IO bound.

There was plan some time ago (during 8.0 beta, I think) to allow
multiple seqscans from different queries to join each other, such that
scans that begin later start scanning the table at the point, or just
behind the point, that the first running scan is already at.  That
plan would reduce IO contention, and buffer and OS cache thrashing, by
having multiple readers pull from the same hose.

I can't see how asking for more than one stream from the same file
would do anything but increase both cache thrashing and IO bandwidth
contention.  Am I missing something here?

 (3) add an indexscan or other one or two node type to  master-slave
 solution: this is in order to make the framework extensible;
 (4) parallize these node - this will be a big chunk of job;

Now that could be a _big_ win!  Especially if tablespaces are used to
balance commonly combined tables and indexes.

 (5) add a two-phase optimization to the server - we have to consider the
 partitioned table in this stage, yet another big chunk of job;


Same here.  This would be a place where parallel seqscans of different
tables (instead of multi-headed scan of one table) could buy you a
lot, especially with proper tablespace use.

Thanks again, Qingqing, for the work on this.  I'm very excited about
where this could go. :)

 Regards,
 Qingqing



 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org



--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-07 Thread Josh Berkus
Qingquing,

 First, I want to second Jonah's enthusiasm.  This is very exciting!

Me, three!  I didn't think this was ever going to come to Postgres absent 
major corporate funding.

 This is really only a gut feeling for me (it can't be otherwise, since
 we can't yet test), but I think parallelizing a single seqscan is
 pretty much guaranteed to do nothing, because seqscans, especially on
 large tables, are IO bound.

Actuall, not true.  Our current seqscan performance suffers from 
produce-consumer fluctuation.  GreenPlum and Sun did a whole bunch of 
testing on this.

Basically reading a large table off disk does this:

read some table while not processing
process in cpu while not reading
read some more table while not processing
process some more in cpu while not reading
etc.
resulting in an I/O througput graph that looks like:

*   *   *
   * ** ** *
  ** ** **
 *   *   *  *

The really annoying part about this, for me personally, is that the peaks 
are significantly faster than comparable commercial DBMSes ... but our 
average is far less.   So even on a single seq scan, parallel query 
execution would make a significant difference in performance, possibly as 
much as +75% on seq scans of large tables.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-07 Thread Jonah H. Harris
On 4/7/06, Josh Berkus josh@agliodbs.com wrote:
 So even on a single seq scan, parallel query
 execution would make a significant difference
 in performance, possibly as
 much as +75% on seq scans of large tables.

I've been looking at several commercial systems which employ dynamic
partitioning for sequential scans of large tables.  While 75% sounds a
little optimistic, there is most definitely a sizable performance
increase.


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-06 Thread Martijn van Oosterhout
On Thu, Apr 06, 2006 at 06:28:33PM +0800, Qingqing Zhou wrote:
 I have written some experimental code of doing master-slave seqscan in
 PostgreSQL. During the work, I feel we had enough infrastructure to support
 parallel query execution.

Good work. One question though, temporary tables. Currently there no
way to get a consistant view of a temporary table from another process.

You probably have to avoid shipping off non-immutable function calls.

Is it possible to deadlock (don't see how, but I may not be looking
hard enough).

Hardest work would be finding what can be safely farmed off.

Nice work though, I hadn't thought of this approach.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-06 Thread Jonah H. Harris
On 4/6/06, Qingqing Zhou [EMAIL PROTECTED] wrote:
 I have written some experimental code of doing master-slave seqscan in
 PostgreSQL. During the work, I feel we had enough infrastructure to support
 parallel query execution.

Great work!  I had looked into this a little bit and came to the same
ideas/problems you did, but none of them seemed insurmountable at all.
 I'd be interested in working with you on this if you'd like.

I'm interested in hearing Tom's suggestions on this topic too.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-06 Thread Andrew Dunstan

Jonah H. Harris wrote:

On 4/6/06, Qingqing Zhou [EMAIL PROTECTED] wrote:
  

I have written some experimental code of doing master-slave seqscan in
PostgreSQL. During the work, I feel we had enough infrastructure to support
parallel query execution.



Great work!  I had looked into this a little bit and came to the same
ideas/problems you did, but none of them seemed insurmountable at all.
 I'd be interested in working with you on this if you'd like.

I'm interested in hearing Tom's suggestions on this topic too.

  


From time to time there is (often ill-informed) discussion about making 
postgres multi-threaded - this strikes me as a possible candidate area 
for multithreading, and maybe that would be a way to overcome some of 
the problems mentioned.


Just a thought

cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-06 Thread Qingqing Zhou

Jonah H. Harris [EMAIL PROTECTED] wrote

 Great work!  I had looked into this a little bit and came to the same
 ideas/problems you did, but none of them seemed insurmountable at all.
  I'd be interested in working with you on this if you'd like.


Yes, I am happy to work with anyone on the topic. The plan in mind is like
this:
(1) stable the master-slave seqscan: solve all the problems left;
(2) parallize the seqscan: AFAICS, this should not very difficult based on
1, may only need some scan portition assignment;
(3) add an indexscan or other one or two node type to  master-slave
solution: this is in order to make the framework extensible;
(4) parallize these node - this will be a big chunk of job;
(5) add a two-phase optimization to the server - we have to consider the
partitioned table in this stage, yet another big chunk of job;

Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org