Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-07 Thread Vitalii Tymchyshyn

Hi, all

My small thoughts about parallelizing single query.
AFAIK in the cases where it is needed, there is usually one single 
operation that takes a lot of CPU, e.g. hashing or sorting. And this are 
usually tasks that has well known algorithms to parallelize.
The main problem, as for me, is thread safety. First of all, operations 
that are going to be parallelized, must be thread safe. Then functions 
and procedures they call must be thread safe too. So, a marker for a 
procedure must be introduced and all standard ones should be 
checked/fixed for parallel processing with marker set.
Then, one should not forget optimizer checks for when to introduce 
parallelizing. How should it be accounted in the query plan? Should it 
influence optimizer decisions (should it count CPU or wall time when 
optimizing query plan)?
Or can it simply be used by an operation when it can see it will benefit 
from it.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-04 Thread Greg Smith

Andy Colson wrote:
Yes, I agree... for today.  If you gaze into 5 years... double the 
core count (but not the speed), double the IO rate.  What do you see?


Four more versions of PostgreSQL addressing problems people are having 
right now.  When we reach the point where parallel query is the only way 
around the actual bottlenecks in the software people are running into, 
someone will finish parallel query.  I am not a fan of speculative 
development in advance of real demand for it.  There are multiple much 
more serious bottlenecks impacting scalability in PostgreSQL that need 
to be addressed before this one is #1 on the development priority list 
to me.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-04 Thread Chris Browne
gnuo...@rcn.com writes:
 Time for my pet meme to wiggle out of its hole (next to Phil's, and a
 day later).  For PG to prosper in the future, it has to embrace the
 multi-core/processor/SSD machine at the query level.  It has to.  And
 it has to because the Big Boys already do so, to some extent, and
 they've realized that the BCNF schema on such machines is supremely
 efficient.  PG/MySql/OSEngineOfChoice will get left behind simply
 because the efficiency offered will be worth the price.

 I know this is far from trivial, and my C skills are such that I can
 offer no help.  These machines have been the obvious current machine
 in waiting for at least 5 years, and those applications which benefit
 from parallelism (servers of all kinds, in particular) will filter out
 the winners and losers based on exploiting this parallelism.

 Much as it pains me to say it, but the MicroSoft approach to software:
 write to the next generation processor and force users to upgrade,
 will be the winning strategy for database engines.  There's just way
 too much to gain.

I'm not sure how true that is, really.  (e.g. - too much to gain.)

I know that Jan Wieck and I have been bouncing thoughts on valid use of
threading off each other for *years*, now, and it tends to be
interesting but difficult to the point of impracticality.

But how things play out are quite fundamentally different for different
usage models.

It's useful to cross items off the list, so we're left with the tough
ones that are actually a problem.

1.  For instance, OLTP applications, that generate a lot of concurrent
connections, already do perfectly well in scaling on multi-core systems.
Each connection is a separate process, and that already harnesses
multi-core systems perfectly well.  Things have improved a lot over the
last 10 years, and there may yet be further improvements to be found,
but it seems pretty reasonable to me to say that the OLTP scenario can
be treated as solved in this context.

The scenario where I can squint and see value in trying to multithread
is the contrast to that, of OLAP.  The case where we only use a single
core, today, is where there's only a single connection, and a single
query, running.

But that can reasonably be further constrained; not every
single-connection query could be improved by trying to spread work
across cores.  We need to add some further assumptions:

2.  The query needs to NOT be I/O-bound.  If it's I/O bound, then your
system is waiting for the data to come off disk, rather than to do
processing of that data.

That condition can be somewhat further strengthened...  It further needs
to be a query where multi-processing would not increase the I/O burden.

Between those two assumptions, that cuts the scope of usefulness to a
very considerable degree.

And if we *are* multiprocessing, we introduce several new problems, each
of which is quite troublesome:

 - How do we decompose the query so that the pieces are processed in
   ways that improve processing time?

   In effect, how to generate a parallel query plan?

   It would be more than stupid to consider this to be obvious.  We've
   got 15-ish years worth of query optimization efforts that have gone
   into Postgres, and many of those changes were not obvious until
   after they got thought through carefully.  This multiplies the
   complexity, and opportunity for error.

 - Coordinating processing

   Becomes quite a bit more complex.  Multiple threads/processes are
   accessing parts of the same data concurrently, so a parallelized
   query that harnesses 8 CPUs might generate 8x as many locks and
   analogous coordination points.

 - Platform specificity

   Threading is a problem in that each OS platform has its own
   implementation, and even when they claim to conform to common
   standards, they still have somewhat different interpretations.  This
   tends to go in one of the following directions:

a) You have to pick one platform to do threading on.

   Oops.  There's now PostgreSQL-Linux, that is the only platform
   where our multiprocessing thing works.  It could be worse than
   that; it might work on a particular version of a particular OS...

b) You follow some apparently portable threading standard

   And find that things are hugely buggy because the platforms
   follow the standard a bit differently.  And perhaps this means
   that, analogous to a), you've got a set of platforms where this
   works (for some value of works), and others where it can't.
   That's almost as evil as a).

c) You follow some apparently portable threading standard

   And need to wrap things in a pretty thick safety blanket to make
   sure it is compatible with all the bugs in interpretation and
   implementation.  Complexity++, and performance probably suffers.

   None of these are particularly palatable, which is why threading
   proposals get a lot of pushback.

At the end of the day, if this is 

Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-04 Thread david

On Fri, 4 Feb 2011, Chris Browne wrote:


2.  The query needs to NOT be I/O-bound.  If it's I/O bound, then your
system is waiting for the data to come off disk, rather than to do
processing of that data.


yes and no on this one.

it is very possible to have a situation where the process generating the 
I/O is waiting for the data to come off disk, but if there are still idle 
resources in the disk subsystem.


it may be that the best way to address this is to have the process 
generating the I/O send off more requests, but that sometimes is 
significantly more complicated than splitting the work between two 
processes and letting them each generate I/O requests


with rotating disks, ideally you want to have at least two requests 
outstanding, one that the disk is working on now, and one for it to start 
on as soon as it finishes the one that it's on (so that the disk doesn't 
sit idle while the process decides what the next read should be). In 
practice you tend to want to have even more outstanding from the 
application so that they can be optimized (combined, reordered, etc) by 
the lower layers.


if you end up with a largish raid array (say 16 disks), this can translate 
into a lot of outstanding requests that you want to have active to fully 
untilize the array, but having the same number of requests outstanding 
with a single disk would be counterproductive as the disk would not be 
able to see all the outstanding requests and therefor would not be able to 
optimize them as effectivly.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Andy Colson

On 2/3/2011 9:08 AM, Mark Stosberg wrote:


Each night we run over a 100,000 saved searches against PostgreSQL
9.0.x. These are all complex SELECTs using cube functions to perform a
geo-spatial search to help people find adoptable pets at shelters.

All of our machines in development in production have at least 2 cores
in them, and I'm wondering about the best way to maximally engage all
the processors.

Now we simply run the searches in serial. I realize PostgreSQL may be
taking advantage of the multiple cores some in this arrangement, but I'm
seeking advice about the possibility and methods for running the
searches in parallel.

One naive I approach I considered was to use parallel cron scripts. One
would run the odd searches and the other would run the even
searches. This would be easy to implement, but perhaps there is a better
way.  To those who have covered this area already, what's the best way
to put multiple cores to use when running repeated SELECTs with PostgreSQL?

Thanks!

 Mark




1) I'm assuming this is all server side processing.
2) One database connection will use one core.  To use multiple cores you 
need multiple database connections.
3) If your jobs are IO bound, then running multiple jobs may hurt 
performance.


Your naive approach is the best.  Just spawn off two jobs (or three, or 
whatever).  I think its also the only method.  (If there is another 
method, I dont know what it would be)


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread gnuoytr
Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
later).  For PG to prosper in the future, it has to embrace the 
multi-core/processor/SSD machine at the query level.  It has to.  And it has to 
because the Big Boys already do so, to some extent, and they've realized that 
the BCNF schema on such machines is supremely efficient.  
PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency 
offered will be worth the price.

I know this is far from trivial, and my C skills are such that I can offer no 
help.  These machines have been the obvious current machine in waiting for at 
least 5 years, and those applications which benefit from parallelism (servers 
of all kinds, in particular) will filter out the winners and losers based on 
exploiting this parallelism.

Much as it pains me to say it, but the MicroSoft approach to software: write to 
the next generation processor and force users to upgrade, will be the winning 
strategy for database engines.  There's just way too much to gain.

-- Robert

 Original message 
Date: Thu, 03 Feb 2011 09:44:03 -0600
From: pgsql-performance-ow...@postgresql.org (on behalf of Andy Colson 
a...@squeakycode.net)
Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated 
complex SELECT statements  
To: Mark Stosberg m...@summersault.com
Cc: pgsql-performance@postgresql.org

On 2/3/2011 9:08 AM, Mark Stosberg wrote:

 Each night we run over a 100,000 saved searches against PostgreSQL
 9.0.x. These are all complex SELECTs using cube functions to perform a
 geo-spatial search to help people find adoptable pets at shelters.

 All of our machines in development in production have at least 2 cores
 in them, and I'm wondering about the best way to maximally engage all
 the processors.

 Now we simply run the searches in serial. I realize PostgreSQL may be
 taking advantage of the multiple cores some in this arrangement, but I'm
 seeking advice about the possibility and methods for running the
 searches in parallel.

 One naive I approach I considered was to use parallel cron scripts. One
 would run the odd searches and the other would run the even
 searches. This would be easy to implement, but perhaps there is a better
 way.  To those who have covered this area already, what's the best way
 to put multiple cores to use when running repeated SELECTs with PostgreSQL?

 Thanks!

  Mark



1) I'm assuming this is all server side processing.
2) One database connection will use one core.  To use multiple cores you 
need multiple database connections.
3) If your jobs are IO bound, then running multiple jobs may hurt 
performance.

Your naive approach is the best.  Just spawn off two jobs (or three, or 
whatever).  I think its also the only method.  (If there is another 
method, I dont know what it would be)

-Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Mark Stosberg
On 02/03/2011 10:54 AM, Oleg Bartunov wrote:
 Mark,
 
 you could try gevel module to get structure of GIST index and look if
 items distributed more or less homogenous (see different levels). You
 can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index
 Also, if your searches are neighbourhood searches, them you could try
 knn, available
 in 9.1 development version.

Oleg,

Those are interesting details to consider. I read more about KNN here:

http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/

Will I be able to use it improve the performance of finding nearby
zipcodes? It sounds like KNN has great potential for performance
improvements!

   Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Aljoša Mohorović
On Thu, Feb 3, 2011 at 4:57 PM,  gnuo...@rcn.com wrote:
 Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
 later).  For PG to prosper in the future, it has to embrace the 
 multi-core/processor/SSD machine at the query level.  It has to.  And it has 
 to because the Big Boys already do so, to some extent, and they've realized 
 that the BCNF schema on such machines is supremely efficient.  
 PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency 
 offered will be worth the price.

this kind of view on what postgres community has to do can only be
true if postgres has no intention to support cloud environments or
any kind of hardware virtualization.
while i'm sure targeting specific hardware features can greatly
improve postgres performance it should be an option not a requirement.
forcing users to have specific hardware is basically telling users
that you can forget about using postgres in amazon/rackspace cloud
environments (or any similar environment).
i'm sure that a large part of postgres community doesn't care about
cloud environments (although this is only my personal impression)
but if plan is to disable postgres usage in such environments you are
basically loosing a large part of developers/companies targeting
global internet consumers with their online products.
cloud environments are currently the best platform for internet
oriented developers/companies to start a new project or even to
migrate from custom hardware/dedicated data center.

 Much as it pains me to say it, but the MicroSoft approach to software: write 
 to the next generation processor and force users to upgrade, will be the 
 winning strategy for database engines.  There's just way too much to gain.

it can arguably be said that because of this approach microsoft is
losing ground in most of their businesses/strategies.

Aljosa Mohorovic

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 8:57 AM,  gnuo...@rcn.com wrote:
 Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
 later).  For PG to prosper in the future, it has to embrace the 
 multi-core/processor/SSD machine at the query level.  It has to.  And

I'm pretty sure multi-core query processing is in the TODO list.  Not
sure anyone's working on it tho.  Writing a big check might help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread gnuoytr


 Original message 
Date: Thu, 3 Feb 2011 18:56:34 +0100
From: pgsql-performance-ow...@postgresql.org (on behalf of Aljoša Mohorović 
aljosa.mohoro...@gmail.com)
Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated 
complex SELECT statements  
To: gnuo...@rcn.com
Cc: pgsql-performance@postgresql.org

On Thu, Feb 3, 2011 at 4:57 PM,  gnuo...@rcn.com wrote:
 Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
 later).  For PG to prosper in the future, it has to embrace the 
 multi-core/processor/SSD machine at the query level.  It has to.  And it has 
 to because the Big Boys already do so, to some extent, and they've realized 
 that the BCNF schema on such machines is supremely efficient.  
 PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency 
 offered will be worth the price.

this kind of view on what postgres community has to do can only be
true if postgres has no intention to support cloud environments or
any kind of hardware virtualization.
while i'm sure targeting specific hardware features can greatly
improve postgres performance it should be an option not a requirement.

Being an option is just fine.  It's not there now.  Asserting that the cloud 
meme, based on lowest cost marginal hardware, should dictate a database engine 
is putting the cart before the horse.


forcing users to have specific hardware is basically telling users
that you can forget about using postgres in amazon/rackspace cloud
environments (or any similar environment).

Just not on cheap clouds, if they want maximal performance from the engine 
using BCNF schemas.  Replicating COBOL/VSAM/flatfile applications in any 
relational database engine is merely deluding oneself.  


i'm sure that a large part of postgres community doesn't care about
cloud environments (although this is only my personal impression)
but if plan is to disable postgres usage in such environments you are
basically loosing a large part of developers/companies targeting
global internet consumers with their online products.
cloud environments are currently the best platform for internet
oriented developers/companies to start a new project or even to
migrate from custom hardware/dedicated data center.

 Much as it pains me to say it, but the MicroSoft approach to software: write 
 to the next generation processor and force users to upgrade, will be the 
 winning strategy for database engines.  There's just way too much to gain.

it can arguably be said that because of this approach microsoft is
losing ground in most of their businesses/strategies.

Not really.  MicroSoft is losing ground for the same reason all other 
client/standalone applications are:  such applications don't run any better on 
multi-core/processor machines.  Add in the netbook/phone devices, and that they 
can't seem to make a version of windows that's markedly better than XP.  
Arguably MicroSoft is failing *because Office no longer requires* the next 
generation hardware to run right.  Hmm?  Linux prospers because it's a server 
OS, largely.  Desktop may, or may not, remain relevant.  Linux does make good 
use of such machines.  MicroSoft applications?  Not so much. 

Aljosa Mohorovic

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Greg Smith

Scott Marlowe wrote:

On Thu, Feb 3, 2011 at 8:57 AM,  gnuo...@rcn.com wrote:
  

Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
later).  For PG to prosper in the future, it has to embrace the 
multi-core/processor/SSD machine at the query level.  It has to.  And



I'm pretty sure multi-core query processing is in the TODO list.  Not
sure anyone's working on it tho.  Writing a big check might help.
  


Work on the exciting parts people are interested in is blocked behind 
completely mundane tasks like coordinating how the multiple sessions are 
going to end up with a consistent view of the database.  See Export 
snapshots to other sessions at 
http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one.


Parallel query works well for accelerating CPU-bound operations that are 
executing in RAM.  The reality here is that while the feature sounds 
important, these situations don't actually show up that often.  There 
are exactly zero clients I deal with regularly who would be helped out 
by this.  The ones running web applications whose workloads do fit into 
memory are more concerned about supporting large numbers of users, not 
optimizing things for a single one.  And the ones who have so much data 
that single users running large reports would seemingly benefit from 
this are usually disk-bound instead.


The same sort of situation exists with SSDs.  Take out the potential 
users whose data can fit in RAM instead, take out those who can't 
possibly get an SSD big enough to hold all their stuff anyway, and 
what's left in the middle is not very many people.  In a database 
context I still haven't found anything better to do with a SSD than to 
put mid-sized indexes on them, ones a bit too large for RAM but not so 
big that only regular hard drives can hold them.


I would rather strongly disagree with the suggestion that embracing 
either of these fancy but not really as functional as they appear at 
first approaches is critical to PostgreSQL's future.  They're 
specialized techniques useful to only a limited number of people.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books



Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Andy Colson

On 02/03/2011 04:56 PM, Greg Smith wrote:

Scott Marlowe wrote:

On Thu, Feb 3, 2011 at 8:57 AM,gnuo...@rcn.com  wrote:


Time for my pet meme to wiggle out of its hole (next to Phil's, and a day 
later).  For PG to prosper in the future, it has to embrace the 
multi-core/processor/SSD machine at the query level.  It has to.  And



I'm pretty sure multi-core query processing is in the TODO list.  Not
sure anyone's working on it tho.  Writing a big check might help.



Work on the exciting parts people are interested in is blocked behind completely mundane 
tasks like coordinating how the multiple sessions are going to end up with a consistent 
view of the database. See Export snapshots to other sessions at 
http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one.

Parallel query works well for accelerating CPU-bound operations that are 
executing in RAM. The reality here is that while the feature sounds important, 
these situations don't actually show up that often. There are exactly zero 
clients I deal with regularly who would be helped out by this. The ones running 
web applications whose workloads do fit into memory are more concerned about 
supporting large numbers of users, not optimizing things for a single one. And 
the ones who have so much data that single users running large reports would 
seemingly benefit from this are usually disk-bound instead.

The same sort of situation exists with SSDs. Take out the potential users whose 
data can fit in RAM instead, take out those who can't possibly get an SSD big 
enough to hold all their stuff anyway, and what's left in the middle is not 
very many people. In a database context I still haven't found anything better 
to do with a SSD than to put mid-sized indexes on them, ones a bit too large 
for RAM but not so big that only regular hard drives can hold them.

I would rather strongly disagree with the suggestion that embracing either of 
these fancy but not really as functional as they appear at first approaches is 
critical to PostgreSQL's future. They're specialized techniques useful to only 
a limited number of people.

--
Greg Smith   2ndQuadrant usg...@2ndquadrant.comBaltimore, MD
PostgreSQL Training, Services, and 24x7 Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books



4 cores is cheap and popular now, 6 in a bit, 8 next year, 16/24 cores in 5 
years.  You can do 16 cores now, but its a bit expensive.  I figure hundreds of 
cores will be expensive in 5 years, but possible, and available.

Cpu's wont get faster, but HD's and SSD's will.  To have one database 
connection, which runs one query, run fast, it's going to need multi-core 
support.

That's not to say we need parallel query's.  Or we need multiple backends to 
work on one query.  We need one backend, working on one query, using mostly the same 
architecture, to just use more than one core.

You'll notice I used _mostly_ and _just_, and have no knowledge of PG 
internals, so I fully expect to be wrong.

My point is, there must be levels of threading, yes?  If a backend has data to 
sort, has it collected, nothing locked, what would it hurt to use multi-core 
sorting?

-- OR --

Threading (and multicore), to me, always mean queues.  What if new type's of backend's 
were created that did simple things, that normal backends could distribute 
work to, then go off and do other things, and come back to collect the results.

I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore 
cpu's would start getting really slow at 16/32 cores, and that message passing 
was the way forward past that.  If PG started aiming for 128 core support right 
now, it should use some kinda message passing with queues thing, yes?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Greg Smith

Andy Colson wrote:
Cpu's wont get faster, but HD's and SSD's will.  To have one database 
connection, which runs one query, run fast, it's going to need 
multi-core support.


My point was that situations where people need to run one query on one 
database connection that aren't in fact limited by disk I/O are far less 
common than people think.  My troublesome database servers aren't ones 
with a single CPU at its max but wishing there were more workers, 
they're the ones that have 25% waiting for I/O.  And even that crowd is 
still a subset, distinct from people who don't care about the speed of 
any one core, they need lots of connections to go at once.



That's not to say we need parallel query's.  Or we need multiple 
backends to work on one query.  We need one backend, working on one 
query, using mostly the same architecture, to just use more than one 
core.


That's exactly what we mean when we say parallel query in the context 
of a single server.


My point is, there must be levels of threading, yes?  If a backend has 
data to sort, has it collected, nothing locked, what would it hurt to 
use multi-core sorting?


Optimizer nodes don't run that way.  The executor pulls rows out of 
the top of the node tree, which then pulls from its children, etc.  If 
you just blindly ran off and executed every individual node to 
completion in parallel, that's not always going to be faster--could be a 
lot slower, if the original query never even needed to execute portions 
of the tree.


When you start dealing with all of the types of nodes that are out there 
it gets very messy in a hurry.  Decomposing the nodes of the query tree 
into steps that can be executed in parallel usefully is the hard problem 
hiding behind the simple idea of use all the cores!


I thought I read a paper someplace that said shared cache (L1/L2/etc) 
multicore cpu's would start getting really slow at 16/32 cores, and 
that message passing was the way forward past that.  If PG started 
aiming for 128 core support right now, it should use some kinda 
message passing with queues thing, yes?


There already is a TupleStore type that is going to serve as the message 
being sent between the client backends.  Unfortunately we won't get 
anywhere near 128 cores without addressing the known scalability issues 
that are in the code right now, ones you can easily run into even with 8 
cores.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 9:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Andy Colson wrote:

 Cpu's wont get faster, but HD's and SSD's will.  To have one database
 connection, which runs one query, run fast, it's going to need multi-core
 support.

 My point was that situations where people need to run one query on one
 database connection that aren't in fact limited by disk I/O are far less
 common than people think.  My troublesome database servers aren't ones with
 a single CPU at its max but wishing there were more workers, they're the
 ones that have 25% waiting for I/O.  And even that crowd is still a subset,
 distinct from people who don't care about the speed of any one core, they
 need lots of connections to go at once.

The most common case where I can use  1 core is loading data.  and
pg_restore supports parallel restore threads, so that takes care of
that pretty well.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Andy Colson

On 02/03/2011 10:00 PM, Greg Smith wrote:

Andy Colson wrote:

Cpu's wont get faster, but HD's and SSD's will. To have one database 
connection, which runs one query, run fast, it's going to need multi-core 
support.


My point was that situations where people need to run one query on one database 
connection that aren't in fact limited by disk I/O are far less common than people 
think. My troublesome database servers aren't ones with a single CPU at its max 
but wishing there were more workers, they're the ones that have 25% waiting 
for I/O. And even that crowd is still a subset, distinct from people who don't 
care about the speed of any one core, they need lots of connections to go at once.



Yes, I agree... for today.  If you gaze into 5 years... double the core count 
(but not the speed), double the IO rate.  What do you see?



My point is, there must be levels of threading, yes? If a backend has data to 
sort, has it collected, nothing locked, what would it hurt to use multi-core 
sorting?


Optimizer nodes don't run that way. The executor pulls rows out of the top of 
the node tree, which then pulls from its children, etc. If you just blindly ran off and 
executed every individual node to completion in parallel, that's not always going to be 
faster--could be a lot slower, if the original query never even needed to execute 
portions of the tree.

When you start dealing with all of the types of nodes that are out there it gets very 
messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed 
in parallel usefully is the hard problem hiding behind the simple idea of use all 
the cores!




What if... the nodes were run in separate threads, and interconnected via 
queues?  A node would not have to run to completion either.  A queue could be 
setup to have a max items.  When a node adds 5 out of 5 items it would go to 
sleep.  Its parent node, removing one of the items could wake it up.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 9:19 PM, Andy Colson a...@squeakycode.net wrote:
 On 02/03/2011 10:00 PM, Greg Smith wrote:

 Andy Colson wrote:

 Cpu's wont get faster, but HD's and SSD's will. To have one database
 connection, which runs one query, run fast, it's going to need multi-core
 support.

 My point was that situations where people need to run one query on one
 database connection that aren't in fact limited by disk I/O are far less
 common than people think. My troublesome database servers aren't ones with a
 single CPU at its max but wishing there were more workers, they're the ones
 that have 25% waiting for I/O. And even that crowd is still a subset,
 distinct from people who don't care about the speed of any one core, they
 need lots of connections to go at once.


 Yes, I agree... for today.  If you gaze into 5 years... double the core
 count (but not the speed), double the IO rate.  What do you see?

I run a cluster of pg servers under slony replication, and we have 112
cores between three servers, soon to go to 144 cores.  We have no need
for individual queries to span the cores, honestly.  Our real limit is
the ability get all those cores working at the same time on individual
queries efficiently without thundering herd issues.  Yeah, it's only
one datapoint, but for us, with a lot of cores, we need each one to
run one query as fast as it can.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance