Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Hi, all My small thoughts about parallelizing single query. AFAIK in the cases where it is needed, there is usually one single operation that takes a lot of CPU, e.g. hashing or sorting. And this are usually tasks that has well known algorithms to parallelize. The main problem, as for me, is thread safety. First of all, operations that are going to be parallelized, must be thread safe. Then functions and procedures they call must be thread safe too. So, a marker for a procedure must be introduced and all standard ones should be checked/fixed for parallel processing with marker set. Then, one should not forget optimizer checks for when to introduce parallelizing. How should it be accounted in the query plan? Should it influence optimizer decisions (should it count CPU or wall time when optimizing query plan)? Or can it simply be used by an operation when it can see it will benefit from it. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Andy Colson wrote: Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see? Four more versions of PostgreSQL addressing problems people are having right now. When we reach the point where parallel query is the only way around the actual bottlenecks in the software people are running into, someone will finish parallel query. I am not a fan of speculative development in advance of real demand for it. There are multiple much more serious bottlenecks impacting scalability in PostgreSQL that need to be addressed before this one is #1 on the development priority list to me. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
gnuo...@rcn.com writes: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. I know this is far from trivial, and my C skills are such that I can offer no help. These machines have been the obvious current machine in waiting for at least 5 years, and those applications which benefit from parallelism (servers of all kinds, in particular) will filter out the winners and losers based on exploiting this parallelism. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. I'm not sure how true that is, really. (e.g. - too much to gain.) I know that Jan Wieck and I have been bouncing thoughts on valid use of threading off each other for *years*, now, and it tends to be interesting but difficult to the point of impracticality. But how things play out are quite fundamentally different for different usage models. It's useful to cross items off the list, so we're left with the tough ones that are actually a problem. 1. For instance, OLTP applications, that generate a lot of concurrent connections, already do perfectly well in scaling on multi-core systems. Each connection is a separate process, and that already harnesses multi-core systems perfectly well. Things have improved a lot over the last 10 years, and there may yet be further improvements to be found, but it seems pretty reasonable to me to say that the OLTP scenario can be treated as solved in this context. The scenario where I can squint and see value in trying to multithread is the contrast to that, of OLAP. The case where we only use a single core, today, is where there's only a single connection, and a single query, running. But that can reasonably be further constrained; not every single-connection query could be improved by trying to spread work across cores. We need to add some further assumptions: 2. The query needs to NOT be I/O-bound. If it's I/O bound, then your system is waiting for the data to come off disk, rather than to do processing of that data. That condition can be somewhat further strengthened... It further needs to be a query where multi-processing would not increase the I/O burden. Between those two assumptions, that cuts the scope of usefulness to a very considerable degree. And if we *are* multiprocessing, we introduce several new problems, each of which is quite troublesome: - How do we decompose the query so that the pieces are processed in ways that improve processing time? In effect, how to generate a parallel query plan? It would be more than stupid to consider this to be obvious. We've got 15-ish years worth of query optimization efforts that have gone into Postgres, and many of those changes were not obvious until after they got thought through carefully. This multiplies the complexity, and opportunity for error. - Coordinating processing Becomes quite a bit more complex. Multiple threads/processes are accessing parts of the same data concurrently, so a parallelized query that harnesses 8 CPUs might generate 8x as many locks and analogous coordination points. - Platform specificity Threading is a problem in that each OS platform has its own implementation, and even when they claim to conform to common standards, they still have somewhat different interpretations. This tends to go in one of the following directions: a) You have to pick one platform to do threading on. Oops. There's now PostgreSQL-Linux, that is the only platform where our multiprocessing thing works. It could be worse than that; it might work on a particular version of a particular OS... b) You follow some apparently portable threading standard And find that things are hugely buggy because the platforms follow the standard a bit differently. And perhaps this means that, analogous to a), you've got a set of platforms where this works (for some value of works), and others where it can't. That's almost as evil as a). c) You follow some apparently portable threading standard And need to wrap things in a pretty thick safety blanket to make sure it is compatible with all the bugs in interpretation and implementation. Complexity++, and performance probably suffers. None of these are particularly palatable, which is why threading proposals get a lot of pushback. At the end of the day, if this is
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Fri, 4 Feb 2011, Chris Browne wrote: 2. The query needs to NOT be I/O-bound. If it's I/O bound, then your system is waiting for the data to come off disk, rather than to do processing of that data. yes and no on this one. it is very possible to have a situation where the process generating the I/O is waiting for the data to come off disk, but if there are still idle resources in the disk subsystem. it may be that the best way to address this is to have the process generating the I/O send off more requests, but that sometimes is significantly more complicated than splitting the work between two processes and letting them each generate I/O requests with rotating disks, ideally you want to have at least two requests outstanding, one that the disk is working on now, and one for it to start on as soon as it finishes the one that it's on (so that the disk doesn't sit idle while the process decides what the next read should be). In practice you tend to want to have even more outstanding from the application so that they can be optimized (combined, reordered, etc) by the lower layers. if you end up with a largish raid array (say 16 disks), this can translate into a lot of outstanding requests that you want to have active to fully untilize the array, but having the same number of requests outstanding with a single disk would be counterproductive as the disk would not be able to see all the outstanding requests and therefor would not be able to optimize them as effectivly. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 2/3/2011 9:08 AM, Mark Stosberg wrote: Each night we run over a 100,000 saved searches against PostgreSQL 9.0.x. These are all complex SELECTs using cube functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm wondering about the best way to maximally engage all the processors. Now we simply run the searches in serial. I realize PostgreSQL may be taking advantage of the multiple cores some in this arrangement, but I'm seeking advice about the possibility and methods for running the searches in parallel. One naive I approach I considered was to use parallel cron scripts. One would run the odd searches and the other would run the even searches. This would be easy to implement, but perhaps there is a better way. To those who have covered this area already, what's the best way to put multiple cores to use when running repeated SELECTs with PostgreSQL? Thanks! Mark 1) I'm assuming this is all server side processing. 2) One database connection will use one core. To use multiple cores you need multiple database connections. 3) If your jobs are IO bound, then running multiple jobs may hurt performance. Your naive approach is the best. Just spawn off two jobs (or three, or whatever). I think its also the only method. (If there is another method, I dont know what it would be) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. I know this is far from trivial, and my C skills are such that I can offer no help. These machines have been the obvious current machine in waiting for at least 5 years, and those applications which benefit from parallelism (servers of all kinds, in particular) will filter out the winners and losers based on exploiting this parallelism. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. -- Robert Original message Date: Thu, 03 Feb 2011 09:44:03 -0600 From: pgsql-performance-ow...@postgresql.org (on behalf of Andy Colson a...@squeakycode.net) Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements To: Mark Stosberg m...@summersault.com Cc: pgsql-performance@postgresql.org On 2/3/2011 9:08 AM, Mark Stosberg wrote: Each night we run over a 100,000 saved searches against PostgreSQL 9.0.x. These are all complex SELECTs using cube functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm wondering about the best way to maximally engage all the processors. Now we simply run the searches in serial. I realize PostgreSQL may be taking advantage of the multiple cores some in this arrangement, but I'm seeking advice about the possibility and methods for running the searches in parallel. One naive I approach I considered was to use parallel cron scripts. One would run the odd searches and the other would run the even searches. This would be easy to implement, but perhaps there is a better way. To those who have covered this area already, what's the best way to put multiple cores to use when running repeated SELECTs with PostgreSQL? Thanks! Mark 1) I'm assuming this is all server side processing. 2) One database connection will use one core. To use multiple cores you need multiple database connections. 3) If your jobs are IO bound, then running multiple jobs may hurt performance. Your naive approach is the best. Just spawn off two jobs (or three, or whatever). I think its also the only method. (If there is another method, I dont know what it would be) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 10:54 AM, Oleg Bartunov wrote: Mark, you could try gevel module to get structure of GIST index and look if items distributed more or less homogenous (see different levels). You can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index Also, if your searches are neighbourhood searches, them you could try knn, available in 9.1 development version. Oleg, Those are interesting details to consider. I read more about KNN here: http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/ Will I be able to use it improve the performance of finding nearby zipcodes? It sounds like KNN has great potential for performance improvements! Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 4:57 PM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. this kind of view on what postgres community has to do can only be true if postgres has no intention to support cloud environments or any kind of hardware virtualization. while i'm sure targeting specific hardware features can greatly improve postgres performance it should be an option not a requirement. forcing users to have specific hardware is basically telling users that you can forget about using postgres in amazon/rackspace cloud environments (or any similar environment). i'm sure that a large part of postgres community doesn't care about cloud environments (although this is only my personal impression) but if plan is to disable postgres usage in such environments you are basically loosing a large part of developers/companies targeting global internet consumers with their online products. cloud environments are currently the best platform for internet oriented developers/companies to start a new project or even to migrate from custom hardware/dedicated data center. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. it can arguably be said that because of this approach microsoft is losing ground in most of their businesses/strategies. Aljosa Mohorovic -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 8:57 AM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Original message Date: Thu, 3 Feb 2011 18:56:34 +0100 From: pgsql-performance-ow...@postgresql.org (on behalf of Aljoša Mohorović aljosa.mohoro...@gmail.com) Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements To: gnuo...@rcn.com Cc: pgsql-performance@postgresql.org On Thu, Feb 3, 2011 at 4:57 PM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. this kind of view on what postgres community has to do can only be true if postgres has no intention to support cloud environments or any kind of hardware virtualization. while i'm sure targeting specific hardware features can greatly improve postgres performance it should be an option not a requirement. Being an option is just fine. It's not there now. Asserting that the cloud meme, based on lowest cost marginal hardware, should dictate a database engine is putting the cart before the horse. forcing users to have specific hardware is basically telling users that you can forget about using postgres in amazon/rackspace cloud environments (or any similar environment). Just not on cheap clouds, if they want maximal performance from the engine using BCNF schemas. Replicating COBOL/VSAM/flatfile applications in any relational database engine is merely deluding oneself. i'm sure that a large part of postgres community doesn't care about cloud environments (although this is only my personal impression) but if plan is to disable postgres usage in such environments you are basically loosing a large part of developers/companies targeting global internet consumers with their online products. cloud environments are currently the best platform for internet oriented developers/companies to start a new project or even to migrate from custom hardware/dedicated data center. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. it can arguably be said that because of this approach microsoft is losing ground in most of their businesses/strategies. Not really. MicroSoft is losing ground for the same reason all other client/standalone applications are: such applications don't run any better on multi-core/processor machines. Add in the netbook/phone devices, and that they can't seem to make a version of windows that's markedly better than XP. Arguably MicroSoft is failing *because Office no longer requires* the next generation hardware to run right. Hmm? Linux prospers because it's a server OS, largely. Desktop may, or may not, remain relevant. Linux does make good use of such machines. MicroSoft applications? Not so much. Aljosa Mohorovic -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Scott Marlowe wrote: On Thu, Feb 3, 2011 at 8:57 AM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. Work on the exciting parts people are interested in is blocked behind completely mundane tasks like coordinating how the multiple sessions are going to end up with a consistent view of the database. See Export snapshots to other sessions at http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one. Parallel query works well for accelerating CPU-bound operations that are executing in RAM. The reality here is that while the feature sounds important, these situations don't actually show up that often. There are exactly zero clients I deal with regularly who would be helped out by this. The ones running web applications whose workloads do fit into memory are more concerned about supporting large numbers of users, not optimizing things for a single one. And the ones who have so much data that single users running large reports would seemingly benefit from this are usually disk-bound instead. The same sort of situation exists with SSDs. Take out the potential users whose data can fit in RAM instead, take out those who can't possibly get an SSD big enough to hold all their stuff anyway, and what's left in the middle is not very many people. In a database context I still haven't found anything better to do with a SSD than to put mid-sized indexes on them, ones a bit too large for RAM but not so big that only regular hard drives can hold them. I would rather strongly disagree with the suggestion that embracing either of these fancy but not really as functional as they appear at first approaches is critical to PostgreSQL's future. They're specialized techniques useful to only a limited number of people. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 04:56 PM, Greg Smith wrote: Scott Marlowe wrote: On Thu, Feb 3, 2011 at 8:57 AM,gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. Work on the exciting parts people are interested in is blocked behind completely mundane tasks like coordinating how the multiple sessions are going to end up with a consistent view of the database. See Export snapshots to other sessions at http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one. Parallel query works well for accelerating CPU-bound operations that are executing in RAM. The reality here is that while the feature sounds important, these situations don't actually show up that often. There are exactly zero clients I deal with regularly who would be helped out by this. The ones running web applications whose workloads do fit into memory are more concerned about supporting large numbers of users, not optimizing things for a single one. And the ones who have so much data that single users running large reports would seemingly benefit from this are usually disk-bound instead. The same sort of situation exists with SSDs. Take out the potential users whose data can fit in RAM instead, take out those who can't possibly get an SSD big enough to hold all their stuff anyway, and what's left in the middle is not very many people. In a database context I still haven't found anything better to do with a SSD than to put mid-sized indexes on them, ones a bit too large for RAM but not so big that only regular hard drives can hold them. I would rather strongly disagree with the suggestion that embracing either of these fancy but not really as functional as they appear at first approaches is critical to PostgreSQL's future. They're specialized techniques useful to only a limited number of people. -- Greg Smith 2ndQuadrant usg...@2ndquadrant.comBaltimore, MD PostgreSQL Training, Services, and 24x7 Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books 4 cores is cheap and popular now, 6 in a bit, 8 next year, 16/24 cores in 5 years. You can do 16 cores now, but its a bit expensive. I figure hundreds of cores will be expensive in 5 years, but possible, and available. Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. That's not to say we need parallel query's. Or we need multiple backends to work on one query. We need one backend, working on one query, using mostly the same architecture, to just use more than one core. You'll notice I used _mostly_ and _just_, and have no knowledge of PG internals, so I fully expect to be wrong. My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? -- OR -- Threading (and multicore), to me, always mean queues. What if new type's of backend's were created that did simple things, that normal backends could distribute work to, then go off and do other things, and come back to collect the results. I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore cpu's would start getting really slow at 16/32 cores, and that message passing was the way forward past that. If PG started aiming for 128 core support right now, it should use some kinda message passing with queues thing, yes? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. That's not to say we need parallel query's. Or we need multiple backends to work on one query. We need one backend, working on one query, using mostly the same architecture, to just use more than one core. That's exactly what we mean when we say parallel query in the context of a single server. My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? Optimizer nodes don't run that way. The executor pulls rows out of the top of the node tree, which then pulls from its children, etc. If you just blindly ran off and executed every individual node to completion in parallel, that's not always going to be faster--could be a lot slower, if the original query never even needed to execute portions of the tree. When you start dealing with all of the types of nodes that are out there it gets very messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed in parallel usefully is the hard problem hiding behind the simple idea of use all the cores! I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore cpu's would start getting really slow at 16/32 cores, and that message passing was the way forward past that. If PG started aiming for 128 core support right now, it should use some kinda message passing with queues thing, yes? There already is a TupleStore type that is going to serve as the message being sent between the client backends. Unfortunately we won't get anywhere near 128 cores without addressing the known scalability issues that are in the code right now, ones you can easily run into even with 8 cores. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 9:00 PM, Greg Smith g...@2ndquadrant.com wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. The most common case where I can use 1 core is loading data. and pg_restore supports parallel restore threads, so that takes care of that pretty well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 10:00 PM, Greg Smith wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see? My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? Optimizer nodes don't run that way. The executor pulls rows out of the top of the node tree, which then pulls from its children, etc. If you just blindly ran off and executed every individual node to completion in parallel, that's not always going to be faster--could be a lot slower, if the original query never even needed to execute portions of the tree. When you start dealing with all of the types of nodes that are out there it gets very messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed in parallel usefully is the hard problem hiding behind the simple idea of use all the cores! What if... the nodes were run in separate threads, and interconnected via queues? A node would not have to run to completion either. A queue could be setup to have a max items. When a node adds 5 out of 5 items it would go to sleep. Its parent node, removing one of the items could wake it up. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 9:19 PM, Andy Colson a...@squeakycode.net wrote: On 02/03/2011 10:00 PM, Greg Smith wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see? I run a cluster of pg servers under slony replication, and we have 112 cores between three servers, soon to go to 144 cores. We have no need for individual queries to span the cores, honestly. Our real limit is the ability get all those cores working at the same time on individual queries efficiently without thundering herd issues. Yeah, it's only one datapoint, but for us, with a lot of cores, we need each one to run one query as fast as it can. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance