On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote: > On 16/01/13 11:14, Bruce Momjian wrote: > > I mentioned last year that I wanted to start working on parallelism: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Years ago I added thread-safety to libpq. Recently I added two parallel > execution paths to pg_upgrade. The first parallel path allows execution > of external binaries pg_dump and psql (to restore). The second parallel > path does copy/link by calling fork/thread-safe C functions. I was able > to do each in 2-3 days. > > I believe it is time to start adding parallel execution to the backend. > We already have some parallelism in the backend: > effective_io_concurrency and helper processes. I think it is time we > start to consider additional options. > > Parallelism isn't going to help all queries, in fact it might be just a > small subset, but it will be the larger queries. The pg_upgrade > parallelism only helps clusters with multiple databases or tablespaces, > but the improvements are significant. > > I have summarized my ideas by updating our Parallel Query Execution wiki > page: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Please consider updating the page yourself or posting your ideas to this > thread. Thanks. > > > Hmm... > > How about being aware of multiple spindles - so if the requested data covers > multiple spindles, then data could be extracted in parallel. This may, or may > not, involve multiple I/O channels?
Well, we usually label these as tablespaces. I don't know if spindle-level is a reasonable level to add. > On large multiple processor machines, there are different blocks of memory > that > might be accessed at different speeds depending on the processor. Possibly a > mechanism could be used to split a transaction over multiple processors to > ensure the fastest memory is used? That seems too far-out for an initial approach. > Once a selection of rows has been made, then if there is a lot of reformatting > going on, then could this be done in parallel? I can of think of 2 very > simplistic strategies: (A) use a different processor core for each column, or > (B) farm out sets of rows to different cores. I am sure in reality, there are > more subtleties and aspects of both the strategies will be used in a hybrid > fashion along with other approaches. Probably #2, but that is going to require having some of modules thread/fork-safe, and that is going to be tricky. > I expect that before any parallel algorithm is invoked, then some sort of > threshold needs to be exceeded to make it worth while. Different aspects of > the parallel algorithm may have their own thresholds. It may not be worth > applying a parallel algorithm for 10 rows from a simple table, but selecting > 10,000 records from multiple tables each over 10 million rows using joins may > benefit for more extreme parallelism. Right, I bet we will need some way to control when the overhead of parallel execution is worth it. > I expect that UNIONs, as well as the processing of partitioned tables, may be > amenable to parallel processing. Interesting idea on UNION. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers