Re: [HACKERS] more parallel query documentation
On Tue, Sep 20, 2016 at 11:18 AM, Peter Eisentraut wrote: > On 9/19/16 1:22 PM, Robert Haas wrote: >> On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera >> wrote: >>> I agree it should be added. I suggest that it could even be added to >>> the 9.6 docs, if you can make it. >> >> Here's a patch. I intend to commit this pretty quickly unless >> somebody objects, and also to backpatch it into 9.6. I'm sure it's >> not perfect, but imperfect documentation is better than no >> documentation. > > Looks reasonable to me. Cool. Committed after fixing a typo that Alvaro noted off-list and a few others that I found after inspecting with an editor that features spell-check. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On 9/19/16 1:22 PM, Robert Haas wrote: > On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera > wrote: >> I agree it should be added. I suggest that it could even be added to >> the 9.6 docs, if you can make it. > > Here's a patch. I intend to commit this pretty quickly unless > somebody objects, and also to backpatch it into 9.6. I'm sure it's > not perfect, but imperfect documentation is better than no > documentation. Looks reasonable to me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera wrote: > I agree it should be added. I suggest that it could even be added to > the 9.6 docs, if you can make it. Here's a patch. I intend to commit this pretty quickly unless somebody objects, and also to backpatch it into 9.6. I'm sure it's not perfect, but imperfect documentation is better than no documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-query-doc-v1.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
Robert Haas wrote: > Hey, everybody: I intended to add this to the documentation before 9.6 > went out, but that didn't get done. Maybe it'll have to happen later > at this point, but can I get some advice on WHERE in the documentation > this stuff could be added? Assuming people agree it should be added? > The major subsections of the documentation are "Tutorial", "The SQL > Language", "Server Administration", "Client Interfaces", "Server > Programming", "Reference", "Internals", and "Appendixes", and it's > not clear to me that parallel query fits very well into any of those > categories. I agree it should be added. I suggest that it could even be added to the 9.6 docs, if you can make it. I think the sections "Tutorial" and "The SQL Language" are the most reasonable places. The latter seems to be exclusively about how to word the queries rather than how they are executed, though adding a new section before or after "Performance Tips" seems not completely off-topic. The "Tutorial" seems somewhat more than a tutorial these days, but it seems much more lighter reading than what you have in that wiki page anyway. Perhaps it would be okay to add some simple text in the "Advanced Features" section, and elaborate in the "The SQL Language" chapter. (Aside: it seems strange to have a "The SQL Language" section inside the "Tutorial" chapter and a separate "The SQL Language" chapter.) I gave a quick look to https://wiki.postgresql.org/wiki/Parallel_Query I think it reads a little strange still: it doesn't say that parallel query is implemented on top of bgworkers, yet very early it suggests that the max_parallel_degree value depends on the max_worker_processes parameter without explaining why. I think that could be clearer. Also, the blurb about VACUUM/CLUSTER looks like it belongs in the "When can parallel query be used" section rather than the intro. > I feel like we need a new major division for operational issues that > don't qualify as server administration - e.g. query performance > tuning, parallel query, how to decide what indexes to create... I'm not opposed to this idea. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On Thu, Apr 21, 2016 at 9:16 PM, Amit Langote wrote: > On 2016/04/15 12:02, Robert Haas wrote: >> As previously threatened, I have written some user documentation for >> parallel query. I put it up here: >> >> https://wiki.postgresql.org/wiki/Parallel_Query >> >> This is not totally comprehensive and I can think of a few more >> details that could be added, but it's a pretty good high-level >> overview of what got into 9.6. After this has gotten some feedback >> and polishing, I would like to add a version of this to the SGML >> documentation somewhere. I am not sure where it would fit, but I >> think it's good to document stuff like this. > > Looking at the "Function Labeling For Parallel Safety" section. There is > a sentence: > > "Functions must be marked PARALLEL UNSAFE if they write to the database, > access sequences, change the transaction state even temporarily (e.g. a > PL/pgsql function which establishes an EXCEPTION block to catch errors), > or make persistent changes to settings." > > Then looking at the "postgres_fdw vs. force_parallel_mode on ppc" thread > [1], I wonder if a note on the lines of "or a function that creates *new* > connection(s) to remote server(s)" may be in order. Overkill? That's not necessarily parallel-unsafe. It's probably parallel-restricted at most. Hey, everybody: I intended to add this to the documentation before 9.6 went out, but that didn't get done. Maybe it'll have to happen later at this point, but can I get some advice on WHERE in the documentation this stuff could be added? Assuming people agree it should be added? The major subsections of the documentation are "Tutorial", "The SQL Language", "Server Administration", "Client Interfaces", "Server Programming", "Reference", "Internals", and "Appendixes", and it's not clear to me that parallel query fits very well into any of those categories. I suppose "Internals" is closest, but that's mostly stuff that typical users won't care about, whereas what I'm trying to document here is what parallel query will look like from a user perspective, not how it looks under the hood. I feel like we need a new major division for operational issues that don't qualify as server administration - e.g. query performance tuning, parallel query, how to decide what indexes to create... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On 2016/04/15 12:02, Robert Haas wrote: > As previously threatened, I have written some user documentation for > parallel query. I put it up here: > > https://wiki.postgresql.org/wiki/Parallel_Query > > This is not totally comprehensive and I can think of a few more > details that could be added, but it's a pretty good high-level > overview of what got into 9.6. After this has gotten some feedback > and polishing, I would like to add a version of this to the SGML > documentation somewhere. I am not sure where it would fit, but I > think it's good to document stuff like this. Looking at the "Function Labeling For Parallel Safety" section. There is a sentence: "Functions must be marked PARALLEL UNSAFE if they write to the database, access sequences, change the transaction state even temporarily (e.g. a PL/pgsql function which establishes an EXCEPTION block to catch errors), or make persistent changes to settings." Then looking at the "postgres_fdw vs. force_parallel_mode on ppc" thread [1], I wonder if a note on the lines of "or a function that creates *new* connection(s) to remote server(s)" may be in order. Overkill? Thanks, Amit [1] http://www.postgresql.org/message-id/CAEepm=1_saV7WJQWqgZfeNL954=nhtvcaoyuu6fxet01rm2...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On 4/14/16 10:02 PM, Robert Haas wrote: As previously threatened, I have written some user documentation for parallel query. I put it up here: Yay! Definitely needed to be written. :) There should be a section that summarizes the parallel machinery. I think the most important points are that separate processes are spun up, that they're limited by max_worker_processes and max_parallel_degree, and that shared memory queues are used to move data, results and errors between a regular backend (controlling backend?) and it's workers. The first section kind-of alludes to this, but it doesn't actually explain any of it. I think it's OK for the very first section to be a *brief* tl;dr summary on the basics of turning the feature on, but after that laying down groundwork knowledge will make the rest of the page much clearer. I think the parts that talk about "parallel plan executed with no workers" are confusing... it almost sounds like the query won't be executed at all. It'd be better to say something like "executed single process" or "executed with no parallelism" or similar. Maybe the real issue is we need to pick a clear term for a non-parallel query and stick with it. I would also expand the different scenarios into bullets and explain why parallelism isn't used, like you did right above that. (I think it's great that you explained *why* parallel plans wouldn't be generated instead of just listing conditions.) When describing SeqScan, it would be good to clarify whether effective_io_concurrency has an effect. (For that matter, does effective_io_concurrency interact with any of the other parallel settings?) "Functions must be marked PARALLEL UNSAFE ..., or make persistent changes to settings." What would be a non-persistent change? SET LOCAL? (This is another case where it'd be good if we decided on specific terminology and referenced the definition from the page.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] more parallel query documentation
As previously threatened, I have written some user documentation for parallel query. I put it up here: https://wiki.postgresql.org/wiki/Parallel_Query This is not totally comprehensive and I can think of a few more details that could be added, but it's a pretty good high-level overview of what got into 9.6. After this has gotten some feedback and polishing, I would like to add a version of this to the SGML documentation somewhere. I am not sure where it would fit, but I think it's good to document stuff like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers