Re: [HACKERS] more parallel query documentation

2016-09-21 Thread Robert Haas
On Tue, Sep 20, 2016 at 11:18 AM, Peter Eisentraut
 wrote:
> On 9/19/16 1:22 PM, Robert Haas wrote:
>> On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera
>>  wrote:
>>> I agree it should be added.  I suggest that it could even be added to
>>> the 9.6 docs, if you can make it.
>>
>> Here's a patch.  I intend to commit this pretty quickly unless
>> somebody objects, and also to backpatch it into 9.6.  I'm sure it's
>> not perfect, but imperfect documentation is better than no
>> documentation.
>
> Looks reasonable to me.

Cool.  Committed after fixing a typo that Alvaro noted off-list and a
few others that I found after inspecting with an editor that features
spell-check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-09-20 Thread Peter Eisentraut
On 9/19/16 1:22 PM, Robert Haas wrote:
> On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera
>  wrote:
>> I agree it should be added.  I suggest that it could even be added to
>> the 9.6 docs, if you can make it.
> 
> Here's a patch.  I intend to commit this pretty quickly unless
> somebody objects, and also to backpatch it into 9.6.  I'm sure it's
> not perfect, but imperfect documentation is better than no
> documentation.

Looks reasonable to me.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-09-19 Thread Robert Haas
On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera
 wrote:
> I agree it should be added.  I suggest that it could even be added to
> the 9.6 docs, if you can make it.

Here's a patch.  I intend to commit this pretty quickly unless
somebody objects, and also to backpatch it into 9.6.  I'm sure it's
not perfect, but imperfect documentation is better than no
documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


parallel-query-doc-v1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-09-16 Thread Alvaro Herrera
Robert Haas wrote:

> Hey, everybody: I intended to add this to the documentation before 9.6
> went out, but that didn't get done.  Maybe it'll have to happen later
> at this point, but can I get some advice on WHERE in the documentation
> this stuff could be added?  Assuming people agree it should be added?
> The major subsections of the documentation are "Tutorial", "The SQL
> Language", "Server Administration", "Client Interfaces", "Server
> Programming", "Reference",  "Internals", and "Appendixes", and it's
> not clear to me that parallel query fits very well into any of those
> categories.

I agree it should be added.  I suggest that it could even be added to
the 9.6 docs, if you can make it.

I think the sections "Tutorial" and "The SQL Language" are the most
reasonable places.  The latter seems to be exclusively about how to word
the queries rather than how they are executed, though adding a new
section before or after "Performance Tips" seems not completely
off-topic.

The "Tutorial" seems somewhat more than a tutorial these days, but it
seems much more lighter reading than what you have in that wiki page
anyway.  Perhaps it would be okay to add some simple text in the
"Advanced Features" section, and elaborate in the "The SQL Language"
chapter.

(Aside: it seems strange to have a "The SQL Language" section inside the
"Tutorial" chapter and a separate "The SQL Language" chapter.)

I gave a quick look to https://wiki.postgresql.org/wiki/Parallel_Query I
think it reads a little strange still: it doesn't say that parallel
query is implemented on top of bgworkers, yet very early it suggests
that the max_parallel_degree value depends on the max_worker_processes
parameter without explaining why.  I think that could be clearer.
Also, the blurb about VACUUM/CLUSTER looks like it belongs in the "When
can parallel query be used" section rather than the intro.

> I feel like we need a new major division for operational issues that
> don't qualify as server administration - e.g. query performance
> tuning, parallel query, how to decide what indexes to create...

I'm not opposed to this idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-09-16 Thread Robert Haas
On Thu, Apr 21, 2016 at 9:16 PM, Amit Langote
 wrote:
> On 2016/04/15 12:02, Robert Haas wrote:
>> As previously threatened, I have written some user documentation for
>> parallel query.  I put it up here:
>>
>> https://wiki.postgresql.org/wiki/Parallel_Query
>>
>> This is not totally comprehensive and I can think of a few more
>> details that could be added, but it's a pretty good high-level
>> overview of what got into 9.6.  After this has gotten some feedback
>> and polishing, I would like to add a version of this to the SGML
>> documentation somewhere.  I am not sure where it would fit, but I
>> think it's good to document stuff like this.
>
> Looking at the "Function Labeling For Parallel Safety" section.  There is
> a sentence:
>
> "Functions must be marked PARALLEL UNSAFE if they write to the database,
> access sequences, change the transaction state even temporarily (e.g. a
> PL/pgsql function which establishes an EXCEPTION block to catch errors),
> or make persistent changes to settings."
>
> Then looking at the "postgres_fdw vs. force_parallel_mode on ppc" thread
> [1], I wonder if a note on the lines of "or a function that creates *new*
> connection(s) to remote server(s)" may be in order.  Overkill?

That's not necessarily parallel-unsafe.  It's probably
parallel-restricted at most.

Hey, everybody: I intended to add this to the documentation before 9.6
went out, but that didn't get done.  Maybe it'll have to happen later
at this point, but can I get some advice on WHERE in the documentation
this stuff could be added?  Assuming people agree it should be added?
The major subsections of the documentation are "Tutorial", "The SQL
Language", "Server Administration", "Client Interfaces", "Server
Programming", "Reference",  "Internals", and "Appendixes", and it's
not clear to me that parallel query fits very well into any of those
categories.  I suppose "Internals" is closest, but that's mostly stuff
that typical users won't care about, whereas what I'm trying to
document here is what parallel query will look like from a user
perspective, not how it looks under the hood.  I feel like we need a
new major division for operational issues that don't qualify as server
administration - e.g. query performance tuning, parallel query, how to
decide what indexes to create...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-04-21 Thread Amit Langote
On 2016/04/15 12:02, Robert Haas wrote:
> As previously threatened, I have written some user documentation for
> parallel query.  I put it up here:
> 
> https://wiki.postgresql.org/wiki/Parallel_Query
> 
> This is not totally comprehensive and I can think of a few more
> details that could be added, but it's a pretty good high-level
> overview of what got into 9.6.  After this has gotten some feedback
> and polishing, I would like to add a version of this to the SGML
> documentation somewhere.  I am not sure where it would fit, but I
> think it's good to document stuff like this.

Looking at the "Function Labeling For Parallel Safety" section.  There is
a sentence:

"Functions must be marked PARALLEL UNSAFE if they write to the database,
access sequences, change the transaction state even temporarily (e.g. a
PL/pgsql function which establishes an EXCEPTION block to catch errors),
or make persistent changes to settings."

Then looking at the "postgres_fdw vs. force_parallel_mode on ppc" thread
[1], I wonder if a note on the lines of "or a function that creates *new*
connection(s) to remote server(s)" may be in order.  Overkill?

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/CAEepm=1_saV7WJQWqgZfeNL954=nhtvcaoyuu6fxet01rm2...@mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more parallel query documentation

2016-04-15 Thread Jim Nasby

On 4/14/16 10:02 PM, Robert Haas wrote:

As previously threatened, I have written some user documentation for
parallel query.  I put it up here:


Yay! Definitely needed to be written. :)

There should be a section that summarizes the parallel machinery. I 
think the most important points are that separate processes are spun up, 
that they're limited by max_worker_processes and max_parallel_degree, 
and that shared memory queues are used to move data, results and errors 
between a regular backend (controlling backend?) and it's workers. The 
first section kind-of alludes to this, but it doesn't actually explain 
any of it. I think it's OK for the very first section to be a *brief* 
tl;dr summary on the basics of turning the feature on, but after that 
laying down groundwork knowledge will make the rest of the page much 
clearer.


I think the parts that talk about "parallel plan executed with no 
workers" are confusing... it almost sounds like the query won't be 
executed at all. It'd be better to say something like "executed single 
process" or "executed with no parallelism" or similar. Maybe the real 
issue is we need to pick a clear term for a non-parallel query and stick 
with it. I would also expand the different scenarios into bullets and 
explain why parallelism isn't used, like you did right above that. (I 
think it's great that you explained *why* parallel plans wouldn't be 
generated instead of just listing conditions.)


When describing SeqScan, it would be good to clarify whether 
effective_io_concurrency has an effect. (For that matter, does 
effective_io_concurrency interact with any of the other parallel settings?)


"Functions must be marked PARALLEL UNSAFE ..., or make persistent 
changes to settings." What would be a non-persistent change? SET LOCAL? 
(This is another case where it'd be good if we decided on specific 
terminology and referenced the definition from the page.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers