Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-03 Thread Amit Kapila
On Sat, Jun 3, 2017 at 9:34 PM, Andres Freund  wrote:
> Hi,
>
> On 2017-06-03 17:40:08 +0530, Amit Kapila wrote:
>> The standard_planner check is sufficient to not generate parallel
>> plans for such statements, but it won't prevent if such commands
>> (which shouldn't be executed by parallel workers) are present in
>> functions.  Consider a hypothetical case as below:
>>
>> 1.  Create a parallel safe function containing Copy commands.
>> create or replace function parallel_copy(a integer) returns integer
>> as $$
>> begin
>> Copy (select * from t1 where c1 < 2) to 'e:\\f1';
>> return a;
>> end;
>> $$ language plpgsql Parallel Safe;
>>
>> 2. Now use this in some command which can be executed in parallel.
>> explain analyze select * from t1 where c1 < parallel_copy(10);
>>
>> This can allow Copy command to be executed by parallel workers if we
>> don't have sufficient safeguards.
>
> Yes.  But I'm unclear what does that have to do with the change
> discussed in this thread?
>

It is not related to the change you have proposed.  It just occurred
to me while reading the code in the area where you have proposed to
change, so mentioned here.  It might have been better to report it in
a separate thread.

>  The pg_plan_query in copy.c setting
> CURSOR_OPT_PARALLEL_OK doesn't meaningfully change the risk of this
> happening in one way or the other.
>
>> We already tried to prohibit it in
>> plpgsql like in function _SPI_execute_plan(), we call
>> PreventCommandIfParallelMode.  However, inspite of that, we have
>> safeguards in lower level calls, so that if the code flow reaches such
>> commands in parallel mode, we error out.  We have a similar check in
>> Copy From code flow  ( PreventCommandIfParallelMode("COPY FROM");) as
>> well, but I think we should have it in Copy To flow as well.
>
> Why?  What is it effectively preventing?  Multiple workers copying to
> the same file?
>

Yeah.  Also, the query (in Copy To command) can be a write a statement
as well.  I thought giving an error from COPY TO flow would be
appropriate for it.

>  Any such function would have the same risk for separate
> sessions.
>

You are right, but not sure if it makes sense to allow any form of
write statement in parallel workers without doing more analysis.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-03 Thread Andres Freund
Hi,

On 2017-06-03 17:40:08 +0530, Amit Kapila wrote:
> The standard_planner check is sufficient to not generate parallel
> plans for such statements, but it won't prevent if such commands
> (which shouldn't be executed by parallel workers) are present in
> functions.  Consider a hypothetical case as below:
> 
> 1.  Create a parallel safe function containing Copy commands.
> create or replace function parallel_copy(a integer) returns integer
> as $$
> begin
> Copy (select * from t1 where c1 < 2) to 'e:\\f1';
> return a;
> end;
> $$ language plpgsql Parallel Safe;
> 
> 2. Now use this in some command which can be executed in parallel.
> explain analyze select * from t1 where c1 < parallel_copy(10);
> 
> This can allow Copy command to be executed by parallel workers if we
> don't have sufficient safeguards.

Yes.  But I'm unclear what does that have to do with the change
discussed in this thread?  The pg_plan_query in copy.c setting
CURSOR_OPT_PARALLEL_OK doesn't meaningfully change the risk of this
happening in one way or the other.

> We already tried to prohibit it in
> plpgsql like in function _SPI_execute_plan(), we call
> PreventCommandIfParallelMode.  However, inspite of that, we have
> safeguards in lower level calls, so that if the code flow reaches such
> commands in parallel mode, we error out.  We have a similar check in
> Copy From code flow  ( PreventCommandIfParallelMode("COPY FROM");) as
> well, but I think we should have it in Copy To flow as well.

Why?  What is it effectively preventing?  Multiple workers copying to
the same file?  Any such function would have the same risk for separate
sessions.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-03 Thread Amit Kapila
On Thu, Jun 1, 2017 at 10:16 PM, Andres Freund  wrote:
> On 2017-06-01 21:37:56 +0530, Amit Kapila wrote:
>> On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund  wrote:
>> > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
>> >> On a related note, I think it might be better to have an
>> >> IsInParallelMode() check in this case as we have at other places.
>> >> This is to ensure that if this command is invoked via plpgsql function
>> >> and that function runs is the parallel mode, it will act as a
>> >> safeguard.
>> >
>> > Hm? Which other places do it that way?  Isn't standard_planner()
>> > centralizing such a check?
>> >
>>
>> heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.
>
> Those aren't comparable, they're not invoking the planner - and all the
> places that set PARALLEL_OK don't check for it.  The relevant check for
> planning is in standard_planner().
>

The standard_planner check is sufficient to not generate parallel
plans for such statements, but it won't prevent if such commands
(which shouldn't be executed by parallel workers) are present in
functions.  Consider a hypothetical case as below:

1.  Create a parallel safe function containing Copy commands.
create or replace function parallel_copy(a integer) returns integer
as $$
begin
Copy (select * from t1 where c1 < 2) to 'e:\\f1';
return a;
end;
$$ language plpgsql Parallel Safe;

2. Now use this in some command which can be executed in parallel.
explain analyze select * from t1 where c1 < parallel_copy(10);

This can allow Copy command to be executed by parallel workers if we
don't have sufficient safeguards.  We already tried to prohibit it in
plpgsql like in function _SPI_execute_plan(), we call
PreventCommandIfParallelMode.  However, inspite of that, we have
safeguards in lower level calls, so that if the code flow reaches such
commands in parallel mode, we error out.  We have a similar check in
Copy From code flow  ( PreventCommandIfParallelMode("COPY FROM");) as
well, but I think we should have it in Copy To flow as well.

I agree that at first place user shouldn't mark such functions as
parallel safe, but having such safeguards can prevent us from problems
where users have incorrectly marked some functions as parallel safe.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund
On 2017-06-01 15:56:35 -0400, Robert Haas wrote:
> > I personally think we should fix this in 9.6 and 10, but I've to admit
> > I'm not entirely impartial, because Citus hit this...
> 
> I guess it's a matter of judgement whether you want to call this a bug
> or a missing feature.  I wasn't really laboring under any illusion
> that I'd found every place that could benefit from a
> CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll
> find more such places and, well, where do you draw the line?

I agree that there's degrees here.  The reason I think this is on the
"backpatch" side of the fence is that IME COPY (query) is one of the
most common ways start start a longrunning query, which isn't the case
for some of the other ways to trigger them.


> That having been said, I don't know of any particular reason why this
> particular change would carry much risk.  My tolerance for optional
> changes in back branches is lower than many people here, so if it were
> me, I'd probably fix it only in master.  However, I can't get too
> excited about it either way.

So far I plan to push a fix to both branches, unless some other people
raise a bit stronger objections. I've some things I want to push first
(sequence stuff), so there's definitely some more time for protest.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Robert Haas
On Wed, May 31, 2017 at 7:19 PM, Andres Freund  wrote:
> To me that appears to be an oversight rather than intentional.  A
> somewhat annoying one at that, because it's not uncommong to use COPY to
> execute reports to a CSV file and such.
>
> Robert, am I missing a complication?

No, I think that would work.

> I personally think we should fix this in 9.6 and 10, but I've to admit
> I'm not entirely impartial, because Citus hit this...

I guess it's a matter of judgement whether you want to call this a bug
or a missing feature.  I wasn't really laboring under any illusion
that I'd found every place that could benefit from a
CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll
find more such places and, well, where do you draw the line?  That
having been said, I don't know of any particular reason why this
particular change would carry much risk.  My tolerance for optional
changes in back branches is lower than many people here, so if it were
me, I'd probably fix it only in master.  However, I can't get too
excited about it either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund
On 2017-06-01 21:37:56 +0530, Amit Kapila wrote:
> On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund  wrote:
> > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
> >> On a related note, I think it might be better to have an
> >> IsInParallelMode() check in this case as we have at other places.
> >> This is to ensure that if this command is invoked via plpgsql function
> >> and that function runs is the parallel mode, it will act as a
> >> safeguard.
> >
> > Hm? Which other places do it that way?  Isn't standard_planner()
> > centralizing such a check?
> >
> 
> heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.

Those aren't comparable, they're not invoking the planner - and all the
places that set PARALLEL_OK don't check for it.  The relevant check for
planning is in standard_planner().

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Amit Kapila
On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund  wrote:
> On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
>> On a related note, I think it might be better to have an
>> IsInParallelMode() check in this case as we have at other places.
>> This is to ensure that if this command is invoked via plpgsql function
>> and that function runs is the parallel mode, it will act as a
>> safeguard.
>
> Hm? Which other places do it that way?  Isn't standard_planner()
> centralizing such a check?
>

heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund
On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
> On a related note, I think it might be better to have an
> IsInParallelMode() check in this case as we have at other places.
> This is to ensure that if this command is invoked via plpgsql function
> and that function runs is the parallel mode, it will act as a
> safeguard.

Hm? Which other places do it that way?  Isn't standard_planner()
centralizing such a check?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Amit Kapila
On Thu, Jun 1, 2017 at 4:49 AM, Andres Freund  wrote:
> Hi,
>
> At the moment $subject doesn't allow parallelism, because copy.c's
> pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
> flag.
>
> To me that appears to be an oversight rather than intentional.
>

I also don't see any problem in parallelizing it.  On a related note,
I think it might be better to have an IsInParallelMode() check in this
case as we have at other places.  This is to ensure that if this
command is invoked via plpgsql function and that function runs is the
parallel mode, it will act as a safeguard.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 7:19 PM, Andres Freund  wrote:
> At the moment $subject doesn't allow parallelism, because copy.c's
> pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
> flag.
>
> To me that appears to be an oversight rather than intentional.  A
> somewhat annoying one at that, because it's not uncommong to use COPY to
> execute reports to a CSV file and such.
>
> Robert, am I missing a complication?
>
> I personally think we should fix this in 9.6 and 10, but I've to admit
> I'm not entirely impartial, because Citus hit this...

IMHO, For copy_to I don't see any problem in parallelizing it.  I just
tested by changing the cursorOption and it's working in parallel.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


copy_to_parallel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-05-31 Thread Andres Freund
Hi,

At the moment $subject doesn't allow parallelism, because copy.c's
pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
flag.

To me that appears to be an oversight rather than intentional.  A
somewhat annoying one at that, because it's not uncommong to use COPY to
execute reports to a CSV file and such.

Robert, am I missing a complication?

I personally think we should fix this in 9.6 and 10, but I've to admit
I'm not entirely impartial, because Citus hit this...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers