Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Sat, Jun 3, 2017 at 9:34 PM, Andres Freund wrote: > Hi, > > On 2017-06-03 17:40:08 +0530, Amit Kapila wrote: >> The standard_planner check is sufficient to not generate parallel >> plans for such statements, but it won't prevent if such commands >> (which shouldn't be executed by parallel workers) are present in >> functions. Consider a hypothetical case as below: >> >> 1. Create a parallel safe function containing Copy commands. >> create or replace function parallel_copy(a integer) returns integer >> as $$ >> begin >> Copy (select * from t1 where c1 < 2) to 'e:\\f1'; >> return a; >> end; >> $$ language plpgsql Parallel Safe; >> >> 2. Now use this in some command which can be executed in parallel. >> explain analyze select * from t1 where c1 < parallel_copy(10); >> >> This can allow Copy command to be executed by parallel workers if we >> don't have sufficient safeguards. > > Yes. But I'm unclear what does that have to do with the change > discussed in this thread? > It is not related to the change you have proposed. It just occurred to me while reading the code in the area where you have proposed to change, so mentioned here. It might have been better to report it in a separate thread. > The pg_plan_query in copy.c setting > CURSOR_OPT_PARALLEL_OK doesn't meaningfully change the risk of this > happening in one way or the other. > >> We already tried to prohibit it in >> plpgsql like in function _SPI_execute_plan(), we call >> PreventCommandIfParallelMode. However, inspite of that, we have >> safeguards in lower level calls, so that if the code flow reaches such >> commands in parallel mode, we error out. We have a similar check in >> Copy From code flow ( PreventCommandIfParallelMode("COPY FROM");) as >> well, but I think we should have it in Copy To flow as well. > > Why? What is it effectively preventing? Multiple workers copying to > the same file? > Yeah. Also, the query (in Copy To command) can be a write a statement as well. I thought giving an error from COPY TO flow would be appropriate for it. > Any such function would have the same risk for separate > sessions. > You are right, but not sure if it makes sense to allow any form of write statement in parallel workers without doing more analysis. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
Hi, On 2017-06-03 17:40:08 +0530, Amit Kapila wrote: > The standard_planner check is sufficient to not generate parallel > plans for such statements, but it won't prevent if such commands > (which shouldn't be executed by parallel workers) are present in > functions. Consider a hypothetical case as below: > > 1. Create a parallel safe function containing Copy commands. > create or replace function parallel_copy(a integer) returns integer > as $$ > begin > Copy (select * from t1 where c1 < 2) to 'e:\\f1'; > return a; > end; > $$ language plpgsql Parallel Safe; > > 2. Now use this in some command which can be executed in parallel. > explain analyze select * from t1 where c1 < parallel_copy(10); > > This can allow Copy command to be executed by parallel workers if we > don't have sufficient safeguards. Yes. But I'm unclear what does that have to do with the change discussed in this thread? The pg_plan_query in copy.c setting CURSOR_OPT_PARALLEL_OK doesn't meaningfully change the risk of this happening in one way or the other. > We already tried to prohibit it in > plpgsql like in function _SPI_execute_plan(), we call > PreventCommandIfParallelMode. However, inspite of that, we have > safeguards in lower level calls, so that if the code flow reaches such > commands in parallel mode, we error out. We have a similar check in > Copy From code flow ( PreventCommandIfParallelMode("COPY FROM");) as > well, but I think we should have it in Copy To flow as well. Why? What is it effectively preventing? Multiple workers copying to the same file? Any such function would have the same risk for separate sessions. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Thu, Jun 1, 2017 at 10:16 PM, Andres Freund wrote: > On 2017-06-01 21:37:56 +0530, Amit Kapila wrote: >> On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund wrote: >> > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote: >> >> On a related note, I think it might be better to have an >> >> IsInParallelMode() check in this case as we have at other places. >> >> This is to ensure that if this command is invoked via plpgsql function >> >> and that function runs is the parallel mode, it will act as a >> >> safeguard. >> > >> > Hm? Which other places do it that way? Isn't standard_planner() >> > centralizing such a check? >> > >> >> heap_insert->heap_prepare_insert, heap_update, heap_delete, etc. > > Those aren't comparable, they're not invoking the planner - and all the > places that set PARALLEL_OK don't check for it. The relevant check for > planning is in standard_planner(). > The standard_planner check is sufficient to not generate parallel plans for such statements, but it won't prevent if such commands (which shouldn't be executed by parallel workers) are present in functions. Consider a hypothetical case as below: 1. Create a parallel safe function containing Copy commands. create or replace function parallel_copy(a integer) returns integer as $$ begin Copy (select * from t1 where c1 < 2) to 'e:\\f1'; return a; end; $$ language plpgsql Parallel Safe; 2. Now use this in some command which can be executed in parallel. explain analyze select * from t1 where c1 < parallel_copy(10); This can allow Copy command to be executed by parallel workers if we don't have sufficient safeguards. We already tried to prohibit it in plpgsql like in function _SPI_execute_plan(), we call PreventCommandIfParallelMode. However, inspite of that, we have safeguards in lower level calls, so that if the code flow reaches such commands in parallel mode, we error out. We have a similar check in Copy From code flow ( PreventCommandIfParallelMode("COPY FROM");) as well, but I think we should have it in Copy To flow as well. I agree that at first place user shouldn't mark such functions as parallel safe, but having such safeguards can prevent us from problems where users have incorrectly marked some functions as parallel safe. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On 2017-06-01 15:56:35 -0400, Robert Haas wrote: > > I personally think we should fix this in 9.6 and 10, but I've to admit > > I'm not entirely impartial, because Citus hit this... > > I guess it's a matter of judgement whether you want to call this a bug > or a missing feature. I wasn't really laboring under any illusion > that I'd found every place that could benefit from a > CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll > find more such places and, well, where do you draw the line? I agree that there's degrees here. The reason I think this is on the "backpatch" side of the fence is that IME COPY (query) is one of the most common ways start start a longrunning query, which isn't the case for some of the other ways to trigger them. > That having been said, I don't know of any particular reason why this > particular change would carry much risk. My tolerance for optional > changes in back branches is lower than many people here, so if it were > me, I'd probably fix it only in master. However, I can't get too > excited about it either way. So far I plan to push a fix to both branches, unless some other people raise a bit stronger objections. I've some things I want to push first (sequence stuff), so there's definitely some more time for protest. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Wed, May 31, 2017 at 7:19 PM, Andres Freund wrote: > To me that appears to be an oversight rather than intentional. A > somewhat annoying one at that, because it's not uncommong to use COPY to > execute reports to a CSV file and such. > > Robert, am I missing a complication? No, I think that would work. > I personally think we should fix this in 9.6 and 10, but I've to admit > I'm not entirely impartial, because Citus hit this... I guess it's a matter of judgement whether you want to call this a bug or a missing feature. I wasn't really laboring under any illusion that I'd found every place that could benefit from a CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll find more such places and, well, where do you draw the line? That having been said, I don't know of any particular reason why this particular change would carry much risk. My tolerance for optional changes in back branches is lower than many people here, so if it were me, I'd probably fix it only in master. However, I can't get too excited about it either way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On 2017-06-01 21:37:56 +0530, Amit Kapila wrote: > On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund wrote: > > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote: > >> On a related note, I think it might be better to have an > >> IsInParallelMode() check in this case as we have at other places. > >> This is to ensure that if this command is invoked via plpgsql function > >> and that function runs is the parallel mode, it will act as a > >> safeguard. > > > > Hm? Which other places do it that way? Isn't standard_planner() > > centralizing such a check? > > > > heap_insert->heap_prepare_insert, heap_update, heap_delete, etc. Those aren't comparable, they're not invoking the planner - and all the places that set PARALLEL_OK don't check for it. The relevant check for planning is in standard_planner(). - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund wrote: > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote: >> On a related note, I think it might be better to have an >> IsInParallelMode() check in this case as we have at other places. >> This is to ensure that if this command is invoked via plpgsql function >> and that function runs is the parallel mode, it will act as a >> safeguard. > > Hm? Which other places do it that way? Isn't standard_planner() > centralizing such a check? > heap_insert->heap_prepare_insert, heap_update, heap_delete, etc. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On 2017-06-01 21:23:04 +0530, Amit Kapila wrote: > On a related note, I think it might be better to have an > IsInParallelMode() check in this case as we have at other places. > This is to ensure that if this command is invoked via plpgsql function > and that function runs is the parallel mode, it will act as a > safeguard. Hm? Which other places do it that way? Isn't standard_planner() centralizing such a check? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Thu, Jun 1, 2017 at 4:49 AM, Andres Freund wrote: > Hi, > > At the moment $subject doesn't allow parallelism, because copy.c's > pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK > flag. > > To me that appears to be an oversight rather than intentional. > I also don't see any problem in parallelizing it. On a related note, I think it might be better to have an IsInParallelMode() check in this case as we have at other places. This is to ensure that if this command is invoked via plpgsql function and that function runs is the parallel mode, it will act as a safeguard. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism
On Wed, May 31, 2017 at 7:19 PM, Andres Freund wrote: > At the moment $subject doesn't allow parallelism, because copy.c's > pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK > flag. > > To me that appears to be an oversight rather than intentional. A > somewhat annoying one at that, because it's not uncommong to use COPY to > execute reports to a CSV file and such. > > Robert, am I missing a complication? > > I personally think we should fix this in 9.6 and 10, but I've to admit > I'm not entirely impartial, because Citus hit this... IMHO, For copy_to I don't see any problem in parallelizing it. I just tested by changing the cursorOption and it's working in parallel. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com copy_to_parallel.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COPY (query) TO ... doesn't allow parallelism
Hi, At the moment $subject doesn't allow parallelism, because copy.c's pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK flag. To me that appears to be an oversight rather than intentional. A somewhat annoying one at that, because it's not uncommong to use COPY to execute reports to a CSV file and such. Robert, am I missing a complication? I personally think we should fix this in 9.6 and 10, but I've to admit I'm not entirely impartial, because Citus hit this... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers