Re: [HACKERS] Parallel Seq Scan

2015-12-02 Thread Amit Kapila
On Wed, Dec 2, 2015 at 12:06 PM, Michael Paquier 
wrote:
>
> On Sun, Nov 22, 2015 at 3:25 PM, Amit Kapila 
wrote:
> > On Fri, Nov 20, 2015 at 11:34 PM, Robert Haas 
wrote:
> >>
> >> On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila 
> >> wrote:
> >> > Isn't it better to destroy the memory for readers array as that gets
> >> > allocated
> >> > even if there are no workers available for execution?
> >> >
> >> > Attached patch fixes the issue by just destroying readers array.
> >>
> >> Well, then you're making ExecGatherShutdownWorkers() not a no-op any
> >> more.  I'll go commit a combination of your two patches.
> >>
> >
> > Thanks!
>
> There is still an entry in the CF app for this thread as "Parallel Seq
> scan". The basic infrastructure has been committed, and I understand
> that this is a never-ending tasks and that there will be many
> optimizations. Still, are you guys fine to switch this entry as
> committed for now?
>

I am fine with it.  I think the further optimizations can be done
separately.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-12-02 Thread Michael Paquier
On Wed, Dec 2, 2015 at 5:45 PM, Amit Kapila wrote:
> I am fine with it.  I think the further optimizations can be done
> separately.

Done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-12-01 Thread Michael Paquier
On Sun, Nov 22, 2015 at 3:25 PM, Amit Kapila  wrote:
> On Fri, Nov 20, 2015 at 11:34 PM, Robert Haas  wrote:
>>
>> On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila 
>> wrote:
>> > Isn't it better to destroy the memory for readers array as that gets
>> > allocated
>> > even if there are no workers available for execution?
>> >
>> > Attached patch fixes the issue by just destroying readers array.
>>
>> Well, then you're making ExecGatherShutdownWorkers() not a no-op any
>> more.  I'll go commit a combination of your two patches.
>>
>
> Thanks!

There is still an entry in the CF app for this thread as "Parallel Seq
scan". The basic infrastructure has been committed, and I understand
that this is a never-ending tasks and that there will be many
optimizations. Still, are you guys fine to switch this entry as
committed for now?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-21 Thread Amit Kapila
On Fri, Nov 20, 2015 at 11:34 PM, Robert Haas  wrote:
>
> On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila 
wrote:
> > Isn't it better to destroy the memory for readers array as that gets
> > allocated
> > even if there are no workers available for execution?
> >
> > Attached patch fixes the issue by just destroying readers array.
>
> Well, then you're making ExecGatherShutdownWorkers() not a no-op any
> more.  I'll go commit a combination of your two patches.
>

Thanks!


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila  wrote:
> Isn't it better to destroy the memory for readers array as that gets
> allocated
> even if there are no workers available for execution?
>
> Attached patch fixes the issue by just destroying readers array.

Well, then you're making ExecGatherShutdownWorkers() not a no-op any
more.  I'll go commit a combination of your two patches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-19 Thread Amit Kapila
On Thu, Nov 19, 2015 at 9:29 PM, Robert Haas  wrote:
>
> On Wed, Nov 18, 2015 at 10:41 PM, Amit Kapila 
wrote:
> > I think whats going on here is that when any of the session doesn't
> > get any workers, we shutdown the Gather node which internally destroys
> > the dynamic shared memory segment as well.  However the same is
> > needed as per current design for doing scan by master backend as
> > well.  So I think the fix would be to just do shutdown of workers which
> > actually won't do anything in this scenario.
>
> It seems silly to call ExecGatherShutdownWorkers() here when that's
> going to be a no-op.  I think we should just remove that line and the
> if statement before it altogether and replace it with a comment
> explaining why we can't nuke the DSM at this stage.
>

Isn't it better to destroy the memory for readers array as that gets
allocated
even if there are no workers available for execution?

Attached patch fixes the issue by just destroying readers array.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_early_dsm_destroy_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-19 Thread Robert Haas
On Wed, Nov 18, 2015 at 10:41 PM, Amit Kapila  wrote:
> I think whats going on here is that when any of the session doesn't
> get any workers, we shutdown the Gather node which internally destroys
> the dynamic shared memory segment as well.  However the same is
> needed as per current design for doing scan by master backend as
> well.  So I think the fix would be to just do shutdown of workers which
> actually won't do anything in this scenario.

It seems silly to call ExecGatherShutdownWorkers() here when that's
going to be a no-op.  I think we should just remove that line and the
if statement before it altogether and replace it with a comment
explaining why we can't nuke the DSM at this stage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-18 Thread Robert Haas
On Wed, Nov 18, 2015 at 12:48 AM, Amit Kapila  wrote:
>> I suggest that we instead fix ExecParallelFinish() to be idempotent.
>> Add a "bool finished" flag to ParallelExecutorInfo and return at once
>> if it's already set. Get rid of the exposed
>> ExecParallelReinitializeTupleQueues() interface and have
>> ExecParallelReinitialize(pei) instead.  Have that call
>> ReinitializeParallelDSM(), ExecParallelSetupTupleQueues(pei->pcxt,
>> true), and set pei->finished = false.  I think that would give us a
>> slightly cleaner separation of concerns between nodeGather.c and
>> execParallel.c.
>
> Okay, attached patch fixes the issue as per above suggestion.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-18 Thread Amit Kapila
On Tue, Nov 17, 2015 at 1:21 AM, Bert  wrote:

> Hey,
>
> I've just pulled and compiled the new code.
> I'm running a TPC-DS like test on different PostgreSQL installations, but
> running (max) 12queries in parallel on a server with 12cores.
> I've configured max_parallel_degree to 2, and I get messages that backend
> processes crash.
> I am running the same test now with 6queries in parallel, and parallel
> degree to 2, and they seem to work. for now. :)
>
> This is the output I get in /var/log/messages
> Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
> 7fa3437bf104 ip 00490b56 sp 7ffdf2f083a0 error 6 in
> postgres[40+5b5000]
>
>
Thanks for reporting the issue.

I think whats going on here is that when any of the session doesn't
get any workers, we shutdown the Gather node which internally destroys
the dynamic shared memory segment as well.  However the same is
needed as per current design for doing scan by master backend as
well.  So I think the fix would be to just do shutdown of workers which
actually won't do anything in this scenario.  I have tried to reproduce
this issue with a simpler test case as below:

Create two tables with large data:
CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 100) g;

Set max_worker_processes = 2 in postgresql.conf

Session-1
set max_parallel_degree=4;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
Explain analyze select count(*) from t1 where c1 > 1;

Session-2
set max_parallel_degree=4;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
Explain analyze select count(*) from t2 where c1 > 1;

The trick to reproduce is that the Explain statement in Session-2
needs to be executed immediately after Explain statement in
Session-1.

Attached patch fixes the issue for me.

I think here we can go for somewhat more invasive fix as well which is
if the statement didn't find any workers, then reset the dsm and also
reset the execution tree (which in case of seq scan means clear the
parallel scan desc and may be few more fields in scan desc) such that it
performs seq scan.  I am not sure how future-proof such a change would
be, because resetting some of the fields in execution tree and expecting
it to work in all cases might not be feasible for all nodes.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_early_dsm_destroy_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Bert
edit: maybe this is more useful? :)

(gdb) bt full
#0  0x00490b56 in heap_parallelscan_nextpage ()
No symbol table info available.
#1  0x00493fdf in heap_getnext ()
No symbol table info available.
#2  0x005c0733 in SeqNext ()
No symbol table info available.
#3  0x005ac5d9 in ExecScan ()
No symbol table info available.
#4  0x005a5c08 in ExecProcNode ()
No symbol table info available.
#5  0x005b5298 in ExecGather ()
No symbol table info available.
#6  0x005a5aa8 in ExecProcNode ()
No symbol table info available.
#7  0x005b68b9 in MultiExecHash ()
No symbol table info available.
#8  0x005b7256 in ExecHashJoin ()
No symbol table info available.
#9  0x005a5b18 in ExecProcNode ()
No symbol table info available.
#10 0x005b0ac9 in fetch_input_tuple ()
No symbol table info available.
#11 0x005b1eaf in ExecAgg ()
No symbol table info available.
#12 0x005a5ad8 in ExecProcNode ()
No symbol table info available.
#13 0x005c11e1 in ExecSort ()
No symbol table info available.
#14 0x005a5af8 in ExecProcNode ()
No symbol table info available.
#15 0x005ba164 in ExecLimit ()
No symbol table info available.
#16 0x005a5a38 in ExecProcNode ()
No symbol table info available.
#17 0x005a2343 in standard_ExecutorRun ()
No symbol table info available.
#18 0x0069cb08 in PortalRunSelect ()
No symbol table info available.
#19 0x0069de5f in PortalRun ()
No symbol table info available.
#20 0x0069bc16 in PostgresMain ()
No symbol table info available.
#21 0x00466f55 in ServerLoop ()
No symbol table info available.
#22 0x00648436 in PostmasterMain ()
No symbol table info available.
#23 0x004679f0 in main ()
No symbol table info available.


On Tue, Nov 17, 2015 at 12:38 PM, Bert  wrote:

> Hi,
>
> this is the backtrace:
> gdb /var/lib/pgsql/9.6/data/ /var/lib/pgsql/9.6/data/core.7877
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> /var/lib/pgsql/9.6/data/: Success.
> [New LWP 7877]
> Missing separate debuginfo for the main executable file
> Try: yum --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/02/20b77a9ab8f607b0610082794165fccedf210d
> Core was generated by `postgres: postgres tpcds [loca'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00490b56 in ?? ()
> (gdb) bt full
> #0  0x00490b56 in ?? ()
> No symbol table info available.
> #1  0x3668 in ?? ()
> No symbol table info available.
> #2  0x7f956249a008 in ?? ()
> No symbol table info available.
> #3  0x0228c498 in ?? ()
> No symbol table info available.
> #4  0x0001 in ?? ()
> No symbol table info available.
> #5  0x0228ad00 in ?? ()
> No symbol table info available.
> #6  0x00493fdf in ?? ()
> No symbol table info available.
> #7  0x021a8e50 in ?? ()
> No symbol table info available.
> #8  0x in ?? ()
> No symbol table info available.
> (gdb) q
>
> Is there something else I can do?
>
> On Mon, Nov 16, 2015 at 8:59 PM, Robert Haas 
> wrote:
>
>> On Mon, Nov 16, 2015 at 2:51 PM, Bert  wrote:
>> > I've just pulled and compiled the new code.
>> > I'm running a TPC-DS like test on different PostgreSQL installations,
>> but
>> > running (max) 12queries in parallel on a server with 12cores.
>> > I've configured max_parallel_degree to 2, and I get messages that
>> backend
>> > processes crash.
>> > I am running the same test now with 6queries in parallel, and parallel
>> > degree to 2, and they seem to work. for now. :)
>> >
>> > This is the output I get in /var/log/messages
>> > Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
>> 7fa3437bf104
>> > ip 00490b56 sp 7ffdf2f083a0 error 6 in
>> postgres[40+5b5000]
>> >
>> > Is there something else I should get?
>>
>> Can you enable core dumps e.g. by passing the -c option to pg_ctl
>> start?  If you can get a core file, you can then get a backtrace
>> using:
>>
>> gdb /path/to/postgres /path/to/core
>> bt full
>> q
>>
>> That should be enough to find and fix whatever the bug is.  Thanks for
>> testing.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> Bert Desmet
> 0477/305361
>



-- 
Bert Desmet
0477/305361


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Bert
Hi,

this is the backtrace:
gdb /var/lib/pgsql/9.6/data/ /var/lib/pgsql/9.6/data/core.7877
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
/var/lib/pgsql/9.6/data/: Success.
[New LWP 7877]
Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install
/usr/lib/debug/.build-id/02/20b77a9ab8f607b0610082794165fccedf210d
Core was generated by `postgres: postgres tpcds [loca'.
Program terminated with signal 11, Segmentation fault.
#0  0x00490b56 in ?? ()
(gdb) bt full
#0  0x00490b56 in ?? ()
No symbol table info available.
#1  0x3668 in ?? ()
No symbol table info available.
#2  0x7f956249a008 in ?? ()
No symbol table info available.
#3  0x0228c498 in ?? ()
No symbol table info available.
#4  0x0001 in ?? ()
No symbol table info available.
#5  0x0228ad00 in ?? ()
No symbol table info available.
#6  0x00493fdf in ?? ()
No symbol table info available.
#7  0x021a8e50 in ?? ()
No symbol table info available.
#8  0x in ?? ()
No symbol table info available.
(gdb) q

Is there something else I can do?

On Mon, Nov 16, 2015 at 8:59 PM, Robert Haas  wrote:

> On Mon, Nov 16, 2015 at 2:51 PM, Bert  wrote:
> > I've just pulled and compiled the new code.
> > I'm running a TPC-DS like test on different PostgreSQL installations, but
> > running (max) 12queries in parallel on a server with 12cores.
> > I've configured max_parallel_degree to 2, and I get messages that backend
> > processes crash.
> > I am running the same test now with 6queries in parallel, and parallel
> > degree to 2, and they seem to work. for now. :)
> >
> > This is the output I get in /var/log/messages
> > Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
> 7fa3437bf104
> > ip 00490b56 sp 7ffdf2f083a0 error 6 in
> postgres[40+5b5000]
> >
> > Is there something else I should get?
>
> Can you enable core dumps e.g. by passing the -c option to pg_ctl
> start?  If you can get a core file, you can then get a backtrace
> using:
>
> gdb /path/to/postgres /path/to/core
> bt full
> q
>
> That should be enough to find and fix whatever the bug is.  Thanks for
> testing.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Bert Desmet
0477/305361


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Robert Haas
On Tue, Nov 17, 2015 at 6:52 AM, Bert  wrote:
> edit: maybe this is more useful? :)

Definitely.  But if you've built with --enable-debug and not stripped
the resulting executable, we ought to get line numbers as well, plus
the arguments to each function on the stack.  That would help a lot
more.  The only things that get dereferenced in that function are
"scan" and "parallel_scan", so it's a good bet that one of those
pointers is pointing off into never-never land.  I can't immediately
guess how that's happening, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Robert Haas
On Mon, Nov 16, 2015 at 9:49 PM, Amit Kapila  wrote:
>> I don't understand this part.
>>
>
> The code in question is as below:
>
> tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
>
> {
> ..
>
> if (tqueue->tupledesc != tupledesc ||
>
> tqueue->remapinfo->natts != tupledesc->natts)
>
> {
>
> if (tqueue->remapinfo != NULL)
>
> pfree(tqueue->remapinfo);
>
> tqueue->remapinfo = BuildRemapInfo(tupledesc);
>
> }
>
> ..
> }
>
> Here the above check always passes as tqueue->tupledesc is not
> set due to which it always try to build remap info.  Is there any reason
> for doing so?

Groan.  The problem here is that tqueue->tupledesc never gets set.  I
think this should be fixed as in the attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 5735acf..d625b0d 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -127,15 +127,15 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
 	 * new tupledesc.  This is a strange test both because the executor really
 	 * shouldn't change the tupledesc, and also because it would be unsafe if
 	 * the old tupledesc could be freed and a new one allocated at the same
-	 * address.  But since some very old code in printtup.c uses this test, we
-	 * adopt it here as well.
+	 * address.  But since some very old code in printtup.c uses a similar
+	 * test, we adopt it here as well.
 	 */
-	if (tqueue->tupledesc != tupledesc ||
-		tqueue->remapinfo->natts != tupledesc->natts)
+	if (tqueue->tupledesc != tupledesc)
 	{
 		if (tqueue->remapinfo != NULL)
 			pfree(tqueue->remapinfo);
 		tqueue->remapinfo = BuildRemapInfo(tupledesc);
+		tqueue->tupledesc = tupledesc;
 	}
 
 	tuple = ExecMaterializeSlot(slot);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Robert Haas
On Thu, Nov 12, 2015 at 10:23 AM, Amit Kapila  wrote:
> Thanks for the report.  The reason for this problem is that instrumentation
> information from workers is getting aggregated multiple times.  In
> ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
> will wait for workers to finish and then accumulate stats from workers.
> Now ExecShutdownGatherWorkers() could be called multiple times
> (once we read all tuples from workers, at end of node) and it should be
> ensured that repeated calls should not try to redo the work done by first
> call.
> The same is ensured for tuplequeues, but not for parallel executor info.
> I think we can safely assume that we need to call ExecParallelFinish() only
> when there are workers started by the Gathers node, so on those lines
> attached patch should fix the problem.

I suggest that we instead fix ExecParallelFinish() to be idempotent.
Add a "bool finished" flag to ParallelExecutorInfo and return at once
if it's already set. Get rid of the exposed
ExecParallelReinitializeTupleQueues() interface and have
ExecParallelReinitialize(pei) instead.  Have that call
ReinitializeParallelDSM(), ExecParallelSetupTupleQueues(pei->pcxt,
true), and set pei->finished = false.  I think that would give us a
slightly cleaner separation of concerns between nodeGather.c and
execParallel.c.

Your fix seems a little fragile.  You're relying on node->reader !=
NULL to tell you whether the readers need to be cleaned up, but in
fact node->reader is set to a non-NULL value AFTER the pei has been
created.  Granted, we currently always create a reader unless we don't
get any workers, and if we don't get any workers then failing to call
ExecParallelFinish is currently harmless, but nonetheless I think we
should be more explicit about this so it doesn't accidentally get
broken later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Amit Kapila
On Wed, Nov 18, 2015 at 12:59 AM, Robert Haas  wrote:
>
> On Thu, Nov 12, 2015 at 10:23 AM, Amit Kapila 
wrote:
> > Thanks for the report.  The reason for this problem is that
instrumentation
> > information from workers is getting aggregated multiple times.  In
> > ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
> > will wait for workers to finish and then accumulate stats from workers.
> > Now ExecShutdownGatherWorkers() could be called multiple times
> > (once we read all tuples from workers, at end of node) and it should be
> > ensured that repeated calls should not try to redo the work done by
first
> > call.
> > The same is ensured for tuplequeues, but not for parallel executor info.
> > I think we can safely assume that we need to call ExecParallelFinish()
only
> > when there are workers started by the Gathers node, so on those lines
> > attached patch should fix the problem.
>
> I suggest that we instead fix ExecParallelFinish() to be idempotent.
> Add a "bool finished" flag to ParallelExecutorInfo and return at once
> if it's already set. Get rid of the exposed
> ExecParallelReinitializeTupleQueues() interface and have
> ExecParallelReinitialize(pei) instead.  Have that call
> ReinitializeParallelDSM(), ExecParallelSetupTupleQueues(pei->pcxt,
> true), and set pei->finished = false.  I think that would give us a
> slightly cleaner separation of concerns between nodeGather.c and
> execParallel.c.
>

Okay, attached patch fixes the issue as per above suggestion.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_finish_parallel_executor_info_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Robert Haas
On Tue, Nov 17, 2015 at 4:51 PM, Bert  wrote:

> Hey Robert,
>
> Thank you for the help. As you might (not) know, I'm quite new to the
> community, but I'm learning. with the help from people like you.
> anyhow, find attached a third attempt to a valid backtrace file.
>
> This run is compiled from commit 5f10b7a604c87fc61a2c20a56552301f74c9bd5f
> and your latest patch atteched in this mailtrack.
>

Thanks.  This is great.  Can you also run these commands:

frame 1
p *scan

The first command should select the heap_parallelscan_nextpage frame.  The
second command should print the contents of the scan object.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Amit Kapila
On Tue, Nov 17, 2015 at 11:22 PM, Robert Haas  wrote:
>
> On Mon, Nov 16, 2015 at 9:49 PM, Amit Kapila 
wrote:
> >> I don't understand this part.
> >>
> >
> > Here the above check always passes as tqueue->tupledesc is not
> > set due to which it always try to build remap info.  Is there any reason
> > for doing so?
>
> Groan.  The problem here is that tqueue->tupledesc never gets set.

Yes that was the problem.

>  I
> think this should be fixed as in the attached.
>

Works for me!


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-17 Thread Bert
Hey Robert,

Thank you for the help. As you might (not) know, I'm quite new to the
community, but I'm learning. with the help from people like you.
anyhow, find attached a third attempt to a valid backtrace file.

This run is compiled from commit 5f10b7a604c87fc61a2c20a56552301f74c9bd5f
and your latest patch atteched in this mailtrack.


cheers,
Bert​
 full_backtrace.log

​

On Tue, Nov 17, 2015 at 6:55 PM, Robert Haas  wrote:

> On Tue, Nov 17, 2015 at 6:52 AM, Bert  wrote:
> > edit: maybe this is more useful? :)
>
> Definitely.  But if you've built with --enable-debug and not stripped
> the resulting executable, we ought to get line numbers as well, plus
> the arguments to each function on the stack.  That would help a lot
> more.  The only things that get dereferenced in that function are
> "scan" and "parallel_scan", so it's a good bet that one of those
> pointers is pointing off into never-never land.  I can't immediately
> guess how that's happening, though.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Bert Desmet
0477/305361


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila
On Mon, Nov 16, 2015 at 4:35 AM, Robert Haas  wrote:
>
> On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:
> >>> And perhaps associated PIDs?
> >>
> >> Yeah, that can be useful, if others also feel like it is important, I
can
> >> look into preparing a patch for the same.
> >
> > Thanks.
>
> Thom, what do you think the EXPLAIN output should look like,
> specifically?  Or anyone else who feels like answering.
>
> I don't think it would be very useful to repeat the entire EXPLAIN
> output n times, once per worker.  That sounds like a loser.
>

Yes, it doesn't seem good idea to repeat the information, but what
about the cases when different workers perform scan on different
relations (partitions in case of Append node) or may be performs a
different operation in Sort or join node parallelism.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Pavel Stehule
2015-11-16 14:17 GMT+01:00 Amit Kapila :

> On Mon, Nov 16, 2015 at 4:35 AM, Robert Haas 
> wrote:
> >
> > On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:
> > >>> And perhaps associated PIDs?
> > >>
> > >> Yeah, that can be useful, if others also feel like it is important, I
> can
> > >> look into preparing a patch for the same.
> > >
> > > Thanks.
> >
> > Thom, what do you think the EXPLAIN output should look like,
> > specifically?  Or anyone else who feels like answering.
> >
> > I don't think it would be very useful to repeat the entire EXPLAIN
> > output n times, once per worker.  That sounds like a loser.
> >
>
> Yes, it doesn't seem good idea to repeat the information, but what
> about the cases when different workers perform scan on different
> relations (partitions in case of Append node) or may be performs a
> different operation in Sort or join node parallelism.
>

+1

Pavel


>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Jeff Janes
On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila  wrote:
> On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes  wrote:
>>
>> On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas 
>> wrote:
>> >
>> > I've committed most of this, except for some planner bits that I
>> > didn't like, and after a bunch of cleanup.  Instead, I committed the
>> > consider-parallel-v2.patch with some additional planner bits to make
>> > up for the ones I removed from your patch.  So, now we have parallel
>> > sequential scan!
>>
>> Pretty cool.  All I had to do is mark my slow plperl functions as
>> being parallel safe, and bang, parallel execution of them for seq
>> scans.
>>
>> But, there does seem to be a memory leak.
>>
>
> Thanks for the report.
>
> I think main reason of the leak in workers seems to be due the reason
> that one of the buffer used while sending tuples (in function
> BuildRemapInfo)
> from worker to master is not getting freed and it is allocated for each
> tuple worker sends back to master.  I couldn't find use of such a buffer,
> so I think we can avoid the allocation of same or atleast we need to free
> it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> issue.

Thanks, that patch (as committed) has fixed the problem for me.  I
don't understand the second one.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Bert
Hey,

I've just pulled and compiled the new code.
I'm running a TPC-DS like test on different PostgreSQL installations, but
running (max) 12queries in parallel on a server with 12cores.
I've configured max_parallel_degree to 2, and I get messages that backend
processes crash.
I am running the same test now with 6queries in parallel, and parallel
degree to 2, and they seem to work. for now. :)

This is the output I get in /var/log/messages
Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
7fa3437bf104 ip 00490b56 sp 7ffdf2f083a0 error 6 in
postgres[40+5b5000]

Is there something else I should get?

cheers,
Bert

On Mon, Nov 16, 2015 at 6:06 PM, Jeff Janes  wrote:

> On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila 
> wrote:
> > On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes 
> wrote:
> >>
> >> On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas 
> >> wrote:
> >> >
> >> > I've committed most of this, except for some planner bits that I
> >> > didn't like, and after a bunch of cleanup.  Instead, I committed the
> >> > consider-parallel-v2.patch with some additional planner bits to make
> >> > up for the ones I removed from your patch.  So, now we have parallel
> >> > sequential scan!
> >>
> >> Pretty cool.  All I had to do is mark my slow plperl functions as
> >> being parallel safe, and bang, parallel execution of them for seq
> >> scans.
> >>
> >> But, there does seem to be a memory leak.
> >>
> >
> > Thanks for the report.
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a buffer,
> > so I think we can avoid the allocation of same or atleast we need to free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Thanks, that patch (as committed) has fixed the problem for me.  I
> don't understand the second one.
>
> Cheers,
>
> Jeff
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Bert Desmet
0477/305361


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Robert Haas
On Mon, Nov 16, 2015 at 2:51 PM, Bert  wrote:
> I've just pulled and compiled the new code.
> I'm running a TPC-DS like test on different PostgreSQL installations, but
> running (max) 12queries in parallel on a server with 12cores.
> I've configured max_parallel_degree to 2, and I get messages that backend
> processes crash.
> I am running the same test now with 6queries in parallel, and parallel
> degree to 2, and they seem to work. for now. :)
>
> This is the output I get in /var/log/messages
> Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at 7fa3437bf104
> ip 00490b56 sp 7ffdf2f083a0 error 6 in postgres[40+5b5000]
>
> Is there something else I should get?

Can you enable core dumps e.g. by passing the -c option to pg_ctl
start?  If you can get a core file, you can then get a backtrace
using:

gdb /path/to/postgres /path/to/core
bt full
q

That should be enough to find and fix whatever the bug is.  Thanks for testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila
On Mon, Nov 16, 2015 at 7:39 AM, Robert Haas  wrote:
> On Sun, Nov 15, 2015 at 1:12 AM, Amit Kapila 
wrote:
> > Thanks for the report.
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a
buffer,
> > so I think we can avoid the allocation of same or atleast we need to
free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Oops.  Committed.
>

Thanks!

> > Another thing I have noticed is that we need to build the remap info
> > target list contains record type of attrs, so ideally it should not
even go
> > in
> > this path when such attrs are not present.  The reason for the same was
> > that the tuple descriptor stored in TQueueDestReceiver was not updated,
> > attached patch fix_initialization_tdesc_v1 fixes this issue.
>
> I don't understand this part.
>

The code in question is as below:

tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
{
..

if (tqueue->tupledesc != tupledesc ||

tqueue->remapinfo->natts != tupledesc->natts)

{

if (tqueue->remapinfo != NULL)

pfree(tqueue->remapinfo);

tqueue->remapinfo = BuildRemapInfo(tupledesc);

}
..
}

Here the above check always passes as tqueue->tupledesc is not
set due to which it always try to build remap info.  Is there any reason
for doing so?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila
On Mon, Nov 16, 2015 at 10:36 PM, Jeff Janes  wrote:
>
> On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila 
wrote:
> > On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes 
wrote:
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a
buffer,
> > so I think we can avoid the allocation of same or atleast we need to
free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Thanks, that patch (as committed) has fixed the problem for me.
>

Thanks to you as well for verification.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-15 Thread Gavin Flower

On 16/11/15 12:05, Robert Haas wrote:

On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:

And perhaps associated PIDs?

Yeah, that can be useful, if others also feel like it is important, I can
look into preparing a patch for the same.

Thanks.

Thom, what do you think the EXPLAIN output should look like,
specifically?  Or anyone else who feels like answering.

I don't think it would be very useful to repeat the entire EXPLAIN
output n times, once per worker.  That sounds like a loser.  But we
could add additional lines to the output for each node, like this:

  Parallel Seq Scan on foo  (cost=0.00..XXX rows=YYY width=ZZZ) (actual
time=AAA..BBB rows=CCC loops=1)
   Leader: actual time=AAA..BBB rows=CCC loops=1
   Worker 0: actual time=AAA..BBB rows=CCC loops=1
   Worker 1: actual time=AAA..BBB rows=CCC loops=1
   Worker 2: actual time=AAA..BBB rows=CCC loops=1

If "buffers" is specified, we could display the summary information
after the Parallel Seq Scan as normal and then display an additional
per-worker line after the "Leader" line and each "Worker N" line.  I
think displaying the worker index is more useful than displaying the
PID, especially if we think that a plan tree like this might ever get
executed multiple times with different PIDs on each pass.

Like?  Dislike?  Other ideas?


Possibly have an option to include the PID?

Consider altering the format field width of the Worker number (depending 
on the number of workers) so you don't get:

   Worker 9 ...
   Worker 10 ...
but something like
   Worker  9 ...
   Worker 10 ...



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-15 Thread Robert Haas
On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:
>>> And perhaps associated PIDs?
>>
>> Yeah, that can be useful, if others also feel like it is important, I can
>> look into preparing a patch for the same.
>
> Thanks.

Thom, what do you think the EXPLAIN output should look like,
specifically?  Or anyone else who feels like answering.

I don't think it would be very useful to repeat the entire EXPLAIN
output n times, once per worker.  That sounds like a loser.  But we
could add additional lines to the output for each node, like this:

 Parallel Seq Scan on foo  (cost=0.00..XXX rows=YYY width=ZZZ) (actual
time=AAA..BBB rows=CCC loops=1)
  Leader: actual time=AAA..BBB rows=CCC loops=1
  Worker 0: actual time=AAA..BBB rows=CCC loops=1
  Worker 1: actual time=AAA..BBB rows=CCC loops=1
  Worker 2: actual time=AAA..BBB rows=CCC loops=1

If "buffers" is specified, we could display the summary information
after the Parallel Seq Scan as normal and then display an additional
per-worker line after the "Leader" line and each "Worker N" line.  I
think displaying the worker index is more useful than displaying the
PID, especially if we think that a plan tree like this might ever get
executed multiple times with different PIDs on each pass.

Like?  Dislike?  Other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-15 Thread Robert Haas
On Sun, Nov 15, 2015 at 1:12 AM, Amit Kapila  wrote:
> Thanks for the report.
>
> I think main reason of the leak in workers seems to be due the reason
> that one of the buffer used while sending tuples (in function
> BuildRemapInfo)
> from worker to master is not getting freed and it is allocated for each
> tuple worker sends back to master.  I couldn't find use of such a buffer,
> so I think we can avoid the allocation of same or atleast we need to free
> it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> issue.

Oops.  Committed.

> Another thing I have noticed is that we need to build the remap info
> target list contains record type of attrs, so ideally it should not even go
> in
> this path when such attrs are not present.  The reason for the same was
> that the tuple descriptor stored in TQueueDestReceiver was not updated,
> attached patch fix_initialization_tdesc_v1 fixes this issue.

I don't understand this part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-14 Thread Amit Kapila
On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes  wrote:
>
> On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas 
wrote:
> >
> > I've committed most of this, except for some planner bits that I
> > didn't like, and after a bunch of cleanup.  Instead, I committed the
> > consider-parallel-v2.patch with some additional planner bits to make
> > up for the ones I removed from your patch.  So, now we have parallel
> > sequential scan!
>
> Pretty cool.  All I had to do is mark my slow plperl functions as
> being parallel safe, and bang, parallel execution of them for seq
> scans.
>
> But, there does seem to be a memory leak.
>

Thanks for the report.

I think main reason of the leak in workers seems to be due the reason
that one of the buffer used while sending tuples (in
function BuildRemapInfo)
from worker to master is not getting freed and it is allocated for each
tuple worker sends back to master.  I couldn't find use of such a buffer,
so I think we can avoid the allocation of same or atleast we need to free
it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
issue.

Another thing I have noticed is that we need to build the remap info
target list contains record type of attrs, so ideally it should not even go
in
this path when such attrs are not present.  The reason for the same was
that the tuple descriptor stored in TQueueDestReceiver was not updated,
attached patch fix_initialization_tdesc_v1 fixes this issue.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


remove_unused_buf_allocation_v1.patch
Description: Binary data


fix_initialization_tdesc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Thom Brown
On 13 November 2015 at 15:22, Amit Kapila  wrote:
> On Fri, Nov 13, 2015 at 7:59 PM, Thom Brown  wrote:
>>
>> On 13 November 2015 at 13:38, Amit Kapila  wrote:
>> > On Wed, Nov 11, 2015 at 11:40 PM, Pavel Stehule
>> > 
>> > wrote:
>> >>
>> >>
>> >> yes - the another little bit unclean in EXPLAIN is number of workers.
>> >> If I
>> >> understand to the behave, the query is processed by two processes if
>> >> workers
>> >> in the explain is one.
>> >>
>> >
>> > You are right and I think that is current working model of Gather
>> > node which seems okay. I think the more serious thing here
>> > is that there is possibility that Explain Analyze can show the
>> > number of workers as more than actual workers working for Gather
>> > node.  We have already discussed that Explain Analyze should
>> > the actual number of workers used in query execution, patch for
>> > the same is still pending.
>>
>> This may have already been discussed before, but in a verbose output,
>> would it be possible to see the nodes for each worker?
>>
>
> There will be hardly any difference in nodes for each worker and it could
> be very long plan for large number of workers.  What kind of additional
> information you want which can't be shown in current format.

For explain plans, not that useful, but it's useful to see how long
each worker took for explain analyse.  And I imagine as more
functionality is added to scan partitions and foreign scans, it will
perhaps be more useful when the plans won't be identical. (or would
they?)

>>
>> And perhaps associated PIDs?
>>
>
> Yeah, that can be useful, if others also feel like it is important, I can
> look into preparing a patch for the same.

Thanks.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Amit Kapila
On Fri, Nov 13, 2015 at 7:59 PM, Thom Brown  wrote:
>
> On 13 November 2015 at 13:38, Amit Kapila  wrote:
> > On Wed, Nov 11, 2015 at 11:40 PM, Pavel Stehule 
> > wrote:
> >>
> >>
> >> yes - the another little bit unclean in EXPLAIN is number of workers.
If I
> >> understand to the behave, the query is processed by two processes if
workers
> >> in the explain is one.
> >>
> >
> > You are right and I think that is current working model of Gather
> > node which seems okay. I think the more serious thing here
> > is that there is possibility that Explain Analyze can show the
> > number of workers as more than actual workers working for Gather
> > node.  We have already discussed that Explain Analyze should
> > the actual number of workers used in query execution, patch for
> > the same is still pending.
>
> This may have already been discussed before, but in a verbose output,
> would it be possible to see the nodes for each worker?
>

There will be hardly any difference in nodes for each worker and it could
be very long plan for large number of workers.  What kind of additional
information you want which can't be shown in current format.

>
> And perhaps associated PIDs?
>

Yeah, that can be useful, if others also feel like it is important, I can
look into preparing a patch for the same.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Jeff Janes
On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas  wrote:
>
> I've committed most of this, except for some planner bits that I
> didn't like, and after a bunch of cleanup.  Instead, I committed the
> consider-parallel-v2.patch with some additional planner bits to make
> up for the ones I removed from your patch.  So, now we have parallel
> sequential scan!

Pretty cool.  All I had to do is mark my slow plperl functions as
being parallel safe, and bang, parallel execution of them for seq
scans.

But, there does seem to be a memory leak.

The setup (warning: 20GB of data):

create table foobar as select md5(floor(random()*150)::text) as
id, random() as volume
  from generate_series(1,2);

set max_parallel_degree TO 8;

explain select count(*) from foobar where volume >0.9;
  QUERY PLAN
---
 Aggregate  (cost=2626202.44..2626202.45 rows=1 width=0)
   ->  Gather  (cost=1000.00..2576381.76 rows=19928272 width=0)
 Number of Workers: 7
 ->  Parallel Seq Scan on foobar  (cost=0.00..582554.56
rows=19928272 width=0)
   Filter: (volume > '0.9'::double precision)


Now running this query leads to an OOM condition:

explain (analyze, buffers) select count(*) from foobar where volume >0.9;
WARNING:  terminating connection because of crash of another server process


Running it without the explain also causes the problem.

Memory dump looks like at some point before the crash looks like:

TopMemoryContext: 62496 total in 9 blocks; 16976 free (60 chunks); 45520 used
  TopTransactionContext: 8192 total in 1 blocks; 4024 free (8 chunks); 4168 used
ExecutorState: 1795153920 total in 223 blocks; 4159872 free (880
chunks); 1790994048 used
  ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  other insignificant stuff...

I don't have enough RAM for each of 7 workers to use all that much more than 2GB

work_mem is 25MB, maintenance work_mem is 64MB

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Amit Kapila
On Fri, Nov 13, 2015 at 9:16 PM, Thom Brown  wrote:
>
> On 13 November 2015 at 15:22, Amit Kapila  wrote:
> >
> > There will be hardly any difference in nodes for each worker and it
could
> > be very long plan for large number of workers.  What kind of additional
> > information you want which can't be shown in current format.
>
> For explain plans, not that useful, but it's useful to see how long
> each worker took for explain analyse.
>

The statistics related to buffers, timing and infact rows filtered will be
different for each worker, so it sounds sensible to me to display separate
plan info for each worker or at least display the same in verbose or some
other mode and then display aggregated information at Gather node.  The
only point that needs more thought is that parallel plans will look big for
many number of workers.  I think this will need somewhat substantial
changes than what is done currently for parallel seq scan, so it is better
if others also share their opinion about this form of display of information
for parallel queries.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Thom Brown
On 13 November 2015 at 13:38, Amit Kapila  wrote:
> On Wed, Nov 11, 2015 at 11:40 PM, Pavel Stehule 
> wrote:
>>
>>
>> yes - the another little bit unclean in EXPLAIN is number of workers. If I
>> understand to the behave, the query is processed by two processes if workers
>> in the explain is one.
>>
>
> You are right and I think that is current working model of Gather
> node which seems okay. I think the more serious thing here
> is that there is possibility that Explain Analyze can show the
> number of workers as more than actual workers working for Gather
> node.  We have already discussed that Explain Analyze should
> the actual number of workers used in query execution, patch for
> the same is still pending.

This may have already been discussed before, but in a verbose output,
would it be possible to see the nodes for each worker?

e.g.

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->>'title' like '%de%';
  QUERY
PLAN
--
 Aggregate  (cost=105557.59..105557.60 rows=1 width=0) (actual
time=400.752..400.752 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175333
   ->  Gather  (cost=1000.00..104931.04 rows=250621 width=0) (actual
time=400.748..400.748 rows=0 loops=1)
 Output: content
 Number of Workers: 2
 Buffers: shared hit=175333
 ->  Parallel Seq Scan on public.js  (cost=0.00..39434.47
rows=125310 width=0) (actual time=182.256..398.14 rows=0 loops=1)
   Output: content
   Filter: (((js.content -> 'tags'::text) ->>
'title'::text) ~~ '%de%'::text)
   Rows Removed by Filter: 626486
   Buffers: shared hit=87666
 ->  Parallel Seq Scan on public.js  (cost=0.00..39434.47
rows=1253101 width=0) (actual time=214.11..325.31 rows=0 loops=1)
   Output: content
   Filter: (((js.content -> 'tags'::text) ->>
'title'::text) ~~ '%de%'::text)
   Rows Removed by Filter: 6264867
   Buffers: shared hit=876667
 Planning time: 0.085 ms
 Execution time: 414.713 ms
(14 rows)

And perhaps associated PIDs?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Thom Brown
On 13 November 2015 at 03:39, Amit Kapila  wrote:
> On Thu, Nov 12, 2015 at 9:05 PM, Thom Brown  wrote:
>>
>> On 12 November 2015 at 15:23, Amit Kapila  wrote:
>> > On Wed, Nov 11, 2015 at 11:29 PM, Pavel Stehule
>> > 
>> > wrote:
>> >>
>> >> Hi
>> >>
>> >> I have a first query
>> >>
>> >> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> >> differen
>> >>
>> >
>> > Thanks for the report.  The reason for this problem is that
>> > instrumentation
>> > information from workers is getting aggregated multiple times.  In
>> > ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
>> > will wait for workers to finish and then accumulate stats from workers.
>> > Now ExecShutdownGatherWorkers() could be called multiple times
>> > (once we read all tuples from workers, at end of node) and it should be
>> > ensured that repeated calls should not try to redo the work done by
>> > first
>> > call.
>> > The same is ensured for tuplequeues, but not for parallel executor info.
>> > I think we can safely assume that we need to call ExecParallelFinish()
>> > only
>> > when there are workers started by the Gathers node, so on those lines
>> > attached patch should fix the problem.
>>
>> That fixes the count issue for me, although not the number of buffers
>> hit,
>>
>
> The number of shared buffers hit could be different across different runs
> because the read sequence of parallel workers can't be guaranteed, also
> I don't think same is even guaranteed for Seq Scan node, the other
> operations
> in parallel could lead to different number, however the actual problem was
> that in one of the plans shown by you [1], the Buffers hit at Gather node
> (175288) is lesser than the Buffers hit at Parallel Seq Scan node (241201).
> Do you still (after applying above patch) see that Gather node is showing
> lesser hit buffers than Parallel Seq Scan node?

Hmm... that's odd, I'm not seeing the problem now, so maybe I'm mistaken there.

> [1]
>  # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->>'title' like '%design%';
>  QUERY
> PLAN
> 
>  Aggregate  (cost=132489.34..132489.35 rows=1 width=0) (actual
> time=382.987..382.987 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=175288
>->  Gather  (cost=1000.00..132488.34 rows=401 width=0) (actual
> time=382.983..382.983 rows=0 loops=1)
>  Output: content
>  Number of Workers: 1
>  Buffers: shared hit=175288
>  ->  Parallel Seq Scan on public.js  (cost=0.00..131448.24
> rows=401 width=0) (actual time=379.407..1141.437 rows=0 loops=1)
>Output: content
>Filter: (((js.content -> 'tags'::text) ->>
> 'title'::text) ~~ '%design%'::text)
>Rows Removed by Filter: 1724810
>Buffers: shared hit=241201
>  Planning time: 0.104 ms
>  Execution time: 403.045 ms
> (14 rows)
>
> Time: 403.596 ms
>
>> or the actual time taken.
>>
>
> Exactly what time you are referring here, Execution Time or actual time
> shown on Parallel Seq Scan node and what problem do you see with
> the reported time?

I'm referring to the Parallel Seq Scan actual time, showing
"379.407..1141.437" with 1 worker, but the total execution time shows
403.045.  If one worker is taking over a second, how come the whole
query was less than half a second?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Amit Kapila
On Fri, Nov 13, 2015 at 10:56 AM, Robert Haas  wrote:
>
> On Thu, Nov 12, 2015 at 10:39 PM, Amit Kapila 
wrote:
> > The number of shared buffers hit could be different across different
runs
> > because the read sequence of parallel workers can't be guaranteed, also
> > I don't think same is even guaranteed for Seq Scan node,
>
> The number of hits could be different.  However, it seems like any
> sequential scan, parallel or not, should have a number of accesses
> (hit + read) equal to the size of the relation.  Not sure if that's
> what is happening here.
>

After patch provided above to fix the issue reported by Pavel, that is
the behaviour, but I think there are few more things which we might
want to consider, just refer the below plan:

Total pages in table

postgres=# select relname,relpages from pg_class where relname like '%t2%';
 relname | relpages
-+--
 t2  | 5406
(1 row)


Parallel Plan
-
postgres=# explain (analyze,buffers,timing) select count(*) from t2 where
c1 % 1
0 = 0;
   QUERY PLAN



 Aggregate  (cost=8174.90..8174.91 rows=1 width=0) (actual
time=1055.294..1055.2
94 rows=1 loops=1)
   Buffers: shared hit=446 read=5054
   ->  Gather  (cost=0.00..8162.40 rows=5000 width=0) (actual
time=79.787..959.6
51 rows=10 loops=1)
 Number of Workers: 2
 Buffers: shared hit=446 read=5054
 ->  Parallel Seq Scan on t2  (cost=0.00..8162.40 rows=5000
width=0) (ac
tual time=30.771..2518.844 rows=10 loops=1)
   Filter: ((c1 % 10) = 0)
   Rows Removed by Filter: 90
   Buffers: shared hit=352 read=5054
 Planning time: 0.170 ms
 Execution time: 1059.400 ms
(11 rows)

Lets focus on Buffers and actual time in the above plan:

Buffers - At Parallel Seq Scan node, it shows total of 5406 (352+5054)
buffers which tallys with what is expected.  However at Gather node,
it shows 5500 (446+5054) and the reason for the same is that we
accumulate overall buffer usage for parallel execution of worker which
includes start of node as well, refer ParallelQueryMain() and when the
that gets counted even towards the buffer calculation of Gather node.
The theory behind collecting overall buffer usage for parallel execution
was that we need it for pg_stat_statements where the stats is accumulated
for overall execution not on node-by-node basis refer queryDesc->totaltime
usage in standard_ExecutorRun().
I think here we need to decide what is the right value to display at
Gather node:
1. Display the same number of buffers at Gather node as at Parallel
Seq Scan node.
2. Display the number of buffers at Parallel Seq Scan node plus the
additional buffers used by parallel workers for miscellaneous work
like ExecutorStart(), etc.
3. Don't account for buffers used for parallel workers.
4. Anything better?

Also in conjuction with above, we need to see what should be accounted for
pg_stat_statements?

actual_time -
Actual time at Gather node: actual time = 79.787..959.651
Actual time at Parallel Seq Scan node = 30.771..2518.844

Time at Parallel Seq Scan node is more than time at Gather node as
the time for parallel workers is also accumulated for Parallel Seq Scan
node, whereas some doesn't get accounted for Gather node.  Now it
could be confusing for users because time displayed at Parallel Seq
Scan node will be equal to - time_taken_by_worker-1 +
time_taken_by_worker-2 + ...
This time could be more than the actual time taken by query because each
worker is getting executed parallely but we have accounted the time such
that each one is executing serially.
I think the time for fetching the tuples from workers is already accounted
for Gather node, so may be for Parallel Seq Scan node we can omit
adding the time for each of the parallel workers.

Thoughts?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Amit Kapila
On Wed, Nov 11, 2015 at 11:40 PM, Pavel Stehule 
wrote:

>
> yes - the another little bit unclean in EXPLAIN is number of workers. If I
> understand to the behave, the query is processed by two processes if
> workers in the explain is one.
>
>
You are right and I think that is current working model of Gather
node which seems okay. I think the more serious thing here
is that there is possibility that Explain Analyze can show the
number of workers as more than actual workers working for Gather
node.  We have already discussed that Explain Analyze should
the actual number of workers used in query execution, patch for
the same is still pending.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-13 Thread Amit Kapila
On Fri, Nov 13, 2015 at 6:17 PM, Thom Brown  wrote:
>
> >
> > The number of shared buffers hit could be different across different
runs
> > because the read sequence of parallel workers can't be guaranteed, also
> > I don't think same is even guaranteed for Seq Scan node, the other
> > operations
> > in parallel could lead to different number, however the actual problem
was
> > that in one of the plans shown by you [1], the Buffers hit at Gather
node
> > (175288) is lesser than the Buffers hit at Parallel Seq Scan node
(241201).
> > Do you still (after applying above patch) see that Gather node is
showing
> > lesser hit buffers than Parallel Seq Scan node?
>
> Hmm... that's odd, I'm not seeing the problem now, so maybe I'm mistaken
there.
>

Thanks for confirming the same.

> >
> >> or the actual time taken.
> >>
> >
> > Exactly what time you are referring here, Execution Time or actual time
> > shown on Parallel Seq Scan node and what problem do you see with
> > the reported time?
>
> I'm referring to the Parallel Seq Scan actual time, showing
> "379.407..1141.437" with 1 worker, but the total execution time shows
> 403.045.  If one worker is taking over a second, how come the whole
> query was less than half a second?
>

Yeah, this could be possible due to the way currently time is accumulated,
see my mail which I sent just before this mail.  I think we might need to do
something, else it could be confusing for users.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-12 Thread Robert Haas
On Thu, Nov 12, 2015 at 10:39 PM, Amit Kapila  wrote:
> The number of shared buffers hit could be different across different runs
> because the read sequence of parallel workers can't be guaranteed, also
> I don't think same is even guaranteed for Seq Scan node,

The number of hits could be different.  However, it seems like any
sequential scan, parallel or not, should have a number of accesses
(hit + read) equal to the size of the relation.  Not sure if that's
what is happening here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-12 Thread Amit Kapila
On Thu, Nov 12, 2015 at 9:05 PM, Thom Brown  wrote:
>
> On 12 November 2015 at 15:23, Amit Kapila  wrote:
> > On Wed, Nov 11, 2015 at 11:29 PM, Pavel Stehule 
> > wrote:
> >>
> >> Hi
> >>
> >> I have a first query
> >>
> >> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> >> differen
> >>
> >
> > Thanks for the report.  The reason for this problem is that
instrumentation
> > information from workers is getting aggregated multiple times.  In
> > ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
> > will wait for workers to finish and then accumulate stats from workers.
> > Now ExecShutdownGatherWorkers() could be called multiple times
> > (once we read all tuples from workers, at end of node) and it should be
> > ensured that repeated calls should not try to redo the work done by
first
> > call.
> > The same is ensured for tuplequeues, but not for parallel executor info.
> > I think we can safely assume that we need to call ExecParallelFinish()
only
> > when there are workers started by the Gathers node, so on those lines
> > attached patch should fix the problem.
>
> That fixes the count issue for me, although not the number of buffers
> hit,
>

The number of shared buffers hit could be different across different runs
because the read sequence of parallel workers can't be guaranteed, also
I don't think same is even guaranteed for Seq Scan node, the other
operations
in parallel could lead to different number, however the actual problem was
that in one of the plans shown by you [1], the Buffers hit at Gather node
(175288) is lesser than the Buffers hit at Parallel Seq Scan node (241201).
Do you still (after applying above patch) see that Gather node is showing
lesser hit buffers than Parallel Seq Scan node?

[1]
 # explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->>'title' like '%design%';
 QUERY
PLAN


 Aggregate  (cost=132489.34..132489.35 rows=1 width=0) (actual
time=382.987..382.987 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175288
   ->  Gather  (cost=1000.00..132488.34 rows=401 width=0) (actual
time=382.983..382.983 rows=0 loops=1)
 Output: content
 Number of Workers: 1
 Buffers: shared hit=175288
 ->  Parallel Seq Scan on public.js  (cost=0.00..131448.24
rows=401 width=0) (actual time=379.407..1141.437 rows=0 loops=1)
   Output: content
   Filter: (((js.content -> 'tags'::text) ->>
'title'::text) ~~ '%design%'::text)
   Rows Removed by Filter: 1724810
   Buffers: shared hit=241201
 Planning time: 0.104 ms
 Execution time: 403.045 ms
(14 rows)

Time: 403.596 ms

> or the actual time taken.
>

Exactly what time you are referring here, Execution Time or actual time
shown on Parallel Seq Scan node and what problem do you see with
the reported time?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-12 Thread Amit Kapila
On Wed, Nov 11, 2015 at 11:29 PM, Pavel Stehule 
wrote:
>
> Hi
>
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
differen
>

Thanks for the report.  The reason for this problem is that instrumentation
information from workers is getting aggregated multiple times.  In
ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
will wait for workers to finish and then accumulate stats from workers.
Now ExecShutdownGatherWorkers() could be called multiple times
(once we read all tuples from workers, at end of node) and it should be
ensured that repeated calls should not try to redo the work done by first
call.
The same is ensured for tuplequeues, but not for parallel executor info.
I think we can safely assume that we need to call ExecParallelFinish() only
when there are workers started by the Gathers node, so on those lines
attached patch should fix the problem.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_agg_instr_issue_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-12 Thread Thom Brown
On 12 November 2015 at 15:23, Amit Kapila  wrote:
> On Wed, Nov 11, 2015 at 11:29 PM, Pavel Stehule 
> wrote:
>>
>> Hi
>>
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
>>
>
> Thanks for the report.  The reason for this problem is that instrumentation
> information from workers is getting aggregated multiple times.  In
> ExecShutdownGatherWorkers(), we call ExecParallelFinish where it
> will wait for workers to finish and then accumulate stats from workers.
> Now ExecShutdownGatherWorkers() could be called multiple times
> (once we read all tuples from workers, at end of node) and it should be
> ensured that repeated calls should not try to redo the work done by first
> call.
> The same is ensured for tuplequeues, but not for parallel executor info.
> I think we can safely assume that we need to call ExecParallelFinish() only
> when there are workers started by the Gathers node, so on those lines
> attached patch should fix the problem.

That fixes the count issue for me, although not the number of buffers
hit, or the actual time taken.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule
2015-11-11 19:03 GMT+01:00 Thom Brown :

> On 11 November 2015 at 17:59, Pavel Stehule 
> wrote:
> > Hi
> >
> > I have a first query
> >
> > I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> > differen
> >
> > postgres=# set max_parallel_degree to 4;
> > SET
> > Time: 0.717 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌───┐
> > │  QUERY PLAN
> > │
> >
> ╞═══╡
> > │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> > time=142.541..142.541 rows=1 loops=1)   │
> > │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> > time=0.633..130.926 rows=10 loops=1)  │
> > │ Number of Workers: 2
> > │
> > │ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> > width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> > │   Filter: ((a % 10) = 0)
> > │
> > │   Rows Removed by Filter: 1526399
> > │
> > │ Planning time: 0.167 ms
> > │
> > │ Execution time: 144.519 ms
> > │
> >
> └───┘
> > (8 rows)
> >
> > Time: 145.374 ms
> > postgres=# set max_parallel_degree to 1;
> > SET
> > Time: 0.706 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌┐
> > │   QUERY PLAN
> > │
> >
> ╞╡
> > │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> > time=163.355..163.355 rows=1 loops=1)  │
> > │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> > time=0.485..152.827 rows=10 loops=1)  │
> > │ Number of Workers: 1
> > │
> > │ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> > width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> > │   Filter: ((a % 10) = 0)
> > │
> > │   Rows Removed by Filter: 1308394
> > │
> > │ Planning time: 0.129 ms
> > │
> > │ Execution time: 165.102 ms
> > │
> >
> └┘
> > (8 rows)
> >
> > Rows removed by filter: 1308394 X 1526399. Is it expected?
>
> Yeah, I noticed the same thing, but more pronounced:
>
> With set max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
> QUERY PLAN
>
> -
>  Aggregate  (cost=49575.51..49575.52 rows=1 width=0) (actual
> time=744.267..744.267 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=175423
>->  Gather  (cost=1000.00..49544.27 rows=12496 width=0) (actual
> time=0.351..731.662 rows=55151 loops=1)
>  Output: content
>  Number of Workers: 4
>  Buffers: shared hit=175423
>  ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
> rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)
>Output: content
>Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>Rows Removed by Filter: 2085546
>Buffers: shared hit=305123
>  Planning time: 0.123 ms
>  Execution time: 759.313 ms
> (14 rows)
>
>
> With set max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>  QUERY PLAN
>
> ---
>  Aggregate  (cost=212857.25..212857.26 rows=1 width=0) (actual
> time=1235.082..1235.082 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=175243
>->  Seq Scan on public.js  (cost=0.00..212826.01 rows=12496
> width=0) (actual time=0.019..1228.515 rows=55151 

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule
Hi

I have a first query

I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
differen

postgres=# set max_parallel_degree to 4;
SET
Time: 0.717 ms
postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
┌───┐
│  QUERY
PLAN   │
╞═══╡
│ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
time=142.541..142.541 rows=1 loops=1)   │
│   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
time=0.633..130.926 rows=10 loops=1)  │
│ Number of Workers:
2
│
│ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
│   Filter: ((a % 10) =
0)
│
│   Rows Removed by Filter:
1526399
│
│ Planning time: 0.167
ms
│
│ Execution time: 144.519
ms
│
└───┘
(8 rows)

Time: 145.374 ms
postgres=# set max_parallel_degree to 1;
SET
Time: 0.706 ms
postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
┌┐
│   QUERY
PLAN   │
╞╡
│ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
time=163.355..163.355 rows=1 loops=1)  │
│   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
time=0.485..152.827 rows=10 loops=1)  │
│ Number of Workers:
1
│
│ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
│   Filter: ((a % 10) =
0)
│
│   Rows Removed by Filter:
1308394
│
│ Planning time: 0.129
ms
│
│ Execution time: 165.102
ms
│
└┘
(8 rows)

Rows removed by filter: 1308394 X 1526399. Is it expected?


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown
On 11 November 2015 at 17:59, Pavel Stehule  wrote:
> Hi
>
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen
>
> postgres=# set max_parallel_degree to 4;
> SET
> Time: 0.717 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> ┌───┐
> │  QUERY PLAN
> │
> ╞═══╡
> │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> time=142.541..142.541 rows=1 loops=1)   │
> │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> time=0.633..130.926 rows=10 loops=1)  │
> │ Number of Workers: 2
> │
> │ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> │   Filter: ((a % 10) = 0)
> │
> │   Rows Removed by Filter: 1526399
> │
> │ Planning time: 0.167 ms
> │
> │ Execution time: 144.519 ms
> │
> └───┘
> (8 rows)
>
> Time: 145.374 ms
> postgres=# set max_parallel_degree to 1;
> SET
> Time: 0.706 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> ┌┐
> │   QUERY PLAN
> │
> ╞╡
> │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> time=163.355..163.355 rows=1 loops=1)  │
> │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> time=0.485..152.827 rows=10 loops=1)  │
> │ Number of Workers: 1
> │
> │ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> │   Filter: ((a % 10) = 0)
> │
> │   Rows Removed by Filter: 1308394
> │
> │ Planning time: 0.129 ms
> │
> │ Execution time: 165.102 ms
> │
> └┘
> (8 rows)
>
> Rows removed by filter: 1308394 X 1526399. Is it expected?

Yeah, I noticed the same thing, but more pronounced:

With set max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

QUERY PLAN
-
 Aggregate  (cost=49575.51..49575.52 rows=1 width=0) (actual
time=744.267..744.267 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175423
   ->  Gather  (cost=1000.00..49544.27 rows=12496 width=0) (actual
time=0.351..731.662 rows=55151 loops=1)
 Output: content
 Number of Workers: 4
 Buffers: shared hit=175423
 ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)
   Output: content
   Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
   Rows Removed by Filter: 2085546
   Buffers: shared hit=305123
 Planning time: 0.123 ms
 Execution time: 759.313 ms
(14 rows)


With set max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

 QUERY PLAN
---
 Aggregate  (cost=212857.25..212857.26 rows=1 width=0) (actual
time=1235.082..1235.082 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175243
   ->  Seq Scan on public.js  (cost=0.00..212826.01 rows=12496
width=0) (actual time=0.019..1228.515 rows=55151 loops=1)
 Output: content
 Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
 Rows Removed by Filter: 1197822
 Buffers: shared hit=175243
 

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Amit Langote
On Wed, Nov 11, 2015 at 11:53 PM, Robert Haas  wrote:
> For those following along at home, here's a demo:
>
> rhaas=# \timing
> Timing is on.
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 743.061 ms
> rhaas=# set max_parallel_degree = 4;
> SET
> Time: 0.270 ms
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 213.412 ms
>
> This is all pretty primitive at this point - there are still lots of
> things that need to be fixed and improved, and it applies to only the
> very simplest cases at present, but, hey, parallel query.  Check it
> out.

Yay! Great work guys!

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Robert Haas
On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila  wrote:
> Okay, I have updated the patch to make seq scan node parallel aware.
> To make that happen we need to have parallel_aware flag both in Plan
> as well as Path, so that we can pass that information from Path to Plan.
> I think the right place to copy parallel_aware info from path to
> plan is copy_path_costsize and ideally we should change the name
> of function to something like copy_generic_path_info(), but for
> now I have retained it's original name as it is used at many places,
> let me know if you think we should goahead and change the name
> of function as well.
>
> I have changed Explain as well so that it adds Parallel for Seq Scan if
> SeqScan node is parallel_aware.
>
> I have not integrated it with consider-parallel patch, so that this and
> Partial Seq Scan version of the patch can be compared without much
> difficulity.
>
> Thoughts?

I've committed most of this, except for some planner bits that I
didn't like, and after a bunch of cleanup.  Instead, I committed the
consider-parallel-v2.patch with some additional planner bits to make
up for the ones I removed from your patch.  So, now we have parallel
sequential scan!

For those following along at home, here's a demo:

rhaas=# \timing
Timing is on.
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-+-+--+
(0 rows)

Time: 743.061 ms
rhaas=# set max_parallel_degree = 4;
SET
Time: 0.270 ms
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-+-+--+
(0 rows)

Time: 213.412 ms

This is all pretty primitive at this point - there are still lots of
things that need to be fixed and improved, and it applies to only the
very simplest cases at present, but, hey, parallel query.  Check it
out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown
On 11 November 2015 at 14:53, Robert Haas  wrote:
> On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila  wrote:
>> Okay, I have updated the patch to make seq scan node parallel aware.
>> To make that happen we need to have parallel_aware flag both in Plan
>> as well as Path, so that we can pass that information from Path to Plan.
>> I think the right place to copy parallel_aware info from path to
>> plan is copy_path_costsize and ideally we should change the name
>> of function to something like copy_generic_path_info(), but for
>> now I have retained it's original name as it is used at many places,
>> let me know if you think we should goahead and change the name
>> of function as well.
>>
>> I have changed Explain as well so that it adds Parallel for Seq Scan if
>> SeqScan node is parallel_aware.
>>
>> I have not integrated it with consider-parallel patch, so that this and
>> Partial Seq Scan version of the patch can be compared without much
>> difficulity.
>>
>> Thoughts?
>
> I've committed most of this, except for some planner bits that I
> didn't like, and after a bunch of cleanup.  Instead, I committed the
> consider-parallel-v2.patch with some additional planner bits to make
> up for the ones I removed from your patch.  So, now we have parallel
> sequential scan!
>
> For those following along at home, here's a demo:
>
> rhaas=# \timing
> Timing is on.
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 743.061 ms
> rhaas=# set max_parallel_degree = 4;
> SET
> Time: 0.270 ms
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 213.412 ms
>
> This is all pretty primitive at this point - there are still lots of
> things that need to be fixed and improved, and it applies to only the
> very simplest cases at present, but, hey, parallel query.  Check it
> out.

Congratulations to both you and Amit.  This is a significant landmark
in PostgreSQL feature development.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule
2015-11-11 16:18 GMT+01:00 Thom Brown :

> On 11 November 2015 at 14:53, Robert Haas  wrote:
> > On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila 
> wrote:
> >> Okay, I have updated the patch to make seq scan node parallel aware.
> >> To make that happen we need to have parallel_aware flag both in Plan
> >> as well as Path, so that we can pass that information from Path to Plan.
> >> I think the right place to copy parallel_aware info from path to
> >> plan is copy_path_costsize and ideally we should change the name
> >> of function to something like copy_generic_path_info(), but for
> >> now I have retained it's original name as it is used at many places,
> >> let me know if you think we should goahead and change the name
> >> of function as well.
> >>
> >> I have changed Explain as well so that it adds Parallel for Seq Scan if
> >> SeqScan node is parallel_aware.
> >>
> >> I have not integrated it with consider-parallel patch, so that this and
> >> Partial Seq Scan version of the patch can be compared without much
> >> difficulity.
> >>
> >> Thoughts?
> >
> > I've committed most of this, except for some planner bits that I
> > didn't like, and after a bunch of cleanup.  Instead, I committed the
> > consider-parallel-v2.patch with some additional planner bits to make
> > up for the ones I removed from your patch.  So, now we have parallel
> > sequential scan!
> >
> > For those following along at home, here's a demo:
> >
> > rhaas=# \timing
> > Timing is on.
> > rhaas=# select * from pgbench_accounts where filler like '%a%';
> >  aid | bid | abalance | filler
> > -+-+--+
> > (0 rows)
> >
> > Time: 743.061 ms
> > rhaas=# set max_parallel_degree = 4;
> > SET
> > Time: 0.270 ms
> > rhaas=# select * from pgbench_accounts where filler like '%a%';
> >  aid | bid | abalance | filler
> > -+-+--+
> > (0 rows)
> >
> > Time: 213.412 ms
> >
> > This is all pretty primitive at this point - there are still lots of
> > things that need to be fixed and improved, and it applies to only the
> > very simplest cases at present, but, hey, parallel query.  Check it
> > out.
>
> Congratulations to both you and Amit.  This is a significant landmark
> in PostgreSQL feature development.
>

+1

Pavel


>
> Thom
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Amit Langote
On 2015/11/12 4:26, Robert Haas wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
> 
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.

I encountered one more odd behavior:

postgres=# EXPLAIN ANALYZE SELECT abalance FROM pgbench_accounts WHERE aid
= 23466;
QUERY PLAN

---
 Gather  (cost=1000.00..65207.88 rows=1 width=4) (actual
time=17450.595..17451.151 rows=1 loops=1)
   Number of Workers: 4
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..64207.78 rows=1
width=4) (actual time=55.934..157001.134 rows=2 loops=1)
 Filter: (aid = 23466)
 Rows Removed by Filter: 18047484
 Planning time: 0.198 ms
 Execution time: 17453.565 ms
(7 rows)


The #rows removed here is almost twice the number of rows in the table
(10m). Also, the #rows selected shown is 2 for Parallel Seq Scan whereas
only 1 row is selected.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown
On 11 November 2015 at 19:51, Thom Brown  wrote:
> On 11 November 2015 at 19:26, Robert Haas  wrote:
>> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
>> wrote:
>>> I have a first query
>>>
>>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>>> differen
>>
>> Hmm, I see I was right about people finding more bugs once this was
>> committed.  That didn't take long.
>>
>> There's supposed to be code to handle this - see the
>> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
>> evidently a few bricks shy of a load.
>> ExecParallelReportInstrumentation is supposed to transfer the counts
>> from each worker to the DSM:
>>
>> ps_instrument = >ps_instrument[i];
>> SpinLockAcquire(_instrument->mutex);
>> InstrAggNode(_instrument->instr, planstate->instrument);
>> SpinLockRelease(_instrument->mutex);
>>
>> And ExecParallelRetrieveInstrumentation is supposed to slurp those
>> counts back into the leader's PlanState objects:
>>
>> /* No need to acquire the spinlock here; workers have exited 
>> already. */
>> ps_instrument = >ps_instrument[i];
>> InstrAggNode(planstate->instrument, _instrument->instr);
>>
>> This might be a race condition, or it might be just wrong logic.
>> Could you test what happens if you insert something like a 1-second
>> sleep in ExecParallelFinish just after the call to
>> WaitForParallelWorkersToFinish()?  If that makes the results
>> consistent, this is a race.  If it doesn't, something else is wrong:
>> then it would be useful to know whether the workers are actually
>> calling ExecParallelReportInstrumentation, and whether the leader is
>> actually calling ExecParallelRetrieveInstrumentation, and if so
>> whether they are doing it for the correct set of nodes.
>
> Hmm.. I made the change, but clearly it's not sleeping properly with
> my change (I'm expecting a total runtime in excess of 1 second):
>
> max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
> QUERY PLAN
> -
>  Aggregate  (cost=49578.18..49578.19 rows=1 width=0) (actual
> time=797.518..797.518 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=174883 read=540
>->  Gather  (cost=1000.00..49546.93 rows=12500 width=0) (actual
> time=0.245..784.959 rows=55151 loops=1)
>  Output: content
>  Number of Workers: 4
>  Buffers: shared hit=174883 read=540
>  ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
> rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)
>Output: content
>Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>Rows Removed by Filter: 2051330
>Buffers: shared hit=299224 read=907
>  Planning time: 0.086 ms
>  Execution time: 803.026 ms
>
>
> max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>  QUERY PLAN
> ---
>  Aggregate  (cost=212867.43..212867.44 rows=1 width=0) (actual
> time=1278.717..1278.717 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=174671 read=572
>->  Seq Scan on public.js  (cost=0.00..212836.18 rows=12500
> width=0) (actual time=0.018..1272.030 rows=55151 loops=1)
>  Output: content
>  Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>  Rows Removed by Filter: 1197822
>  Buffers: shared hit=174671 read=572
>  Planning time: 0.064 ms
>  Execution time: 1278.741 ms
> (10 rows)
>
> Time: 1279.145 ms
>
>
> I did, however, notice that repeated runs of the query with
> max_parallel_degree = 4 yields different counts of rows removed by
> filter:
>
> Run 1: 2051330
> Run 2: 2081252
> Run 3: 2065112
> Run 4: 2022045
> Run 5: 2025384
> Run 6: 2059360
> Run 7: 2079620
> Run 8: 2058541

Here's another oddity, with max_parallel_degree = 1:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->>'title' like '%design%';
 QUERY
PLAN

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule
2015-11-11 20:26 GMT+01:00 Robert Haas :

> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule 
> wrote:
> > I have a first query
> >
> > I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> > differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.
>

It is super feature, nobody can to wait to check it :). Much more people
can to put feedback and can do tests now.


>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
> ps_instrument = >ps_instrument[i];
> SpinLockAcquire(_instrument->mutex);
> InstrAggNode(_instrument->instr, planstate->instrument);
> SpinLockRelease(_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
> /* No need to acquire the spinlock here; workers have exited
> already. */
> ps_instrument = >ps_instrument[i];
> InstrAggNode(planstate->instrument, _instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()?  If that makes the results
> consistent, this is a race.  If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.
>

I did there pg_usleep(100L) without success

postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
 QUERY
PLAN
═
Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
time=154.535..154.535 rows=1 loops=1)
  ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
time=0.675..142.320 rows=10 loops=1)
Number of Workers: 2
->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
width=0) (actual time=0.075..445.999 rows=168927 loops=1)
  Filter: ((a % 10) = 0)
  Rows Removed by Filter: 1520549
Planning time: 0.117 ms
Execution time: 1155.505 ms
(8 rows)

expected

postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
  QUERY
PLAN
═══
Aggregate  (cost=19437.50..19437.51 rows=1 width=0) (actual
time=171.233..171.233 rows=1 loops=1)
  ->  Seq Scan on xxx  (cost=0.00..19425.00 rows=5000 width=0) (actual
time=0.187..162.627 rows=10 loops=1)
Filter: ((a % 10) = 0)
Rows Removed by Filter: 90
Planning time: 0.119 ms
Execution time: 171.322 ms
(6 rows)

The tests is based on table xxx

create table xxx(a int);
insert into xxx select generate_series(1,100);


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown
On 11 November 2015 at 19:26, Robert Haas  wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.
>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
> ps_instrument = >ps_instrument[i];
> SpinLockAcquire(_instrument->mutex);
> InstrAggNode(_instrument->instr, planstate->instrument);
> SpinLockRelease(_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
> /* No need to acquire the spinlock here; workers have exited already. 
> */
> ps_instrument = >ps_instrument[i];
> InstrAggNode(planstate->instrument, _instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()?  If that makes the results
> consistent, this is a race.  If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.

Hmm.. I made the change, but clearly it's not sleeping properly with
my change (I'm expecting a total runtime in excess of 1 second):

max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

QUERY PLAN
-
 Aggregate  (cost=49578.18..49578.19 rows=1 width=0) (actual
time=797.518..797.518 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=174883 read=540
   ->  Gather  (cost=1000.00..49546.93 rows=12500 width=0) (actual
time=0.245..784.959 rows=55151 loops=1)
 Output: content
 Number of Workers: 4
 Buffers: shared hit=174883 read=540
 ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)
   Output: content
   Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
   Rows Removed by Filter: 2051330
   Buffers: shared hit=299224 read=907
 Planning time: 0.086 ms
 Execution time: 803.026 ms


max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

 QUERY PLAN
---
 Aggregate  (cost=212867.43..212867.44 rows=1 width=0) (actual
time=1278.717..1278.717 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=174671 read=572
   ->  Seq Scan on public.js  (cost=0.00..212836.18 rows=12500
width=0) (actual time=0.018..1272.030 rows=55151 loops=1)
 Output: content
 Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
 Rows Removed by Filter: 1197822
 Buffers: shared hit=174671 read=572
 Planning time: 0.064 ms
 Execution time: 1278.741 ms
(10 rows)

Time: 1279.145 ms


I did, however, notice that repeated runs of the query with
max_parallel_degree = 4 yields different counts of rows removed by
filter:

Run 1: 2051330
Run 2: 2081252
Run 3: 2065112
Run 4: 2022045
Run 5: 2025384
Run 6: 2059360
Run 7: 2079620
Run 8: 2058541

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Robert Haas
On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  wrote:
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen

Hmm, I see I was right about people finding more bugs once this was
committed.  That didn't take long.

There's supposed to be code to handle this - see the
SharedPlanStateInstrumentation stuff in execParallel.c - but it's
evidently a few bricks shy of a load.
ExecParallelReportInstrumentation is supposed to transfer the counts
from each worker to the DSM:

ps_instrument = >ps_instrument[i];
SpinLockAcquire(_instrument->mutex);
InstrAggNode(_instrument->instr, planstate->instrument);
SpinLockRelease(_instrument->mutex);

And ExecParallelRetrieveInstrumentation is supposed to slurp those
counts back into the leader's PlanState objects:

/* No need to acquire the spinlock here; workers have exited already. */
ps_instrument = >ps_instrument[i];
InstrAggNode(planstate->instrument, _instrument->instr);

This might be a race condition, or it might be just wrong logic.
Could you test what happens if you insert something like a 1-second
sleep in ExecParallelFinish just after the call to
WaitForParallelWorkersToFinish()?  If that makes the results
consistent, this is a race.  If it doesn't, something else is wrong:
then it would be useful to know whether the workers are actually
calling ExecParallelReportInstrumentation, and whether the leader is
actually calling ExecParallelRetrieveInstrumentation, and if so
whether they are doing it for the correct set of nodes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-09 Thread Amit Kapila
On Fri, Nov 6, 2015 at 10:13 AM, Robert Haas  wrote:
>
> On Thu, Nov 5, 2015 at 10:49 PM, Amit Kapila 
wrote:
> > On Thu, Nov 5, 2015 at 11:54 PM, Robert Haas 
wrote:
> >> I was thinking about this idea:
> >>
> >> 1. Add a parallel_aware flag to each plan.
> >
> > Okay, so shall we add it in generic Plan node or to specific plan nodes
> > like SeqScan, IndexScan, etc.  To me, it appears that parallelism is
> > a node specific property, so we should add it to specific nodes and
> > for now as we are parallelising seq scan, so we can add this flag in
> > SeqScan node.  What do you say?
>
> I think it should go in the Plan node itself.  Parallel Append is
> going to need a way to test whether a node is parallel-aware, and
> there's nothing simpler than if (plan->parallel_aware).  That makes
> life simple for EXPLAIN, too.
>

Okay, I have updated the patch to make seq scan node parallel aware.
To make that happen we need to have parallel_aware flag both in Plan
as well as Path, so that we can pass that information from Path to Plan.
I think the right place to copy parallel_aware info from path to
plan is copy_path_costsize and ideally we should change the name
of function to something like copy_generic_path_info(), but for
now I have retained it's original name as it is used at many places,
let me know if you think we should goahead and change the name
of function as well.

I have changed Explain as well so that it adds Parallel for Seq Scan if
SeqScan node is parallel_aware.

I have not integrated it with consider-parallel patch, so that this and
Partial Seq Scan version of the patch can be compared without much
difficulity.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_partialseqscan_v25.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-08 Thread Amit Kapila
On Sat, Nov 7, 2015 at 4:11 AM, Robert Haas  wrote:
>
> On Fri, Oct 23, 2015 at 9:22 PM, Amit Kapila 
wrote:
> > The base rel's consider_parallel flag won't be percolated to childrels,
so
> > even
> > if we mark base rel as parallel capable, while generating the path it
won't
> > be considered.  I think we need to find a way to pass on that
information if
> > we want to follow this way.
>
> Fixed in the attached version.  I added a max_parallel_degree check,
> too, per your suggestion.
>

+ else if (IsA(node, SubPlan) || IsA(node, SubLink) ||
+ IsA(node, AlternativeSubPlan) || IsA(node, Param))
+ {
+ /*
+ * Since we don't have the ability to push subplans down to workers
+ * at present, we treat subplan references as parallel-restricted.
+ */
+ if (!context->allow_restricted)
+ return true;
+ }

You seem to have agreed upthread to change this check for PARAM_EXEC
paramkind. I think you have forgotten to change this code.

@@ -714,6 +860,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
  continue;
  }

+ /* Copy consider_parallel flag from parent. */
+ childrel->consider_parallel = rel->consider_parallel;
+

We are changing the childrel quals in this function and in function
adjust_appendrel_attrs()->adjust_appendrel_attrs_mutator(), we are
adding an extra conversion step for one of the case to the Var
participating in qualification.  So even though it might be safe to
assume that it won't add any new parallel-restricted or parallel-unsafe
expression in qual, ideally we should have a check for parallel-safety in
childrel quals separately as those might not be identical to parent rel.

> > True, we can do that way.  What I was trying to convey by above is
> > that we anyway need checks during path creation atleast in some
> > of the cases, so why not do all the checks at that time only as I
> > think all the information will be available at that time.
> >
> > I think if we store parallelism related info in RelOptInfo, that can
also
> > be made to work, but the only worry I have with that approach is we
> > need to have checks at two levels one at RelOptInfo formation time
> > and other at Path formation time.
>
> I don't really see that as a problem.  What I'm thinking about doing
> (but it's not implemented in the attached patch) is additionally
> adding a ppi_consider_parallel flag to the ParamPathInfo.  This would
> be meaningful only for baserels, and would indicate whether the
> ParamPathInfo's ppi_clauses are parallel-safe.
>
> If we're thinking about adding a parallel path to a baserel, we need
> the RelOptInfo to have consider_parallel set and, if there is a
> ParamPathInfo, we need the ParamPathInfo's ppi_consider_parallel flag
> to be set also.  That shows that both the rel's baserestrictinfo and
> the paramaterized join clauses are parallel-safe.  For a joinrel, we
> can add a path if (1) the joinrel has consider_parallel set and (2)
> the paths being joined are parallel-safe.  Testing condition (2) will
> require a per-Path flag, so we'll end up with one flag in the
> RelOptInfo, a second in the ParamPathInfo, and a third in the Path.
>

I am already adding a parallel_aware flag in the patch to make seqscan
node parallel_aware, so if you find that patch better compare to partial
seq scan node patch, then I can take care of above in that patch.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-06 Thread Robert Haas
On Fri, Oct 23, 2015 at 9:22 PM, Amit Kapila  wrote:
> The base rel's consider_parallel flag won't be percolated to childrels, so
> even
> if we mark base rel as parallel capable, while generating the path it won't
> be considered.  I think we need to find a way to pass on that information if
> we want to follow this way.

Fixed in the attached version.  I added a max_parallel_degree check,
too, per your suggestion.

> True, we can do that way.  What I was trying to convey by above is
> that we anyway need checks during path creation atleast in some
> of the cases, so why not do all the checks at that time only as I
> think all the information will be available at that time.
>
> I think if we store parallelism related info in RelOptInfo, that can also
> be made to work, but the only worry I have with that approach is we
> need to have checks at two levels one at RelOptInfo formation time
> and other at Path formation time.

I don't really see that as a problem.  What I'm thinking about doing
(but it's not implemented in the attached patch) is additionally
adding a ppi_consider_parallel flag to the ParamPathInfo.  This would
be meaningful only for baserels, and would indicate whether the
ParamPathInfo's ppi_clauses are parallel-safe.

If we're thinking about adding a parallel path to a baserel, we need
the RelOptInfo to have consider_parallel set and, if there is a
ParamPathInfo, we need the ParamPathInfo's ppi_consider_parallel flag
to be set also.  That shows that both the rel's baserestrictinfo and
the paramaterized join clauses are parallel-safe.  For a joinrel, we
can add a path if (1) the joinrel has consider_parallel set and (2)
the paths being joined are parallel-safe.  Testing condition (2) will
require a per-Path flag, so we'll end up with one flag in the
RelOptInfo, a second in the ParamPathInfo, and a third in the Path.
That doesn't seem like a problem, though: it's a sign that we're doing
this in a way that fits into the existing infrastructure, and it
should be pretty efficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From e31d5f3f4c53d80e87a74925db88bcaf2e6fa564 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Fri, 2 Oct 2015 23:57:46 -0400
Subject: [PATCH 2/7] Strengthen planner infrastructure for parallelism.

Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.
---
 src/backend/nodes/outfuncs.c  |   1 +
 src/backend/optimizer/path/allpaths.c | 155 +++-
 src/backend/optimizer/plan/planmain.c |  12 +++
 src/backend/optimizer/plan/planner.c  |   9 +-
 src/backend/optimizer/util/clauses.c  | 183 +++---
 src/backend/optimizer/util/relnode.c  |  21 
 src/backend/utils/cache/lsyscache.c   |  22 
 src/include/nodes/relation.h  |   1 +
 src/include/optimizer/clauses.h   |   2 +-
 src/include/utils/lsyscache.h |   1 +
 10 files changed, 364 insertions(+), 43 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3e75cd1..0030a9b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1868,6 +1868,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_INT_FIELD(width);
 	WRITE_BOOL_FIELD(consider_startup);
 	WRITE_BOOL_FIELD(consider_param_startup);
+	WRITE_BOOL_FIELD(consider_parallel);
 	WRITE_NODE_FIELD(reltargetlist);
 	WRITE_NODE_FIELD(pathlist);
 	WRITE_NODE_FIELD(ppilist);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8fc1cfd..105e544 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -21,6 +21,7 @@
 #include "access/tsmapi.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_proc.h"
 #include "foreign/fdwapi.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -71,6 +72,9 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
+		  RangeTblEntry *rte);
+static bool function_rte_parallel_ok(RangeTblEntry *rte);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -158,7 +162,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	

Re: [HACKERS] Parallel Seq Scan

2015-11-05 Thread Robert Haas
On Thu, Nov 5, 2015 at 12:52 AM, Haribabu Kommi
 wrote:
>> Now instead of displaying Partial Seq Scan, if we just display Seq Scan,
>> then it might confuse user, so it is better to add some thing indicating
>> parallel node if we want to go this route.
>
> IMO, the change from Partial Seq Scan to Seq Scan may not confuse user,
> if we clearly specify in the documentation that all plans under a Gather node
> are parallel plans.
>
> This is possible for the execution nodes that executes fully under a
> Gather node.
> The same is not possible for parallel aggregates, so we have to mention the
> aggregate node below Gather node as partial only.
>
> I feel this suggestion arises as may be because of some duplicate code between
> Partial Seq Scan and Seq scan. By using Seq Scan node only if we display as
> Partial Seq Scan by storing some flag in the plan? This avoids the
> need of adding
> new plan nodes.

I was thinking about this idea:

1. Add a parallel_aware flag to each plan.

2. If the flag is set, have EXPLAIN print the word "Parallel" before
the node name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-05 Thread Robert Haas
On Thu, Nov 5, 2015 at 10:49 PM, Amit Kapila  wrote:
> On Thu, Nov 5, 2015 at 11:54 PM, Robert Haas  wrote:
>> I was thinking about this idea:
>>
>> 1. Add a parallel_aware flag to each plan.
>
> Okay, so shall we add it in generic Plan node or to specific plan nodes
> like SeqScan, IndexScan, etc.  To me, it appears that parallelism is
> a node specific property, so we should add it to specific nodes and
> for now as we are parallelising seq scan, so we can add this flag in
> SeqScan node.  What do you say?

I think it should go in the Plan node itself.  Parallel Append is
going to need a way to test whether a node is parallel-aware, and
there's nothing simpler than if (plan->parallel_aware).  That makes
life simple for EXPLAIN, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-05 Thread Amit Kapila
On Thu, Nov 5, 2015 at 11:54 PM, Robert Haas  wrote:
>
> I was thinking about this idea:
>
> 1. Add a parallel_aware flag to each plan.
>

Okay, so shall we add it in generic Plan node or to specific plan nodes
like SeqScan, IndexScan, etc.  To me, it appears that parallelism is
a node specific property, so we should add it to specific nodes and
for now as we are parallelising seq scan, so we can add this flag in
SeqScan node.  What do you say?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-11-04 Thread Haribabu Kommi
On Tue, Nov 3, 2015 at 9:41 PM, Amit Kapila  wrote:
> On Fri, Oct 23, 2015 at 4:41 PM, Amit Kapila 
> wrote:
>>
>> On Fri, Oct 23, 2015 at 10:33 AM, Robert Haas 
>> wrote:
>
> Please find the rebased partial seq scan patch attached with this
> mail.
>
> Robert suggested me off list that we should once try to see if we
> can use Seq Scan node instead of introducing a new Partial Seq Scan
> node. I have analyzed to see if we can use the SeqScan node (containing
> parallel flag) instead of introducing new partial seq scan and found that
> we primarily need to change most of the functions in nodeSeqScan.c to
> have a parallel flag check and do something special for Partial Seq Scan
> and apart from that we need special handling in function
> ExecSupportsBackwardScan().  In general, I think we can make
> SeqScan node parallel-aware by having some special paths without
> introducing much complexity and that can save us code-duplication
> between nodeSeqScan.c and nodePartialSeqScan.c.  One thing that makes
> me slightly uncomfortable with this approach is that for partial seq scan,
> currently the plan looks like:
>
> QUERY PLAN
> --
>  Gather  (cost=0.00..2588194.25 rows=9990667 width=4)
>Number of Workers: 1
>->  Partial Seq Scan on t1  (cost=0.00..89527.51 rows=9990667 width=4)
>  Filter: (c1 > 1)
> (4 rows)
>
> Now instead of displaying Partial Seq Scan, if we just display Seq Scan,
> then it might confuse user, so it is better to add some thing indicating
> parallel node if we want to go this route.

IMO, the change from Partial Seq Scan to Seq Scan may not confuse user,
if we clearly specify in the documentation that all plans under a Gather node
are parallel plans.

This is possible for the execution nodes that executes fully under a
Gather node.
The same is not possible for parallel aggregates, so we have to mention the
aggregate node below Gather node as partial only.

I feel this suggestion arises as may be because of some duplicate code between
Partial Seq Scan and Seq scan. By using Seq Scan node only if we display as
Partial Seq Scan by storing some flag in the plan? This avoids the
need of adding
new plan nodes.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-03 Thread Amit Kapila
On Fri, Oct 23, 2015 at 4:41 PM, Amit Kapila 
wrote:
>
> On Fri, Oct 23, 2015 at 10:33 AM, Robert Haas 
wrote:

Please find the rebased partial seq scan patch attached with this
mail.

Robert suggested me off list that we should once try to see if we
can use Seq Scan node instead of introducing a new Partial Seq Scan
node. I have analyzed to see if we can use the SeqScan node (containing
parallel flag) instead of introducing new partial seq scan and found that
we primarily need to change most of the functions in nodeSeqScan.c to
have a parallel flag check and do something special for Partial Seq Scan
and apart from that we need special handling in function
ExecSupportsBackwardScan().  In general, I think we can make
SeqScan node parallel-aware by having some special paths without
introducing much complexity and that can save us code-duplication
between nodeSeqScan.c and nodePartialSeqScan.c.  One thing that makes
me slightly uncomfortable with this approach is that for partial seq scan,
currently the plan looks like:

QUERY PLAN
--
 Gather  (cost=0.00..2588194.25 rows=9990667 width=4)
   Number of Workers: 1
   ->  Partial Seq Scan on t1  (cost=0.00..89527.51 rows=9990667 width=4)
 Filter: (c1 > 1)
(4 rows)

Now instead of displaying Partial Seq Scan, if we just display Seq Scan,
then it might confuse user, so it is better to add some thing indicating
parallel node if we want to go this route.

Thoughts?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_partialseqscan_v24.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-02 Thread Robert Haas
On Fri, Oct 30, 2015 at 11:12 PM, Noah Misch  wrote:
> On Wed, Oct 28, 2015 at 01:04:12AM +0100, Robert Haas wrote:
>> Well, OK.  That's not strictly a correctness issue, but here's an
>> updated patch along the lines you suggested.
>
>> Finally, have setup_param_list set a new ParamListInfo field,
>> paramMask, to the parameters actually used in the expression, so that
>> we don't try to fetch those that are not needed when serializing a
>> parameter list.  This isn't necessary for performance, but it makes
>
> s/performance/correctness/
>
>> the performance of the parallel executor code comparable to what we
>> do for cases involving cursors.
>
> With that, the patch is ready.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-30 Thread Noah Misch
On Wed, Oct 28, 2015 at 01:04:12AM +0100, Robert Haas wrote:
> Well, OK.  That's not strictly a correctness issue, but here's an
> updated patch along the lines you suggested.


> Finally, have setup_param_list set a new ParamListInfo field,
> paramMask, to the parameters actually used in the expression, so that
> we don't try to fetch those that are not needed when serializing a
> parameter list.  This isn't necessary for performance, but it makes

s/performance/correctness/

> the performance of the parallel executor code comparable to what we
> do for cases involving cursors.

With that, the patch is ready.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-27 Thread Robert Haas
On Sat, Oct 24, 2015 at 6:31 PM, Noah Misch  wrote:
> On Sat, Oct 24, 2015 at 07:49:07AM -0400, Robert Haas wrote:
>> On Fri, Oct 23, 2015 at 9:38 PM, Noah Misch  wrote:
>> > Since that specification permits ParamListInfo consumers to ignore 
>> > paramMask,
>> > the plpgsql_param_fetch() change from copy-paramlistinfo-fixes.patch is 
>> > still
>> > formally required.
>>
>> So why am I not just doing that, then?  Seems a lot more surgical.
>
> do $$
> declare
> param_unused text := repeat('a', 100 * 1024 * 1024);
> param_used oid := 403;
> begin
> perform count(*) from pg_am where oid = param_used;
> end
> $$;
>
> I expect that if you were to inspect the EstimateParamListSpace() return
> values when executing that, you would find that it serializes the irrelevant
> 100 MiB datum.  No possible logic in plpgsql_param_fetch() could stop that
> from happening, because copyParamList() and SerializeParamList() call the
> paramFetch hook only for dynamic parameters.  Cursors faced the same problem,
> which is the raison d'être for setup_unshared_param_list().

Well, OK.  That's not strictly a correctness issue, but here's an
updated patch along the lines you suggested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 50895be5cdbb0fda41535be23700e5112585e1e3 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Thu, 22 Oct 2015 23:56:51 -0400
Subject: [PATCH 6/6] Fix problems with ParamListInfo serialization mechanism.

Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker.  However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch.  Repair.

Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case.  To fix, make the bms_is_member test
in that function unconditional.

Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list.  This isn't necessary for performance, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.
---
 src/backend/commands/prepare.c   |  1 +
 src/backend/executor/functions.c |  1 +
 src/backend/executor/spi.c   |  1 +
 src/backend/nodes/params.c   | 54 
 src/backend/tcop/postgres.c  |  1 +
 src/backend/utils/adt/datum.c| 16 
 src/include/nodes/params.h   |  4 ++-
 src/pl/plpgsql/src/pl_exec.c | 40 -
 8 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index fb33d30..0d4aa69 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -392,6 +392,7 @@ EvaluateParams(PreparedStatement *pstmt, List *params,
 	paramLI->parserSetup = NULL;
 	paramLI->parserSetupArg = NULL;
 	paramLI->numParams = num_params;
+	paramLI->paramMask = NULL;
 
 	i = 0;
 	foreach(l, exprstates)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 812a610..0919c04 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -910,6 +910,7 @@ postquel_sub_params(SQLFunctionCachePtr fcache,
 			paramLI->parserSetup = NULL;
 			paramLI->parserSetupArg = NULL;
 			paramLI->numParams = nargs;
+			paramLI->paramMask = NULL;
 			fcache->paramLI = paramLI;
 		}
 		else
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 300401e..13ddb8f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2330,6 +2330,7 @@ _SPI_convert_params(int nargs, Oid *argtypes,
 		paramLI->parserSetup = NULL;
 		paramLI->parserSetupArg = NULL;
 		paramLI->numParams = nargs;
+		paramLI->paramMask = NULL;
 
 		for (i = 0; i < nargs; i++)
 		{
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index d093263..0351774 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -15,6 +15,7 @@
 
 #include "postgres.h"
 
+#include "nodes/bitmapset.h"
 #include "nodes/params.h"
 #include "storage/shmem.h"
 #include "utils/datum.h"
@@ -50,6 +51,7 @@ copyParamList(ParamListInfo from)
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
 	retval->numParams = from->numParams;
+	retval->paramMask = NULL;
 
 	for (i = 0; i < from->numParams; i++)
 	{
@@ -58,6 +60,17 @@ copyParamList(ParamListInfo from)
 		int16		typLen;
 		bool		typByVal;
 
+		/* Ignore parameters we don't need, to save cycles and space. */
+		if (retval->paramMask != NULL &&
+			!bms_is_member(i, retval->paramMask))

Re: [HACKERS] Parallel Seq Scan

2015-10-24 Thread Noah Misch
On Sat, Oct 24, 2015 at 07:49:07AM -0400, Robert Haas wrote:
> On Fri, Oct 23, 2015 at 9:38 PM, Noah Misch  wrote:
> > Since that specification permits ParamListInfo consumers to ignore 
> > paramMask,
> > the plpgsql_param_fetch() change from copy-paramlistinfo-fixes.patch is 
> > still
> > formally required.
> 
> So why am I not just doing that, then?  Seems a lot more surgical.

do $$
declare
param_unused text := repeat('a', 100 * 1024 * 1024);
param_used oid := 403;
begin
perform count(*) from pg_am where oid = param_used;
end
$$;

I expect that if you were to inspect the EstimateParamListSpace() return
values when executing that, you would find that it serializes the irrelevant
100 MiB datum.  No possible logic in plpgsql_param_fetch() could stop that
from happening, because copyParamList() and SerializeParamList() call the
paramFetch hook only for dynamic parameters.  Cursors faced the same problem,
which is the raison d'être for setup_unshared_param_list().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-24 Thread Robert Haas
On Fri, Oct 23, 2015 at 9:38 PM, Noah Misch  wrote:
> Since that specification permits ParamListInfo consumers to ignore paramMask,
> the plpgsql_param_fetch() change from copy-paramlistinfo-fixes.patch is still
> formally required.

So why am I not just doing that, then?  Seems a lot more surgical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-23 Thread Robert Haas
On Fri, Oct 23, 2015 at 3:35 AM, Amit Kapila  wrote:
> Considering parallelism at RelOptInfo level in the way as done in patch,
> won't consider the RelOptInfo's for child relations in case of Append node.
> Refer build_simple_rel().

Hmm, true, but what can go wrong there?  The same quals apply to both,
and either both are temp or neither is.

> Also for cases when parallelism is not enabled like max_parallel_degree = 0,
> the current way of doing could add an overhead of traversing the
> baserestrictinfo without need. I think one way to avoid that would be check
> that while setting parallelModeOK flag.

Good idea.

> Another point is that it will consider parallelism for cases where we really
> can't parallelize example for foreign table, sample scan.

As soon as we add the ability to push joins below Gather nodes, we
will be able to parallelize that stuff if it is joined to something we
can parallelize.  That's why this flag is so handy.

> One thing to note here is that we already have precedent of verifying qual
> push down safety while path generation (during subquery path generation),
> so it doesn't seem wrong to consider the same for parallel paths and it
> would
> minimize the cases where we need to evaluate parallelism.

Mmm, yeah.

>> The advantage of this is that the logic is centralized.  If we have
>> parallel seq scan and also, say, parallel bitmap heap scan, your
>> approach would require that we duplicate the logic to check for
>> parallel-restricted functions for each path generation function.
>
> Don't we anyway need that irrespective of caching it in RelOptInfo?
> During bitmappath creation, bitmapqual could contain something
> which needs to be evaluated for parallel-safety as it is built based
> on index paths which inturn can be based on some join clause.  As per
> patch, the join clause parallel-safety is checked much later than
> generation bitmappath.

Yes, it's possible there could be some additional checks needed here
for parameterized paths.  But we're not quite there yet, so I think we
can solve that problem when we get there.  I have it in mind that in
the future we may want a parallel_safe flag on each path, which would
normally match the consider_parallel flag on the RelOptInfo but could
instead be false if the path internally uses parallelism (since,
currently, Gather nodes cannot be nested) or if it's got
parallel-restricted parameterized quals.  However, that seems like
future work.

> + else if (IsA(node, SubPlan) || IsA(node, SubLink) ||
> + IsA(node, AlternativeSubPlan) || IsA(node, Param))
> + {
> + /*
> + * Since we don't have the ability to push subplans down to workers
> + * at present, we treat subplan references as parallel-restricted.
> + */
> + if (!context->allow_restricted)
> + return true;
> + }
>
> I think it is better to do this for PARAM_EXEC paramkind, as those are
> the cases where it would be subplan or initplan.

Right, OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-23 Thread Amit Kapila
On Fri, Oct 23, 2015 at 10:33 AM, Robert Haas  wrote:
>
> +   /*
> +* We can't finish transaction commit or abort until all
of the
> +* workers are dead.  This means, in particular, that
> we can't respond
> +* to interrupts at this stage.
> +*/
> +   HOLD_INTERRUPTS();
> +   status =
> WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
> +   RESUME_INTERRUPTS();
>
> These comments are correct when this code is called from
> DestroyParallelContext(), but they're flat wrong when called from
> ReinitializeParallelDSM().  I suggest moving the comment back to
> DestroyParallelContext and following it with this:
>
> HOLD_INTERRUPTS();
> WaitForParallelWorkersToDie();
> RESUME_INTERRUPTS();
>
> Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie()
itself.
>

Changed as per suggestion.

> This hunk is a problem:
>
> case 'X':   /* Terminate,
> indicating clean exit */
> {
> -   pfree(pcxt->worker[i].bgwhandle);
> pfree(pcxt->worker[i].error_mqh);
> -   pcxt->worker[i].bgwhandle = NULL;
> pcxt->worker[i].error_mqh = NULL;
> break;
> }
>
> If you do that on receipt of the 'X' message, then
> DestroyParallelContext() might SIGTERM a worker that has supposedly
> exited cleanly.  That seems bad.  I think maybe the solution is to
> make DestroyParallelContext() terminate the worker only if
> pcxt->worker[i].error_mqh != NULL.

Changed as per suggestion.

>   So make error_mqh == NULL mean a
> clean loss of a worker: either we couldn't register it, or it exited
> cleanly.  And bgwhandle == NULL would mean it's actually gone.
>

I think even if error_mqh is NULL, it not guarnteed that the worker has
exited, it ensures that clean worker shutdown is either in-progress or
done.

> It makes sense to have ExecShutdownGather and
> ExecShutdownGatherWorkers, but couldn't the former call the latter
> instead of duplicating the code?
>

makes sense, so changed accordingly.

> I think ReInitialize should be capitalized as Reinitialize throughout.
>

Changed as per suggestion.

> ExecParallelReInitializeTupleQueues is almost a cut-and-paste
> duplicate of ExecParallelSetupTupleQueues.  Please refactor this to
> avoid duplication - e.g. change
> ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second
> argument bool reinit. ExecParallelReInitializeTupleQueues can just do
> ExecParallelSetupTupleQueues(pxct, true).
>

Changed as per suggestion.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_partialseqscan_v23.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-23 Thread Amit Kapila
On Fri, Oct 23, 2015 at 5:45 PM, Robert Haas  wrote:
>
> On Fri, Oct 23, 2015 at 3:35 AM, Amit Kapila 
wrote:
> > Considering parallelism at RelOptInfo level in the way as done in patch,
> > won't consider the RelOptInfo's for child relations in case of Append
node.
> > Refer build_simple_rel().
>
> Hmm, true, but what can go wrong there?  The same quals apply to both,
> and either both are temp or neither is.
>

The base rel's consider_parallel flag won't be percolated to childrels, so
even
if we mark base rel as parallel capable, while generating the path it won't
be considered.  I think we need to find a way to pass on that information if
we want to follow this way.

>
> >> The advantage of this is that the logic is centralized.  If we have
> >> parallel seq scan and also, say, parallel bitmap heap scan, your
> >> approach would require that we duplicate the logic to check for
> >> parallel-restricted functions for each path generation function.
> >
> > Don't we anyway need that irrespective of caching it in RelOptInfo?
> > During bitmappath creation, bitmapqual could contain something
> > which needs to be evaluated for parallel-safety as it is built based
> > on index paths which inturn can be based on some join clause.  As per
> > patch, the join clause parallel-safety is checked much later than
> > generation bitmappath.
>
> Yes, it's possible there could be some additional checks needed here
> for parameterized paths.  But we're not quite there yet, so I think we
> can solve that problem when we get there.  I have it in mind that in
> the future we may want a parallel_safe flag on each path, which would
> normally match the consider_parallel flag on the RelOptInfo but could
> instead be false if the path internally uses parallelism (since,
> currently, Gather nodes cannot be nested) or if it's got
> parallel-restricted parameterized quals.  However, that seems like
> future work.
>

True, we can do that way.  What I was trying to convey by above is
that we anyway need checks during path creation atleast in some
of the cases, so why not do all the checks at that time only as I
think all the information will be available at that time.

I think if we store parallelism related info in RelOptInfo, that can also
be made to work, but the only worry I have with that approach is we
need to have checks at two levels one at RelOptInfo formation time
and other at Path formation time.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-23 Thread Noah Misch
On Thu, Oct 22, 2015 at 11:59:58PM -0400, Robert Haas wrote:
> On Thu, Oct 15, 2015 at 8:23 PM, Noah Misch  wrote:
> > Agreed.  More specifically, I had in mind for copyParamList() to check the
> > mask while e.g. ExecEvalParamExtern() would either check nothing or merely
> > assert that any mask included the requested parameter.  It would be tricky 
> > to
> > verify that as safe, so ...
> >
> >> Would it work to define this as "if non-NULL,
> >> params lacking a 1-bit may be safely ignored"?  Or some other tweak
> >> that basically says that you don't need to care about this, but you
> >> can if you want to.
> >
> > ... this is a better specification.
> 
> Here's an attempt to implement that.

Since that specification permits ParamListInfo consumers to ignore paramMask,
the plpgsql_param_fetch() change from copy-paramlistinfo-fixes.patch is still
formally required.

> @@ -50,6 +51,7 @@ copyParamList(ParamListInfo from)
>   retval->parserSetup = NULL;
>   retval->parserSetupArg = NULL;
>   retval->numParams = from->numParams;
> + retval->paramMask = bms_copy(from->paramMask);

Considering that this function squashes the masked params, I wonder if it
should just store NULL here.

>  
>   for (i = 0; i < from->numParams; i++)
>   {
> @@ -58,6 +60,20 @@ copyParamList(ParamListInfo from)
>   int16   typLen;
>   booltypByVal;
>  
> + /*
> +  * Ignore parameters we don't need, to save cycles and space, 
> and
> +  * in case the fetch hook might fail.
> +  */
> + if (retval->paramMask != NULL &&
> + !bms_is_member(i, retval->paramMask))

The "and in case the fetch hook might fail" in this comment and its clones is
contrary to the above specification.  Under that specification, it would be a
bug in the ParamListInfo producer to rely on consumers checking paramMask.
Saving cycles/space would be the spec-approved paramMask use.

Consider adding an XXX comment to the effect that cursors ought to stop using
unshared param lists.  The leading comment at setup_unshared_param_list() is a
good home for such an addition.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-23 Thread Amit Kapila
On Fri, Oct 23, 2015 at 5:14 AM, Robert Haas  wrote:
>
> On Tue, Oct 13, 2015 at 5:59 PM, Robert Haas 
wrote:
> > - Although the changes in parallelpaths.c are in a good direction, I'm
> > pretty sure this is not yet up to scratch.  I am less sure exactly
> > what needs to be fixed, so I'll have to give some more thought to
> > that.
>
> Please find attached a proposed set of changes that I think are
> better.  These changes compute a consider_parallel flag for each
> RelOptInfo, which is true if it's a non-temporary relation whose
> baserestrictinfo references no PARAM_EXEC parameters, sublinks, or
> parallel-restricted functions.  Actually, I made an effort to set the
> flag correctly even for baserels other than plain tables, and for
> joinrels, though we don't technically need that stuff until we get to
> the point of pushing joins beneath Gather nodes.  When we get there,
> it will be important - any joinrel for which consider_parallel = false
> needn't even try to generate parallel paths, while if
> consider_parallel = true then we can consider it, if the costing makes
> sense.
>

Considering parallelism at RelOptInfo level in the way as done in patch,
won't consider the RelOptInfo's for child relations in case of Append node.
Refer build_simple_rel().

Also for cases when parallelism is not enabled like max_parallel_degree = 0,
the current way of doing could add an overhead of traversing the
baserestrictinfo without need. I think one way to avoid that would be check
that while setting parallelModeOK flag.

Another point is that it will consider parallelism for cases where we really
can't parallelize example for foreign table, sample scan.

One thing to note here is that we already have precedent of verifying qual
push down safety while path generation (during subquery path generation),
so it doesn't seem wrong to consider the same for parallel paths and it
would
minimize the cases where we need to evaluate parallelism.

> The advantage of this is that the logic is centralized.  If we have
> parallel seq scan and also, say, parallel bitmap heap scan, your
> approach would require that we duplicate the logic to check for
> parallel-restricted functions for each path generation function.
>

Don't we anyway need that irrespective of caching it in RelOptInfo?
During bitmappath creation, bitmapqual could contain something
which needs to be evaluated for parallel-safety as it is built based
on index paths which inturn can be based on some join clause.  As per
patch, the join clause parallel-safety is checked much later than
generation bitmappath.


+ else if (IsA(node, SubPlan) || IsA(node, SubLink) ||
+ IsA(node, AlternativeSubPlan) || IsA(node, Param))
+ {
+ /*
+ * Since we don't have the ability to push subplans down to workers
+ * at present, we treat subplan references as parallel-restricted.
+ */
+ if (!context->allow_restricted)
+ return true;
+ }

I think it is better to do this for PARAM_EXEC paramkind, as those are
the cases where it would be subplan or initplan.





With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-22 Thread Robert Haas
On Tue, Oct 13, 2015 at 5:59 PM, Robert Haas  wrote:
> - Although the changes in parallelpaths.c are in a good direction, I'm
> pretty sure this is not yet up to scratch.  I am less sure exactly
> what needs to be fixed, so I'll have to give some more thought to
> that.

Please find attached a proposed set of changes that I think are
better.  These changes compute a consider_parallel flag for each
RelOptInfo, which is true if it's a non-temporary relation whose
baserestrictinfo references no PARAM_EXEC parameters, sublinks, or
parallel-restricted functions.  Actually, I made an effort to set the
flag correctly even for baserels other than plain tables, and for
joinrels, though we don't technically need that stuff until we get to
the point of pushing joins beneath Gather nodes.  When we get there,
it will be important - any joinrel for which consider_parallel = false
needn't even try to generate parallel paths, while if
consider_parallel = true then we can consider it, if the costing makes
sense.

The advantage of this is that the logic is centralized.  If we have
parallel seq scan and also, say, parallel bitmap heap scan, your
approach would require that we duplicate the logic to check for
parallel-restricted functions for each path generation function.  By
caching it in the RelOptInfo, we don't have to do that.  The function
you wrote to generate parallel paths can just check the flag; if it's
false, return without generating any paths.  If it's true, then
parallel paths can be considered.

Ultimately, I think that each RelOptInfo should have a new List *
member containing a list of partial paths for that relation.  For a
baserel, we generate a partial path (e.g. Partial Seq Scan).  Then, we
can consider turning each partial path into a complete path by pushing
a Gather path on top of it.  For a joinrel, we can consider generating
a partial hash join or partial nest loop path by taking an outer
partial path and an ordinary inner path and putting the appropriate
path on top.  In theory it would also be correct to generate merge
join paths this way, but it's difficult to believe that such a plan
would ever be anything but a disaster.  These can then be used to
generate a complete path by putting a Gather node on top of them, or
they can bubble up to the next level of the join tree in the same way.
However, I think for the first version of this we can keep it simple:
if the consider_parallel flag is set on a relation, consider Gather ->
Partial Seq Scan.  If not, forget it.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 79eb838ba87b19788617aac611712ed734e0102c
Author: Robert Haas 
Date:   Fri Oct 2 23:57:46 2015 -0400

Strengthen planner infrastructure for parallelism.

Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3e75cd1..0030a9b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1868,6 +1868,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_INT_FIELD(width);
 	WRITE_BOOL_FIELD(consider_startup);
 	WRITE_BOOL_FIELD(consider_param_startup);
+	WRITE_BOOL_FIELD(consider_parallel);
 	WRITE_NODE_FIELD(reltargetlist);
 	WRITE_NODE_FIELD(pathlist);
 	WRITE_NODE_FIELD(ppilist);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8fc1cfd..f582b86 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -21,6 +21,7 @@
 #include "access/tsmapi.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_proc.h"
 #include "foreign/fdwapi.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -71,6 +72,9 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
+		  RangeTblEntry *rte);
+static bool function_rte_parallel_ok(RangeTblEntry *rte);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -158,7 +162,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_consider_startup(root);
 
 	/*
-	 * Generate access paths for the base rels.
+	 * Generate access paths for the base 

Re: [HACKERS] Parallel Seq Scan

2015-10-22 Thread Robert Haas
On Tue, Oct 20, 2015 at 3:04 AM, Amit Kapila  wrote:
> I have rebased the partial seq scan patch based on the above committed
> patches.  Now for rescanning it reuses the dsm and to achieve that we
> need to ensure that workers have been completely shutdown and then
> reinitializes the dsm.  To ensure complete shutdown of workers, current
> function  WaitForParallelWorkersToFinish is not sufficient as that just
> waits for the last message to receive from worker backend, so I have
> written a new function WaitForParallelWorkersToDie.  Also on receiving
> 'X' message in HandleParallelMessage, it just frees the worker handle
> without ensuring if the worker is died due to which later it will be
> difficult
> to even find whether worker is died or not, so I have removed that code
> from HandleParallelMessage.  Another change is that after receiving last
> tuple in Gather node, it just shutdown down the workers without
> destroying the dsm.

+   /*
+* We can't finish transaction commit or abort until all of the
+* workers are dead.  This means, in particular, that
we can't respond
+* to interrupts at this stage.
+*/
+   HOLD_INTERRUPTS();
+   status =
WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
+   RESUME_INTERRUPTS();

These comments are correct when this code is called from
DestroyParallelContext(), but they're flat wrong when called from
ReinitializeParallelDSM().  I suggest moving the comment back to
DestroyParallelContext and following it with this:

HOLD_INTERRUPTS();
WaitForParallelWorkersToDie();
RESUME_INTERRUPTS();

Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie() itself.

This hunk is a problem:

case 'X':   /* Terminate,
indicating clean exit */
{
-   pfree(pcxt->worker[i].bgwhandle);
pfree(pcxt->worker[i].error_mqh);
-   pcxt->worker[i].bgwhandle = NULL;
pcxt->worker[i].error_mqh = NULL;
break;
}

If you do that on receipt of the 'X' message, then
DestroyParallelContext() might SIGTERM a worker that has supposedly
exited cleanly.  That seems bad.  I think maybe the solution is to
make DestroyParallelContext() terminate the worker only if
pcxt->worker[i].error_mqh != NULL.  So make error_mqh == NULL mean a
clean loss of a worker: either we couldn't register it, or it exited
cleanly.  And bgwhandle == NULL would mean it's actually gone.

It makes sense to have ExecShutdownGather and
ExecShutdownGatherWorkers, but couldn't the former call the latter
instead of duplicating the code?

I think ReInitialize should be capitalized as Reinitialize throughout.

ExecParallelReInitializeTupleQueues is almost a cut-and-paste
duplicate of ExecParallelSetupTupleQueues.  Please refactor this to
avoid duplication - e.g. change
ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second
argument bool reinit. ExecParallelReInitializeTupleQueues can just do
ExecParallelSetupTupleQueues(pxct, true).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-22 Thread Tom Lane
Robert Haas  writes:
> On Thu, Oct 15, 2015 at 8:23 PM, Noah Misch  wrote:
>>> Would it work to define this as "if non-NULL,
>>> params lacking a 1-bit may be safely ignored"?  Or some other tweak
>>> that basically says that you don't need to care about this, but you
>>> can if you want to.

>> ... this is a better specification.

> Here's an attempt to implement that.

BTW, my Salesforce colleagues have been bit^H^H^Hgriping for quite some
time about the performance costs associated with translating between
plpgsql's internal PLpgSQL_datum-array format and the ParamListInfo
representation.  Maybe it's time to think about some wholesale redesign of
ParamListInfo?  Because TBH this patch doesn't seem like much but a kluge.
It's mostly layering still-another bunch of ad-hoc restrictions on
copyParamList, without removing any one of the kluges we had already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-22 Thread Robert Haas
On Thu, Oct 15, 2015 at 8:23 PM, Noah Misch  wrote:
> Agreed.  More specifically, I had in mind for copyParamList() to check the
> mask while e.g. ExecEvalParamExtern() would either check nothing or merely
> assert that any mask included the requested parameter.  It would be tricky to
> verify that as safe, so ...
>
>> Would it work to define this as "if non-NULL,
>> params lacking a 1-bit may be safely ignored"?  Or some other tweak
>> that basically says that you don't need to care about this, but you
>> can if you want to.
>
> ... this is a better specification.

Here's an attempt to implement that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 1ffc4a3a1bac686b46d47dfa40bd0eb3cb8b0be4
Author: Robert Haas 
Date:   Thu Oct 22 23:56:51 2015 -0400

Fix problems with ParamListInfo serialization mechanism.

Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker.  However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch.  Moreover, plpgsql_param_fetch
requires adjustment because the serialization mechanism needs it to skip
evaluating unused parameters just as we would do when it is called from
copyParamList, but params == estate->paramLI in that case.  To fix,
have setup_param_list set a new ParamListInfo field, paramMask, to
the parameters actually used in the expression, so that we don't try
to fetch those that are not needed.

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index fb33d30..0d4aa69 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -392,6 +392,7 @@ EvaluateParams(PreparedStatement *pstmt, List *params,
 	paramLI->parserSetup = NULL;
 	paramLI->parserSetupArg = NULL;
 	paramLI->numParams = num_params;
+	paramLI->paramMask = NULL;
 
 	i = 0;
 	foreach(l, exprstates)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 812a610..0919c04 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -910,6 +910,7 @@ postquel_sub_params(SQLFunctionCachePtr fcache,
 			paramLI->parserSetup = NULL;
 			paramLI->parserSetupArg = NULL;
 			paramLI->numParams = nargs;
+			paramLI->paramMask = NULL;
 			fcache->paramLI = paramLI;
 		}
 		else
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 300401e..13ddb8f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2330,6 +2330,7 @@ _SPI_convert_params(int nargs, Oid *argtypes,
 		paramLI->parserSetup = NULL;
 		paramLI->parserSetupArg = NULL;
 		paramLI->numParams = nargs;
+		paramLI->paramMask = NULL;
 
 		for (i = 0; i < nargs; i++)
 		{
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index d093263..9f5fd3a 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -15,6 +15,7 @@
 
 #include "postgres.h"
 
+#include "nodes/bitmapset.h"
 #include "nodes/params.h"
 #include "storage/shmem.h"
 #include "utils/datum.h"
@@ -50,6 +51,7 @@ copyParamList(ParamListInfo from)
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
 	retval->numParams = from->numParams;
+	retval->paramMask = bms_copy(from->paramMask);
 
 	for (i = 0; i < from->numParams; i++)
 	{
@@ -58,6 +60,20 @@ copyParamList(ParamListInfo from)
 		int16		typLen;
 		bool		typByVal;
 
+		/*
+		 * Ignore parameters we don't need, to save cycles and space, and
+		 * in case the fetch hook might fail.
+		 */
+		if (retval->paramMask != NULL &&
+			!bms_is_member(i, retval->paramMask))
+		{
+			nprm->value = (Datum) 0;
+			nprm->isnull = true;
+			nprm->pflags = 0;
+			nprm->ptype = InvalidOid;
+			continue;
+		}
+
 		/* give hook a chance in case parameter is dynamic */
 		if (!OidIsValid(oprm->ptype) && from->paramFetch != NULL)
 			(*from->paramFetch) (from, i + 1);
@@ -90,19 +106,31 @@ EstimateParamListSpace(ParamListInfo paramLI)
 	for (i = 0; i < paramLI->numParams; i++)
 	{
 		ParamExternData *prm = >params[i];
+		Oid			typeOid;
 		int16		typLen;
 		bool		typByVal;
 
-		/* give hook a chance in case parameter is dynamic */
-		if (!OidIsValid(prm->ptype) && paramLI->paramFetch != NULL)
-			(*paramLI->paramFetch) (paramLI, i + 1);
+		/*
+		 * Ignore parameters we don't need, to save cycles and space, and
+		 * in case the fetch hook might fail.
+		 */
+		if (paramLI->paramMask != NULL &&
+			!bms_is_member(i, paramLI->paramMask))
+			typeOid = InvalidOid;
+		else
+		{
+			/* give hook a chance in case parameter is dynamic */
+			if (!OidIsValid(prm->ptype) && paramLI->paramFetch != NULL)
+(*paramLI->paramFetch) (paramLI, i + 1);
+			typeOid = prm->ptype;
+		}
 
 		sz = add_size(sz, sizeof(Oid));			/* space for type OID */
 		sz = add_size(sz, sizeof(uint16));		/* space for pflags */
 
 		/* space for datum/isnull 

Re: [HACKERS] Parallel Seq Scan

2015-10-22 Thread Robert Haas
On Fri, Oct 23, 2015 at 12:31 AM, Tom Lane  wrote:
> BTW, my Salesforce colleagues have been bit^H^H^Hgriping for quite some
> time about the performance costs associated with translating between
> plpgsql's internal PLpgSQL_datum-array format and the ParamListInfo
> representation.  Maybe it's time to think about some wholesale redesign of
> ParamListInfo?  Because TBH this patch doesn't seem like much but a kluge.
> It's mostly layering still-another bunch of ad-hoc restrictions on
> copyParamList, without removing any one of the kluges we had already.

I have no objection to some kind of a redesign there, but (1) I don't
think we're going to be better off doing that before getting Partial
Seq Scan committed and (2) I don't think I'm the best-qualified person
to do the work.  With respect to the first point, despite my best
efforts, this feature is going to have bugs, and getting it committed
in November without a ParamListInfo redesign is surely going to be
better for the overall stability of PostgreSQL and the timeliness of
our release schedule than getting it committed in February with such a
redesign -- never mind that this is far from the only redesign into
which I could get sucked.  I want to put in place some narrow fix for
this issue so that I can move forward.  Three alternatives have been
proposed so far: (1) this, (2) the fix I coded and posted previously,
which made plpgsql_param_fetch's bms_is_member test unconditional, and
(3) not allowing PL/pgsql to run parallel queries.  (3) sounds worse
to me than either (1) or (2); I defer to others on which of (1) and
(2) is preferable, or perhaps you have another proposal.

On the second point, I really don't know enough about the problems
with ParamListInfo to know what would be better, so I can't really
help there.  If you do and want to redesign it, fine, but I really
need whatever you replace it with have an easy way of serializing and
restoring it - be it nodeToString() and stringToNode(),
SerializeParamList and RestoreParamList, or whatever.  Without that,
parallel query is going to have to be disabled for any query involving
parameters, and that would be, uh, extremely sad.  Also, FWIW, in my
opinion, it would be far more useful to PostgreSQL for you to finish
the work on upper planner path-ification ... an awful lot of people
are waiting for that to be completed to start their own work, or are
doing work that may have to be completely redone when that lands.
YMMV, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-20 Thread Amit Kapila
On Fri, Oct 16, 2015 at 9:51 AM, Robert Haas  wrote:
>
> On Mon, Oct 5, 2015 at 8:20 AM, Amit Kapila 
wrote:
> > [ new patch for heapam.c changes ]
>
> There's a second patch attached here as well, parallel-relaunch.patch,
> which makes it possible to relaunch workers with the same parallel
> context.  Currently, after you WaitForParallelWorkersToFinish(), you
> must proceed without fail to DestroyParallelContext().  With this
> rather simple patch, you have the option to instead go back and again
> LaunchParallelWorkers(), which is nice because it avoids the overhead
> of setting up a new DSM and filling it with all of your transaction
> state a second time.  I'd like to commit this as well, and I think we
> should revise execParallel.c to use it.
>

I have rebased the partial seq scan patch based on the above committed
patches.  Now for rescanning it reuses the dsm and to achieve that we
need to ensure that workers have been completely shutdown and then
reinitializes the dsm.  To ensure complete shutdown of workers, current
function  WaitForParallelWorkersToFinish is not sufficient as that just
waits for the last message to receive from worker backend, so I have
written a new function WaitForParallelWorkersToDie.  Also on receiving
'X' message in HandleParallelMessage, it just frees the worker handle
without ensuring if the worker is died due to which later it will be
difficult
to even find whether worker is died or not, so I have removed that code
from HandleParallelMessage.  Another change is that after receiving last
tuple in Gather node, it just shutdown down the workers without
destroying the dsm.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_partialseqscan_v22.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Amit Kapila
On Sat, Oct 17, 2015 at 4:54 PM, Robert Haas  wrote:
>
> On Sat, Oct 17, 2015 at 2:44 AM, Amit Kapila 
wrote:
> > I am not denying from that fact, the point I wanted to convey here is
that
> > the logic guarded by "params != estate->paramLI" in plpgsql_param_fetch
> > is only needed if cursors are in use otherwise we won't need them even
> > for parallel query.
>
> Well, I think what Noah and are trying to explain is that this is not
> true.  The problem is that, even if there are no cursors anywhere in
> the picture, there might be some variable in the param list that is
> not used by the parallel query but which, if evaluated, leads to an
> error.
>

I understand what Noah wants to say, it's just that I am not able to see
how that is possible by looking at current code.  The code is not straight
forward in this area, so there is a good chance that I might be missing
something.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Amit Kapila
On Sat, Oct 17, 2015 at 2:41 AM, Robert Haas  wrote:
>
> On Fri, Oct 16, 2015 at 2:29 AM, Amit Kapila 
wrote:
> >> Yeah, but I think the scenario is legitimate.  When a query gets run
> >> from within PL/pgsql, parallelism is an option, at least as we have
> >> the code today.  So if a Gather were present, and the query used a
> >> parameter, then you could have this issue.  For example:
> >>
> >> SELECT * FROM bigtable WHERE unindexed_column = some_plpgsql_variable;
> >>
> >
> > I don't think for such statements the control flow will set up an
unshared
> > param list.  I have tried couple of such statements [1] and found that
> > always such parameters are set up by setup_param_list().  I think there
> > are only two possibilities which could lead to setting up of unshared
> > params:
> >
> > 1. Usage of cursors - This is already prohibited for parallel-mode.
> > 2. Usage of read-write-param - This only happens for expressions like
> > x := array_append(x, foo) (Refer exec_check_rw_parameter()).  Read-write
> > params are not used for SQL statements. So this also won't be used for
> > parallel-mode
> >
> > There is a chance that I might be missing some case where unshared
> > params will be required for parallel-mode (as of today), but if not then
> > I think we can live without current changes.
>
> *shrug*
>
> The gather-test stuff isn't failing for no reason.  Either PL/pgsql
> shouldn't be passing CURSOR_OPT_PARALLEL_OK, or having a parallel plan
> get generated there should work.  There's not a third option.
>

Agreed and on looking at code, I think in below code, if we pass
parallelOK as true for the cases where Portal is non-null, such a
problem could happen.


static int

exec_run_select(PLpgSQL_execstate *estate,

PLpgSQL_expr *expr, long maxtuples, Portal *portalP,

bool parallelOK)

{

ParamListInfo paramLI;

int rc;


/*

* On the first call for this expression generate the plan

*/

if (expr->plan == NULL)

exec_prepare_plan(estate, expr, parallelOK ?

CURSOR_OPT_PARALLEL_OK : 0);


/*

* If a portal was requested, put the query into the portal

*/

if (portalP != NULL)

{

/*

* Set up short-lived ParamListInfo

*/

paramLI = setup_unshared_param_list(estate, expr);


*portalP = SPI_cursor_open_with_paramlist(NULL, expr->plan,

  paramLI,

  estate->readonly_func);



and one such case is

exec_stmt_return_query()
{
..

if (stmt->query != NULL)

{

/* static query */

exec_run_select(estate, stmt->query, 0, , true);
..
}


In this function we are using controlled fetch mechanism (count as 50) to
fetch the tuples which we initially thought of not supporting for
parallelism,
as such a mechanism is not built for parallel workers and that is the
reason we want to prohibit cases where ever cursor is getting created.

Do we want to support parallelism for this case on the basis that this API
will eventually fetch all the tuples by using controlled fetch mechanism?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Amit Kapila
On Sat, Oct 17, 2015 at 12:06 PM, Noah Misch  wrote:
>
> On Sat, Oct 17, 2015 at 11:00:57AM +0530, Amit Kapila wrote:
> > On Sat, Oct 17, 2015 at 6:15 AM, Noah Misch  wrote:
> > > On Thu, Oct 15, 2015 at 04:30:01PM +0530, Amit Kapila wrote:
> > > > On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas 
wrote:
> > > > > plpgsql_param_fetch() assumes that it can detect whether it's
being
> > > > > called from copyParamList() by checking whether params !=
> > > > > estate->paramLI.  I don't know why this works, but I do know that
this
> > > > > test fails to detect the case where it's being called from
> > > > > SerializeParamList(), which causes failures in exec_eval_datum()
as
> > > > > predicted.  Calls from SerializeParamList() need the same
treatment as
> > > > > calls from copyParamList() because it, too, will try to evaluate
every
> > > > > parameter in the list.
> > > >
> > > > From what I understood by looking at code in this area, I think the
> > check
> > > > params != estate->paramLI and code under it is required for
parameters
> > > > that are setup by setup_unshared_param_list().  Now unshared params
> > > > are only created for Cursors and expressions that are passing a R/W
> > > > object pointer; for cursors we explicitly prohibit the parallel
> > > > plan generation
> > > > and I am not sure if it makes sense to generate parallel plans for
> > > > expressions
> > > > involving R/W object pointer, if we don't generate parallel plan
where
> > > > expressions involve such parameters, then SerializeParamList()
should
> > not
> > > > be affected by the check mentioned by you.
> > >
> > > The trouble comes from the opposite direction.  A
setup_unshared_param_list()
> > > list is fine under today's code, but a shared param list needs more
help.  To
> > > say it another way, parallel queries that use the shared
estate->paramLI need,
> > > among other help, the logic now guarded by "params !=
estate->paramLI".
> > >
> >
> > Why would a parallel query need such a logic, that logic is needed
mainly
> > for cursor with params and we don't want a parallelize such cases?
>
> This is not about mixing cursors with parallelism.  Cursors get special
> treatment because each cursor copies its param list.  Parallel query also
> copies (more precisely, serializes) its param list.  You need certain
logic
> for every param list subject to being copied.
>

I am not denying from that fact, the point I wanted to convey here is that
the logic guarded by "params != estate->paramLI" in plpgsql_param_fetch
is only needed if cursors are in use otherwise we won't need them even
for parallel query.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Amit Kapila
On Sat, Oct 17, 2015 at 11:45 AM, Amit Kapila 
wrote:
>
> Agreed and on looking at code, I think in below code, if we pass
> parallelOK as true for the cases where Portal is non-null, such a
> problem could happen.
>
>
> static int
>
> exec_run_select(PLpgSQL_execstate *estate,
>
> PLpgSQL_expr *expr, long maxtuples, Portal *portalP,
>
> bool parallelOK)
>
> {
>
> ParamListInfo paramLI;
>
> int rc;
>
>
> /*
>
> * On the first call for this expression generate the plan
>
> */
>
> if (expr->plan == NULL)
>
> exec_prepare_plan(estate, expr, parallelOK ?
>
> CURSOR_OPT_PARALLEL_OK : 0);
>
>
> /*
>
> * If a portal was requested, put the query into the portal
>
> */
>
> if (portalP != NULL)
>
> {
>
> /*
>
> * Set up short-lived ParamListInfo
>
> */
>
> paramLI = setup_unshared_param_list(estate, expr);
>
>
> *portalP = SPI_cursor_open_with_paramlist(NULL, expr->plan,
>
>   paramLI,
>
>   estate->readonly_func);
>
>
>
>
> and one such case is
>
> exec_stmt_return_query()
> {
> ..
>
> if (stmt->query != NULL)
>
> {
>
> /* static query */
>
> exec_run_select(estate, stmt->query, 0, , true);
>
> ..
> }
>
>
> In this function we are using controlled fetch mechanism (count as 50) to
> fetch the tuples which we initially thought of not supporting for
parallelism,
> as such a mechanism is not built for parallel workers and that is the
> reason we want to prohibit cases where ever cursor is getting created.
>

Here, I wanted to say that is one of the reasons for prohibiting cursors
for parallelism, certainly there are others like Backward scan.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Noah Misch
On Sat, Oct 17, 2015 at 11:00:57AM +0530, Amit Kapila wrote:
> On Sat, Oct 17, 2015 at 6:15 AM, Noah Misch  wrote:
> > On Thu, Oct 15, 2015 at 04:30:01PM +0530, Amit Kapila wrote:
> > > On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas  
> > > wrote:
> > > > plpgsql_param_fetch() assumes that it can detect whether it's being
> > > > called from copyParamList() by checking whether params !=
> > > > estate->paramLI.  I don't know why this works, but I do know that this
> > > > test fails to detect the case where it's being called from
> > > > SerializeParamList(), which causes failures in exec_eval_datum() as
> > > > predicted.  Calls from SerializeParamList() need the same treatment as
> > > > calls from copyParamList() because it, too, will try to evaluate every
> > > > parameter in the list.
> > >
> > > From what I understood by looking at code in this area, I think the
> check
> > > params != estate->paramLI and code under it is required for parameters
> > > that are setup by setup_unshared_param_list().  Now unshared params
> > > are only created for Cursors and expressions that are passing a R/W
> > > object pointer; for cursors we explicitly prohibit the parallel
> > > plan generation
> > > and I am not sure if it makes sense to generate parallel plans for
> > > expressions
> > > involving R/W object pointer, if we don't generate parallel plan where
> > > expressions involve such parameters, then SerializeParamList() should
> not
> > > be affected by the check mentioned by you.
> >
> > The trouble comes from the opposite direction.  A 
> > setup_unshared_param_list()
> > list is fine under today's code, but a shared param list needs more help.  
> > To
> > say it another way, parallel queries that use the shared estate->paramLI 
> > need,
> > among other help, the logic now guarded by "params != estate->paramLI".
> >
> 
> Why would a parallel query need such a logic, that logic is needed mainly
> for cursor with params and we don't want a parallelize such cases?

This is not about mixing cursors with parallelism.  Cursors get special
treatment because each cursor copies its param list.  Parallel query also
copies (more precisely, serializes) its param list.  You need certain logic
for every param list subject to being copied.  If PostgreSQL had no concept of
cursors, we'd be writing that same logic from scratch for parallel query.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Robert Haas
On Sat, Oct 17, 2015 at 2:44 AM, Amit Kapila  wrote:
> I am not denying from that fact, the point I wanted to convey here is that
> the logic guarded by "params != estate->paramLI" in plpgsql_param_fetch
> is only needed if cursors are in use otherwise we won't need them even
> for parallel query.

Well, I think what Noah and are trying to explain is that this is not
true.  The problem is that, even if there are no cursors anywhere in
the picture, there might be some variable in the param list that is
not used by the parallel query but which, if evaluated, leads to an
error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-17 Thread Robert Haas
On Sat, Oct 17, 2015 at 2:15 AM, Amit Kapila  wrote:
> Agreed and on looking at code, I think in below code, if we pass
> parallelOK as true for the cases where Portal is non-null, such a
> problem could happen.
>
> and one such case is
>
> exec_stmt_return_query()
> {
> ..
>
> if (stmt->query != NULL)
>
> {
>
> /* static query */
>
> exec_run_select(estate, stmt->query, 0, , true);
>
> ..
> }
>
>
> In this function we are using controlled fetch mechanism (count as 50) to
> fetch the tuples which we initially thought of not supporting for
> parallelism,
> as such a mechanism is not built for parallel workers and that is the
> reason we want to prohibit cases where ever cursor is getting created.
>
> Do we want to support parallelism for this case on the basis that this API
> will eventually fetch all the tuples by using controlled fetch mechanism?

That was my idea when I made that change, but I think it's not going
to work out well given the way the rest of the code works.  Possibly
that should be reverted for now, but maybe only after testing it.

It's worth noting that, as of commit
bfc78d7196eb28cd4e3d6c24f7e607bacecf1129, the consequences of invoking
the executor with a fetch count have been greatly reduced.
Previously, the assumption was that doing that was broken, and if you
did it you got to keep both pieces.  But that commit rejiggered things
so that your parallel plan just gets run serially in that case.  That
might not be great from a performance perspective, but it beats
undefined behavior by a wide margin.  So I suspect that there are some
decisions about where to pass CURSOR_OPT_PARALLEL_OK that need to be
revisited in the light of that change.  I haven't had time to do that
yet, but we should do it as soon as we get time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Robert Haas
On Fri, Oct 16, 2015 at 2:29 AM, Amit Kapila  wrote:
>> Yeah, but I think the scenario is legitimate.  When a query gets run
>> from within PL/pgsql, parallelism is an option, at least as we have
>> the code today.  So if a Gather were present, and the query used a
>> parameter, then you could have this issue.  For example:
>>
>> SELECT * FROM bigtable WHERE unindexed_column = some_plpgsql_variable;
>>
>
> I don't think for such statements the control flow will set up an unshared
> param list.  I have tried couple of such statements [1] and found that
> always such parameters are set up by setup_param_list().  I think there
> are only two possibilities which could lead to setting up of unshared
> params:
>
> 1. Usage of cursors - This is already prohibited for parallel-mode.
> 2. Usage of read-write-param - This only happens for expressions like
> x := array_append(x, foo) (Refer exec_check_rw_parameter()).  Read-write
> params are not used for SQL statements. So this also won't be used for
> parallel-mode
>
> There is a chance that I might be missing some case where unshared
> params will be required for parallel-mode (as of today), but if not then
> I think we can live without current changes.

*shrug*

The gather-test stuff isn't failing for no reason.  Either PL/pgsql
shouldn't be passing CURSOR_OPT_PARALLEL_OK, or having a parallel plan
get generated there should work.  There's not a third option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Amit Kapila
On Thu, Oct 15, 2015 at 6:52 PM, Robert Haas  wrote:
>
> On Thu, Oct 15, 2015 at 7:00 AM, Amit Kapila 
wrote:
> >
> > From what I understood by looking at code in this area, I think the
check
> > params != estate->paramLI and code under it is required for parameters
> > that are setup by setup_unshared_param_list().  Now unshared params
> > are only created for Cursors and expressions that are passing a R/W
> > object pointer; for cursors we explicitly prohibit the parallel plan
> > generation
> > and I am not sure if it makes sense to generate parallel plans for
> > expressions
> > involving R/W object pointer, if we don't generate parallel plan where
> > expressions involve such parameters, then SerializeParamList() should
not
> > be affected by the check mentioned by you.  Is by anychance, this is
> > happening because you are testing by forcing gather node on top of
> > all kind of plans?
>
> Yeah, but I think the scenario is legitimate.  When a query gets run
> from within PL/pgsql, parallelism is an option, at least as we have
> the code today.  So if a Gather were present, and the query used a
> parameter, then you could have this issue.  For example:
>
> SELECT * FROM bigtable WHERE unindexed_column = some_plpgsql_variable;
>

I don't think for such statements the control flow will set up an unshared
param list.  I have tried couple of such statements [1] and found that
always such parameters are set up by setup_param_list().  I think there
are only two possibilities which could lead to setting up of unshared
params:

1. Usage of cursors - This is already prohibited for parallel-mode.
2. Usage of read-write-param - This only happens for expressions like
x := array_append(x, foo) (Refer exec_check_rw_parameter()).  Read-write
params are not used for SQL statements. So this also won't be used for
parallel-mode

There is a chance that I might be missing some case where unshared
params will be required for parallel-mode (as of today), but if not then
I think we can live without current changes.

[1] -
1.
create or replace function parallel_func_params() returns integer
as $$
declare
param_val int;
ret_val int;
begin
 param_val := 1000;
 Select c1 into ret_val from t1 where c1 = param_val;
 return ret_val;
end;
$$ language plpgsql;

For such a query, it will go in setup_param_list()

2.
create or replace function parallel_func_params_1() returns integer
as $$
declare
param_val int;
ret_val int;
begin
 param_val := 1000;
 Execute 'Select count(c1) from t1 where c1 = $1' Into ret_val Using
param_val;
 return ret_val;
end;
$$ language plpgsql;


3.
create or replace function parallel_func_params_2() returns integer
as $$
declare
param_val int;
ret_val int;
row_var t1%ROWTYPE;
begin
 param_val := 1000;
 Select * into row_var from t1 where c1 = param_val;
 return ret_val;
end;
$$ language plpgsql;


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Amit Kapila
On Fri, Oct 16, 2015 at 9:51 AM, Robert Haas  wrote:
>
> On Mon, Oct 5, 2015 at 8:20 AM, Amit Kapila 
wrote:
> > [ new patch for heapam.c changes ]
>
> I went over this in a fair amount of detail and reworked it somewhat.
> The result is attached as parallel-heapscan-revised.patch.  I think
> the way I did this is a bit cleaner than what you had, although it's
> basically the same thing.  There are fewer changes to initscan, and we
> don't need one function to initialize the starting block that must be
> called in each worker and then another one to get the next block, and
> generally the changes are a bit more localized.  I also went over the
> comments and, I think, improved them.  I tweaked the logic for
> reporting the starting scan position as the last position report; I
> think the way you had it the last report would be for one block
> earlier.  I'm pretty happy with this version and hope to commit it
> soon.
>

static BlockNumber
+heap_parallelscan_nextpage(HeapScanDesc scan)
+{
+ BlockNumber page = InvalidBlockNumber;
+ BlockNumber sync_startpage = InvalidBlockNumber;
+ BlockNumber report_page = InvalidBlockNumber;

..
..
+ * Report scan location.  Normally, we report the current page number.
+ * When we reach the end of the scan, though, we report the starting page,
+ * not the ending page, just so the starting positions for later scans
+ * doesn't slew backwards.  We only report the position at the end of the
+ * scan once, though: subsequent callers will have report nothing, since
+ * they will have page == InvalidBlockNumber.
+ */
+ if (scan->rs_syncscan)
+ {
+ if (report_page == InvalidBlockNumber)
+ report_page = page;
+ if (report_page != InvalidBlockNumber)
+ ss_report_location(scan->rs_rd, report_page);
+ }
..
}

I think due to above changes it will report sync location on each page
scan, don't we want to report it once at end of scan?

Other than that, patch looks okay to me.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Robert Haas
On Fri, Oct 16, 2015 at 7:42 AM, Amit Kapila  wrote:
> I think due to above changes it will report sync location on each page
> scan, don't we want to report it once at end of scan?

I think reporting for each page is correct.  Isn't that what the
non-parallel case does?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Robert Haas
On Thu, Oct 15, 2015 at 11:38 PM, Haribabu Kommi
 wrote:
> Some more tests that failed in similar configuration settings.
> 1. Table that is created under a begin statement is not visible in the worker.
> 2. permission problem in worker side for set role command.

The second problem, too, I have already posted a bug fix for, on a
thread which also contains a whole bunch of other bug fixes.  I'll get
those committed today so you can avoid wasting time finding bugs I've
already found and fixed.  Thanks for testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Amit Kapila
On Fri, Oct 16, 2015 at 5:31 PM, Robert Haas  wrote:
>
> On Fri, Oct 16, 2015 at 7:42 AM, Amit Kapila 
wrote:
> > I think due to above changes it will report sync location on each page
> > scan, don't we want to report it once at end of scan?
>
> I think reporting for each page is correct.  Isn't that what the
> non-parallel case does?
>

Yes, sorry I got confused.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Noah Misch
On Thu, Oct 15, 2015 at 04:30:01PM +0530, Amit Kapila wrote:
> On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas  wrote:
> > plpgsql_param_fetch() assumes that it can detect whether it's being
> > called from copyParamList() by checking whether params !=
> > estate->paramLI.  I don't know why this works, but I do know that this
> > test fails to detect the case where it's being called from
> > SerializeParamList(), which causes failures in exec_eval_datum() as
> > predicted.  Calls from SerializeParamList() need the same treatment as
> > calls from copyParamList() because it, too, will try to evaluate every
> > parameter in the list.
> 
> From what I understood by looking at code in this area, I think the check
> params != estate->paramLI and code under it is required for parameters
> that are setup by setup_unshared_param_list().  Now unshared params
> are only created for Cursors and expressions that are passing a R/W
> object pointer; for cursors we explicitly prohibit the parallel
> plan generation
> and I am not sure if it makes sense to generate parallel plans for
> expressions
> involving R/W object pointer, if we don't generate parallel plan where
> expressions involve such parameters, then SerializeParamList() should not
> be affected by the check mentioned by you.

The trouble comes from the opposite direction.  A setup_unshared_param_list()
list is fine under today's code, but a shared param list needs more help.  To
say it another way, parallel queries that use the shared estate->paramLI need,
among other help, the logic now guarded by "params != estate->paramLI".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-16 Thread Amit Kapila
On Sat, Oct 17, 2015 at 6:15 AM, Noah Misch  wrote:
>
> On Thu, Oct 15, 2015 at 04:30:01PM +0530, Amit Kapila wrote:
> > On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas 
wrote:
> > > plpgsql_param_fetch() assumes that it can detect whether it's being
> > > called from copyParamList() by checking whether params !=
> > > estate->paramLI.  I don't know why this works, but I do know that this
> > > test fails to detect the case where it's being called from
> > > SerializeParamList(), which causes failures in exec_eval_datum() as
> > > predicted.  Calls from SerializeParamList() need the same treatment as
> > > calls from copyParamList() because it, too, will try to evaluate every
> > > parameter in the list.
> >
> > From what I understood by looking at code in this area, I think the
check
> > params != estate->paramLI and code under it is required for parameters
> > that are setup by setup_unshared_param_list().  Now unshared params
> > are only created for Cursors and expressions that are passing a R/W
> > object pointer; for cursors we explicitly prohibit the parallel
> > plan generation
> > and I am not sure if it makes sense to generate parallel plans for
> > expressions
> > involving R/W object pointer, if we don't generate parallel plan where
> > expressions involve such parameters, then SerializeParamList() should
not
> > be affected by the check mentioned by you.
>
> The trouble comes from the opposite direction.  A
setup_unshared_param_list()
> list is fine under today's code, but a shared param list needs more
help.  To
> say it another way, parallel queries that use the shared estate->paramLI
need,
> among other help, the logic now guarded by "params != estate->paramLI".
>

Why would a parallel query need such a logic, that logic is needed mainly
for cursor with params and we don't want a parallelize such cases?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Amit Kapila
On Thu, Oct 15, 2015 at 5:39 PM, Haribabu Kommi 
wrote:
>
>
>
> On Thu, Oct 15, 2015 at 6:32 PM, Amit Kapila 
wrote:
> > On Wed, Oct 14, 2015 at 3:29 AM, Robert Haas 
wrote:
> > I think this got messed up while rebasing on top of Gather node
> > changes, but nonetheless, I have changed it such that PartialSeqScan
> > node handling is after SeqScan.
>
> Currently, the explain analyze of parallel seq scan plan is not showing
the allocated number of workers
> including the planned workers.I feel this information is good for users
in understanding the performance
> difference that is coming with parallel seq scan. It may be missed in
recent patch series. It was discussed
> in[1].
>

I am aware of that and purposefully kept it for a consecutive patch.
There are other things as well which I have left out from this patch
and those are:
a. Early stop of executor for Rescan purpose
b. Support of pushdown for plans containing InitPlan and SubPlans

Then there is more related work like
a. Support for prepared statements

Basically I think this could be done as add on patches once we are
done with basic patch.

> Currently there is no qualification evaluation at Result and Gather
nodes, because of this reason, if any
> query that contains any parallel restricted functions is not chosen for
parallel scan. Because of
> this reason, there is no difference between parallel restricted and
parallel unsafe functions currently.
>

This requires new infrastructure to pull restricted functions from
lower nodes to Gather node, so I have left it for another day.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Haribabu Kommi
On Thu, Oct 15, 2015 at 6:32 PM, Amit Kapila 
wrote:
> On Wed, Oct 14, 2015 at 3:29 AM, Robert Haas 
wrote:
> I think this got messed up while rebasing on top of Gather node
> changes, but nonetheless, I have changed it such that PartialSeqScan
> node handling is after SeqScan.

Currently, the explain analyze of parallel seq scan plan is not showing the
allocated number of workers
including the planned workers.I feel this information is good for users in
understanding the performance
difference that is coming with parallel seq scan. It may be missed in
recent patch series. It was discussed
in[1].

Currently there is no qualification evaluation at Result and Gather nodes,
because of this reason, if any
query that contains any parallel restricted functions is not chosen for
parallel scan. Because of
this reason, there is no difference between parallel restricted and
parallel unsafe functions currently.
Is it fine for first version?

[1]
http://www.postgresql.org/message-id/ca+tgmobhq0_+yobmlbjexvt4qef6xblfudax1owl-ivgan5...@mail.gmail.com


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Amit Kapila
On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas  wrote:
>
> On Sun, Oct 11, 2015 at 7:56 PM, Noah Misch  wrote:
> > I see no mention in this thread of varatt_indirect, but I anticipated
> > datumSerialize() reacting to it the same way datumCopy() reacts.  If
> > datumSerialize() can get away without doing so, why is that?
>
> Good point.  I don't think it can.  Attached is a patch to fix that.
> This patch also includes some somewhat-related changes to
> plpgsql_param_fetch() upon which I would appreciate any input you can
> provide.
>
> plpgsql_param_fetch() assumes that it can detect whether it's being
> called from copyParamList() by checking whether params !=
> estate->paramLI.  I don't know why this works, but I do know that this
> test fails to detect the case where it's being called from
> SerializeParamList(), which causes failures in exec_eval_datum() as
> predicted.  Calls from SerializeParamList() need the same treatment as
> calls from copyParamList() because it, too, will try to evaluate every
> parameter in the list.
>

>From what I understood by looking at code in this area, I think the check
params != estate->paramLI and code under it is required for parameters
that are setup by setup_unshared_param_list().  Now unshared params
are only created for Cursors and expressions that are passing a R/W
object pointer; for cursors we explicitly prohibit the parallel
plan generation
and I am not sure if it makes sense to generate parallel plans for
expressions
involving R/W object pointer, if we don't generate parallel plan where
expressions involve such parameters, then SerializeParamList() should not
be affected by the check mentioned by you.  Is by anychance, this is
happening because you are testing by forcing gather node on top of
all kind of plans?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Amit Kapila
On Wed, Oct 14, 2015 at 3:29 AM, Robert Haas  wrote:
>
> On Tue, Oct 13, 2015 at 2:45 AM, Amit Kapila 
wrote:
> > Attached is rebased patch for partial seqscan support.
>
> Review comments:
>
> - If you're going to pgindent execParallel.c, you need to add some
> entries to typedefs.list so it doesn't mangle the formatting.
> ExecParallelEstimate's parameter list is misformatted, for example.
> Also, I think if we're going to do this we had better extract the
> pgindent changes and commit those first.  It's pretty distracting the
> way you have it.
>

Agreed, we can do those separately. So, I have reverted those changes.

> - Instead of inlining the work needed by each parallel mode in
> ExecParallelEstimate(), I think you should mimic the style of
> ExecProcNode and call a node-type specific function that is part of
> that node's public interface - here, ExecPartialSeqScanEstimate,
> perhaps.  Similarly for ExecParallelInitializeDSM.  Perhaps
> ExecPartialSeqScanInitializeDSM.
>
> - I continue to think GetParallelShmToc is the wrong approach.
> Instead, each time ExecParallelInitializeDSM or
> ExecParallelInitializeDSM calls a nodetype-specific initialized
> function (as described in the previous point), have it pass d->pcxt as
> an argument.  The node can get the toc from there if it needs it.  I
> suppose it could store a pointer to the toc in its scanstate if it
> needs it, but it really shouldn't.  Instead, it should store a pointer
> to, say, the ParallelHeapScanDesc in the scanstate.  It really should
> only care about its own shared data, so once it finds that, the toc
> shouldn't be needed any more. Then ExecPartialSeqScan doesn't need to
> look up pscan; it's already recorded in the scanstate.
>
> - ExecParallelInitializeDSMContext's new pscan_len member is 100%
> wrong.  Individual scan nodes don't get to add stuff to that context
> object.  They should store details like this in their own ScanState as
> needed.
>

Changed the patch as per above suggestions.

> - The positioning of the new nodes in various lists doesn't seem to
> entirely consistent.  nodes.h adds them after SampleScan which isn't
> terrible, though maybe immediately after SeqScan would be better, but
> _outNode has it right after BitmapOr and the switch in _copyObject has
> it somewhere else again.
>

I think this got messed up while rebasing on top of Gather node
changes, but nonetheless, I have changed it such that PartialSeqScan
node handling is after SeqScan.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_partialseqscan_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Robert Haas
On Thu, Oct 15, 2015 at 1:51 AM, Noah Misch  wrote:
> Tests of prm->prmtype and paramLI->paramFetch appear superfluous.  Given that
> the paramListCopyHook callback would return a complete substitute
> ParamListInfo, I wouldn't expect SerializeParamList() to examine the the
> original paramLI->params at all.  If that's correct, the paramListCopyHook
> design sounds fine.  However, its implementation will be more complex than
> paramMask would have been.

Well, I think there are two use cases we care about.  If the
ParamListInfo came from Bind parameters sent via a protocol message,
then it will neither have a copy method nor require one.  If it came
from some source that plays fancy games, like PL/pgsql, then it needs
a safe way to copy the list.

>> This wouldn't require any
>> modification to the current plpgsql_param_fetch() at all, but the new
>> function would steal its bms_is_member() test.  Furthermore, no user
>> of ParamListInfo other than plpgsql needs to care at all -- which,
>> with your proposals, they would.
>
> To my knowledge, none of these approaches would compel existing users to care.
> They would leave paramMask or paramListCopyHook NULL and get today's behavior.

Well, looking at this proposal:

Bitmapset  *paramMask;  /* if non-NULL, ignore params lacking a 1-bit */

I read that to imply that every consumer of ParamListInfo objects
would need to account for the possibility of getting one with a
non-NULL paramMask.  Would it work to define this as "if non-NULL,
params lacking a 1-bit may be safely ignored"?  Or some other tweak
that basically says that you don't need to care about this, but you
can if you want to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Robert Haas
On Thu, Oct 15, 2015 at 7:00 AM, Amit Kapila  wrote:
> On Mon, Oct 12, 2015 at 9:16 PM, Robert Haas  wrote:
>> On Sun, Oct 11, 2015 at 7:56 PM, Noah Misch  wrote:
>> > I see no mention in this thread of varatt_indirect, but I anticipated
>> > datumSerialize() reacting to it the same way datumCopy() reacts.  If
>> > datumSerialize() can get away without doing so, why is that?
>>
>> Good point.  I don't think it can.  Attached is a patch to fix that.
>> This patch also includes some somewhat-related changes to
>> plpgsql_param_fetch() upon which I would appreciate any input you can
>> provide.
>>
>> plpgsql_param_fetch() assumes that it can detect whether it's being
>> called from copyParamList() by checking whether params !=
>> estate->paramLI.  I don't know why this works, but I do know that this
>> test fails to detect the case where it's being called from
>> SerializeParamList(), which causes failures in exec_eval_datum() as
>> predicted.  Calls from SerializeParamList() need the same treatment as
>> calls from copyParamList() because it, too, will try to evaluate every
>> parameter in the list.
>
> From what I understood by looking at code in this area, I think the check
> params != estate->paramLI and code under it is required for parameters
> that are setup by setup_unshared_param_list().  Now unshared params
> are only created for Cursors and expressions that are passing a R/W
> object pointer; for cursors we explicitly prohibit the parallel plan
> generation
> and I am not sure if it makes sense to generate parallel plans for
> expressions
> involving R/W object pointer, if we don't generate parallel plan where
> expressions involve such parameters, then SerializeParamList() should not
> be affected by the check mentioned by you.  Is by anychance, this is
> happening because you are testing by forcing gather node on top of
> all kind of plans?

Yeah, but I think the scenario is legitimate.  When a query gets run
from within PL/pgsql, parallelism is an option, at least as we have
the code today.  So if a Gather were present, and the query used a
parameter, then you could have this issue.  For example:

SELECT * FROM bigtable WHERE unindexed_column = some_plpgsql_variable;

So this can happen, I think, even with parallel sequential scan only,
even if Gather node is not otherwise used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-10-15 Thread Haribabu Kommi
On Fri, Oct 16, 2015 at 2:10 PM, Haribabu Kommi
 wrote:
> On Thu, Oct 15, 2015 at 11:45 PM, Amit Kapila  wrote:
>> On Thu, Oct 15, 2015 at 5:39 PM, Haribabu Kommi 
>> wrote:
>>>
>>>
>>>
>>> On Thu, Oct 15, 2015 at 6:32 PM, Amit Kapila 
>>> wrote:
>>> > On Wed, Oct 14, 2015 at 3:29 AM, Robert Haas 
>>> > wrote:
>>> > I think this got messed up while rebasing on top of Gather node
>>> > changes, but nonetheless, I have changed it such that PartialSeqScan
>>> > node handling is after SeqScan.
>>>
>>> Currently, the explain analyze of parallel seq scan plan is not showing
>>> the allocated number of workers
>>> including the planned workers.I feel this information is good for users in
>>> understanding the performance
>>> difference that is coming with parallel seq scan. It may be missed in
>>> recent patch series. It was discussed
>>> in[1].
>>>
>>
>> I am aware of that and purposefully kept it for a consecutive patch.
>> There are other things as well which I have left out from this patch
>> and those are:
>> a. Early stop of executor for Rescan purpose
>> b. Support of pushdown for plans containing InitPlan and SubPlans
>>
>> Then there is more related work like
>> a. Support for prepared statements
>>
>
> OK.
>
> During the test with latest patch, I found a dead lock between worker
> and backend
> on relation lock. To minimize the test scenario, I changed the number
> of pages required
> to start one worker to 1 and all parallel cost parameters as zero.
>
> Backend is waiting for the tuples from workers, workers are waiting on
> lock of relation.
> Attached is the sql script that can reproduce this issue.

Some more tests that failed in similar configuration settings.
1. Table that is created under a begin statement is not visible in the worker.
2. permission problem in worker side for set role command.


Regards,
Hari Babu
Fujitsu Australia


parallel_table_doesn't_exist_problem.sql
Description: Binary data


parallel_set_role_permission_problem.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   >