subject:"Re\: \[HACKERS\] Parallel query"

Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Konstantin Knizhnik




On 31.03.2017 13:48, Robert Haas wrote:

On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
 wrote:

It is possible to execute query concurrently using SPI?
If so, how it can be enforced?
I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
query is executed by single backend while the same query been launched at
top level uses parallel plan:

 fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
 ...
 SPI_cursor_fetch(fsstate->portal, true, 1);

Parallel execution isn't possible if you are using a cursor-type
interface, because a parallel query can't be suspended and resumed
like a non-parallel query.  If you use a function that executes the
query to completion in one go, like SPI_execute_plan, then it's cool.
See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.


Thank you very much for explanation.
In case of using SPI_execute the query is really executed concurrently.
But it means that when I am executing some query using SPI, I need to 
somehow predict number of returned tuples.
If it is not so much, then it is better to use SPI_execute to allow 
concurrent execution of the query.
But if it is large enough, then SPI_execute without limit can cause 
memory overflow.
Certainly I can specify some reasonable limit and it if is reached, then 
use cursor instead.

But it is neither convenient, neither efficient.

I wonder if somebody can suggest better solution?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Rafia Sabih

On Fri, Mar 31, 2017 at 4:18 PM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
>  wrote:
>> It is possible to execute query concurrently using SPI?
>> If so, how it can be enforced?
>> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
>> query is executed by single backend while the same query been launched at
>> top level uses parallel plan:
>>
>> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
>> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
>> ...
>> SPI_cursor_fetch(fsstate->portal, true, 1);
>
> Parallel execution isn't possible if you are using a cursor-type
> interface, because a parallel query can't be suspended and resumed
> like a non-parallel query.  If you use a function that executes the
> query to completion in one go, like SPI_execute_plan, then it's cool.
> See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.
>
> --

Adding to that, for your case, passing CURSOR_OPT_PARALLEL_OK is not
enough, because PortalRun for the cursor would be having
portal->run_once set as false which restricts parallelism in
ExecutePlan,
if (!execute_once || dest->mydest == DestIntoRel)
  use_parallel_mode = false;

You may check [1] for the discussion on this.

[1] 
https://www.postgresql.org/message-id/flat/CAFiTN-vxhvvi-rMJFOxkGzNaQpf%2BKS76%2Bsu7-sG_NQZGRPJkQg%40mail.gmail.com#cafitn-vxhvvi-rmjfoxkgznaqpf+ks76+su7-sg_nqzgrpj...@mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Robert Haas

On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
 wrote:
> It is possible to execute query concurrently using SPI?
> If so, how it can be enforced?
> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
> query is executed by single backend while the same query been launched at
> top level uses parallel plan:
>
> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
> ...
> SPI_cursor_fetch(fsstate->portal, true, 1);

Parallel execution isn't possible if you are using a cursor-type
interface, because a parallel query can't be suspended and resumed
like a non-parallel query.  If you use a function that executes the
query to completion in one go, like SPI_execute_plan, then it's cool.
See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-16 Thread Robert Haas

On Tue, Aug 16, 2016 at 1:05 AM, Rushabh Lathia
 wrote:
> I agree, this make sense.
>
> Here is the patch to allocate worker instrumentation into same context
> as the regular instrumentation which is per-query context.

Looks good, committed.  I am not sure it was a very good idea for
af33039317ddc4a0e38a02e2255c2bf453115fd2 by Tom Lane to change the
current memory context for the entire execution of gather_readnext();
this might not be the only or the last bug that results from that
decision.  However, I don't really want to get an argument about that
right now, and this at least fixes the problem we know about.  Thanks
for the report and patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Rushabh Lathia

On Mon, Aug 15, 2016 at 6:02 PM, Robert Haas  wrote:

> On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila 
> wrote:
> > AFAICS, your patch seems to be the right fix for this issue, unless we
> > need the instrumentation information during execution (other than for
> > explain) for some purpose.
>
> Hmm, I disagree.  It should be the job of
> ExecParallelRetrieveInstrumentation to allocate its data in the
> correct context, not the responsibility of nodeGather.c to work around
> the fact that it doesn't.  The worker instrumentation should be
> allocated in the same context as the regular instrumentation
> information, which I assume is probably the per-query context.
>

I agree, this make sense.

Here is the patch to allocate worker instrumentation into same context
as the regular instrumentation which is per-query context.

PFA patch.


--
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 380d743..5aa6f02 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -500,6 +500,7 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	int			n;
 	int			ibytes;
 	int			plan_node_id = planstate->plan->plan_node_id;
+	MemoryContext oldcontext;
 
 	/* Find the instumentation for this node. */
 	for (i = 0; i < instrumentation->num_plan_nodes; ++i)
@@ -514,10 +515,19 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	for (n = 0; n < instrumentation->num_workers; ++n)
 		InstrAggNode(planstate->instrument, [n]);
 
-	/* Also store the per-worker detail. */
+	/*
+	 * Also store the per-worker detail.
+	 *
+	 * Worker instrumentation should be allocated in the same context as
+	 * the regular instrumentation information, which is the per-query
+	 * context. Switch into per-query memory context.
+	 */
+	oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
 	ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
 	planstate->worker_instrument =
 		palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
+	MemoryContextSwitchTo(oldcontext);
+
 	planstate->worker_instrument->num_workers = instrumentation->num_workers;
 	memcpy(>worker_instrument->instrument, instrument, ibytes);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Robert Haas

On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila  wrote:
> AFAICS, your patch seems to be the right fix for this issue, unless we
> need the instrumentation information during execution (other than for
> explain) for some purpose.

Hmm, I disagree.  It should be the job of
ExecParallelRetrieveInstrumentation to allocate its data in the
correct context, not the responsibility of nodeGather.c to work around
the fact that it doesn't.  The worker instrumentation should be
allocated in the same context as the regular instrumentation
information, which I assume is probably the per-query context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-13 Thread Amit Kapila

On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia
 wrote:
> Hi All,
>
> Recently while running tpc-h queries on postgresql master branch, I am
> noticed
> random server crash. Most of the time server crash coming while turn tpch
> query
> number 3 - (but its very random).
>
>
> Here its clear that work_instrument is either corrupted or Un-inililized
> that is the
> reason its ending up with server crash.
>
> With bit more debugging and looked at git history I found that issue started
> coming
> with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
> calls
> ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
> calls ExecParallelFinish() which collects the instrumentation before marking
> ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the
> allocation
> of planstate->worker_instrument.
>
> With commit af33039317 now we calling the gather_readnext() with per-tuple
> context,
> but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
> allocation
> of planstate->worker_instrument into per-tuple context - which is wrong.
>
> Now fix can be:
>
> 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
> let
> ExecEndGather() do that things.
>

I don't think we can wait till ExecEndGather() to collect statistics,
as we need it before that for explain path.  However, we do call
ExecShutdownNode() from ExecutePlan() when there are no more tuples
which can take care of ensuring the shutdown of Gather node.   I think
the advantage of calling it in
gather_readnext() is that it will resources to be released early and
populating the instrumentation/statistics as early as possible.

> But with this change, gather_readread() and
> gather_getnext() depend on planstate->reader structure to continue reading
> tuple.
> Now either we can change those condition to be depend on planstate->nreaders
> or
> just pfree(planstate->reader) into gather_readnext() instead of calling
> ExecShutdownGatherWorkers().
>
>
> Attaching patch, which fix the issue with approach 1).
>

AFAICS, your patch seems to be the right fix for this issue, unless we
need the instrumentation information during execution (other than for
explain) for some purpose.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-07 Thread Robert Haas

On Tue, Jul 5, 2016 at 3:59 PM, Peter Geoghegan  wrote:
> On Tue, Jul 5, 2016 at 12:58 PM, Tom Lane  wrote:
>> Perhaps we could change the wording of temp_file_limit's description
>> from "space that a session can use" to "space that a process can use"
>> to help clarify this?
>
> That's all that I was looking for, really.

OK, done that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Tom Lane

Peter Geoghegan  writes:
> On Tue, Jul 5, 2016 at 12:00 PM, Robert Haas  wrote:
>> I think that it is not worth mentioning specifically for
>> temp_file_limit; to me that seems to be a hole with no bottom.  We'll
>> end up arguing about which GUCs should mention it specifically and
>> there will be no end to it.

> I don't think that you need it for any other GUC, so I really don't
> know why you're concerned about a slippery slope.

FWIW, I agree with Robert on this.  It seems just weird to call out
temp_file_limit specifically.  Also, I don't agree that that's the
only interesting per-process resource consumption; max_files_per_process
seems much more likely to cause trouble in practice.

Perhaps we could change the wording of temp_file_limit's description
from "space that a session can use" to "space that a process can use"
to help clarify this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Peter Geoghegan

On Tue, Jul 5, 2016 at 12:58 PM, Tom Lane  wrote:
> Perhaps we could change the wording of temp_file_limit's description
> from "space that a session can use" to "space that a process can use"
> to help clarify this?

That's all that I was looking for, really.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Peter Geoghegan

On Tue, Jul 5, 2016 at 12:00 PM, Robert Haas  wrote:
> I think that it is not worth mentioning specifically for
> temp_file_limit; to me that seems to be a hole with no bottom.  We'll
> end up arguing about which GUCs should mention it specifically and
> there will be no end to it.

I don't think that you need it for any other GUC, so I really don't
know why you're concerned about a slippery slope. The only other
resource GUC that is scoped per session that I can see is
temp_buffers, but that doesn't need to change, since parallel workers
cannot use temp_buffers directly in practice. max_files_per_process is
already clearly per process, so no change needed there either.

I don't see a case other than temp_file_limit that appears to be even
marginally in need of a specific note.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Robert Haas

On Tue, Jul 5, 2016 at 1:58 PM, Peter Geoghegan  wrote:
> On Tue, Jul 5, 2016 at 7:45 AM, Robert Haas  wrote:
>> Since Peter doesn't seem in a hurry to produce a patch for this issue,
>> I wrote one.  It is attached.  I'll commit this in a day or two if
>> nobody objects.
>
> Sorry about the delay.
>
> Your patch seems reasonable, but I thought we'd also want to change
> "per session" to "per session (with an additional temp_file_limit
> allowance within each parallel worker)" for temp_file_limit.
>
> I think it's worthwhile noting this for temp_file_limit specifically,
> since it's explicitly a per session limit, whereas users are quite
> used to the idea that work_mem might be doled out multiple times for
> multiple executor nodes.

I think that it is not worth mentioning specifically for
temp_file_limit; to me that seems to be a hole with no bottom.  We'll
end up arguing about which GUCs should mention it specifically and
there will be no end to it.

We can see what other people think, but that's my position.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Peter Geoghegan

On Tue, Jul 5, 2016 at 7:45 AM, Robert Haas  wrote:
> Since Peter doesn't seem in a hurry to produce a patch for this issue,
> I wrote one.  It is attached.  I'll commit this in a day or two if
> nobody objects.

Sorry about the delay.

Your patch seems reasonable, but I thought we'd also want to change
"per session" to "per session (with an additional temp_file_limit
allowance within each parallel worker)" for temp_file_limit.

I think it's worthwhile noting this for temp_file_limit specifically,
since it's explicitly a per session limit, whereas users are quite
used to the idea that work_mem might be doled out multiple times for
multiple executor nodes.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-07-05 Thread Robert Haas

On Tue, Jun 21, 2016 at 8:15 AM, Robert Haas  wrote:
> On Mon, Jun 20, 2016 at 11:01 PM, Tom Lane  wrote:
>> Peter Geoghegan  writes:
>>> On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
 What I'm tempted to do is trying to document that, as a point of
 policy, parallel query in 9.6 uses up to (workers + 1) times the
 resources that a single session might use.  That includes not only CPU
 but also things like work_mem and temp file space.  This obviously
 isn't ideal, but it's what could be done by the ship date.
>>
>>> Where would that be documented, though? Would it need to be noted in
>>> the case of each such GUC?
>>
>> Why can't we just note this in the number-of-workers GUCs?  It's not like
>> there even *is* a GUC for many of our per-process resource consumption
>> behaviors.
>
> +1.

Since Peter doesn't seem in a hurry to produce a patch for this issue,
I wrote one.  It is attached.  I'll commit this in a day or two if
nobody objects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


parallel-workers-guc-doc.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-21 Thread Robert Haas

On Mon, Jun 20, 2016 at 11:01 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
>>> What I'm tempted to do is trying to document that, as a point of
>>> policy, parallel query in 9.6 uses up to (workers + 1) times the
>>> resources that a single session might use.  That includes not only CPU
>>> but also things like work_mem and temp file space.  This obviously
>>> isn't ideal, but it's what could be done by the ship date.
>
>> Where would that be documented, though? Would it need to be noted in
>> the case of each such GUC?
>
> Why can't we just note this in the number-of-workers GUCs?  It's not like
> there even *is* a GUC for many of our per-process resource consumption
> behaviors.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-20 Thread Tom Lane

Peter Geoghegan  writes:
> On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
>> What I'm tempted to do is trying to document that, as a point of
>> policy, parallel query in 9.6 uses up to (workers + 1) times the
>> resources that a single session might use.  That includes not only CPU
>> but also things like work_mem and temp file space.  This obviously
>> isn't ideal, but it's what could be done by the ship date.

> Where would that be documented, though? Would it need to be noted in
> the case of each such GUC?

Why can't we just note this in the number-of-workers GUCs?  It's not like
there even *is* a GUC for many of our per-process resource consumption
behaviors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-20 Thread Peter Geoghegan

On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
> What I'm tempted to do is trying to document that, as a point of
> policy, parallel query in 9.6 uses up to (workers + 1) times the
> resources that a single session might use.  That includes not only CPU
> but also things like work_mem and temp file space.  This obviously
> isn't ideal, but it's what could be done by the ship date.

Where would that be documented, though? Would it need to be noted in
the case of each such GUC?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-07 Thread Peter Geoghegan

On Tue, Jun 7, 2016 at 8:32 AM, Robert Haas  wrote:
> You previously offered to write a patch for this.  Are you still
> planning to do that?

OK, I'll get to that in the next few days.

I'm slightly concerned that I might have missed a real problem in the
code. I'll need to examine the issue more closely.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-07 Thread Robert Haas

On Sun, Jun 5, 2016 at 4:32 PM, Peter Geoghegan  wrote:
> On Wed, May 18, 2016 at 12:01 PM, Peter Geoghegan  wrote:
>>> I think for 9.6 we just have to document this issue.  In the next
>>> release, we could (and might well want to) try to do something more
>>> clever.
>>
>> Works for me. You may wish to update comments within fd.c at the same time.
>
> I've created a 9.6 open issue for this.

You previously offered to write a patch for this.  Are you still
planning to do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-06-05 Thread Peter Geoghegan

On Wed, May 18, 2016 at 12:01 PM, Peter Geoghegan  wrote:
>> I think for 9.6 we just have to document this issue.  In the next
>> release, we could (and might well want to) try to do something more
>> clever.
>
> Works for me. You may wish to update comments within fd.c at the same time.

I've created a 9.6 open issue for this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query

2016-05-22 Thread Michael Paquier

On Sun, May 22, 2016 at 10:36 AM, Tatsuo Ishii  wrote:
>> The brief introudce of MPI(Message Passing Interface) as following URL,
>> which is a message protocol used for parallel computinng, just like DSM
>> does in parallel query. The DSM play a message passing role(in fact, it's.
>> by passing the query plan/raw node tree to anthor worker) in parallel
>> query.  I think the parallel query resemble the MPI. so I mentioned that we
>> can refere to the MPI bechmark, and use the idea which is used to test the
>> parallel computing system.If the parallel query to be feature in future, I
>> think we must have an other bechmark for this feature, just like tpcc does.
>> So, I mention the MPI.
>>
>> https://www.open-mpi.org/
>>
>> https://en.wikipedia.org/wiki/Message_Passing_Interface
>
> Thank you for the info.

Ishii-san is doing so... Please be sure to press "reply-all" when
answering to an email in the community mailing lists. It is hard to
follow this discussion.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query

2016-05-22 Thread Tatsuo Ishii

> The brief introudce of MPI(Message Passing Interface) as following URL,
> which is a message protocol used for parallel computinng, just like DSM
> does in parallel query. The DSM play a message passing role(in fact, it's.
> by passing the query plan/raw node tree to anthor worker) in parallel
> query.  I think the parallel query resemble the MPI. so I mentioned that we
> can refere to the MPI bechmark, and use the idea which is used to test the
> parallel computing system.If the parallel query to be feature in future, I
> think we must have an other bechmark for this feature, just like tpcc does.
> So, I mention the MPI.
> 
> https://www.open-mpi.org/
> 
> https://en.wikipedia.org/wiki/Message_Passing_Interface

Thank you for the info.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query

2016-05-22 Thread Tatsuo Ishii

What's MPI?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Maybe we can refere to the MPI test cases.
> 
> On Sun, May 22, 2016 at 3:19 PM, Hao Lee  wrote:
> 
>> What kind of cases do you want to run? beside the multi-cores, i think the
>> working mem and its access rate are also a main criteria. As you known the
>> parallel query uses DSM as IPC tools, which means it will meet the memory
>> access barrier.，the memory bus has its access rate limitation. the
>> different system architecture, such as different CPU architect, etc,will
>> also be considered when we do the performace test. Do we need to consider
>> what i mentioned above?
>>
>> Best Regards,
>>
>> Hao LEE.
>>
>> On Thu, May 19, 2016 at 11:07 PM, Tatsuo Ishii 
>> wrote:
>>
>>> Robert,
>>> (and others who are involved in parallel query of PostgreSQL)
>>>
>>> PostgreSQL Enterprise Consortium (one of the PostgreSQL communities in
>>> Japan, in short "PGECons") is planning to test the parallel query
>>> performance of PostgreSQL 9.6. Besides TPC-H (I know you have already
>>> tested on an IBM box), what kind of tests would you like be performed?
>>>
>>> We are planning to use a big intel box (like more than 60 cores).
>>> Any suggestions are welcome.
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>
>>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query

2016-05-22 Thread Tatsuo Ishii

Thank you for the suggesion.  Currently no particular test cases are
in my mind. That's the reason why I need input from
community. Regarding the test schedule, PGECons starts the planning
from next month or so. So I guess test starts no earlier than July.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> What kind of cases do you want to run? beside the multi-cores, i think the
> working mem and its access rate are also a main criteria. As you known the
> parallel query uses DSM as IPC tools, which means it will meet the memory
> access barrier.，the memory bus has its access rate limitation. the
> different system architecture, such as different CPU architect, etc,will
> also be considered when we do the performace test. Do we need to consider
> what i mentioned above?
> 
> Best Regards,
> 
> Hao LEE.
> 
> On Thu, May 19, 2016 at 11:07 PM, Tatsuo Ishii  wrote:
> 
>> Robert,
>> (and others who are involved in parallel query of PostgreSQL)
>>
>> PostgreSQL Enterprise Consortium (one of the PostgreSQL communities in
>> Japan, in short "PGECons") is planning to test the parallel query
>> performance of PostgreSQL 9.6. Besides TPC-H (I know you have already
>> tested on an IBM box), what kind of tests would you like be performed?
>>
>> We are planning to use a big intel box (like more than 60 cores).
>> Any suggestions are welcome.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread Peter Geoghegan

On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
>> I'll write a patch to fix the issue, if there is a consensus on a solution.
>
> I think for 9.6 we just have to document this issue.  In the next
> release, we could (and might well want to) try to do something more
> clever.

Works for me. You may wish to update comments within fd.c at the same time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread David Rowley

On 18 May 2016 at 22:40, Robert Haas  wrote:

> On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan  wrote:
> > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan  wrote:
> >> Fundamentally, since temporary_files_size enforcement simply
> >> piggy-backs on low-level fd.c file management, without any
> >> consideration of what the temp files contain, it'll be hard to be sure
> >> that parallel workers will not have issues. I think it'll be far
> >> easier to fix the problem then it would be to figure out if it's
> >> possible to get away with it.
> >
> > I'll write a patch to fix the issue, if there is a consensus on a
> solution.
>
> I think for 9.6 we just have to document this issue.  In the next
> release, we could (and might well want to) try to do something more
> clever.
>
> What I'm tempted to do is trying to document that, as a point of
> policy, parallel query in 9.6 uses up to (workers + 1) times the
> resources that a single session might use.  That includes not only CPU
> but also things like work_mem and temp file space.  This obviously
> isn't ideal, but it's what could be done by the ship date.
>

I was asked (internally I believe) about abuse of work_mem during my work
on parallel aggregates, at the time I didn't really feel like I was abusing
that any more than parallel hash join was.  My thought was that one day it
would be nice if work_mem could be granted to a query and we had some query
marshal system which ensured that the total grants did not exceed the
server wide memory dedicated to work_mem. Of course that's lots of work, as
there's at least one node (HashAgg) which can still blow out work_mem for
bad estimates. For this release, I assumed it wouldn't be too big an issue
if we're shipping with max_parallel_degree = 0 as we could just decorate
the docs with some warnings about work_mem is per node / per worker to
caution users setting this setting any higher. That might be enough to give
us wriggle from for the future where we can make improvements, so I agree
with Robert, the docs seem like the best solution for 9.6.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread Robert Haas

On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan  wrote:
> On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan  wrote:
>> Fundamentally, since temporary_files_size enforcement simply
>> piggy-backs on low-level fd.c file management, without any
>> consideration of what the temp files contain, it'll be hard to be sure
>> that parallel workers will not have issues. I think it'll be far
>> easier to fix the problem then it would be to figure out if it's
>> possible to get away with it.
>
> I'll write a patch to fix the issue, if there is a consensus on a solution.

I think for 9.6 we just have to document this issue.  In the next
release, we could (and might well want to) try to do something more
clever.

What I'm tempted to do is trying to document that, as a point of
policy, parallel query in 9.6 uses up to (workers + 1) times the
resources that a single session might use.  That includes not only CPU
but also things like work_mem and temp file space.  This obviously
isn't ideal, but it's what could be done by the ship date.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-17 Thread Peter Geoghegan

On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan  wrote:
> Fundamentally, since temporary_files_size enforcement simply
> piggy-backs on low-level fd.c file management, without any
> consideration of what the temp files contain, it'll be hard to be sure
> that parallel workers will not have issues. I think it'll be far
> easier to fix the problem then it would be to figure out if it's
> possible to get away with it.

I'll write a patch to fix the issue, if there is a consensus on a solution.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-17 Thread Peter Geoghegan

On Tue, May 17, 2016 at 1:53 PM, Amit Kapila  wrote:
> What kind of special treatment are you expecting for temporary_files_size,
> also why do you think it is required?  Currently neither we build hash in
> parallel nor there is any form of parallel sort work.

I expect only that temporary_files_size be described accurately, and
have new behavior for parallel query that is not surprising. There are
probably several solutions that would meet that standard, and I am not
attached to any particular one of them.

I wrote a parallel sort patch already (CREATE INDEX for the B-Tree
AM), and will post it at an opportune time. So, I think we can expect
your observations about there not being parallel sort work to no
longer apply in a future release, which we should get ahead of now.
Also, won't parallel workers that build their own copy of the hash
table (for a hash join) also use their own temp files, if there is a
need for temp files? I think parallel query will end up sharing temp
files fairly often, and not just out of convenience to implementers
(that is, not just to avoid using shared memory extensively).

Fundamentally, since temporary_files_size enforcement simply
piggy-backs on low-level fd.c file management, without any
consideration of what the temp files contain, it'll be hard to be sure
that parallel workers will not have issues. I think it'll be far
easier to fix the problem then it would be to figure out if it's
possible to get away with it.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query and temp_file_limit

2016-05-17 Thread Amit Kapila

On Wed, May 18, 2016 at 12:55 AM, Peter Geoghegan  wrote:
>
> temp_file_limit "specifies the maximum amount of disk space that a
> session can use for temporary files, such as sort and hash temporary
> files", according to the documentation. That's not true when parallel
> query is in use, since the global variable temporary_files_size
> receives no special treatment for parallel query.
>

What kind of special treatment are you expecting for temporary_files_size,
also why do you think it is required?  Currently neither we build hash in
parallel nor there is any form of parallel sort work.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] parallel query vs extensions

2016-04-18 Thread Noah Misch

On Mon, Apr 18, 2016 at 09:56:28AM -0400, Robert Haas wrote:
> On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janes  wrote:
> > Should every relevant contrib extension get a version bump with a
> > transition file which is nothing but a list of "alter function blah
> > blah blah parallel safe" ?
> 
> Yes, I think that's what we would need to do.  It's a lot of work,
> albeit mostly mechanical.

This is in the open items list, but I think it is too late to include such a
change in 9.6.  This is an opportunity for further optimization, not a defect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel query vs extensions

2016-04-18 Thread Robert Haas

On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janes  wrote:
> Should every relevant contrib extension get a version bump with a
> transition file which is nothing but a list of "alter function blah
> blah blah parallel safe" ?

Yes, I think that's what we would need to do.  It's a lot of work,
albeit mostly mechanical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel query vs extensions

2016-04-16 Thread Craig Ringer

On 15 April 2016 at 12:45, Jeff Janes  wrote:

> I think there are a lot of extensions which create functions which
> could benefit from being declared parallel safe.  But how does one go
> about doing that?
>
> create extension xml2;
> select xml_valid(filler),count(*)  from pgbench_accounts group by 1;
>  Time: 3205.830 ms
>
> alter function xml_valid (text) parallel safe;
>
> select xml_valid(filler),count(*)  from pgbench_accounts group by 1;
> Time: 1803.936 ms
>
> (Note that I have no particular interest in the xml2 extension, it
> just provides a convenient demonstration of the general principle)
>
> Should every relevant contrib extension get a version bump with a
> transition file which is nothing but a list of "alter function blah
> blah blah parallel safe" ?
>
> And what of non-contrib extensions?  Is there some clever alternative
> to having a bunch of pseudo versions, like "1.0", "1.0_96", "1.1",
> "1.1_9.6", "1.2", "1.2_96", etc.?
>

What I've done in the past for similar problems is preprocess the
extension--x.y.sql files in the Makefile to conditionally remove
unsupported syntax, functions, etc.

It's rather less than perfect because if the user pg_upgrades they won't
get the now-supported options. They'll have the old-version extension on
the new version and would have to drop & re-create to get the new version
contents.

You could create variant pseudo-extensions to make this clearer -
myext95--1.0.sql, myext96--1.0.sql, etc - but there's still no way to ALTER
EXTENSION to upgrade. pseudo-versions like you suggest are probably going
to work, but the extension machinery doesn't understand them and you can
only specify one of them as the default in the control file.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Michael Paquier

On Wed, Mar 9, 2016 at 12:34 AM, Robert Haas  wrote:
> On Tue, Mar 8, 2016 at 8:23 AM, Michael Paquier
>  wrote:
>> On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer  wrote:
>>> On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:

 While testing a parallel scan feature on standby server, it is found that
 the parallel query fails with an error "ERROR:  failed to initialize
 transaction_read_only to 0".

>>>
>>> Looks like it might be a good idea to add some tests to src/test/recovery
>>> for parallel query on standby servers...
>>
>> An even better thing would be a set of read-only tests based on the
>> database "regression" generated by make check, itself run with
>> pg_regress.
>
> I'm not sure anything in the main regression suite actually goes
> parallel right now, which is probably the first thing to fix.
>
> Unless, of course, you use force_parallel_mode=regress, max_parallel_degree>0.

I was thinking about a test in src/test/recovery, that runs a standby
and a master. pg_regress with the main recovery test suite is run on
the master, then a second pg_regress run happens with a set of
read-only queries, set with sql/expected located in src/test/recovery
directly for example. Do we actually have a buildfarm animal using
those parameters in extra_config?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Robert Haas

On Tue, Mar 8, 2016 at 8:23 AM, Michael Paquier
 wrote:
> On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer  wrote:
>> On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:
>>>
>>> While testing a parallel scan feature on standby server, it is found that
>>> the parallel query fails with an error "ERROR:  failed to initialize
>>> transaction_read_only to 0".
>>>
>>
>> Looks like it might be a good idea to add some tests to src/test/recovery
>> for parallel query on standby servers...
>
> An even better thing would be a set of read-only tests based on the
> database "regression" generated by make check, itself run with
> pg_regress.

I'm not sure anything in the main regression suite actually goes
parallel right now, which is probably the first thing to fix.

Unless, of course, you use force_parallel_mode=regress, max_parallel_degree>0.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Michael Paquier

On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer  wrote:
> On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:
>>
>>
>> While testing a parallel scan feature on standby server, it is found that
>> the parallel query fails with an error "ERROR:  failed to initialize
>> transaction_read_only to 0".
>>
>
> Looks like it might be a good idea to add some tests to src/test/recovery
> for parallel query on standby servers...

An even better thing would be a set of read-only tests based on the
database "regression" generated by make check, itself run with
pg_regress.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Craig Ringer

On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:

>
> While testing a parallel scan feature on standby server, it is found that
> the parallel query fails with an error "*ERROR:  failed to initialize
> transaction_read_only to 0*".
>
>
Looks like it might be a good idea to add some tests to src/test/recovery
for parallel query on standby servers...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel query execution

2013-01-24 Thread Paul Ramsey

On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 I believe it is time to start adding parallel execution to the backend. 
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes. I think it is time we
 start to consider additional options
 
 

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries. The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
I just got out of a meeting that included Oracle Spatial folks, who
were boasting of big performance increases in enabling parallel query
on their spatial queries. Basically the workloads on things like big
spatial joins are entirely CPU bound, so they are seeing that adding
15 processors makes things 15x faster. Spatial folks would love love
love to see parallel query execution.

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

Re: [HACKERS] Parallel query execution

2013-01-24 Thread Bruce Momjian

On Thu, Jan 24, 2013 at 02:34:49PM -0800, Paul Ramsey wrote:
 On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes. I think it is time we
 start to consider additional options
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries. The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 
 I just got out of a meeting that included Oracle Spatial folks, who
 were boasting of big performance increases in enabling parallel query
 on their spatial queries. Basically the workloads on things like big
 spatial joins are entirely CPU bound, so they are seeing that adding
 15 processors makes things 15x faster. Spatial folks would love love
 love to see parallel query execution.

I added PostGIS under the Expensive Functions opportunity:


https://wiki.postgresql.org/wiki/Parallel_Query_Execution#Specific_Opportunities

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Daniel Farina

On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

 In case you hadn't noticed, we've totally lost control of the CF
 process.  Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  At this point I'd bet against releasing 9.3 during
 2013.

I have been skimming the commitfest application, and unlike some of
the previous commitfests a huge number of patches have had review at
some point in time, but probably need more...so looking for the red
Nobody in the 'reviewers' column probably understates the shortage
of review.

I'm curious what the qualitative feelings are on patches or clusters
thereof and what kind of review would be helpful in clearing the
field.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Magnus Hagander

On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.

 Like when?  I don't remember a policy of not discussing things now.
 Does anyone else remember this?  Are you saying feature discussion is
 only between commit-fests?  Is this written down anywhere?  I only
 remember beta-time as a time not to discuss features.

We kind of do - when in a CF we should do reviewing of existing
patches, when outside a CF we should do discussions and work on new
features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
doesn't specifically say do this and don't do htat, but it says focus
on review and discussing things that will happen that far ahead is
definitely not focusing on review.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Robert Haas

On Wed, Jan 16, 2013 at 2:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

 In case you hadn't noticed, we've totally lost control of the CF
 process.  Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  At this point I'd bet against releasing 9.3 during
 2013.

Or we could reject all of those patches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Robert Haas

On Wed, Jan 16, 2013 at 6:52 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.

 Like when?  I don't remember a policy of not discussing things now.
 Does anyone else remember this?  Are you saying feature discussion is
 only between commit-fests?  Is this written down anywhere?  I only
 remember beta-time as a time not to discuss features.

 We kind of do - when in a CF we should do reviewing of existing
 patches, when outside a CF we should do discussions and work on new
 features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
 doesn't specifically say do this and don't do htat, but it says focus
 on review and discussing things that will happen that far ahead is
 definitely not focusing on review.

Bruce is evidently under the impression that he's no longer under any
obligation to review or commit other people's patches, or participate
in the CommitFest process in any way.  I believe that he has not
committed a significant patch written by someone else in several
years.  If the committers on the core team aren't committed to the
process, it doesn't stand much chance of working.

The fact that I have been completely buried for the last six months is
perhaps not helping, either, but even at the very low level of
engagement I've been at recently, I've still done more reviews (a few)
than patch submissions (none).  I view it as everyone's responsibility
to maintain a similar balance in their own work.  And some people are,
but not enough, especially among the committers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost

* Claudio Freire (klaussfre...@gmail.com) wrote:
 Well, there's the fault in your logic. It won't be as linear.

I really don't see how this has become so difficult to communicate.

It doesn't have to be linear.

We're currently doing massive amounts of parallel processing by hand
using partitioning, tablespaces, and client-side logic to split up the
jobs.  It's certainly *much* faster than doing it in a single thread.
It's also faster with 10 processes going than 5 (we've checked).  With
10 going, we've hit the FC fabric limit (and these are spinning disks in
the SAN, not SSDs).  I'm also sure it'd be much slower if all 10
processes were trying to read data through a single process that's
reading from the I/O system.  We've got some processes which essentially
end up doing that and we don't come anywhere near the total FC fabric
bandwidth when just scanning through the system because, at that point,
you do hit the limits of how fast the individual drive sets can provide
data.

To be clear- I'm not suggesting that we would parallelize a SeqScan node
and have the nodes above it be single-threaded.  As I said upthread- we
want to parallelize reading and processing the data coming in.  Perhaps
at some level that works out to not change how we actually *do* seqscans
at all and instead something higher in the plan tree just creates
multiple of them on independent threads, but it's still going to end up
being parallel I/O in the end.

I'm done with this thread for now- as brought up, we need to focus on
getting 9.3 out the door.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 In case you hadn't noticed, we've totally lost control of the CF
 process.  

I concur.

 Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  

For my small part, it wasn't my intent to drop a contentious patch at
the end.  I had felt it was pretty minor and relatively simple.  My
arguments regarding the popen patch were simply that it didn't address
one of the use-cases that I was hoping to.

I'll hold off on working on the compressed transport for now in favor of
doing reviews and trying to help get 9.3 wrapped up.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost

* Daniel Farina (dan...@heroku.com) wrote:
 I have been skimming the commitfest application, and unlike some of
 the previous commitfests a huge number of patches have had review at
 some point in time, but probably need more...so looking for the red
 Nobody in the 'reviewers' column probably understates the shortage
 of review.

I've been frustrated by that myself.  I realize we don't want to
duplicate work but I'm really starting to think that having the
Reviewers column has turned out to actually work against us.

 I'm curious what the qualitative feelings are on patches or clusters
 thereof and what kind of review would be helpful in clearing the
 field.

I haven't been thrilled with the patches that I've looked at but they've
also been ones that hadn't been reviewed before, so perhaps that's what
should be expected.  It'd be neat if we had some idea of what committers
were actively working on and keep off of *those*, but keep working on
the ones which aren't being worked by a committer currently.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Andrew Dunstan



On 01/15/2013 11:32 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:


On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
Is it still being worked on?




I am about half way through reviewing it. Unfortunately paid work take 
precedence over unpaid work.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire

On Wed, Jan 16, 2013 at 10:33 AM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 Well, there's the fault in your logic. It won't be as linear.

 I really don't see how this has become so difficult to communicate.

 It doesn't have to be linear.

 We're currently doing massive amounts of parallel processing by hand
 using partitioning, tablespaces, and client-side logic to split up the
 jobs.  It's certainly *much* faster than doing it in a single thread.
 It's also faster with 10 processes going than 5 (we've checked).  With
 10 going, we've hit the FC fabric limit (and these are spinning disks in
 the SAN, not SSDs).  I'm also sure it'd be much slower if all 10
 processes were trying to read data through a single process that's
 reading from the I/O system.  We've got some processes which essentially
 end up doing that and we don't come anywhere near the total FC fabric
 bandwidth when just scanning through the system because, at that point,
 you do hit the limits of how fast the individual drive sets can provide
 data.

Well... just closing then (to let people focus on 9.3's CF), that's a
level of hardware I haven't had experience with, but seems to behave
much different than regular (big and small) RAID arrays.

In any case, perhaps tablespaces are a hint here: if nodes are working
on different tablespaces, there's an indication that they *can* be
parallelized efficiently. That could be fleshed out on a parallel
execution node, but for that to work the whole execution engine needs
to be thread-safe (or it has to fork). It won't be easy.

It's best to concentrate on lower-hanging fruits, like sorting and aggregates.

Now back to the CF.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Noah Misch

On Wed, Jan 16, 2013 at 08:42:29AM -0500, Stephen Frost wrote:
 * Daniel Farina (dan...@heroku.com) wrote:
  I have been skimming the commitfest application, and unlike some of
  the previous commitfests a huge number of patches have had review at
  some point in time, but probably need more...so looking for the red
  Nobody in the 'reviewers' column probably understates the shortage
  of review.
 
 I've been frustrated by that myself.  I realize we don't want to
 duplicate work but I'm really starting to think that having the
 Reviewers column has turned out to actually work against us.

That column tells the CF manager whom to browbeat.  Without a CF manager, a
stale entry can indeed make a patch look under-control when it isn't.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 01:37:28PM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote:
 
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for
 the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like
 GIS)
 
  parallel data load? :/
 
 We have that in pg_restore, and I think we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?
 
 Not exactly, I meant something like being able to use parallel processing when
 doing INSERT or COPY directly in core. If there is a parallel processing
 infrastructure, it could also be used for such write operations. I agree that
 the cases mentioned by Josh are far more appealing though...

I am not sure how a COPY could be easily parallelized, but I supposed it
could be done as part of the 1GB segment feature.  People have
complained that COPY is CPU-bound, so it might be very interesting to
see if we could offload some of that parsing overhead to a child.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 01:48:29AM -0300, Alvaro Herrera wrote:
 Bruce Momjian escribió:
  On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
   
   
   On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
   
   Claudio, Stephen,
   
   It really seems like the areas where we could get the most bang for 
   the
   buck in parallelism would be:
   
   1. Parallel sort
   2. Parallel aggregation (for commutative aggregates)
   3. Parallel nested loop join (especially for expression joins, like 
   GIS)
   
   parallel data load? :/
  
  We have that in pg_restore, and I thinnk we are getting parallel dump in
  9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
  Is it still being worked on?
 
 It's in the previous-to-last commitfest.  IIRC that patch required
 review and testing from people with some Windows background.
 
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

Wow, I had no idea we were that far behind.  I have avoided commit-fest
work because I often travel so might leave the items abandoned, and I
try to do cleanup of items that never make the commit-fest --- I thought
that was something that needed doing too, and I rarely can complete that
task.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 08:11:06AM -0500, Robert Haas wrote:
  We kind of do - when in a CF we should do reviewing of existing
  patches, when outside a CF we should do discussions and work on new
  features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
  doesn't specifically say do this and don't do htat, but it says focus
  on review and discussing things that will happen that far ahead is
  definitely not focusing on review.
 
 Bruce is evidently under the impression that he's no longer under any
 obligation to review or commit other people's patches, or participate
 in the CommitFest process in any way.  I believe that he has not
 committed a significant patch written by someone else in several
 years.  If the committers on the core team aren't committed to the
 process, it doesn't stand much chance of working.

I assume you know I was the most frequent committer of other people's
patches for years before the commit-fests started, so I thought I would
move on to other things.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote:
 
 On 01/15/2013 11:32 PM, Bruce Momjian wrote:
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for 
  the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like 
  GIS)
 
 parallel data load? :/
 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?
 
 
 
 I am about half way through reviewing it. Unfortunately paid work
 take precedence over unpaid work.

Do you think it will make it into 9.3?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Andrew Dunstan



On 01/16/2013 12:20 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote:

On 01/15/2013 11:32 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:

On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
Is it still being worked on?



I am about half way through reviewing it. Unfortunately paid work
take precedence over unpaid work.

Do you think it will make it into 9.3?


Yes, I hope it will.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 I am not sure how a COPY could be easily parallelized, but I supposed it
 could be done as part of the 1GB segment feature.  People have
 complained that COPY is CPU-bound, so it might be very interesting to
 see if we could offload some of that parsing overhead to a child.

COPY can certainly be CPU bound but before we can parallelize that
usefully we need to solve the problem around extent locking when trying
to do multiple COPY's to the same table.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Pavel Stehule

2013/1/16 Stephen Frost sfr...@snowman.net:
 * Bruce Momjian (br...@momjian.us) wrote:
 I am not sure how a COPY could be easily parallelized, but I supposed it
 could be done as part of the 1GB segment feature.  People have
 complained that COPY is CPU-bound, so it might be very interesting to
 see if we could offload some of that parsing overhead to a child.

 COPY can certainly be CPU bound but before we can parallelize that
 usefully we need to solve the problem around extent locking when trying
 to do multiple COPY's to the same table.

Probably update any related indexes and constraint checking should be
paralellized.

Regards

Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 10:06:51PM +0100, Pavel Stehule wrote:
 2013/1/16 Stephen Frost sfr...@snowman.net:
  * Bruce Momjian (br...@momjian.us) wrote:
  I am not sure how a COPY could be easily parallelized, but I supposed it
  could be done as part of the 1GB segment feature.  People have
  complained that COPY is CPU-bound, so it might be very interesting to
  see if we could offload some of that parsing overhead to a child.
 
  COPY can certainly be CPU bound but before we can parallelize that
  usefully we need to solve the problem around extent locking when trying
  to do multiple COPY's to the same table.
 
 Probably update any related indexes and constraint checking should be
 paralellized.

Wiki updated:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Dickson S. Guedes

2013/1/16 Bruce Momjian br...@momjian.us:
 Wiki updated:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Could we add CTE to that opportunities list? I think that some kind of
queries in CTE queries could be easilly parallelized.

[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 07:57:01PM -0200, Dickson S. Guedes wrote:
 2013/1/16 Bruce Momjian br...@momjian.us:
  Wiki updated:
 
  https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Could we add CTE to that opportunities list? I think that some kind of
 queries in CTE queries could be easilly parallelized.

I added CTEs with joins.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes

On Tuesday, January 15, 2013, Stephen Frost wrote:

 * Gavin Flower (gavinflo...@archidevsys.co.nz javascript:;) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?

 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.


I'd rather not have the benefits of parallelism be tied to partitioning if
we can help it.  Hopefully implementing parallelism in core would result in
something more transparent than that.

Cheers,

Jeff

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes

On Tuesday, January 15, 2013, Gavin Flower wrote:

  On 16/01/13 11:14, Bruce Momjian wrote:

 I mentioned last year that I wanted to start working on parallelism:

   https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.

 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:

   https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.


  Hmm...

 How about being aware of multiple spindles - so if the requested data
 covers multiple spindles, then data could be extracted in parallel.  This
 may, or may not, involve multiple I/O channels?



effective_io_concurrency does this for bitmap scans.  I thought there was a
patch in the commitfest to extend this to ordinary index scans, but now I
can't find it.  But it still doesn't give you CPU parallelism.  The nice
thing about CPU parallelism is that it usually brings some amount of IO
parallelism for free, while the reverse less likely to be so.

Cheers,

Jeff

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire

On Wed, Jan 16, 2013 at 10:04 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Hmm...

 How about being aware of multiple spindles - so if the requested data
 covers multiple spindles, then data could be extracted in parallel.  This
 may, or may not, involve multiple I/O channels?



 effective_io_concurrency does this for bitmap scans.  I thought there was a
 patch in the commitfest to extend this to ordinary index scans, but now I
 can't find it.

I never pushed it to the CF since it interacts so badly with the
kernel. I was thinking about pushing the small part that is a net win
in all cases, the back-sequential patch, but that's independent of any
spindle count. It's more related to rotating media and read request
merges than it is to multiple spindles or parallelization.

The kernel guys basically are waiting for me to patch the kernel. I
think I convinced our IT guy at the office to lend me a machine for
tests... so it might happen soon.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
 On Tuesday, January 15, 2013, Stephen Frost wrote:
 
 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?
 
 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.  
 
 
 I'd rather not have the benefits of parallelism be tied to partitioning if we
 can help it.  Hopefully implementing parallelism in core would result in
 something more transparent than that.

We will need a way to know we are not saturating the I/O channel with
random I/O that could have been sequential if it was single-threaded. 
Tablespaces give us that info;  not sure what else does.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire

On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
 On Tuesday, January 15, 2013, Stephen Frost wrote:

 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?

 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.


 I'd rather not have the benefits of parallelism be tied to partitioning if we
 can help it.  Hopefully implementing parallelism in core would result in
 something more transparent than that.

 We will need a way to know we are not saturating the I/O channel with
 random I/O that could have been sequential if it was single-threaded.
 Tablespaces give us that info;  not sure what else does.

I do also think tablespaces are a safe bet. But it wouldn't help for
parallelizing sorts or other operations with tempfiles (tempfiles
reside on the same tablespace), or even over a single table (same
tablespace again). And when the query is CPU-bound, it could be
parallelized by simply making a multithreaded memory sort. Well, not
so simply, but I do think it's an important building block.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 11:56:21PM -0300, Claudio Freire wrote:
 On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote:
  On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
  On Tuesday, January 15, 2013, Stephen Frost wrote:
 
  * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
   How about being aware of multiple spindles - so if the requested
   data covers multiple spindles, then data could be extracted in
   parallel. This may, or may not, involve multiple I/O channels?
 
  Yes, this should dovetail with partitioning and tablespaces to pick up
  on exactly that.
 
 
  I'd rather not have the benefits of parallelism be tied to partitioning if 
  we
  can help it.  Hopefully implementing parallelism in core would result in
  something more transparent than that.
 
  We will need a way to know we are not saturating the I/O channel with
  random I/O that could have been sequential if it was single-threaded.
  Tablespaces give us that info;  not sure what else does.
 
 I do also think tablespaces are a safe bet. But it wouldn't help for
 parallelizing sorts or other operations with tempfiles (tempfiles
 reside on the same tablespace), or even over a single table (same

We can round-robin temp tablespace usage if you list multiple entries.

 tablespace again). And when the query is CPU-bound, it could be
 parallelized by simply making a multithreaded memory sort. Well, not
 so simply, but I do think it's an important building block.

Yes, and detecting when to use these parallel features will be hard.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes

On Wednesday, January 16, 2013, Stephen Frost wrote:

 * Bruce Momjian (br...@momjian.us javascript:;) wrote:
  I am not sure how a COPY could be easily parallelized, but I supposed it
  could be done as part of the 1GB segment feature.  People have
  complained that COPY is CPU-bound, so it might be very interesting to
  see if we could offload some of that parsing overhead to a child.

 COPY can certainly be CPU bound but before we can parallelize that
 usefully we need to solve the problem around extent locking when trying
 to do multiple COPY's to the same table.


I think that is rather over-stating it.  Even with unindexed untriggered
tables, I can get some benefit from doing hand-rolled parallel COPY before
the extension lock becomes an issue, at least on some machines.  And with
triggered or indexed tables, all the more so.

Cheers,

Jeff

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

This would be fantastic and I'd like to help.  Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.

Will do.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Peter Geoghegan

On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1] 
http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Tue, Jan 15, 2013 at 10:39:10PM +, Peter Geoghegan wrote:
 On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
  I believe it is time to start adding parallel execution to the backend.
  We already have some parallelism in the backend:
  effective_io_concurrency and helper processes.  I think it is time we
  start to consider additional options.
 
 A few months back, I remarked [1] that speeding up sorting using
 pipelining and asynchronous I/O was probably parallelism low-hanging
 fruit. That hasn't changed, though I personally still don't have the
 bandwidth to look into it in a serious way.
 
 [1] 
 http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

OK, I added the link to the wiki.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Tue, Jan 15, 2013 at 10:53:29PM +, Simon Riggs wrote:
 On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
 
  I mentioned last year that I wanted to start working on parallelism:
 
 We don't normally begin discussing topics for next release just as a
 CF is starting.
 
 Why is this being discussed now?

It is for 9.4 and will take months.  I didn't think there was a better
time.  We don't usually discuss features during beta testing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs

On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:

 I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs

On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

 Why is this being discussed now?

 It is for 9.4 and will take months.  I didn't think there was a better
 time.  We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Gavin Flower


On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq.  Recently I added two parallel
execution paths to pg_upgrade.  The first parallel path allows execution
of external binaries pg_dump and psql (to restore).  The second parallel
path does copy/link by calling fork/thread-safe C functions.  I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes.  I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries.  The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread.  Thanks.


Hmm...

How about being aware of multiple spindles - so if the requested data 
covers multiple spindles, then data could be extracted in parallel. This 
may, or may not, involve multiple I/O channels?


On large multiple processor machines, there are different blocks of 
memory that might be accessed at different speeds depending on the 
processor. Possibly a mechanism could be used to split a transaction 
over multiple processors to ensure the fastest memory is used?


Once a selection of rows has been made, then if there is a lot of 
reformatting going on, then could this be done in parallel?  I can of 
think of 2 very simplistic strategies: (A) use a different processor 
core for each column, or (B) farm out sets of rows to different cores.  
I am sure in reality, there are more subtleties and aspects of both the 
strategies will be used in a hybrid fashion along with other approaches.


I expect that before any parallel algorithm is invoked, then some sort 
of threshold needs to be exceeded to make it worth while. Different 
aspects of the parallel algorithm may have their own thresholds. It may 
not be worth applying a parallel algorithm for 10 rows from a simple 
table, but selecting 10,000 records from multiple tables each over 10 
million rows using joins may benefit for more extreme parallelism.


I expect that UNIONs, as well as the processing of partitioned tables, 
may be amenable to parallel processing.



Cheers,
Gavin

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:
 
  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.
 
 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?
 
 If we can't finish what we've started for 9.3, why talk about 9.4?
 
 Yes, its a great topic for discussion, but there are better times.

Like when?  I don't remember a policy of not discussing things now. 
Does anyone else remember this?  Are you saying feature discussion is
only between commit-fests?  Is this written down anywhere?  I only
remember beta-time as a time not to discuss features.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
 On 16/01/13 11:14, Bruce Momjian wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.
 
 
 Hmm...
 
 How about being aware of multiple spindles - so if the requested data covers
 multiple spindles, then data could be extracted in parallel.  This may, or may
 not, involve multiple I/O channels?

Well, we usually label these as tablespaces.  I don't know if
spindle-level is a reasonable level to add.

 On large multiple processor machines, there are different blocks of memory 
 that
 might be accessed at different speeds depending on the processor.  Possibly a
 mechanism could be used to split a transaction over multiple processors to
 ensure the fastest memory is used?

That seems too far-out for an initial approach.

 Once a selection of rows has been made, then if there is a lot of reformatting
 going on, then could this be done in parallel?  I can of think of 2 very
 simplistic strategies: (A) use a different processor core for each column, or
 (B) farm out sets of rows to different cores.  I am sure in reality, there are
 more subtleties and aspects of both the strategies will be used in a hybrid
 fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

 I expect that before any parallel algorithm is invoked, then some sort of
 threshold needs to be exceeded to make it worth while.  Different aspects of
 the parallel algorithm may have their own thresholds.  It may not be worth
 applying a parallel algorithm for 10 rows from a simple table, but selecting
 10,000 records from multiple tables each over 10 million rows using joins may
 benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

 I expect that UNIONs, as well as the processing of partitioned tables, may be
 amenable to parallel processing.

Interesting idea on UNION.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost

* Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
 How about being aware of multiple spindles - so if the requested
 data covers multiple spindles, then data could be extracted in
 parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that.  We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can.  I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

 On large multiple processor machines, there are different blocks of
 memory that might be accessed at different speeds depending on the
 processor. Possibly a mechanism could be used to split a transaction
 over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first..  I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

 Once a selection of rows has been made, then if there is a lot of
 reformatting going on, then could this be done in parallel?  I can
 of think of 2 very simplistic strategies: (A) use a different
 processor core for each column, or (B) farm out sets of rows to
 different cores.  I am sure in reality, there are more subtleties
 and aspects of both the strategies will be used in a hybrid fashion
 along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this..  I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already.  Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

 I expect that before any parallel algorithm is invoked, then some
 sort of threshold needs to be exceeded to make it worth while.

Certainly.  That's need to be included in the optimization model to
support this.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:
 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?
 
 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.  We're implementing our own poor-man's parallelism
 using exactly this to use as much of the CPU and I/O bandwidth as we
 can.  I have every confidence that it could be done better and be
 simpler for us if it was handled in the backend.

Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.

  On large multiple processor machines, there are different blocks of
  memory that might be accessed at different speeds depending on the
  processor. Possibly a mechanism could be used to split a transaction
  over multiple processors to ensure the fastest memory is used?
 
 Let's work on getting it working on the h/w that PG is most commonly
 deployed on first..  I agree that we don't want to paint ourselves into
 a corner with this, but I don't think massive NUMA systems are what we
 should focus on first (are you familiar with any that run PG today..?).
 I don't expect we're going to be trying to fight with the Linux (or
 whatever) kernel over what threads run on what processors with access to
 what memory on small-NUMA systems (x86-based).

Agreed.

  Once a selection of rows has been made, then if there is a lot of
  reformatting going on, then could this be done in parallel?  I can
  of think of 2 very simplistic strategies: (A) use a different
  processor core for each column, or (B) farm out sets of rows to
  different cores.  I am sure in reality, there are more subtleties
  and aspects of both the strategies will be used in a hybrid fashion
  along with other approaches.
 
 Given our row-based storage architecture, I can't imagine we'd do
 anything other than take a row-based approach to this..  I would think
 we'd do two things: parallelize based on partitioning, and parallelize
 seqscan's across the individual heap files which are split on a per-1G
 boundary already.  Perhaps we can generalize that and scale it based on
 the number of available processors and the size of the relation but I
 could see advantages in matching up with what the kernel thinks are
 independent files.

The 1GB idea is interesting.  I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

  I expect that before any parallel algorithm is invoked, then some
  sort of threshold needs to be exceeded to make it worth while.
 
 Certainly.  That's need to be included in the optimization model to
 support this.

I have updated the wiki to reflect the ideas mentioned above.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier

On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote:

 I mentioned last year that I wanted to start working on parallelism:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.

 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.

Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.
 
 Honestly that would be a great feature, and I would be happy helping working 
 on
 it.
 Taking advantage of parallelism in a server with multiple core, especially for
 things like large sorting operations would be great.
 Just thinking loudly, but wouldn't it be the role of the planner to determine
 if such or such query is worth using parallelism? The executor would then be 
 in
 charge of actually firing the tasks in parallel that planner has determined
 necessary to do.

Yes, it would probably be driven off of the optimizer statistics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 Given our row-based storage architecture, I can't imagine we'd do
 anything other than take a row-based approach to this..  I would think
 we'd do two things: parallelize based on partitioning, and parallelize
 seqscan's across the individual heap files which are split on a per-1G
 boundary already.  Perhaps we can generalize that and scale it based on
 the number of available processors and the size of the relation but I
 could see advantages in matching up with what the kernel thinks are
 independent files.

 The 1GB idea is interesting.  I found in pg_upgrade that file copy would
 just overwhelm the I/O channel, and that doing multiple copies on the
 same device had no win, but those were pure I/O operations --- a
 sequential scan might be enough of a mix of I/O and CPU that parallelism
 might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost

* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
  The 1GB idea is interesting.  I found in pg_upgrade that file copy would
  just overwhelm the I/O channel, and that doing multiple copies on the
  same device had no win, but those were pure I/O operations --- a
  sequential scan might be enough of a mix of I/O and CPU that parallelism
  might help.
 
 AFAIR, synchroscans were introduced because multiple large sequential
 scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.  It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.

Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system.  That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire

On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
  The 1GB idea is interesting.  I found in pg_upgrade that file copy would
  just overwhelm the I/O channel, and that doing multiple copies on the
  same device had no win, but those were pure I/O operations --- a
  sequential scan might be enough of a mix of I/O and CPU that parallelism
  might help.

 AFAIR, synchroscans were introduced because multiple large sequential
 scans were counterproductive (big time).

 Sequentially scanning the *same* data over and over is certainly
 counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
 talking about though- we're talking about scanning and processing
 independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

  It's certainly
 possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it a lot worse
rather than not as good.

 but there will be
 quite a few cases where it's much, much better.

Just cached segments.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost

* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote:
  Sequentially scanning the *same* data over and over is certainly
  counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
  talking about though- we're talking about scanning and processing
  independent sets of data using multiple processes.
 
 I don't see the difference. Blocks are blocks (unless they're cached).

Not quite.  Having to go out to the kernel isn't free.  Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.

   It's certainly
  possible that in some cases that won't be as good
 
 If memory serves me correctly (and it does, I suffered it a lot), the
 performance hit is quite considerable. Enough to make it a lot worse
 rather than not as good.

I feel like we must not be communicating very well.

If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall.  If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system.  The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.

  but there will be
  quite a few cases where it's much, much better.
 
 Just cached segments.

No, certainly not just cached segments.  Any situation where the CPU is
the bottleneck.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus


 but there will be
 quite a few cases where it's much, much better.
 
 Just cached segments.

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost

* Josh Berkus (j...@agliodbs.com) wrote:
 Actually, thanks to much faster storage (think SSD, SAN), it's easily
 possible for PostgreSQL to become CPU-limited on a seq scan query, even
 when reading from disk.

Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus

Claudio, Stephen,

It really seems like the areas where we could get the most bang for the
buck in parallelism would be:

1. Parallel sort
2. Parallel aggregation (for commutative aggregates)
3. Parallel nested loop join (especially for expression joins, like GIS)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier

On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian

On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
 Claudio, Stephen,
 
 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:
 
 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)
 
 parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
Is it still being worked on?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier

On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote:

 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for
 the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like
 GIS)
 
  parallel data load? :/

 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?

Not exactly, I meant something like being able to use parallel processing
when doing INSERT or COPY directly in core. If there is a parallel
processing infrastructure, it could also be used for such write operations.
I agree that the cases mentioned by Josh are far more appealing though...
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire

On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost sfr...@snowman.net wrote:
 If memory serves me correctly (and it does, I suffered it a lot), the
 performance hit is quite considerable. Enough to make it a lot worse
 rather than not as good.

 I feel like we must not be communicating very well.

 If the CPU is pegged at 100% and the I/O system is at 20%, adding
 another CPU at 100% will bring the I/O load up to 40% and you're now
 processing data twice as fast overall

Well, there's the fault in your logic. It won't be as linear. Adding
another sequential scan will decrease bandwidth, if the I/O system was
doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load
(maybe even worse). Quite suddenly you'll meet diminishing returns,
and the I/O subsystem which wasn't the bottleneck will become it,
bandwidth being the key. You might end up with less bandwidth than
you've started, if you go far enough past that knee.

Add some concurrent operations (connections) to the mix and it just gets worse.

Figuring out where the knee is may be the hardest problem you'll face.
I don't think it'll be predictable enough to make I/O parallelization
in that case worth the effort.

If you instead think of parallelizing random I/O (say index scans
within nested loops), that might work (or it might not). Again it
depends a helluva lot on what else is contending with the I/O
resources and how far ahead of optimum you push it. I've faced this
problem when trying to prefetch on index scans. If you try to prefetch
too much, you induce extra delays and it's a bad tradeoff.

Feel free to do your own testing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Alvaro Herrera

Bruce Momjian escribió:
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
  
  
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
  
  Claudio, Stephen,
  
  It really seems like the areas where we could get the most bang for the
  buck in parallelism would be:
  
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like GIS)
  
  parallel data load? :/
 
 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
 Is it still being worked on?

It's in the previous-to-last commitfest.  IIRC that patch required
review and testing from people with some Windows background.

There are still 34 items needing attention in CF3.  I suggest that, if
you have some spare time, your help would be very much appreciated
there.  The commitfest that started on Jan 15th has 65 extra items.
Anything currently listed in CF3 can rightfully be considered to be part
of CF4, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Jeff Janes

On Tuesday, January 15, 2013, Simon Riggs wrote:

 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us javascript:;
 wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.


Possibly so.  But unless we are to introduce a thinkfest, how do we know
when such a better time would be?

Lately commit-fests have been basically a continuous thing, except during
beta which would be an even worse time to discuss it.  It think that
parallel execution is huge and probably more likely for 9.5 (10.0?) than
9.4 for the general case (maybe some special cases for 9.4, like index
builds).  Yet the single biggest risk I see to the future of the project is
the lack of parallel execution.

Cheers,

Jeff

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

In case you hadn't noticed, we've totally lost control of the CF
process.  Quite aside from the lack of progress on closing CF3, major
hackers who should know better are submitting significant new feature
patches now, despite our agreement in Ottawa that nothing big would be
accepted after CF3.  At this point I'd bet against releasing 9.3 during
2013.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Query Execution Project

2010-09-28 Thread Markus Wanner

Hi,

On 09/28/2010 07:24 AM, Li Jie wrote:
 I'm interested in this parallel project,
 http://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 But I can't find any discussion and current progress in the website, it
 seems to stop for nearly a year?

Yeah, I don't know of anybody really working on it ATM.

If you are interested in a process based design, please have a look at
the bgworker infrastructure stuff. It could be of help for a
process-based implementation.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Query Execution Project

2010-09-28 Thread Hans-Jürgen Schönig

On Sep 28, 2010, at 10:15 AM, Markus Wanner wrote:

 Hi,
 
 On 09/28/2010 07:24 AM, Li Jie wrote:
 I'm interested in this parallel project,
 http://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 But I can't find any discussion and current progress in the website, it
 seems to stop for nearly a year?
 
 Yeah, I don't know of anybody really working on it ATM.
 
 If you are interested in a process based design, please have a look at
 the bgworker infrastructure stuff. It could be of help for a
 process-based implementation.
 
 Regards
 
 Markus Wanner



yes, i don't know of anybody either.
in addition to that it is more than a giant task. it means working on more than 
just one isolated part.
practically i cannot think of any stage of query execution which would not need 
some changes.
i don't see a feature like that within a realistic timeframe.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

95 matches

Mail list logo