from:"Rafia Sabih"

Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-16 Thread Rafia Sabih

On Tue, Oct 17, 2017 at 3:22 AM, Andres Freund <and...@anarazel.de> wrote:
> Hi Rafia,
>
> On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:
>> head:
>> explain  analyse select * from t where i < 3000;
>>  QUERY PLAN
>
> Could you share how exactly you generated the data? Just so others can
> compare a bit better with your results?
>

Sure. I used generate_series(1, 1000);
Please find the attached script for the detailed steps.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


large_tbl_gen.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-25 Thread Rafia Sabih

On Thu, Sep 21, 2017 at 10:34 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 4:50 PM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
>> On Sun, Sep 17, 2017 at 9:10 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Wed, Sep 6, 2017 at 4:14 PM, Rafia Sabih
>>> <rafia.sa...@enterprisedb.com> wrote:
>>>
>>
>> Please find the attached file for the revised version.
>
> Thanks for the updated patch, I have some more comments.
>
Again, thanks for the review. Please find the attached file for updated patch.

> +static shm_mq_handle *local_mq_attach(shm_mq_handle *mqh);
> +static uint64 space_in_local(local_mq * lq, Size tuple_size);
> +static bool read_local_queue(local_mq * lq, bool shm_mq_full);
>
> local_mq * lq  -> local_mq *lq
> same for other places as well.
>
> ---
>
Done. This is something pgindent does, anyhow corrected it.

> +static uint64 space_in_shm(shm_mq *mq);
> +
> +static uint64 space_in_local(local_mq * lq, Size tuple_size);
>
> we better use Size here instead if uint64
>
> ---
Done
>
> + available = ringsize - used;
> +
> + ringsize = lq->mq_ring_size;
> + writer_offset = lq->mq_bytes_written % ringsize;
> + reader_offset = lq->mq_bytes_read % ringsize;
> +
> + if (writer_offset + tuple_size < ringsize && reader_offset < writer_offset)
> + available = (ringsize - writer_offset);
>
> even though there is space in queue but tuple need rotation then we
> are not allowing it to
> write into the local queue.  If there is some strong reason behind
> that, please add comments
> to explain the same.
> ---
>
Corrected, it will just return available space now.

> + if (shm_mq_full || (written - read) >= .05 * lq->mq_ring_size)
> + return true;
> +
> + else
> + return true;
> +}
>
> Seems like you want to return 'false' in the else case.
>
> 
Yes and done.
>
> + read_offset = lq->mq_bytes_read % lq->mq_ring_size;
> + available = space_in_shm(mqh->mqh_queue);
> +
> + /* always read data in the aligned form */
> + to_read = MAXALIGN_DOWN(Min(used, available));
> +
> + /*
> + * if the amount of data to be send from local queue involves wrapping of
> + * local queue, then send only the data till the end of queue right now
> + * and rest later.
> + */
> + if (lq->mq_bytes_read % lq->mq_ring_size + to_read > lq->mq_ring_size)
>
> You can directly use "read_offset" instead of recalculating
> lq->mq_bytes_read % lq->mq_ring_size.
>
> 
Done.
> + do
> + {
> + if (shm_mq_is_detached(mqh->mqh_queue))
> + return SHM_MQ_DETACHED;
> +
> + shm_space = space_in_shm(mqh->mqh_queue);
> +
> + /*
> + * cannot send data to shared queue, unless there is required
> + * space, so wait till we get some space, since we cannot
> + * write anymore in local queue as of now
> + */
> + WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_EVENT_MQ_SEND);
> +
> + /* Reset the latch so we don't spin. */
> + ResetLatch(MyLatch);
> +
> + /* An interrupt may have occurred while we were waiting. */
> + CHECK_FOR_INTERRUPTS();
> +
> + shm_space = space_in_shm(mqh->mqh_queue);
> +
> + if (read_local_queue(lq, true) && shm_space > 0)
> + copy_local_to_shared(lq, mqh, false);
> +
> + local_space = space_in_local(lq, tuple_size);
> +
> + } while (local_space <= tuple_size);
> +
>
> 1. Just after getting the shm_space, you are calling WaitLatch,
> without even checking whether
> that space is sufficient to send the tuple.
> 2. Every time after latch is set (maybe some space freed in the shared
> queue) you are calling
> copy_local_to_shared to send as much data as possible from local to
> shared queue, but that doesn't
> even guarantee that we will have sufficient space in the local queue
> to accommodate the current tuple.
>
> I think calling copy_local_to_shared multiple time (which will
> internally acquire mutex), after latch
> is set you can check the shared queue space, don't attempt
> copy_local_to_shared unless
> shm_space >=tuple_size
>
Done.
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


faster_gather_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-21 Thread Rafia Sabih

On Sun, Sep 17, 2017 at 9:10 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Wed, Sep 6, 2017 at 4:14 PM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
>
>> I worked on this idea of using local queue as a temporary buffer to
>> write the tuples when master is busy and shared queue is full, and it
>> gives quite some improvement in the query performance.
>>
>
> I have done some initial review of this patch and I have some comments.
>
Thanks for the review.

> /* this is actual size for this tuple which will be written in queue */
> + tuple_size = MAXALIGN(sizeof(Size)) + MAXALIGN(iov.len);
> +
> + /* create and attach a local queue, if it is not yet created */
> + if (mqh->mqh_local_queue == NULL)
> + mqh = local_mq_attach(mqh);
>
> I think we can create the local queue when first time we need it. So
> basically you
> can move this code inside else part where we first identify that there is no
> space in the shared queue.
>
Done.
> --
> + /* write in local queue if there is enough space*/
> + if (local_space > tuple_size)
>
> I think the condition should be if (local_space >= tuple_size)
>
I did this to be on the safer side, anyhow modified.
> --
> + while(shm_space <= 0)
> + {
> + if (shm_mq_is_detached(mqh->mqh_queue))
> + return SHM_MQ_DETACHED;
> +
> + shm_space = space_in_shm(mqh->mqh_queue);
> + }
>
> Instead of waiting in CPU intensive while loop we should wait on some latch, 
> why
> can't we use wait latch of the shared queue and whenever some space
> free, the latch will
> be set then we can recheck the space and if it's enough we can write
> to share queue
> otherwise wait on the latch again.
>
> Check other similar occurrences.
Done.
> -
>
> + if (read_local_queue(lq, true) && shm_space > 0)
> + copy_local_to_shared(lq, mqh, false);
> +
>
> Instead of adding shm_space > 0 in check it can be Assert or nothing
> because above loop will
> only break if shm_space > 0.
> 
Done
>
> + /*
> + * create a local queue, the size of this queue should be way higher
> + * than PARALLEL_TUPLE_QUEUE_SIZE
> + */
> + char *mq;
> + Size len;
> +
> + len = 6553600;
>
> Create some macro which is multiple of PARALLEL_TUPLE_QUEUE_SIZE,
>
Done.
> ---
>
> + /* this local queue is not required anymore, hence free the space. */
> + pfree(mqh->mqh_local_queue);
> + return;
> +}
>
> I think you can remove the return at the end of the void function.
> -
Done.
>
> In empty_queue(shm_mq_handle *mqh) function I see that only last exit path 
> frees
> the local queue but not all even though local queue is created.
> 
>
Modified.
> Other cosmetic issues.
> -
> +/* check the space availability in local queue */
> +static uint64
> +space_in_local(local_mq *lq, Size tuple_size)
> +{
> + uint64 read, written, used, available, ringsize, writer_offset, 
> reader_offset;
> +
> + ringsize = lq->mq_ring_size;
> + read = lq->mq_bytes_read;
> + written = lq->mq_bytes_written;
> + used = written - read;
> + available = ringsize - used;
> +
> Alignment is not correct, I think you can run pgindent once.
> 
>
> + /* check is there is required space in shared queue */
> statement need refactoring. "check if there is required space in shared 
> queue" ?
>
> -
>
> + /* write in local queue if there is enough space*/
> space missing before comment end.
>
> -
>
> +
> +/* Routines required for local queue */
> +
> +/*
> + * Initialize a new local message queue, this is kept quite similar
> to shm_mq_create.
> + */
>
> Header comments formatting is not in sync with other functions.
>
> -
>
> +}
> +/* routine to create and attach local_mq to the shm_mq_handle */
>
> one blank line between two functions.
>
> -
>
Ran pgindent for these issues.

> + bool detached;
> +
> + detached = false;
>
> a better way is bool detached = false;
>
> -
>
Done.
> Compilation warning
> 
> shm_mq.c: In function ‘write_in_local_queue’:
> shm_mq.c:1489:32: warning: variable ‘tuple_size’ set but not used
> [-Wunused-but-set-variable]
>   uint64 bytes_written, nbytes, tuple_size;
>
That might be in case not configured with cassert, however, it is
removed in current version anyway.

Please find the attached file for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


faster_gather_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-20 Thread Rafia Sabih

On Tue, Sep 19, 2017 at 3:50 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
> Rafia Sabih wrote:
>
>> On completing the benchmark for all queries for the above mentioned
>> setup, following performance improvement can be seen,
>> Query | Patch | Head
>> 3  | 1455  |  1631
>> 4  |  499  |  4344
>> 5  |  1464  |  1606
>> 10  |  1475  |  1599
>> 12  |  1465  |  1790
>>
>> Note that all values of execution time are in seconds.
>> To summarise, apart from Q4, all other queries are showing somewhat
>> 10-20% improvement.
>
> Saving 90% of time on the slowest query looks like a worthy improvement
> on its own right.  However, you're reporting execution time only, right?
> What happens to planning time?  In a quick look,

Definitely. The planning time issue has been discussed upthread,

On Mon, Mar 20, 2017 at 12:07 PM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:

> Another minor thing to note that is planning time is almost twice with
> this patch, though I understand that this is for scenarios with really
> big 'big data' so this may not be a serious issue in such cases, but
> it'd be good if we can keep an eye on this that it doesn't exceed the
> computational bounds for a really large number of tables..

To which Robert replied as,

Yes, this is definitely going to use significant additional planning
time and memory.  There are several possible strategies for improving
that situation, but I think we need to get the basics in place first.
That's why the proposal is now to have this turned off by default.
People joining really big tables that happen to be equipartitioned are
likely to want to turn it on, though, even before those optimizations
are done.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-13 Thread Rafia Sabih

On Wed, Sep 13, 2017 at 2:29 PM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
>
> On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <rafia.sa...@enterprisedb.com>
> wrote:
>>
>> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
>> <kommi.harib...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > Attached a rebased patch that supports parallelism for the queries
>> > that are underneath of some utility commands such as CREATE TABLE AS
>> > and CREATE MATERIALIZED VIEW.
>> >
>> > Note: This patch doesn't make the utility statement (insert operation)
>> > to run in parallel. It only allows the select query to be parallel if
>> > the
>> > query
>> > is eligible for parallel.
>> >
>>
>> Here is my feedback fro this patch,
>>
>> - The patch is working as expected, all regression tests are passing
>
>
> Thanks for the review.
>
>>
>> - I agree with Dilip that having similar mechanism for 'insert into
>> select...' statements would add more value to the patch, but even then
>> this looks like a good idea to extend parallelism for atleast a few of
>> the write operations
>
>
> Yes, I also agree that supporting of 'insert into select' will provide more
> benefit. I already tried to support the same in [1], but it have many
> drawbacks especially with triggers. To support a proper parallel support
> for DML queries, I feel the logic of ParalleMode needs an update to
> avoid the errors from PreventCommandIfParallelMode() function to
> identify whether it is nested query operation and that should execute
> only in backend and etc.
>
> As the current patch falls into DDL category that gets benefited from
> parallel query, because of this reason, I didn't add the 'insert into
> select'
> support into this patch. Without support of it also, it provides the
> benefit.
> I work on supporting the DML write support with parallel query as a
> separate patch.
>
Sounds sensible. In that case, I'll be marking this patch ready for committer.
>
> [1] -
> https://www.postgresql.org/message-id/CAJrrPGfo58TrYxnqwnFAo4%2BtYr8wUH-oC0dJ7V9x7gAOZeaz%2BQ%40mail.gmail.com
>
>

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-13 Thread Rafia Sabih

On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
> Hi All,
>
> Attached a rebased patch that supports parallelism for the queries
> that are underneath of some utility commands such as CREATE TABLE AS
> and CREATE MATERIALIZED VIEW.
>
> Note: This patch doesn't make the utility statement (insert operation)
> to run in parallel. It only allows the select query to be parallel if the
> query
> is eligible for parallel.
>

Here is my feedback fro this patch,

- The patch is working as expected, all regression tests are passing
- I agree with Dilip that having similar mechanism for 'insert into
select...' statements would add more value to the patch, but even then
this looks like a good idea to extend parallelism for atleast a few of
the write operations

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-09-11 Thread Rafia Sabih

On Sat, Sep 9, 2017 at 8:14 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, Sep 8, 2017 at 11:07 PM, Alexander Kuzmenkov
> <a.kuzmen...@postgrespro.ru> wrote:
>> Hi Rafia,
>>
>> I like the idea of reducing locking overhead by sending tuples in bulk. The
>> implementation could probably be simpler: you could extend the API of shm_mq
>> to decouple notifying the sender from actually putting data into the queue
>> (i.e., make shm_mq_notify_receiver public and make a variant of shm_mq_sendv
>> that doesn't send the notification).
>>
>
> Rafia can comment on details, but I would like to bring it to your
> notice that we need kind of local buffer (queue) for gathermerge
> processing as well where the data needs to be fetched in order from
> queues.  So, there is always a chance that some of the workers have
> filled their queues while waiting for the master to extract the data.
> I think the patch posted by Rafia on the nearby thread [1] addresses
> both the problems by one patch.
>
>
> [1] - 
> https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com
>

Thanks Alexander for your interest in this work. As rightly pointed
out by Amit, when experimenting with this patch we found that there
are cases when master is busy and unable to read tuples in
shared_queue and the worker get stuck as it can not process tuples any
more. When experimenting aong these lines, I found that Q12 of TPC-H
is showing great performance improvement when increasing
shared_tuple_queue_size [1].
It was then we realised that merging this with the idea of giving an
illusion of larger tuple queue size with a local queue[1] could be
more beneficial. To precisely explain the meaning of merging the two
ideas, now we write tuples in local_queue once shared_queue is full
and as soon as we have filled some enough tuples in local queue we
copy the tuples from local to shared queue in one memcpy call. It is
giving good performance improvements for quite some cases.

I'll be glad if you may have a look at this patch and enlighten me
with your suggestions. :-)

[1] - 
https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-08 Thread Rafia Sabih

On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan...@gmail.com> wrote:
> Hi Rafia,
>
> On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan...@gmail.com> wrote:
>> But for all of the cases here, partial
>> subplans seem possible, and so even on HEAD it executed Partial
>> Append. So between a Parallel Append having partial subplans and a
>> Partial Append having partial subplans , the cost difference would not
>> be significant. Even if we assume that Parallel Append was chosen
>> because its cost turned out to be a bit cheaper, the actual
>> performance gain seems quite large as compared to the expected cost
>> difference. So it might be even possible that the performance gain
>> might be due to some other reasons. I will investigate this, and the
>> other queries.
>>
>
> I ran all the queries that were showing performance benefits in your
> run. But for me, the ParallelAppend benefits are shown only for plans
> that use Partition-Wise-Join.
>
> For all the queries that use only PA plans but not PWJ plans, I got
> the exact same plan for HEAD as for PA+PWJ patch, except that for the
> later, the Append is a ParallelAppend. Whereas, for you, the plans
> have join-order changed.
>
> Regarding actual costs; consequtively, for me the actual-cost are more
> or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
> quite different costs naturally because the plans themselves are
> different on head versus PA+PWJ.
>
> My PA+PWJ plan outputs (and actual costs) match exactly what you get
> with PA+PWJ patch. But like I said, I get the same join order and same
> plans (and actual costs) for HEAD as well (except
> ParallelAppend=>Append).
>
> May be, if you have the latest HEAD code with your setup, you can
> yourself check some of the queries again to see if they are still
> seeing higher costs as compared to PA ? I suspect that some changes in
> latest code might be causing this discrepancy; because when I tested
> some of the explains with a HEAD-branch server running with your
> database, I got results matching PA figures.
>
> Attached is my explain-analyze outputs.
>

Now, when I compare your results with the ones I posted I could see
one major difference between them -- selectivity estimation errors.
In the results I posted, e.g. Q3, on head it gives following

->  Finalize GroupAggregate  (cost=41131358.89..101076015.45
rows=455492628 width=44) (actual time=126436.642..129247.972
rows=226765 loops=1)
   Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
   ->  Gather Merge  (cost=41131358.89..90637642.73
rows=379577190 width=44) (actual time=126436.602..127791.768
rows=235461 loops=1)
 Workers Planned: 2
 Workers Launched: 2

and in your results it is,
->  Finalize GroupAggregate  (cost=4940619.86..6652725.07
rows=13009521 width=44) (actual time=89573.830..91956.956 rows=226460
loops=1)
   Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
   ->  Gather Merge  (cost=4940619.86..6354590.21
rows=10841268 width=44) (actual time=89573.752..90747.393 rows=235465
loops=1)
 Workers Planned: 2
 Workers Launched: 2

However, for the results with the patch/es this is not the case,

in my results, with patch,

 ->  Finalize GroupAggregate  (cost=4933450.21..663.01
rows=12899766 width=44) (actual time=87250.039..90593.716 rows=226765
loops=1)
   Group Key: lineitem_001.l_orderkey,
orders_001.o_orderdate, orders_001.o_shippriority
   ->  Gather Merge  (cost=4933450.21..6335491.38
rows=10749804 width=44) (actual time=87250.020..89125.279 rows=227291
loops=1)
 Workers Planned: 2
 Workers Launched: 2

I think this explains the reason for drastic different in the plan
choices and thus the performance for both the cases.

Since I was using same database for the cases, I don't have much
reasons for such difference in selectivity estimation for these
queries. The only thing might be a missing vacuum analyse, but since I
checked it a couple of times I am not sure if even that could be the
reason. Additionally, it is not the case for all the queries, like in
Q10 and Q21, the estimates are similar.

However, on a fresh database the selectivity-estimates and plans as
reported by you and with the patched version I posted seems to be the
correct one. I'll see if I may check performance of these queries once
again to verify these.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-07 Thread Rafia Sabih

On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan...@gmail.com> wrote:
> Hi Rafia,
>
> On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan...@gmail.com> wrote:
>> But for all of the cases here, partial
>> subplans seem possible, and so even on HEAD it executed Partial
>> Append. So between a Parallel Append having partial subplans and a
>> Partial Append having partial subplans , the cost difference would not
>> be significant. Even if we assume that Parallel Append was chosen
>> because its cost turned out to be a bit cheaper, the actual
>> performance gain seems quite large as compared to the expected cost
>> difference. So it might be even possible that the performance gain
>> might be due to some other reasons. I will investigate this, and the
>> other queries.
>>
>
> I ran all the queries that were showing performance benefits in your
> run. But for me, the ParallelAppend benefits are shown only for plans
> that use Partition-Wise-Join.
>
> For all the queries that use only PA plans but not PWJ plans, I got
> the exact same plan for HEAD as for PA+PWJ patch, except that for the
> later, the Append is a ParallelAppend. Whereas, for you, the plans
> have join-order changed.
>
> Regarding actual costs; consequtively, for me the actual-cost are more
> or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
> quite different costs naturally because the plans themselves are
> different on head versus PA+PWJ.
>
> My PA+PWJ plan outputs (and actual costs) match exactly what you get
> with PA+PWJ patch. But like I said, I get the same join order and same
> plans (and actual costs) for HEAD as well (except
> ParallelAppend=>Append).
>
> May be, if you have the latest HEAD code with your setup, you can
> yourself check some of the queries again to see if they are still
> seeing higher costs as compared to PA ? I suspect that some changes in
> latest code might be causing this discrepancy; because when I tested
> some of the explains with a HEAD-branch server running with your
> database, I got results matching PA figures.
>
> Attached is my explain-analyze outputs.
>

Strange. Please let me know what was the commit-id you were
experimenting on. I think we need to investigate this a little
further. Additionally I want to point that I also applied patch [1],
which I forgot to mention before.

[1]  
https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com

> On 16 August 2017 at 18:34, Robert Haas <robertmh...@gmail.com> wrote:
>> Thanks for the benchmarking results!
>>
>> On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
>> <rafia.sa...@enterprisedb.com> wrote:
>>> Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41
>>
>> 12 seconds instead of 244?  Whoa.  I find it curious that we picked a
>> Parallel Append with a bunch of non-partial plans when we could've
>> just as easily picked partial plans, or so it seems to me.  To put
>> that another way, why did we end up with a bunch of Bitmap Heap Scans
>> here instead of Parallel Bitmap Heap Scans?
>
> Actually, the cost difference would be quite low for Parallel Append
> with partial plans and Parallel Append with non-partial plans with 2
> workers. But yes, I should take a look at why it is consistently
> taking non-partial Bitmap Heap Scan.
>
> 
>
>> Q6 | 29 | 12 | PA only
>
> This one needs to be analysed, because here, the plan cost is the
> same, but actual cost for PA is almost half the cost for HEAD. This is
> the same observation for my run also.
>
> Thanks
> -Amit



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-06 Thread Rafia Sabih

al time=42770.489..42770.489 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..2969127.63 rows=575231
width=27) (actual time=11.355..42224.843 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..2899612.01 rows=143808
width=27) (actual time=0.346..10385.472 rows=62906 loops=5)
 ->  Parallel Index Scan using idx_l_shipmode on
lineitem  (cost=0.57..2796168.46 rows=143808 width=19) (actual
time=0.280..9004.095 rows=62906 loops=5)
   Index Cond: (l_shipmode = ANY ('{"REG
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date)
AND (l_receiptdate < '1996-01-01 00:00:00'::timestamp without time
zone))
   Rows Removed by Filter: 3402367
 ->  Index Scan using orders_pkey on orders
(cost=0.56..0.72 rows=1 width=20) (actual time=0.020..0.020 rows=1
loops=314530)
   Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 1.202 ms
 Execution time: 42841.895 ms
(15 rows)

Patch:
--
 Limit  (cost=1001.19..426457.34 rows=1 width=27) (actual
time=19461.653..19461.654 rows=1 loops=1)
   ->  GroupAggregate  (cost=1001.19..2979194.24 rows=7 width=27)
(actual time=19461.651..19461.651 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..2969127.63 rows=575231
width=27) (actual time=10.239..18783.386 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..2899612.01 rows=143808
width=27) (actual time=0.376..19109.107 rows=66104 loops=5)
 ->  Parallel Index Scan using idx_l_shipmode on
lineitem  (cost=0.57..2796168.46 rows=143808 width=19) (actual
time=0.310..16615.236 rows=66104 loops=5)
   Index Cond: (l_shipmode = ANY ('{"REG
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date)
AND (l_receiptdate < '1996-01-01 00:00:00'::timestamp with out time
zone))
   Rows Removed by Filter: 3574492
 ->  Index Scan using orders_pkey on orders
(cost=0.56..0.72 rows=1 width=20) (actual time=0.034..0.034 rows=1
loops=330519)
   Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 3.498 ms
 Execution time: 19661.054 ms
(15 rows)

This suggests that with such an idea the range of selectivity for
using parallelism can be extended for improving the performance of the
queries.

Credits:
Would like to extend thanks to my colleagues Dilip Kumar, Amit Kapila,
and Robert Haas for their discussions and words of encouragement
throughout the development of this patch.

Feedback and suggestions are welcome.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


faster_gather_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-26 Thread Rafia Sabih

On Wed, Jul 26, 2017 at 12:02 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
>
> Ok. If those queries have equi-join between partitioned tables and are
> not picking up partition-wise join, that case needs to be
> investigated. Q21 for example has join between three lineitem
> instances. Those joins can be executed by partition-wise join. But it
> may so happen that optimal join order doesn't join partitioned tables
> with each other, thus interleaving partitioned tables with
> unpartitioned or differently partitioned tables in join order.
> Partition-wise join is not possible then. A different partitioning
> scheme may be required there.
>
Good point, will look into this direction as well.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-26 Thread Rafia Sabih

On Wed, Jul 26, 2017 at 10:38 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
>
> On Tue, Jul 25, 2017 at 9:39 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> > On Tue, Jul 25, 2017 at 8:59 PM, Robert Haas <robertmh...@gmail.com> wrote:
> >> On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
> >> <rafia.sa...@enterprisedb.com> wrote:
> >>> - other queries show a good 20-30% improvement in performance. Performance
> >>> numbers are as follows,
> >>>
> >>> Query| un_part_head (seconds) | part_head (seconds) | part_patch 
> >>> (seconds) |
> >>> 3 | 76 |127 | 88 |
> >>> 4 |17 | 244 | 41 |
> >>> 5 | 52 | 123 | 84 |
> >>> 7 | 73 | 134 | 103 |
> >>> 10 | 67 | 111 | 89 |
> >>> 12 | 53 | 114 | 99 |
> >>> 18 | 447 | 709 | 551 |
> >>
> >> Hmm.  This certainly shows that benefit of the patch, although it's
> >> rather sad that we're still slower than if we hadn't partitioned the
> >> data in the first place.  Why does partitioning hurt performance so
> >> much?
> >
> > I was analysing some of the plans (without partition and with
> > partition), Seems like one of the reasons of performing worse with the
> > partitioned table is that we can not use an index on the partitioned
> > table.
> >
> > Q4 is taking 17s without partition whereas it's taking 244s with partition.
> >
> > Now if we analyze the plan
> >
> > Without partition, it can use parameterize index scan on lineitem
> > table which is really big in size. But with partition, it has to scan
> > this table completely.
> >
> >   ->  Nested Loop Semi Join
> >  ->  Parallel Bitmap Heap Scan on orders
> >->  Bitmap Index Scan on
> > idx_orders_orderdate  (cost=0.00..24378.88 r
> >->  Index Scan using idx_lineitem_orderkey on
> > lineitem  (cost=0.57..29.29 rows=105 width=8) (actual
> > time=0.031..0.031 rows=1 loops=1122364)
> >Index Cond: (l_orderkey =
> > orders.o_orderkey)
> >Filter: (l_commitdate < 
> > l_receiptdate)
> >Rows Removed by Filter: 1
> >
>
> If the partitions have the same indexes as the unpartitioned table,
> planner manages to create parameterized plans for each partition and
> thus parameterized plan for the whole partitioned table. Do we have
> same indexes on unpartitioned table and each of the partitions? The

Yes both lineitem and orders have same number of partitions viz 17 and
on the same partitioning key (*_orderkey) and same ranges for each
partition. However, I missed creating the index on o_orderdate for the
partitions. But on creating it as well, the plan with bitmap heap scan
is used and it still completes in some 200 seconds, check the attached
file for the query plan.

> difference between the two cases is the parameterized path on an
> unpartitioned table scans only one index whereas that on the
> partitioned table scans the indexes on all the partitions. My guess is
> the planner thinks those many scans are costlier than hash/merge join
> and chooses those strategies over parameterized nest loop join. In
> case of partition-wise join, only one index on the inner partition is
> involved and thus partition-wise join picks up parameterized nest loop
> join. Notice, that this index is much smaller than the index on the
> partitioned table, so the index scan will be a bit faster. But only a
> bit, since the depth of the index doesn't increase linearly with the
> size of index.
>
As I have observed, the thing with this query is that selectivity
estimation is too high than actual, now when index scan is chosen for
lineitem being in the inner side of NLJ, the query completes quickly
since the number of actual returned rows is too low. However, in case
we pick seq scan, or lineitem is on the outer side, the query is going
to take a really long time. Now, when Hash-Join is picked in the case
of partitioned database and no partition-wise join is available, seq
scan is preferred instead of index scan and hence the elongated query
execution time.

I tried this query with random_page_cost = 0 and forcing NLJ and the
chosen plan completes the query in 45 seconds, check the attached file
for explain analyse output.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


q4_idx_orderdate.out
Description: Binary data


Q4_low_random_page_cost.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Rafia Sabih

On Wed, Jul 26, 2017 at 11:06 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Jul 26, 2017 at 11:00 AM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
> >
> >
> > On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat
> > <ashutosh.ba...@enterprisedb.com> wrote:
> >>
> >> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
> >> <rafia.sa...@enterprisedb.com> wrote:
> >>
> >> > Query plans for the above mentioned queries is attached.
> >> >
> >>
> >> Can you please share plans for all the queries, even if they haven't
> >> chosen partition-wise join when run on partitioned tables with
> >> enable_partition_wise_join ON? Also, please include the query in
> >> explain analyze output using -a or -e flats to psql. That way we will
> >> have query and its plan in the same file for ready reference.
> >>
> > I didn't run the query not using partition-wise join, for now.
>
> parse-parse error, sorry. Do you mean, you haven't run the queries
> which do not use partition-wise join?
>
> Yes, that's what I mean.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Rafia Sabih

On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
>
> > Query plans for the above mentioned queries is attached.
> >
>
> Can you please share plans for all the queries, even if they haven't
> chosen partition-wise join when run on partitioned tables with
> enable_partition_wise_join ON? Also, please include the query in
> explain analyze output using -a or -e flats to psql. That way we will
> have query and its plan in the same file for ready reference.
>
> I didn't run the query not using partition-wise join, for now.


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Rafia Sabih

On Thu, Jul 20, 2017 at 2:44 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
>
> On Thu, Jul 20, 2017 at 11:46 AM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
> > On 2017/07/20 15:05, Ashutosh Bapat wrote:
> >> On Wed, Jul 19, 2017 at 9:54 AM, Rafia Sabih
> >> <rafia.sa...@enterprisedb.com> wrote:
> >>>
> >>> Partition information:
> >>> Type of partitioning - single column range partition
> >>> Tables partitioned - Lineitem and orders
> >>>
> >>> Lineitem -
> >>> Partition key = l_orderkey
> >>> No of partitions = 18
> >>>
> >>> Orders -
> >>> Partition key = o_orderkey
> >>> No of partitions = 11
> >>>
> >>
> >> The patch set upto 0015 would refuse to join two partitioned relations
> >> using a partition-wise join if they have different number of
> >> partitions. Next patches implement a more advanced partition matching
> >> algorithm only for list partitions. Those next patches would refuse to
> >> apply partition-wise join for range partitioned tables. So, I am
> >> confused as to how come partition-wise join is being chosen even when
> >> the number of partitions differ.
> >
> > In 21_part_patched.out, I see that lineitem is partitionwise-joined with
> > itself.
> >
> >  >  Append
> >
> >->  Hash Semi Join
> >Hash Cond: (l1.l_orderkey = l2.l_orderkey)
> >Join Filter: (l2.l_suppkey <> l1.l_suppkey)
> >Rows Removed by Join Filter: 395116
> >
> >->  Parallel Seq Scan on lineitem_001 l1
> >Filter: (l_receiptdate > l_commitdate)
> >Rows Removed by Filter: 919654
> >
> >->  Hash
> >Buckets: 8388608  Batches: 1  Memory Usage: 358464kB
> >->  Seq Scan on lineitem_001 l2
> >
> Ah, I see now.
>
> We need the same number of partitions in all partitioned tables, for
> joins to pick up partition-wise join.
>
Oh, I missed this limitation, will modify my setup to have same number
of partitions in the partitioned table with same ranges. So, does this
also mean that a partitioned table will not join with an unpartitioned
table without append of partitions?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Rafia Sabih

On Thu, Jul 20, 2017 at 8:53 AM, Thomas Munro <thomas.mu...@enterprisedb.com
> wrote:

> On Thu, Jul 20, 2017 at 2:02 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munro
> > <thomas.mu...@enterprisedb.com> wrote:
> >> Isn't this the same as the issue reported here?
> >>
> >> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-
> 5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com
> >
> > Hmm, possibly.  But why would that affect the partition-wise join case
> only?
>
> It doesn't.  From Rafia's part_reg.zip we see a bunch of rows=1 that
> turn out to be wrong by several orders of magnitude:
>
> 21_nopart_head.out:  Hash Semi Join  (cost=5720107.25..9442574.55
> rows=1 width=50)
> 21_part_head.out:Hash Semi Join  (cost=5423094.06..8847638.36
> rows=1 width=38)
> 21_part_patched.out: Hash Semi Join  (cost=309300.53..491665.60 rows=1
> width=12)
>
> My guess is that the consequences of that bad estimate are sensitive
> to arbitrary other parameters moving around, as you can see from the
> big jump in execution time I showed in the that message, measured on
> unpatched master of the day:
>
>   4 workers = 9.5s
>   3 workers = 39.7s
>
> That's why why both parallel hash join and partition-wise join are
> showing regressions on Q21: it's just flip-flopping between various
> badly costed plans.  Note that even without parallelism, the fix that
> Tom Lane suggested gives a much better plan:
>
> https://www.postgresql.org/message-id/CAEepm%
> 3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com
>
>
Following the discussion at [1], with the patch Thomas posted there, now
Q21 completes in some 160 seconds. The plan is changed for the good but
does not use partition-wise join. The output of explain analyse is
attached.

Not just the join orders but the join strategy itself changed, with the
patch no hash semi join is picked which was consuming most time there,
rather nested loop semi join is in picture now, though the estimates are
still way-off, but the change in join-order made them terrible from
horrible. It appears like this query is performing efficient now
particularly because of worse under-estimated hash-join as compared to
under-estimated nested loop join.

For the hash-semi-join:
->  Hash  (cost=3449457.34..3449457.34 rows=119994934 width=8) (actual
time=180858.448..180858.448 rows=119994608 loops=3)
   Buckets: 33554432
 Batches: 8  Memory Usage: 847911kB

Overall, this doesn't look like a problem of partition-wise join patch
itself.

[1]
https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Q21_SE_patch.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-06-30 Thread Rafia Sabih

On Tue, Apr 4, 2017 at 12:37 PM, Amit Khandekar <amitdkhan...@gmail.com>
wrote:

> Attached is an updated patch v13 that has some comments changed as per
> your review, and also rebased on latest master.
>

This is not applicable on the latest head i.e. commit
-- 08aed6604de2e6a9f4d499818d7c641cbf5eb9f7, looks like need a rebasing.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-06-30 Thread Rafia Sabih

On Mon, May 22, 2017 at 12:02 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
> Here's set of patches rebased on latest head.
>

In an attempt to test this set of patches, I found that not all of the
patches could be applied on latest head-- commit
08aed6604de2e6a9f4d499818d7c641cbf5eb9f7
Might be in need of rebasing.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-06 Thread Rafia Sabih

On Mon, Jun 5, 2017 at 8:06 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
> Many of these seem worse, like these ones:
>
> - * Quit if we've reached records for another database. Unless the
> + * Quit if we've reached records of another database. Unless the
>
Why is it worse? I've always encountered using "records of database"
and not "records for database". Anyhow, I tried reframing the sentence
altogether.
> - * When we reach a new relation, close the old one.  Note, however,
> - * that the previous try_relation_open may have failed, in which case
> - * rel will be NULL.
> + * On reaching a new relation, close the old one.  Note, that the
> + * previous try_relation_open may have failed, in which case rel will
> + * be NULL.
>
Reframed the sentence.
> - * Try to open each new relation, but only once, when we first
> - * encounter it.  If it's been dropped, skip the associated blocks.
> + * Each relation is open only once at it's first encounter. If it's
> + * been dropped, skip the associated blocks.
>
Reframed.

> IMHO, there's still a good bit of work needed here to make this sound
> like American English.  For example:
>
> - *It is a bgworker which automatically records information about 
> blocks
> - *which were present in buffer pool before server shutdown and then
> - *prewarm the buffer pool upon server restart with those blocks.
> + *It is a bgworker process that automatically records information 
> about
> + *blocks which were present in buffer pool before server
> shutdown and then
> + *prewarms the buffer pool upon server restart with those blocks.
>
> This construction "It is a..." without a clear referent seems to be
> standard in Indian English, but it looks wrong to English speakers
> from other parts of the world, or at least to me.
>
Agreed, tried reframing the sentence.
> + * Since there could be at max one worker who could do a prewarm, hence,
> + * acquiring locks is not required before setting 
> skip_prewarm_on_restart.
>
> To me, adding a comma before hence looks like a significant
> improvement, but the word hence itself seems out-of-place.  Also, I'd
> change "at max" to "at most" and maybe reword the sentence a little.
> There's a lot of little things like this which I have tended be quite
> strict about changing before commit; I occasionally wonder whether
> it's really worth the effort.  It's not really wrong, it just sounds
> weird to me as an American.
>
Agree, sentence reframed.
I am attaching the patch with the modifications I made on a second look.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


cosmetic_autoprewarm_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-05 Thread Rafia Sabih

r things that need to be included
>> by multiple .c files, but there's no such need here.
>
> -- This was done to fix one of the previous review comments. I have
> moved them back to .c file.
>
>> + * load. If there are no other block infos than the global 
>> objects
>> + * we silently ignore them. Should I throw error?
>>
>> Silently ignoring them seems fine.  Throwing an error doesn't seem
>> like it would improve things.
>
> -- Okay thanks.
>
>>
>> +/*
>> + * Register a sub-worker to load new database's block. Wait until 
>> the
>> + * sub-worker finish its job before launching next sub-worker.
>> + */
>> +launch_prewarm_subworker();
>>
>> The function name implies that it launches the worker, but the comment
>> implies that it also waits for it to terminate.  Functions should be
>> named in a way that matches what they do.
>
> -- Have renamed it to launch_and_wait_for_per_database_worker
>
>> I feel like the get_autoprewarm_task() function is introducing fairly
>> baroque logic for something that really ought to be more simple.  All
>> autoprewarm_main() really needs to do is:
>>
>> if (!state->autoprewarm_done)
>> autoprewarm();
>> dump_block_info_periodically();
>
> -- Have simplified things as suggested now. Function
> get_autoprewarm_task has been removed.
>
>> The locking in autoprewarm_dump_now() is a bit tricky.  There are two
>> trouble cases.  One is that we try to rename() our new dump file on
>> top of the existing one while a background worker is still using it to
>> perform an autoprewarm.  The other is that we try to write a new
>> temporary dump file while some other process is trying to do the same
>> thing.  I think we should fix this by storing a PID in
>> AutoPrewarmSharedState; a process which wants to perform a dump or an
>> autoprewarm must change the PID from 0 to their own PID, and change it
>> back to 0 on successful completion or error exit.  If we go to perform
>> an immediate dump process and finds a non-zero value already just does
>> ereport(ERROR, ...), including the PID of the other process in the
>> message (e.g. "unable to perform block dump because dump file is being
>> used by PID %d").  In a background worker, if we go to dump and find
>> the file in use, log a message (e.g. "skipping block dump because it
>> is already being performed by PID %d", "skipping prewarm because block
>> dump file is being rewritten by PID %d").
>
> -- Fixed as suggested.
>
>> I also think we should change is_bgworker_running to a PID, so that if
>> we try to launch a new worker we can report something like
>> "autoprewarm worker is already running under PID %d".
>
> -- Fixed. I could only "LOG" about another autoprewarm worker already
> running and then exit. Because on ERROR we try to restart the worker,
> so do not want to restart such workers.
>
>> So putting that all together, I suppose AutoPrewarmSharedState should
>> end up looking like this:
>>
>> LWLock lock; /* mutual exclusion */
>> pid_t bgworker_pid;  /* for main bgworker */
>> pid_t pid_using_dumpfile; /* for autoprewarm or block dump */
>
> -- I think one more member is required which state whether prewarm can
> be done when the worker restarts.
>
>> /* following items are for communication with per-database worker */
>> dsm_handle block_info_handle;
>> Oid database;
>> int prewarm_start_idx;
>> int prewarm_stop_idx;
>
> -- Fixed as suggested
>
>> I suggest going through and changing "subworker" to "per-database
>> worker" throughout.
>
> -- Fixed as suggested.
>
>>
>> BTW, have you tested how long this takes to run with, say, shared_buffers = 
>> 8GB?
>
> I have tried same on my local machine with ssd as a storage.
>
> settings: shared_buffers = 8GB, loaded data with pg_bench scale_factor=1000.
>
> Total blocks got dumped
> autoprewarm_dump_now
> --
> 1048576
>
> 5 different load time based logs
>
> 1.
> 2017-06-04 11:30:26.460 IST [116253] LOG:  autoprewarm has started
> 2017-06-04 11:30:43.443 IST [116253] LOG:  autoprewarm load task ended
> -- 17 secs
>
> 2
> 2017-06-04 11:31:13.565 IST [116291] LOG:  autoprewarm has started
> 2017-06-04 11:31:30.317 IST [116291] LOG:  autoprewarm load task ended
> -- 17 secs
>
> 3.
> 2017-06-04 11:32:12.995 IST [116329] LOG:  autoprewarm has started
> 2017-06-04 11:32:29.982 IST [116329] LOG:  autoprewarm load task ended
> -- 17 secs
>
> 4.
> 2017-06-04 11:32:58.974 IST [116361] LOG:  autoprewarm has started
> 2017-06-04 11:33:15.017 IST [116361] LOG:  autoprewarm load task ended
> -- 17secs
>
> 5.
> 2017-06-04 12:15:49.772 IST [117936] LOG:  autoprewarm has started
> 2017-06-04 12:16:11.012 IST [117936] LOG:  autoprewarm load task ended
> -- 22 secs.
>
> So mostly from 17 to 22 secs.
>
> But I think I need to do tests on a larger set of configuration on
> different storage types. I shall do same and upload later. I have also
> uploaded latest performance test results (on my local machine ssd
> drive)
> configuration: shared_buffer = 8GB,
> test setting: scale_factor=300 (data fits to shared_buffers) pgbench clients 
> =1
>
> TEST
> PGBENCH_RUN="./pgbench --no-vacuum --protocol=prepared --time=5 -j 1
> -c 1 --select-only postgres"
> START_TIME=$SECONDS; echo TIME, TPS; while true; do TPS=$($PGBENCH_RUN
> | grep excluding | cut -d ' ' -f 3); TIME=$((SECONDS-START_TIME));
> echo $TIME, $TPS; done
>
>
I had a look at the patch from stylistic/formatting point of view,
please find the attached patch for the suggested modifications.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


cosmetic_autoprewarm.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-03 Thread Rafia Sabih

On Fri, Jun 2, 2017 at 5:48 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jun 2, 2017 at 8:09 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> I think if you have found spelling mistakes unrelated to this patch,
>> then it is better to submit those as a separate patch in a new thread.
>
> +1.

Sure, attached is the version without those changes.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


cosmetic_range_default_partition_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-02 Thread Rafia Sabih

On Wed, May 31, 2017 at 5:53 PM, Beena Emerson <memissemer...@gmail.com> wrote:
> Hello,
>
> PFA the updated patch.
> Dependent patch default_partition_v17.patch [1]
> This patch adds regression tests and removes the separate function to
> get default partition qual.
>
>
> On Mon, May 29, 2017 at 3:22 PM, Jeevan Ladhe
> <jeevan.la...@enterprisedb.com> wrote:
>> Hi Beena,
>>
>> I went through your patch, and here are some of my comments:
>>
>> - For generating a qual for default range partition, instead of scanning for
>> all
>> the children and collecting all the boundspecs, how about creating and
>> negating
>> an expression from the lists of lowerdatums and upperdatums in boundinfo.
>>
>
> Unlike list, range partition can be for multiple columns and the
> expressions get complicated. I have used the same logic of looping
> through different partitions and negating the ORed expr. However, this
> is now done under get_qual_for_range.
>
>
>> - Wrong comment:
>> + int default_index; /* Index of the default list partition. */
>
> Comment is made generic in the dependent patch.
>
>>
>> - Suggested by Robert earlier on default list partitioning thread, instead
>> of
>> abbreviating is_def/found_def use is(found)_default etc.
>
> corrected.
>
>>
>> - unrelated change:
>> - List   *all_parts;
>> + List   *all_parts;
>>
>
> undone.
>
>> - typo: partiton should be partition, similar typo at other places too.
>> +  * A non-default range partiton table does not currently allow partition
>> keys
>>
>
> rectified.
>
>> - Useless hunk for this patch:
>> - Oiddefid;
>> + Oid defid;
>
> undone.
>
>>
>> - better to use IsA here, instead of doing explicit check:
>> + bound->content[i] = (datum->type == T_DefElem)
>> + ? RANGE_DATUM_DEFAULT
>
> Modified
>
>>
>> - It is better to use head of either lowerdatums or upperdatums list to
>> verify
>> if its a default partition and get rid of the constant PARTITION_DEFAULT
>> altogether.
>>
>
> modified this part as necessary.
>
>
>> - In the function get_qual_from_partbound() is_def is always going to be
>> false
>> for range partition, the function get_qual_for_range can be directly passed
>> false instead.
>>
>> - Following comment for function get_qual_for_range_default() implies that
>> this
>> function returns bool, but the actually it returns a List.
>> + *
>> + * If DEFAULT is the only partiton for the table then this returns TRUE.
>> + *
>>
> Updated.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315573.html
>

Hi Beena,

I had a look at the patch from the angle of aesthetics and there are a
few cosmetic changes I might suggest. Please have a look at the
attached patch and if you agree with those changes you may merge it on
your patch. The patch also fixes a couple of more spelling mistakes
unrelated to this patch.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


cosmetic_range_default_partition.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-06-02 Thread Rafia Sabih

On Tue, May 16, 2017 at 9:39 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, May 7, 2017 at 11:54 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> I'm having second thoughts based on some more experimentation I did
>> this morning.  I'll update again once I've had a bit more time to poke
>> at it.
>
> So the issue that I noticed here is that this problem really isn't
> confined to abort processing.  If we ROLLBACK TO SAVEPOINT or ABORT
> TRANSACTION on an open connection, we really do not know what the
> state of that connection is until we get an acknowledgement that the
> command completed successfully.  The patch handles that.  However, the
> same is true if we're sending a SAVEPOINT or RELEASE SAVEPOINT
> command, and the patch that I posted doesn't do anything about those
> cases.  I think it would be best to fix all transaction control
> commands in a symmetric manner.
>
> Concretely, I think we should replace the abort_cleanup_incomplete
> flag from my previous patch with a changing_xact_state flag and set
> that flag around all transaction state changes, clearing it when such
> changes have succeeded.  On error, the flag remains set, so we know
> that the state of that connection is unknown and that we must close it
> (killing outer transaction levels as needed).
>
> Thoughts?

I tried following the sketch stated above, and here's what I added to
v2 of the patch. In begin_remote_xact when savepoint is send to the
remote server through do_sql_command I set the changing_xact_state
flag and when it successfully returns from there the flag is unset.
Similar behaviour is followed in pgfdw_subxact_callback for release
savepoint. Additionally, I added this flag set-unset for start
transaction command in begin_remote_xact. Enlighten me if I've
msised/misunderstood something here. I ran the regress test for
postgres_fdw and it went on fine.

I have a couple of concerns here, since there is only flag required
for the purpose, it's  name to be changing_xact_state doesn't suit at
some places particularly in pgfdw_subxact_callback, not sure should
change comment or variable name

/* Disarm abort_cleanup_incomplete if it all worked. */
+ entry->changing_xact_state = abort_cleanup_failure;

Also, by any chance should we add a test-case for this?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


improve-pgfdw-abort-behavior-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-06-01 Thread Rafia Sabih

On Tue, May 30, 2017 at 4:57 PM, Robert Haas <robertmh...@gmail.com> wrote:

> I did a little bit of brief experimentation on this same topic a long
> time ago and didn't see an improvement from boosting the queue size
> beyond 64k but Rafia is testing Gather rather than Gather Merge and,
> as I say, my test was very brief.  I think it would be a good idea to
> try to get a complete picture here.  Does this help on any query that
> returns many tuples through the Gather?  Only the ones that use Gather
> Merge?  Some queries but not others with no obvious pattern?  Only
> this query?
>
I did further exploration trying other values of
PARALLEL_TUPLE_QUEUE_SIZE and trying different queries and here are my
findings,
- on even setting PARALLEL_TUPLE_QUEUE_SIZE to 655360, there isn't
much improvement in q12 itself.
- there is no other TPC-H query which is showing significant
improvement on 6553600 itself. There is a small improvement in q3
which is also using gather-merge.
- as per perf analysis of q12 on head and patch, the %age of
ExecGatherMerge is 18% with patch and 98% on head, and similar with
gather_merge_readnext and gather_merge_writenext.

As per my understanding it looks like this increase in tuple queue
size is helping only gather-merge. Particularly, in the case where it
is enough stalling by master in gather-merge because it is maintaining
the sort-order. Like in q12 the index is unclustered and gather-merge
is just above parallel index scan, thus, it is likely that to maintain
the order the workers have to wait long for the in-sequence tuple is
attained by the master. Something like this might be happening, master
takes one tuple from worker 1, then next say 10 tuples from worker 2
and so on, and then finally returning to worker1, so, one worker 1 has
done enough that filled it's queue it sits idle. Hence, on increasing
the tuple queue size helps in workers to keep on working for longer
and this is improving the performance.

In other cases like q3, q18, etc. gather-merge is above sort, partial
group aggregate, etc. here the chances of stalls is comparatively
lesser stalls since the scan of relation is using the primary key,
hence the tuples in the blocks are likely to be in the order. Similar
was the case for many other cases of TPC-H queries. Other thing is
that in TPC-H benchmark queries most of the time the number of tuples
at gather-merge is fairly low so I'll try to test this on some custom
queries which exhibit aforementioned case.

> Blindly adding a GUC because we found one query that would be faster
> with a different value is not the right solution.   If we don't even
> know why a larger value is needed here and (maybe) not elsewhere, then
> how will any user possibly know how to tune the GUC?  And do we really
> want the user to have to keep adjusting a GUC before each query to get
> maximum performance?  I think we need to understand the whole picture
> here, and then decide what to do.  Ideally this would auto-tune, but
> we can't write code for that without a more complete picture of the
> behavior.
>
Yeah may be for the scenario discussed above GUC is not the best idea
but may be using something which can tell the relation between the
ordering on index and the physical ordering of the tuples along with
the number of tuples, etc. by the planner to decide the value of
PARALLEL_TUPLE_QUEUE_SIZE might help. E.g. if the index is primary key
then the physical order is same as index order and if this the sort
key then while at gather-merge stalls would be less, but if this is
unclustered index then the physical order is way different than index
order then it is likely that workers would be stalling more so keep a
higher value of PARALLEL_TUPLE_QUEUE _SIZE based on the number of
tuples.

Again I am not yet concluding anything as this is very less
experimentation to ascertain something, I'll continue the experiments
and would be grateful to have more suggestions on that.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-05-30 Thread Rafia Sabih

Hello everybody,

Here is a thing I observed in my recent experimentation, on changing
the value of PARALLEL_TUPLE_QUEUE_SIZE to 6553600, the performance of
a TPC-H query is improved by more than 50%.

Specifically, with this change, q12 completes in 14 seconds which was
taking 45 seconds on head. There wasn't any change in the plan
structure, just the time at gather-merge reduced which gave this
improvement.

This clearly says that the current value of PARALLEL_TUPLE_QUEUE_SIZE
is not the best one for all the queries, rather some modification in
it is very likely to improve performance significantly. One way to do
is to give this parameters as another GUC just like
min_parallel_table_scan_size, etc.

Attached .txt file gives the plan at head and with this patch,
additionally patch is attached for setting PARALLEL_TUPLE_QUEUE_SIZE
to 6553600 too.

Thoughts?
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
Head:
 \i /data/rafia.sabih/dss/queries/12.sql
   
---

 Limit  (cost=1001.19..391123.34 rows=1 width=27) (actual 
time=45511.755..45511.755 rows=1 loops=1)
   ->  GroupAggregate  (cost=1001.19..2731856.24 rows=7 width=27) (actual 
time=45511.753..45511.753 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..2721491.60 rows=592261 width=27) 
(actual time=7.806..44794.334 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..2649947.55 rows=148065 width=27) 
(actual time=0.342..9071.892 rows=62257 loops=5)
 ->  Parallel Index Scan using idx_l_shipmode on lineitem  
(cost=0.57..2543794.01 rows=148065 width=19) (actual time=0.284.
.7936.015 rows=62257 loops=5)
   Index Cond: (l_shipmode = ANY ('{"REG 
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND 
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date) AN
D (l_receiptdate < '1996-01-01 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 3368075
 ->  Index Scan using orders_pkey on orders  
(cost=0.56..0.71 rows=1 width=20) (actual time=0.016..0.016 rows=1 loops=31128
5)
   Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 1.143 ms
 Execution time: 45522.278 ms
(15 rows)



PATCH:
TPC-H queries:
\i /data/rafia.sabih/dss/queries/12.sql

  QUERY PLAN   

---

 Limit  (cost=1001.19..391123.34 rows=1 width=27) (actual 
time=14427.289..14427.290 rows=1 loops=1)
   ->  GroupAggregate  (cost=1001.19..2731856.24 rows=7 width=27) (actual 
time=14427.287..14427.287 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..2721491.60 rows=592261 width=27) 
(actual time=8.656..13469.925 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..2649947.55 rows=148065 width=27) 
(actual time=0.418..14050.843 rows=67321 loops=5)
 ->  Parallel Index Scan using idx_l_shipmode on lineitem  
(cost=0.57..2543794.01 rows=148065 width=19) (actual time=0.354.
.12291.338 rows=67321 loops=5)
   Index Cond: (l_shipmode = ANY ('{"REG 
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND 
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date) AN
D (l_receiptdate < '1996-01-01 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 3639282
 ->  Index Scan using orders_pkey on orders  
(cost=0.56..0.71 rows=1 width=20) (actual time=0.023..0.024 rows=1 loops=33660
6)
   Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 1.201 ms
 Execution time: 14569.550 ms
(15 rows)


change_parallel_tuple_q_size.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increasing parallel workers at runtime

2017-05-23 Thread Rafia Sabih

On Tue, May 23, 2017 at 4:41 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> Isn't this point contradictory?  Basically, on one side you are
> suggesting to calculate additional workers (work_remaining) based on
> selectivity and on another side you are saying that it will fix
> estimation errors.  IIUC, then you are talking about some sort of
> executor feedback to adjust a number of workers which might be a good
> thing to do but I that is a different problem to solve.  As of now, I
> don't think we have that type of mechanism even for non-parallel
> execution.
>
Yes, I am talking of a executor feedback mechanism sort of a thing.
For non parallel execution it's a lot more challenging since the
execution plan might need to be changed in the runtime, but in
parallel query case only addition/subtraction of workers is to be
dealt, which appears to be a less complex thing.
Why I am thinking in this direction is because in my experience it
seems very easy to regress with more workers when over-estimation is
done and this runtime calculation of workers can mitigate it largely.
However, I understand that this might require more proof than my
experience alone. Let's see if anybody else shares my gut feeling. :)

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increasing parallel workers at runtime

2017-05-22 Thread Rafia Sabih

On Wed, May 17, 2017 at 2:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, May 16, 2017 at 2:14 PM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>> On Mon, May 15, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>
>> Also, looking at the patch, it doesn't look like it take enough care
>> to build execution state of new worker so that it can participate in a
>> running query. I may be wrong, but the execution state initialization
>> routines are written with the assumption that all the workers start
>> simultaneously?
>>
>
> No such assumptions, workers started later can also join the execution
> of the query.
>
If we are talking of run-time allocation of workers I'd like to
propose an idea to safeguard parallelism from selectivity-estimation
errors. Start each query (if it qualifies for the use of parallelism)
with a minimum number of workers (say 2) irrespective of the #planned
workers. Then as query proceeds and we find that there is more work to
do, we allocate more workers.

Let's get to the details a little, we'll have following new variables,
- T_int - a time interval at which we'll periodically check if the
query requires more workers,
- work_remaining - a variable which estimates the work yet to do. This
will use the selectivity estimates to find the total work done and the
remaining work accordingly. Once, the actual number of rows crosses
the estimated number of rows, take maximum possible tuples for that
operator as the new estimate.

Now, we'll check at gather, after each T_int if the work is remaining
and allocate another 2 (say) workers. This way we'll keep on adding
the workers in small chunks and not in one go. Thus, saving resources
in case over-estimation is done.

Some of the things we may improvise upon are,
- check if we want to increase workers or kill some of them. e.g. if
the filtering is not happening at estimated at the node, i.e. #output
tuples is same as #input tuples, then do not add any more workers as
it will increase the work at gather only.
- Instead of just having a number of 2 or 4 workers at the start,
allocate x% of planned workers.
- As the query progresses, we may alter the value of T_int and/or
#workers to allocate. e.g. till a query is done something less than
50%, check at every T_int, after that increase T_int to T_int(1 + .5),
similarly for #workers, because now allocating more workers might not
do much work rather the cost of adding new workers could be more.

This scheme is likely to safeguard parallelism with selectivity
estimation errors in a sense that it is using resources only when
required.
Certainly there are many more things to think about in terms of
implementation and the cases where this can help or regress, will be
glad to know opinion of more people on this.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [POC] Faster processing at Gather node

2017-05-19 Thread Rafia Sabih

hich case currently parallelism is not selected.

I am testing the performance impact of this on TPC-H queries, in the
meantime would appreciate some feedback on the design, etc.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


faster_gather.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-04-24 Thread Rafia Sabih

On Tue, Apr 25, 2017 at 9:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andres Freund <and...@anarazel.de> writes:
>> On 2017-04-24 23:37:42 -0400, Bruce Momjian wrote:
>>> I remember seeing those and those are normally details I do not put in
>>> the release notes as there isn't a clear user experience change except
>>> "Postgres is faster".  Yeah, a bummer, and I can change my filter, but
>>> it would require discussion.
>
>> I think "postgres is faster" is one of the bigger user demands, so I
>> don't think that policy makes much sense.  A large number of the changes
>> over the next few releases will focus solely on that.  Nor do I think
>> past release notes particularly filtered such changes out.
>
> I think it has been pretty common to accumulate a lot of such changes
> into generic entries like, say, "speedups for hash joins".  More detail
> than that simply isn't useful to end users; and as a rule, our release
> notes are too long anyway.
>
> regards, tom lane
>
>
Just wondering if the mention of commit
0414b26bac09379a4cbf1fbd847d1cee2293c5e4 is missed? Not sure if this
requires a separate entry or could be merged with -- Support parallel
btree index scans.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Merge Join v1

2017-04-20 Thread Rafia Sabih

On Thu, Apr 6, 2017 at 2:13 PM, Jeff Davis <pg...@j-davis.com> wrote:

> 
> Example:
> 
>
> Find different people using the same website at the same time:
>
> create table session(sessionid text, username text, during tstzrange);
> SELECT s1.username, s2.username, s1.during * s2.during
>   FROM session s1, session s2
>   WHERE s1.during && s2.during AND s1.username < s2.username
>
>
> =
> Brief summary of previous discussion:
> =
>
> http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis
>
> - can indexes solve it already (parameterized paths, etc.)?
> - planner complexity (track path keys, mergejoinable equality, etc.)
> - spatial join algorithm choice
> - compare range merge join against existing indexes to see if any
>   indexing approach is viable
>
> ==
> Externals:
> ==
>
> No new syntax or other externals. Just improved execution strategy for
> joining ranges using the overlaps operator (&&).
>
> New syntax is possible, but I don't see a strong reason to support
> special range join syntax at this time.
>
> =
> Optimizer Design
> =
>
> I took the path of least resistance, so if I am doing something wrong
> please let me know. I hardwired it to look for the range overlaps
> operator when checking if it could do a merge join, and then add
> range_ops to restrictinfo->mergeopfamilies.
>
> Then, I have to suppress adding it as an equivalence class, because
> overlaps is not equality. It still adds single-member ECs to
> restrictinfo->right_ec and left_ec, but (I think) that's OK.
>
> Also, I have to prevent generating paths for right/full join, because
> the range join algorithm can't (currently) handle those.
>
> =
> Costing
> =
>
> Needs more consideration. Seems to do reasonable things in the few
> examples I tried.
>
> =
> Executor Design
> =
>
> See detailed comments in nodeMergejoin.c
>
> =
> Performance
> =
>
> Seems much better than index nest loop join when the join is not
> selective. I will post detailed numbers soon.
>
> If no index is available, range merge join is the only reasonable way
> to execute a range join. For instance, if the inner is not a leaf in
> the plan.
>
> =
> Alternatives:
> =
>
> It was suggested that I approach it as a general spatial-join
> problem. That has merit, but I rejected it for now because the
> algorithm that I think has the most promise is range-oriented
> anyway. By that I mean we would need to extract all of the dimensions
> into ranges first, and then perform the join. With that in mind, it
> makes sense to implement range joins first; and then later provide the
> APIs to get at the spatial dimensions so that we can implement other
> spatial joins as range joins.
>
> Another suggestion was to base it on hash join, but I never understood
> that proposal and it didn't seem to map very well to a spatial join.
>
> Yet another suggestion was to use some kind of temporary index. Some
> brief experiments I did indicated that it would be fairly slow (though
> more investigation might be useful here). Also, it doesn't provide any
> alternative to the nestloop-with-inner-index we already offer at the
> leaf level today.
>
> Regards,
>  Jeff Davis
>

Looks like an interesting idea, however, in an attempt to test this patch I
found following error when compiling,
selfuncs.c: In function ‘mergejoinscansel’:
selfuncs.c:2901:12: error: ‘op_strategy’ undeclared (first use in this
function)
   _strategy,

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] parallel explain analyze support not exercised

2017-04-02 Thread Rafia Sabih

On Sat, Apr 1, 2017 at 12:25 AM, Andres Freund <and...@anarazel.de> wrote:
> Hi,
>
> As visible in [1], the explain analyze codepaths of parallel query isn't
> exercised in the tests.  That used to be not entirely trivial if the
> output was to be displayed (due to timing), but we should be able to do
> that now that we have the SUMMARY option.
>
> E.g.
> SET max_parallel_workers = 0;
> EXPLAIN (analyze, timing off, summary off, costs off) SELECT * FROM blarg2 
> WHERE generate_series < 0;
> ┌───┐
> │QUERY PLAN │
> ├───┤
> │ Gather (actual rows=0 loops=1)│
> │   Workers Planned: 10 │
> │   Workers Launched: 0 │
> │   ->  Parallel Seq Scan on blarg2 (actual rows=0 loops=1) │
> │ Filter: (generate_series < 0) │
> │ Rows Removed by Filter: 1000  │
> └───┘
>
> should be reproducible.  I'd suggest additionally adding one tests that
> throws the EXPLAIN output away, but actually enables paralellism.
>
> Greetings,
>
> Andres Freund
>
> [1] 
> https://coverage.postgresql.org/src/backend/executor/execParallel.c.gcov.html

Please find the attached for the same.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


code_coverage.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-03-31 Thread Rafia Sabih

On Fri, Mar 31, 2017 at 5:13 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 8:24 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
>> <tomas.von...@2ndquadrant.com> wrote:
>>> What is however strange is that changing max_parallel_workers_per_gather
>>> affects row estimates *above* the Gather node. That seems a bit, um,
>>> suspicious, no? See the parallel-estimates.log.
>>
>> Thanks for looking at this!  Comparing the parallel plan vs. the
>> non-parallel plan:
>>
>> part: parallel rows (after Gather) 20202, non-parallel rows 20202
>> partsupp: parallel rows 18, non-parallel rows 18
>> part-partsupp join: parallel rows 88988, non-parallel rows 355951
>> lineitem: parallel rows 59986112, non-parallel rows 59986112
>> lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
>> final join: parallel rows 131, non-parallel rows 524
>>
>> I agree with you that that looks mighty suspicious.  Both the
>> part-partsupp join and the final join have exactly 4x as many
>> estimated rows in the non-parallel plan as in the parallel plan, and
>> it just so happens that the parallel divisor here will be 4.
>>
>> Hmm... it looks like the parallel_workers value from the Gather node
>> is being erroneously propagated up to the higher levels of the plan
>> tree.   Wow.   Somehow, Gather Merge managed to get the logic correct
>> here, but Gather is totally wrong.  Argh.   Attached is a draft patch,
>> which I haven't really tested beyond checking that it passes 'make
>> check'.
>>
>
> Your patch looks good to me.  I have verified some join cases as well
> where the behaviour is sane after patch.  I have also done testing
> with force_parallel_mode=regress (ran make check-world) and everything
> seems good.
>
> --

I tried checking the plan of Q20 with this patch, and got the following results,
with patch,
->  Merge Join  (cost=3025719.98..3499235.22 rows=6 width=16) (actual
time=176440.801..245903.143 rows=118124 loops=1)
   Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity
and without patch,
->  Merge Join  (cost=3014830.12..3511637.54 rows=2 width=16) (actual
time=130994.237..208887.149 rows=118124 loops=1)
 Merge Cond: ((partsupp.ps_partkey =
lineitem.l_partkey) AND (partsupp.ps_suppkey = lineitem.l_suppkey))
 Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity

So, it looks like in the problematic area, it is not improving much.
Please find the attached file for the query plan of Q20 with and
without patch. I haven't evaluated the performance of this query with
patch.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
 with patch:

 QUERY PLAN 


 Limit  (cost=3500605.72..3500605.73 rows=1 width=52) (actual 
time=572114.740..572114.740 rows=1 loops=1)
   ->  Sort  (cost=3500605.72..3500605.73 rows=1 width=52) (actual 
time=572114.738..572114.738 rows=1 loops=1)
 Sort Key: supplier.s_name
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Nested Loop Semi Join  (cost=3025720.40..3500605.71 rows=1 
width=52) (actual time=178675.638..572096.079 rows=3600 loops=1)
   Join Filter: (supplier.s_suppkey = lineitem.l_suppkey)
   Rows Removed by Join Filter: 723939540
   ->  Nested Loop  (cost=0.42..650.46 rows=8000 width=56) (actual 
time=5.210..128.099 rows=8090 loops=1)
 ->  Seq Scan on nation  (cost=0.00..0.41 rows=1 width=4) 
(actual time=0.016..0.039 rows=1 loops=1)
   Filter: (n_name = 'EGYPT'::bpchar)
   Rows Removed by Filter: 24
 ->  Index Scan using idx_supplier_nation_key on supplier  
(cost=0.42..570.04 rows=8000 width=64) (actual time=5.189..123.582 rows=8090 
loops=1)
   Index Cond: (s_nationkey = nation.n_nationkey)
   ->  Materialize  (cost=3025719.98..3499235.25 rows=6 width=16) 
(actual time=21.810..40.888 rows=89486 loops=8090)
 ->  Merge Join  (cost=3025719.98..3499235.22 rows=6 
width=16) (actual time=176440.801..245903.143 rows=118124 loops=1)
   Merge Cond: ((lineitem.l

Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Rafia Sabih

On Fri, Mar 31, 2017 at 4:18 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
> <k.knizh...@postgrespro.ru> wrote:
>> It is possible to execute query concurrently using SPI?
>> If so, how it can be enforced?
>> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
>> query is executed by single backend while the same query been launched at
>> top level uses parallel plan:
>>
>> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
>> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
>> ...
>> SPI_cursor_fetch(fsstate->portal, true, 1);
>
> Parallel execution isn't possible if you are using a cursor-type
> interface, because a parallel query can't be suspended and resumed
> like a non-parallel query.  If you use a function that executes the
> query to completion in one go, like SPI_execute_plan, then it's cool.
> See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.
>
> --

Adding to that, for your case, passing CURSOR_OPT_PARALLEL_OK is not
enough, because PortalRun for the cursor would be having
portal->run_once set as false which restricts parallelism in
ExecutePlan,
if (!execute_once || dest->mydest == DestIntoRel)
  use_parallel_mode = false;

You may check [1] for the discussion on this.

[1] 
https://www.postgresql.org/message-id/flat/CAFiTN-vxhvvi-rMJFOxkGzNaQpf%2BKS76%2Bsu7-sG_NQZGRPJkQg%40mail.gmail.com#cafitn-vxhvvi-rmjfoxkgznaqpf+ks76+su7-sg_nqzgrpj...@mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-30 Thread Rafia Sabih

On Tue, Mar 28, 2017 at 11:11 AM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> On Mon, Mar 27, 2017 at 12:20 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>>
>> On Sun, Mar 26, 2017 at 3:56 PM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>> > But... what you said above must be a problem for Windows.  I believe
>> > it doesn't allow files to be unlinked if they are open, and I see that
>> > DSM segments are cleaned up in resowner's phase ==
>> > RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
>> > RESOURCE_RELEASE_AFTER_LOCKS.
>>
>> I thought this last point about Windows might be fatal to my design,
>> but it seems that Windows since at least version 2000 has support for
>> Unixoid unlinkability via the special flag FILE_SHARE_DELETE.
>
> On testing v10 of this patch over commit
> b54aad8e34bd6299093e965c50f4a23da96d7cc3 and applying the tweak
> mentioned in [1], for TPC-H queries I found the results quite
> encouraging,
>
> Experimental setup:
> TPC-H scale factor - 20
> work_mem = 1GB
> shared_buffers = 10GB
> effective_cache_size = 10GB
> random_page_cost = seq_page_cost = 0.1
> max_parallel_workers_per_gather = 4
>
> Performance numbers:
> (Time in seconds)
> Query  |  Head | Patch |
> ---
> Q3   | 73   | 37  |
> Q5   | 56   | 31  |
> Q7   | 40   | 30  |
> Q8   | 8 | 8|
> Q9   | 85   | 42  |
> Q10 | 86   | 46  |
> Q14 | 11   | 6|
> Q16 | 32   | 11  |
> Q21 | 53   | 56  |
>
> Please find the attached file for the explain analyse output of these
> queries on head as well as patch.
> Would be working on analysing the performance of this patch on 300 scale 
> factor.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com
> --

Before moving to higher scale I tried playing around work_mem effects
on this patch and came across following results,
All settings are kept as before with the exception of work_mem that is
set to 64MB.

Most of the queries showed similar performance except a few, details
are as follows,
(all time are given in ms)
Query | Head| Patch
-+--+
 Q8 | 8720 | 8839
 Q18   | 370710 | 384347
 Q21   | 53270   | 65189

Clearly, regression in Q8 and Q18 is minor but that in Q21 is
significant. Just to confirm, I have applied the tweak mentioned in
[1] as before,
For the explain analyse output of Q21 on head and with patch, please
check the attached file.

 [1] 
https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


q21_reg_ph.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-03-29 Thread Rafia Sabih

On Thu, Mar 30, 2017 at 12:30 AM, Robert Haas <robertmh...@gmail.com> wrote:

> I don't think the problem originates at the Merge Join, though,
> because the commit says that at is fixing semi and anti-join estimates
> - this is a plain inner join, so in theory it shouldn't change.
> However, it's a bit hard for me to piece through these plans, the
> formatting kind of got messed up - things are wrapped.  Could you
> possibly attach the plans as attachments?
>
Sure, please find attached file for the plans before and after commit.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
The plan that completes in 19 hours after commit:
commit 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
Author: Tom Lane <t...@sss.pgh.pa.us>
Date:   Sat Dec 17 15:28:54 2016 -0500
 Fix FK-based join selectivity estimation for semi/antijoins.

QUERY PLAN
-
 Limit  (cost=60187550.59..60187550.59 rows=1 width=52) (actual 
time=71064602.963..71064602.964 rows=1 loops=1)
   ->  Sort  (cost=60187550.59..60187550.59 rows=3 width=52) (actual 
time=71064602.961..71064602.961 rows=1 loops=1)
 Sort Key: supplier.s_name
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Nested Loop Semi Join  (cost=52960550.15..60187550.57 rows=3 
width=52) (actual time=2163639.699..71063153.953 rows=52263 loops=1)
   Join Filter: (supplier.s_suppkey = lineitem.l_suppkey)
   Rows Removed by Join Filter: 155706242106
   ->  Nested Loop  (cost=963.43..10081.54 rows=12 width=56) 
(actual time=178.589..6282.852 rows=120358 loops=1)
 ->  Seq Scan on nation  (cost=0.00..0.41 rows=1 width=4) 
(actual time=0.018..0.042 rows=1 loops=1)
   Filter: (n_name = 'EGYPT'::bpchar)
   Rows Removed by Filter: 24
 ->  Bitmap Heap Scan on supplier  (cost=963.43..8881.13 
rows=12 width=64) (actual time=178.563..6143.786 rows=120358 loops=1)
   Recheck Cond: (s_nationkey = nation.n_nationkey)
   Heap Blocks: exact=57317
   ->  Bitmap Index Scan on idx_supplier_nation_key  
(cost=0.00..933.43 rows=12 width=0) (actual time=133.218..133.218 
rows=120358 loops=1)
 Index Cond: (s_nationkey = nation.n_nationkey)
   ->  Materialize  (cost=52959586.72..60024469.24 rows=85 
width=16) (actual time=12.679..175.456 rows=1293693 loops=120358)
 ->  Merge Join  (cost=52959586.72..60024468.82 rows=85 
width=16) (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
   Merge Cond: ((lineitem.l_partkey = 
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   Join Filter: ((partsupp.ps_availqty)::numeric > 
((0.5 * sum(lineitem.l_quantity
   Rows Removed by Join Filter: 3771
   ->  GroupAggregate  (cost=48448912.90..53325163.98 
rows=144696357 width=48) (actual time=1342085.116..2178225.340 rows=156665300 
loops=1)
 Group Key: lineitem.l_partkey, 
lineitem.l_suppkey
 ->  Sort  (cost=48448912.90..49125364.33 
rows=270580572 width=21) (actual time=1342085.072..1495863.254 rows=273057944 
loops=1)
   Sort Key: lineitem.l_partkey, 
lineitem.l_suppkey
   Sort Method: external merge  Disk: 
8282600kB
   ->  Bitmap Heap Scan on lineitem  
(cost=2847641.84..10552097.42 rows=270580572 width=21) (actual 
time=241170.637..650122.225 rows=273058377 loops=1)
 Recheck Cond: ((l_shipdate >= 
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without 
time zone))
 Heap Blocks: exact=3333
 ->  Bitmap Index Scan on 
idx_l_shipdate  (cost=0.00..2779996.70 rows=270580572 width=0) (actual 
time=190321.155..190321.155 rows=273058377 loops=1)
   Index Cond: ((l_shipdate >= 
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without 
time zone))
   ->  Sort  (cost=4510673.81..4516734.42 rows=2424244 
width=24) (actual time=183237.183..184039.914 rows=2602656 loops=1)
 Sort Key: partsupp.ps_partkey, 
partsupp.ps_suppkey
 Sort Method: quicksort  Memory: 301637kB
 ->  Hash Join  (cost=379620.36.

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-28 Thread Rafia Sabih

On Tue, Mar 28, 2017 at 9:05 PM, Robert Haas <robertmh...@gmail.com> wrote:
> OK, but don't pg_event_trigger_dropped_objects and
> pg_event_trigger_ddl_commands need the same treatment?
>
Done.
I was only concentrating on the build farm failure cases, otherwise I
think more work might be required in this direction.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


system_defined_fn_update_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-27 Thread Rafia Sabih

On Mon, Mar 27, 2017 at 5:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
> If it's just that they are relying on unsynchronized global variables,
> then it's sufficient to mark them parallel-restricted ('r').  Do we
> really need to go all the way to parallel-unsafe ('u')?
>
Done.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


system_defined_fn_update_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-27 Thread Rafia Sabih

On Fri, Mar 24, 2017 at 8:59 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Rafia,
>
>> if (my_command->argc > 2)
>> + syntax_error(source, lineno, my_command->line, my_command->argv[0],
>> + "at most on argument expected", NULL, -1);
>>
>> I suppose you mean 'one' argument here.
>
>
> Indeed.
>
>> Apart from that indentation is not correct as per pgindent, please check.
>
>
> I guess that you are refering to switch/case indentation which my emacs does
> not do as expected.
>
> Please find attached a v8 which hopefully fixes these two issues.
Looks good to me, marking as ready for committer.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-03-27 Thread Rafia Sabih

On Wed, Mar 8, 2017 at 11:52 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Mar 8, 2017 at 1:18 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>>> What I'm using is:
>>
>>> Configured with:
>>> --prefix=/Applications/Xcode.app/Contents/Developer/usr
>>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>>> Apple LLVM version 7.0.2 (clang-700.1.81)
>>> Target: x86_64-apple-darwin14.5.0
>>> Thread model: posix
>>
>> Hm.  I noticed that longfin didn't spit up on it either, despite having
>> -Werror turned on.  That's a slightly newer version, but still Apple's
>> clang:
>>
>> $ gcc -v
>> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>> Apple LLVM version 8.0.0 (clang-800.0.42.1)
>> Target: x86_64-apple-darwin16.4.0
>> Thread model: posix
>> InstalledDir: 
>> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>
> Yeah.  I think on my previous MacBook Pro, you could do this without
> generating a warning:
>
> int x;
> printf("%d\n", x);
>
> The compiler on this one detects that case, but that seems to be about
> as far as it goes.

Recently, on testing TPC-H 300 scale factor I stumbled on to a error
for Q6, the test environment is as follows,
work_mem = 1GB,
shared_buffers = 10 GB,
Effective_cache_size = 10GB
random_page_cost = seq+page_cost =0.1

The error message is, ERROR:  invalid DSA memory alloc request size 1610612740
In case required, stack trace is as follows,

#0  errfinish (dummy=0) at elog.c:415
#1  0x10738820 in elog_finish (elevel=20, fmt=0x109115d0
"invalid DSA memory alloc request size %zu") at elog.c:1378
#2  0x10778824 in dsa_allocate_extended (area=0x1001b53b138,
size=1610612740, flags=4) at dsa.c:676
#3  0x103a3e60 in pagetable_allocate (pagetable=0x1001b52a590,
size=1610612736) at tidbitmap.c:1521
#4  0x103a0488 in pagetable_grow (tb=0x1001b52a590,
newsize=33554432) at ../../../src/include/lib/simplehash.h:379
#5  0x103a07b0 in pagetable_insert (tb=0x1001b52a590,
key=34962730, found=0x3fffc705ba48 "") at
../../../src/include/lib/simplehash.h:504
#6  0x103a3354 in tbm_get_pageentry (tbm=0x1001b526fa0,
pageno=34962730) at tidbitmap.c:1235
#7  0x103a1614 in tbm_add_tuples (tbm=0x1001b526fa0,
tids=0x1001b51d22a, ntids=1, recheck=0 '\000') at tidbitmap.c:425
#8  0x100fdf24 in btgetbitmap (scan=0x1001b51c4a0,
tbm=0x1001b526fa0) at nbtree.c:460
#9  0x100f2c48 in index_getbitmap (scan=0x1001b51c4a0,
bitmap=0x1001b526fa0) at indexam.c:726
#10 0x1033c7d8 in MultiExecBitmapIndexScan
(node=0x1001b51c180) at nodeBitmapIndexscan.c:91
#11 0x1031c0b4 in MultiExecProcNode (node=0x1001b51c180) at
execProcnode.c:611
#12 0x1033a8a8 in BitmapHeapNext (node=0x1001b5187a8) at
nodeBitmapHeapscan.c:143
#13 0x1032a094 in ExecScanFetch (node=0x1001b5187a8,
accessMtd=0x1033a6c8 , recheckMtd=0x1033bab8
) at execScan.c:95
#14 0x1032a194 in ExecScan (node=0x1001b5187a8,
accessMtd=0x1033a6c8 , recheckMtd=0x1033bab8
) at execScan.c:162
#15 0x1033bb88 in ExecBitmapHeapScan (node=0x1001b5187a8) at
nodeBitmapHeapscan.c:667

Please feel free to ask if any more information is required for this.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-26 Thread Rafia Sabih

On Sun, Mar 26, 2017 at 3:34 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> It doesn't seem to be a platform-specific problem: I can duplicate the
>> failure here by applying same settings mandrill uses, ie build with
>> -DRANDOMIZE_ALLOCATED_MEMORY and set force_parallel_mode = regress.
>
> Oh ... scratch that: you don't even need -DRANDOMIZE_ALLOCATED_MEMORY.
> For some reason I just assumed that any parallelism-related patch
> would have been tested with force_parallel_mode = regress.  This one
> evidently was not.
>
> regards, tom lane
>

This is caused because trigger related functions are marked safe and
using global variables, hence when executed in parallel are giving
incorrect  output. Attached patch fixes this. I have modified only
those functions that are showing incorrect behaviour in the regress
output and checked regression test with force_parallel_mode = regress
and all testcases are passing now.

This concerns me that should we be checking all the system defined
functions once again if they are actually parallel safe...?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

system_defined_fn_update.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-24 Thread Rafia Sabih

On Thu, Mar 23, 2017 at 11:26 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
> The changes to the plpgsql code don't look so good to me.  The change
> to exec_stmt_return_query fixes the same bug that I mentioned in the
> email linked above, but only half of it -- it corrects the RETURN
> QUERY EXECUTE case but not the RETURN QUERY case.  And it's clearly a
> separate change; that part is a bug fix, not an enhancement.

My bad. Since, you have given this as a separate patch in the link
upthread, I suppose there's nothing expected from me regarding this
right now.

> Some of
> the other changes depend on whether we're in a trigger, which seems
> irrelevant to whether we can use parallelism. Even if the outer query
> is doing writes, we can still use parallelism for queries inside the
> trigger function if warranted.  It's probably a rare case to have
> queries inside a trigger that are expensive enough to justify such
> handling but I don't see the value of putting in special-case logic to
> prevent it.
>
Fixed. I confused it with not allowing parallel workers when update
command is in progress.

> I suspect that code fails to achieve its goals anyway.  At the top of
> exec_eval_expr(), you call exec_prepare_plan() and unconditionally
> pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
> might now be a parallel plan.  If we reach the call to
> exec_run_select() further down in that function, and if we happen to
> pass false, it's not going to matter, because exec_run_select() is
> going to find the plan already initialized.
>
True, fixed.
The attached patch is to be applied over [1].

[1]  
https://www.postgresql.org/message-id/CA%2BTgmoZ_ZuH%2BauEeeWnmtorPsgc_SmP%2BXWbDsJ%2BcWvWBSjNwDQ%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_opt_support_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-23 Thread Rafia Sabih

On Thu, Mar 16, 2017 at 12:45 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Rafia,
>
>> I was reviewing v7 of this patch, to start with I found following white
>> space errors when applying with git apply,
>> /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
>> whitespace.
>
>
> Yep.
>
> I do not know why "git apply" sometimes complains. All is fine for me both
> with "git apply" and "patch".
>
> Last time it was because my mailer uses text/x-diff for the mime type, as
> define by the system in "/etc/mime.types", which some mailer then interpret
> as a license to change eol-style when saving, resulting in this kind of
> behavior. Could you tell your mailer just to save the file as is?
>
>> Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT
>> 2
>> AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
>> giving error Invalid command \gcset.
>
>
> Are you sure that you are using the compiled pgbench, not a previously
> installed one?
>
>   bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql
> SQL/gset-1.sql:1: invalid command in command "gset"
> \gset
>
>   bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql
> starting vacuum...end.
> debug(script=0,command=2): int 1
> debug(script=0,command=4): int 2
> ...
>
>> Not sure what is the intention of this script anyway?
>
>
> The intention is to test that gset & gcset work as expected in various
> settings, especially with combined queries (\;) the right result must be
> extracted in the sequence.
>
>> Also, instead of so many different files for error why don't you combine
>> it into one.
>
>
> Because a pgbench scripts stops on the first error, and I wanted to test
> what happens with several kind of errors.
>
if (my_command->argc > 2)
+ syntax_error(source, lineno, my_command->line, my_command->argv[0],
+ "at most on argument expected", NULL, -1);

I suppose you mean 'one' argument here.
Apart from that indentation is not correct as per pgindent, please check.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-22 Thread Rafia Sabih

On Thu, Mar 23, 2017 at 5:23 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 10:33 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> So couldn't we actually make this test !fcache->returnsSet || !es->lazyEval?
>> That would let us allow parallel execution for all non-set-returning
>> functions, and also for set-returning functions that end up with
>> es->lazyEval set to false.
>
> Yes, this is the right thing to do although we may not enable
> parallelism for any more queries by adding "|| !es->lazyEval". Because
> SELECT are always marked as es->lazyEval=true(And so far we have
> parallelism only for select).  But here we calling the parameter to
> ExecutorRun as execute_once so  !fcache->returnsSet || !es->lazyEval
> is the correct one and future proof.
>
Agree, done.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


execute-once-v3.patch
Description: Binary data


pl_parallel_opt_support_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-22 Thread Rafia Sabih

On Wed, Mar 22, 2017 at 3:19 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
>>>>
>> In an attempt to test the geqo side of this patch, I reduced
>> geqo_threshold to 6 and set enable_partitionwise_join to to true and
>> tried following query, which crashed,
>>
>> explain select * from prt, prt2, prt3, prt32, prt4, prt42 where prt.a
>> = prt2.b and prt3.a = prt32.b and prt4.a = prt42.b and prt2.a > 1000
>> order by prt.a desc;
>>
>> Stack-trace for the crash is as follows,
>>
> Nice catch. When reparameterize_path_by_child() may be running in a
> temporary memory context while running in GEQO mode. It may add a new
> PPI to base relation all in the temporary context. In the next GEQO
> cycle, the ppilist will be clobbered since the temporary context is
> reset for each geqo cycle. The fix is to allocate PPI in the same
> memory context as the RelOptInfo similar to mark_dummy_rel().
>
> I also found another problem. In geqo, we never call
> generate_partition_wise_join_paths() which set cheapest paths for each
> child-join. Because of this cheapest_*_paths are never set for those
> rels, thus segfaulting in functions like sort_inner_and_outer() which
> use those.
>
> Here's patch fixing both the issues. Please let me know if it fixes
> the issues you are seeing.

I tested the applied patch, it is fixing the reported issue.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-22 Thread Rafia Sabih

rt3, prt32, prt4, prt42 where prt.a
= prt2.b and prt3.a = prt32.b and prt4.a = prt42.b and prt2.a > 1000
order by prt.a desc;

Stack-trace for the crash is as follows,

Program received signal SIGSEGV, Segmentation fault.
0x007a43d1 in find_param_path_info (rel=0x2d3fe30,
required_outer=0x2ff6d30) at relnode.c:1534
1534 if (bms_equal(ppi->ppi_req_outer, required_outer))
(gdb) bt
#0  0x007a43d1 in find_param_path_info (rel=0x2d3fe30,
required_outer=0x2ff6d30) at relnode.c:1534
#1  0x0079b8bb in reparameterize_path_by_child
(root=0x2df7550, path=0x2f6dec0, child_rel=0x2d4a860) at
pathnode.c:3455
#2  0x0075be30 in try_nestloop_path (root=0x2df7550,
joinrel=0x2ff51b0, outer_path=0x2f96540, inner_path=0x2f6dec0,
pathkeys=0x0,
jointype=JOIN_INNER, extra=0x7fffe6b4e130) at joinpath.c:344
#3  0x0075d55b in match_unsorted_outer (root=0x2df7550,
joinrel=0x2ff51b0, outerrel=0x2d4a860, innerrel=0x2d3fe30,
jointype=JOIN_INNER,
extra=0x7fffe6b4e130) at joinpath.c:1389
#4  0x0075bc5f in add_paths_to_joinrel (root=0x2df7550,
joinrel=0x2ff51b0, outerrel=0x2d4a860, innerrel=0x2d3fe30,
jointype=JOIN_INNER,
sjinfo=0x3076bc8, restrictlist=0x3077168) at joinpath.c:234
#5  0x0075f1d5 in populate_joinrel_with_paths (root=0x2df7550,
rel1=0x2d3fe30, rel2=0x2d4a860, joinrel=0x2ff51b0, sjinfo=0x3076bc8,
restrictlist=0x3077168) at joinrels.c:793
#6  0x00760107 in try_partition_wise_join (root=0x2df7550,
rel1=0x2d3f6d8, rel2=0x2d4a1a0, joinrel=0x30752f0,
parent_sjinfo=0x7fffe6b4e2d0,
parent_restrictlist=0x3075768) at joinrels.c:1401
#7  0x0075f0e6 in make_join_rel (root=0x2df7550,
rel1=0x2d3f6d8, rel2=0x2d4a1a0) at joinrels.c:744
#8  0x00742053 in merge_clump (root=0x2df7550,
clumps=0x3075270, new_clump=0x30752a8, force=0 '\000') at
geqo_eval.c:260
#9  0x00741f1c in gimme_tree (root=0x2df7550, tour=0x2ff2430,
num_gene=6) at geqo_eval.c:199
#10 0x00741df5 in geqo_eval (root=0x2df7550, tour=0x2ff2430,
num_gene=6) at geqo_eval.c:102
#11 0x0074288a in random_init_pool (root=0x2df7550,
pool=0x2ff23d0) at geqo_pool.c:109
#12 0x007422a6 in geqo (root=0x2df7550, number_of_rels=6,
initial_rels=0x2ff22d0) at geqo_main.c:114
#13 0x00747f19 in make_rel_from_joinlist (root=0x2df7550,
joinlist=0x2dce940) at allpaths.c:2333
#14 0x00744e7e in make_one_rel (root=0x2df7550,
joinlist=0x2dce940) at allpaths.c:182
#15 0x00772df9 in query_planner (root=0x2df7550,
tlist=0x2dec2c0, qp_callback=0x777ce1 ,
qp_extra=0x7fffe6b4e700)
at planmain.c:254

Please let me know if any more information is required on this.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-22 Thread Rafia Sabih

On Wed, Mar 22, 2017 at 12:55 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 6:36 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>> How about taking the decision of execute_once based on
>> fcache->returnsSet instead of based on lazyEval?
>>
>> change
>> + ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);
>> to
>> + ExecutorRun(es->qd, ForwardScanDirection, count, !fcache->returnsSet);
>>
>> IMHO, Robert have the same thing in mind?
>
> Yeah, something like that.
>
Done in execute-once-v2.patch

Apart from this, I also observed that in case of SQL functions,
cursorOptions are not set properly, note in init_execution_state,

stmt = pg_plan_query(queryTree,
fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
NULL)

Now, this fcache->readonly_func is set in init_sql_fcache,

/* Remember if function is STABLE/IMMUTABLE */
fcache->readonly_func =
(procedureStruct->provolatile != PROVOLATILE_VOLATILE);

so, if parallel safe stable is missing in function definition then it
is not as readonly as per this code. Also, we can see that this is set
as per function rather than per query as in case of other PL
functions. So, I did

stmt = pg_plan_query(queryTree,
-  fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
+  CURSOR_OPT_PARALLEL_OK,
  NULL);

Now, if the query is an update/insert/delete statement then it is
anyways taken care by planner and is not parallelised. This also
enables parallelism for the case when one query is select and another
is update in an SQL function which couldn't be done before.

This is done in pl_parallel_opt_v3.patch.

>>>SELECT * FROM blah() LIMIT 3
>>>
>>>...will trigger three separate calls to ExecutorRun(), which is a
>>>problem if the plan is a parallel plan.
>>
>> And you also need to test this case what Robert have mentioned up thread.
>
> +1
Checked, nope ExecutorRun is called only once in this case and
execute_once is true here.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

pl_parallel_opt_support_v3.patch
Description: Binary data

execute-once-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-21 Thread Rafia Sabih

On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas <robertmh...@gmail.com> wrote:
> Note this:
>
> if (completed || !fcache->returnsSet)
> postquel_end(es);
>
> When the SQL function doesn't return a set, then we can allow
> parallelism even when lazyEval is set, because we'll only call
> ExecutorStart() once.  But my impression is that something like this:

Well, when I test following SQL function I see it cannot be
parallelised because lazyEval is true for it though it is not
returning set,

CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
BEGIN
  SELECT count(distinct i) FROM t WHERE j = 12;
END;
$$ LANGUAGE sql;

Query Text:
 SELECT count(distinct i) FROM t WHERE j = 12;
Aggregate  (cost=34.02..34.02 rows=1 width=8) (actual
time=0.523..0.523 rows=1 loops=1)
 ->  Seq Scan on t  (cost=0.00..34.01 rows=1 width=4) (actual
time=0.493..0.493 rows=0 loops=1)
   Filter: (j = 12)
   Rows Removed by Filter: 2001
2017-03-21 15:24:03.378 IST [117823] CONTEXT:  SQL function
"already_parallel" statement 1
2017-03-21 15:24:03.378 IST [117823] LOG:  duration: 94868.181 ms  plan:
Query Text: select already_parallel();
Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=87981.047..87981.048 rows=1 loops=1)
 already_parallel
--
0
(1 row)

As far as my understanding goes for this case, lazyEvalOk is set
irrespective of whether the function returns set or not in fmgr_sql,

else
{
randomAccess = false;
lazyEvalOK = true;
}

then it is passed to init_sql_fcache which is then passed to
init_execution_state where cache->lazyEval is set,

if (lasttages && fcache->junkFilter)
{
lasttages->setsResult = true;
if (lazyEvalOK &&
lasttages->stmt->commandType == CMD_SELECT &&
!lasttages->stmt->hasModifyingCTE)
fcache->lazyEval = lasttages->lazyEval = true;
}

Finally, this lazyEval is passed to ExecutorRun in postquel_getnext
that restricts parallelism by setting execute_once = 0,

/* Run regular commands to completion unless lazyEval */
uint64 count = (es->lazyEval) ? 1 : 0;

ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);

So, this is my concern that why is such a query should not execute in
parallel when in SQL function. If I run this same query from PLpgsql
function then it can run in parallel,

CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
declare cnt int:=0;
BEGIN
  SELECT count(distinct i) into cnt FROM t WHERE j = 12;
  RETURN cnt;
END;
$$ LANGUAGE plpgsql;

select not_parallel();
2017-03-21 15:28:56.282 IST [123086] LOG:  duration: 0.003 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
 Filter: (j = 12)
2017-03-21 15:28:56.282 IST [123087] LOG:  duration: 0.003 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
 Filter: (j = 12)
2017-03-21 15:28:57.530 IST [117823] LOG:  duration: 1745.372 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Aggregate  (cost=19.42..19.43 rows=1 width=8) (actual
time=1255.743..1255.743 rows=1 loops=1)
 ->  Gather  (cost=0.00..19.42 rows=1 width=4) (actual
time=1255.700..1255.700 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4)
(actual time=418.443..418.443 rows=0 loops=3)
 Filter: (j = 12)
 Rows Removed by Filter: 667
2017-03-21 15:28:57.530 IST [117823] CONTEXT:  SQL statement "SELECT
count(distinct i)  FROM t WHERE j = 12"
PL/pgSQL function not_parallel() line 4 at SQL statement
2017-03-21 15:28:57.531 IST [117823] LOG:  duration: 2584.282 ms  plan:
Query Text: select not_parallel();
Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=2144.315..2144.316 rows=1 loops=1)
 not_parallel
--
0
(1 row)

Hence, it appears lazyEval is the main reason behind it and it should
be definitely fixed in my opinion.
Please enlighten me with your comments/opinions.

Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-20 Thread Rafia Sabih

On Mon, Mar 20, 2017 at 8:21 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 8:10 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> While I was studying what you did with reparameterize_path_by_child(),
>> I started to wonder whether reparameterize_path() doesn't need to
>> start handling join paths.  I think it only handles scan paths right
>> now because that's the only thing that can appear under an appendrel
>> created by inheritance expansion, but you're changing that.  Maybe
>> it's not critical -- I think the worst consequences of missing some
>> handling there is that we won't consider a parameterized path in some
>> case where it would be advantageous to do so.  Still, you might want
>> to investigate a bit.
>
> I spent a fair amount of time this weekend musing over
> reparameterize_path_by_child().  I think a key question for this patch
> - as you already pointed out - is whether we're happy with that
> approach.  When we discover that we want to perform a partitionwise
> parameterized nestloop, and therefore that we need the paths for each
> inner appendrel to get their input values from the corresponding outer
> appendrel members rather than from the outer parent, we've got two
> choices.  The first is to do what the patch actually does, which is to
> build a new path tree for the nestloop inner path parameterized by the
> appropriate childrel.  The second is to use the existing paths, which
> are parameterized by the parent rel, and then somehow allow make that
> work.  For example, you can imagine that create_plan_recurse() could
> pass down a list of parameterized nestloops above the current point in
> the path tree, and a parent-child mapping for each, and then we could
> try to substitute everything while actually generating the plan
> instead of creating paths sooner.  Which is better?
>
> It would be nice to hear opinions from anyone else who cares, but
> after some thought I think the approach you've picked is probably
> better, because it's more like what we do already.  We have existing
> precedent for reparameterizing a path, but none for allowing a Var for
> one relation (the parent) to in effect refer to another relation (the
> child).
>
> That having been said, having try_nestloop_path() perform the
> reparameterization at the very top of the function seems quite
> undesirable.  You're creating a new path there before you know whether
> it's going to be rejected by the invalid-parameterization test and
> also before you know whether initial_cost_nestloop is going to reject
> it.  It would be much better if you could find a way to postpone the
> reparameterization until after those steps, and only do it if you're
> going to try add_path().

On a further testing of this patch I find another case when it is
showing regression, the time taken with patch is around 160 secs and
without it is 125 secs.
Another minor thing to note that is planning time is almost twice with
this patch, though I understand that this is for scenarios with really
big 'big data' so this may not be a serious issue in such cases, but
it'd be good if we can keep an eye on this that it doesn't exceed the
computational bounds for a really large number of tables..
Please find the attached .out file to check the output I witnessed and
let me know if anymore information is required
Schema and data was similar to the preciously shared schema with the
addition of more data for this case, parameter settings used were:
work_mem = 1GB
random_page_cost = seq_page_cost = 0.1

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pwj_regress_2.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-18 Thread Rafia Sabih

On Sat, Mar 18, 2017 at 5:40 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 9:15 AM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>> This set of patches fixes both of those things.
>
> 0001 changes the purpose of a function and then 0007 renames it.  It
> would be better to include the renaming in 0001 so that you're not
> taking multiple whacks at the same function in the same patch series.
> I believe it would also be best to include 0011's changes to
> adjust_appendrel_attrs_multilevel in 0001.
>
> 0002 should either add find_param_path_info() to the relevant header
> file as extern from the beginning, or it should declare and define it
> as static and then 0007 can remove those markings.  It makes no sense
> to declare it as extern but put the prototype in the .c file.
>
> 0004 still needs to be pared down.  If you want to get something
> committed this release cycle, you have to get these details taken care
> of, uh, more or less immediately.  Actually, preferably, several weeks
> ago.  You're welcome to maintain your own test suite locally but what
> you submit should be what you are proposing for commit -- or if not,
> then you should separate the part proposed for commit and the part
> included for dev testing into two different patches.
>
> In 0005's README, the part about planning partition-wise joins in two
> phases needs to be removed.  This patch also contains a small change
> to partition_join.sql that belongs in 0004.
>
> 0008 removes direct tests against RELOPT_JOINREL almost everywhere,
> but it overlooks the new ones added to postgres_fdw.c by
> b30fb56b07a885f3476fe05920249f4832ca8da5.  It should be updated to
> cover those as well, I suspect.  The commit message claims that it
> will "Similarly replace RELOPT_OTHER_MEMBER_REL test with
> IS_OTHER_REL() where we want to test for child relations of all kinds,
> but in fact it makes exactly zero such substitutions.
>
> While I was studying what you did with reparameterize_path_by_child(),
> I started to wonder whether reparameterize_path() doesn't need to
> start handling join paths.  I think it only handles scan paths right
> now because that's the only thing that can appear under an appendrel
> created by inheritance expansion, but you're changing that.  Maybe
> it's not critical -- I think the worst consequences of missing some
> handling there is that we won't consider a parameterized path in some
> case where it would be advantageous to do so.  Still, you might want
> to investigate a bit.
>
I was trying to play around with this patch and came across following
case when without the patch query completes in 9 secs and with it in
15 secs. Theoretically, I tried to capture the case when each
partition is having good amount of rows in output and each has to
build their own hash, in that case the cost of building so many hashes
comes to be more costly than having an append and then join. Thought
it might be helpful to consider this case in better designing of the
algorithm. Please feel free to point out if I missed something.

Test details:
commit: b4ff8609dbad541d287b332846442b076a25a6df
Please find the attached .sql file for the complete schema and data
and .out file for the result of explain analyse with and without
patch.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pwj_regress_test.out
Description: Binary data


test_case_pwj.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-15 Thread Rafia Sabih

On Wed, Mar 15, 2017 at 10:52 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> I have reviewed the patch, I have some questions.
>
> @@ -3031,7 +3031,7 @@ exec_stmt_return_query(PLpgSQL_execstate *estate,
>   Assert(stmt->dynquery != NULL);
>   portal = exec_dynquery_with_params(estate, stmt->dynquery,
> stmt->params, NULL,
> -   CURSOR_OPT_PARALLEL_OK);
> +   0);
>
> After this patch, I have noticed that In exec_stmt_return_query, we
> allow parallel query if it's a static query
> but not for the dynamic query.  Any specific reason for different
> handling for these 2 cases?
>
The reason for such behaviour is given upthread, in
exec_stmt_return_query, the query may be executed by either
exec_run_select which then uses SPI_execute_plan_with_paramlist(),
which calls _SPI_execute_plan(), which calls _SPI_pquery(), hence if
parallelOK is set then passing CURSOR_OPT_PARALLEL_OK is safe.
Otherwise, exec_stmt_return_query executes with a call to
SPI_cursor_fetch() with a fetch count of 50, and that calls
_SPI_cursor_operation() which calls PortalRunFetch(), hence, passing
CURSOR_OPT_PARALLEL_OK would not be safe in this case hence not
passed.
>
> @ -314,7 +314,7 @@ SPI_execute(const char *src, bool read_only, long tcount)
>
>   memset(, 0, sizeof(_SPI_plan));
>   plan.magic = _SPI_PLAN_MAGIC;
> - plan.cursor_options = 0;
> + plan.cursor_options = CURSOR_OPT_PARALLEL_OK;
>
> In SPI_Execute, we are setting cursor_options to
> CURSOR_OPT_PARALLEL_OK. I have analysed call to this function from PL
> and seems fine to me. But, I have a question have you analyzed all the
> calls to this functions?
> e.g. build_tuplestore_recursively, get_crosstab_tuplestore.
>
The thing with SPI_execute is that it calls pg_plan_query through
_SPI_execute_plan, there if the query is parallel unsafe then
parallelism will be  restricted by planner itself otherwise it is
enabled if CURSOR_OPT_PARALLEL_OK was set. So, yes I evaluated all the
calls for this functions and here is the analysis of why it should be
safe passing CURSOR_OPT_PARALLEL_OK, in some of the suspicious looking
calls(other calls are directly related to spy functions)
In crosstab: ret = SPI_execute(sql, true, 0);
In load_categories_hash: ret = SPI_execute(cats_sql, true, 0);
In get_crosstab_tuplestore: ret = SPI_execute(sql, true, 0);
In build_tuplestore_recursively: ret = SPI_execute(sql.data, true, 0);
In query_to_oid_list: SPI_execute(query, true, 0);

In all of these calls, read_only flag is passed to be true, hence
enabling parallelism will not cause any anomalous behaviour.

In refresh_by_match_merge: if (SPI_execute(querybuf.data, false, 1) !=
SPI_OK_SELECT)
In this case, since read_only is set to false, hence, in SPI_execute
when planner will recalled for such a case it will disable
parallelism, hence, no issues.

if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
dynamic_shared_memory_type != DSM_IMPL_NONE &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker() &&
!IsolationIsSerializable())

>
> On Fri, Mar 10, 2017 at 5:38 PM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
>> I wanted to clarify a few things here, I noticed that call of ExecutorRun in
>> postquel_getnext() uses !es->lazyEval as execute_once, this is confusing, as
>> it is true even in cases when a simple query like "select count(*) from t"
>> is used in a sql function. Hence, restricting parallelism for cases when it
>> shouldn't. It seems to me that es->lazyEval is not set properly or it should
>> not be true for simple select statements. I found that in the definition of
>> execution_state
>> bool lazyEval; /* true if should fetch one row at a time
>
> Hmm, It seems that es->lazyEval is not set properly, Ideally, it
> should be set true only if it's lazy evaluation but in this case, it's
> used for identifying the tupcount as well.  IMHO, it should be fixed.
>
> Any other opinion on this?
>
Hmmm... I tried investigating into this but it looks like there isn't
much scope for this. LazyEvalOk is set for SELECT commands in
init_execution_state as per,
/*
* Mark the last canSetTag query as delivering the function result; then,
* if it is a plain SELECT, mark it for lazy evaluation. If it's not a
* SELECT we must always run it to completion.
...
if (lasttages && fcache->junkFilter)
{
lasttages->setsResult = true;
if (lazyEvalOK &&
lasttages->stmt->commandType == CMD_SELECT &&
!lasttages->stmt->hasModifyingCTE)
fcache->lazyEval = lasttages->lazyEval = true;
}
and then in postquel_getnext we set execute_once = !es->lazyEval [1],
which also makes sense since,
/* Run regular commands t

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-14 Thread Rafia Sabih

On Tue, Jan 31, 2017 at 11:54 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> Bonjour Michaël,
>
> Attached are the patch, a test script for the feature, and various test
>>> scripts to trigger error cases.
>>>
>>
>> I have moved this patch to next CF
>>
>
> Ok.
>
> as the last status is a new patch set with no further reviews.
>>
>
> Indeed.
>
> I did not check if the comments have been applied though, this is a bit
>> too much for me now...
>>
>
> Sure.

I was reviewing v7 of this patch, to start with I found following white
space errors when applying with git apply,
 /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
whitespace.
char   *line; /* first line for short display */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:67: trailing
whitespace.
char   *lines; /* full multi-line text of command */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:72: trailing
whitespace.
int compound;   /* last compound command (number of \;) */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:73: trailing
whitespace.
char  **gset;   /* per-compound command prefix */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:81: trailing
whitespace.
/* read all responses from backend */
error: patch failed: doc/src/sgml/ref/pgbench.sgml:815
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.c:375
error: src/bin/pgbench/pgbench.c: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.h:11
error: src/bin/pgbench/pgbench.h: patch does not apply
error: patch failed: src/fe_utils/psqlscan.l:678
error: src/fe_utils/psqlscan.l: patch does not apply
error: patch failed: src/include/fe_utils/psqlscan_int.h:112
error: src/include/fe_utils/psqlscan_int.h: patch does not apply

Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT 2
AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
giving error Invalid command \gcset. Not sure what is the intention of this
script anyway? Also, instead of so many different files for error why don't
you combine it into one.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-13 Thread Rafia Sabih

On Wed, Mar 8, 2017 at 1:54 AM, Robert Haas <robertmh...@gmail.com> wrote:

> There are two places where we currently set CURSOR_OPT_PARALLEL_OK in
> PL/pgsql: exec_stmt_return_query() sets it when calling
> exec_dynquery_with_params(), and exec_run_select() calls it when
> calling exec_prepare_plan() if parallelOK is set.  The latter is OK,
> because exec_run_select() runs the plan via
> SPI_execute_plan_with_paramlist(), which calls _SPI_execute_plan(),
> which calls _SPI_pquery().  But the former is broken, because
> exec_stmt_return_query() executes the query by calling
> SPI_cursor_fetch() with a fetch count of 50, and that calls
> _SPI_cursor_operation() which calls PortalRunFetch() -- and of course
> each call to PortalRunFetch() is going to cause a separate call to
> PortalRunSelect(), resulting in a separate call to ExecutorRun().
> Oops.
>

Fixed. The attached patch is over execute_once.patch [1].
I haven't addressed the issue regarding the confusion I raised upthread
about incorrect value of !es->lazyeval that is restricting parallelism for
simple queries coming from sql functions, as I am not sure if it is the
concern of this patch or not.

[1]
https://www.postgresql.org/message-id/ca+tgmobxehvhbjtwdupzm9bvslitj-kshxqj2um5gpdze9f...@mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_opt_support_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-13 Thread Rafia Sabih

owing existing
> > usages of the "pattern", it's probably hard to grasp.
>
> Examples added.
>
> > + *---
> --
> > + */
> > +
> > +#include "storage/barrier.h"
> >
> > Aren't you missing an include of postgres.h here?
>
> Fixed.
>
> > +bool
> > +BarrierWait(Barrier *barrier, uint32 wait_event_info)
> > +{
> > +   bool first;
> > +   bool last;
> > +   int start_phase;
> > +   int next_phase;
> > +
> > +   SpinLockAcquire(>mutex);
> > +   start_phase = barrier->phase;
> > +   next_phase = start_phase + 1;
> > +   ++barrier->arrived;
> > +   if (barrier->arrived == 1)
> > +   first = true;
> > +   else
> > +   first = false;
> > +   if (barrier->arrived == barrier->participants)
> > +   {
> > +   last = true;
> > +   barrier->arrived = 0;
> > +   barrier->phase = next_phase;
> > +   }
> > +   else
> > +   last = false;
> > +   SpinLockRelease(>mutex);
> >
> > Hm. So what's the defined concurrency protocol for non-static barriers,
> > when they attach after the spinlock here has been released?  I think the
> > concurrency aspects deserve some commentary.  Afaics it'll correctly
> > just count as the next phase - without any blocking - but that shouldn't
> > have to be inferred.
>
> It may join at start_phase or next_phase depending on what happened
> above.  If it we just advanced the phase (by being the last to arrive)
> then another backend that attaches will be joining at phase ==
> next_phase, and if that new backend calls BarrierWait it'll be waiting
> for the phase after that.
>
> > Things might get wonky if that new participant
> > then starts waiting for the new phase, violating the assert below...
>
> > +   Assert(barrier->phase == start_phase || barrier->phase
> == next_phase);
>
> I've added a comment near that assertion that explains the reason the
> assertion holds.
>
> Short version:  The caller is attached, so there is no way for the
> phase to advance beyond next_phase without the caller's participation;
> the only possibilities to consider in the wait loop are "we're still
> waiting" or "the final participant arrived or detached, advancing the
> phase and releasing me".
>
> Put another way, no waiting backend can ever see phase advance beyond
> next_phase, because in order to do so, the waiting backend would need
> to run BarrierWait again; barrier->arrived can never reach
> barrier->participants a second time while we're in that wait loop.
>
> > +/*
> > + * Detach from a barrier.  This may release other waiters from
> BarrierWait and
> > + * advance the phase, if they were only waiting for this backend.
> Return
> > + * true if this participant was the last to detach.
> > + */
> > +bool
> > +BarrierDetach(Barrier *barrier)
> > +{
> > +   bool release;
> > +   bool last;
> > +
> > +   SpinLockAcquire(>mutex);
> > +   Assert(barrier->participants > 0);
> > +   --barrier->participants;
> > +
> > +   /*
> > +* If any other participants are waiting and we were the last
> participant
> > +* waited for, release them.
> > +*/
> > +   if (barrier->participants > 0 &&
> > +   barrier->arrived == barrier->participants)
> > +   {
> > +   release = true;
> > +   barrier->arrived = 0;
> > +   barrier->phase++;
> > +   }
> > +   else
> > +   release = false;
> > +
> > +   last = barrier->participants == 0;
> > +   SpinLockRelease(>mutex);
> > +
> > +   if (release)
> > +   ConditionVariableBroadcast(&
> barrier->condition_variable);
> > +
> > +   return last;
> > +}
> >
> > Doesn't this, again, run into danger of leading to an assert failure in
> > the loop in BarrierWait?
>
> I believe this code is correct.  The assertion in BarrierWait can't
> fail, because waiters know that there is no way for the phase to get
> any further ahead without their help (because they are attached):
> again, the only possibilities are phase == start_phase (implying that
> they received a spurious condition variable signal) or phase ==
> next_phase (the last backend being waited on has finally arrived or
> detached, allowing other participants to proceed).
>
> I've attached a test module that starts N workers, and makes the
> workers attach, call BarrierWait a random number of times, then
> detach, and then rinse and repeat, until the phase reaches some large
> number and they all exit.  This exercises every interleaving of the
> attach, wait, detach.  CREATE EXTENSION test_barrier, then something
> like SELECT test_barrier_reattach_random(4, 100) to verify that no
> assertions are thrown and it always completes.
>
> > +#include "postgres.h"
> >
> > Huh, that normally shouldn't be in a header.  I see you introduced that
> > in a bunch of other places too - that really doesn't look right to me.
>
> Fixed.
>
> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
few regressions,
Q21: 52 secs on HEAD and  400 secs with this patch
Q8: 8 secs on HEAD to 14 secs with patch

However, to avoid me being framed as some sort of "jinx" [;)] I'd like to
report a few cases of improvements also,
Q3: improved to 44 secs from 58 secs on HEAD
Q9: 81 secs on HEAD to 48 secs with patch
Q10: improved to 47 secs from 57 secs on HEAD
Q14: 9 secs on HEAD to 5 secs with patch

The details of this experimental setup is as follows,
scale-factor: 20
work_mem = 1GB
shared_buffers = 10GB

For the output plans on head and with patch please find the attached tar
file. In case, you require any more information please let me know.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


ph.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-10 Thread Rafia Sabih

On Wed, Mar 8, 2017 at 1:54 AM, Robert Haas <robertmh...@gmail.com> wrote:

>
> Here's a draft patch showing the sort of thing I have in mind.  I
> think it needs more work, but it gives you the idea, I hope.  This is
> loosely based on your pl_parallel_exec_support_v1.patch, but what I've
> done here is added some flags that carry the information about whether
> there will be only one or maybe more than one call to ExecutorRun to a
> bunch of relevant places.
>
> I think this might have the effect of disabling parallel query in some
> cases where PL/pgsql currently allows it, and I think that may be
> necessary.  (We may need to back-patch a different fix into 9.6.)
>

I wanted to clarify a few things here, I noticed that call of ExecutorRun
in postquel_getnext() uses !es->lazyEval as execute_once, this is
confusing, as it is true even in cases when a simple query like "select
count(*) from t" is used in a sql function. Hence, restricting parallelism
for cases when it shouldn't. It seems to me that es->lazyEval is not set
properly or it should not be true for simple select statements. I found
that in the definition of execution_state
bool lazyEval; /* true if should fetch one row at a time */
and in init_execution_state, there is a comment saying,
* Mark the last canSetTag query as delivering the function result; then,

* if it is a plain SELECT, mark it for lazy evaluation. If it's not a

* SELECT we must always run it to completion.

I find these two things contradictory to each other. So, is this point
missed or is there some deep reasoning behind that?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-07 Thread Rafia Sabih

On Sun, Feb 26, 2017 at 7:09 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
> I think I see the problem that you're trying to solve, but I agree
> that this doesn't seem all that elegant.  The reason why we have that
> numberTuples check is because we're afraid that we might be in a
> context like the extended-query protocol, where the caller can ask for
> 1 tuple, and then later ask for another tuple.  That won't work,
> because once we shut down the workers we can't reliably generate the
> rest of the query results.  However, I think it would probably work
> fine to let somebody ask for less than the full number of tuples if
> it's certain that they won't later ask for any more.
>
> So maybe what we ought to do is allow CURSOR_OPT_PARALLEL_OK to be set
> any time we know that ExecutorRun() will be called for the QueryDesc
> at most once rather than (as at present) only where we know it will be
> executed only once with a tuple-count of zero.  Then we could change
> things in ExecutePlan so that it doesn't disable parallel query when
> the tuple-count is non-zero, but does take an extra argument "bool
> execute_only_once", and it disables parallel execution if that is not
> true.  Also, if ExecutorRun() is called a second time for the same
> QueryDesc when execute_only_once is specified as true, it should
> elog(ERROR, ...).  Then exec_execute_message(), for example, can pass
> that argument as false when the tuple-count is non-zero, but other
> places that are going to fetch a limited number of rows could pass it
> as true even though they also pass a row-count.
>
> I'm not sure if that's exactly right, but something along those lines
> seems like it should work.
>

IIUC, this needs an additional bool execute_once in the queryDesc which is
set to true in standard_ExecutorRun when the query is detected to be coming
from PL function or provided count is zero i.e. execute till the end, in
case execute_once is already true then report the error.

>
> I think that a final patch for this functionality should involve
> adding CURSOR_OPT_PARALLEL_OK to appropriate places in each PL, plus
> maybe some infrastructure changes like the ones mentioned above.
> Maybe it can be divided into two patches, one to make the
> infrastructure changes and a second to add CURSOR_OPT_PARALLEL_OK to
> more places.
>

I have split the patch into two, one is to allow optimiser to select a
parallel plan for queries in PL functions
(pl_parallel_opt_support_v1.patch), wherein CURSOR_OPT_PARALLEL_OK is
passed at required places.

Next, the patch for allowing execution of such queries in parallel mode,
that involves infrastructural changes along the lines mentioned upthread
(pl_parallel_exec_support_v1.patch).

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_exec_support_v1.patch
Description: Binary data


pl_parallel_opt_support_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-03-05 Thread Rafia Sabih

n idx_l_shipdate  (cost=0.00..2779996.70 rows=270580572
width=0) (actual time=146519.142..146519.142 rows=273058377 loops=1)
   Index
Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate <
'1995-01-01 00:00:00'::timestamp without time zone))
   ->  Index Scan using
partsupp_pkey on partsupp  (cost=0.57..12722651.69 rows=24000
width=20) (actual time=0.160..197858.090 rows=24000 loops=1)
 ->  Index Scan using part_pkey on
part  (cost=0.56..0.75 rows=1 width=4) (actual time=0.012..0.012
rows=0 loops=156321526)
   Index Cond: (p_partkey =
lineitem.l_partkey)
   Filter: ((p_name)::text ~~
'beige%'::text)
   Rows Removed by Filter: 1
 ->  Index Scan using supplier_pkey on supplier
(cost=0.43..0.61 rows=1 width=64) (actual time=0.011..0.012 rows=1
loops=1300126)
   Index Cond: (s_suppkey = lineitem.l_suppkey)
 Planning time: 2.440 ms
 Execution time: 6057329.179 ms

I hope there might be some way-out for such a case which includes the
benefits of the commit without hurting other cases (like this one)
this bad.
Thoughts, comments...

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-22 Thread Rafia Sabih

On Wed, Feb 22, 2017 at 10:22 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> Some initial comments.
>
> --
> if (numberTuples || dest->mydest == DestIntoRel)
> use_parallel_mode = false;
>
> if (use_parallel_mode)
> EnterParallelMode();
> + else if (estate->es_plannedstmt->parallelModeNeeded &&
> + (dest->mydest == DestSPI || dest->mydest == DestSQLFunction))
> + {
> + use_parallel_mode = true;
> + EnterParallelMode();
> + }
>
> I think we can simplify this, can we replace above code with something
> like this?
>
> if (dest->mydest == DestIntoRel ||
> numberTuples && (dest->mydest != DestSPI || dest->mydest !=
> DestSQLFunction))
> use_parallel_mode = false;

Yes, it can be simplified to
if (dest->mydest == DestIntoRel || (numberTuples && (dest->mydest !=
DestSPI && dest->mydest ! DestSQLFunction)))
Thanks.
>
> -
>
> + {
> + /* Allow parallelism if the function is not trigger type. */
> + if (estate->func->fn_is_trigger == PLPGSQL_NOT_TRIGGER)
> + exec_res = SPI_execute(querystr, estate->readonly_func,
> CURSOR_OPT_PARALLEL_OK);
> + else
> + exec_res = SPI_execute(querystr, estate->readonly_func, 0);
> + }
>
> The last parameter of SPI_execute is tuple count, not cursorOption,
> you need to fix this. Also, this is crossing the 80 line boundary.
>
Oops, corrected.

> ---
> Any specific reason for not changing SPI_execute_with_args, EXECUTE
> with USING will take this path.
>
Fixed.


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-02-21 Thread Rafia Sabih

On Sun, Feb 19, 2017 at 4:02 PM, Robert Haas <robertmh...@gmail.com> wrote:

> Committed, although I neglected to incorporate this change.  Not sure
> if I should go back and do that; it doesn't read too badly as-is.
>
Thanks Robert for committing, and thanks Rahila, Amit, and Tushar for
reviewing and testing the patch.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-21 Thread Rafia Sabih

On Wed, Feb 22, 2017 at 12:25 PM, Robert Haas <robertmh...@gmail.com> wrote:
> Looks fine to me.  Committed.  I did move es_queryText to what I think
> is a more appropriate location in the structure definition.
>
> Thanks.
>
Many thanks to Robert for committing and to Kuntal and Amit for reviewing.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-21 Thread Rafia Sabih

Hello everybody,

In the current version, queries in SQL or PL functions can not
leverage parallelism. To relax this restriction I prepared a patch,
the approach used in the patch is explained next,

Approach:

1. Allow parallelism for queries in PL functions by passing
CURSOR_OPT_PARALLEL_OK instead of 0 to exec_prepare_plan called from
exec_stmt_execsql or exec_stmt_dynexecute. Similarly, pass
CURSOR_OPT_PARALLEL_OK instead of 0 to SPI_execute and exec_run_select
called from exec_stmt_dynexecute. CURSOR_OPT_PARALLEL_OK is passed to
these functions after checking if the statement is not trigger, since
in that case using parallelism may not be efficient.

2. In ExecutePlan there is an additional check to see if the query is
coming from SQL or PL functions and is having a parallel plan. In that
case we ignore the check of numberTuples since it is always one for
these functions and existing checks restrict parallelism for these
cases. Though, I understand this may not be the most elegant way to do
this, and would be pleased to know some better alternative.


I have attached a sql file containing cases for some pgpsql, perl,
python functions and an .out file which contains the parallel plans
for the queries in these functions after this patch. This might be
helpful in understanding the level of parallelism this patch is
relaxing for PL functions.

Thanks to my colleagues Amit Kapila and Dilip Kumar for discussions in
this regard.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_v1.patch
Description: Binary data


pl_parallel.out
Description: Binary data


pl_parallel_test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-20 Thread Rafia Sabih

On Mon, Feb 20, 2017 at 8:35 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:
>
> +   char   *query_data;
> +   query_data = estate->es_sourceText;
> estate->es_sourceText is a const char* variable. Assigning this const
> pointer to a non-const pointer violates the rules
> constant-correctness. So, either you should change query_data as const
> char*, or as Robert suggested, you can directly use
> estate->es_sourceText.
>
Done.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-19 Thread Rafia Sabih

On Sun, Feb 19, 2017 at 10:11 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Feb 16, 2017 at 6:41 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
> > On Thu, Feb 16, 2017 at 5:47 PM, Rafia Sabih
> > <rafia.sa...@enterprisedb.com> wrote:
> >> Other that that I updated some comments and other cleanup things. Please
> >> find the attached patch for the revised version.
> > Looks good.
> >
> > It has passed the regression tests (with and without regress mode).
> > Query is getting displayed for parallel workers in pg_stat_activity,
> > log statements and failed-query statements. Moved status to Ready for
> > committer.
>
> The first hunk of this patch says:
>
> +estate->es_queryString = (char *)
> palloc0(strlen(queryDesc->sourceText) + 1);
> +estate->es_queryString = queryDesc->sourceText;
>
> This is all kinds of wrong.
>
> 1. If you assign a value to a variable, and then immediately assign
> another value to a variable, the first assignment might as well not
> have happened.  In other words, this is allocating a string and then
> immediately leaking the allocated memory.
>
> 2. If the intent was to copy the string in into
> estate->es_queryString, ignoring for the moment that you'd need to use
> strcpy() rather than an assignment to make that happen, the use of
> palloc0 would be unnecessary.  Regular old palloc would be fine,
> because you don't need to zero the memory if you're about to overwrite
> it with some new contents.  Zeroing isn't free, so it's best not to do
> it unnecessarily.
>
> 3. Instead of using palloc and then copying the string, you could just
> use pstrdup(), which does the whole thing for you.
>
> 4. I don't actually think you need to copy the query string at all.
> Tom's email upthread referred to copying the POINTER to the string,
> not the string itself, and src/backend/executor/README seems to
> suggest that FreeQueryDesc can't be called until after ExecutorEnd().
> So I think you could replace these two lines of code with
> estate->es_queryString = queryDesc->sourceText and call it good.
> That's essentially what this is doing anyway, as the code is written.
>
> Fixed.

> +/* For passing query text to the workers */
>
> Unnecessary, because it's clear from the name PARALLEL_KEY_QUERY_TEXT.
>

True, done.

>
>  #define PARALLEL_TUPLE_QUEUE_SIZE  65536
> -
>  /*
>
> Don't remove this blank line.
>

Done.

>
> +   query_data = (char *) palloc0(strlen(estate->es_queryString) + 1);
> +   strcpy(query_data, estate->es_queryString);
>
> It's unnecessary to copy the query string here; you're going to use it
> later on in the very same function.  That code can just refer to
> estate->es_queryString directly.
>

Done.

>
> +   const char *es_queryString; /* Query string for passing to workers
> */
>
> Maybe this should be called es_sourceText, since it's being copied
> from querydesc->sourceText?
>

Done.

Please find the attached patch for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-02-16 Thread Rafia Sabih

On Thu, Feb 16, 2017 at 9:25 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:
>
> Few comments:
>
> 1.
> + * ioss_PscanLen   This is needed for parallel index scan
>   * 
>   */
>  typedef struct IndexOnlyScanState
> @@ -1427,6 +1428,7 @@ typedef struct IndexOnlyScanState
>   IndexScanDesc ioss_ScanDesc;
>   Buffer ioss_VMBuffer;
>   long ioss_HeapFetches;
> + Size ioss_PscanLen; /* This is needed for parallel index scan */
>
> No need to mention same comment at multiple places.  I think you keep
> it on top of structure.  The explanation could be some thing like
> "size of parallel index only scan descriptor"
>

Fixed.

>
> 2.
> + node->ioss_ScanDesc->xs_want_itup = true;
>
> I think wherever you are initializing xs_want_itup, you should
> initialize ioss_VMBuffer as well.  Is there a reason for not doing so?
>

Done.

>
>
> 3.
> explain (costs off)
>   select  sum(parallel_restricted(unique1)) from tenk1
>   group by(parallel_restricted(unique1));
> - QUERY PLAN
> -
> +QUERY PLAN
> +---
>   HashAggregate
> Group Key: parallel_restricted(unique1)
> -   ->  Index Only Scan using tenk1_unique1 on tenk1
> -(3 rows)
> +   ->  Gather
> + Workers Planned: 4
> + ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
> +(5 rows)
>
> It doesn't look good that you want to test parallel index only scan
> for a test case that wants to test restricted function.  The comments
> atop test looks odd. I suggest add a separate test (both explain and
> actual execution of query) for parallel index only scan as we have for
> parallel plans for other queries and see if we can keep the output of
> existing test same.
>

Agree, but actually the objective of this test-case is met even with this
plan. To restrict parallel index-only scan here, modification in query or
other parameters would be required. However, for the proper code-coverage
and otherwise I have added test-case for parallel index-only scan.

>
> 4.
> ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
> {
> ..
> + /*
> + * if we are here to just update the scan keys, then don't reset parallel
> + * scan
> + */
> + if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
> + reset_parallel_scan = false;
> ..
> }
>
> I think here you can update the comment to indicate that for detailed
> reason refer ExecReScanIndexScan.
>

Done.
Please find the attached patch for the revised version.
Just an FYI, in my recent tests on TPC-H 300 scale factor, Q16 showed
improved execution time from 830 seconds to 730 seconds with this patch
when used with parallel merge-join patch [1].

[1]
https://www.postgresql.org/message-id/CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ%40mail.gmail.com
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-16 Thread Rafia Sabih

On Thu, Feb 16, 2017 at 5:06 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:
>
> >>
> >> Another question is don't we need to set debug_query_string in worker?
> >
> > In the updated version I am setting it in ParallelQueryMain.
> >
> Ahh, I missed that. debug_query_string is used to show the log
> statements. Hence, it should be set.
>
> +   queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT);
> +   debug_query_string = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT);
> +   pgstat_report_activity(STATE_RUNNING, shm_toc_lookup(toc,
> PARALLEL_KEY_QUERY_TEXT));
> Just one lookup is sufficient.
>
> Fixed.

Other that that I updated some comments and other cleanup things. Please
find the attached patch for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-02-16 Thread Rafia Sabih

On Thu, Feb 16, 2017 at 3:40 PM, Rafia Sabih <rafia.sa...@enterprisedb.com>
wrote:

>
> On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasye...@gmail.com>
> wrote:
>
>> I  reviewed the patch. Overall it looks fine to me.
>>
>> One comment,
>>
>> >-   if (index->amcanparallel &&
>> >-   !index_only_scan &&
>> >+   if ((index->amcanparallel ||
>> >+   index_only_scan) &&
>>
>> Why do we need to check for index_only_scan in the above condition. IIUC,
>> index->amcanparallel is necessary for
>> parallel index scan to be chosen. We cannot chose parallel index only
>> scan if index_only_scan is happening
>> without worrying about index->amcanparallel value. So OR-ing
>> index->amcanparallel with index_only_scan is probably not
>> correct.
>>
>> True, we do not need this, only removing !index_only_scan should work.
> Fixed
>
>>
>>
>> On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>>>
>>>
>>> This again needs minor rebasing but basically looks fine.  It's a
>>> pretty straightforward extension of the parallel index scan work.
>>>
>>> Please make sure that this is pgindent-clean - i.e. that when you
>>> pgindent the files that it touches, pgindent doesn't change anything
>>> of the same parts of the file that you've changed in the patch.  Also,
>>> I believe Amit may have made some adjustments to the logic in
>>> nodeIndexScan.c; if so, it would be good to make sure that the
>>> nodeIndexOnlyScan.c changes match what was done there.  In particular,
>>> he's got this:
>>>
>>> if (reset_parallel_scan && node->iss_ScanDesc->parallel_s
>>> can)
>>>     index_parallelrescan(node->iss_ScanDesc);
>>>
>>> And you've got this:
>>>
>>> +   if (reset_parallel_scan)
>>> +   index_parallelrescan(node->ioss_ScanDesc);
>>>
>>
> Fixed.
> Please find the attached patch for rebased and cleaner version.
>
> Please find the attached patch with a minor comment update.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-02-16 Thread Rafia Sabih

On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasye...@gmail.com> wrote:

> I  reviewed the patch. Overall it looks fine to me.
>
> One comment,
>
> >-   if (index->amcanparallel &&
> >-   !index_only_scan &&
> >+   if ((index->amcanparallel ||
> >+   index_only_scan) &&
>
> Why do we need to check for index_only_scan in the above condition. IIUC,
> index->amcanparallel is necessary for
> parallel index scan to be chosen. We cannot chose parallel index only scan
> if index_only_scan is happening
> without worrying about index->amcanparallel value. So OR-ing
> index->amcanparallel with index_only_scan is probably not
> correct.
>
> True, we do not need this, only removing !index_only_scan should work.
Fixed

>
>
> On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
>>
>>
>> This again needs minor rebasing but basically looks fine.  It's a
>> pretty straightforward extension of the parallel index scan work.
>>
>> Please make sure that this is pgindent-clean - i.e. that when you
>> pgindent the files that it touches, pgindent doesn't change anything
>> of the same parts of the file that you've changed in the patch.  Also,
>> I believe Amit may have made some adjustments to the logic in
>> nodeIndexScan.c; if so, it would be good to make sure that the
>> nodeIndexOnlyScan.c changes match what was done there.  In particular,
>> he's got this:
>>
>> if (reset_parallel_scan && node->iss_ScanDesc->parallel_s
>> can)
>> index_parallelrescan(node->iss_ScanDesc);
>>
>> And you've got this:
>>
>> +   if (reset_parallel_scan)
>> +   index_parallelrescan(node->ioss_ScanDesc);
>>
>
Fixed.
Please find the attached patch for rebased and cleaner version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-10 Thread Rafia Sabih

> >
> > There are some spacing issues in the code. For example,
> > +   estate->es_queryString = (char
> > *)palloc0(strlen(queryDesc->sourceText) + 1);
> > +   /*Estimate space for query text. */
> > pgindent might be helpful to track all such changes.
> >
>
Fixed.


> > +#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE010)
> > I'm uncomfortable with declaring the same macro in two
> > files(parallel.c, execParallel.c). My suggestion would be to move
> > pgstat_report_activity in ParallelQueryMain instead of
> > ParallelWorkerMain. Then, you can remove the macro definition from
> > parallel.c. Thoughts?
> >

Yes, I also don't see any need of defining it in parallel.c.  I think
> she has kept to report it in pg_stat_activity, but I feel that code
> can also be moved to execParallel.c.
>
> Agree and fixed.


> Another question is don't we need to set debug_query_string in worker?

In the updated version I am setting it in ParallelQueryMain.

Please find the attached file for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-06 Thread Rafia Sabih

On Mon, Jan 23, 2017 at 2:46 PM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
> I've looked into the patch. I've some comments regarding that.
>
> +#define PARALLEL_KEY_QUERY_TEXTUINT64CONST(0xE010)
> It should be UINT64CONST(0xE00A)
>
> +   query_len = strlen(query_data) + 1;
> +   shm_toc_estimate_chunk(>estimator, query_len);
> +   shm_toc_estimate_keys(>estimator, 1);
> +
> Adding a comment before this will be helpful. You should maintain the
> same order for estimating and storing the query string. For example,
> as you've estimated the space for query string after estimation of dsa
> space, you should store the same after creating dsa. Besides, I think
> the appropriate place for this would be before planned statement.
Fixed.

> The regression test for select_parallel is failing after applying the
> patch. You can reproduce the scenario by the following sql statements:
>
> CREATE TABLE t1(a int);
> INSERT INTO t1 VALUES (generate_series(1,10));
> SET parallel_setup_cost=0;
> SET parallel_tuple_cost=0;
> SET min_parallel_relation_size=0;
> SET max_parallel_workers_per_gather=4;
> SELECT count(*) FROM t1;
>
> It is showing the following warning.
> WARNING:  problem in alloc set ExecutorState: detected write past
> chunk end in block 0x14f5310, chunk 0x14f6c50
Fixed.

Thanks a lot Kuntal for the review, please find attached patch for the
revised version.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Redundant comment in ExecutePlan

2017-02-03 Thread Rafia Sabih

Hello all,
I noticed there is a redundant comment in ExecutePlan,

/*
* If a tuple count was supplied, we must force the plan to run without
* parallelism, because we might exit early.  Also disable parallelism
* when writing into a relation, because no database changes are allowed
* in parallel mode.
*/
if (numberTuples || dest->mydest == DestIntoRel)
use_parallel_mode = false;

/*
* If a tuple count was supplied, we must force the plan to run without
* parallelism, because we might exit early.
*/
if (use_parallel_mode)
EnterParallelMode();

I felt it is not required the second time, hence attached is the patch
to remove this second comment.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


redundant_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-02-01 Thread Rafia Sabih

On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
> <rafia.sa...@enterprisedb.com> wrote:
>> 9 | 62928.88   | 59077.909
>
> Thanks Rafia.  At first glance this plan is using the Parallel Shared
> Hash in one place where it should pay off, that is loading the orders
> table, but the numbers are terrible.  I noticed that it uses batch
> files and then has to increase the number of batch files, generating a
> bunch of extra work, even though it apparently overestimated the
> number of rows, though that's only ~9 seconds of ~60.  I am
> investigating.

Hi Thomas,
Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

#0  0x3fff880c7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1  0x104e2718 in WaitEventSetWaitBlock (set=0x100157bde90,
cur_timeout=-1, occurred_events=0x3fffdbe69698, nevents=1) at
latch.c:998
#2  0x104e255c in WaitEventSetWait (set=0x100157bde90,
timeout=-1, occurred_events=0x3fffdbe69698, nevents=1,
wait_event_info=134217745) at latch.c:950
#3  0x10512970 in ConditionVariableSleep (cv=0x3ffd736e05a4,
wait_event_info=134217745) at condition_variable.c:132
#4  0x104dbb1c in BarrierWaitSet (barrier=0x3ffd736e0594,
new_phase=1, wait_event_info=134217745) at barrier.c:97
#5  0x104dbb9c in BarrierWait (barrier=0x3ffd736e0594,
wait_event_info=134217745) at barrier.c:127
#6  0x103296a8 in ExecHashShrink (hashtable=0x3ffd73747dc0) at
nodeHash.c:1075
#7  0x1032c46c in dense_alloc_shared
(hashtable=0x3ffd73747dc0, size=40, shared=0x3fffdbe69eb8,
respect_work_mem=1 '\001') at nodeHash.c:2618
#8  0x1032a2f0 in ExecHashTableInsert
(hashtable=0x3ffd73747dc0, slot=0x100158f9e90, hashvalue=2389907270)
at nodeHash.c:1476
#9  0x10327fd0 in MultiExecHash (node=0x100158f9800) at nodeHash.c:296
#10 0x10306730 in MultiExecProcNode (node=0x100158f9800) at
execProcnode.c:577

The issue is not deterministic and straightforwardly reproducible,
sometimes after make clean, etc. queries run sometimes they hang up
again. I wanted to bring this to your notice hoping you might be
faster than me in picking up the exact reason behind this anomaly.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-01-19 Thread Rafia Sabih

On Fri, Jan 13, 2017 at 2:19 PM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasye...@gmail.com> wrote:
>> Hello,
>>
>> On applying the patch on latest master branch and running regression tests
>> following failure occurs.
>> I applied it on latest parallel index scan patches as given in the link
>> above.
>>
>> ***
>> /home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
>> 2017-01-03 14:06:29.122022780 +0530
>> ---
>> /home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
>> 2017-01-12 14:35:56.652712622 +0530
>> ***
>> *** 92,103 
>>   explain (costs off)
>> select  sum(parallel_restricted(unique1)) from tenk1
>> group by(parallel_restricted(unique1));
>> !  QUERY PLAN
>> ! 
>>HashAggregate
>>  Group Key: parallel_restricted(unique1)
>> !->  Index Only Scan using tenk1_unique1 on tenk1
>> ! (3 rows)
>>
>>   set force_parallel_mode=1;
>>   explain (costs off)
>> --- 92,105 
>>   explain (costs off)
>> select  sum(parallel_restricted(unique1)) from tenk1
>> group by(parallel_restricted(unique1));
>> ! QUERY PLAN
>> ! ---
>>HashAggregate
>>  Group Key: parallel_restricted(unique1)
>> !->  Gather
>> !  Workers Planned: 4
>> !  ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
>> ! (5 rows)
>>
>> IIUC, parallel operation being performed here is fine as parallel restricted
>> function occurs above Gather node
>> and just the expected output needs to be changed.
>>
>
> True, fixed it, please find the attached file for the latest version.
>

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

[1] 
https://www.postgresql.org/message-id/CAA4eK1L-gb0Fum3mQN4c5PWJXNE7xs7pzwMDWsrDYLucKqvJ2A%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-16 Thread Rafia Sabih

On Fri, Jan 13, 2017 at 9:15 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>>> Could it be related to transformations operated when the file is
>>> downloaded
>>> and saved, because it is a text file?
>>
>>
>> I think this is delaying the patch unnecessarily, I have attached a
>> version, please see if you can apply it successfully, we can proceed
>> with that safely then...
>
>
> This is the same file I sent:
>
>  sh> sha1sum pgbench-continuation-4.patch pgbench-continuation-3.patch
>  97fe805a89707565210699694467f9256ca02dab  pgbench-continuation-4.patch
>  97fe805a89707565210699694467f9256ca02dab  pgbench-continuation-3.patch
>
> The difference is that your version is mime-typed with a generic
>
> Application/OCTET-STREAM
>
> While the one I sent was mime-typed as
>
> text/x-diff
>
> as defined by the system in /etc/mime.types on my xenial laptop.
>
> My guess is that with the later your mail client changes the file when
> saving it, because it sees "text".
>

Okay, I am marking it as 'Ready for committer' with the following note
to committer,
I am getting whitespace errors in v3 of patch, which I corrected in
v4, however, Fabien is of the opinion that v3 is clean and is showing
whitespace errors because of downloader, etc. issues in my setup.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-13 Thread Rafia Sabih

On Fri, Jan 13, 2017 at 4:57 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Rafia,
>
>>>   sh> git apply ~/pgbench-continuation-3.patch
>>>   # ok
>>
>>
>> It still gives me whitespace errors with git apply,
>
>
> Strange.
>
Yes, quite strange.

>> /Users/edb/Downloads/pgbench-continuation-3.patch:31: trailing whitespace.
>> continuation \\{newline}
>
>
>> Looks like an editor issue, I used awk '{ sub("\r$", ""); print }'
>> patch1 > patch2 to clean it and it was applying then.
>
>
> Doing that does not change the file for me.
>
> I see no \r in the patch file according to "od", I just see "nl" (0x0a).
>
> sha1sum: 97fe805a89707565210699694467f9256ca02dab
> pgbench-continuation-3.patch
>
> Could it be related to transformations operated when the file is downloaded
> and saved, because it is a text file?
>
I think this is delaying the patch unnecessarily, I have attached a
version, please see if you can apply it successfully, we can proceed
with that safely then...

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pgbench-continuation-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-01-13 Thread Rafia Sabih

On Thu, Jan 12, 2017 at 6:24 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Jan 11, 2017 at 11:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>>> On Wed, Jan 11, 2017 at 6:36 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>> That would work, if you had a way to get at the active QueryDesc ...
>>>> but we don't pass that down to executor nodes.
>>
>>> Hmm, that is a bit of a problem.  Do you have a suggestion?
>>
>> Copy that string pointer into the EState, perhaps?
>
> OK, that sounds good.

Thanks a lot Tom and Robert for the suggestions. Please find the
attached file for the revised version of the patch, wherein I added an
extra pointer for queryString in EState and populated it with
queryDesc->sourceText in standard_ExecutorStart. Later, in
ExecInitParallelPlan a token for queryString is created in shared
memory which is used in ExecParallelGetQueryDesc and
ParallelWorkerMain as before (in version 1 of patch).

Please let me know your feedback over the same.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

pass_queryText_to_workers_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-13 Thread Rafia Sabih

On Wed, Jan 11, 2017 at 5:39 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello,
>
>>> The attached patch adds backslash-return (well newline really)
>>> continuations
>>> to all pgbench backslash-commands.
>>>
>>> The attached test uses continuations on all such commands (sleep set
>>> setshell and shell).
>>>
>>> I think that adding continuations to psql should be a distinct patch.
>
>
>> The patch does not apply on the latest head, I guess this requires
>> rebasing since yours is posted in December.
>
>
> Strange. Here is a new version and a test for all known backslash-commands
> in pgbench.
>
>   sh> git br test master
>   sh> git apply ~/pgbench-continuation-3.patch
>   # ok
>   sh> git diff
>   # ok
>   sh> cd src/bin/pgbench
>   sh> make
>   sh> ./pgbench -t 1 -f SQL/cont.sql
>   starting vacuum...end.
>   debug(script=0,command=1): int 0
>   debug(script=0,command=2): int 1
>   debug(script=0,command=4): int 2
>   3
>   debug(script=0,command=8): int 4
>   transaction type: SQL/cont.sql
>
>> Again, it is giving trailing whitespace errors (as I reported for the
>> earlier version), plus it does not apply with git apply,
>
>
> It does above with the attached version.

It still gives me whitespace errors with git apply,

/Users/edb/Downloads/pgbench-continuation-3.patch:31: trailing whitespace.
continuation \\{newline}
/Users/edb/Downloads/pgbench-continuation-3.patch:39: trailing whitespace.
{continuation} { /* ignore */ }
/Users/edb/Downloads/pgbench-continuation-3.patch:40: trailing whitespace.
/Users/edb/Downloads/pgbench-continuation-3.patch:48: trailing whitespace.
{continuation} { /* ignore */ }
/Users/edb/Downloads/pgbench-continuation-3.patch:49: trailing whitespace.

error: patch failed: doc/src/sgml/ref/pgbench.sgml:810
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
error: patch failed: src/bin/pgbench/exprscan.l:65
error: src/bin/pgbench/exprscan.l: patch does not apply

Looks like an editor issue, I used awk '{ sub("\r$", ""); print }'
patch1 > patch2 to clean it and it was applying then.

>> hopefully that would be fixed once rebased. Other than that, I observed
>> that if after backslash space is there, then the command fails.
>
>
> Yes, this is expected.
>
>> I think it should be something like if after backslash some spaces are
>> there, followed by end-of-line then it should ignore these spaces and read
>> next line, atleast with this new meaning of backslash.
>
>
> Hmmm. This is not the behavior of backslash continuation in bash or python,
> I do not think that this is desirable to have a different behavior.
>
Okay, seems sensible.
>> Otherwise, it should be mentioned in the docs that backslash should not be
>> followed by space.
>
>
> I'm not sure. Doc says that continuation is "backslash-return", it cannot be
> more explicit. If it must say what it is not, where should it stop?
>
> --
> Fabien.

Please post the clean version and I'll mark it as ready for committer then.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2017-01-13 Thread Rafia Sabih

On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasye...@gmail.com> wrote:
> Hello,
>
> On applying the patch on latest master branch and running regression tests
> following failure occurs.
> I applied it on latest parallel index scan patches as given in the link
> above.
>
> ***
> /home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
> 2017-01-03 14:06:29.122022780 +0530
> ---
> /home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
> 2017-01-12 14:35:56.652712622 +0530
> ***
> *** 92,103 
>   explain (costs off)
> select  sum(parallel_restricted(unique1)) from tenk1
> group by(parallel_restricted(unique1));
> !  QUERY PLAN
> ! 
>HashAggregate
>  Group Key: parallel_restricted(unique1)
> !->  Index Only Scan using tenk1_unique1 on tenk1
> ! (3 rows)
>
>   set force_parallel_mode=1;
>   explain (costs off)
> --- 92,105 
>   explain (costs off)
> select  sum(parallel_restricted(unique1)) from tenk1
> group by(parallel_restricted(unique1));
> ! QUERY PLAN
> ! ---
>HashAggregate
>  Group Key: parallel_restricted(unique1)
> !->  Gather
> !  Workers Planned: 4
> !  ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
> ! (5 rows)
>
> IIUC, parallel operation being performed here is fine as parallel restricted
> function occurs above Gather node
> and just the expected output needs to be changed.
>

True, fixed it, please find the attached file for the latest version.

> Thank you,
> Rahila Syed
>
Thanks a lot for the review Rahila.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-01-12 Thread Rafia Sabih

On Wed, Jan 11, 2017 at 5:33 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
>
> On 01/10/2017 05:16 PM, Dilip Kumar wrote:
>>
>>   Please try attached patch and confirm from your
>> side.
>
> Thanks,issue seems to be fixed now.
>
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Hello Dilip,
I was trying to test the performance of all the parallel query related
patches on TPC-H queries, I found that when parallel hash [1 ]and your
patch are applied then Q10 and Q14 were hanged, however, without any
one of these patches, these queries are running fine. Following is the
stack trace,

At the master:

#0  0x3fff8bef7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1  0x104ea184 in WaitEventSetWaitBlock (set=0x10039b2d360,
cur_timeout=-1, occurred_events=0x3fffd4a55a98, nevents=1) at
latch.c:998
#2  0x104e9fc8 in WaitEventSetWait (set=0x10039b2d360,
timeout=-1, occurred_events=0x3fffd4a55a98, nevents=1,
wait_event_info=134217730) at latch.c:950
#3  0x104e9484 in WaitLatchOrSocket (latch=0x3fff85128ab4,
wakeEvents=1, sock=-1, timeout=-1, wait_event_info=134217730) at
latch.c:350
#4  0x104e931c in WaitLatch (latch=0x3fff85128ab4,
wakeEvents=1, timeout=0, wait_event_info=134217730) at latch.c:304
#5  0x1032a378 in gather_readnext (gatherstate=0x10039aeeb98)
at nodeGather.c:393
#6  0x1032a044 in gather_getnext (gatherstate=0x10039aeeb98)
at nodeGather.c:293
#7  0x10329e94 in ExecGather (node=0x10039aeeb98) at nodeGather.c:234
#8  0x103086f0 in ExecProcNode (node=0x10039aeeb98) at
execProcnode.c:521
#9  0x1031f550 in fetch_input_tuple (aggstate=0x10039aed220)
at nodeAgg.c:587
#10 0x103229a4 in agg_retrieve_direct (aggstate=0x10039aed220)
at nodeAgg.c:2211

At one of the worker process,

#0  0x103879f8 in pagetable_insert (tb=0x10039abcac0,
key=6491, found=0x3fffd4a54b88 "") at
../../../src/include/lib/simplehash.h:571
#1  0x10389964 in tbm_get_pageentry (tbm=0x10039ab5778,
pageno=6491) at tidbitmap.c:876
#2  0x10388608 in tbm_add_tuples (tbm=0x10039ab5778,
tids=0x10039a94c30, ntids=1, recheck=0 '\000') at tidbitmap.c:340
#3  0x100f5e54 in btgetbitmap (scan=0x10039ab3c80,
tbm=0x10039ab5778) at nbtree.c:439
#4  0x100eab7c in index_getbitmap (scan=0x10039ab3c80,
bitmap=0x10039ab5778) at indexam.c:687
#5  0x10328acc in MultiExecBitmapIndexScan
(node=0x10039a98630) at nodeBitmapIndexscan.c:98
#6  0x103088d8 in MultiExecProcNode (node=0x10039a98630) at
execProcnode.c:591
#7  0x10326c70 in BitmapHeapNext (node=0x10039a98770) at
nodeBitmapHeapscan.c:164
#8  0x10316440 in ExecScanFetch (node=0x10039a98770,
accessMtd=0x10326b70 , recheckMtd=0x10327bb4
) at execScan.c:95
#9  0x10316590 in ExecScan (node=0x10039a98770,
accessMtd=0x10326b70 , recheckMtd=0x10327bb4
) at execScan.c:180
#10 0x10327c84 in ExecBitmapHeapScan (node=0x10039a98770) at
nodeBitmapHeapscan.c:623

At other workers,

#0  0x3fff8bef7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1  0x104ea184 in WaitEventSetWaitBlock (set=0x10039a24670,
cur_timeout=-1, occurred_events=0x3fffd4a55ed8, nevents=1) at
latch.c:998
#2  0x104e9fc8 in WaitEventSetWait (set=0x10039a24670,
timeout=-1, occurred_events=0x3fffd4a55ed8, nevents=1,
wait_event_info=134217737) at latch.c:950
#3  0x1051a3dc in ConditionVariableSleep (cv=0x3fff7c2d01ac,
wait_event_info=134217737) at condition_variable.c:132
#4  0x103283e0 in pbms_is_leader (pbminfo=0x3fff7c2d0178) at
nodeBitmapHeapscan.c:900
#5  0x10326c38 in BitmapHeapNext (node=0x10039a95f50) at
nodeBitmapHeapscan.c:153
#6  0x10316440 in ExecScanFetch (node=0x10039a95f50,
accessMtd=0x10326b70 , recheckMtd=0x10327bb4
) at execScan.c:95
#7  0x10316590 in ExecScan (node=0x10039a95f50,
accessMtd=0x10326b70 , recheckMtd=0x10327bb4
) at execScan.c:180
#8  0x10327c84 in ExecBitmapHeapScan (node=0x10039a95f50) at
nodeBitmapHeapscan.c:623
#9  0x10308588 in ExecProcNode (node=0x10039a95f50) at
execProcnode.c:443
#10 0x103310d8 in ExecHashJoinOuterGetTuple
(outerNode=0x10039a95f50, hjstate=0x10039a96808,
hashvalue=0x3fffd4a5639c) at nodeHashjoin.c:936

I was using TPC-H with scale factor 20, please let me know if there is
anything more you require in this regard.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D1vGcv6LBrxZeqPb_rPxfraidWAF_8_4z2ZMQ%2B7DOjj9w%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-12 Thread Rafia Sabih

 warning: variable ‘size_before_shrink’ set but not used
> > [-Wunused-but-set-variable]
> >   Size size_before_shrink = 0;
> >^
>
> In this case it was only used in dtrace builds; I will make sure any
> such code is compiled out when in non-dtrace builds.
>
> > /home/pg/pgbuild/builds/root/../../postgresql/src/backend/
> executor/nodeHashjoin.c:
> > In function ‘ExecHashJoinCloseBatch’:
> > /home/pg/pgbuild/builds/root/../../postgresql/src/backend/
> executor/nodeHashjoin.c:1548:28:
> > warning: variable ‘participant’ set but not used
> > [-Wunused-but-set-variable]
> >   HashJoinParticipantState *participant;
> > ^
> > /home/pg/pgbuild/builds/root/../../postgresql/src/backend/
> executor/nodeHashjoin.c:
> > In function ‘ExecHashJoinRewindBatches’:
> > /home/pg/pgbuild/builds/root/../../postgresql/src/backend/
> executor/nodeHashjoin.c:1587:23:
> > warning: variable ‘batch_reader’ set but not used
> > [-Wunused-but-set-variable]
> >   HashJoinBatchReader *batch_reader;
> >^
> >
> > Is this change really needed?:
> >
> >> --- a/src/backend/executor/nodeSeqscan.c
> >> +++ b/src/backend/executor/nodeSeqscan.c
> >> @@ -31,6 +31,8 @@
> >>  #include "executor/nodeSeqscan.h"
> >>  #include "utils/rel.h"
> >>
> >> +#include 
> >> +
> >>  static void InitScanRelation(SeqScanState *node, EState *estate, int
> eflags);
> >>  static TupleTableSlot *SeqNext(SeqScanState *node);
>
> Right, will clean up.
>
> > That's all I have for now...
>
> Thanks!  I'm away from my computer for a couple of days but will have
> a new patch series early next week, and hope to have a better handle
> on what's involved in adopting the 'unification' approach here
> instead.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Hi Thomas,
I was trying to analyse the performance of TPC-H queries with your patch
and came across following results,
Q9 and Q21 were crashing, both of them had following bt in core dump (I
thought it might be helpful),

#0  0x10757da4 in pfree (pointer=0x3fff78d11000) at mcxt.c:1012
#1  0x1032c574 in ExecHashIncreaseNumBatches
(hashtable=0x1003af6da60) at nodeHash.c:1124
#2  0x1032d518 in ExecHashTableInsert (hashtable=0x1003af6da60,
slot=0x1003af695c0, hashvalue=2904801109, preload=1 '\001') at
nodeHash.c:1700
#3  0x10330fd4 in ExecHashJoinPreloadNextBatch
(hjstate=0x1003af39118) at nodeHashjoin.c:886
#4  0x103301fc in ExecHashJoin (node=0x1003af39118) at
nodeHashjoin.c:376
#5  0x10308644 in ExecProcNode (node=0x1003af39118) at
execProcnode.c:490
#6  0x1031f530 in fetch_input_tuple (aggstate=0x1003af38910) at
nodeAgg.c:587
#7  0x10322b50 in agg_fill_hash_table (aggstate=0x1003af38910) at
nodeAgg.c:2304
#8  0x1032239c in ExecAgg (node=0x1003af38910) at nodeAgg.c:1942
#9  0x10308694 in ExecProcNode (node=0x1003af38910) at
execProcnode.c:509
#10 0x10302a1c in ExecutePlan (estate=0x1003af37fa0,
planstate=0x1003af38910, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x1003af19390) at execMain.c:1587

In case you want to know, I was using TPC-H with 20 scale factor. Please
let me know if you want anymore information on this.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

[HACKERS] Passing query string to workers

2017-01-10 Thread Rafia Sabih

Hello everybody,

Currently, query string is not passed to the workers and only master has
it. In the events, when multiple queries are running on a server and for
one query some worker crashes then it becomes quite burdensome to find the
query with the crashed worker, since on the worker crash no query is
displayed.

To fix this, I propose a patch wherein query string is passed to the
workers as well, hence, displayed when worker crashes.

Approach:
A token for query string is created in the shared memory, this token is
populated with the query string using the global string --
debug_query_string. Now, for each of the worker when
ExecGetParallelQueryDesc is called, we retrieve the query text from shared
memory and pass it to CreateQueryDesc.

Next, to ensure that query gets displayed at the time of crash,
BackendStatusArray needs to be populated correctly, specifically for our
purpose, activity needs to be filled with current query. For this I called
pgstat_report_activity in ParallelWorkerMain, with the query string, this
populates workers' tuples in system table -- pgstat_activity as well.
Previously, pgstat_report_activity was only called for master in
exec_simple_query, hence, for workers pgstat_activity remained null.

Results:
Here is an output for artificially created worker crash with and without
the patch.

Without the patch error report on worker crash:
 LOG:  worker process: parallel worker for PID 49739 (PID 49741) was
terminated by signal 11: Segmentation fault

Error report with the patch:
 LOG:  worker process: parallel worker for PID 51757 (PID 51758) was
terminated by signal 11: Segmentation fault
2017-01-11 11:10:27.630 IST [51742] DETAIL:  Failed process was running:
explain analyse select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '119' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
LIMIT 1;

Inputs of all sorts are encouraged.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-10 Thread Rafia Sabih

On Sat, Dec 3, 2016 at 1:52 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>> FWIW, I looked a bit further and concluded that probably psqlscan.l
>> doesn't need to be modified; so likely you could do this across all of
>> pgbench's commands without touching psql.  That might be an acceptable
>> compromise for now, though I still think that as soon as we have this
>> for pgbench, users will start wanting it in psql.
>
>
> The attached patch adds backslash-return (well newline really) continuations
> to all pgbench backslash-commands.
>
> The attached test uses continuations on all such commands (sleep set
> setshell and shell).
>
> I think that adding continuations to psql should be a distinct patch.
>
> --
> Fabien.

Hello Fabien,
The patch does not apply on the latest head, I guess this requires
rebasing since yours is posted in December. Again, it is giving
trailing whitespace errors (as I reported for the earlier version),
plus it does not apply with git apply, hopefully that would be fixed
once rebased.
Other than that, I observed that if after backslash space is there,
then the command fails. I think it should be something like if after
backslash some spaces are there, followed by end-of-line then it
should ignore these spaces and read next line, atleast with this new
meaning of backslash. Otherwise, it should be mentioned in the docs
that backslash should not be followed by space.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing subplans

2016-12-28 Thread Rafia Sabih

valuating these patches on TPC-H queries on different
scale factors we came across query 16, for TPC-H scale factor 20 and
aforementioned parameter settings with the change of
max_parallel_workers_per gather = 2, it took 37 seconds with the
patches and 49 seconds on head. Though the improvement in overall
query performance is not appearing to be significantly high, yet the
point to note here is that the time taken by join was around 26
seconds on head which reduced to 14 seconds with the patches. Overall
benefit in performance is not high because sort node is dominating the
execution time. The plan information of this query is given in
attached file h_q16_20_2.txt.

On increasing the scale factor to 300, setting work_mem to 1GB,
increasing max_parallel_workers_per_gather = 6, and disabling the
parallel sequential scan for supplier table by 'alter table supplier
set (parallel_workers = 0)', Q16 completes in 645 seconds with
patches, which was taking 812 seconds on head. We need to disable
parallel_workers for supplier table because on higher worker count it
was taking parallel seq scan and hence the scan node that is using
subplan could not be parallelised. For this query both pushdown of
subplans and parallel merge-join, without any one of these the
benefits of parallelism might not be leveraged fully. The output of
explain analyse for this query is given in h_q16_300.txt

Overall, with pushdown of uncorrelated sub-plans to workers enables
the parallelism in joins which was restricted before and hence good
improvement in query performance can be witnessed.

> Now, we can further extend this to parallelize queries containing
> correlated subplans like below:
>
> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
>  QUERY PLAN
> -
>  Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>Filter: (i = t1.i)
> (5 rows)
>
As per my analysis this extension to correlated subplans is likely to
improve the performance of following queries -- Q2 in TPC-H and Q6 in
TPC-DS.

> Yet, another useful enhancement in this area could be to consider both
> parallel and non-parallel paths for subplans.  As of now, we consider
> the cheapest/best path and form subplan from it, but it is quite
> possible that instead of choosing parallel path (in case it is
> cheapest) at subplan level, the non-parallel path at subplan level
> could be beneficial when upper plan can use parallelism.  I think this
> will be a separate project in itself if we want to do this and based
> on my study of TPC-H and TPC-DS queries, I am confident that this will
> be helpful in certain queries at higher scale factors.
>
I agree as then we do not need to disable parallelism for particular
relations as we currently do for supplier relation in Q16 of TPC-H.

[1] 
https://www.postgresql.org/message-id/CAA4eK1KthrAvNjmB2cWuUHz%2Bp3ZTTtbD7o2KUw49PC8HK4r1Wg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAOGQiiOv9NA1VrAo8PmENfGi-y%3DCj_DiTF4vyjp%2BfmuEzovwEA%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAFiTN-tdYpcsk7Lpv0HapcmvSnMN_TgKjC7RkmvVLZAF%2BXfmPg%40mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
On head:

QUERY PLAN  
   

 Limit  (cost=539939.00..539941.25 rows=100 width=46) (actual 
time=35194.017..35194.358 rows=100 loops=1)
   ->  GroupAggregate  (cost=539939.00..541004.96 rows=47376 width=46) (actual 
time=35194.015..35194.345 rows=100 loops=1)
 Group Key: customer_address.ca_zip, customer_address.ca_state
 ->  Sort  (cost=539939.00..540057.44 rows=47376 width=20) (actual 
time=35193.994..35194.019 rows=101 loops=1)
   Sort Key: customer_address.ca_zip, customer_address.ca_state
   Sort Method: quicksort  Memory: 48kB
   ->  Hash Join  (cost=32899.51..536259.81 rows=47376 width=20) 
(actual time=1735.259..35192.430 rows=303 loops=1)
 Hash Cond: (web_sales.ws_item_sk = item.i_item_sk)
 Join Filter: ((substr((customer_address.ca_zip)::text, 1, 
5) = ANY ('{85669,86197,88274,83405,86475,85392,85460,80348,81792}'::text[])) 
OR (hashed SubPlan 1))
 Rows Removed by Join Filter: 2070923
 ->  Hash Join  (cost=26911.21..527324.70 rows=90671 
width=24) (actual time=1406.355..25817.821 rows=2071226 loops=1)
   Hash Cond: (customer.c_current_addr_sk = 
cus

Re: [HACKERS] Parallel Index-only scan

2016-12-27 Thread Rafia Sabih

Rebased patch of parallel-index only scan based on the latest version of
parallel index scan [1] is attached.

[1] https://www.postgresql.org/message-id/CAA4eK1LiNi7_Z1%
2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com

On Sat, Dec 24, 2016 at 7:55 PM, Rafia Sabih <rafia.sa...@enterprisedb.com>
wrote:

> Extremely sorry for the inconvenience caused, please find the attached
> file for the latest version of the patch.
>
> On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
>
>> On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> > Or in words of one syllable: please do not use nabble to post to the
>> > community mailing lists.
>>
>> Many of those words have two syllables, and one has four.
>>
>> Anyhow, I think three castigating emails from committers in a span of
>> 62 minutes is probably enough for the OP to get the point, unless
>> someone else REALLY feels the need to pile on.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> Regards,
> Rafia Sabih
> EnterpriseDB: http://www.enterprisedb.com/
>



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2016-12-24 Thread Rafia Sabih

Extremely sorry for the inconvenience caused, please find the attached file
for the latest version of the patch.

On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Or in words of one syllable: please do not use nabble to post to the
> > community mailing lists.
>
> Many of those words have two syllables, and one has four.
>
> Anyhow, I think three castigating emails from committers in a span of
> 62 minutes is probably enough for the OP to get the point, unless
> someone else REALLY feels the need to pile on.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Index-only scan

2016-12-11 Thread Rafia Sabih

 Group Key: lineitem.l_shipmode
>->  Nested Loop  (cost=1.13..6084.10 rows=10312
> width=27) (actual time=0.139..16.352 rows=1993 loops=5)
>  ->  Parallel Index Only Scan using
> orders_pkey on orders  (cost=0.56..27.14 rows=644 width=4) (actual
> time=0.082..0.366 rows=501 loops=5)
>Index Cond: (o_orderkey < 1)
>Heap Fetches: 0
>  ->  Index Scan using
> idx_lineitem_orderkey on lineitem  (cost=0.57..6.45 rows=296 width=31)
> (actual time=0.020..0.025 rows=4 loops=2503)
>Index Cond: (l_orderkey =
> orders.o_orderkey)
>  Planning time: 1.170 ms
>  Execution time: 40.898 ms


Test configuration and machine detail:
--
TPC-H: scale facto : 20
work_mem  : 64MB
shared buffer   : 1GB
max_parallel_workers_per_gather  : 4
Machine   : IBM POWER, 4 socket
machine with 512 GB RAM.
random_page_cost = seq_page_cost = 0.1
We kept random and seq page cost same since warm cache environment was
ensured, thus, keeping a value of random_page_cost that is higher than
seq_page_cost doesn't makes much sense in this setup.
Also, some additional indexes were build, for lineitem table l_shipdate,
l_shipmode, and l_returnflag, on orders table index is added for o_comment
and o_orderdate individually.

The point to note here is that with parallel index-only scan, not only the
heap fetches are saved but also the aggregates/sort can be pushed down to
workers and thus the total time of query can be improved. Clearly, the
performance of queries improved significantly with this new operator and
considering the changes required after parallel index scan patches is less
if evaluated against the improvement in performance it offers.

Attached file:
--
1. parallel_index_only_v1.patch

This patch is to be applied over parallel index scan [1].

* Thanks to my colleagues Dilip Kumar and Amit Kapila for their support and
feedback.

[1]
https://www.postgresql.org/message-id/CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2016-11-09 Thread Rafia Sabih

On Wed, Nov 9, 2016 at 3:28 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> +1.  My vote is for backslash continuations.
>>
>
> I'm fine with that!
>
> --
> Fabien.
>

Looks good to me also.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2016-11-01 Thread Rafia Sabih

Well with this new approach, the example you gave previously for better
readability:

> \set bid
>  CASE WHEN random(0, 99) < 85
>THEN :tbid
>ELSE :abid + (:abid >= :tbid)
>  END

will give error at the first line. In general, this new approach is likely
to create confusions in such cases. As an end-user one needs to be real
careful to check what portions have to split between lines. Keeping this in
mind, I'd prefer the previous approach.

On Tue, Nov 1, 2016 at 4:23 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> Attached patch does what is described in the title, hopefully.
>> Continuations in other pgbench backslash-commands should be dealt with
>> elsewhere...
>>
>> Also attached is a small test script.
>>
>
> Here is another approach, with "infered" continuations: no backslash is
> needed, the parsing is pursued if the last token of the line cannot end an
> expression (eg an operator) or if there is an unclosed parenthesis.
>
> I think that backslashes are less surprising for the classically minded
> user, but this one is more fun:-) Also, this version changes a little more
> the scanner because on each token the next state (continued or not) must be
> decided.
>
> --
> Fabien.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2016-11-01 Thread Rafia Sabih

Hie Fabien,
Seems like an interesting addition to pgbench interface, but not sure where
it's required, it'll be good if you may provide some cases where it's
utility can be witnessed. Something like where you absolutely need
continuations in expression.

While applying it is giving some trailing whitespace errors, please correct
them.
As an additional comment you may consider  reformatting following snippet

> {continuation} { /* ignore */ }
>
>   as

> {continuation} {
> /* ignore */
> }

Thanks.

On Mon, Oct 3, 2016 at 6:16 PM, Christoph Berg <m...@debian.org> wrote:

> Re: Fabien COELHO 2016-10-03 <alpine.DEB.2.20.1610031259400.19411@lancre>
> > > I "\set" a bunch of lengthy SQL commands in there, e.g.
> >
> > I agree that this looks like a desirable feature, however I would tend to
> > see that as material for another independent patch.
>
> Sure, my question was by no means intending to stop your pgbench patch
> from going forward by adding extra requirements.
>
> > Hmmm. I'm not sure how this is parsed. If this is considered a string
> '...',
> > then maybe \set should wait for the end of the string instead of the end
> of
> > the line, i.e. no continuation would be needed...
> >
> >  \set config '
> > SELECT name, ...
> >CASE ... END
> > FROM pg_settings
> > WHERE ...;'
>
> I guess that would be the sane solution here, yes. Not adding any \
> chars at the end of the line would also mean cut-and-paste of the RHS
> content would work.
>
> Thanks for the feedback!
>
> Christoph
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

[HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall

2016-09-25 Thread Rafia Sabih

On Tue, Jul 5, 2016 at 06:39 AM, Haribabu Kommi
kommi(dot)haribabu(at)gmail(dot)com wrote:

Still i feel the GRANT statements should be present, as the create
database statement
is generated only with -C option. So attached patch produces the GRANT
statements based
on the -x option.


The attached patch does the job fine. However, I am a little skeptical
about this addition, since, it is clearly mentioned in the documentation of
pg_dump that it would not restore global objects, then why expecting this.
This addditon makes pg_dump -C somewhat special as now it is restoring
these grant statements. Only if we consider the popular method of
dump-restore mentioned in the thread (https://www.postgresql.org/me
ssage-id/E1VYMqi-0001P4-P4%40wrigleys.postgresql.org) with pg_dumpall -g
and then individual pg_dump, then it would be helpful to have this patch.

Regards,
Rafia Sabih
EnterpriseDB:: http://www.enterprisedb.com

88 matches

Mail list logo