date:20170323

[HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-23 Thread Kuntal Ghosh

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

I'm referring the following part of pg_dump code:

/*
 * On Windows, we need to use binary mode to read/write non-text archive
 * formats.  Force stdin/stdout into binary mode if that is what we are
 * using.
 */
#ifdef WIN32
if (fmt != archNull &&
(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
{
if (mode == archModeWrite)
setmode(fileno(stdout), O_BINARY);
else
setmode(fileno(stdin), O_BINARY);
}
#endif

For plain-text format, fmt is set to archNull. In that case, the
binary mode will not be forced(I think). To fix this, I've attached a
patch which adds one extra check in the 'if condition' to check the
compression level. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-pg_dump-for-Windows-cmd.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] create_unique_path and GEQO

2017-03-23 Thread Ashutosh Bapat

On Wed, Mar 22, 2017 at 3:43 PM, Ashutosh Bapat
 wrote:
> Hi,
> In create_unique_path() there's comment
> /*
>  * We must ensure path struct and subsidiary data are allocated in main
>  * planning context; otherwise GEQO memory management causes trouble.
>  */
> oldcontext = MemoryContextSwitchTo(root->planner_cxt);
>
> pathnode = makeNode(UniquePath);
>
> This means that when GEQO resets the memory context, the RelOptInfo
> for which this path is created and may be set to cheapest_unique_path
> goes away, the unique path lingers on in the planner context.
> Shouldn't we instead allocate the path in the same context as the
> RelOptInfo similar to mark_dummy_rel()?
>

tried this change as attached patch. I ran make installcheck with
geqo_threshold = 2. Only join.sql failed several plans changed, which
is expected. There was one difference related to changed output order
but that's because of the changed plan.

Adding this to the commitfest.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_upath_geqo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

2017-03-23 Thread Peter Eisentraut

On 3/23/17 06:09, Vitaly Burovoy wrote:
> I think we'll end up with "DROP IDENTITY IF EXISTS" to avoid raising
> an exception and "ADD OR SET" if your grammar remains.

That sounds reasonable to me.

> Right. From that PoV IDENTITY also changes a default value: "SET (ADD
> ... AS?) IDENTITY" works as setting a default to "nextval(...)"
> whereas "DROP IDENTITY" works as setting it back to NULL.

But dropping and re-adding an identity destroys state, so it's not quite
the same.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-23 Thread Amit Kapila

On Thu, Mar 23, 2017 at 4:26 PM, Ashutosh Sharma  wrote:
> On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila  wrote:
>>
>> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila  wrote:
>> >
>> > I think this will work, but not sure if there is a merit to deviate
>> > from what btree does to handle this case.   One thing I find slightly
>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>> > using a number of tuples registered as part of fixed data
>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>> > I think it will be better if we register offsets also in fixed part of
>> > data as we are doing btree case.
>
> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>

Changes look good to me.   I think you can modify the comments in
structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
FOLLOW AT THE END"

>>
>> >
>> >
>>
>> Also another small point in this regard, do we need two separate
>> variables to track number of deleted items in below code?  I think one
>> variable is sufficient.
>>
>> _hash_vacuum_one_page()
>> {
>> ..
>> deletable[ndeletable++] = offnum;
>> tuples_removed += 1;--
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>> ..
>> }
>>
>
> Yes, I think 'ndeletable' alone should be fine.
>

I think it would have been probably okay to use *int* for ntuples as
that matches with what you are actually assigning in the function.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Peter Eisentraut

On 3/22/17 09:17, Stephen Frost wrote:
>> If we do it via GRANTs instead, then users can easily extend it.
> The intent here is that users will *also* be able to do it via GRANTs if
> they wish to.

But why not do it with GRANTs in the first place then?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-23 Thread Thomas Munro

On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro
 wrote:
> If that's fixed and the permissions question can be waved away by
> saying it's the same as the per-row situation, my only other comment
> would be a bikeshed issue: Enr isn't a great name for a struct.

One more thought: should this be allowed?

postgres=# create table mytab (i int) partition by list (i);
CREATE TABLE
postgres=# create table mytab1 partition of mytab for values in (42);
CREATE TABLE
postgres=# create function my_trigger_function() returns trigger as $$
begin end; $$ language plpgsql;
CREATE FUNCTION
postgres=# create trigger my_trigger after update on mytab referencing
old table as my_old for each statement execute procedure
my_trigger_function();
CREATE TRIGGER

I haven't really looked into the interaction of triggers and the new
partition feature very much but it looks like the intention is that
you shouldn't be allowed to do something that would need access to the
actual row data from a trigger that is attached to the top-level
partition:

postgres=# create trigger my_trigger before update on mytab for each
row execute procedure my_trigger_function();
ERROR:  "mytab" is a partitioned table
DETAIL:  Partitioned tables cannot have ROW triggers.

By the same logic, I guess that we shouldn't allow transition table
triggers to be attached to the top level partitioned table, because it
can't really work.

You can attach ROW triggers to the concrete tables that hold real
data, which makes sense because they actually have data to capture and
feed to the trigger function:

postgres=# create trigger my_trigger before update on mytab1 for each
row execute procedure my_trigger_function();
CREATE TRIGGER

Perhaps the moral equivalent should be possible for statement triggers
with transition tables, and that already works with your patch as far
as I know.  So I think your patch probably just needs to reject them
on partitioned tables.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut

On 3/23/17 21:47, Jeff Janes wrote:
> I have a pg_restore which predicts the file 5 files ahead of the one it
> was asked for, and initiates a pre-fetch and decompression of it. Then
> it delivers the file it was asked for, either by pulling it out of the
> pre-staging area set up by the N-5th invocation, or by going directly to
> the archive to get it.  This speeds up play-back dramatically when the
> files are stored compressed and non-local.

Yeah, some better support for prefetching would be necessary to avoid
having to have any knowledge of the file naming.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut

On 3/23/17 16:58, Stephen Frost wrote:
> The backup tools need to also get the LSN from the pg_stop_backup and
> verify that they have the WAL file associated with that LSN.

There is a function for that.

> They also
> need to make sure that they have all of the WAL files between the
> starting LSN and the ending LSN.  Doing that requires understanding how
> the files are named to make sure there aren't any missing.

There is not a function for that, but there could be one.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-03-23 Thread Peter Eisentraut

On 2/15/17 12:11, Robert Haas wrote:
> On Wed, Feb 15, 2017 at 11:30 AM, Peter Eisentraut
>  wrote:
>> If RegisterBackgroundWorker() (the non-dynamic kind that is only
>> loadable from shared_preload_libraries) fails to register the worker, it
>> writes a log message and proceeds, ignoring the registration request.  I
>> think that is a mistake, it should be a hard error.  The only way in
>> practice to fix the problem is to change shared_preload_libraries or
>> max_worker_processes, both requiring a restart anyway, so proceeding
>> without the worker is not useful.
> 
> I guess the question is whether people will prefer to have the
> database start up and be missing the worker, or to have it not start.
> As you point out, the former is likely to result in an eventual
> restart, but the latter may lead to a longer period of downtime RIGHT
> NOW.  People tend to really hate things that make the database not
> start, so I'm not sure what's best here.

Any other thoughts on this?  Seems like a potential usability issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma

On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma  wrote:
> On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila  wrote:
>> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  
>> wrote:

 Oh, okay, but my main objection was that we should not check hash page
 type (hasho_flag) without ensuring whether it is a hash page.  Can you
 try to adjust the above code so that this check can be moved after
 hasho_page_id check?
>>>
>>> Yes, I got your point. I have done that but then i had to remove the
>>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>>> only be true for one page in entire hash index table and can be
>>> ignored. If you wish, I could mention about it in the documentation.
>>>
>>
>> Yeah, I think it is worth adding a Note in docs, but we can do that
>> separately if required.
>>

> To avoid
> this, at the start of verify_hash_page function itself if we recognise
> page as UNUSED page we return immediately.
>
>>
>> 2.
>> + /* Check if it is an empty hash page. */
>> + if (PageIsEmpty(page))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>> + errmsg("index table contains empty page")));
>>
>>
>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>> it better that the same error message can be given for both of them as
>> from user perspective there is not much difference between both the
>> messages?
>
> I think we should show separate message because they are two different
> type of pages. In the sense like, one is initialised whereas other is
> completely zero.
>

 I understand your point, but not sure if it makes any difference to user.

>>>
>>> okay, I have now anyways removed the check for PageIsEmpty. Please
>>> refer to the attached '0002
>>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>>
>>
>> +
>> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
>> ereport(ERROR,
>>
>> spurious white space.
>>
>>>
>>> Also, I have attached
>>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>>> handles your comment mentioned in [1].
>>>
>>
>> In general, we have to initialize prevblock with max_bucket, but here
>> it is okay, because we anyway initialize it after this page is
>> allocated as overflow page.
>>
>> Your patches look good to me except for small white space change.
>
> Thanks for reviewing my patch. I have removed the extra white space.
> Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag =

[HACKERS] comment/security label for publication/subscription

2017-03-23 Thread Peter Eisentraut

Here is a patch to add COMMENT support for publications and subscriptions.

On a similar issue, do we need SECURITY LABEL support for those?  Does
that make sense?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Add-COMMENT-support-for-publications-and-subscriptio.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma

On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila  wrote:
> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  
> wrote:
>>>
>>> Oh, okay, but my main objection was that we should not check hash page
>>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>>> try to adjust the above code so that this check can be moved after
>>> hasho_page_id check?
>>
>> Yes, I got your point. I have done that but then i had to remove the
>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>> only be true for one page in entire hash index table and can be
>> ignored. If you wish, I could mention about it in the documentation.
>>
>
> Yeah, I think it is worth adding a Note in docs, but we can do that
> separately if required.
>
>>>
 To avoid
 this, at the start of verify_hash_page function itself if we recognise
 page as UNUSED page we return immediately.

>
> 2.
> + /* Check if it is an empty hash page. */
> + if (PageIsEmpty(page))
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index table contains empty page")));
>
>
> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
> it better that the same error message can be given for both of them as
> from user perspective there is not much difference between both the
> messages?

 I think we should show separate message because they are two different
 type of pages. In the sense like, one is initialised whereas other is
 completely zero.

>>>
>>> I understand your point, but not sure if it makes any difference to user.
>>>
>>
>> okay, I have now anyways removed the check for PageIsEmpty. Please
>> refer to the attached '0002
>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>
>
> +
> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
> ereport(ERROR,
>
> spurious white space.
>
>>
>> Also, I have attached
>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>> handles your comment mentioned in [1].
>>
>
> In general, we have to initialize prevblock with max_bucket, but here
> it is okay, because we anyway initialize it after this page is
> allocated as overflow page.
>
> Your patches look good to me except for small white space change.

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..6ba8847

Re: [HACKERS] free space map and visibility map

2017-03-23 Thread Masahiko Sawada

On Fri, Mar 24, 2017 at 11:01 AM, Kyotaro HORIGUCHI
 wrote:
> At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada  
> wrote in 
>> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas  wrote:
>> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes  wrote:
>> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
>> >> can't leave the block as all visible or all frozen).  I think the issue is
>> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
>> >> that neither of those ever update the FSM, regardless of FPI?
>> >
>> > Yes, updates to the FSM are never logged.  Forcing replay of
>> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> >
>>
>> I think I was missing something. I imaged your situation is that FPI
>> is replayed during crash recovery after the crashed server vacuums the
>> page and marked it as all-frozen. But this situation is also resolved
>> by that solution.
>
> # HEAP2_CLEAN is issued in lazy_vacuum_page
>
> It will work but I'm not sure it is right direction for
> HEAP2_FREEZE_PAGE to touch FSM.
>
> As Masahiko said, the situation must be created by HEAP2_VISIBLE
> without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> think only the latter can happen. The comment in heap_xlog_clean
> below is right generally but if a page filled with tuples becomes
> almost empty and freezable by this cleanup, a problematic
> situation like this occurs.
>
>> /*
>>  * Update the FSM as well.
>>  *
>>  * XXX: Don't do this if the page was restored from full page image. We
>>  * don't bother to update the FSM in that case, it doesn't need to be
>>  * totally accurate anyway.
>>  */
>> if (action == BLK_NEEDS_REDO)
>>   XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
>
> HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
> these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
> occurs infrequently than the three. So I suppose HEAP2_CLEAN may
> always update FSM.
>
> Even if the page is not frozen, the similar situation is made
> with just ALL_VISIBLE. Without any updates on the page, freespace
> information for the page won't be corrected until the next
> freezing(or 'aggressive') vacuum occurs.
>
> From this point of view, HEAP2_FREEZE_PAGE is not responsible for
> updating FSM. But if we see that always updating FSM on
> HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
> to go.
>
> (I don't understand the reason for skipping updating FSM only for
>  FPI. This seems introduced by f8f42279)
>

This code is introduced by e9816533e39be464227b748ee5eeb3d9f688cd76
and discussion is here[1].
ISTM that this code is implemented based on that all page will be
vacuumed eventually. But now that we have freeze map and the pages
could never be vacuum, it would be worth to consider that behavior
again.

[1] 
https://www.postgresql.org/message-id/flat/49072021.7010801%40enterprisedb.com#49072021.7010...@enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Amit Kapila

On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  wrote:
>>
>> Oh, okay, but my main objection was that we should not check hash page
>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>> try to adjust the above code so that this check can be moved after
>> hasho_page_id check?
>
> Yes, I got your point. I have done that but then i had to remove the
> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
> only be true for one page in entire hash index table and can be
> ignored. If you wish, I could mention about it in the documentation.
>

Yeah, I think it is worth adding a Note in docs, but we can do that
separately if required.

>>
>>> To avoid
>>> this, at the start of verify_hash_page function itself if we recognise
>>> page as UNUSED page we return immediately.
>>>

 2.
 + /* Check if it is an empty hash page. */
 + if (PageIsEmpty(page))
 + ereport(ERROR,
 + (errcode(ERRCODE_INDEX_CORRUPTED),
 + errmsg("index table contains empty page")));


 Do we want to give a separate message for EMPTY and NEW pages?  Isn't
 it better that the same error message can be given for both of them as
 from user perspective there is not much difference between both the
 messages?
>>>
>>> I think we should show separate message because they are two different
>>> type of pages. In the sense like, one is initialised whereas other is
>>> completely zero.
>>>
>>
>> I understand your point, but not sure if it makes any difference to user.
>>
>
> okay, I have now anyways removed the check for PageIsEmpty. Please
> refer to the attached '0002
> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>

+
if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
ereport(ERROR,

spurious white space.

>
> Also, I have attached
> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
> handles your comment mentioned in [1].
>

In general, we have to initialize prevblock with max_bucket, but here
it is okay, because we anyway initialize it after this page is
allocated as overflow page.

Your patches look good to me except for small white space change.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump --sequence-data option

2017-03-23 Thread Peter Eisentraut

At the conclusion of
, pg_upgrade was
changed to upgrade sequences "logically".  We initially did that by
adding a pg_dump option --sequence-data that would dump sequence data
(setval calls) in spite of --schema-only.  Later, that option was
removed as a separate option and made automatic in --binary-upgrade,
because we didn't think it was useful independently.

For logical replication, sequence data is not replicated at the moment.
So it could be useful in some cases to have a simple way to dump just
the sequence data.  So I think it would be worth adding that option back.

Comments?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-23 Thread Peter Eisentraut

On 3/21/17 23:34, Tom Lane wrote:
> Peter Eisentraut  writes:
>> No answer.  Can we remove this chunk?
> 
>>> +   if (no_role_passwords && binary_upgrade)
> 
> Perhaps, but why?  ISTM that trying to run pg_upgrade as non-superuser
> is a nonstarter for a number of reasons, while if you're superuser you
> do not need --no-role-passwords.

Well, this code was added, apparently without reason.  We don't need to
actively prohibit option combinations just because they are unusual.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-23 Thread Peter Eisentraut

On 3/17/17 18:35, Tomas Vondra wrote:
> On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
>> I'm struggling to find a good way to share code between
>> bt_page_items(text, int4) and bt_page_items(bytea).
>>
>> If we do it via the SQL route, as I had suggested, it makes the
>> extension non-relocatable, and it will also create a bit of a mess
>> during upgrades.
>>
>> If doing it in C, it will be a bit tricky to pass the SRF context
>> around.  There is no "DirectFunctionCall within SRF context", AFAICT.
> 
> Not sure what it has to do with DirectFunctionCall? You want to call the 
> bytea variant from the existing one? Wouldn't it be easier to simply 
> define a static function with the shared parts, and pass around the 
> fctx/fcinfo? Not quite pretty, but should work.

Perhaps what was added in

would actually work here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Sync pg_dump and pg_dumpall output

2017-03-23 Thread Michael Paquier

On Thu, Mar 23, 2017 at 1:10 AM, Stephen Frost  wrote:
> Andrew,
>
> * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
>> On 03/22/2017 11:39 AM, Stephen Frost wrote:
>> > * Andrew Dunstan (and...@dunslane.net) wrote:
>> >> Sync pg_dump and pg_dumpall output
>> > This probably should have adjusted all callers of pg_dump in the
>> > regression tests to use the --no-sync option, otherwise we'll end up
>> > spending possibly a good bit of time calling fsync() during the
>> > regression tests unnecessairly.
>>
>> All of them? The imnpact is not likely to be huge in most cases
>> (possibly different on Windows). On crake, the bin-check stage actually
>> took less time after the change than before, so I suspect that the
>> impact will be pretty small.
>
> Well, perhaps not all, but I'd think --no-sync would be better to have
> in most cases.  We turn off fsync for all of the TAP tests and all
> initdb calls, so it seems like we should here too.  Perhaps it won't be
> much overhead on an unloaded system, but not all of the buildfarm
> animals seem to be on unloaded systems, nor are they particularly fast
> in general or have fast i/o..

Agreed.

>> Still I agree that we should have tests for both cases.
>
> Perhaps, though if I recall correctly, we've basically had zero calls
> for fsync() until this.  If we don't feel like we need to test that in
> the backend then it seems a bit silly to feel like we need it for
> pg_dump's regression coverage.
>
> That said, perhaps the right answer is to have a couple tests for both
> the backend and for pg_dump which do exercise the fsync-enabled paths.

And agreed.

The patch attached uses --no-sync in most places for pg_dump, except
in 4 places of 002_pg_dump.pl to stress as well the sync code path.
-- 
Michael


test-dump-nosync.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-03-23 Thread Peter Eisentraut

I think the fix belongs into the web site CSS, so there is nothing to
commit into PostgreSQL here.  I will close the commit fest entry, but I
have added a section to the open items list so we keep track of it.
(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we create a new roadmap page for development?

2017-03-23 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> On Tue, Mar 21, 2017 at 2:36 AM, Tsunakawa, Takayuki
>  wrote:
> > Should I create a page for PostgreSQL 11 likewise?  Or, do you want a
> more stable page named "PostgreSQL Roadmap" instead of "PostgreSQL11
> Roadmap" and attach the target version to each feature as in "Feature X
> (V11)", so that we can represent more longer-term goals?
> 
> I think a page that's just called PostgreSQL Roadmap will get out of date
> pretty quickly.  Ultimately it'll end up like the TODO list, which is not
> really a list of things TO DO.  The advantage of the version-specific pages
> is that old information kind of ages its way out of the system.

Maybe so.  I created a page for PostgreSQL 11 and add a link to our roadmap:

https://wiki.postgresql.org/wiki/PostgreSQL11_Roadmap

Please add your roadmaps when you can.


> > And, why don't we add a link to the above roadmap page in the "Development
> information" page?
> >
> > Development information
> > https://wiki.postgresql.org/wiki/Development_information
> 
> I'm sure nobody will object to you doing that.  It's a wiki, and intended
> to be edited.

Thanks, done. Also, I moved the existing the link to "Development projects" 
from " Past PgCon Developer Meeting Notes" to the new header "Roadmaps and 
Projects", because the development projects page doesn't appear to fit the 
PGCON notes.

Regards
Takayuki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] free space map and visibility map

2017-03-23 Thread Kyotaro HORIGUCHI

At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada  
wrote in 
> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas  wrote:
> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes  wrote:
> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
> >> can't leave the block as all visible or all frozen).  I think the issue is
> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
> >> that neither of those ever update the FSM, regardless of FPI?
> >
> > Yes, updates to the FSM are never logged.  Forcing replay of
> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> >
> 
> I think I was missing something. I imaged your situation is that FPI
> is replayed during crash recovery after the crashed server vacuums the
> page and marked it as all-frozen. But this situation is also resolved
> by that solution.

# HEAP2_CLEAN is issued in lazy_vacuum_page

It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.

As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.

> /*
>  * Update the FSM as well.
>  *
>  * XXX: Don't do this if the page was restored from full page image. We
>  * don't bother to update the FSM in that case, it doesn't need to be
>  * totally accurate anyway.
>  */
> if (action == BLK_NEEDS_REDO)
>   XLogRecordPageWithFreeSpace(rnode, blkno, freespace);

HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
occurs infrequently than the three. So I suppose HEAP2_CLEAN may
always update FSM.

Even if the page is not frozen, the similar situation is made
with just ALL_VISIBLE. Without any updates on the page, freespace
information for the page won't be corrected until the next
freezing(or 'aggressive') vacuum occurs.

>From this point of view, HEAP2_FREEZE_PAGE is not responsible for
updating FSM. But if we see that always updating FSM on
HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
to go.

(I don't understand the reason for skipping updating FSM only for
 FPI. This seems introduced by f8f42279)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

On 2017-03-23 21:58:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> >> Hmm, I see ... but that only works in the cases where the caller of
> >> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> >> don't.
> 
> > Right, the old and new code comment on that:
> 
> >  * inputDesc can be NULL, but if it is not, we check to see whether simple
> >  * Vars in the tlist match the descriptor.  It is important to provide
> >  * inputDesc for relation-scan plan nodes, as a cross check that the 
> > relation
> >  * hasn't been changed since the plan was made.  At higher levels of a plan,
> >  * there is no need to recheck.
> 
> Ah, I'd forgotten the assumption that we only need to check this at scan
> level.
> 
> >> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> >> step types.  I could take it back out, but I wonder if it wouldn't be
> >> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> >> Or maybe we could have ExecBuildProjectionInfo emit either
> >> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> >> the reference safe.
> 
> > I think it's probably ok to just leave the check in, and remove those
> > comments, and simplify the isSimpleVar stuff to only check if
> >  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)
> 
> Not sure.  It's a pretty fair amount of duplicative code, once you finish
> dealing with all the ExecJustFoo functions in addition to the main code
> paths.  At this point I'm inclined to take it back out and improve the
> comments around ExecBuildProjectionInfo.

I'm ok with both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
>> Hmm, I see ... but that only works in the cases where the caller of
>> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
>> don't.

> Right, the old and new code comment on that:

>  * inputDesc can be NULL, but if it is not, we check to see whether simple
>  * Vars in the tlist match the descriptor.  It is important to provide
>  * inputDesc for relation-scan plan nodes, as a cross check that the relation
>  * hasn't been changed since the plan was made.  At higher levels of a plan,
>  * there is no need to recheck.

Ah, I'd forgotten the assumption that we only need to check this at scan
level.

>> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
>> step types.  I could take it back out, but I wonder if it wouldn't be
>> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
>> Or maybe we could have ExecBuildProjectionInfo emit either
>> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
>> the reference safe.

> I think it's probably ok to just leave the check in, and remove those
> comments, and simplify the isSimpleVar stuff to only check if
>  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

Not sure.  It's a pretty fair amount of duplicative code, once you finish
dealing with all the ExecJustFoo functions in addition to the main code
paths.  At this point I'm inclined to take it back out and improve the
comments around ExecBuildProjectionInfo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Jeff Janes

On Thu, Mar 23, 2017 at 1:45 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/22/17 17:33, David Steele wrote:
>
> > and I doubt that most tool writers would be quick to
> > add support for a feature that very few people (if any) use.
>
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.
>

I have a pg_restore which predicts the file 5 files ahead of the one it was
asked for, and initiates a pre-fetch and decompression of it. Then it
delivers the file it was asked for, either by pulling it out of the
pre-staging area set up by the N-5th invocation, or by going directly to
the archive to get it.  This speeds up play-back dramatically when the
files are stored compressed and non-local.

That is why I need to know how the files are numbered.  I don't think that
that makes much of a difference, though.  Any change is going to break
that, no matter which change.  Then I'll fix it.

If we are going to break it, I'd prefer to just do away with the 'segment'
thing altogether.  You have timelines, and you have files.  That's it.

Cheers,

Jeff

Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andres Freund

On 2017-03-24 02:31:47 +0100, Andreas Karlsson wrote:
> On 08/01/2015 05:14 PM, Andres Freund wrote:
> > According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
> > is planning to relicense to the apache license 2.0. While APL2 is not
> > compatible with GLP2 it *is* compatible with GPL3.
> 
> Great! This means that the Debian packages will eventually be able to drop
> their LD_PRELOAD hack, which never worked perfectly due to compiling against
> libedit or libreadline header resulting in different binaries.

Well, relicensing is hard. It's far from guaranteed to succeed...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> >> The problem here is that once we drop the buffer pin, any pointers we may
> >> have into on-disk data are dangling pointers --- we're at risk of some
> >> other backend taking away that shared buffer.  (So I'm afraid that the
> >> commentary there is overly optimistic.)  So even though those pointers
> >> may still be there beyond tts_nvalid, subsequent references to them are
> >> very dangerous.
> 
> > This applies to the code in master as well, no?  An ExecEvalScalarVar()
> > followed by an ExecEvalWholeRowVar() would have precisely the same
> > effect?
> 
> Yeah.  The other order would be safe, because ExecEvalScalarVar would do
> slot_getattr which would re-extract the value from the newly materialized
> tuple.  But there definitely seems to be a hazard for the order you
> mentioned.
> 
> > Do we need to do anything about this in the back-branches,
> > given how unlikely this is going to be in practice?
> 
> Probably not.  As I mentioned, I think this may be only theoretical rather
> than real, if you believe that buffer pins would only be associated with
> slots holding references to regular tuples.  And even if it's not
> theoretical, the odds of seeing a failure in the field seem pretty tiny
> given that a just-released buffer shouldn't be subject to recycling for
> a fair while.  But I don't want to leave it like this going forward.

Ok.


> >> Also, while trying to test the above scenario, I realized that the patch
> >> as submitted was being way too cavalier about where it was applying 
> >> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> >> instance, had no protection at all.
> 
> > I don't think that's true - the assign checks had copied the code from
> > the old ExecBuildProjectionInfo, setting isSimpleVar iff
> > (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> > check that for projections in contrast to normal expressions because we
> > already know the slot.
> 
> Hmm, I see ... but that only works in the cases where the caller of
> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> don't.

Right, the old and new code comment on that:

 * inputDesc can be NULL, but if it is not, we check to see whether simple
 * Vars in the tlist match the descriptor.  It is important to provide
 * inputDesc for relation-scan plan nodes, as a cross check that the relation
 * hasn't been changed since the plan was made.  At higher levels of a plan,
 * there is no need to recheck.

and that seems like reasonable to me?  That said, I think we can remove
that assumption, by checking once.


> As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
> of places, including everywhere above the relation scan level.

Hm? If inputDesc isn't given we just, before and after, do:
if (!inputDesc)
isSimpleVar = true; /* can't check 
type, assume OK */



> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> step types.  I could take it back out, but I wonder if it wouldn't be
> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> Or maybe we could have ExecBuildProjectionInfo emit either
> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> the reference safe.

I think it's probably ok to just leave the check in, and remove those
comments, and simplify the isSimpleVar stuff to only check if
 IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andreas Karlsson


On 08/01/2015 05:14 PM, Andres Freund wrote:

According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
is planning to relicense to the apache license 2.0. While APL2 is not
compatible with GLP2 it *is* compatible with GPL3.


Great! This means that the Debian packages will eventually be able to 
drop their LD_PRELOAD hack, which never worked perfectly due to 
compiling against libedit or libreadline header resulting in different 
binaries.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2017-03-23 Thread Craig Ringer

On 24 March 2017 at 04:54, Peter Eisentraut
 wrote:

> Fixed.

Congratulations on getting this done. It's great work, and it'll make
a whole class of potential bugs and platform portability warts go away
if widely adopted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Craig Ringer

On 24 March 2017 at 05:39, Thomas Munro  wrote:

> Fujii-san for the idea of tracking write and flush lag too

You mentioned wishing that logical replication would update sent lag
as the decoding position.

It appears to do just that already; see the references to restart_lsn
in StartLogicalReplication, and the update of sentPtr in
XLogSendLogical .

It's a bit misleading, since it hasn't *sent* anything, it buffers it
until commit. But it's useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
>> The problem here is that once we drop the buffer pin, any pointers we may
>> have into on-disk data are dangling pointers --- we're at risk of some
>> other backend taking away that shared buffer.  (So I'm afraid that the
>> commentary there is overly optimistic.)  So even though those pointers
>> may still be there beyond tts_nvalid, subsequent references to them are
>> very dangerous.

> This applies to the code in master as well, no?  An ExecEvalScalarVar()
> followed by an ExecEvalWholeRowVar() would have precisely the same
> effect?

Yeah.  The other order would be safe, because ExecEvalScalarVar would do
slot_getattr which would re-extract the value from the newly materialized
tuple.  But there definitely seems to be a hazard for the order you
mentioned.

> Do we need to do anything about this in the back-branches,
> given how unlikely this is going to be in practice?

Probably not.  As I mentioned, I think this may be only theoretical rather
than real, if you believe that buffer pins would only be associated with
slots holding references to regular tuples.  And even if it's not
theoretical, the odds of seeing a failure in the field seem pretty tiny
given that a just-released buffer shouldn't be subject to recycling for
a fair while.  But I don't want to leave it like this going forward.

>> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
>> clobber the state of the slot.

> That seems like a good plan.

Yeah.  I have the stopgap code in my working copy, and will look at
refactoring the tuptoaster code for better performance later.

>> Also, while trying to test the above scenario, I realized that the patch
>> as submitted was being way too cavalier about where it was applying 
>> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
>> instance, had no protection at all.

> I don't think that's true - the assign checks had copied the code from
> the old ExecBuildProjectionInfo, setting isSimpleVar iff
> (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> check that for projections in contrast to normal expressions because we
> already know the slot.

Hmm, I see ... but that only works in the cases where the caller of
ExecBuildProjectionInfo supplied a source slot, and a lot of 'em don't.
As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
of places, including everywhere above the relation scan level.

I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
step types.  I could take it back out, but I wonder if it wouldn't be
smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
Or maybe we could have ExecBuildProjectionInfo emit either
ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
the reference safe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andres Freund

On 2015-08-01 17:14:10 +0200, Andres Freund wrote:
> According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
> is planning to relicense to the apache license 2.0. While APL2 is not
> compatible with GLP2 it *is* compatible with GPL3.

Just 5 minutes later, some progress on that front:

https://www.openssl.org/blog/blog/2017/03/20/license/

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.  Right at the moment, the only way to do
> that seems to be to do this instead of ExecFetchSlotTupleDatum:
> 
> tuple = ExecCopySlotTuple(slot);
> dtuple = (HeapTupleHeader)
> DatumGetPointer(heap_copy_tuple_as_datum(tuple,
>  slot->tts_tupleDescriptor));
> heap_freetuple(tuple);

Hm.  One disadvantage would be that repeated whole-row references to the
same table would be a bit slower, because we'd repeatedly form a tuple
from a virtual one - but I have a hard time coming up with a scenario
where that'd matter.  I'd suspect that in the end it'd probably even
have a *positive* performance impact, because right now the next scalar
access will have to deform the whole tuple again, and that seems like a
lot more likely scenario.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-23 Thread Amit Langote

On 2017/03/23 23:47, Amit Langote wrote:
> On Thu, Mar 23, 2017 at 11:27 PM, Maksim Milyutin
>  wrote:
>> Hi!
>>
>> I have noticed that there is scheduled unlinking of nonexistent physical
>> storage under partitioned table when we execute DROP TABLE statement on this
>> partitioned table. Though this action doesn't generate any error under
>> typical behavior of postgres because the error of storage's lack is caught
>> through if-statement [1] I think it is not safe.
>>
>> My patch fixes this issue.
>>
>> 1. src/backend/storage/smgr/md.c:1385
> 
> Good catch, will incorporate that in the main patch.

And here is the updated patch.

Thanks,
Amit
>From aed0685b6bfb99a301e674221fb393cc4e660461 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 24 Jan 2017 14:22:34 +0900
Subject: [PATCH] Do not allocate storage for partitioned tables.

Currently, it is not possible to insert any data into a partitioned
table.  So they're empty at all times, which means it is wasteful to
allocate relfilenode and related storage objects.

Also documented the fact that specifying storage options for partitioned
tables is ineffective at the moment.
---
 doc/src/sgml/ref/create_table.sgml |  6 ++
 src/backend/catalog/heap.c | 17 ++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 9ed25c05da..7ac55b17fa 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1040,6 +1040,12 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 will use the table's parameter value.

 
+   
+Note that specifying the following parameters for partitioned tables has
+no effect at the moment.  You must specify them for individual leaf
+partitions if necessary.
+   
+

 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 41c0056556..2f5090b183 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -291,6 +291,7 @@ heap_create(const char *relname,
 		case RELKIND_VIEW:
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
 			create_storage = false;
 
 			/*
@@ -1345,14 +1346,13 @@ heap_create_with_catalog(const char *relname,
 	if (oncommit != ONCOMMIT_NOOP)
 		register_on_commit_action(relid, oncommit);
 
-	if (relpersistence == RELPERSISTENCE_UNLOGGED)
-	{
-		Assert(relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW ||
-			   relkind == RELKIND_TOASTVALUE ||
-			   relkind == RELKIND_PARTITIONED_TABLE);
-
+	/*
+	 * We do not want to create any storage objects for a partitioned
+	 * table, including the init fork.
+	 */
+	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
+		relkind != RELKIND_PARTITIONED_TABLE)
 		heap_create_init_fork(new_rel_desc);
-	}
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1376,6 +1376,9 @@ heap_create_with_catalog(const char *relname,
 void
 heap_create_init_fork(Relation rel)
 {
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
 	RelationOpenSmgr(rel);
 	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
 	log_smgrcreate(>rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about
> 
> +/*
> + * Can't assert tts_nvalid, as wholerow var evaluation or such
> + * could have materialized the slot - but the contents are still
> + * valid :/
> + */
> +Assert(op->d.var.attnum >= 0);
> 
> is actually averting its eyes from a potentially serious problem.  If we
> go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
> ExecFetchSlotTuple decides it has to materialize the tuple, then
> ExecMaterializeSlot does this:
> 
> /*
>  * Drop the pin on the referenced buffer, if there is one.
>  */
> if (BufferIsValid(slot->tts_buffer))
> ReleaseBuffer(slot->tts_buffer);
> 
> slot->tts_buffer = InvalidBuffer;
> 
> /*
>  * Mark extracted state invalid.  This is important because the slot is
>  * not supposed to depend any more on the previous external data; we
>  * mustn't leave any dangling pass-by-reference datums in tts_values.
>  * However, we have not actually invalidated any such datums, if there
>  * happen to be any previously fetched from the slot.  (Note in particular
>  * that we have not pfree'd tts_mintuple, if there is one.)
>  */
> slot->tts_nvalid = 0;
> 
> 
> The problem here is that once we drop the buffer pin, any pointers we may
> have into on-disk data are dangling pointers --- we're at risk of some
> other backend taking away that shared buffer.  (So I'm afraid that the
> commentary there is overly optimistic.)  So even though those pointers
> may still be there beyond tts_nvalid, subsequent references to them are
> very dangerous.

This applies to the code in master as well, no?  An ExecEvalScalarVar()
followed by an ExecEvalWholeRowVar() would have precisely the same
effect?  Do we need to do anything about this in the back-branches,
given how unlikely this is going to be in practice?


> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.

That seems like a good plan.


> Also, while trying to test the above scenario, I realized that the patch
> as submitted was being way too cavalier about where it was applying 
> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> instance, had no protection at all.

I don't think that's true - the assign checks had copied the code from
the old ExecBuildProjectionInfo, setting isSimpleVar iff
(!attr->attisdropped && variable->vartype == attr->atttypid) - we can
check that for projections in contrast to normal expressions because we
already know the slot.  The relevant comment for that, from before the
patch, is:
 * inputDesc.  (Note: if there is a type mismatch then ExecEvalScalarVar
 * will probably throw an error at runtime, but we leave that to it.)
 */


> I hope to have a fully reviewed patch to pass back to you tomorrow.
> Or Saturday at the latest.

Cool.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

Hi,m

On 2017-03-23 17:40:55 -0400, Tom Lane wrote:
> Stylistic thought ... I am wondering if it wouldn't be a good idea
> to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
> and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
> usage-dependent way as
> 
>   EEOP_JUMP   unconditional jump
>   EEOP_JUMP_IF_NULL   jump if step result is null
>   EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
>   EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE
> 
> One could imagine later filling out this set with the other BoolTest
> condition types, but that seems to be all we need right now.

Hm, no arguments against, but I'm also not particularly excited about
the change.


> These are basically just renamings of the step types that exist now,
> although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
> necessary Assert(!op->d.arrayref.state->isassignment).

I won't shed a tear about that assert's removal.


> Well, I guess I should say that they're renamings of the semantics
> that I have for these steps in my working copy; for instance, I got
> rid of casewhen.value/casewhen.isnull in favor of letting CASE WHEN
> expressions evaluate into the CASE's final output variable.

That sounds like a sensible change (in the abstract, I obviously haven't
seen your working copy).


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane

I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about

+/*
+ * Can't assert tts_nvalid, as wholerow var evaluation or such
+ * could have materialized the slot - but the contents are still
+ * valid :/
+ */
+Assert(op->d.var.attnum >= 0);

is actually averting its eyes from a potentially serious problem.  If we
go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
ExecFetchSlotTuple decides it has to materialize the tuple, then
ExecMaterializeSlot does this:

/*
 * Drop the pin on the referenced buffer, if there is one.
 */
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);

slot->tts_buffer = InvalidBuffer;

/*
 * Mark extracted state invalid.  This is important because the slot is
 * not supposed to depend any more on the previous external data; we
 * mustn't leave any dangling pass-by-reference datums in tts_values.
 * However, we have not actually invalidated any such datums, if there
 * happen to be any previously fetched from the slot.  (Note in particular
 * that we have not pfree'd tts_mintuple, if there is one.)
 */
slot->tts_nvalid = 0;


The problem here is that once we drop the buffer pin, any pointers we may
have into on-disk data are dangling pointers --- we're at risk of some
other backend taking away that shared buffer.  (So I'm afraid that the
commentary there is overly optimistic.)  So even though those pointers
may still be there beyond tts_nvalid, subsequent references to them are
very dangerous.

I think that it's pretty hard to hit this in practice, maybe impossible,
because the normal case for an "on-disk" tuple is that
TTS_HAS_PHYSICAL_TUPLE is true, so that ExecFetchSlotTuple won't change
the state of the slot.  If we have a virtual tuple that has to be
materialized, then by that very token it won't have a buffer pin to drop.
But I find this fragile as heck, and the aforesaid patch comment surely
isn't adequately documenting the safety considerations.  Also, if there
ever were a live bug here, reproducing it would be damn hard because of
the low probability that a just-unpinned buffer would get replaced any
time soon.  (Hm, I wonder whether the buffer cache needs something
analogous to the syscaches' CLOBBER_CACHE_ALWAYS behavior...)

Besides which, I really really don't like the lack of an "attnum <
tts_nvalid" assertion there; that's just obviously failing to check for
very simple bugs, such as getting the FETCHSOME steps wrong.

So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
clobber the state of the slot.  Right at the moment, the only way to do
that seems to be to do this instead of ExecFetchSlotTupleDatum:

tuple = ExecCopySlotTuple(slot);
dtuple = (HeapTupleHeader)
DatumGetPointer(heap_copy_tuple_as_datum(tuple,
 slot->tts_tupleDescriptor));
heap_freetuple(tuple);

That's kind of annoying because of the double copying involved, but
tuptoaster.c doesn't expose any functionality for this except
heap_copy_tuple_as_datum().  I figure we can improve it later --- it looks
like we can refactor heap_copy_tuple_as_datum to expose a function that
reads from Datum/isnull arrays, and then call it on the slot's
tts_values/tts_isnull arrays.  Seems like it might be a good idea to
think a bit harder about the TupleTableSlot APIs, too, and reduce the
number of cases where execTuples exposes destructive changes to the
state of a slot.

Also, while trying to test the above scenario, I realized that the patch
as submitted was being way too cavalier about where it was applying 
CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
instance, had no protection at all.  Think I have that all fixed up
though.

I hope to have a fully reviewed patch to pass back to you tomorrow.
Or Saturday at the latest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Michael Paquier

On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev  wrote:
> I believe patch looks good and it's ready to commit.

Thanks for the review!

> As I understand, it fixes bug introduced by
> commit 7117685461af50f50c03f43e6a622284c8d54694
> Date:   Tue Apr 5 20:03:49 2016 +0200
>
> Implement backup API functions for non-exclusive backups

Indeed.

> And, suppose, it should be backpatched to 9.6?

Yes, that's where non-exclusive backups have been introduced.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-23 Thread Thomas Munro

On Tue, Mar 14, 2017 at 7:51 AM, Kevin Grittner  wrote:
> On Sun, Mar 12, 2017 at 4:08 PM, Thomas Munro
>  wrote:
>> I found a new way to break it: run the trigger function so
>> that the plan is cached by plpgsql, then ALTER TABLE incompatibly,
>> then run the trigger function again.  See attached.
>
> [...]
>
> I expected that existing mechanisms would have forced re-planning of
> a trigger function if the table the function was attached to was
> altered.  Either that was a bit "optimistic", or the old TupleDesc
> is used for the new plan.  Will track down which it is, and fix it.

When PlanCacheRelCallback runs, I don't think it understands that
these named tuplestore RangeTblEntry objects are dependent on the
subject table.  Could that be fixed like this?

@@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node,
PlannerInfo *context)
if (rte->rtekind == RTE_RELATION)
context->glob->relationOids =

lappend_oid(context->glob->relationOids, rte->relid);
+   else if (rte->rtekind == RTE_NAMEDTUPLESTORE)
+   context->glob->relationOids =
+
lappend_oid(context->glob->relationOids, [subject table's OID]);
}

> I'll post a new patch once I figure out the dropped column on the
> cached function plan.

If that's fixed and the permissions question can be waved away by
saying it's the same as the per-row situation, my only other comment
would be a bikeshed issue: Enr isn't a great name for a struct.

Very keen to see this feature in PostgreSQL 10!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-23 Thread Jim Nasby


On 2/25/17 10:27 AM, Peter Eisentraut wrote:

So I'm also wondering here which style people prefer so
I can implement it there.


I think the more OO style is definitely better. I expect it would 
simplify the code as well.

--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek

On 24/03/17 00:14, Mark Kirkwood wrote:
> On 24/03/17 02:00, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump
>>> tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>> Committed all that.
>>
> 
> Testing now this patch is in, I'm unable to create a subscription:
> 
> (master)
> 
> bench=# CREATE PUBLICATION pgbench
> FOR TABLE pgbench_accounts , pgbench_branches,
>   pgbench_tellers, pgbench_history
> WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);
> 
> (slave)
> 
> bench=# CREATE SUBSCRIPTION pgbench
> CONNECTION 'port=5447 user=postgres dbname=bench'
> PUBLICATION pgbench
> WITH (COPY DATA);
> ERROR:  duplicate key value violates unique constraint
> "pg_subscription_rel_srrelid_srsubid_index"
> DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.
> 
> This is a pair of freshly initdb'ed instances, the master has a size 100
> pgbench schema.
> 
> I'm guessing this is a different bug from the segfault also reported
> 

Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

Attached patches should fix both issues.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From cdb2590e8b42d0dec4c00d8ae7a6affa2bb41372 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Mar 2017 00:24:47 +0100
Subject: [PATCH 2/2] Check that published table exists on subscriber

---
 src/backend/commands/subscriptioncmds.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8f201b2..ad11610 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -417,6 +417,11 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 Oid			relid;
 
 relid = RangeVarGetRelid(rv, AccessShareLock, true);
+if (relid == InvalidOid)
+	ereport(ERROR,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			 errmsg("published table \"%s.%s\" not found on subscriber",
+	rv->schemaname, rv->relname)));
 
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
@@ -513,6 +518,12 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
 		Oid			relid;
 
 		relid = RangeVarGetRelid(rv, AccessShareLock, false);
+		if (relid == InvalidOid)
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("published table \"%s.%s\" not found on subscriber",
+			rv->schemaname, rv->relname)));
+
 		pubrel_local_oids[off++] = relid;
 
 		if (!bsearch(, subrel_local_oids,
-- 
2.7.4

>From e492ead4b8036f03b4624093d64280aa6ea23129 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 23 Mar 2017 23:39:25 +0100
Subject: [PATCH 1/2] Always return tupleslot and tupledesc from libpqrcv_exec

---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 4dd8eef..9d7bb25 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -803,10 +803,6 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	MemoryContext	rowcontext;
 	MemoryContext	oldcontext;
 
-	/* No point in doing anything here if there were no tuples returned. */
-	if (PQntuples(pgres) == 0)
-		return;
-
 	/* Make sure we got expected number of fields. */
 	if (nfields != nRetTypes)
 		ereport(ERROR,
@@ -824,6 +820,10 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 		   PQfname(pgres, coln), retTypes[coln], -1, 0);
 	attinmeta = TupleDescGetAttInMetadata(walres->tupledesc);
 
+	/* No point in doing more here if there were no tuples returned. */
+	if (PQntuples(pgres) == 0)
+		return;
+
 	/* Create temporary context for local allocations. */
 	rowcontext = AllocSetContextCreate(CurrentMemoryContext,
 	   "libpqrcv query result context",
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Privilege checks on array coercions

2017-03-23 Thread Jim Nasby


On 3/23/17 12:37 PM, Andres Freund wrote:

On 2017-03-23 15:26:51 -0400, Tom Lane wrote:

There is a test in privileges.sql (currently lines 589-625 in
privileges.out) that seems to be dependent on the fact that the
ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
per-element type coercion function if it's dealing with a NULL input
array.

...

Does anyone want to defend this
privileges test case as testing for some behavior that users expect?


Not me - that seems quite sensible to change.


I'd even argue that existing behavior is a bug.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Mark Kirkwood


On 24/03/17 02:00, Peter Eisentraut wrote:

On 3/21/17 21:38, Peter Eisentraut wrote:

This patch is looking pretty good to me, modulo the failing pg_dump tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

Committed all that.



Testing now this patch is in, I'm unable to create a subscription:

(master)

bench=# CREATE PUBLICATION pgbench
FOR TABLE pgbench_accounts , pgbench_branches,
  pgbench_tellers, pgbench_history
WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);

(slave)

bench=# CREATE SUBSCRIPTION pgbench
CONNECTION 'port=5447 user=postgres dbname=bench'
PUBLICATION pgbench
WITH (COPY DATA);
ERROR:  duplicate key value violates unique constraint 
"pg_subscription_rel_srrelid_srsubid_index"

DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.

This is a pair of freshly initdb'ed instances, the master has a size 100 
pgbench schema.


I'm guessing this is a different bug from the segfault also reported


regards


Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 Andres> a) cast result of lfirst/lnext/whatnot.

Again, what we need here is something like

#define lfirst_node(_type_, l) (castNode(_type_, lfirst(l)))

etc.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 Andres> We usually cast the result of palloc.

 >> Rough count in the backend has ~400 without casts to ~1350 with, so
 >> this doesn't seem to have been consistently enforced.

 Andres> Yea, but we're still trying.

Well, a lot of the uncasted ones are in fairly new code, from quite a
number of different committers.

So if this is a big deal, why don't we already have

#define palloc_array(etype,ecount) (((etype) *) palloc((ecount) * 
sizeof(etype)))
#define palloc_object(otype) (((otype) *) palloc(sizeof(otype)))

or something of that ilk?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek

Hi,

On 23/03/17 23:06, Michael Banck wrote:
> Hi,
> 
> On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>>
>> Committed all that.
> 
> Maybe I'm doing something wrong, but I'm getting a segfault trying to
> start logical replication since that patch went in.
> 

Thanks for report. Looks like we don't handle correctly empty result set
when fetching table list. Will post fix tomorrow.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Euler Taveira

2017-03-18 12:29 GMT-03:00 Tom Lane :

>
> But createuser/dropuser are a real problem, because they certainly could
> be mistaken for system-level utilities.


I proposed something along those lines [1] to fix this historical mistake
but we didn't reach a consensus. createuser/dropuser could be a candidate
to removal because it is easily replaced by psql -c "command here" like
Simon said. If we go to this road, other binaries (that are just a wrapper
around an SQL command) could be removed too (such as createdb, dropdb,
clusterdb and reindexdb).


[1]
https://www.postgresql.org/message-id/bdd1adb1-c26d-ad1f-2f15-cc5205606...@timbira.com.br


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Michael Banck

Hi,

On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
> On 3/21/17 21:38, Peter Eisentraut wrote:
> > This patch is looking pretty good to me, modulo the failing pg_dump tests.
> > 
> > Attached is a fixup patch.  I have mainly updated some comments and
> > variable naming for (my) clarity.  No functional changes.
> 
> Committed all that.

Maybe I'm doing something wrong, but I'm getting a segfault trying to
start logical replication since that patch went in.

I've blown away my build tree and started over with this minimal
example, any obvious mistakes here?  Quoting the publisher/subscriber
names doesn't seem to change things:

$ initdb --pgdata=data_pg1
$ sed -i -e 's/^#wal_level.=.replica/wal_level = logical/' 
data_pg1/postgresql.conf
$ pg_ctl --pgdata=data_pg1 -l pg1.log start
$ pg_basebackup --pgdata=data_pg2
$ sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
$ pg_ctl --pgdata=data_pg2 -l pg2.log start
$ psql -c "CREATE PUBLICATION pub1;"
$ psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' 
PUBLICATION pub1;"

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
269 PinTupleDesc(tupdesc);
(gdb) bt
#0  ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
#1  0x005dc4fc in MakeSingleTupleTableSlot (tupdesc=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:203
#2  0x005a16ff in fetch_table_list (wrconn=wrconn@entry=0xc0030001,
publications=publications@entry=0xffb448)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:996
#3  0x005a1efa in CreateSubscription (stmt=0x101cd40, 
isTopLevel=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:396
#4  0x006ecf96 in ProcessUtilitySlow (pstate=0x0, 
pstate@entry=0xfc4818, pstmt=0x0,
pstmt@entry=0x101d080, queryString=0xf9d248 ,
queryString@entry=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", context=(unknown: 
16534992), context@entry=PROCESS_UTILITY_TOPLEVEL,
params=0x7ffd7e6263d0, params@entry=0x0,
completionTag=0x45 ,
completionTag@entry=0x7ffd7e626b00 "", dest=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:1612
#5  0x006ec4e9 in standard_ProcessUtility (pstmt=0x101d080,
queryString=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x101d160,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:922
#6  0x006e9f5f in PortalRunUtility (portal=0xfbb9d8, pstmt=0x101d080,
isTopLevel=, setHoldSnapshot=, 
dest=,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1165
#7  0x006ea977 in PortalRunMulti (portal=portal@entry=0xfbb9d8,
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1315
#8  0x006eb4cb in PortalRun (portal=portal@entry=0xfbb9d8,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:788
#9  0x006e7868 in exec_simple_query (
query_string=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:1101
#10 0x006e94a9 in PostgresMain (argc=1, argv=0x101c0b8, dbname=0xfc8478 
"postgres",
username=0xfc8458 "postgres")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:4069
#11 0x004770ad in BackendRun (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:4317
#12 BackendStartup (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:3989
#13 ServerLoop ()
at

Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-23 Thread Andrew Dunstan

On 03/21/2017 06:28 PM, Dmitry Dolgov wrote:
> > On 21 March 2017 at 03:03, Andrew Dunstan
>  > wrote:
> >
> > However, I think it should probably be broken up into a couple of
> pieces -
> > one for the generic json/jsonb transforms infrastructure (which probably
> > needs some more comments) and one for the FTS functions that will
> use it.
>
> Sure, here are two patches with separated functionality and a bit more
> commentaries for the transform functions.

I'm not through looking at this. However, here are a few preliminary
comments

  * we might need to rationalize the header locations a bit
  * iterate_json(b) and transform_json(b) are a bit too generally named.
Really what they do is iterate over or transform string values in
the json(b). They ignore / preserve the structure, keys, and
non-string scalar values in the json(b). A general iterate or
transform function would be called in effect with a stream of all
the elements in the json, not just scalar strings.
  * Unless I'm missing something the iterate_json(b)_values return value
is ignored. Instead of returning the state it looks to me like it
should return nothing and be declared as void instead of void *
  * transform_jsonb and transform_json are somewhat asymmetrical. The
latter should probably return a text* instead of a StringInfo, to be
consistent with the former.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-23 Thread Michael Paquier

On Wed, Mar 22, 2017 at 7:00 AM, Michael Paquier
 wrote:
> On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
>  wrote:
>> This is really a pretty small patch all things considered, and pretty
>> low-risk (although I haven;t been threough the code in fine detail yet).
>> In the end I'm persuaded by Andres' point that there's actually no
>> practical alternative way to make sure the data is actually synced to disk.
>>
>> If nobody else wants to pick it up I will, unless there is a strong
>> objection.
>
> Thanks!

Thanks Andrew, I can see that this has been committed as 96a7128b.

I also saw that:
https://www.postgresql.org/message-id/75e1b6ff-c3d5-9a26-e38b-3cb22a099...@2ndquadrant.com
I'll send a patch in a bit for the regression tests.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Michael Paquier

On Fri, Mar 24, 2017 at 5:08 AM, Teodor Sigaev  wrote:
> Hmm, it doesn't work (but appplies) on current HEAD:
> [...]
> Data page checksums are disabled.
>
> fixing permissions on existing directory /spool/pg_data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting dynamic shared memory implementation ... posix
> creating configuration files ... ok
> running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:
> could not open file "pg_clog": No such file or directory
> child process exited with exit code 1
> initdb: data directory "/spool/pg_data" not removed at user's request

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.
-- 
Michael
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2d33510930..7a007a6ba5 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -577,6 +577,13 @@ ShutdownCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
 	SimpleLruFlush(ClogCtl, false);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }
 
@@ -589,6 +596,13 @@ CheckPointCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
 	SimpleLruFlush(ClogCtl, true);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 8e1df6e0ea..03ffa20908 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -746,6 +746,12 @@ ShutdownCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, false);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
@@ -756,6 +762,12 @@ CheckPointCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, true);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4b4999fd7b..831693 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1650,6 +1650,14 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
 	}
 	LWLockRelease(TwoPhaseStateLock);
 
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  Two-phase files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created
+	 * previously since the last checkpoint.
+	 */
+	fsync_fname(TWOPHASE_DIR, true);
+
 	TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 
 	if (log_checkpoints && serialized_xacts > 0)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b99ded5df6..11b1929c94 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3469,7 +3469,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	if (!find_free)
 	{
 		/* Force installation: get rid of any pre-existing segment file */
-		unlink(path);
+		durable_unlink(path, DEBUG1);
 	}
 	else
 	{
@@ -4020,16 +4020,13 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	  path)));
 			return;
 		}
-		rc = unlink(newpath);
+		rc = durable_unlink(newpath, LOG);
 #else
-		rc = unlink(path);
+		rc = durable_unlink(path, LOG);
 #endif
 		if (rc != 0)
 		{
-			ereport(LOG,
-	(errcode_for_file_access(),
-			   errmsg("could not remove old transaction log file \"%s\": %m",
-	  path)));
+			/* Message already logged by durable_unlink() */
 			return;
 		}
 		CheckpointStats.ckpt_segs_removed++;
@@ -10752,17 +10749,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 		(errcode_for_file_access(),
 		 errmsg("could not read file \"%s\": %m",
 BACKUP_LABEL_FILE)));
-			if (unlink(BACKUP_LABEL_FILE) != 0)
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not remove file \"%s\": %m",
-BACKUP_LABEL_FILE)));
+			durable_unlink(BACKUP_LABEL_FILE, ERROR);
 
 			/*
 			 * Remove tablespace_map file if present, it is created only if there
 			 * are tablespaces.
 			 */
-			unlink(TABLESPACE_MAP);
+			durable_unlink(TABLESPACE_MAP, DEBUG1);
 		}
 		PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
 	}
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane

Stylistic thought ... I am wondering if it wouldn't be a good idea
to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
usage-dependent way as

EEOP_JUMP   unconditional jump
EEOP_JUMP_IF_NULL   jump if step result is null
EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE

One could imagine later filling out this set with the other BoolTest
condition types, but that seems to be all we need right now.

These are basically just renamings of the step types that exist now,
although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
necessary Assert(!op->d.arrayref.state->isassignment).  Well, I guess
I should say that they're renamings of the semantics that I have
for these steps in my working copy; for instance, I got rid of
casewhen.value/casewhen.isnull in favor of letting CASE WHEN expressions
evaluate into the CASE's final output variable.

At least to me, I think the compiling code would be more readable
this way.  I find WHEN_STEP and THEN_STEP a bit odd because they are
emitted after, not before, the expressions you'd think they control.
ARRAYREF_CHECKINPUT is pretty vaguely named, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro

On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here's the patch I'm looking to commit, with some docs and minor code
> changes as discussed.

Thank you for committing this.  Time-based replication lag tracking
seems to be a regular topic on mailing lists and IRC, so I hope that
this will provide what people are looking for and not simply replace
that discussion with a new discussion about what lag really means :-)

Many thanks to Simon and Fujii-san for convincing me to move the
buffer to the sender (which now seems so obviously better), to
Fujii-san for the idea of tracking write and flush lag too, and to
Abhijit, Sawada-san, Ian, Craig and Robert for valuable feedback.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/22/17 17:33, David Steele wrote:
> > and I doubt that most tool writers would be quick to 
> > add support for a feature that very few people (if any) use.
> 
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.

PG backup technology has come a fair ways from that simple
characterization of it. :)

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.  They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.

David will probably point out other reasons that the backup tools need
to understand the file naming, but those are ones I know of off-hand.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] ICU integration

2017-03-23 Thread Peter Eisentraut

On 3/23/17 16:45, Thomas Munro wrote:
> On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
>  wrote:
>> Committed.
> 
> On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
> a package manager that installs stuff under /opt/local.  I have
> /opt/local/bin in $PATH so that pkg-config can be found.  I run
> ./configure with --with-icu but without mentioning any special paths
> and it says:
> 
> checking whether to build with ICU support... yes
> checking for pkg-config... /opt/local/bin/pkg-config
> checking pkg-config is at least version 0.9.0... yes
> checking for icu-uc icu-i18n... yes
> 
> ... but then building fails, because there are no headers in the search path:

Fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2017-03-23 Thread Thomas Munro

On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
 wrote:
> Committed.

On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
a package manager that installs stuff under /opt/local.  I have
/opt/local/bin in $PATH so that pkg-config can be found.  I run
./configure with --with-icu but without mentioning any special paths
and it says:

checking whether to build with ICU support... yes
checking for pkg-config... /opt/local/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for icu-uc icu-i18n... yes

... but then building fails, because there are no headers in the search path:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O2 -Wall -Werror
-I../../../src/include/snowball
-I../../../src/include/snowball/libstemmer -I../../../src/include   -c
-o dict_snowball.o dict_snowball.c -MMD -MP -MF .deps/dict_snowball.Po
...
../../../src/include/utils/pg_locale.h:19:10: fatal error:
'unicode/ucol.h' file not found

But pkg-config does know where to find those headers:

$ pkg-config --cflags icu-i18n
-I/opt/local/include

... and it's not wrong:

$ ls /opt/local/include/unicode/ucol.h
/opt/local/include/unicode/ucol.h

So I think there may be a bug in the configure script: I think it
should be putting pkg-config's --cflags output into one of the
CFLAGS-like variables.

Or am I misunderstanding what pkg-config is supposed to be doing for us here?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut

On 3/22/17 17:33, David Steele wrote:
> I think if we don't change the default size it's very unlikely I would 
> use alternate WAL segment sizes or recommend that anyone else does, at 
> least in v10.
> 
> I simply don't think it would get the level of testing required to be 
> production worthy

I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.

More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.

> and I doubt that most tool writers would be quick to 
> add support for a feature that very few people (if any) use.

I'm not one of those tool writers, although I have written my share of
DBA scripts over the years, but I wonder why those tools would really
care.  They are handed files with predetermined names to archive, and
for restore files with predetermined names are requested back from them.
 What else do they need?  If something is missing that requires them to
parse file names, then maybe that should be added.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-23 Thread Teodor Sigaev


Hi, the patch looks good except why do you remove initialization of 
is_throttled?
Suppose, just a typo?

 --- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
st->listen = false;
st->sleeping = false;
st->throttling = false;
-   st->is_throttled = false;
memset(st->prepared, 0, sizeof(st->prepared));
}


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-23 Thread Jan Michálek

2017-03-23 17:26 GMT+01:00 Pierre Ducroquet :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Hi
>
> This is my first review (Magnus said in his presentation in PGDay Paris
> that volunteers should just come and help, so here I am), so please notify
> me for any mistake I do when using the review tools...
>
> The feature seems to work as expected, but I don't claim to be a markdown
> and rst expert.
> Some minor issues with the code itself :
> - some indentation issues (documentation and code itself with mix between
> space based and tab based indentation) and a few trailing spaces in code
> - typographic issues in the documentation :
>   - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown
> and rst formats" ==> duplicated and
>   - "Sets the output format to one of unaligned, aligned, wrapped, html,
> asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or
> troff-ms." ==> extra comma at the end of the list
> - the comment " dont add line after last row, because line is added after
> every row" is misleading, it should warn that it's only for rst
> - there is a block of commented out code left
> - in the print_aligned_vertical function, there is a mix between
> "cont->opt->format == PRINT_RST" and "format == _rst" and I don't see
> any obvious reason for that
> - the documentation doesn't mention (but ok, it's kind of obvious) that
> the linestyle option will not work with rst and markdown
>
> Thanks !
>

Thanks
I will work on it this weekend. I need to adapt it to current master and i
will do some indentation issues with this.
I need to add \x to markdown format and some things about title from older
posts there.

Nice Day Je;



>
> The new status of this patch is: Waiting on Author
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jelen
Starší čeledín datovýho chlíva

Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Teodor Sigaev


Hmm, it doesn't work (but appplies) on current HEAD:

% uname -a
FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 
02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR  amd64

% pg_config --configure
'--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' 
'--with-perl' 'CC=clang' 'CFLAGS=-O0'

% rm -rf /spool/pg_data/*
% initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C 
--lc-numeric C --lc-time C -D /spool/pg_data

Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "teodor".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  ru_RU.UTF-8
  CTYPE:ru_RU.UTF-8
  MESSAGES: C
  MONETARY: C
  NUMERIC:  C
  TIME: C
The default text search configuration will be set to "russian".

Data page checksums are disabled.

fixing permissions on existing directory /spool/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:  could 
not open file "pg_clog": No such file or directory

child process exited with exit code 1
initdb: data directory "/spool/pg_data" not removed at user's request


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2017-03-23 Thread Jeff Janes

On Thu, Mar 23, 2017 at 12:34 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/23/17 05:34, Andreas Karlsson wrote:
> > I am fine with this version of the patch. The issues I have with it,
> > which I mentioned earlier in this thread, seem to be issues with ICU
> > rather than with this patch. For example there seems to be no way for
> > ICU to validate the syntax of the BCP 47 locales (or ICU's old format).
> > But I think we will just have to accept the weirdness of how ICU handles
> > locales.
> >
> > I think this patch is ready to be committed.
> >
> > Found a typo in the documentation:
> >
> > "The inspect the currently available locales" should be "To inspect the
> > currently available locales".
>
> Committed.
>

This has broken the C locale, and the build farm.

if (pg_database_encoding_max_length() > 1 || locale->provider ==
COLLPROVIDER_ICU)

segfaults because locale is null.  (locale_is_c is true)

Cheers,

Jeff

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth

> "Mark" == Mark Dilger  writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,1000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy

Hi Amit please find the new patch

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> It is not only about above calculation, but also what the patch is
> doing in function _hash_get_tbuckets().  By the way function name also
> seems unclear (mainly *tbuckets* in the name).

Fixed I have introduced some macros for readability and added more
comments to explain why some calculations are mad. Please let me know
if you think more improvements are needed.

>spelling.
>/forth/fourth
>/at time/at a time
Thanks fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Privilege checks on array coercions

2017-03-23 Thread Andres Freund

On 2017-03-23 15:26:51 -0400, Tom Lane wrote:
> There is a test in privileges.sql (currently lines 589-625 in
> privileges.out) that seems to be dependent on the fact that the
> ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
> per-element type coercion function if it's dealing with a NULL input
> array.
> 
> While fooling with Andres' faster-expressions patch, I moved the
> pg_proc_aclcheck call for this into expression compilation, causing
> that privileges.sql test to fail.
> 
> Since Andres' patch moves ACL checks for regular function calls into
> expression compilation, I think it would be weird and inconsistent not
> to do so for ArrayCoerceExpr as well.  Does anyone want to defend this
> privileges test case as testing for some behavior that users expect?

Not me - that seems quite sensible to change.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2017-03-23 Thread Peter Eisentraut

On 3/23/17 05:34, Andreas Karlsson wrote:
> I am fine with this version of the patch. The issues I have with it, 
> which I mentioned earlier in this thread, seem to be issues with ICU 
> rather than with this patch. For example there seems to be no way for 
> ICU to validate the syntax of the BCP 47 locales (or ICU's old format). 
> But I think we will just have to accept the weirdness of how ICU handles 
> locales.
> 
> I think this patch is ready to be committed.
> 
> Found a typo in the documentation:
> 
> "The inspect the currently available locales" should be "To inspect the 
> currently available locales".

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Privilege checks on array coercions

2017-03-23 Thread Tom Lane

There is a test in privileges.sql (currently lines 589-625 in
privileges.out) that seems to be dependent on the fact that the
ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
per-element type coercion function if it's dealing with a NULL input
array.

While fooling with Andres' faster-expressions patch, I moved the
pg_proc_aclcheck call for this into expression compilation, causing
that privileges.sql test to fail.

Since Andres' patch moves ACL checks for regular function calls into
expression compilation, I think it would be weird and inconsistent not
to do so for ArrayCoerceExpr as well.  Does anyone want to defend this
privileges test case as testing for some behavior that users expect?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-23 Thread Fabrízio de Royes Mello

On Thu, Mar 23, 2017 at 3:58 PM, Alvaro Herrera 
wrote:
>
> Copying Fabrízio Mello here, who spent some time trying to work on
> reloptions too.  He may have something to say about the new
> functionality that this patch provides.
>

Thanks Álvaro, I'll look the patch and try to help in some way.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [HACKERS] Parallel Append implementation

2017-03-23 Thread Amit Khandekar

On 23 March 2017 at 16:26, Amit Khandekar  wrote:
> On 23 March 2017 at 05:55, Robert Haas  wrote:
>> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar 
wrote:
>>> Attached is the updated patch that handles the changes for all the
>>> comments except the cost changes part. Details about the specific
>>> changes are after the cost-related points discussed below.
>>>
>>> For non-partial paths, I was checking following 3 options :
>>>
>>> Option 1. Just take the sum of total non-partial child costs and
>>> divide it by number of workers. It seems to be getting close to the
>>> actual cost.
>>
>> If the costs for all children are about equal, then that works fine.
>> But when they are very unequal, then it's highly misleading.
>>
>>> Option 2. Calculate exact cost by an algorithm which I mentioned
>>> before, which is pasted below for reference :
>>> Persubpath cost : 20 16 10 8 3 1, with 3 workers.
>>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
>>> times remaining are :
>>> 10  6  0 8 3 1
>>> After 6 units (minimum of 10, 06, 08), the times remaining are :
>>> 4  0  0 2 3 1
>>> After 2 units (minimum of 4, 2, 3), the times remaining are :
>>>  2  0  0 0 1 1
>>> After 1 units (minimum of 2, 1, 1), the times remaining are :
>>>  1  0  0 0 0 0
>>> After 1 units (minimum of 1, 0 , 0), the times remaining are :
>>>  0  0  0 0 0 0
>>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20
>>
>
>> This gives the same answer as what I was proposing
>
> Ah I see.
>
>> but I believe it's more complicated to compute.
> Yes a bit, particularly because in my algorithm, I would have to do
> 'n' subtractions each time, in case of 'n' workers. But it looked more
> natural because it follows exactly the way we manually calculate.
>
>> The way my proposal would work in this
>> case is that we would start with an array C[3] (since there are three
>> workers], with all entries 0.  Logically C[i] represents the amount of
>> work to be performed by worker i.  We add each path in turn to the
>> worker whose array entry is currently smallest; in the case of a tie,
>> just pick the first such entry.
>>
>> So in your example we do this:
>>
>> C[0] += 20;
>> C[1] += 16;
>> C[2] += 10;
>> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next
>> path to C[2] */
>> C[2] += 8;
>> /* after the previous line, C[1] is now the smallest, so add to that
>> entry next */
>> C[1] += 3;
>> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
>> C[2] += 1;
>> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */
>>
>> Now we just take the highest entry that appears in any array, which in
>> this case is C[0], as the total cost.
>
> Wow. The way your final result exactly tallies with my algorithm
> result is very interesting. This looks like some maths or computer
> science theory that I am not aware.
>
> I am currently coding the algorithm using your method.

While I was coding this, I was considering if Path->rows also should
be calculated similar to total cost for non-partial subpath and total
cost for partial subpaths. I think for rows, we can just take
total_rows divided by workers for non-partial paths, and this
approximation should suffice. It looks odd that it be treated with the
same algorithm we chose for total cost for non-partial paths.

Meanwhile, attached is a WIP patch v10. The only change in this patch
w.r.t. the last patch (v9) is that this one has a new function defined
append_nonpartial_cost(). Just sending this to show how the algorithm
looks like; haven't yet called it.


ParallelAppend_v10_WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-23 Thread Alvaro Herrera

Copying Fabrízio Mello here, who spent some time trying to work on
reloptions too.  He may have something to say about the new
functionality that this patch provides.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee

On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
wrote:

> On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
>
> >
> > Yes, this is a very fair point. The way I proposed to address this
> upthread
> > is by introducing a set of threshold/scale GUCs specific to WARM. So
> users
> > can control when to invoke WARM cleanup. Only if the WARM cleanup is
> > required, we do 2 index scans. Otherwise vacuum will work the way it
> works
> > today without any additional overhead.
> >
>
> I am not sure on what basis user can set such parameters, it will be
> quite difficult to tune those parameters.  I think the point is
> whatever threshold we keep, once that is crossed, it will perform two
> scans for all the indexes.

Well, that applies to even vacuum parameters, no? The general sense I've
got here is that we're ok to push some work in background if it helps the
real-time queries, and I kinda agree with that. If WARM improves things in
a significant manner even with these additional maintenance work, it's
still worth doing.

Having said that, I see many ways we can improve on this later. Like we can
track somewhere else information about tuples which may have received WARM
updates (I think it will need to be a per-index bitmap or so) and use that
to do WARM chain conversion in a single index pass. But this is clearly not
PG 10 material.

>   IIUC, this conversion of WARM chains is
> required so that future updates can be WARM or is there any other
> reason?  I see this as a big penalty for future updates.
>

It's also necessary for index-only-scans. But I don't see this as a big
penalty for future updates, because if there are indeed significant WARM
updates then not preparing for future updates will result in
write-amplification, something we are trying to solve here and something
which seems to be showing good gains.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund

On 2017-03-23 05:09:46 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >> - Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
>  >> + Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);
> 
>  Andres> I think this somewhere in the file header needs an expanded
>  Andres> explanation about what these "special" phases mean.
> 
> I could move most of the block comment for ExecInitAgg up into the file
> header; would that be better?

Yes, I think so.


>  >> + values = palloc((1 + max_weight) * sizeof(double));
>  >> + sets = palloc((1 + max_weight) * sizeof(Bitmapset *));
> 
>  Andres> We usually cast the result of palloc.
> 
> Rough count in the backend has ~400 without casts to ~1350 with, so this
> doesn't seem to have been consistently enforced.

Yea, but we're still trying.

>  >> + RelOptInfo *grouped_rel,
>  >> + Path *path,
>  >> + bool is_sorted,
>  >> + bool can_hash,
>  >> + PathTarget *target,
>  >> + grouping_sets_data *gd,
>  >> + const AggClauseCosts 
> *agg_costs,
>  >> + double dNumGroups)
>  >> +{
>  >> + Query  *parse = root->parse;
>  >> +
>  >> + /*
>  >> +  * If we're not being offered sorted input, then only consider plans 
> that
>  >> +  * can be done entirely by hashing.
>  >> +  *
>  >> +  * We can hash everything if it looks like it'll fit in work_mem. But if
>  >> +  * the input is actually sorted despite not being advertised as such, we
>  >> +  * prefer to make use of that in order to use less memory.
>  >> +  * If none of the grouping sets are sortable, then ignore the work_mem
>  >> +  * limit and generate a path anyway, since otherwise we'll just fail.
>  >> +  */
>  >> + if (!is_sorted)
>  >> + {
>  >> + List   *new_rollups = NIL;
>  >> + RollupData *unhashable_rollup = NULL;
>  >> + List   *sets_data;
>  >> + List   *empty_sets_data = NIL;
>  >> + List   *empty_sets = NIL;
>  >> + ListCell   *lc;
>  >> + ListCell   *l_start = list_head(gd->rollups);
>  >> + AggStrategy strat = AGG_HASHED;
>  >> + Sizehashsize;
>  >> + double  exclude_groups = 0.0;
>  >> +
>  >> + Assert(can_hash);
>  >> +
>  >> + if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
>  >> + {
>  >> + unhashable_rollup = lfirst(l_start);
> 
>  Andres> a) cast result of lfirst/lnext/whatnot.
> 
> casting lfirst is defensible (and only about 10% of existing uses don't
> cast it), but why would you ever cast the result of lnext?

Obviously lnext is bogus, was thinking of linitial...

>  >> + if (can_hash && gd->any_hashable)
>  >> + {
>  >> + List   *rollups = NIL;
>  >> + List   *hash_sets = list_copy(gd->unsortable_sets);
>  >> + double  availspace = (work_mem * 1024.0);
> 
>  Andres> Why the parens, and why the .0?
> 
> The .0 is because the * should be done as a double, not an int.

I was just missing why - but now that I've not already been awake for 20
hours, 8 of those crammed into a plane, it's obviously beneficial
because of overflow concerns.


>  Andres> I think you need a higher level explanation of how you're
>  Andres> mapping the work_mem issue to knapsack somewhere.
> 
> The general overview ("work_mem is the knapsack size, and we need to
> figure out how best to pack it with hashtables"), or the specific issue
> of the scale factor for discrete chunking of the size?

The general overview - the scaling thing doesn't seem that important to
understand in detail, to understand the approach.  Briefly explaining
what we're trying to minimize (sort steps), what the constraint is
(memory), that the problem is representable as the classic "knapsack
problem".


>  Andres> Hm.  So we're solely optimizing for the number of sorts, rather
>  Andres> the cost of the sorts.  Probably an acceptable tradeoff.
> 
> The existing cost model for sorts doesn't actually seem to help us here;
> all of our sorts sort the same data, just with different comparison
> columns, but cost_sort doesn't actually account for that (all its
> callers except the CLUSTER planning code pass in 0.0 for
> comparison_cost).
> 
> If the sort costs were correct, we could compute a "value" for each
> rollup that reflected the cost difference between the sort+unique
> comparisons for it, and the hash functions that would be called if we
> hashed instead; and just pass that to the knapsack function.

That'd indeed be desirable - but I think we can punt on that for now.

Greetings,

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Mark Dilger


> On Mar 23, 2017, at 11:22 AM, Andrew Gierth  
> wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> You define DiscreteKnapsack to take integer weights and double
> Mark> values, and perform the usual Dynamic Programming algorithm to
> Mark> solve.  But the only place you call this, you pass in NULL for
> Mark> the values, indicating that all the values are equal.  I'm
> Mark> confused why in this degenerate case you bother with the DP at
> Mark> all.  Can't you just load the knapsack from lightest to heaviest
> Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
> Mark> been a long day.  Sorry if I am missing something obvious.
> 
> That's actually a very good question.
> 
> (Though in passing I should point out that the runtime cost is going to
> be negligible in all practical cases. Even an 8-way cube (256 grouping
> sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
> 4096 grouping sets was only chosen because it was a nice round number
> that was larger than comparable limits in other database products. Other
> stuff in the grouping sets code has worse time bounds; the
> bipartite-match code used to minimize the number of rollups is
> potentially O(n^2.5) in the number of grouping sets.)
> 
> Thinking about it, it's not at all obvious what we should be optimizing
> for here in the absence of accurate sort costs.

Is there a performance test case where this patch should shine
brightest?  I'd like to load a schema with lots of data, and run
a grouping sets query, both before and after applying the patch,
to see what the performance advantage is.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund

On 2017-03-23 03:43:57 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  Andres> Changes to advance_aggregates() are, in my experience, quite
>  Andres> likely to have performance effects.  This needs some
>  Andres> performance tests.
>  [...]
>  Andres> Looks like it could all be noise, but it seems worthwhile to
>  Andres> look into some.
> 
> Trying to sort out signal from noise when dealing with performance
> impacts of no more than a few percent is _extremely hard_ these days.

Indeed. But that doesn't mean we needn't try.  With some determination
and profiling you can often sepearate signal from noise - I managed to
track down 0.12% regressions in the expression evaluation work...


> I will go ahead and do this, out of sheer curiosity if nothing else,
> but the preliminary results suggest there's probably nothing worth
> holding up the patch for.

Agreed. I'd want to run one more profile, checking whether the profiles
indicate new hotspots, but other than that...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee

On Thu, Mar 23, 2017 at 11:44 PM, Mithun Cy 
wrote:

> Hi Pavan,
> On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
>  wrote:
> > Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> > RAM, attached SSD) and results are shown below. But I think it is
> important
> > to get independent validation from your side too, just to ensure I am not
> > making any mistake in measurement. I've attached naively put together
> > scripts which I used to run the benchmark. If you find them useful,
> please
> > adjust the paths and run on your machine.
>
> I did a similar test appears. Your v19 looks fine to me, it does not
> cause any regression, On the other hand, I also ran tests reducing
> table fillfactor to 80 there I can see a small regression 2-3% in
> average when updating col2 and on updating col9 again I do not see any
> regression.
>
>
Thanks Mithun for repeating the tests and confirming that the v19 patch
looks ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma

> Hi,
>
> On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:
>>
>> On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
>>  wrote:
>>>
>>> 0001v2:
>>>
>>> In hashgettuple() you can remove the 'currItem' and 'offnum' from the
>>> 'else'
>>> part, and do the assignment inside
>>>
>>>   if (so->numKilled < MaxIndexTuplesPerPage)
>>>
>>> instead.
>>>
>>
>> Done. Please have a look into the attached v3 patch.
>>
>>>
>>> No new comments for 0002 and 0003.
>>
>>
>> okay. Thanks.
>>
>
> I'll keep the entry in 'Needs Review' if Alexander, or others, want to add
> their feedback.

okay. Thanks.

>
> (Best to post the entire patch series each time)

I take your suggestion. Please find the attachments.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		ItemPointerSetOffsetNumber(current, offnum);
-
-		/*
 		 * Check to see if we should kill the previously-fetched tuple.
 		 */
 		if (scan->kill_prior_tuple)
@@ -346,9 +303,11 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 
 			if (so->numKilled < MaxIndexTuplesPerPage)
 			{
-so->killedItems[so->numKilled].heapTid = so->hashso_heappos;
-so->killedItems[so->numKilled].indexOffset =
-

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-23 Thread Robert Haas

On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer  wrote:
> Changes made per discussion.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Peter Eisentraut

On 3/23/17 06:41, Daniel Gustafsson wrote:
>> On 20 Mar 2017, at 01:37, Peter Eisentraut 
>>  wrote:
>>
>> On 3/18/17 09:00, Peter Eisentraut wrote:
>>> I just noticed that createlang and droplang have been listed as
>>> deprecated since PG 9.1.
>>>
>>> Do we dare remove them?
>>
>> Patch
> 
> LGTM, +1

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth

> "Mark" == Mark Dilger  writes:

 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen


Hi,

On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:

0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
part, and do the assignment inside

  if (so->numKilled < MaxIndexTuplesPerPage)

instead.



Done. Please have a look into the attached v3 patch.



No new comments for 0002 and 0003.


okay. Thanks.



I'll keep the entry in 'Needs Review' if Alexander, or others, want to 
add their feedback.


(Best to post the entire patch series each time)

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Mithun Cy

Hi Pavan,
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.

I did a similar test appears. Your v19 looks fine to me, it does not
cause any regression, On the other hand, I also ran tests reducing
table fillfactor to 80 there I can see a small regression 2-3% in
average when updating col2 and on updating col9 again I do not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


WARM_test_02.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:
> Hi,
>
> On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:
>>
>> Done. Please refer to the attached v2 version of patch.
>>
>
> Thanks.
>
 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
 patch rewrites the hash index scan module to work in page-at-a-time
 mode. It basically introduces two new functions-- _hash_readpage() and
 _hash_saveitem(). The former is used to load all the qualifying tuples
 from a target bucket or overflow page into an items array. The latter
 one is used by _hash_readpage to save all the qualifying tuples found
 in a page into an items array. Apart from that, this patch bascially
 cleans _hash_first(), _hash_next and hashgettuple().

>
> 0001v2:
>
> In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
> part, and do the assignment inside
>
>   if (so->numKilled < MaxIndexTuplesPerPage)
>
> instead.
>

Done. Please have a look into the attached v3 patch.

>
> No new comments for 0002 and 0003.

okay. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-23 Thread Robert Haas

On Thu, Mar 23, 2017 at 12:41 AM, Rafia Sabih
 wrote:
> Agree, done.

OK, committed execute-once-v3.patch after some further study.  See
also 
https://www.postgresql.org/message-id/CA%2BTgmoZ_ZuH%2BauEeeWnmtorPsgc_SmP%2BXWbDsJ%2BcWvWBSjNwDQ%40mail.gmail.com
which resulted from that study.

Regarding pl_parallel_opt_support_v3.patch, the change to
init_execution_state looks good.  Whether or not it's safe to use
parallel query doesn't depend on whether the function is marked
volatile, so the current code is just silly (and, arguably,
readonly_func is misnamed).  The changes to spi.c also seem fine; both
of those functions execute the plan to completion and don't allow
cursor operations, so we're good.

The changes to the plpgsql code don't look so good to me.  The change
to exec_stmt_return_query fixes the same bug that I mentioned in the
email linked above, but only half of it -- it corrects the RETURN
QUERY EXECUTE case but not the RETURN QUERY case.  And it's clearly a
separate change; that part is a bug fix, not an enhancement.  Some of
the other changes depend on whether we're in a trigger, which seems
irrelevant to whether we can use parallelism. Even if the outer query
is doing writes, we can still use parallelism for queries inside the
trigger function if warranted.  It's probably a rare case to have
queries inside a trigger that are expensive enough to justify such
handling but I don't see the value of putting in special-case logic to
prevent it.

I suspect that code fails to achieve its goals anyway.  At the top of
exec_eval_expr(), you call exec_prepare_plan() and unconditionally
pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
might now be a parallel plan.  If we reach the call to
exec_run_select() further down in that function, and if we happen to
pass false, it's not going to matter, because exec_run_select() is
going to find the plan already initialized.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-23 Thread Ashutosh Sharma

Hi,

On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas  wrote:
> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila  wrote:
>>> Maybe we should call them "unused pages".
>>
>> +1.  If we consider some more names for that column then probably one
>> alternative could be "empty pages".
>
> Yeah, but I think "unused" might be better.  Because a page could be
> in use (as an overflow page or primary bucket page) and still be
> empty.
>

Based on the earlier discussions, I have prepared a patch that would
allow pgstathashindex() to show the number of unused pages in hash
index. Please find the attached patch for the same. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From e3b59fa85f16d6d15be5360e85b7faf63e8683a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:02:26 +0530
Subject: [PATCH] Allow pgstathashindex to show unused pages v1

---
 contrib/pgstattuple/expected/pgstattuple.out  | 12 ++--
 contrib/pgstattuple/pgstatindex.c | 19 ---
 contrib/pgstattuple/pgstattuple--1.4--1.5.sql |  1 +
 doc/src/sgml/pgstattuple.sgml | 17 -
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out
index 2c3515b..1f1ff46 100644
--- a/contrib/pgstattuple/expected/pgstattuple.out
+++ b/contrib/pgstattuple/expected/pgstattuple.out
@@ -132,9 +132,9 @@ select * from pgstatginindex('test_ginidx');
 
 create index test_hashidx on test using hash (b);
 select * from pgstathashindex('test_hashidx');
- version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | live_items | dead_items | free_percent 
--+--++--++++--
-   2 |4 |  0 |1 |  0 |  0 |  0 |  100
+ version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | unused_pages | live_items | dead_items | free_percent 
+-+--++--++--+++--
+   2 |4 |  0 |1 |  0 |0 |  0 |  0 |  100
 (1 row)
 
 -- these should error with the wrong type
@@ -233,9 +233,9 @@ select pgstatindex('test_partition_idx');
 (1 row)
 
 select pgstathashindex('test_partition_hash_idx');
-   pgstathashindex   
--
- (2,8,0,1,0,0,0,100)
+pgstathashindex
+---
+ (2,8,0,1,0,0,0,0,100)
 (1 row)
 
 drop table test_partitioned;
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index d448e9e..6fc41d6 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -120,6 +120,7 @@ typedef struct HashIndexStat
 	BlockNumber overflow_pages;
 	BlockNumber bitmap_pages;
 	BlockNumber zero_pages;
+	BlockNumber unused_pages;
 
 	int64	live_items;
 	int64	dead_items;
@@ -588,8 +589,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	BufferAccessStrategy bstrategy;
 	HeapTuple	tuple;
 	TupleDesc	tupleDesc;
-	Datum		values[8];
-	bool		nulls[8];
+	Datum		values[9];
+	bool		nulls[9];
 	Buffer		metabuf;
 	HashMetaPage	metap;
 	float8		free_percent;
@@ -667,6 +668,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 			}
 			else if (opaque->hasho_flag & LH_BITMAP_PAGE)
 stats.bitmap_pages++;
+			else if (PageIsEmpty(page))
+stats.unused_pages++;
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -680,8 +683,9 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	/* Done accessing the index */
 	index_close(rel, AccessShareLock);
 
-	/* Count zero pages as free space. */
-	stats.free_space += stats.zero_pages * stats.space_per_page;
+	/* Count zero and unused pages as free space. */
+	stats.free_space += (stats.zero_pages + stats.unused_pages) *
+		 stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
@@ -711,9 +715,10 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	values[2] = Int64GetDatum((int64) stats.overflow_pages);
 	values[3] = Int64GetDatum((int64) stats.bitmap_pages);
 	values[4] = Int64GetDatum((int64) stats.zero_pages);
-	values[5] = Int64GetDatum(stats.live_items);
-	values[6] = Int64GetDatum(stats.dead_items);
-	values[7] = Float8GetDatum(free_percent);
+	values[5] = Int64GetDatum((int64) stats.unused_pages);
+	values[6] = Int64GetDatum(stats.live_items);
+	values[7] = Int64GetDatum(stats.dead_items);
+	values[8] = Float8GetDatum(free_percent);
 	tuple = heap_form_tuple(tupleDesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
diff --git a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql b/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
index 84e112e..eda719a 100644
--- a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
+++

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Alvaro Herrera

Robert Haas wrote:

> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

I think that the chances of someone depending on a parallel plan running
serially by accident which is better than the non-parallel plan, are
pretty slim.

+1 for back-patching.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund

Hi,

On 2017-03-23 13:14:59 -0400, Tom Lane wrote:
> Looking at some of the coding choices you made here, I see that you're
> making a hard assumption that called functions never scribble on their
> fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
> an assumption before.

I think we did that before, e.g. ExecEvalScalarArrayOp(). Think there's
others too.

> Are we comfortable with that?  If so, we'd better document it.

I think it's ok, but we indeed should document it. I recall a note
somewhere... Can't find it anywhere however, might have misremembered a
note about pass-by-ref arguments.  fmgr/README? A note in
FunctionCallInfoData's definition?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Joshua D. Drake


On 03/23/2017 10:03 AM, Robert Haas wrote:

On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:

Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
statement in a PL/pgsql block.  As it turns out, the analysis that led
to this decision was totally wrong-headed, because the plan will
always be executed using SPI_cursor_fetch(portal, true, 50), which
will cause ExecutePlan() to get invoked with a count of 50, which will
cause it to run the parallel plan serially, without workers.
Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
can do is cause us to pick a parallel plan that's slow when executed
serially instead of the best serial plan.

The attached patch fixes it.  I plan to commit this and back-patch it
to 9.6, barring objections or better ideas.


I guess the downside of back-patching this is that it could cause a
plan change for somebody which ends up being worse.  On the whole,
serial execution of queries intended to be run in parallel isn't
likely to work out well, but it's always possible somebody has a cases
where it happens to be winning, and this could break it.  So maybe I
should do this only in master?  Thoughts?


I think the greater good of a fix applies here. +1 to 9.6.






--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane

Looking at some of the coding choices you made here, I see that you're
making a hard assumption that called functions never scribble on their
fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
an assumption before.  Are we comfortable with that?  If so, we'd
better document it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Andres Freund

On 2017-03-23 13:03:19 -0400, Robert Haas wrote:
> On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> > Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> > plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> > statement in a PL/pgsql block.  As it turns out, the analysis that led
> > to this decision was totally wrong-headed, because the plan will
> > always be executed using SPI_cursor_fetch(portal, true, 50), which
> > will cause ExecutePlan() to get invoked with a count of 50, which will
> > cause it to run the parallel plan serially, without workers.
> > Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> > can do is cause us to pick a parallel plan that's slow when executed
> > serially instead of the best serial plan.
> >
> > The attached patch fixes it.  I plan to commit this and back-patch it
> > to 9.6, barring objections or better ideas.
> 
> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

I'm +0.5 for backpatching.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> > Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> > plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> > statement in a PL/pgsql block.  As it turns out, the analysis that led
> > to this decision was totally wrong-headed, because the plan will
> > always be executed using SPI_cursor_fetch(portal, true, 50), which
> > will cause ExecutePlan() to get invoked with a count of 50, which will
> > cause it to run the parallel plan serially, without workers.
> > Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> > can do is cause us to pick a parallel plan that's slow when executed
> > serially instead of the best serial plan.
> >
> > The attached patch fixes it.  I plan to commit this and back-patch it
> > to 9.6, barring objections or better ideas.
> 
> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

For my 2c, I'd back-patch it.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Robert Haas

On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> statement in a PL/pgsql block.  As it turns out, the analysis that led
> to this decision was totally wrong-headed, because the plan will
> always be executed using SPI_cursor_fetch(portal, true, 50), which
> will cause ExecutePlan() to get invoked with a count of 50, which will
> cause it to run the parallel plan serially, without workers.
> Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> can do is cause us to pick a parallel plan that's slow when executed
> serially instead of the best serial plan.
>
> The attached patch fixes it.  I plan to commit this and back-patch it
> to 9.6, barring objections or better ideas.

I guess the downside of back-patching this is that it could cause a
plan change for somebody which ends up being worse.  On the whole,
serial execution of queries intended to be run in parallel isn't
likely to work out well, but it's always possible somebody has a cases
where it happens to be winning, and this could break it.  So maybe I
should do this only in master?  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parallel "return query" is no good

2017-03-23 Thread Robert Haas

Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
statement in a PL/pgsql block.  As it turns out, the analysis that led
to this decision was totally wrong-headed, because the plan will
always be executed using SPI_cursor_fetch(portal, true, 50), which
will cause ExecutePlan() to get invoked with a count of 50, which will
cause it to run the parallel plan serially, without workers.
Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
can do is cause us to pick a parallel plan that's slow when executed
serially instead of the best serial plan.

The attached patch fixes it.  I plan to commit this and back-patch it
to 9.6, barring objections or better ideas.

I previously remarked on this in
http://postgr.es/m/ca+tgmobxehvhbjtwdupzm9bvslitj-kshxqj2um5gpdze9f...@mail.gmail.com
but I wasn't quite so clear what the whole picture was in that email
as I am now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-parallel-return-query.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Teodor Sigaev


Hi!

I believe patch looks good and it's ready to commit. As I understand, it fixes 
bug introduced by

commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups

And, suppose, it should be backpatched to 9.6?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-23 Thread Teodor Sigaev


Thank you, pushed

Andrew Borodin wrote:

2017-03-22 22:48 GMT+05:00 Teodor Sigaev :

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')


Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.


And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.


From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Order-preserving function transforms and EquivalenceClass

2017-03-23 Thread Mat Arye

Hi,

I am on a team developing an open-source extension for time-series data
storage in PostgreSQL (https://github.com/timescaledb/timescaledb).

We are trying to extend/hook into the planner so that it understands that
date_trunc('minute', time) has the same ordering as time (or rather that a
sort ordering on the latter is always a valid sort ordering on the former).
But this question really applies to any order-preserving transform such as
(time+1) vs (time).

Let's assume we can detect such cases. How do we tell the planner about it.

I see two ways of doing this:

(1) Having an EquivalenceClass with two members - the "time" and
"date_trunc" member. Then the pathkey for the sort would reference this
member and understand the equivalence. I think this is a pretty clean way
of doing things. But this equivalence between "time" and "date_trunc" only
applies to sorts. It should not apply to joins (for which EquivalenceClass
is also used) because these values are not actually equivalent. They are
only equivalent for sorting purposes. I know an EquivalenceClass has
a ec_has_volatile field which marks the EquivalenceClass as only for sorts
and not for joins. But is it an incorrect hack to set this for cases where
the EquivalenceMembers are not volatile? Is there some other way as marking
an EquivalenceClass as only for sorts that I am missing? Another wrinkle is
that while a date_trunc sort can be safely represented by a time sort the
reverse isn't true. Has there been any discussion on supporting such cases
in EquivalenceClasses? I'd be happy to submit a patch to the core if people
think this is worthwhile.

(2) I can create new EquivalenceClass objects and pathkeys and use planner
hooks to substitute the appropriate pathkeys for doing things like finding
indexes etc. I have a prototype where this works, but I just think approach
1 is much cleaner.

Any thoughts would be greatly appreciated. I am pretty new to the planner
codebase and just want to avoid any obvious pitfalls.

Thanks,
Matvey Arye

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-23 Thread Pierre Ducroquet

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi

This is my first review (Magnus said in his presentation in PGDay Paris that 
volunteers should just come and help, so here I am), so please notify me for 
any mistake I do when using the review tools...

The feature seems to work as expected, but I don't claim to be a markdown and 
rst expert.
Some minor issues with the code itself :
- some indentation issues (documentation and code itself with mix between space 
based and tab based indentation) and a few trailing spaces in code
- typographic issues in the documentation : 
  - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown and rst 
formats" ==> duplicated and
  - "Sets the output format to one of unaligned, aligned, wrapped, html, 
asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or troff-ms." 
==> extra comma at the end of the list
- the comment " dont add line after last row, because line is added after every 
row" is misleading, it should warn that it's only for rst
- there is a block of commented out code left
- in the print_aligned_vertical function, there is a mix between 
"cont->opt->format == PRINT_RST" and "format == _rst" and I don't see any 
obvious reason for that
- the documentation doesn't mention (but ok, it's kind of obvious) that the 
linestyle option will not work with rst and markdown

Thanks !

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: GSOC 2017 project ideas

2017-03-23 Thread Charles Cui

Given the low activity in pgsql-students, forward here to get more
feedbacks.

-- Forwarded message --
From: Charles Cui 
Date: 2017-03-23 8:58 GMT-07:00
Subject: GSOC 2017 project ideas
To: pgsql-stude...@postgresql.org

Hi guys,
This is Charles Cui from Tsinghua University, China. I am pretty
interested in
the ideas listed in GSOC 2017 page on Postgresql, especially the idea
"Eliminate O(N^2) scaling from rw-conflict tracking in serializable
transactions". For my PhD program in
Tsinghua University, I focus on operating system scalability and
performance on multicore processors. One of my jobs is to design the
benchmark suite to stress kinds of
operating system interfaces from user level and analyze performance. After
that, I will propose solutions to solve the problem. For example, I have
published papers in
improving synchronization primitives in Linux kernel and pretty familiar
with kinds of locks and lock-free techniques. I think this project is
similar to my previous work and I believe that I have the confidence and
ability to do this well. Besides, I speak fluent English and I am full
available during the whole summer. I am wondering whether I can join this
project and work with you guys? In case that this project is allocated to
other people, I am also open to pick other projects in Postgresql. Let me
know if you have questions or feedbacks.

Thanks, Charles.

Re: [HACKERS] perlcritic

2017-03-23 Thread Daniel Gustafsson

> On 21 Mar 2017, at 19:20, David Steele  wrote:
> 
> On 3/6/17 12:02 PM, Dagfinn Ilmari Mannsåker wrote:
>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>> 
>>> Hi Peter,
>>> 
>>> Peter Eisentraut  writes:
>>> 
 I posted this about 18 months ago but then ran out of steam. [ ] Here
 is an updated patch.  The testing instructions below still apply.
 Especially welcome would be ideas on how to address some of the places
 I have marked with ## no critic.
>>> 
>>> Attached is a patch on top of yours that addresses all the ## no critic
>>> annotations except RequireFilenameMatchesPackage, which can't be fixed
>>> without more drastic reworking of the plperl build process.
>>> 
>>> Tested on perl 5.8.1 and 5.24.0 by configuring with --with-perl and
>>> --enable-tap-tests followed by make check-world, and running pgindent
>>> --build.
>> 
>> Attached is an updated version of the patch, in which
>> src/tools/msvc/gendef.pl actually compiles.  If someone on Windows could
>> test it, that would be great.
> 
> You are signed up to review this patch.  Do you know when you'll have a 
> chance to do that?

Below is a review of the two patches attached to the commitfest entry:

The v2-0001-Clean-up-Perl-code-according-to-perlcritic-severi.patch didn’t
apply cleanly due to later commits, but the fixes to get there were trivial.
The followup 0001-Fix-most-perlcritic-exceptions-v2.patch applied clean on top
of that.  The attached patch contains these two patches, rebased on top of
current master, with the below small nitpicks.

Since the original submission, most things have been addressed already, leaving
this patch with mostly changing to three-close open.  The no critic exceptions
left are quite harmless: two cases of RequireFilenameMatchesPackage and one
ProhibitStringyEval.  All three could be fixed at the expense of complicating
things without much (or any) benefit (as mentioned up-thread by Dagfinn Ilmari
Mannsåker), so I’m fine with leaving them in.

A few small nitpicks on the patch:

## In src/interfaces/libpq/test/regress.pl:

-open(REGRESS_IN, "<", $regress_in)
+open(my $regress_in_fh, "<", $regress_in)

Reading and closing this file was still using REGRESS_IN, fixed in the attached
updated patch.

## In src/test/locale/sort-test.pl:

-open(INFILE, "<$ARGV[0]");
-chop(my (@words) = );
-close(INFILE);
+chop(my (@words) = <>);

While this hunk does provide the same functionality due to the magic handling
of ARGV in <>, it also carries the side “benefit” that arbitrary applications
can be executed by using a | to read the output from a program:

  $ src/test/locale/sort-test.pl "rm README |"

  $ cat README
  cat: README: No such file or directory

A silly example for sure, but since the intent of the patch is to apply best
practices and safe practices, I’d argue that a normal three-clause open is
safer here.  The risk for misuse is very low, but it also makes the code less
magic and more readable IMO.

Reading the thread, most of the discussion was around the use of three-clause
open instead of the older two-clause.  Without diving into the arguments, there
are a few places where we should use three-clause open, so simply applying it
everywhere rather than on a case by case basis seems reasonable to me.
Consistency across the codebase helps when reading the code.

There is no measurable performance impact on the changes, and no user visible
changes to functionality.  With this applied, make check-world passes and
perlcritic returns a clean run (except on the autogenerated Gen_dummy_probes.pl
which is kept out of this work).  The intent of the patch is to make the code
consistent and readable, and it achieves that.  I see no reason to not go ahead
with these changes, if only to keep the codebase consistent with with modern
Perl code is expected to look like.

Given the nitpick nature of the comments, bumping status to ready for
committer.

cheers ./daniel

perlcritic-with-review-comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-23 Thread Peter Eisentraut

Another idea here:

Instead of making COPY_PARSE_PLAN_TREES a compile-time option, make it a
run-time option, and make that somehow selectable while running the test
suite.  It would be much easier to run the test suite with an option on
the command line, instead of having to manually edit a header file and
recompiling, then switching it back, etc.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minor spelling error fix (btis -> bits)

2017-03-23 Thread Simon Riggs

On 23 March 2017 at 15:05, Jon Nelson  wrote:
> Attached please find a minor spelling error fix, changing "btis" to "bits".

Applied, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-23 Thread Kevin Grittner

On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
 wrote:

> I've finished a draft proposal for "Eliminate O(N^2) scaling from
> rw-conflict tracking in serializable transactions".  You can find
> it from GSOC website or by the link below.
> https://docs.google.com/document/d/17TAs3EJIokwPU7UTUmnlVY3ElB-VHViyX1zkQJmrD1A/edit?usp=sharing
>
> I was wondering if you have time to review the proposal and give me some 
> comments?

Will take a look and give you an initial review in a day or two.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] minor spelling error fix (btis -> bits)

2017-03-23 Thread Jon Nelson

Attached please find a minor spelling error fix, changing "btis" to "bits".




-- 
Jon
From f590a6dce6677bc5b8a409d40fd651ecb69b27bb Mon Sep 17 00:00:00 2001
From: Jon Nelson 
Date: Sun, 23 Mar 2014 08:23:48 -0500
Subject: [PATCH] - fix minor spelling error

---
 src/backend/utils/adt/network.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 1f8469a..b126494 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -1136,7 +1136,7 @@ network_scan_first(Datum in)
 
 /*
  * return "last" IP on a given network. It's the broadcast address,
- * however, masklen has to be set to its max btis, since
+ * however, masklen has to be set to its max bits, since
  * 192.168.0.255/24 is considered less than 192.168.0.255/32
  *
  * inet_set_masklen() hacked to max out the masklength to 128 for IPv6
-- 
2.10.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 149 matches

Mail list logo