date:20161103

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh  wrote:
> On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
>  wrote:
>>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>>> because small buildfarm machines may suffer on it. Although I've no
>>> experience in this matter, I would like to be certain that nothings breaks
>>> in recovery tests after some modifications.
>>
>> An extra idea that I have here would be to extend the TAP tests to
>> accept an environment variable that would be used to specify extra
>> options when starting Postgres instances. Buildfarm machines could use
>> it.
>>
> It can be added as a separate feature.
>
>>
>> -/* If it's a full-page image, restore it. */
>> -if (XLogRecHasBlockImage(record, block_id))
>> +/* If full-page image should be restored, do it. */
>> +if (XLogRecBlockImageApply(record, block_id))
>> Hm. It seems to me that this modification is incorrect. If the page
>> does not need to be applied, aka if it needs to be used for
>> consistency checks, what should be done is more something like the
>> following in XLogReadBufferForRedoExtended:
>> if (Apply(record, block_id))
>> return;
>> if (HasImage)
>> {
>> //do what needs to be done with an image
>> }
>> else
>> {
>> //do default things
>> }
>>
> XLogReadBufferForRedoExtended should return a redo action
> (block restored, done, block needs redo or block not found). So, we
> can't just return
> from the function if BLKIMAGE_APPLY flag is not set. It still has to
> check whether a
> redo is required or not.

Wouldn't the definition of a new redo action make sense then? Say
SKIPPED. None of the existing actions match the non-apply case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exclude pg_largeobject form pg_dump

2016-11-03 Thread amul sul

Hi Guillaume,

With your v2 patch, -B options working as expected but --no-blobs
options is still unrecognized, this happens is because of you have
forgot to add entry for 'no-blobs' in long_options[] array.

Apart from this concern patch looks good to me. Thanks

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add radiustimeout parameter for RADIUS HBA

2016-11-03 Thread Haribabu Kommi

On Mon, Oct 24, 2016 at 2:03 PM, Samuel D. Leslie  wrote:

> Hello everyone,
>
> I’d like to submit the attached patch for feedback from the PostgreSQL
> community and potential future inclusion in the codebase. The patch adds a
> new parameter to the RADIUS authentication method named “radiustimeout”,
> allowing the database administrator to configure the timeout in seconds to
> wait for responses from a configured RADIUS server. Until now, this has
> been hardcoded to three seconds by the RADIUS_TIMEOUT define in auth.c.
> While this is usually sufficient for typical RADIUS server configurations,
> there are some more unusual configurations where a higher timeout is
> required. Examples include:
>  - Authenticating against a RADIUS server over a high latency link
>  - Authenticating against a RADIUS server that is performing additional
> out-of-band authentication
>
> The latter case is applicable to a server I admin and spurred the
> development of this patch. We implemented multi-factor authentication for
> user access to a sensitive database via a RADIUS server implementation
> which performs the standard username & password verification, and if it
> succeeds, subsequently performs a second factor of authentication via a
> configured mobile app. The RADIUS response confirming successful
> authentication is only returned after both authentication factors have
> completed. In our deployment, a timeout of 60 seconds seems to work well,
> but certainly three seconds is not at all workable.
>
> Thanks in advance for any and all feedback.
>

I reviewed and tested the patch. It works as expected.
Following are my observations during the test.

1. In case if the radiustimeout is more than authentication_timeout the
client connection is stopped only when the radiustimeout is occurred.

Do we need add the CHECK_FOR_INTERRUPTS() call or add this
behavior information in the docs?

2. When the Postgresql Backend is waiting for a response from
Radius server, in case if the client disconnects, still backend waits
for the response from RADIUS server and then it closes.

I feel the second case is rare, may not be a problem.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Exclude pg_largeobject form pg_dump

2016-11-03 Thread amul sul

Kindly ignore this, i've added this note to original thread.

Sorry for noise.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Substantial bloat in postgres_fdw regression test runtime

2016-11-03 Thread Jeevan Chalke

On Wed, Nov 2, 2016 at 10:09 PM, Tom Lane  wrote:

> In 9.6, "make installcheck" in contrib/postgres_fdw takes a shade
> under 3 seconds on my machine.  In HEAD, it's taking 10 seconds.
> I am not happy, especially not since there's no parallelization
> of the contrib regression tests.  That's a direct consumption of
> my time and all other developers' time too.  This evidently came
> in with commit 7012b132d (Push down aggregates to remote servers),
> and while that's a laudable feature, I cannot help thinking that
> it does not deserve this much of an imposition on every make check
> that's ever done for the rest of eternity.
>

Thanks Tom for reporting this substantial bloat in postgres_fdw regression
test runtime. On my machine "make installcheck" in contrib/postgres_fdw
takes 6.2 seconds on master (HEAD:
770671062f130a830aa89100c9aa2d26f8d4bf32).
However if I remove all tests added for aggregate push down, it drops down
to 2.2 seconds. Oops 4 seconds more.

I have timed each of my tests added as part of aggregate push down patch and
observed that one of the test (LATERAL one) is taking around 3.5 seconds.
This is causing because of the parameterization. I have added a filter so
that we will have less number of rows for parameterization. Doing this,
lateral query in question now runs in 100ms. Also updated few more queries
which were taking more than 100ms to have runtime around 30ms or so. So
effectively, with changes "make installcheck" now takes around 2.5 seconds.

Attached patch with test-case modification.

Thanks

>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

speedup_agg_push_down_tests.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 12:34 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh  
> wrote:
>> On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
>>> -/* If it's a full-page image, restore it. */
>>> -if (XLogRecHasBlockImage(record, block_id))
>>> +/* If full-page image should be restored, do it. */
>>> +if (XLogRecBlockImageApply(record, block_id))
>>> Hm. It seems to me that this modification is incorrect. If the page
>>> does not need to be applied, aka if it needs to be used for
>>> consistency checks, what should be done is more something like the
>>> following in XLogReadBufferForRedoExtended:
>>> if (Apply(record, block_id))
>>> return;
>>> if (HasImage)
>>> {
>>> //do what needs to be done with an image
>>> }
>>> else
>>> {
>>> //do default things
>>> }
>>>
>> XLogReadBufferForRedoExtended should return a redo action
>> (block restored, done, block needs redo or block not found). So, we
>> can't just return
>> from the function if BLKIMAGE_APPLY flag is not set. It still has to
>> check whether a
>> redo is required or not.
>
> Wouldn't the definition of a new redo action make sense then? Say
> SKIPPED. None of the existing actions match the non-apply case.

As per my understanding, XLogReadBufferForRedoExtended works as follows:
1.  If wal record has backup block
2.  {
3.   restore the backup block;
4.   return BLK_RESTORED;
5.  }
6.  else
7.  {
8.   If block found in buffer
10.  If lsn of block is less than last replayed record
11.   return BLK_DONE;
12.  else
13.   return BLK_NEEDS_REDO;
14. else
15.  return BLK_NOT_FOUND;
16. }
Now, we can just change step 1 as follows:
1.  If wal record has backup block and it needs to be restored.

I'm not getting why we should introduce a new redo action and return
from the function beforehand.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier
 wrote:
> Wouldn't the definition of a new redo action make sense then? Say
> SKIPPED. None of the existing actions match the non-apply case.

I just took 5 minutes to look at the code and reason about it, and
something like what your patch is doing would be actually fine. Still
I don't think that checking for the apply flag in the macro routine
should look for has_image. Let's keep things separate.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 2:27 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier
>  wrote:
>> Wouldn't the definition of a new redo action make sense then? Say
>> SKIPPED. None of the existing actions match the non-apply case.
>
> I just took 5 minutes to look at the code and reason about it, and
> something like what your patch is doing would be actually fine. Still
> I don't think that checking for the apply flag in the macro routine
> should look for has_image. Let's keep things separate.
Actually, I just verified that bimg_info is not even valid if
has_image is not set.
In DecodeXLogRecord, we initialize bimg_info only when has_image flag
is set. So, keeping them
separate doesn't look a good approach to me. If we keep them separate,
the output
of the following assert is undefined:
Assert(XLogRecHasBlockImage(record, block_id) ||
!XLogRecBlockImageApply(record, block_id)).

Thoughts??
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 5:56 PM, Kuntal Ghosh  wrote:
> I'm not getting why we should introduce a new redo action and return
> from the function beforehand.

Per my last email, same conclusion from here :)
Sorry if I am picky and noisy on many points, I am trying to think
about the value of each change introduced in this patch, particularly
if they are meaningful, can be improved in some way, or can be
simplified and make the code more simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh  wrote:
> Actually, I just verified that bimg_info is not even valid if
> has_image is not set.
> In DecodeXLogRecord, we initialize bimg_info only when has_image flag
> is set. So, keeping them
> separate doesn't look a good approach to me. If we keep them separate,
> the output
> of the following assert is undefined:
> Assert(XLogRecHasBlockImage(record, block_id) ||
> !XLogRecBlockImageApply(record, block_id)).
>
> Thoughts??

Yes, that's exactly the reason why we should keep both macros as
checking for separate things: apply implies that has_image is set and
that's normal, hence we could use sanity checks by just using those
macros and not propagating xlogreader.h.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 2:52 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh  
> wrote:
>> Actually, I just verified that bimg_info is not even valid if
>> has_image is not set.
>> In DecodeXLogRecord, we initialize bimg_info only when has_image flag
>> is set. So, keeping them
>> separate doesn't look a good approach to me. If we keep them separate,
>> the output
>> of the following assert is undefined:
>> Assert(XLogRecHasBlockImage(record, block_id) ||
>> !XLogRecBlockImageApply(record, block_id)).
>>
>> Thoughts??
>
> Yes, that's exactly the reason why we should keep both macros as
> checking for separate things: apply implies that has_image is set and
> that's normal, hence we could use sanity checks by just using those
> macros and not propagating xlogreader.h.
>
No, apply doesn't mean has_image is set. If has_image is not set,
apply/bimg_info
is invalid(/undefined) and we should not use that. For example, in
XLogDumpDisplayRecord we use
bimg_info as following,
if (XLogRecHasBlockImage(record, block_id))
{
   if (record->blocks[block_id].bimg_info & BKPIMAGE_IS_COMPRESSED)
   {
   }
}
So, whenever we are required to use bimg_info flag, we should make
sure that has_image
is set.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-03 Thread amul sul

Hi Takayuki-san,

IMHO, I think we could remove third paragraph completely and
generalised starting of second paragraph, somewhat looks likes as
follow:


-If you have a dedicated database server with 1GB or more of RAM, a
-reasonable starting value for shared_buffers is 25%
-of the memory in your system.  There are some workloads where even
+A reasonable starting value for
shared_buffers is 25%
+   of the RAM in your system.  There are some workloads where even
 large settings for shared_buffers are effective, but
 because PostgreSQL also relies on the
 operating system cache, it is unlikely that an allocation of more than

I may be wrong here, would like know your and/or community's thought
on this. Thanks.

Regards,
Amul Sul


The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-03 Thread Ashutosh Bapat

> Here is the updated version, which includes the restructuring you proposed.
> Other than the above issue and the alias issue we discussed, I addressed all
> your comments except one on testing; I tried to add test cases where the
> remote query is deparsed as nested subqueries, but I couldn't because IIUC,
> reduce_outer_joins reduced full joins to inner joins or left joins.

No always. It will do so in some cases as explained in the prologue of
reduce_outer_joins()

 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)

>  So, I
> added two test cases: (1) the joining relations are both base relations
> (actually, we already have that) and (2) one of the joining relations is a
> base relation and the other is a join relation.  I rebased the patch to
> HEAD, so I added a test case with aggregate pushdown also.
>

This patch again has a lot of unrelated changes, esp. in
deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
which seems unnecessary. Or there are changes like

-deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
-   List *tlist, List *remote_conds, List *pathkeys,
+deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
+   RelOptInfo *foreignrel, List *tlist,
+   List *remote_conds, List *pathkeys,

which arise because rel has been renamed as foreignrel. The patch will
work even without such renaming.

I think such refactoring, should be done in a separate patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 6:48 PM, Kuntal Ghosh  wrote:
> So, whenever we are required to use bimg_info flag, we should make
> sure that has_image
> is set.

OK, we are taking past each other here. There are two possible patterns:
- has_image is set, not apply, meaning that the image block is used
for consistency checks.
- has_image is set, as well as apply, meaning that the block needs to
be applied at redo.
So I mean exactly the same thing as you do. The point I am trying to
raise is that it would be meaningful to put in some code paths checks
of the type (apply && !has_image) and ERROR on them. Perhaps we could
just do that in xlogreader.c though. If having those checks external
to xlogreader.c makes sense, then using separate macros is more
portable.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-11-03 Thread Amit Kapila

On Wed, Nov 2, 2016 at 6:39 AM, Robert Haas  wrote:
> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila  wrote:
>> [ new patches ]
>
> I looked over parts of this today, mostly the hashinsert.c changes.
>
> +/*
> + * Copy bucket mapping info now;  The comment in _hash_expandtable where
> + * we copy this information and calls _hash_splitbucket explains why this
> + * is OK.
> + */
>
> So, I went and tried to find the comments to which this comment is
> referring and didn't have much luck.
>

I guess you have just tried to find it in the patch. However, the
comment I am referring above is an existing comment in
_hash_expandtable().  Refer below comment:
/*
* Copy bucket mapping info now; this saves re-accessing the meta page
* inside _hash_splitbucket's inner loop. ...

> At the point this code is
> running, we have a pin but no lock on the metapage, so this is only
> safe if changing any of these fields requires a cleanup lock on the
> metapage.  If that's true,
>

No that's not true, we need just Exclusive content lock to update
those fields and these fields should be copied when we have Share
content lock on metapage.  In version-8 of patch, it was correct, but
in last version, it seems during code re-arrangement, I have moved it.
I will change it such that these values are copied under matapage
share content lock.  I think moving it just before the preceding for
loop should be okay, let me know if you think otherwise.

> +nblkno = _hash_get_newblk(rel, pageopaque);
>
> I think this is not a great name for this function.  It's not clear
> what "new blocks" refers to, exactly.  I suggest
> FIND_SPLIT_BUCKET(metap, bucket) or OLD_BUCKET_TO_NEW_BUCKET(metap,
> bucket) returning a new bucket number.  I think that macro can be
> defined as something like this: bucket + (1 <<
> (fls(metap->hashm_maxbucket) - 1)).
>

I think such a macro would not work for the usage of incomplete
splits.  The reason is that by the time we try to complete the split
of the current old bucket, the table half (lowmask, highmask,
maxbucket) would have changed and it could give you the bucket in new
table half.

> Then do nblkno =
> BUCKET_TO_BLKNO(metap, newbucket) to get the block number.  That'd all
> be considerably simpler than what you have for hash_get_newblk().
>

I think to use BUCKET_TO_BLKNO we need either a share or exclusive
lock on metapage and as we need a lock on metapage to find new block
from old block, I thought it is better to do inside
_hash_get_newblk().

>
> Moving on ...
>
>  /*
>   * ovfl page exists; go get it.  if it doesn't have room, we'll
> - * find out next pass through the loop test above.
> + * find out next pass through the loop test above.  Retain the
> + * pin, if it is a primary bucket page.
>   */
> -_hash_relbuf(rel, buf);
> +if (pageopaque->hasho_flag & LH_BUCKET_PAGE)
> +_hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
> +else
> +_hash_relbuf(rel, buf);
>
> It seems like it would be cheaper, safer, and clearer to test whether
> buf != bucket_buf here, rather than examining the page opaque data.
> That's what you do down at the bottom of the function when you ensure
> that the pin on the primary bucket page gets released, and it seems
> like it should work up here, too.
>
> +boolretain_pin = false;
> +
> +/* page flags must be accessed before releasing lock on a page. 
> */
> +retain_pin = pageopaque->hasho_flag & LH_BUCKET_PAGE;
>
> Similarly.
>

Agreed, will change the usage as per your suggestion.

> I have also attached a patch with some suggested comment changes.
>

I will include it in next version of patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-03 Thread Kevin Grittner

On Wed, Nov 2, 2016 at 4:09 PM, Adam Brusselback
 wrote:
>> There may be some situations where crawling the indexes a row at a
>> time will perform better than this by enough to want to retain that
>> option.
>
> If an index existed, wouldn't it still be able to use that in the set-based
> implementation?

Yes.  The optimizer would compare plans and pick the lowest cost.

> Is there something which would make doing the check
> set-based ever worse than row based inherently?

I'm not sure.  I doubt that it would ever lose by very much, but
only benchmarking can really answer that question.

Anyway, it's probably premature to get too far into it now.  It
just occurred to me that it might be a worthwhile project once the
transition tables are available, so I did a quick set of triggers
to see what the potential was in a "best case" scenario.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improve hash-agg performance

2016-11-03 Thread Andres Freund

Hi,

There's two things I found while working on faster expression
evaluation, slot deforming and batched execution. As those two issues
often seemed quite dominant cost-wise it seemed worthwhile to evaluate
them independently.

1) We atm do one ExecProject() to compute each aggregate's
   arguments. Turns out it's noticeably faster to compute the argument
   for all aggregates in one go. Both because it reduces the amount of
   function call / moves more things into a relatively tight loop, and
   because it allows to deform all the required columns at once, rather
   than one-by-one.  For a single aggregate it'd be faster to avoid
   ExecProject alltogether (i.e. directly evaluate the expression as we
   used to), but as soon you have two the new approach is faster.

2) For hash-aggs we right now we store the representative tuple using
   the input tuple's format, with unneeded columns set to NULL. That
   turns out to be expensive if the aggregated-on columns are not
   leading columns, because we have to skip over a potentially large
   number of NULLs.  The fix here is to simply use a different tuple
   format for the hashtable.  That doesn't cause overhead, because we
   already move columns in/out of the hashslot explicitly anyway.

Comments?

Regards,

Andres Freund
>From 36b022c8a0f74a037c01c085810365b20830ba34 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 12 Jul 2016 01:01:28 -0700
Subject: [PATCH 1/2] Perform one only projection to compute agg arguments.

Previously we did a ExecProject() for each individual aggregate
argument. Doing so is quite a bit cheaper because ExecProject's fastpath
can do the work at once in a relatively tight loop, and because it
allows to easily slot_getsomeattrs the right amount of columns for all
the aggregates.
---
 src/backend/executor/nodeAgg.c | 136 ++---
 src/include/nodes/execnodes.h  |   4 ++
 2 files changed, 105 insertions(+), 35 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 28c15ba..3ed6fb2 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -213,6 +213,9 @@ typedef struct AggStatePerTransData
 	 */
 	int			numInputs;
 
+	/* offset of input columns in AggState->evalslot */
+	int			inputoff;
+
 	/*
 	 * Number of aggregated input columns to pass to the transfn.  This
 	 * includes the ORDER BY columns for ordered-set aggs, but not for plain
@@ -297,7 +300,6 @@ typedef struct AggStatePerTransData
 	 * well go whole hog and use ExecProject too.
 	 */
 	TupleDesc	evaldesc;		/* descriptor of input tuples */
-	ProjectionInfo *evalproj;	/* projection machinery */
 
 	/*
 	 * Slots for holding the evaluated input arguments.  These are set up
@@ -847,14 +849,20 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 	int			setno = 0;
 	int			numGroupingSets = Max(aggstate->phase->numsets, 1);
 	int			numTrans = aggstate->numtrans;
+	TupleTableSlot *slot = aggstate->evalslot;
+	AggStatePerTrans pertrans;
 
-	for (transno = 0; transno < numTrans; transno++)
+	/* compute input for all aggregates */
+	if (aggstate->evalproj)
+		aggstate->evalslot = ExecProject(aggstate->evalproj, NULL);
+
+	for (transno = 0, pertrans = aggstate->pertrans; transno < numTrans;
+		 transno++, pertrans++)
 	{
-		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
 		ExprState  *filter = pertrans->aggfilter;
 		int			numTransInputs = pertrans->numTransInputs;
 		int			i;
-		TupleTableSlot *slot;
+		int			inputoff = pertrans->inputoff;
 
 		/* Skip anything FILTERed out */
 		if (filter)
@@ -868,13 +876,10 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 continue;
 		}
 
-		/* Evaluate the current input expressions for this aggregate */
-		slot = ExecProject(pertrans->evalproj, NULL);
-
 		if (pertrans->numSortCols > 0)
 		{
 			/* DISTINCT and/or ORDER BY case */
-			Assert(slot->tts_nvalid == pertrans->numInputs);
+			Assert(slot->tts_nvalid >= pertrans->numInputs);
 
 			/*
 			 * If the transfn is strict, we want to check for nullity before
@@ -887,7 +892,7 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 			{
 for (i = 0; i < numTransInputs; i++)
 {
-	if (slot->tts_isnull[i])
+	if (slot->tts_isnull[i + inputoff])
 		break;
 }
 if (i < numTransInputs)
@@ -899,10 +904,25 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 /* OK, put the tuple into the tuplesort object */
 if (pertrans->numInputs == 1)
 	tuplesort_putdatum(pertrans->sortstates[setno],
-	   slot->tts_values[0],
-	   slot->tts_isnull[0]);
+	   slot->tts_values[inputoff],
+	   slot->tts_isnull[inputoff]);
 else
-	tuplesort_puttupleslot(pertrans->sortstates[setno], slot);
+{
+	/*
+	 * Copy slot contents, starting from inputoff, into sort
+	 * slot.
+	 */
+	ExecClearTuple(pertrans->evalslot);
+	memcpy(pertrans->e

Re: [HACKERS] Danger of automatic connection reset in psql

2016-11-03 Thread Ashutosh Bapat

On Thu, Oct 20, 2016 at 3:58 PM, Oleksandr Shulgin
 wrote:
> Hi Hackers!
>
> When using psql interactively one might be tempted to guard potentially
> destructive commands such as "UPDATE / DELETE / DROP " by starting
> the input line with an explicit "BEGIN; ...".  This has the added benefit
> that then you invoke the command by reverse-searching the command history,
> you get it together with the guarding transaction open statement.
>
> This, however, is not 100% safe as I've found out a few days ago.  Should
> the connection to the server get lost, the first of semicolon-separated
> statements, "BEGIN;" will only trigger connection reset, and if that is
> successful the following command(s) are going to be executed on the newly
> opened connection, but without the transaction guard.
>
> I'm not the first one to discover that, a search in archives gives at least
> 3 results:
>
> https://www.postgresql.org/message-id/flat/1097174870.9273.8.camel%40ipso.snappymail.ca#1097174870.9273.8.ca...@ipso.snappymail.ca
> https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.c
> https://www.postgresql.org/message-id/flat/CAD3a31U%2BfSBsq%3Dtxw2G-D%2BfPND_UN-nSojrGyaD%2BhkYUzvxusQ%40mail.gmail.com
>
> The second one even resulted in a TODO item:
>
>   Prevent psql from sending remaining single-line multi-statement queries
>   after reconnection
>
> I was thinking that simply adding a bool flag in the pset struct, to
> indicate that connection was reset during attempt to execute the last query
> would do the trick, but it only helps in exactly the case described above.
>
> Since this is already an improvement, I'm attaching a patch.
>

A couple of doubts/suggestions:
1. Should pset.conn_was_reset be set to false, when we make a
connection in do_connection()? Usually pset.conn_was_reset would be
initialised with 0 since it's a global variable, so this may not be
necessary. But as a precaution should we do this?

2. Comment below should be updated to reflect the new change
/* fall out of loop if lexer reached EOL */
-   if (scan_result == PSCAN_INCOMPLETE ||
+   if (pset.conn_was_reset ||
+   scan_result == PSCAN_INCOMPLETE ||

3. What happens when the connection is reset while the source is a
file say specified by \i in an interactive shell?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread alvherre


El 2016-10-28 07:53, Amit Langote escribió:

@@ -6267,6 +6416,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, 
Relation rel,

 * Validity checks (permission checks wait till we have the column
 * numbers)
 */
+   if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reference relation \"%s\"", 
RelationGetRelationName(pkrel)),
+ errdetail("Referencing partitioned tables in foreign key 
constraints is not supported.")));


Is there a plan for fixing this particular limitation?  It's a pretty 
serious problem for users,
and the suggested workaround (to create a separate non-partitioned table 
which carries only the PK
columns which is updated by triggers, and direct the FKs to it instead 
of to the partitioned table)

is not only a very ugly one, but also very slow.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using a latch between a background worker process and a thread

2016-11-03 Thread Abbas Butt

Thanks every body for the detailed advise.
Let me try replacing latches by condition variables.
I will report the results here.

On Wed, Nov 2, 2016 at 11:54 AM, Craig Ringer  wrote:

> On 2 November 2016 at 02:10, Robert Haas  wrote:
> > On Tue, Nov 1, 2016 at 12:35 PM, Abbas Butt 
> wrote:
> >> Hi,
> >> Consider this situation:
> >> 1. I have a background worker process.
> >> 2. The process creates a latch, initializes it using InitLatch & resets
> it.
> >> 3. It then creates a thread and passes the latch created in step 2 to
> it.
> >> To pass it, the process uses the last argument of pthread_create.
> >> 4. The thread blocks by calling WaitLatch.
> >> 5. The process after some time sets the latch using SetLatch.
> >>
> >> The thread does not notice that the latch has been set and keeps
> waiting.
> >>
> >> My question is:
> >> Are latches supposed to work between a process and a thread created by
> that
> >> process?
> >
> > Nothing in the entire backend is guaranteed to work if you spawn
> > multiple threads within the same process.
> >
> > Including this.
>
> Yep.
>
> You could have the main thread wait on the latch, then signal the
> other threads via appropriate pthread primitives. But you must ensure
> your other threads do nothing that calls into backend code. Including
> things like atexit handlers. They need to coordinate with the main
> thread to do everything PostgreSQL related, and you'd need to make
> sure the main thread handles all signals. That's the default for Linux
> - the main thread gets first chance at all signals and other threads'
> sigmasks are only checked if the main thread has masked the signal,
> but that means your other threads should be sure to mask all signals
> used by PostgreSQL. Good luck doing that portably.
>
> There are exceptions where you can call some backend functions and
> macros from other threads. But you'd have to analyse each on a case by
> case basis, and there's no guarantee they'll _stay_ safe.
>
> I'd just avoid using threads in the backend if at all possible.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
-- 
*Abbas*
Architect

Ph: 92.334.5100153
Skype ID: gabbasb
www.enterprisedb.co m


*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
 and more

Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-03 Thread Brad DeJong

Change "even large" to "even larger" because it is used in a comparison.

> ... a reasonable starting value for shared_buffers is 25%
> of the memory in your system. There are some workloads where even large 
> settings
> for shared_buffers are effective, ...

... are some workloads where even larger settings ...

Regards,
Brad DeJong



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-11-03 Thread Petr Jelinek

On 02/11/16 17:22, Peter Eisentraut wrote:
> On 10/24/16 9:22 AM, Petr Jelinek wrote:
>> I added one more prerequisite patch (the first one) which adds ephemeral
>> slots (or well implements UI on top of the code that was mostly already
>> there). The ephemeral slots are different in that they go away either on
>> error or when session is closed. This means the initial data sync does
>> not have to worry about cleaning up the slots after itself. I think this
>> will be useful in other places as well (for example basebackup). I
>> originally wanted to call them temporary slots in the UI but since the
>> behavior is bit different from temp tables I decided to go with what the
>> underlying code calls them in UI as well.
> 
> I think it makes sense to expose this.
> 
> Some of the comments need some polishing.
> 
> Eventually, we might want to convert the option list in
> CREATE_REPLICATION_SLOT into a list instead of adding more and more
> keywords (see also VACUUM), but not necessarily now.
> 
> I find the way Acquire and Release are handled now quite confusing.
> Because Release of an ephemeral slot means to delete it, you have
> changed most code to never release them until the end of the session.
> So there is a lot of ugly and confusing code that needs to know this
> difference.  I think we need to use some different verbs for different
> purposes here.  Acquire and release should keep their meaning of "I'm
> using this", and the calls in proc.c and postgres.c should be something
> like ReplicationSlotCleanup().
> 

Release does not really change behavior, it has always dropped ephemeral
slot.

So if I understand correctly what you are proposing is to change
behavior of Release to not remove ephemeral slot, add function that
removes the ephemeral slots of current session and add tracking of
ephemeral slots created in current session? That seems like quite more
complicated than what the patch does with little gain.

What about just releasing the ephemeral slot if the different one is
being acquired instead of the current error?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 1:59 PM, Mithun Cy  wrote:
> On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas  wrote:
>> Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.
>
> I repeated the test on new patch, It works fine now, Also did some more
> negative tests forcibly failing some internal calls. All tests have passed.
> This patch works as described and look good to me.

Great, committed.  There's still potentially more work to be done
here, because my patch omits some features that were present in
Victor's original submission, like setting the failover timeout,
optionally randomizing the order of the hosts, and distinguishing
between master and standby servers; Victor, or anyone, please feel
free to submit separate patches for those things.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] who calls InitializeTimeouts() ?

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 11:10 PM, Chapman Flack  wrote:
> It looks like for about 3 years, PL/Java has been calling
> InitializeTimeouts before calling RegisterTimeout. Looking over
> the callers of InitializeTimeouts in core, though, it appears
> that an extension like PL/Java should be able to assume it has
> already been called, and in fact would be rude to call it again,
> as it isn't idempotent and could conceivably clobber somebody
> else's timeouts.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 2:42 PM, Tomas Vondra
 wrote:
> BTW is it really a good idea to use nloops to track the number of workers
> executing a given node? How will that work if once we get parallel nested
> loops and index scans?

We already have parallel nested loops with inner index scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-11-03 Thread Robert Haas

On Tue, Nov 1, 2016 at 8:31 PM, Kouhei Kaigai  wrote:
> By the way, I'm a bit skeptical whether this enhancement is really beneficial
> than works for this enhancement, because we can now easily increase the number
> of processor cores to run seq-scan with qualifier, especially, when it has 
> high
> selectivity.
> How about your thought?

Are you saying we don't need to both making sequential scans faster
because we could just use parallel sequential scan instead?  That
doesn't sound right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

I've updated the patch for review.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v12.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stopping logical replication protocol

2016-11-03 Thread Craig Ringer

On 21 October 2016 at 19:38, Vladimir Gordiychuk  wrote:
> Craig, Andres what do you thinks about previous message?

I haven't had a chance to look further to be honest.

Since a downstream disconnect works, though it's ugly, it's not
something I can justify spending a lot of time on, and I already did
spend a lot on it in patch review/updating/testing etc.

I don't know what Andres wants, but I think CopyFail with
ERRCODE_QUERY_CANCELED is fine.

As for plugins that collect changes in memory and only send them on
commit, I'd call them "broken". Not an interesting case IMO. Don't do
that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2016-11-03 Thread Michael Banck

Hi,

this is a first review of this patch.

As a feature, I think this functionality is very welcome.  Having to
schedule a downtime in order to enable SSL or change the SSL certificate
is a nuisance and it might make admins think twice, reducing security.

The patch applies cleanly (modulo fuzz in the last hunk, but see below)
to master as of 00a86856, it compiles cleanly when using --with-openssl
and without that option and passes make check for both cases.

It does not update the documentation, I think at least section 18.9
"Secure TCP/IP Connections with SSL" needs updating as it says: "The
files server.key, server.crt, root.crt, and root.crl (or their
configured alternative names) are only examined during server start; so
you must restart the server for changes in them to take effect".

I (so far) lightly tested the functionality, and I could not find any
serious logic issues up till now. Changing or replacing the server
certificate with an expired one would make psql no longer connect with
sslmode=require. 

However see below for segfaults during testing.

Some code comments on the patch:

On Mon, Oct 31, 2016 at 10:40:18AM +0100, Andreas Karlsson wrote:
> diff --git a/src/backend/libpq/be-secure-openssl.c 
> b/src/backend/libpq/be-secure-openssl.c
> index 668f217..a1b582f 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -77,12 +77,14 @@ static DH  *generate_dh_parameters(int prime_len, int 
> generator);
[...] 
> + if ((context = initialize_context()) != NULL)
>   {
[...]
> + be_tls_destroy();

I am getting segfaults on reloading the configuration for this call.

|Program received signal SIGSEGV, Segmentation fault.
|X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105
|105x509_vpm.c: Datei oder Verzeichnis nicht gefunden.
|(gdb) bt full
|#0  X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105
|No locals.
|#1  0x7f3b59745a09 in SSL_CTX_free (a=0x21) at ssl_lib.c:1955
|i = -1
|#2  0x005ee314 in be_tls_destroy () at be-secure-openssl.c:197
|No locals.
|#3  be_tls_init () at be-secure-openssl.c:181
|No locals.
|#4  0x005e3f55 in secure_initialize () at be-secure.c:74
|No locals.
|#5  0x0065f377 in SIGHUP_handler (postgres_signal_arg=) 
at postmaster.c:2524
|save_errno = 4
|__func__ = "SIGHUP_handler"
|#6  
|No locals.
|#7  0x7f3b58938873 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:81
|No locals.
|#8  0x0046e027 in ServerLoop () at postmaster.c:1672
|timeout = {tv_sec = 54, tv_usec = 568533}
|rmask = {fds_bits = {56, 0 }}
|selres = 
|now = 
|readmask = {fds_bits = {56, 0 }}
|last_lockfile_recheck_time = 1478169442
|last_touch_time = 1478169442
|__func__ = "ServerLoop"
|#9  0x0066263e in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x2405df0)
|at postmaster.c:1316
|opt = 
|status = 
|userDoption = 
|listen_addr_saved = 
|i = 
|output_config_variable = 
|__func__ = "PostmasterMain"
|#10 0x0046f6b7 in main (argc=3, argv=0x2405df0) at main.c:228
|No locals.

What I'm doing is:

|postgres=# SHOW ssl;
| ssl 
|-
| on
|(1 row)
|
|postgres=# ALTER SYSTEM SET ssl = off;
|ALTER SYSTEM
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf 
|
| t
|(1 row)
|
|postgres=# SHOW ssl;
| ssl 
|-
| off
|(1 row)
|
|postgres=# ALTER SYSTEM SET ssl = on;
|ALTER SYSTEM
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf 
|
| t
|(1 row)
|
|postgres=# SHOW ssl;
|FATAL:  terminating connection due to unexpected postmaster exit
|server closed the connection unexpectedly
|   This probably means the server terminated abnormally
|   before or while processing the request.
|The connection to the server was lost. Attempting reset: Failed.
|!> 

The other one I got:

|Program received signal SIGSEGV, Segmentation fault.
|sk_pop_free (st=0x17f0002, func=0x7fdbac018ed0 ) at 
stack.c:291
|291stack.c: Datei oder Verzeichnis nicht gefunden.
|(gdb) bt full
|#0  sk_pop_free (st=0x17f0002, func=0x7fdbac018ed0 ) 
at stack.c:291
|i = 
|#1  0x7fdbac04324a in x509_verify_param_zero (param=0x10d4b40) at 
x509_vpm.c:85
|No locals.
|#2  X509_VERIFY_PARAM_free (param=0x10d4b40) at x509_vpm.c:105
|No locals.
|#3  0x7fdbac33ca09 in SSL_CTX_free (a=0x17f0002) at ssl_lib.c:1955
|i = -1
|#4  0x005ee84c in be_tls_destroy () at be-secure-openssl.c:197
|No locals.
|#5  0x005e3f65 in secure_destroy () at be-secure.c:87
|No locals.
|#6  0x0065f395 in SIGHUP_handler (postgres_signal_arg=) 
at postmaster.c:2532
|save_errno = 4
|__func__ = "SIGHUP_handler"
|#7  
|No locals.
|#8  0x7fdbab52f873 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:81
|No locals.
|#9  0x0046e027 in ServerLoop () at pos

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 7:47 PM, Kuntal Ghosh  wrote:
> I've updated the patch for review.
>
If an inconsistency is found, it'll just log it for now. Once, the
patch is finalized, we can
change it to FATAL as before. I was making sure that all regression
tests should pass with the patch.
It seems that there is some inconsistency in regression tests for BRIN index.

LOG:  Inconsistent page found, rel 1663/16384/30607, forknum 0, blkno 1
CONTEXT:  xlog redo at 0/9BAE08C8 for BRIN/UPDATE+INIT: heapBlk 100
pagesPerRange 1 old offnum 11, new offnum 1

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making table reloading easier

2016-11-03 Thread Corey Huinker

>
>
>   ALTER TABLE my_table
>   DISABLE INDEX ALL;
>

+1
This very thing came up in a conversation with PeterG early last year. I
was in favor then and I was surprised that the only thing standing in the
way was a lack of ALTER TABLE syntax.
Creating temporary data structures to mimic existing metadata structures is
a pain.


>
> It'd be even better to also add a REINDEX flag to COPY, where it
> disables indexes and re-creates them after it finishes. But that could
> be done separately.
>

I'm iffy on the COPY change. If we add index rebuilding, why not disabling
as well? If the COPY fails, what state do we leave the indexes in?

I'm not sure I can tackle this in the current dev cycle,


I may have some spare cycles to devote to this, but it's unfamiliar
territory. I'm happy to do the grunt work if I had some higher level
guidance.

Re: [HACKERS] Confusing docs about GetForeignUpperPaths in fdwhandler.sgml

2016-11-03 Thread Robert Haas

On Wed, Oct 26, 2016 at 3:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I'm not interested in committing this patch.  I don't believe it is an
>> improvement on what we've got today.
>> Tom, any chance you could offer an opinion?
>
> I have no objection to this patch as such, but I think that the docs
> around FDW direct modify need significantly more work than this.
> I've had a to-do item for awhile to work on that, but it hasn't gotten
> to the top of the list.

Well, the question is whether you or someone else is willing to commit
it.  I am not.  If no one else is either, then let's call it Rejected
and move on.

> A larger issue is that I think the API itself is poorly designed, as
> I stated awhile ago (<31706.1457547...@sss.pgh.pa.us>) and was told it
> was too late to object.  So that's kind of discouraged me from bothering.

Apparently, you were already fairly discouraged, because you were
weighing in about once per release cycle to say "nope, still not
right".  I'd really be quite happy to see you take a more active hand
in the FDW discussions; clearly, you've got a lot of understanding of
that area that is pretty much unique to you.  I'd even support an
effort to rewrite the work that has already been done in a form more
to your liking, but I think you'd need to actually pay attention to
the threads on a somewhat regular basis in order that to be practical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 4:00 PM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> while eye-balling some explain plans for parallel queries, I got a bit
>> confused by the row count estimates. I wonder whether I'm alone.
>
> I got confused by that a minute ago, so no you're not alone.  The problem
> is even worse in join cases.  For example:
>
>  Gather  (cost=34332.00..53265.35 rows=100 width=8)
>Workers Planned: 2
>->  Hash Join  (cost=2.00..52255.35 rows=100 width=8)
>  Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
>  ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
>->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 
> widt
> h=8)
>->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 
> width=8
> )
>  ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
>->  Seq Scan on cc  (cost=0.00..14425.00 rows=100 width=8)
>
> There are actually 100 rows in pp, and none in pp1.  I'm not bothered
> particularly by the nonzero estimate for pp1, because I know where that
> came from, but I'm not very happy that nowhere here does it look like
> it's estimating a million-plus rows going into the join.

I welcome suggestions for improvement, but you will note that if the
row count didn't reflect some kind of guess about the number of rows
that each individual worker will see, the costing would be hopelessly
broken.  The cost needs to reflect a guess about the time the query
will finish, not the total amount of effort expended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 10:44 PM, Tomas Vondra
 wrote:
> Although - it is estimating 1M rows, but only "per worker" estimates are
> shown, and because there are 2 workers planned it says 1M/2.4 which is the
> 416k. I agree it's a bit unclear, but at least it's consistent with how we
> treat loops (i.e. that the numbers are per loop).

Right.  Which I think was a horrible decision.  I think that it would
be best to change EXPLAIN so that the row counts and costs are never
divided by nloops.  That would be a backward-incompatible change, but
I think it would be worth it.  What you typically want to understand
is the total effort expended in a particular plan node, and the
current system makes that incredibly difficult to understand,
especially because we then round off the row count estimates to the
nearest integer, so that you can't even reverse the division if you
want to (which you always do).

> But there's more fun with joins - consider for example this simple join:
>
>QUERY PLAN
> --
>  Gather  (cost=19515.96..43404.82 rows=96957 width=12)
>  (actual time=295.167..746.312 rows=9 loops=1)
>Workers Planned: 2
>Workers Launched: 2
>->  Hash Join  (cost=18515.96..32709.12 rows=96957 width=12)
>   (actual time=249.281..670.309 rows=3 loops=3)
>  Hash Cond: (t2.a = t1.a)
>  ->  Parallel Seq Scan on t2
>  (cost=0.00..8591.67 rows=416667 width=8)
>  (actual time=0.100..184.315 rows=33 loops=3)
>  ->  Hash  (cost=16925.00..16925.00 rows=96957 width=8)
>(actual time=246.760..246.760 rows=9 loops=3)
>Buckets: 131072  Batches: 2  Memory Usage: 2976kB
>->  Seq Scan on t1
>(cost=0.00..16925.00 rows=96957 width=8)
>(actual time=0.065..178.385 rows=9 loops=3)
>  Filter: (b < 10)
>  Rows Removed by Filter: 91
>  Planning time: 0.763 ms
>  Execution time: 793.653 ms
> (13 rows)
>
> Suddenly we don't show per-worker estimates for the hash join - both the
> Hash Join and the Gather have exactly the same cardinality estimate.

I'm not sure why that's happening, but I haven't made any changes to
the costing for a node like hash join.  It doesn't treat the parallel
sequential scan that is coming as its first input any differently than
it would if that were a non-parallel plan.  It's just costing the join
normally, based on an input row count that is lower than what it would
be if it were going to see every row from t2 rather than only some of
them.

> So, different join method but same result - 2 workers, loops=3. But let's
> try with small tables (100k rows instead of 1M rows):
>
>   QUERY PLAN
> 
>  Gather  (cost=0.29..36357.94 rows=100118 width=12) (actual
> time=13.219..589.723 rows=10 loops=1)
>Workers Planned: 1
>Workers Launched: 1
>Single Copy: true
>->  Nested Loop  (cost=0.29..36357.94 rows=100118 width=12)
> (actual time=0.288..442.821 rows=10 loops=1)
>  ->  Seq Scan on t1  (cost=0.00..1444.18 rows=100118 width=8)
>   (actual time=0.148..49.308 rows=10 loops=1)
>  ->  Index Scan using t2_a_idx on t2
>   (cost=0.29..0.34 rows=1 width=8)
>   (actual time=0.002..0.002 rows=1 loops=10)
>Index Cond: (a = t1.a)
>  Planning time: 0.483 ms
>  Execution time: 648.941 ms
> (10 rows)
>
> Suddenly, we get nworkers=1 with loops=1 (and not nworkers+1 as before).
> FWIW I've only seen this with force_parallel_mode=on, and the row counts are
> correct, so perhaps that's OK. single_copy seems a bit underdocumented,
> though.

This is certainly entirely as expected.  Single-copy means that
there's one process running the non-parallel plan beneath it, and
that's it.  So the Gather is just a pass-through node here, like a
Materialize or Sort: the number of input rows and the number of output
rows have to be the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Copying Permissions

2016-11-03 Thread Corey Huinker

Craig's post yesterday about exposing syntax for disabling indexes reminded
me of another feature I think we're lacking in areas where we have to do
table management.

The issue is to create a *something* that has the exact permissions of
another *something*. Usually it's creating a table related to (but not yet
inheriting) a parent, but it could also be to drop and recreate a
*something*, making sure it has the same permissions it had before.

BEGIN;

CREATE VIEW dummy AS SELECT 1::text as dummy;

UPDATE pg_class
SET relacl = ( SELECT relacl FROM pg_class
   WHERE oid = 'foo'::regclass)
WHERE oid = 'dummy'::regclass;

DROP VIEW foo;

CREATE VIEW foo AS ;

UPDATE pg_class
SET relacl = ( SELECT relacl FROM pg_class
   WHERE oid = 'dummy'::regclass)
WHERE oid = 'foo'::regclass;

END;

I suppose I could have learned how to store a relacl as a scalar and just
saved it to a variable, but even then I'd have to twiddle with (and have
the permissions to twiddle with) pg_class.

So it'd be nice to:
ALTER TABLE bar SET PERMISSIONS FROM foo;
or maybe even
GRANT SAME PERMISSIONS ON VIEW bar FROM foo;

Thoughts?

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
 wrote:
> - Another suggestion was to remove wal_consistency from PostgresNode.pm
> because small buildfarm machines may suffer on it. Although I've no
> experience in this matter, I would like to be certain that nothings breaks
> in recovery tests after some modifications.

I think running the whole test suite with this enabled is going to
provoke complaints from buildfarm owners.  That's too bad, because I
agree with you that it would be nice to have the test coverage, but it
seems that many of the buildfarm machines are VMs with very minimal
resource allocations -- or very old physical machines -- or running
with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
you blow on them too hard, they fall over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain vs. parallel query

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 12:49 PM, Tomas Vondra
 wrote:
> On 11/01/2016 08:32 PM, Robert Haas wrote:
>> On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
>>  wrote:
>>>
>>> Damn! You're right of course. Who'd guess I need more coffee this early?
>>>
>>> Attached is a fix replacing the flag with an array of flags, indexed by
>>> ParallelMasterBackendId. Hopefully that makes it work with multiple
>>> concurrent parallel queries ... still, I'm not sure this is the right
>>> solution.
>>
>> I feel like it isn't.  I feel like this ought to go in the DSM for
>> that parallel query, not the main shared memory segment, but I'm not
>> sure how to accomplish that offhand.  Also, if we do solve it this
>> way, surely we don't need the locking.  The flag's only set before any
>> workers have started and never changes thereafter.
>
> I'm not sure what you mean by "DSM for that parallel query" - I thought the
> segments are created for Gather nodes, no? Or is there a DSM for the whole
> query that we could use?

Sure, the Gather node creates it.  There's generally only one per
query, though, and that's how most information is communicated from
leader to workers.

> Another thing is that maybe we don't really want to give extensions access
> to any of those segments - my impression was those segments are considered
> internal (is there RequestAddinShmemSpace for them?). And hacking something
> just for auto_explain seems a big ugly.

Yeah.  I thought that there wouldn't be any reason for third-party
code to need to get into these segments, but maybe that was
short-sighted of me.  If we fix this without that, then we've got to
force pg_stat_statements to be loaded through
shared_preload_libarries, as you mentioned, and that doesn't seem nice
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add make rules to download raw Unicode mapping files

2016-11-03 Thread Peter Eisentraut

On 11/2/16 3:33 AM, Michael Paquier wrote:
> On Wed, Nov 2, 2016 at 12:58 AM, Peter Eisentraut  wrote:
>> Add make rules to download raw Unicode mapping files
>>
>> This serves as implicit documentation and is handy if someone wants to
>> tweak things.  The rules are not part of a normal build, like this
>> entire directory.
> 
> If the goal is to prevent to have those files in the code tree,
> wouldn't it make sense as well to have a .gitignore with all those
> files in it?

Well, the goal is to document where the files come from.  If someone
wants to refine this more, go ahead, but I have accomplished what I set
out to do. :)


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-03 Thread Robert Haas

On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra
 wrote:
> I don't think I've suggested not committing any of the clog patches (or
> other patches in general) because shifting the contention somewhere else
> might cause regressions. At the end of the last CF I've however stated that
> we need to better understand the impact on various wokloads, and I think
> Amit agreed with that conclusion.
>
> We have that understanding now, I believe - also thanks to your idea of
> sampling wait events data.
>
> You're right we can't fix all the contention points in one patch, and that
> shifting the contention may cause regressions. But we should at least
> understand what workloads might be impacted, how serious the regressions may
> get etc. Which is why all the testing was done.

OK.

> Sure, I understand that. My main worry was that people will get worse
> performance with the next major version that what they get now (assuming we
> don't manage to address the other contention points). Which is difficult to
> explain to users & customers, no matter how reasonable it seems to us.
>
> The difference is that both the fast-path locks and msgNumLock went into
> 9.2, so that end users probably never saw that regression. But we don't know
> if that happens for clog and WAL.
>
> Perhaps you have a working patch addressing the WAL contention, so that we
> could see how that changes the results?

I don't think we do, yet.  Amit or Kuntal might know more.  At some
level I think we're just hitting the limits of the hardware's ability
to lay bytes on a platter, and fine-tuning the locking may not help
much.

> I might be wrong, but I doubt the kernel guys are running particularly wide
> set of tests, so how likely is it they will notice issues with specific
> workloads? Wouldn't it be great if we could tell them there's a bug and
> provide a workload that reproduces it?
>
> I don't see how "it's a Linux issue" makes it someone else's problem. The
> kernel guys can't really test everything (and are not obliged to). It's up
> to us to do more testing in this area, and report issues to the kernel guys
> (which is not happening as much as it should).

I don't exactly disagree with any of that.  I just want to find a
course of action that we can agree on and move forward.  This has been
cooking for a long time, and I want to converge on some resolution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-03 Thread Robert Haas

On Wed, Oct 26, 2016 at 11:25 PM, Karl O. Pinc  wrote:
> What it comes down to is I don't buy the adequacy of the
> ".csv" suffix test and think that "keeping things simple" now
> is a recipe for future breakage, or at least significant future
> complication and confusion when it come to processing logfile
> content.

Sounds like a plausible argument (although I haven't looked at the code).

> My thoughts are as follows:  Either you know the log format because
> you configured the cluster or you don't.  If you don't know the log
> format having the log file is halfway useless.  You can do something
> like back it up, but you can't ever look at it's contents (in some
> sense) without knowing what data structure you're looking at.
>
> Therefore pg_current_logfile() without any arguments is, in the sense
> of any sort of automated processing of the logfile content, useless.

Yeah, but it's not useless in general.  I've certainly had cases where
somebody gave me access to their PostgreSQL cluster to try to fix a
problem, and I'm struggling to find the log files.  Being able to ask
the PostgreSQL cluster "where are all of the files to which you are
logging?" sounds helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row level security implementation in Foreign Table in Postgres

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 10:46 PM, Sounak Chakraborty  wrote:
> Row level security feature implementation in Postgres is through policy and 
> the row security qualifier is attached as a subquery to the main query before 
> query planning. The RLS is implemented through ALTER TABLE STATEMENT.
> But my doubt is why this feature is not enabled in case of Foreign Table. 
> (ALTER FOREIGN TABLE doesn't have a option of enabling Row Level Security).
> Is this is not implemented due to some limitations in the current design?
> Because from a quick view it looks like the security subquery can also be 
> easily attached to the main query and passed for processing in foreign 
> database.

Yeah, I don't see why that couldn't be made to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-11-03 Thread Peter Eisentraut

On 11/2/16 1:54 PM, Tom Lane wrote:
> I think the right thing is likely to be to copy the presented bytea
> into a palloc'd (and therefore properly aligned) buffer.  And not
> just in this one function.

Does the attached look reasonable?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 086c603016de5018a8baddc88a8932472e0fcd1e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 3 Nov 2016 12:00:00 -0400
Subject: [PATCH] pageinspect: Fix unaligned struct access in GIN functions

The raw page data that is passed into the functions will not be aligned
at 8-byte boundaries.  Casting that to a struct and accessing int64
fields will result in unaligned access.  On most platforms, you get away
with it, but it will result on a crash on pickier platforms such as ia64
and sparc64.
---
 contrib/pageinspect/ginfuncs.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
index 145e2ce..660d236 100644
--- a/contrib/pageinspect/ginfuncs.c
+++ b/contrib/pageinspect/ginfuncs.c
@@ -51,7 +51,10 @@ gin_metapage_info(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("input page too small (%d bytes)", raw_page_size)));
-	page = VARDATA(raw_page);
+
+	/* make a copy so that the page is properly aligned for struct access */
+	page = palloc(raw_page_size);
+	memcpy(page, VARDATA(raw_page), raw_page_size);
 
 	opaq = (GinPageOpaque) PageGetSpecialPointer(page);
 	if (opaq->flags != GIN_META)
@@ -115,7 +118,10 @@ gin_page_opaque_info(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("input page too small (%d bytes)", raw_page_size)));
-	page = VARDATA(raw_page);
+
+	/* make a copy so that the page is properly aligned for struct access */
+	page = palloc(raw_page_size);
+	memcpy(page, VARDATA(raw_page), raw_page_size);
 
 	opaq = (GinPageOpaque) PageGetSpecialPointer(page);
 
@@ -195,7 +201,10 @@ gin_leafpage_items(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("input page too small (%d bytes)", raw_page_size)));
-		page = VARDATA(raw_page);
+
+		/* make a copy so that the page is properly aligned for struct access */
+		page = palloc(raw_page_size);
+		memcpy(page, VARDATA(raw_page), raw_page_size);
 
 		if (PageGetSpecialSize(page) != MAXALIGN(sizeof(GinPageOpaqueData)))
 			ereport(ERROR,
-- 
2.10.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commitfest 2016-11 status summary

2016-11-03 Thread Robert Haas

On Tue, Nov 1, 2016 at 3:48 PM, Haribabu Kommi  wrote:
> There are plenty of patches that are in "ready for committer" state,
> committers please have a look at those patches and give some conclusion
> on them.

Yes, we really need some more committer attention on a lot of these
patches.  I've been trying to do what I can to whittle that list down,
but it's long and contains many complex patches, as well as quite a
few patches in areas that I don't really understand.  For example, I
have no idea whether these things are good ideas, because I don't know
Windows:

https://commitfest.postgresql.org/11/604/ - pgwin32_is_service not
checking if SECURITY_SERVICE_SID is disabled
https://commitfest.postgresql.org/11/620/ - Updating Windows
environment variables

Any chance that some Windows-savvy committer can take a look at those?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths

2016-11-03 Thread Robert Haas

On Tue, Oct 25, 2016 at 1:59 AM, Tatsuro Yamada
 wrote:
> Hi Ashutosh,
>> You are right. Every call except that one is using NIL, so better to
>> fix it. The pattern was repeated in the recent aggregate pushdown
>> support. Here's patch to fix create_foreignscan_path() call in
>> add_foreign_grouping_paths() as well.
>
> Thanks for the review!

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas

On Thu, Nov 3, 2016 at 7:46 AM,   wrote:
> El 2016-10-28 07:53, Amit Langote escribió:
>> @@ -6267,6 +6416,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab,
>> Relation rel,
>>  * Validity checks (permission checks wait till we have the column
>>  * numbers)
>>  */
>> +   if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> +   ereport(ERROR,
>> +   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> +errmsg("cannot reference relation
>> \"%s\"", RelationGetRelationName(pkrel)),
>> +errdetail("Referencing partitioned tables
>> in foreign key constraints is not supported.")));
>
> Is there a plan for fixing this particular limitation?  It's a pretty
> serious problem for users,
> and the suggested workaround (to create a separate non-partitioned table
> which carries only the PK
> columns which is updated by triggers, and direct the FKs to it instead of to
> the partitioned table)
> is not only a very ugly one, but also very slow.

If you have two compatibly partitioned tables, and the foreign key
matches the partitioning keys, you could implement a foreign key
between the two tables as a foreign key between each pair of matching
partitions.  Otherwise, isn't the only way to handle this a global
index?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Complete LOCK TABLE ... IN ACCESS|ROW|SHARE

2016-11-03 Thread Robert Haas

On Thu, Oct 6, 2016 at 2:28 PM, marllius ribeiro
 wrote:
> This was my first test which had help Gerdan.
>
> I did some tests and found nothing special. The stated resource is 
> implemented correctly.
> He passes all regression tests and enables the use of the new features 
> specified.

Committed.  I tweaked the comments just slightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas

On Tue, Nov 1, 2016 at 2:36 PM, Corey Huinker  wrote:
> On Tue, Nov 1, 2016 at 2:24 PM, Robert Haas  wrote:
>> Well, I'm not sure we've exactly reached consensus here, and you're
>> making me feel like I just kicked a puppy.
>
> It was hyperbole. I hope you found it as funny to read as I did to write.
> This is a great feature and I'm not going to make "perfect" the enemy of
> "good".

Oh, OK.  Sorry, the tone did not transfer to my brain in the way you
intended it - I thought you were actually upset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 6:47 AM, Amit Langote
 wrote:
> OK, I changed things so that DROP TABLE a_partition no longer complains
> about requiring to detach first.  Much like how index_drop() locks the
> parent table ('parent' in a different sense, of course) and later
> invalidates its relcache, heap_drop_with_catalog(), if the passed in relid
> is a partition, locks the parent table using AccessExclusiveLock, performs
> its usual business, and finally invalidates the parent's relcache before
> closing it without relinquishing the lock.  Does that sounds sane?

Yes.

> One
> downside is that if the specified command is DROP TABLE parent CASCADE,
> the above described invalidation is a waste of cycles because the parent
> will be dropped anyway after all the partitions are dropped.

I don't think that's even slightly worth worrying about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain vs. parallel query

2016-11-03 Thread Tomas Vondra


On 11/03/2016 03:59 PM, Robert Haas wrote:

On Wed, Nov 2, 2016 at 12:49 PM, Tomas Vondra
 wrote:

On 11/01/2016 08:32 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
 wrote:


Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.


I feel like it isn't.  I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand.  Also, if we do solve it this
way, surely we don't need the locking.  The flag's only set before any
workers have started and never changes thereafter.


I'm not sure what you mean by "DSM for that parallel query" - I thought the
segments are created for Gather nodes, no? Or is there a DSM for the whole
query that we could use?


Sure, the Gather node creates it. There's generally only one per
query, though, and that's how most information is communicated from
leader to workers.



Ah, right. I haven't realized there's just a single Gather per query.

>>

Another thing is that maybe we don't really want to give extensions access
to any of those segments - my impression was those segments are considered
internal (is there RequestAddinShmemSpace for them?). And hacking something
just for auto_explain seems a big ugly.


Yeah.  I thought that there wouldn't be any reason for third-party
code to need to get into these segments, but maybe that was
short-sighted of me.  If we fix this without that, then we've got to
force pg_stat_statements to be loaded through
shared_preload_libarries, as you mentioned, and that doesn't seem nice
either.


Well, I was talking about auto_explain, and pg_stat_statements already 
has to be loaded through shared_preload_libraries. I'm not sure how many 
other extensions would need similar hack (I don't see any in contrib, 
but there may be external ones).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas

On Wed, Nov 2, 2016 at 3:41 AM, Amit Langote
 wrote:
> As for which parts of the system need to know about these implicit
> partition constraints to *enforce* them for data integrity, we could say
> that it's really just one site - ExecConstraints() called from
> ExecInsert()/ExecUpdate().

Well, that's a slightly optimistic view of the situation, because the
issue is whether ExecConstraints() is going to get called in the first
place.  But now that I look at it, there are only a handful of
callers, so maybe it's not so bad.

> Admittedly, the current error message style as in this patch exposes the
> implicit constraint approach to a certain criticism: "ERROR:  new row
> violates the partition boundary specification of \"%s\"".  It would say
> the following if it were a named constraint: "ERROR: new row for relation
> \"%s\" violates check constraint \"%s\""
>
> For constraint exclusion optimization, we teach get_relation_constraints()
> to look at these constraints.  Although greatly useful, it's not the case
> of being absolutely critical.
>
> Beside the above two cases, there is bunch of code (relcache, DDL) that
> looks at regular constraints, but IMHO, we need not let any of that code
> concern itself with the implicit partition constraints.  Especially, I
> wonder why the client tools should want the implicit partitioning
> constraint to be shown as a CHECK constraint?  As the proposed patch 0004
> (psql) currently does, isn't it better to instead show the partition
> bounds as follows?
>
> +CREATE TABLE part_b PARTITION OF parted (
> +   b WITH OPTIONS NOT NULL DEFAULT 1 CHECK (b >= 0),
> +   CONSTRAINT check_a CHECK (length(a) > 0)
> +) FOR VALUES IN ('b');

Good point.

> +\d part_b
> + Table "public.part_b"
> + Column |  Type   | Modifiers
> ++-+
> + a  | text|
> + b  | integer | not null default 1
> +Partition of: parted FOR VALUES IN ('b')
> +Check constraints:
> +"check_a" CHECK (length(a) > 0)
> +"part_b_b_check" CHECK (b >= 0)
>
> Needless to say, that could save a lot of trouble thinking about
> generating collision-free names of these constraints, their dependency
> handling, unintended altering of these constraints, pg_dump, etc.

Well, that's certainly true, but those problems don't strike me as so
serious that they couldn't be solved with a reasonable amount of
labor.

>> In fact, as far as I can see, the only advantage of this approach is
>> that when the insert arrives through the parent and is routed to the
>> child by whatever tuple-routing code we end up with (I guess that's
>> what 0008 does), we get to skip checking the constraint, saving CPU
>> cycles.  That's probably an important optimization, but I don't think
>> that putting the partitioning constraint in the catalog in any way
>> rules out the possibility of performing that optimization.  It's just
>> that instead of having the partitioning excluded-by-default and then
>> sometimes choosing to include it, you'll have it included-by-default
>> and then sometimes choose to exclude it.
>
> Hmm, doesn't it seem like we would be making *more* modifications to the
> existing code (backend or otherwise) to teach it about excluding the
> implicit partition constraints than the other way around?  The other way
> around being to modify the existing code to include the implicit
> constraints which is what this patch is about.

I'm not sure which way is actually going to be more code modification,
but I guess you've persuaded me that the way you have it is
reasonable, so let's stick with that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain vs. parallel query

2016-11-03 Thread Robert Haas

On Thu, Nov 3, 2016 at 12:11 PM, Tomas Vondra
 wrote:
>> Sure, the Gather node creates it. There's generally only one per
>> query, though, and that's how most information is communicated from
>> leader to workers.
>
> Ah, right. I haven't realized there's just a single Gather per query.

That's not a hard and fast rule; it just usually works out that way.

>> Yeah.  I thought that there wouldn't be any reason for third-party
>> code to need to get into these segments, but maybe that was
>> short-sighted of me.  If we fix this without that, then we've got to
>> force pg_stat_statements to be loaded through
>> shared_preload_libarries, as you mentioned, and that doesn't seem nice
>> either.
>
> Well, I was talking about auto_explain, and pg_stat_statements already has
> to be loaded through shared_preload_libraries. I'm not sure how many other
> extensions would need similar hack (I don't see any in contrib, but there
> may be external ones).

Sorry, auto_explain, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Applying XLR_INFO_MASK correctly when looking at WAL record information

2016-11-03 Thread Ashutosh Sharma

Hi,

> What about the patch attached to make things more consistent?

I have reviewed this patch. It does serve the purpose and looks sane
to me. I am marking it as ready for committer.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add more NLS to bin

2016-11-03 Thread Peter Eisentraut

On 10/31/16 1:58 AM, Michael Paquier wrote:
> 2) For 0002 and pg_test_fsync, I am seeing a missing entry:
> printf(NA_FORMAT, "n/a*\n");

ok

> 4) 0004 and pg_upgrade... In check.c, three places like that:
> if (!db_used)
> {
> fprintf(script, "Database: %s\n", active_db->db_name);
> db_used = true;
> }
> 
> In exec.c:
> #endif
> fprintf(log, "command: %s\n", cmd);
> #ifdef WIN32

These and several of the other places write into log or script files.  I
have chosen not to do anything about those for now.  That might be a
future change, or we just leave them.

> +GETTEXT_FLAGS= \
> +pg_fatal:1:c-format \
> +pg_log:2:c-format \
> +prep_status:1:c-format \
> +report_stats:2:c-forma
> s/report_stats/report_status/

ok

> In info.c, missing some entries in report_unmatched_relation() when
> reporting unmatching relations?

Yeah, that will need a bit of a rewrite, so FIXME later?

> In util.c, doesn't pg_log_v() need to handle strings used in fprintf?

Which specific lines do you have in mind?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-03 Thread Gilles Darold

Le 03/11/2016 à 16:15, Robert Haas a écrit :
> On Wed, Oct 26, 2016 at 11:25 PM, Karl O. Pinc  wrote:
>> What it comes down to is I don't buy the adequacy of the
>> ".csv" suffix test and think that "keeping things simple" now
>> is a recipe for future breakage, or at least significant future
>> complication and confusion when it come to processing logfile
>> content.
> Sounds like a plausible argument (although I haven't looked at the code).

Yes, the current v11 patch doesn't rely on any extension anymore,
instead the log format is now stored with the log file name.

>> My thoughts are as follows:  Either you know the log format because
>> you configured the cluster or you don't.  If you don't know the log
>> format having the log file is halfway useless.  You can do something
>> like back it up, but you can't ever look at it's contents (in some
>> sense) without knowing what data structure you're looking at.
>>
>> Therefore pg_current_logfile() without any arguments is, in the sense
>> of any sort of automated processing of the logfile content, useless.
> Yeah, but it's not useless in general.  I've certainly had cases where
> somebody gave me access to their PostgreSQL cluster to try to fix a
> problem, and I'm struggling to find the log files.  Being able to ask
> the PostgreSQL cluster "where are all of the files to which you are
> logging?" sounds helpful.
>

+1

Current implementation always returns the log file in use by the logging
collector when pg_current_logfile() is called without arguments. If
stderr and csvlog are both enabled in log_destination, then it will
return the stderr log. If you need to specifically ask for the csvlog
file you can give it with as pg_current_logfile parameter.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2016-11-03 Thread Jesper Pedersen


Hi,

On 11/02/2016 01:38 AM, Ashutosh Sharma wrote:

While replaying the delete/vacuum record on standby, it can conflict
with some already running queries.  Basically the replay can remove
some row which can be visible on standby.  You need to resolve
conflicts similar to what we do in btree delete records (refer
btree_xlog_delete).


Agreed. Thanks for putting this point. I have taken care of it in the
attached v2 patch.



Some initial comments.

_hash_vacuum_one_page:

+   END_CRIT_SECTION();
+   _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

The _hash_chgbufaccess() needs a comment.

You also need a place where you call pfree for so->killedItems - maybe 
in hashkillitems().


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split up psql \d Modifiers column

2016-11-03 Thread Peter Eisentraut

On 11/2/16 12:24 PM, Kuntal Ghosh wrote:
> On Fri, Oct 28, 2016 at 9:00 AM, Peter Eisentraut
>  wrote:
>> I propose to change the psql \d output a bit, best shown with an example:
>>
>>  \d persons3
>> -   Table "public.persons3"
>> - Column |  Type   |Modifiers
>> -+-+--
>> - id | integer | not null
>> - name   | text| default ''::text
>> +  Table "public.persons3"
>> + Column |  Type   | Collation | Nullable | Default
>> ++-+---+--+--
>> + id | integer |   | not null |
>> + name   | text|   |  | ''::text
>>
> +1.
> psql-ref.sgml(line 4085) needs to be modified. Otherwise, the patch
> looks good to me.

Good catch.  I didn't see that because it said "Modifier" instead of
"Modifiers".  I also saw that the domain listing \dD also used
"Modifier", so I updated that as well to the new style.  I have
committed the patch.  Thanks for reviewing it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas  wrote:
> On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
>  wrote:
>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>> because small buildfarm machines may suffer on it. Although I've no
>> experience in this matter, I would like to be certain that nothings breaks
>> in recovery tests after some modifications.
>
> I think running the whole test suite with this enabled is going to
> provoke complaints from buildfarm owners.  That's too bad, because I
> agree with you that it would be nice to have the test coverage, but it
> seems that many of the buildfarm machines are VMs with very minimal
> resource allocations -- or very old physical machines -- or running
> with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
> you blow on them too hard, they fall over.
>
Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to
have some environment variable to pass optional conf parameters during
tap-tests.
Implementing this feature actually solves the problem.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IPv6 link-local addresses and init data type

2016-11-03 Thread Peter Eisentraut

On 6/7/16 2:43 PM, Peter Eisentraut wrote:
> On 6/7/16 1:19 AM, Haribabu Kommi wrote:
>> How about the following case, Do we treat them as same or different?
>>
>> select 'fe80::%eth1'::inet = 'fe80::%ETH1'::inet;
>>
>> fe80::%2/64 is only treated as the valid address but not other way as
>> fe80::/64%2.
>> Do we need to throw an error in this case or just ignore.
> 
> I suspect questions like these are already solved in someone else's IP 
> address type/object/class implementation, and we could look there.

I have been doing some testing and reviewed the RFCs again.  I'm having
some second thoughts about this.

I tried the IP address parsing modules in Perl, Python, Ruby, and they
all reject zone IDs.

The Java class java.net.Inet6Address accepts zone IDs, but only numbers
and names actually present on the local host.

RFC 4007 specifies that the zone IDs "should" not be used for global
addresses or the loopback address.  The presented patch does not check
for that.

The original message in this thread mentioned an address "::1%0", so
that is not even valid.  0 is supposed to be the default zone, so one
could argue that that can be silently accepted.

A zone ID is only meaningful inside a node.  So storing it in a table
without an associated node is meaningless.  (compare: storing a time
with time zone without a date)

There are also questions about comparing addresses with zone IDs, such
as in the example at the very top.  We don't have a good answer on
whether those addresses should be equal.  (My OS seems to think that
interface names are case-insensitive.)  Also, since there is no node
associated with the address, it's questionable what equal really means
anyway.

So I'm having doubts that the proposed change to allow any string to be
appended to any address is really sound.  This could invite a lot of
shenanigans, where equality comparisons or range operations are
subverted, for example.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscription

2016-11-03 Thread Artur Zakirov

Hello,

Do you have an updated version of the patch?

2016-10-18 20:41 GMT+03:00 Dmitry Dolgov <9erthali...@gmail.com>:
>
>
> > The term "subscription" is confusing me
>
> Yes, you're right. "container" is too general I think, so I renamed
> everything
> to "subscripting".
>

There is another occurrences of "subscription" including file names. Please
fix them.

Also I've sent a personal email, that we have performance degradation of
SELECT queries:

create table test (
a int2[],
b int4[][][]);

With patch:

=> insert into test (a[1:5], b[1:1][1:2][1:2])
  select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
  from generate_series(1,10);
Time: 936,285 ms

=> UPDATE test SET a[0] = '2';
Time: 1605,406 ms (00:01,605)

=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1603,076 ms (00:01,603)

=> explain analyze select a[1], b[1][1][1] from test;
 QUERY PLAN


-
 Seq Scan on test  (cost=0.00..3858.00 rows=10 width=6) (actual
time=0.035..241.280 rows=10 loops=1)
 Planning time: 0.087 ms
 Execution time: 246.916 ms
(3 rows)

And without patch:

=> insert into test (a[1:5], b[1:1][1:2][1:2])
  select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
  from generate_series(1,10);
Time: 971,677 ms

=> UPDATE test SET a[0] = '2';
Time: 1508,262 ms (00:01,508)

=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1473,459 ms (00:01,473)

=> explain analyze select a[1], b[1][1][1] from test;
 QUERY PLAN



 Seq Scan on test  (cost=0.00..5286.00 rows=10 width=6) (actual
time=0.024..98.475 rows=10 loops=1)
 Planning time: 0.081 ms
 Execution time: 105.055 ms
(3 rows)

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

[HACKERS] C based plugins, clocks, locks, and configuration variables

2016-11-03 Thread Clifford Hammerschmidt

Hi all,

Apologies in advance if this isn't the right place to be posting this.

I've started work on a plugin in C (https://github.com/tanglebones/pg_tuid)
for generating generally monotonically ascending UUIDs (aka TUIDs), and
after googling around I couldn't find any guidence on a few things. (It's
hard to google for anything in the postgres C api as most results coming
back are for using postgres itself, not developing plugins for postgres.)

I'm looking for the idiomatic (and portable) way of:

1) getting microseconds (or nanoseconds) from UTC epoch in a plugin
2) getting an exclusive lock for a user plugin to serialize access to its
shared state (I'm assuming that plugins must be reentrant)
3) creating a configuration variable for a plugin and accessing its values
in the plugin. (e.g. `set plugin.configuration_variable=1` or somesuch)

Thanks,

-- 
Clifford Hammerschmidt, P.Eng.

Re: [HACKERS] Improving RLS planning

2016-11-03 Thread Tom Lane

Dean Rasheed  writes:
> On 25 October 2016 at 22:58, Tom Lane  wrote:
>> The alternative I'm now thinking about pursuing is to get rid of the
>> conversion of RLS quals to subqueries.  Instead, we can label individual
>> qual clauses with security precedence markings.

> +1 for this approach. It looks like it could potentially be much
> simpler. There's some ugly code in the inheritance planner (and
> probably one or two other places) that it might be possible to chop
> out, which would probably also speed up planning times.

Here's a draft patch for this.  I've only addressed the RLS use-case
so far, so this doesn't get into managing the order of application of
join quals, only restriction quals.

>> 2. In order_qual_clauses, sort first by security_level and second by cost.

> I think that ordering might be sub-optimal if you had a mix of
> leakproof quals and security quals and the cost of some security quals
> were significantly higher than the cost of some other quals. Perhaps
> all leakproof quals should be assigned security_level 0, to allow them
> to be checked earlier if they have a lower cost (whether or not they
> are security quals), and only leaky quals would have a security_level
> greater than zero. Rule 1 would then not need to check whether the
> qual was leakproof, and you probably wouldn't need the separate
> "leakproof" bool field on RestrictInfo.

Hm, but it would also force leakproof quals to be evaluated in front
of potentially-cheaper leaky quals, whether or not that's semantically
necessary.

I experimented with ignoring security_level altogether for leakproof
quals, but I couldn't make it work properly, because that didn't lead to
a comparison rule that satisfies transitivity.  For instance, consider
three quals:
A: cost 1, security_level 1, leaky
B: cost 2, security_level 1, leakproof
C: cost 3, security_level 0, leakproof
A should sort before B, since same security_level and lower cost;
B should sort before C, since lower cost and leakproof;
but A must sort after C, since higher security_level and leaky.

So what I ended up doing was using your idea of forcing the security
level to 0 for leakproof quals, but only if they have cost below a
threshold (which I set at 10X cpu_operator_cost, which should take in
most built-in functions).  That at least limits the possible damage
from forcing early evaluation of a leakproof qual.  There may be
some better way to do it, though, so I didn't go so far as to remove
the separate leakproof flag.

Some other notes:

* This creates a requirement on scan-planning code (and someday on
join-planning code) to be sure it doesn't create violations of the qual
ordering rule.  Currently only indxpath.c and tidpath.c have to worry
about that AFAICS.  FDWs would need to worry about it too, except that
we don't currently allow RLS to be enabled on foreign tables.  I'm a
little concerned about whether FDWs could create security holes by
not accounting for this, but it's moot for now.  Custom scan providers
will need to pay attention as well.

* prepsecurity.c is now dead code and should be removed, but I did not
include that in this patch, since it would just bloat the patch.

* Accounting for the removal of prepsecurity.c, this is actually a net
savings of about 300 lines of code.  So I feel pretty good about that.
It also gets rid of some really messy kluges, particularly the behavior
of generating new subquery RTEs as late as halfway through
grouping_planner.  I find it astonishing that that worked at all.

* Since the planner is now depending on Query.hasRowSecurity to be set
whenever there are any securityQuals, I put in an Assert about that,
and promptly found three places in prepjointree.c and the rewriter where
we'd been failing to set it.  I have not looked to see if these represent
live bugs in existing releases, but they might.  Or am I misunderstanding
what the flag is supposed to mean?

* Aside from plan changes, there's one actual behavioral change in the
regression test results, but I think it's okay because it's a question
of whether to put a non-RLS qual before or after a leaky qual.  That's
not something we are promising anything about.

* There's one test query in updatable_views.sql where the plan collapses
to a dummy (Result with constant false qual) because the planner is now
able to see that the qual conditions are mutually contradictory.  Maybe
that query needs adjustment; I'm not sure what it's intending to test
exactly.

regards, tom lane

PS: I've been slacking on the commitfest because I wanted to push this
to a reasonably finished state before I set it aside to do CF work.
I'm not expecting it to be reviewed in this fest.

new-rls-planning-implementation-1.patch.gz
Description: new-rls-planning-implementation-1.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row level security implementation in Foreign Table in Postgres

2016-11-03 Thread Tom Lane

Robert Haas  writes:
> On Wed, Nov 2, 2016 at 10:46 PM, Sounak Chakraborty  wrote:
>> But my doubt is why this feature is not enabled in case of Foreign Table. 
>> (ALTER FOREIGN TABLE doesn't have a option of enabling Row Level Security).
>> Is this is not implemented due to some limitations in the current design?
>> Because from a quick view it looks like the security subquery can also be 
>> easily attached to the main query and passed for processing in foreign 
>> database.

> Yeah, I don't see why that couldn't be made to work.

Once the patch at <30304.1478211...@sss.pgh.pa.us> gets in, the major
issue will be that FDWs will have to be careful not to select quals for
optimization (ie pushing down to a remote server) unless they satisfy
restriction_is_securely_promotable().  In most cases that should be
about a one-line change in the FDW, but I'm not sure that it'd be a good
idea to just blindly assume that FDWs are doing that.  We could perhaps
add some sort of "supports RLS" flag to the FDW API, which would not
get set unless the FDW author takes positive action to do so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add more NLS to bin

2016-11-03 Thread Michael Paquier

On Fri, Nov 4, 2016 at 1:44 AM, Peter Eisentraut
 wrote:
> On 10/31/16 1:58 AM, Michael Paquier wrote:
>> In info.c, missing some entries in report_unmatched_relation() when
>> reporting unmatching relations?
>
> Yeah, that will need a bit of a rewrite, so FIXME later?

This patch not being complicated, so I would vote for those being
addressed now so as they are not forgotten even if there is a FIXME
flag added. Perhaps you don't think so, and as that's a minor issue
I'll be fine with your judgement as well.

>> In util.c, doesn't pg_log_v() need to handle strings used in fprintf?
>
> Which specific lines do you have in mind?

The verbose logs at the top. In pg_rewind for example those logs are
getting translated via the pg_log() calls used with PG_DEBUG.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-11-03 Thread Tom Lane

Peter Eisentraut  writes:
> On 11/2/16 1:54 PM, Tom Lane wrote:
>> I think the right thing is likely to be to copy the presented bytea
>> into a palloc'd (and therefore properly aligned) buffer.  And not
>> just in this one function.

> Does the attached look reasonable?

I'd be inclined to wrap it in some kind of support function for
conciseness; and you could put the other boilerplate (the length check)
there too.  Otherwise, +1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-03 Thread Karl O. Pinc

On Mon, 31 Oct 2016 09:26:27 +0100
Gilles Darold  wrote:

> Le 30/10/2016 à 08:04, Karl O. Pinc a écrit :

> > Have you given any thought to my proposal to change
> > CURRENT_LOG_FILENAME to LOG_METAINFO_FILE?  
> Yes, I don't think the information logged in this file are kind of
> meta information and CURRENT_LOG_FILENAME seems obvious.

To me, the CURRENT_LOG_FILENAME symbol should contain the name 
of the current log file.  It does not.  The CURRENT_LOG_FILENAME symbol
holds the name of the file which itself contains the name of 
the log file(s) being written, plus the log file
structure of each log file.

IMO, the name of the log files being written, as well as
the type of data structure written into each log file,
are meta-information about the logging data.  So maybe
the right name is LOG_METAINFO_DATAFILE.

If you're not happy with making this change that's fine.
If not, I'd like to make mention of the symbol name to
the committers.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making table reloading easier

2016-11-03 Thread Michael Paquier

On Thu, Nov 3, 2016 at 11:37 PM, Corey Huinker  wrote:
>>   ALTER TABLE my_table
>>   DISABLE INDEX ALL;
>
> +1
> This very thing came up in a conversation with PeterG early last year. I was
> in favor then and I was surprised that the only thing standing in the way
> was a lack of ALTER TABLE syntax.
> Creating temporary data structures to mimic existing metadata structures is
> a pain.

As long as the exclusive lock is kept until the end of the transaction
that would make it. I got scared when reading $subject that you were
going to propose a ENABLE ALL. This command should be a no-opt outside
a transaction block though.

>> It'd be even better to also add a REINDEX flag to COPY, where it
>> disables indexes and re-creates them after it finishes. But that could
>> be done separately.
>
> I'm iffy on the COPY change. If we add index rebuilding, why not disabling
> as well? If the COPY fails, what state do we leave the indexes in?

Yeah, I am -1 on that. Leaving the indexes in an unclean state would
force the users to REINDEX anyway afterwards.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier

On Fri, Nov 4, 2016 at 4:16 AM, Kuntal Ghosh  wrote:
> On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas  wrote:
>> On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
>>  wrote:
>>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>>> because small buildfarm machines may suffer on it. Although I've no
>>> experience in this matter, I would like to be certain that nothings breaks
>>> in recovery tests after some modifications.
>>
>> I think running the whole test suite with this enabled is going to
>> provoke complaints from buildfarm owners.  That's too bad, because I
>> agree with you that it would be nice to have the test coverage, but it
>> seems that many of the buildfarm machines are VMs with very minimal
>> resource allocations -- or very old physical machines -- or running
>> with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
>> you blow on them too hard, they fall over.

Count me in. My RPIs won't like that! Actually I have a couple of
things internally mimicking the buildfarm client code on machines with
far higher capacity. And FWIW I am definitely going to enable this
option in the test suite, finishing with reports here.

> Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to
> have some environment variable to pass optional conf parameters during
> tap-tests.
> Implementing this feature actually solves the problem.

We just need make PostgresNode.pm aware of something like PGTAPOPTIONS
to enforce a server started by TAP tests to append options to it.
There is already PGCTLTIMEOUT that behaves similarly. Even if this
brings extra load to buildfarm owners, that will limit complaints.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-11-03 Thread Haribabu Kommi

On Tue, Oct 25, 2016 at 11:40 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 10/25/16 1:38 AM, Haribabu Kommi wrote:
> > Here I attached the first version of patch that supports both EUI-48 and
> > EUI-64 type
> > Mac addresses with a single datatype called macaddr. This is an variable
> > length
> > datatype similar like inet. It can store both 6 and 8 byte addresses.
> > Variable length
> > type is used because in case in future, if MAC address gets enhanced,
> > still this type
> > can support without breaking DISK compatibility.
>
> Since the world knows the 6-byte variant as MAC-48, shouldn't it be
> renamed to macaddr48 or even mac48?

Yes. Before doing this change, it is better to confirm the approach and
then do all the changes.

1. Does everyone agrees that renaming the existing datatype without
changing the OID?

2. The old macaddress datatype rename to mac48 macaddr48
or macaddr6 or mac6.

3. Add the new datatype with the name that supports both 48 bit
and 64 bit MAC address.

4. The new datatype is of variable length datatype similar like INET,
so it can handle any future changes.

> > Currently the patch lacks of documentation. Comments?
>
> For patches like this, it would be good if you included a mock commit
> message so that someone who comes in late knows what's going on.

Thanks, I will do it from now onward.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] C based plugins, clocks, locks, and configuration variables

2016-11-03 Thread Craig Ringer

On 4 Nov. 2016 06:05, "Clifford Hammerschmidt" 
wrote:
>
> Hi all,
>
> Apologies in advance if this isn't the right place to be posting this.
>
> I've started work on a plugin in C (https://github.com/tanglebones/pg_tuid)
for generating generally monotonically ascending UUIDs (aka TUIDs), and
after googling around I couldn't find any guidence on a few things. (It's
hard to google for anything in the postgres C api as most results coming
back are for using postgres itself, not developing plugins for postgres.)
>
> I'm looking for the idiomatic (and portable) way of:
>
> 1) getting microseconds (or nanoseconds) from UTC epoch in a plugin

GetCurrentIntegerTimestamp()

> 2) getting an exclusive lock for a user plugin to serialize access to its
shared state (I'm assuming that plugins must be reentrant)

Allocate an LWLock in your shared memory segment and use it to arbitrate
access. Multiple examples in contrib. Lwlock allocation infonin developer
docs.

> 3) creating a configuration variable for a plugin and accessing its
values in the plugin. (e.g. `set plugin.configuration_variable=1` or
somesuch)

DefineCustomIntegerVariable etc (I think, name not exactly right? On
phone). See guc.h .

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas

Apologies if I've made some of these comments before and/or missed
comments you've made on these topics.  The size of this patch set is
so large that it's hard to keep track of everything.

Re-reviewing 0001:

+   indicate which table columns are used as partition key.  For example,

s/are used as/are part of the/

+   third table columns make up the partition key.  A zero in this array
+   indicates that the corresponding partition key column is an expression
+   over the table columns, rather than a simple column reference.

I think you can leave out "over the table columns".

+  columns or expressions forms the partitioning key

s/forms/form/

+  The table itself is empty.  A data row inserted into the table is routed

s/The table/The partitioned table/

+ * Anything mentioned in the expressions.  We must ignore the column
+ * references which will count as self-dependency items; in this case,
+ * the depender is the table itself (there is no such thing as partition
+ * key object).

"depender" is not really a word, and the parenthetical note isn't very
clear.  Maybe: We must ignore the column references, which will depend
on the table itself; there is no separate partition key object.

+heap_close(pg_partitioned_table, RowExclusiveLock);

It seems like it would be safe to do this right after
CatalogUpdateIndexes(pg_partitioned_table, tuple), and I'd recommend
that you do.  Not for performance or anything, but just to keep
related code together.

 /*
  * Resolve possibly-defaulted operator class specification
  */
-static Oid
+Oid
 GetIndexOpClass(List *opclass, Oid attrType,

Perhaps we should rename this function to ResolveOpClass, since it's
now going to be used for both indexes and partitioning keys.

+ * Sets *is_expr if attnum is found to be referenced in some partition key
+ * expression.

is_expr doesn't seem quite as clear as, say, used_by_expr or used_in_expr.

Also, the specification for this function doesn't seem to be very
clear about what this is supposed to do if the same column is both an
explicit partitioning column and also used in an expression, and the
code looks like it'll return with *is_expr set based on whichever use
it encounters first.  If that's intended behavior, maybe add a comment
like: It's possible for a column to be used both directly and as part
of a partition key expression; if that happens, *is_expr may end up as
either true or false.  That's OK for current uses of this function,
because *is_expr is only used to tailor the error message text.

+if (is_expr)
+*is_expr = false;
+if (attnum == partattno)
+return true;

I think you should adjust this (and the other bit in the same
function) so that you don't set *is_expr until you're committed to
returning.

+index = -1;
+while ((index = bms_next_member(expr_attrs, index)) > 0)
+{
+AttrNumber attno = index + FirstLowInvalidHeapAttributeNumber;
+
+if (attno == attnum)
+return true;
+}

How about bms_is_member(expr_attrs, attnum -
FirstLowInvalidHeapAttributeNumber), instead of looping?

+ errmsg("cannot reference relation \"%s\"",
RelationGetRelationName(pkrel)),
+ errdetail("Referencing partitioned tables in foreign
key constraints is not supported.")));

I think you could get rid of the errdetail and just have the error
message be "cannot reference partitioned table \"%s\"".

+ errmsg("column \"%s\" appears twice in
partition key", pelem->name),

It could be there three times!  How about column \"%s\" appears more
than once in partition key?  (I see that you seem to have adapted this
from some code in parse_utilcmd.c, which perhaps should also be
adjusted, but that's a job for another patch.)

+/*
+ * Strip any top-level COLLATE clause.  This ensures that we treat
+ * "x COLLATE y" and "(x COLLATE y)" alike.
+ */

But you don't, right?  Unless I am confused, which is possible, the
latter COLLATE will be ignored, while the former one will set the
collation to be used in the context of partitioning comparisons.

Re-reviewing 0002:

+   if (fout->remoteVersion >= 10)
+   {
+   PQExpBuffer acl_subquery = createPQExpBuffer();
+   PQExpBuffer racl_subquery = createPQExpBuffer();
+   PQExpBuffer initacl_subquery = createPQExpBuffer();
+   PQExpBuffer initracl_subquery = createPQExpBuffer();
+
+   PQExpBuffer attacl_subquery = createPQExpBuffer();
+   PQExpBuffer attracl_subquery = createPQExpBuffer();
+   PQExpBuffer attinitacl_subquery = createPQExpBuffer();
+   PQExpBuffer attinitracl_subquery = createPQExpBuffer();

It seems unnecessary to repeat all of this.  The only differences
between the 1 version and the 9600 v

Re: [HACKERS] Logical Replication WIP

2016-11-03 Thread Peter Eisentraut

On 11/3/16 9:31 AM, Petr Jelinek wrote:
> Release does not really change behavior, it has always dropped ephemeral
> slot.

Well, currently ephemeral is just a temporary state while a slot is
being created.  It's not really something that can exist independently.
You might as well call it RS_NOTREADY.  Therefore, dropping the slot
when you de-acquire (release) it makes sense.

But what you want is a slot that exists across acquire/release but it
dropped at the end of the session.  And what is implicit is that the
slot is only usable by one session, so you don't really need to ever
"release" it for use by other sessions.  And so half the Release calls
have been changed to Release-if-persistent, but it's not explained why
in each case.  It all seems to work OK, but there are a lot of hidden
assumptions in each case that make it hard to follow.

> So if I understand correctly what you are proposing is to change
> behavior of Release to not remove ephemeral slot, add function that
> removes the ephemeral slots of current session and add tracking of
> ephemeral slots created in current session? That seems like quite more
> complicated than what the patch does with little gain.
> 
> What about just releasing the ephemeral slot if the different one is
> being acquired instead of the current error?

Maybe that would help reducing some of the mystery about when you have
to call Release and when ReleasePersistent (better called
ReleaseIfPersistent).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14350: VIEW with INSTEAD OF INSERT TRIGGER and COPY. Missing feature or working as designed.

2016-11-03 Thread Haribabu Kommi

On Thu, Nov 3, 2016 at 5:23 PM, Dilip Kumar  wrote:

> On Thu, Nov 3, 2016 at 9:57 AM, Haribabu Kommi 
> wrote:
> > Thanks for your suggestion. Yes, I agree that adding a hint is good.
> > Updated patch is attached with addition of hint message.
> >
> > 2016-11-03 14:56:28.685 AEDT [7822] ERROR:  cannot copy to view "ttt_v"
> > 2016-11-03 14:56:28.685 AEDT [7822] HINT:  To enable copy to view,
> provide
> > an INSTEAD OF INSERT trigger
> > 2016-11-03 14:56:28.685 AEDT [7822] STATEMENT:  COPY ttt_v FROM stdin;
>
> Okay, Patch in general looks fine to me. One cosmetic comments, IMHO
> in PG we follow operator at end of the line, so move '&&' to end of
> the previous line.
>
> + if (cstate->rel->rd_rel->relkind != RELKIND_RELATION
> + && (!cstate->rel->trigdesc ||
> + !cstate->rel->trigdesc->trig_insert_instead_row))
>

changed.


> Meanwhile I will test it and give the feedback.


Thanks.

Updated patch is attached with added regression tests.

Regards,
Hari Babu
Fujitsu Australia


copy_to_view_3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-11-03 Thread Peter Eisentraut

On 10/24/16 9:22 AM, Petr Jelinek wrote:
> I also split out the libpqwalreceiver rewrite to separate patch which
> does just the re-architecture and does not really add new functionality.
> And I did the re-architecture bit differently based on the review.

That looks good to me, and it appears to address the previous discussions.

I wouldn't change walrcv_xxx to walrcvconn_xxx.  If we're going to have
macros to hide the internals, we might as well keep the names the same.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-03 Thread Thomas Munro

On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
 wrote:
> Please find attached latest patch which fix the review point as well as
> additional clean-up.

I've signed up to review this patch and I'm planning to do some
testing.  Here's some initial feedback after a quick read-through:

+ if (gather_merge_readnext(gm_state, i, initialize ? false : true))

Clunky ternary operator... how about "!initialize".

+/*
+ * Function clear out a slot in the tuple table for each gather merge
+ * slots and returns the clear clear slot.
+ */

Maybe better like this?  "_Clear_ out a slot in the tuple table for
each gather merge _slot_ and _return_ the _cleared_ slot."

+ /* Free tuple array as we no more need it */

"... as we don't need it any more"

+/*
+ * Read the next tuple for gather merge.
+ *
+ * Function fetch the sorted tuple out of the heap.
+ */

"_Fetch_ the sorted tuple out of the heap."

+ * Otherwise, pull the next tuple from whichever participate we
+ * returned from last time, and reinsert the index into the heap,
+ * because it might now compare differently against the existing

s/participate/participant/

+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California

Shouldn't this say just "(c) 2016, PostgreSQL Global Development
Group"?  Are we supposed to be blaming the University of California
for new files?

+#include "executor/tqueue.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "lib/binaryheap.h"

Not correctly sorted.

+ /*
+ * store the tuple descriptor into gather merge state, so we can use it
+ * later while initilizing the gather merge slots.
+ */

s/initilizing/initializing/

+/* 
+ * ExecEndGatherMerge
+ *
+ * frees any storage allocated through C routines.
+ * 

The convention in Postgres code seems to be to use a form like "Free
any storage ..." in function documentation.  Not sure if that's an
imperative, an infinitive, or if the word "we" is omitted since
English is so fuzzy like that, but it's inconsistent with other
documentation to use "frees" here.  Oh, I see that exact wording is in
several other files.  I guess I'll just leave this as a complaint
about all those files then :-)

+ * Pull atleast single tuple from each worker + leader and set up the heap.

s/atleast single/at least a single/

+ * Read the tuple for given reader into nowait mode, and form the tuple array.

s/ into / in /

+ * Function attempt to read tuple for the given reader and store it into reader

s/Function attempt /Attempt /

+ * Function returns true if found tuple for the reader, otherwise returns

s/Function returns /Return /

+ * First try to read tuple for each worker (including leader) into nowait
+ * mode, so that we initialize read from each worker as well as leader.

I wonder if it would be good to standardise on the terminology we use
when we mean workers AND the leader.  In my Parallel Shared Hash work,
I've been saying 'participants' if I mean = workers + leader.  What do
you think?

+ * After this if all active worker unable to produce the tuple, then
+ * re-read and this time read the tuple into wait mode. For the worker,
+ * which was able to produced single tuple in the earlier loop and still
+ * active, just try fill the tuple array if more tuples available.
+ */

How about this?  "After this, if all active workers are unable to
produce a tuple, then re-read and this time us wait mode.  For workers
that were able to produce a tuple in the earlier loop and are still
active, just try to fill the tuple array if more tuples are
available."

+ * The heap is never spilled to disk, since we assume N is not very large. So
+ * this is much simple then cost_sort.

s/much simple then/much simpler than/

+ /*
+ * Avoid log(0)...
+ */
+ N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
+ logN = LOG2(N);
...
+ /* Per-tuple heap maintenance cost */
+ run_cost += path->path.rows * comparison_cost * 2.0 * logN;

Why multiply by two?  The comment above this code says "about log2(N)
comparisons to delete the top heap entry and another log2(N)
comparisons to insert its successor".  In fact gather_merge_getnext
calls binaryheap_replace_first, which replaces the top element without
any comparisons at all and then performs a sift-down in log2(N)
comparisons to find its new position.  There is no per-tuple "delete"
involved.  We "replace" the top element with the value it already had,
just to trigger the sift-down, because we know that our comparator
function might have a new opinion of the sort order of this element.
Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
to be wrong though -- or am I misreading the code?

Also, shouldn't we add 1 to N to account for the leader?  Suppose
there are 2 workers.  There are 3 elements in

Re: [HACKERS] Confusing docs about GetForeignUpperPaths in fdwhandler.sgml

2016-11-03 Thread Etsuro Fujita


On 2016/11/03 23:39, Robert Haas wrote:

On Wed, Oct 26, 2016 at 3:18 PM, Tom Lane  wrote:

A larger issue is that I think the API itself is poorly designed, as
I stated awhile ago (<31706.1457547...@sss.pgh.pa.us>)


I agree on that point.  I plan to rewrite direct modify using upper 
planner path-ification; I think we would no longer need the planner API 
PlanDirectModify, but I expect the executor/explain APIs (ie, 
BeginDirectModify, IterateDirectModify, EndDirectModify, and 
ExplainDirectModify) would be still useful after that rewrite.  Before 
that, however, I'd like to work on extend postgres_fdw so as to handle 
more cases such as UPDATE/DELETE on a join and INSERT, with the existing 
API.



I'd really be quite happy to see you take a more active hand
in the FDW discussions; clearly, you've got a lot of understanding of
that area that is pretty much unique to you.  I'd even support an
effort to rewrite the work that has already been done in a form more
to your liking, but I think you'd need to actually pay attention to
the threads on a somewhat regular basis in order that to be practical.


+1

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-03 Thread Etsuro Fujita


On 2016/11/03 18:52, Ashutosh Bapat wrote:

Here is the updated version, which includes the restructuring you proposed.
Other than the above issue and the alias issue we discussed, I addressed all
your comments except one on testing; I tried to add test cases where the
remote query is deparsed as nested subqueries, but I couldn't because IIUC,
reduce_outer_joins reduced full joins to inner joins or left joins.



No always. It will do so in some cases as explained in the prologue of
reduce_outer_joins()

 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)


Hmm.  Would it be necessary for this patch to include test cases for 
nested subqueries?  As mentioned in a previous email, those test cases 
can be added by the split patch that allow PHVs to be evaluated on the 
remote side.



This patch again has a lot of unrelated changes, esp. in
deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
which seems unnecessary.


IIUC, I think this change was done to address your comment (see the 
comment #2 in [1]).  Am I missing something?



Or there are changes like

-deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
-   List *tlist, List *remote_conds, List *pathkeys,
+deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
+   RelOptInfo *foreignrel, List *tlist,
+   List *remote_conds, List *pathkeys,

which arise because rel has been renamed as foreignrel. The patch will
work even without such renaming.


I did that because we have a Relation named rel (the same name!) within 
that function, to execute deparseTargetList in a base-relation case. 
Another reason for that is because (1) that would match with other 
function definitions in deparse.c and (2) that would be consistent with 
the existing function definition for that function in postgres_fdw.h.


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAFjFpRfwoSsJr9418b2jA7g0nwagjZSWhPQPUmK9M6z5XSOAqQ%40mail.gmail.com





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-03 Thread Ashutosh Bapat

On Fri, Nov 4, 2016 at 9:19 AM, Etsuro Fujita
 wrote:
> On 2016/11/03 18:52, Ashutosh Bapat wrote:
>>>
>>> Here is the updated version, which includes the restructuring you
>>> proposed.
>>> Other than the above issue and the alias issue we discussed, I addressed
>>> all
>>> your comments except one on testing; I tried to add test cases where the
>>> remote query is deparsed as nested subqueries, but I couldn't because
>>> IIUC,
>>> reduce_outer_joins reduced full joins to inner joins or left joins.
>
>
>> No always. It will do so in some cases as explained in the prologue of
>> reduce_outer_joins()
>>
>>  * More generally, an outer join can be reduced in strength if there is a
>>  * strict qual above it in the qual tree that constrains a Var from the
>>  * nullable side of the join to be non-null.  (For FULL joins this applies
>>  * to each side separately.)
>
>
> Hmm.  Would it be necessary for this patch to include test cases for nested
> subqueries?  As mentioned in a previous email, those test cases can be added
> by the split patch that allow PHVs to be evaluated on the remote side.

A patch should have testcases testing the functionality added. This
patch adds functionality to deparse nested subqueries, so there should
be testcase to test it. If we can not come up with a testcase, then
it's very much possible that the code related to that functionality is
not needed. PHV is a separate patch and we do not know whether it will
be committed or when it will be committed after this patch is
committed. It's better to have self-sufficient patch.

>
>> This patch again has a lot of unrelated changes, esp. in
>> deparseSelectStmtForRel(). What was earlier part of deparseSelectSql()
>> and deparseFromExpr() is now flattened in deparseSelectStmtForRel(),
>> which seems unnecessary.
>
>
> IIUC, I think this change was done to address your comment (see the comment
> #2 in [1]).  Am I missing something?

There is some misunderstanding here. That comment basically says,
deparseRangeTblRef() duplicates code in deparseSelectStmtForRel(), so
we should either remove deparseRangeTblRef() altogether or should keep
it to minimal avoiding duplication of code. What might have confused
you is the last sentence in that comment "This will also make the
current changes to deparseSelectSql unnecessary." By current changes I
meant changes to deparseSelectSql() in your patch, not the one that's
in the repository. I don't think we should flatten
deparseSelectStmtForRel() in this patch.

>
>> Or there are changes like
>>
>> -deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo
>> *rel,
>> -   List *tlist, List *remote_conds, List *pathkeys,
>> +deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
>> +   RelOptInfo *foreignrel, List *tlist,
>> +   List *remote_conds, List *pathkeys,
>>
>> which arise because rel has been renamed as foreignrel. The patch will
>> work even without such renaming.
>
>
> I did that because we have a Relation named rel (the same name!) within that
> function, to execute deparseTargetList in a base-relation case.

That's because you have flattened deparseSelectStmtForRel(). If we
don't flatten it out, the variable name conflict doesn't arise.

> Another
> reason for that is because (1) that would match with other function
> definitions in deparse.c and (2) that would be consistent with the existing
> function definition for that function in postgres_fdw.h.
>

That would be a separate patch, I guess.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Contents of "backup_label" and "*.backup" in pg_wal location

2016-11-03 Thread Venkata B Nagothi

Hello Hackers,

I have a question regarding the contents being written to the backup_label
file and the .backup file in the pg_wal location generated when the online
backup is done.

backup_label file contents are as follows, which do not contain backup stop
position (timestamp and STOP WAL location). The below information is
written when pg_start_backup() is done.

START WAL LOCATION: 0/4460 (file 00010044)
CHECKPOINT LOCATION: 0/4498
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-11-04 14:24:25 AEDT
LABEL: 222

I see the following contents in the file
"00010044.0060.backup" which was generated in the
pg_wal location during the online backup. When pg_stop_backup() is
executed, the following content is written which includes the content
copied from the backup_label file.


START WAL LOCATION: 0/4460 (file 00010044)
STOP WAL LOCATION: 0/44000168 (file 00010044)
CHECKPOINT LOCATION: 0/4498
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-11-04 14:24:25 AEDT
LABEL: 222
STOP TIME: 2016-11-04 14:25:09 AEDT


Can someone please help me know the importance of the above file ?
How about having the same contents in the backup_label file as well ?

I am working on another patch and was thinking it would be good if the same
contents can be written to backup_label file as well. Actually, it would be
more beneficial to do so.

As of now, the backup_label file does not have any information related to
when and at what position the backup actually completed.

Any thoughts / apprehensions ?

Regards,

Venkata B N
Database Consultant

Fujitsu Australia

Re: [HACKERS] Contents of "backup_label" and "*.backup" in pg_wal location

2016-11-03 Thread Michael Paquier

On Fri, Nov 4, 2016 at 1:18 PM, Venkata B Nagothi  wrote:
> I see the following contents in the file
> "00010044.0060.backup" which was generated in the pg_wal
> location during the online backup. When pg_stop_backup() is executed, the
> following content is written which includes the content copied from the
> backup_label file.
>
> [...]
>
> Can someone please help me know the importance of the above file?

It is not actually critical, and useful for debugging (you could say
the same about backup_label.old).

> How about having the same contents in the backup_label file as well?

> As of now, the backup_label file does not have any information related to
> when and at what position the backup actually completed.

Yes, and it is not actually possible to write the stop information
because when a backup finishes the backup_label is simply removed and
it has been included in the backup before it finishes. The role of
this file is to provide the LSN start location from which the backup
is able to replay things cleanly. The stop position, aka when
everything on disk is consistent, is determined at replay by the
XLOG_BACKUP_END record. This stop position is not something you can
know when the backup_label file is generated. And I am of course
talking about exclusive backups here.

For non-exclusive backups, it could be possible to add some
information related to the backup stop locations directly in the
backup_label file generated and returned to the caller, though the
information it would contain does not seem much pertinent as what
really matters is the LSN end location, and that's already a piece of
information available.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-03 Thread Michael Paquier

On Fri, Nov 4, 2016 at 12:00 PM, Thomas Munro
 wrote:
> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
>
> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
> Group"?  Are we supposed to be blaming the University of California
> for new files?

If the new file contains a portion of code from this age, yes. If
that's something completely new using only PGDG is fine. At least
that's what I can conclude by looking at git log -p and search for
"new file mode".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

80 matches

Mail list logo