date:20171004

Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-10-04 Thread Michael Paquier

On Thu, Oct 5, 2017 at 1:23 AM, Bossart, Nathan  wrote:
> Presently, there are a few edge cases in vacuum_rel() and analyze_rel() that I
> believe do not have sufficient logging.  This was discussed a bit in the
> vacuum-multiple-relations thread [0], but it was ultimately decided that any
> logging changes should be proposed separately.

I think that I agree with that, especially now with VACUUM allowing
multiple relations. The discussion then would be how much logging we
want. WARNING looks adapted per the discussions we had on the other
thread as manual VACUUMs can now involve much more relations, even
with partitioned tables. More opinions would be welcome.

> So, the attached patch changes the existing lock contention message to be
> emitted for non-autovacuum sessions if necessary, and it adds a "skipping"
> message when a specified relation disappears before it is processed.  For
> consistency, autovacuum logs are emitted at LOG, and logs for manual commands
> are emitted at WARNING.  This patch also includes a minor documentation change

This is here:
 250ms or longer will be logged.  In addition, when this parameter is
 set to any value other than -1, a message will be
-logged if an autovacuum action is skipped due to the existence of a
-conflicting lock.  Enabling this parameter can be helpful
+logged if an autovacuum action is skipped due to a
conflicting lock or a
+concurrently dropped relation.  Enabling this parameter can be helpful
 in tracking autovacuum activity.  This parameter can only be set in
So that looks adapted to the patch.

> and a test that exercises a bit of this functionality.

My take on those test would be to not include them. This is a lot just
to test two logging lines where the relation has been dropped.

> If this change were to be considered for back-patching, we would likely want 
> to
> also apply Michael's RangeVar fix for partitioned tables to 10 [1].  Without
> this change, log messages for unspecified partitions will be emitted with the
> parent's RangeVar information.

Well, that's assuming that we begin logging some information for
manual VACUUMs using the specified RangeVar, something that does not
happen at the top of upstream REL_10_STABLE, but can happen if we were
to include the patch you are proposing on this thread for
REL_10_STABLE. But the latter is not going to happen. Or did you patch
your version of v10 to do so in your stuff? For v10 the ship has
already sailed, so I think that it would be better to just let it go,
and rely on v11 which has added all the facility we wanted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging idle checkpoints

2017-10-04 Thread Kyotaro HORIGUCHI

At Tue, 3 Oct 2017 08:22:27 -0400, Stephen Frost  wrote in 
<2017100317.gj4...@tamriel.snowman.net>
> Greetings,
> 
> * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> > At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier 
> >  wrote in 
> > 
> > > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> > > Since their introduction in
> > > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> > > number of checkpoint requests, not if a checkpoint has been actually
> > > executed or not, I am not sure that this should be changed after 10
> > > years. So, to put it in other words, wouldn't we want a way to track
> > > checkpoints that are *executed*, meaning that we could increment a
> > > counter after doing the skip checks in CreateRestartPoint() and
> > > CreateCheckPoint()?
> > 
> > This sounds reasonable to me.
> 
> I agree that tracking executed checkpoints is valuable, but, and perhaps
> I'm missing something, isn't that the same as tracking non-skipped
> checkpoints? I suppose we could have both, if we really feel the need,
> provided that doesn't result in more work or effort being done than
> simply keeping the count.  I'd hate to end up in a situation where we're
> writing things out unnecessairly just to keep track of checkpoints that
> were requested but ultimately skipped because there wasn't anything to
> do.

I'm fine with counting both executed and skipped. But perhaps the
time of lastest checkpoint fits the concern better, like
vacuum. It is seen in control file but not in system views. If we
have count skipped checkpoints, I'd like to see the time (or LSN)
of last checkpoint in system views.

  checkpoints_timed | bigint   |   |  | 
  checkpoints_req   | bigint   |   |  | 
+ checkpoints_skipped   | bigint
+ last_checkpint| timestamp with time zone or LSN?


# This reminded me of a concern. I'd like to count vacuums that
# are required but skipped by lock-failure, or killed by other
# backend.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Allow multiple tables to be specified in one VACUUM or ANALYZE c

2017-10-04 Thread Michael Paquier

On Wed, Oct 4, 2017 at 7:53 AM, Tom Lane  wrote:
> Allow multiple tables to be specified in one VACUUM or ANALYZE command.
>
> Not much to say about this; does what it says on the tin.
>
> However, formerly, if there was a column list then the ANALYZE action was
> implied; now it must be specified, or you get an error.  This is because
> it would otherwise be a bit unclear what the user meant if some tables
> have column lists and some don't.
>
> Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
> editorialization by me

Tom, it seems to me that in the portions you have editorialized, you
have forgotten to update two comments still mentioning get_rel_oids()
in vacuum.c and analyze.c. Those should now refer to
expand_vacuum_rel() instead. Please see the attached.
-- 
Michael


vacuum-fix-comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier

On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan  wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  Even 
> with the communities fix plus YiWen’s fix I can still get duplicate rows.  
> What works for “in-block” hot chains may not work when spanning blocks.

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is waiting 
> to vacuum which one by one will take a while I usually just “pkill -9 psql”.  
> After that I have many of duplicate “id=3” rows.  On top of that I think we 
> might have a lock leak.  After the pkill I tried to rerun setup.sql to 
> drop/create the table and it hangs.  I see an autovacuum process starting and 
> existing every couple of seconds.  Only by killing and restarting PG can I 
> drop the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] document and use SPI_result_code_string()

2017-10-04 Thread Peter Eisentraut

On 10/2/17 03:28, Daniel Gustafsson wrote:
>> On 06 Sep 2017, at 14:25, Tom Lane  wrote:
>>
>> Michael Paquier  writes:
>>> Fine for 0002. This reminds me of LockGXact and RemoveGXact in
>>> twophase.c, as well as _hash_squeezebucket that have some code paths
>>> that cannot return... Any thoughts about having some kind of
>>> PG_NOTREACHED defined to 0 which could be put in an assertion?
>>
>> Generally we just do "Assert(false)", maybe with "not reached" in a
>> comment.  I don't feel a strong need to invent a new way to do that.
> 
> Moving this to the next commitfest and bumping status to Ready for committer
> based on the discussion in this thread.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block level parallel vacuum WIP

2017-10-04 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 4:31 PM, Masahiko Sawada  wrote:
> On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
>  wrote:
>> On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada  
>> wrote:
>>> Since v4 patch conflicts with current HEAD I attached the latest version 
>>> patch.
>>
>> Hi Sawada-san,
>>
>> Here is an interesting failure with this patch:
>>
>> test rowsecurity  ... FAILED
>> test rules... FAILED
>>
>> Down at the bottom of the build log in the regression diffs file you can see:
>>
>> ! ERROR: cache lookup failed for relation 32893
>>
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907
>>
>
> Thank you for letting me know.
>
> Hmm, it's an interesting failure. I'll investigate it and post the new patch.
>

Since the patch conflicts with current HEAD, I've rebased the patch
and fixed a bug. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


parallel_vacuum_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Wood, Dan

Whatever you do make sure to also test 250 clients running lock.sql.  Even with 
the communities fix plus YiWen’s fix I can still get duplicate rows.  What 
works for “in-block” hot chains may not work when spanning blocks.

Once nearly all 250 clients have done their updates and everybody is waiting to 
vacuum which one by one will take a while I usually just “pkill -9 psql”.  
After that I have many of duplicate “id=3” rows.  On top of that I think we 
might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.

On 10/4/17, 6:31 PM, "Michael Paquier"  wrote:

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera  
wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain 
is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
> /*
>  * If xmin isn't what we're expecting, the slot 
must have been
>  * recycled and reused for an unrelated tuple.  
This implies that
>  * the latest version of the row was deleted, so 
we need do
>  * nothing.  (Should be safe to examine xmin 
without getting
>  * buffer's content lock, since xmin never 
changes in an existing
>  * tuple.)
>  */
> if 
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>  
priorXmax))

Agreed. That's not good.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera  wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is 
>> continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
>> > HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
>> FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
> /*
>  * If xmin isn't what we're expecting, the slot must 
> have been
>  * recycled and reused for an unrelated tuple.  This 
> implies that
>  * the latest version of the row was deleted, so we 
> need do
>  * nothing.  (Should be safe to examine xmin without 
> getting
>  * buffer's content lock, since xmin never changes in 
> an existing
>  * tuple.)
>  */
> if 
> (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>  
> priorXmax))

Agreed. That's not good.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Michael Paquier

On Thu, Oct 5, 2017 at 4:12 AM, Robert Haas  wrote:
> On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
>  wrote:
>> On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas  wrote:
>>> Not really; dynahash won't merge two keys just because their hash
>>> codes come out the same.  But you're right; that's probably not the
>>> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
>>> like using tag_hash would be superior.
>>
>> Yes, using tag_hash would be just better than any custom formula.
>
> OK, here's v4, which does it that way.

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-10-04 Thread Vaishnavi Prabakaran

On Mon, Oct 2, 2017 at 8:31 PM, Daniel Gustafsson  wrote:

> > On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <
> vaishnaviprabaka...@gmail.com> wrote:
> >
> > On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer  > wrote:
> >
> > I really do not like calling it "commit" as that conflates with a
> database commit.
> >
> > A batch can embed multiple BEGINs and COMMITs. It's entirely possible
> for an earlier part of the batch to succeed and commit, then a later part
> to fail, if that's the case. So that name is IMO wrong.
> >
> > Ok, SendQueue seems ok to me as well. Will change it in next version.
> >
> > +"a"?
> >
> > Hmm, Can you explain the question please. I don't understand.
> >
> > s/of new query/of a new query/
> >
> > Thanks for explaining. Will change this too in next version.
>
> Based on the discussions in this thread, and that a new version hasn’t been
> submitted, I’m marking this Returned with Feedback.  Please re-submit the
> new
> version in an upcoming commitfest when ready.



Thanks for the suggestion and, OK I will create a new patch in upcoming
commitfest with attached patch addressing above review comments.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-04 Thread Badrul Chowdhury

Okay, I will add a mechanism to try connecting with 3.0 if 3.1 fails- that 
should be a few lines of code fe-connect.c; this will eliminate the need for a 
back-patch. What do you think of the rest of the change? 

Thanks,
Badrul

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Wednesday, October 4, 2017 4:54 AM
To: Tom Lane 
Cc: Badrul Chowdhury ; Satyanarayana Narlapuram 
; Craig Ringer ; 
Peter Eisentraut ; Magnus Hagander ; 
PostgreSQL-development 
Subject: Re: [HACKERS] Re: protocol version negotiation (Re: Libpq 
PGRES_COPY_BOTH - version compatibility)

On Tue, Oct 3, 2017 at 9:46 PM, Tom Lane  wrote:
> Badrul Chowdhury  writes:
>> 1. Pgwire protocol v3.0 with negotiation is called v3.1.
>> 2. There are 2 patches for the change: a BE-specific patch that will be 
>> backported and a FE-specific patch that is only for pg10 and above.
>
> TBH, anything that presupposes a backported change in the backend is 
> broken by definition.  We expect libpq to be able to connect to older 
> servers, and that has to include servers that didn't get this memo.
>
> It would be all right for libpq to make a second connection attempt if 
> its first one fails, as we did in the 2.0 -> 3.0 change.

Hmm, that's another approach, but I prefer the one advocated by Tom Lane.

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F30788.1498672033%40sss.pgh.pa.us=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=jLwhk6twUrlsm9K6yLronVvg%2Fjx93MM37UXm6NndfLY%3D=0
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F24357.1498703265%2540sss.pgh.pa.us=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=gtFfNcxR3qK7rzieQQ0EAOFn%2BsDsw8rjtQeWwyIv6EY%3D=0

--
Robert Haas
EnterpriseDB: 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=wf9cTkQEnRzkdaZxZ1D6NBY9kZbiViyni5lkA7nzEXM%3D=0
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan

On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera  wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is 
>> continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
>> > HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
>> FreezeId instead of the Xmin when the transaction happened

As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).

> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.

Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)

Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw super user checks

2017-10-04 Thread Jeff Janes

On Thu, Sep 14, 2017 at 1:08 PM, Robert Haas  wrote:

> On Thu, Sep 14, 2017 at 2:33 PM, Jeff Janes  wrote:
> > I think that foreign tables ought to behave as views do, where they run
> as
> > the owner rather than the invoker.  No one has talked me out of it, but
> no
> > one has supported me on it either.  But I think it is too late to change
> > that now.
>
> That's an interesting point.  I think that you can imagine use cases
> for either method.  Obviously, if what you want to do is drill a hole
> through the Internet to another server and then expose it to some of
> your fellow users, having the FDW run with the owner's permissions
> (and credentials) is exactly right.  But there's another use case too,
> which is where you have something that looks like a multi-user
> sharding cluster.  You want each person's own credentials to carry
> over to everything they do remotely.
>

OK.  And if you want the first one, you can wrap it in a view currently,
but if it were changed I don't know what you would do if you want the 2nd
one (other than having every user create their own set of foreign tables).
So I guess the current situation is more flexible.

It does seem like it would then be a good idea to have a user mapping
option of "pass_the_hash" which would look up md5 hash from the pg_authid
(if the local username is the same as the remote user name) and use that to
connect to the foreign server, as an alternative option to recording the
password in plain text in the mapping itself.  But that would require some
changes to libpq, not just postgres_fdw.

And that wouldn't work for SCRAM.  I guess that SCRAM does have some
feature to allow this kind of delegation, but I don't know enough about it
to know how hard it would be to implement in postgres_fdw or how useful it
would be to have.

>
> I feel like the USER MAPPING stuff is a pretty clunky and annoying way
> of trying to make this work, no matter which of those use cases you
> happen to have.  But I'm not exactly sure what would be better,
> either, and like you say, it's a bit late to be breaking compatibility
> at this point.
>

Yeah, I have not been finding it enjoyable.  How much flexibility does the
SQL/MED spec even give us (I don't have access to the spec)?  From what I
could tell, it requires USER MAPPING to exist but doesn't give any details,
and doesn't say there can't be something else one could optionally use
instead.

Cheers,

Jeff

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Jeff Janes

On Tue, Oct 3, 2017 at 6:44 AM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Tue, Oct 3, 2017 at 6:33 AM, Tom Lane  wrote:
> >> I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the
> >> question is what are you going to verify against?
>
> > One way to do it would be to default to the "system global certificate
> > store", which is what most other SSL apps do. For example on a typical
> > debian/ubuntu, that'd be the store in /etc/ssl/certs/ca-
> certificates.crt.
> > Exactly where to find them would be distribution-specific though, and we
> > would need to actually add support for a second certificate store. But
> that
> > would probably be a useful feature in itself.
>
> Maybe.  The impression I have is that it's very common for installations
> to use a locally-run CA to generate server and client certs.  I would not
> expect them to put such certs into /etc/ssl/certs.


Well, I would do it that way if it worked.  Not directly /etc/ssl/certs,
but /etc/pki/ca-trust/source/anchors/

I would like the locally-run CA to able to sign not just postgresql server
certs, but also apache server certs.  And then install the CA cert file in
one place per client  and have it work for psql, curl, wget, etc.

Cheers,

Jeff

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-04 Thread Robert Haas

On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich  wrote:
> Tom Lane writes:
>> Presumably somebody could dig into the libc source code and prove or
>> disprove this, though it would sure help to know exactly what platform
>> and version Andreas is testing on.
>
> This is the code in glibc-2.24 around the crash site:
>
> ,[ glibc-2.24/elf/dl-load.c:442 ]
> |   to_free = cp = expand_dynamic_string_token (l, cp, 1);
> |
> |   size_t len = strlen (cp);
> `
>
> …while expand_dynamic_string_token will indeed return NULL on a failed
> malloc.  Code in the most recent glibc looks the same, so I'll carry
> this issue over to the glibc bugzilla then.

You know, I was pretty impressed with sqlsmith when it was only
finding bugs in PostgreSQL.  Finding bugs in glibc is even more
impressive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why subplan is 10x faster then function?

2017-10-04 Thread Pavel Stehule

2017-10-01 12:45 GMT+02:00 Sokolov Yura :

> 1 октября 2017 г. 12:42:14 GMT+03:00, Pavel Stehule <
> pavel.steh...@gmail.com> пишет:
> >2017-09-30 23:23 GMT+02:00 Pavel Stehule :
> >
> >> Hi
> >>
> >> I have some strange slow queries based on usage "view" functions
> >>
> >> one function looks like this:
> >>
> >> CREATE OR REPLACE FUNCTION
> >ides_funcs.najdatsplt_cislo_exekuce(mid_najdatsplt
> >> bigint)
> >>  RETURNS character varying
> >>  LANGUAGE sql
> >>  STABLE
> >> AS $function$
> >> select CISLOEXEKUCE
> >>   from najzalobpr MT, najvzallok A1,
> >> NAJZALOBST A2, NAJZALOBCE A3 where
> >> MT.ID_NAJVZALLOK= A1.ID_NAJVZALLOK AND
> >> A1.ID_NAJZALOBST=A2.ID_NAJZALOBST AND
> >> A2.ID_NAJZALOBCE= A3.ID_NAJZALOBCE AND
> >> MT.ID_NAJDATSPLT = mID_NAJDATSPLT  LIMIT 1;
> >> $function$ cost 20
> >> ;
> >>
> >> I know so using this kind of functions is not good idea - it is
> >customer
> >> old code generated from Oracle. I had idea about possible planner
> >issues.
> >> But this is a executor issue.
> >>
> >> when this function is evaluated as function, then execution needs
> >about 46
> >> sec
> >>
> >> ->  Nested Loop Left Join  (cost=0.71..780360.31 rows=589657
> >> width=2700) (actual time=47796.588..47796.588 rows=0 loops=1)
> >>   ->  Nested Loop  (cost=0.29..492947.20 rows=589657
> >width=2559)
> >> (actual time=47796.587..47796.587 rows=0 loops=1)
> >> ->  Seq Scan on najdatsplt mt  (cost=0.00..124359.24
> >> rows=1106096 width=1013) (actual time=47796.587..47796.587 rows=0
> >loops=1)
> >>   Filter:
> >(najdatsplt_cislo_exekuce(id_najdatsplt) IS
> >> NOT NULL)
> >>   Rows Removed by Filter: 654
> >>
> >> When I use correlated subquery, then
> >>
> >>  ->  Nested Loop  (cost=0.29..19876820.11 rows=589657 width=2559)
> >(actual
> >> time=3404.154..3404.154 rows=0 loops=1)
> >>   ->  Seq Scan on najdatsplt mt  (cost=0.00..19508232.15 rows=1106096
> >> width=1013) (actual time=3404.153..3404.153 rows=0 loops=1)
> >>   Filter: ((SubPlan 11) IS NOT NULL)
> >>   Rows Removed by Filter: 654
> >>   SubPlan 11
> >> ->  Limit  (cost=1.10..17.49 rows=1 width=144) (actual
> >> time=0.002..0.002 rows=0 loops=654)
> >>   ->  Nested Loop  (cost=1.10..17.49 rows=1 width=144)
> >(actual
> >> time=0.002..0.002 rows=0 loops=654)
> >> ->  Nested Loop  (cost=0.83..17.02 rows=1
> >width=8)
> >> (actual time=0.002..0.002 rows=0 loops=654)
> >>   ->  Nested Loop  (cost=0.56..16.61 rows=1
> >> width=8) (actual time=0.002..0.002 rows=0 loops=654)
> >>
> >> The execution plan is +/- same - the bottleneck is in function
> >execution
> >>
> >> Tested with same result on 9.6, 10.
> >>
> >> Is known overhead of function execution?
> >>
> >>
> >profile of slow execution looks like
> >
> >+   24,71%24,40% 48235  postmaster  [.] SearchCatCache
> >+   14,25% 0,00% 0  postmaster  [unknown]   [.]
> >
> >+9,76% 9,65% 19071  postmaster  [.]
> >TupleDescInitEntry
> >+3,91% 3,86%  7625  postmaster  [.]
> >ExecAssignScanProjectionInfoWithVarno
> >+3,56% 3,52%  6955  postmaster  [.] AllocSetAlloc
> >+2,66% 2,63%  5193  postmaster  [.]
> >FunctionCall2Coll
> >+2,65% 2,62%  5183  postmaster  [.]
> >ResourceArrayRemove
> >+2,42% 2,39%  4719  postmaster  [.]
> >ExecTypeFromTLInternal
> >+2,21% 2,19%  4321  postmaster  [.]
> >DirectFunctionCall1Coll
> >+2,02% 2,00%  3961  postmaster  [.]
> >heap_getsysattr
> >+1,85% 1,82%  3604  postmaster  [.]
> >exprTypmod
> >+1,81% 1,79%  3540  postmaster  [.]
> >ResourceArrayAdd
> >+1,68% 1,66%  3282  postmaster  [.]
> >hash_uint32
> >+1,65% 1,63%  3214  postmaster  [.]
> >hash_search_with_hash_value
> >+1,64% 1,62%  3208  postmaster  [.]
> >CatalogCacheComputeHashValue
> >+1,28% 1,26%  2498  postmaster  [.]
> >MemoryContextAllocZeroAligned
> >+1,25% 1,24%  2446  postmaster  [.] palloc0
> >
> >Any ides why SearchCatCache is called too often?
> >
> >
> >
> >> Regards
> >>
> >> Pavel
> >>
>
> Looks like you've already collected profile with call-graph. So you can
> tell us where it were called from.
>

I have more info no. Probably ExecInitIndexScan/ExecAssignResultTypeFromTL
is very expensive, when it is called often (in this case for every row of
table)

  - 20,88% ExecInitIndexScan
 - 16,31% ExecAssignResultTypeFromTL
- 16,22% ExecTypeFromTL
   - 15,87% ExecTypeFromTLInternal
  - 13,39%

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Robert Haas

On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
 wrote:
> On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas  wrote:
>> Not really; dynahash won't merge two keys just because their hash
>> codes come out the same.  But you're right; that's probably not the
>> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
>> like using tag_hash would be superior.
>
> Yes, using tag_hash would be just better than any custom formula.

OK, here's v4, which does it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


64-bit-queryid-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Nico Williams

On Wed, Oct 04, 2017 at 11:47:45AM -0700, Jeff Janes wrote:
> On Mon, Oct 2, 2017 at 9:33 PM, Tom Lane  wrote:
> > It's possible that we could adopt some policy like "if the root.crt file
> > exists then default to verify" ... but that seems messy and unreliable,
> > so I'm not sure it would really add any security.
> 
> That is what we do.  If root.crt exists, we default to verify-ca.
> 
> And yes, it is messy and unreliable.  I don't know if it adds any security
> or not.
> 
> Or do you mean we could default to verify-full instead of verify-ca?

I would rather psql defaulted to verify-full and let users deal with
errors by either a) configuring appropriate trust anchors and
provisioning appropriate certificates, or b) disabling verify-full.

Users should know that they are using psql(1) insecurely -- it has to be
obvious.

Yes, this would be a backwards-incompatible change, but security tends
to justify this sort of change.

Another possibility would be to make this default change only applicable
when using postgresql-scheme URIs (which I do, almost religiously --
they are much easier to use than all alternative connection data
specifications).

Nico
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-04 Thread Robert Haas

On Wed, Oct 4, 2017 at 8:23 AM, Ashutosh Bapat
 wrote:
> I am not sure whether your assumption that expression with no Vars
> would have em_relids empty is correct. I wonder whether we will add
> any em_is_child members with empty em_relids; looking at
> process_equivalence() those come from RestrictInfo::left/right_relids
> which just indicates the relids at which that particular expression
> can be evaluated. Place holder vars is an example when that can
> happen, but there may be others. To verify this, I tried attached
> patch on master and ran make check. The assertion didn't trip. If
> em_relids is not NULL, bms_is_subset() is fine.

I spent some more time experimenting with this.  I found that cases
where an em_is_child equivalence class contains multiple relids are
quite easy to generate, e.g. select * from foo, bar where foo.a +
bar.a = 0, where foo and bar are partitioned.  However, I wasn't able
to generate a case where an em_is_child equivalence class has no
relids at all, and I'm out of ideas about how such a thing could
occur.  I suspect it can't.  I wondered whether there was some problem
with the multiple-relids case, but I can't find an example where that
misbehaves either.  So maybe it's fine (or maybe I'm just not smart
enough to find the case where it breaks).

>> I don't think I believe that comment, either.  In the case from which
>> that comment was copied (mark_dummy_rel), it was talking about a
>> RelOptInfo, and geqo_eval() takes care to remove any leftover pointers
>> to joinrels creating during a GEQO cycle.  But there's no similar
>> logic for ppilist, so I think what will happen here is that you'll end
>> up with a freed node in the middle of the list.
>
> In mark_dummy_rel() it's not about RelOptInfo, it's about the pathlist
> with dummy path being created in the same context as the RelOptInfo.
> Same applies here.

Oops.  I was thinking that the ppilist was attached to some
planner-global structure, but it's not; it's hanging off the
RelOptInfo.  So you're entirely right, and I'm just being dumb.

> We need to reparameterize any path which contains further paths and/or
> contains expressions that point to the parent relation. For a given
> path we need to reparameterize any paths that it contains and
> translate any expressions that are specific to that path. Expressions
> common across the paths are translated after the switch case. I have
> added this rule to the comment just above the switch case
> /*
>  * Copy of the given path. Reparameterize any paths referenced by the 
> given
>  * path. Replace parent Vars in path specific expressions by corresponding
>  * child Vars.
>  */
> Does that look fine or we want to add explanation for every node handled here.

No, I don't think we want something for every node, just a general
explanation at the top of the function.  Maybe something like this:

Most fields from the original path can simply be flat-copied, but any
expressions must be adjusted to refer to the correct varnos, and any
paths must be recursively reparameterized.  Other fields that refer to
specific relids also need adjustment.

>> I don't see much point in the T_SubqueryScanPath and T_ResultPath
>> cases in reparameterize_path_by_child().  It's just falling through to
>> the default case.
>
> I added those cases separately to explain why we should not see those
> cases in that switch case. I think that explanation is important
> (esp. considering your comment above) and associating those comment
> with "case" statement looks better. Are you suggesting that we should
> add that explanation in default case?

Or leave the explanation out altogether.

>> I wonder if reparameterize_path_by_child() ought to default to
>> returning NULL rather than throwing an error; the caller would then
>> have to be prepared for that and skip building the path.  But that
>> would be more like what reparameterize_path() does, and it would make
>> failure to include some relevant path type here a corner-case
>> performance bug rather than a correctness issue.  It seems like
>> someone adding a new path type could quite easily fail to realize that
>> it might need to be added here, or might be unsure whether it's
>> necessary to add it here.
>
> I am OK with that. However reparameterize_path_by_child() and
> reparameterize_paths_by_child() are callers of
> reparameterize_path_by_child() so they will need to deal with NULL
> return. I am fine with that too, but making sure that we are on the
> same page. If we do that, we could simply assert that the switch case
> doesn't see T_SubqueryScanPath and T_ResultPath.

Or do nothing at all about those cases.

I noticed today that the version of the patchset I have here says in
the header comments for reparameterize_path_by_child() that it returns
NULL if it can't reparameterize, but that's not what it actually does.
If you make this change, the existing comment will become correct.

The

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan

On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan  wrote:
> The early “break;” here is likely the xmin frozen reason as I found in the 
> other loop.

It looks that way.

Since we're already very defensive when it comes to this xmin/xmax
matching business, and we're defensive while following an update chain
more generally, I wonder if we should be checking
HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON
CONFLICT DO UPDATE, where t_ctid block number might actually be a
speculative insertion token). Or, at least acknowledging that case in
comments. I remember expressing concern that something like this was
possible at the time that that went in.

We certainly don't want to have heap_abort_speculative() "super
delete" the wrong tuple in the event of item pointer recycling. There
are at least defensive sanity checks that would turn that into an
error within heap_abort_speculative(), so it wouldn't be a disaster if
it was attempted. I don't think that it's possible in practice, and
maybe it's sufficient that routines like heap_get_latest_tid() check
for a sane item offset, which will discriminate against
SpecTokenOffsetNumber, which must be well out of range for ItemIds on
the page. Could be worth a comment.

(I guess that heap_prune_chain() wouldn't need to be changed if we
decide to add such comments, because the speculative tuple ItemId is
going to be skipped over due to being ItemIdIsUsed() before we even
get there.)
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Jeff Janes

On Mon, Oct 2, 2017 at 9:33 PM, Tom Lane  wrote:

>
> It's possible that we could adopt some policy like "if the root.crt file
> exists then default to verify" ... but that seems messy and unreliable,
> so I'm not sure it would really add any security.
>

That is what we do.  If root.crt exists, we default to verify-ca.

And yes, it is messy and unreliable.  I don't know if it adds any security
or not.

Or do you mean we could default to verify-full instead of verify-ca?

Cheers,

Jeff

Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams

Ah, David Fetter points out that I should also update tabe completion
for psql.  I'll do that at some point.  I notice there's no table
completion for column constraint attributes...  If it's obvious enough
I'll try to fix that too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams

Ay, NOT WIP -- I left that in the Subject: by accident.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams


[make check-world passes.  Tests and docs included.  Should be ready for
code review.]

Attached are patches to add an ALWAYS DEFERRED option to CONSTRAINTs and
CONSTRAINT TRIGGERs, meaning: SET CONSTRAINTS .. IMMEDIATE will not make
immediate any constraint/trigger that is declared as ALWAYS DEFERRED.

I.e., the opposite of NOT DEFERRED.

Motivation:

 - Security.

   One may have triggers they need to always be deferred and they
   cannot give direct PG access because of SET CONSTRAINTS .. IMMEDIATE.

   I have such triggers that must run at the end of the transaction
   (after the last statement prior to COMMIT sent by the client/user),
   which I make DEFERRABLE, INITIALLY DEFERRED CONSTRAINT TRIGGERs.

   I have written SQL code to detect that constraint triggers have fired
   too soon, but I'd rather not need it as it does slow things down (it
   uses DDL event triggers and per-table triggers).

   Making it easier to write secure code DEFERRED CONSTRAINT TRIGGERs
   seems like a good idea to me.

 - Symmetry.

   Not using NOT DEFERRABLE is not the inverse of NOT DEFERRABLE.  There
   is no inverse at this time.

   If we can have NOT DEFERRABLE constraints, why not also the inverse,
   a constraint that cannot be made IMMEDIATE with SET CONSTRAINTs?


I've *not* cleaned up C style issues in surrounding -- I'm not sure
if that's desired.  Not cleaning up makes it easier to see what I
changed.

Some questions for experienced PostgreSQL developers:

Q0: Is this sort of patch welcomed?

Q1: Should new columns for pg_catalog tables go at the end, or may they
be added in the middle?

FYI, I'm adding them in the middle, so they are next to related
columns.

Q2: Can I add new columns to information_schema tables, or are there
standards-compliance issues with that?

This is done in the second patch, and it can be dropped safely.

Q3: Perhaps I should make this NOT IMMEDIATE rather than ALWAYS DEFERRED?
Making it NOT IMMEDIATE has the benefit of not having to change the
precedence of ALWAYS to avoid a shift/reduce conflict...  It may
also be more in keeping with NOT DEFERRED.  Thoughts?

Nico
-- 
>From 1d04483511f99cd3417df571ecc0498e928ace35 Mon Sep 17 00:00:00 2001
From: Nicolas Williams 
Date: Tue, 3 Oct 2017 00:33:09 -0500
Subject: [PATCH 1/2] Add ALWAYS DEFERRED option for CONSTRAINTs

and CONSTRAINT TRIGGERs.

This is important so that one can have triggers and constraints that
must run after all of the user/client's statements in a transaction
(i.e., at COMMIT time), so that the user/client may make no further
changes (triggers, of course, still can).
---
 doc/src/sgml/catalogs.sgml | 17 -
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_table.sgml | 10 ++-
 doc/src/sgml/ref/create_trigger.sgml   |  2 +-
 doc/src/sgml/trigger.sgml  |  1 +
 src/backend/bootstrap/bootparse.y  |  2 +
 src/backend/catalog/heap.c |  1 +
 src/backend/catalog/index.c|  8 +++
 src/backend/catalog/information_schema.sql |  8 +++
 src/backend/catalog/pg_constraint.c|  2 +
 src/backend/catalog/toasting.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/tablecmds.c   | 20 +-
 src/backend/commands/trigger.c | 28 +++--
 src/backend/commands/typecmds.c|  3 +
 src/backend/nodes/copyfuncs.c  |  3 +
 src/backend/nodes/outfuncs.c   |  4 ++
 src/backend/parser/gram.y  | 99 ++
 src/backend/parser/parse_utilcmd.c | 46 +-
 src/backend/utils/adt/ruleutils.c  |  4 ++
 src/bin/pg_dump/pg_dump.c  | 31 --
 src/bin/pg_dump/pg_dump.h  |  2 +
 src/bin/psql/describe.c| 34 +++---
 src/include/catalog/index.h|  2 +
 src/include/catalog/pg_constraint.h| 42 +++--
 src/include/catalog/pg_constraint_fn.h |  1 +
 src/include/catalog/pg_trigger.h   | 16 ++---
 src/include/commands/trigger.h |  1 +
 src/include/nodes/parsenodes.h |  6 +-
 src/include/utils/reltrigger.h |  1 +
 src/test/regress/input/constraints.source  | 51 +++
 src/test/regress/output/constraints.source | 54 +++-
 32 files changed, 416 insertions(+), 91 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9af77c1..2c3ed23 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2202,6 +2202,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  conalwaysdeferred
+  bool
+  
+  Is the constraint always deferred?
+ 
+
+ 
   convalidated
   bool
   
@@ -6948,6 +6955,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  tgalwaysdeferred
+  bool
+  
+  True

[HACKERS] Re: [BUGS] Re: [PATCH] BUG #13416: Postgres >= 9.3 doesn't use optimized shared memory on Solaris anymore

2017-10-04 Thread Andres Freund

Hi,

On 2017-10-04 10:47:06 -0700, Sean Chittenden wrote:
> Hello.  We identified the same problem.  Sam Gwydir and Josh Clulow were able 
> to put together the appropriate fix after.
> 
> The breakage in src/backend/port/sysv_shmem.c and 
> src/include/storage/dsm_impl.h should be back ported to all supported 
> versions (the regression was introduced between the 9.2 and 9.3 branches).

Personally I don't think "breakage" is quite the right work.

I also don't like that we're unconditionally not using
USE_ANONYMOUS_SHMEM - doesn't that run into similar config limits on
solaris based stuff as it does on linux etc?

I think if we want to do this, we should rather go with a patch like
https://www.postgresql.org/message-id/20140422121921.gd4...@awork2.anarazel.de

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] BUG #13416: Postgres >= 9.3 doesn't use optimized shared memory on Solaris anymore

2017-10-04 Thread Sean Chittenden

Hello.  We identified the same problem.  Sam Gwydir and Josh Clulow were able 
to put together the appropriate fix after.

The breakage in src/backend/port/sysv_shmem.c and 
src/include/storage/dsm_impl.h should be back ported to all supported versions 
(the regression was introduced between the 9.2 and 9.3 branches).

The only real question remaining is: do we want to change the default behavior, 
as detected by initdb(1), to use ISM on Illumos/Solaris derived systems?  Given 
the memory bloat experienced when using POSIX shared memory (potentially very 
significant for systems with larger shared_buffers), we think it's probably 
prudent to change the default from dynamic_shared_memory_type=posix to sysv.  
Unfortunately I think it's not worth changing the default behavior of initdb(1) 
in the current supported branches, but it should be changed for 10.

Please let us know if there are any questions.  -sc


--
Sean Chittenden
se...@joyent.com

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-04 Thread Tom Lane

Brent Dearth  writes:
> Is there an issue tracker I could be looking at to follow along on the
> progress on this issue?

This email thread is pretty much it ...

Current status is that I've filed a bug report with Apple and am waiting
to see their response before deciding what to do next.  If they fix the
issue promptly then there's little need for us to do anything.

If you want to help move things along, it would be useful to run some
experiments and see if there's a way to ameliorate the problem short of
the brute-force answer of disabling copy_file()'s pg_flush_data() call.
One thing that occurs to me is that even a 64KB file copy buffer is pretty
small on any modern machine.  If we were to up that to, say, 1MB, does
that help any?  See COPY_BUF_SIZE in src/backend/storage/file/copydir.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-04 Thread Brent Dearth

Tom, Andres -

Is there an issue tracker I could be looking at to follow along on the
progress on this issue?

Thanks so much!

On Mon, Oct 2, 2017 at 9:06 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
> >> What I saw was that the backend process was consuming 100% of (one) CPU,
> >> while the I/O transaction rate viewed by "iostat 1" started pretty low
> >> --- under 10% of what the machine is capable of --- and dropped from
> >> there as the copy proceeded.  I did not think to check if that was user
> >> or kernel-space CPU, but I imagine it has to be the latter.
>
> > So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
> > msync() that's the problem here. I guess you didn't run a profile?
>
> Interestingly, profiling with Activity Monitor seems to blame the problem
> entirely on munmap() ... which squares with the place I hit every time
> when randomly stopping the process with gdb^Hlldb, so I'm inclined to
> believe it.
>
> This still offers no insight as to why CREATE DATABASE is hitting the
> problem while regular flush activity doesn't.
>
> > One interesting thing here is that in the CREATE DATABASE case there'll
> > probably be a lot larger contiguous mappings than in *_flush_after
> > cases. So it might be related to the size of the mapping / flush "unit".
>
> Meh, the mapping is only 64K in this case vs. 8K in the other.  Hard
> to credit that it breaks that easily.
>
> regards, tom lane
>

Re: [HACKERS] [PATCH] Improve geometric types

2017-10-04 Thread Emre Hasegeli

> Now, it's also not clear that anything in PG really cares.  But if we
> do care, I think we should keep pg_hypot() ... and maybe clarify the
> comment a bit more.

I am not sure how useful NaNs are in geometric types context, but we
allow them, so inconsistent hypot() would be a problem.  I will change
my patches to keep pg_hypot().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-04 Thread Robert Haas

On Wed, Oct 4, 2017 at 11:34 AM, Robert Haas  wrote:
> +Enables or disables the query planner's use of partition-wise join
> +plans. When enabled, it spends time in creating paths for joins 
> between
> +partitions and consumes memory to construct expression nodes to be 
> used
> +for those joins, even if partition-wise join does not result in the
> +cheapest path. The time and memory increase exponentially with the
> +number of partitioned tables being joined and they increase linearly
> +with the number of partitions. The default is off.
>
> I think this is too scary and too much technical detail.  I think you
> could just say something like: Enables or disables use of
> partition-wise join, which allows a join between partitioned tables to
> be performed by joining the matching partitions.  Partition-wise join
> currently applies only when the join conditions include all the
> columns of the partition keys, which must be of the same data type and
> have exactly matching sets of child partitions.  Because
> partition-wise join planning can use significantly increase CPU time
> and memory usage during planning, the default is off.

Not enough caffeine, obviously: should have been something like --
Because partition-wise join can significantly increase the CPU and
memory costs of planning...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan

On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen  wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is 
> continuous; in which case the condition should be:
>
>> if (TransactionIdIsValid(priorXmax) &&
>> !TransactionIdEquals(priorXmax, 
>> HeapTupleHeaderGetRawXmin(htup)))
>> break;
>
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
> FreezeId instead of the Xmin when the transaction happened

I was thinking along similar lines.

> The interesting consequence of changing that is the prune seems to get the 
> entire chain altogether with Dan's repro... I've run it a couple of times and 
> have consistently gotten the following page
>
>  lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
> +++--++-
>   1 | (0,1)  |   8152 |1 |   2818 |   3
>   2 ||  7 |2 ||
>   3 ||  0 |0 ||
>   4 ||  0 |0 ||
>   5 ||  0 |0 ||
>   6 ||  0 |0 ||
>   7 | (0,7)  |   8112 |1 |  11010 |   32771
> (7 rows)

That's also what I see. This is a good thing, I think; that's how we
ought to prune.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-04 Thread Robert Haas

On Tue, Oct 3, 2017 at 3:27 PM, Robert Haas  wrote:
> I decided to skip over 0001 for today and spend some time looking at
> 0002-0006.

Back to 0001.

+Enables or disables the query planner's use of partition-wise join
+plans. When enabled, it spends time in creating paths for joins between
+partitions and consumes memory to construct expression nodes to be used
+for those joins, even if partition-wise join does not result in the
+cheapest path. The time and memory increase exponentially with the
+number of partitioned tables being joined and they increase linearly
+with the number of partitions. The default is off.

I think this is too scary and too much technical detail.  I think you
could just say something like: Enables or disables use of
partition-wise join, which allows a join between partitioned tables to
be performed by joining the matching partitions.  Partition-wise join
currently applies only when the join conditions include all the
columns of the partition keys, which must be of the same data type and
have exactly matching sets of child partitions.  Because
partition-wise join planning can use significantly increase CPU time
and memory usage during planning, the default is off.

+partitioned table. The join partners can not be found in other partitions. This
+condition allows the join between partitioned tables to be broken into joins
+between the matching partitions. The resultant join is partitioned in the same

"The join partners can not be found in other partitions." is redundant
with the previous sentence.  I suggest deleting it.  I also suggest
"This condition allows the join between partitioned tables to be
broken" -> "Because of this, the join between partitioned tables can
be broken".

+relation" for both partitioned table as well as join between partitioned tables
+which can use partition-wise join technique.

for either a partitioned table or a join between compatibly partitioned tables

+Partitioning properties of a partitioned relation are stored in
+PartitionSchemeData structure. Planner maintains a list of canonical partition
+schemes (distinct PartitionSchemeData objects) so that any two partitioned
+relations with same partitioning scheme share the same PartitionSchemeData
+object. This reduces memory consumed by PartitionSchemeData objects and makes
+it easy to compare the partition schemes of joining relations.

Not all of the partitioning properties are stored in the
PartitionSchemeData structure any more.  I think this needs some
rethinking and maybe some expansion.  As written, each of the first
two sentences needs a "the" at the beginning.

+   /*
+* Create "append" paths for
partitioned joins. Do this before
+* creating GatherPaths so that
partial "append" paths in
+* partitioned joins will be considered.
+*/

I think you could shorten this to a single-line comment and just keep
the first sentence.  Similarly in the other location where you have
the same sort of thing.

+ * child-joins. Otherwise, add_path might delete a path that some "append"
+ * path has reference to.

to which some path generated here has a reference.

Here and elsewhere, you use "append" rather than Append to refer to
the paths added.  I suppose that's weasel-wording to work around the
fact that they might be either Append or MergeAppend paths, but I'm
not sure it's really going to convey that to anyone.  I suggest
rephrasing those comments more generically, e.g.:

+   /* Add "append" paths containing paths from child-joins. */

You could say: Build additional paths for this rel from child-join paths.

Or something.

+   if (!REL_HAS_ALL_PART_PROPS(rel))
+   return;

Isn't this an unnecessarily expensive test?  I mean, it shouldn't be
possible for it to have some arbitrary subset.

+   /*
+* Every pair of joining relations we see here should have an equi-join
+* between partition keys if this join has been deemed as a partitioned
+* join. See build_joinrel_partition_info() for reasons.
+*/
+   Assert(have_partkey_equi_join(rel1, rel2, parent_sjinfo->jointype,
+
parent_restrictlist));

I suggest removing this assertion.  Seems like overkill to me.

+   child_sjinfo = build_child_join_sjinfo(root, parent_sjinfo,
+
child_rel1->relids,
+
child_rel2->relids);

It seems like we might end up doing this multiple times for the same
child join, if there are more than 2 tables involved.  Not sure if
there's a good way to avoid that.  Similarly for child_restrictlist.

+   pk_has_clause = (bool *) palloc0(sizeof(bool) * num_pks);

Just  do bool pk_has_clause[PARTITION_MAX_KEYS] instead.  Stack
allocation is a lot faster, and then you don't need to pfree it.

+

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-04 Thread Robert Haas

On Thu, Sep 21, 2017 at 8:52 AM, Robert Haas  wrote:
> On Thu, Sep 21, 2017 at 8:21 AM, Ashutosh Bapat
>  wrote:
>> About your earlier comment of making build_joinrel_partition_info()
>> simpler. Right now, the code assumes that partexprs or
>> nullable_partexpr can be NULL when either of them is not populated.
>> That may be saves a sizeof(pointer) * (number of keys) byes of memory.
>> Saving that much memory may not be worth the complexity of code. So,
>> we may always allocate memory for those arrays and fill it with NIL
>> values when there are no key expressions to populate those. That will
>> simplify the code. I haven't done that change in this patchset. I was
>> busy debugging the Q7 regression. Let me know your comments about
>> that.
>
> Hmm, I'm not sure that's the best approach, but let me look at it more
> carefully before I express a firm opinion.

Having studied this a bit more, I now think your proposed approach is
a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] datetime.h defines like PM conflict with external libraries

2017-10-04 Thread Michael Meskes

> Isn't pgtypeslib/*.h exposed to ecpg-using applications?

No, the public interface is is include/*.h, pgtypeslib/*.h is only
internal.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Michael Paquier

On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas  wrote:
> Not really; dynahash won't merge two keys just because their hash
> codes come out the same.  But you're right; that's probably not the
> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
> like using tag_hash would be superior.

Yes, using tag_hash would be just better than any custom formula.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] datetime.h defines like PM conflict with external libraries

2017-10-04 Thread Tom Lane

Michael Meskes  writes:
>> Maybe it'd be good idea to unify some of that stuff so that ecpg can
>> use it, too?  Having a second copy of the same stuff in
>> src/interfaces/ecpg/pgtypeslib/dt.h is pretty terrible.  Even if not,
>> let's make sure they don't diverge.

> Please let's unify whatever we can. The fewer manual sync we need, the
> better.

Isn't pgtypeslib/*.h exposed to ecpg-using applications?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Robert Haas

On Wed, Oct 4, 2017 at 9:49 AM, Michael Paquier
 wrote:
> I am still on the learning curve with pg_stat_statements... This still
> does not look complete to me. pgss_hash_fn only makes use of the last
> four bytes of the query ID. What about computing the hash using as
> also the first four bytes? With the current code, if the last four
> bytes of two queries match then they would be counted together looking
> at pgss_store().

Not really; dynahash won't merge two keys just because their hash
codes come out the same.  But you're right; that's probably not the
best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
like using tag_hash would be superior.

> I have spotted as well this comment in pg_stat_statements.c:
> /* Increment the counts, except when jstate is not NULL */
> if (!jstate)
> I think that this should be "when jstate is NULL".

I think that you're right, but that's unrelated to this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Michael Paquier

On Wed, Oct 4, 2017 at 9:08 PM, Robert Haas  wrote:
> On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
>  wrote:
>>> I'm sorry, but I don't understand this comment.
>>
>> I just mean that your patch is correct here. I don't always complain :)
>
> Oh, OK.  I'm all right with my patch being correct.
>
> Here's a new version that hopefully fixes the things that you noticed
> were incorrect.

I am still on the learning curve with pg_stat_statements... This still
does not look complete to me. pgss_hash_fn only makes use of the last
four bytes of the query ID. What about computing the hash using as
also the first four bytes? With the current code, if the last four
bytes of two queries match then they would be counted together looking
at pgss_store().

I have spotted as well this comment in pg_stat_statements.c:
/* Increment the counts, except when jstate is not NULL */
if (!jstate)
I think that this should be "when jstate is NULL".
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera

Wong, Yi Wen wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is 
> continuous; in which case the condition should be:
> 
> > if (TransactionIdIsValid(priorXmax) &&
> > !TransactionIdEquals(priorXmax, 
> > HeapTupleHeaderGetRawXmin(htup)))
> > break;
> 
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
> FreezeId instead of the Xmin when the transaction happened

I independently arrived at the same conclusion.  Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.

This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.

This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes?  It does change with freezing.

/*
 * If xmin isn't what we're expecting, the slot must 
have been
 * recycled and reused for an unrelated tuple.  This 
implies that
 * the latest version of the row was deleted, so we 
need do
 * nothing.  (Should be safe to examine xmin without 
getting
 * buffer's content lock, since xmin never changes in 
an existing
 * tuple.)
 */
if 
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),

priorXmax))

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+   {
+   xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | 
FRM_RETURN_IS_XID;
+   }
else
{
*flags |= FRM_INVALIDATE_XMAX;
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 9a8db74cb9..561acd144a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -435,7 +435,8 @@ heap_prune_chain(Relation relation, Buffer buffer, 
OffsetNumber rootoffnum,
 * Check the tuple XMIN against prior XMAX, if any
 */
if (TransactionIdIsValid(priorXmax) &&
-   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax))
+   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax) &&
+   HeapTupleHeaderGetXmin(htup) != FrozenTransactionId)
break;

/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Zeus Kronion

On Tue, Oct 3, 2017 at 11:39 AM, Nico Williams 
wrote:

> On Tue, Oct 03, 2017 at 12:33:00AM -0400, Tom Lane wrote:
> > So to default to verification would be to default to failing to
> > connect at all until user has created a ~/.postgresql/root.crt file with
> > valid, relevant entries.  That seems like a nonstarter.
> >
> > It's possible that we could adopt some policy like "if the root.crt file
> > exists then default to verify" ... but that seems messy and unreliable,
> > so I'm not sure it would really add any security.
>
> Still, it would be safer to refuse to connect until the lack of trust
> anchors is rectified than to connect without warning about the inability
> to verify a server.  By forcing the user (admins) to take action to
> remediate the problem, the problem then gets fixed, whereas plowing on
> creates an invisible (for many users) security problem.


I agree with Nico. If the server certificate can't be validated, the client
should fail to connect unless specifically opting out of MITM protection.
Why not change DefaultSSLMode from "prefer," even if it isn't backwards
compatible? Is there a policy for deprecating default settings?

Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-04 Thread Alexander Kuzmenkov

As discussed earlier, I changed the way we work with mergeopfamilies. I 
use the "is_equality" flag to indicate whether the clause is an equality 
one, and fill mergeopfamilies for both equality and inequality operators.

The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 925b4cf553..73e6a4ca74 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = [iClause];
+		MergeJoinClause clause = >mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   _strategy,
+   _op_strategy,
    _lefttype,
    _righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple comparison result */
+typedef enum
+{
+	MJCR_NextInner = 1,
+	MJCR_NextOuter = -1,
+	MJCR_Join = 0
+} MJCompareResult;
+
 /*
  * MJCompare
  *
@@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, respectively.
  */
-static int
+static MJCompareResult
 MJCompare(MergeJoinState *mergestate)
 {
-	int			result = 0;
+	MJCompareResult result = MJCR_Join;
 	bool		nulleqnull = false;

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-04 Thread Ashutosh Bapat

On Wed, Oct 4, 2017 at 5:32 PM, Robert Haas  wrote:
> On Wed, Oct 4, 2017 at 6:40 AM, Ashutosh Bapat
>  wrote:
>> Just like the local constraints on a foreign table are not ensured on
>> remote table (unless user takes steps to make that sure), WCO defined
>> locally need not be (and probably can not be) ensured remotely. We can
>> check whether a row being sent from the local server to the foreign
>> server obeys WCO, but what foreign server does to that row is beyond
>> local server's scope.
>
> But I think right now we're not checking the row being sent from the
> local server, either.

Didn't 7086be6e3627c1ad797e32ebbdd232905b5f577f fix that?

> The WCO that is being ignored isn't a
> constraint on the foreign table; it's a constraint on a view which
> happens to reference the foreign table.  It seems quite odd for the
> "assume constraints are valid" property of the foreign table to
> propagate back up into the view that references it.
>

The view with WCO is local but the modification which violates WCO is
being made on remote server by a trigger on remote table. Trying to
control that doesn't seem to be a good idea, just like we can't
control what rows get inserted on the foreign server when they violate
local constraints. I am using local constraints as an example of
precedence where we ignore what's happening on remote side and enforce
whatever we could enforce locally. Local server should make sure that
any rows sent from local server to the remote server do not violate
any local WCO. But once it's handed over to the foreign server, we
shouldn't worry about what happens there. That behaviour is ensured by
the above commit, isn't it?  I am not suggesting that we use local
constraints to enforce WCO or something like that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-04 Thread Ashutosh Bapat

On Wed, Oct 4, 2017 at 12:57 AM, Robert Haas  wrote:
>
> 0003:
>
> The commit message mentions estimate_num_groups but the patch doesn't touch 
> it.

This was fixed when we converted many rel->reloptkind ==
RELOPT_BASEREL to IS_SIMPLE_REL(). I have removed this section from
the commit message.

>
> I am concerned that this patch might introduce some problem fixed by
> commit dd4134ea56cb8855aad3988febc45eca28851cd8.  The comment in that
> patch say, at one place, that "This protects against possible
> incorrect matches to child expressions that contain no Vars."
> However, if a child expression has no Vars, then I think em->em_relids
> will be empty, so the bms_is_equal() test that is there now will fail
> but your proposed bms_is_subset() test will pass.

bms_is_equal() was enough when there was only a single member in
relids but it doesn't work now that there can be multiple of them.
bms_is_equal() was replaced with bms_is_subset() to accomodate for
ec_members with only a subset of relids when we are searching for a
join relation.

I am not sure whether your assumption that expression with no Vars
would have em_relids empty is correct. I wonder whether we will add
any em_is_child members with empty em_relids; looking at
process_equivalence() those come from RestrictInfo::left/right_relids
which just indicates the relids at which that particular expression
can be evaluated. Place holder vars is an example when that can
happen, but there may be others. To verify this, I tried attached
patch on master and ran make check. The assertion didn't trip. If
em_relids is not NULL, bms_is_subset() is fine.

If em_relids could indeed go NULL when em_is_child is true, passing
NULL relids (for parent rels) to that function can cause unwanted
behaviour. bms_is_equal(em->em_relids, relids) will return true
turning the if (em->em_is_child && !bms_is_equal(em->em_relids,
relids)) to false. This means that we will consider a child member
with em_relids NULL even while matching a parent relation. What
surprises me is, that commit added a bunch of testcases and none of
them failed with this change.

Nonetheless, I have changed "matches" with "belongs to" in the
prologue of those functions since an exact match won't be possible
with child-joins.

>
> 0004:
>
> I suggest renaming get_wholerow_ref_from_convert_row_type to
> is_converted_whole_row_reference and making it return a bool.

Done.

>
> The coding of that function is a little strange; why not move Var to
> an inner scope?  Like this: if (IsA(convexpr->arg, var)) { Var *var =
> castNode(Var, convexpr->arg; if (var->varattno == 0) return var; }

I probably went too far to avoid indented code :). Fixed now.

>
> Will the statement that "In case of multi-level partitioning, we will
> have as many nested ConvertRowtypeExpr as there are levels in
> partition hierarchy" be falsified by Amit Khandekar's pending patch to
> avoid sticking a ConvertRowTypeExpr on top of another
> ConvertRowTypeExpr?  Even if the answer is "no", I think it might be
> better to drop this part of the comment; it would be easy for it to
> become false in the future, because we might want to optimize that
> case in the future and we'll probably forget to update this comment
> when we do.

That might keep someone wondering where the nested
ConvertRowtypeExpr's came from. But may be in future those can arise
from something other than multi-level partition hierarchy and in that
case too the comment would be rendered inaccurate. So done.

>
> In fix_upper_expr_mutator(), you have an if statement whose entire
> contents are another if statement.  I think you should use && instead,
> and maybe reverse the order of the tests, since
> context->subplan_itlist->has_conv_whole_rows is probably cheaper to
> test than a function call.  It's also a little strange that this code
> isn't adjacent too, or merged with, the existing has_non_vars case.
> Maybe:
>
> converted_whole_row = is_converted_whole_row_reference(node);
> if (context->outer_itlist && (context->outer_itlist->has_non_vars ||
> (context->outer_itlist->has_conv_whole_rows && converted_whole_row))
> ...
> if (context->inner_itlist && (context->inner_itlist->has_non_vars ||
> (context->inner_itlist->has_conv_whole_rows && converted_whole_row))

I placed it with the other node types since it's for a specific node
type, but I guess your suggestion avoids duplicates and looks better.
Done.

> ...
>
> 0005:
>
> The comment explaining why the ParamPathInfo is allocated in the same
> context as the RelOptInfo is a modified copy of an existing comment
> that still reads like the original, a manner of commenting I find a
> bit undesirable as it leads to filling up the source base with
> duplicate comments.

I have pointed to mark_dummy_rel() in that comment instead of
duplicating the whole paragraph.

>
> I don't think I believe that comment, either.  In the case from which
> that comment was copied (mark_dummy_rel), it was talking about

Re: [HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Andreas Joseph Krogh

På onsdag 04. oktober 2017 kl. 00:24:19, skrev Vik Fearing <
vik.fear...@2ndquadrant.com >:
On 10/03/2017 10:10 PM, Andreas Joseph Krogh wrote:
 > While we're in deferrable constraints land...;
 > I even more often need deferrable /conditional /unique-indexes.
 > In PG you now may have:
 >
 > ALTER TABLE email_folder ADD CONSTRAINT some_uk UNIQUE (owner_id, 
folder_type, name) DEFERRABLE INITIALLY DEFERRED;
 >
 > 
 > But this isn't supported:
 >
 > CREATE UNIQUE INDEX some_uk ON email_folder(owner_id, folder_type, name) 
WHERE parent_id IS NULL DEFERRABLE INITIALLY DEFERRED;
 >
 > Are there any plans to support this?

 I don't want to hijack the thread, but you can do that with exclusion
 constraints.
 
 
True.
 
--
 Andreas Joseph Krogh

Re: [HACKERS] 64-bit queryId?

2017-10-04 Thread Robert Haas

On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
 wrote:
>> I'm sorry, but I don't understand this comment.
>
> I just mean that your patch is correct here. I don't always complain :)

Oh, OK.  I'm all right with my patch being correct.

Here's a new version that hopefully fixes the things that you noticed
were incorrect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


64-bit-queryid-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> 
> > I thought that we no longer store FrozenTransactionId (xid 2) as our
> > "raw" xmin while freezing, and yet that's what we see here.
> 
> I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
> broke things more thoroughly, but it is already broken in earlier
> releases.

In fact, I think in 9.3 we should include this patch, to set the Xmin to
FrozenXid.  9.4 and onwards have commit 37484ad2a "Change the way we
mark tuples as frozen" which uses a combination of infomask bits, but in
9.3 I think leaving the unfrozen value in the xmax field is a bad idea
even if we set the HEAP_XMAX_COMMITTED bit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+   {
+   xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | 
FRM_RETURN_IS_XID;
+   }
else
{
*flags |= FRM_INVALIDATE_XMAX;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-04 Thread Robert Haas

On Wed, Oct 4, 2017 at 6:40 AM, Ashutosh Bapat
 wrote:
> Just like the local constraints on a foreign table are not ensured on
> remote table (unless user takes steps to make that sure), WCO defined
> locally need not be (and probably can not be) ensured remotely. We can
> check whether a row being sent from the local server to the foreign
> server obeys WCO, but what foreign server does to that row is beyond
> local server's scope.

But I think right now we're not checking the row being sent from the
local server, either.  The WCO that is being ignored isn't a
constraint on the foreign table; it's a constraint on a view which
happens to reference the foreign table.  It seems quite odd for the
"assume constraints are valid" property of the foreign table to
propagate back up into the view that references it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Warnings in objectaddress.c

2017-10-04 Thread Robert Haas

On Tue, Oct 3, 2017 at 9:41 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Perhaps we should apply some glorified version of this:
>
>> +if (list_length(object) < 2)
>> +elog(ERROR, "fail");
>
>> However, I'm not 100% sure that would be sufficient to suppress these
>> warnings, because the compiler has got to be smart enough to know that
>> elog() doesn't return and that i >= 2 is sufficient to guarantee that
>> everything is initialized.
>
> I'm betting it wouldn't help.  I was considering something along the line
> of unrolling the loop:
>
> Assert(list_length(object) == 2);
>
> assign typenames[0] and typeoids[0] from linitial(object)
>
> assign typenames[1] and typeoids[1] from lsecond(object)
>
> This would involve duplicating the loop body, but that's only 3 lines,
> so I think it wouldn't even net out as more code.

Yeah, that's an idea, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-04 Thread Robert Haas

On Tue, Oct 3, 2017 at 9:46 PM, Tom Lane  wrote:
> Badrul Chowdhury  writes:
>> 1. Pgwire protocol v3.0 with negotiation is called v3.1.
>> 2. There are 2 patches for the change: a BE-specific patch that will be 
>> backported and a FE-specific patch that is only for pg10 and above.
>
> TBH, anything that presupposes a backported change in the backend is
> broken by definition.  We expect libpq to be able to connect to older
> servers, and that has to include servers that didn't get this memo.
>
> It would be all right for libpq to make a second connection attempt
> if its first one fails, as we did in the 2.0 -> 3.0 change.

Hmm, that's another approach, but I prefer the one advocated by Tom Lane.

https://www.postgresql.org/message-id/30788.1498672...@sss.pgh.pa.us
https://www.postgresql.org/message-id/24357.1498703265%40sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-04 Thread Ashutosh Bapat

On Wed, Oct 4, 2017 at 3:45 PM, Etsuro Fujita
 wrote:
> On 2017/10/03 18:16, Ashutosh Bapat wrote:
>>
>> Enforcing WCO constraints imposed by the local server on the row/DML
>> being passed to the foreign server is fine, but trying to impose them
>> on the row being inserted/updated at the foreign server looks odd. May
>> be we should just leave this case as it is. I am comparing this case
>> with the way we handle constraints on a foreign table.
>
>
> Hmm, I think that would be okay in the case where WCO constraints match
> constraints on the foreign table, but I'm not sure that would be okay even
> in the case where WCO constraints don't match?  Consider:
>
> create table bt (a int check (a % 2 = 0));
> create foreign table ft (a int check (a % 2 = 0)) server loopback options
> (table_name 'bt');
> create view rw_view_2 as select * from ft where a % 2 = 0 with check option;
>
> In that case the WCO constraint matches the constraint on the foreign table,
> so there would be no need to ensure the WCO constraint locally (to make the
> explanation simple, we assume here that we don't have triggers on the remote
> end).  BUT: for another auto-updatable view defined using the same foreign
> table like this:
>
> create view rw_view_4 as select * from ft where a % 4 = 0 with check option;
>
> how is the WCO constraint (ie, a % 4 = 0) ensured remotely, which is
> different from the constraint on the foreign table (ie, a % 2 = 0)? Maybe
> I'm missing something, though.

Just like the local constraints on a foreign table are not ensured on
remote table (unless user takes steps to make that sure), WCO defined
locally need not be (and probably can not be) ensured remotely. We can
check whether a row being sent from the local server to the foreign
server obeys WCO, but what foreign server does to that row is beyond
local server's scope.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-04 Thread Etsuro Fujita


On 2017/10/03 18:16, Ashutosh Bapat wrote:

Enforcing WCO constraints imposed by the local server on the row/DML
being passed to the foreign server is fine, but trying to impose them
on the row being inserted/updated at the foreign server looks odd. May
be we should just leave this case as it is. I am comparing this case
with the way we handle constraints on a foreign table.


Hmm, I think that would be okay in the case where WCO constraints match 
constraints on the foreign table, but I'm not sure that would be okay 
even in the case where WCO constraints don't match?  Consider:


create table bt (a int check (a % 2 = 0));
create foreign table ft (a int check (a % 2 = 0)) server loopback 
options (table_name 'bt');

create view rw_view_2 as select * from ft where a % 2 = 0 with check option;

In that case the WCO constraint matches the constraint on the foreign 
table, so there would be no need to ensure the WCO constraint locally 
(to make the explanation simple, we assume here that we don't have 
triggers on the remote end).  BUT: for another auto-updatable view 
defined using the same foreign table like this:


create view rw_view_4 as select * from ft where a % 4 = 0 with check option;

how is the WCO constraint (ie, a % 4 = 0) ensured remotely, which is 
different from the constraint on the foreign table (ie, a % 2 = 0)? 
Maybe I'm missing something, though.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] datetime.h defines like PM conflict with external libraries

2017-10-04 Thread Michael Meskes

> Maybe it'd be good idea to unify some of that stuff so that ecpg can
> use
> it, too?  Having a second copy of the same stuff in
> src/interfaces/ecpg/pgtypeslib/dt.h is pretty terrible.  Even if not,
> let's make sure they don't diverge.

Please let's unify whatever we can. The fewer manual sync we need, the
better.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera

Wood, Dan wrote:
> One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row 
> chain?
> 
> lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
> ++++-++
>   1 | (0,1)  |   8152 |   2818 |   3 |  36957 |  0
>   2 ||  5 || ||   
>   3 ||  0 || ||   
>   4 ||  0 || ||   
>   5 | (0,6)  |   8112 |   9986 |   49155 |  36962 |  36963
>   6 | (0,7)  |   8072 |   9986 |   49155 |  36963 |  36961
>   7 | (0,7)  |   8032 |  11010 |   32771 |  36961 |  0
> (7 rows)

No, it just means transaction A got its XID before transaction B, but B
executed the update first and A updated the tuple second.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera

Wood, Dan wrote:

> There is a tangled web of issues here.  With the community fix we get a 
> corrupted page(invalid redirect ptr from indexed item).  The cause of that is:
> pruneheap.c:
> 
>   /*
>* Check the tuple XMIN against prior XMAX, if any
>*/
>   if (TransactionIdIsValid(priorXmax) &&
>   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
> priorXmax))
>   break;
> 
>   chainitems[nchain++] = offnum;
> 
> The priorXmax is a multixact key share lock,

Uhh, what?  That certainly shouldn't happen -- the priorXmax comes from

priorXmax = HeapTupleHeaderGetUpdateXid(htup);

so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera

Peter Geoghegan wrote:

> I thought that we no longer store FrozenTransactionId (xid 2) as our
> "raw" xmin while freezing, and yet that's what we see here.

I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] datetime.h defines like PM conflict with external libraries

2017-10-04 Thread Alvaro Herrera

Andrew Dunstan wrote:

> On 10/03/2017 04:43 PM, Tom Lane wrote:

> > I like the new-header-file idea because it will result in minimal code
> > churn and thus minimal back-patching hazards.
> >
> > I do *not* like "PG_PM".  For our own purposes that adds no uniqueness
> > at all.  If we're to touch these symbols then I'd go for names like
> > "DATETIME_PM".  Or maybe "DT_PM" ... there's a little bit of precedent
> > for the DT_ prefix already.
> 
> Yeah. If we use a prefix +1 for DT_. If we do that then I think they
> should *all* be prefixed, not just the ones we know of conflicts for.

Maybe it'd be good idea to unify some of that stuff so that ecpg can use
it, too?  Having a second copy of the same stuff in
src/interfaces/ecpg/pgtypeslib/dt.h is pretty terrible.  Even if not,
let's make sure they don't diverge.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-04 Thread Magnus Hagander

On Wed, Oct 4, 2017 at 8:51 AM, Laurenz Albe 
wrote:

> Peter Eisentraut wrote:
> > At the PGCon Developer Meeting it was agreed[0] to add a list of credits
> > to the release notes, including everyone who was mentioned in a commit
> > message.  I have now completed that list.
> >
> > Attached is the proposed documentation commit as well as the raw list.
>
> > The list is sorted using COLLATE "en-x-icu".
>
> It would be awesome if the list could be sorted by last name,
> as name lists traditionally are, but maybe that's too much to ask.
>

Whether that's traditionally or not very much depends on which part of the
world you are in, I believe. Let's try to avoid going down that rabbithole
:)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Add TOAST to system tables with ACL?

2017-10-04 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane  wrote:

> For pg_database, you'd have to make sure that the startup-time check of
> database CONNECT privilege still works if the ACL's been pushed out of
> line.
>

I've tried this case.

At first, make database temp with no connect privilege from public and
1 users.
create database temp;
revoke connect on database temp from public;
\copy (select 'create user u' || i || ';' from generate_series(1,1) i)
to 'script.sql'
\i script.sql

I've checked that user u1 can't login to database temp.
$ psql temp -U u1
psql: FATAL:  permission denied for database "temp"
DETAIL:  User does not have CONNECT privilege.

Than I grant connect privilege to all that 1 users.
\copy (select 'grant connect on database temp to u' || i || ';' from
generate_series(1,1) i) to 'script.sql'
\i script.sql

Then user u1 can login successfully.
$ psql temp -U u1
psql (11devel)
Type "help" for help.

u1@temp=#

Thus, in this simple case database CONNECT privilege works with out-of-line
ACL for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] JIT compiling - v4.0

2017-10-04 Thread Ants Aasma

On Wed, Oct 4, 2017 at 9:48 AM, Andres Freund  wrote:
> Here's an updated version of the patchset.  There's some substantial
> changes here, but it's still very obviously very far from committable as
> a whole. There's some helper commmits that are simple and independent
> enough to be committable earlier on.

Looks pretty impressive already.

I wanted to take it for a spin, but got errors about the following
symbols being missing:

LLVMOrcUnregisterPerf
LLVMOrcRegisterGDB
LLVMOrcRegisterPerf
LLVMOrcGetSymbolAddressIn
LLVMLinkModules2Needed

As far as I can tell these are not in mainline LLVM. Is there a branch
or patchset of LLVM available somewhere that I need to use this?

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-04 Thread Haribabu Kommi

On Sat, Sep 30, 2017 at 3:31 AM, Robert Haas  wrote:

> On Fri, Sep 29, 2017 at 12:44 AM, Vaishnavi Prabakaran
>  wrote:
> > Option name "--enable-pgdumpall-behaviour"  is very generic
>
> Yeah, that's a terrible name, at least in my opinion.
>

OK. I will use a new name based on the discussion.

> and it is better
> > to rename it to something that reflects its functionality like
> > --skip-default-db-create/--no-default-db-create
>
> But I wonder why this patch needs a new option at all?
>

There are some differences in handling database objects
between pg_dump and pg_dumpall, To retain both pg_dump
and pg_dumpall behavior even after refactoring, this option
is added. Currently this option is used mainly for the three
purposes.

1. Don't print unnecessary CREATE DATABASE options like
ENCODING, LC_COLLATE and LC_CTYPE options if the
default encoding is same with the above values.

The above behavior is as per the pg_dumpall, but it can be
changed to print irrespective of the default encoding.

2. Do not dump postgres and template0 databases.

3. Set default_transaction_read_only = off.

As per the following comment in pg_dumpall, based on that flag
the GUC is set, to retain the same behavior even after this
refactoring.

/*
* Restore will need to write to the target cluster.  This connection
* setting is emitted for pg_dumpall rather than in the code also used
* by pg_dump, so that a cluster with databases or users which have
* this flag turned on can still be replicated through pg_dumpall
* without editing the file or stream.  With pg_dump there are many
* other ways to allow the file to be used, and leaving it out allows
 * users to protect databases from being accidental restore targets.
*/
fprintf(OPF, "SET default_transaction_read_only = off;\n\n");

we can remove the usage -1 and retain the usage-2 with modified option
name as --no-default-database or similar.

Any opinions about the usage-3, In case if we need to retain that change,
any best solution to the option name?

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.

2017-10-04 Thread Andres Freund

On 2017-10-02 15:01:36 -0700, Andres Freund wrote:
> On 2017-10-02 17:57:51 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > Done that way. It's a bit annoying, because we've to take care to
> > > initialize the "unused" part of the array with a valid signalling it's
> > > an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a
> > > valid entry.
> > 
> > The prototype code I posted further upthread just used -1 as the "unused"
> > marker. There's no reason the array can't be int16 rather than uint16,
> > and "if (index < 0)" is probably a faster test anyway.
> 
> Right, but whether we use -1 or UINT16_MAX or such doesn't matter. The
> relevant bit is that we can't use 0, so we can't rely on the rest of the
> array being zero initialized, but instead of to initialize all of it
> explicitly.  I've no real feelings about using -1 or UINT16_MAX - I'd be
> very surprised if there's any sort of meaningful performance difference.

I pushed a further cleaned up version of these two patches.  If you see
a way to avoid initializing the "trailing" part of the
fmgr_builtin_oid_index in a different manner, I'm all ears ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-04 Thread Amit Kapila

On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila  wrote:
> On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas  wrote:
>> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila  wrote:
>>
>> Having said all that, I think that this patch only wants to handle the
>> subset of cases (2) and (4) where the relevant InitPlan is attached
>> ABOVE the Gather node -- which seems very reasonable, because
>> evaluating an InitPlan at a level of the plan tree above the level at
>> which it is defined sounds like it might be complex.  But I still
>> don't quite see why we need these tests.  I mean, if we only allow
>> Param references from a set of safe parameter IDs, and the safe
>> parameter IDs include only those IDs that can be generated in a
>> worker, then won't other InitPlans in the workers anyway be ruled out?
>
> It doesn't happen always.  There are cases when the part of required
> conditions are pushed one query level below, so when we check during
> max_parallel_hazard_walker, they look safe, but actually, they are
> not.  I will try to explain by example:
>
> postgres=# explain (costs off, verbose) select * from t1 where t1.i in
> ( select 1 + (select max(j) from t3));

If you want to reproduce this case, then you can use the script posted
by Kuntal up thread [1].

[1] - 
https://www.postgresql.org/message-id/CAGz5QC%2BuHOq78GCika3fbgRyN5zgiDR9Dd1Th5kENF%2BUpnPomQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-04 Thread Amit Kapila

On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas  wrote:
> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila  wrote:
>
> Having said all that, I think that this patch only wants to handle the
> subset of cases (2) and (4) where the relevant InitPlan is attached
> ABOVE the Gather node -- which seems very reasonable, because
> evaluating an InitPlan at a level of the plan tree above the level at
> which it is defined sounds like it might be complex.  But I still
> don't quite see why we need these tests.  I mean, if we only allow
> Param references from a set of safe parameter IDs, and the safe
> parameter IDs include only those IDs that can be generated in a
> worker, then won't other InitPlans in the workers anyway be ruled out?

It doesn't happen always.  There are cases when the part of required
conditions are pushed one query level below, so when we check during
max_parallel_hazard_walker, they look safe, but actually, they are
not.  I will try to explain by example:

postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
  QUERY PLAN
--
 Hash Semi Join
   Output: t1.i, t1.j, t1.k
   Hash Cond: (t1.i = ((1 + $1)))
   ->  Seq Scan on public.t1
 Output: t1.i, t1.j, t1.k
   ->  Hash
 Output: ((1 + $1))
 ->  Result
   Output: (1 + $1)
   InitPlan 1 (returns $1)
 ->  Finalize Aggregate
   Output: max(t3.j)
   ->  Gather
 Output: (PARTIAL max(t3.j))
 Workers Planned: 2
 ->  Partial Aggregate
   Output: PARTIAL max(t3.j)
   ->  Parallel Seq Scan on public.t3
 Output: t3.j
(19 rows)

In the above example, you can see that the condition referring to
initplan (1 + $1) is pushed one level below.  So when it tries to
check parallel safety for the join condition, it won't see Param node.
Now, consider if we don't check contains_parallel_unsafe_param during
generate_gather_paths, then it will lead to below plan.


postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
 QUERY PLAN

 Gather
   Output: t1.i, t1.j, t1.k
   Workers Planned: 2
   ->  Hash Semi Join
 Output: t1.i, t1.j, t1.k
 Hash Cond: (t1.i = ((1 + $1)))
 ->  Parallel Seq Scan on public.t1
   Output: t1.i, t1.j, t1.k
 ->  Hash
   Output: ((1 + $1))
   ->  Result
 Output: (1 + $1)
 InitPlan 1 (returns $1)
   ->  Finalize Aggregate
 Output: max(t3.j)
 ->  Gather
   Output: (PARTIAL max(t3.j))
   Workers Planned: 2
   ->  Partial Aggregate
 Output: PARTIAL max(t3.j)
 ->  Parallel Seq Scan on public.t3
   Output: t3.j
(22 rows)

This is wrong because when we will try to evaluate params that are
required at gather node, we won't get the required param as there is
no initplan at that level.

>
> If I am all mixed up, please help straighten me out.
>

I think whatever you said is right and very clear.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-04 Thread Laurenz Albe

Peter Eisentraut wrote:
> At the PGCon Developer Meeting it was agreed[0] to add a list of credits
> to the release notes, including everyone who was mentioned in a commit
> message.  I have now completed that list.
> 
> Attached is the proposed documentation commit as well as the raw list.

> The list is sorted using COLLATE "en-x-icu".

It would be awesome if the list could be sorted by last name,
as name lists traditionally are, but maybe that's too much to ask.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JIT compiling - v4.0

2017-10-04 Thread Andres Freund

Hi,

Here's an updated version of the patchset.  There's some substantial
changes here, but it's still very obviously very far from committable as
a whole. There's some helper commmits that are simple and independent
enough to be committable earlier on.

The git tree of this work, which is *frequently* rebased, is at:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/jit

The biggest changes are:

- The JIT "infrastructure" is less bad than before, and starting to
  shape up.
- The tuple deforming logic is considerably faster than before due to
  various optimizations. The optimizations are:
  - build deforming exactly to the required natts for the specific caller
  - avoid checking the tuple's natts for attributes that have
"following" NOT NULL columns.
  - a bunch of minor codegen improvements.
- The tuple deforming codegen also got simpler by relying on LLVM to
  promote a stack variable to a register, instead of working with a
  register manually - the need to keep IR in SSA form makes doing so
  manually rather painful.
- WIP patch to do execGrouping.c TupleHashTableMatch() via JIT. That
  makes the column comparison faster, but more importantly it JITs the
  deforming (one side at least always is a MinimalTuple).
- All tests pass with JITed expression, tuple deforming, agg transition
  value computation and execGrouping logic. There were a number of bugs,
  who would have imagined that.
- some more experimental changes later in the series to address some
  bottlenecks.

Functionally this covers all of what I think a sensible goal for v11
is. There's a lot of details to figure out, and the inlining
*implementation* isn't what I think we should do.  I'll follow up, not
tonight though, with an email outlining the first few design decisions
we're going to have to finalize, which'll be around the memory/lifetime
management of functions, and other infrastructure pieces (currently
patch 0006).

As the patchset is pretty large already, and not going to get any
smaller, I'll make smaller adjustments solely via the git tree, rather
than full reposts.

Greetings,

Andres Freund


0001-Rely-on-executor-utils-to-build-targetlist-for-DM.v4.patch.gz
Description: application/patch-gzip


0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-us.v4.patch.gz
Description: application/patch-gzip


0003-Perform-slot-validity-checks-in-a-separate-pass-o.v4.patch.gz
Description: application/patch-gzip


0004-Pass-through-PlanState-parent-to-expression-insta.v4.patch.gz
Description: application/patch-gzip


0005-Add-configure-infrastructure-to-enable-LLVM.v4.patch.gz
Description: application/patch-gzip


0006-Beginning-of-a-LLVM-JIT-infrastructure.v4.patch.gz
Description: application/patch-gzip


0007-JIT-compile-expressions.v4.patch.gz
Description: application/patch-gzip


0008-Centralize-slot-deforming-logic-a-bit.v4.patch.gz
Description: application/patch-gzip


0009-WIP-Make-scan-desc-available-for-all-PlanStates.v4.patch.gz
Description: application/patch-gzip


0010-JITed-tuple-deforming.v4.patch.gz
Description: application/patch-gzip


0011-Simplify-aggregate-code-a-bit.v4.patch.gz
Description: application/patch-gzip


0012-More-efficient-AggState-pertrans-iteration.v4.patch.gz
Description: application/patch-gzip


0013-Avoid-dereferencing-tts_values-nulls-repeatedly-i.v4.patch.gz
Description: application/patch-gzip


0014-WIP-Expression-based-agg-transition.v4.patch.gz
Description: application/patch-gzip


0015-Hacky-Preliminary-inlining-implementation.v4.patch.gz
Description: application/patch-gzip


0016-WIP-Inline-ExecScan-mostly-to-make-profiles-easie.v4.patch.gz
Description: application/patch-gzip


0017-WIP-Do-execGrouping.c-via-expression-eval-machine.v4.patch.gz
Description: application/patch-gzip


0018-WIP-deduplicate-int-float-overflow-handling-code.v4.patch.gz
Description: application/patch-gzip


0019-Make-timestamp_cmp_internal-an-inline-function.v4.patch.gz
Description: application/patch-gzip


0020-Make-hot-path-of-pg_detoast_datum-an-inline-funct.v4.patch.gz
Description: application/patch-gzip


0021-WIP-Inline-additional-function.v4.patch.gz
Description: application/patch-gzip


0022-WIP-Faster-order.v4.patch.gz
Description: application/patch-gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - option to build using ppoll() for larger connection counts

2017-10-04 Thread Fabien COELHO




This patch enables building pgbench to use ppoll() instead of select()
to allow for more than (FD_SETSIZE - 10) connections.  As implemented,
when using ppoll(), the only connection limitation is system resources.

One based on 'master' which can also apply to REL_10_STABLE.


 /home/fabien/pgbench-ppoll.patch:137: trailing whitespace.
 #define PFD_THREAD_INIT(t,s,n) { do ...
 error: patch failed: configure:13024
 error: configure: patch does not apply
 error: patch failed: configure.in:1430
 error: configure.in: patch does not apply
 error: patch failed: src/bin/pgbench/pgbench.c:4588
 error: src/bin/pgbench/pgbench.c: patch does not apply

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

66 matches

Mail list logo