Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-05 Thread Ildar Musin

Hello Sergei,

I couldn't find any case when your code doesn't work properly. So it 
seems ok to me.



@@ -220,6 +220,13 @@ WITH ( MODULUS numeric_literal, REM
  

  
+  Full table scan is performed to check that no existing row
+  in the table has null values in given column.  It is possible to avoid
+  this scan by adding a valid CHECK constraint to
+  the table that would allow only NOT NULL values for given column.
+ 


Adding check constraint will also force the full table scan. So I think 
it would be better to rephrased it as follows:


"Full table scan is performed to check that column doesn't contain NULL 
values unless there are valid check constraints that prohibit NULL 
values for specified column. In the latter case table scan is skipped."


A native English speaker input would be helpful here.

Regarding regression tests it may be useful to set client_min_messages 
to 'debug1' before setting "NOT NULL" attribute for a column. In this 
case you can tell for sure that NotNullImpliedByRelConstraints() 
returned true (i.e. your code actually did the job) as the special debug 
message is printed to the log.


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



Re: public schema default ACL

2018-03-05 Thread Noah Misch
On Sat, Mar 03, 2018 at 02:31:58AM -0800, Joe Conway wrote:
> On 03/03/2018 01:56 AM, Noah Misch wrote:
> > If we do that alone, databases reaching v11 via dump/reload or pg_upgrade 
> > will
> > get the new default ACL if they had not changed the ACL of schema public.  
> > If
> > they had GRANTed or REVOKEd on schema public, pg_dump will recreate the
> > resulting ACL.  This is the standard pg_dump behavior for ACLs on system
> > objects.  I think that's okay for the public schema, too, and I like
> > preserving that usual rule.  However, if we wanted to minimize upgrade-time
> > surprises, we could make pg_dump include GRANT for schema public
> > unconditionally.  That way, the default ACL change would apply to new
> > databases only.  Does anyone want to argue for that?
> 
> What about a pg_dump option to do that and then a big note in the
> release notes telling people why they might want to use it?

I'd want any new pg_dump option to have use beyond this one case.  That is,
not --old-public-schema-acl, but perhaps --old-system-acls-for=OBJECT-PATTERN.
But it's a simple task to loop over your databases and run a GRANT, so I
somewhat doubt that particular idea should win.  Hmm.



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner
I've reworked the SELECT completion patch to use the VersionedQuery
infrastructure.

I've also made it a schema query (for the functions), with an addon
for the attributes.  This provides schema-aware completion.

Previously, addons to schema queries were appended verbatim; I've
changed this to use the same interpolation just as in the simple_query
case, so that the attributes can be filtered in the query.  Otherwise,
relevant items can be omitted when they don't make it into the first
1000 rows in the query result, even when only a small number of items
are ultimately presented for completion.

Edmund


psql-select-tab-completion-v5.patch
Description: Binary data


Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread David Rowley
On 6 March 2018 at 11:43, Alvaro Herrera  wrote:
> Pushed now, to branches master and pg10, with Tomas changes.  I made a
> few changes of my own

Great! Many thanks to both of you for making those changes and thanks
Alvaro for pushing.

> 3. I chose not to backpatch the node->stxcomment thing.  It makes me
> nervous to modify a parse node.  So cloning the comments is a PG11
> thing.  Hopefully it's not *too* bad ...

Makes sense. We've had other objects previously that the comments were
lost sometimes, and I don't think they were noticed too quickly, so
perhaps this is an unlikely case that will bother too many people.

> 4. See elsewhere in the thread about list_copy vs. list_concat :-)

I saw that. Thanks for fixing. The only weird thing I see in the
changes is that the comment here claims it makes a copy, but it does
not.

+ * Right now, there's nothing to do here, so we just copy the list.
+ */
+static void
+transformExtendedStatistics(CreateStmtContext *cxt)
+{
+   cxt->alist = list_concat(cxt->alist, cxt->extstats);


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: inserts into partitioned table may cause crash

2018-03-05 Thread Amit Langote
Fujita-san,

Thanks for the review.

On 2018/03/05 22:00, Etsuro Fujita wrote:
> I started reviewing this.  I think the analysis you mentioned upthread
> would be correct, but I'm not sure the patch is the right way to go
> because I think that exception handling added by the patch throughout
> ExecInsert such as:
> 
> @@ -408,7 +408,10 @@ ExecInsert(ModifyTableState *mtstate,
>     slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);
> 
>     if (slot == NULL)   /* "do nothing" */
> -   return NULL;
> +   {
> +   result = NULL;
> +   goto restore_estate_result_rel;
> +   }
> 
> would reduce the code readability.

Hmm yeah, it is a bit hacky.

> An alternative fix for this would be to handle the set/reset of
> estate->es_result_relation_info in a higher level ie, ExecModifyTable,
> like the attached: (1) before calling ExecInsert, we do the preparation
> work for tuple routing of one tuple (eg, determine the partition for the
> tuple and convert the format of the tuple in a separate function, which
> also sets estate->es_result_relation_info to the partition for ExecInsert
> to work on it; then (2) we call ExecInsert, which just inserts into the
> partition; then (3) we reset estate->es_result_relation_info back to the
> root partitioned table.  One good thing about the alternative is to return
> ExecInsert somehow to PG9.6, which would make maintenance easier.  COPY
> has the same issue, so the attached also fixes that.  (Maybe we could do
> some refactoring to use the same code in both cases, but I'm not sure we
> should do that as a fix.)  What do you think about the alternative?

Your patch seems like a good cleanup overall, fixes this bug, and seems to
make future maintenance easier.  So, +1.  I agree that doing a surgery
like this on COPY is better left for later.

I noticed (as you may have too) that it cannot be applied to the 10 branch
as is.  I made the necessary changes so that it can be.  See attached
patch with filename suffixed "-10backpatch".  Also attached is your patch
unchanged to have both of them together.

Thanks,
Amit
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2768,2779  CopyFrom(CopyState cstate)
 * tuples inserted by an INSERT command.
 */
processed++;
  
!   if (saved_resultRelInfo)
!   {
!   resultRelInfo = saved_resultRelInfo;
!   estate->es_result_relation_info = resultRelInfo;
!   }
}
}
  
--- 2768,2780 
 * tuples inserted by an INSERT command.
 */
processed++;
+   }
  
!   /* Restore the saved ResultRelInfo */
!   if (saved_resultRelInfo)
!   {
!   resultRelInfo = saved_resultRelInfo;
!   estate->es_result_relation_info = resultRelInfo;
}
}
  
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***
*** 67,72  static void ExecSetupChildParentMapForTcs(ModifyTableState 
*mtstate);
--- 67,77 
  static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
  static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
int whichplan);
+ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
+   PartitionTupleRouting *proute,
+   EState *estate,
+   ResultRelInfo *rootRelInfo,
+   TupleTableSlot *slot);
  
  /*
   * Verify that the tuples to be produced by INSERT or UPDATE match the
***
*** 265,271  ExecInsert(ModifyTableState *mtstate,
  {
HeapTuple   tuple;
ResultRelInfo *resultRelInfo;
-   ResultRelInfo *saved_resultRelInfo = NULL;
RelationresultRelationDesc;
Oid newId;
List   *recheckIndexes = NIL;
--- 270,275 
***
*** 282,381  ExecInsert(ModifyTableState *mtstate,
 * get information on the (current) result relation
 */
resultRelInfo = estate->es_result_relation_info;
- 
-   /* Determine the partition to heap_insert the tuple into */
-   if (mtstate->mt_partition_tuple_routing)
-   {
-   int leaf_part_index;
-   PartitionTupleRouting *proute = 
mtstate->mt_partition_tuple_routing;
- 
-   /*
-* Away we go ... If we end up not finding a partition after 
all,
-* 

Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-05 Thread Pavel Stehule
2018-03-06 2:32 GMT+01:00 Michael Paquier :

> On Mon, Mar 05, 2018 at 09:25:09PM +0100, Pavel Stehule wrote:
> > I afraid so there is not good solution. Is possible to store options in
> > original form? When the function will be displayed, then the original
> value
> > will be displayed. The current patch (with known extensions) can be used
> as
> > bug fix and back patched. In new version we store original value with
> > quotes.
>
> You mean storing the values in pg_proc.proconfig at the creation time of
> the function?  That would basically work, except for the case of a
> function using a parameter which is not from the same PL.  The function
> creation would fail because it cannot find the parameter it is looking
> for as we need to look at loaded parameters to know it uses a list or
> not :(
>

yes. It can fails on execution time, but it is something like runtime error.

just dump should to produce same form like was input. So if on input we got
quotes, then we should to use quotes on output. Without querying somewhere.

The possible quotes can be removed in function compile time.

Pavel

> --
> Michael
>


Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-03-05 Thread Amit Kapila
On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro
 wrote:
> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila  wrote:
>> Yes, but I think it would be better if we call this once we are sure
>> that at least one tuple from the old bucket has been transferred
>> (consider if all tuples in the old bucket are dead).  Apart from this,
>> I think this patch has missed handling the cases where we scan the
>> buckets when the split is in progress.  In such cases, we scan both
>> old and new bucket, so I think we need to ensure that we take
>> PredicateLock on both the buckets during such scans.
>
> Hmm.  Yeah.
>
> So, in _hash_first(), do you think we might just need this?
>
>   if (H_BUCKET_BEING_POPULATED(opaque))
>   {
>   ...
>   old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
>   ...
>   old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
> + PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
> scan->xs_snapshot);
>   TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));
>
> That is, if you begin scanning a 'new' bucket, we remember the old
> bucket and go and scan that too, so we'd better predicate-lock both up
> front (or I suppose we could do it later when we visit that page, but
> here it can be done in a single place).
>

Yeah, that can work, but I am slightly worried that we might actually
never scan the old bucket (say for queries with Limit clause) in which
case it might give false positives for insertions in old buckets.

> What if we begin scanning an 'old' bucket that is being split?  I
> think we'll only do that for tuples that actually belong in the old
> bucket after the split, so no need to double-lock?  And I don't think
> a split could begin while we are scanning.  Do I have that right?
>

Right.

> As for inserting, I'm not sure if any special treatment is needed, as
> long as the scan code path (above) and the split code path are
> correct.  I'm not sure though.
>

I also don't think we need any special handling for insertions.

> I'm wondering how to test all this.  I'm thinking about a program that
> repeatedly creates a hash index and then slowly adds more things to it
> so that buckets split (maybe using distinct keys carefully crafted to
> hit the same bucket?), while concurrently hammering it with a ton of
> scans and then ... somehow checking correctness...
>

Yeah, that will generate the required errors, but not sure how to
verify correctness.  One idea could be that when the split is in
progress, we somehow stop it in-between (say by cancel request) and
then run targeted selects and inserts on the bucket being scanned and
bucket being populated.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-03-05 Thread Yura Sokolov
05.03.2018 18:00, Tom Lane пишет:
> Tomas Vondra  writes:
>> Snapshots are static (we don't really add new XIDs into existing ones,
>> right?), so why don't we simply sort the XIDs and then use bsearch to
>> lookup values? That should fix the linear search, without need for any
>> local hash table.
> 
> +1 for looking into that, since it would avoid adding any complication
> to snapshot copying.  In principle, with enough XIDs in the snap, an
> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
> dubious that we are often in the range where that would matter.
> We do need to worry about the cost of snapshot copying, too.

Sorting - is the first thing I've tried. Unfortunately, sorting itself
eats too much cpu. Filling hash table is much faster.

Can I rely on snapshots being static? May be then there is no need in
separate raw representation and hash table. I may try to fill hash table
directly in GetSnapshotData instead of lazy filling. Though it will
increase time under lock, so it is debatable and should be carefully
measured.

I'll look at a bug in CopySnapshot soon. Excuse me for delaying.

With regards,
Sokolov Yura



signature.asc
Description: OpenPGP digital signature


Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Peter Geoghegan
On Mon, Mar 5, 2018 at 7:10 PM, Claudio Freire  wrote:
>> Introducing any case that allows us to land on a recycled page, and
>> reason that it must at least not be the page we were looking for based
>> on *any* criteria about the page itself seems very brittle. Yeah, it
>> probably won't break in practice, but it's a bad design.
>
> How is this any different from btvacuumscan?
>
> I don't think it can be argued that somehow btvacuumscan has
> permission to be brittle and the rest of the code doesn't.

VACUUM doesn't have to worry about concurrent page recycling because
it is already the only thing that performs page deletion. It's already
the process that has the exclusive right to give index pages back to
the FSM.

-- 
Peter Geoghegan



Re: constraint exclusion and nulls in IN (..) clause

2018-03-05 Thread Amit Langote
Hi.

Thanks for reviewing again.

On 2018/03/05 23:04, Emre Hasegeli wrote:
>>> Shouldn't we check if we consumed all elements (state->next_elem >=
>>> state->num_elems) inside the while loop?
>>
>> You're right.  Fixed.
> 
> I don't think the fix is correct.  arrayconst_next_fn() can still
> execute state->next_elem++ without checking if we consumed all
> elements.  I couldn't manage to crash it though.

I couldn't get it to crash either, but it's wrong anyway.  What happens is
the calling code would perform constraint exclusion with a garbage clause
(that is one containing a garbage value picked from elem_values when
next_elem has exceeded num_elems) and that may alter the result of
partition pruning.  Verified that with the following example.

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);

-- before fixing
explain (costs off) select * from p where a in (2, null);
QUERY PLAN
---
 Append
   ->  Seq Scan on p1
 Filter: (a = ANY ('{2,NULL}'::integer[]))
   ->  Seq Scan on p2
 Filter: (a = ANY ('{2,NULL}'::integer[]))
(5 rows)

So, it looks as if the proposed patch has no effect at all.

-- after fixing
explain (costs off) select * from p where a in (2, null);
QUERY PLAN
--
 Append
   ->  Seq Scan on p2
 Filter: (a = ANY ('{2,NULL}'::integer[]))

Was a bit surprised that the calling code didn't crash while handling a
garbage clause.

> I am also not sure if it is proper to skip some items inside the
> "next_fn", but I don't know the code to suggest a better alternative.

Patch teaches it to ignore nulls when it's known that the operator being
used is strict.  It is harmless and has the benefit that constraint
exclusion gives an answer that is consistent with what actually running
such a qual against a table's rows would do.

Consider this example.

create table foo (a int check (a = 1));

insert into foo values (1), (null);

select * from foo where a in (2, null);
 a
---
(0 rows)

set constraint_exclusion to on;

-- you'd think constraint exclusion would avoid the scan
explain select * from foo where a in (2, null);
 QUERY PLAN
-
 Seq Scan on foo  (cost=0.00..41.88 rows=13 width=4)
   Filter: (a = ANY ('{2,NULL}'::integer[]))
(2 rows)

But it didn't, because the null in that list caused constraint exclusion
to mistakenly decide that the clause as a whole doesn't refute the
constraint (check (a = 1)).

-- with the patch
explain select * from foo where a in (2, null);
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

After ignoring null, only (a = 2) is left to consider and that does refute
(a = 1), so pruning works as desired.

BTW, if you compose the qual using an OR directly, it works as desired
even with HEAD:

explain select * from foo where a = 2 or a = null;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

That's because a = null is const-simplified in an earlier planning phase
to false, so (a = 2 or false) reduces to just a = 2, which does refute
foo's constraint (a = 1).  I think the same thing should happen when
constraint exclusion internally turns an IN (..) clause into a set of OR
clauses.  Skipping nulls in these next_fn functions is, in a way, same as
const-simplifying them away.

>> +   /* skip nulls if ok to do so */
>> +   if (state->opisstrict && state->next)
> 
> Do we need && state->next in here?  It is checked for (state->next ==
> NULL) 2 lines above.

Yeah, it wasn't needed.

>> +   {
>> +   Node   *node = (Node *) lfirst(state->next);
>> +
>> +   while (node && IsA(node, Const) && ((Const *) node)->constisnull)
> 
> Should we spell the first check like node != NULL as the rest of the code 
> does.

OK.

>> +   {
>> +   state->next = lnext(state->next);
>> +   if (state->next)
>> +   node = (Node *) lfirst(state->next);
>> +   }
>> +   }
> 
> I think this function is also not correct as it can call
> lnext(state->next) without checking.

Yeah.  Rearranged the code to fix that.

Attached updated patch.  Thanks again.

Regards,
Amit
From 1c16473da04dfa2d41fdcf5de6983b65aabd5ff4 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 1 Feb 2018 11:32:23 +0900
Subject: [PATCH v4] Disregard nulls in SAOP rightarg array/list during CE

---
 src/backend/optimizer/util/predtest.c | 45 +++
 src/test/regress/expected/inherit.out | 34 ++
 src/test/regress/sql/inherit.sql  |  1 +
 3 files changed, 75 insertions(+), 5 deletions(-)

diff --git 

Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Pavan Deolasee
On Tue, Mar 6, 2018 at 7:29 AM, Peter Geoghegan  wrote:

> On Mon, Mar 5, 2018 at 5:48 PM, Claudio Freire 
> wrote:
>
> > I believe PKs are a prime candidate for this optimization, and
> > expecting it to apply only when no concurrency is involved is severely
> > dumbing down the optimization.
>
> Pavan justified the patch using a benchmark that only involved a
> single client -- hardly typical for a patch that changes the B-Tree
> code. If the benefits with many clients can be shown to matter, that
> will make this much more interesting to me.


Ok. I will repeat those tests with more number of clients and report back.

Regarding your suggestion about using page LSN to detect intermediate
activity, my concern is that unless we store that value in shared memory,
concurrent backends, even if inserting values in an order, will make
backend-local cached page LSN invalid and the optimisation will not
kick-in.

I am yet to digest the entire conversation between you and Claudio; you
guys clearly understand b-tree internals better than me. It seems while
you're worried about missing out on something, Claudio feels that we can
find a safe way just looking at the information available in the current
page. I feel the same way, but will need to re-read the discussion
carefully again.

Simon had raised concerns about DESC indexes and whether we need to do the
checks for leftmost page in that case. I haven't yet figured out if DESC
indexes are actually stored in the reverse order. I am gonna look at that
too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-05 Thread Peter Geoghegan
On Mon, Mar 5, 2018 at 3:02 AM, Pavan Deolasee  wrote:
>> Again, I have to ask: is such an UPDATE actually meaningfully
>> different from a concurrent DELETE + INSERT? If so, why is a special
>> error better than a dup violation, or maybe even having the INSERT
>> (and whole MERGE statement) succeed?
>>
>
> Ok, I agree. I have updated the patch to remove the serialization error.

Cool. This is really shaping up. Thank you for working so hard to
address my concerns.

> If MATCHED changes to NOT MATCHED because of concurrent update/delete, we now
> simply retry from the top and execute the first NOT MATCHED action, if WHEN
> AND qual passes on the updated version. Of course, if the MERGE does not
> contain any NOT MATCHED action then we simply ignore the target row and move
> to the next row. Since a NOT MATCHED case can never turn into a MATCHED
> case, there is no risk of a live lock.

Makes sense. We either follow an UPDATE chain, which must always make
forward progress, or do NOT MATCHED handling/insert a row, which only
happens when NOT MATCHED handling has been abandoned, and so similarly
must make forward progress. I think that this needs to be documented
in a central place, though. I'd like to see arguments for why the
design is livelock safe, as well as an explanation for how EPQ
handling works, what the goals of how it works are, and so on. I would
perhaps summarize it by saying something like the following within
ExecMerge():

"""
ExecMerge() EPQ handling repeatedly rechecks all WHEN MATCHED actions
for each new candidate tuple as an update chain is followed, either
until a tuple is actually updated/deleted, or until we decide that the
new join input row actually requires WHEN NOT MATCHED handling after
all. Rechecking join quals for a single candidate tuple is performed
by ExecUpdate() and ExecDelete(), which can specially instruct us to
advance to the next tuple in the update chain so that we can recheck
our own WHEN ... AND quals (when the join quals no longer pass due to
a concurrent UPDATE), or to switch to the WHEN NOT MATCHED case (when
join quals no longer pass due to a concurrent DELETE).

EPQ only ever installs a new tuple for the target relation (never the
source), so changing from one WHEN MATCHED action to another during
READ COMMITTED conflict handling must be due to a concurrent UPDATE
that changes WHEN ... AND qual referenced attribute(s). If it was due
to a concurrent DELETE, or, equivalently, some concurrent UPDATE that
affects the target's primary key attribute(s) (or whatever the outer
join's target qual attributes are), then we switch to WHEN NOT MATCHED
handling, which will never switch back to WHEN MATCHED. ExecMerge()
avoids livelocks by always either walking the UPDATE chain, which
makes forward progress, or by finally switching to WHEN NOT MATCHED
processing.

"""

ExecUpdate()/ExecDelete() are not "driving" EPQ here, which is new --
they're doing one small part of it (the join quals for one tuple) on
behalf of ExecMerge(). I don't expect you to just take my wording
without editing or moving parts around a bit; I think that you'll get
the general idea from what I've written, though.

Actually, my wording may need to be changed to reflect a new code
structure for EPQ. The current control flow makes the reader think
about UPDATE quals, as well as DELETE quals. Instead, the reader
should think about WHEN ... AND quals, as well as MERGE's outer JOIN
quals (the join quals are the same, regardless of whether an UPDATE or
DELETE happens to be involved). The control flow between ExecMerge(),
ExecUpdate(), and ExecDelete() is just confusing.

Sure, ExecUpdate() and ExecDelete() *do* require a little special
handling for MERGE, but ExecMerge() should itself call EvalPlanQual()
for the join quals, rather than getting ExecUpdate()/ExecDelete() to
do it. All that ExecUpdate()/ExecDelete() need to tell ExecMerge() is
"you need to do your HeapTupleUpdated stuff". This not only clarifies
the new EPQ design -- it also lets you do that special case rti EPQ
stuff in the right place (BTW, I'm not a huge fan of that detail in
general, but haven't thought about it enough to say more). The new EPQ
code within ExecUpdate() and ExecDelete() is already identical, so why
not do it that way?

I especially like this organization because ExecOnConflictUpdate()
already calls ExecUpdate() without expecting it to do EPQ handling at
all, which is kind of similar to what I suggest -- in both cases, the
caller "takes care of all of the details of the scan". And, by
returning a HTSU_Result value to ExecMerge() from
ExecUpdate()/ExecDelete(), you could also do the new cardinality
violation stuff from within ExecMerge() in exactly one place -- no
need to invent new error_on_SelfUpdate arguments.

I would also think about organizing ExecMerge() handling so that
CMD_INSERT handling became WHEN NOT MATCHED handling. It's
unnecessarily confusing to have that included in the same switch

Re: PATCH: Configurable file mode mask

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 03:07:20PM -0500, David Steele wrote:
> On 2/28/18 2:28 AM, Michael Paquier wrote:
>> On Tue, Feb 27, 2018 at 03:52:32PM -0500, David Steele wrote:
>> I don't quite understand here.  I have no objection into extending
>> setup_cluster() with a group_access argument so as the tests run both
>> the group access and without it, but I'd rather keep the low-level API
>> clean of anything fancy if we can use the facility which already
>> exists.
> 
> I'm not sure what you mean by, "facility that already exists".  I think
> I implemented this the same as the allows_streaming and has_archiving
> flags.  The only difference is that this option must be set at initdb
> time rather than in postgresql.conf.
> 
> Could you be more specific?

Let's remove has_group_access from PostgresNode::init and instead use
existing parameter called "extra".  In your patch, this setup:
has_group_access => 1
is equivalent to that:
extra => [ -g ]
You can also guess the value of has_group_access by parsing the
arguments from the array "extra".

> I think I prefer grouping 1 and 2.  It produces less churn, as you say,
> and I don't think MakeDirectoryDefaultPerm really needs its own patch.
> Based on comments so far, nobody has an objection to it.
> 
> In theory, the first two patches could be applied just for refactoring
> without adding group permissions at all.  There are a lot of new tests
> to make sure permissions are set correctly so it seems like a win
> right off.

Okay, that's fine for me.

>> check_pg_data_perm() looks useful even without $allow_group even if the
>> grouping facility is reverted at the end.  S_ISLNK should be considered
>> as well for tablespaces or symlinks of pg_wal?  We have such cases in
>> the regression tests.
> 
> Hmmm, the link is just pointing to a directory whose permissions have
> been changed.  Why do we need to worry about the link?

Oh, perhaps I misread your code here, but this ignores symlinks, for
example take an instance where pg_wal is symlinked, we'd likely want to
make sure that at least archive_status is using a correct set of
permissions, no?  There is a "follow" option in File::Find which could
help.

>> -if (chmod(path, S_IRUSR | S_IWUSR) != 0)
>> -{
>> -fprintf(stderr, _("%s: could not change permissions of \"%s\": %s\n"),
>> -progname, path, strerror(errno));
>> -exit_nicely();
>> -}
>> Hm.  Why are those removed?
> 
> When I started working on this, pg_upgrade did not set umask and instead
> did chmod for each dir created.  umask() has since been added (even
> before my patch) and so these are now noops.  Seemed easier to remove
> them than to change them all.
>
>> setup_signals();
>> 
>> -   umask(S_IRWXG | S_IRWXO);
>> -
>> create_data_directory();
>> This should not be moved around.
> 
> Hmmm - I moved it much earlier in the process which seems like a good
> thing.  Consider if there was a option to fixup permissions, like there
> is to fsync.  Isn't it best to set the mode as soon as possible to
> prevent code from being inserted before it?

Those two are separate issues.  Could you begin a new thread on the
matter?  This will attract more attention.

The indentation in RewindTest.pm is a bit wrong.

-   if (chmod(location, S_IRWXU) != 0)
+   current_umask = umask(0);
+   umask(current_umask);
[...]
-   if (chmod(pg_data, S_IRWXU) != 0)
+   if (chmod(pg_data, PG_DIR_MODE_DEFAULT & ~file_mode_mask) != 0)
Such modifications should be part of patch 3, not 2, where the group
features really apply.

my $test_mode = shift;

+   umask(0077);
+
RewindTest::setup_cluster($test_mode);
What's that for?  A comment would be welcome.

+# make sure all directories and files have group permissions
+if [ $(find ${PGDATA} -type f ! -perm 600 | wc -l) -ne 0 ]; then
+   echo "files in PGDATA with permission != 600";
+   exit 1;
+fi
Perhaps on hold if we are able to move pg_upgrade tests to a TAP
suite...  We'll see what happens.

Perhaps the tests should be cut into a separate patch?  Those are not
directly related to the refactoring done in patch 2.

Patch 2 begins to look fine for me.  I still need to look at patch 3 in
more details.
--
Michael


signature.asc
Description: PGP signature


Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Claudio Freire
On Mon, Mar 5, 2018 at 10:59 PM, Peter Geoghegan  wrote:
> On Mon, Mar 5, 2018 at 5:48 PM, Claudio Freire  wrote:
>> But in thise case there's no right link to follow, so it's a non-issue.
>>
>> BTree doesn't truncate AFAIK, so the cached block number can't be
>> nonexisting (beyond the end of the relation). It's probably fragile to
>> assume BTree will never truncate, so maybe it would be wise to check
>> for that. But if the block exists, I believe it contains a page that
>> can be safely interpreted by that condition. As long as there can be
>> no other pages with the same flags on the index while the cached page
>> is write-locked, that code will be safe.
>
> Introducing any case that allows us to land on a recycled page, and
> reason that it must at least not be the page we were looking for based
> on *any* criteria about the page itself seems very brittle. Yeah, it
> probably won't break in practice, but it's a bad design.

How is this any different from btvacuumscan?

I don't think it can be argued that somehow btvacuumscan has
permission to be brittle and the rest of the code doesn't.

Point is, it's not brittle. The on-disk format of the tree is such
that any block can be catalogued by inspecting its header,
btvacuumscan depends on that. Questions that can be answered looking
at a page header alone, are:

* Is it deleted or new?
* Is it a leaf?
* Is it rightmost?

Only question remaining is whether it's the *only* rightmost leaf, and
that can be guaranteed by locking.

So, can a flag check result in a random outcome? No. That would also
cause a random outcome for btvacuumscan and then vacuum would corrupt
the index just as well.

And now that we mention this... why is the code not using _bt_getbuf?
It already does quite a bit of sanity checking that is IMO quite
desirable here.



Re: [PATCH] GET DIAGNOSTICS FUNCTION_NAME

2018-03-05 Thread Yugo Nagata
On Thu, 1 Mar 2018 14:29:58 -0800
Andres Freund  wrote:

> Hi,
> 
> On 2018-01-11 11:03:26 +0900, Yugo Nagata wrote:
> > However, I don't inisist on this patch, so If anyone other don't need this
> > feature, I'll withdraw this.
> 
> Given this is where the discussion dried up more than a month ago I'm
> inclined to mark this as rejected unless somebody wants to argue
> otherwise?

I have no objection.

Thans,

> 
> Greetings,
> 
> Andres Freund


-- 
Yugo Nagata 



Re: pgsql: Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)

2018-03-05 Thread Alvaro Herrera
Thomas Munro wrote:
> On Tue, Mar 6, 2018 at 11:39 AM, Alvaro Herrera  
> wrote:
> > Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
> 
> Is this commit responsible for this valgrind error?
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack=2018-03-05%2023%3A03%3A01
> 
> ==430== Invalid read of size 2
> ==430==at 0x4C33F6F: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1021)
> ==430==by 0x483419: heap_fill_tuple (heaptuple.c:276)
> ==430==by 0x4842C6: heap_form_tuple (heaptuple.c:771)
> ==430==by 0x6474B5: CreateStatistics (statscmds.c:322)
> ==430==by 0x865784: ProcessUtilitySlow (utility.c:1668)

Uh ... it probably is.  I'll take a look tomorrow, but my guess is that
my last minute removal of the conversion to Name is to blame.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Kerberos test suite

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 02:45:07PM -0500, Peter Eisentraut wrote:
> On 2/27/18 00:21, Michael Paquier wrote:
>> +my ($stdout, $krb5_version);
>> +IPC::Run::run [ 'krb5-config', '--version' ], '>', \$stdout or die
>> "could not execute krb5-config";
>> +$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/ or die "could not get
>> Kerberos version";
>> +$krb5_version = $1;
>> Time for a new routine command_log which executes the command, then
>> returns stdout and stderr to the caller?
> 
> I didn't do anything about this.  Do we have any more places where that
> would be needed right now?

I don't mind if this happens later on.  I can send a patch as well if
necessary.  There are a couple of places where things can be
consolidated using a single entry point:
- 010_pg_archivecleanup.pl
- PostgresNode::psql, where we could merge things.
- PostgresNode::poll_query_until
- PostgresNode::pg_recvlogical_upto
- TestLib::check_pg_config
- TestLib::program_help_ok
- TestLib::program_version_ok
- TestLib::program_options_handling_ok
- TestLib::command_like
- TestLib::command_like_safe
- TestLib::command_fails_like
- TestLib::command_checks_all

+if ($ENV{with_gssapi} eq 'yes')
+{
+   plan tests => 4;
+}
+else
+{
+   plan skip_all => 'GSSAPI/Kerberos not supported by this build';
+}
Thanks for adding this sanity check.

+my $kdc_port = int(rand() * 16384) + 49152;
That may not be worth worrying as that's an ephemeral port range but it
can make the test a bit unstable...

Perhaps the tests should be skipped on Windows or just produce an error?
Like LDAP tests, libraries are supported on Windows but the hardcoded
paths make things harder to handle there.
--
Michael


signature.asc
Description: PGP signature


Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Peter Geoghegan
On Mon, Mar 5, 2018 at 5:48 PM, Claudio Freire  wrote:
> From what I read, both phase 1 & 2 are served by the !P_IGNORE check.

True.

> For the third phase:
>
>> A deleted page can only be reclaimed once there is no scan or search that
>> has a reference to it; until then, it must stay in place with its
>> right-link undisturbed.
>
> That's because:
>
>> A deleted page cannot be reclaimed immediately, since there may be other
>> processes waiting to reference it (ie, search processes that just left the
>> parent, or scans moving right or left from one of the siblings).  These
>> processes must observe that the page is marked dead and recover
>> accordingly.  Searches and forward scans simply follow the right-link
>> until they find a non-dead page --- this will be where the deleted page's
>> key-space moved to.
>
> But in thise case there's no right link to follow, so it's a non-issue.
>
> BTree doesn't truncate AFAIK, so the cached block number can't be
> nonexisting (beyond the end of the relation). It's probably fragile to
> assume BTree will never truncate, so maybe it would be wise to check
> for that. But if the block exists, I believe it contains a page that
> can be safely interpreted by that condition. As long as there can be
> no other pages with the same flags on the index while the cached page
> is write-locked, that code will be safe.

Introducing any case that allows us to land on a recycled page, and
reason that it must at least not be the page we were looking for based
on *any* criteria about the page itself seems very brittle. Yeah, it
probably won't break in practice, but it's a bad design.

> Yes, if the scankey is equal to the leftmost key, the first occurrence
> of that key could be on the left, so the check would have to be for
> strictly greater. But that's about as complex as it would get.

That's pretty complicated.

>> I didn't say that the patch is necessarily wrong here. Just that it
>> makes me nervous.
>
> Any change to btree would make anyone nervous ;-)

True. Anyway, this isn't the thing that I'm most concerned about right now.

>> I didn't get what the point of checking the first item on the page
>> (your proposal) was.
>
> Right now, if you've got concurrent transactions, there's absolutely
> no chance this optimization would kick in. For it to happen,
> insertions on the index need to happen in order, each insertion a key
> greater than any other key (visible or not) on the index.
>
> Concurrent inserts on PKs are unlikely to arrive in order, a slight
> disorder is to be expected as serial sequence numbers get generated a
> significant amount of time before the insert actually takes place.
> Different backends can get to the index insertion at different times,
> causing unordered inserts.
>
> I believe PKs are a prime candidate for this optimization, and
> expecting it to apply only when no concurrency is involved is severely
> dumbing down the optimization.

Pavan justified the patch using a benchmark that only involved a
single client -- hardly typical for a patch that changes the B-Tree
code. If the benefits with many clients can be shown to matter, that
will make this much more interesting to me.

-- 
Peter Geoghegan



Re: BUG #14941: Vacuum crashes

2018-03-05 Thread Andres Freund
On 2018-03-03 16:12:52 -0800, Andres Freund wrote:
> On 2018-01-11 08:14:42 +0900, Michael Paquier wrote:
> > On Wed, Jan 10, 2018 at 05:26:43PM +, Bossart, Nathan wrote:
> > > Perhaps any logging changes for VACOPT_NOWAIT could be handled in a
> > > separate thread.  For now, I've updated 0003 to remove the logging
> > > changes.
> > 
> > Thanks. I am marking those as ready for committer, you are providing the
> > set patch patch which offer the most consistent experience.
> 
> I was working on committing 0002 and 0003, when I noticed that the
> second patch doesn't actually fully works.  NOWAIT does what it says on
> the tin iff the table is locked with a lower lock level than access
> exclusive.  But if AEL is used, the command is blocked in

I've pushed the first one now. There seems to be no reason to wait with
it, even it takes a bit till we can get the second part right.

- Andres



Re: Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers

2018-03-05 Thread Craig Ringer
On 5 March 2018 at 23:25, David Steele  wrote:

> Hi Craig,
>
> On 1/21/18 5:45 PM, Craig Ringer wrote:
> > On 6 January 2018 at 08:28, Alvaro Herrera  > > wrote:
> >
> > I think this should use ReadDirExtended with an elevel less than
> ERROR,
> > and do nothing.
> >
> > Why have strcmp(.) and strcmp(..)?  These are going to be skipped by
> the
> > comparison to "xid" prefix anyway.  Looks like strcmp processing
> > power waste.
> >
> > Please don't use bare sprintf() -- upgrade to snprintf.
> >
> > With this coding, if I put a root-owned file "xidfoo" in a replslot
> > directory, it will PANIC the server.  Is that okay?  Why not read the
> > file name with sscanf(), since we know the precise format it has?
> Then
> > we don't need to bother with random crap left around.  Maybe a good
> time
> > to put the "xid-%u-lsn-%X-%X.snap" literal in a macro?   I guess the
> > rationale is that if you let random people put "xidfoo" files in your
> > replication slot dirs, you deserve a PANIC anyway, but I'm not sure.
> >
> > I'm happy to address those comments.
> >
> > The PANIC probably made sense when it was only done on startup, but not
> > now it's at runtime.
> >
> > The rest is mainly retained from the prior code, but it's a good chance
> > to make those changes.
> This patch was marked Waiting on Author last December.  Do you know when
> you'll have a chance to provide an updated patch?
>
> Given that it's a bug fix it would be good to see a patch for this CF,
> or very soon after.
>
>
Thanks for the reminder. I'll address it today if I can.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: 2018-03 CFM

2018-03-05 Thread Andres Freund
On 2018-03-05 20:49:02 -0500, David Steele wrote:
> On 3/5/18 8:33 PM, Thomas Munro wrote:
> > On Tue, Mar 6, 2018 at 1:00 PM, David Steele  wrote:
> >> I've been using commitfest.cputube.org for reference since the last CF,
> >> though I'm unsure if it rechecks patches when master changes, so I do
> >> that manually.  Anyway, that's probably too much to ask since every push
> >> to master would set off a 100+ builds on Travis.
> >>
> >> Maybe just reapply once a day when master changes?  And only rebuild if
> >> the patch changes?  Not perfect, but it would work in most cases.
> >>
> >> I use Travis for the pgBackRest project, and I try to be considerate of
> >> the free resources.  I dislike the idea of throwing a ton of builds at
> >> them without considering what would be most useful.
> > 
> > It's triggered by new patches AND new commits.  Since I want to be
> > polite, I try to avoid having more than 2 builds going at a time.
> > They generously invite open source projects like us to run up to 5
> > builds concurrently for free, so I thought I was being at least a
> > little bit considerate, though I guess I could drop it down to 1.  It
> > goes in rotating order of Commitfest ID and waits in between to limit
> > the rate. The constant stream of both commits and patches during a
> > 200+ entry Commitfest mean that it's effectively building around the
> > clock and each CF entry gets retested about twice a day.  Perhaps I
> > could make it smarter... I'll think about that.  Thanks!
> 
> Another option would be too look at their graphs and figure out when
> their peak times are, then ramp up the builds outside that time.
> 
> Even so, 2 builds at a time sounds pretty moderate to me.

Have we pinged them about what they're ok with? In previous interactions
I had with them they were pretty responsive.

Greetings,

Andres Freund



Re: 2018-03 CFM

2018-03-05 Thread David Steele
On 3/5/18 8:33 PM, Thomas Munro wrote:
> On Tue, Mar 6, 2018 at 1:00 PM, David Steele  wrote:
>> I've been using commitfest.cputube.org for reference since the last CF,
>> though I'm unsure if it rechecks patches when master changes, so I do
>> that manually.  Anyway, that's probably too much to ask since every push
>> to master would set off a 100+ builds on Travis.
>>
>> Maybe just reapply once a day when master changes?  And only rebuild if
>> the patch changes?  Not perfect, but it would work in most cases.
>>
>> I use Travis for the pgBackRest project, and I try to be considerate of
>> the free resources.  I dislike the idea of throwing a ton of builds at
>> them without considering what would be most useful.
> 
> It's triggered by new patches AND new commits.  Since I want to be
> polite, I try to avoid having more than 2 builds going at a time.
> They generously invite open source projects like us to run up to 5
> builds concurrently for free, so I thought I was being at least a
> little bit considerate, though I guess I could drop it down to 1.  It
> goes in rotating order of Commitfest ID and waits in between to limit
> the rate. The constant stream of both commits and patches during a
> 200+ entry Commitfest mean that it's effectively building around the
> clock and each CF entry gets retested about twice a day.  Perhaps I
> could make it smarter... I'll think about that.  Thanks!

Another option would be too look at their graphs and figure out when
their peak times are, then ramp up the builds outside that time.

Even so, 2 builds at a time sounds pretty moderate to me.

-- 
-David
da...@pgmasters.net



Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Claudio Freire
On Mon, Mar 5, 2018 at 9:52 PM, Peter Geoghegan  wrote:
> On Mon, Mar 5, 2018 at 4:37 PM, Claudio Freire  wrote:
>> Correct me if I'm wrong, but there's lots of code in nbtree already
>> that risks reading recycled pages for various reasons. Presumably,
>> checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what
>> !P_IGNORE implies.
>
> You're mistaken. Read the nbtree README on page deletion, and the
> RecentGlobalXmin interlock. Note also that there are 3 distinct page
> states in play as far as deletion/recycling is concerned:
>
> 1. The first phase of deletion
> 2. The second phase of deletion.
> 3. Post-recycling, when in general the page could look like just about 
> anything.
>
> It's page deletion's job to make sure that an index scan cannot land
> on a page after it was recycled. The corollary is that it is not
> everyone else's job to make sure that that didn't happen -- how could
> that possibly work?

>From what I read, both phase 1 & 2 are served by the !P_IGNORE check.

For the third phase:

> A deleted page can only be reclaimed once there is no scan or search that
> has a reference to it; until then, it must stay in place with its
> right-link undisturbed.

That's because:

> A deleted page cannot be reclaimed immediately, since there may be other
> processes waiting to reference it (ie, search processes that just left the
> parent, or scans moving right or left from one of the siblings).  These
> processes must observe that the page is marked dead and recover
> accordingly.  Searches and forward scans simply follow the right-link
> until they find a non-dead page --- this will be where the deleted page's
> key-space moved to.

But in thise case there's no right link to follow, so it's a non-issue.

BTree doesn't truncate AFAIK, so the cached block number can't be
nonexisting (beyond the end of the relation). It's probably fragile to
assume BTree will never truncate, so maybe it would be wise to check
for that. But if the block exists, I believe it contains a page that
can be safely interpreted by that condition. As long as there can be
no other pages with the same flags on the index while the cached page
is write-locked, that code will be safe.

 You could allow for some slack, by comparing against the leftmost key
 instead. You just need to know whether the key fits in the page. Then,
 the insert can proceed as usual, doing the binsearch to find the
 proper insertion point, etc.
>>>
>>> The whole way that this patch opts out of using an exclusive buffer
>>> lock on the "first page the value could be on" (the _bt_check_unique()
>>> + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
>>> terribly concrete.
>>
>> Assuming the rightmost page is the first page the value could be on,
>> it already does get an exclusive buffer lock.
>
> This can be quite fiddly. The downlink in the parent can be equal to
> the value in the target page, meaning that we actually lock the
> target's left sibling (see _bt_binsrch() special case for internal
> pages).

I'm not sure which special case you're referring to, I don't see
anything relevant in _bt_binsrch.

Yes, if the scankey is equal to the leftmost key, the first occurrence
of that key could be on the left, so the check would have to be for
strictly greater. But that's about as complex as it would get.

> I didn't say that the patch is necessarily wrong here. Just that it
> makes me nervous.

Any change to btree would make anyone nervous ;-)

>> If one wanted to relax the condition as I proposed, that uniqueness
>> check is necessary, so that's why I propose shortcircuiting the search
>> only, and not the actual insertion on the page. I believe IMO it's
>> more important to try and maximize the conditions under which the
>> optimization can be applied.
>
> I didn't get what the point of checking the first item on the page
> (your proposal) was.

Right now, if you've got concurrent transactions, there's absolutely
no chance this optimization would kick in. For it to happen,
insertions on the index need to happen in order, each insertion a key
greater than any other key (visible or not) on the index.

Concurrent inserts on PKs are unlikely to arrive in order, a slight
disorder is to be expected as serial sequence numbers get generated a
significant amount of time before the insert actually takes place.
Different backends can get to the index insertion at different times,
causing unordered inserts.

I believe PKs are a prime candidate for this optimization, and
expecting it to apply only when no concurrency is involved is severely
dumbing down the optimization.

Consider 3 transactions on a table like:

CREATE TABLE test ( id serial, d integer );

T1: INSERT INTO test (d) VALUES (3), (4), (5), (6);
T2: INSERT INTO test (d) SELECT generate_series(1,10);
T3: INSERT INTO test (d) VALUES (1);

Each transaction will take values from the sequence at some point of
the evaluation, 

Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 3/5/18 8:03 PM, Michael Paquier wrote:
> On Mon, Mar 05, 2018 at 05:11:29PM -0500, Tom Lane wrote:
>> David Steele  writes:
>>> On 2/28/18 2:28 AM, Michael Paquier wrote:
 That's basically a recursive chmod, so chmod_recursive is more adapted?
 I could imagine that this is useful as well for removing group
 permissions, so the new mode could be specified as an argument.
>>
>>> The required package (File::chmod::Recursive) for chmod_recursive is not
>>> in use anywhere else and was not installed when I installed build
>>> dependencies.
> 
> Woah.  I didn't even know that chmod_recursive existed and was part of a
> module.  What I commented about here was to rename to a more generic
> name the routine you are implementing so as other tests could use it.

OK, that is pretty funny.  I thought you were directing me to a Perl
function I hadn't heard of (but did exist).

>>> I'm not sure what the protocol for introducing a new Perl module is?  I
>>> couldn't find packages for the major OSes.  Are we OK with using CPAN?
>>
>> I don't think that's cool.  Anything that's not part of a standard Perl
>> installation is a bit of a lift already, and if it's not packaged by
>> major distros then it's really a problem for many people.  (Yeah, they
>> may know what CPAN is, but they might have local policy issues about
>> installing directly from there.)
> 
> Yes, that's not cool.  I am not pushing in this direction.  Sorry for
> creating confusion with fuzzy wording.
No worries, I'll just make it more generic as suggested.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: User defined data types in Logical Replication

2018-03-05 Thread Masahiko Sawada
On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera  wrote:
> I noticed that logicalrep_typmap_gettypname's only caller is
> slot_store_error_callback, which is an error callback.  So by raising an
> error from the former, we would effectively just hide the actual reason
> for the error behind the error that the cache entry cannot be found.
>
> Therefore, I'm inclined to make this function raise a warning, then
> return a substitute value (something like "unrecognized type XYZ").  Do
> the same for the case with the mismatched major versions.  Lastly, if
> LogicalRepTypMap is not defined, also throw a warning and return a
> substitute value rather than try to initialize the hash table:
> initializing the table is not going to make the entry show up in it,
> anyway, so it's pretty pointless; and there's plenty chance for things
> to go wrong, which is not a good idea in an error callback.

I agree with you about not hiding the actual reason for the error but
if we raise a warning at logicalrep_typmap_gettypname don't we call
slot_store_error_callback recursively?

>
> Do we need a new TAP test for this?  Judging by
> https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html
> showing all zeroes for the last function, we do.  Same with
> slot_store_error_callback in
> https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html
>

Agreed. Will add a TAP test.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 05:21:11PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-03-06 10:17:49 +0900, Michael Paquier wrote:
> > On Mon, Mar 05, 2018 at 05:06:59PM -0800, Andres Freund wrote:
> > > Yea, that's a concern. OTOH, it doesn't seem nice to grow duplicates of
> > > similar code. It'd not be too hard to move RangeVarGetRelidExtended()
> > > code into RangeVarGetRelidInternal() and add
> > > RangeVarGetRelidTryLock(). Not sure if that's any better.  Or just add
> > > RangeVarGetRelidExtended2() :)
> > 
> > FWIW, it would have been nice to switch RangeVarGetRelidExtended
> 
> What exactly do you mean with the paste tense here?

s/paste/past/?  I mean "When RangeVarGetRelidExtended was created."

>> so as it handles a set of uint8 flags as one of its arguments.
> 
> Right, that's what I was proposing. Although I'd just go for uint32,
> there's no benefit in uint8 here.

No objection to what you are suggested here.

>> Avoiding a new flavor of RangevarGet would be also nice, now
>> RangeVarGetRelidExtended() is likely popular enough in extensions that
>> much things would break.
> 
> I can't follow?

Please, let's not have RangeVarGetRelidTryLock(), RangeVarGetRelidFoo()
or RangeVarGetRelidHoge().  This would make a third API designed at
doing the same thing...
--
Michael


signature.asc
Description: PGP signature


Re: 2018-03 CFM

2018-03-05 Thread Thomas Munro
On Tue, Mar 6, 2018 at 1:00 PM, David Steele  wrote:
> I've been using commitfest.cputube.org for reference since the last CF,
> though I'm unsure if it rechecks patches when master changes, so I do
> that manually.  Anyway, that's probably too much to ask since every push
> to master would set off a 100+ builds on Travis.
>
> Maybe just reapply once a day when master changes?  And only rebuild if
> the patch changes?  Not perfect, but it would work in most cases.
>
> I use Travis for the pgBackRest project, and I try to be considerate of
> the free resources.  I dislike the idea of throwing a ton of builds at
> them without considering what would be most useful.

It's triggered by new patches AND new commits.  Since I want to be
polite, I try to avoid having more than 2 builds going at a time.
They generously invite open source projects like us to run up to 5
builds concurrently for free, so I thought I was being at least a
little bit considerate, though I guess I could drop it down to 1.  It
goes in rotating order of Commitfest ID and waits in between to limit
the rate. The constant stream of both commits and patches during a
200+ entry Commitfest mean that it's effectively building around the
clock and each CF entry gets retested about twice a day.  Perhaps I
could make it smarter... I'll think about that.  Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 09:25:09PM +0100, Pavel Stehule wrote:
> I afraid so there is not good solution. Is possible to store options in
> original form? When the function will be displayed, then the original value
> will be displayed. The current patch (with known extensions) can be used as
> bug fix and back patched. In new version we store original value with
> quotes.

You mean storing the values in pg_proc.proconfig at the creation time of
the function?  That would basically work, except for the case of a
function using a parameter which is not from the same PL.  The function
creation would fail because it cannot find the parameter it is looking
for as we need to look at loaded parameters to know it uses a list or
not :(
--
Michael


signature.asc
Description: PGP signature


Re: Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Andres Freund
Hi,

On 2018-03-06 10:17:49 +0900, Michael Paquier wrote:
> On Mon, Mar 05, 2018 at 05:06:59PM -0800, Andres Freund wrote:
> > Yea, that's a concern. OTOH, it doesn't seem nice to grow duplicates of
> > similar code. It'd not be too hard to move RangeVarGetRelidExtended()
> > code into RangeVarGetRelidInternal() and add
> > RangeVarGetRelidTryLock(). Not sure if that's any better.  Or just add
> > RangeVarGetRelidExtended2() :)
> 
> FWIW, it would have been nice to switch RangeVarGetRelidExtended

What exactly do you mean with the paste tense here?


> so as it handles a set of uint8 flags as one of its arguments.

Right, that's what I was proposing. Although I'd just go for uint32,
there's no benefit in uint8 here.


> Avoiding a new flavor of RangevarGet would be also nice, now
> RangeVarGetRelidExtended() is likely popular enough in extensions that
> much things would break.

I can't follow?


Greetings,

Andres Freund



Re: Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 05:06:59PM -0800, Andres Freund wrote:
> Yea, that's a concern. OTOH, it doesn't seem nice to grow duplicates of
> similar code. It'd not be too hard to move RangeVarGetRelidExtended()
> code into RangeVarGetRelidInternal() and add
> RangeVarGetRelidTryLock(). Not sure if that's any better.  Or just add
> RangeVarGetRelidExtended2() :)

FWIW, it would have been nice to switch RangeVarGetRelidExtended so as
it handles a set of uint8 flags as one of its arguments.  missing_ok and
nowait could be part of that.  Avoiding a new flavor of RangevarGet
would be also nice, now RangeVarGetRelidExtended() is likely popular
enough in extensions that much things would break.
--
Michael


signature.asc
Description: PGP signature


Re: Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Andres Freund
On 2018-03-05 19:57:44 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > One wrinkle in that plan is that it'd not be trivial to discern whether
> > a lock couldn't be acquired or whether the object vanished.  I don't
> > really have good idea how to tackle that yet.
> 
> Do we really care which case applies?

I think there might be cases where we do. As expand_vacuum_rel()
wouldn't use missing_ok = true, it'd not be an issue for this specific
callsite though.


> But having to mess with the semantics of RangeVarGetRelidExtended seems
> like a good reason not to go down this path...

Yea, that's a concern. OTOH, it doesn't seem nice to grow duplicates of
similar code. It'd not be too hard to move RangeVarGetRelidExtended()
code into RangeVarGetRelidInternal() and add
RangeVarGetRelidTryLock(). Not sure if that's any better.  Or just add
RangeVarGetRelidExtended2() :)

Greetings,

Andres Freund



Re: PATCH: Configurable file mode mask

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 05:11:29PM -0500, Tom Lane wrote:
> David Steele  writes:
>> On 2/28/18 2:28 AM, Michael Paquier wrote:
>>> That's basically a recursive chmod, so chmod_recursive is more adapted?
>>> I could imagine that this is useful as well for removing group
>>> permissions, so the new mode could be specified as an argument.
> 
>> The required package (File::chmod::Recursive) for chmod_recursive is not
>> in use anywhere else and was not installed when I installed build
>> dependencies.

Woah.  I didn't even know that chmod_recursive existed and was part of a
module.  What I commented about here was to rename to a more generic
name the routine you are implementing so as other tests could use it.

>> I'm not sure what the protocol for introducing a new Perl module is?  I
>> couldn't find packages for the major OSes.  Are we OK with using CPAN?
> 
> I don't think that's cool.  Anything that's not part of a standard Perl
> installation is a bit of a lift already, and if it's not packaged by
> major distros then it's really a problem for many people.  (Yeah, they
> may know what CPAN is, but they might have local policy issues about
> installing directly from there.)

Yes, that's not cool.  I am not pushing in this direction.  Sorry for
creating confusion with fuzzy wording.
--
Michael


signature.asc
Description: PGP signature


Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 04:34:09PM -0800, Andres Freund wrote:
> Scratch that, we should be going down the
>   /* If caller supplied OID, there's nothing we need do here. */
>   if (OidIsValid(vrel->oid))
>   {
>   oldcontext = MemoryContextSwitchTo(vac_context);
>   vacrels = lappend(vacrels, vrel);
>   MemoryContextSwitchTo(oldcontext);
>   }
> branch in expand_vacuum_rel() for autovacuum, so this shouldn't
> matter. Sorry for the noise

Yes, I don't see a problem.  However I can understand that it is easy to
be confused on those code paths if you are not used to them and this
area has changed quite a bit the last years.  Perhaps we could add an
Assert(IsAutoVacuumWorkerProcess()) to reduce the confusion?
--
Michael


signature.asc
Description: PGP signature


Re: Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Tom Lane
Andres Freund  writes:
> One wrinkle in that plan is that it'd not be trivial to discern whether
> a lock couldn't be acquired or whether the object vanished.  I don't
> really have good idea how to tackle that yet.

Do we really care which case applies?

But having to mess with the semantics of RangeVarGetRelidExtended seems
like a good reason not to go down this path...

regards, tom lane



Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Andres Freund
On 2018-03-05 19:53:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Scratch that, we should be going down the
> > /* If caller supplied OID, there's nothing we need do here. */
> > branch in expand_vacuum_rel() for autovacuum, so this shouldn't
> > matter. Sorry for the noise
> 
> But you said you'd seen blocking behind AEL with NOWAIT, so there's
> still a problem for manual vacuums no?

Right. But that facility isn't yet exposed to SQL, just in the patch I'm
reviewing. So there's no issue with the code from the commit I'm
replying to.

In [1] I wrote about one idea how to resolve this for the proposed
patch.

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20180306005349.b65whmvj7z6hb...@alap3.anarazel.de



Change RangeVarGetRelidExtended() to take flags argument?

2018-03-05 Thread Andres Freund
Hi,

https://www.postgresql.org/message-id/7327b413-1a57-477f-a6a0-6fd80cb54...@amazon.com
adds a SKIP LOCKED option to vacuum. To avoid blocking in
expand_vacuum_rel()'s call of RangeVarGetRelid(), it open codes a
SKIP_LOCKED variant (i.e. RangeVarGetRelidExtended()'s nowait, but
doesn't error out). I don't like that much.

Therefore I wonder if we should consolidate the existing
RangeVarGetRelidExtended() arguments (missing_ok and nowait) into a
flags argument. That'll then allow to add SKIP_LOCKED behaviour.

One wrinkle in that plan is that it'd not be trivial to discern whether
a lock couldn't be acquired or whether the object vanished.  I don't
really have good idea how to tackle that yet.  We could make the return
value more complicated (and return relation oid via parameter) but that
seems like it'd be more invasive.

Greetings,

Andres Freund



Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Tom Lane
Andres Freund  writes:
> Scratch that, we should be going down the
>   /* If caller supplied OID, there's nothing we need do here. */
> branch in expand_vacuum_rel() for autovacuum, so this shouldn't
> matter. Sorry for the noise

But you said you'd seen blocking behind AEL with NOWAIT, so there's
still a problem for manual vacuums no?

regards, tom lane



Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Peter Geoghegan
On Mon, Mar 5, 2018 at 4:37 PM, Claudio Freire  wrote:
> Correct me if I'm wrong, but there's lots of code in nbtree already
> that risks reading recycled pages for various reasons. Presumably,
> checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what
> !P_IGNORE implies.

You're mistaken. Read the nbtree README on page deletion, and the
RecentGlobalXmin interlock. Note also that there are 3 distinct page
states in play as far as deletion/recycling is concerned:

1. The first phase of deletion
2. The second phase of deletion.
3. Post-recycling, when in general the page could look like just about anything.

It's page deletion's job to make sure that an index scan cannot land
on a page after it was recycled. The corollary is that it is not
everyone else's job to make sure that that didn't happen -- how could
that possibly work?

Quite a few places know a bit and must reason about the the first
phase of deletion, and a few know about the second, but that's it.

>>> You could allow for some slack, by comparing against the leftmost key
>>> instead. You just need to know whether the key fits in the page. Then,
>>> the insert can proceed as usual, doing the binsearch to find the
>>> proper insertion point, etc.
>>
>> The whole way that this patch opts out of using an exclusive buffer
>> lock on the "first page the value could be on" (the _bt_check_unique()
>> + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
>> terribly concrete.
>
> Assuming the rightmost page is the first page the value could be on,
> it already does get an exclusive buffer lock.

This can be quite fiddly. The downlink in the parent can be equal to
the value in the target page, meaning that we actually lock the
target's left sibling (see _bt_binsrch() special case for internal
pages).

I didn't say that the patch is necessarily wrong here. Just that it
makes me nervous.

> If one wanted to relax the condition as I proposed, that uniqueness
> check is necessary, so that's why I propose shortcircuiting the search
> only, and not the actual insertion on the page. I believe IMO it's
> more important to try and maximize the conditions under which the
> optimization can be applied.

I didn't get what the point of checking the first item on the page
(your proposal) was.

-- 
Peter Geoghegan



Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Claudio Freire
On Mon, Mar 5, 2018 at 9:37 PM, Claudio Freire  wrote:
> Assuming the rightmost page is the first page the value could be on,
> it already does get an exclusive buffer lock.

That made me check, and:

+/*
+ * Acquire exclusive lock on the buffer before doing any checks. This
+ * ensures that the index state cannot change, as far as the rightmost
+ * part of the index is concerned.
+ */
+LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);

BTree code uses BT_READ and BT_WRITE instead, so that should be:

LockBuffer(buf, BT_WRITE)



Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-29 20:26:42 +, Tom Lane wrote:
>> get_rel_oids used to not take any relation locks at all, but that stopped
>> being a good idea with commit 3c3bb9933, which inserted a syscache lookup
>> into the function.  A concurrent DROP TABLE could now produce "cache lookup
>> failed", which we don't want to have happen in normal operation.  The best
>> solution seems to be to transiently take a lock on the relation named by
>> the RangeVar (which also makes the result of RangeVarGetRelid a lot less
>> spongy).  But we shouldn't hold the lock beyond this function, because we
>> don't want VACUUM to lock more than one table at a time.  (That would not
>> be a big problem right now, but it will become one after the pending
>> feature patch to allow multiple tables to be named in VACUUM.)

> I'm not sure how big a problem this is, but I suspect it is
> one. Autovacuum used to skip relations when they're locked, and the
> vacuum isn't an anti-wraparound one.  But after this change it appears
> we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
> specified.  That's bad because now relations that are AEL locked
> (e.g. because of some longrunning DDL) can block global autovacuum
> progress.

Hm.  Maybe we should take this lock with nowait if the vacuum option
is specified?

Another idea is to revert both this patch and 3c3bb9933, and instead
handle partitioning more like we handle recursion for toast tables, ie
make no decisions until after we do have lock on a relation.  The
really fundamental problem here is that 3c3bb9933 thought it is OK
to do a syscache lookup on a table we have no lock on, which is flat
wrong.  But we don't necessarily have to do it exactly like that.

regards, tom lane



Re: Cache lookup errors with functions manipulation object addresses

2018-03-05 Thread Michael Paquier
On Mon, Mar 05, 2018 at 12:57:38PM +, Aleksander Alekseev wrote:
> It looks like that this patch doesn't apply anymore: 
> http://commitfest.cputube.org/
> 
> The new status of this patch is: Waiting on Author

Yes, this happens because patch 0003 from the last series has been
committed as a26116c6.  Attached is a rebased set, though the patches
have no actual changes as those are able to apply correctly.
--
Michael
From 7fedf5f334acf0bddcf2f5672df368302ceb698b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 13 Feb 2018 11:12:44 +0900
Subject: [PATCH 1/3] Extend lookup routines for FDW and foreign server with
 NULL handling

The cache lookup routines for foreign-data wrappers and foreign servers
are extended with an extra argument able to control if an error or a
NULL object is returned to the caller in the event of an undefined
object.
---
 contrib/dblink/dblink.c |  2 +-
 contrib/file_fdw/file_fdw.c |  4 ++--
 contrib/postgres_fdw/connection.c   |  4 ++--
 contrib/postgres_fdw/postgres_fdw.c |  8 
 doc/src/sgml/fdwhandler.sgml| 10 --
 src/backend/catalog/objectaddress.c | 12 ++--
 src/backend/commands/foreigncmds.c  | 14 --
 src/backend/commands/tablecmds.c|  8 
 src/backend/foreign/foreign.c   | 28 ++--
 src/include/foreign/foreign.h   |  4 ++--
 10 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index ae7e24ad08..a67febac6f 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2784,7 +2784,7 @@ get_connect_string(const char *servername)
 		Oid			userid = GetUserId();
 
 		user_mapping = GetUserMapping(userid, serverid);
-		fdw = GetForeignDataWrapper(fdwid);
+		fdw = GetForeignDataWrapper(fdwid, false);
 
 		/* Check permissions, user must have usage on the server. */
 		aclresult = pg_foreign_server_aclcheck(serverid, userid, ACL_USAGE);
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index cf0a3629bc..d837f977e8 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -357,8 +357,8 @@ fileGetOptions(Oid foreigntableid,
 	 * Simplify?)
 	 */
 	table = GetForeignTable(foreigntableid);
-	server = GetForeignServer(table->serverid);
-	wrapper = GetForeignDataWrapper(server->fdwid);
+	server = GetForeignServer(table->serverid, false);
+	wrapper = GetForeignDataWrapper(server->fdwid, false);
 
 	options = NIL;
 	options = list_concat(options, wrapper->options);
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 00c926b983..3503595b7b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -182,7 +182,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
-		ForeignServer *server = GetForeignServer(user->serverid);
+		ForeignServer *server = GetForeignServer(user->serverid, false);
 
 		/* Reset all transient state fields, to be sure all are clean */
 		entry->xact_depth = 0;
@@ -1003,7 +1003,7 @@ pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry)
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for user mapping %u", entry->key);
 	umform = (Form_pg_user_mapping) GETSTRUCT(tup);
-	server = GetForeignServer(umform->umserver);
+	server = GetForeignServer(umform->umserver, false);
 	ReleaseSysCache(tup);
 
 	ereport(ERROR,
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e8a0d5482a..590469fe9e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -520,7 +520,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 
 	/* Look up foreign-table catalog info. */
 	fpinfo->table = GetForeignTable(foreigntableid);
-	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+	fpinfo->server = GetForeignServer(fpinfo->table->serverid, false);
 
 	/*
 	 * Extract user-settable option values.  Note that per-table setting of
@@ -2053,7 +2053,7 @@ postgresIsForeignRelUpdatable(Relation rel)
 	updatable = true;
 
 	table = GetForeignTable(RelationGetRelid(rel));
-	server = GetForeignServer(table->serverid);
+	server = GetForeignServer(table->serverid, false);
 
 	foreach(lc, server->options)
 	{
@@ -3998,7 +3998,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	 * owner, even if the ANALYZE was started by some other user.
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
-	server = GetForeignServer(table->serverid);
+	server = GetForeignServer(table->serverid, false);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
 	conn = GetConnection(user, false);
 
@@ -4221,7 +4221,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	server = 

Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Claudio Freire
On Mon, Mar 5, 2018 at 9:12 PM, Peter Geoghegan  wrote:
> On Thu, Mar 1, 2018 at 3:18 PM, Claudio Freire  wrote:
>> The key there is strictly greater than the rightmost key. As such, it
>> must precede the first page with an equal key, and since it's the
>> rightmost page... there's no such key. But if there was, the check for
>> uniqueness only needs the insert point to point to the first such
>> page, and this guarantees it.
>
> This code assumes that it can use block number as a stable way of
> finding the same point in the B-Tree over time, without any interlock.
> That may actually work out in practice, since even if the page is
> recycled there can only ever be one rightmost leaf page, but it seems
> like a brittle approach to me. The page image may be recycled by the
> FSM already, and we really shouldn't be depending on the page not
> looking a particular way once that has happened. I guess that you can
> say the same thing about using page LSN to determine cached block
> staleness instead, but that still seems a lot safer.
>
> Basically, I'm worried that we could do something entirely
> unpredictable when a page has actually been recycled, since we're
> entirely opting out of the RecentGlobalXmin interlock business on the
> assumption that we can be sure that recycling hasn't occurred in some
> other way. Imagine what could happen if we ever teach the FSM to share
> freespace among small relations, which seems quite possible to me.

Correct me if I'm wrong, but there's lots of code in nbtree already
that risks reading recycled pages for various reasons. Presumably,
checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what
!P_IGNORE implies.

>> You could allow for some slack, by comparing against the leftmost key
>> instead. You just need to know whether the key fits in the page. Then,
>> the insert can proceed as usual, doing the binsearch to find the
>> proper insertion point, etc.
>
> The whole way that this patch opts out of using an exclusive buffer
> lock on the "first page the value could be on" (the _bt_check_unique()
> + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
> terribly concrete.

Assuming the rightmost page is the first page the value could be on,
it already does get an exclusive buffer lock.

And it seems that assumption is correct in the patch as-is. In fact,
the current patch checks for a much stronger situation than needed to
apply this optimization, so it can even skip checking for conflicting
concurrent transactions. With an exclusive lock on the buffer, and
being the rightmost page, it means there can be no conflicting key
since the check is that the scan key is strictly greater than the
rightmost key (lets forget for a moment empty rightmost pages). Unless
there can be a new rightmost page in construction somehow (which I
don't believe it can happen, new rightmost pages would be created by
splitting the rightmost page, and that would conflict with the
exclusive lock), I don't see how this can fail.

If one wanted to relax the condition as I proposed, that uniqueness
check is necessary, so that's why I propose shortcircuiting the search
only, and not the actual insertion on the page. I believe IMO it's
more important to try and maximize the conditions under which the
optimization can be applied.



Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Andres Freund
On 2018-03-05 16:11:52 -0800, Andres Freund wrote:
> Hi Tom,
> 
> On 2017-09-29 20:26:42 +, Tom Lane wrote:
> > get_rel_oids used to not take any relation locks at all, but that stopped
> > being a good idea with commit 3c3bb9933, which inserted a syscache lookup
> > into the function.  A concurrent DROP TABLE could now produce "cache lookup
> > failed", which we don't want to have happen in normal operation.  The best
> > solution seems to be to transiently take a lock on the relation named by
> > the RangeVar (which also makes the result of RangeVarGetRelid a lot less
> > spongy).  But we shouldn't hold the lock beyond this function, because we
> > don't want VACUUM to lock more than one table at a time.  (That would not
> > be a big problem right now, but it will become one after the pending
> > feature patch to allow multiple tables to be named in VACUUM.)
> 
> I'm not sure how big a problem this is, but I suspect it is
> one. Autovacuum used to skip relations when they're locked, and the
> vacuum isn't an anti-wraparound one.  But after this change it appears
> we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
> specified.  That's bad because now relations that are AEL locked
> (e.g. because of some longrunning DDL) can block global autovacuum
> progress.

Scratch that, we should be going down the
/* If caller supplied OID, there's nothing we need do here. */
if (OidIsValid(vrel->oid))
{
oldcontext = MemoryContextSwitchTo(vac_context);
vacrels = lappend(vacrels, vrel);
MemoryContextSwitchTo(oldcontext);
}
branch in expand_vacuum_rel() for autovacuum, so this shouldn't
matter. Sorry for the noise

Greetings,

Andres Freund



Re: Faster inserts with mostly-monotonically increasing values

2018-03-05 Thread Peter Geoghegan
On Thu, Mar 1, 2018 at 3:18 PM, Claudio Freire  wrote:
> given...
>
> +if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
> +!P_INCOMPLETE_SPLIT(lpageop) &&
> +!P_IGNORE(lpageop) &&
> +(PageGetFreeSpace(page) > itemsz) &&
> +_bt_compare(rel, natts, itup_scankey, page,
> PageGetMaxOffsetNumber(page)) > 0)

One simple problem with this code is that it assumes that there must
be at least one item on a leaf page, which isn't guaranteed. There has
to be a highkey on non-righmost pages, but rightmost leaf pages can be
totally empty. While btvacuumpage() is very unlikely to not be able to
begin deleting the page there and then, it can happen (see remarks at
the end of btvacuumpage() about recursing).

Other issues spotted:

* The variable name "lpageop" seems like is was lifted from somewhere
dealing with a page to the left of some other page, which is not the
case here.

* P_INCOMPLETE_SPLIT() checks a bit that can only be set on a
non-rightmost page -- a page that has a sibling to its right that
doesn't have a downlink in parent. The bit is only ever set on the
"target" of a (still unfinished) page split. This is why
_bt_moveright() doesn't bother with completing a page split when it
reaches a rightmost page. I don't see why you need this part of the
test at all.

> The key there is strictly greater than the rightmost key. As such, it
> must precede the first page with an equal key, and since it's the
> rightmost page... there's no such key. But if there was, the check for
> uniqueness only needs the insert point to point to the first such
> page, and this guarantees it.

This code assumes that it can use block number as a stable way of
finding the same point in the B-Tree over time, without any interlock.
That may actually work out in practice, since even if the page is
recycled there can only ever be one rightmost leaf page, but it seems
like a brittle approach to me. The page image may be recycled by the
FSM already, and we really shouldn't be depending on the page not
looking a particular way once that has happened. I guess that you can
say the same thing about using page LSN to determine cached block
staleness instead, but that still seems a lot safer.

Basically, I'm worried that we could do something entirely
unpredictable when a page has actually been recycled, since we're
entirely opting out of the RecentGlobalXmin interlock business on the
assumption that we can be sure that recycling hasn't occurred in some
other way. Imagine what could happen if we ever teach the FSM to share
freespace among small relations, which seems quite possible to me.

> You could allow for some slack, by comparing against the leftmost key
> instead. You just need to know whether the key fits in the page. Then,
> the insert can proceed as usual, doing the binsearch to find the
> proper insertion point, etc.

The whole way that this patch opts out of using an exclusive buffer
lock on the "first page the value could be on" (the _bt_check_unique()
+ _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
terribly concrete.

-- 
Peter Geoghegan



Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().

2018-03-05 Thread Andres Freund
Hi Tom,

On 2017-09-29 20:26:42 +, Tom Lane wrote:
> get_rel_oids used to not take any relation locks at all, but that stopped
> being a good idea with commit 3c3bb9933, which inserted a syscache lookup
> into the function.  A concurrent DROP TABLE could now produce "cache lookup
> failed", which we don't want to have happen in normal operation.  The best
> solution seems to be to transiently take a lock on the relation named by
> the RangeVar (which also makes the result of RangeVarGetRelid a lot less
> spongy).  But we shouldn't hold the lock beyond this function, because we
> don't want VACUUM to lock more than one table at a time.  (That would not
> be a big problem right now, but it will become one after the pending
> feature patch to allow multiple tables to be named in VACUUM.)

I'm not sure how big a problem this is, but I suspect it is
one. Autovacuum used to skip relations when they're locked, and the
vacuum isn't an anti-wraparound one.  But after this change it appears
we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
specified.  That's bad because now relations that are AEL locked
(e.g. because of some longrunning DDL) can block global autovacuum
progress.

I noticed this when reviewing a patch that adds NOWAIT (now renamed to
SKIP LOCKED) as a user visible knob and it getting stuck behind an
AEL. The updated version of the patch
http://archives.postgresql.org/message-id/7327B413-1A57-477F-A6A0-6FD80CB5482D%40amazon.com
has an attempt at fixing the issue, although I've not yet looked at it
in any sort of detail.

I suspect we should break out that part once reviewed, and backpatch to
10?

Greetings,

Andres Freund



Re: 2018-03 CFM

2018-03-05 Thread David Steele
On 3/5/18 6:06 PM, Thomas Munro wrote:
> On Sat, Mar 3, 2018 at 1:18 AM, Aleksander Alekseev
>>
>> I don't think commitfest.cputube.org has the SQL data on whether patch
>> pass the tests. It just displays SVG images from travis-ci.org. Also
>> unfortunately both commitfest.postgresql.org and commitfest.cputube.org
>> currently don't have any kind of public API and don't allow to export
>> data, e.g. in CSV or JSON.
> 
> My current plan is to propose that we post automated apply/build
> results into the Commitfest app itself so that it can show them, and
> then shut down commitfest.cputube.org.

+1

> The reason commitfest.cputube.org is so limited, not integrated yet
> and running on a strange domain name is because I decided to build an
> independent proof-of-concept first.  I imagined that negotiating with
> many busy people on how such a thing should work and what would even
> be possible first would be more likely to stall, based on previous
> experiences with humans :-D

I've been using commitfest.cputube.org for reference since the last CF,
though I'm unsure if it rechecks patches when master changes, so I do
that manually.  Anyway, that's probably too much to ask since every push
to master would set off a 100+ builds on Travis.

Maybe just reapply once a day when master changes?  And only rebuild if
the patch changes?  Not perfect, but it would work in most cases.

I use Travis for the pgBackRest project, and I try to be considerate of
the free resources.  I dislike the idea of throwing a ton of builds at
them without considering what would be most useful.

Regards,
-- 
-David
da...@pgmasters.net



RE: [bug fix] pg_rewind takes long time because it mistakenly copies data files

2018-03-05 Thread Tsunakawa, Takayuki
From: Fujii Masao [mailto:masao.fu...@gmail.com]
> Thanks for the patch! Pushed.

Thank you.  I'm glad to see you again on this list.

Regards
Takayuki Tsunakawa



Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 3/5/18 5:51 PM, Tom Lane wrote:
> David Steele  writes:
>> On 3/5/18 5:11 PM, Tom Lane wrote:
>>> David Steele  writes:
 I'm not sure what the protocol for introducing a new Perl module is?  I
 couldn't find packages for the major OSes.  Are we OK with using CPAN?
> 
>>> I don't think that's cool.  Anything that's not part of a standard Perl
>>> installation is a bit of a lift already, and if it's not packaged by
>>> major distros then it's really a problem for many people.  (Yeah, they
>>> may know what CPAN is, but they might have local policy issues about
>>> installing directly from there.)
> 
>> It's a little different here, because these packages are only used for
>> testing/development.
> 
> True, but if we want this test to be part of either check-world or the
> buildfarm run, that's still a lot of people we're asking to install
> the extra module.  If we're talking about saving just a few dozen
> lines of code, it ain't worth it.

+1.

-- 
-David
da...@pgmasters.net



Re: jsonpath

2018-03-05 Thread Nikita Glukhov

On 06.03.2018 00:29, Tomas Vondra wrote:


Hi,

The patch no longer applies - it got broken by fd1a421fe66 which changed
columns in pg_proc. A rebase is needed.

Fixing it is pretty simle, so I've done that locally and tried to run
'make check' under valgrind. And I got a bunch of reports about
uninitialised values.


Thank you very much for your discovery. It is my fault that I never ran
PostgreSQL under valgrind. Rebased and fixed patches are attached.


  Full report attached, but in general there seem to
be two types of failures:

   Conditional jump or move depends on uninitialised value(s)
  at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so)
  by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so)
  by 0xA926D3: pvsnprintf (psprintf.c:121)
  by 0x723E03: appendStringInfoVA (stringinfo.c:130)
  by 0x723D58: appendStringInfo (stringinfo.c:87)
  by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413)
  by 0x776F99: outNode (outfuncs.c:3978)
  by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779)
  by 0x777CB9: outNode (outfuncs.c:4398)
  by 0x76D507: _outJsonExpr (outfuncs.c:1752)
  by 0x777CA1: outNode (outfuncs.c:4395)
  by 0x767000: _outList (outfuncs.c:187)
  by 0x776874: outNode (outfuncs.c:3753)
  by 0x76A4D2: _outTableFunc (outfuncs.c:1068)
  by 0x776D89: outNode (outfuncs.c:3912)
  by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209)
  by 0x777959: outNode (outfuncs.c:4290)
  by 0x767000: _outList (outfuncs.c:187)
  by 0x776874: outNode (outfuncs.c:3753)
  by 0x773713: _outQuery (outfuncs.c:3049)
Uninitialised value was created by a stack allocation
  at 0x5B0C19: base_yyparse (gram.c:26287)

This happens when _outCoerceViaIO tries to output 'location' field
(that's line 1413), so I guess it's not set/copied somewhere.


Yes, JSON FORMAT location was not set in gram.y.


The second failure looks like this:

   Conditional jump or move depends on uninitialised value(s)
  at 0x49E58B: ginFillScanEntry (ginscan.c:72)
  by 0x49EB56: ginFillScanKey (ginscan.c:221)
  by 0x49EF72: ginNewScanKey (ginscan.c:369)
  by 0x4A3875: gingetbitmap (ginget.c:1807)
  by 0x4F620B: index_getbitmap (indexam.c:727)
  by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
  by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
  by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
  by 0x6DC26D: ExecScanFetch (execScan.c:95)
  by 0x6DC308: ExecScan (execScan.c:162)
  by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
  by 0x6E5961: ExecProcNode (executor.h:239)
  by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
  by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
  by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
  by 0x6D1361: ExecProcNode (executor.h:239)
  by 0x6D3BB7: ExecutePlan (execMain.c:1721)
  by 0x6D1917: standard_ExecutorRun (execMain.c:361)
Uninitialised value was created by a heap allocation
  at 0xA64FDC: palloc (mcxt.c:858)
  by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630)
  by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746)
  by 0xA340C0: FunctionCall7Coll (fmgr.c:1201)
  by 0x49EE7F: ginNewScanKey (ginscan.c:313)
  by 0x4A3875: gingetbitmap (ginget.c:1807)
  by 0x4F620B: index_getbitmap (indexam.c:727)
  by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
  by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
  by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
  by 0x6DC26D: ExecScanFetch (execScan.c:95)
  by 0x6DC308: ExecScan (execScan.c:162)
  by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
  by 0x6E5961: ExecProcNode (executor.h:239)
  by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
  by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
  by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
  by 0x6D1361: ExecProcNode (executor.h:239)

So the extra_data allocated in gin_extract_jsonpath_query() get to
ginFillScanEntry() uninitialised.


Yes, only the first element of extra_data[] was initialized and used later in
gin_consistent_jsonb(), but ginFillScanEntry() wants all of them to be
initialized.


Both seem like a valid issues, I think.

regards


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



sqljson_jsonpath_v11.tar.gz
Description: application/gzip


Re: 2018-03 Commitfest Summary (Andres #1)

2018-03-05 Thread Andres Freund
Hi,

On 2018-03-04 11:03:05 +0100, Fabien COELHO wrote:
> > A lot of contributors, including serial ones, don't even remotely put in
> > as much resources reviewing other people's patches as they use up in
> > reviewer and committer bandwidth. You certainly have contributed more
> > patches than you've reviewed for example.
> 
> Please check your numbers before criticising someone unduly.

I did.  I filtered emails by threads, and counted the number of
messages.


> > Note that pgbench code does add work, even if one is not using the new
> > features. As you know, I was working on performance and robustness
> > improvements, and to make sure they are and stay correct I was
> > attempting to compile postgres with -fsanitize=overflow - which fails
> > because pgbench isn't overflow safe. I reported that, but you didn't
> > follow up with fixes.
> 
> Indeed. AFAICR you did it before, I think that I reviewed it, it was not a
> period for which I had a lot of available time, and I did not feel it was
> something that urgent to fix because there was no practical impact. I would
> have done it later, probably.

It's still not fixed.

Greetings,

Andres Freund



Re: User defined data types in Logical Replication

2018-03-05 Thread Alvaro Herrera
I noticed that logicalrep_typmap_gettypname's only caller is
slot_store_error_callback, which is an error callback.  So by raising an
error from the former, we would effectively just hide the actual reason
for the error behind the error that the cache entry cannot be found.

Therefore, I'm inclined to make this function raise a warning, then
return a substitute value (something like "unrecognized type XYZ").  Do
the same for the case with the mismatched major versions.  Lastly, if
LogicalRepTypMap is not defined, also throw a warning and return a
substitute value rather than try to initialize the hash table:
initializing the table is not going to make the entry show up in it,
anyway, so it's pretty pointless; and there's plenty chance for things
to go wrong, which is not a good idea in an error callback.

Do we need a new TAP test for this?  Judging by
https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html
showing all zeroes for the last function, we do.  Same with
slot_store_error_callback in
https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Creating backup history files for backups taken from standbys

2018-03-05 Thread Michael Paquier
On Tue, Mar 06, 2018 at 02:23:19AM +0900, Fujii Masao wrote:
> I have no objection to mark the patch "returned with feedback".

Yes I have done so, that's way too late.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-05 Thread Andres Freund
Hi,

On 2018-02-13 12:41:26 +0530, amul sul wrote:
> From 08c8c7ece7d9411e70a780dbeed89d81419db6b6 Mon Sep 17 00:00:00 2001
> From: Amul Sul 
> Date: Tue, 13 Feb 2018 12:37:52 +0530
> Subject: [PATCH 1/2] Invalidate ip_blkid v5
> 
> v5:
>  - Added code in heap_mask to skip wal_consistency_checking[7]
>  - Fixed previous todos.
> 
> v5-wip2:
>  - Minor changes in RelationFindReplTupleByIndex() and
>RelationFindReplTupleSeq()
> 
>  - TODO;
>Same as the privious
> 
> v5-wip: Update w.r.t Amit Kapila's comments[6].
>  - Reverted error message in nodeModifyTable.c from 'tuple to be locked'
>to 'tuple to be updated'.
> 
>  - TODO:
>  1. Yet to made a decision of having LOG/ELOG/ASSERT in the
> RelationFindReplTupleByIndex() and RelationFindReplTupleSeq().
> 
> v4: Rebased on "UPDATE of partition key v35" patch[5].
> 
> v3: Update w.r.t Amit Kapila's[3] & Alvaro Herrera[4] comments
>  - typo in all error message and comment : "to an another" -> "to another"
>  - error message change : "tuple to be updated" -> "tuple to be locked"
>  - In ExecOnConflictUpdate(), error report converted into assert &
>   comments added.
> 
> v2: Updated w.r.t Robert review comments[2]
>  - Updated couple of comment of heap_delete argument and ItemPointerData
>  - Added same concurrent update error logic in ExecOnConflictUpdate,
>RelationFindReplTupleByIndex and RelationFindReplTupleSeq
> 
> v1: Initial version -- as per Amit Kapila's suggestions[1]
>  - When tuple is being moved to another partition then ip_blkid in the
>tuple header mark to InvalidBlockNumber.

Very nice and instructive to keep this in a submission's commit message.



> diff --git a/src/backend/access/heap/heapam.c 
> b/src/backend/access/heap/heapam.c
> index 8a846e7dba..f4560ee9cb 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -3037,6 +3037,8 @@ xmax_infomask_changed(uint16 new_infomask, uint16 
> old_infomask)
>   *   crosscheck - if not InvalidSnapshot, also check tuple against this
>   *   wait - true if should wait for any conflicting update to commit/abort
>   *   hufd - output parameter, filled in failure cases (see below)
> + *   row_moved - true iff the tuple is being moved to another partition
> + *   table due to an update of partition key. 
> Otherwise, false.
>   *

I don't think 'row_moved' is a good variable name for this. Moving a row
in our heap format can mean a lot of things. Maybe 'to_other_part' or
'changing_part'?


> + /*
> +  * Sets a block identifier to the InvalidBlockNumber to indicate such an
> +  * update being moved tuple to another partition.
> +  */
> + if (row_moved)
> + BlockIdSet(&((tp.t_data->t_ctid).ip_blkid), InvalidBlockNumber);

The parens here are set in a bit werid way. I assume that's from copying
it out of ItemPointerSet()?  Why aren't you just using 
ItemPointerSetBlockNumber()?


I think it'd be better if we followed the example of specultive inserts
and created an equivalent of HeapTupleHeaderSetSpeculativeToken. That'd
be a heck of a lot easier to grep for...


> @@ -9314,6 +9323,18 @@ heap_mask(char *pagedata, BlockNumber blkno)
>*/
>   if (HeapTupleHeaderIsSpeculative(page_htup))
>   ItemPointerSet(_htup->t_ctid, blkno, off);
> +
> + /*
> +  * For a deleted tuple, a block identifier is set to the

I think this 'the' is superflous.


> +  * InvalidBlockNumber to indicate that the tuple has 
> been moved to
> +  * another partition due to an update of partition key.

But I think it should be 'the partition key'.


> +  * Like speculative tuple, to ignore any inconsistency 
> set block
> +  * identifier to current block number.

This doesn't quite parse.


> +  */
> + if (!BlockNumberIsValid(
> + 
> BlockIdGetBlockNumber((&((page_htup->t_ctid).ip_blkid)
> + BlockIdSet(&((page_htup->t_ctid).ip_blkid), 
> blkno);
>   }

That formatting looks wrong. I think it should be replaced by a macro
like mentioned above.


>   /*
> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 160d941c00..a770531e14 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c
> @@ -3071,6 +3071,11 @@ ltrmark:;
>   ereport(ERROR,
>   
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>errmsg("could not 
> serialize access due to concurrent update")));
> + if 
> (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid
> +  

Re: Weird failures on lorikeet

2018-03-05 Thread Thomas Munro
On Thu, Feb 22, 2018 at 7:06 AM, Andres Freund  wrote:
> Hi Andrew,
>
> I noticed your animal lorikeet failed in the last two runs:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2018-02-21%2009%3A47%3A17
> TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >= 
> (__builtin_offsetof (PageHeaderData, pd_linp)))", File: 
> "/home/andrew/bf64/root/HEAD/pgsql/src/include/storage/bufpage.h", Line: 313)
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2018-02-20%2012%3A46%3A17
> TRAP: FailedAssertion("!(PMSignalState->PMChildFlags[slot] == 1)", File: 
> "/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/pmsignal.c",
>  Line: 229)
> 2018-02-20 08:07:14.054 EST [5a8c1c3b.21d0:3] LOG:  select() failed in 
> postmaster: Bad address
> 2018-02-20 08:07:14.073 EST [5a8c1c3b.21d0:4] LOG:  database system is shut 
> down
>
> The difference between the last successfull and the last failing build
> is a single comment typo commit.
>
> It kinda looks like there might be some underlying issue on the machine
> with shared memory going away or such?

Bad memory?  Assorted output from recent runs:

+ ERROR:  index "pg_toast_28546_index" contains corrupted page at block 1

TRAP: FailedAssertion("!(PMSignalState->PMChildFlags[slot] == 1)",
File: 
"/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/pmsignal.c",
Line: 229)

TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
"/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/pgstat.c",
Line: 871)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: 2018-03 CFM

2018-03-05 Thread Thomas Munro
On Sat, Mar 3, 2018 at 1:18 AM, Aleksander Alekseev
 wrote:
>> You do realize we have the actual source database available, I hope? Since
>> it's our own system... There is no need to scrape the data back out -- if
>> we can just define what kind of reports we want, we can trivially run it on
>> the source database. Or if we want it more often, we can easily make a
>> webview for it. It's basically just a "map this URL to a SQL query"...
>
> I don't think commitfest.cputube.org has the SQL data on whether patch
> pass the tests. It just displays SVG images from travis-ci.org. Also
> unfortunately both commitfest.postgresql.org and commitfest.cputube.org
> currently don't have any kind of public API and don't allow to export
> data, e.g. in CSV or JSON.
>
> I guess it would be nice if both services supported export, in any
> format, so anyone could build any kind of reports or automation tools
> without parsing HTML with regular expressions or depending on other
> people.
>
> If I'm not mistaken, there was a discussion regarding public APIs.
> I wonder what prevents adding it, at least a simple export of everything.
> After all, it is indeed just mapping URL to a SQL query. For instance,
> this one:
>
> select array_to_json(array_agg(row_to_json(tbl))) from tbl;

Hi Aleksander,

My current plan is to propose that we post automated apply/build
results into the Commitfest app itself so that it can show them, and
then shut down commitfest.cputube.org.  I don't have all the details
figured out yet and I won't be working on a proposal until after this
Commitfest is over.  If you have ideas on how it should receive, store
and show them, please clone the app (pgcommitfest2.git?) and start a
thread (maybe on pgsql-www?).  Here's a question for you to ponder:  I
think there should probably be one "apply" result (success/fail, and a
URL to find the log), but should there be one or many "build/test"
results?  It could show results from different OSes separately (Travis
macOS, Travis Ubuntu, AppVeyor Windows, ...), and it could track more
than one recent result (useful for seeing things that used to pass and
now fail or vice versa).

The reason commitfest.cputube.org is so limited, not integrated yet
and running on a strange domain name is because I decided to build an
independent proof-of-concept first.  I imagined that negotiating with
many busy people on how such a thing should work and what would even
be possible first would be more likely to stall, based on previous
experiences with humans :-D

-- 
Thomas Munro
http://www.enterprisedb.com



Re: jsonpath

2018-03-05 Thread Oleg Bartunov
Thanks, Tomas !

Will publish a new version really soon !

Regards,
Oleg

On Tue, Mar 6, 2018 at 12:29 AM, Tomas Vondra
 wrote:
> Hi,
>
> The patch no longer applies - it got broken by fd1a421fe66 which changed
> columns in pg_proc. A rebase is needed.
>
> Fixing it is pretty simle, so I've done that locally and tried to run
> 'make check' under valgrind. And I got a bunch of reports about
> uninitialised values. Full report attached, but in general there seem to
> be two types of failures:
>
>   Conditional jump or move depends on uninitialised value(s)
>  at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so)
>  by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so)
>  by 0xA926D3: pvsnprintf (psprintf.c:121)
>  by 0x723E03: appendStringInfoVA (stringinfo.c:130)
>  by 0x723D58: appendStringInfo (stringinfo.c:87)
>  by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413)
>  by 0x776F99: outNode (outfuncs.c:3978)
>  by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779)
>  by 0x777CB9: outNode (outfuncs.c:4398)
>  by 0x76D507: _outJsonExpr (outfuncs.c:1752)
>  by 0x777CA1: outNode (outfuncs.c:4395)
>  by 0x767000: _outList (outfuncs.c:187)
>  by 0x776874: outNode (outfuncs.c:3753)
>  by 0x76A4D2: _outTableFunc (outfuncs.c:1068)
>  by 0x776D89: outNode (outfuncs.c:3912)
>  by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209)
>  by 0x777959: outNode (outfuncs.c:4290)
>  by 0x767000: _outList (outfuncs.c:187)
>  by 0x776874: outNode (outfuncs.c:3753)
>  by 0x773713: _outQuery (outfuncs.c:3049)
>Uninitialised value was created by a stack allocation
>  at 0x5B0C19: base_yyparse (gram.c:26287)
>
> This happens when _outCoerceViaIO tries to output 'location' field
> (that's line 1413), so I guess it's not set/copied somewhere.
>
> The second failure looks like this:
>
>   Conditional jump or move depends on uninitialised value(s)
>  at 0x49E58B: ginFillScanEntry (ginscan.c:72)
>  by 0x49EB56: ginFillScanKey (ginscan.c:221)
>  by 0x49EF72: ginNewScanKey (ginscan.c:369)
>  by 0x4A3875: gingetbitmap (ginget.c:1807)
>  by 0x4F620B: index_getbitmap (indexam.c:727)
>  by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
>  by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
>  by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
>  by 0x6DC26D: ExecScanFetch (execScan.c:95)
>  by 0x6DC308: ExecScan (execScan.c:162)
>  by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
>  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>  by 0x6E5961: ExecProcNode (executor.h:239)
>  by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
>  by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
>  by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
>  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>  by 0x6D1361: ExecProcNode (executor.h:239)
>  by 0x6D3BB7: ExecutePlan (execMain.c:1721)
>  by 0x6D1917: standard_ExecutorRun (execMain.c:361)
>Uninitialised value was created by a heap allocation
>  at 0xA64FDC: palloc (mcxt.c:858)
>  by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630)
>  by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746)
>  by 0xA340C0: FunctionCall7Coll (fmgr.c:1201)
>  by 0x49EE7F: ginNewScanKey (ginscan.c:313)
>  by 0x4A3875: gingetbitmap (ginget.c:1807)
>  by 0x4F620B: index_getbitmap (indexam.c:727)
>  by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
>  by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
>  by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
>  by 0x6DC26D: ExecScanFetch (execScan.c:95)
>  by 0x6DC308: ExecScan (execScan.c:162)
>  by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
>  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>  by 0x6E5961: ExecProcNode (executor.h:239)
>  by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
>  by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
>  by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
>  by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>  by 0x6D1361: ExecProcNode (executor.h:239)
>
> So the extra_data allocated in gin_extract_jsonpath_query() get to
> ginFillScanEntry() uninitialised.
>
>
> Both seem like a valid issues, I think.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Joe Conway
On 03/05/2018 02:07 PM, Tom Lane wrote:
> So you can revert the rhinoceros config change if you like --- thanks
> for making it so quickly!

Ok, reverted.

> Meanwhile, I'm back to wondering what could possibly have affected
> the planner's estimates, if pg_proc and pg_statistic didn't change.
> I confess bafflement ... but we've now eliminated the autovacuum-
> did-it theory entirely, so it's time to start looking someplace else.
> I wonder if something in the postgres_fdw remote join machinery
> is not as deterministic as it should be.

Do you want me to do anything manual locally on this VM?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2018-03-05 Thread Tomas Vondra


On 03/05/2018 08:34 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 03/04/2018 09:46 PM, Tom Lane wrote:
>>> Well, I think the existing bytea bug is a counterexample to that.  If
>>> someone were to repeat that mistake with, say, UUID, these tests would not
>>> catch it, because none of them would exercise UUID-vs-something-else.
>>> For that matter, your statement is false on its face, because even if
>>> somebody tried to add say uuid-versus-int8, these tests would not catch
>>> lack of support for that in convert_to_scalar unless the specific case of
>>> uuid-versus-int8 were added to the tests.
> 
>> I suspect we're simply having different expectations what the tests
>> could/should protect against - my intention was to make sure someone
>> does not break convert_to_scalar for the currently handled types.
> 
> I concur that we could use better test coverage for the existing
> code --- the coverage report is pretty bleak there.  But I think we
> could improve that just by testing with the existing operators.  I do
> not see much use in checking that unsupported cross-type cases fail
> cleanly, because there isn't a practical way to have full coverage
> for such a concern.
> 

Hmmm, OK. I'm sure we could improve the coverage of the file in general
by using existing operators, no argument here.

Obviously, that does not work for failure cases in convert_to_scalar(),
because no existing operator can exercise those. I wouldn't call those
cases 'unsupported' - they are pretty obviously supported, just meant to
handle unknown user-defined data types. Admittedly, the operators in the
tests were rather silly but I find them rather practical.

In any case, thanks for the discussion! I now understand the reasoning
why you did not commit the tests, at least.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-03-05 Thread Tom Lane
David Steele  writes:
> On 3/5/18 5:11 PM, Tom Lane wrote:
>> David Steele  writes:
>>> I'm not sure what the protocol for introducing a new Perl module is?  I
>>> couldn't find packages for the major OSes.  Are we OK with using CPAN?

>> I don't think that's cool.  Anything that's not part of a standard Perl
>> installation is a bit of a lift already, and if it's not packaged by
>> major distros then it's really a problem for many people.  (Yeah, they
>> may know what CPAN is, but they might have local policy issues about
>> installing directly from there.)

> It's a little different here, because these packages are only used for
> testing/development.

True, but if we want this test to be part of either check-world or the
buildfarm run, that's still a lot of people we're asking to install
the extra module.  If we're talking about saving just a few dozen
lines of code, it ain't worth it.

regards, tom lane



Re: 2018-03 CFM

2018-03-05 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 3/5/18 4:55 PM, Magnus Hagander wrote:
> > 
> > I would like to get a list of submitter patches totals vs the total
> > number of patches they are reviewing.  In the past I have done this by
> > eyeball.
> > 
> > I think that's pretty much the part that's available under "reports" to
> > use as CF manager? Link at the bottom of the CF page, reports -> Author
> > Stats?
> 
> Aha, I think I found that a long time ago and forgot about it.  Maybe it
> would be more obvious at the top of the page?
> 
> On my wishlist would be a way to mark that I looked at a patch (and
> maybe a comment), then be able to sort by that date.  Right now I'm
> keeping spreadsheets for that.

Agreed.  I ended up using "last email on this thread date" but that
isn't always quite the same.

I'd suggest a way to provide a couple of email addresses for that
though- in case we actually have a CFM + helpers.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Alvaro Herrera
Pushed now, to branches master and pg10, with Tomas changes.  I made a
few changes of my own

1. you forgot to update various src/backend/nodes/ files
2. I got rid of "NameData stxname" variable in CreateStatistics, which
seems pointless now.  We can work with a cstring only.  Not sure why we
had that one ...
3. I chose not to backpatch the node->stxcomment thing.  It makes me
nervous to modify a parse node.  So cloning the comments is a PG11
thing.  Hopefully it's not *too* bad ...
4. See elsewhere in the thread about list_copy vs. list_concat :-)

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-05 Thread Alexander Korotkov
On Mon, Mar 5, 2018 at 5:56 AM, Masahiko Sawada 
wrote:

> On Sun, Mar 4, 2018 at 8:59 AM, Alexander Korotkov
>  wrote:
> > On Fri, Mar 2, 2018 at 10:53 AM, Masahiko Sawada 
> > wrote:
> >>
> >> > 2) In the append-only case, index statistics can lag indefinitely.
> >>
> >> The original proposal proposed a new GUC that specifies a fraction of
> >> the modified pages to trigger a cleanup indexes.
> >
> >
> > Regarding original proposal, I didn't get what exactly it's intended to
> be.
> > You're checking if vacuumed_pages >= nblocks *
> vacuum_cleanup_index_scale.
> > But vacuumed_pages is the variable which could be incremented when
> > no indexes exist on the table.  When indexes are present, this variable
> is
> > always
> > zero.  I can assume, that it's intended to compare number of pages where
> > at least one tuple is deleted to nblocks * vacuum_cleanup_index_scale.
> > But that is also not an option for us, because we're going to optimize
> the
> > case when exactly zero tuples is deleted by vacuum.
>
> In the latest v4 patch, I compare scanned_pages and the threshold,
> which means if the number of pages that are modified since the last
> vacuum is larger than the threshold we force cleanup index.
>

Right, sorry I've overlooked that.  However, if even use number of pages
I would still prefer cumulative measure.  So, number of vacuums are
taken into account even if each of them touched only small number of
pages.


> > The thing I'm going to propose is to add estimated number of tuples in
> > table to IndexVacuumInfo.  Then B-tree can memorize that number of tuples
> > when last time index was scanned in the meta-page.  If pass value
> > is differs from the value in meta-page too much, then cleanup is forced.
> >
> > Any better ideas?
>
> I think that would work. But I'm concerned about metapage format
> compatibility.


That's not show-stopper.  B-tree meta page have version number.  So,
it's no problem to provide online update.


> And since I've not fully investigated about cleanup
> index of other index types I'm not sure that interface makes sense. It
> might not be better but an alternative idea is to add a condition
> (Irel[i]->rd_rel->relam == BTREE_AM_OID) in lazy_scan_heap.


I meant putting this logic *inside* btvacuumcleanup() while passing
required measure to IndexVacuumInfo which is accessible from
btvacuumcleanup().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: PATCH: Configurable file mode mask

2018-03-05 Thread Stephen Frost
Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> David Steele  writes:
> > On 2/28/18 2:28 AM, Michael Paquier wrote:
> >> That's basically a recursive chmod, so chmod_recursive is more adapted?
> >> I could imagine that this is useful as well for removing group
> >> permissions, so the new mode could be specified as an argument.
> 
> > The required package (File::chmod::Recursive) for chmod_recursive is not
> > in use anywhere else and was not installed when I installed build
> > dependencies.
> 
> > I'm not sure what the protocol for introducing a new Perl module is?  I
> > couldn't find packages for the major OSes.  Are we OK with using CPAN?
> 
> I don't think that's cool.  Anything that's not part of a standard Perl
> installation is a bit of a lift already, and if it's not packaged by
> major distros then it's really a problem for many people.  (Yeah, they
> may know what CPAN is, but they might have local policy issues about
> installing directly from there.)

Agreed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
Hi Tom,

On 3/5/18 5:11 PM, Tom Lane wrote:
> David Steele  writes:
>> On 2/28/18 2:28 AM, Michael Paquier wrote:
>>> That's basically a recursive chmod, so chmod_recursive is more adapted?
>>> I could imagine that this is useful as well for removing group
>>> permissions, so the new mode could be specified as an argument.
> 
>> The required package (File::chmod::Recursive) for chmod_recursive is not
>> in use anywhere else and was not installed when I installed build
>> dependencies.
> 
>> I'm not sure what the protocol for introducing a new Perl module is?  I
>> couldn't find packages for the major OSes.  Are we OK with using CPAN?
> 
> I don't think that's cool.  Anything that's not part of a standard Perl
> installation is a bit of a lift already, and if it's not packaged by
> major distros then it's really a problem for many people.  (Yeah, they
> may know what CPAN is, but they might have local policy issues about
> installing directly from there.)

This is my view as well.  I don't think CPAN should ever be on a
production box, mostly because of all the other tools that need to be
installed to make it work.

It's a little different here, because these packages are only used for
testing/development.

Of course, maybe I have this wrong and Michael will point out my error.
If not, I'm happy to rework the function (about 15 lines) to be more
generic.

-- 
-David
da...@pgmasters.net



Re: PATCH: Configurable file mode mask

2018-03-05 Thread Tom Lane
David Steele  writes:
> On 2/28/18 2:28 AM, Michael Paquier wrote:
>> That's basically a recursive chmod, so chmod_recursive is more adapted?
>> I could imagine that this is useful as well for removing group
>> permissions, so the new mode could be specified as an argument.

> The required package (File::chmod::Recursive) for chmod_recursive is not
> in use anywhere else and was not installed when I installed build
> dependencies.

> I'm not sure what the protocol for introducing a new Perl module is?  I
> couldn't find packages for the major OSes.  Are we OK with using CPAN?

I don't think that's cool.  Anything that's not part of a standard Perl
installation is a bit of a lift already, and if it's not packaged by
major distros then it's really a problem for many people.  (Yeah, they
may know what CPAN is, but they might have local policy issues about
installing directly from there.)

regards, tom lane



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Alvaro Herrera
I admit to much head-scratching, erasing my entire ccache cache, the
autoconf cache and doing two complete rebuilds from scratch, because 
I was seeing 40 errors in regression tests.  But it
turned out to be about this hunk, which was identical to the idea I had
while skimming David's original, "hey why don't we just copy the list":

> +/*
> + * transformExtendedStatistics
> + *   handle extended statistics
> + *
> + * Right now, there's nothing to do here, so we just copy the list.
> + */
>  static void
>  transformExtendedStatistics(CreateStmtContext *cxt)
>  {
> - ListCell *lc;
> -
> - foreach(lc, cxt->extstats)
> - cxt->alist = lappend(cxt->alist, lfirst(lc));
> + cxt->alist = list_copy(cxt->extstats);
>  }
>  
>  /*

But as it turns out, it's wrong!  list_concat() is what is needed here,
not list_copy.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Tom Lane
Joe Conway  writes:
> On 03/05/2018 11:19 AM, Tom Lane wrote:
>> Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
>> rhinoceros' extra_config options, temporarily?  Correlating that log
>> output with the log_statement output from the test proper would let
>> us confirm or deny whether it's autovacuum.

> Done just now. Do you want me to force a run?

Both of these runs

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-03-05%2020%3A35%3A00
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-03-05%2021%3A45%3A02

appear to exonerate autovacuum, and the second seems to destroy the
behind-the-scenes-ANALYZE theory entirely, since there was no change
in the outputs of the extra instrumentation queries.  (In theory
there's a window for ANALYZE to have occurred in between, but that's
not very credible.)

So you can revert the rhinoceros config change if you like --- thanks
for making it so quickly!

Meanwhile, I'm back to wondering what could possibly have affected
the planner's estimates, if pg_proc and pg_statistic didn't change.
I confess bafflement ... but we've now eliminated the autovacuum-
did-it theory entirely, so it's time to start looking someplace else.
I wonder if something in the postgres_fdw remote join machinery
is not as deterministic as it should be.

regards, tom lane



Re: 2018-03 CFM

2018-03-05 Thread David Steele
Hi Magnus,

On 3/5/18 4:55 PM, Magnus Hagander wrote:
> 
> I would like to get a list of submitter patches totals vs the total
> number of patches they are reviewing.  In the past I have done this by
> eyeball.
> 
> I think that's pretty much the part that's available under "reports" to
> use as CF manager? Link at the bottom of the CF page, reports -> Author
> Stats?

Aha, I think I found that a long time ago and forgot about it.  Maybe it
would be more obvious at the top of the page?

On my wishlist would be a way to mark that I looked at a patch (and
maybe a comment), then be able to sort by that date.  Right now I'm
keeping spreadsheets for that.

Thanks,
-- 
-David
da...@pgmasters.net



Re: JIT compiling with LLVM v11

2018-03-05 Thread Andres Freund
On 2018-03-05 13:36:04 -0800, Andres Freund wrote:
> On 2018-03-05 16:19:52 -0500, Peter Eisentraut wrote:
> > Testing 0732ee73cf3ffd18d0f651376d69d4798d351ccc on Debian testing ...
> > 
> > The build works out of the box with whatever the default system packages
> > are.
> > 
> > Regression tests crash many times.  One backtrace looks like this:
> > 
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > #1  0x7fd5b1730231 in __GI_abort () at abort.c:79
> > #2  0x55c10a1555e3 in ExceptionalCondition
> > (conditionName=conditionName@entry=0x7fd5a245c2d8
> > "!(LLVMGetIntrinsicID(fn))",
> > errorType=errorType@entry=0x7fd5a245bb1d "FailedAssertion",
> > fileName=fileName@entry=0x7fd5a245c294 "llvmjit_expr.c",
> > lineNumber=lineNumber@entry=193) at assert.c:54
> > #3  0x7fd5a245510f in get_LifetimeEnd (mod=mod@entry=0x55c10b1db670)
> > at llvmjit_expr.c:193
> > #4  0x7fd5a24553c8 in get_LifetimeEnd (mod=0x55c10b1db670) at
> > llvmjit_expr.c:233
> > #5  BuildFunctionCall (context=context@entry=0x55c10b0ca340,
> > builder=builder@entry=0x55c10b225160,
> > mod=mod@entry=0x55c10b1db670, fcinfo=0x55c10b1a08b0,
> > v_fcinfo_isnull=v_fcinfo_isnull@entry=0x7ffc701f5c60)
> > at llvmjit_expr.c:244
> 
> Hm, that should be trivial to fix.  Which version of llvm are you
> building against? There appear to be a lot of them in testing:
> https://packages.debian.org/search?keywords=llvm+dev=names=testing=all

On Debian unstable, I built against a wide variety of branches:

for v in 3.9 4.0 5.0 6.0;do rm -f ../config.cache;CLANG="ccache clang-$v" 
LLVM_CONFIG=/usr/lib/llvm-$v/bin/llvm-config ../config.sh --with-llvm && make 
-j16 -s install && make -s check;done

All of those pass. I'll create a testing chroot.

Regards,

Andres



Re: 2018-03 CFM

2018-03-05 Thread Magnus Hagander
On Mon, Mar 5, 2018 at 3:48 PM, David Steele  wrote:

> Hi Aleksander,
>
> On 3/2/18 7:18 AM, Aleksander Alekseev wrote:
> >
> >> You do realize we have the actual source database available, I hope?
> Since
> >> it's our own system... There is no need to scrape the data back out --
> if
> >> we can just define what kind of reports we want, we can trivially run
> it on
> >> the source database. Or if we want it more often, we can easily make a
> >> webview for it. It's basically just a "map this URL to a SQL query"...
> >
> > I don't think commitfest.cputube.org has the SQL data on whether patch
> > pass the tests. It just displays SVG images from travis-ci.org. Also
> > unfortunately both commitfest.postgresql.org and commitfest.cputube.org
> > currently don't have any kind of public API and don't allow to export
> > data, e.g. in CSV or JSON.
> >
> > I guess it would be nice if both services supported export, in any
> > format, so anyone could build any kind of reports or automation tools
> > without parsing HTML with regular expressions or depending on other
> > people.
>
> Yes, that would be good.  I just had a chance to look through the data
> and the thing I was most hoping to do with it would be a bit complicated.
>
> I would like to get a list of submitter patches totals vs the total
> number of patches they are reviewing.  In the past I have done this by
> eyeball.
>
>
I think that's pretty much the part that's available under "reports" to use
as CF manager? Link at the bottom of the CF page, reports -> Author Stats?


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: JIT compiling with LLVM v11

2018-03-05 Thread Andres Freund
Hi,

On 2018-03-05 12:17:30 -0800, Andres Freund wrote:
> Writing up a patch that I can actually push.  Thanks both to Thomas and
> Peter for pointing me towards this issue!

After screwing the first attempt at a fix, the second one seems to work
nicely. With optimizations, inlining, etc all core tests pass on
Thomas' machine.

Greetings,

Andres Freund



Re: JIT compiling with LLVM v11

2018-03-05 Thread Andres Freund
On 2018-03-05 16:19:52 -0500, Peter Eisentraut wrote:
> Testing 0732ee73cf3ffd18d0f651376d69d4798d351ccc on Debian testing ...
> 
> The build works out of the box with whatever the default system packages
> are.
> 
> Regression tests crash many times.  One backtrace looks like this:
> 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7fd5b1730231 in __GI_abort () at abort.c:79
> #2  0x55c10a1555e3 in ExceptionalCondition
> (conditionName=conditionName@entry=0x7fd5a245c2d8
> "!(LLVMGetIntrinsicID(fn))",
> errorType=errorType@entry=0x7fd5a245bb1d "FailedAssertion",
> fileName=fileName@entry=0x7fd5a245c294 "llvmjit_expr.c",
> lineNumber=lineNumber@entry=193) at assert.c:54
> #3  0x7fd5a245510f in get_LifetimeEnd (mod=mod@entry=0x55c10b1db670)
> at llvmjit_expr.c:193
> #4  0x7fd5a24553c8 in get_LifetimeEnd (mod=0x55c10b1db670) at
> llvmjit_expr.c:233
> #5  BuildFunctionCall (context=context@entry=0x55c10b0ca340,
> builder=builder@entry=0x55c10b225160,
> mod=mod@entry=0x55c10b1db670, fcinfo=0x55c10b1a08b0,
> v_fcinfo_isnull=v_fcinfo_isnull@entry=0x7ffc701f5c60)
> at llvmjit_expr.c:244

Hm, that should be trivial to fix.  Which version of llvm are you
building against? There appear to be a lot of them in testing:
https://packages.debian.org/search?keywords=llvm+dev=names=testing=all

Greetings,

Andres Freund



Re: Kerberos test suite

2018-03-05 Thread Thomas Munro
On Tue, Mar 6, 2018 at 8:45 AM, Peter Eisentraut
 wrote:
> New patch attached.

Passes here.  LGTM.

Only complaint is your assumption that 'darwin' implies HomeBrew
installation paths, but you already did that in other tests before
this one.  Perhaps we can sort that out globally in some future patch.

PS Once this lands I'll adjust my Commitfest testing bot to use
PG_TEST_EXTRA='kerberos ldap ssl'.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Tomas Vondra


On 03/05/2018 08:57 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> 4) I see you've added generateClonedExtStatsStmt to parse_utilcmd.h, but
>> it was only really called in parse_utilcmd.c, so I've made it static. I
>> don't think we need to expose stuff unnecessarily.
> 
>> BTW the last point made me thinking, because parse_utilcmd.h also
>> exposes generateClonedIndexStmt. That is however necessary, because it's
>> called from DefineRelation when copying indexes from partitioned table
>> to partitions. I'm wondering - shouldn't we do the same thing for
>> extended statistics?
> 
> Maybe, but that would not be a bugfix anymore.  So if we do want that,
> that is definitely a new feature, so it should be its own patch; the
> copying of indexes to partitions is a new feature in pg11.
> 

Sure, I wasn't really suggesting squashing that into this bugfix.
-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Tomas Vondra


On 03/05/2018 09:37 PM, Thomas Munro wrote:
> On Tue, Mar 6, 2018 at 9:17 AM, Robert Haas  wrote:
>> The optimistic approach seems a little bit less likely to slow this
>> down on systems where barriers are expensive, so I committed that one.
>> Thanks for debugging this; I hope this fixes it, but I guess we'll
>> see.
> 
> Thanks.
> 
> For the record, the commit message (written by me) should have
> acknowledged Tomas's help as reviewer and tester.  Sorry about that.
> 

Meh. You've done the hard work of figuring out what's wrong. The commit
message is perfectly fine.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: jsonpath

2018-03-05 Thread Tomas Vondra
Hi,

The patch no longer applies - it got broken by fd1a421fe66 which changed
columns in pg_proc. A rebase is needed.

Fixing it is pretty simle, so I've done that locally and tried to run
'make check' under valgrind. And I got a bunch of reports about
uninitialised values. Full report attached, but in general there seem to
be two types of failures:

  Conditional jump or move depends on uninitialised value(s)
 at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so)
 by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so)
 by 0xA926D3: pvsnprintf (psprintf.c:121)
 by 0x723E03: appendStringInfoVA (stringinfo.c:130)
 by 0x723D58: appendStringInfo (stringinfo.c:87)
 by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413)
 by 0x776F99: outNode (outfuncs.c:3978)
 by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779)
 by 0x777CB9: outNode (outfuncs.c:4398)
 by 0x76D507: _outJsonExpr (outfuncs.c:1752)
 by 0x777CA1: outNode (outfuncs.c:4395)
 by 0x767000: _outList (outfuncs.c:187)
 by 0x776874: outNode (outfuncs.c:3753)
 by 0x76A4D2: _outTableFunc (outfuncs.c:1068)
 by 0x776D89: outNode (outfuncs.c:3912)
 by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209)
 by 0x777959: outNode (outfuncs.c:4290)
 by 0x767000: _outList (outfuncs.c:187)
 by 0x776874: outNode (outfuncs.c:3753)
 by 0x773713: _outQuery (outfuncs.c:3049)
   Uninitialised value was created by a stack allocation
 at 0x5B0C19: base_yyparse (gram.c:26287)

This happens when _outCoerceViaIO tries to output 'location' field
(that's line 1413), so I guess it's not set/copied somewhere.

The second failure looks like this:

  Conditional jump or move depends on uninitialised value(s)
 at 0x49E58B: ginFillScanEntry (ginscan.c:72)
 by 0x49EB56: ginFillScanKey (ginscan.c:221)
 by 0x49EF72: ginNewScanKey (ginscan.c:369)
 by 0x4A3875: gingetbitmap (ginget.c:1807)
 by 0x4F620B: index_getbitmap (indexam.c:727)
 by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
 by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
 by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
 by 0x6DC26D: ExecScanFetch (execScan.c:95)
 by 0x6DC308: ExecScan (execScan.c:162)
 by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
 by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
 by 0x6E5961: ExecProcNode (executor.h:239)
 by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
 by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
 by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
 by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
 by 0x6D1361: ExecProcNode (executor.h:239)
 by 0x6D3BB7: ExecutePlan (execMain.c:1721)
 by 0x6D1917: standard_ExecutorRun (execMain.c:361)
   Uninitialised value was created by a heap allocation
 at 0xA64FDC: palloc (mcxt.c:858)
 by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630)
 by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746)
 by 0xA340C0: FunctionCall7Coll (fmgr.c:1201)
 by 0x49EE7F: ginNewScanKey (ginscan.c:313)
 by 0x4A3875: gingetbitmap (ginget.c:1807)
 by 0x4F620B: index_getbitmap (indexam.c:727)
 by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
 by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
 by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
 by 0x6DC26D: ExecScanFetch (execScan.c:95)
 by 0x6DC308: ExecScan (execScan.c:162)
 by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
 by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
 by 0x6E5961: ExecProcNode (executor.h:239)
 by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
 by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
 by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
 by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
 by 0x6D1361: ExecProcNode (executor.h:239)

So the extra_data allocated in gin_extract_jsonpath_query() get to
ginFillScanEntry() uninitialised.


Both seem like a valid issues, I think.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
==31228== Conditional jump or move depends on uninitialised value(s)
==31228==at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so)
==31228==by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so)
==31228==by 0xA926D3: pvsnprintf (psprintf.c:121)
==31228==by 0x723E03: appendStringInfoVA (stringinfo.c:130)
==31228==by 0x723D58: appendStringInfo (stringinfo.c:87)
==31228==by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413)
==31228==by 0x776F99: outNode (outfuncs.c:3978)
==31228==by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779)
==31228==by 0x777CB9: outNode (outfuncs.c:4398)
==31228==by 0x76D507: _outJsonExpr (outfuncs.c:1752)
==31228==by 0x777CA1: outNode (outfuncs.c:4395)
==31228==by 0x767000: _outList (outfuncs.c:187)
==31228==by 0x776874: outNode (outfuncs.c:3753)
==31228==

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-03-05 Thread Daniel Gustafsson
> On 26 Jan 2018, at 00:05, Daniel Gustafsson  wrote:
> 
>> On 24 Jan 2018, at 16:45, Alvaro Herrera  wrote:

>> Maybe have two patches, 0001 creates the files moving the contents over,
>> then 0002 adds your new stuff on top.
> 
> The two attached patches implements this.

Attached are rebased patches to cope with the recent pgproc changes.  I also
made the function cope with NULL messages, not because it’s a sensible value
but I can see this function being fed from a sub-SELECT which could return
NULL.

As per the previous mail, 0001 refactors the signal code according to Alvaros
suggestion and 0002 implements $subject on top of the refactoring.

cheers ./daniel



0001-Refactor-backend-signalling-code-v7.patch
Description: Binary data


0002-Support-optional-message-in-backend-cancel-terminate-v7.patch
Description: Binary data


Re: JIT compiling with LLVM v11

2018-03-05 Thread Peter Eisentraut
Testing 0732ee73cf3ffd18d0f651376d69d4798d351ccc on Debian testing ...

The build works out of the box with whatever the default system packages
are.

Regression tests crash many times.  One backtrace looks like this:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7fd5b1730231 in __GI_abort () at abort.c:79
#2  0x55c10a1555e3 in ExceptionalCondition
(conditionName=conditionName@entry=0x7fd5a245c2d8
"!(LLVMGetIntrinsicID(fn))",
errorType=errorType@entry=0x7fd5a245bb1d "FailedAssertion",
fileName=fileName@entry=0x7fd5a245c294 "llvmjit_expr.c",
lineNumber=lineNumber@entry=193) at assert.c:54
#3  0x7fd5a245510f in get_LifetimeEnd (mod=mod@entry=0x55c10b1db670)
at llvmjit_expr.c:193
#4  0x7fd5a24553c8 in get_LifetimeEnd (mod=0x55c10b1db670) at
llvmjit_expr.c:233
#5  BuildFunctionCall (context=context@entry=0x55c10b0ca340,
builder=builder@entry=0x55c10b225160,
mod=mod@entry=0x55c10b1db670, fcinfo=0x55c10b1a08b0,
v_fcinfo_isnull=v_fcinfo_isnull@entry=0x7ffc701f5c60)
at llvmjit_expr.c:244

...

#16 0x55c10a0433ad in exec_simple_query (
query_string=0x55c10b096358 "SELECT COUNT(*) FROM test_tsquery WHERE
keyword <  'new & york';") at postgres.c:1082

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Verify Checksums during Basebackups

2018-03-05 Thread David Steele
Hi Michael,

On 3/5/18 6:36 AM, Stephen Frost wrote:
> * Michael Banck (michael.ba...@credativ.de) wrote:
>
>> So I guess this would have to be sent back via the replication protocol,
>> but I don't see an off-hand way to do this easily?
> 
> The final ordinary result set could be extended to include the
> information about checksum failures..?  I'm a bit concerned about what
> to do when there are a lot of checksum failures though..  Ideally, you'd
> identify all of the pages in all of the files where a checksum failed
> (just throwing an error such as the one proposed above is really rather
> terrible since you have no idea what block, or even what table, failed
> the checksum...).

I agree that knowing the name of the file that failed validation is
really important, with a list of the pages that failed validation being
a nice thing to have as well, though I would be fine having the latter
added in a future version.

For instance, in pgBackRest we output validation failures this way:

[from a regression test]
WARN: invalid page checksums found in file
[TEST_PATH]/db-primary/db/base/base/32768/33001 at pages 0, 3-5, 7

Note that we collate ranges of errors to keep the output from being too
overwhelming.

I think the file names are very important because there's a rather large
chance that corruption may happen in an index, unlogged table, or
something else that can be rebuilt or reloaded.  Knowing where the
corruption is can save a lot of headaches.

> Reviewing the original patch and considering this issue, I believe there
> may be a larger problem- while very unlikely, there's been concern that
> it's possible to read a half-written page (and possibly only the second
> half) and end up with a checksum failure due to that.  In pgBackRest, we
> address that by doing another read of the page and by checking the LSN
> vs. where we started the backup (if the LSN is more recent than when the
> backup started then we don't have to care about the page- it'll be in
> the WAL).

The need to reread pages can be drastically reduced by skipping
validation of any page that has an LSN >= the backup start LSN because
they will be replayed from WAL during recovery.

The rereads are still necessary because of the possible transposition of
page read vs. page write as Stephen notes above.  We have not been able
to reproduce this case but can't discount it.

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Robert Haas
On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
 wrote:
> However, to perform Gather or Gather Merge once we have all partial paths
> ready, and to avoid too many existing code rearrangement, I am calling
> try_partitionwise_grouping() before we do any aggregation/grouping on whole
> relation. By doing this, we will be having all partial paths in
> partially_grouped_rel and then existing code will do required finalization
> along with any Gather or Gather Merge, if required.
>
> Please have a look over attached patch-set and let me know if it needs
> further changes.

This does look better.

+  enable_partitionwise_agg (boolean)

Please don't abbreviate "aggregate" to "agg".

-   /* Build final grouping paths */
-   add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
_final_costs, gd, can_sort, can_hash,
- dNumGroups,
(List *) parse->havingQual);
+   if (isPartialAgg)
+   {
+   Assert(agg_partial_costs && agg_final_costs);
+   add_paths_to_partial_grouping_rel(root, input_rel,
+
   partially_grouped_rel,
+
   agg_partial_costs,
+
   gd, can_sort, can_hash,
+
   false, true);
+   }
+   else
+   {
+   double  dNumGroups;
+
+   /* Estimate number of groups. */
+   dNumGroups = get_number_of_groups(root,
+
   cheapest_path->rows,
+
   gd,
+
   child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+   /* Build final grouping paths */
+   add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+   }

This looks strange.  Why do we go down two completely different code
paths here?  It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg.  I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed.  On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL.  But that doesn't appear to be what this
is doing.

+   /*
+* If there are any fully aggregated partial paths present,
may be because
+* of parallel Append over partitionwise aggregates, we must stick a
+* Gather or Gather Merge path atop the cheapest partial path.
+*/
+   if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

More tomorrow...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Tom Lane
I wrote:
> If people like this approach, I propose to commit this more or less
> as-is.  The select-tab-completion patch would then need to be rewritten
> to use this infrastructure, but I think that should be straightforward.
> As a separate line of work, the infrastructure could be applied to fix
> the pre-existing places where tab completion fails against old servers.
> But that's probably work for v12 or beyond, unless somebody's really
> motivated to do it right now.

Pushed, but while looking it over a second time, I noticed a discrepancy
that ought to be resolved.  In Query_for_list_of_functions, we now have
this for server version >= 11

/* selcondition */
"p.prokind IN ('f', 'w')",

and this for prior versions:

/* selcondition */
NULL,

The prokind variant is as Peter had it, and NULL is what we were using
here in v10 and earlier.  But it strikes me that these are inconsistent,
in that the prokind variant rejects aggregates while the other variant
doesn't.  We should decide which behavior we want and make that happen
consistently regardless of server version.

I believe the primary reason why the old code didn't reject aggregates
is that there is no GRANT ON AGGREGATE syntax, so that we really need to
include aggregates when completing GRANT ... ON FUNCTION.  Also, other
commands such as DROP FUNCTION will accept an aggregate, although that's
arguably just historical laxity.

If we wanted to tighten this up, one way would be to create a separate
Query_for_list_of_functions_or_aggregates that allows both, and use it
for (only) the GRANT/REVOKE case.  I'm not sure it's worth the trouble
though; I do not recall hearing field complaints about the laxity of
the existing completion rules.  So my inclination is to change the
v11 code to be "p.prokind != 'p'" and leave it at that.

Thoughts?

regards, tom lane



Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Thomas Munro
On Tue, Mar 6, 2018 at 9:17 AM, Robert Haas  wrote:
> The optimistic approach seems a little bit less likely to slow this
> down on systems where barriers are expensive, so I committed that one.
> Thanks for debugging this; I hope this fixes it, but I guess we'll
> see.

Thanks.

For the record, the commit message (written by me) should have
acknowledged Tomas's help as reviewer and tester.  Sorry about that.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Tom Lane
Joe Conway  writes:
> On 03/05/2018 11:19 AM, Tom Lane wrote:
>> Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
>> rhinoceros' extra_config options, temporarily?  Correlating that log
>> output with the log_statement output from the test proper would let
>> us confirm or deny whether it's autovacuum.

> Done just now. Do you want me to force a run?

Thanks.  I'm about to push something, so no need for a force.

regards, tom lane



Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 3/1/18 11:18 PM, Michael Paquier wrote:
> 
> Based on my recent lookup at code level for this feature, the patch for
> pg_resetwal (which could have been discussed on its own thread as well),
> would be fine for commit.  The thing could be extended a bit more but
> there is nothing opposing even a basic test suite to be in.  

There are no core changes, so it doesn't seem like the tests can hurt
anything.

> Then you
> have a set of refactoring patches, which still need some work.

New patches posted today, hopefully those address most of your concerns.

> And
> finally there is a rather invasive patch on top of the whole thing.  

I'm not sure if I would call it invasive since it's an optional feature
that is off by default.  Honestly, I think the refactor in 02 is more
likely to cause problems even if the goal there is *not* to change the
behavior.

> The
> refactoring work shows much more value only after the main feature is
> in, still I think that unifying the default permissions allowed for
> files and directories, as well as mkdir() calls has some value in
> itself to think it as an mergeable, independent, change.  

I agree.

> I think that
> it would be hard to get the whole patch set into the tree by the end of
> the CF though

I hope it does make it, it's a pretty big win for security.

> but cutting the refactoring pieces would be doable.  At
> least it would provide some base for integration in v12.  And the
> refactoring patch has some pieces that would be helpful for TAP tests as
> well.

I've gone pretty big on tests in this patch because I recognize it is a
pretty fundamental behavior change.

Thanks,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-05 Thread Pavel Stehule
2018-02-21 8:35 GMT+01:00 Michael Paquier :

> On Tue, Feb 20, 2018 at 06:46:57PM +0300, Arthur Zakirov wrote:
> > Just 2 cents from me. It seems that there is a problem with extensions
> > GUCs.
> >
> > [...]
> >
> > =# SELECT pg_get_functiondef('func_with_set_params'::regproc);
> > ERROR:  unrecognized configuration parameter "plpgsql.extra_errors"
>
> You have an excellent point here, and I did not consider it.
> For pg_dump and pg_dumpall, something like what I described in
> https://www.postgresql.org/message-id/20180112012440.ga2...@paquier.xyz
> to extend pg_settings so as GUC types are visible via SQL could make
> sense, now it is really a lot for just being able to quote parameters
> correctly.  On top of that, what I suggested previously would not work
> reliably except if we have pg_get_functiondef load the library
> associated to a given parameter.  Even in this case there could
> perfectly be a custom parameter from another plugin and not a parameter
> associated to the function language itself.
>
> It seems to me that this brings us back to a best-effort for the backend
> and the frontend by maintaining a list of hardcoded parameter names,
> which could be refined a maximum by considering lists for in-core
> extensions and perhaps the most famous extensions around :(
>

I afraid so there is not good solution. Is possible to store options in
original form? When the function will be displayed, then the original value
will be displayed. The current patch (with known extensions) can be used as
bug fix and back patched. In new version we store original value with
quotes.

Regards

Pavel



>
>
> Thoughts?
> --
> Michael
>


RE: [HACKERS] Client Connection redirection support for PostgreSQL

2018-03-05 Thread Satyanarayana Narlapuram
Please see the attached patch with the comments.

Changes in the patch:
A client-side PGREDIRECTLIMIT parameter has been introduced to control 
the maximum number of retries. 
BE_v3.1 sends a ProtocolNegotiation message. FE_v3.1 downgrades to v3.0 
upon receipt of this message.
FE falls back to v3.0 if 3.1 is not supported by the server.


>> I hadn't really thought deeply about whether redirection should 
happen before or after authentication.  For the most part, before seems better, 
because it seems a bit silly to force people to authenticate just so that you 
can tell them to go someplace else.  Also, that would lead to double 
authentication,which might for example result in multiple password 
prompts, which users might either dislike or find confusing.  

Yes, redirection before authentication would avoid multiple password 
prompts.

Thanks,
Satya


redirection_v2.patch
Description: redirection_v2.patch


Re: JIT compiling with LLVM v11

2018-03-05 Thread Andres Freund
Hi,

On 2018-03-04 21:00:06 -0800, Andres Freund wrote:
> > Looking at llvm_get_function(), the function that raises that error, I
> > see that there are a few different paths here.  I don't have
> > HAVE_DECL_LLVMORCGETSYMBOLADDRESSIN defined, and I don't have LLVM <
> > 5, so I should be getting the symbol address with
> > LLVMOrcGetSymbolAddress(llvm_opt0_orc, , mangled) or
> > LLVMOrcGetSymbolAddress(llvm_opt3_orc, , mangled), but clearly
> > those are returning NULL.
> 
> Yep. I wonder if this is some symbol naming issue or such, because
> emitting and relocating the object worked without an error.

Thanks to Thomas helping get access to an OSX machine I was able to
discover what the issue is.  OSX prepends, for reason unbeknownst to me,
a leading underscore to all function names.  That lead to two issues:
First JITed functions do not have that underscore (making us look up a
non-existing symbol, because llvm_get_function applied
mangling). Secondly, llvm_resolve_symbol failed looking up symbol names,
because for $reason dlsym() etc do *not* have the names prefixed by the
underscore.   Easily enough fixed.

After that I discovered another problem, the bitcode files for core pg /
contrib modules weren't installed. That turned out to be a make version
issue, I'd used
define install_llvm_module =
# body
but older make only like
define install_llvm_module
# body

Writing up a patch that I can actually push.  Thanks both to Thomas and
Peter for pointing me towards this issue!

Greetings,

Andres Freund



Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Robert Haas
On Sun, Mar 4, 2018 at 4:46 PM, Thomas Munro
 wrote:
> Thanks!  Here are a couple of patches.  I'm not sure which I prefer.
> The "pessimistic" one looks simpler and is probably the way to go, but
> the "optimistic" one avoids doing an extra read until it has actually
> run out of data and seen mq_detached == true.
>
> I realised that the pg_write_barrier() added to
> shm_mq_detach_internal() from the earlier demonstration/hack patch was
> not needed... I had a notion that SpinLockAcquire() might not include
> a strong enough barrier (unlike SpinLockRelease()), but after reading
> s_lock.h I think it's not needed (since you get either TAS() or a
> syscall-based slow path, both expected to include a full fence).  I
> haven't personally tested this on a weak memory order system.

The optimistic approach seems a little bit less likely to slow this
down on systems where barriers are expensive, so I committed that one.
Thanks for debugging this; I hope this fixes it, but I guess we'll
see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Joe Conway
On 03/05/2018 11:19 AM, Tom Lane wrote:
> Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
> rhinoceros' extra_config options, temporarily?  Correlating that log
> output with the log_statement output from the test proper would let
> us confirm or deny whether it's autovacuum.


Done just now. Do you want me to force a run?


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 2/28/18 2:28 AM, Michael Paquier wrote:
> On Tue, Feb 27, 2018 at 03:52:32PM -0500, David Steele wrote:
>> On 1/30/18 3:01 AM, Michael Paquier wrote:
> 
>>> +command_ok(
>>> +   ['chmod', "-R", 'g+rX', "$pgdata"],
>>> +   'add group perms to PGDATA');
>>>
>>> That would blow up on Windows.  You should replace that by perl's chmod
>>> and look at its return result for sanity checks, likely with ($cnt != 0).
>>> And let's not use icacls please...
>>
>> Fixed with a new function, add_pg_data_group_perm().
> 
> That's basically a recursive chmod, so chmod_recursive is more adapted?
> I could imagine that this is useful as well for removing group
> permissions, so the new mode could be specified as an argument.

The required package (File::chmod::Recursive) for chmod_recursive is not
in use anywhere else and was not installed when I installed build
dependencies.

I'm not sure what the protocol for introducing a new Perl module is?  I
couldn't find packages for the major OSes.  Are we OK with using CPAN?

>>> +if ($params{has_group_access})
>>> +{
>>> +push(@{$params{extra}}, '-g');
>>> +}
>>> No need to introduce a new parameter here, please use directly extra =>
>>> [ '-g' ] in the routines calling PostgresNode::init.
>>
>> Other code needs to know if group access is enabled on the node (see
>> RewindTest.pm) so it's not just a matter of passing the option.
> 
> I don't quite understand here.  I have no objection into extending
> setup_cluster() with a group_access argument so as the tests run both
> the group access and without it, but I'd rather keep the low-level API
> clean of anything fancy if we can use the facility which already
> exists.

I'm not sure what you mean by, "facility that already exists".  I think
I implemented this the same as the allows_streaming and has_archiving
flags.  The only difference is that this option must be set at initdb
time rather than in postgresql.conf.

Could you be more specific?

>> New patches are attached.  The end result is almost identical to v6 but
>> code was moved from 03 to 02 as per Michael's suggestion.
> 
> Thanks for the updated versions.
> 
>> 1) 01-pgresetwal-test
>>
>> Adds a very basic test suite for pg_resetwal.
> 
> Okay.  This one looks fine to me.  It could be passed down to a
> committer.

Agreed.

>> 2) 02-file-perm
>>
>> Adds a new header (file_perm.h) and makes all front-end utilities use
>> the new constants for setting umask.  Converts mkdir() calls in the
>> backend to MakeDirectoryDefaultPerm() if the original
>> call used default permissions.  Adds tests to make sure permissions in
>> PGDATA are set correctly by the front-end tools.
> 
> I had a more simple (complicated?) thing in mind for the split, which
> would actually cause this patch to be split into two more:
> 1) Introduce file_perm.h and replace all the existing values for modes
> and masks to what the header introduces.  This allows to check if the
> new header is not forgotten anywhere, especially for the frontends.
> 2) Introduce MakeDirectoryDefaultPerm, which switches the backend calls
> of mkdir() to the new API.
> 3) Add the grouping facilities, which updates the default masks and
> modes of file_perm.h.
> 
> Grouping 1) and 2) together as you do makes sense as well I guess, as in
> my proposal the mkdir calls in the backend would be touched twice, still
> things get more incremental.

I think I prefer grouping 1 and 2.  It produces less churn, as you say,
and I don't think MakeDirectoryDefaultPerm really needs its own patch.
Based on comments so far, nobody has an objection to it.

In theory, the first two patches could be applied just for refactoring
without adding group permissions at all.  There are a lot of new tests
to make sure permissions are set correctly so it seems like a win right off.

> 
> +/*
> + * Default mode for created files.
> + */
> +#define PG_FILE_MODE_DEFAULT   (S_IRUSR | S_IWUSR | S_IRGRP)
> +
> +/*
> + * Default mode for directories.
> + */
> +#define PG_DIR_MODE_DEFAULT(S_IRWXU | S_IRGRP | S_IXGRP)
> For the first cut, this should not use S_IRGRP in the first declaration,
> and (S_IXGRP | S_IXGRP) in the second declaration.

Done.

> 
> -if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
> +if (script == NULL && (script = fopen(output_path, "w")) == NULL)
> Why are those calls in pg_upgrade changed?

Done.

Those should have been removed already.  Originally I had removed
fopen_priv() because it didn't serve a purpose anymore.  In the
meantime, somebody else made it a macro to fopen().  I decided my patch
would be less noisy if I reverted to fopen_priv() but I missed a few places.

> TestLib.pm does not need the group-related extensions, right?

Done.

> check_pg_data_perm() looks useful even without $allow_group even if the
> grouping facility is reverted at the end.  S_ISLNK should be considered
> as well for tablespaces or symlinks of pg_wal?  We have such cases in
> the 

Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Thomas Munro
On Tue, Mar 6, 2018 at 5:04 AM, Alvaro Herrera  wrote:
> Thomas Munro wrote:
>> On Sun, Mar 4, 2018 at 10:46 PM, Magnus Hagander  wrote:
>> > Um. Have you actually seen the "mail archive app" cut long threads off in
>> > other cases? Because it's certainly not supposed to do that...
>>
>> Hi Magnus,
>>
>> I mean the "flat" thread view:
>>
>> https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com
>>
>> The final message on that page is not the final message that appears
>> in my mail client for the thread.  I guessed that might have been cut
>> off due to some hard-coded limit, but perhaps there is some other
>> reason (different heuristics for thread following?)
>
> You're thinking of message
> https://www.postgresql.org/message-id/cafjfprfa6_n10cn3vxjn9hdtqneh6a1rfnlxy0pncp63t2p...@mail.gmail.com
> but that is not the same thread -- it doesn't have the References or
> In-Reply-To headers (see "raw"; user/pwd is archives/antispam).  Don't
> know why though -- maybe Gmail trimmed References because it no longer
> fit in the DKIM signature?  Yours had a long one:
> https://www.postgresql.org/message-id/raw/CAEepm%3D0VCrC-WfzZkq3YSvJXf225rDnp1ypjv%2BrjKO5d0%3DXqFg%40mail.gmail.com

Huh.  Interesting.  It seems that Gmail uses a fuzzier heuristics, not
just "In-Reply-To", explaining why I considered that to be the same
thread but our archive didn't:

http://www.sensefulsolutions.com/2010/08/how-does-email-threading-work-in-gmail.html

I wonder why it dropped the In-Reply-To header when Ashutosh replied...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Alvaro Herrera
Tomas Vondra wrote:

> 4) I see you've added generateClonedExtStatsStmt to parse_utilcmd.h, but
> it was only really called in parse_utilcmd.c, so I've made it static. I
> don't think we need to expose stuff unnecessarily.

> BTW the last point made me thinking, because parse_utilcmd.h also
> exposes generateClonedIndexStmt. That is however necessary, because it's
> called from DefineRelation when copying indexes from partitioned table
> to partitions. I'm wondering - shouldn't we do the same thing for
> extended statistics?

Maybe, but that would not be a bugfix anymore.  So if we do want that,
that is definitely a new feature, so it should be its own patch; the
copying of indexes to partitions is a new feature in pg11.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: chained transactions

2018-03-05 Thread Andres Freund
On 2018-03-05 09:21:33 +, Simon Riggs wrote:
> On 2 March 2018 at 07:18, Andres Freund  wrote:
> > Hi,
> >
> > On 2018-02-28 22:35:52 -0500, Peter Eisentraut wrote:
> >> This feature is meant to help manage transaction isolation in
> >> procedures.
> >
> > This is a major new feature, submitted the evening before the last CF
> > for v11 starts. Therefore I think it should just be moved to the next
> > fest, it doesn't seems realistic to attempt to get this into v11.
> 
> Looks like a useful patch that adds fairly minor syntax that follows
> the SQL Standard.
> 
> It introduces no new internal infrastructure, so I can't call this a
> major feature.

You can avoid calling it new infrastructure, but it certainly modifies
new one. And it adds quite some new user interface, which certainly make
s it important to get it right.

 doc/src/sgml/plpgsql.sgml   |9 
 doc/src/sgml/ref/abort.sgml |   14 +
 doc/src/sgml/ref/commit.sgml|   14 +
 doc/src/sgml/ref/end.sgml   |   14 +
 doc/src/sgml/ref/rollback.sgml  |   14 +
 doc/src/sgml/spi.sgml   |   14 -
 src/backend/access/transam/xact.c   |  210 +++-
 src/backend/commands/cluster.c  |2 
 src/backend/commands/dbcommands.c   |2 
 src/backend/commands/discard.c  |2 
 src/backend/commands/portalcmds.c   |2 
 src/backend/commands/subscriptioncmds.c |4 
 src/backend/commands/typecmds.c |2 
 src/backend/commands/vacuum.c   |4 
 src/backend/commands/variable.c |   57 -
 src/backend/executor/spi.c  |   25 ++
 src/backend/nodes/copyfuncs.c   |2 
 src/backend/nodes/equalfuncs.c  |2 
 src/backend/parser/gram.y   |   34 +--
 src/backend/replication/walsender.c |6 
 src/backend/tcop/utility.c  |   58 ++---
 src/backend/utils/misc/guc.c|   35 +--
 src/backend/utils/time/snapmgr.c|2 
 src/bin/psql/common.c   |2 
 src/include/access/xact.h   |   18 -
 src/include/commands/variable.h |4 
 src/include/executor/spi.h  |4 
 src/include/nodes/parsenodes.h  |4 
 src/pl/plperl/plperl.c  |4 
 src/pl/plpgsql/src/expected/plpgsql_transaction.out |   31 ++
 src/pl/plpgsql/src/pl_exec.c|   10 
 src/pl/plpgsql/src/pl_funcs.c   |   10 
 src/pl/plpgsql/src/pl_gram.y|   18 +
 src/pl/plpgsql/src/pl_scanner.c |2 
 src/pl/plpgsql/src/plpgsql.h|2 
 src/pl/plpgsql/src/sql/plpgsql_transaction.sql  |   23 ++
 src/pl/plpython/plpy_plpymodule.c   |4 
 src/pl/tcl/pltcl.c  |4 
 src/test/regress/expected/transactions.out  |  141 +
 src/test/regress/sql/transactions.sql   |   49 
 40 files changed, 596 insertions(+), 262 deletions(-)


Greetings,

Andres Freund



Re: [HACKERS] generated columns

2018-03-05 Thread Peter Eisentraut
On 2/1/18 21:25, Michael Paquier wrote:
> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
>> That would be nice.  I'm going to study this some more to see what can
>> be done.
> 
> Thanks for the update.  Note: Peter has moved the patch to next CF.

I didn't get to updating this patch, so I'm closing it in this CF.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Kerberos test suite

2018-03-05 Thread Peter Eisentraut
On 2/27/18 00:21, Michael Paquier wrote:
> Thanks.  Could you document that on the README please?  krb5-user and
> krb5-kdc is a split from Debian.  For darwin, are you using macports or
> homebrew?  I would assume the later, and it would be nice to precise
> that in the README as well.  On Debian you need to install as well
> krb5-admin-server as it includes kadmin.local which the test needs.
> Once I understood that I have been able to run the tests.

Added to README.

> You have forgotten to update ALWAYS_SUBDIRS in src/test/Makefile.

Fixed, and updated to reflect recent changes with PG_TEST_EXTRA etc.

> +my ($stdout, $krb5_version);
> +IPC::Run::run [ 'krb5-config', '--version' ], '>', \$stdout or die
> "could not execute krb5-config";
> +$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/ or die "could not get
> Kerberos version";
> +$krb5_version = $1;
> Time for a new routine command_log which executes the command, then
> returns stdout and stderr to the caller?

I didn't do anything about this.  Do we have any more places where that
would be needed right now?

> +system_or_bail 'echo secret1 | kinit test1';
> Using IPC::Run stuff would be better here.

done

> @@ -1153,6 +1152,11 @@ sub psql
> $params{on_error_stop} = 1 unless defined $params{on_error_stop};
> $params{on_error_die}  = 0 unless defined $params{on_error_die};
> 
> +   $connstr .= ' host=localhost' if defined $params{tcpip};
> +
> +   my @psql_params =
> + ('psql', '-XAtq', '-d', $connstr, '-f', '-');
> This bit I don't like.  Wouldn't it be enough to abuse of extra_params
> and use a custom connection string?  The last value wins in a psql
> command. 

done

Also included feedback from Thomas Munro.

New patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f694b67401d93051c3442f60c06f8327391239cf Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 5 Mar 2018 14:42:11 -0500
Subject: [PATCH v2] Tests for Kerberos/GSSAPI authentication

---
 configure   |   4 +
 configure.in|   2 +
 doc/src/sgml/regress.sgml   |  12 ++-
 src/Makefile.global.in  |   2 +
 src/test/Makefile   |   7 +-
 src/test/kerberos/.gitignore|   2 +
 src/test/kerberos/Makefile  |  25 ++
 src/test/kerberos/README|  35 
 src/test/kerberos/t/001_auth.pl | 177 
 9 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 src/test/kerberos/.gitignore
 create mode 100644 src/test/kerberos/Makefile
 create mode 100644 src/test/kerberos/README
 create mode 100644 src/test/kerberos/t/001_auth.pl

diff --git a/configure b/configure
index 1242e310b4..3943711283 100755
--- a/configure
+++ b/configure
@@ -709,7 +709,9 @@ with_systemd
 with_selinux
 with_openssl
 with_ldap
+with_krb_srvnam
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5788,6 +5790,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
@@ -5815,6 +5818,7 @@ fi
 
 
 
+
 cat >>confdefs.h <<_ACEOF
 #define PG_KRB_SRVNAM "$with_krb_srvnam"
 _ACEOF
diff --git a/configure.in b/configure.in
index aee3ab0867..1babdbb755 100644
--- a/configure.in
+++ b/configure.in
@@ -638,6 +638,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
@@ -650,6 +651,7 @@ PGAC_ARG_REQ(with, krb-srvnam,
  [NAME], [default service principal name in Kerberos (GSSAPI) 
[postgres]],
  [],
  [with_krb_srvnam="postgres"])
+AC_SUBST(with_krb_srvnam)
 AC_DEFINE_UNQUOTED([PG_KRB_SRVNAM], ["$with_krb_srvnam"],
[Define to the name of the default PostgreSQL service 
principal in Kerberos (GSSAPI). (--with-krb-srvnam=NAME)])
 
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 3c448dc5bc..673a8c2164 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -220,10 +220,20 @@ Additional Test Suites
PG_TEST_EXTRA to a whitespace-separated list, for
example:
 
-make check-world PG_TEST_EXTRA='ldap ssl'
+make check-world PG_TEST_EXTRA='kerberos ldap ssl'
 
The following values are currently supported:

+
+ kerberos
+ 
+  
+   Runs the test suite under src/test/kerberos.  This
+   requires an MIT Kerberos installation and opens TCP/IP listen sockets.
+  
+ 
+
+
 
  ldap
  
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dcb8dc5d90..15c14951e8 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -186,6 +186,8 @@ with_tcl= @with_tcl@
 with_openssl   = @with_openssl@
 with_selinux   = @with_selinux@
 with_systemd   = @with_systemd@
+with_gssapi= @with_gssapi@
+with_krb_srvnam= 

Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2018-03-05 Thread Tom Lane
Tomas Vondra  writes:
> On 03/04/2018 09:46 PM, Tom Lane wrote:
>> Well, I think the existing bytea bug is a counterexample to that.  If
>> someone were to repeat that mistake with, say, UUID, these tests would not
>> catch it, because none of them would exercise UUID-vs-something-else.
>> For that matter, your statement is false on its face, because even if
>> somebody tried to add say uuid-versus-int8, these tests would not catch
>> lack of support for that in convert_to_scalar unless the specific case of
>> uuid-versus-int8 were added to the tests.

> I suspect we're simply having different expectations what the tests
> could/should protect against - my intention was to make sure someone
> does not break convert_to_scalar for the currently handled types.

I concur that we could use better test coverage for the existing
code --- the coverage report is pretty bleak there.  But I think we
could improve that just by testing with the existing operators.  I do
not see much use in checking that unsupported cross-type cases fail
cleanly, because there isn't a practical way to have full coverage
for such a concern.

regards, tom lane



Re: [HACKERS] PoC: custom signal handler for extensions

2018-03-05 Thread David Steele
Hi Maksim,

On 3/5/18 11:24 AM, Maksim Milyutin wrote:
> Hello David,
> 
> 
> On 05.03.2018 18:50, David Steele wrote:
>> Hello Maksim,
>>
>> On 1/27/18 2:19 PM, Arthur Zakirov wrote:
>>
>>> Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
>>> An extension registers a handler and then unregister it doing
>>> nothing. It seems useless.
>>>
>>> Also process_shared_preload_libraries_in_progress within _PG_fini() will
>>> be false I think. _PG_fini() won't be called though, because
>>> implementation of internal_unload_library() is disabled.
>>>
>>> Actually, is there need in UnregisterCustomProcSignal() at all? It
>>> unregisters a handler only in current backend, for actual unergistering
>>> we need to call it everywhere, if I'm not mistaken.
>> This patch has been in Waiting on Author state for almost three weeks.
>> Have you had a chance to consider Arthur's suggestions?
> 
> Yes, I want to rework my patch to enable setup of custom signals on
> working backend without preload initialization.
> 
>> Do you know when you'll have an updated patch available?
> 
> I want to actuate the work on this patch for the next 12 release. Sorry,
> for now I can not keep up with the current release.
Understood.  I'll mark it Returned with Feedback and you can enter it in
a CF when you have a new patch.

Regards,
-- 
-David
da...@pgmasters.net



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Tom Lane
I wrote:
> The thing that I find curious, now that we've shut off autovacuum
> altogether on those tables, is that we *still* aren't getting stable
> results.  How can that be?

I spent some time trying to figure out what's going on here.  I've still
not been able to replicate the failure on any of my machines, but I have
learned a thing or two.

On the first query that's still failing on rhinoceros (and has also been
seen to fail on other machines, before the last round of changes), which
is at line 1142 in postgres_fdw.sql in current HEAD:

EXPLAIN (verbose, costs off)
UPDATE ft2 SET c3 = 'baz'
  FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
  WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
  RETURNING ft2.*, ft4.*, ft5.*;
-- can't be pushed down

(call this Q1), we are expecting to get a plan of the shape of

->  Nested Loop
  ->  Foreign Scan on public.ft2
Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 
1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
  ->  Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)

Now, there is no possible way that the planner would pick that plan if its
estimate of the number of rows out of the ft2 scan was more than 1.
Re-executing the foreign join would have high enough overhead to push the
plan to some other shape.  In fact, probably the shape we see in the
actual plan choice, on the failing machines:

->  Nested Loop
  ->  Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)
  ->  Materialize
->  Foreign Scan on public.ft2
  Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid 
FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE

I instrumented costsize.c, and determined that the estimated size of
"S 1"."T 1" WHERE (("C 1" > 2000)), before the clamp_row_est() rounding,
is only 0.1159 rows on my machine.  So there's no way to explain the plan
change as the result of small platform-specific roundoff differences ---
we need something more than a 12X change in the selectivity estimate
before the plan shape would change like this.  There are a bunch of things
going on under the hood, such as that the table's raw rowcount estimate
gets scaled up from the original 1000 rows because it's now got more pages
than before, but none come close to explaining 12X.

However, the planner is working with quite old statistics, dating back to
the manual ANALYZE at postgres_fdw.sql line 93.  If I stick in another
manual ANALYZE just before Q1, I get exactly the same plan change reported
by rhinoceros.  (And underneath, the rowcount estimate for ft2 has gone
from 1 row to 8 rows, which is much closer to the true value of 10 rows,
so the plan change is not surprising.)  What's more, doing this also
reproduces the one other plan change seen later in rhinoceros' output.

It is, therefore, very hard to avoid the conclusion that something is
causing an ANALYZE to happen while the script runs, despite our fooling
about with the table's reloptions.  I'm not sure that that something is
autovacuum.  A platform-specific bug in reloptions handling doesn't seem
out of the question, but poking around in the code didn't spot anything
obvious.

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily?  Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Another thing I'd like to do is temporarily add

select relpages, reltuples from pg_class where relname = 'T 1';

to the test script, both just after the manual ANALYZE and just before Q1.
If we see a change between those reports on any of the affected machines,
we'll know that *something* is changing the stats.  Now the problem with
doing that is that the expected value of relpages is platform-dependent
(I see 11 on 64-bit and 10 on 32-bit on my machines).  We can work around
that, perhaps by capturing the initial value in a temp table and printing
only the delta, but I'm not sure if it's worth the effort as opposed to
just letting it fail on 32-bit critters for a day or two.

regards, tom lane



Re: parallel append vs. simple UNION ALL

2018-03-05 Thread Robert Haas
On Tue, Feb 27, 2018 at 6:21 AM, Rajkumar Raghuwanshi
 wrote:
> I have applied 0001 on pg-head, and while playing with regression tests, it
> crashed with below test case.
>
> postgres=# SET min_parallel_table_scan_size=0;
> SET
> postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
> ORDER BY 1, 2, 3;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hmm, nice catch.  I think part of the problem here is that commit
69f4b9c85f168ae006929eec44fc44d569e846b9, wherein Andres introduced
the ProjectSet node, didn't really do the right thing in terms of
testing parallel-safety.  Before that commit, is_parallel_safe(root,
(Node *) scanjoin_target->exprs)) was really testing whether the tlist
produced during the scan/join phase was parallel-safe.  However, after
that commit, scanjoin_target->exprs wasn't really the final target for
the scan/join phase any more; instead, it was the first of possibly
several targets computed by split_pathtarget_at_srfs().  Really, the
right thing to do is to test the *last* element in that list for
parallel-safety, but as the code stands we end up testing the *first*
element.  So, if there's a parallel-restricted item in the target list
(this query ends up putting a CoerceToDomain in the target list, which
we currently consider parallel-restricted), it looks we can
nevertheless end up trying to project it in what is supposed to be a
partial path.

There are a large number of cases where this doesn't end up mattering
because the next upper_rel created will not get marked
consider_parallel because its target list will also contain the same
parallel-restricted construct, and therefore the partial paths
generated for the scan/join rel will never get used -- except to
generate Gather/Gather Merge paths, which has already happened; but
that step didn't know about the rest of the scan/join targets either,
so it won't have used them.  However, both create_distinct_paths() and
the code in grouping_planner that populates final_rel assume that they
don't need to retest the target for parallel-safety because no
projection is done at those levels; they just inherit the
parallel-safety marking of the input rel, so in those cases if the
input rel's marking is wrong the result is populated upward.

There's another way final_rel->consider_parallel can be wrong, too: if
the FROM-list is empty, then we create a join rel and set its
consider_parallel flag without regard to the parallel-safety of the
target list.  There are comments in query_planner() says that this
will be dealt with "later", but this seems not to be true. (Before
Tom's commit da1c91631e3577ea5818f855ebb5bd206d559006, the comments
simply ignored the question of whether a check was needed here, but
Tom seems to have inserted an incorrect justification for the
already-wrong code.)

I'm not sure to what degree, if at all, any of these problems are
visible given that we don't use final_rel->consider_parallel for much
of anything.  Certainly, it gets much easier to trigger a problem with
0001 applied, as the test case shows.  But I'm not entirely convinced
that there's no problem even without that.  It seems like every upper
rel that is setting its consider_parallel flag based on the first
element of some list of targets rather than the last is potentially
vulnerable to ending up with the wrong answer, and I'm afraid that
might have some adverse consequence that I haven't quite pinned down
yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: IndexJoin memory problem using spgist and boxes

2018-03-05 Thread Alexander Kuzmenkov

On 04.03.2018 20:20, Anton Dignös wrote:

The better alternative may be to have two temporary memory contexts,
one per-tuple for calling the inner consistent method and one
per-index-scan for the traversal memory.


Yes, this seems to be a better way of fixing the problem without 
introducing regressions mentioned by Tom. We'd keep a separate traversal 
context in ScanOpaque and run most of the spgWalk in it, except calling 
storeRes in query context and the inner consistent method in short-lived 
context.


Also, I think it would be worthwhile to test the resulting patch with 
valgrind. The allocations are not very apparent in the code, so it's 
easy to miss something.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




  1   2   >