Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread Rajkumar Raghuwanshi
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote 
wrote:

> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.
>

Hi Amit,

I have tried pruning for different values of constraint exclusion GUC
change, not sure exactly how it should behave, but I can see with the
delete statement pruning is not happening when constraint_exclusion is off,
but select is working as expected. Is this expected behaviour?

create table lp (c1 int, c2 text) partition by list(c1);
create table lp1 partition of lp for values in (1,2);
create table lp2 partition of lp for values in (3,4);
create table lp3 partition of lp for values in (5,6);
insert into lp values (1,'p1'),(2,'p1'),(3,'p2'),(4,'p2'),(5,'p3');

show constraint_exclusion ;
 constraint_exclusion
--
 partition
(1 row)

explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Append  (cost=0.00..29.05 rows=6 width=4)
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=4)
 Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)

explain delete from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Delete on lp  (cost=0.00..29.05 rows=6 width=6)
   Delete on lp1
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
(4 rows)

set constraint_exclusion = off;

explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Append  (cost=0.00..29.05 rows=6 width=4)
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=4)
 Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)

*explain delete from lp where c1 >= 1 and c1 < 2;*
QUERY PLAN
--
 Delete on lp  (cost=0.00..87.15 rows=18 width=6)
   Delete on lp1
   Delete on lp2
   Delete on lp3
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
   ->  Seq Scan on lp2  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
   ->  Seq Scan on lp3  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
(10 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
Ah, there is one reason not to use a mapping to CTEs to implement MERGE:
it might be faster to use a single query that is a FULL OUTER JOIN of the
source and target to drive the update/insert/delete operations.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-07 Thread Thomas Munro
On Wed, Nov 8, 2017 at 5:57 PM, Amit Khandekar  wrote:
> Thomas, can you please try the attached incremental patch
> regress_locale_changes.patch and check if the test passes ? The patch
> is to be applied on the main v22 patch. If the test passes, I will
> include these changes (also for list_parted) in the upcoming v23
> patch.

That looks good.  Thanks.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread Amit Langote
Hi David.

Thanks for the review.

(..also looking at the comments you sent earlier today.)

On 2017/11/07 11:14, David Rowley wrote:
> On 7 November 2017 at 01:52, David Rowley  
> wrote:
>> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
> 
> I have a little more review to share:
> 
> 1. Missing "in" in comment. Should be "mentioned in"
> 
>  * get_append_rel_partitions
>  * Return the list of partitions of rel that pass the clauses mentioned
>  * rel->baserestrictinfo
> 
> 2. Variable should be declared in inner scope with the following fragment:
> 
> void
> set_basic_child_rel_properties(PlannerInfo *root,
>RelOptInfo *rel,
>RelOptInfo *childrel,
>AppendRelInfo *appinfo)
> {
> AttrNumber attno;
> 
> if (rel->part_scheme)
> {
> 
> which makes the code the same as where you moved it from.

It seems like you included the above changes in your attached C file,
which I will incorporate into my repository.

> 3. Normally lfirst() is assigned to a variable at the start of a
> foreach() loop. You have code which does not follow this.
> 
> foreach(lc, clauses)
> {
> Expr   *clause;
> int i;
> 
> if (IsA(lfirst(lc), RestrictInfo))
> {
> RestrictInfo *rinfo = lfirst(lc);
> 
> You could assign this to a Node * since the type is unknown to you at
> the start of the loop.

Will make the suggested changes to match_clauses_to_partkey().

> 4.
> /*
> * Useless if what we're thinking of as a constant is actually
> * a Var coming from this relation.
> */
> if (bms_is_member(rel->relid, constrelids))
> continue;
> 
> should this be moved to just above the op_strict() test? This one seems 
> cheaper.

Agreed, will do.  Also makes sense to move the PartCollMatchesExprColl()
test together.

> 5. Typo "paritions": /* No clauses to prune paritions, so scan all
> partitions. */
> 
> But thinking about it more the comment should something more along the
> lines of /* No useful clauses for partition pruning. Scan all
> partitions. */

You fixed it. :)

> The key difference is that there might be clauses, just without Consts.
> 
> Actually, the more I look at get_append_rel_partitions() I think it
> would be better if you re-shaped that if/else if test so that it only
> performs the loop over the partindexes if it's been set.
> 
> I ended up with the attached version of the function after moving
> things around a little bit.

Thanks a lot for that.  Looks much better now.

> I'm still reviewing but thought I'd share this part so far.

As mentioned at the top, I'm looking at your latest comments and they all
seem to be good points to me, so will address those in the next version.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-07 Thread Amit Khandekar
On 8 November 2017 at 07:55, Thomas Munro  wrote:
> On Tue, Nov 7, 2017 at 8:03 AM, Robert Haas  wrote:
>> The changes to trigger.c still make me super-nervous.  Hey THOMAS
>> MUNRO, any chance you could review that part?
>
> Looking, but here's one silly thing that jumped out at me while
> getting started with this patch.  I cannot seem to convince my macOS
> system to agree with the expected sort order from :show_data, where
> underscores precede numbers:
>
>   part_a_10_a_20 | a | 10 | 200 |  1 |
>   part_a_1_a_10  | a |  1 |   1 |  1 |
> - part_d_1_15| b | 15 | 146 |  1 |
> - part_d_1_15| b | 16 | 147 |  2 |
>   part_d_15_20   | b | 17 | 155 | 16 |
>   part_d_15_20   | b | 19 | 155 | 19 |
> + part_d_1_15| b | 15 | 146 |  1 |
> + part_d_1_15| b | 16 | 147 |  2 |
>
> It seems that macOS (like older BSDs) just doesn't know how to sort
> Unicode and falls back to sorting the bits.  I expect that means that
> the test will also fail on any other OS with "make check
> LC_COLLATE=C".  I believe our regression tests are supposed to pass
> with a wide range of collations including C, so I wonder if this means
> we should stick a leading zero on those single digit numbers, or
> something, to stabilise the output.

I preferably need to retain the partition names. I have now added a
LOCALE "C" for partname like this :

-\set show_data 'select tableoid::regclass::text partname, * from
range_parted order by 1, 2, 3, 4, 5, 6'
+\set show_data 'select tableoid::regclass::text COLLATE "C" partname,
* from range_parted order by 1, 2, 3, 4, 5, 6'

Thomas, can you please try the attached incremental patch
regress_locale_changes.patch and check if the test passes ? The patch
is to be applied on the main v22 patch. If the test passes, I will
include these changes (also for list_parted) in the upcoming v23
patch.

Thanks
-Amit Khandekar


regress_locale_changes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread David Rowley
On 7 November 2017 at 01:52, David Rowley  wrote:
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

Hi Amit,

I had another look over this today. Apologies if any of the review seems petty.

Here goes:

1. If test seems to be testing for a child that's a partitioned table,
but is testing for a non-NULL part_scheme.

/*
* If childrel is itself partitioned, add it and its partitioned
* children to the list being propagated up to the root rel.
*/
if (childrel->part_scheme && rel->part_scheme)

Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange
to test for a NULL part_scheme

2. There's a couple of mistakes in my bms_add_range() code. I've
attached bms_add_range_fix.patch. Can you apply this to your tree?

3. This assert seems to be Asserting the same thing twice:

Assert(rel->live_partitioned_rels != NIL &&
   list_length(rel->live_partitioned_rels) > 0);

A List with length == 0 is always NIL.

4. get_partitions_from_clauses(), can you comment why you perform the
list_concat() there.

I believe this is there so that the partition bound from the parent is
passed down to the child so that we can properly eliminate all child
partitions when the 2nd level of partitioning is using the same
partition key as the 1st level. I think this deserves a paragraph of
comment to explain this.

5. Please add a comment to explain what's going on here in
classify_partition_bounding_keys()

if (partattno == 0)
{
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);
}

Looks like, similar to index expressions, that partition expressions
are attno 0 to mark to consume the next expression from the list.

Does this need validation that there are enough partexprs_item items
like what is done in get_range_key_properties()? Or is this validated
somewhere else?

6. Comment claims the if test will test something which it does not
seem to test for:

/*
* Redundant key elimination using btree-semantics based tricks.
*
* Only list and range partitioning use btree operator semantics, so
* skip otherwise.   Also, if there are expressions whose value is yet
* unknown, skip this step, because we need to compare actual values
* below.
*/
memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
if (partkey->strategy == PARTITION_STRATEGY_LIST ||
partkey->strategy == PARTITION_STRATEGY_RANGE)

I was expecting this to be skipped when the clauses contained a
non-const, but it does not seem to.

7. Should be "compare them"

/*
* If the leftarg and rightarg clauses' constants are both of the type
* expected by "op" clause's operator, then compare then using the
* latter's comparison function.
*/

But if I look at the code "compare then using the latter's comparison
function." is not true, it seems to use op's comparison function not
rightarg's. With most of the calls op and rightarg are the same, but
not all of them. The function shouldn't make that assumption even if
the args op was always the same as rightarg.

8. remove_redundant_clauses() needs an overview comment of what the
function does.

9. The comment should explain what we would do in the case of key < 3
AND key <= 2 using some examples.

/* try to keep only one of <, <= */

10. Wondering why this loop runs backward?

for (s = BTMaxStrategyNumber; --s >= 0;)

Why not just:

for (s = 0; s < BTMaxStrategyNumber; s++)

I can't see a special reason for it to run backward. It seems unusual,
but if there's a good reason that I've failed to realise then it's
maybe worth a comment.

11. Pleae comment on why *constfalse = true is set here:

if (!chk || s == (BTEqualStrategyNumber - 1))
continue;

if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
   _result))
{
if (!test_result)
{
*constfalse = true;
return;
}
/* discard the redundant key. */
xform[s] = NULL;
}

Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1.

Also please add a comment when discarding the redundant key maybe
explain that equality is more useful than the other strategies when
there's an overlap.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


bms_add_range_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restricting maximum keep segments by repslots

2017-11-07 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 6 Nov 2017 05:20:50 -0800, Andres Freund  wrote in 
<20171106132050.6apzynxrqrzgh...@alap3.anarazel.de>
> Hi,
> 
> On 2017-10-31 18:43:10 +0900, Kyotaro HORIGUCHI wrote:
> >   - distance:
> > how many bytes LSN can advance before the margin defined by
> > max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
> > or how many bytes this slot have lost xlog from restart_lsn.
> 
> I don't think 'distance' is a good metric - that's going to continually
> change. Why not store the LSN that's available and provide a function
> that computes this? Or just rely on the lsn - lsn operator?

It seems reasonable.,The 'secured minimum LSN' is common among
all slots so showing it in the view may look a bit stupid but I
don't find another suitable place for it.  distance = 0 meant the
state that the slot is living but insecured in the previous patch
and that information is lost by changing 'distance' to
'min_secure_lsn'.

Thus I changed the 'live' column to 'status' and show that staus
in text representation.

status: secured | insecured | broken

So this looks like the following (max_slot_wal_keep_size = 8MB,
which is a half of the default segment size)

-- slots that required WAL is surely available
select restart_lsn, status, min_secure_lsn, pg_current_wal_lsn() from 
pg_replication_slots;
restart_lsn | status  | min_recure_lsn | pg_current_wal_lsn 
+-++
0/1A60  | secured | 0/1A00 | 0/1B42BC78

-- slots that required WAL is still available but insecured
restart_lsn | status| min_recure_lsn | pg_current_wal_lsn 
+---++
0/1A60  | insecured | 0/1C00 | 0/1D76C948

-- slots that required WAL is lost
# We should have seen the log 'Some replication slots have lost...'

restart_lsn | status | min_recure_lsn | pg_current_wal_lsn 
+++
0/1A60  | broken | 0/1C00 | 0/1D76C9F0


I noticed that I abandoned the segment fragment of
max_slot_wal_keep_size in calculating in the routines. The
current patch honors the frament part of max_slot_wal_keep_size.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 109f056e257aba70dddc8d466767ed0a1db371e2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 28 Feb 2017 11:39:48 +0900
Subject: [PATCH 1/2] Add WAL releaf vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c | 39 +++
 src/backend/utils/misc/guc.c  | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 4 files changed, 52 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..cfdae39 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -105,6 +105,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_size_mb = 0;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -9432,9 +9433,47 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
 	{
 		XLogSegNo	slotSegNo;
+		int			slotlimitsegs;
+		uint64		recptroff;
+		uint64		slotlimitbytes;
+		uint64		slotlimitfragment;
+
+		recptroff = XLogSegmentOffset(recptr, wal_segment_size);
+		slotlimitbytes = 1024 * 1024 * max_slot_wal_keep_size_mb;
+		slotlimitfragment =	XLogSegmentOffset(slotlimitbytes,
+			  wal_segment_size);
+
+		/* calculate segments to keep by max_slot_wal_keep_size_mb */
+		slotlimitsegs = ConvertToXSegs(max_slot_wal_keep_size_mb,
+	   wal_segment_size);
+		/* honor the fragment */
+		if (recptroff < slotlimitfragment)
+			slotlimitsegs++;
 
 		XLByteToSeg(keep, slotSegNo, wal_segment_size);
 
+		/*
+		 * ignore slots if too many wal segments are kept.
+		 * max_slot_wal_keep_size is just accumulated on wal_keep_segments.
+		 */
+		if (max_slot_wal_keep_size_mb > 0 && slotSegNo + slotlimitsegs < segno)
+		{
+			segno = segno - slotlimitsegs; /* must be positive */
+
+			/*
+			 * warn only if the checkpoint flushes the required segment.
+			 * we assume here that *logSegNo is calculated keep location.
+			 */
+			if (slotSegNo < *logSegNo)
+ereport(WARNING,
+	(errmsg ("restart LSN of replication slots is ignored by checkpoint"),
+	 errdetail("Some replication slots have lost required WAL segnents to continue by up to %ld segments.",
+	   (segno < *logSegNo ? segno : *logSegNo) - slotSegNo)));
+
+			/* emergency vent */
+			slotSegNo = segno;
+		}
+
 		if (slotSegNo <= 

Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Andres Freund
Hi,

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * avoids wasting CPU executing duplicate subplans

What's the last one referring to?





+static void
+MultiExecParallelHash(HashState *node)
+{

+   switch (BarrierPhase(build_barrier))
+   {
+   case PHJ_BUILD_ALLOCATING:
+
+   /*
+* Either I just allocated the initial hash table in
+* ExecHashTableCreate(), or someone else is doing 
that.  Either
+* way, wait for everyone to arrive here so we can 
proceed, and
+* then fall through.
+*/
+   BarrierArriveAndWait(build_barrier, 
WAIT_EVENT_HASH_BUILD_ALLOCATING);

Can you add a /* fallthrough */ comment here? Gcc is warning if you
don't. While we currently have lotsa other places not having the
annotation, it seem reasonable to have it in new code.


+   case PHJ_BUILD_HASHING_INNER:
+
+   /*
+* It's time to begin hashing, or if we just arrived 
here then
+* hashing is already underway, so join in that effort. 
 While
+* hashing we have to be prepared to help increase the 
number of
+* batches or buckets at any time, and if we arrived 
here when
+* that was already underway we'll have to help 
complete that work
+* immediately so that it's safe to access batches and 
buckets
+* below.
+*/
+   if 
(PHJ_GROW_BATCHES_PHASE(BarrierAttach(>grow_batches_barrier)) !=
+   PHJ_GROW_BATCHES_ELECTING)
+   ExecParallelHashIncreaseNumBatches(hashtable);
+   if 
(PHJ_GROW_BUCKETS_PHASE(BarrierAttach(>grow_buckets_barrier)) !=
+   PHJ_GROW_BUCKETS_ELECTING)
+   ExecParallelHashIncreaseNumBuckets(hashtable);
+   ExecParallelHashEnsureBatchAccessors(hashtable);

"accessors" sounds a bit weird for a bunch of pointers, but maybe that's
just my ESL senses tingling wrongly.



/* 
@@ -240,12 +427,15 @@ ExecEndHash(HashState *node)
  * 
  */
 HashJoinTable
-ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
+ExecHashTableCreate(HashState *state, List *hashOperators, bool keepNulls)

+   /*
+* Parallel Hash tries to use the combined work_mem of all workers to
+* avoid the need to batch.  If that won't work, it falls back to 
work_mem
+* per worker and tries to process batches in parallel.
+*/

One day we're going to need a better approach to this. I have no idea
how, but this per-node, and now per_node * max_parallelism, approach has
only implementation simplicity as its benefit.





+static HashJoinTuple
+ExecParallelHashLoadTuple(HashJoinTable hashtable, MinimalTuple tuple,
+ dsa_pointer *shared)
+{

+static void
+ExecParallelHashJoinSetUpBatches(HashJoinTable hashtable, int nbatch)
+{



+/*
+ * Get the first tuple in a given bucket identified by number.
+ */
+static HashJoinTuple
+ExecHashFirstTupleInBucket(HashJoinTable hashtable, int bucketno)
+{
+   if (hashtable->parallel_state)
+   {
+   dsa_pointer p =
+   dsa_pointer_atomic_read(>buckets.shared[bucketno]);

Can you make this, and possibly a few other places, more readable by
introducing a temporary variable?


+/*
+ * Insert a tuple at the front of a chain of tuples in DSA memory atomically.
+ */
+static void
+ExecParallelHashPushTuple(dsa_pointer_atomic *head,
+ HashJoinTuple tuple,
+ dsa_pointer tuple_shared)
+{
+   do
+   {
+   tuple->next.shared = dsa_pointer_atomic_read(head);
+   } while (!dsa_pointer_atomic_compare_exchange(head,
+   
  >next.shared,
+   
  tuple_shared));
+}

This is hard to read.


+ * While in the phase PHJ_BUILD_HASHING_INNER a separate pair of barriers may
+ * be used repeatedly as required to coordinate expansions in the number of
+ * batches or buckets.  Their phases are as follows:
+ *
+ *   PHJ_GROW_BATCHES_ELECTING   -- initial state
+ *   PHJ_GROW_BATCHES_ALLOCATING -- one allocates new batches
+ *   PHJ_GROW_BATCHES_REPARTITIONING -- all rep
s/rep/repartition/?

 #include "access/htup_details.h"
+#include "access/parallel.h"
 #include 

Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Masahiko Sawada
On Wed, Nov 8, 2017 at 11:20 AM, Michael Paquier
 wrote:
> On Wed, Nov 8, 2017 at 1:58 AM, Alexander Korotkov
>  wrote:
>> On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada 
>> wrote:
>>> I understood the necessity of this patch and reviewed two patches.
>>
>> Good, thank you.
>
> That's clearly a bug fix.
>
>>> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
>>> index 13bd397..c1566d4 100644
>>> --- a/contrib/bloom/Makefile
>>> +++ b/contrib/bloom/Makefile
>>> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
>>>  include $(top_srcdir)/contrib/contrib-global.mk
>>>  endif
>>>
>>> +check: wal-check
>>> +
>>>  wal-check: temp-install
>>> $(prove_check)
>>
>>
>> I've tried this version Makefile.  And I've seen the only difference: when
>> tap tests are enabled, this version of Makefile runs tap tests before
>> regression tests.  While my version of Makefile runs tap tests after
>> regression tests.  That seems like more desirable behavior for me.  But it
>> would be sill nice to simplify Makefile.
>
> Why do you care about the order of actions? There is no dependency
> between each test and it seems to me that it should remain so to keep
> a maximum flexibility. This does not sacrifice coverage as well. In
> short, Sawada-san's suggestion looks like a thing to me. One piece
> that it is missing though is that installcheck triggers only the
> SQL-based tests, and it should also trigger the TAP tests.

+1

> So I think
> that you should instead do something like that:
>
> --- a/contrib/bloom/Makefile
> +++ b/contrib/bloom/Makefile
> @@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global
>  include $(top_srcdir)/contrib/contrib-global.mk
>  endif
>
> +installcheck: wal-installcheck
> +
> +check: wal-check
> +
> +wal-installcheck:
> +   $(prove_installcheck)
> +
>  wal-check: temp-install
> $(prove_check)
>
> This works for me as I would expect it should.

Looks good to me too.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-11-07 Thread Michael Paquier
On Tue, Nov 7, 2017 at 9:58 PM, Fabrízio de Royes Mello
 wrote:
> On Tue, Nov 7, 2017 at 1:15 AM, Michael Paquier 
> wrote:
>> On Sun, Nov 5, 2017 at 3:14 AM, Fabrízio de Royes Mello
>>  wrote:
>> I was going to to hack something like that. That's interesting for the
>> use case Robert has mentioned.
>>
>> Well, in the case of the end session hook, those variables are passed
>> to the hook by being directly taken from the context from MyProcPort:
>> +   (*session_end_hook) (MyProcPort->database_name,
>> MyProcPort->user_name);
>> In the case of the start hook, those are directly taken from the
>> command outer caller, but similarly MyProcPort is already set, so
>> those are strings available (your patch does so in the end session
>> hook)... Variables in hooks are useful if those are not available
>> within the memory context, and refer to a specific state where the
>> hook is called. For example, take the password hook, this uses the
>> user name and the password because those values are not available
>> within the session context. The same stands for other hooks as well.
>> Keeping the interface minimal helps in readability and maintenance.
>> See for the attached example that can be applied on top of 0003, which
>> makes use of the session context, the set regression tests does not
>> pass but this shows how I think those hooks had better be shaped.
>>
>
> Makes sense... fixed.

Thanks for considering it.

>> +   (*session_end_hook) (MyProcPort->database_name,
>> MyProcPort->user_name);
>> +
>> +   /* Make don't leave any active transactions and/or locks behind */
>> +   AbortOutOfAnyTransaction();
>> +   LockReleaseAll(USER_LOCKMETHOD, true);
>> Let's leave this work to people actually implementing the hook contents.
>>
>
> Fixed.
>
> Attached new version. I unify all three patches in just one because this is
> a small patch and it's most part is test code.

Thanks. This makes sense to me.

+   /* Hook just normal backends */
+   if (session_end_hook && MyBackendId != InvalidBackendId)
+   (*session_end_hook) ();
I have been wondering about the necessity of this restriction.
Couldn't it be useful to just let the people implementing the hook do
any decision-making? Tracking some non-backend shutdowns may be useful
as well for logging purposes.

The patch is beginning to take shape (I really like the test module
you are implementing here!), still needs a bit more work.

+CREATE ROLE session_hook_usr1 LOGIN;
+CREATE ROLE session_hook_usr2 LOGIN;
Roles created during regression tests should be prefixed with regress_.

+   if (prev_session_start_hook)
+   prev_session_start_hook();
+
+   if (session_start_hook_enabled)
+   (void) register_session_hook("START");
Shouldn't both be reversed?

+/* sample sessoin end hook function */
Typo here.

+CREATE DATABASE session_hook_db;
[...]
+   if (!strcmp(dbname, "session_hook_db"))
+   {
Creating a database is usually avoided in sql files as those can be
run on existing servers using installcheck. I am getting that this
restriction is in place so as it is possible to create an initial
connection to the server so as the base table to log the activity is
created. That's a fine assumption to me. Still, I think that the
following changes should be done:
- Let's restrict the logging to a role name instead of a database
name, and let's parametrize it with a setting in the temporary
configuration file. Let's not bother about multiple role support with
a list, for the sake of tests and simplicity only defining one role
looks fine to me. Comments in the code should be clear about the
dependency.
- The GUCs test_session_hooks.start_enabled and
test_session_hooks.end_enabled are actually redundant with the
database restriction (well, role restriction per previous comment), so
let's remove them. Please let's also avoid ALTER SYSTEM calls in tests
as it would impact existing installations with installcheck.

Impossible to tell committer's feeling about this test suite, but my
opinion is to keep it as that's a good template example about what can
be done with those two hooks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-07 Thread Amit Kapila
On Wed, Nov 8, 2017 at 2:51 AM, Robert Haas  wrote:
> On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila  wrote:
>
>>>  Also, even if inheritance is used, we might still be the
>>> topmost scan/join target.
>>
>> Sure, but in that case, it won't generate the gather path here (due to
>> this part of check "!root->append_rel_list").  I am not sure whether I
>> have understood the second part of your question, so if my answer
>> appears inadequate, then can you provide more details on what you are
>> concerned about?
>
> I don't know why the question of why root->append_rel_list is empty is
> the relevant thing here for deciding whether to generate gather paths
> at this point.
>

This is required to prohibit generating gather path for top rel in
case of inheritence (Append node) at this place (we want to generate
it later when scan/join target is available).  For such a case, the
reloptkind will be RELOPT_BASEREL and simple_rel_array_size will be
greater than two as it includes child rels as well.  So, the check for
root->append_rel_list will prohibit generating gather path for such a
rel.  Now, for all the child rels of Append, the  reloptkind will be
RELOPT_OTHER_MEMBER_REL, so it won't generate gather path here because
of the first part of check (rel->reloptkind == RELOPT_BASEREL).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-07 Thread Thomas Munro
On Tue, Nov 7, 2017 at 8:03 AM, Robert Haas  wrote:
> The changes to trigger.c still make me super-nervous.  Hey THOMAS
> MUNRO, any chance you could review that part?

Looking, but here's one silly thing that jumped out at me while
getting started with this patch.  I cannot seem to convince my macOS
system to agree with the expected sort order from :show_data, where
underscores precede numbers:

  part_a_10_a_20 | a | 10 | 200 |  1 |
  part_a_1_a_10  | a |  1 |   1 |  1 |
- part_d_1_15| b | 15 | 146 |  1 |
- part_d_1_15| b | 16 | 147 |  2 |
  part_d_15_20   | b | 17 | 155 | 16 |
  part_d_15_20   | b | 19 | 155 | 19 |
+ part_d_1_15| b | 15 | 146 |  1 |
+ part_d_1_15| b | 16 | 147 |  2 |

It seems that macOS (like older BSDs) just doesn't know how to sort
Unicode and falls back to sorting the bits.  I expect that means that
the test will also fail on any other OS with "make check
LC_COLLATE=C".  I believe our regression tests are supposed to pass
with a wide range of collations including C, so I wonder if this means
we should stick a leading zero on those single digit numbers, or
something, to stabilise the output.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Expand empty end tag

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 11:15 AM, Peter Eisentraut  wrote:
> Expand empty end tag

Perhaps you missed this patch?
https://www.postgresql.org/message-id/CAJrrPGdkL8TFk+-VivrW637js0v_KM=ub4pBFy=nf0bpafb...@mail.gmail.com
It seems to me that the information within brackets should not be
inside the filename markup :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 1:58 AM, Alexander Korotkov
 wrote:
> On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada 
> wrote:
>> I understood the necessity of this patch and reviewed two patches.
>
> Good, thank you.

That's clearly a bug fix.

>> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
>> index 13bd397..c1566d4 100644
>> --- a/contrib/bloom/Makefile
>> +++ b/contrib/bloom/Makefile
>> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
>>  include $(top_srcdir)/contrib/contrib-global.mk
>>  endif
>>
>> +check: wal-check
>> +
>>  wal-check: temp-install
>> $(prove_check)
>
>
> I've tried this version Makefile.  And I've seen the only difference: when
> tap tests are enabled, this version of Makefile runs tap tests before
> regression tests.  While my version of Makefile runs tap tests after
> regression tests.  That seems like more desirable behavior for me.  But it
> would be sill nice to simplify Makefile.

Why do you care about the order of actions? There is no dependency
between each test and it seems to me that it should remain so to keep
a maximum flexibility. This does not sacrifice coverage as well. In
short, Sawada-san's suggestion looks like a thing to me. One piece
that it is missing though is that installcheck triggers only the
SQL-based tests, and it should also trigger the TAP tests. So I think
that you should instead do something like that:

--- a/contrib/bloom/Makefile
+++ b/contrib/bloom/Makefile
@@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif

+installcheck: wal-installcheck
+
+check: wal-check
+
+wal-installcheck:
+   $(prove_installcheck)
+
 wal-check: temp-install
$(prove_check)

This works for me as I would expect it should. Could you update the
patch? That's way more simple than having to replicate again rules
like regresscheck and regressioncheck-install. I am switching the
patch back to "waiting on author" for now.

I can see that src/test/modules/commit_ts is missing the shot as well,
and I think that's the case as well of other test modules like
pg_dump's suite..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 1:49 AM, Alexander Korotkov
 wrote:
> On Tue, Nov 7, 2017 at 4:26 PM, Fabrízio Mello 
> wrote:
>> The patch doesn't apply against master:
>>
>> fabrizio@macanudo:/d/postgresql (master)
>> $ git apply /tmp/wal-check-on-bloom-check.patch
>> error: contrib/bloom/Makefile: already exists in working directory
>
> Apparently I sent patches whose are ok for "patch -p1", but not ok for "git
> apply".
> Sorry for that.   I resubmit both of them.

I would not worry about that as long as there is a way to apply
patches. git apply fails to easily to be honest, while patch -p1 is
more permissive. I use the latter extensively by the way, because it
has the merit to not be a PITA.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove duplicate setting in test/recovery/Makefile

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 10:38 AM, Masahiko Sawada  wrote:
> Hi,
>
> I found that EXTRA_INSTALL is doubly set at both top and bottom of the
> src/test/recovery/Makefile. Is it necessary?
>
> Attached patch fixes this.

Indeed, there is some bad overlap between d851bef and 1148e22a. Better
to CC Simon who committed both things.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in dsm_impl.c

2017-11-07 Thread Masahiko Sawada
On Wed, Nov 8, 2017 at 6:36 AM, Robert Haas  wrote:
> On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada  
> wrote:
>> Attached the patch for $subject.
>
> Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove duplicate setting in test/recovery/Makefile

2017-11-07 Thread Masahiko Sawada
Hi,

I found that EXTRA_INSTALL is doubly set at both top and bottom of the
src/test/recovery/Makefile. Is it necessary?

Attached patch fixes this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


remove_duplicate_setting.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] OpenTemporaryFile() vs resowner.c

2017-11-07 Thread Thomas Munro
Hi hackers,

Andres, Robert and Peter G rightly complained[1] that my shared
temporary file patch opens a file, then calls
ResourceOwnerEnlargeFiles() which can fail due to lack of memory, and
then registers the file handle to make sure we don't leak it.  Doh.
The whole point of the separate ResourceOwnerEnlargeXXX() interface is
to be able to put it before resource acquisition.

The existing OpenTemporaryFile() coding has the same mistake.  Please
see attached.

[1] 
https://www.postgresql.org/message-id/20171107210155.kuksdd324kgz5oev%40alap3.anarazel.de

-- 
Thomas Munro
http://www.enterprisedb.com


fix-file-leak.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 9:50 AM, Haribabu Kommi  wrote:
> Thanks for the correction. I was not much aware of SGML markup usage.
> While building the documentation, it raises an warning message of "empty
> end-tag".
> So I just added the end tag. Attached the update patch with the suggested
> correction.

Ah, I can see the warning as well. Using empty tags is forbidden since
c29c5789, which is really recent. Sorry for missing it. Simon got
trapped by that as well visibly. Your patch looks good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 11:11 AM, Michael Paquier 
wrote:

> On Wed, Nov 8, 2017 at 9:04 AM, Haribabu Kommi 
> wrote:
> > The commit 98267e missed to check the empty SGML tag, attached patch
> > fixes the same.
>
> 
>  
> - pg_internal.init (found in multiple directories)
> + pg_internal.init (found in multiple
> directories)
>  
> 
> What has been committed in 98267ee and what is proposed here both look
> incorrect to me. The markup filename ought to be used only with file
> names, so "(found in multiple directories)" should not be within it.
> Simon's commit is not wrong with the markup usage by the way.
>

Thanks for the correction. I was not much aware of SGML markup usage.
While building the documentation, it raises an warning message of "empty
end-tag".
So I just added the end tag. Attached the update patch with the suggested
correction.

Regards,
Hari Babu
Fujitsu Australia


sgml_empty_tag_fix_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Tue, Nov 07, 2017 at 03:31:22PM -0800, Peter Geoghegan wrote:
> On Tue, Nov 7, 2017 at 3:29 PM, Nico Williams  wrote:
> > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> >> Nico Williams  wrote:
> >> >A MERGE mapped to a DML like this:
> >
> > I needed to spend more time reading MERGE docs from other RDBMSes.
> 
> Please don't hijack this thread. It's about the basic question of
> semantics, and is already hard enough for others to follow as-is.

I'm absolutely not.  If you'd like a pithy summary devoid of detail, it
is this:

  I'm making the argument that using ON CONFLICT to implement MERGE
  cannot produce a complete implementation [you seem to agree], but
  there is at least one light-weight way to implement MERGE with
  _existing_ machinery in PG: CTEs.

  It's perfectly fine to implement an executor for MERGE, but I think
  that's a bit silly and I explain why.

Further, I explored your question regarding order of events, which you
(and I) think is a very important semantics question.  You thought order
of execution / trigger firing should be defined, whereas I think it
should not because MERGE explicitly says, at least MSFT's!

MSFT's MERGE says:

| For every insert, update, or delete action specified in the MERGE
| statement, SQL Server fires any corresponding AFTER triggers defined
| on the target table, but does not guarantee on which action to fire
| triggers first or last. Triggers defined for the same action honor the
| order you specify.

Impliedly (though not stated explicitly), the actual updates, inserts,
and deletes, can happen in any order as well as the triggers firing in
any order.

As usual, in the world of programming language design, leaving order of
execution undefined as much as possible increases the level of available
opportunities to parallelize.  Presumably MSFT is leaving the door open
to parallizing MERGE, if they haven't already.

Impliedly, CTEs that have no dependencies on each other are also ripe
for parallelization.  This is important too!  For one of my goals is: to
improve CTE performance.  If implementing MERGE as a mapping to CTEs
leads to improvements in CTEs, so much the better.  But also this *is* a
simple implementation of MERGE, and simplicity seems like a good thing.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 9:04 AM, Haribabu Kommi  wrote:
> The commit 98267e missed to check the empty SGML tag, attached patch
> fixes the same.


 
- pg_internal.init (found in multiple directories)
+ pg_internal.init (found in multiple directories)
 

What has been committed in 98267ee and what is proposed here both look
incorrect to me. The markup filename ought to be used only with file
names, so "(found in multiple directories)" should not be within it.
Simon's commit is not wrong with the markup usage by the way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 3:03 AM, Simon Riggs  wrote:

> On 5 November 2017 at 11:55, Magnus Hagander  wrote:
> > On Sat, Nov 4, 2017 at 4:04 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >>
> >> On Fri, Nov 3, 2017 at 4:04 PM, Petr Jelinek
> >>  wrote:
> >> > Not specific problem to this patch, but I wonder if it should be made
> >> > more clear that those files (there are couple above of what you added)
> >> > are skipped no matter which directory they reside in.
> >>
> >> Agreed, it is a good idea to tell in the docs how this behaves. We
> >> could always change things so as the comparison is based on the full
> >> path like what is done for pg_control, but that does not seem worth
> >> complicating the code.
> >
> >
> > pg_internal.init can, and do, appear in multiple different directories.
> > pg_control is always in the same place. So they're not the same thing.
> >
> > So +1 for documenting the difference in how these are handled, as this is
> > important to know for somebody writing an external tool for it.
>
> Changes made, moving to commit the attached patch.
>

The commit 98267e missed to check the empty SGML tag, attached patch
fixes the same.

Regards,
Hari Babu
Fujitsu Australia


sgml_empty_tag_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Thomas Munro
Hi Peter,

See responses to a couple of points below.  I'll respond to the other
points separately (ie with code/comment changes).

On Wed, Nov 8, 2017 at 10:32 AM, Peter Geoghegan  wrote:
> On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund  wrote:
>> +/*
>> + * Delete a BufFile that was created by BufFileCreateShared in the given
>> + * SharedFileSet using the given name.
>> + *
>> + * It is not necessary to delete files explicitly with this function.  It is
>> + * provided only as a way to delete files proactively, rather than waiting 
>> for
>> + * the SharedFileSet to be cleaned up.
>> + *
>> + * Only one backend should attempt to delete a given name, and should know
>> + * that it exists and has been exported or closed.
>> + */
>
> This part is new to me. We now want one backend to delete a given
> filename. What changed? Please provide a Message-Id reference if that
> will help me to understand.
>
> For now, I'm going to guess that this development had something to do
> with the need to deal with virtual FDs that do a close() on an FD to
> keep under backend limits. Do I have that right?

No -- this is simply an option available to client code that wants to
clean up individual temporary files earlier.  Such client code is
responsible for meeting the synchronisation requirements described in
the comment, but it's entirely optional.  For example:
SharedTuplestore is backed by multiple files and it could take the
opportunity to delete individual files after they've been scanned, if
you told it you were going to scan only once
(SHARED_TUPLESTORE_SINGLE_PASS) and if it could prove that no other
backend could still need to read from it, as mentioned in a comment in
sts_end_parallel_scan().

However, since I changed to the page/chunk based model (spinlock while
advancing block counter, mimicking Parallel Seq Scan) instead of the
earlier Fisher Price version (LWLock while reading each tuple with a
shared read head maintained with tell/seek), I don't actually do that.
Hitting the end of the file no longer means that no one else is
reading from the file (someone else might still be reading from an
earlier chunk even though you've finished reading the final chunk in
the file, and the vfd systems means that they must be free to close
and reopen the file at any time).  In the current patch version the
files are cleaned up wholesale at two times: SharedFileSet cleanup
triggered by DSM destruction, and SharedFileSet reset triggered by
rescan.  Practically, it's always the former case.  It's vanishingly
rare that you'd actually want to be rescanning a Parallel Hash that
spills to disk but in that case we delete the old files and recreate,
and that case is tested in the regression tests.

If it bothers you that I have an API there that I'm not actually using
yet, I will remove it.

>> +   if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
>> +   {
>> +   /* Subtract its size from current usage (do first in case of 
>> error) */
>> +   temporary_files_size -= vfdP->fileSize;
>> +   vfdP->fileSize = 0;
>> +   }
>>
>> So, is it right to do so unconditionally and without regard for errors?
>> If the file isn't deleted, it shouldn't be subtracted from fileSize. I
>> guess you're managing that through the flag, but that's not entirely
>> obvious.
>
> I think that the problem here is that the accounting is expected to
> always work. It's not like there is a resowner style error path in
> which temporary_files_size gets reset to 0.

But there is a resowner error path in which File handles get
automatically closed and temporary_files_size gets adjusted.  The
counter goes up when you create, and goes down when you close or
resowner closes for you.  Eventually you either close and the
bookkeeping is consistent or you crash and it doesn't matter.  And
some kind of freak multiple close attempt is guarded against by
setting the files size to 0 so we can't double-subtract.  Do you see a
bug?

None of this has any impact on whether files are leaked: either
SharedFileSet removes the files, or you crash (or take a filesystem
snapshot, etc) and RemovePgTempFiles() mops them up at the next clean
startup.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 8:48 AM, Robert Haas  wrote:

> On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi 
> wrote:
> > The newly added option is not recommended to be used in normal cases and
> > it is used only for upgrade utilities.
>
> I don't know why it couldn't be used in normal cases.  That seems like
> a totally legitimate thing for somebody to want.  Maybe nobody does,
> but I see no reason to worry if they do.


Ok. Removed the documentation changes that it cannot be used for normal
scenarios, and also added a Note section explaining the problem of using
the dump with pg_restore command with --clean and --create options.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Peter Geoghegan
On Tue, Nov 7, 2017 at 3:29 PM, Nico Williams  wrote:
> On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
>> Nico Williams  wrote:
>> >A MERGE mapped to a DML like this:
>
> I needed to spend more time reading MERGE docs from other RDBMSes.

Please don't hijack this thread. It's about the basic question of
semantics, and is already hard enough for others to follow as-is.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-11-07 Thread Nico Williams
On Mon, Nov 06, 2017 at 05:50:21PM +1300, Thomas Munro wrote:
> On Fri, Oct 20, 2017 at 9:05 AM, Nico Williams  wrote:
> > Rebased (there were conflicts in the SGML files).
> 
> Hi Nico
> 
> FYI that version has some stray absolute paths in constraints.source:
> 
> -COPY COPY_TBL FROM '@abs_srcdir@/data/constro.data';
> +COPY COPY_TBL FROM 
> '/home/nico/ws/postgres/src/test/regress/data/constro.data';
> 
> -COPY COPY_TBL FROM '@abs_srcdir@/data/constrf.data';
> +COPY COPY_TBL FROM 
> '/home/nico/ws/postgres/src/test/regress/data/constrf.data';

Oops!  Thanks for catching that!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> Nico Williams  wrote:
> >A MERGE mapped to a DML like this:

I needed to spend more time reading MERGE docs from other RDBMSes.

The best MERGE so far is MS SQL Server's, which looks like:

  MERGE INTO  
  USING  
  ON ()
  
  -- optional:
  WHEN MATCHED THEN UPDATE SET ...

  -- optional:
  WHEN NOT MATCHED [ BY TARGET ] THEN INSERT ...

  -- optional:
  WHEN NOT MATCHED BY SOURCE THEN DELETE

  -- optional:
  OUTPUT ...
  
  ;

(The other MERGEs are harder to use because they lack a WHEN NOT MATCHED
BY SOURCE THEN DELETE, instead having a DELETE clause on the UPDATE,
which is then difficult to use.)

This is *trivial* to map to a CTE, and, in fact, I and my colleagues
have resorted to hand-coded CTEs like this precisely because PG lacks
MERGE (though we ourselves didn't know about MERGE -- it's new to us).

If  is a query, then we start with a CTE for that, else if it's
a view or table, then we don't setup a CTE for it.  Each of the UPDATE,
INSERT, and/or DELETE can be it's own CTE.  If there's an OUTPUT clause,
that can be a final SELECT that queries from the CTEs that ran the DMLs
with RETURNING.  If there's no OUTPUT then none of the DMLs need to have
RETURNING, and one of them will be the main statement, rather than a
CTE.

The pattern is:

  WITH
-- IFF  is a query:
 AS (),

-- IFF there's a WHEN MATCHED THEN UPDATE
updates AS (
  UPDATE  AS  SET ...
  FROM 
  WHERE 
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'update' as "@action", ...
),

inserts AS (
  INSERT INTO  ()
  SELECT ...
  FROM 
  LEFT JOIN  ON 
  WHERE  IS NOT DISTINCT FROM NULL
  -- IFF there's a CONCURRENTLY clause:
  ON CONFLICT DO NOTHING
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'insert' as "@action", ...
),

deletes AS (
  DELETE FROM 
  WHERE NOT EXISTS (SELECT * FROM  WHERE )
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'delete' as "@action", ...
),

  -- IFF there's an OUTPUT clause
  SELECT * FROM updates
  UNION
  SELECT * FROM inserts
  UNION
  SELECT * FROM deletes;

If there's not an output clause then one of the DMLs has to be the main
statement:

  WITH ...
  DELETE ...; -- or UPDATE, or INSERT

Note that if the source is a view or table and there's no OUTPUT clause,
then it's one DML with up to (but not referring to) two CTEs, and in all
cases the CTEs do not refer to each other.  This means that the executor
can parallelize all of the DMLs.

If the source is a query, then that could be made a temp view to avoid
having to run the query first.  The CTE executor needs to learn to
sometimes do this anyways, so this is good.

The  CTE can be equivalently written without a NOT EXISTS:

to_be_deleted AS (
  SELECT 
  FROM 
  LEFT JOIN  ON ()
  WHERE  IS NOT DISTINCT FROM NULL
),
deletes AS (
  DELETE FROM 
  USING to_be_deleted tbd
  WHERE  = 
)

if that were to run faster (probably not, since PG today would first run
the to_be_deleted CTE, then the deletes CTE).  I mention only because
it's nice to see the symmetry of LEFT JOINs for the two WHEN NOT MATCHED
cases.

(Here  is the alias for it if one was given.)

***

This mapping triggers triggers as one would expect (at least FOR EACH
ROW; I expect the DMLs in CTEs should also trigger FOR EACH STATEMENT
triggers, and if they don't I consider that a bug).

> This is a bad idea. An implementation like this is not at all
> maintainable.

I beg to differ.  First of all, not having to add an executor for MERGE
is a win: much, much less code to maintain.  The code to map MERGE to
CTEs can easily be contained entirely in src/backend/parser/gram.y,
which is a maintainability win: any changes to how CTEs are compiled
will fail to compile if they break the MERGE mapping to CTEs.

> >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.
> 
> That's not handling concurrency -- it's silently ignoring an error. Who
> is to say that the conflict that IGNORE ignored is associated with a row
> visible to the MVCC snapshot of the statement? IOW, why should the DELETE
> affect any row?

That was me misunderstanding MERGE.  The DELETE is independent of the
INSERT -- if an INSERT does nothing because of an ON CONFLICT DO
NOTHING clause, then that won't cause that row to be deleted -- the
inserts and deletes CTEs are independent in the latest mapping (see
above).

I believe adding ON CONFLICT DO NOTHING to the INSERT in this mapping is
all that's needed to support concurrency.

> There are probably a great many reasons why you need a ModifyTable
> executor node that keeps around state, and explicitly indicates that a
> MERGE is a MERGE. For example, we'll probably want statement level
> triggers to execute in a fixed order, regardless of the MERGE, RLS will
> probably require explicitly knowledge of MERGE 

Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 1:42 AM, David Steele  wrote:
> On 11/7/17 11:03 AM, Simon Riggs wrote:
>> On 5 November 2017 at 11:55, Magnus Hagander  wrote:
>>>
>>> So +1 for documenting the difference in how these are handled, as this is
>>> important to know for somebody writing an external tool for it.
>>
>> Changes made, moving to commit the attached patch.
>
> Thanks, Simon!  This was on my to do list today -- glad I checked my
> email first.


+pg_internal.init files can be omitted from the
+backup whenever a file of that name is found.  These files contain
+relation cache data that is always rebuilt when recovering.
+   
Do we want to mention in the docs that the same decision-making is
done for *all* files with matching names, aka the fact that if a file
is listed and found in a sub-folder it is skipped? postmaster.opts or
similar friends are unlikely to be found in anything but the root of
the data folder, still the upthread argument of documenting precisely
what basebackup.c does sounded rather convincing to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 2:23 AM, Simon Riggs  wrote:
> On 31 October 2017 at 12:01, Michael Paquier  
> wrote:
>> While the mention about a manual checkpoint happening after a timed
>> one will cause a full range of WAL segments to be recycled, it is not
>> actually true that segments of the prior's prior checkpoint are not
>> needed, because with your patch the segments of the prior checkpoint
>> are getting recycled. So it seems to me that based on that the formula
>> ought to use 1.0 instead of 2.0...
>
> I think the argument in the comment is right, in that
> CheckPointDistanceEstimate is better if we use multiple checkpoint
> cycles.

Yes, the theory behind is correct. No argument behind that.

> But the implementation of that is bogus and multiplying by 2.0
> wouldn't make it better if CheckPointDistanceEstimate is wrong.

Yes, this is wrong. My apologies if my words looked confusing. By
reading your message I can see that our thoughts are on the same page.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Peter Geoghegan
On Tue, Nov 7, 2017 at 2:40 PM, Юрий Соколов  wrote:
>> The same is true of unique indexes vs. non-unique.
>
> offtopic: recently I'd a look at setting LP_DEAD in indexes.
> I didn't found huge difference between unique and non-unique indices.
> There is codepath that works only for unique, but it is called less
> frequently than common codepath that also sets LP_DEAD.

I meant to say that this is only important with UPDATEs + contention.
The extra LP_DEAD setting within _bt_check_unique() makes quite a
noticeable difference, at least in terms of index bloat (though less
so in terms of raw TPS).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Peter Geoghegan
On Tue, Nov 7, 2017 at 2:36 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> My point is only that it's worth considering that this factor affects
>> how representative your sympathetic case is. It's not clear how many
>> PageIndexMultiDelete() calls are from opportunistic calls to
>> _bt_vacuum_one_page(), how important that subset of calls is, and so
>> on. Maybe it doesn't matter at all.
>
> According to the perf measurements I took earlier, essentially all the
> compactify_tuple calls in this test case are from PageRepairFragmentation
> (from heap_page_prune), not PageIndexMultiDelete.

For a workload with high contention (e.g., lots of updates that follow
a Zipfian distribution) lots of important cleanup has to occur within
_bt_vacuum_one_page(), and with an exclusive buffer lock held. It may
be that making PageIndexMultiDelete() faster pays off
disproportionately well there, but I'd only expect to see that at
higher client count workloads with lots of contention -- workloads
that we still do quite badly on (note that we always have not done
well here, even prior to commit 2ed5b87f9 -- Yura showed this at one
point).

It's possible that this work influenced Yura in some way.

When Postgres Pro did some benchmarking of this at my request, we saw
that the bloat got really bad past a certain client count. IIRC there
was a clear point at around 32 or 64 clients where TPS nosedived,
presumably because cleanup could not keep up. This was a 128 core box,
or something like that, so you'll probably have difficulty recreating
it with what's at hand.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-07 Thread Michael Paquier
On Wed, Nov 8, 2017 at 2:26 AM, Jacob Champion  wrote:
> On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier
>  wrote:
>> Did you really test WAL replay?
>
> Is there a way to test this other than installcheck-world? The only
> failure we've run into at the moment is in the snapshot-too-old tests.
> Maybe we're not configuring with all the tests enabled?

Not automatically. The simplest method I use in this case is
installcheck with a standby replaying changes behind.

>>> The assertion added caught at least one code path where TestForOldSnapshot
>>> calls PageGetLSN without holding any lock.  The snapshot_too_old test in
>>> "check-world" failed due to the assertion failure.  This needs to be fixed,
>>> see the open question in the opening mail on this thread.
>>
>> Good point. I am looping Kevin Grittner here for his input, as the
>> author and committer of old_snapshot_threshold. Things can be
>> addressed with a separate patch by roughly moving the check on
>> PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic()
>> instead.
>
> It still doesn't pass the tests, as not all code paths ensure that a
> content lock is held before calling TestForOldSnapshot.
> BufferGetLSNAtomic only adds the spinlock.

I would prefer waiting for more input from Kevin here. This may prove
to be a more invasive change. There are no objections into fixing the
existing callers in index AMs though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Юрий Соколов
2017-11-08 1:11 GMT+03:00 Peter Geoghegan :
>
> The same is true of unique indexes vs. non-unique.

offtopic: recently I'd a look at setting LP_DEAD in indexes.
I didn't found huge difference between unique and non-unique indices.
There is codepath that works only for unique, but it is called less
frequently than common codepath that also sets LP_DEAD.


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Tom Lane
Peter Geoghegan  writes:
> My point is only that it's worth considering that this factor affects
> how representative your sympathetic case is. It's not clear how many
> PageIndexMultiDelete() calls are from opportunistic calls to
> _bt_vacuum_one_page(), how important that subset of calls is, and so
> on. Maybe it doesn't matter at all.

According to the perf measurements I took earlier, essentially all the
compactify_tuple calls in this test case are from PageRepairFragmentation
(from heap_page_prune), not PageIndexMultiDelete.

I'd be the first to agree that I doubt that test case is really
representative.  I'd been whacking around Yura's original case to
try to get PageRepairFragmentation's runtime up to some measurable
fraction of the total, and while I eventually succeeded, I'm not
sure that too many real workloads will look like that.  However,
if we can make it smaller as well as faster, that seems like a win
even if it's not a measurable fraction of most workloads.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-07 Thread Jacob Champion
On Tue, Nov 7, 2017 at 9:26 AM, Jacob Champion  wrote:
> On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier
>  wrote:
>> It seems to me that 0001 is good for a committer lookup, that will get
>> rid of all existing bugs. For 0002, what you are proposing is still
>> not a good idea for anything using page copies.
>
> I think there is still significant confusion here. To be clear: this
> patch is intended to add no changes for local page copies.

Maybe a better way to put this is: we see no failures with the
pageinspect regression tests, which exercise PageGetLSN() via the
page_header() function. Are there other tests we should be paying
attention to that might show a problem with our local-copy logic?

--Jacob


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Peter Geoghegan
)

On Tue, Nov 7, 2017 at 1:39 PM, Tom Lane  wrote:
> So I think we should seriously consider the attached, but it'd be a
> good idea to benchmark it on a wider variety of platforms and test
> cases.

> create unlogged table test3 (
>  id integer PRIMARY KEY with (fillfactor=85),
>  val text
>  ) WITH (fillfactor=85);

Passing observation:  Unlogged table B-Tree indexes have a much
greater tendency for LP_DEAD setting/kill_prior_tuple() working out
following commit 2ed5b87f9 [1], because unlogged tables were
unaffected by that commit. (I've been meaning to follow up with my
analysis of that regression, actually.)

The same is true of unique indexes vs. non-unique. There are workloads
where the opportunistic LP_DEAD setting performed by
_bt_check_unique() is really important (it calls ItemIdMarkDead()).
Think high contention workloads, like when Postgres is used to
implement a queue table.

My point is only that it's worth considering that this factor affects
how representative your sympathetic case is. It's not clear how many
PageIndexMultiDelete() calls are from opportunistic calls to
_bt_vacuum_one_page(), how important that subset of calls is, and so
on. Maybe it doesn't matter at all.

[1] 
https://postgr.es/m/cah2-wzmyry7mnjf0gw5wtk3cszh3gqfhhoxvsyuno5pk8cu...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Thomas Munro
On Wed, Nov 8, 2017 at 10:32 AM, Robert Haas  wrote:
> On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund  wrote:
>> diff --git a/src/backend/utils/resowner/resowner.c 
>> b/src/backend/utils/resowner/resowner.c
>> index 4c35ccf65eb..8b91d5a6ebe 100644
>> --- a/src/backend/utils/resowner/resowner.c
>> +++ b/src/backend/utils/resowner/resowner.c
>> @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
>> PrintRelCacheLeakWarning(res);
>> RelationClose(res);
>> }
>> -
>> -   /* Ditto for dynamic shared memory segments */
>> -   while (ResourceArrayGetAny(&(owner->dsmarr), ))
>> -   {
>> -   dsm_segment *res = (dsm_segment *) 
>> DatumGetPointer(foundres);
>> -
>> -   if (isCommit)
>> -   PrintDSMLeakWarning(res);
>> -   dsm_detach(res);
>> -   }
>> }
>> else if (phase == RESOURCE_RELEASE_LOCKS)
>> {
>> @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
>> PrintFileLeakWarning(res);
>> FileClose(res);
>> }
>> +
>> +   /* Ditto for dynamic shared memory segments */
>> +   while (ResourceArrayGetAny(&(owner->dsmarr), ))
>> +   {
>> +   dsm_segment *res = (dsm_segment *) 
>> DatumGetPointer(foundres);
>> +
>> +   if (isCommit)
>> +   PrintDSMLeakWarning(res);
>> +   dsm_detach(res);
>> +   }
>> }
>>
>> Is that entirely unproblematic? Are there any DSM callbacks that rely on
>> locks still being held? Please split this part into a separate commit
>> with such analysis.
>
> FWIW, I think this change is a really good idea (I recommended it to
> Thomas at some stage, I think).

Yeah, it was Robert's suggestion; I thought I needed *something* like
this but was hesitant for the niggling reason that Andres mentions:
what if someone somewhere (including code outside our source tree)
depends on this ordering because of unlocking etc?

At that time I thought that my clean-up logic wasn't going to work on
Windows without this reordering, because we were potentially closing
file handles after unlinking the files, and I was under the impression
that Windows wouldn't like that.  Since then I've learned that Windows
does actually allow it, but only if all file handles were opened with
the FILE_SHARE_DELETE flag.  We always do that (see src/port/open.c),
so in fact this change is probably not needed for my patch set (theory
not tested).  I will put it in a separate patch as requested by
Andres, because it's generally a good idea anyway for the reasons that
Robert explained (ie you probably always want to clean up memory last,
since it might contain the meta-data/locks/control objects/whatever
you'll need to clean up anything else).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Robert Haas
On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi  wrote:
> The newly added option is not recommended to be used in normal cases and
> it is used only for upgrade utilities.

I don't know why it couldn't be used in normal cases.  That seems like
a totally legitimate thing for somebody to want.  Maybe nobody does,
but I see no reason to worry if they do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Tom Lane
Robert Haas  writes:
> I think it would be a good idea, as Thomas says, to order the qual
> clauses at an earlier stage and then remember our decision.  However,
> we have to think about whether that's going to increase planning time
> in a noticeable way.  I wonder why we currently postpone this until
> such a late phase of planning.

Because (1) up to now there's been no need to consider the qual ordering
till later, and (2) re-doing that sort for each path seemed unduly
expensive.  If we're to try to estimate whether later quals will be
reached, then sure the ordering becomes important.  I'm still concerned
about (2) though.  If there were a way to pre-sort the quals once for
all paths, the problem goes away, but I don't think that works ---
indexscans may segregate some quals, and in join cases different paths
will actually have different sets of quals they need to check depending
on the join order.

So the bottom line here is how much is it going to cost us to add this
additional refinement in cost estimation, and is it worth it given our
extremely poor level of accuracy in expression cost estimation.  Color
me dubious.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-11-07 Thread Robert Haas
On Tue, Nov 7, 2017 at 4:31 AM, Haribabu Kommi  wrote:
>> Updated patch attached.
> Patch rebased.

I think the earlier concerns about the performance impact of this are
probably very valid concerns, and I don't see how the new version of
the patch gets us much closer to solving them.

I am also not sure I understand how the backend_write_blocks column is
intended to work.  The only call to pgstat_send_walwrites() is in
WalWriterMain, so where do the other backends report anything?

Also, if there's only ever one global set of counters (as opposed to
one per table, say) then why use the stats collector machinery for
this at all, vs. having a structure in shared memory that can be
updated directly?  It seems like adding a lot of overhead for no
functional benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Tom Lane
I've been getting less and less excited about this patch, because I still
couldn't measure any above-the-noise performance improvement without
artificial exaggerations, and some cases seemed actually slower.

However, this morning I had an epiphany: why are we sorting at all?

There is no requirement that these functions preserve the physical
ordering of the tuples' data areas, only that the line-pointer ordering be
preserved.  Indeed, reorganizing the data areas into an ordering matching
the line pointers is probably a good thing, because it should improve
locality of access in future scans of the page.  This is trivial to
implement if we copy the data into a workspace area and back again, as
I was already proposing to do to avoid memmove.  Moreover, at that point
there's little value in a separate compactify function at all: we can
integrate the data-copying logic into the line pointer scan loops in
PageRepairFragmentation and PageIndexMultiDelete, and get rid of the
costs of constructing the intermediate itemIdSortData arrays.

That led me to the attached patch, which is the first version of any
of this work that produces an above-the-noise performance win for me.
I'm seeing 10-20% gains on this modified version of Yura's original
example:

psql -f test3setup.sql
pgbench -M prepared -c 3 -s 1000 -T 300 -P 3 -n -f test3.sql

(sql scripts also attached below; I'm using 1GB shared_buffers and
fsync off, other parameters stock.)

However, there are a couple of objections that could be raised to
this patch:

1. It's trading off per-byte work, in the form of an extra memcpy,
to save sorting work that has per-tuple costs.  Therefore, the relatively
narrow tuples used in Yura's example offer a best-case scenario;
with wider tuples the performance might be worse.

2. On a platform with memmove not so much worse than memcpy as I'm
seeing on my RHEL6 server, trading memmove for memcpy might not be
such a win.

To address point 1, I tried some measurements on the standard pgbench
scenario, which uses significantly wider tuples.  In hopes of addressing
point 2, I also ran the measurements on a laptop running Fedora 25
(gcc 6.4.1, glibc 2.24); I haven't actually checked memmove vs memcpy
on that machine, but at least it's a reasonably late-model glibc.

What I'm getting from the standard pgbench measurements, on both machines,
is that this patch might be a couple percent slower than HEAD, but that is
barely above the noise floor so I'm not too sure about it.

So I think we should seriously consider the attached, but it'd be a
good idea to benchmark it on a wider variety of platforms and test
cases.

regards, tom lane

drop table if exists test3;

create unlogged table test3 (
 id integer PRIMARY KEY with (fillfactor=85),
 val text
 ) WITH (fillfactor=85);

insert into test3 select i, '!'||i from generate_series(1, 1000) as i;

vacuum analyze; checkpoint;

create or replace function dotest3(n int, scale float8) returns void
language plpgsql as $$
begin
for i in 1..n loop
  declare
id1 int := random() * scale;
id2 int := random() * scale;
  begin
perform * from test3 where id = id1;
update test3 set val = '!'|| id2 where id = id1;
  end;
end loop;
end $$;
select dotest3(100, :scale);
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index b6aa2af..73b73de 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
*** PageRestoreTempPage(Page tempPage, Page 
*** 415,471 
  }
  
  /*
-  * sorting support for PageRepairFragmentation and PageIndexMultiDelete
-  */
- typedef struct itemIdSortData
- {
- 	uint16		offsetindex;	/* linp array index */
- 	int16		itemoff;		/* page offset of item data */
- 	uint16		alignedlen;		/* MAXALIGN(item data len) */
- } itemIdSortData;
- typedef itemIdSortData *itemIdSort;
- 
- static int
- itemoffcompare(const void *itemidp1, const void *itemidp2)
- {
- 	/* Sort in decreasing itemoff order */
- 	return ((itemIdSort) itemidp2)->itemoff -
- 		((itemIdSort) itemidp1)->itemoff;
- }
- 
- /*
-  * After removing or marking some line pointers unused, move the tuples to
-  * remove the gaps caused by the removed items.
-  */
- static void
- compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
- {
- 	PageHeader	phdr = (PageHeader) page;
- 	Offset		upper;
- 	int			i;
- 
- 	/* sort itemIdSortData array into decreasing itemoff order */
- 	qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
- 		  itemoffcompare);
- 
- 	upper = phdr->pd_special;
- 	for (i = 0; i < nitems; i++)
- 	{
- 		itemIdSort	itemidptr = [i];
- 		ItemId		lp;
- 
- 		lp = PageGetItemId(page, itemidptr->offsetindex + 1);
- 		upper -= itemidptr->alignedlen;
- 		memmove((char *) page + upper,
- (char *) page + itemidptr->itemoff,
- itemidptr->alignedlen);
- 		lp->lp_off = upper;
- 	}
- 
- 	phdr->pd_upper = upper;
- }
- 
- /*
   * PageRepairFragmentation
   *
   * 

Re: [HACKERS] Fix a typo in dsm_impl.c

2017-11-07 Thread Robert Haas
On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada  wrote:
> Attached the patch for $subject.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Peter Geoghegan
On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund  wrote:
> +/*
> + * Build the name for a given segment of a given BufFile.
> + */
> +static void
> +MakeSharedSegmentName(char *name, const char *buffile_name, int segment)
> +{
> +   snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
> +}
>
> Not a fan of this name - you're not "making" a filename here (as in
> allocating or such). I think I'd just remove the Make prefix.

+1

Can we document the theory behind file naming here, if that isn't in
the latest version? This is a routine private to parallel hash join
(or shared tuplestore), not Buffile. Maybe Buffile should have some
opinion on this, though. Just as a matter of style.

> +/*
> + * Delete a BufFile that was created by BufFileCreateShared in the given
> + * SharedFileSet using the given name.
> + *
> + * It is not necessary to delete files explicitly with this function.  It is
> + * provided only as a way to delete files proactively, rather than waiting 
> for
> + * the SharedFileSet to be cleaned up.
> + *
> + * Only one backend should attempt to delete a given name, and should know
> + * that it exists and has been exported or closed.
> + */

This part is new to me. We now want one backend to delete a given
filename. What changed? Please provide a Message-Id reference if that
will help me to understand.

For now, I'm going to guess that this development had something to do
with the need to deal with virtual FDs that do a close() on an FD to
keep under backend limits. Do I have that right?

> +   /*
> +* We don't set FD_DELETE_AT_CLOSE for files opened this way, but we 
> still
> +* want to make sure they get closed at end of xact.
> +*/
> +   ResourceOwnerEnlargeFiles(CurrentResourceOwner);
> +   ResourceOwnerRememberFile(CurrentResourceOwner, file);
> +   VfdCache[file].resowner = CurrentResourceOwner;
>
> So maybe I'm being pedantic here, but wouldn't the right order be to do
> ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory
> allocating operation, so it can fail, which'd leak the file.

I remember going to pains to get this right with my own unifiable
BufFile concept. I'm going to wait for an my question about file
deletion + resowners before commenting further, though.

> +   if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
> +   {
> +   /* Subtract its size from current usage (do first in case of 
> error) */
> +   temporary_files_size -= vfdP->fileSize;
> +   vfdP->fileSize = 0;
> +   }
>
> So, is it right to do so unconditionally and without regard for errors?
> If the file isn't deleted, it shouldn't be subtracted from fileSize. I
> guess you're managing that through the flag, but that's not entirely
> obvious.

I think that the problem here is that the accounting is expected to
always work. It's not like there is a resowner style error path in
which temporary_files_size gets reset to 0.

> Is that entirely unproblematic? Are there any DSM callbacks that rely on
> locks still being held? Please split this part into a separate commit
> with such analysis.

I was always confused about the proper use of DSM callbacks myself.
They seemed generally underdocumented.

> +   /* Create the output buffer. */
> +   if (accessor->write_chunk != NULL)
> +   pfree(accessor->write_chunk);
> +   accessor->write_chunk = (SharedTuplestoreChunk *)
> +   palloc0(accessor->write_pages * BLCKSZ);
>
> Are we guaranteed to be in a long-lived memory context here?

I imagine that Thomas looked to tuplestore_begin_heap() + interXact as
a kind of precedent here. See comments above that function.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Robert Haas
On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund  wrote:
> +   ResourceOwnerEnlargeFiles(CurrentResourceOwner);
> +   ResourceOwnerRememberFile(CurrentResourceOwner, file);
> +   VfdCache[file].resowner = CurrentResourceOwner;
>
> So maybe I'm being pedantic here, but wouldn't the right order be to do
> ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory
> allocating operation, so it can fail, which'd leak the file.

That's not pedantic ... that's a very sound criticism.  IMHO, anyway.

> diff --git a/src/backend/utils/resowner/resowner.c 
> b/src/backend/utils/resowner/resowner.c
> index 4c35ccf65eb..8b91d5a6ebe 100644
> --- a/src/backend/utils/resowner/resowner.c
> +++ b/src/backend/utils/resowner/resowner.c
> @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
> PrintRelCacheLeakWarning(res);
> RelationClose(res);
> }
> -
> -   /* Ditto for dynamic shared memory segments */
> -   while (ResourceArrayGetAny(&(owner->dsmarr), ))
> -   {
> -   dsm_segment *res = (dsm_segment *) 
> DatumGetPointer(foundres);
> -
> -   if (isCommit)
> -   PrintDSMLeakWarning(res);
> -   dsm_detach(res);
> -   }
> }
> else if (phase == RESOURCE_RELEASE_LOCKS)
> {
> @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
> PrintFileLeakWarning(res);
> FileClose(res);
> }
> +
> +   /* Ditto for dynamic shared memory segments */
> +   while (ResourceArrayGetAny(&(owner->dsmarr), ))
> +   {
> +   dsm_segment *res = (dsm_segment *) 
> DatumGetPointer(foundres);
> +
> +   if (isCommit)
> +   PrintDSMLeakWarning(res);
> +   dsm_detach(res);
> +   }
> }
>
> Is that entirely unproblematic? Are there any DSM callbacks that rely on
> locks still being held? Please split this part into a separate commit
> with such analysis.

FWIW, I think this change is a really good idea (I recommended it to
Thomas at some stage, I think).  The current positioning was decided
by me at a very early stage of parallel query development where I
reasoned as follows (1) the first thing we're going to implement is
going to be parallel quicksort, (2) that's going to allocate a huge
amount of DSM, (3) therefore we should try to free it as early as
possible.  However, I now thing that was wrongheaded, and not just
because parallel quicksort didn't turn out to be the first thing we
developed.  Memory is the very last resource we should release when
aborting a transaction, because any other resource we have is tracked
using data structures that are stored in memory.  Throwing the memory
away before the end therefore makes life very difficult. That's why,
for backend-private memory, we clean up most everything else in
AbortTransaction() and then only destroy memory contexts in
CleanupTransaction().  This change doesn't go as far, but it's in the
same general direction, and I think rightly so.  My error was in
thinking that the primary use of memory would be for storing data, but
really it's about where you put your control structures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-07 Thread Robert Haas
On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila  wrote:
>> Well, I suppose that test will fire for a baserel when the total
>> number of baserels is at least 3 and there's no inheritance involved.
>> But if there are 2 baserels, we're still not the topmost scan/join
>> target.
>
> No, if there are 2 baserels, then simple_rel_array_size will be 3.
> The simple_rel_array_size is always the "number of relations" plus
> "one".  See setup_simple_rel_arrays.

Oh, wow.  That's surprising.

>>  Also, even if inheritance is used, we might still be the
>> topmost scan/join target.
>
> Sure, but in that case, it won't generate the gather path here (due to
> this part of check "!root->append_rel_list").  I am not sure whether I
> have understood the second part of your question, so if my answer
> appears inadequate, then can you provide more details on what you are
> concerned about?

I don't know why the question of why root->append_rel_list is empty is
the relevant thing here for deciding whether to generate gather paths
at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Andres Freund
Hi,

Here's a review of v24

+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 2) AS id, 'aa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 2) as id, 'aa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+delete from bigger_than_it_looks where id <= 19000;
+vacuum bigger_than_it_looks;
+analyze bigger_than_it_looks;
+insert into bigger_than_it_looks
+  select generate_series(1, 19000) as id, 'aa';

It seems kinda easier to just manipulate ndistinct and reltuples...


+set max_parallel_workers_per_gather = 0;
+set work_mem = '4MB';

I hope there's a fair amount of slop here - with different archs you're
going to see quite some size differences.

+-- The "good" case: batches required, but we plan the right number; we
+-- plan for 16 batches, and we stick to that number, and peak memory
+-- usage says within our work_mem budget
+-- non-parallel
+set max_parallel_workers_per_gather = 0;
+set work_mem = '128kB';

So how do we know that's actually the case we're testing rather than
something arbitrarily different? There's IIRC tests somewhere that just
filter the json explain output to the right parts...





+/*
+ * Build the name for a given segment of a given BufFile.
+ */
+static void
+MakeSharedSegmentName(char *name, const char *buffile_name, int segment)
+{
+   snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
+}

Not a fan of this name - you're not "making" a filename here (as in
allocating or such). I think I'd just remove the Make prefix.



+/*
+ * Open a file that was previously created in another backend with
+ * BufFileCreateShared in the same SharedFileSet using the same name.  The
+ * backend that created the file must have called BufFileClose() or
+ * BufFileExport() to make sure that it is ready to be opened by other
+ * backends and render it read-only.
+ */

Is it actually guaranteed that it's another backend / do we rely on
that?

+BufFile *
+BufFileOpenShared(SharedFileSet *fileset, const char *name)
+{

+   /*
+* If we didn't find any files at all, then no BufFile exists with this
+* tag.
+*/
+   if (nfiles == 0)
+   return NULL;

s/taag/name/?


+/*
+ * Delete a BufFile that was created by BufFileCreateShared in the given
+ * SharedFileSet using the given name.
+ *
+ * It is not necessary to delete files explicitly with this function.  It is
+ * provided only as a way to delete files proactively, rather than waiting for
+ * the SharedFileSet to be cleaned up.
+ *
+ * Only one backend should attempt to delete a given name, and should know
+ * that it exists and has been exported or closed.
+ */
+void
+BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+{
+   charsegment_name[MAXPGPATH];
+   int segment = 0;
+   boolfound = false;
+
+   /*
+* We don't know how many segments the file has.  We'll keep deleting
+* until we run out.  If we don't manage to find even an initial 
segment,
+* raise an error.
+*/
+   for (;;)
+   {
+   MakeSharedSegmentName(segment_name, name, segment);
+   if (!SharedFileSetDelete(fileset, segment_name, true))
+   break;
+   found = true;
+   ++segment;
+   }

Hm. Do we properly delete all the files via the resowner mechanism if
this fails midway? I.e. if there are no leading segments? Also wonder if
this doesn't need a CFI check.

+void
+PathNameCreateTemporaryDir(const char *basedir, const char *directory)
+{
+   if (mkdir(directory, S_IRWXU) < 0)
+   {
+   if (errno == EEXIST)
+   return;
+
+   /*
+* Failed.  Try to create basedir first in case it's missing. 
Tolerate
+* ENOENT to close a race against another process following the 
same
+* algorithm.
+*/
+   if (mkdir(basedir, S_IRWXU) < 0 && errno != ENOENT)
+   elog(ERROR, "cannot create temporary directory \"%s\": 
%m",
+basedir);

ENOENT or EEXIST?



+File
+PathNameCreateTemporaryFile(const char *path, bool error_on_failure)
+{
+   Filefile;
+
+   /*
+* Open the file.  Note: we don't use O_EXCL, in case there is an 
orphaned
+* temp file that can be reused.
+*/
+   file = PathNameOpenFile(path, 

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Robert Haas
On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapat
 wrote:
> IIRC, only thing that changes between plan time quals and execution
> time quals is constaint folding of constant parameters. But I don't
> think we change the selectivity estimates when that's done. At the
> same time, I don't think we should make a lot of effort to make sure
> that the order used during the estimation is same as the order at the
> execution; we are anyway estimating. There can always be some
> difference between what's estimated and what actually happens.

I don't think that's a good justification for allowing the order to
vary.  It's true that, at execution time, we may find row counts and
selectivities to be different than what we predicted, but that is a
case of the real data being different from our idealized data -- which
is difficult to avoid in general.  Allowing our actual decisions to be
different from the decisions we thought we would make seems like
programmer sloppiness.  It would also be very confusing if the planner
uses one ordering to estimate the cost and then a different order at
execution time and in the EXPLAIN ANALYZE output.  How could anybody
understand what had happened in such a case?

I think it would be a good idea, as Thomas says, to order the qual
clauses at an earlier stage and then remember our decision.  However,
we have to think about whether that's going to increase planning time
in a noticeable way.  I wonder why we currently postpone this until
such a late phase of planning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-07 Thread Robert Haas
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada  wrote:
>>> I suggest that a good thing to do more or less immediately, regardless
>>> of when this patch ends up being ready, would be to insert an
>>> insertion that LockAcquire() is never called while holding a lock of
>>> one of these types.  If that assertion ever fails, then the whole
>>> theory that these lock types don't need deadlock detection is wrong,
>>> and we'd like to find out about that sooner or later.
>>
>> I understood. I'll check that first.
>
> I've checked whether LockAcquire is called while holding a lock of one
> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
> we cannot move these four lock types together out of heavy-weight
> lock, but can move only the relation extension lock with tricks.
>
> Here is detail of the survey.

Thanks for these details, but I'm not sure I fully understand.

> * LOCKTAG_RELATION_EXTENSION
> There is a path that LockRelationForExtension() could be called while
> holding another relation extension lock. In brin_getinsertbuffer(), we
> acquire a relation extension lock for a index relation and could
> initialize a new buffer (brin_initailize_empty_new_buffer()). During
> initializing a new buffer, we call RecordPageWithFreeSpace() which
> eventually can call fsm_readbuf(rel, addr, true) where the third
> argument is "extend". We can process this problem by having the list
> (or local hash) of acquired locks and skip acquiring the lock if
> already had. For other call paths calling LockRelationForExtension, I
> don't see any problem.

Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock?

Basically, what matters here in the end is whether we can articulate a
deadlock-proof rule around the order in which these locks are
acquired.  The simplest such rule would be "you can only acquire one
lock of any of these types at a time, and you can't subsequently
acquire a heavyweight lock".  But a more complicated rule would be OK
too, e.g. "you can acquire as many heavyweight locks as you want, and
after that you can optionally acquire one page, tuple, or speculative
token lock, and after that you can acquire a relation extension lock".
The latter rule, although more complex, is still deadlock-proof,
because the heavyweight locks still use the deadlock detector, and the
rest has a consistent order of lock acquisition that precludes one
backend taking A then B while another backend takes B then A.  I'm not
entirely clear whether your survey leads us to a place where we can
articulate such a deadlock-proof rule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-07 Thread Andres Freund
Hi,

On 2017-11-06 10:56:43 +0530, Amit Kapila wrote:
> On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund  wrote
> > On 2017-11-05 01:05:59 +0100, Robert Haas wrote:
> >> skip-gather-project-v1.patch does what it says on the tin.  I still
> >> don't have a test case for this, and I didn't find that it helped very
> >> much,
> 
> I am also wondering in which case it can help and I can't think of the
> case.

I'm confused?  Isn't it fairly obvious that unnecessarily projecting
at the gather node is wasteful? Obviously depending on the query you'll
see smaller / bigger gains, but that there's beenfdits should be fairly obvious?


> Basically, as part of projection in the gather, I think we are just
> deforming the tuple which we anyway need to perform before sending the
> tuple to the client (printtup) or probably at the upper level of the
> node.

But in most cases you're not going to print millions of tuples, instead
you're going to apply some further operators ontop (e.g. the
OFFSET/LIMIT in my example).

> >> and you said this looked like a big bottleneck in your
> >> testing, so here you go.

> Is it possible that it shows the bottleneck only for 'explain analyze'
> statement as we don't deform the tuple for that at a later stage?

Doesn't matter, there's a OFFSET/LIMIT ontop of the query. Could just as
well be a sort node or something.


> > The query where that showed a big benefit was
> >
> > SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1;
> >
> > (i.e a not very selective filter, and then just throwing the results away)
> >
> > still shows quite massive benefits:
> 
> Do you see the benefit if the query is executed without using Explain Analyze?

Yes.

Before:
tpch_5[11878][1]=# SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 
10 LIMIT 1;^[[A
...
Time: 7590.196 ms (00:07.590)

After:
Time: 3862.955 ms (00:03.863)


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-07 Thread Jacob Champion
Hi Michael,

On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier
 wrote:
> Did you really test WAL replay?

Is there a way to test this other than installcheck-world? The only
failure we've run into at the moment is in the snapshot-too-old tests.
Maybe we're not configuring with all the tests enabled?

>> The assertion added caught at least one code path where TestForOldSnapshot
>> calls PageGetLSN without holding any lock.  The snapshot_too_old test in
>> "check-world" failed due to the assertion failure.  This needs to be fixed,
>> see the open question in the opening mail on this thread.
>
> Good point. I am looping Kevin Grittner here for his input, as the
> author and committer of old_snapshot_threshold. Things can be
> addressed with a separate patch by roughly moving the check on
> PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic()
> instead.

It still doesn't pass the tests, as not all code paths ensure that a
content lock is held before calling TestForOldSnapshot.
BufferGetLSNAtomic only adds the spinlock.

> The commit fest has lost view of this entry, and so did I. So I have
> added a new entry:
> https://commitfest.postgresql.org/16/1371/

Thank you!

> BufferGetLSNAtomic() could really use LWLockHeldByMe(). Could you
> consider it with an extra patch on top of 0001?

The LWLockHeldByMe() calls are added to BufferGetLSNAtomic in patch
0002 (because it does all its work through PageGetLSN).

> It seems to me that 0001 is good for a committer lookup, that will get
> rid of all existing bugs. For 0002, what you are proposing is still
> not a good idea for anything using page copies.

I think there is still significant confusion here. To be clear: this
patch is intended to add no changes for local page copies. As I've
tried to say (three or four times now): to our understanding, local
page copies are not allocated in the shared BufferBlocks region and
are therefore not subject to the new assertions. Am I missing a corner
case, or completely misunderstanding your point? I never got a direct
response to my earlier questions on this.

--Jacob


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-11-07 Thread Simon Riggs
On 31 October 2017 at 12:01, Michael Paquier  wrote:
> On Tue, Oct 31, 2017 at 2:00 PM, Simon Riggs  wrote:
>> On 30 October 2017 at 18:58, Simon Riggs  wrote:
>>> On 30 October 2017 at 15:22, Simon Riggs  wrote:
>>>
> You forgot to update this formula in xlog.c:
> distance = (2.0 + CheckPointCompletionTarget) * 
> CheckPointDistanceEstimate;
> /* add 10% for good measure. */
> distance *= 1.10;
> You can guess easily where the mistake is.

 Doh - fixed that before posting, so I must have sent the wrong patch.

 It's the 10%, right? ;-)
>>>
>>> So, there are 2 locations that mention 2.0 in xlog.c
>>>
>>> I had already fixed one, which is why I remembered editing it.
>>>
>>> The other one you mention has a detailed comment above it explaining
>>> why it should be 2.0 rather than 1.0, so I left it.
>>>
>>> Let me know if you still think it should be removed? I can see the
>>> argument both ways.
>>> Or maybe we need another patch to account for manual checkpoints.
>>
>> Revised patch to implement this
>
> Here is the comment and the formula:
>  * The reason this calculation is done from the prior
> checkpoint, not the
>  * one that just finished, is that this behaves better if some
> checkpoint
>  * cycles are abnormally short, like if you perform a manual 
> checkpoint
>  * right after a timed one. The manual checkpoint will make
> almost a full
>  * cycle's worth of WAL segments available for recycling, because the
>  * segments from the prior's prior, fully-sized checkpoint cycle are 
> no
>  * longer needed. However, the next checkpoint will make only
> few segments
>  * available for recycling, the ones generated between the timed
>  * checkpoint and the manual one right after that. If at the manual
>  * checkpoint we only retained enough segments to get us to
> the next timed
>  * one, and removed the rest, then at the next checkpoint we would not
>  * have enough segments around for recycling, to get us to the
> checkpoint
>  * after that. Basing the calculations on the distance from
> the prior redo
>  * pointer largely fixes that problem.
>  */
> distance = (2.0 + CheckPointCompletionTarget) *
> CheckPointDistanceEstimate;
>
> While the mention about a manual checkpoint happening after a timed
> one will cause a full range of WAL segments to be recycled, it is not
> actually true that segments of the prior's prior checkpoint are not
> needed, because with your patch the segments of the prior checkpoint
> are getting recycled. So it seems to me that based on that the formula
> ought to use 1.0 instead of 2.0...

I think the argument in the comment is right, in that
CheckPointDistanceEstimate is better if we use multiple checkpoint
cycles.

But the implementation of that is bogus and multiplying by 2.0
wouldn't make it better if CheckPointDistanceEstimate is wrong.

So I will change to 1.0 as you say and rewrite the comment. Thanks for
your review.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Alexander Korotkov
Hi!

On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada 
wrote:

> I understood the necessity of this patch and reviewed two patches.
>

Good, thank you.


> For /fix-bloom-wal-check.patch, it looks good to me. I found no
> problem. But for wal-check-on-bloom-check.patch, if you want to run
> wal-check even when running 'make check' or 'make check-world', we can
> just add wal-check test as follows?
>
> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
> index 13bd397..c1566d4 100644
> --- a/contrib/bloom/Makefile
> +++ b/contrib/bloom/Makefile
> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
>  include $(top_srcdir)/contrib/contrib-global.mk
>  endif
>
> +check: wal-check
> +
>  wal-check: temp-install
> $(prove_check)
>

I've tried this version Makefile.  And I've seen the only difference: when
tap tests are enabled, this version of Makefile runs tap tests before
regression tests.  While my version of Makefile runs tap tests after
regression tests.  That seems like more desirable behavior for me.  But it
would be sill nice to simplify Makefile.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Alexander Korotkov
On Tue, Nov 7, 2017 at 4:26 PM, Fabrízio Mello 
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> The patch doesn't apply against master:
>
> fabrizio@macanudo:/d/postgresql (master)
> $ git apply /tmp/wal-check-on-bloom-check.patch
> error: contrib/bloom/Makefile: already exists in working directory
>

Apparently I sent patches whose are ok for "patch -p1", but not ok for "git
apply".
Sorry for that.   I resubmit both of them.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-bloom-wal-check-2.patch
Description: Binary data


wal-check-on-bloom-check-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread David Steele
On 11/7/17 11:03 AM, Simon Riggs wrote:
> On 5 November 2017 at 11:55, Magnus Hagander  wrote:
>>
>> So +1 for documenting the difference in how these are handled, as this is
>> important to know for somebody writing an external tool for it.
> 
> Changes made, moving to commit the attached patch.

Thanks, Simon!  This was on my to do list today -- glad I checked my
email first.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Geoff Winkless
On 6 November 2017 at 17:35, Simon Riggs  wrote:
> I read that step 3 in Approach2 is some kind of problem in MVCC
> semantics. My understanding is that SQL Standard allows us to define
> what the semantics of the statement are in relation to concurrency, so
> any semantic issue can be handled by defining it to work the way we
> want. The semantics are:
> a) when a unique index is available we avoid errors by using semantics
> of INSERT .. ON CONFLICT UPDATE.
> b) when a unique index is not available we use other semantics.

I'm obviously being obtuse.

If a unique index is not available, then surely there won't _be_ a
failure? The INSERT (or indeed UPDATE) that results in two similar
records will simply happen, and you will end up with two records the
same. That's OK, based on the semantics of MERGE, no? At the
transaction-start INSERT was the correct thing to do.

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Simon Riggs
On 5 November 2017 at 11:55, Magnus Hagander  wrote:
> On Sat, Nov 4, 2017 at 4:04 AM, Michael Paquier 
> wrote:
>>
>> On Fri, Nov 3, 2017 at 4:04 PM, Petr Jelinek
>>  wrote:
>> > Not specific problem to this patch, but I wonder if it should be made
>> > more clear that those files (there are couple above of what you added)
>> > are skipped no matter which directory they reside in.
>>
>> Agreed, it is a good idea to tell in the docs how this behaves. We
>> could always change things so as the comparison is based on the full
>> path like what is done for pg_control, but that does not seem worth
>> complicating the code.
>
>
> pg_internal.init can, and do, appear in multiple different directories.
> pg_control is always in the same place. So they're not the same thing.
>
> So +1 for documenting the difference in how these are handled, as this is
> important to know for somebody writing an external tool for it.

Changes made, moving to commit the attached patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pg_basebackup-exclusion-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-11-07 Thread Fabrízio de Royes Mello
On Tue, Nov 7, 2017 at 1:35 PM, Bossart, Nathan  wrote:
>
> On 11/7/17, 9:13 AM, "Fabrízio Mello"  wrote:
> >> int save_nestlevel;
> >> +   boolrel_lock;
> >>
> >
> > Just remove the additional tab indentation before rel_lock variable.
>
> I've removed the extra tab in v4.
>

Great. Changed status to ready for commiter.

Regards,


--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Andres Freund
On 2017-11-07 12:12:02 -0300, Claudio Freire wrote:
> If you need it. I'm not particularly fond of writing code before it's needed.

+1

> Otherwise, if it's a rarely-encountered corner case, I'd recommend
> simply calling the stdlib's qsort.

FWIW, we always map qsort onto our own implementation:

#define qsort(a,b,c,d) pg_qsort(a,b,c,d)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Claudio Freire
On Tue, Nov 7, 2017 at 11:42 AM, Юрий Соколов  wrote:
>
>
> 2017-11-07 17:15 GMT+03:00 Claudio Freire :
>> Aside from requiring all that include magic, if you place specialized
>> sort functions in a reusable header, using it is as simple as
>> including the type-specific header (or declaring the type macros and
>> including the template), and using them as regular functions. There's
>> no runtime overhead involved, especially if you declare the comparison
>> function as a macro or a static inline function. The sort itself can
>> be declared static inline as well, and the compiler will decide
>> whether it's worth inlining.
>
> Ok, if no one will complain against another one qsort implementation,
> I will add template header for qsort. Since qsort needs insertion sort,
> it will be in a same file.
> Do you approve of this?
>
> With regards,
> Sokolov Yura

If you need it. I'm not particularly fond of writing code before it's needed.

If you can measure the impact for that particular case where qsort
might be needed, and it's a real-world case, then by all means.

Otherwise, if it's a rarely-encountered corner case, I'd recommend
simply calling the stdlib's qsort.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-11-07 Thread Fabrízio Mello
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

> int save_nestlevel;
> +   boolrel_lock;
> 

Just remove the additional tab indentation before rel_lock variable. 

The rest looks good to me.

Regards,

Fabrízio Mello


The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Юрий Соколов
2017-11-07 17:15 GMT+03:00 Claudio Freire :
>
> On Mon, Nov 6, 2017 at 9:08 PM, Юрий Соколов 
wrote:
> > 2017-11-07 1:14 GMT+03:00 Claudio Freire :
> >>
> >> I haven't seen this trick used in postgres, nor do I know whether it
> >> would be well received, so this is more like throwing an idea to see
> >> if it sticks...
> >>
> >> But a way to do this without macros is to have an includable
> >> "template" algorithm that simply doesn't define the comparison
> >> function/type, it rather assumes it:
> >>
> >> qsort_template.h
> >>
> >> #define QSORT_NAME qsort_ ## QSORT_SUFFIX
> >>
> >> static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems)
> >> {
> >> ... if (ELEM_LESS(arr[a], arr[b]))
> >> ...
> >> }
> >>
> >> #undef QSORT_NAME
> >>
> >> Then, in "offset_qsort.h":
> >>
> >> #define QSORT_SUFFIX offset
> >> #define ELEM_TYPE offset
> >> #define ELEM_LESS(a,b) ((a) < (b))
> >>
> >> #include "qsort_template.h"
> >>
> >> #undef QSORT_SUFFIX
> >> #undef ELEM_TYPE
> >> #undef ELEM_LESS
> >>
> >> Now, I realize this may have its cons, but it does simplify
> >> maintainance of type-specific or parameterized variants of
> >> performance-critical functions.
> >>
> >> > I can do specialized qsort for this case. But it will be larger
bunch of
> >> > code, than
> >> > shell sort.
> >> >
> >> >> And I'd recommend doing that when there is a need, and I don't think
> >> >> this patch really needs it, since bucket sort handles most cases
> >> >> anyway.
> >> >
> >> > And it still needs insertion sort for buckets.
> >> > I can agree to get rid of shell sort. But insertion sort is
necessary.
> >>
> >> I didn't suggest getting rid of insertion sort. But the trick above is
> >> equally applicable to insertion sort.
> >
> > This trick is used in simplehash.h . I agree, it could be useful for
qsort.
> > This will not make qsort inlineable, but will reduce overhead much.
> >
> > This trick is too heavy-weight for insertion sort alone, though. Without
> > shellsort, insertion sort could be expressed in 14 line macros ( 8 lines
> > without curly braces). But if insertion sort will be defined together
with
> > qsort (because qsort still needs it), then it is justifiable.
>
> What do you mean by heavy-weight?


I mean, I've already made reusable sort implementation with macros
that is called like a function (with type parameter). If we are talking
only about insertion sort, then such macros looks much prettier than
including file.

But qsort is better implemented with included template-header.

BTW, there is example of defining many functions with call to template
macro instead of including template header:
https://github.com/attractivechaos/klib/blob/master/khash.h
But it looks ugly.

>
> Aside from requiring all that include magic, if you place specialized
> sort functions in a reusable header, using it is as simple as
> including the type-specific header (or declaring the type macros and
> including the template), and using them as regular functions. There's
> no runtime overhead involved, especially if you declare the comparison
> function as a macro or a static inline function. The sort itself can
> be declared static inline as well, and the compiler will decide
> whether it's worth inlining.

Ok, if no one will complain against another one qsort implementation,
I will add template header for qsort. Since qsort needs insertion sort,
it will be in a same file.
Do you approve of this?

With regards,
Sokolov Yura


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Claudio Freire
On Mon, Nov 6, 2017 at 9:08 PM, Юрий Соколов  wrote:
> 2017-11-07 1:14 GMT+03:00 Claudio Freire :
>>
>> I haven't seen this trick used in postgres, nor do I know whether it
>> would be well received, so this is more like throwing an idea to see
>> if it sticks...
>>
>> But a way to do this without macros is to have an includable
>> "template" algorithm that simply doesn't define the comparison
>> function/type, it rather assumes it:
>>
>> qsort_template.h
>>
>> #define QSORT_NAME qsort_ ## QSORT_SUFFIX
>>
>> static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems)
>> {
>> ... if (ELEM_LESS(arr[a], arr[b]))
>> ...
>> }
>>
>> #undef QSORT_NAME
>>
>> Then, in "offset_qsort.h":
>>
>> #define QSORT_SUFFIX offset
>> #define ELEM_TYPE offset
>> #define ELEM_LESS(a,b) ((a) < (b))
>>
>> #include "qsort_template.h"
>>
>> #undef QSORT_SUFFIX
>> #undef ELEM_TYPE
>> #undef ELEM_LESS
>>
>> Now, I realize this may have its cons, but it does simplify
>> maintainance of type-specific or parameterized variants of
>> performance-critical functions.
>>
>> > I can do specialized qsort for this case. But it will be larger bunch of
>> > code, than
>> > shell sort.
>> >
>> >> And I'd recommend doing that when there is a need, and I don't think
>> >> this patch really needs it, since bucket sort handles most cases
>> >> anyway.
>> >
>> > And it still needs insertion sort for buckets.
>> > I can agree to get rid of shell sort. But insertion sort is necessary.
>>
>> I didn't suggest getting rid of insertion sort. But the trick above is
>> equally applicable to insertion sort.
>
> This trick is used in simplehash.h . I agree, it could be useful for qsort.
> This will not make qsort inlineable, but will reduce overhead much.
>
> This trick is too heavy-weight for insertion sort alone, though. Without
> shellsort, insertion sort could be expressed in 14 line macros ( 8 lines
> without curly braces). But if insertion sort will be defined together with
> qsort (because qsort still needs it), then it is justifiable.

What do you mean by heavy-weight?

Aside from requiring all that include magic, if you place specialized
sort functions in a reusable header, using it is as simple as
including the type-specific header (or declaring the type macros and
including the template), and using them as regular functions. There's
no runtime overhead involved, especially if you declare the comparison
function as a macro or a static inline function. The sort itself can
be declared static inline as well, and the compiler will decide
whether it's worth inlining.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-11-07 Thread Fabrízio Mello
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The patch doesn't apply against master anymore:

fabrizio@macanudo:/d/postgresql (master) 
$ git apply /tmp/new_vacuum_log_messages_v2.patch
error: patch failed: doc/src/sgml/config.sgml:5899
error: doc/src/sgml/config.sgml: patch does not apply

Regards,

Fabrízio Mello

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Masahiko Sawada
On Fri, Sep 29, 2017 at 10:32 PM, Alexander Korotkov
 wrote:
> On Wed, Sep 6, 2017 at 5:06 PM, Alexander Korotkov
>  wrote:
>>
>> On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov
>>  wrote:
>>>
>>> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
>>> check that queries give same results on master and standby.  They just check
>>> that *return codes* of psql are equal.
>>>
 # Run test queries and compare their result
 my $master_result = $node_master->psql("postgres", $queries);
 my $standby_result = $node_standby->psql("postgres", $queries);
 is($master_result, $standby_result, "$test_name: query result matches");
>>>
>>>
>>> Attached patch fixes this problem by using safe_psql() which returns psql
>>> output instead of return code.  For safety, this patch replaces psql() with
>>> safe_psql() in other places too.
>>>
>>> I think this should be backpatched to 9.6 as bugfix.
>>
>>
>> Also, it would be nice to call wal-check on bloom check (now wal-check
>> isn't called even on check-world).
>> See attached patch.  My changes to Makefile could be cumbersome.  Sorry
>> for that, I don't have much experience with them...
>
>
> This topic didn't receive any attention yet.  Apparently, it's because of
> in-progress commitfest and upcoming release.
> I've registered it on upcoming commitfest as bug fix.
> https://commitfest.postgresql.org/15/1313/
>

I understood the necessity of this patch and reviewed two patches.

For /fix-bloom-wal-check.patch, it looks good to me. I found no
problem. But for wal-check-on-bloom-check.patch, if you want to run
wal-check even when running 'make check' or 'make check-world', we can
just add wal-check test as follows?

diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
index 13bd397..c1566d4 100644
--- a/contrib/bloom/Makefile
+++ b/contrib/bloom/Makefile
@@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif

+check: wal-check
+
 wal-check: temp-install
$(prove_check)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Fabrízio Mello
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The patch doesn't apply against master:

fabrizio@macanudo:/d/postgresql (master) 
$ git apply /tmp/wal-check-on-bloom-check.patch
error: contrib/bloom/Makefile: already exists in working directory

Regards,

Fabrízio Mello

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-11-07 Thread Fabrízio de Royes Mello
On Tue, Nov 7, 2017 at 1:15 AM, Michael Paquier 
wrote:
>
> On Sun, Nov 5, 2017 at 3:14 AM, Fabrízio de Royes Mello
>  wrote:
> > On Sat, Nov 4, 2017 at 1:23 AM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >> On Fri, Nov 3, 2017 at 1:55 PM, Fabrízio de Royes Mello
> >>  wrote:
> >> >> Passing the database name and user name does not look much useful to
> >> >> me. You can have access to this data already with CurrentUserId and
> >> >> MyDatabaseId.
> >> >
> >> > This way we don't need to convert oid to names inside hook code.
> >>
> >> Well, arguments of hooks are useful if they are used. Now if I look at
> >> the two examples mainly proposed in this thread, be it in your set of
> >> patches or the possibility to do some SQL transaction, I see nothing
> >> using them. So I'd vote for keeping an interface minimal.
> >>
> >
> > Maybe the attached patch with improved test module can illustrate
better the
> > feature.
>
> I was going to to hack something like that. That's interesting for the
> use case Robert has mentioned.
>
> Well, in the case of the end session hook, those variables are passed
> to the hook by being directly taken from the context from MyProcPort:
> +   (*session_end_hook) (MyProcPort->database_name,
> MyProcPort->user_name);
> In the case of the start hook, those are directly taken from the
> command outer caller, but similarly MyProcPort is already set, so
> those are strings available (your patch does so in the end session
> hook)... Variables in hooks are useful if those are not available
> within the memory context, and refer to a specific state where the
> hook is called. For example, take the password hook, this uses the
> user name and the password because those values are not available
> within the session context. The same stands for other hooks as well.
> Keeping the interface minimal helps in readability and maintenance.
> See for the attached example that can be applied on top of 0003, which
> makes use of the session context, the set regression tests does not
> pass but this shows how I think those hooks had better be shaped.
>

Makes sense... fixed.


> +   (*session_end_hook) (MyProcPort->database_name,
MyProcPort->user_name);
> +
> +   /* Make don't leave any active transactions and/or locks behind */
> +   AbortOutOfAnyTransaction();
> +   LockReleaseAll(USER_LOCKMETHOD, true);
> Let's leave this work to people actually implementing the hook contents.
>

Fixed.

Attached new version. I unify all three patches in just one because this is
a small patch and it's most part is test code.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2c7260e..52a9641 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -169,6 +169,9 @@ static ProcSignalReason RecoveryConflictReason;
 static MemoryContext row_description_context = NULL;
 static StringInfoData row_description_buf;
 
+/* Hook for plugins to get control at start of session */
+session_start_hook_type session_start_hook = NULL;
+
 /* 
  *		decls for routines only used in this file
  * 
@@ -3857,6 +3860,9 @@ PostgresMain(int argc, char *argv[],
 	if (!IsUnderPostmaster)
 		PgStartTime = GetCurrentTimestamp();
 
+	if (session_start_hook)
+		(*session_start_hook) ();
+
 	/*
 	 * POSTGRES main processing loop begins here
 	 *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 20f1d27..ffaf51f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -76,6 +76,8 @@ static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oid databaseid, Oid roleid);
 
+/* Hook for plugins to get control at end of session */
+session_end_hook_type session_end_hook = NULL;
 
 /*** InitPostgres support ***/
 
@@ -1154,6 +1156,10 @@ ShutdownPostgres(int code, Datum arg)
 	 * them explicitly.
 	 */
 	LockReleaseAll(USER_LOCKMETHOD, true);
+
+	/* Hook just normal backends */
+	if (session_end_hook && MyBackendId != InvalidBackendId)
+		(*session_end_hook) ();
 }
 
 
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index f8c535c..9f05bfb 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -35,6 +35,13 @@ extern PGDLLIMPORT const char *debug_query_string;
 extern int	max_stack_depth;
 extern int	PostAuthDelay;
 
+/* Hook for plugins to get control at start and end of session */

Re: [HACKERS] [PATCH] Improve geometric types

2017-11-07 Thread Kyotaro HORIGUCHI
Hello, thanks for the new patch.

0004 failed to be applied on the underneath patches.

At Sun, 5 Nov 2017 15:54:19 +0100, Emre Hasegeli  wrote in 

> > I am not sure how useful NaNs are in geometric types context, but we
> > allow them, so inconsistent hypot() would be a problem.  I will change
> > my patches to keep pg_hypot().
> 
> New versions of the patches are attached with 2 additional ones.  The
> new versions leave pg_hypot() in place.  One of the new patches
> improves the test coverage.  The line coverage of geo_ops.c increases
> from 55% to 81%.  The other one fixes -0 values to 0 on float
> operators.  I am not sure about performance implication of this, so
> kept it separate.  It may be a better idea to check this only on the
> platforms that has tendency to produce -0.
> 
> While working on the tests, I found some unreachable code and removed
> it.  I also found that lseg ## lseg operator returning wrong results.
> It is defined as "closest point to first segment on the second
> segment", but:
> 
> > # select '[(1,2),(3,4)]'::lseg ## '[(0,0),(6,6)]'::lseg;
> >  ?column?
> > --
> >  (1,2)
> > (1 row)
> 
> I appended the fix to the patches.  This is also effecting lseg ## box 
> operator.

Mmm.. It returns (1.5, 1.5) with the 0004 patch. It is surely a
point on the second operand but I'm not sure it's right that the
operator returns a specific point for two parallel segments.

> I also changed recently band-aided point ## lseg operator to return
> the point instead of NULL when it cannot find the correct result to
> avoid the operators depending on this one to crash.

I'd like to put comments on 0001 and 0004 only now:

 - Adding [LR]DELIM_L looks good but they should be described in
   the comment just above.

 - Renaming float8_slope to slope seems no problem.

 - I'm not sure the change of box_construct is good but currently
   all callers fits the new interface (giving two points, not
   four coordinates).

 - close_lseg seems forgetting the case where the two given
   segments are crossing (and parallel). For example,

   select '(-8,-8),(1,1)'::lseg ## '(-10,0),(2,0)'::lseg;

   is expected to return (0,0), which is the crossing point of
   the two segments, but it returns (1,0) (and returned (1,1)
   before the patch), which is the point on the l2 closest to the
   closer end of l1 to l2.

   As mentioned above, it is difficult to dicide what is the
   proper result for parallel segments. I suppose that the
   following operation should return an invalid value as a point.

   select '(-1,0),(1,0)'::lseg ## '(-1,1),(1,1)'::lseg;

   close_lseg does the following operations in the else case of
   'if (float8_...)'. If i'm not missing something, the result of
   the following operation is always the closer end of l2. In
   other words it seems a waste of cycles.
   
   | point = DirectFunctionCall2(close_ps,
   | PointPGetDatum(>p[closer_end2]),
   | LsegPGetDatum(l1));
   | return DirectFunctionCall2(close_ps, point, LsegPGetDatum(l2));


- make_bound_box operates directly on the poly->boundbox. I'm
  afraid that such coding hinders compiers from using registers.

  This is a bit apart from this patch, it would be better if we
  could eliminate internal calls using DirectFunctionCall.

reagrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-11-07 Thread Aleksandr Parfenov
On Tue, 31 Oct 2017 09:47:57 +0100
Emre Hasegeli  wrote:

> > If we want to save this behavior, we should somehow pass a stopword
> > to tsvector composition function (parsetext in ts_parse.c) for
> > counter increment or increment it in another way. Currently, an
> > empty lexemes array is passed as a result of LexizeExec.
> >
> > One of possible way to do so is something like:
> > CASE polish_stopword
> > WHEN MATCH THEN KEEP -- stopword counting
> > ELSE polish_isspell
> > END  
> 
> This would mean keeping the stopwords.  What we want is
> 
> CASE polish_stopword-- stopword counting
> WHEN NO MATCH THEN polish_isspell
> END
> 
> Do you think it is possible?

Hi Emre,

I thought how it can be implemented. The way I see is to increment
word counter in case if any chcked dictionary matched the word even
without returning lexeme. Main drawback is that counter increment is
implicit.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-11-07 Thread Amit Kapila
On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila  wrote:
> On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas  wrote:
>> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila  wrote:
>>> How about always returning false for PARAM_EXTERN?
>>
>> Yeah, I think that's what we should do.  Let's do that first as a
>> separate patch, which we might even want to back-patch, then return to
>> this.
>>
>
> Now that the PARAM_EXTERN issue is fixed, I have rebased this patch.
> This patch had been switched to Ready For Committer in last CF, then
> Robert had comments which I have addressed, so I think the status
> should be switched back to Ready For committer.
>

As mentioned, changed the status of the patch in CF app.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-07 Thread Ildus Kurbangaliev
On Thu, 2 Nov 2017 23:02:34 +0800
Craig Ringer  wrote:

> On 2 November 2017 at 17:41, Ildus Kurbangaliev
>  wrote:
> 
> > In this patch compression methods is suitable for MAIN and EXTENDED
> > storages like in current implementation in postgres. Just instead
> > only of LZ4 you can specify any other compression method.  
> 
> We've had this discussion before.
> 
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
> 
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de
> 
> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.
> 

Thank you for the link, I didn't see that thread when I looked over
mailing lists. I read it briefly, and I can address few things
relating to my patch.

Most concerns have been related with legal issues.
Actually that was the reason I did not include any new compression
algorithms to my patch. Unlike that patch mine only provides syntax
and is just a way to give the users use their own compression algorithms
and deal with any legal issues themselves.

I use only one unused bit in header (there's still one free ;), that's
enough to determine that data is compressed or not.

I did found out that pg_upgrade doesn't work properly with my patch,
soon I will send fix for it.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Haribabu Kommi
On Thu, Oct 26, 2017 at 10:01 PM, Robert Haas  wrote:

> On Mon, Oct 23, 2017 at 7:36 AM, Haribabu Kommi
>  wrote:
> > Apologies for not providing much details.
> >
> > pg_dumpall is used to produce the following statements for database,
> >
> > "Create database" (other than default database) or
> > "Alter database set tablespace" for default database (if required)
> >
> > ACL queries related to database
> > Alter database config
> > Alter database role config
> >
> > whereas, pg_dump used to produce only "create database statement".
>
> How about adding a new flag --set-db-properties that doesn't produce
> CREATE DATABASE but does dump the other stuff?  -C would dump both
> CREATE DATABASE *and* the other stuff.  Then you could dump built-in
> databases with --set-db-properties and others with -C.


Thanks for the idea, Here I attached the patch that implements the same.

The newly added option is not recommended to be used in normal cases and
it is used only for upgrade utilities.

In case if user issues pg_dump with --set-db-properties option along with
--create
or --clean options, an error is raised. Currently there is no way to throw
an error
in case if the dump is generated with --set-db-properties and try to
restore with
--clean option. To avoid this change, we may need to add additional details
in the
archive handler, but is it really needed?

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-11-07 Thread Haribabu Kommi
On Wed, Sep 27, 2017 at 6:58 PM, Haribabu Kommi 
wrote:

>
> Updated patch attached.
>

Patch rebased.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites-statistics-view_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers