Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Douglas Doole
>
> Here's a not-really-tested draft patch for that.
>

I don't know the parallel code so I'm not going to comment on the overall
patch, but a thought on ExecSetTupleBound().

That function is getting a number of cases where we're doing recursion for
a single child (result, gather, gather-merge). Recursion for a single child
isn't particularly efficient. I know that if there's a single case of
recursion like this, compilers will frequently turn it into a loop, but I
don't know if compilers can optimize a branched case like this.

Would we be better off moving those cases into the while loop I added to
avoid the recursion? So we'd end up with something like:

while ()
{
if subquery
else if result
else if gather
else if gather merge
}

if sort
else if merge append

And a nit from my original fix now that I've looked at the pg10 code more.
The two casts I added (to SubqueryScanState and Node) should probably be
changed to castNode() calls.

- Doug
Salesforce


Re: [HACKERS] Combination with priority-based and quorum-based synchronous replications

2017-08-23 Thread Michael Paquier
On Thu, Aug 24, 2017 at 12:38 PM, Masahiko Sawada  wrote:
> PostgreSQL 9.6 introduced the priority-based multiple synchronous
> replication and PostgreSQL 10 introduced the quorum-based one.
> Initially I was thinking to use both synchronous replication ways in
> combination but it's not supported yet for now. It's useful for some
> use cases where for example we have three replicas: nodeA, nodeB and
> nodeC. Two of them (nodeA and nodeC) are for read replica and another
> one (nodeB) is for the backup. In this case we would want to replicate
> the data synchronously to nodeB while replicating data to the nodeA
> and nodeC using quorum-based synchronous replication. To cover such a
> use case we need a feature allowing us to use both in combination.
> IIUC other use cases are also mentioned on earlier discussion.

Yes, I would imagine cases where users are two sites with a set of
standbys in each to be a common use-case for this feature, though
complicated (are there that many people with such complicated
scenarios anyway?), it is possible to failover to one site or the
other, and be able to have at hand a set of read-only standbys at hand
just after the promotion of the new primary on one of the two sites.

> To implement there are two ideas.
> 1. Use two synchronous replication ways in combination. For above
> example, we can set s_s_names = 'First 2 (nodeB, Any1(nodeA, nodeC))'.
> This approach allows us to set a nested set of nodes in s_s_names. We
> can consider supporting the more nested solution but one nested level
> would be enough for most cases. Also, it would be useful if one
> synchronization method can have multiple another synchronization
> method. For example, I can imagine a use case where each two backup
> data centers have two replicas and we want to send the data
> synchronously either one replica on each data center. We can set
> s_s_name = 'First 2( Any 1(nodeA, nodeB), Any 1(nodeC, nodeD))'. It
> might be over engineering but it would be worth to consider it.

That would cover the use-case written above. After a failover, it may
count to have ready-to-use standbys at the same site as the new
primary. Cascading replication could be used, but you really don't
want a F with a single standby entry connecting to the primary on each
backup sites either.

> 2. Extend quorum-based synchronous replication so that we can specify
> the node that we want to replicate the data synchronously. For above
> example, if we can set s_s_names = 'Any 2 (nodeA, *nodeB, nodeC)' then
> the master server wait for nodeB and either nodeA or nodeC. That is, a
> special mark '*' means that the marked nodes is preferentially
> selected as synchronous server. This approach is more simpler than #1
> approach but we cannot nest more than one and a synchronization method
> cannot have an another method..

You cannot either select multiple groups, which sounds limited to me.
It's hard to imagine people using more than one level of nesting as
well.

> Feedback and comment are very welcome.

Approach 1. has more promises, no need to have more than 1 level of
nesting. I would be curious to know if there are many people
interested in such more complex availability strategies as well or if
that's a niche case. For example, I would not need such complexity for
what develop lately as there are already enough complexities to
consider with one level of standbys.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Ashutosh Sharma
On Wed, Aug 23, 2017 at 7:39 PM, Jesper Pedersen
 wrote:
> On 08/23/2017 07:38 AM, Amit Kapila wrote:
>>
>> Thanks for the new version.  I again looked at the patches and fixed
>> quite a few comments in the code and ReadMe.  You have forgotten to
>> update README for the changes in vacuum patch
>> (0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
>> don't have anything more to add.  If you are okay with changes, then
>> we can move it to Ready For Committer unless someone else has some
>> more comments.
>>
>
> Just some minor comments.

Thanks for the review.

>
> README:
> +   it's pin till the end of scan)
>
> its pin till the end of the scan)

Corrected.

>
> +To minimize lock/unlock traffic, hash index scan always searches entire
> hash
>
> To minimize lock/unlock traffic, hash index scan always searches the entire
> hash

Done.

>
> hashsearch.c:
>
> +static inline void _hash_saveitem(HashScanOpaque so, int itemIndex,
> +  OffsetNumber offnum, IndexTuple itup);
>
> There are other instances of "inline" in the code base, so I guess that this
> is ok.
>
> +* Advance to next tuple on current page; or if there's no more, try
> to
>
> Advance to the next tuple on the current page; or if done, try to
>

Done.

Attached are the patches with above changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 501d48ef3b566569c687a9a4ac4b239b2278c789 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 24 Aug 2017 10:19:58 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma.
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 426 +++
 src/backend/access/hash/hashutil.c   |  74 +-
 src/include/access/hash.h|  55 -
 6 files changed, 549 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..3b1f719 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+	the pin on the primary bucket throughout the scan)
+	save all the matching tuples from current index page into an items array
+	release pin and content lock (but if it is primary bucket page retain
+	its pin till the end of the scan)
+	get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned.  Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches the entire
+hash page to identify all the matching items at once, copying their heap tuple
+IDs into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on the same index page without any requirement of re-finding the
+current scan position for the reader. We do continue to hold a pin on the
+bucket page, to protect against concurrent deletions and bucket split.
 
 To allow for scans during a bucket split, if at the start of the scan, the
 bucket is marked as bucket-being-populated, it scan all the tuples in that
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index d89c192..45a3a5a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,21 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
-	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	

Re: [HACKERS] Silent bug in transformIndexConstraint

2017-08-23 Thread Serge Rielau
Never mind. I take that back. The problem is not in community code.

Cheers
Serge


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Ashutosh Sharma
Hi Amit,

On Wed, Aug 23, 2017 at 5:08 PM, Amit Kapila  wrote:
> On Tue, Aug 22, 2017 at 7:24 PM, Ashutosh Sharma  
> wrote:
>> On Tue, Aug 22, 2017 at 3:55 PM, Amit Kapila  wrote:
>>
>> Okay, I got your point now. I think, currently in _hash_kill_items(),
>> if an overflow page is pinned we do not check if it got modified since
>> the last read or
>> not. Hence, if the vacuum runs on an overflow page that is pinned and
>> also has some dead tuples in it then it could create a problem for
>> scan basically,
>> when scan would attempt to mark the killed items as dead. To get rid
>> of such problem, i think, even if an overflow page is pinned we should
>> check if it got
>> modified or not since the last read was performed on the page. If yes,
>> then do not allow scan to mark killed items as dead. Attached is the
>> newer version with these changes along with some other cosmetic
>> changes mentioned in your earlier email. Thanks.
>>
>
> Thanks for the new version.  I again looked at the patches and fixed
> quite a few comments in the code and ReadMe.  You have forgotten to
> update README for the changes in vacuum patch
> (0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
> don't have anything more to add.  If you are okay with changes, then
> we can move it to Ready For Committer unless someone else has some
> more comments.
>

Thanks for reviewing my patches. I've gone through the changes done by
you in the README file and few changes in code comments. The changes
looks valid to me. But, it seems like there are some more minor review
comments from Jesper which i will fix and share the new set of patches
shortly.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-23 Thread Michael Paquier
On Thu, Aug 24, 2017 at 12:38 PM, Bossart, Nathan  wrote:
> On 8/18/17, 12:56 AM, "Michael Paquier"  wrote:
> According to the docs, VACUUM and ANALYZE "do not support recursing over
> inheritance hierarchies" [1].  However, we need a list of OIDs for
> partitioned tables.  Namely, this piece of code in get_rel_oids(...):
>
> if (include_parts)
> oid_list = list_concat(oid_list,
>
> find_all_inheritors(relid, NoLock, NULL));
> else
> oid_list = lappend_oid(oid_list, relid);
>
> Since all of these OIDs should correspond to the same partitioned table,
> it makes sense to me to leave them together instead of breaking out each
> partition into a VacuumRelation.  If we did so, I think we would also need
> to duplicate the va_cols list for each partition.  What do you think?

Robert, Amit and other folks working on extending the existing
partitioning facility would be in better position to answer that, but
I would think that we should have something as flexible as possible,
and storing a list of relation OID in each VacuumRelation makes it
harder to track the uniqueness of relations vacuumed. I agree that the
concept of a partition with multiple parents induces a lot of
problems. But the patch as proposed worries me as it complicates
vacuum() with a double loop: one for each relation vacuumed, and one
inside it with the list of OIDs. Modules calling vacuum() could also
use flexibility, being able to analyze a custom list of columns for
each relation has value as well.

>> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
>> - * the RangeVar is used.  (The latter must always be passed, because it's
>> - * used for error messages.)
>> + * If we intend to process all relations, the 'relations' argument may be
>> + * NIL.
>> This comment actually applies to RelationAndColumns. If the OID is
>> invalid, then RangeVar is used, and should always be set. You are
>> breaking that promise actually for autovacuum. The comment here should
>> say that if relations is NIL all the relations of the database are
>> processes, and for an ANALYZE all the columns are done.
>
> Makes sense, I've tried to make this comment clearer.

+ * relations is a list of VacuumRelations to process.  If it is NIL, all
+ * relations in the database are processed.
Typo here, VacuumRelation is singular.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql command \graw

2017-08-23 Thread Pavel Stehule
Hi

2017-08-24 5:50 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> I have added the patch to the next commitfest.
>
> Patch applies, compiles, works.
>
> I'm okay with the names graw/graw+, and for having such short-hands.
>
> Missing break in switch, even if last item and useless, because other
> items do it... Also should be added at its place in alphabetical order?
>

I can do


>
> "column_header" is somehow redundant with "tuples_only". Use the
> existing one instead of adding a new one?
>

It is different - a result of tuples_only is just tuples - not column
names, not title, footer. I needed new special flag for enhancing
tuples_only to print column names


>
> More generally, ISTM that the same effect could be achieved without
> adding a new print function, but by setting more options (separator,
> ...) and calling an existing print function. If so, I think it would
> reduce the code size.
>

Maybe, maybe not. removing PRINT_RAW you need to enhance PRINT_UNALIGNED to
use one shot parameters and you have to teach it to print column names in
tuples_only mode. The code's length will be same. The form of this patch is
not final.


>
> Missing help entry.
>
> Missing non regression tests.
>
> Missing documentation.


yes - I wrote it like proof concept - be possible (for me, for others) to
verify usability of this commands (and design). I tested it against gnuplot
and looks it is works

Regards

Pavel


>
>
> --
> Fabien.
>


Re: [HACKERS] proposal: psql command \graw

2017-08-23 Thread Fabien COELHO


Hello Pavel,

I have added the patch to the next commitfest.

Patch applies, compiles, works.

I'm okay with the names graw/graw+, and for having such short-hands.

Missing break in switch, even if last item and useless, because other
items do it... Also should be added at its place in alphabetical order?

"column_header" is somehow redundant with "tuples_only". Use the
existing one instead of adding a new one?

More generally, ISTM that the same effect could be achieved without
adding a new print function, but by setting more options (separator,
...) and calling an existing print function. If so, I think it would
reduce the code size.

Missing help entry.

Missing non regression tests.

Missing documentation.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Combination with priority-based and quorum-based synchronous replications

2017-08-23 Thread Masahiko Sawada
Hi all,

PostgreSQL 9.6 introduced the priority-based multiple synchronous
replication and PostgreSQL 10 introduced the quorum-based one.
Initially I was thinking to use both synchronous replication ways in
combination but it's not supported yet for now. It's useful for some
use cases where for example we have three replicas: nodeA, nodeB and
nodeC. Two of them (nodeA and nodeC) are for read replica and another
one (nodeB) is for the backup. In this case we would want to replicate
the data synchronously to nodeB while replicating data to the nodeA
and nodeC using quorum-based synchronous replication. To cover such a
use case we need a feature allowing us to use both in combination.
IIUC other use cases are also mentioned on earlier discussion.

To implement there are two ideas.
1. Use two synchronous replication ways in combination. For above
example, we can set s_s_names = 'First 2 (nodeB, Any1(nodeA, nodeC))'.
This approach allows us to set a nested set of nodes in s_s_names. We
can consider supporting the more nested solution but one nested level
would be enough for most cases. Also, it would be useful if one
synchronization method can have multiple another synchronization
method. For example, I can imagine a use case where each two backup
data centers have two replicas and we want to send the data
synchronously either one replica on each data center. We can set
s_s_name = 'First 2( Any 1(nodeA, nodeB), Any 1(nodeC, nodeD))'. It
might be over engineering but it would be worth to consider it.

2. Extend quorum-based synchronous replication so that we can specify
the node that we want to replicate the data synchronously. For above
example, if we can set s_s_names = 'Any 2 (nodeA, *nodeB, nodeC)' then
the master server wait for nodeB and either nodeA or nodeC. That is, a
special mark '*' means that the marked nodes is preferentially
selected as synchronous server. This approach is more simpler than #1
approach but we cannot nest more than one and a synchronization method
cannot have an another method..

Feedback and comment are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Robert Haas
On Wed, Aug 23, 2017 at 6:55 PM, Douglas Doole  wrote:
> From previous experience, pushing the limit to the workers has the potential
> to be a big win .

Here's a not-really-tested draft patch for that.

> I haven't played with parallel mode at all yet. Is it possible to disable
> force_parallel_mode for the new test?

Yeah, I suppose that's another alternative.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


push-down-bound-to-workers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-23 Thread Fabien COELHO


Hello Alik,


I am attaching patch v7.


Patch generates multiple warnings with "git apply", apparently because of 
end-of-line spaces, and fails:


  pgbench-zipf-07v.patch:52: trailing whitespace.
{
  pgbench-zipf-07v.patch:53: trailing whitespace.
"random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN
  pgbench-zipf-07v.patch:54: trailing whitespace.
},
  pgbench-zipf-07v.patch:66: trailing whitespace.
  #define ZIPF_CACHE_SIZE 15  /* cache cells number */
  pgbench-zipf-07v.patch:67: trailing whitespace.

  error: patch failed: doc/src/sgml/ref/pgbench.sgml:1013
  error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
  error: patch failed: src/bin/pgbench/exprparse.y:191
  error: src/bin/pgbench/exprparse.y: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.c:93
  error: src/bin/pgbench/pgbench.c: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.h:75
  error: src/bin/pgbench/pgbench.h: patch does not apply

It also complains with "patch", although it seems to work:

 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file doc/src/sgml/ref/pgbench.sgml
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/exprparse.y
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.c
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.h
 patch unexpectedly ends in middle of line
 patch unexpectedly ends in middle of line

Please do not put spaces at the end of lines, eg there is a lonely tab on 
the second line of "computeHarmonicZipfian".


It seems that "CR" characters are used:

  sh> sha1sum ~/pgbench-zipf-07v.patch
  439ad1de0a960b3b415ff6de9412b3cc4d6922e7

  sh> file ~/pgbench-zipf-07v.patch
  pgbench-zipf-07v.patch: unified diff output, ASCII text, with CRLF line 
terminators

Compilation issues one warning:

 pgbench.c: In function ‘computeHarmonicZipfian’:
 pgbench.c:894:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
double  uniform = pg_erand48(thread->random_state);

Please conform to pg standard for portability, even if it is a very old one:-)


About the documentation:

I would suggest to replace 0.5 in the first example by 1.5, as a zipfian 
distribution with a lower-than-one parameter is not that well defined.


I'm fine with using references to papers or books for the algorithm.

The documentation lines in the sgml file are too long. At least
limit text paragraphs to maybe 80 columns as done elsewhere.

typo: " Zipfian" (remove space)

typo: "used(described" (missing space)

typo: "numbers(algorithm" (again)

The sentence "It's recommended to pick parameter in range 
(0, 1) for better accuracy." does not make sense with the updated 
generation algorithm.


The text would benefit from a review by a native English speaker, who I am 
not. Anyway here is an attempt at improving/simplifying the text (I have 
removed sgml formatting). If some native English speaker could review it, 
that would be great.


"""
  "random_zipfian" generates an approximated bounded zipfian distribution.
  For parameter in (0,1), an approximated algorithm is taken from
  "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, 
SIGMOD 1994.
  For parameter in (1, 1000), a rejection method is used, based on
  "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551, Springer 
1986.
  The distribution is not defined when the parameter's value is 1.0.
  The drawing performance is poor for parameter values close and above 1.0
  and on a small range.

  "parameter" defines how skewed the distribution is.
  The larger the "parameter", the more frequently values close to the beginning
  of the interval are drawn.
  The closer to 0 "parameter" is, the flatter (more uniform) the access 
distribution.
"""


English in comments and code:

"inited" is not English, I think you want "initialized". Maybe "nb_cells" 
would do.


"it is not exist" (multiple instances) -> "it does not exist".

I think that the references to the paper/book should appear as comments in 
the code, for instance in front of the functions which implement the 
algorithm.


"current_time", "last_touch_time" are counter based, they are not "time". 
I suggest to update the name to "current" and "last_touch" or "last_used".


"time when cell was used last time" -> "last used logical time"


About the code:

I would prefer that you use "1.0" instead of "1." for double constants.

I think that getZipfianRand should be symmetric. Maybe a direct formula 
could do:


  return min - 1 + (s > 1) ? xxx() : yyy();

or at least use an explicit "else" for symmetry.

Function "zipfn" asks for s and n as arguments, but ISTM that they are 
already include as fields in cell, so this seems redundant. Moreover I do 
not think that the zipfn function is that useful. I would suggest to 

Re: [HACKERS] pgbench tap tests & minor fixes

2017-08-23 Thread Fabien COELHO


While reviewing another patch, I figured out at least one potential issue 
on windows when handling execution status.


The attached v11 small updates is more likely to pass on windows.

--
Fabien.diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
index dc1367b..8255eff 100644
--- a/src/bin/pgbench/exprscan.l
+++ b/src/bin/pgbench/exprscan.l
@@ -360,6 +360,23 @@ expr_scanner_get_substring(PsqlScanState state,
 }
 
 /*
+ * get current expression line without one ending newline
+ */
+char *
+expr_scanner_chomp_substring(PsqlScanState state, int start_offset, int end_offset)
+{
+	const char *p = state->scanbuf;
+	/* chomp eol */
+	if (end_offset > start_offset && p[end_offset] == '\n')
+	{
+		end_offset--;
+		if (end_offset > start_offset && p[end_offset] == '\r')
+			end_offset--;
+	}
+	return expr_scanner_get_substring(state, start_offset, end_offset);
+}
+
+/*
  * Get the line number associated with the given string offset
  * (which must not be past the end of where we've lexed to).
  */
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae78c7b..b72fe04 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -229,7 +229,7 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions */
+	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
  * and --latency-limit */
 	SimpleStats latency;
@@ -329,7 +329,7 @@ typedef struct
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
 	/* per client collected stats */
-	int64		cnt;			/* transaction count */
+	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -2045,7 +2045,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	if (INSTR_TIME_IS_ZERO(now))
 		INSTR_TIME_SET_CURRENT(now);
 	now_us = INSTR_TIME_GET_MICROSEC(now);
-	while (thread->throttle_trigger < now_us - latency_limit)
+	while (thread->throttle_trigger < now_us - latency_limit &&
+		   (nxacts <= 0 || st->cnt < nxacts)) /* with -t, do not overshoot */
 	{
 		processXactStats(thread, st, , true, agg);
 		/* next rendez-vous */
@@ -2053,6 +2054,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		thread->throttle_trigger += wait;
 		st->txn_scheduled = thread->throttle_trigger;
 	}
+
+	if (nxacts > 0 && st->cnt >= nxacts)
+	{
+		st->state = CSTATE_FINISHED;
+		break;
+	}
 }
 
 st->state = CSTATE_THROTTLE;
@@ -2364,15 +2371,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
  */
 			case CSTATE_END_TX:
 
-/*
- * transaction finished: calculate latency and log the
- * transaction
- */
-if (progress || throttle_delay || latency_limit ||
-	per_script_stats || use_log)
-	processXactStats(thread, st, , false, agg);
-else
-	thread->stats.cnt++;
+/* transaction finished: calculate latency and do log */
+processXactStats(thread, st, , false, agg);
 
 if (is_connect)
 {
@@ -2381,7 +2381,6 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_ZERO(now);
 }
 
-++st->cnt;
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
 {
 	/* exit success */
@@ -2519,17 +2518,20 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 lag = 0.0;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+		per_script_stats || use_log;
 
-	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
-		INSTR_TIME_SET_CURRENT(*now);
-
-	if (!skipped)
+	if (detailed && !skipped)
 	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
+	/* detailed thread stats */
 	if (progress || throttle_delay || latency_limit)
 	{
 		accumStats(>stats, skipped, latency, lag);
@@ -2539,7 +2541,13 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 			thread->latency_late++;
 	}
 	else
+	{
+		/* no detailed stats, just count */
 		thread->stats.cnt++;
+	}
+
+	/* client stat is just counting */
+	st->cnt ++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
@@ -3026,8 +3034,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	PQExpBufferData word_buf;
 	int			word_offset;
 	int			offsets[MAX_ARGS];	/* offsets of argument words */
-	int			start_offset,
-end_offset;
+	int			start_offset;
 	int			lineno;
 	int			j;
 
@@ -3081,13 +3088,10 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 
 		my_command->expr = expr_parse_result;
 
-		/* Get location of the ending newline */
-		end_offset = expr_scanner_offset(sstate) - 1;
-
-		/* Save 

Re: [HACKERS] One-shot expanded output in psql using \gx

2017-08-23 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Tue, Aug 15, 2017 at 10:24:34PM +0200, Tobias Bussmann wrote:
> > I've tested the new \gx against 10beta and current git HEAD. Actually one 
> > of my favourite features of PostgreSQL 10! However in my environment it was 
> > behaving strangely. After some debugging I found that \gx does not work if 
> > you have \set FETCH_COUNT n before. Please find attached a patch that fixes 
> > this incl. new regression test.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Stephen,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

Just to preempt the coming nagging email, yes, I'm aware I didn't get to
finish this today (though at least today, unlike yesterday, I looked at
it and started to play with it..  I blame the celestial bodies that
caused an impressive traffic jam on Monday night causing me to lose
basically all of Tuesday too..).

I'll provide an update tomorrow.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

2017-08-23 Thread Peter Eisentraut
On 4/6/17 15:01, Peter Eisentraut wrote:
> On 2/15/17 11:19, Peter Eisentraut wrote:
>> So I would like to have a background worker limit per user, as you
>> allude to.  Attached is a patch that implements a GUC setting
>> max_worker_processes_per_user.
>>
>> Besides the uses for background sessions, but it can also be useful for
>> parallel workers, logical replication apply workers, or things like
>> third-party partitioning extensions.
> 
> Given that background sessions have been postponed, is there still
> interest in this separate from that?  It would be useful for per-user
> parallel worker limits, for example.

Here is a slightly updated patch for consideration in the upcoming
commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e99d8ccf9dfd007c054b20e8d10ffb35cfb722b7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 23 Aug 2017 19:10:21 -0400
Subject: [PATCH v2] Add max_worker_processes_per_user setting

---
 doc/src/sgml/config.sgml  | 28 
 doc/src/sgml/ref/create_role.sgml |  3 ++-
 src/backend/postmaster/bgworker.c | 28 
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/misc/guc.c  | 12 
 src/include/miscadmin.h   |  1 +
 6 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255ed95..24f1d13f8a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2041,6 +2041,34 @@ Asynchronous Behavior

   
 
+  
+   max_worker_processes_per_user 
(integer)
+   
+max_worker_processes_per_user configuration 
parameter
+   
+   
+   
+
+ Sets the maximum number of background processes allowed per user.  If
+ the setting is -1, then there is no limit.  That is also the default.
+
+
+
+ Only superusers can change this setting.
+ Unlike max_worker_processes, which controls the
+ overall instance limit, this setting can also be changed at run time
+ and can be set differently for different users by
+ using ALTER ROLE ... SET.
+
+
+
+ This setting is checked at the time the worker process is started
+ using the setting in the session that causes it to start.  Changes to
+ the setting will not affect already running workers.
+
+   
+  
+
   
max_parallel_workers_per_gather 
(integer)

diff --git a/doc/src/sgml/ref/create_role.sgml 
b/doc/src/sgml/ref/create_role.sgml
index 36772b678a..a7c547d907 100644
--- a/doc/src/sgml/ref/create_role.sgml
+++ b/doc/src/sgml/ref/create_role.sgml
@@ -204,7 +204,8 @@ Parameters
 the role can make.  -1 (the default) means no limit. Note that only
 normal connections are counted towards this limit. Neither prepared
 transactions nor background worker connections are counted towards
-this limit.
+this limit (see 
+for the latter).

   
  
diff --git a/src/backend/postmaster/bgworker.c 
b/src/backend/postmaster/bgworker.c
index 28af6f0f07..443dcaa4f3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -79,6 +79,7 @@ typedef struct BackgroundWorkerSlot
boolin_use;
boolterminate;
pid_t   pid;/* InvalidPid = not started 
yet; 0 = dead */
+   Oid roleid; /* user responsible for 
it */
uint64  generation; /* incremented when slot is 
recycled */
BackgroundWorker worker;
 } BackgroundWorkerSlot;
@@ -976,6 +977,32 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
return false;
}
 
+   /*
+* Check number of used slots for user
+*/
+   if (max_worker_processes_per_user >= 0)
+   {
+   int count = 0;
+
+   for (slotno = 0; slotno < BackgroundWorkerData->total_slots; 
++slotno)
+   {
+   BackgroundWorkerSlot *slot = 
>slot[slotno];
+
+   if (slot->in_use && slot->roleid == GetUserId())
+   count++;
+   }
+
+   if (count > max_worker_processes_per_user)
+   {
+   ereport(LOG,
+   
(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+errmsg("too many worker processes for 
role \"%s\"",
+   
GetUserNameFromId(GetUserId(), false;
+   LWLockRelease(BackgroundWorkerLock);
+   return false;
+   }
+   }
+
/*
 * Look for an 

Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Douglas Doole
>
> Buildfarm members with force_parallel_mode=regress are failing now.  I
> haven't had a chance to investigate exactly what's going on here, but
> I think there are probably several issues:
>

Not an auspicious beginning for my first patch :-(


> 3. However, even if it did that, it would only affect the leader, not
> the workers, because each worker will of course have a separate copy
> of each executor state node.  We could fix that by having the Gather
> or Gather Merge node also store the bound and propagate that to each
> worker via DSM, and then have each worker call pass_down_bound itself.
> (This would require a bit of refactoring of the API for
> pass_down_bound, but it looks doable.)
>

>From previous experience, pushing the limit to the workers has the
potential to be a big win .

In the short run, I'm not sure we have a better alternative than
> removing this test - unless somebody has a better idea? - but it would
> be good to work on all of the above problems.
>

I haven't played with parallel mode at all yet. Is it possible to disable
force_parallel_mode for the new test? If not, then I'd agree that removing
the test is probably the only reasonable short term fix.

- Doug
Salesforce


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Robert Haas
On Mon, Aug 21, 2017 at 2:43 PM, Robert Haas  wrote:
> Works for me.  While I'm sure this won't eclipse previous achievements
> in this area, it still seems worthwhile.  So committed with one minor
> change to shorten a long-ish line.

Buildfarm members with force_parallel_mode=regress are failing now.  I
haven't had a chance to investigate exactly what's going on here, but
I think there are probably several issues:

1. It's definitely the case that the details about a sort operation
aren't propagated from a worker back to the leader.  This has elicited
complaint previously.

2. pass_down_bound() probably gets the Gather node at the top of the
tree and stops, since it doesn't know how to pass down a bound through
a Gather.  We could probably teach it to pass down the bound through
Gather or Gather Merge on the same theory as Append/MergeAppend.

3. However, even if it did that, it would only affect the leader, not
the workers, because each worker will of course have a separate copy
of each executor state node.  We could fix that by having the Gather
or Gather Merge node also store the bound and propagate that to each
worker via DSM, and then have each worker call pass_down_bound itself.
(This would require a bit of refactoring of the API for
pass_down_bound, but it looks doable.)

In the short run, I'm not sure we have a better alternative than
removing this test - unless somebody has a better idea? - but it would
be good to work on all of the above problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-23 Thread Robert Haas
On Wed, Aug 23, 2017 at 12:42 PM, Andres Freund  wrote:
> I don't think that's sufficient. make, and especially check-world,
> should have a decent coverage of the code locally. Without having to
> know about options like force_parallel_mode=regress. As e.g. evidenced
> by the fact that Thomas's latest version crashed if you ran the tests
> that way.  If there's a few lines that aren't covered by the plain
> tests, and more than a few node + parallelism combinations, I'm not
> bothered much. But this is (soon hopefully was) a fairly complicated
> piece of infrastructure - that should be exercised.  If necessary that
> can just be a BEGIN; SET LOCAL force_parallel_mode=on; query with
> blessed descs;COMMIT or whatnot - it's not like we need something hugely
> complicated here.

Yeah, we've been bitten before by changes that seemed OK when run
without force_parallel_mode but misbehaved with that option, so it
would be nice to improve things.  Now, I'm not totally convinced that
just adding a test around blessed tupledescs is really going to help
very much - that option exercises a lot of code, and this is only one
relatively small bit of it.  But I'm certainly not objecting to the
idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Out of date comment in predicate.c

2017-08-23 Thread Peter Eisentraut
On 7/6/17 21:06, Thomas Munro wrote:
> On Sat, Jul 1, 2017 at 6:38 AM, Peter Eisentraut
>  wrote:
>> On 6/27/17 01:21, Thomas Munro wrote:
>>> Commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f got rid of
>>> FirstPredicateLockMgrLock, but it's still referred to in a comment in
>>> predicate.c where the locking protocol is documented.  I think it's
>>> probably best to use the name of the macro that's usually used to
>>> access the lock array in the code.  Please see attached.
>>
>> Does this apply equally to PredicateLockHashPartitionLock() and
>> PredicateLockHashPartitionLockByIndex()?  Should the comment mention or
>> imply both?
> 
> Yeah, I guess so.  How about listing the hashcode variant, as it's the
> more commonly used and important for a reader to understand of the
> two, but mentioning the ByIndex variant in a bullet point below?  Like
> this.

Committed and backpatched to 9.4.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-23 Thread Josh Berkus
On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
> WARNING:  what you did is ok, but you might have wanted to do something else
> 
> First of all, whether or not that can properly be called a warning is
> highly debatable.  Also, if you do that sort of thing to your spouse
> and/or children, they call it "nagging".  I don't think users will
> like it any more than family members do.

Realistically, we'll support the backwards-compatible syntax for 3-5
years.  Which is fine.

I suggest that we just gradually deprecate the old syntax from the docs,
and then around Postgres 16 eliminate it.  I posit that that's better
than changing the meaning of the old syntax out from under people.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Silent bug in transformIndexConstraint

2017-08-23 Thread Serge Rielau
In parse_utilcmd.c: transformIndexConstraint() resides the following piece 
of code:



/* * For UNIQUE and PRIMARY KEY, we just have a list of column names. * * 
Make sure referenced keys exist. If we are making a PRIMARY KEY index, * 
also make sure they are NOT NULL, if possible. (Although we could leave * 
it to DefineIndex to mark the columns NOT NULL, it's more efficient to * 
get it right the first time.) */ foreach(lc, constraint->keys) { char *key 
= strVal(lfirst(lc));
The strVal() is wrong since first(lc) returns an IndexElem * and not a 
Value * and we should be doing: char *key = ((IndexElem *) 
lfirst(lc))->name
The existing code only works by luck because Value.val.str happens to match 
the same offset as IndexElem.name.

Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-23 Thread Andres Freund
On 2017-08-23 09:45:38 -0400, Robert Haas wrote:
> On Wed, Aug 23, 2017 at 1:46 AM, Andres Freund  wrote:
> > For later commits in the series:
> > - Afaict the whole shared tupledesc stuff, as tqueue.c before, is
> >   entirely untested. This baffles me. See also [1]. I can force the code
> >   to be reached with force_parallel_mode=regress/1, but this absolutely
> >   really totally needs to be reached by the default tests. Robert?
> 
> force_parallel_mode=regress is a good way of testing this because it
> keeps the leader from doing the work, which would likely dodge any
> bugs that happened to exist.  If you want to test something in the
> regular regression tests, using force_parallel_mode=on is probably a
> good way to do it.
> 
> Also note that there are 3 buildfarm members that test with
> force_parallel_mode=regress on a regular basis, so it's not like there
> is no automated coverage of this area.

I don't think that's sufficient. make, and especially check-world,
should have a decent coverage of the code locally. Without having to
know about options like force_parallel_mode=regress. As e.g. evidenced
by the fact that Thomas's latest version crashed if you ran the tests
that way.  If there's a few lines that aren't covered by the plain
tests, and more than a few node + parallelism combinations, I'm not
bothered much. But this is (soon hopefully was) a fairly complicated
piece of infrastructure - that should be exercised.  If necessary that
can just be a BEGIN; SET LOCAL force_parallel_mode=on; query with
blessed descs;COMMIT or whatnot - it's not like we need something hugely
complicated here.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row level security (RLS) for updatable views

2017-08-23 Thread Daurnimator
On 24 December 2015 at 11:03, Caleb Meredith  wrote:
> There should be a way to do separate read/write security barriers for
> updatable views. I'll start by addressing the problem, state some potential
> solutions with the current software, and finally end with 2 proposals to
> solve the problem in the best way possible.
>
> ## Problem
> I want the user to see more rows then they can edit, a common scenario. Like
> a blog, the user can read all the posts but they can only edit their own.
>
> Users of my database are directly reading and writing to and from views, I
> have chosen to use views to hide implementation details from the user and
> add extra metadata columns. However, in doing so I have lost the row level
> security capabilities for tables that postgres 9.5 provides.
>
> More specifically I'm using the [PostgREST][1] API which detects relations
> in a postgres schema and exposes an HTTP REST interface.
>
> ## Exploration
> I asked [this][2] question on stack overflow for clarification on why
> currently postgres does not allow row level security for views. I also
> explored some other mechanisms to provide this functionality:
>
> 1. Row level security on the parent table: This removes information about
> the user making the request and mixes view schema details with table schema
> details, I'd prefer to not have to do that.
> 2. Two views: One which is the general selection view, and the second which
> is a security definer view which selects everything from the general view
> and adds a where clause. This is what I'm currently using, but it's not
> optimal because it requires a naming convention (I'm using "people" and
> "~people") and it requires a little more domain knowledge + decreases
> interoperability.
> 3. Triggers/rules: Use a trigger to override the behavior of the view when
> writing to the database. This requires 3 triggers/rules (INSERT, UPDATE,
> DELETE) and kinda defeats the entire purpose of having an updatable view.
> 4. Conditional triggers/rules: Have a trigger which throws an error when the
> condition is true (using the WHEN keyword). This just doesn't work because
> a) triggers can only replace operations on views (no BEFORE or AFTER) and b)
> the WHEN keyword doesn't work on triggers which replace operations.
>
> ## Proposal 1: Add RLS to views
> Therefore I propose adding support for to views. The syntax would be the
> same:
>
> ALTER VIEW … ENABLE ROW LEVEL SECURITY;
>
> and the corresponding:
>
> CREATE POLICY …
>
> command would work the same. The most important part of this implementation
> would be that the row level security `current_user` be the invoker and *not*
> the definer.
>
> Theoretically I think this would be simple enough to implement as row level
> security seemingly is just adding a couple extra WHERE conditions to a query
> on the relation, and there is already some support for views which are
> security definers. Row level security of this nature could only be enabled
> on updatable views.
>
> This would be my preferred solution to the problem.
>
> ## Proposal 2: Different where condition for reads and writes
> This might be simpler to implement, but also not as verbose as the first
> proposal. It involves extending the CREATE VIEW syntax for updatable views
> with a WITH BARRIER expression. Similar to how WITH CHECK works for RLS
> policies it would be added to the view's select statement on INSERT, UPDATE,
> and DELETE. It might look like the following:
>
> CREATE VIEW posts
>   WITH (check_option = 'cascaded', security_barrier)
>   AS SELECT p.id, p.headline, p.text
>FROM private.posts as p
>   WITH BARRIER (p.author = current_user);
>
> This would allow any user to look at all the views, but only ever write to
> their own. All operations of the view are the same except the barrier is
> appended to INSERT, UPDATE, and DELETEs.
>
> The weakness of this approach comes in the following:
>
> CREATE VIEW posts
>   WITH (check_option = 'cascaded', security_barrier)
>   AS SELECT p.id, p.headline, p.text
>FROM private.posts as p
>   WHERE p.published = true
>   WITH BARRIER (p.author = current_user);
>
> The above view would show all published posts to all users, but owners of
> unpublished posts could not edit their posts. This might be solved by making
> the barrier action specific so maybe WITH BARRIER INSERT, UPDATE, DELETE (…)
> and WITH BARRIER SELECT (…)?
>
> This second proposal might be easier to implement and works well with how
> views currently function, however it is not preferred because it cannot add
> different barriers for different users.
>
> Thanks for your time, these are just some rough ideas I have had to solve my
> problem. I hope this can be resolved for all developers looking to build
> advanced systems with postgres.
>
> – Caleb Meredith
>
> [1]: https://github.com/begriffs/postgrest
> [2]:
> 

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Jesper Pedersen

On 08/23/2017 07:38 AM, Amit Kapila wrote:

Thanks for the new version.  I again looked at the patches and fixed
quite a few comments in the code and ReadMe.  You have forgotten to
update README for the changes in vacuum patch
(0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
don't have anything more to add.  If you are okay with changes, then
we can move it to Ready For Committer unless someone else has some
more comments.



Just some minor comments.

README:
+   it's pin till the end of scan)

its pin till the end of the scan)

+To minimize lock/unlock traffic, hash index scan always searches entire 
hash


To minimize lock/unlock traffic, hash index scan always searches the 
entire hash


hashsearch.c:

+static inline void _hash_saveitem(HashScanOpaque so, int itemIndex,
+  OffsetNumber offnum, IndexTuple itup);

There are other instances of "inline" in the code base, so I guess that 
this is ok.


+* Advance to next tuple on current page; or if there's no more, try to

Advance to the next tuple on the current page; or if done, try to

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-23 Thread Amit Kapila
On Wed, Aug 23, 2017 at 11:05 AM, Haribabu Kommi
 wrote:
>
>
> On Mon, Aug 21, 2017 at 7:25 PM, Amit Kapila 
> wrote:
>>
>> On Mon, Aug 21, 2017 at 12:58 PM, Haribabu Kommi
>>  wrote:
>> >
>> > On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila 
>> > wrote:
>> >>
>> >>
>> >> Also, it is quite possible that some of the storage Am's don't even
>> >> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
>> >> guess what we need here is to provide a way so that different storage
>> >> am's can register their function pointer for an equivalent to
>> >> satisfies function.  So, we need to change
>> >> SnapshotData.SnapshotSatisfiesFunc in some way so that different
>> >> handlers can register their function instead of using that directly.
>> >> I think that should address the problem you are planning to solve by
>> >> omitting buffer parameter.
>> >
>> >
>> > Thanks for your suggestion. Yes, it is better to go in the direction of
>> > SnapshotSatisfiesFunc.
>> >
>> > I verified the above idea of implementing the Tuple visibility functions
>> > and assign them into the snapshotData structure based on the snapshot.
>> >
>> > The Tuple visibility functions that are specific to the relation are
>> > available
>> > with the RelationData structure and this structure may not be available,
>> >
>>
>> Which functions are you referring here?  I don't see anything in
>> tqual.h that uses RelationData.
>
>
>
> With storage API's, the tuple visibility functions are available with
> RelationData
> and those are needs used to update the SnapshotData structure
> SnapshotSatisfiesFunc member.
>
> But the RelationData is not available everywhere, where the snapshot is
> created,
> but it is available every place where the tuple visibility is checked. So I
> just changed
> the way of checking the tuple visibility with the information of snapshot by
> calling
> the corresponding tuple visibility function from RelationData.
>
> If SnapshotData provides MVCC, then the MVCC specific tuple visibility
> function from
> RelationData is called. The SnapshotSatisfiesFunc member is changed to a
> enum
> that holds the tuple visibility type such as MVCC, DIRTY, SELF and etc.
> Whenever
> the visibility check is needed, the corresponding function is called.
>

It will be easy to understand and see if there is some better
alternative once you have something in the form of a patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-23 Thread Robert Haas
On Wed, Aug 23, 2017 at 1:46 AM, Andres Freund  wrote:
> For later commits in the series:
> - Afaict the whole shared tupledesc stuff, as tqueue.c before, is
>   entirely untested. This baffles me. See also [1]. I can force the code
>   to be reached with force_parallel_mode=regress/1, but this absolutely
>   really totally needs to be reached by the default tests. Robert?

force_parallel_mode=regress is a good way of testing this because it
keeps the leader from doing the work, which would likely dodge any
bugs that happened to exist.  If you want to test something in the
regular regression tests, using force_parallel_mode=on is probably a
good way to do it.

Also note that there are 3 buildfarm members that test with
force_parallel_mode=regress on a regular basis, so it's not like there
is no automated coverage of this area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] locale problem of bgworker: logical replication launcher and worker process

2017-08-23 Thread Peter Eisentraut
On 8/22/17 01:19, Ioseph Kim wrote:
> 2017-08-22 14:06:21.697 KST [306] 로그:  logical replication apply
> worker for subscription "replica_a" has started
> 2017-08-22 14:06:21.698 KST [306] 오류:  발행 서버에 연결 할 수 없음:
> ??? ??? ? ??: ??? ???
>         "localhost" (::1)  ??? ?? ???,
>         5432 ??? TCP/IP ???  ??.
>     ??? ??? ? ??: ??? ???
>         "localhost" (127.0.0.1)  ??? ?? ???,
>         5432 ??? TCP/IP ???  ??.
> -
> 
> main postmaster messages are printed  in korean well,
> but bgworker process message is not.
> 
> This problem seems to have occurred because the server locale
> environment and the client's that are different.

I have tried it locally with a ko_KR locale, and it seems to work
correctly for me.  Still, I can imagine there are all kinds of ways this
could go wrong in particular configurations.  Could you construct a
reproducible test setup, including specific initdb and locale settings,
operating system, etc.?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obsolete code in pg_upgrade

2017-08-23 Thread Robert Haas
On Tue, Aug 22, 2017 at 8:28 PM, Peter Eisentraut
 wrote:
> It seems to me that this code in pg_upgrade/check.c has been useless
> since at least version 9.1:
>
> /* Is it 9.0 but without tablespace directories? */
> if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
> new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS_CAT_VER)
> pg_fatal("This utility can only upgrade to PostgreSQL version
> 9.0 after 2010-01-11\n"
>  "because of backend API changes made during
> development.\n");

I think I agree.  It seems to me that the version of pg_upgrade
shipped with release N only needs to support upgrades to release N,
not older releases.  There's probably room for debate about whether a
pg_upgrade needs to support direct upgrades FROM very old releases,
but we need not decide anything about that right now.

I think you could go ahead and rip out this code, but as it seems like
a non-critical change, -1 for back-patching it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql command \graw

2017-08-23 Thread Pavel Stehule
Hi

2017-08-22 11:22 GMT+02:00 Pavel Stehule :

>
>
> 2017-08-22 10:46 GMT+02:00 Fabien COELHO :
>
>>
>> Hello Pavel,
>>
>> One my idea is introduction new simple output format and execution command
>>> with result in this format.
>>>
>>> It should work something like
>>>
>>> \setenv GNUPLOT_OPTION '..'
>>> SELECT * FROM data
>>> \graw | gnuplot ...
>>>
>>
>> I understand that it is kind of a shortcut for:
>>
>>   \pset fieldsep ' '
>>   \pset format unaligned
>>   \pset tuples_only on
>>   -- possibly other settings...
>>   SELECT * FROM data \g | gnuplot '...'
>
>
>> And then you have to persuade gnuplot to take its data from stdin?
>
>
> There are some methods
>
> https://stackoverflow.com/questions/17543386/pipe-plot-
> data-to-gnuplot-script/17576571#17576571
>
>
>
postgres=# select pocet_muzu + pocet_zen from obce
postgres-# \graw | gnuplot -p -e "set terminal dumb; plot '-' with boxes"





  1.4e+06
+-+-+---+---++---+---+---+-+-+
  +   +   +   ++   +   +   +
+
  |   *'-' ***
|
  1.2e+06 +-+ *
 +-+
  |   *
 |
1e+06 +-+ *
 +-+
  |   *
 |
  |   *
 |
   80 +-+ *
 +-+
  |   *
 |
  |   *
 |
   60 +-+ *
 +-+
  |   *
 |
  |   *
 |
   40 +-+ * *
 +-+
  |   * **
|
   20 +-+ * **
+-+
  |   *   * **
|
  +   * *  *  +** *  **    *   *  * *  + ***  
+
0
+-+----+-+
-1000 0  10002000 3000400050006000
 7000


postgres=#




>
>>
>> --
>> Fabien.
>>
>
>
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 96f3a402a4..90f88a8280 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -335,7 +335,8 @@ exec_command(const char *cmd,
 		status = exec_command_errverbose(scan_state, active_branch);
 	else if (strcmp(cmd, "f") == 0)
 		status = exec_command_f(scan_state, active_branch);
-	else if (strcmp(cmd, "g") == 0 || strcmp(cmd, "gx") == 0)
+	else if (strcmp(cmd, "g") == 0 || strcmp(cmd, "gx") == 0 ||
+			 strcmp(cmd, "graw") == 0 || strcmp(cmd, "graw+") == 0)
 		status = exec_command_g(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "gexec") == 0)
 		status = exec_command_gexec(scan_state, active_branch);
@@ -1300,6 +1301,7 @@ exec_command_f(PsqlScanState scan_state, bool active_branch)
 
 /*
  * \g [filename] -- send query, optionally with output to file/pipe
+ * \graw [filename] -- same as \g with raw format
  * \gx [filename] -- same as \g, with expanded mode forced
  */
 static backslashResult
@@ -1322,6 +1324,10 @@ exec_command_g(PsqlScanState scan_state, bool active_branch, const char *cmd)
 		free(fname);
 		if (strcmp(cmd, "gx") == 0)
 			pset.g_expanded = true;
+		else if (strcmp(cmd, "graw") == 0)
+			pset.g_raw = true;
+		else if (strcmp(cmd, "graw+") == 0)
+			pset.g_raw_header = true;
 		status = PSQL_CMD_SEND;
 	}
 	else
@@ -3701,6 +3707,8 @@ _align2string(enum printFormat in)
 		case PRINT_TROFF_MS:
 			return "troff-ms";
 			break;
+		case PRINT_RAW:
+			return "raw";
 	}
 	return "unknown";
 }
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 044cdb82a7..05af28ee40 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -816,6 +816,12 @@ PrintQueryTuples(const PGresult *results)
 	/* one-shot expanded output requested via \gx */
 	if (pset.g_expanded)
 		my_popt.topt.expanded = 1;
+	else if (pset.g_raw || pset.g_raw_header)
+	{
+		my_popt.topt.format = PRINT_RAW;
+		my_popt.topt.tuples_only = true;
+		my_popt.topt.column_header = pset.g_raw_header;
+	}
 
 	/* write output to \g argument, if any */
 	if (pset.gfname)
@@ -845,6 +851,8 @@ PrintQueryTuples(const PGresult *results)
 }
 
 
+
+
 /*
  * StoreQueryTuple: assuming query result is OK, save data into variables
  *
@@ -1460,6 +1468,10 @@ sendquery_cleanup:
 	/* reset \gx's expanded-mode flag */
 	pset.g_expanded = false;
 
+	/* reset \graw flags */
+	pset.g_raw = false;
+	pset.g_raw_header = false;
+
 	/* reset \gset trigger */
 	if (pset.gset_prefix)
 	{
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index b78f151acd..6a2fb6b2dd 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -92,6 +92,8 @@ typedef struct _psqlSettings
 
 	char	   *gfname;			/* one-shot file output argument for \g */
 	bool		g_expanded;		/* one-shot expanded output requested via \gx */
+	bool		g_raw;			/* one-shot raw format for \graw */
+	bool		g_raw_header;	/* one-shot show header for \graw+ */
 	char	   *gset_prefix;	/* one-shot prefix argument for \gset 

Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-23 Thread Thomas Munro
On Wed, Aug 23, 2017 at 11:58 PM, Thomas Munro
 wrote:
> On Wed, Aug 23, 2017 at 5:46 PM, Andres Freund  wrote:
>> - Afaict GetSessionDsmHandle() uses the current rather than
>>   TopMemoryContext. Try running the regression tests under
>>   force_parallel_mode - crashes immediately for me without fixing that.
>
> Gah, right.  Fixed.

That version missed an early return case where dsm_create failed.
Here's a version that restores the caller's memory context in that
case too.

-- 
Thomas Munro
http://www.enterprisedb.com


shared-record-typmods-v8.patchset.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-23 Thread Thomas Munro
On Wed, Aug 23, 2017 at 5:46 PM, Andres Freund  wrote:
> Committing 0003. This'll probably need further adjustment, but I think
> it's good to make progress here.

Thanks!

> Changes:
> - pgindent'ed after adding the necessary typedefs to typedefs.list
> - replaced INT64CONST w UINT64CONST
> - moved count assertion in delete_item to before decrementing - as count
>   is unsigned, it'd just wrap around on underflow not triggering the 
> assertion.
> - documented and asserted resize is called without partition lock held
> - removed reference to iterator in dshash_find comments
> - removed stray references to dshash_release (rather than dshash_release_lock)
> - reworded dshash_find_or_insert reference to dshash_find to also
>   mention error handling.

Doh.  Thanks.

> Notes for possible followup commits of the dshash API:
> - nontrivial portions of dsahash are essentially critical sections lest
>   dynamic shared memory is leaked. Should we, short term, introduce
>   actual critical section markers to make that more obvious? Should we,
>   longer term, make this more failsafe / easier to use, by
>   extending/emulating memory contexts for dsa memory?

Hmm.  I will look into this.

> - I'm very unconvinced of supporting both {compare,hash}_arg_function
>   and the non-arg version. Why not solely support the _arg_ version, but
>   add the size argument? On all relevant platforms that should still be
>   register arg callable, and the branch isn't free either.

Well, the idea was that both versions were compatible with existing
functions: one with DynaHash's hash and compare functions and the
other with qsort_arg's compare function type.  In the attached version
I've done as you suggested in 0001.  Since I guess many users will
finish up wanting raw memory compare and hash I've provided
dshash_memcmp() and dshash_memhash().  Thoughts?

Since there is no attempt to be compatible with anything else, I was
slightly tempted to make equal functions return true for a match,
rather than the memcmp-style return value but figured it was still
better to be consistent.

> - might be worthwhile to try to reduce duplication between
>   delete_item_from_bucket, delete_key_from_bucket, delete_item
>   dshash_delete_key.

Yeah.  I will try this and send a separate refactoring patch.

> For later commits in the series:
> - Afaict the whole shared tupledesc stuff, as tqueue.c before, is
>   entirely untested. This baffles me. See also [1]. I can force the code
>   to be reached with force_parallel_mode=regress/1, but this absolutely
>   really totally needs to be reached by the default tests. Robert?

A fair point.  0002 is a simple patch to push some blessed records
through a TupleQueue in select_parallel.sql.  It doesn't do ranges and
arrays (special cases in the tqueue.c code that 0004 rips out), but
for exercising the new shared code I believe this is enough.  If you
apply just 0002 and 0004 then this test fails with a strange confused
record decoding error as expected.

> - gcc wants static before const (0004).

Fixed.

> - Afaict GetSessionDsmHandle() uses the current rather than
>   TopMemoryContext. Try running the regression tests under
>   force_parallel_mode - crashes immediately for me without fixing that.

Gah, right.  Fixed.

> - SharedRecordTypmodRegistryInit() is called from GetSessionDsmHandle()
>   which calls EnsureCurrentSession(), but
>   SharedRecordTypmodRegistryInit() does so again - sprinkling those
>   around liberally seems like it could hide bugs.

Yeah.  Will look into this.

-- 
Thomas Munro
http://www.enterprisedb.com


shared-record-typmods-v7.patchset.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Amit Kapila
On Tue, Aug 22, 2017 at 7:24 PM, Ashutosh Sharma  wrote:
> On Tue, Aug 22, 2017 at 3:55 PM, Amit Kapila  wrote:
>
> Okay, I got your point now. I think, currently in _hash_kill_items(),
> if an overflow page is pinned we do not check if it got modified since
> the last read or
> not. Hence, if the vacuum runs on an overflow page that is pinned and
> also has some dead tuples in it then it could create a problem for
> scan basically,
> when scan would attempt to mark the killed items as dead. To get rid
> of such problem, i think, even if an overflow page is pinned we should
> check if it got
> modified or not since the last read was performed on the page. If yes,
> then do not allow scan to mark killed items as dead. Attached is the
> newer version with these changes along with some other cosmetic
> changes mentioned in your earlier email. Thanks.
>

Thanks for the new version.  I again looked at the patches and fixed
quite a few comments in the code and ReadMe.  You have forgotten to
update README for the changes in vacuum patch
(0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
don't have anything more to add.  If you are okay with changes, then
we can move it to Ready For Committer unless someone else has some
more comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Rewrite-hash-index-scan-to-work-page-at-a-time_v13.patch
Description: Binary data


0002-Remove-redundant-hash-function-_hash_step-and-do-som.patch
Description: Binary data


0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-08-23 Thread Ashutosh Bapat
On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
 wrote:
> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one.  It's not fully cooked yet though.
>
> Meanwhile, I thought I'd share a couple of patches that implement some
> restructuring of the planner code related to partitioned table inheritance
> planning that I think would be helpful.  They are to be applied on top of
> the patches being discussed at [1].  Note that these patches themselves
> don't implement the actual code that replaces constraint exclusion as a
> method of performing partition pruning.  I will share that patch after
> debugging it some more.
>
> The main design goal of the patches I'm sharing here now is to defer the
> locking and  opening of leaf partitions in a given partition tree to a
> point after set_append_rel_size() is called on the root partitioned table.
>  Currently, AFAICS, we need to lock and open the child tables in
> expand_inherited_rtentry() only to set the translated_vars field in
> AppendRelInfo that we create for the child.  ISTM, we can defer the
> creation of a child AppendRelInfo to a point when it (its field
> translated_vars in particular) will actually be used and so lock and open
> the child tables only at such a time.  Although we don't lock and open the
> partition child tables in expand_inherited_rtentry(), their RT entries are
> still created and added to root->parse->rtable, so that
> setup_simple_rel_arrays() knows the maximum number of entries
> root->simple_rel_array will need to hold and allocate the memory for that
> array accordingly.   Slots in simple_rel_array[] corresponding to
> partition child tables will be empty until they are created when
> set_append_rel_size() is called on the root parent table and it determines
> the partitions that will be scanned after all.

The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.

There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.

>
> Patch augments the existing PartitionedChildRelInfo node, which currently
> holds only the partitioned child rel RT indexes, to carry some more
> information about the partition tree, which includes the information
> returned by RelationGetPartitionDispatchInfo() when it is called from
> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
> be able to add partitions to the query tree in the bound order).
> Actually, since PartitionedChildRelInfo now contains more information
> about the partition tree than it used to before, I thought the struct's
> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
> only use that node internally.
>
> Then during the add_base_rels_to_query() step, when build_simple_rel()
> builds a RelOptInfo for the root partitioned table, it also initializes
> some newly introduced fields in RelOptInfo from the information contained
> in PartitionRootInfo of the table.  The aforementioned fields are only
> initialized in RelOptInfos of root partitioned tables.  Note that the
> add_base_rels_to_query() step won't add the partition "otherrel"
> RelOptInfos yet (unlike the regular inheritance case, where they are,
> after looking them up in root->append_rel_list).

Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch 

Re: [HACKERS] Partition-wise aggregation/grouping

2017-08-23 Thread Jeevan Chalke
Hi,

Attached is the patch to implement partition-wise aggregation/grouping.

As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then add
finalization step over it.

I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.

This feature is only used when enable_partition_wise_agg GUC is set to on.

Here are the details of the patches in the patch-set:

0001 - Refactors sort and hash final grouping paths into separate functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.

0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.

0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.

0004 - Implements partition-wise aggregation.

0005 - Adds test-cases.

0006 - postgres_fdw changes which enable pushing aggregation for other upper
relations.


Since this patch is highly dependent on partition-wise join [1], one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.

Suggestions / feedback / inputs ?

[1]
https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=p1juky...@mail.gmail.com


On Tue, Mar 21, 2017 at 12:47 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi all,
>
> Declarative partitioning is supported in PostgreSQL 10 and work is already
> in
> progress to support partition-wise joins. Here is a proposal for
> partition-wise
> aggregation/grouping.  Our initial performance measurement has shown 7
> times
> performance when partitions are on foreign servers and approximately 15%
> when
> partitions are local.
>
> Partition-wise aggregation/grouping computes aggregates for each partition
> separately.  If the group clause contains the partition key, all the rows
> belonging to a given group come from one partition, thus allowing
> aggregates
> to be computed completely for each partition.  Otherwise, partial
> aggregates
> computed for each partition are combined across the partitions to produce
> the
> final aggregates. This technique improves performance because:
> i. When partitions are located on foreign server, we can push down the
> aggregate to the foreign server.
> ii. If hash table for each partition fits in memory, but that for the whole
> relation does not, each partition-wise aggregate can use an in-memory hash
> table.
> iii. Aggregation at the level of partitions can exploit properties of
> partitions like indexes, their storage etc.
>
> Attached an experimental patch for the same based on the partition-wise
> join
> patches posted in [1].
>
> This patch currently implements partition-wise aggregation when group
> clause
> contains the partitioning key.  A query below, involving a partitioned
> table
> with 3 partitions containing 1M rows each, producing total 30 groups showed
> 15% improvement over non-partition-wise aggregation. Same query showed 7
> times
> improvement when the partitions were located on the foreign servers.
>
> Here is the sample plan:
>
> postgres=# set enable_partition_wise_agg to true;
> SET
> postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
>   QUERY
> PLAN
> 
> --
>  Append  (cost=5100.00..61518.90 rows=30 width=12) (actual
> time=324.837..944.804 rows=30 loops=1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=324.837..324.838 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p1 plt1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=309.954..309.956 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p2 plt1)
>->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
> time=310.002..310.004 rows=10 loops=1)
>  Relations: Aggregate on (public.fplt1_p3 plt1)
>  Planning time: 0.370 ms
>  Execution time: 945.384 ms
> (9 rows)
>
> postgres=# set enable_partition_wise_agg to 

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-08-23 Thread Andres Freund
Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
> 
> Firstly, sorry for the delayed response as I got busy with other
> activities.

No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.


> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
> 
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.


> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...



> +   
> +Much like asynchronous query mode, there is no performance disadvantage 
> to
> +using batching and pipelining. It increases client application complexity
> +and extra caution is required to prevent client/server deadlocks but
> +can sometimes offer considerable performance improvements.
> +   

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.


> +   
> +Batching is most useful when the server is distant, i.e. network latency
> +(ping time) is high, and when many small operations are 
> being performed in
> +rapid sequence. There is usually less benefit in using batches when each
> +query takes many multiples of the client/server round-trip time to 
> execute.
> +A 100-statement operation run on a server 300ms round-trip-time away 
> would take
> +30 seconds in network latency alone without batching; with batching it 
> may spend
> +as little as 0.3s waiting for results from the server.
> +   

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.


> +   
> +Use batches when your application does lots of small
> +INSERT, UPDATE and
> +DELETE operations that can't easily be transformed 
> into
> +operations on sets or into a
> +COPY operation.
> +   

Aren't SELECTs also a major beneficiarry of this?


> +   
> +Batching is less useful when information from one operation is required 
> by the
> +client before it knows enough to send the next operation.

s/less/not/


> +   
> +
> + The batch API was introduced in PostgreSQL 10.0, but clients using 
> PostgresSQL 10.0 version of libpq can
> + use batches on server versions 8.4 and newer. Batching works on any 
> server
> + that supports the v3 extended query protocol.
> +
> +   

Where's the 8.4 coming from?


> +   
> +The client uses libpq's asynchronous query functions to dispatch work,
> +marking the end of each batch with PQbatchSyncQueue.
> +And to get results, it uses PQgetResult and
> +PQbatchProcessQueue. It may eventually exit
> +batch mode with PQexitBatchMode once all results are
> +processed.
> +   
> +
> +   
> +
> + It is best to use batch mode with libpq in
> + non-blocking mode. If 
> used in
> + blocking mode it is possible for a client/server deadlock to occur. The
> + client will block trying to send queries to the server, but the server 
> will
> + block trying to send results from queries it has already processed to 
> the
> + client. This only occurs when the client sends enough queries to fill 
> its
> + output buffer and the server's receive buffer before switching to
> + processing input from the server, but it's hard to predict exactly when
> + that'll happen so it's best to always use non-blocking mode.
> +
> +   

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.


> +   
> +Issuing queries
> +
> +
> + After entering batch mode the application dispatches requests
> + using normal asynchronous libpq functions 
> such as 
> + PQsendQueryParams, 
> PQsendPrepare,
> + PQsendQueryPrepared, 
> PQsendDescribePortal,
> + PQsendDescribePrepared.
> + The asynchronous requests are followed by a  + 
> linkend="libpq-PQbatchSyncQueue">PQbatchSyncQueue(conn)
>  call to mark
> + the end of the batch. The client does not need to 
> call
> + PQgetResult immediately after dispatching each
> + operation. Result 

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-23 Thread Simon Riggs
On 23 August 2017 at 08:18, Michael Paquier  wrote:
> On Wed, Aug 23, 2017 at 10:59 AM, Masahiko Sawada  
> wrote:
>> On Tue, Aug 22, 2017 at 3:23 PM, Simon Riggs  wrote:
>>> e.g.
>>> replace RelationGetRelationName() with
>>> RelationGetOptionallyQualifiedRelationName()
>>> and then control whether we include this new behaviour with
>>> log_qualified_object_names = on | off
>>
>> Is there any case where we don't want to get non-qualified object
>> names? If users want to get the same log message as what they got so
>> far, it would be better to have a GUC that allows us to switch between
>> the existing behavior and the forcibly logging qualified object names.
>
> I can imagine plenty of cases where providing more information is
> valuable, but not really any where it makes more sense to provide less
> information, so -1 for a GUC to control such behavior. I would imagine
> that people are not going to set it anyway. A RangeVar may not set the
> schema_name, so I would suggest to rely on that to decide if the error
> messages show the schema name or name.

We can put in the GUC if there are objections about backwards
compaibility, so I am OK to leave it out.

> Still we are only talking about
> two messages in the vacuum code paths, which are the ones close to the
> checks where is assigned the OID of the relation with a RangeVar.

The proposal is to change all log messages so we have consistency, not
just VACUUM.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Ashutosh Bapat
On Wed, Aug 23, 2017 at 12:56 PM, Konstantin Knizhnik
 wrote:
>
>
> On 22.08.2017 17:27, Konstantin Knizhnik wrote:
>
>
>
> On 18.08.2017 04:33, Robert Haas wrote:
>
>
> It seems like a somewhat ad-hoc approach; it supposes that we can take any
> query produced by deparseSelectStmtForRel() and stick a LIMIT clause onto
> the very end and all will be well.  Maybe that's not a problematic
> assumption, not sure.  The grammar happens to allow both FOR UPDATE LIMIT n
> and LIMIT n FOR UPDATE even though only the latter syntax is documented.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
> I am not absolutely sure that it is possible to append any query which can
> be constructed by postgres_fdw for foreign scan with "LIMIT n" clause.
> But I also do not know example when it is not possible. As you have
> mentioned, "FOR UPDATE LIMIT n" is currently recognized by Postgres.
>
>
> I have inspected deparseSelectStmtForRel function and now I am sure that
> appending LIMIT to the SQL statement generated by this function will not
> cause any problems.
> It can produce only the following subset of SELECT:
>
> select  FROM  [GROUP BY ... [ HAVING ... ]] [
> OREDER BY ... ] [ FOR UPDATE ... ];
>
>
> The only suspicious clause is FOR UPDATE, but I have checked that "FOR
> UPDATE ... LIMIT n" is  really accepted by Postgres parser.
>

There are two ways we can do this.
1. Implement limit handling in postgresGetForeignUpperPaths() when it
gets UPPERREL_FINAL (or we might want to break this rel into final and
limit rel). We then add paths for limit processing to the final rel
and add corresponding handling in deparseSelectStmtForRel(). If
foreign path happens to be cheaper, LIMIT gets pushed down to the
foreign server. But this approach does not take care of LIMITs passed
down by pass_down_bound(). But it will take care of deparsing the
query correctly and handle OFFSET clause as well.

2. Konstantin's approach takes care of LIMITs passed down by
pass_down_bound(), but "always" pushes down the LIMIT and assumes that
a LIMIT clause can be appended to the query already generated. Both of
these seem sane choices. But then it doesn't push down OFFSET clause
even when it's possible to push it down.

If we could defer deparsing the query to execution time we might be
able to achieve both the targets. It has one more advantage: we could
pass parameters embedded in the query, rather than binding them.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-23 Thread Etsuro Fujita

On 2017/08/22 9:55, Amit Langote wrote:

On 2017/08/22 1:08, Robert Haas wrote:

On Mon, Aug 21, 2017 at 7:45 AM, Etsuro Fujita
 wrote:

If there are no objections, I'll add this to the open item list for v10.


This seems fairly ad-hoc to me.  I mean, now you have
CheckValidResultRel not being called in just this one case -- but that
bypasses all the checks that function might do, not just this one.  It
so happens that's OK at the moment because CheckCmdReplicaIdentity()
doesn't do anything in the insert case.

I'm somewhat inclined to just view this as a limitation of v10 and fix
it in v11.


That limitation seems too restrictive to me.


If you want to fix it in v10, I think we need a different
approach -- just ripping the CheckValidResultRel checks out entirely
doesn't seem like a good idea to me.


Before 389af951552f, the relkind check that is now performed by
CheckValidResultRel(), used to be done in InitResultRelInfo().  ISTM, it
was separated out so that certain ResultRelInfos could be initialized
without the explicit relkind check, either because it's taken care of
elsewhere or the table in question is *known* to be a valid result
relation.  Maybe, mostly just the former of the two reasons when that
commit went in.

IMO, the latter case applies when initializing ResultRelInfos for
partitions during tuple-routing, because the table types we allow to
become partitions are fairly restricted.


Agreed.


Also, it seems okay to show the error messages that CheckValidResultRel()
shows when the concerned table is *directly* addressed in a query, but the
same error does not seem so user-friendly when emitted for one of the
partitions while tuple-routing is being set up.  IMHO, there should be
"tuple routing" somewhere in the error message shown in that case, even if
it's for the total lack of support for inserts by a FDW.


Agreed, but I'd vote for fixing this in v10 as proposed; I agree that 
just ripping the CheckValidResultRel checks out entirely is not a good 
idea, but that seems OK to me at least as a fix just for v10.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression stoping PostgreSQL 9.4.13 if a walsender is running

2017-08-23 Thread Andres Freund
On 2017-08-23 09:52:45 +0900, Michael Paquier wrote:
> On Wed, Aug 23, 2017 at 2:28 AM, Marco Nenciarini
>  wrote:
> > I have noticed that after the 9.4.13 release PostgreSQL reliably fails
> > to shutdown with smart and fast method if there is a running walsender.
> >
> > The postmaster continues waiting forever for the walsender termination.
> >
> > It works perfectly with all the other major releases.
> 
> Right. A similar issue has been reported yesterday:
> https://www.postgresql.org/message-id/caa5_dud0o1xym8onozhrepypu-t8nzklzs1pt2jpzp0ns+v...@mail.gmail.com
> Thanks for digging into the origin of the problem, I was lacking of
> time yesterday to look at it.
> 
> > I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d
> >
> > After some investigation I discovered that the instruction that sets
> > got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
> > function.

Yea, that's an annoying screwup (by me) - there were merge conflicts on
every single version, so apparently I screwed up at least one of them
:(. Sorry for that.

Will fix tomorrow.

> That looks correct to me, only REL9_4_STABLE is impacted. This bug
> breaks many use cases like failovers :(

Well, scheduled failovers, that is.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-23 Thread Konstantin Knizhnik



On 22.08.2017 17:27, Konstantin Knizhnik wrote:



On 18.08.2017 04:33, Robert Haas wrote:


It seems like a somewhat ad-hoc approach; it supposes that we can 
take any query produced by deparseSelectStmtForRel() and stick a 
LIMIT clause onto the very end and all will be well.  Maybe that's 
not a problematic assumption, not sure.  The grammar happens to allow 
both FOR UPDATE LIMIT n and LIMIT n FOR UPDATE even though only the 
latter syntax is documented.


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



I am not absolutely sure that it is possible to append any query which 
can be constructed by postgres_fdw for foreign scan with "LIMIT n" clause.
But I also do not know example when it is not possible. As you have 
mentioned, "FOR UPDATE LIMIT n" is currently recognized by Postgres.




I have inspected deparseSelectStmtForRel function and now I am sure that 
appending LIMIT to the SQL statement generated by this function will not 
cause any problems.

It can produce only the following subset of SELECT:

select  FROM  [GROUP BY ... [ HAVING ... ]] [ 
OREDER BY ... ] [ FOR UPDATE ... ];



The only suspicious clause is FOR UPDATE, but I have checked that "FOR 
UPDATE ... LIMIT n" is  really accepted by Postgres parser.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-23 Thread Michael Paquier
On Wed, Aug 23, 2017 at 10:59 AM, Masahiko Sawada  wrote:
> On Tue, Aug 22, 2017 at 3:23 PM, Simon Riggs  wrote:
>> e.g.
>> replace RelationGetRelationName() with
>> RelationGetOptionallyQualifiedRelationName()
>> and then control whether we include this new behaviour with
>> log_qualified_object_names = on | off
>
> Is there any case where we don't want to get non-qualified object
> names? If users want to get the same log message as what they got so
> far, it would be better to have a GUC that allows us to switch between
> the existing behavior and the forcibly logging qualified object names.

I can imagine plenty of cases where providing more information is
valuable, but not really any where it makes more sense to provide less
information, so -1 for a GUC to control such behavior. I would imagine
that people are not going to set it anyway. A RangeVar may not set the
schema_name, so I would suggest to rely on that to decide if the error
messages show the schema name or name. Still we are only talking about
two messages in the vacuum code paths, which are the ones close to the
checks where is assigned the OID of the relation with a RangeVar.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-23 Thread Chris Travers
On Aug 21, 2017 07:47, "Simon Riggs"  wrote:

On 18 August 2017 at 15:40, Alvaro Herrera  wrote:
> Ildar Musin wrote:
>
>> While we've been developing pg_pathman extension one of the most frequent
>> questions we got from our users was about global index support. We cannot
>> provide it within an extension. And I couldn't find any recent discussion
>> about someone implementing it. So I'm thinking about giving it a shot and
>> start working on a patch for postgres.
>>
>> One possible solution is to create an extended version of item pointer
which
>> would store relation oid along with block number and position:
>
> I've been playing with the index code in order to allow indirect tuples,
> which are stored in a format different from IndexTupleData.
>
> I've been adding an "InMemoryIndexTuple" (maybe there's a better name)
> which internally has pointers to both IndexTupleData and
> IndirectIndexTupleData, which makes it easier to pass around the index
> tuple in either format.

> It's very easy to add an OID to that struct,
> which then allows to include the OID in either an indirect index tuple
> or a regular one.

If there is a unique index then there is no need for that. Additional
data to the index makes it even bigger and even less useful, so we
need to count that as a further disadvantage of global indexes.

I have a very clear statement from a customer recently that "We will
never use global indexes", based upon their absolute uselessness in
Oracle.


It is worth noting that the only use case I can see where global indexes
fill a functionality gap are with unique indexes which allow you to enforce
uniqueness across an inheritance tree where the uniqueness is orthogonal to
any partition key.

I could find large numbers of uses for that.  That could also allow
referential integrity to check against a root table rather than force
partition explosion.Will

Otherwise the following mostly works:

Create table (like foo including all) inherits (foo);

So the gap this addresses is very real even if it is narrow.


> Then, wherever we're using IndexTupleData in the index AM code, we would
> replace it with InMemoryIndexTuple.  This should satisfy both your use
> case and mine.

Global indexes are a subset of indirect indexes use case but luckily
not the only use.

--
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-23 Thread Michael Paquier
On Wed, Aug 23, 2017 at 3:04 PM, Masahiko Sawada  wrote:
> It seems to me that we should discuss whether we want to keep the some
> syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
> making the quorum commit the default behavior of 'N(a,b)' syntax. If
> we plan to remove such syntax in a future release we can live with the
> current code and should document it.

The parsing code of repl_gram.y represents zero maintenance at the
end, so let me suggest to just live with what we have and do nothing.
Things kept as they are are not bad either. By changing the default,
people may have their failover flows silently trapped. So if we change
the default we will perhaps make some users happy, but I think that we
are going to make also some people angry. That's not fun to debug
silent failover issues.

At the end of the day, we could just add one sentence in the docs
saying the use of ANY and FIRST is encouraged over the past grammar
because they are clearer to understand.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-23 Thread Masahiko Sawada
On Sat, Aug 19, 2017 at 12:28 AM, Robert Haas  wrote:
> On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier
>  wrote:
>> I had in mind a ereport(WARNING) in create_syncrep_config. Extra
>> thoughts/opinions welcome.
>
> I think for v10 we should just document the behavior we've got; I
> think it's too late to be whacking things around now.
>
> For v11, we could emit a warning if we plan to deprecate and
> eventually remove the syntax without ANY/FIRST, but let's not do:
>
> WARNING:  what you did is ok, but you might have wanted to do something else
>
> First of all, whether or not that can properly be called a warning is
> highly debatable.  Also, if you do that sort of thing to your spouse
> and/or children, they call it "nagging".  I don't think users will
> like it any more than family members do.
>

It seems to me that we should discuss whether we want to keep the some
syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
making the quorum commit the default behavior of 'N(a,b)' syntax. If
we plan to remove such syntax in a future release we can live with the
current code and should document it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers