date:20190703

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-07-03 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> I'm dubious that relying on zone[1970].tab would improve matters
 Tom> substantially; it would fix some cases, but I don't think it would
 Tom> fix all of them. Resolving all ambiguous zone-name choices is not
 Tom> the charter of those files.

Allowing zone matching by _content_ (as we do) rather than by name does
not seem to be supported in any respect whatever by the upstream data;
we've always been basically on our own with that.

[tl/dr for what follows: my proposal reduces the number of discrepancies
from 91 (see previously posted list) to 16 or 7, none of which are new]

So here are the ambiguities that are not resolvable at all:

Africa/Abidjan -> GMT

This happens because the Africa/Abidjan zone is literally just GMT even
down to the abbreviation, and we don't want to guess Africa/Abidjan for
all GMT installs.

America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
Asia/Kuala_Lumpur -> Asia/Singapore

These are cases where zone1970.tab, despite its name, includes
distinctly-named zones which are distinct only for times in the far past
(before 1920 or 1905 respectively). They are otherwise identical by
content. We therefore end up choosing arbitrarily.

In addition, the following collection of random islands have timezones
which lack local abbreviation names, recent offset changes, or DST, and
are therefore indistinguishable by content from fixed-offset zones like
Etc/GMT+2:

Etc/GMT-4 ==
  Indian/Mahe
  Indian/Reunion

Etc/GMT-7 == Indian/Christmas
Etc/GMT-9 == Pacific/Palau
Etc/GMT-10 == Pacific/Port_Moresby
Etc/GMT-11 == Pacific/Guadalcanal

Etc/GMT-12 ==
  Pacific/Funafuti
  Pacific/Tarawa
  Pacific/Wake
  Pacific/Wallis

Etc/GMT+10 == Pacific/Tahiti
Etc/GMT+9 == Pacific/Gambier

Etc/GMT+2 == Atlantic/South_Georgia

We currently map all of these to the Etc/GMT+x names on the grounds of
length. If we chose to prefer zone.tab names over Etc/* names for all of
these, we'd be ambiguous only for a handful of relatively small islands.

-- 
Andrew (irc:RhodiumToad)

RE: Run-time pruning for ModifyTable

2019-07-03 Thread Kato, Sho

Hi, Amit

> If I understand the details of [1] correctly, ModifyTable will no longer
> have N subplans for N result relations as there are today.  So, it doesn't
> make sense for ModifyTable to contain PartitionedRelPruneInfos and for
> ExecInitModifyTable/ExecModifyTable
> to have to perform initial and execution-time pruning, respectively.

Does this mean that the generic plan will not have N subplans for N result 
relations?
I thought [1] would make creating generic plans faster, but is this correct?

regards,

kato sho
> -Original Message-
> From: Amit Langote [mailto:amitlangot...@gmail.com]
> Sent: Wednesday, July 3, 2019 5:41 PM
> To: David Rowley 
> Cc: PostgreSQL Hackers 
> Subject: Re: Run-time pruning for ModifyTable
> 
> On Wed, Jul 3, 2019 at 4:34 PM David Rowley 
> wrote:
> > On Wed, 3 Jul 2019 at 17:27, Amit Langote 
> wrote:
> > > A cursory look at the patch suggests that most of its changes will
> > > be for nothing if [1] materializes.  What do you think about that?
> >
> > Yeah, I had this in mind when writing the patch, but kept going
> > anyway.  I think it's only really a small patch of this patch that
> > would get wiped out with that change. Just the planner.c stuff.
> > Everything else is still required, as far as I understand.
> 
> If I understand the details of [1] correctly, ModifyTable will no longer
> have N subplans for N result relations as there are today.  So, it doesn't
> make sense for ModifyTable to contain PartitionedRelPruneInfos and for
> ExecInitModifyTable/ExecModifyTable
> to have to perform initial and execution-time pruning, respectively.
> As I said, bottom expansion of target inheritance will mean pruning (both
> plan-time and run-time) will occur at the bottom too, so the run-time
> pruning capabilities of nodes that already have it will be used for UPDATE
> and DELETE too.
> 
> Thanks,
> Amit
>

Re: Tid scan improvements

2019-07-03 Thread David Rowley

On Mon, 1 Jul 2019 at 23:29, Thomas Munro  wrote:
> The new CF is here.  I'm going through poking threads for submissions
> that don't apply, but it sounds like this needs more than a rebase?
> Perhaps this belongs in the next CF?

0001 and 0004 of v7 got pushed in PG12. The CFbot will be trying to
apply 0001 still, but on testing 0002, no joy there either.

It would be good to see this back in PG13. For now, I'll mark it as
waiting on author.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: TopoSort() fix

2019-07-03 Thread Rui Hai Jiang

I'll try to figure out some  scenarios to do the test. A parallel process
group is needed for the test.

Actually I was trying to do some testing against the locking mechanism. I
happened to see this issue.

On Wed, Jul 3, 2019 at 9:38 PM Robert Haas  wrote:

> On Tue, Jul 2, 2019 at 11:23 AM Tom Lane  wrote:
> > Rui Hai Jiang  writes:
> > > I think I found an issue in the TopoSort() function.
> >
> > This indeed seems like a live bug introduced by a1c1af2a.
> > Robert?
>
> This is pretty thoroughly swapped out of my head, but it looks like
> that analysis might be correct.
>
> Is it practical to come up with a test case that demonstrates the problem?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: Where is SSPI auth username determined for TAP tests?

2019-07-03 Thread Michael Paquier

On Wed, Jul 03, 2019 at 09:53:14AM -0400, Tom Lane wrote:
> I haven't checked that this actually works, but it looks plausible,
> and I agree it's simpler/easier.

Thanks, committed.  While testing on Windows, I have been trapped by
the fact that IPC::Run mishandles double quotes, causing the tests to
fail for the environment variable part because of a mismatching
pg_hba.conf entry.  The difference is that with we run pg_regress
--config-auth using IPC::Run::run on HEAD but the patch switches to
system().  So I have finished by removing the double-quote handling
from the restore user name which makes the whole test suite more
consistent.  The patch has at the end the advantage of removing in
pg_ident.conf the entry related to the OS user running the scripts,
which makes the environment more restricted by default.
--
Michael


signature.asc
Description: PGP signature

Re: Memory-Bounded Hash Aggregation

2019-07-03 Thread Jeff Davis

On Wed, 2019-07-03 at 02:17 +0200, Tomas Vondra wrote:
> What does "partitioned hash strategy" do? It's probably explained in
> one
> of the historical discussions, but I'm not sure which one. I assume
> it
> simply hashes the group keys and uses that to partition the data, and
> then
> passing it to hash aggregate.

Yes. When spilling, it is cheap to partition on the hash value at the
same time, which dramatically reduces the need to spill multiple times.
Previous discussions:

> Unfortunately the second link does not work :-(

It's supposed to be:

https://www.postgresql.org/message-id/CAGTBQpa__-NP7%3DkKwze_enkqw18vodRxKkOmNhxAPzqkruc-8g%40mail.gmail.com

> I'm not going to block Approach 1, althought I'd really like to see
> something that helps with array_agg.

I have a WIP patch that I just posted. It doesn't yet work with
ARRAY_AGG, but I think it can be made to work by evicting the entire
hash table, serializing the transition states, and then later combining
them.

> Aren't all three approaches a way to "fix" hash aggregate? In any
> case,
> it's certainly reasonable to make incremental changes. The question
> is
> whether "approach 1" is sensible step towards some form of "approach
> 3"

Disk-based hashing certainly seems like a reasonable algorithm on paper
that has some potential advantages over sorting. It certainly seems
sensible to me that we explore the disk-based hashing strategy first,
and then we would at least know what we are missing (if anything) by
going with the hybrid approach later.

There's also a fair amount of design space to explore in the hybrid
strategy. That could take a while to converge, especially if we don't
have anything in place to compare against.

> > * It means we have a hash table and sort running concurrently, each
> >  using memory. Andres said this might not be a problem[3], but I'm
> >  not convinced that the problem is zero. If you use small work_mem
> >  for the write phase of sorting, you'll end up with a lot of runs
> > to
> >  merge later and that has some kind of cost.
> > 
> 
> Why would we need to do both concurrently? I thought we'd empty the
> hash
> table before doing the sort, no?

So you are saying we spill the tuples into a tuplestore, then feed the
tuplestore through a tuplesort? Seems inefficient, but I guess we can.

> Do we actually need to handle that case? How many such aggregates are
> there? I think it's OK to just ignore that case (and keep doing what
> we do
> now), and require serial/deserial functions for anything better.

Punting on a few cases is fine with me, if the user has a way to fix
it.

Regards,
Jeff Davis

Re: Cleaning up and speeding up string functions

2019-07-03 Thread David Rowley

On Sun, 26 May 2019 at 04:50, Tom Lane  wrote:
>
> David Rowley  writes:
> > Here's a small patch series aimed to both clean up a few misuses of
> > string functions and also to optimise a few things along the way.
>
> > 0001: Converts various call that use appendPQExpBuffer() that really
> > should use appendPQExrBufferStr().  If there's no formatting then
> > using the former function is a waste of effort.
>
> > 0002: Similar to 0001 but replaces various appendStringInfo calls with
> > appendStringInfoString calls.
>
> Agreed on these; we've applied such transformations before.

I've pushed 0001 and 0002.

Instead of having 0004, how about the attached?

Most of the calls won't improve much performance-wise since they're so
cheap anyway, but there is xmlconcat(), I imagine that should see some
speedup.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


use_binary_string_info_when_len_is_known.patch
Description: Binary data

Re: GiST VACUUM

2019-07-03 Thread Peter Geoghegan

On Thu, Apr 4, 2019 at 8:15 AM Heikki Linnakangas  wrote:
> I think we should fix this in a similar manner in B-tree, too, but that
> can be done separately. For B-tree, we need to worry about
> backwards-compatibility, but that seems simple enough; we just need to
> continue to understand old deleted pages, where the deletion XID is
> stored in the page opaque field.

What Postgres versions will the B-Tree fix end up targeting? Sounds
like you plan to backpatch all the way?

-- 
Peter Geoghegan

Re: Memory-Bounded Hash Aggregation

2019-07-03 Thread Jeff Davis

On Mon, 2019-07-01 at 12:13 -0700, Jeff Davis wrote:
> This is for design review. I have a patch (WIP) for Approach 1, and
> if
> this discussion starts to converge on that approach I will polish and
> post it.

WIP patch attached (based on 9a81c9fa); targeting September CF.

Not intended for detailed review yet, but it seems to work in enough
cases (including grouping sets and JIT) to be a good proof-of-concept
for the algorithm and its complexity.

Initial performance numbers put it at 2X slower than sort for grouping
10M distinct integers. There are quite a few optimizations I haven't
tried yet and quite a few tunables I haven't tuned yet, so hopefully I
can close the gap a bit for the small-groups case.

I will offer more details soon when I have more confidence in the
numbers.

It does not attempt to spill ARRAY_AGG at all yet.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..9f978e5a90 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1702,6 +1702,23 @@ include_dir 'conf.d'
   
  
 
+ 
+  hashagg_mem_overflow (boolean)
+  
+   hashagg_mem_overflow configuration parameter
+  
+  
+  
+   
+ If hash aggregation exceeds work_mem at query
+ execution time, and hashagg_mem_overflow is set
+ to on, continue consuming more memory rather than
+ performing disk-based hash aggregation. The default
+ is off.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
@@ -4354,6 +4371,24 @@ ANY num_sync ( 
+  enable_hashagg_spill (boolean)
+  
+   enable_hashagg_spill configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation plan
+types when the memory usage is expected to
+exceed work_mem. This only affects the planner
+choice; actual behavior at execution time is dictated by
+. The default
+is on.
+   
+  
+ 
+
  
   enable_hashjoin (boolean)
   
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 92969636b7..6d6481a75f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -102,6 +102,7 @@ static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
@@ -1826,6 +1827,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
 show_instrumentation_count("Rows Removed by Filter", 1,
 		   planstate, es);
@@ -2715,6 +2717,55 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on hash aggregate buckets and batches
+ */
+static void
+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+	Agg		*agg	   = (Agg *)aggstate->ss.ps.plan;
+	long	 memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
+	long	 diskKb= (aggstate->hash_disk_used + 1023) / 1024;
+
+
+	Assert(IsA(aggstate, AggState));
+
+	if (agg->aggstrategy != AGG_HASHED &&
+		agg->aggstrategy != AGG_MIXED)
+		return;
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		appendStringInfoSpaces(es->str, es->indent * 2);
+		appendStringInfo(
+			es->str,
+			"Memory Usage: %ldkB",
+			memPeakKb);
+
+		if (aggstate->hash_batches_used > 0)
+		{
+			appendStringInfo(
+es->str,
+"  Batches: %d  Disk Usage:%ldkB",
+aggstate->hash_batches_used, diskKb);
+		}
+
+		appendStringInfo(
+			es->str,
+			"\n");
+	}
+	else
+	{
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+		if (aggstate->hash_batches_used > 0)
+		{
+			ExplainPropertyInteger("HashAgg Batches", NULL,
+   aggstate->hash_batches_used, es);
+			ExplainPropertyInteger("Disk Usage", "kB", diskKb, es);
+		}
+	}
+}
+
 /*
  * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
  */
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 66a67c72b2..19e1127627 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1570,7 +1570,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 [op->d.agg_init_trans.transno];
 
 			/* If transValue has not yet been initialized, do so now. */
-			if (pergroup->noTransValue)
+			if (pergroup != NULL && pergroup->noTransValue)
 			{

Re: Replacing the EDH SKIP primes

2019-07-03 Thread Michael Paquier

On Wed, Jul 03, 2019 at 08:56:42PM +0200, Daniel Gustafsson wrote:
> Agreed, I’ve updated the patch with a comment on this formulated such that it
> should stand the test of time even as OpenSSL changes etc.

I'd like to think that we had rather mention the warning issue
explicitely, so as people don't get surprised, like that for example:

 *  This is the 2048-bit DH parameter from RFC 3526.  The generation of the
 *  prime is specified in RFC 2412, which also discusses the design choice
 *  of the generator.  Note that when loaded with OpenSSL this causes
 *  DH_check() to fail on with DH_NOT_SUITABLE_GENERATOR, where leaking
 *  a bit is preferred.

Now this makes an OpenSSL-specific issue pop up within a section of
the code where we want to make things more generic with SSL, so your
simpler version has good arguments as well.

I have just rechecked the shape of the key, and we have an exact
match.
--
Michael
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 96415a9c8b..93581acb26 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -206,19 +206,20 @@ typedef struct Port
  *	Hardcoded DH parameters, used in ephemeral DH keying.  (See also
  *	README.SSL for more details on EDH.)
  *
- *	If you want to create your own hardcoded DH parameters
- *	for fun and profit, review "Assigned Number for SKIP
- *	Protocols" (http://www.skip-vpn.org/spec/numbers.html)
- *	for suggestions.
+ *	This is the 2048-bit DH parameter from RFC 3526.  The generation of the
+ *	prime is specified in RFC 2412, which also discusses the design choice
+ *	of the generator.  Note that when loaded with OpenSSL this causes
+ *	DH_check() to fail on with DH_NOT_SUITABLE_GENERATOR, where leaking
+ *	a bit is preferred.
  */
 #define FILE_DH2048 \
 "-BEGIN DH PARAMETERS-\n\
-MIIBCAKCAQEA9kJXtwh/CBdyorrWqULzBej5UxE5T7bxbrlLOCDaAadWoxTpj0BV\n\
-89AHxstDqZSt90xkhkn4DIO9ZekX1KHTUPj1WV/cdlJPPT2N286Z4VeSWc39uK50\n\
-T8X8dryDxUcwYc58yWb/Ffm7/ZFexwGq01uejaClcjrUGvC/RgBYK+X0iP1YTknb\n\
-zSC0neSRBzZrM2w4DUUdD3yIsxx8Wy2O9vPJI8BD8KVbGI2Ou1WMuF040zT9fBdX\n\
-Q6MdGGzeMyEstSr/POGxKUAYEY18hKcKctaGxAMZyAcpesqVDNmWn6vQClCbAkbT\n\
-CD1mpF1Bn5x8vYlLIhkmuquiXsNV6TILOwIBAg==\n\
+MIIBCAKCAQEA///JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxOb\n\
+IlFKCHmONATd75UZs806QxswKwpt8l8UN0/hNW1tUcJF5IW1dmJefsb0TELppjft\n\
+awv/XLb0Brft7jhr+1qJn6WunyQRfEsf5kkoZlHs5Fs9wgB8uKFjvwWY2kg2HFXT\n\
+mmkWP6j9JM9fg2VdI9yjrZYcYvNWIIVSu57VKQdwlpZtZww1Tkq8mATxdGwIyhgh\n\
+fDKQXkYuNs474553LBgOhgObJ4Oi7Aeij7XFXfBvTFLJ3ivL9pVYFxg5lUl86pVq\n\
+5RXSJhiY+gUQFXKOWoqsqmj//wIBAg==\n\
 -END DH PARAMETERS-\n"
 
 /*


signature.asc
Description: PGP signature

Re: Index Skip Scan

2019-07-03 Thread David Fetter

On Thu, Jul 04, 2019 at 10:06:11AM +1200, David Rowley wrote:
> On Thu, 4 Jul 2019 at 09:02, James Coleman  wrote:
> > I think that example is the opposite direction of what David (Rowley)
> > is saying. Unique on {a, b} implies unique on {a, b, c} while you're
> > correct that the inverse doesn't hold.
> >
> > Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c}
> > and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this
> > different from pathkeys.
> 
> Yeah, exactly. A superset of the unique columns is still unique.

Thanks for clarifying!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Weird intermittent ECPG test failures

2019-07-03 Thread Tom Lane

Observe the following buildfarm failures:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2019-07-03%2013%3A33%3A59
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-04-30%2014%3A45%3A26
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-02%2015%3A15%3A26
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-05%2006%3A15%3A26
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-26%2002%3A00%3A26
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-07-03%2022%3A15%3A27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-04-10%2011%3A00%3A09
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-04-14%2012%3A42%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-05-01%2011%3A00%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-06-29%2011%3A33%3A25

(These represent all but a couple of the ECPG-Check failures in the last
90 days, excluding a spate around 31-May that were caused by an
ill-considered code change.)

In every one of these cases, the visible symptom is that one of the ECPG
test programs produced precisely nothing on stdout.  stderr seems okay
though.  (Not all of these cases have nonempty expected stderr, but enough
do that we can say it's not a case of the program just failing entirely.)

We've been seeing this sort of thing for a *long* time, although my
recollection is that in the past it's almost always been the thread-thread
case that failed, so that I'd supposed that there was some problem with
that particular test.  It's now clear that that's not true though, and
there's seemingly some generic issue with the tests or test
infrastructure.

The other thing that I thought was invariably true was that it was a
Windows-only thing.  But here we have conchuela failing in the exact
same way, and it is, um, not Windows.

Anyone have a theory what might be going on here?

regards, tom lane

Re: POC: converting Lists into arrays

2019-07-03 Thread David Rowley

On Thu, 4 Jul 2019 at 06:15, Tom Lane  wrote:
>
> David Rowley  writes:
> > I've now read over the entire patch and have noted down the following:
>
> Thanks for the review!  Attached is a v8 responding to most of your
> comments.  Anything not quoted below I just accepted.

Thanks for the speedy turnaround. I've looked at v8, as far as a diff
between the two patches and I'm happy.

I've marked as ready for committer.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Index Skip Scan

2019-07-03 Thread David Rowley

On Thu, 4 Jul 2019 at 09:02, James Coleman  wrote:
>
> On Wed, Jul 3, 2019 at 3:46 PM David Fetter  wrote:
> >
> > On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote:
> > > On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:
> > >
> > > The more I think about these UniqueKeys, the more I think they need to
> > > be a separate concept to PathKeys. For example, UniqueKeys: { x, y }
> > > should be equivalent to { y, x }, but with PathKeys, that's not the
> > > case, since the order of each key matters. UniqueKeys equivalent
> > > version of pathkeys_contained_in() would not care about the order of
> > > individual keys, it would say things like, { a, b, c } is contained in
> > > { b, a }, since if the path is unique on columns { b, a } then it must
> > > also be unique on { a, b, c }.
> >
> > Is that actually true, though? I can see unique {a, b, c} => unique
> > {a, b}, but for example:
> >
> > a | b | c
> > --|---|--
> > 1 | 2 | 3
> > 1 | 2 | 4
> > 1 | 2 | 5
> >
> > is unique on {a, b, c} but not on {a, b}, at least as I understand the
> > way "unique" is used here, which is 3 distinct {a, b, c}, but only one
> > {a, b}.
> >
> > Or I could be missing something obvious, and in that case, please
> > ignore.
>
> I think that example is the opposite direction of what David (Rowley)
> is saying. Unique on {a, b} implies unique on {a, b, c} while you're
> correct that the inverse doesn't hold.
>
> Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c}
> and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this
> different from pathkeys.

Yeah, exactly. A superset of the unique columns is still unique.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2019-07-03 Thread Melanie Plageman

On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman 
wrote:

>
> These questions will probably make a lot more sense with corresponding
> code, so I will follow up with the second version of the state machine
> patch once I finish it.
>
>
I have changed the state machine and resolved the questions I had
raised in the previous email. This seems to work for the parallel and
non-parallel cases. I have not yet rewritten the unmatched outer tuple
status as a bitmap in a spill file (for ease of debugging).

Before doing that, I wanted to ask what a desirable fallback condition
would be. In this patch, fallback to hashloop join happens only when
inserting tuples into the hashtable after batch 0 when inserting
another tuple from the batch file would exceed work_mem. This means
you can't increase nbatches, which, I would think is undesirable.

I thought a bit about when fallback should happen. So, let's say that
we would like to fallback to hashloop join when we have increased
nbatches X times. At that point, since we do not want to fall back to
hashloop join for all batches, we have to make a decision. After
increasing nbatches the Xth time, do we then fall back for all batches
for which inserting inner tuples exceeds work_mem? Do we use this
strategy but work_mem + some fudge factor?

Or, do we instead try to determine if data skew led us to increase
nbatches both times and then determine which batch, given new
nbatches, contains that data, set fallback to true only for that
batch, and let all other batches use the existing logic (with no
fallback option) unless they contain a value which leads to increasing
nbatches X number of times?

-- 
Melanie Plageman
From 2d6fec7d2bac90a41d4ec88ad5ac2011562a14a1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 10 Jun 2019 10:54:42 -0700
Subject: [PATCH v2] hashloop fallback

First part is to "chunk" the inner file into arbitrary partitions of
work_mem size

This chunks inner file and makes it so that the offset is along tuple
bounds.

Note that this makes it impossible to increase nbatches during the
loading of batches after initial hashtable creation

In preparation for doing this chunking, separate advance batch and load
batch. advance batch only if page offset is reset to 0, then load that
part of the batch

Second part was to: implement outer tuple batch rewinding per chunk of
inner batch

Would be a simple rewind and replay of outer side for each chunk of
inner if it weren't for LOJ.
Because we need to wait to emit NULL-extended tuples for LOJ until after
all chunks of inner have been processed.
To do this, make a list with an entry for each outer tuple and keep
track of its match status. Also, keep track of its offset so that we can
access the file at that offset in case the tuples are not processed in
order (like in parallel case)

For non-hashloop fallback scenario, this list should not be constructed
and unmatched outer tuples should be emitted as they are encountered.
---
 src/backend/executor/nodeHashjoin.c   | 379 
 src/include/executor/hashjoin.h   |  10 +
 src/include/nodes/execnodes.h |  12 +
 src/test/regress/expected/adaptive_hj.out | 402 ++
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/adaptive_hj.sql  |  39 +++
 7 files changed, 770 insertions(+), 75 deletions(-)
 create mode 100644 src/test/regress/expected/adaptive_hj.out
 create mode 100644 src/test/regress/sql/adaptive_hj.sql

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 8484a287e7..e46b453a9b 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -124,9 +124,11 @@
 #define HJ_BUILD_HASHTABLE		1
 #define HJ_NEED_NEW_OUTER		2
 #define HJ_SCAN_BUCKET			3
-#define HJ_FILL_OUTER_TUPLE		4
-#define HJ_FILL_INNER_TUPLES	5
-#define HJ_NEED_NEW_BATCH		6
+#define HJ_FILL_INNER_TUPLES4
+#define HJ_NEED_NEW_BATCH		5
+#define HJ_NEED_NEW_INNER_CHUNK 6
+#define HJ_ADAPTIVE_EMIT_UNMATCHED_OUTER_INIT 7
+#define HJ_ADAPTIVE_EMIT_UNMATCHED_OUTER 8

 /* Returns true if doing null-fill on outer relation */
 #define HJ_FILL_OUTER(hjstate)	((hjstate)->hj_NullInnerTupleSlot != NULL)
@@ -143,10 +145,15 @@ static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
  BufFile *file,
  uint32 *hashvalue,
  TupleTableSlot *tupleSlot);
-static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
+static bool ExecHashJoinAdvanceBatch(HashJoinState *hjstate);
 static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate);
 static void ExecParallelHashJoinPartitionOuter(HashJoinState *node);
+static bool LoadInner(HashJoinState *hjstate);
+static TupleTableSlot *ExecHashJoinGetOuterTupleAtOffset(HashJoinState *hjstate, off_t offset);
+static void rewindOuter(BufFile *bufFile);

+static TupleTableSlot *
+emitUnmatchedOuterTuple(ExprState

Re: [PATCH][PROPOSAL] Add enum releation option type

2019-07-03 Thread Alvaro Herrera

On 2019-Jul-03, Nikolay Shaplov wrote:

> В письме от понедельник, 1 июля 2019 г. 14:06:28 MSK пользователь Alvaro 
> Herrera написал:

> > + GistBufferingValidMsg = gettext_noop("Valid values are \"on\", \"off\",
> > and \"auto\".");
> > 
> > I think that's the most contentious point on this patch at this point
> > (though I may be misremembering).
> 
> I actually removed "and" from the list and let it be simple coma separated 
> list
> 
>  ERROR:  invalid value for "check_option" option  
>   
>  DETAIL:  Valid values are: "local", "cascaded".
> 
> Now we can translate left part, and subst list to the right part

Yes, I saw that, and you know what?  Nobody said they liked this idea.

> You do not see it that way?

I think this is easier to sell if you change that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Index Skip Scan

2019-07-03 Thread James Coleman

On Wed, Jul 3, 2019 at 3:46 PM David Fetter  wrote:
>
> On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote:
> > On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:
> >
> > The more I think about these UniqueKeys, the more I think they need to
> > be a separate concept to PathKeys. For example, UniqueKeys: { x, y }
> > should be equivalent to { y, x }, but with PathKeys, that's not the
> > case, since the order of each key matters. UniqueKeys equivalent
> > version of pathkeys_contained_in() would not care about the order of
> > individual keys, it would say things like, { a, b, c } is contained in
> > { b, a }, since if the path is unique on columns { b, a } then it must
> > also be unique on { a, b, c }.
>
> Is that actually true, though? I can see unique {a, b, c} => unique
> {a, b}, but for example:
>
> a | b | c
> --|---|--
> 1 | 2 | 3
> 1 | 2 | 4
> 1 | 2 | 5
>
> is unique on {a, b, c} but not on {a, b}, at least as I understand the
> way "unique" is used here, which is 3 distinct {a, b, c}, but only one
> {a, b}.
>
> Or I could be missing something obvious, and in that case, please
> ignore.

I think that example is the opposite direction of what David (Rowley)
is saying. Unique on {a, b} implies unique on {a, b, c} while you're
correct that the inverse doesn't hold.

Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c}
and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this
different from pathkeys.

James Coleman

Re: SQL/JSON path issues/questions

2019-07-03 Thread Alexander Korotkov

Hi!

On Wed, Jul 3, 2019 at 5:27 PM Liudmila Mantrova
 wrote:
>
> I have rechecked the standard and I agree that we should use "filter
> expression" whenever possible.
> "A filter expression must be enclosed in parentheses..." looks like an
> oversight, so I fixed it. As for what's actually enclosed, I believe we
> can still use the word "condition" here as it's easy to understand and
> is already used in our docs, e.g. in description of the WHERE clause
> that serves a similar purpose.
> The new version of the patch fixes the terminology, tweaks the examples,
> and provides some grammar and style fixes in the jsonpath-related chapters.


It looks good to me.  But this sentence looks a bit too complicated.

"It can be followed by one or more accessor operators to define the
JSON element on a lower nesting level by which to filter the result."

Could we phrase this as following?

"In order to filter the result by values lying on lower nesting level,
@ operator can be followed by one or more accessor operators."

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Custom Scan coverage.

2019-07-03 Thread Alvaro Herrera

On 2019-Jul-03, didier wrote:

> Currently there's 0 coverage of CustomScan code path in core.

Yeah :-(

> What about  adding a  noops custom_scan test in src/test/modules/ ? Or
> is it out of pg perimeter and each extension using it should take are
> of themselves?
> 
> If there's an interest I'm willing to write and propose such test suite.

We'd certainly like if there was coverage of that infrastructure.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Custom Scan coverage.

2019-07-03 Thread didier

Hi,
Currently there's 0 coverage of CustomScan code path in core.

What about  adding a  noops custom_scan test in src/test/modules/ ? Or
is it out of pg perimeter and each extension using it should take are
of themselves?

If there's an interest I'm willing to write and propose such test suite.

Regards
Didier.

Re: Index Skip Scan

2019-07-03 Thread David Fetter

On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote:
> On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:
> 
> The more I think about these UniqueKeys, the more I think they need to
> be a separate concept to PathKeys. For example, UniqueKeys: { x, y }
> should be equivalent to { y, x }, but with PathKeys, that's not the
> case, since the order of each key matters. UniqueKeys equivalent
> version of pathkeys_contained_in() would not care about the order of
> individual keys, it would say things like, { a, b, c } is contained in
> { b, a }, since if the path is unique on columns { b, a } then it must
> also be unique on { a, b, c }.

Is that actually true, though? I can see unique {a, b, c} => unique
{a, b}, but for example:

a | b | c
--|---|--
1 | 2 | 3
1 | 2 | 4
1 | 2 | 5

is unique on {a, b, c} but not on {a, b}, at least as I understand the
way "unique" is used here, which is 3 distinct {a, b, c}, but only one
{a, b}.

Or I could be missing something obvious, and in that case, please
ignore.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: POC: converting Lists into arrays

2019-07-03 Thread Tom Lane

David Rowley  writes:
> I also did some more benchmarking of the patch. ...
> Which makes the patched version 2.2% faster than master on that run.

BTW, further on the subject of performance --- I'm aware of at least
these topics for follow-on patches:

* Fix places that are maintaining arrays parallel to Lists for
access-speed reasons (at least simple_rte_array, append_rel_array,
es_range_table_array).

* Look at places using lcons/list_delete_first to maintain FIFO lists.
The patch makes these O(N^2) for long lists.  If we can reverse the list
order and use lappend/list_truncate instead, it'd be better.  Possibly in
some places the list ordering is critical enough to make this impractical,
but I suspect it's an easy win in most.

* Rationalize places that are using combinations of list_copy and
list_concat, probably by inventing an additional list-concatenation
primitive that modifies neither input.

* Adjust API for list_qsort(), as discussed, to save indirections and
avoid constructing an intermediate pointer array.  I also seem to recall
one place in the planner that's avoiding using list_qsort by manually
flattening a list into an array, qsort'ing, and rebuilding the list :-(

I don't think that any one of these fixes would move the needle very
much on "typical" simple workloads, but it's reasonable to hope that in
aggregate they'd make for a noticeable improvement.  In the meantime,
I'm gratified that the initial patch at least doesn't seem to have lost
any ground.

regards, tom lane

Re: Replacing the EDH SKIP primes

2019-07-03 Thread Daniel Gustafsson

> On 3 Jul 2019, at 12:11, Michael Paquier  wrote:

> It would be nice to add a comment on that though, perhaps in
> libpq-be.h where the key is defined.

Agreed, I’ve updated the patch with a comment on this formulated such that it
should stand the test of time even as OpenSSL changes etc.

cheers ./daniel

skip_primes-v2.patch
Description: Binary data

Re: POC: converting Lists into arrays

2019-07-03 Thread Alvaro Herrera

On 2019-Jul-03, Tom Lane wrote:

> David Rowley  writes:
> > 10. I wonder if we can reduce a bit of pain for extension authors by
> > back patching a macro that wraps up a lnext() call adding a dummy
> > argument for the List.
> 
> I was wondering about a variant of that yesterday; specifically,
> I thought of naming the new 2-argument function list_next() not lnext().
> Then we could add "#define list_next(l,c) lnext(c)" in the back branches.
> This would simplify back-patching code that used the new definition, and
> it might save some effort for extension authors who are trying to maintain
> cross-branch code.  On the other hand, it's more keystrokes forevermore,
> and I'm not entirely convinced that code that's using lnext() isn't likely
> to need other adjustments anyway.  So I didn't pull the trigger on that,
> but if people like the idea I'd be okay with doing it like that.

I was thinking about this issue too earlier today, and my conclusion is
that the way you have it in v7 is fine, because lnext() callsites are
not that numerous, so the cost to third-party code authors is not that
high; the other arguments trump this consideration IMO.  I say this as
someone who curses every time he has to backpatch things across the
heap_open / table_open change -- but there are a lot more calls of that.

> Indeed.  I don't want to expend a lot of effort keeping it in sync
> with master over a long period, either.  Opinions?

Yeah, let's get it done soon.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_waldump and PREPARE

2019-07-03 Thread Julien Rouhaud

On Wed, Jul 3, 2019 at 5:21 PM Fujii Masao  wrote:
>
> On Tue, Jul 2, 2019 at 7:16 PM Julien Rouhaud  wrote:
> >
> > On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier  wrote:
> > >
> > > On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote:
> > > > On 2019-Apr-26, Fujii Masao wrote:
> > > >> I did that arrangement because the format of PREPARE TRANSACTION 
> > > >> record,
> > > >> i.e., that struct, also needs to be accessed in backend and frontend.
> > > >> But, of course, if there is smarter way, I'm happy to adopt that!
> > > >
> > > > I don't know.  I spent some time staring at the code involved, and it
> > > > seems it'd be possible to improve just a little bit on cleanliness
> > > > grounds, with a lot of effort, but not enough practical value.
> > >
> > > Describing those records is something we should do.  There are other
> > > parsing routines in xactdesc.c for commit and abort records, so having
> > > that extra routine for prepare at the same place does not sound
> > > strange to me.
> > >
> > > +typedef xl_xact_prepare TwoPhaseFileHeader;
> > > I find this mapping implementation a bit lazy, and your
> > > newly-introduced xl_xact_prepare does not count for all the contents
> > > of the actual WAL record for PREPARE TRANSACTION.  Wouldn't it be
> > > better to put all the contents of the record in the same structure,
> > > and not only the 2PC header information?
> >
> > This patch doesn't apply anymore, could you send a rebase?
>
> Yes, attached is the updated version of the patch.

Thanks!

So the patch compiles and works as intended. I don't have much to say
about it, it all looks good to me, since the concerns about xactdesc.c
aren't worth the trouble.

I'm not sure that I understand Michael's objection though, as
xl_xact_prepare is not a new definition and AFAICS it couldn't contain
the records anyway.  So I'll let him say if he has further objections
or if it's ready for committer!

Re: Index Skip Scan

2019-07-03 Thread Jesper Pedersen


Hi Thomas and David,

Thanks for the feedback !

On 7/2/19 8:27 AM, David Rowley wrote:

On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:

I took this for a quick spin today.  The DISTINCT ON support is nice
and I think it will be very useful.  I've signed up to review it and
will have more to say later.  But today I had a couple of thoughts
after looking into how src/backend/optimizer/plan/planagg.c works and
wondering how to do some more skipping tricks with the existing
machinery.

1.  SELECT COUNT(DISTINCT i) FROM t could benefit from this.  (Or
AVG(DISTINCT ...) or any other aggregate).  Right now you get a seq
scan, with the sort/unique logic inside the Aggregate node.  If you
write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get
a skip scan that is much faster in good cases.  I suppose you could
have a process_distinct_aggregates() in planagg.c that recognises
queries of the right form and generates extra paths a bit like
build_minmax_path() does.  I think it's probably better to consider
that in the grouping planner proper instead.  I'm not sure.


I think to make that happen we'd need to do a bit of an overhaul in
nodeAgg.c to allow it to make use of presorted results instead of
having the code blindly sort rows for each aggregate that has a
DISTINCT or ORDER BY.  The planner would also then need to start
requesting paths with pathkeys that suit the aggregate and also
probably dictate the order the AggRefs should be evaluated to allow
all AggRefs to be evaluated that can be for each sort order.  Once
that part is done then the aggregates could then also request paths
with certain "UniqueKeys" (a feature I mentioned in [1]), however we'd
need to be pretty careful with that one since there may be other parts
of the query that require that all rows are fed in, not just 1 row per
value of "i", e.g SELECT COUNT(DISTINCT i) FROM t WHERE z > 0; can't
just feed through 1 row for each "i" value, since we need only the
ones that have "z > 0".  Getting the first part of this solved is much
more important than making skip scans work here, I'd say. I think we
need to be able to walk before we can run with DISTINCT  / ORDER BY
aggs.



I agree that the above is outside of scope for the first patch -- I 
think the goal should be the simple use-cases for IndexScan and 
IndexOnlyScan.


Maybe we should expand [1] with possible cases, so we don't lose track 
of the ideas.



2.  SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
you're allowed to go forwards.  Same for SELECT i, MAX(j) FROM t GROUP
BY i if you're allowed to go backwards.  Those queries are equivalent
to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
(though as Floris noted, the backwards version gives the wrong answers
with v20).  That does seem like a much more specific thing applicable
only to MIN and MAX, and I think preprocess_minmax_aggregates() could
be taught to handle that sort of query, building an index only scan
path with skip scan in build_minmax_path().


For the MIN query you just need a path with Pathkeys: { i ASC, j ASC
}, UniqueKeys: { i, j }, doing the MAX query you just need j DESC.



Ok.


The more I think about these UniqueKeys, the more I think they need to
be a separate concept to PathKeys. For example, UniqueKeys: { x, y }
should be equivalent to { y, x }, but with PathKeys, that's not the
case, since the order of each key matters. UniqueKeys equivalent
version of pathkeys_contained_in() would not care about the order of
individual keys, it would say things like, { a, b, c } is contained in
{ b, a }, since if the path is unique on columns { b, a } then it must
also be unique on { a, b, c }.



I'm looking at this, and will keep this in mind.

Thanks !

[1] https://wiki.postgresql.org/wiki/Loose_indexscan

Best regards,
 Jesper

Re: proposal: pg_restore --convert-to-text

2019-07-03 Thread Daniel Verite

Alvaro Herrera wrote:

> So you suggest that it should be 
> 
> pg_restore: error: one of -d/--dbname, -f/--file and -l/--list must be
> specified
> ?

I'd suggest this minimal fix :

(int argc, char **argv)
/* Complain if neither -f nor -d was specified (except if dumping
TOC) */
if (!opts->dbname && !opts->filename && !opts->tocSummary)
{
-   pg_log_error("one of -d/--dbname and -f/--file must be
specified");
+   pg_log_error("-d/--dbname or -f/--file or -l/--list must be
specified");
+   fprintf(stderr, _("Try \"%s --help\" for more
information.\n"),
+   progname);
exit_nicely(1);
}

> So you suggest that it should be 
>   pg_restore [connection-option...] { -d | -f | -l } [option...] [filename]
> ?

Looking at the other commands, it doesn't seem that we use this form for
any of those that require at least one argument, for instance:

===
$ ./pg_basebackup 
pg_basebackup: error: no target directory specified
Try "pg_basebackup --help" for more information.

$ ./pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.

Usage:
  pg_basebackup [OPTION]...

$ ./pg_checksums 
pg_checksums: error: no data directory specified
Try "pg_checksums --help" for more information.

$ ./pg_checksums --help
pg_checksums enables, disables, or verifies data checksums in a PostgreSQL
database cluster.

Usage:
  pg_checksums [OPTION]... [DATADIR]

$ ./pg_rewind 
pg_rewind: error: no source specified (--source-pgdata or --source-server)
Try "pg_rewind --help" for more information.

$ ./pg_rewind  --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the
cluster.

Usage:
  pg_rewind [OPTION]...
===

"pg_restore [OPTION]... [FILE]" appears to be consistent with those, even
with the new condition that no option is an error, so it's probably okay.

> Output target options:
>  -l, --list   print summarized TOC of the archive
>  -d, --dbname=NAMEconnect to database name
>  -f, --file=FILENAME  output file name (- for stdout)

That makes sense. I would put this section first, before
"General options".

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Optimize partial TOAST decompression

2019-07-03 Thread Binguo Bao

Paul Ramsey  于2019年7月2日周二 下午10:46写道：

> This looks good to me. A little commentary around why
> pglz_maximum_compressed_size() returns a universally correct answer
> (there's no way the compressed size can ever be larger than this
> because...) would be nice for peasants like myself.
>
> If you're looking to continue down this code line in your next patch,
> the next TODO item is a little more involved: a user-land (ala
> PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
> the optimization of searching of large objects like JSONB types, and
> so on, where the thing you are looking for is not at a known location
> in the object. So, things like looking for a particular substring in a
> string, or looking for a particular key in a JSONB. "Iterate until you
> find the thing." would allow optimization of some code lines that
> currently require full decompression of the objects.
>
> P.
>

Thanks for your comment. I've updated the patch.
As for the iterator API, I've implemented a de-TOAST iterator actually[0].
And I’m looking for more of its application scenarios and perfecting it.
Any comments would be much appreciated.

Best Regards, Binguo Bao.

[0]
https://www.postgresql.org/message-id/flat/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com
From 2e4e2838937ec6fa1404fe529e7ed303e391d1b2 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 24 +---
 src/common/pg_lzcompress.c   | 26 ++
 src/include/common/pg_lzcompress.h   |  1 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..684f1b2 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -266,6 +266,7 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
+		int32 max_size;
 
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
@@ -273,8 +274,13 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
+toast_pointer.va_rawsize);
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, max_size);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2037,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2079,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2097,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index 988b398..80ed17a 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -771,3 +771,29 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 	 */
 	return (char *) dp - dest;
 }
+
+
+
+/* --
+ * pglz_max_compressed_size -
+ *
+ * 		Calculate the maximum size of the compressed slice corresponding to the
+ * 		raw slice. Return the maximum size, or raw size if maximum size is larger
+ * 		than raw size.
+ * --
+ */
+int32
+pglz_maximum_compressed_size(int32 raw_slice_size, int32 raw_size)
+{
+	int32 result;
+
+	/*
+	 * Use int64 to prevent overflow during calculation.
+	 */
+	result =

Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-03 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 2 Jul 2019, at 22:35, Alvaro Herrera  wrote:
>> Anyway I'm not objecting to the patch -- I agree that we're already not
>> testing translatability and that this patch shouldn't be forced to start
>> doing it.

> I forgot to add that to my previous email, the patch as it stands in v8 looks
> good to me. I’ve missed having this on many occasions.

OK, pushed.

For the record, I did verify that the translatability logic worked
by adding some bogus entries to psql/po/es.po and seeing that the
display changed to match.

regards, tom lane

Re: Increasing default value for effective_io_concurrency?

2019-07-03 Thread Robert Haas

On Wed, Jul 3, 2019 at 11:24 AM Tomas Vondra
 wrote:
> Maybe. And it would probably work for the systems I used for benchmarks.
>
> It however assumes two things: (a) the storage system actually has
> spindles and (b) you know how many spindles there are. Which is becoming
> less and less safe these days - flash storage becomes pretty common, and
> even when there are spindles they are often hidden behind the veil of
> virtualization in a SAN, or something.

Yeah, that's true.

> I wonder if we might provide something like pg_test_prefetch which would
> measure performance with different values, similarly to pg_test_fsync.

That's not a bad idea, but I'm not sure if the results that we got in
a synthetic test - presumably unloaded - would be a good guide to what
to use in a production situation.  Maybe it would; I'm just not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Increasing default value for effective_io_concurrency?

2019-07-03 Thread Tomas Vondra

On Wed, Jul 03, 2019 at 11:04:59AM -0400, Robert Haas wrote:

On Mon, Jul 1, 2019 at 7:32 PM Andres Freund  wrote:

On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote:
> I think we should consider changing the effective_io_concurrency default
> value, i.e. the guc that determines how many pages we try to prefetch in
> a couple of places (the most important being Bitmap Heap Scan).

Maybe we need improve the way it's used / implemented instead - it seems
just too hard to determine the correct setting as currently implemented.

Perhaps the translation from effective_io_concurrency to a prefetch
distance, which is found in the slightly-misnamed ComputeIoConcurrency
function, should be changed.  The comments therein say:

* Experimental results show that both of these formulas
aren't aggressive
* enough, but we don't really have any better proposals.

Perhaps we could test experimentally what works well with N spindles
and then fit a formula to that curve and stick it in here, so that our
tuning is based on practice rather than theory.

I'm not sure if that approach is adequate or not.  It just seems like
something to try.

Maybe. And it would probably work for the systems I used for benchmarks. 

It however assumes two things: (a) the storage system actually has
spindles and (b) you know how many spindles there are. Which is becoming
less and less safe these days - flash storage becomes pretty common, and
even when there are spindles they are often hidden behind the veil of
virtualization in a SAN, or something.

I wonder if we might provide something like pg_test_prefetch which would
measure performance with different values, similarly to pg_test_fsync.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_waldump and PREPARE

2019-07-03 Thread Fujii Masao

On Tue, Jul 2, 2019 at 7:16 PM Julien Rouhaud  wrote:
>
> On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier  wrote:
> >
> > On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote:
> > > On 2019-Apr-26, Fujii Masao wrote:
> > >> I did that arrangement because the format of PREPARE TRANSACTION record,
> > >> i.e., that struct, also needs to be accessed in backend and frontend.
> > >> But, of course, if there is smarter way, I'm happy to adopt that!
> > >
> > > I don't know.  I spent some time staring at the code involved, and it
> > > seems it'd be possible to improve just a little bit on cleanliness
> > > grounds, with a lot of effort, but not enough practical value.
> >
> > Describing those records is something we should do.  There are other
> > parsing routines in xactdesc.c for commit and abort records, so having
> > that extra routine for prepare at the same place does not sound
> > strange to me.
> >
> > +typedef xl_xact_prepare TwoPhaseFileHeader;
> > I find this mapping implementation a bit lazy, and your
> > newly-introduced xl_xact_prepare does not count for all the contents
> > of the actual WAL record for PREPARE TRANSACTION.  Wouldn't it be
> > better to put all the contents of the record in the same structure,
> > and not only the 2PC header information?
>
> This patch doesn't apply anymore, could you send a rebase?

Yes, attached is the updated version of the patch.

Regards,

-- 
Fujii Masao


prepare-xact-desc_v2.patch
Description: Binary data

Re: Increasing default value for effective_io_concurrency?

2019-07-03 Thread Robert Haas

On Mon, Jul 1, 2019 at 7:32 PM Andres Freund  wrote:
> On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote:
> > I think we should consider changing the effective_io_concurrency default
> > value, i.e. the guc that determines how many pages we try to prefetch in
> > a couple of places (the most important being Bitmap Heap Scan).
>
> Maybe we need improve the way it's used / implemented instead - it seems
> just too hard to determine the correct setting as currently implemented.

Perhaps the translation from effective_io_concurrency to a prefetch
distance, which is found in the slightly-misnamed ComputeIoConcurrency
function, should be changed.  The comments therein say:

 * Experimental results show that both of these formulas
aren't aggressive
 * enough, but we don't really have any better proposals.

Perhaps we could test experimentally what works well with N spindles
and then fit a formula to that curve and stick it in here, so that our
tuning is based on practice rather than theory.

I'm not sure if that approach is adequate or not.  It just seems like
something to try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH v4] Add \warn to psql

2019-07-03 Thread Fabien COELHO




Hello Tom,


The point is that there would be at least *one* TAP tests so that many
other features of psql, although not all, can be tested. [...]


Yeah, but the point I was trying to make is that that's mostly down to
laziness.


Not always.

I agree that using TAP test if another simpler option is available is not 
a good move.


However, in the current state, as soon as there is some variation a test 
is removed and coverage is lost, but they could be kept if the check could 
be against a regexp.


I see no reason that we couldn't be covering a lot of these features in 
src/test/regress/sql/psql.sql, with far less overhead. The interactive 
aspects of psql can't be tested that way ... but since this patch 
doesn't actually provide any way to test those, it's not much of a 
proof-of-concept.


The PoC is checking against a set of regexp instead of expecting an exact 
output. Ok, it does not solve all possible test scenarii, that is life.



IOW, the blocking factor here is not "does src/bin/psql/t/ exist",
it's "has somebody written a test that moves the coverage needle
meaningfully".  I'm not big on adding a bunch of overhead first and
just hoping somebody will do something to make it worthwhile later.


I do intend to add coverage once a psql TAP test is available, as I have 
done with pgbench. Ok, some of the changes are still in the long CF queue, 
but at least pgbench coverage is around 90%.


I also intend to direct submitted patches to use the TAP infra when 
appropriate, instead of "no tests, too bad".


--
Fabien.

Re: SQL/JSON path issues/questions

2019-07-03 Thread Liudmila Mantrova


On 6/28/19 6:47 AM, Alexander Korotkov wrote:

On Tue, Jun 25, 2019 at 6:38 PM Liudmila Mantrova
 wrote:

Thank you for the catch! Please see the modified version of patch 0004
attached.

I tried to review and revise the part related to filters, but I failed
because I don't understand the notions used in the documentation.

What is the difference between filter expression and filter condition?
  I can guess that filter expression contains question mark,
parentheses and filter condition inside.  But this sentence is in
contradiction with my guess: "A filter expression must be enclosed in
parentheses and preceded by a question mark".  So, filter expression
is inside the parentheses.  Then what is filter condition?  The same?


Each filter expression can provide one or more filters
that are applied to the result of the path evaluation.


So additionally to filter condition and filter expression we introduce
the notion of just filter.  What is it?  Could we make it without
introduction of new notion?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Hi,

I have rechecked the standard and I agree that we should use "filter 
expression" whenever possible.
"A filter expression must be enclosed in parentheses..." looks like an 
oversight, so I fixed it. As for what's actually enclosed, I believe we 
can still use the word "condition" here as it's easy to understand and 
is already used in our docs, e.g. in description of the WHERE clause 
that serves a similar purpose.
The new version of the patch fixes the terminology, tweaks the examples, 
and provides some grammar and style fixes in the jsonpath-related chapters.



--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a8581d..b0de624 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11538,7 +11538,8 @@ table2-mapping
from the JSON data, similar to XPath expressions used
for SQL access to XML. In PostgreSQL,
path expressions are implemented as the jsonpath
-   data type, described in .
+   data type and can use any elements described in
+   .
   
 
   JSON query functions and operators
@@ -11585,7 +11586,7 @@ table2-mapping
   },
   { "location":   [ 47.706, 13.2635 ],
 "start time": "2018-10-14 10:39:21",
-"HR": 130
+"HR": 135
   } ]
   }
 }
@@ -11637,23 +11638,33 @@ table2-mapping
 
   
When defining the path, you can also use one or more
-   filter expressions, which work similar to
-   the WHERE clause in SQL. Each filter expression
-   can provide one or more filtering conditions that are applied
-   to the result of the path evaluation. Each filter expression must
-   be enclosed in parentheses and preceded by a question mark.
-   Filter expressions are evaluated from left to right and can be nested.
-   The @ variable denotes the current path evaluation
-   result to be filtered, and can be followed by one or more accessor
-   operators to define the JSON element by which to filter the result.
-   Functions and operators that can be used in the filtering condition
-   are listed in .
-   SQL/JSON defines three-valued logic, so the result of the filter
-   expression may be true, false,
+   filter expressions that work similar to the
+   WHERE clause in SQL. A filter expression begins with
+   a question mark and provides a condition in parentheses:
+
+
+? (condition)
+
+  
+
+  
+   Filter expressions must be specified right after the path evaluation step
+   to which they are applied. The result of this step is filtered to include
+   only those items that satisfy the provided condition. SQL/JSON defines
+   three-valued logic, so the condition can be true, false,
or unknown. The unknown value
-   plays the same role as SQL NULL. Further path
+   plays the same role as SQL NULL and can be tested
+   for with the is unknown predicate. Further path
evaluation steps use only those items for which filter expressions
-   return true.
+   return true.
+  
+
+  
+   Functions and operators that can be used in filter expressions are listed
+   in . The path
+   evaluation result to be filtered is denoted by the @
+   variable. It can be followed by one or more accessor operators to define
+   the JSON element on a lower nesting level by which to filter the result.
   
 
   
@@ -11667,8 +11678,8 @@ table2-mapping
   
To get the start time of segments with such values instead, you have to
filter out irrelevant segments before returning the start time, so the
-   filter is applied to the previous step and the path in the filtering
-   condition is different:
+   filter expression is applied to the previous step, and the path used
+   in the condition is different:
 
 '$.track.segments[*] ? (@.HR  130)."start time"'
 
@@ -11693,9 +11704,9 @@ table2-mapping
   
 
   
-

Re: [PATCH v4] Add \warn to psql

2019-07-03 Thread Tom Lane

Fabien COELHO  writes:
> The point is that there would be at least *one* TAP tests so that many 
> other features of psql, although not all, can be tested. I have been 
> reviewing quite a few patches without tests because of this lack of 
> infrastructure, and no one patch is ever going to justify a TAP test on 
> its own. It has to start somewhere. Currently psql coverage is abysmal, 
> around 40% of lines & functions are called by the whole non regression 
> tests, despite the hundreds of psql-relying tests.

Yeah, but the point I was trying to make is that that's mostly down to
laziness.  I see no reason that we couldn't be covering a lot of these
features in src/test/regress/sql/psql.sql, with far less overhead.
The interactive aspects of psql can't be tested that way ... but since
this patch doesn't actually provide any way to test those, it's not much
of a proof-of-concept.

IOW, the blocking factor here is not "does src/bin/psql/t/ exist",
it's "has somebody written a test that moves the coverage needle
meaningfully".  I'm not big on adding a bunch of overhead first and
just hoping somebody will do something to make it worthwhile later.

regards, tom lane

Re: Where is SSPI auth username determined for TAP tests?

2019-07-03 Thread Tom Lane

Michael Paquier  writes:
> I have been reviewing that part, and the part to split the bootstrap
> user from the set of extra roles created looks fine to me.  Now, it
> seems to me that you can simplify 010_dump_connstr.pl as per the
> attached because PostgresNode::Init can take care of --auth-config
> part with the correct options using auth_extra.  What do you think
> about the cleanup attached?

I haven't checked that this actually works, but it looks plausible,
and I agree it's simpler/easier.

regards, tom lane

Re: Another way to fix inherited UPDATE/DELETE

2019-07-03 Thread Tom Lane

Amit Langote  writes:
> On Wed, Feb 20, 2019 at 6:49 AM Tom Lane  wrote:
>> Obviously this'd be a major rewrite with no chance of making it into v12,
>> but it doesn't sound too big to get done during v13.

> Are you planning to work on this?

It's on my list, but so are a lot of other things.  If you'd like to
work on it, feel free.

regards, tom lane

Re: TopoSort() fix

2019-07-03 Thread Robert Haas

On Tue, Jul 2, 2019 at 11:23 AM Tom Lane  wrote:
> Rui Hai Jiang  writes:
> > I think I found an issue in the TopoSort() function.
>
> This indeed seems like a live bug introduced by a1c1af2a.
> Robert?

This is pretty thoroughly swapped out of my head, but it looks like
that analysis might be correct.

Is it practical to come up with a test case that demonstrates the problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Conflict handling for COPY FROM

2019-07-03 Thread Anthony Nowocien

Hi,
I'm very interested in this patch and would like to give a review within a
week. On the feature side, how about simply using the less verbose "ERRORS"
instead of "ERROR LIMIT" ?

On Wed, Jul 3, 2019 at 1:42 PM Surafel Temesgen 
wrote:

> Hi Alexey,
> Thank you for looking at it
>
> On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <
> a.kondra...@postgrespro.ru> wrote:
>
>> On 28.06.2019 16:12, Alvaro Herrera wrote:
>> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund 
>> wrote:
>> >>> Or even just return it as a row. CopyBoth is relatively widely
>> supported
>> >>> these days.
>> >> i think generating warning about it also sufficiently meet its propose
>> of
>> >> notifying user about skipped record with existing logging facility
>> >> and we use it for similar propose in other place too. The different
>> >> i see is the number of warning that can be generated
>> > Warnings seem useless for this purpose.  I'm with Andres: returning rows
>> > would make this a fine feature.  If the user wants the rows in a table
>> > as Andrew suggests, she can use wrap the whole thing in an insert.
>>
>> I agree with previous commentators that returning rows will make this
>> feature more versatile.
>
>
> I agree. am looking at the options
>
> Also, I would prefer having an option to ignore all errors, e.g. with
>> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
>> a number of future errors if you are playing with some badly structured
>> data, while always setting it to 100500k looks ugly.
>>
>>
> Good idea
>
> I also +1 having an option to ignore all errors. Other RDBMS might use a
large number, but "-1" seems cleaner so far.

>
>
>> 1) Calculation of processed rows isn't correct (I've checked). You do it
>> in two places, and
>>
>> -processed++;
>> +if (!cstate->error_limit)
>> +processed++;
>>
>> is never incremented if ERROR_LIMIT is specified and no errors
>> occurred/no constraints exist, so the result will always be 0. However,
>> if primary column with constraints exists, then processed is calculated
>> correctly, since another code path is used:
>>
>>
> Correct. i will fix
>
> +if (specConflict)
>> +{
>> +...
>> +}
>> +else
>> +processed++;
>>
>> I would prefer this calculation in a single place (as it was before
>> patch) for simplicity and in order to avoid such problems.
>>
>>
> ok
>
>
>> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
>> is specified and was exceeded, which doesn't seem to be correct, does it?
>>
>> -if (resultRelInfo->ri_NumIndices > 0)
>> +if (resultRelInfo->ri_NumIndices > 0 &&
>> cstate->error_limit == 0)
>>   recheckIndexes =
>> ExecInsertIndexTuples(myslot,
>>
>>
> No it alwase executed . I did it this way to avoid
> inserting index tuple twice but i see its unlikely
>
>
>> 3) Trailing whitespaces added to error messages and tests for some reason:
>>
>> +ereport(WARNING,
>> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("skipping \"%s\" --- missing data
>> for column \"%s\" ",
>>
>> +ereport(ERROR,
>> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("missing data for column \"%s\" ",
>>
>> -ERROR:  missing data for column "e"
>> +ERROR:  missing data for column "e"
>>   CONTEXT:  COPY x, line 1: "20002302323"
>>
>> -ERROR:  missing data for column "e"
>> +ERROR:  missing data for column "e"
>>   CONTEXT:  COPY x, line 1: "2001231\N\N"
>>
>>
>
> regards
> Surafel
>

Thanks,
Anthony

pguint Installation error in PostgreSQL server version 11.2

2019-07-03 Thread Suresh Kumar

Hi Everyone,

I have installed PostgresSQL 11.2 version on Centos 7 and try to install
*pguint* from source to install *TINYINT* datatype .
But installation   had problem not able to resolve dependency packages. I
have followed below method to install , please help me to resolve this
issue.

1. yum install centos-release-scl-rh
2.yum install llvm-toolset-7-clang-tools-extra
3.yum install devtoolset-7
4.yum install llvm-toolset-5
5.make PG_CONFIG=/usr/pgsql-11/bin/pg_config
6.make PG_CONFIG=/usr/pgsql-11/bin/pg_config install
Installed llvm5.0 packages successfully but when try to create pg extension
getting below error ,




*[postgres] # CREATE EXTENSION  uint;ERROR:  XX000: could not load library
"/usr/pgsql-11/lib/uint.so": /usr/pgsql-11/lib/uint.so: undefined symbol:
GET_1_BYTELOCATION:  internal_load_library, dfmgr.c:240Time: 17.247 ms*

Please help me to resolve this issue.

Regards,
Suresh Seema

-- 
Thanks & Regards
Suresh S

Re: benchmarking Flex practices

2019-07-03 Thread John Naylor

On Wed, Jul 3, 2019 at 5:35 AM Tom Lane  wrote:
>
> John Naylor  writes:
> > 0001 is a small patch to remove some unneeded generality from the
> > current rules. This lowers the number of elements in the yy_transition
> > array from 37045 to 36201.
>
> I don't particularly like 0001.  The two bits like this
>
> -whitespace ({space}+|{comment})
> +whitespace ({space}|{comment})
>
> seem likely to create performance problems for runs of whitespace, in that
> the lexer will now have to execute the associated action once per space
> character not just once for the whole run.

Okay.

> There are a bunch of higher-order productions that use "{whitespace}*",
> which is surely a bit redundant given the contents of {whitespace}.
> But maybe we could address that by replacing "{whitespace}*" with
> "{opt_whitespace}" defined as
>
> opt_whitespace  ({space}*|{comment})
>
> Not sure what impact if any that'd have on table size, but I'm quite sure
> that {whitespace} was defined with an eye to avoiding unnecessary
> lexer action cycles.

It turns out that {opt_whitespace} as defined above is not equivalent
to {whitespace}* , since the former is either a single comment or a
single run of 0 or more whitespace chars (if I understand correctly).
Using {opt_whitespace} for the UESCAPE rules on top of v3-0002, the
regression tests pass, but queries like this fail with a syntax error:

# select U&'d!0061t!+61' uescape  --comment
'!';

There was in fact a substantial size reduction, though, so for
curiosity's sake I tried just replacing {whitespace}* with {space}* in
the UESCAPE rules, and the table shrank from 30367 (that's with 0002
only) to 24661.

> As for the other two bits that are like
>
> -.  {
> -   /* This is only needed for \ just 
> before EOF */
> +\\ {
>
> my recollection is that those productions are defined that way to avoid a
> flex warning about not all possible input characters being accounted for
> in the  (resp. ) state.  Maybe that warning is
> flex-version-dependent, or maybe this was just a worry and not something
> that actually produced a warning ... but I'm hesitant to change it.
> If we ever did get to flex's default action, that action is to echo the
> current input character to stdout, which would be Very Bad.

FWIW, I tried Flex 2.5.35 and 2.6.4 with no warnings, and I did get a
warning when I deleted any of those two rules. I'll leave them out for
now, since this change was only good for ~500 fewer elements in the
transition array.

> As far as I can see, the point of 0002 is to have just one set of
> flex rules for the various variants of quotecontinue processing.
> That sounds OK, though I'm a bit surprised it makes this much difference
> in the table size. I would suggest that "state_before" needs a less
> generic name (maybe "state_before_xqs"?) and more than no comment.
> Possibly more to the point, it's not okay to have static state variables
> in the core scanner, so that variable needs to be kept in yyextra.
> (Don't remember offhand whether it's any more acceptable in the other
> scanners.)

Ah yes, I got this idea from the ECPG scanner, which is not reentrant. Will fix.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Conflict handling for COPY FROM

2019-07-03 Thread Surafel Temesgen

Hi Alexey,
Thank you for looking at it

On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov 
wrote:

> On 28.06.2019 16:12, Alvaro Herrera wrote:
> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund 
> wrote:
> >>> Or even just return it as a row. CopyBoth is relatively widely
> supported
> >>> these days.
> >> i think generating warning about it also sufficiently meet its propose
> of
> >> notifying user about skipped record with existing logging facility
> >> and we use it for similar propose in other place too. The different
> >> i see is the number of warning that can be generated
> > Warnings seem useless for this purpose.  I'm with Andres: returning rows
> > would make this a fine feature.  If the user wants the rows in a table
> > as Andrew suggests, she can use wrap the whole thing in an insert.
>
> I agree with previous commentators that returning rows will make this
> feature more versatile.


I agree. am looking at the options

Also, I would prefer having an option to ignore all errors, e.g. with
> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
> a number of future errors if you are playing with some badly structured
> data, while always setting it to 100500k looks ugly.
>
>
Good idea



> 1) Calculation of processed rows isn't correct (I've checked). You do it
> in two places, and
>
> -processed++;
> +if (!cstate->error_limit)
> +processed++;
>
> is never incremented if ERROR_LIMIT is specified and no errors
> occurred/no constraints exist, so the result will always be 0. However,
> if primary column with constraints exists, then processed is calculated
> correctly, since another code path is used:
>
>
Correct. i will fix

+if (specConflict)
> +{
> +...
> +}
> +else
> +processed++;
>
> I would prefer this calculation in a single place (as it was before
> patch) for simplicity and in order to avoid such problems.
>
>
ok


> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
> is specified and was exceeded, which doesn't seem to be correct, does it?
>
> -if (resultRelInfo->ri_NumIndices > 0)
> +if (resultRelInfo->ri_NumIndices > 0 &&
> cstate->error_limit == 0)
>   recheckIndexes =
> ExecInsertIndexTuples(myslot,
>
>
No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely


> 3) Trailing whitespaces added to error messages and tests for some reason:
>
> +ereport(WARNING,
> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\" --- missing data
> for column \"%s\" ",
>
> +ereport(ERROR,
> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("missing data for column \"%s\" ",
>
> -ERROR:  missing data for column "e"
> +ERROR:  missing data for column "e"
>   CONTEXT:  COPY x, line 1: "20002302323"
>
> -ERROR:  missing data for column "e"
> +ERROR:  missing data for column "e"
>   CONTEXT:  COPY x, line 1: "2001231\N\N"
>
>

regards
Surafel

Re: FETCH FIRST clause WITH TIES option

2019-07-03 Thread Surafel Temesgen

Hi Erik,
Thank you for looking at it.

Can you have a look?
>
>

It is because of the patch didn't handle that edge case.
attache patch contain a fix for it

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e83d309c5b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for the last place in the result set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..cf557ecb1c 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -102,6 +103,16 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_EMPTY;
 	return NULL;
 }
+
+/*
+ * Tuple at limit is needed for comparation in subsequent execution
+ * to detect ties.
+ */
+if (node->limitOption == WITH_TIES &&
+	node->position - node->offset == node->count - 1)
+{
+	ExecCopySlot(node->last_slot, slot);
+}
 node->subSlot = slot;
 if (++node->position > node->offset)
 	break;
@@ -126,12 +137,16 @@ ExecLimit(PlanState *pstate)
 			{
 /*
  * Forwards scan, so check for stepping off end of window. If
- * we are at the end of the window, return NULL without
- * advancing the subplan or the position variable; but change
- * the state machine state to record having done so.
+ * we are at the end of the window, the behavior depends whether
+ * ONLY or WITH TIES was specified.  In case of ONLY, we return
+ * NULL without advancing the subplan or the position variable;
+ * but change the state machine state to record having done so.
+ * In the WITH TIES mode, we need to advance the subplan until
+ * we find the first row with different ORDER BY pathkeys.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == EXACT_NUMBER)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,18 +159,69 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		 * If we know we won't need to back up, we can release
+		 * resources at this point.
+		 */
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+}
+else
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;

Re: Replacing the EDH SKIP primes

2019-07-03 Thread Michael Paquier

On Wed, Jul 03, 2019 at 10:56:41AM +0200, Daniel Gustafsson wrote:
> OpenSSL provides DH_check() which we use in load_dh_file() to ensure that the
> user is passing a valid prime in the DH file.  Adding this to the hardcoded
> blob seems overkill though, once the validity has been verified before it 
> being
> committed.

Agreed, and I didn't notice this check...  There could be an argument
for having DH_check within an assertion block but as this changes once
per decade there is little value.

> A DH param in PEM (or DER) format can be checked with the openssl dhparam 
> tool.
> Assuming the PEM is extracted from the patch into a file, one can do:
> 
>   openssl dhparam -inform PEM -in /tmp/dh.pem -check -text
> 
> The prime is returned and can be diffed against the one in the RFC.  If you
> modify the blob you will see that the check complains about it not being 
> prime.

Ah, thanks.  I can see that the new key matches the RFC.

> There is an expected warning in the output however: "the g value is not a
> generator” (this is also present when subjecting the PEM for the 2048 MODP in
> OpenSSL).

Indeed, I saw that also from OpenSSL.  That looks to come from dh.c
(there are two other code paths, haven't checked in details).  Thanks
for the pointer.

> I’m far from a cryptographer, but AFAICT from reading it essentially means 
> that
> the RFC authors chose to limit the search space of the shared secret rather
> than leaking a bit of it, and OpenSSL has chosen in DH_check() that leaking a
> bit is preferrable.  (This makes me wonder if we should downgrade the check in
> load_dh_file() "codes & DH_NOT_SUITABLE_GENERATOR” to WARNING, but the lack of
> reports of it being a problem either shows that most people are just using
> openssl dhparam generated parameters which can leak a bit, or aren’t using DH
> files at all.)

Yeah, no objections per the arguments from the RFC.  I am not sure if
we actually need to change that part though.  We'd still get a
complaint for a key which is not a prime, and for the default one this
is not something we care much about, because we know its properties,
no?  It would be nice to add a comment on that though, perhaps in
libpq-be.h where the key is defined.
--
Michael


signature.asc
Description: PGP signature

Re: Attached partition not considering altered column properties of root partition.

2019-07-03 Thread Prabhat Sahu

Thanks Amit for the fix patch,

I have applied the patch and verified the issue.
The attached partition with altered column properties shows error as below:
postgres=# alter table p attach partition p2 for values in (2);
psql: ERROR:  child table "p2" has different storage option for column "b"
than parent
DETAIL:  EXTENDED versus MAIN

Thanks,
Prabhat Sahu

On Wed, Jul 3, 2019 at 7:23 AM Amit Langote  wrote:

> Hi Prabhat,
>
> On Tue, Jul 2, 2019 at 5:12 PM Prabhat Sahu
>  wrote:
> >
> > Hi,
> >
> > In below testcase when I changed the staorage option for root partition,
> newly attached partition not including the changed staorage option.
> > Is this an expected behavior?
>
> Thanks for the report.  This seems like a bug.  Documentation claims
> that the child tables inherit column storage options from the parent
> table.  That's actually enforced in only some cases.
>
> 1. If you create the child table as a child to begin with (that is,
> not attach it as child after the fact):
>
> create table parent (a text);
> create table child () inherits (parent);
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  parent   │ a   │ x
>  child│ a   │ x
> (2 rows)
>
>
> 2. If you change the parent's column's storage option, child's column
> is recursively changed.
>
> alter table parent alter a set storage main;
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  parent   │ a   │ m
>  child│ a   │ m
> (2 rows)
>
> However, we fail to enforce the rule when the child is attached after the
> fact:
>
> create table child2 (a text);
> alter table child2 inherit parent;
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('parent'::regclass, 'child'::regclass,
> 'child2'::regclass) and attname = 'a';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  parent   │ a   │ m
>  child│ a   │ m
>  child2   │ a   │ x
> (3 rows)
>
> To fix this, MergeAttributesIntoExisting() should check that the
> attribute options of a child don't conflict with the parent, which the
> attached patch implements.  Note that partitioning uses the same code
> as inheritance, so the fix applies to it too.  After the patch:
>
> create table p (a int, b text) partition by list (a);
> create table p1 partition of p for values in (1);
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  p│ b   │ x
>  p1   │ b   │ x
> (2 rows)
>
> alter table p alter b set storage main;
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  p│ b   │ m
>  p1   │ b   │ m
> (2 rows)
>
> create table p2 (like p);
> select attrelid::regclass, attname, attstorage from pg_attribute where
> attrelid in ('p'::regclass, 'p1'::regclass, 'p2'::regclass) and
> attname = 'b';
>  attrelid │ attname │ attstorage
> ──┼─┼
>  p│ b   │ m
>  p1   │ b   │ m
>  p2   │ b   │ x
> (3 rows)
>
> alter table p attach partition p2 for values in (2);
> ERROR:  child table "p2" has different storage option for column "b" than
> parent
> DETAIL:  EXTENDED versus MAIN
>
> -- ok after changing p2 to match
> alter table p2 alter b set storage main;
> alter table p attach partition p2 for values in (2);
>
> Thanks,
> Amit

Re: [PATCH] Speedup truncates of relation forks

2019-07-03 Thread Adrien Nayrat

On 7/1/19 12:55 PM, Jamison, Kirk wrote:
> On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:
>> As far as I remember, you should see "relation" wait events (type lock) on
>> standby server. This is due to startup process acquiring AccessExclusiveLock
>> for the truncation and other backend waiting to acquire a lock to read the
>> table.
> 
> Hi Adrien, thank you for taking time to reply.
> 
> I understand that RelationTruncate() can block read-only queries on
> standby during redo. However, it's difficult for me to reproduce the 
> test case where I need to catch that wait for relation lock, because
> one has to execute SELECT within the few milliseconds of redoing the
> truncation of one table.

Yes, that why your test by measuring vacuum execution time is better as it is
more reproductible.

> 
> Instead, I just measured the whole recovery time, smgr_redo(),
> to show the recovery improvement compared to head. Please refer below.
> 
> [Recovery Test]
> I used the same stored functions and configurations in the previous email
> & created "test" db.
> 
> $ createdb test
> $ psql -d test
> 
> 1. [Primary] Create 10,000 relations.
>   test=# SELECT create_tables(1);
> 
> 2. [P] Insert one row in each table.
>   test=# SELECT insert_tables(1);
> 
> 3. [P] Delete row of each table.
>   test=# SELECT delfrom_tables(1);
> 
> 4. [Standby] WAL application is stopped at Standby server.
>   test=# SELECT pg_wal_replay_pause();
> 
> 5. [P] VACUUM is executed at Primary side, and measure its execution time.
> 
>   test=# \timing on
>   test=# VACUUM;
> 
>   Alternatively, you may use:
>   $ time psql -d test -c 'VACUUM;'
>   (Note: WAL has not replayed on standby because it's been paused.)
> 
> 6. [P] Wait until VACUUM has finished execution. Then, stop primary server. 
>   test=# pg_ctl stop -w
> 
> 7. [S] Resume WAL replay, then promote standby (failover).
> I used a shell script to execute recovery & promote standby server
> because it's kinda difficult to measure recovery time. Please refer to the 
> script below.
> - "SELECT pg_wal_replay_resume();" is executed and the WAL application is 
> resumed.
> - "pg_ctl promote" to promote standby.
> - The time difference of "select pg_is_in_recovery();" from "t" to "f" is 
> measured.
> 
> shell script:
> 
> PGDT=/path_to_storage_directory/
> 
> if [ "$1" = "resume" ]; then
>   psql -c "SELECT pg_wal_replay_resume();" test
>   date +%Y/%m/%d_%H:%M:%S.%3N
>   pg_ctl promote -D ${PGDT}
>   set +x
>   date +%Y/%m/%d_%H:%M:%S.%3N
>   while [ 1 ]
>   do
>   RS=`psql -Atc "select pg_is_in_recovery();" test`   
>   if [ ${RS} = "f" ]; then
>   break
>   fi
>   done
>   date +%Y/%m/%d_%H:%M:%S.%3N
>   set -x
>   exit 0
> fi
> 
> 
> [Test Results]
> shared_buffers = 24GB
> 
> 1. HEAD
> (wal replay resumed)
> 2019/07/01_08:48:50.326
> server promoted
> 2019/07/01_08:49:50.482
> 2019/07/01_09:02:41.051
> 
>  Recovery Time:
>  13 min 50.725 s -> Time difference from WAL replay to complete recovery
>  12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" 
> to "f"
> 
> 2. PATCH
> (wal replay resumed)
> 2019/07/01_07:34:26.766
> server promoted
> 2019/07/01_07:34:57.790
> 2019/07/01_07:34:57.809
> 
>  Recovery Time:   
>  31.043 s -> Time difference from WAL replay to complete recovery
>  00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>  
> [Conclusion]
> The recovery time significantly improved compared to head
> from 13 minutes to 30 seconds.
> 
> Any thoughts?
> I'd really appreciate your comments/feedback about the patch and/or test.
> 
> 

Thanks for the time you spend on this test, it is a huge win!
Although creating 10k tables and deleting tuples is not a common use case, it is
still good to know how your patch performs.
I will try to look deeper in your patch, but my knowledge on postgres internal
are limited :)

-- 
Adrien




signature.asc
Description: OpenPGP digital signature

Re: [PATCH][PROPOSAL] Add enum releation option type

2019-07-03 Thread Nikolay Shaplov

В письме от понедельник, 1 июля 2019 г. 14:06:28 MSK пользователь Alvaro 
Herrera написал:
> It strikes me that the way to avoid sentence construction is to have
> each enum reloption declare a string that it uses to list the values it
> accepts.  So for example we would have
> 
> +#define GIST_OPTION_BUFFERING_ENUM_DEF {   \
> +   { "on", GIST_OPTION_BUFFERING_ON }, \
> +   { "off",GIST_OPTION_BUFFERING_OFF },\
> +   { "auto",   GIST_OPTION_BUFFERING_AUTO },   \
> +   { (const char *) NULL, 0 }  \
> +}
> +
> + GistBufferingValidMsg = gettext_noop("Valid values are \"on\", \"off\",
> and \"auto\".");
> 
> I think that's the most contentious point on this patch at this point
> (though I may be misremembering).

I actually removed "and" from the list and let it be simple coma separated 
list

 ERROR:  invalid value for "check_option" option
 DETAIL:  Valid values are: "local", "cascaded".

Now we can translate left part, and subst list to the right part

errdetail("Valid values are: %s.", buf.data)));

It is not that nice as before, but quite acceptable, as I see it.


You do not see it that way?

Re: [PATCH][PROPOSAL] Add enum releation option type

2019-07-03 Thread Nikolay Shaplov

В письме от понедельник, 1 июля 2019 г. 21:25:30 MSK пользователь Thomas Munro 
написал:
> > It's not clear to me that all of Michael's and Álvaro's issues have been
> > addressed, so I've marked this Waiting on Author.

> To help reviewers for the Commitfest that is starting, could you
> please rebase this patch?

Oh, yes, sure, thank you for reminding.

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5773021..fffab3a 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -433,32 +433,41 @@ static relopt_real realRelOpts[] =
 	{{NULL}}
 };
 
-static relopt_string stringRelOpts[] =
+static relopt_enum_elt_def gist_buffering_enum_def[] =
+GIST_OPTION_BUFFERING_ENUM_DEF;
+static relopt_enum_elt_def view_check_option_enum_def[] =
+VIEW_OPTION_CHECK_OPTION_ENUM_DEF;
+static relopt_enum enumRelOpts[] =
 {
 	{
 		{
 			"buffering",
-			"Enables buffering build for this GiST index",
-			RELOPT_KIND_GIST,
-			AccessExclusiveLock
+"Enables buffering build for this GiST index",
+RELOPT_KIND_GIST,
+AccessExclusiveLock
 		},
-		4,
-		false,
-		gistValidateBufferingOption,
-		"auto"
+			gist_buffering_enum_def,
+			GIST_OPTION_BUFFERING_AUTO
 	},
 	{
 		{
 			"check_option",
-			"View has WITH CHECK OPTION defined (local or cascaded).",
-			RELOPT_KIND_VIEW,
-			AccessExclusiveLock
+"View has WITH CHECK OPTION defined (local or cascaded).",
+RELOPT_KIND_VIEW,
+AccessExclusiveLock
 		},
-		0,
-		true,
-		validateWithCheckOption,
-		NULL
+			view_check_option_enum_def,
+			VIEW_OPTION_CHECK_OPTION_NOT_SET
 	},
+	{
+		{
+			NULL
+		}
+	}
+};
+
+static relopt_string stringRelOpts[] =
+{
 	/* list terminator */
 	{{NULL}}
 };
@@ -505,6 +514,12 @@ initialize_reloptions(void)
    realRelOpts[i].gen.lockmode));
 		j++;
 	}
+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		Assert(DoLockModesConflict(enumRelOpts[i].gen.lockmode,
+   enumRelOpts[i].gen.lockmode));
+		j++;
+	}
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		Assert(DoLockModesConflict(stringRelOpts[i].gen.lockmode,
@@ -543,6 +558,14 @@ initialize_reloptions(void)
 		j++;
 	}
 
+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		relOpts[j] = [i].gen;
+		relOpts[j]->type = RELOPT_TYPE_ENUM;
+		relOpts[j]->namelen = strlen(relOpts[j]->name);
+		j++;
+	}
+
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		relOpts[j] = [i].gen;
@@ -640,6 +663,9 @@ allocate_reloption(bits32 kinds, int type, const char *name, const char *desc)
 		case RELOPT_TYPE_REAL:
 			size = sizeof(relopt_real);
 			break;
+		case RELOPT_TYPE_ENUM:
+			size = sizeof(relopt_enum);
+			break;
 		case RELOPT_TYPE_STRING:
 			size = sizeof(relopt_string);
 			break;
@@ -719,6 +745,24 @@ add_real_reloption(bits32 kinds, const char *name, const char *desc, double defa
 }
 
 /*
+ * add_enuum_reloption
+ *		Add a new enum reloption
+ */
+void
+add_enum_reloption(bits32 kinds, const char *name, const char *desc,
+   relopt_enum_elt_def *members, int default_val)
+{
+	relopt_enum *newoption;
+
+	newoption = (relopt_enum *) allocate_reloption(kinds, RELOPT_TYPE_ENUM,
+   name, desc);
+	newoption->members = members;
+	newoption->default_val = default_val;
+
+	add_reloption((relopt_gen *) newoption);
+}
+
+/*
  * add_string_reloption
  *		Add a new string reloption
  *
@@ -1234,6 +1278,51 @@ parse_one_reloption(relopt_value *option, char *text_str, int text_len,
 	   optreal->min, optreal->max)));
 			}
 			break;
+		case RELOPT_TYPE_ENUM:
+			{
+relopt_enum *optenum = (relopt_enum *) option->gen;
+relopt_enum_elt_def *elt;
+
+parsed = false;
+for (elt = optenum->members; elt->string_val; elt++)
+{
+	if (pg_strcasecmp(value, elt->string_val) == 0)
+	{
+		option->values.enum_val = elt->symbol_val;
+		parsed = true;
+		break;
+	}
+}
+if (!parsed)
+{
+	/*
+	 * If value is not among the allowed string values, but we
+	 * are not asked to validate, just use default numeric
+	 * value.  Otherwise raise an error.
+	 */
+	if (!validate)
+		option->values.enum_val = optenum->default_val;
+	else
+	{
+		StringInfoData buf;
+
+		initStringInfo();
+		for (elt = optenum->members; elt->string_val; elt++)
+		{
+			appendStringInfo(, "\"%s\"", elt->string_val);
+			if (elt[1].string_val)
+appendStringInfo(, ", ");
+		}
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for \"%s\" option",
+		option->gen->name),
+ errdetail("Valid values are: %s.", buf.data)));
+		pfree(buf.data);
+	}
+}
+			}
+			break;
 		case RELOPT_TYPE_STRING:
 			{
 relopt_string *optstring = (relopt_string *) option->gen;
@@ -1327,6 +1416,11 @@ fillRelOptions(void *rdopts, Size basesize,
 			options[i].values.real_val :
 			((relopt_real *) options[i].gen)->default_val;

Re: postgres_fdw: Minor improvement to postgresAcquireSampleRowsFunc

2019-07-03 Thread Etsuro Fujita

On Fri, Jun 28, 2019 at 7:15 PM Etsuro Fujita  wrote:
> On Fri, Jun 28, 2019 at 6:54 PM Julien Rouhaud  wrote:
> > On Fri, Jun 28, 2019 at 11:39 AM Etsuro Fujita  
> > wrote:
> > > In postgresAcquireSampleRowsFunc, we 1) determine the fetch size and
> > > then 2) construct the fetch command in each iteration of fetching some
> > > rows from the remote, but that would be totally redundant.
> >
> > Indeed.
> >
> > > Attached
> > > is a patch for removing that redundancy.
> >
> > It all looks good to me!  I marked it as ready for committer.
>
> Cool!  I'll commit the patch if there are no objections.  Thanks for 
> reviewing!

Pushed.

Best regards,
Etsuro Fujita

Re: Replacing the EDH SKIP primes

2019-07-03 Thread Daniel Gustafsson

> On 2 Jul 2019, at 09:49, Michael Paquier  wrote:
> On Tue, Jul 02, 2019 at 08:14:25AM +0100, Peter Eisentraut wrote:

>> 
>> I was wondering whether the provided binary blob contained any checksums
>> or other internal checks.  How would we know whether it contains
>> transposed characters or replaces a 1 by a I or a l?  If I just randomly
>> edit the blob, the ssl tests still pass.  (The relevant load_dh_buffer()
>> call does get called by the tests.)  How can we make sure we actually
>> got a good copy?
>> 
> 
> PEM_read_bio_DHparams() has some checks on the Diffie-Hellman key, but
> it is up to the caller to make sure that it is normally providing a
> prime number in this case to make the cracking harder, no?

OpenSSL provides DH_check() which we use in load_dh_file() to ensure that the
user is passing a valid prime in the DH file.  Adding this to the hardcoded
blob seems overkill though, once the validity has been verified before it being
committed.

>  RFC 3526
> has a small formula in this case, which we can use to double-check the
> patch.

A DH param in PEM (or DER) format can be checked with the openssl dhparam tool.
Assuming the PEM is extracted from the patch into a file, one can do:

openssl dhparam -inform PEM -in /tmp/dh.pem -check -text

The prime is returned and can be diffed against the one in the RFC.  If you
modify the blob you will see that the check complains about it not being prime.

There is an expected warning in the output however: "the g value is not a
generator” (this is also present when subjecting the PEM for the 2048 MODP in
OpenSSL).  From reading RFC2412 which outlines how the primes are generated,
this is by design.  In Appendix E:

  "Because these two primes are congruent to 7 (mod 8), 2 is a quadratic
   residue of each prime.  All powers of 2 will also be quadratic
   residues.  This prevents an opponent from learning the low order bit
   of the Diffie-Hellman exponent (AKA the subgroup confinement
   problem).  Using 2 as a generator is efficient for some modular
   exponentiation algorithms.  [Note that 2 is technically not a
   generator in the number theory sense, because it omits half of the
   possible residues mod P.  From a cryptographic viewpoint, this is a
   virtue.]"

I’m far from a cryptographer, but AFAICT from reading it essentially means that
the RFC authors chose to limit the search space of the shared secret rather
than leaking a bit of it, and OpenSSL has chosen in DH_check() that leaking a
bit is preferrable.  (This makes me wonder if we should downgrade the check in
load_dh_file() "codes & DH_NOT_SUITABLE_GENERATOR” to WARNING, but the lack of
reports of it being a problem either shows that most people are just using
openssl dhparam generated parameters which can leak a bit, or aren’t using DH
files at all.)

cheers ./daniel

Re: MSVC Build support with visual studio 2019

2019-07-03 Thread Haribabu Kommi

On Wed, 3 Jul 2019 at 10:01, Michael Paquier  wrote:

> On Tue, Jul 02, 2019 at 02:10:11PM +0900, Michael Paquier wrote:
> > OK, committed to HEAD for now after perltidy'ing the patch.  Let's see
> > what the buildfarm has to say about it first.  Once we are sure that
> > the thing is stable, I'll try to backpatch it.  This works on my own
> > dev machines with VS 2015 and 2019, but who knows what hides in the
> > shadows...
>
> The buildfarm did not have much to say, so backpatched down to 9.4,
> adjusting things on the way.


Thanks Michael.

Regards,
Haribabu Kommi

Re: Run-time pruning for ModifyTable

2019-07-03 Thread Amit Langote

On Wed, Jul 3, 2019 at 4:34 PM David Rowley
 wrote:
> On Wed, 3 Jul 2019 at 17:27, Amit Langote  wrote:
> > A cursory look at the patch suggests that most of its changes will be
> > for nothing if [1] materializes.  What do you think about that?
>
> Yeah, I had this in mind when writing the patch, but kept going
> anyway.  I think it's only really a small patch of this patch that
> would get wiped out with that change. Just the planner.c stuff.
> Everything else is still required, as far as I understand.

If I understand the details of [1] correctly, ModifyTable will no
longer have N subplans for N result relations as there are today.  So,
it doesn't make sense for ModifyTable to contain
PartitionedRelPruneInfos and for ExecInitModifyTable/ExecModifyTable
to have to perform initial and execution-time pruning, respectively.
As I said, bottom expansion of target inheritance will mean pruning
(both plan-time and run-time) will occur at the bottom too, so the
run-time pruning capabilities of nodes that already have it will be
used for UPDATE and DELETE too.

Thanks,
Amit

RE: extension patch of CREATE OR REPLACE TRIGGER

2019-07-03 Thread osumi.takami...@fujitsu.com

Dear Mr. Thomas

> The July Commitfest is now beginning.  To give the patch the best chance of
> attracting reviewers, could you please post a rebased version?  The last 
> version
> doesn't apply.

I really appreciate your comments.
Recently, I'm very busy because of my work.
So, I'll fix this within a couple of weeks.

Regards,
Osumi Takamichi

Re: FETCH FIRST clause WITH TIES option

2019-07-03 Thread Erik Rijkers


On 2019-07-01 19:38, Surafel Temesgen wrote:
Thank you for informing. attach is a rebased patch against current 
master

[...]
[fetch_first_with_ties_v10.patch]


Hi Surafel,

The patch applies OK, make check is OK, compiles OK.

But I get:

TRAP: FailedAssertion("!(!(((slot)->tts_flags & (1 << 1)) != 0))", File: 
"execTuples.c", Line: 491)


when running a variant ('fetch 1' instead of 'fetch 2') of the test SQL 
against src/test/regress/data/onek.data:


(in the script below the location of the file 'onek.data' will have to 
be changed)


- 8< -
#!/bin/bash

echo "
drop   table if exists onek ;
create table onek (
unique1  int4,
unique2  int4,
two  int4,
four int4,
ten  int4,
twenty   int4,
hundred  int4,
thousand int4,
twothousand  int4,
fivethousint4,
tenthous int4,
odd  int4,
even int4,
stringu1 name,
stringu2 name,
string4  name
);

copy onek from 
'/home/aardvark/pg_stuff/pg_sandbox/pgsql.fetch_first_with_ties/src/test/regress/data/onek.data';


create index   onek_unique1 on onek using btree(unique1 
int4_ops);

create index onek_unique2  on onek using btree(unique2 int4_ops);
create index onek_hundred  on onek using btree(hundred int4_ops);
create index onek_stringu1 on onek using btree(stringu1 name_ops);

-- OK:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows only
;

-- crashes:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows with ties
;

" | psql -qXa
- 8< -

Can you have a look?


thanks,

Erik Rijkers

Re: Problem with default partition pruning

2019-07-03 Thread Amit Langote

Hosoya-san,

Thanks for updating the patches.

I have no comment in particular about
v4_default_partition_pruning.patch, but let me reiterate my position
about v5_ignore_contradictory_where_clauses_at_partprune_step.patch,
which I first stated in the following email a few months ago:

https://www.postgresql.org/message-id/d2c38e4e-ade4-74de-f686-af37e4a5f1c1%40lab.ntt.co.jp

This patch proposes to apply constraint exclusion to check whether it
will be wasteful to generate pruning steps from a given clause against
a given sub-partitioned table, because the clause contradicts its
partition clause.  Actually, the patch started out to generalize the
existing usage of constraint exclusion in partprune.c that's used to
skip processing useless arguments of an OR clause.  The problem with
steps generated from such contradictory clauses is that they fail to
prune the default partition of a sub-partitioned table, because the
value extracted from such a clause appears to the pruning logic to
fall in the default partition, given that the pruning logic proper is
unaware of the partition constraint of the partitioned table that
pruning is being applied to.  Here is an example similar to one that
Hosoya-san shared earlier on this thread that shows the problem.

create table p (a int) partition by range (a);
create table p1 partition of p for values from (0) to (20) partition
by range (a);
create table p11 partition of p1 for values from (0) to (10);
create table p1_def partition of p1 default;
-- p11 correctly pruned, but p1_def not
explain select * from p1 where a = 25;
  QUERY PLAN
──
 Append  (cost=0.00..41.94 rows=13 width=4)
   ->  Seq Scan on p1_def  (cost=0.00..41.88 rows=13 width=4)
 Filter: (a = 25)
(3 rows)

Here without the knowledge that p1's range is restricted to 0 <= a <
20 by way of its partition constraint, the pruning logic, when handed
the value 25, concludes that p1_def must be scanned.  With the patch,
partprune.c concludes without performing pruning that scanning any of
p1's partitions is unnecessary.

explain select * from p1 where a = 25;
QUERY PLAN
──
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

Actually, as of 11.4, setting constraint_exclusion = on, by way of
relation_excluded_by_constraints(), will give you the same result even
without the patch.  My argument earlier was that we shouldn't have two
places that will do essentially the same processing -- partprune.c
with the patch applied and relation_excluded_by_constraints().  That
is, we should only keep the latter, with the trade-off that users have
to live with the default partition of sub-partitioned tables not being
pruned in some corner cases like this one.

Note that there's still a problem with the existing usage of
constraint exclusion in partprune.c, which Thibaut first reported on
this thread [1].

explain select * from p1 where a = 25 or a = 5;
  QUERY PLAN
──
 Append  (cost=0.00..96.75 rows=50 width=4)
   ->  Seq Scan on p11  (cost=0.00..48.25 rows=25 width=4)
 Filter: ((a = 25) OR (a = 5))
   ->  Seq Scan on p1_def  (cost=0.00..48.25 rows=25 width=4)
 Filter: ((a = 25) OR (a = 5))
(5 rows)

Here only one of the OR's arguments can be true for p1's partitions,
but partprune.c's current usage of constraint exclusion fails to
notice that.  I had posted a patch [2] to solve this specific problem.
Hosoya-san's patch is a generalization of my patch.

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/bd03f475-30d4-c4d0-3d7f-d2fbde755971%40dalibo.com

[2] 
https://www.postgresql.org/message-id/9bb31dfe-b0d0-53f3-3ea6-e64b811424cf%40lab.ntt.co.jp

Re: Custom table AMs need to include heapam.h because of BulkInsertState

2019-07-03 Thread David Rowley

On Wed, 3 Jul 2019 at 19:35, Michael Paquier  wrote:
>
> On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote:
> > I've pushed the original patch plus a small change to only call
> > table_finish_bulk_insert() for the target of the copy when we're using
> > bulk inserts.
>
> Yes, sorry for coming late at the party here.  What I meant previously
> is that I did not find the version published at [1] to be natural with
> its structure to define an executor callback which then calls a
> callback for table AMs, still I get your point that it would be better
> to try to avoid unnecessary fsync calls on partitions where no tuples
> have been redirected with a COPY.  The version 1 of the patch attached
> at [2] felt much more straight-forward and cleaner by keeping all the
> table AM callbacks within copy.c.
>
> This has been reverted as of f5db56f, still it seems to me that this
> was moving in the right direction.

I think the only objection to doing it the way [2] did was, if there
are more than MAX_PARTITION_BUFFERS partitions then we may end up
evicting the CopyMultiInsertBuffer out of the CopyMultiInsertInfo and
thus cause a call to table_finish_bulk_insert() before we're done with
the copy. It's not impossible that this could happen many times for a
given partition.  I agree that a working version of [2] is cleaner
than [1] but it's just the thought of those needless calls.

For [1], I wasn't very happy with the way it turned out which is why I
ended up suggesting a few other ideas. I just don't really like either
of them any better than [1], so I didn't chase those up, and that's
why I ended up going for [2]. If you think any of the other ideas I
suggested are better (apart from [2]) then let me know and I can see
about writing a patch. Otherwise, I plan to just fix [2] and push.

> [1]: 
> https://postgr.es/m/CAKJS1f95sB21LBF=1mcsev+xlta_jc3mtxx5kgduhdsogow...@mail.gmail.com
> [2]: 
> https://postgr.es/m/cakjs1f_0t-k0_3xe+erxpq-jgaob6trzayercxf2rpgduvm...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Custom table AMs need to include heapam.h because of BulkInsertState

2019-07-03 Thread Michael Paquier

On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote:
> I've pushed the original patch plus a small change to only call
> table_finish_bulk_insert() for the target of the copy when we're using
> bulk inserts.

Yes, sorry for coming late at the party here.  What I meant previously
is that I did not find the version published at [1] to be natural with
its structure to define an executor callback which then calls a
callback for table AMs, still I get your point that it would be better
to try to avoid unnecessary fsync calls on partitions where no tuples
have been redirected with a COPY.  The version 1 of the patch attached
at [2] felt much more straight-forward and cleaner by keeping all the
table AM callbacks within copy.c.

This has been reverted as of f5db56f, still it seems to me that this
was moving in the right direction.

[1]: 
https://postgr.es/m/CAKJS1f95sB21LBF=1mcsev+xlta_jc3mtxx5kgduhdsogow...@mail.gmail.com
[2]: 
https://postgr.es/m/cakjs1f_0t-k0_3xe+erxpq-jgaob6trzayercxf2rpgduvm...@mail.gmail.com

--
Michael

signature.asc
Description: PGP signature

Re: Run-time pruning for ModifyTable

2019-07-03 Thread David Rowley

On Wed, 3 Jul 2019 at 17:27, Amit Langote  wrote:
> A cursory look at the patch suggests that most of its changes will be
> for nothing if [1] materializes.  What do you think about that?

Yeah, I had this in mind when writing the patch, but kept going
anyway.  I think it's only really a small patch of this patch that
would get wiped out with that change. Just the planner.c stuff.
Everything else is still required, as far as I understand.

> [1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: POC: converting Lists into arrays

2019-07-03 Thread Oleksandr Shulgin

On Tue, Jul 2, 2019 at 5:12 PM Tom Lane  wrote:

> Oleksandr Shulgin  writes:
> > Not related to the diff v6..v7, but shouldn't we throw additionally a
> > memset() with '\0' before calling pfree():
>
> I don't see the point of that.  In debug builds CLOBBER_FREED_MEMORY will
> take care of it, and in non-debug builds I don't see why we'd expend
> the cycles.
>

This is what I was wondering about, thanks for providing a reference.

--
Alex

Re: TopoSort() fix

2019-07-03 Thread Michael Paquier

On Wed, Jul 03, 2019 at 10:41:59AM +0800, Rui Hai Jiang wrote:
> Could the attached patch fix this issue? Or does any one else plan to fix
> it?
> 
> If people are busy and have not  time, I can go ahead to fix it.  To fix
> this issue, do we need a patch for each official branch?

Only a committer could merge any fix you produce.  What you have sent
looks fine to me, so let's wait for Robert, who has visiblu broken
this part to comment.  Back-patched versions are usually taken care of
by the committer merging the fix, and by experience it is better to
agree about the shape of a patch on HEAD before working on other
branches.  Depending on the review done, the patch's shape may change
slightly...
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH v4] Add \warn to psql

2019-07-03 Thread Fabien COELHO




Hello Tom,


I took a look at this.  I have no quibble with the proposed feature,
and the implementation is certainly simple enough.  But I'm unconvinced
about the proposed test scaffolding.  Spinning up a new PG instance is a
*hell* of a lot of overhead to pay for testing something that could be
tested as per attached.



Admittedly, the attached doesn't positively prove which pipe each output 
string went down, but that does not strike me as a concern large enough 
to justify adding a TAP test for.


Sure.

The point is that there would be at least *one* TAP tests so that many 
other features of psql, although not all, can be tested. I have been 
reviewing quite a few patches without tests because of this lack of 
infrastructure, and no one patch is ever going to justify a TAP test on 
its own. It has to start somewhere. Currently psql coverage is abysmal, 
around 40% of lines & functions are called by the whole non regression 
tests, despite the hundreds of psql-relying tests. Pg is around 80% 
coverage overall.


Basically, I really thing that one psql dedicated TAP test should be 
added, not for \warn per se, but for other features.



I'd be happier about adding TAP infrastructure if it looked like it'd
be usable to test some of the psql areas that are unreachable by the
existing test methodology, particularly tab-complete.c and prompt.c.
But I don't see anything here that looks like it'll work for that.


The tab complete and prompt are special interactive cases and probably 
require special infrastructure to make a test believe it is running 
against a tty while it is not. The point of this proposal is not to 
address these special needs, but to lay a basic infra.



I don't like what you did to command_checks_all,


Yeah, probably my fault, not David.

either --- it could hardly say "bolted on after the fact" more clearly 
if you'd written that in  text.  If we need an input-stream 
argument, let's just add it in a rational place and adjust the callers. 
There aren't that many of 'em, nor has the subroutine been around all 
that long.


I wanted to avoid breaking the function signature of it is used by some 
external packages. Not caring is an option.


--
Fabien.

Re: POC: converting Lists into arrays

2019-07-03 Thread David Rowley

On Tue, 2 Jul 2019 at 11:27, Tom Lane  wrote:
> My previous patch would have had you replace this with a loop using
> an integer list-position index.  You can still do that if you like,
> but it's less change to convert the loop to a foreach(), drop the
> prev/next variables, and replace the list_delete_cell call with
> foreach_delete_current:
>
> foreach(cell, state->enterKeys)
> {
> TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
>
> if (need to delete)
> state->enterKeys = 
> foreach_delete_current(state->enterKeys,
> cell);
> }
>
> So I think this is a win, and attached is v7.

It's pretty nice to get rid of those. I also like you've handled the
changes in SRFs.

I've now read over the entire patch and have noted down the following:

1. MergeAttributes() could do with a comment mention why you're not
using foreach() on the outer loop. I almost missed the
list_delete_nth_cell() call that's a few branches deep in the outer
loop.

2. In expandTupleDesc(), couldn't you just change the following:

int i;
for (i = 0; i < offset; i++)
{
if (aliascell)
aliascell = lnext(eref->colnames, aliascell);
}

to:

aliascell = offset < list_length(eref->colnames) ?
list_nth_cell(eref->colnames, offset) : NULL;

3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ?

4. Do you think it would be a good idea to document that the 'pos' arg
in list_insert_nth and co must be <= list_length(). I know you've
mentioned that in insert_new_cell, but that's static and
list_insert_nth is not. I think it would be better not to have to
chase down comments of static functions to find out how to use an
external function.

5. Why does enlarge_list() return List *? Can't it just return void?
I noticed this after looking at add_new_cell_after() and reading your
cautionary comment and then looking at lappend_cell(). At first, it
seemed that lappend_cell() could end up reallocating List to make way
for the new cell, but from looking at enlarge_list() it seems we
always maintain the original allocation of the header. So why bother
returning List * in that function?

6. Is there a reason to use memmove() in list_concat() rather than
just memcpy()? I don't quite believe the comment you've written. As
far as I can see you're not overwriting any useful memory so the order
of the copy should not matter.

7. The last part of the following comment might not age well.

/*
* Note: unlike the individual-list-cell deletion functions, we never make
* any effort to move the surviving list cells to new storage.  This is
* because none of them can move in this operation, so it's the same as
* the old implementation in terms of what callers may assume.
*/

The old comment about extending the list seems more fitting.

9. I see your XXX comment in list_qsort(), but wouldn't it be better
to just invent list_qsort_internal() as a static function and just
have it qsort the list in-place, then make list_qsort just return
list_qsort_internal(list_copy(list)); and keep the XXX comment so that
the fixup would just remove the list_copy()? That way, when it comes
to fixing that inefficiency we can just cut and paste the internal
implementation into list_qsort(). It'll be much less churn, especially
if you put the internal version directly below the external one.

10. I wonder if we can reduce a bit of pain for extension authors by
back patching a macro that wraps up a lnext() call adding a dummy
argument for the List.  That way they don't have to deal with their
own pre-processor version dependent code. Downsides are we'd need to
keep the macro into the future, however, it's just 1 line of code...

I also did some more benchmarking of the patch. Basically, I patched
with the attached file (converted to .txt not to upset the CFbot) then
ran make installcheck. This was done on an AWS m5d.large instance.
The patch runs the planner 10 times then LOGs the average time of
those 10 runs. Taking the sum of those averages I see:

Master: 0.482479 seconds
Patched: 0.471949 seconds

Which makes the patched version 2.2% faster than master on that run.
I've resisted attaching the spreadsheet since there are almost 22k
data points per run.

Apart from the 10 points above, I think the patch is good to go.

I also agree with keeping the further improvements like getting rid of
the needless list_copy() in list_concat() calls as a followup patch. I
also agree with Tom about moving quickly with this one.  Reviewing it
in detail took me a long time, I'd really rather not do it again after
leaving it to rot for a while.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1bcfdd67e0..750320cb3b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-03 Thread Etsuro Fujita

On Tue, Jul 2, 2019 at 1:47 PM amul sul  wrote:
> Attached version is rebase atop of the latest master head(c74d49d41c), thanks.

Thanks!  Will review.

Best regards,
Etsuro Fujita

Re: [HACKERS] WIP: Aggregation push-down

2019-07-03 Thread Richard Guo

On Fri, Mar 1, 2019 at 12:01 AM Antonin Houska  wrote:

> Tom Lane  wrote:
>
> > Antonin Houska  writes:
> > > Michael Paquier  wrote:
> > >> Latest patch set fails to apply, so moved to next CF, waiting on
> > >> author.
> >
> > > Rebased.
> >
> > This is in need of rebasing again :-(.  I went ahead and pushed the 001
> > part, since that seemed fairly uncontroversial.
>
> ok, thanks.
>


Another rebase is needed for the patches.

Thanks
Richard

Re: Where is SSPI auth username determined for TAP tests?

2019-07-03 Thread Michael Paquier

On Sun, Jun 30, 2019 at 12:09:18PM -0400, Tom Lane wrote:
> We could apply the same hack on the source node, but I wonder if it
> wouldn't be better to make this less of a kluge.  I'm tempted to
> propose that "pg_regress --config-auth --user XXX" should understand
> XXX as the bootstrap superuser name, and then we could clean up
> 010_dump_connstr.pl by using that.

I have been reviewing that part, and the part to split the bootstrap
user from the set of extra roles created looks fine to me.  Now, it
seems to me that you can simplify 010_dump_connstr.pl as per the
attached because PostgresNode::Init can take care of --auth-config
part with the correct options using auth_extra.  What do you think
about the cleanup attached?
--
Michael
diff --git a/src/bin/pg_dump/t/010_dump_connstr.pl b/src/bin/pg_dump/t/010_dump_connstr.pl
index 048ecf24eb..e9f0ff8211 100644
--- a/src/bin/pg_dump/t/010_dump_connstr.pl
+++ b/src/bin/pg_dump/t/010_dump_connstr.pl
@@ -177,15 +177,11 @@ my $restore_super = qq{regress_a'b\\c=d\\ne"f};
 # dbname/user connection parameters
 
 my $envar_node = get_new_node('destination_envar');
-$envar_node->init(extra =>
-	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]);
-$envar_node->run_log(
-	[
-		$ENV{PG_REGRESS},  '--config-auth',
-		$envar_node->data_dir, '--user',
-		$dst_bootstrap_super,  '--create-role',
-		$restore_super
-	]);
+$envar_node->init(
+	extra =>
+	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ],
+	auth_extra =>
+	  [ '--user', $dst_bootstrap_super, '--create-role', $restore_super ]);
 $envar_node->start;
 
 # make superuser for restore
@@ -210,15 +206,11 @@ is($stderr, '', 'no dump errors');
 $restore_super =~ s/"//g
   if $TestLib::windows_os;# IPC::Run mishandles '"' on Windows
 my $cmdline_node = get_new_node('destination_cmdline');
-$cmdline_node->init(extra =>
-	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]);
-$cmdline_node->run_log(
-	[
-		$ENV{PG_REGRESS},'--config-auth',
-		$cmdline_node->data_dir, '--user',
-		$dst_bootstrap_super,'--create-role',
-		$restore_super
-	]);
+$cmdline_node->init(
+	extra =>
+	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ],
+	auth_extra =>
+	  [ '--user', $src_bootstrap_super, '--create-role', $restore_super ]);
 $cmdline_node->start;
 $cmdline_node->run_log(
 	[ 'createuser', '-U', $dst_bootstrap_super, '-s', $restore_super ]);


signature.asc
Description: PGP signature

66 matches

Mail list logo