Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Fabien COELHO



Hello Amit,


Yes, on -i it will fail because the syntax will not be recognized.


Maybe we should be checking the server version, which would allow to
produce more useful error messages when these options are used against
older servers, like

if (sversion < 1)
   fprintf(stderr, "cannot use --partitions/--partitions-method
against servers older than 10");

We would also have to check that partition-method=hash is not used against v10.

Maybe overkill?


Yes, I think so: the error detection and messages would be more or less 
replicated from the server and would vary from version to version.


I do not think that it is worth going this path because the use case is 
virtually void as people in 99.9% of cases would use a pgbench matching 
the server version. For those who do not, the error message should be 
clear enough to let them guess what the issue is. Also, it would be 
untestable.


One thing we could eventually do is just to check pgbench version against 
the server version like psql does and output a generic warning if they 
differ, but franckly I do not think it is worth the effort: ISTM that 
nobody ever complained about such issues. Also, that would be matter for 
another patch.


--
Fabien.




RE: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Smith, Peter
-Original Message-
From: Michael Paquier  Sent: Thursday, 19 September 2019 
11:08 AM

>On Wed, Sep 18, 2019 at 04:46:30PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> Postgres doesn't seem to have it, but it would be possible to define a 
>> StaticAssertDecl macro that can be used at the file level, outside any 
>> function.  See for example Perl's STATIC_ASSERT_DECL:
>> 
>> https://github.com/Perl/perl5/blob/v5.30.0/perl.h#L3455-L3488
>
>That sounds like a cleaner alternative.  Thanks for the pointer.

In the attached patch example I have defined a new macro StaticAssertDecl. A 
typical usage of it is shown in the relpath.c file.

The goal was to leave all existing Postgres static assert macros unchanged, but 
to allow static asserts to be added in the code at file scope without the need 
for the explicit ct_asserts function.

Notice, in reality the StaticAssertDecl macro still uses a static function as a 
wrapper for the StaticAssertStmt,  but now the function is not visible in the 
source.

If this strategy is acceptable I will update my original patch to remove all 
those ct_asserts functions, and instead put each StaticAssertDecl nearby the 
array that it is asserting (e.g. just like relpath.c)

Kind Regards,
Peter Smith
---
Fujitsu Australia


add_more_ct_asserts_StaticAssertDecl.patch
Description: add_more_ct_asserts_StaticAssertDecl.patch


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-18 Thread Michael Paquier
On Wed, Sep 18, 2019 at 03:46:20PM +0300, Alexey Kondratov wrote:
> Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] option, so
> I hope it worth adding this option here for convenience. Added in the new
> version.

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.  Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.

+-- check that all relations moved to new tablespace
+SELECT relname FROM pg_class
+WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE
spcname='regress_tblspace')
+AND relname IN ('regress_tblspace_test_tbl_idx');
+relname
+---
+ regress_tblspace_test_tbl_idx
+(1 row)
Just to check one relation you could use \d with the relation (index
or table) name.

-   if (RELATION_IS_OTHER_TEMP(iRel))
-   ereport(ERROR,
-   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-errmsg("cannot reindex temporary tables of other
-   sessions")))
I would keep the order of this operation in order with
CheckTableNotInUse().
--
Michael


signature.asc
Description: PGP signature


Re: Feature request: binary NOTIFY

2019-09-18 Thread Mitar
Hi!

On Wed, Sep 18, 2019 at 9:22 PM Tom Lane  wrote:
> [ shrug... ]  We can put that on the list of things we might want
> to do if the wire protocol ever gets changed.  I urgently recommend
> not holding your breath.

What is the process to add it to the list?

And yes, I will not expect it soon. :-) Thanks.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Feature request: binary NOTIFY

2019-09-18 Thread Tom Lane
Mitar  writes:
> What about adding NOTIFYB and LISTENB commands? And
> NotificationBinaryResponse? For binary?

[ shrug... ]  We can put that on the list of things we might want
to do if the wire protocol ever gets changed.  I urgently recommend
not holding your breath.

regards, tom lane




Re: Optimization of some jsonb functions

2019-09-18 Thread Alvaro Herrera
On 2019-Sep-19, Alvaro Herrera wrote:

> On 2019-Sep-18, Alvaro Herrera wrote:
> 
> > Well, I think that was useless, so I rebased again -- attached.
> 
> ... which is how you find out that 0001 as an independent patch is not
> really a valid one, since it depends on an API change that does not
> happen until 0005.

... and there were other compilation problems too, presumably fixed
silently by Joe in his rebase, but which I fixed again for this series
which now seems more credible.  I tested compile and regression tests
after each patch, it all works locally.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d4fcb682356f21652e6adfdf8df741a56e745377 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 21 Feb 2019 03:04:13 +0300
Subject: [PATCH v4 1/3] Optimize jsonb operator #>> using extracted
 JsonbValueAsText()

---
 src/backend/utils/adt/jsonfuncs.c | 180 ++
 1 file changed, 57 insertions(+), 123 deletions(-)

diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 667f9d9563..64bcf61daa 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -747,6 +747,47 @@ json_object_field_text(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 }
 
+static text *
+JsonbValueAsText(JsonbValue *v)
+{
+	switch (v->type)
+	{
+		case jbvNull:
+			return NULL;
+
+		case jbvBool:
+			return v->val.boolean ?
+cstring_to_text_with_len("true", 4) :
+cstring_to_text_with_len("false", 5);
+
+		case jbvString:
+			return cstring_to_text_with_len(v->val.string.val,
+			v->val.string.len);
+
+		case jbvNumeric:
+			{
+Datum		cstr = DirectFunctionCall1(numeric_out,
+	   PointerGetDatum(v->val.numeric));
+
+return cstring_to_text(DatumGetCString(cstr));
+			}
+
+		case jbvBinary:
+			{
+StringInfoData jtext;
+
+initStringInfo();
+(void) JsonbToCString(, v->val.binary.data, -1);
+
+return cstring_to_text_with_len(jtext.data, jtext.len);
+			}
+
+		default:
+			elog(ERROR, "unrecognized jsonb type: %d", (int) v->type);
+			return NULL;
+	}
+}
+
 Datum
 jsonb_object_field_text(PG_FUNCTION_ARGS)
 {
@@ -761,39 +802,9 @@ jsonb_object_field_text(PG_FUNCTION_ARGS)
 	   VARDATA_ANY(key),
 	   VARSIZE_ANY_EXHDR(key));
 
-	if (v != NULL)
-	{
-		text	   *result = NULL;
 
-		switch (v->type)
-		{
-			case jbvNull:
-break;
-			case jbvBool:
-result = cstring_to_text(v->val.boolean ? "true" : "false");
-break;
-			case jbvString:
-result = cstring_to_text_with_len(v->val.string.val, v->val.string.len);
-break;
-			case jbvNumeric:
-result = cstring_to_text(DatumGetCString(DirectFunctionCall1(numeric_out,
-			 PointerGetDatum(v->val.numeric;
-break;
-			case jbvBinary:
-{
-	StringInfo	jtext = makeStringInfo();
-
-	(void) JsonbToCString(jtext, v->val.binary.data, -1);
-	result = cstring_to_text_with_len(jtext->data, jtext->len);
-}
-break;
-			default:
-elog(ERROR, "unrecognized jsonb type: %d", (int) v->type);
-		}
-
-		if (result)
-			PG_RETURN_TEXT_P(result);
-	}
+	if (v != NULL && v->type != jbvNull)
+		PG_RETURN_TEXT_P(JsonbValueAsText(v));
 
 	PG_RETURN_NULL();
 }
@@ -878,39 +889,9 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
 	}
 
 	v = getIthJsonbValueFromContainer(>root, element);
-	if (v != NULL)
-	{
-		text	   *result = NULL;
 
-		switch (v->type)
-		{
-			case jbvNull:
-break;
-			case jbvBool:
-result = cstring_to_text(v->val.boolean ? "true" : "false");
-break;
-			case jbvString:
-result = cstring_to_text_with_len(v->val.string.val, v->val.string.len);
-break;
-			case jbvNumeric:
-result = cstring_to_text(DatumGetCString(DirectFunctionCall1(numeric_out,
-			 PointerGetDatum(v->val.numeric;
-break;
-			case jbvBinary:
-{
-	StringInfo	jtext = makeStringInfo();
-
-	(void) JsonbToCString(jtext, v->val.binary.data, -1);
-	result = cstring_to_text_with_len(jtext->data, jtext->len);
-}
-break;
-			default:
-elog(ERROR, "unrecognized jsonb type: %d", (int) v->type);
-		}
-
-		if (result)
-			PG_RETURN_TEXT_P(result);
-	}
+	if (v != NULL && v->type != jbvNull)
+		PG_RETURN_TEXT_P(JsonbValueAsText(v));
 
 	PG_RETURN_NULL();
 }
@@ -1388,7 +1369,6 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 {
 	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
 	ArrayType  *path = PG_GETARG_ARRAYTYPE_P(1);
-	Jsonb	   *res;
 	Datum	   *pathtext;
 	bool	   *pathnulls;
 	int			npath;
@@ -1396,7 +1376,6 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 	bool		have_object = false,
 have_array = false;
 	JsonbValue *jbvp = NULL;
-	JsonbValue	tv;
 	JsonbContainer *container;
 
 	/*
@@ -1508,41 +1487,30 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 
 		if (jbvp->type == jbvBinary)
 		{
-			JsonbIterator *it = 

Re: Feature request: binary NOTIFY

2019-09-18 Thread Mitar
Hi!

On Tue, Sep 17, 2019 at 7:10 AM Tom Lane  wrote:
> Yeah it is ... the internal async-queue data structure assumes
> null-terminated strings.  What's a lot worse, so does the
> wire protocol's NotificationResponse message, as well as every
> existing client that can read it.  (For instance, libpq's exposed
> API for notify messages hands back the payload as a null-terminated
> string.)  I don't think this is going to happen.

Ahh. Any particular reason for this design decision at that time?

What about adding NOTIFYB and LISTENB commands? And
NotificationBinaryResponse? For binary?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Optimization of some jsonb functions

2019-09-18 Thread Alvaro Herrera
On 2019-Sep-18, Alvaro Herrera wrote:

> Well, I think that was useless, so I rebased again -- attached.

... which is how you find out that 0001 as an independent patch is not
really a valid one, since it depends on an API change that does not
happen until 0005.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-18 Thread Amit Kapila
On Thu, Sep 19, 2019 at 7:34 AM Michael Paquier  wrote:
>
> On Wed, Sep 18, 2019 at 10:37:27AM +0900, Michael Paquier wrote:
> > I am attaching an updated patch for now that I would like to commit.
> > Are there more comments about the shape of the patch, the name of the
> > columns for the function, etc.?
>
> Okay, I have done an extra round of review, and committed it.
>

Thanks.  I was about to have a look today, but anyway I checked the
committed patch and it looks fine.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Kyotaro Horiguchi
At Thu, 19 Sep 2019 10:07:40 +0900, Michael Paquier  wrote 
in <20190919010740.gc22...@paquier.xyz>
> On Wed, Sep 18, 2019 at 04:46:30PM +0100, Dagfinn Ilmari Mannsåker wrote:
> > Postgres doesn't seem to have it, but it would be possible to define a
> > StaticAssertDecl macro that can be used at the file level, outside any
> > function.  See for example Perl's STATIC_ASSERT_DECL:
> > 
> > https://github.com/Perl/perl5/blob/v5.30.0/perl.h#L3455-L3488
> 
> That sounds like a cleaner alternative.  Thanks for the pointer.

The cause for StaticAssertStmt not being usable outside of
functions is enclosing do-while, which is needed to avoid "mixed
declaration" warnings, which we are inhibiting to use as of
now. Therefore just defining another macro defined as just
_Static_assert() works fine.

I don't find an alternative way for the tool chains that don't
have static assertion feature. In the attached diff the macro is
defined as nothing. I don't find a way to warn that the assertion
is ignored.

regards.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89339..822b9846bf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4601,7 +4601,6 @@ static void write_auto_conf_file(int fd, const char *filename, ConfigVariable *h
 static void replace_auto_config_value(ConfigVariable **head_p, ConfigVariable **tail_p,
 	  const char *name, const char *value);
 
-
 /*
  * Some infrastructure for checking malloc/strdup/realloc calls
  */
diff --git a/src/include/c.h b/src/include/c.h
index f461628a24..a386a81a19 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -836,11 +836,14 @@ extern void ExceptionalCondition(const char *conditionName,
 #ifdef HAVE__STATIC_ASSERT
 #define StaticAssertStmt(condition, errmessage) \
 	do { _Static_assert(condition, errmessage); } while(0)
+#define StaticAssertDecl(condition, errmessage) \
+	_Static_assert(condition, errmessage)
 #define StaticAssertExpr(condition, errmessage) \
 	((void) ({ StaticAssertStmt(condition, errmessage); true; }))
 #else			/* !HAVE__STATIC_ASSERT */
 #define StaticAssertStmt(condition, errmessage) \
 	((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : -1; }))
+#define StaticAssertDecl(condition, errmessage) /* no alternatives exist */
 #define StaticAssertExpr(condition, errmessage) \
 	StaticAssertStmt(condition, errmessage)
 #endif			/* HAVE__STATIC_ASSERT */
@@ -848,11 +851,14 @@ extern void ExceptionalCondition(const char *conditionName,
 #if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
 #define StaticAssertStmt(condition, errmessage) \
 	static_assert(condition, errmessage)
+#define StaticAssertDecl(condition, errmessage) \
+	static_assert(condition, errmessage)
 #define StaticAssertExpr(condition, errmessage) \
 	({ static_assert(condition, errmessage); })
 #else
 #define StaticAssertStmt(condition, errmessage) \
 	do { struct static_assert_struct { int static_assert_failure : (condition) ? 1 : -1; }; } while(0)
+#define StaticAssertDecl(condition, errmessage) /* no alternatives exist */
 #define StaticAssertExpr(condition, errmessage) \
 	((void) ({ StaticAssertStmt(condition, errmessage); }))
 #endif


Re: Zedstore - compressed in-core columnar storage

2019-09-18 Thread Alexandra Wang
On Tue, Sep 17, 2019 at 4:15 AM Ashutosh Sharma 
wrote:

> create table t1(a int, b int) using zedstore;
> insert into t1 select i, i+10 from generate_series(1, 100) i;
> postgres=# update t1 set b = 200;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Above update statement crashed due to some extensive memory leak.
>

Thank you for reporting! We have located the memory leak and also
noticed some other memory related bugs. We are working on the fixes
please stay tuned!


> I also found some typos when going through the writeup in
> zedstore_internal.h and thought of correcting those. Attached is the
> patch with the changes.
>

Applied. Thank you!


Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Amit Langote
Hi Fabien,

On Thu, Sep 19, 2019 at 2:03 AM Fabien COELHO  wrote:
> > If that is the case, then I think if user gives --partitions for the old
> > server version, it will also give an error?
>
> Yes, on -i it will fail because the syntax will not be recognized.

Maybe we should be checking the server version, which would allow to
produce more useful error messages when these options are used against
older servers, like

if (sversion < 1)
fprintf(stderr, "cannot use --partitions/--partitions-method
against servers older than 10");

We would also have to check that partition-method=hash is not used against v10.

Maybe overkill?

Thanks,
Amit




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-18 Thread Michael Paquier
On Wed, Sep 18, 2019 at 10:37:27AM +0900, Michael Paquier wrote:
> I am attaching an updated patch for now that I would like to commit.
> Are there more comments about the shape of the patch, the name of the
> columns for the function, etc.?

Okay, I have done an extra round of review, and committed it.
--
Michael


signature.asc
Description: PGP signature


Re: backup manifests

2019-09-18 Thread David Steele
Hi Robert,

On 9/18/19 1:48 PM, Robert Haas wrote:
> That whole approach might still be dead on
> arrival if it's possible to add new blocks with old LSNs to existing
> files,[7] but there seems to be room to hope that there are no such
> cases.[8]

I sure hope there are no such cases, but we should be open to the idea
just in case.

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. 

These are essential.

Also consider adding the timestamp.  You have justifiable concerns about
using timestamps for deltas and I get that.  However, there are a number
of methods that can be employed to make it *much* safer.  I won't go
into that here since it is an entire thread in itself.  Suffice to say
we can detect many anomalies in the timestamps and require a checksum
backup when we see them.  I'm really interested in scanning the WAL for
changed files but that method is very complex and getting it right might
be harder than ensuring FS checksums are reliable.  Still worth trying,
though, since the benefits are enormous.  We are planning to use
timestamp + size + wal data to do incrementals if we get there.

Consider adding a reference to each file that specifies where the file
can be found in if it is not in this backup.  As I understand the
pg_basebackup proposal, it would only be implementing differential
backups, i.e. an incremental that is *only* based on the last full
backup.  So, the reference can be inferred in this case.  However, if
the user selects the wrong full backup on restore, and we have labeled
each backup, then a differential restore with references against the
wrong full backup would result in a hard error rather than corruption.

> I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. 

Based on my original calculations (which sadly I don't have anymore),
the combination of SHA1, size, and file name is *extremely* unlikely to
generate a collision.  As in, unlikely to happen before the end of the
universe kind of unlikely.  Though, I guess it depends on your
expectations for the lifetime of the universe.

These checksums don't have to be cryptographically secure, in the sense
that you could infer the plaintext from the checksum.  They just need to
have a suitably low collision rate.  These days I would choose something
with more bits because the computation time is similar, though the
larger size requires more storage.

> I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. 

Our benchmarks have indicated that checksums only account for about 1%
of total cpu time when gzip -6 compression is used.  Without compression
the percentage may be higher of course, but in that case we find network
latency is the primary bottleneck.

For S3 backups we do a SHA1 hash for our manifest, a SHA256 hash for
authv4 and a good-old-fashioned MD5 checksum for each upload part.  This
is barely noticeable when compression is enabled.

> Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  

Absolutely -- and yet.  There was a time when we made checksums optional
but eventually gave up on that once we profiled and realized how low the
cost was vs. the benefit.

> The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. 

This is a good idea.  Amazingly we've never seen a manifest checksum
error in the field but it's only a matter of time.

And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?

A backup label for sure.  You can also use this as the directory/tar
name to save the user coming up with one.  We use MMDDHH24MMSSF for
full backups and MMDDHH24MMSSF_MMDDHH24MMSS(D|I) for
incrementals and have logic to prevent two backups from having the same
label.  This is unlikely outside of testing but still a good idea.

Knowing the start/stop time of the backup is useful in all kinds of
ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
also good.  I know this is also in backup_label but having it all in one
place is nice.


Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Michael Paquier
On Wed, Sep 18, 2019 at 04:46:30PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Postgres doesn't seem to have it, but it would be possible to define a
> StaticAssertDecl macro that can be used at the file level, outside any
> function.  See for example Perl's STATIC_ASSERT_DECL:
> 
> https://github.com/Perl/perl5/blob/v5.30.0/perl.h#L3455-L3488

That sounds like a cleaner alternative.  Thanks for the pointer.
--
Michael


signature.asc
Description: PGP signature


RE: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Smith, Peter
-Original Message-
From: Michael Paquier   Sent: Wednesday, 18 September 2019 
5:40 PM

> For some of them it could help, and we could think about a better location 
> for that stuff than an unused routine.  The indentation of your patch is 
> weird, with "git diff --check" complaining a lot.
>
> If you want to discuss more about that, could you add that to the next commit 
> fest?  Here it is: https://commitfest.postgresql.org/25/

Hi Michael,

Thanks for your feedback and advice.

I have modified the patch to clean up the whitespace issues, and added it to 
the next commit fest.

Kind Regards,
---
Peter Smith
Fujitsu Australia


add_more_ct_asserts.patch
Description: add_more_ct_asserts.patch


RE: [PATCH] Speedup truncates of relation forks

2019-09-18 Thread Jamison, Kirk
On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:
> On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk 
> wrote:
> >
> > On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:
> > > On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
> > > 
> > > wrote:
> > > >
> > > > On 2019-Sep-13, Fujii Masao wrote:
> > > >
> > > > > On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
> > > > > 
> > > wrote:
> > > >
> > > > > > > Please add a preliminary patch that removes the function.
> > > > > > > Dead code is good, as long as it is gone.  We can get it
> > > > > > > pushed ahead of
> > > the rest of this.
> > > > > >
> > > > > > Alright. I've attached a separate patch removing the
> smgrdounlinkfork.
> > > > >
> > > > > Per the past discussion, some people want to keep this "dead"
> > > > > function for some reasons. So, in my opinion, it's better to
> > > > > just enclose the function with #if NOT_USED and #endif, to keep
> > > > > the function itself as it is, and then to start new discussion
> > > > > on hackers about the removal of that separatedly from this patch.
> > > >
> > > > I searched for anybody requesting to keep the function.  I
> > > > couldn't find anything.  Tom said in 2012:
> > > > https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.u
> > > > s
> > >
> > > Yes. And I found Andres.
> > > https://www.postgresql.org/message-id/20180621174129.hogefyopje4xazn
> > > u@al
> > > ap3.anarazel.de
> > >
> > > > > As committed, the smgrdounlinkfork case is actually dead code;
> > > > > it's never called from anywhere.  I left it in place just in
> > > > > case we want it someday.
> > > >
> > > > but if no use has appeared in 7 years, I say it's time to kill it.
> > >
> > > +1
> >
> > The consensus is we remove it, right?
> > Re-attaching the patch that removes the deadcode: smgrdounlinkfork().
> >
> > ---
> > I've also fixed Fujii-san's comments below in the latest attached speedup
> truncate rel patch (v8).
> 
> Thanks for updating the patch!
> 
> + block = visibilitymap_truncate_prepare(rel, 0); if
> + (BlockNumberIsValid(block))
>   {
> - xl_smgr_truncate xlrec;
> + fork = VISIBILITYMAP_FORKNUM;
> + smgrtruncate(rel->rd_smgr, , 1, );
> +
> + if (RelationNeedsWAL(rel))
> + {
> + xl_smgr_truncate xlrec;
> 
> I don't think this fix is right. Originally, WAL is generated even in the
> case where visibilitymap_truncate_prepare() returns InvalidBlockNumber. But
> the patch unexpectedly changed the logic so that WAL is not generated in that
> case.
> 
> + if (fsm)
> + FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
> + InvalidBlockNumber);
> 
> This code means that FreeSpaceMapVacuumRange() is called if FSM exists even
> if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
> This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
> in the case where InvalidBlockNumber is returned.
> 
> So I updated the patch based on yours and fixed the above issues.
> Attached. Could you review this one? If there is no issue in that, I'm 
> thinking
> to commit that.

Oops. Thanks for the catch to correct my fix and revision of some descriptions.
I also noticed you reordered the truncation of forks,  by which main fork will 
be
truncated first instead of FSM. I'm not sure if the order matters now given that
we're truncating the forks simultaneously, so I'm ok with that change.

Just one minor comment:
+ * Return the number of blocks of new FSM after it's truncated.

"after it's truncated" is quite confusing. 
How about, "as a result of previous truncation" or just end the sentence after 
new FSM?


Thank you for committing the other patch as well!


Regards,
Kirk Jamison


Re: Define jsonpath functions as stable

2019-09-18 Thread Tom Lane
Chapman Flack  writes:
> On 09/18/19 17:12, Tom Lane wrote:
>> As such, I think this doesn't apply to SQL/JSON.  The SQL/JSON spec
>> seems to defer to Javascript/ECMAscript for syntax details, and
>> in either of those languages you have backslash escape sequences
>> for writing weird characters, *not* XML entities.  You certainly
>> wouldn't have use of such entities in a native implementation of
>> LIKE_REGEX in SQL.

> So yeah, that seems to be correct.

Thanks for double-checking.  I removed that para from the patch.

>> So now I'm thinking we can just remove the handwaving about entities.
>> On the other hand, this points up a large gap in our docs about
>> SQL/JSON, which is that nowhere does it even address the question of
>> what the string literal syntax is within a path expression.

> That does seem like it ought to be covered.

I found a spot that seemed like a reasonable place, and added some
coverage of the point.  Updated patch attached.

It seems to me that there are some discrepancies between what the spec
says and what jsonpath_scan.l actually does, so maybe we should take a
hard look at that code too.  The biggest issue is that jsonpath_scan.l
seems to allow single- and double-quoted strings interchangeably, which is
OK per ECMAScript, but then the SQL/JSON spec seems to be saying that only
double-quoted strings are allowed.  I'd rather be conservative about this
than get out in front of the spec and use syntax space that they might do
something else with someday.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0c..16e41a6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5970,6 +5970,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
 
 
+   
+   Differences From XQuery (LIKE_REGEX)
+
+   
+LIKE_REGEX
+   
+
+   
+XQuery regular expressions
+   
+
+
+ Since SQL:2008, the SQL standard includes
+ a LIKE_REGEX operator that performs pattern
+ matching according to the XQuery regular expression
+ standard.  PostgreSQL does not yet
+ implement this operator, but you can get very similar behavior using
+ the regexp_match() function, since XQuery
+ regular expressions are quite close to the ARE syntax described above.
+
+
+
+ Notable differences between the existing POSIX-based
+ regular-expression feature and XQuery regular expressions include:
+
+ 
+  
+   
+XQuery character class subtraction is not supported.  An example of
+this feature is using the following to match only English
+consonants: [a-z-[aeiou]].
+   
+  
+  
+   
+XQuery character class shorthands \c,
+\C, \i,
+and \I are not supported.
+   
+  
+  
+   
+XQuery character class elements
+using \p{UnicodeProperty} or the
+inverse \P{UnicodeProperty} are not supported.
+   
+  
+  
+   
+POSIX interprets character classes such as \w
+(see )
+according to the prevailing locale (which you can control by
+attaching a COLLATE clause to the operator or
+function).  XQuery specifies these classes by reference to Unicode
+character properties, so equivalent behavior is obtained only with
+a locale that follows the Unicode rules.
+   
+  
+  
+   
+The SQL standard (not XQuery itself) attempts to cater for more
+variants of newline than POSIX does.  The
+newline-sensitive matching options described above consider only
+ASCII NL (\n) to be a newline, but SQL would have
+us treat CR (\r), CRLF (\r\n)
+(a Windows-style newline), and some Unicode-only characters like
+LINE SEPARATOR (U+2028) as newlines as well.
+Notably, . and \s should
+count \r\n as one character not two according to
+SQL.
+   
+  
+  
+   
+Of the character-entry escapes described in
+,
+XQuery supports only \n, \r,
+and \t.
+   
+  
+  
+   
+XQuery does not support
+the [:name:] syntax
+for character classes within bracket expressions.
+   
+  
+  
+   
+XQuery does not have lookahead or lookbehind constraints,
+nor any of the constraint escapes described in
+.
+   
+  
+  
+   
+The metasyntax forms described in 
+do not exist in XQuery.
+   
+  
+  
+   
+The regular expression flag letters defined by XQuery are
+related to but not the same as the option letters for POSIX
+().  While the
+i and q options behave the
+same, others do not:
+
+ 
+  
+   XQuery's s (allow dot to match newline)
+   and m (allow ^
+   and $ to match at newlines) flags provide
+   

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-18 Thread Thomas Munro
On Thu, Sep 19, 2019 at 7:58 AM Nikolay Shaplov  wrote:
> В Fri, 2 Aug 2019 11:12:35 +1200
> Thomas Munro  пишет:
>
> > While moving this to the September CF, I noticed this failure:
> >
> > test reloptions   ... FAILED   32 ms
>
> Do you have any idea, how to reproduce this? I tried this patch on
> current master, and did not get result you are talking about.
> Is it still there for you BTW?

Hi Nikolay,

Yeah, it's still happening on Travis:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/586714100

Although the "reloptions" tests passes when it's run as part of the
regular test schedule (ie "make check"), the patch also runs it from
src/test/modules/dummy_index_am/Makefile ("REGRESS = reloptions"), and
when it runs in that context it fails.  Cfbot is simply running "make
check-world".

Let's see if I can see this on my Mac... yep, with "make -C
src/test/modules/dummy_index_am directory check" I see a similar
failure with "ERROR:  unrecognized lock mode: 2139062143".  I changed
that to a PANIC and got a core file containing this stack:

frame #4: 0x0001051e6572 postgres`elog_finish(elevel=22,
fmt="unrecognized lock mode: %d") at elog.c:1365:2
frame #5: 0x000104ff033a
postgres`LockAcquireExtended(locktag=0x7ffeeb14bc28,
lockmode=2139062143, sessionLock=false, dontWait=false,
reportMemoryError=true, locallockp=0x7ffeeb14bc20) at lock.c:756:3
frame #6: 0x000104fedaed postgres`LockRelationOid(relid=16397,
lockmode=2139062143) at lmgr.c:116:8
frame #7: 0x000104c056f2
postgres`RangeVarGetRelidExtended(relation=0x7fbd0f000b58,
lockmode=2139062143, flags=0,
callback=(postgres`RangeVarCallbackForAlterRelation at
tablecmds.c:14834), callback_arg=0x7fbd0f000d60) at
namespace.c:379:4
frame #8: 0x000104d4b14d
postgres`AlterTableLookupRelation(stmt=0x7fbd0f000d60,
lockmode=2139062143) at tablecmds.c:3445:9
frame #9: 0x00010501ff8b
postgres`ProcessUtilitySlow(pstate=0x7fbd10800d18,
pstmt=0x7fbd0f0010b0, queryString="ALTER INDEX test_idx SET
(int_option = 3);", context=PROCESS_UTILITY_TOPLEVEL,
params=0x, queryEnv=0x,
dest=0x7fbd0f0011a0, completionTag="") at utility.c::14
frame #10: 0x00010501f480
postgres`standard_ProcessUtility(pstmt=0x7fbd0f0010b0,
queryString="ALTER INDEX test_idx SET (int_option = 3);",
context=PROCESS_UTILITY_TOPLEVEL, params=0x,
queryEnv=0x, dest=0x7fbd0f0011a0,
completionTag="") at utility.c:927:4

AlterTableGetLockLevel() returns that crazy lockmode value, becase it
calls AlterTableGetRelOptionsLockLevel(), I suspect with a garbage
defList, but I didn't dig further.

-- 
Thomas Munro
https://enterprisedb.com




Re: Define jsonpath functions as stable

2019-09-18 Thread Chapman Flack
On 09/18/19 17:12, Tom Lane wrote:

> After further reading, it seems like what that text is talking about
> is not actually a regex feature, but an outgrowth of the fact that
> the regex pattern is being expressed as a string literal in a language
> for which XML character entities are a native aspect of the string
> literal syntax.  So it looks to me like the entities get folded to
> raw characters in a string-literal parser before the regex engine
> ever sees them.

Hmm. That occurred to me too, but I thought the explicit mention of
'character reference' in the section specific to regexes[1] might not
mean that. It certainly could have been clearer.

But you seem to have the practical agreement of both BaseX:

let $foo := codepoints-to-string((38,35,120,54,49,59))
return ($foo, matches('a', $foo))
--

false

and the Saxon-based pljava example:

select occurrences_regex('', 'a', w3cNewlines => true);
 occurrences_regex
---
 0

> As such, I think this doesn't apply to SQL/JSON.  The SQL/JSON spec
> seems to defer to Javascript/ECMAscript for syntax details, and
> in either of those languages you have backslash escape sequences
> for writing weird characters, *not* XML entities.  You certainly
> wouldn't have use of such entities in a native implementation of
> LIKE_REGEX in SQL.

So yeah, that seems to be correct.

The upshot seems to be a two-parter:

1. Whatever string literal syntax is used in front of the regex engine
   had better have some way to represent any character you could want
   to match, and
2. There is only one way to literally match a character that is a regex
   metacharacter, namely, to precede it with a backslash (that the regex
   engine will see; therefore doubled if necessary). Whatever codepoint
   escape form might be available in the string literal syntax does not
   offer another way to do that, because it happens too early, before
   the regex engine can see it.

> So now I'm thinking we can just remove the handwaving about entities.
> On the other hand, this points up a large gap in our docs about
> SQL/JSON, which is that nowhere does it even address the question of
> what the string literal syntax is within a path expression.

That does seem like it ought to be covered.

Regards,
-Chap




Re: Optimization of some jsonb functions

2019-09-18 Thread Alvaro Herrera
On 2019-Jul-26, Joe Nelson wrote:

> Thomas Munro wrote:
> > This doesn't apply -- to attract reviewers, could we please have a rebase?
> 
> To help the review go forward, I have rebased the patch on 27cd521e6e.
> It passes `make check` for me, but that's as far as I've verified the
> correctness.
> 
> I squashed the changes into a single patch, sorry if that makes it
> harder to review than the original set of five patch files...

Well, I think that was useless, so I rebased again -- attached.
(Thanks, git-imerge).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 26ef2d6940dbba84b5e027f7d3a19f9dce948c7e Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 21 Feb 2019 02:52:24 +0300
Subject: [PATCH v3 1/5] Optimize JsonbExtractScalar()

---
 src/backend/utils/adt/jsonb.c | 25 +++--
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 69f41ab455..9e1ad0a097 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1873,9 +1873,7 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS)
 bool
 JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res)
 {
-	JsonbIterator *it;
-	JsonbIteratorToken tok PG_USED_FOR_ASSERTS_ONLY;
-	JsonbValue	tmp;
+	JsonbValue	*scalar PG_USED_FOR_ASSERTS_ONLY;
 
 	if (!JsonContainerIsArray(jbc) || !JsonContainerIsScalar(jbc))
 	{
@@ -1884,25 +1882,8 @@ JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res)
 		return false;
 	}
 
-	/*
-	 * A root scalar is stored as an array of one element, so we get the array
-	 * and then its first (and only) member.
-	 */
-	it = JsonbIteratorInit(jbc);
-
-	tok = JsonbIteratorNext(, , true);
-	Assert(tok == WJB_BEGIN_ARRAY);
-	Assert(tmp.val.array.nElems == 1 && tmp.val.array.rawScalar);
-
-	tok = JsonbIteratorNext(, res, true);
-	Assert(tok == WJB_ELEM);
-	Assert(IsAJsonbScalar(res));
-
-	tok = JsonbIteratorNext(, , true);
-	Assert(tok == WJB_END_ARRAY);
-
-	tok = JsonbIteratorNext(, , true);
-	Assert(tok == WJB_DONE);
+	scalar = getIthJsonbValueFromContainer(jbc, 0, res);
+	Assert(scalar);
 
 	return true;
 }
-- 
2.17.1

>From 7010862a80420f6407badbd814ef1378b0bca290 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 21 Feb 2019 03:04:13 +0300
Subject: [PATCH v3 2/5] Optimize jsonb operator #>> using extracted
 JsonbValueAsText()

---
 src/backend/utils/adt/jsonfuncs.c | 164 +-
 1 file changed, 50 insertions(+), 114 deletions(-)

diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 667f9d9563..c7f71408d5 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -747,6 +747,47 @@ json_object_field_text(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 }
 
+static text *
+JsonbValueAsText(JsonbValue *v)
+{
+	switch (v->type)
+	{
+		case jbvNull:
+			return NULL;
+
+		case jbvBool:
+			return v->val.boolean ?
+cstring_to_text_with_len("true", 4) :
+cstring_to_text_with_len("false", 5);
+
+		case jbvString:
+			return cstring_to_text_with_len(v->val.string.val,
+			v->val.string.len);
+
+		case jbvNumeric:
+			{
+Datum		cstr = DirectFunctionCall1(numeric_out,
+	   PointerGetDatum(v->val.numeric));
+
+return cstring_to_text(DatumGetCString(cstr));
+			}
+
+		case jbvBinary:
+			{
+StringInfoData jtext;
+
+initStringInfo();
+(void) JsonbToCString(, v->val.binary.data, -1);
+
+return cstring_to_text_with_len(jtext.data, jtext.len);
+			}
+
+		default:
+			elog(ERROR, "unrecognized jsonb type: %d", (int) v->type);
+			return NULL;
+	}
+}
+
 Datum
 jsonb_object_field_text(PG_FUNCTION_ARGS)
 {
@@ -761,39 +802,9 @@ jsonb_object_field_text(PG_FUNCTION_ARGS)
 	   VARDATA_ANY(key),
 	   VARSIZE_ANY_EXHDR(key));
 
-	if (v != NULL)
-	{
-		text	   *result = NULL;
 
-		switch (v->type)
-		{
-			case jbvNull:
-break;
-			case jbvBool:
-result = cstring_to_text(v->val.boolean ? "true" : "false");
-break;
-			case jbvString:
-result = cstring_to_text_with_len(v->val.string.val, v->val.string.len);
-break;
-			case jbvNumeric:
-result = cstring_to_text(DatumGetCString(DirectFunctionCall1(numeric_out,
-			 PointerGetDatum(v->val.numeric;
-break;
-			case jbvBinary:
-{
-	StringInfo	jtext = makeStringInfo();
-
-	(void) JsonbToCString(jtext, v->val.binary.data, -1);
-	result = cstring_to_text_with_len(jtext->data, jtext->len);
-}
-break;
-			default:
-elog(ERROR, "unrecognized jsonb type: %d", (int) v->type);
-		}
-
-		if (result)
-			PG_RETURN_TEXT_P(result);
-	}
+	if (v != NULL && v->type != jbvNull)
+		PG_RETURN_TEXT_P(JsonbValueAsText(v));
 
 	PG_RETURN_NULL();
 }
@@ -878,39 +889,9 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
 	}
 
 	v = getIthJsonbValueFromContainer(>root, element);
-	if (v != NULL)
-	{
-		text	   *result = 

Re: subscriptionCheck failures on nightjar

2019-09-18 Thread Tomas Vondra

On Wed, Sep 18, 2019 at 04:25:14PM +0530, Kuntal Ghosh wrote:

Hello Michael,

On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier  wrote:


On my side, I have let this thing run for a couple of hours with a
patched version to include a sleep between the rename and the sync but
I could not reproduce it either:
#!/bin/bash
attempt=0
while true; do
attempt=$((attempt+1))
echo "Attempt $attempt"
cd $HOME/postgres/src/test/recovery/
PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
ERRNUM=$?
if [ $ERRNUM != 0 ]; then
echo "Failed at attempt $attempt"
exit $ERRNUM
fi
done

I think the failing test is src/test/subscription/t/010_truncate.pl.
I've tried to reproduce the same failure using your script in OS X
10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
couldn't reproduce the same.



I kinda suspect it might be just a coincidence that it fails during that
particular test. What likely plays a role here is a checkpoint timing
(AFAICS that's the thing removing the file).  On most systems the tests
complete before any checkpoint is triggered, hence no issue.

Maybe aggressively triggering checkpoints on the running cluter from
another session would do the trick ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Memory Accounting

2019-09-18 Thread Melanie Plageman
I wanted to address a couple of questions Jeff asked me off-list about
Greenplum's implementations of Memory Accounting.

Greenplum has two memory accounting sub-systems -- one is the
MemoryContext-based system proposed here.
The other memory accounting system tracks "logical" memory owners in
global accounts. For example, every node in a plan has an account,
however, there are other execution contexts, such as the parser, which
have their own logical memory accounts.
Notably, this logical memory account system also tracks chunks instead
of blocks.

The rationale for tracking memory at the logical owner level was that
memory for a logical owner may allocate memory across multiple
contexts and a single context may contain memory belonging to several
of these logical owners.

More compellingly, many of the allocations done during execution are
done directly in the per query or per tuple context--as opposed to
being done in their own uniquely named context. Arguably, this is
because those logical owners (a Result node, for example) are not
memory-intensive and thus do not require specific memory accounting.
However, when debugging a memory leak or OOM, the specificity of
logical owner accounts was seen as desirable. A discrepancy between
memory allocated and memory freed in the per query context doesn't
provide a lot of hints as to the source of the leak.
At the least, there was no meaningful way to represent MemoryContext
account balances in EXPLAIN ANALYZE. Where would the TopMemoryContext
be represented, for example.

Also, by using logical accounts, each node in the plan could be
assigned a quota at plan time--because many memory intensive operators
will not have relinquished the memory they hold when other nodes are
executing (e.g. Materialize)--so, instead of granting each node
work_mem, work_mem is divided up into quotas for each operator in a
particular way. This was meant to pave the way for work_mem
enforcement. This is a topic that has come up in various ways in other
forums. For example, in the XPRS thread, the discussion of erroring
out for queries with no "escape mechanism" brought up by Thomas Munro
[1] is a kind of work_mem enforcement (this discussion was focused
more on a proposed session-level memory setting, but it is still
enforcement of a memory setting).
It was also discussed at PGCon this year in an unconference session on
OOM-detection and debugging, runaway query termination, and
session-level memory consumption tracking [2].

The motivation for tracking chunks instead of blocks was to understand
the "actual" memory consumption of different components in the
database. Then, eventually, memory consumption patterns would emerge
and improvements could be made to memory allocation strategies to suit
different use cases--perhaps other implementations of the
MemoryContext API similar to Slab and Generation were envisioned.
Apparently, it did lead to the discovery of some memory fragmentation
issues which were tuned.

I bring these up not just to answer Jeff's question but also to
provide some anecdotal evidence that the patch here is a good base for
other memory accounting and tracking schemes.

Even if HashAgg is the only initial consumer of the memory accounting
framework, we know that tuplesort can make use of it in its current
state as well. And, if another node or component requires
chunk-tracking, they could implement a different MemoryContext API
implementation which uses the MemoryContextData->mem_allocated field
to track chunks instead of blocks by tracking chunks in its alloc/free
functions.

Ideas like logical memory accounting could leverage the mem_allocated
field and build on top of it.

[1]
https://www.postgresql.org/message-id/CA%2BhUKGJEMT7SSZRqt-knu_3iLkdscBCe9M2nrhC259FdE5bX7g%40mail.gmail.com
[2] https://wiki.postgresql.org/wiki/PgCon_2019_Developer_Unconference


Re: Fix parsing of identifiers in jsonpath

2019-09-18 Thread Tom Lane
Nikita Glukhov  writes:
> I don't know if it is possible to check Unicode properties "ID_Start" and
> "ID_Continue" in Postgres, and what ZWNJ/ZWJ is.  Now, identifier's starting
> character set is simply determined by the exclusion of all recognized special
> characters.

TBH, I think you should simply ignore any aspect of any of these standards
that is defined by reference to Unicode.  We are not necessarily dealing
with a Unicode character set, so at best, references to things like ZWNJ
are unreachable no-ops in a lot of environments.

As a relevant example, modern SQL defines whitespace in terms of Unicode[1],
a fact that we have ignored from the start and will likely continue to
do so.

You could do a lot worse than to just consider identifiers to be the same
strings as our SQL lexer would do (modulo things like "$" that have
special status in the path language).

regards, tom lane

[1] cf 4.2.4 "Character repertoires" in SQL:2011




Re: Define jsonpath functions as stable

2019-09-18 Thread Tom Lane
"Jonathan S. Katz"  writes:
> On 9/17/19 6:40 PM, Tom Lane wrote:
>> After a re-read of the XQuery spec, it seems to me that the character
>> entry form that they have and we don't is actually 

Re: log bind parameter values on error

2019-09-18 Thread Alvaro Herrera
Nice patch, thanks.

I didn't like abusing testlibpq3.c for your new stuff, so I moved it off
to a new file testlibpq5.c.  I cleaned up a few other cosmetics things
about this -- v10 attached.  I eventually noticed that this patch fails
to initialize each param's textValue to NULL, which probably explains
why you have to be so careful about only setting hasTextValues after the
whole loop.  That seems a bit too trusting; I think it would be better
to set all these to NULL in makeParamList instead of leaving the memory
undefined.  One way would be to have a for() look in makeParamList that
nullifies the member; another would be to use palloc0().

A third possibility is to inspect each caller of makeParamList and have
them all set textValue to NULL to each parameter.

I'm marking this waiting on author.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 19c80d631694366e8098a564ba347fb1e937056b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 18 Sep 2019 16:56:42 -0300
Subject: [PATCH v10] Allow logging of portal parameters on error

---
 doc/src/sgml/config.sgml  |  17 ++
 src/backend/nodes/params.c|   3 +
 src/backend/tcop/postgres.c   | 233 +-
 src/backend/utils/error/elog.c|  28 +++
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/params.h|   3 +
 src/include/tcop/tcopprot.h   |   4 +
 src/include/utils/guc.h   |   1 +
 src/test/README   |   3 +-
 src/test/examples/Makefile|   2 +-
 src/test/examples/testlibpq5.c| 116 +
 12 files changed, 355 insertions(+), 66 deletions(-)
 create mode 100644 src/test/examples/testlibpq5.c

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6612f95f9f..81dfee5fe7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6414,6 +6414,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_parameters_on_error (boolean)
+  
+   log_parameters_on_error configuration parameter
+  
+  
+  
+   
+Controls whether the statement is logged with bind parameter values. 
+It adds some overhead, as postgres will cache textual
+representations of parameter values in memory for all statements,
+even if they eventually do not get logged. The default is
+off. Only superusers can change this setting.
+   
+  
+ 
+
  
   log_statement (enum)
   
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index cf4387e40f..00ad6347c7 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -45,6 +45,7 @@ makeParamList(int numParams)
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
 	retval->numParams = numParams;
+	retval->hasTextValues = false;
 
 	return retval;
 }
@@ -58,6 +59,8 @@ makeParamList(int numParams)
  * set of parameter values.  If dynamic parameter hooks are present, we
  * intentionally do not copy them into the result.  Rather, we forcibly
  * instantiate all available parameter values and copy the datum values.
+ *
+ * We don't bother copying text values, since no caller needs that at present.
  */
 ParamListInfo
 copyParamList(ParamListInfo from)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e8d8e6f828..ec1e4092f5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -86,6 +86,12 @@
  */
 const char *debug_query_string; /* client-supplied query string */
 
+/*
+ * The top-level portal that the client is immediately working with:
+ * creating, binding, executing, or all at one using simple protocol
+ */
+Portal current_top_portal = NULL;
+
 /* Note: whereToSendOutput is initialized for the bootstrap/standalone case */
 CommandDest whereToSendOutput = DestDebug;
 
@@ -1714,6 +1720,9 @@ exec_bind_message(StringInfo input_message)
 	else
 		portal = CreatePortal(portal_name, false, false);
 
+	Assert(current_top_portal == NULL);
+	current_top_portal = portal;
+
 	/*
 	 * Prepare to copy stuff into the portal's memory context.  We do all this
 	 * copying first, because it could possibly fail (out-of-memory) and we
@@ -1751,6 +1760,9 @@ exec_bind_message(StringInfo input_message)
 	 */
 	if (numParams > 0)
 	{
+		/* GUC value can change, so we remember its state to be consistent */
+		bool need_text_values = log_parameters_on_error;
+
 		params = makeParamList(numParams);
 
 		for (int paramno = 0; paramno < numParams; paramno++)
@@ -1818,9 +1830,18 @@ exec_bind_message(StringInfo input_message)
 
 pval = OidInputFunctionCall(typinput, pstring, typioparam, -1);
 
-/* Free result of encoding conversion, if any */
-if (pstring && pstring != 

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-18 Thread Nikolay Shaplov
В Fri, 2 Aug 2019 11:12:35 +1200
Thomas Munro  пишет:
 
> While moving this to the September CF, I noticed this failure:
> 
> test reloptions   ... FAILED   32 ms

Do you have any idea, how to reproduce this? I tried this patch on
current master, and did not get result you are talking about.
Is it still there for you BTW?





Re: dropdb --force

2019-09-18 Thread Pavel Stehule
Hi

I am sending updated version - the changes against last patch are two. I
use pg_terminate_backed for killing other terminates like Tom proposed. I
am not sure if it is 100% practical - on second hand the necessary right to
kill other sessions is  almost available - and consistency with
pg_terminal_backend has sense. More - native implementation is
significantly better then solution implemented outer. I fixed docs little
bit - the timeout for logout (as reaction on SIGTERM is only 5 sec).

Regards

Pavel
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 3ac06c984a..11a31899d2 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ ( option [, ...] ) ] [ IF EXISTS ] name
+
+where option can be one of:
+
+FORCE
 
  
 
@@ -32,9 +36,11 @@ DROP DATABASE [ IF EXISTS ] name
DROP DATABASE drops a database. It removes the
catalog entries for the database and deletes the directory
containing the data.  It can only be executed by the database owner.
-   Also, it cannot be executed while you or anyone else are connected
-   to the target database.  (Connect to postgres or any
-   other database to issue this command.)
+   Also, it cannot be executed while you are connected to the target database.
+   (Connect to postgres or any other database to issue this
+   command.)
+   If anyone else is connected to the target database, this command will fail
+   - unless you use the FORCE option described below.
   
 
   
@@ -64,6 +70,24 @@ DROP DATABASE [ IF EXISTS ] name
  
 

+
+   
+FORCE
+
+ 
+  Attempt to terminate all existing connections to the target database.
+ 
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with 
+  pg_terminate_backend, described
+  in .
+
+  This will also fail, if the connections do not terminate in 5 seconds.
+ 
+
+   
+
   
  
 
diff --git a/doc/src/sgml/ref/dropdb.sgml b/doc/src/sgml/ref/dropdb.sgml
index 3fbdb33716..5d10ad1c92 100644
--- a/doc/src/sgml/ref/dropdb.sgml
+++ b/doc/src/sgml/ref/dropdb.sgml
@@ -86,6 +86,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -f
+  --force
+  
+   
+Force termination of connected backends before removing the database.
+   
+   
+This will add the FORCE option to the DROP
+DATABASE command sent to the server.
+   
+  
+ 
+
  
-V
--version
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 95881a8550..faa5bdbf71 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -810,7 +810,7 @@ createdb_failure_callback(int code, Datum arg)
  * DROP DATABASE
  */
 void
-dropdb(const char *dbname, bool missing_ok)
+dropdb(const char *dbname, bool missing_ok, bool force)
 {
 	Oid			db_id;
 	bool		db_istemplate;
@@ -895,6 +895,9 @@ dropdb(const char *dbname, bool missing_ok)
   nslots_active, nslots_active)));
 	}
 
+	if (force)
+		TerminateOtherDBBackends(db_id);
+
 	/*
 	 * Check for other backends in the target database.  (Because we hold the
 	 * database lock, no new ones can start after this.)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3432bb921d..2f267e4bb6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3868,6 +3868,7 @@ _copyDropdbStmt(const DropdbStmt *from)
 
 	COPY_STRING_FIELD(dbname);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_NODE_FIELD(options);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 18cb014373..da0e1d139a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1676,6 +1676,7 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 {
 	COMPARE_STRING_FIELD(dbname);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_NODE_FIELD(options);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3f67aaf30e..f7bab61175 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -310,6 +310,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	vac_analyze_option_elem
 %type 	vac_analyze_option_list
 %type 	vac_analyze_option_arg
+%type 	drop_option
 %type 	opt_or_replace
 opt_grant_grant_option opt_grant_admin_option
 opt_nowait opt_if_exists opt_with_data
@@ -406,6 +407,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 publication_name_list
 vacuum_relation_list opt_vacuum_relation_list
+drop_option_list opt_drop_option_list
 
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause 

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-09-18 Thread Peter Geoghegan
On Wed, Sep 18, 2019 at 10:43 AM Peter Geoghegan  wrote:
> This also suggests that making _bt_dedup_one_page() do raw page adds
> and page deletes to the page in shared_buffers (i.e. don't use a temp
> buffer page) could pay off. As I went into at the start of this
> e-mail, unnecessarily doing expensive things like copying large
> posting lists around is a real concern. Even if it isn't truly useful
> for _bt_dedup_one_page() to operate in a very incremental fashion,
> incrementalism is probably still a good thing to aim for -- it seems
> to make deduplication faster in all cases.

I think that I forgot to mention that I am concerned that the
kill_prior_tuple/LP_DEAD optimization could be applied less often
because _bt_dedup_one_page() operates too aggressively. That is a big
part of my general concern.

Maybe I'm wrong about this -- who knows? I definitely think that
LP_DEAD setting by _bt_check_unique() is generally a lot more
important than LP_DEAD setting by the kill_prior_tuple optimization,
and the patch won't affect unique indexes. Only very serious
benchmarking can give us a clear answer, though.

-- 
Peter Geoghegan




[DOC] Document concurrent index builds waiting on each other

2019-09-18 Thread James Coleman
In my experience it's not immediately obvious (even after reading the
documentation) the implications of how concurrent index builds manage
transactions with respect to multiple concurrent index builds in
flight at the same time.

Specifically, as I understand multiple concurrent index builds running
at the same time will all return at the same time as the longest
running one.

I've attached a small patch to call this caveat out specifically in
the documentation. I think the description in the patch is accurate,
but please let me know if there's some intricacies around how the
various stages might change the results.

James Coleman
commit 9e28e704820eebb81ff94c1c3cbfb7db087b2c45
Author: James Coleman 
Date:   Wed Sep 18 13:36:22 2019 -0400

Document concurrent indexes waiting on each other

It's not immediately obvious that because concurrent index building
waits on previously running transactions to complete, running multiple
concurrent index builds at the same time will result in each of them
taking as long to return as the longest takes, so, document this caveat.

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 629a31ef79..35f15abb0e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -616,6 +616,13 @@ Indexes:
 cannot.

 
+   
+Because the second table scan must wait for any transactions having a
+snapshot preceding the start of that scan to finish before completing the
+scan, concurrent index builds on multiple tables at the same time will
+not return on any one table until all have completed.
+   
+

 Concurrent builds for indexes on partitioned tables are currently not
 supported.  However, you may concurrently build the index on each


backup manifests

2019-09-18 Thread Robert Haas
In the lengthy thread on block-level incremental backup,[1] both
Vignesh C[2] and Stephen Frost[3] have suggested storing a manifest as
part of each backup, somethig that could be useful not only for
incremental backups but also for full backups. I initially didn't
think this was necessary,[4] but some of my colleagues figured out
that my design was broken, because my proposal was to detect new
blocks just using LSN, and that ignores the fact that CREATE DATABASE
and ALTER TABLE .. SET TABLESPACE do physical copies without bumping
page LSNs, which I knew but somehow forgot about.  Fortunately, some
of my colleagues realized my mistake in testing.[5] Because of this
problem, for an LSN-based approach to work, we'll need to send not
only an LSN, but also a list of files (and file sizes) that exist in
the previous full backup; so, some kind of backup manifest now seems
like a good idea to me.[6] That whole approach might still be dead on
arrival if it's possible to add new blocks with old LSNs to existing
files,[7] but there seems to be room to hope that there are no such
cases.[8]

So, let's suppose we invent a backup manifest. What should it contain?
I imagine that it would consist of a list of files, and the lengths of
those files, and a checksum for each file. I think you should have a
choice of what kind of checksums to use, because algorithms that used
to seem like good choices (e.g. MD5) no longer do; this trend can
probably be expected to continue. Even if we initially support only
one kind of checksum -- presumably SHA-something since we have code
for that already for SCRAM -- I think that it would also be a good
idea to allow for future changes. And maybe it's best to just allow a
choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
gate, so that we can avoid bikeshedding over which one is secure
enough. I guess we'll still have to argue about the default. I also
think that it should be possible to build a manifest with no
checksums, so that one need not pay the overhead of computing
checksums if one does not wish. Of course, such a manifest is of much
less utility for checking backup integrity, but you can still check
that you've got the right files, which is noticeably better than
nothing.  The manifest should probably also contain a checksum of its
own contents so that the integrity of the manifest itself can be
verified. And maybe a few other bits of metadata, but I'm not sure
exactly what.  Ideas?

Once we invent the concept of a backup manifest, what do we need to do
with them? I think we'd want three things initially:

(1) When taking a backup, have the option (perhaps enabled by default)
to include a backup manifest.
(2) Given an existing backup that has not got a manifest, construct one.
(3) Cross-check a manifest against a backup and complain about extra
files, missing files, size differences, or checksum mismatches.

One thing I'm not quite sure about is where to store the backup
manifest. If you take a base backup in tar format, you get base.tar,
pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
Does the backup manifest go into base.tar? Get written into a separate
file outside of any tar archive? Something else? And what about a
plain-format backup? I suppose then we should just write the manifest
into the top level of the main data directory, but perhaps someone has
another idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoYxQLL%3DmVyN90HZgH0X_EUrw%2BaZ0xsXJk7XV3-3LygTvA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CALDaNm310fUZ72nM2n%3DcD0eSHKRAoJPuCyvvR0dhTEZ9Oytyzg%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/20190916143817.GA6962%40tamriel.snowman.net
[4] 
https://www.postgresql.org/message-id/CA%2BTgmoaj-zw4Mou4YBcJSkHmQM%2BJA-dAVJnRP8zSASP1S4ZVgw%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/CAM2%2B6%3DXfJX%3DKXvpTgDvgd1rQjya_Am27j4UvJtL3nA%2BJMCTGVQ%40mail.gmail.com
[6] 
https://www.postgresql.org/message-id/CA%2BTgmoYg9i8TZhyjf8MqCyU8unUVuW%2B03FeBF1LGDu_-eOONag%40mail.gmail.com
[7] 
https://www.postgresql.org/message-id/CA%2BTgmoYT9xODgEB6y6j93hFHqobVcdiRCRCp0dHh%2BfFzZALn%3Dw%40mail.gmail.com
and nearby messages
[8] 
https://www.postgresql.org/message-id/20190916173933.GE6962%40tamriel.snowman.net




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-09-18 Thread Peter Geoghegan
On Tue, Sep 17, 2019 at 9:43 AM Anastasia Lubennikova
 wrote:
> > 3. Third, there is incremental writing of the page itself -- avoiding
> > using a temp buffer. Not sure where I stand on this.
>
> I think it's a good idea.  memmove must be much faster than copying
> items tuple by tuple.
> I'll send next patch by the end of the week.

I think that the biggest problem is that we copy all of the tuples,
including existing posting list tuples that can't be merged with
anything. Even if you assume that we'll never finish early (e.g. by
using logic like the "if (pagesaving >= newitemsz) deduplicate =
false;" thing), this can still unnecessarily slow down deduplication.
Very often,  _bt_dedup_one_page() is called when 1/2 - 2/3 of the
space on the page is already used by a small number of very large
posting list tuples.

> > The loop within _bt_dedup_one_page() is very confusing in both v13 and
> > v14 -- I couldn't figure out why the accounting worked like this:

> I'll look at it.

I'm currently working on merging my refactored version of
_bt_dedup_one_page() with your v15 WAL-logging. This is a bit tricky.
(I have finished merging the other WAL-logging stuff, though -- that
was easy.)

The general idea is that the loop in _bt_dedup_one_page() now
explicitly operates with a "base" tuple, rather than *always* saving
the prev tuple from the last loop iteration. We always have a "pending
posting list", which won't be written-out as a posting list if it
isn't possible to merge at least one existing page item. The "base"
tuple doesn't change. "pagesaving" space accounting works in a way
that doesn't care about whether or not the base tuple was already a
posting list -- it saves the size of the IndexTuple without any
existing posting list size, and calculates the contribution to the
total size of the new posting list separately (heap TIDs from the
original base tuple and subsequent tuples are counted together).

This has a number of advantages:

* The loop is a lot clearer now, and seems to have slightly better
space utilization because of improved accounting (with or without the
"if (pagesaving >= newitemsz) deduplicate = false;" thing).

* I think that we're going to need to be disciplined about which tuple
is the "base" tuple for correctness reasons -- we should always use
the leftmost existing tuple to form a new posting list tuple. I am
concerned about rare cases where we deduplicate tuples that are equal
according to _bt_keep_natts_fast()/datum_image_eq() that nonetheless
have different sizes (and are not bitwise equal). There are rare cases
involving TOAST compression where that is just about possible (see the
temp comments I added to  _bt_keep_natts_fast() in the patch).

* It's clearly faster, because there is far less palloc() overhead --
the "land" unlogged table test completes in about 95.5% of the time
taken by v15 (I disabled "if (pagesaving >= newitemsz) deduplicate =
false;" for both versions here, to keep it simple and fair).

This also suggests that making _bt_dedup_one_page() do raw page adds
and page deletes to the page in shared_buffers (i.e. don't use a temp
buffer page) could pay off. As I went into at the start of this
e-mail, unnecessarily doing expensive things like copying large
posting lists around is a real concern. Even if it isn't truly useful
for _bt_dedup_one_page() to operate in a very incremental fashion,
incrementalism is probably still a good thing to aim for -- it seems
to make deduplication faster in all cases.

-- 
Peter Geoghegan




Re: dropdb --force

2019-09-18 Thread Pavel Stehule
st 18. 9. 2019 v 19:11 odesílatel vignesh C  napsal:

> On Wed, Sep 18, 2019 at 9:41 AM Pavel Stehule 
> wrote:
> >
> >
> >
> > st 18. 9. 2019 v 5:59 odesílatel vignesh C  napsal:
> >>
> >> On Wed, Sep 18, 2019 at 8:30 AM Pavel Stehule 
> wrote:
> >> >
> >> >
> >> Hi Pavel,
> >>
> >> One Comment:
> >> In the documentation we say drop database will fail after 60 seconds
> >>   
> >> FORCE
> >> 
> >>  
> >>   Attempt to terminate all existing connections to the target
> database.
> >>  
> >>  
> >>   This will fail, if current user has no permissions to terminate
> other
> >>   connections. Required permissions are the same as with
> >>   pg_terminate_backend, described
> >>   in .
> >>
> >>   This will also fail, if the connections do not terminate in 60
> seconds.
> >>  
> >> 
> >>
> >
> >
> > This is not valid. With FORCE flag the clients are closed immediately
> >
> >>
> >>
> >> But in TerminateOtherDBBackends:
> >> foreach (lc, pids)
> >> + {
> >> + int pid = lfirst_int(lc);
> >> +
> >> + (void) kill(pid, SIGTERM); /* ignore any error */
> >> + }
> >> +
> >> + /* sleep 100ms */
> >> + pg_usleep(100 * 1000L);
> >> + }
> >>
> >> We check for any connected backends after sending kill signal in
> >> CountOtherDBBackends and throw error immediately.
> >>
> >> I had also tested this scenario to get the following error immediately:
> >> test=# drop database (force) test1;
> >> ERROR:  database "test1" is being accessed by other users
> >> DETAIL:  There is 1 other session using the database.
> >>
> >
> > sure - you cannot to kill self
> >
> This was not a case where we try to do drop database from the same
> session, I got this error when one of the process took longer time to
> terminate the other connected process.
> But this scenario was simulated using gdb, I'm not sure if similar
> scenario is possible without gdb in production environment. If
> terminating process does not happen immediately then the above
> scenario can happen.
>

maybe - gdb can handle SIGTERM signal.

I think so current design should be ok, because it immediately send SIGTERM
signals and then try to check 5 sec if these signals are really processed.
If not, then it raise error, and do nothing.

Pavel



> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>


Re: dropdb --force

2019-09-18 Thread vignesh C
On Wed, Sep 18, 2019 at 9:41 AM Pavel Stehule  wrote:
>
>
>
> st 18. 9. 2019 v 5:59 odesílatel vignesh C  napsal:
>>
>> On Wed, Sep 18, 2019 at 8:30 AM Pavel Stehule  
>> wrote:
>> >
>> >
>> Hi Pavel,
>>
>> One Comment:
>> In the documentation we say drop database will fail after 60 seconds
>>   
>> FORCE
>> 
>>  
>>   Attempt to terminate all existing connections to the target database.
>>  
>>  
>>   This will fail, if current user has no permissions to terminate other
>>   connections. Required permissions are the same as with
>>   pg_terminate_backend, described
>>   in .
>>
>>   This will also fail, if the connections do not terminate in 60 seconds.
>>  
>> 
>>
>
>
> This is not valid. With FORCE flag the clients are closed immediately
>
>>
>>
>> But in TerminateOtherDBBackends:
>> foreach (lc, pids)
>> + {
>> + int pid = lfirst_int(lc);
>> +
>> + (void) kill(pid, SIGTERM); /* ignore any error */
>> + }
>> +
>> + /* sleep 100ms */
>> + pg_usleep(100 * 1000L);
>> + }
>>
>> We check for any connected backends after sending kill signal in
>> CountOtherDBBackends and throw error immediately.
>>
>> I had also tested this scenario to get the following error immediately:
>> test=# drop database (force) test1;
>> ERROR:  database "test1" is being accessed by other users
>> DETAIL:  There is 1 other session using the database.
>>
>
> sure - you cannot to kill self
>
This was not a case where we try to do drop database from the same
session, I got this error when one of the process took longer time to
terminate the other connected process.
But this scenario was simulated using gdb, I'm not sure if similar
scenario is possible without gdb in production environment. If
terminating process does not happen immediately then the above
scenario can happen.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Fabien COELHO


Hello Amit,


  - use search_path to find at most one pgbench_accounts
It still uses left join because I still think that it is appropriate.
I added a lateral to avoid repeating the array_position call
to manage the search_path, and use explicit pg_catalog everywhere.


It would be good if you can add some more comments to explain the
intent of query.


Indeed, I put too few comments on the query.


+ if (ps == NULL)
+ partition_method = PART_NONE;

When can we expect ps as NULL?  If this is not a valid case, then
probably and Assert would be better.


No, ps is really NULL if there is no partitioning, because of the LEFT 
JOIN and pg_partitioned_table is just empty in that case.


The last else where there is an unexpected entry is different, see 
comments about v11 below.



+ else if (PQntuples(res) == 0)
+ {
+ /* no pgbench_accounts found, builtin script should fail later */
+ partition_method = PART_NONE;
+ partitions = -1;

If we don't find pgbench_accounts, let's give error here itself rather
than later unless you have a valid case in mind.


I thought of it, but decided not to: Someone could add a builtin script 
which does not use pgbench_accounts, or a parallel running script could 
create a table dynamically, whatever, so I prefer the error to be raised 
by the script itself, rather than deciding that it will fail before even 
trying.



+ /*
+ * Partition information. Assume no partitioning on any failure, so as
+ * to avoid failing on an older version.
+ */
..
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ /* probably an older version, coldly assume no partitioning */
+ partition_method = PART_NONE;
+ partitions = 0;
+ }

So, here we are silently absorbing the error when pgbench is executed
against older server version which doesn't support partitioning.


Yes, exactly.

If that is the case, then I think if user gives --partitions for the old 
server version, it will also give an error?


Yes, on -i it will fail because the syntax will not be recognized.

It is not clear in documentation whether we support or not using pgbench 
with older server versions.


Indeed. We more or less do in practice. Command "psql" works back to 8 
AFAICR, and pgbench as well.


I guess it didn't matter, but with this feature, it can matter.  Do we 
need to document this?


This has been discussed in the past, and the conclusion was that it was 
not worth the effort. We just try not to break things if it is avoidable. 
On this regard, the patch slightly changes FILLFACTOR output, which is 
removed if the value is 100 (%) as it is the default, which means that 
table creation would work on very very old version which did not support 
fillfactor, unless you specify a lower percentage.


Attached v11:

 - add quite a few comments on the pg_catalog query

 - reverts the partitions >= 1 test; If some new partition method is
   added that pgbench does not know about, the failure mode will be that
   nothing is printed rather than printing something strange like
   "method none with 2 partitions".

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..c07ed42bbb 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning, -1 for bad */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+	

Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Wed, Sep 18, 2019 at 06:46:24AM +, Smith, Peter wrote:
>> I have identified some OSS code where more compile-time asserts could be 
>> added. 
>> 
>> Mostly these are asserting that arrays have the necessary length to
>> accommodate the enums that are used to index into them.
>> 
>> In general the code is already commented with warnings such as:
>> * "If you add a new entry, remember to ..."
>> * "When modifying this enum, update the table in ..."
>> * "Display names for enums in ..."
>> * etc.
>> 
>> But comments can be accidentally overlooked, so adding the
>> compile-time asserts can help eliminate human error. 
>
> For some of them it could help, and we could think about a better
> location for that stuff than an unused routine.

Postgres doesn't seem to have it, but it would be possible to define a
StaticAssertDecl macro that can be used at the file level, outside any
function.  See for example Perl's STATIC_ASSERT_DECL:

https://github.com/Perl/perl5/blob/v5.30.0/perl.h#L3455-L3488

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law




Relation extension lock bottleneck

2019-09-18 Thread Konstantin Knizhnik



Hi hackers,

We have a customer which suffer from Postgres performance degradation 
when there are large number of connections performing inserts in the 
same table.
In 2016 Robert Haas has committed optimization of relation extension 
719c84c1:


Author: Robert Haas 
Date:   Fri Apr 8 02:04:46 2016 -0400

    Extend relations multiple blocks at a time to improve scalability.

    Contention on the relation extension lock can become quite fierce when
    multiple processes are inserting data into the same relation at the 
same

    time at a high rate.  Experimentation shows the extending the relation
    multiple blocks at a time improves scalability.

But this optimization is applied only for heap relations 
(RelationGetBufferForTuple).

And here most of backends are competing for index relation extension lock:

        /*
         * Extend the relation by one page.
         *
         * We have to use a lock to ensure no one else is extending the 
rel at

         * the same time, else we will both try to initialize the same new
         * page.  We can skip locking for new or temp relations, however,
         * since no one else could be accessing them.
         */
        needLock = !RELATION_IS_LOCAL(rel);

        if (needLock)
            LockRelationForExtension(rel, ExclusiveLock);


#0  0x7ff1787065e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0072e39e in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffd03c0ddf0, cur_timeout=-1, set=0x2cb1838) at 
latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x2cb1838, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffd03c0ddf0, 
nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=50331649) a

t latch.c:1000
#3  0x0072e7fb in WaitLatchOrSocket (latch=0x2aec0cf5c844, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, 
timeout@entry=0, wait_event_info=50331649) at latch.c:385
#4  0x0072e8b0 in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0, 
wait_event_info=) at latch.c:339
#5  0x0073e2c6 in ProcSleep 
(locallock=locallock@entry=0x2ace708, 
lockMethodTable=lockMethodTable@entry=0x9cee80 ) at 
proc.c:1284
#6  0x00738d92 in WaitOnLock 
(locallock=locallock@entry=0x2ace708, owner=owner@entry=0x28f2d10) at 
lock.c:1750
#7  0x0073a216 in LockAcquireExtended 
(locktag=locktag@entry=0x7ffd03c0e170, lockmode=lockmode@entry=7, 
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, 
reportMemoryError=rep

ortMemoryError@entry=true, locallockp=locallockp@entry=0x0) at lock.c:1032
#8  0x0073a8d4 in LockAcquire 
(locktag=locktag@entry=0x7ffd03c0e170, lockmode=lockmode@entry=7, 
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false) at 
lock.c:695
#9  0x00737c36 in LockRelationForExtension 
(relation=relation@entry=0x3089c30, lockmode=lockmode@entry=7) at lmgr.c:362
#10 0x004d2209 in _bt_getbuf (rel=rel@entry=0x3089c30, 
blkno=blkno@entry=4294967295, access=access@entry=2) at nbtpage.c:829
#11 0x004d013b in _bt_split (newitemonleft=true, 
newitem=0x2cb15b8, newitemsz=24, newitemoff=63, firstright=138, cbuf=0, 
buf=27480727, rel=0x3089c30) at nbtinsert.c:1156
#12 _bt_insertonpg (rel=rel@entry=0x3089c30, buf=buf@entry=27480727, 
cbuf=cbuf@entry=0, stack=stack@entry=0x2cb1758, 
itup=itup@entry=0x2cb15b8, newitemoff=63, 
split_only_page=split_only_page@entry=fals

e) at nbtinsert.c:909
#13 0x004d1b1c in _bt_doinsert (rel=rel@entry=0x3089c30, 
itup=itup@entry=0x2cb15b8, 
checkUnique=checkUnique@entry=UNIQUE_CHECK_NO, 
heapRel=heapRel@entry=0x3088d70) at nbtinsert.c:306
#14 0x004d4651 in btinsert (rel=0x3089c30, values=out>, isnull=, ht_ctid=0x2bd230c, heapRel=0x3088d70, 
checkUnique=UNIQUE_CHECK_NO, indexInfo=0x2caf828) at nbtree.c:20

5
#15 0x0060a41a in ExecInsertIndexTuples 
(slot=slot@entry=0x2c875f0, tupleid=tupleid@entry=0x2bd230c, 
estate=estate@entry=0x2c85dc0, noDupErr=noDupErr@entry=false, 
specConflict=specConflict@entr

y=0x0, arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:386
#16 0x0062d132 in ExecInsert (mtstate=mtstate@entry=0x2c86110, 
slot=0x2c875f0, planSlot=planSlot@entry=0x2c875f0, 
estate=estate@entry=0x2c85dc0, canSetTag=) at 
nodeModifyTable.c:

535
#17 0x0062e1b9 in ExecModifyTable (pstate=0x2c86110) at 
nodeModifyTable.c:2159



I wonder if such optimization should also be used for index relations?
Can we just  make RelationAddExtraBlocks public (non static) and use it 
in B-Tree code in the same way as in hio.c?
Or it is better to provide some special function for extending arbitrary 
relation?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Commit fest 2019-09

2019-09-18 Thread Alvaro Herrera
On 2019-Sep-16, Alvaro Herrera wrote:

> The third week for this commitfest starts.  The numbers now:
> 
>   statusstring  │ week1 │ week2 │ week3
> ┼───┼───┼───
>  Needs review   │   165 │   138 │   116
>  Waiting on Author  │30 │44 │51

FWIW I have just went over all the waiting-on-author patches and marked
a few as needs-review in cases where new versions had been posted and CF
app had not been updated.  So now we have NR: 127, WoA: 40.

Patch authors, please make sure to update the status of your patch in
the CF app when posting new versions!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fix parsing of identifiers in jsonpath

2019-09-18 Thread Chapman Flack
On 9/18/19 11:10 AM, Nikita Glukhov wrote:

> 4. Even if the Unicode escape sequence '\u' is used, it cannot produce
>    special symbols or whitespace, because the identifiers are displayed
> ...
> I don't know if it is possible to check Unicode properties "ID_Start" and
> "ID_Continue" in Postgres, and what ZWNJ/ZWJ is.

ZWNJ and ZWJ are U+200C and U+200D (mentioned in [1]).

Also, it's not just that a Unicode escape sequence can't make a
special symbol or whitespace; it can't make any character that's
not allowed there by the other rules:

"A UnicodeEscapeSequence cannot be used to put a code point into an
IdentifierName that would otherwise be illegal. In other words, if a \
UnicodeEscapeSequence sequence were replaced by the SourceCharacter it
contributes, the result must still be a valid IdentifierName that has
the exact same sequence of SourceCharacter elements as the original
IdentifierName. All interpretations of IdentifierName within this
specification are based upon their actual code points regardless of
whether or not an escape sequence was used to contribute any particular
code point."


A brief glance through src/backend/utils/mb/Unicode shows that the
Makefile does download a bunch of stuff, but maybe not the Unicode
character data that would allow testing ID_Start and ID_Continue?
I'm not sure.

Regards,
-Chap




Re: PGCOLOR? (Re: pgsql: Unified logging system for command-line programs)

2019-09-18 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-06-06 11:08, Masahiko Sawada wrote:
>>> Do we need two variables to control this? I was only looking at
>>> PG_COLOR, and noticed PG_COLORS only later. Keeping PG_COLORS aligned
>>> with {GCC,LS}_COLORS makes sense. How about removing PG_COLOR, and
>>> making "auto" the default? (Maybe we could still support "PG_COLORS=off")

>> I think the if we keep two variables user can set the same value to
>> both GCC_COLORS and PG_COLORS. Rather I think it's a problem that
>> there is no documentation of PG_COLORS. Thoughts?

> It looks like there is documentation for PG_COLORS in the release notes
> now, which seems like an odd place.  Suggestions for a better place?

I stuck that in because Bruce's text didn't make any sense to me,
so I went and read the code to see what it was actually doing.
I didn't know that it hadn't been correctly documented in the first
place ;-)

I'm not for forcing "auto" mode all the time; that will surely break
things for some people.  So I think the behavior is fine and
we should just fix the docs.  (Possibly my opinion is biased here
by the fact that I hate all forms of colorized output with a deep,
abiding passion, as Robert would put it.  So off-by-default is just
fine with me.)

> And any more opinions for PG_COLORS vs PGCOLORS naming?

Following the precedent of LS_COLORS makes sense from here.

regards, tom lane




Fix parsing of identifiers in jsonpath

2019-09-18 Thread Nikita Glukhov

Hi!

Unfortunately, jsonpath lexer, in contrast to jsonpath parser, was written by
Teodor and me without a proper attention to the stanard.  JSON path lexics is
is borrowed from the external ECMAScript [1], and we did not study it carefully.

There were numerous deviations from the ECMAScript standard in our jsonpath
implementation that were mostly fixed in the attached patch:

1. Identifiers (unquoted JSON key names) should start from the one of (see [2]):
   - Unicode symbol having Unicode property "ID_Start" (see [3])
   - Unicode escape sequence '\u' or '\u{X...}'
   - '$'
   - '_'

   And they should continue with the one of:
   - Unicode symbol having Unicode property "ID_Continue" (see [3])
   - Unicode escape sequence
   - '$'
   - ZWNJ
   - ZWJ

2. '$' is also allowed inside the identifiers, so it is possible to write
   something like '$.a$$b'.

3. Variable references '$var' are regular identifiers simply starting from the
   '$' sign, and there is no syntax like '$"var"', because quotes are not
   allowed in identifiers.

4. Even if the Unicode escape sequence '\u' is used, it cannot produce
   special symbols or whitespace, because the identifiers are displayed without
   quoting (i.e. '$\u{20}' is not possible to display as '$" "' or even more as
   string '"$ "').

5. All codepoints in '\u{XX}' greater than 0x10 should be forbidden.

6. 6 single-character escape sequences (\b \t \r \f \n \v) should only be
   supported inside quoted strings.


I don't know if it is possible to check Unicode properties "ID_Start" and
"ID_Continue" in Postgres, and what ZWNJ/ZWJ is.  Now, identifier's starting
character set is simply determined by the exclusion of all recognized special
characters.


The patch is not so simple, but I believe that it's not too late to fix v12.


[1] 
https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-lexical-grammar
[2] 
https://www.ecma-international.org/ecma-262/10.0/index.html#sec-names-and-keywords
[3] https://unicode.org/reports/tr31/

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From 2387de10cf241f86f6987ba310f59594cae6b64f Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Fri, 22 Mar 2019 15:15:38 +0300
Subject: [PATCH] Fix parsing of identifiers in jsonpath

---
 src/backend/utils/adt/jsonpath.c   |  11 ++-
 src/backend/utils/adt/jsonpath_gram.y  |   6 +-
 src/backend/utils/adt/jsonpath_scan.l  | 125 +++--
 src/test/regress/expected/jsonpath.out | 138 +
 src/test/regress/sql/jsonpath.sql  |  21 +
 5 files changed, 216 insertions(+), 85 deletions(-)

diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 7f32248..f43aeef 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -494,9 +494,14 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey,
 			escape_json(buf, jspGetString(v, NULL));
 			break;
 		case jpiVariable:
-			appendStringInfoChar(buf, '$');
-			escape_json(buf, jspGetString(v, NULL));
-			break;
+			{
+int32		len;
+char	   *name = jspGetString(v, );
+
+appendStringInfoChar(buf, '$');
+appendBinaryStringInfo(buf, name, len);
+break;
+			}
 		case jpiNumeric:
 			appendStringInfoString(buf,
    DatumGetCString(DirectFunctionCall1(numeric_out,
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 1725502..196a191 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -334,8 +334,10 @@ makeItemVariable(JsonPathString *s)
 	JsonPathParseItem  *v;
 
 	v = makeItemType(jpiVariable);
-	v->value.string.val = s->val;
-	v->value.string.len = s->len;
+
+	/* skip leading '$' */
+	v->value.string.val = >val[1];
+	v->value.string.len = s->len - 1;
 
 	return v;
 }
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 2165ffc..2c79bf0 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -20,6 +20,8 @@
 #include "mb/pg_wchar.h"
 #include "nodes/pg_list.h"
 
+#define JSONPATH_SPECIAL_CHARS "?%.[]{}()|&!=<>@#,*:-+/~`;\\\"' \b\f\n\r\t\v"
+
 static JsonPathString scanstring;
 
 /* Handles to the buffer that the lexer uses internally */
@@ -63,7 +65,7 @@ fprintf_to_ereport(const char *fmt, const char *msg)
  * quoted variable names and C-tyle comments.
  * Exclusive states:
  *   - quoted strings
- *   - non-quoted strings
+ *   - non-quoted identifiers
  *   - quoted variable names
  *   - single-quoted strings
  *   - C-style comment
@@ -75,9 +77,12 @@ fprintf_to_ereport(const char *fmt, const char *msg)
 %x xsq
 %x xc
 
-special		 [\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/]
-any			[^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\"\' \t\n\r\f]
-blank		[ \t\n\r\f]
+special		 

Re: Nondeterministic collations vs. text_pattern_ops

2019-09-18 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a draft patch.

> It will require a catversion change because those operator classes don't
> have assigned OIDs so far.

That's slightly annoying given where we are with v12.  We could
avoid it by looking up the opclass's opfamily and seeing if it's
TEXT_BTREE_FAM_OID etc, which do already have hand-assigned OIDs.
But maybe avoiding a catversion bump now is not worth the cost of
an extra syscache lookup.  (It'd give me an excuse to shove the
leakproofness-marking changes from the other thread into v12, so
there's that.)

Speaking of extra syscache lookups, I don't like that you rearranged
the if-test to check nondeterminism before the opclass identity checks.
That's usually going to be a wasted lookup.

> The comment block I just moved over for the time being.  It should
> probably be rephrased a bit.

Indeed.  Maybe like

 * text_pattern_ops uses text_eq as the equality operator, which is
 * fine as long as the collation is deterministic; text_eq then
 * reduces to bitwise equality and so it is semantically compatible
 * with the other operators and functions in the opclass.  But with a
 * nondeterministic collation, text_eq could yield results that are
 * incompatible with the actual behavior of the index (which is
 * determined by the opclass's comparison function).  We prevent
 * such problems by refusing creation of an index with this opclass
 * and a nondeterministic collation.
 *
 * The same applies to varchar_pattern_ops and bpchar_pattern_ops.
 * If we find more cases, we might decide to create a real mechanism
 * for marking opclasses as incompatible with nondeterminism; but
 * for now, this small hack suffices.
 *
 * Another solution is to use a special operator, not text_eq, as the
 * equality opclass member, but that is undesirable because it would
 * prevent index usage in many queries that work fine today.

regards, tom lane




Re: pg_upgrade check fails on Solaris 10

2019-09-18 Thread Alvaro Herrera
On 2019-Sep-17, Marina Polyakova wrote:

> Hello, hackers!
> 
> We got an error for pg_upgrade check on the branch REL_11_STABLE (commit
> 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC the
> argument to the sed command is not enclosed in quotation marks (see [1]):

Hmm, I'm surprised it has taken this long to detect the problem.

> Attached diff.patch fixes the problem.

I have pushed it to all branches that have src/bin/pg_upgrade (namely,
9.5 onwards), thanks.  I hope this won't make the msys/mingw machines
angry ;-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: allow online change primary_conninfo

2019-09-18 Thread Fujii Masao
On Wed, Aug 28, 2019 at 6:50 PM Sergei Kornilov  wrote:
>
> Hello
>
> Updated patch attached. (also I merged into one file)

Thanks for updating the patch! Here are some comments from me.

#primary_conninfo = '' # connection string to sending server
# (change requires restart)
#primary_slot_name = '' # replication slot on sending server
# (change requires restart)

ISTM that you need to update the above parts in postgresql.conf.sample.


/* we don't expect primary_conninfo to change */

ISTM that you need to update the above comment in walreceiver.c.


+ 
+  The WAL receiver is restarted after an update of
primary_conninfo.
+ 

If primary_conninfo is set to an empty string, walreceiver just shuts down,
not restarts. The above description in the doc is not always true.
So I'm thinking something like the following description is better.
Thought?

If primary_conninfo is changed while WAL receiver is running,
the WAL receiver shuts down and then restarts with new setting,
except when primary_conninfo is an empty string.


There is the case where walreceiver doesn't shut down immediately
after the change of primary_conninfo. If the change happens while
the startup process in paused state (e.g., by pg_wal_replay_pause(),
recovery_min_apply_delay, recovery conflict, etc), the startup
process tries to terminate walreceiver after it gets out of such state.
Shouldn't this fact be documented as a note?

Regards,

-- 
Fujii Masao




Re: Psql patch to show access methods info

2019-09-18 Thread Alvaro Herrera
On 2019-Sep-18, Alexander Korotkov wrote:

> On Tue, Sep 17, 2019 at 9:01 PM Alvaro Herrera  
> wrote:

> > I think \dAf is just as critical as \dAo; the former lets you know which
> > opfamilies you can use in CREATE INDEX, while the latter lets you know
> > which operators would be helped by such an index.  (But, really, only if
> > the opfamily name is printed in \d of the index, which we currently
> > don't print unless it's non-default ... which is an omission that
> > perhaps we should consider fixing).

> I think you have a point.  Will add \dAf command to the patch.

Great, thanks.

I think in order for this feature to be more complete "\d index" should
show the opfamily name, also, even when it's the default one.  (Let's
not put the opfamily when it's the default in "\d table", as we do when
the opfamily is not default; that would lead, I think, to too much
clutter.)

> > On the other hand, from a user perspective, what you really want to know
> > is: what opfamilies exist for datatype T, and what operators are
> > supported by the opfamily I have chosen?  The current patch doesn't
> > really help you find that out.

I hope that in some future somebody will contribute towards this, which
I think is more important (from users POV) than the below one:

> > I think \dAp isn't terribly informative from a user perspective.  The
> > support procs are just an opfamily implementation detail.
> 
> I've expressed my opinion regarding \dAp in [1].  In my observations,
> some advanced users can write btree/hash opclasses in pl/* languages.
> This doesn't require knowledge of core developer.  And they may find
> \dAp command useful.  What do you think?

I have never tried or had the need to do that.  I'll take your word for
it, so I have no objection.

I do wonder if \? is going to end up with too much clutter, and if so do
we need to make \? show only the most important commands and relegate
some others to \?+ ... however, going over the existing \? I see no
command that I would move to \?+ so \dAp would be alone there, which
would be pretty strange.  So let's forget this angle for now; but if
psql acquires too much "system innards" functionality then I say we
should consider it.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Minimal logical decoding on standbys

2019-09-18 Thread Robert Haas
On Fri, Sep 13, 2019 at 7:20 AM Amit Khandekar  wrote:
> > Thanks for notifying about this. Will work this week on rebasing this
> > patchset and putting it into the 2019-11 commit fest.
>
> Rebased patch set attached.
>
> Added in the Nov commitfest : https://commitfest.postgresql.org/25/2283/

I took a bit of a look at
0004-New-TAP-test-for-logical-decoding-on-standby.patch and saw some
things I don't like in terms of general code quality:

- Not many comments. I think each set of tests should have a block
comment at the top explaining clearly what it's trying to test.
- print_phys_xmin and print_logical_xmin don't print anything.
- They are also identical to each other except that they each operate
on a different hard-coded slot name.
- They are also identical to wait_for_xmins except that they don't wait.
- create_logical_slots creates two slots whose names are hard-coded
using code that is cut-and-pasted.
- The same code is also cut-and-pasted into two other places in the file.
- Why does that cut-and-pasted code use BAIL_OUT(), which aborts the
entire test run, instead of die, which just aborts the current test
file?
- cmp_ok() message in check_slots_dropped() has trailing whitespace.
- make_slot_active() and check_slots_dropped(), at least, use global
variables; is that really necessary?
- In particular, $return is used only in one function and doesn't need
to survive across calls; why is it not a local variable?
- Depending on whether $return ends up true or false, the number of
executed tests will differ; so besides any actual test failures,
you'll get complaints about not executing exactly 58 tests.
- $backup_name only ever has one value, but for some reason the
variable is created at the top of the test file and then initialized
later. Just do my $backup_name = 'b1' near where it's first used, or
ditch the variable and write 'b1' in each of the three places it's
used.
- Some of the calls to wait_for_xmins() save the return values into
local variables but then do nothing with those values before they are
overwritten. Either it's wrong that we're saving them into local
variables, or it's wrong that we're not doing anything with them.
- test_oldest_xid_retention() is called only once; it basically acts
as a wrapper for one group of tests. You could argue against that
approach, but I actually think it's a nice style which makes the code
more self-documenting. However, it's not used consistently; all the
other groups of tests are written directly as toplevel code.
- The code in that function verifies that oldestXid is found in
pg_controldata's output, but does not check the same for NextXID.
- Is there a reason the code in that function prints debugging output?
Seems like a leftover.
- I think it might be an idea to move the tests for recovery
conflict/slot drop to a separate test file, so that we have one file
for the xmin-related testing and another for the recovery conflict
testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Fabien COELHO




+ "group by 1, 2 "

I have a question, wouldn't it be sufficient to just group by 1?


Conceptually yes, it is what is happening in practice, but SQL requires 
that non aggregated columns must appear explicitely in the GROUP BY 
clause, so I have to put it even if it will not change groups.


--
Fabien.




Re: another look at macOS SIP

2019-09-18 Thread Robert Haas
On Tue, Sep 17, 2019 at 1:52 PM Andres Freund  wrote:
> On 2019-09-10 19:14:19 +0200, Peter Eisentraut wrote:
> > I think the way forward here is to get rid of all uses of system() for
> > calling between PostgreSQL programs.  There are only a handful of those,
> > and we already have well-tested replacement code like spawn_process() in
> > pg_regress.c that could be used.  (Perhaps we could also use that
> > opportunity to get rid of the need for shell quoting?)
>
> Yea, I think that'd be good, regardless of SIP.

+1, and making some progress on the SIP issue would be good, too, even
if we don't fix everything right away.  It seems entirely possible
that Apple will make this even more annoying to disable than it
already is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: patch: psql - enforce constant width of last column

2019-09-18 Thread Pavel Stehule
st 18. 9. 2019 v 12:52 odesílatel Ahsan Hadi  napsal:

>
>
> On Tue, Sep 17, 2019 at 8:16 PM Pavel Stehule 
> wrote:
>
>>
>>
>> út 17. 9. 2019 v 17:06 odesílatel Ahsan Hadi 
>> napsal:
>>
>>> Hi Pavel,
>>>
>>> I have been trying to reproduce the case of badly displaying last
>>> columns of a query result-set. I played around with the legal values for
>>> psql border variable but not able to find a case where last columns are
>>> badly displayed. Can you please share an example that I can use to
>>> reproduce this problem. I will try out your patch once I am able to
>>> reproduce the problem.
>>>
>>
>> you need to use pspg, and vertical cursor.
>>
>> https://github.com/okbob/pspg
>> vertical cursor should be active
>>
>
> okay thanks for the info. I don't think it was possible to figure this out
> by reading the initial post. I will check it out.
>
> does this patch have any value for psql without pspg?
>

The benefit of this patch is just for pspg users today.

Pavel



>
>> \pset border 1
>> \pset linestyle ascii
>> \pset pager always
>>
>> select * from generate_series(1,3);
>>
>> Regards
>>
>> Pavel
>>
>>
>>> Thanks,
>>>
>>> -- Ahsan
>>>
>>>
>>> On Mon, Sep 9, 2019 at 2:32 PM Pavel Stehule 
>>> wrote:
>>>
 Hi

 When I played with vertical cursor support I got badly displayed last
 columns when border was not 2. Only when border is 2, then psql displays
 last column with same width for each row.

 I think so we can force column width alignment for any border styles
 today (for alignment and wrapping styles) or as minimum this behave can be
 optional.

 I wrote a patch with pset option "final_spaces", but I don't see a
 reason why we trim rows today.

 Regards

 Pavel

>>>


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-18 Thread Alexey Kondratov

Hi Surafel,

Thank you for looking at the patch!

On 17.09.2019 14:04, Surafel Temesgen wrote:
* There are NOWAIT option in alter index, is there a reason not to 
have similar option here?


Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] 
option, so I hope it worth adding this option here for convenience. 
Added in the new version.



* SET TABLESPACE command is not documented


Actually, new_tablespace parameter was documented, but I've added a more 
detailed section for SET TABLESPACE too.


* There are multiple checking for whether the relation is temporary 
tables of other sessions, one in check_relation_is_movable and other 
independently


Yes, and there is a comment section in the code describing why. There is 
a repeatable bunch of checks for verification whether relation movable 
or not, so I put it into a separated function -- 
check_relation_is_movable. However, if we want to do only REINDEX, then 
some of them are excess, so the only one RELATION_IS_OTHER_TEMP is used. 
Thus, RELATION_IS_OTHER_TEMP is never executed twice, just different 
code paths.



*+ char *tablespacename;

calling it new_tablespacename will make it consistent with other places



OK, changed, although I don't think it is important, since this is the 
only one tablespace variable there.


*The patch did't applied cleanly 
http://cfbot.cputube.org/patch_24_2269.log




Patch is rebased and attached with all the fixes described above.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 7a19b1fd945502ad55f1fa9e61c3014d8715e404 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 18 Sep 2019 15:22:04 +0300
Subject: [PATCH v2] Allow REINDEX and REINDEX CONCURRENTLY to SET TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml |  25 +
 src/backend/catalog/index.c   | 109 ++
 src/backend/commands/cluster.c|   2 +-
 src/backend/commands/indexcmds.c  |  38 +---
 src/backend/commands/tablecmds.c  |  59 +++-
 src/backend/parser/gram.y |  29 --
 src/backend/tcop/utility.c|  22 -
 src/include/catalog/index.h   |   7 +-
 src/include/commands/defrem.h |   6 +-
 src/include/commands/tablecmds.h  |   2 +
 src/include/nodes/parsenodes.h|   2 +
 src/test/regress/input/tablespace.source  |  32 +++
 src/test/regress/output/tablespace.source |  44 +
 13 files changed, 308 insertions(+), 69 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a..192243e58f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE new_tablespace [NOWAIT] ]
 
  
 
@@ -165,6 +166,30 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURR
 

 
+   
+SET TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Can be used only with REINDEX INDEX and
+  REINDEX TABLE, since the system indexes are not
+  movable, but SCHEMA, DATABASE or
+  SYSTEM very likely will has one.  If the
+  NOWAIT option is specified then the command will fail
+  if it is unable to acquire all of the locks required immediately.
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 54288a498c..715abfdf65 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1194,7 +1194,8 @@ index_create(Relation heapRelation,
  * on.  This is called during concurrent reindex processing.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid tablespaceOid, const char *newName)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1324,7 +1325,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char
 			  newInfo,
 			  indexColNames,
 			  indexRelation->rd_rel->relam,
-			  indexRelation->rd_rel->reltablespace,
+			  tablespaceOid ? tablespaceOid : indexRelation->rd_rel->reltablespace,
 			  indexRelation->rd_indcollation,
 			  indclass->values,
 			  indcoloptions->values,
@@ -3297,16 +3298,22 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+reindex_index(Oid indexId, Oid tablespaceOid, bool 

Re: Nondeterministic collations vs. text_pattern_ops

2019-09-18 Thread Peter Eisentraut
On 2019-09-17 17:17, Tom Lane wrote:
> My recommendation is to get rid of the run-time checks and instead
> put a hack like this into DefineIndex or some minion thereof:
> 
>   if ((opclass == TEXT_PATTERN_BTREE_CLASS_OID ||
>opclass == VARCHAR_PATTERN_BTREE_CLASS_OID ||
>opclass == BPCHAR_PATTERN_BTREE_CLASS_OID) &&
>   !get_collation_isdeterministic(collid))
>  ereport(ERROR, ...);

Here is a draft patch.

It will require a catversion change because those operator classes don't
have assigned OIDs so far.

The comment block I just moved over for the time being.  It should
probably be rephrased a bit.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 3b1f52e5292303cba8804ae1f6676973ef903ec0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 Sep 2019 14:28:31 +0200
Subject: [PATCH v1] Prohibit creating text_pattern_ops indexes with
 nondeterministic collations

Discussion: https://www.postgresql.org/message-id/22566.1568675...@sss.pgh.pa.us
---
 src/backend/catalog/index.c| 40 +++
 src/backend/utils/adt/varchar.c| 32 +-
 src/backend/utils/adt/varlena.c| 43 +-
 src/include/catalog/pg_opclass.dat |  9 ---
 4 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1dac2803b0..f2adee3156 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -762,6 +762,46 @@ index_create(Relation heapRelation,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("user-defined indexes on system catalog 
tables are not supported")));
 
+   /*
+* XXX We cannot use a text_pattern_ops index for nondeterministic
+* collations, because these operators intentionally ignore the 
collation.
+* However, the planner has no way to know that, so it might choose such
+* an index for an "=" clause, which would lead to wrong results.  This
+* check here doesn't prevent choosing the index, but it will at least
+* error out if the index is chosen.  A text_pattern_ops index on a 
column
+* with nondeterministic collation is pretty useless anyway, since LIKE
+* etc. won't work there either.  A future possibility would be to
+* annotate the operator class or its members in the catalog to avoid 
the
+* index.  Another alternative is to stay away from the *_pattern_ops
+* operator classes and prefer creating LIKE-supporting indexes with
+* COLLATE "C".
+*/
+   for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+   {
+   Oid collation = collationObjectId[i];
+   Oid opclass = classObjectId[i];
+
+   if (collation)
+   {
+   if (!get_collation_isdeterministic(collation) &&
+   (opclass == TEXT_BTREE_PATTERN_OPS_OID ||
+opclass == VARCHAR_BTREE_PATTERN_OPS_OID ||
+opclass == BPCHAR_BTREE_PATTERN_OPS_OID))
+   {
+   HeapTuple   classtup;
+
+   classtup = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclass));
+   if (!HeapTupleIsValid(classtup))
+   elog(ERROR, "cache lookup failed for 
operator class %u", opclass);
+   ereport(ERROR,
+   
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("nondeterministic 
collations are not supported for operator class \"%s\"",
+   
NameStr(((Form_pg_opclass) GETSTRUCT(classtup))->opcname;
+   ReleaseSysCache(classtup);
+   }
+   }
+   }
+
/*
 * Concurrent index build on a system catalog is unsafe because we tend 
to
 * release locks before committing in catalogs.
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 332dc860c4..b21137c583 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -1105,23 +1105,12 @@ hashbpcharextended(PG_FUNCTION_ARGS)
  */
 
 static int
-internal_bpchar_pattern_compare(BpChar *arg1, BpChar *arg2, Oid collid)
+internal_bpchar_pattern_compare(BpChar *arg1, BpChar *arg2)
 {
int result;
int len1,
len2;
 
-   check_collation_set(collid);
-
-   /*
-* see internal_text_pattern_compare()
-*/
-   if 

Re: enhance SPI to support EXECUTE commands

2019-09-18 Thread Ahsan Hadi
I don't see much use for this because the documentation says that "server's
execute command cannot be used directly within pl/pgsql function (and it is
not needed). Within pl/pgsql you can execute update/delete commands using
pl/pgsql EXECUTE command and get results like row_count using "get
diagnostic".

Why would somebody do what you have shown in your example in pl/pgsql? Or
do you have a more general use-case for this enhancement?

On Thu, Sep 5, 2019 at 11:39 AM Quan Zongliang <
zongliang.q...@postgresdata.com> wrote:

> Dear hackers,
>
> I found that such a statement would get 0 in PL/pgSQL.
>
> PREPARE smt_del(int) AS DELETE FROM t1;
> EXECUTE 'EXECUTE smt_del(100)';
> GET DIAGNOSTICS j = ROW_COUNT;
>
> In fact, this is a problem with SPI, it does not support getting result
> of the EXECUTE command. I made a little enhancement. Support for the
> number of rows processed when executing INSERT/UPDATE/DELETE statements
> dynamically.
>
> Regards,
> Quan Zongliang
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.h...@highgo.ca


Re: [PATCH] Speedup truncates of relation forks

2019-09-18 Thread Fujii Masao
On Tue, Sep 17, 2019 at 2:25 PM Michael Paquier  wrote:
>
> On Tue, Sep 17, 2019 at 01:44:12AM +, Jamison, Kirk wrote:
> > On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:
> >> On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera 
> >> wrote:
>  As committed, the smgrdounlinkfork case is actually dead code; it's
>  never called from anywhere.  I left it in place just in case we want
>  it someday.
> >>>
> >>> but if no use has appeared in 7 years, I say it's time to kill it.
> >>
> >> +1
> >
> > The consensus is we remove it, right?
>
> Yes.  Just adding my +1 to nuke the function.

Okay, so committed.

Regards,

-- 
Fujii Masao




RE: A problem presentaion about ECPG, DECLARE STATEMENT

2019-09-18 Thread kuroda.hay...@fujitsu.com
Dear Peter,

I want to complement about another purpose.
This is that declaring an SQL identifier.

In the oracle (and maybe DB2), the following example is not allowed:

...
EXEC SQL DECLARE cursor CURSOR FOR stmt;
  
EXEC SQL PREPARE stmt FOR "SELECT ..."
...

This is caused because these preprocessor cannot recognize stmt as an SQL 
identifier and
throw an error.
I think DB2 might focus on here, so AT clause is not important for them.
But ECPG can accept these sentences, so it has no meaning for postgres.
That is why I did not mention about it and I focused on the omission of AT 
clause.

Hayato Kuroda
Fujitsu LIMITED



Re: [PATCH] Speedup truncates of relation forks

2019-09-18 Thread Fujii Masao
On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk  wrote:
>
> On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:
> > On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera 
> > wrote:
> > >
> > > On 2019-Sep-13, Fujii Masao wrote:
> > >
> > > > On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk 
> > wrote:
> > >
> > > > > > Please add a preliminary patch that removes the function.  Dead
> > > > > > code is good, as long as it is gone.  We can get it pushed ahead of
> > the rest of this.
> > > > >
> > > > > Alright. I've attached a separate patch removing the smgrdounlinkfork.
> > > >
> > > > Per the past discussion, some people want to keep this "dead"
> > > > function for some reasons. So, in my opinion, it's better to just
> > > > enclose the function with #if NOT_USED and #endif, to keep the
> > > > function itself as it is, and then to start new discussion on
> > > > hackers about the removal of that separatedly from this patch.
> > >
> > > I searched for anybody requesting to keep the function.  I couldn't
> > > find anything.  Tom said in 2012:
> > > https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.us
> >
> > Yes. And I found Andres.
> > https://www.postgresql.org/message-id/20180621174129.hogefyopje4xaznu@al
> > ap3.anarazel.de
> >
> > > > As committed, the smgrdounlinkfork case is actually dead code; it's
> > > > never called from anywhere.  I left it in place just in case we want
> > > > it someday.
> > >
> > > but if no use has appeared in 7 years, I say it's time to kill it.
> >
> > +1
>
> The consensus is we remove it, right?
> Re-attaching the patch that removes the deadcode: smgrdounlinkfork().
>
> ---
> I've also fixed Fujii-san's comments below in the latest attached speedup 
> truncate rel patch (v8).

Thanks for updating the patch!

+ block = visibilitymap_truncate_prepare(rel, 0);
+ if (BlockNumberIsValid(block))
  {
- xl_smgr_truncate xlrec;
+ fork = VISIBILITYMAP_FORKNUM;
+ smgrtruncate(rel->rd_smgr, , 1, );
+
+ if (RelationNeedsWAL(rel))
+ {
+ xl_smgr_truncate xlrec;

I don't think this fix is right. Originally, WAL is generated
even in the case where visibilitymap_truncate_prepare() returns
InvalidBlockNumber. But the patch unexpectedly changed the logic
so that WAL is not generated in that case.

+ if (fsm)
+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

This code means that FreeSpaceMapVacuumRange() is called if FSM exists
even if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
in the case where InvalidBlockNumber is returned.

So I updated the patch based on yours and fixed the above issues.
Attached. Could you review this one? If there is no issue in that,
I'm thinking to commit that.

Regards,

-- 
Fujii Masao


speedup_truncate_forks_fujii.patch
Description: Binary data


Re: PGCOLOR? (Re: pgsql: Unified logging system for command-line programs)

2019-09-18 Thread Peter Eisentraut
On 2019-06-06 11:08, Masahiko Sawada wrote:
> On Tue, Apr 9, 2019 at 9:01 PM Christoph Berg  wrote:
>>
>> Re: Peter Eisentraut 2019-04-09 
>> 
>>> I'm okay with changing it.  As you indicate, I chose the name so that it
>>> doesn't look like a libpq variable.  There are some other PG_ variables
>>> throughout the code, but those appear to be mostly for internal use.
>>> Also, there is GCC_COLORS, LS_COLORS, etc.  But perhaps this wisdom will
>>> be lost on users who just read the man page and get confused. ;-)
>>
>> Do we need two variables to control this? I was only looking at
>> PG_COLOR, and noticed PG_COLORS only later. Keeping PG_COLORS aligned
>> with {GCC,LS}_COLORS makes sense. How about removing PG_COLOR, and
>> making "auto" the default? (Maybe we could still support "PG_COLORS=off")
>>
> 
> I think the if we keep two variables user can set the same value to
> both GCC_COLORS and PG_COLORS. Rather I think it's a problem that
> there is no documentation of PG_COLORS. Thoughts?

It looks like there is documentation for PG_COLORS in the release notes
now, which seems like an odd place.  Suggestions for a better place?

And any more opinions for PG_COLORS vs PGCOLORS naming?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Documentation updates for direct foreign table modification

2019-09-18 Thread Etsuro Fujita
On Tue, Sep 17, 2019 at 3:45 PM Etsuro Fujita  wrote:
> * Commit 7086be6e3 should have documented the limitation that the
> direct modification is disabled when WCO constraints are present, but
> didn't, which is definitely my fault.
>
> * Commit fc22b6623 should have documented the limitation that the
> direct modification is disabled when generated columns are defined,
> but didn't.
>
> Attached is a patch for updating the documentation.

Committed.

Best regards,
Etsuro Fujita




Re: subscriptionCheck failures on nightjar

2019-09-18 Thread Kuntal Ghosh
Hello Michael,

On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier  wrote:
>
> On my side, I have let this thing run for a couple of hours with a
> patched version to include a sleep between the rename and the sync but
> I could not reproduce it either:
> #!/bin/bash
> attempt=0
> while true; do
> attempt=$((attempt+1))
> echo "Attempt $attempt"
> cd $HOME/postgres/src/test/recovery/
> PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
> ERRNUM=$?
> if [ $ERRNUM != 0 ]; then
> echo "Failed at attempt $attempt"
> exit $ERRNUM
> fi
> done
I think the failing test is src/test/subscription/t/010_truncate.pl.
I've tried to reproduce the same failure using your script in OS X
10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
couldn't reproduce the same.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: patch: psql - enforce constant width of last column

2019-09-18 Thread Ahsan Hadi
On Tue, Sep 17, 2019 at 8:16 PM Pavel Stehule 
wrote:

>
>
> út 17. 9. 2019 v 17:06 odesílatel Ahsan Hadi 
> napsal:
>
>> Hi Pavel,
>>
>> I have been trying to reproduce the case of badly displaying last columns
>> of a query result-set. I played around with the legal values for psql
>> border variable but not able to find a case where last columns are badly
>> displayed. Can you please share an example that I can use to reproduce this
>> problem. I will try out your patch once I am able to reproduce the problem.
>>
>
> you need to use pspg, and vertical cursor.
>
> https://github.com/okbob/pspg
> vertical cursor should be active
>

okay thanks for the info. I don't think it was possible to figure this out
by reading the initial post. I will check it out.

does this patch have any value for psql without pspg?


> \pset border 1
> \pset linestyle ascii
> \pset pager always
>
> select * from generate_series(1,3);
>
> Regards
>
> Pavel
>
>
>> Thanks,
>>
>> -- Ahsan
>>
>>
>> On Mon, Sep 9, 2019 at 2:32 PM Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> When I played with vertical cursor support I got badly displayed last
>>> columns when border was not 2. Only when border is 2, then psql displays
>>> last column with same width for each row.
>>>
>>> I think so we can force column width alignment for any border styles
>>> today (for alignment and wrapping styles) or as minimum this behave can be
>>> optional.
>>>
>>> I wrote a patch with pset option "final_spaces", but I don't see a
>>> reason why we trim rows today.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>


Re: some PostgreSQL 12 release notes comments

2019-09-18 Thread Magnus Hagander
On Tue, Sep 17, 2019 at 7:09 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> > * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP
>
> I would remove the "if" part from the major features list, since it's a
> qualification of minor importance.  Instead I'd write something like
>
> * Discovery of LDAP servers using DNS SRV
>

-> "DNS SRV records" I think?


which is a clearer concept that people can easily recognize.
>

+1. The "discovery" part isn't actually part of LDAP, it's part of DNS, so
leaving that part out does a disservice.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Amit Kapila
On Wed, Sep 18, 2019 at 12:19 AM Fabien COELHO  wrote:
>
>
> Attached v9:
>
>   - remove the PART_UNKNOWN and use partitions = -1 to tell
> that there is an error, and partitions >= 1 to print info
>   - use search_path to find at most one pgbench_accounts
> It still uses left join because I still think that it is appropriate.
> I added a lateral to avoid repeating the array_position call
> to manage the search_path, and use explicit pg_catalog everywhere.

It would be good if you can add some more comments to explain the
intent of query.

Few more comments:
*
else
+ {
+ /* PQntupes(res) == 1: normal case, extract the partition status */
+ char *ps = PQgetvalue(res, 0, 1);
+
+ if (ps == NULL)
+ partition_method = PART_NONE;


When can we expect ps as NULL?  If this is not a valid case, then
probably and Assert would be better.

*
+ else if (PQntuples(res) == 0)
+ {
+ /* no pgbench_accounts found, builtin script should fail later */
+ partition_method = PART_NONE;
+ partitions = -1;
+ }

If we don't find pgbench_accounts, let's give error here itself rather
than later unless you have a valid case in mind.

*
+
+ /*
+ * Partition information. Assume no partitioning on any failure, so as
+ * to avoid failing on an older version.
+ */
..
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ /* probably an older version, coldly assume no partitioning */
+ partition_method = PART_NONE;
+ partitions = 0;
+ }

So, here we are silently absorbing the error when pgbench is executed
against older server version which doesn't support partitioning. If
that is the case, then I think if user gives --partitions for the old
server version, it will also give an error?  It is not clear in
documentation whether we support or not using pgbench with older
server versions.  I guess it didn't matter, but with this feature, it
can matter.  Do we need to document this?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Dilip Kumar
On Wed, Sep 18, 2019 at 1:02 PM Fabien COELHO  wrote:
>
*/
+ res = PQexec(con,
+ "select o.n, p.partstrat, pg_catalog.count(p.partrelid) "
+ "from pg_catalog.pg_class as c "
+ "join pg_catalog.pg_namespace as n on (n.oid = c.relnamespace) "
+ "cross join lateral (select
pg_catalog.array_position(pg_catalog.current_schemas(true),
n.nspname)) as o(n) "
+ "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+ "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+ /* right name and schema in search_path */
+ "where c.relname = 'pgbench_accounts' and o.n is not null "
+ "group by 1, 2 "
+ "order by 1 asc "

I have a question, wouldn't it be sufficient to just group by 1?  Are
you expecting multiple pgbench_account tables partitioned by different
strategy under the same schema?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Psql patch to show access methods info

2019-09-18 Thread Alexander Korotkov
On Tue, Sep 17, 2019 at 9:01 PM Alvaro Herrera  wrote:
> It seems strange that there's a way to display AMs, and a way to display
> ops and procs in an opfamily; but there's no way to list what opfamilies
> exist (possibly given an AM as pattern).  Should we add that too?  We
> had \dAf in the original submission, but that seems to have lost along
> the way, not sure why.
>
> I think \dAf is just as critical as \dAo; the former lets you know which
> opfamilies you can use in CREATE INDEX, while the latter lets you know
> which operators would be helped by such an index.  (But, really, only if
> the opfamily name is printed in \d of the index, which we currently
> don't print unless it's non-default ... which is an omission that
> perhaps we should consider fixing).
>
> On the other hand, from a user perspective, what you really want to know
> is: what opfamilies exist for datatype T, and what operators are
> supported by the opfamily I have chosen?  The current patch doesn't
> really help you find that out.

I think you have a point.  Will add \dAf command to the patch.

> I think \dAp isn't terribly informative from a user perspective.  The
> support procs are just an opfamily implementation detail.

I've expressed my opinion regarding \dAp in [1].  In my observations,
some advanced users can write btree/hash opclasses in pl/* languages.
This doesn't require knowledge of core developer.  And they may find
\dAp command useful.  What do you think?

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdtj_w20hTr4fHW4MnpL-pPGU3Mw0A9pRTRBL_XP-WGsyQ%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Define jsonpath functions as stable

2019-09-18 Thread Alexander Korotkov
On Wed, Sep 18, 2019 at 4:13 AM Jonathan S. Katz  wrote:
> Here is a v4. I added some more paragraphs the bullet point that
> explains the different flags to make it feel a bit less dense.

Sorry that I didn't participate this discussion till now.  FWIW, I
agree with selected approach to document differences with XQuery regex
and and forbid 'x' from jsonpath like_regex.  Patch also looks good
for me at the first glance.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Global temporary tables

2019-09-18 Thread Konstantin Knizhnik



On 21.08.2019 11:54, Konstantin Knizhnik wrote:



On 20.08.2019 20:01, Pavel Stehule wrote:
Another solution is wait on ZHeap storage and replica can to have own 
UNDO log.





I thought about implementation of special table access method for
temporary tables.


+1

Unfortunately implementing special table access method for temporary 
tables doesn't solve all problems.

XID generation is not part of table access methods.
So we still need to assign some XID to write transaction at replica 
which will not conflict with XIDs received from master.
Actually only global temp tables can be updated at replica and so 
assigned XIDs can be stored only in tuples of such relations.
But still I am not sure that we can use arbitrary XID for such 
transactions at replica.


Also I upset by amount of functionality which has to be reimplemented 
for global temp tables if we really want to provide access method for 
them:


1. CLOG
2. vacuum
3. MVCC visibility

And still it is not possible to encapsulate all changes need to 
support writes to temp tables at replica inside table access method.
XID assignment, transaction commit and abort, subtransactions - all 
this places need to be patched.




I was able to fully support work with global temp tables at replica 
(including subtransactions).
The patch is attached. Also you can find this version in 
https://github.com/postgrespro/postgresql.builtin_pool/tree/global_temp_hot


Right now transactions at replica updating global temp table are 
assigned special kind of GIDs which are not related with XIDs received 
from master.
So special visibility rules are used for such tables at replica. Also I 
have to patch TransactionIdIsInProgress, TransactionIdDidCommit, 
TransactionIdGetCurrent
functions to correctly handle such XIDs. In principle it is possible to 
implement global temp tables as special heap access method. But it will 
require copying a lot of code (heapam.c)

so I prefer to add few checks to existed functions.

There are still some limitations:
- Number of transactions at replica which update temp tables is limited 
by 2^32 (wraparound problem is not addressed).
- I have to maintain in-memory analog of CLOG for such transactions 
which is also not cropped. It means that for 2^32 transaction size of 
bitmap can grow up to  0.5Gb.


I try to understand what are the following steps in global temp tables 
support.
This is why I want to perform short survey - what people are expecting 
from global temp tables:


1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 1bd579f..2d93f6f 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -153,9 +153,9 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 			buf_state = LockBufHdr(bufHdr);
 
 			fctx->record[i].bufferid = BufferDescriptorGetBuffer(bufHdr);
-			fctx->record[i].relfilenode = bufHdr->tag.rnode.relNode;
-			fctx->record[i].reltablespace = bufHdr->tag.rnode.spcNode;
-			fctx->record[i].reldatabase = bufHdr->tag.rnode.dbNode;
+			fctx->record[i].relfilenode = bufHdr->tag.rnode.node.relNode;
+			fctx->record[i].reltablespace = bufHdr->tag.rnode.node.spcNode;
+			fctx->record[i].reldatabase = bufHdr->tag.rnode.node.dbNode;
 			fctx->record[i].forknum = bufHdr->tag.forkNum;
 			fctx->record[i].blocknum = bufHdr->tag.blockNum;
 			fctx->record[i].usagecount = BUF_STATE_GET_USAGECOUNT(buf_state);
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 38ae240..8a04954 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -608,9 +608,9 @@ apw_dump_now(bool is_bgworker, bool dump_unlogged)
 		if (buf_state & BM_TAG_VALID &&
 			((buf_state & BM_PERMANENT) || dump_unlogged))
 		{
-			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
-			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
-			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.node.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.node.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.node.relNode;
 			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
 			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
 			++num_blocks;
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index a2c44a9..43b4c66 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -158,7 +158,8 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		/* 

Re: Add a GUC variable that control logical replication

2019-09-18 Thread Quan Zongliang

On 2019/9/18 17:11, Peter Eisentraut wrote:

On 2019-09-18 10:39, Quan Zongliang wrote:

Sybase has a feature to turn off replication at the session level: set
replication = off, which can be temporarily turned off when there is a
maintenance action on the table. Our users also want this feature.


These kinds of feature requests are always dubious because just because
Sybase behaves this way for some implementation or architectural reason
doesn't necessarily mean it makes sense for PostgreSQL too.


Agree

Why do you need to turn off replication when there is "maintenance" on a
table?  What does that even mean?

In a table, the user only keep data for a period of time and delete 
expired records every day, involving about 10 million to 20 million 
records at a time. They want to not pass similar bulk operations in 
logical replication.

My English is bad that I use the wrong word “maintenance” in my description.





Re: some PostgreSQL 12 release notes comments

2019-09-18 Thread Peter Eisentraut
On 2019-09-17 22:22, Tom Lane wrote:
> Peter Eisentraut  writes:
>> * Add GSSAPI encryption support (Robbie Harwood, Stephen Frost)
>>   This allows TCP/IP connections to be encrypted when using GSSAPI
>>   authentication without having to set up a separate encryption facility
>>   like SSL.
> Hmm, does that imply that you don't have to have compiled --with-openssl,
> or just that you don't have to bother with setting up SSL certificates?
> But you already don't have to do the latter.  I'd be the first to admit
> that I know nothing about GSSAPI, but this text still doesn't enlighten
> me about why I should learn.

It means, more or less, if you already have the client and the server do
the GSS dance for authentication, you just have to turn on an additional
flag and they'll also encrypt the communication while they're at it.

This does not require SSL support.

So if you already have a Kerberos infrastructure set up, you can get
wire encryption for almost free without having to set up a parallel SSL
CA infrastructure.  Which is great for administration.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add a GUC variable that control logical replication

2019-09-18 Thread Peter Eisentraut
On 2019-09-18 10:39, Quan Zongliang wrote:
> Sybase has a feature to turn off replication at the session level: set 
> replication = off, which can be temporarily turned off when there is a 
> maintenance action on the table. Our users also want this feature.

These kinds of feature requests are always dubious because just because
Sybase behaves this way for some implementation or architectural reason
doesn't necessarily mean it makes sense for PostgreSQL too.

Why do you need to turn off replication when there is "maintenance" on a
table?  What does that even mean?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Allow to_date() and to_timestamp() to accept localized names

2019-09-18 Thread Juan José Santamaría Flecha
On Fri, Sep 13, 2019 at 10:31 PM Alvaro Herrera
 wrote:
>
Thanks for taking a look at this.

> I'm confused why we acquire the MONTH_DIM / etc definitions.  Can't we
> just use lengthof() of the corresponding array?  AFAICS it should work
> just as well.
>

It was because of the length difference between ascii-name arrays,
which were all null-ended, and localized-name arrays. The attached
version uses lengthof().

> I wonder if the "compare first char" thing (seq_search_localized) really
> works when there are multibyte chars in the day/month names.  I think
> the code compares just the first char ... but what if the original
> string uses those funny Unicode non-normalized letters and the locale
> translation uses normalized letters?  My guess is that the first-char
> comparison will fail, but surely you'll want the name to match.
> (There's no month/day name in Spanish that doesn't start with an ASCII
> letter, but I bet there are some in other languages.)  I think the
> localized comparison should avoid the first-char optimization, just
> compare the whole string all the time, and avoid possible weird issues.

The localized search is reformulated in this version to deal with
multibyte normalization. A regression test for this issue is included.


Regards,

Juan José Santamaría Flecha


0001-Allow-localized-month-names-to_date-v2.patch
Description: Binary data


Add a GUC variable that control logical replication

2019-09-18 Thread Quan Zongliang



Sybase has a feature to turn off replication at the session level: set 
replication = off, which can be temporarily turned off when there is a 
maintenance action on the table. Our users also want this feature.
I add a new flag bit in xinfo, control it with a session-level variable, 
when set to true, this flag is written when the transaction is 
committed, and when the logic is decoded it abandons the transaction 
like aborted transactions. Since PostgreSQL has two types of 
replication, I call the variable "logical_replication" to avoid 
confusion and default value is true.


Sample SQL

insert into a values(100);
set logical_replication to off;
insert into a values(200);
reset logical_replication;
insert into a values(300);

pg_recvlogical output(the second is not output.)
BEGIN 492
table public.a: INSERT: col1[integer]:100
COMMIT 492
BEGIN 494
table public.a: INSERT: col1[integer]:300
COMMIT 494

I'm not sure this is the most appropriate way. What do you think?

Regards,
Quan Zongliang
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 9162286c98..d7a04539d6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -82,6 +82,8 @@ bool  XactDeferrable;
 
 intsynchronous_commit = SYNCHRONOUS_COMMIT_ON;
 
+bool   logical_replication = true;
+
 /*
  * When running as a parallel worker, we place only a single
  * TransactionStateData on the parallel worker's state stack, and the XID
@@ -5535,6 +5537,9 @@ XactLogCommitRecord(TimestampTz commit_time,
xl_origin.origin_timestamp = 
replorigin_session_origin_timestamp;
}
 
+   if (!logical_replication)
+   xl_xinfo.xinfo |= XACT_XINFO_LOGIREPL_OFF;
+
if (xl_xinfo.xinfo != 0)
info |= XLOG_XACT_HAS_INFO;
 
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index c53e7e2279..b11c89ff74 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -618,6 +618,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf,
 */
if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
(parsed->dbId != InvalidOid && parsed->dbId != 
ctx->slot->data.database) ||
+   (parsed->xinfo & XACT_XINFO_LOGIREPL_OFF) ||
ctx->fast_forward || FilterByOrigin(ctx, origin_id))
{
for (i = 0; i < parsed->nsubxacts; i++)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89339..0880a2765f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1952,6 +1952,16 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
 
+   {
+   {"logical_replication", PGC_USERSET, REPLICATION_MASTER,
+   gettext_noop("Close logical replication of transactions 
in the session."),
+   NULL
+   },
+   _replication,
+   true,
+   NULL, NULL, NULL
+   },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d714551704..1a7f52ac50 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -81,6 +81,9 @@ typedef enum
 /* Synchronous commit level */
 extern int synchronous_commit;
 
+/* turn on/off logical replication */
+extern boollogical_replication;
+
 /*
  * Miscellaneous flag bits to record events which occur on the top level
  * transaction. These flags are only persisted in MyXactFlags and are intended
@@ -168,6 +171,12 @@ typedef void (*SubXactCallback) (SubXactEvent event, 
SubTransactionId mySubid,
 #define XACT_XINFO_HAS_AE_LOCKS(1U << 6)
 #define XACT_XINFO_HAS_GID (1U << 7)
 
+/*
+ * It indicates that the data affected by this transaction does not need
+ * to be included in the logical replication.
+ */
+#define XACT_XINFO_LOGIREPL_OFF(1U << 28)
+
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
  * need to occur when emulating transaction effects during recovery.


Re: POC: Cleaning up orphaned files using undo logs

2019-09-18 Thread Amit Kapila
On Mon, Sep 16, 2019 at 10:37 PM Robert Haas  wrote:
>
> It seems to me that zheap went wrong in ending up with separate undo
> types for in-place and non-in-place updates. Why not just have ONE
> kind of undo record that describes an update, and allow that update to
> have either one TID or two TIDs depending on which kind of update it
> is? There may be a reason, but I don't know what it is, unless it's
> just that the UnpackedUndoRecord idea that I invented wasn't flexible
> enough and nobody thought of generalizing it. Curious to hear your
> thoughts on this.
>

I think not only TID's, but we also need to two uur_prevundo (previous
undo of the block) pointers.  This is required both when we have to
perform page-wise undo and chain traversal during visibility checks.
So, we can keep a combination of TID and prevundo.

The other thing is that during rollback when we collect the undo for
each page, applying the action for this undo need some thoughts.  For
example, we can't apply the undo to rollback both Insert and
non-inplace-update as both are on different pages.  The reason is that
the page where non-inplace-update has happened might have more undos
that need to be applied before this.  We can somehow make this undo
available to apply while collecting undo for both the heap pages.  I
think there is also a need to
identify which TID is for Insert and which is for non-inplace-update
part of the operation because we won't know that while applying undo
unless we check the state of a tuple on the page.

So, with this idea, we will make one undo record part of multiple
chains which might need some consideration at different places like
above.



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Michael Paquier
On Wed, Sep 18, 2019 at 06:46:24AM +, Smith, Peter wrote:
> I have identified some OSS code where more compile-time asserts could be 
> added. 
> 
> Mostly these are asserting that arrays have the necessary length to
> accommodate the enums that are used to index into them.
> 
> In general the code is already commented with warnings such as:
> * "If you add a new entry, remember to ..."
> * "When modifying this enum, update the table in ..."
> * "Display names for enums in ..."
> * etc.
> 
> But comments can be accidentally overlooked, so adding the
> compile-time asserts can help eliminate human error. 

For some of them it could help, and we could think about a better
location for that stuff than an unused routine.  The indentation of
your patch is weird, with "git diff --check" complaining a lot.

If you want to discuss more about that, could you add that to the next
commit fest?  Here it is:
https://commitfest.postgresql.org/25/
--
Michael


signature.asc
Description: PGP signature


Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Fabien COELHO


Hello Amit,


+fprintf(stderr, "invalid partition type,
expecting \"range\" or \"hash\","

How about "partitioning method" instead of "partition type"?


Indeed, this is a left over from a previous version.


+fprintf(stderr, "--partition-method requires actual
partitioning with --partitions\n");

[...] "--partition-method requires --partitions to be greater than zero"



I think the first suggestion is clear enough. I've put a shorter variant 
in the same spirit:


  "--partitions-method requires greater than zero --partitions"

Attached v10 fixes both messages.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..dd5bb5c215 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning, -1 for bad */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3613,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3687,15 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-	 " with (fillfactor=%d)", fillfactor);
+	 " partition by %s (aid)", PARTITION_METHOD[partition_method]);
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3715,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 1; p <= partitions; p++)
+		{
+			char		query[256];
+
+			if (partition_method == PART_RANGE)
+			{
+int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+char		minvalue[32], maxvalue[32];
+
+/* For RANGE, we use open-ended partitions at the beginning and end */
+if (p == 1)
+	sprintf(minvalue, "minvalue");
+else
+	sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+
+if (p < partitions)
+	sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+else
+	sprintf(maxvalue, "maxvalue");
+
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of pgbench_accounts\n"
+		 "  for values from (%s) to (%s)%s\n",
+		 unlogged_tables ? " unlogged" : "", p,
+		 minvalue, 

Re: Efficient output for integer types

2019-09-18 Thread Kyotaro Horiguchi
Hello.

At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter  wrote in 
<20190918034201.gx31...@fetter.org>
> On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > Folks,
> > > > 
> > > > Please find attached a couple of patches intended to $subject.
> > > > 
> > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > pretty decent.
> > > 
> > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > help in any cases I was testing.
> > 
> > Found a couple of "whiles" that should have been "ifs."
> 
> Factored out some inefficient functions and made the guts use the more
> efficient function.

I'm not sure this is on the KISS principle, but looked it and
have several random comments.


+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)

I don't think that we are allowing that as project coding
policy. It seems to have been introduced only to accept external
code as-is.


-charstr[23];/* sign, 21 digits and '\0' */
+charstr[MAXINT8LEN];

It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
can be so. I think MAXINT8LEN should be 20 and the definition
should be str[MAXINT8LEN + 1].



+static const char DIGIT_TABLE[200] = {
+   '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', 
'0', '7', '0', '8', '0', '9',

Wouldn't it be simpler if it were defined as a constant string?

static const char DIGIT_TABLE[201] =
"00010203040519"
"20212223242539"
..


+pg_ltoa_n(int32 value, char *a)
...
+   /* Compute the result string. */
+   while (value >= 1)

We have only two degits above the value. Isn't the stuff inside
the while a waste of cycles?


+   /* Expensive 64-bit division. Optimize? */

I believe compiler treats such trivial optimizations. (concretely
converts into shifts and subtractons if faster.)


+   memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);

Maybe it'd be easy to read if 'a + olength - i' is a single variable.


+   i += adjust;
+   return i;

If 'a + olength - i' is replaced with a variable, the return
statement is replacable with "return olength + adjust;".


+   return t + (v >= PowersOfTen[t]);

I think it's better that if it were 't - (v < POT[t]) + 1; /*
log10(v) + 1 */'. At least we need an explanation of the
difference.  (I'didn't checked the algorithm is truely right,
though.)


> void
> pg_lltoa(int64 value, char *a)
> {
..
>   memcpy(a, "-9223372036854775808", 21);
..
>   memcpy(a, "0", 2);

The lines need a comment like "/* length contains trailing '\0'
*/"


+   if (value >= 0)
...
+   else
+ {
+   if (value == PG_INT32_MIN)
+   memcpy(str, "-2147483648", 11);
+   return str + 11;
>   }
+   *str++ = '-';
+   return pg_ltostr_zeropad(str, -value, minwidth - 1);

If then block of the if statement were (values < 0), we won't
need to reenter the functaion.


+   len = pg_ltoa_n(value, str);
+   if (minwidth <= len)
+   return str + len;
+
+   memmove(str + minwidth - len, str, len);

If the function had the parameters str with the room only for two
digits and a NUL, 2 as minwidth but 1000 as value, the function
would overrun the buffer. The original function just ignores
overflowing digits.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Don't codegen deform code for virtual tuples in expr eval for scan fetch

2019-09-18 Thread Soumyadeep Chakraborty
Hello Hackers,

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.

Attached is a patch 0001 that achieves exactly that by:

1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().

Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.

---
Soumyadeep Chakraborty


0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of-sc.patch
Description: Binary data


0002-Minor-refactor-JIT-deform-or-not.patch
Description: Binary data


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-18 Thread Amit Khandekar
On Tue, 17 Sep 2019 at 21:19, Andres Freund  wrote:
> On 2019-09-14 14:34:21 -0400, Tom Lane wrote:
> > Amit Khandekar  writes:
> > > Yeah, something like the attached patch. I think this tracking of
> > > offsets would have been cleaner if we add in-built support in VFD. But
> > > yeah, for bank branches at least, we need to handle it outside of VFD.
> > > Or may be we would add it if we find one more use-case.
> >
> > Again, we had that and removed it, for what seem to me to be solid
> > reasons.  It adds cycles when we're forced to close/reopen a file,
> > and it also adds failure modes that we could do without (ie, failure
> > of either the ftell or the lseek, which are particularly nasty because
> > they shouldn't happen according to the VFD abstraction).

Ok. So you mean, when the caller would call FileRead() for sequential
reading, underneath VFD would do a pread(), but if pread() returns
error, the errno can belong to read() or it might as well belong to
lseek(). If it's due to lseek(), it's not expected from the caller
because for the caller it's just a sequential read. Yeah, makes sense.

>>  I do not
> > think there is going to be any argument strong enough to make us
> > put it back, especially not for non-mainstream callers like logical
> > decoding.

Ok. Also, more about putting back is in the below comments ...

>
> Yea, I think that's the right call. Avoiding kernel seeks is quite
> worthwhile, and we shouldn't undo it just because of this usecase. And
> that'll become more and more important performance-wise (and has already
> done so, with all the intel fixes making syscalls much slower).

By the way, I was not thinking about adding back the read() and
lseek() calls. I was saying we continue to use the pread() call, so
it's just a single system call. FileReadAt(..., offset) would do
pread() with user-supplied offset, and FileRead() would do pread()
using internally tracked offset. So for the user, FileReadAt() is like
pread(), and FileRead() would be like read().

But I agree with Tom's objection about having to unnecessarily handle
lseek error codes.

>
> I could see an argument for adding a separate generic layer providing
> position tracking ontop of the VFD abstraction however. Seems quite
> possible that there's some other parts of the system that could benefit
> from using VFDs rather than plain fds. And they'd probably also need the
> positional tracking.

Yeah, that also could be done.

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-18 Thread Smith, Peter
Dear Hackers,

I have identified some OSS code where more compile-time asserts could be added. 

Mostly these are asserting that arrays have the necessary length to accommodate 
the enums that are used to index into them.

In general the code is already commented with warnings such as:
* "If you add a new entry, remember to ..."
* "When modifying this enum, update the table in ..."
* "Display names for enums in ..."
* etc.

But comments can be accidentally overlooked, so adding the compile-time asserts 
can help eliminate human error.

Please refer to the attached patch.

Kind Regards,
Peter Smith
---
Fujitsu Australia


0001-Add-compile-time-asserts.patch
Description: 0001-Add-compile-time-asserts.patch


Re: pgbench - allow to create partitioned tables

2019-09-18 Thread Amit Langote
Hi Fabien,

On Wed, Sep 18, 2019 at 3:49 AM Fabien COELHO  wrote:
> Attached v9:

Thanks.  This seems to work well.

Couple of nitpicks on parameter error messages.

+fprintf(stderr, "invalid partition type,
expecting \"range\" or \"hash\","

How about "partitioning method" instead of "partition type"?

+fprintf(stderr, "--partition-method requires actual
partitioning with --partitions\n");

Assuming that this error message is to direct the user to fix a
mistake they might have inadvertently made in specifying --partitions,
I don't think the message is very clear.  How about:

"--partition-method requires --partitions to be greater than zero"

but this wording might suggest to some users that some partitioning
methods do allow zero partitions.  So, maybe:

"specifying --partition-method requires --partitions to be greater than zero"

Thanks,
Amit




Re: Efficient output for integer types

2019-09-18 Thread David Fetter
On Wed, Sep 18, 2019 at 07:51:42AM +0200, David Fetter wrote:
> On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > Folks,
> > > > > 
> > > > > Please find attached a couple of patches intended to $subject.
> > > > > 
> > > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > pretty decent.
> > > > 
> > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > help in any cases I was testing.
> > > 
> > > Found a couple of "whiles" that should have been "ifs."
> > 
> > Factored out some inefficient functions and made the guts use the more
> > efficient function.
> 
> Fix copy-paste-o that introduced some unneeded 64-bit math.

Removed static annotation that shouldn't have been present.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From ba1b57b230b08582bb3cc9ec91baca5177e63ff5 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v6] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..2c6f39cbc7 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 

Re: Creating partitions automatically at least on HASH?

2019-09-18 Thread Amit Langote
Hello Fabien, Rafia,

Thanks for starting this discussion.

On Tue, Aug 27, 2019 at 5:36 PM Rafia Sabih  wrote:
> On Mon, 26 Aug 2019 at 19:46, Fabien COELHO  wrote:
>> > I happen to start a similar discussion [1] being unaware of this one
>> > and there Ashutosh Sharma talked about interval partitioning in Oracle.
>> > Looking
>> > closely it looks like we can have this automatic partitioning more
>> > convenient by having something similar. Basically, it is creating
>> > partitions on demand or lazy partitioning.
>>
>> Yep, the "what" of dynamic partitioning is more or less straightforward,
>> along the line you are describing.
>>
>> For me there are really two questions:
>>
>>   - having a extendable syntax, hence the mail I sent, which would cover
>> both automatic static & dynamic partitioning and their parameters,
>> given that we already have manual static, automatic static should
>> be pretty easy.
>>
>>   - implementing the stuff, with limited performance impact if possible
>> for the dynamic case, which is non trivial.
>>
>> > To explain a bit more, let's take range partition for example, first
>> > parent table is created and it's interval and start and end values are
>> > specified and it creates only the parent table just like it works today.
>>
>> > Now, if there comes a insertion that does not belong to the existing (or
>> > any, in the case of first insertion) partition(s), then the
>> > corresponding partition is created,
>>
>> Yep. Now, you also have to deal with race conditions issues, i.e. two
>> parallel session inserting tuples that must create the same partition, and
>> probably you would like to avoid a deadlock.
>>
> Hmmm, that shouldn't be very hard. Postgres handles many such things and I 
> think mostly by a mutex guarded shared memory structure. E.g. we can have a 
> shared memory structure associated with the parent table holding the 
> information of all the available partitions, and keep this structure guarded 
> by mutex. Anytime a new partition has to be created the relevant information 
> is first entered in this structure before actually creating it.

I like the Fabien's approach to focus on automatic creation of
partitions only "statically" at first, deferring any complex matters
of the "dynamic" counterpart to a later date.  One advantage is that
we get to focus on the details of the UI for this feature, which has
complexities of its own.  Speaking of which, how about the following
variant of the syntax that Fabien proposed earlier:

CREATE TABLE ... PARTITION BY partition_method (list_of_columns)
partition_auto_create_clause

where partition_auto_create_clause is:

PARTITIONS { IMMEDIATE | DEFERRED } USING (partition_descriptor)

where partition_descriptor is:

MODULUS integer | FROM (range_start) END (range_end) INTERVAL
(range_step) | list_values

where range_ start/end/step is:

(expr [,...])

and list_values is:

(expr [,...]) [, ]

Note that list_values contains one parenthesized list per partition.
This is slightly different from what Robert suggested upthread in that
even a single value needs parentheses.

Automatic creation of multi-column range partitions seems a bit tricky
as thinking about a multi-column "interval" is tricky.

Needless to say, PARTITIONS DEFERRED will cause an unsupported feature
error in the first cut.

Thanks,
Amit