date:20201023

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 08:24:06PM -0400, Tom Lane wrote:
> I'd advise not putting conv_compare() between get_code_entry() and
> the header comment for get_code_entry().  Looks good otherwise.

Indeed.  I have adjusted the position of the comment, and applied the
fix.  Thanks for the report.
--
Michael


signature.asc
Description: PGP signature

Re: Tab complete for alter table rls

2020-10-23 Thread Li Japin

Thanks Michael!

--
Best regards
Japin Li



> On Oct 24, 2020, at 9:49 AM, Michael Paquier  wrote:
> 
> On Fri, Oct 23, 2020 at 04:37:18PM +0900, Michael Paquier wrote:
>> No worries.  Good catch.  I'll try to test that and apply it later,
>> but by reading the code it looks like you got that right.
> 
> Checked and applied on HEAD, thanks!
> --
> Michael

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Tom Lane

Heikki Linnakangas  writes:
> On 23/10/2020 17:51, Tom Lane wrote:
>> Seems like we could have gotten rid of that by now, but when exactly
>> does it become fair game?  And can we have a non-ad-hoc process for
>> getting rid of such cruft?

> I did some grepping for strings like "version 7", "pre-8" and so on. I 
> couldn't come up with a clear rule on what could be removed. Context 
> matters.

Yeah, that's unsurprising.  But thanks for all the effort you put into
this review!

> Findings in detail follow. And attached is a patch about the stuff that 
> I think can be removed pretty straightforwardly.

I agree with the patch, and with your other thoughts, except as noted
below.

> config.sgml (on synchronized_scans):

>  have no ORDER BY clause.  Setting this parameter 
> to
>  off ensures the pre-8.3 behavior in which a 
> sequential
>  scan always starts from the beginning of the table.  The default
>  is on.

> We could remove the reference to 8.3 version. I'm inclined to keep it 
> though.

Maybe s/pre-8.3/simple/, or some similar adjective?

> func.sgml:
> 
>   
>   Before PostgreSQL 8.2, the containment
>   operators @> and <@ 
> were respectively
>   called ~ and @.  These names 
> are still
>   available, but are deprecated and will eventually be removed.
>  
> 

> The old names are still available, so should keep this.

Perhaps it's time to actually remove those operators as threatened here?
That's material for a separate discussion, though.

> If the contents of two arrays are equal but the dimensionality is
> different, the first difference in the dimensionality information
> determines the sort order.  (This is a change from versions of
> PostgreSQL prior to 8.2: older versions 
> would claim
> that two arrays with the same contents were equal, even if the
> number of dimensions or subscript ranges were different.)
>

> Could remove it.

Yeah, I'm OK with removing the parenthetical comment.

>   There are two differences in the behavior of 
> string_to_array
>   from pre-9.1 versions of PostgreSQL.

> Feels too early to remove.

+1.  9.1 was in support till ~4 years ago; 8.2 EOL'd 9 years ago.
I'm not sure where to put the cutoff, but 4 years seems too little.

>
> 
>  Prior to PostgreSQL 8.2, the
>  <, <=, 
> > and >=
>  cases were not handled per SQL specification.  A comparison like
>  ROW(a,b) < ROW(c,d)
>  was implemented as
>  a < c AND b < d
>  whereas the correct behavior is equivalent to
>  a < c OR (a = c AND b < d).
> 
>

> Important incompatibility. Although very old. I'm inclined to keep it. 
> If we remove it, it'd still be useful to explain the new behavior.

Yeah, even if we don't care about 8.2, some of this text is useful
to clarify the behavior of row comparisons.  I haven't looked at
the surrounding material, but I'd not want to just delete this
unless it's clearly duplicative.

>   As of PostgreSQL 8.4, this advice is less
>   necessary since delayed indexing is used (seelinkend="gin-fast-update"/> for details).  But for very large updates
>   it may still be best to drop and recreate the index.

> I think that's old enough, but the paragraph would need some 
> copy-editing, not just removal.

Right, same deal, needs a bit of wordsmithing not just deletion.

>   mode is unused and
>   ignored as of PostgreSQL 8.1; however, for
>   backward compatibility with earlier releases it is best to
>   set it to INV_READ, INV_WRITE,
>   or INV_READ | 
> INV_WRITE.

> We need to say something about 'mode'. Keep.

Maybe s/as of/since/, but otherwise fine.

>  Data of a particular data type might be transmitted in any of several
>  different formats.  As of 
> PostgreSQL 7.4
>  the only supported formats are text and 
> binary,
>  but the protocol makes provision for future extensions.  The desired

> Could replace the "as of PostgreSQL 7.4" with "Currently", but it's not 
> much shorter.

While it's not shorter, I think it's clearer in this context.  7.4
is far enough back that a reader might expect the next sentence to
offer updated info.

> 
>
> Another example — the PostgreSQL 
> mailing
> list archives contained 910,989 unique words with 57,491,343 lexemes in
> 461,020 messages.
>

> Refresh the numbers.

I agree with the comment: if we keep this, there should be an "as of" date
associated with the numbers.

Thanks again for slogging through that!

regards, tom lane

Re: Tab complete for alter table rls

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 04:37:18PM +0900, Michael Paquier wrote:
> No worries.  Good catch.  I'll try to test that and apply it later,
> but by reading the code it looks like you got that right.

Checked and applied on HEAD, thanks!
--
Michael


signature.asc
Description: PGP signature

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Tom Lane

Michael Paquier  writes:
> On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote:
>> On Oct 23, 2020, at 9:07 AM, Tom Lane  wrote:
>>> genhtml: WARNING: function data mismatch at 
>>> /home/postgres/pgsql/src/common/unicode_norm.c:102

> I can see the problem on Debian GID with lcov 1.14-2.  This points to
> the second declaration of get_code_entry().  I think that genhtml,
> because it considers the code of unicode_norm.c as a whole without its
> CFLAGS, gets confused because it finds the same function to index as
> defined twice.  It expects only one definition, hence the warning.  So
> I think that this can lead to some incorrect data in the HTML report,
> and the attached patch takes care of fixing that.  Tom, does it take
> care of the issue on your side?

Good catch!  Yeah, that fixes it for me.

I'd advise not putting conv_compare() between get_code_entry() and
the header comment for get_code_entry().  Looks good otherwise.

regards, tom lane

Re: minor problem in boolean cast

2020-10-23 Thread Tom Lane

Cary Huang  writes:
> I noticed that when casting a string to boolean value with input 'of' it 
> still cast it to 'f'. I think with 'of', it should give an error because 
> 'off' is the expected candidate. This may not be intended so I made a simple 
> patch to address this. 

It's absolutely intended, and documented:

https://www.postgresql.org/docs/devel/datatype-boolean.html

Note the bit about "Unique prefixes of these strings are also accepted".

The code comment just above parse_bool() says the same.

regards, tom lane

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote:
> On Oct 23, 2020, at 9:07 AM, Tom Lane  wrote:
>> genhtml: WARNING: function data mismatch at 
>> /home/postgres/pgsql/src/common/unicode_norm.c:102
>> 
>> I've never seen anything like that before.  I suppose it means that
>> something about 783f0cc64 is a bit fishy, but I don't know what.
>> 
>> The emitted coverage report looks fairly normal anyway.  It says
>> unicode_norm.c has zero test coverage, which is very possibly correct
>> since I wasn't running in UTF8 encoding, but I'm not entirely sure of
>> that either.
> 
> I don't see it on mac nor on ubuntu64.  I get 70.6% coverage of
> lines and 90.9% of functions on ubuntu.

I can see the problem on Debian GID with lcov 1.14-2.  This points to
the second declaration of get_code_entry().  I think that genhtml,
because it considers the code of unicode_norm.c as a whole without its
CFLAGS, gets confused because it finds the same function to index as
defined twice.  It expects only one definition, hence the warning.  So
I think that this can lead to some incorrect data in the HTML report,
and the attached patch takes care of fixing that.  Tom, does it take
care of the issue on your side?
--
Michael
diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index 4ffce0e619..7cc8faa63a 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -53,11 +53,26 @@
  * The backend version of this code uses a perfect hash function for the
  * lookup, while the frontend version uses a binary search.
  */
-#ifndef FRONTEND
+#ifdef FRONTEND
+/* comparison routine for bsearch() of decomposition lookup table. */
+static int
+conv_compare(const void *p1, const void *p2)
+{
+	uint32		v1,
+v2;
+
+	v1 = *(const uint32 *) p1;
+	v2 = ((const pg_unicode_decomposition *) p2)->codepoint;
+	return (v1 > v2) ? 1 : ((v1 == v2) ? 0 : -1);
+}
+
+#endif
 
 static const pg_unicode_decomposition *
 get_code_entry(pg_wchar code)
 {
+#ifndef FRONTEND
+
 	int			h;
 	uint32		hashkey;
 	pg_unicode_decompinfo decompinfo = UnicodeDecompInfo;
@@ -82,33 +97,17 @@ get_code_entry(pg_wchar code)
 
 	/* Success! */
 	return &decompinfo.decomps[h];
-}
 
 #else
 
-/* comparison routine for bsearch() of decomposition lookup table. */
-static int
-conv_compare(const void *p1, const void *p2)
-{
-	uint32		v1,
-v2;
-
-	v1 = *(const uint32 *) p1;
-	v2 = ((const pg_unicode_decomposition *) p2)->codepoint;
-	return (v1 > v2) ? 1 : ((v1 == v2) ? 0 : -1);
-}
-
-static const pg_unicode_decomposition *
-get_code_entry(pg_wchar code)
-{
 	return bsearch(&(code),
    UnicodeDecompMain,
    lengthof(UnicodeDecompMain),
    sizeof(pg_unicode_decomposition),
    conv_compare);
+#endif
 }
 
-#endif			/* !FRONTEND */
 
 /*
  * Given a decomposition entry looked up earlier, get the decomposed


signature.asc
Description: PGP signature

minor problem in boolean cast

2020-10-23 Thread Cary Huang

Hi



I noticed that when casting a string to boolean value with input 'of' it still 
cast it to 'f'. I think with 'of', it should give an error because 'off' is the 
expected candidate. This may not be intended so I made a simple patch to 
address this. 


```

postgres=# select cast('of' as boolean);

 bool 

--

 f

(1 row)

```



Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca

0001-boolean-type-cast-fix.patch
Description: Binary data

Re: new heapcheck contrib module

2020-10-23 Thread Mark Dilger




> On Oct 23, 2020, at 4:12 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> You certainly appear to be right about that.  I've added the extra checks, 
>> and extended the regression test to include them.  Patch attached.
> 
> Pushed with some more fooling about.  The "bit reversal" idea is not
> a sufficient guide to picking values that will hit all the code checks.
> For instance, I was seeing out-of-range warnings on one endianness and
> not the other because on the other one the maxalign check rejected the
> value first.  I ended up manually tweaking the corruption patterns
> until they hit all the tests on both endiannesses.  Pretty much the
> opposite of black-box testing, but it's not like our notions of line
> pointer layout are going to change anytime soon.
> 
> I made some logic rearrangements in the C code, too.

Thanks Tom!  And Peter, your comment earlier save me some time. Thanks to you, 
also!  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Mark Dilger




> On Oct 23, 2020, at 9:07 AM, Tom Lane  wrote:
> 
> I chanced to do an --enable-coverage test run today, and I got this
> weird message during "make coverage-html":
> 
> genhtml: WARNING: function data mismatch at 
> /home/postgres/pgsql/src/common/unicode_norm.c:102
> 
> I've never seen anything like that before.  I suppose it means that
> something about 783f0cc64 is a bit fishy, but I don't know what.
> 
> The emitted coverage report looks fairly normal anyway.  It says
> unicode_norm.c has zero test coverage, which is very possibly correct
> since I wasn't running in UTF8 encoding, but I'm not entirely sure of
> that either.
> 
> This is with RHEL8's lcov-1.13-4.el8 package.  I suppose the first
> question is does anybody else see that?

I don't see it on mac nor on ubuntu64.  I get 70.6% coverage of lines and 90.9% 
of functions on ubuntu.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2020-10-23 Thread Tom Lane

Mark Dilger  writes:
> You certainly appear to be right about that.  I've added the extra checks, 
> and extended the regression test to include them.  Patch attached.

Pushed with some more fooling about.  The "bit reversal" idea is not
a sufficient guide to picking values that will hit all the code checks.
For instance, I was seeing out-of-range warnings on one endianness and
not the other because on the other one the maxalign check rejected the
value first.  I ended up manually tweaking the corruption patterns
until they hit all the tests on both endiannesses.  Pretty much the
opposite of black-box testing, but it's not like our notions of line
pointer layout are going to change anytime soon.

I made some logic rearrangements in the C code, too.

regards, tom lane

Re: new heapcheck contrib module

2020-10-23 Thread Tom Lane

Mark Dilger  writes:
>> On Oct 23, 2020, at 11:51 AM, Tom Lane  wrote:
>> I do not have 64-bit big-endian hardware to play with unfortunately.
>> But what I suspect is happening here is less about endianness and
>> more about alignment pickiness; or maybe we were unlucky enough to
>> index off the end of the shmem segment.

> You certainly appear to be right about that.  I've added the extra checks, 
> and extended the regression test to include them.  Patch attached.

Meanwhile, I've replicated the SIGBUS problem on gaur's host, so
that's definitely what's happening.

(Although PPC is also alignment-picky on the hardware level, I believe
that both macOS and Linux try to mask that by having kernel trap handlers
execute unaligned accesses, leaving only a nasty performance loss behind.
That's why I failed to see this effect when checking your previous patch
on an old Apple box.  We likely won't see it in the buildfarm either,
unless maybe on Noah's AIX menagerie.)

I'll check this patch on gaur and push it if it's clean.

regards, tom lane

Re: Mop-up around psql's \connect behavior

2020-10-23 Thread Tom Lane

Kyotaro Horiguchi  writes:
> At Thu, 22 Oct 2020 15:23:04 -0400, Tom Lane  wrote in 
>> ... The only real objection I can see is that it could
>> hold a server connection open when the user thinks there is none;
>> but that could only happen in a non-interactive script, and it does
>> not seem like a big problem in that case.  We could alternatively
>> not stash the "dead" connection after a non-interactive \connect
>> failure, but I doubt that's better.

> Agreed. Thanks!

After further thought I decided we *must* do it as per my "alternative"
idea.  Consider a script containing
\c db1 user1 live_server
\c db2 user2 dead_server
\c db3
The script would be expecting to connect to db3 at dead_server, but
if we re-use parameters from the first connection then it might
successfully connect to db3 at live_server.  This'd defeat the goal
of not letting a script accidentally execute commands against the
wrong database.

So we have to not save the connection after a failed script \connect.
However, it seems OK to save after a connection loss whether we're
in a script or not; that is,

\c db1 user1 server1
...
(connection dies here)
...  --- these commands will fail
\c db2

The script will be expecting the second \c to re-use parameters
from the first one, and that will still work as expected.

I went ahead and pushed it after adjusting that.

regards, tom lane

Re: new heapcheck contrib module

2020-10-23 Thread Mark Dilger



> On Oct 23, 2020, at 11:51 AM, Tom Lane  wrote:
> 
> Hmm, we're not out of the woods yet: thorntail is even less happy
> than before.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11
> 
> I do not have 64-bit big-endian hardware to play with unfortunately.
> But what I suspect is happening here is less about endianness and
> more about alignment pickiness; or maybe we were unlucky enough to
> index off the end of the shmem segment.  I see that verify_heapam
> does this for non-redirect tuples:
> 
>/* Set up context information about this next tuple */
>ctx.lp_len = ItemIdGetLength(ctx.itemid);
>ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
>ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
> 
> with absolutely no thought for the possibility that lp_off is out of
> range or not maxaligned.  The checks for a sane lp_len seem to have
> gone missing as well.

You certainly appear to be right about that.  I've added the extra checks, and 
extended the regression test to include them.  Patch attached.



v23-0001-Sanity-checking-line-pointers.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Heikki Linnakangas


On 23/10/2020 17:51, Tom Lane wrote:

But anyway, this was about documentation not code.  What I'm wondering
about is when to drop things like, say, this bit in the regex docs:

 Two significant incompatibilities exist between AREs and the ERE syntax
 recognized by pre-7.4 releases of PostgreSQL:
 (etc etc)

Seems like we could have gotten rid of that by now, but when exactly
does it become fair game?  And can we have a non-ad-hoc process for
getting rid of such cruft?


Let's try to zoom in on a rule:

Anything that talks about 9.4 or above (min supported version - 1) 
should definitely be left in place.


Something around 9.0 is possibly still useful to someone upgrading or 
updating an application. Or someone might still bump into old blog posts 
from that era.


Before that, I don't see much value. Although you could argue that I 
jumped the gun on the notice about pre-8.2 pg_dump -t behavior. pg_dump 
still supports servers down to 8.0, so someone might also have an 8.0 
pg_dump binary lying around, and might be confused that -t behaves 
differently. On the whole though, I think removing it was fair game.


I did some grepping for strings like "version 7", "pre-8" and so on. I 
couldn't come up with a clear rule on what could be removed. Context 
matters. In text that talks about protocol versions or libpq functions 
like PQlibVersion() it seems sensible to go back as far as possible, for 
the completeness. And subtle user-visible differences in behavior are 
more important to document than changes in internal C APIs that cause a 
compiler failure, for example.


Other notices are about old syntax that's kept for backwards 
compatibility, but still works. It makes sense to mention the old 
version in those cases, even if it's very old, because the alternative 
would be to just say something like "very old version", which is not any 
shorter, just less precise.



Findings in detail follow. And attached is a patch about the stuff that 
I think can be removed pretty straightforwardly.


array.sgml:
  
   If the value written for an element is NULL (in 
any case

   variant), the element is taken to be NULL.  The presence of any quotes
   or backslashes disables this and allows the literal string value
   NULL to be entered.  Also, for backward compatibility 
with

   pre-8.2 versions of PostgreSQL, the  configuration parameter can be turned
   off to suppress recognition of 
NULL as a NULL.

  

The GUC still exists, so we should keep this.

catalogs.sgml:
  
   The view pg_group exists for backwards
   compatibility: it emulates a catalog that existed in
   PostgreSQL before version 8.1.
   It shows the names and members of all roles that are marked as not
   rolcanlogin, which is an approximation to 
the set

   of roles that are being used as groups.
  

pg_group still exists, and that paragraph explains why. We should keep 
it. (There's a similar paragraph for pg_shadow)


config.sgml (on synchronized_scans):

   
This allows sequential scans of large tables to synchronize 
with each

other, so that concurrent scans read the same block at about the
same time and hence share the I/O workload.  When this is enabled,
a scan might start in the middle of the table and then wrap
around the end to cover all rows, so as to synchronize 
with the

activity of scans already in progress.  This can result in
unpredictable changes in the row ordering returned by queries that
have no ORDER BY clause.  Setting this 
parameter to
off ensures the pre-8.3 behavior in which a 
sequential

scan always starts from the beginning of the table.  The default
is on.
   

We could remove the reference to 8.3 version. I'm inclined to keep it 
though.


func.sgml (String Functions and Operators):


 Before PostgreSQL 8.3, these functions 
would
 silently accept values of several non-string data types as well, 
due to

 the presence of implicit coercions from those data types to
 text.  Those coercions have been removed because they 
frequently
 caused surprising behaviors.  However, the string concatenation 
operator
 (||) still accepts non-string input, so long as 
at least one

 input is of a string type, as shown in .  For other cases, insert an explicit
 coercion to text if you need to duplicate the 
previous behavior.


   

Could remove the reference to 8.3, but the information about || still 
makes sense. I'm inclined to just keep it.


func.sgml:
   
 
 Before PostgreSQL 8.2, the containment
 operators @> and <@ 
were respectively
 called ~ and @.  These names 
are still

 available, but are deprecated and will eventually be removed.

   

The old names are still available, so should keep this.

func.sgml:
   
Before PostgreSQL 8.1, the arguments of the
sequence functions were of type text, not 
regclass, and

the above-described conve

Re: new heapcheck contrib module

2020-10-23 Thread Peter Geoghegan

On Fri, Oct 23, 2020 at 11:51 AM Tom Lane  wrote:
> /* Set up context information about this next tuple */
> ctx.lp_len = ItemIdGetLength(ctx.itemid);
> ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
> ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
>
> with absolutely no thought for the possibility that lp_off is out of
> range or not maxaligned.  The checks for a sane lp_len seem to have
> gone missing as well.

That is surprising. verify_nbtree.c has PageGetItemIdCareful() for
this exact reason.

-- 
Peter Geoghegan

Re: new heapcheck contrib module

2020-10-23 Thread Tom Lane

Hmm, we're not out of the woods yet: thorntail is even less happy
than before.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11

I do not have 64-bit big-endian hardware to play with unfortunately.
But what I suspect is happening here is less about endianness and
more about alignment pickiness; or maybe we were unlucky enough to
index off the end of the shmem segment.  I see that verify_heapam
does this for non-redirect tuples:

/* Set up context information about this next tuple */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);

with absolutely no thought for the possibility that lp_off is out of
range or not maxaligned.  The checks for a sane lp_len seem to have
gone missing as well.

regards, tom lane

Re: [var]char versus character [varying]

2020-10-23 Thread Tom Lane

James Coleman  writes:
> I've been wondering recently why the external canonical form of types
> like char and varchar doesn't match the typname in pg_type.

Mostly because the SQL standard wants certain spellings, some of
which aren't even single words (e.g. DOUBLE PRECISION).  There
are cases where we could have changed internal names to match up
with the spec name, but that won't work for all cases, and people
have some attachment to the existing names anyway.

> But I'm not following what would actually break if it weren't done
> this way. Is the issue that a user defined type (in a different
> schema, perhaps?) could overshadow the system type?

That's one thing, and the rules about typmods are another.  For
instance the spec says that BIT without any other decoration means
BIT(1), so that we have this:

regression=# select '111'::bit;
 bit 
-
 1
(1 row)

versus

regression=# select '111'::"bit";
 bit 
-
 111
(1 row)

The latter means "bit without any length constraint", which is
something the spec doesn't actually support.  So when we have
bit with typmod -1, we must spell it "bit" with quotes.

> And would it make more sense (though I'm not volunteering right now to
> write such a patch :D) to have these names be an additional column on
> pg_type so that they can be queried by the user?

Not particularly, because some of these types actually have several
different spec-approved spellings, eg VARCHAR, CHAR VARYING,
CHARACTER VARYING are all in the standard.

regards, tom lane

Re: new heapcheck contrib module

2020-10-23 Thread Mark Dilger

> On Oct 23, 2020, at 11:04 AM, Tom Lane  wrote:
> 
> I wrote:
>> Mark Dilger  writes:
>>> The patch I *should* have attached last night this time:
> 
>> Thanks, I'll do some big-endian testing with this.
> 
> Seems to work, so I pushed it (after some compulsive fooling
> about with whitespace and perltidy-ing).

Thanks for all the help!

> It appears to me that
> the code coverage for verify_heapam.c is not very good though,
> only circa 50%.  Do we care to expend more effort on that?

Part of the issue here is that I developed the heapcheck code as a sequence of 
patches, and there is much greater coverage in the tests in the 0002 patch, 
which hasn't been committed yet.  (Nor do I know that it ever will be.)  Over 
time, the patch set became:

0001 -- adds verify_heapam.c to contrib/amcheck, with basic test coverage
0002 -- adds pg_amcheck command line interface to contrib/pg_amcheck, with more 
extensive test coverage
0003 -- creates a non-throwing interface to clog
0004 -- uses the non-throwing clog interface from within verify_heapam
0005 -- adds some controversial acl checks to verify_heapam

Your question doesn't have much to do with 3,4,5 above, but it definitely 
matters whether we're going to commit 0002.  The test in that patch, in 
contrib/pg_amcheck/t/004_verify_heapam.pl, does quite a bit of bit twiddling of 
heap tuples and toast records and checks that the right corruption messages 
come back.  Part of the reason I was trying to keep 0001's 
t/001_verify_heapam.pl test ignorant of the exact page layout information is 
that I already had this other test that covers that.

So, should I port that test from (currently non-existant) contrib/pg_amcheck 
into contrib/amcheck, or should we wait to see if the 0002 patch is going to 
get committed?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: vacuum -vs reltuples on insert only index

2020-10-23 Thread Peter Geoghegan

On Fri, Oct 23, 2020 at 8:51 AM Jehan-Guillaume de Rorthais
 wrote:
> Before 0d861bbb70, btvacuumpage was adding to relation stats the number of
> leaving lines in the block using:
>
>   stats->num_index_tuples += maxoff - minoff + 1;
>
> After 0d861bbb70, it is set using new variable nhtidslive:
>
>   stats->num_index_tuples += nhtidslive
>
> However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback)
> is set, which seems not to be the case on select-only workload.

I agree that that's a bug.

> A naive fix might be to use "maxoff - minoff + 1" when callback==NULL.

The problem with that is that we really should use nhtidslive (or
something like it), and we're not really willing to do the work to get
that information when callback==NULL. We could use "maxoff - minoff +
1" in the way you suggest, but that will be only ~30% of what
nhtidslive would be in pages where deduplication is maximally
effective (which is not at all uncommon -- you only need about 10 TIDs
per distinct value for the space savings to saturate like this).

GIN does this for cleanup (but not for delete, which has a real count
available):

/*
 * XXX we always report the heap tuple count as the number of index
 * entries.  This is bogus if the index is partial, but it's real hard to
 * tell how many distinct heap entries are referenced by a GIN index.
 */
stats->num_index_tuples = Max(info->num_heap_tuples, 0);
stats->estimated_count = info->estimated_count;

I suspect that we need to move in this direction within nbtree. I'm a
bit concerned about the partial index problem, though. I suppose maybe
we could do it the old way (which won't account for posting list
tuples) during cleanup as you suggest, but only use the final figure
when it turns out to have been a partial indexes. For other indexes we
could do what GIN does here.

Anybody else have thoughts on this?

--
Peter Geoghegan

[var]char versus character [varying]

2020-10-23 Thread James Coleman

I've been wondering recently why the external canonical form of types
like char and varchar doesn't match the typname in pg_type.
Additionally, the alternative/extended names are hardcoded in
format_type.c rather than being an additional column in that catalog
table.

I would have assumed there were largely historical reasons for this,
but I see the following relevant comments in that file:

/*
* See if we want to special-case the output for certain built-in types.
* Note that these special cases should all correspond to special
* productions in gram.y, to ensure that the type name will be taken as a
* system type, not a user type of the same name.
*
* If we do not provide a special-case output here, the type name will be
* handled the same way as a user type name --- in particular, it will be
* double-quoted if it matches any lexer keyword. This behavior is
* essential for some cases, such as types "bit" and "char".
*/

But I'm not following what would actually break if it weren't done
this way. Is the issue that a user defined type (in a different
schema, perhaps?) could overshadow the system type?

And would it make more sense (though I'm not volunteering right now to
write such a patch :D) to have these names be an additional column on
pg_type so that they can be queried by the user?

Thanks,
James

Re: new heapcheck contrib module

2020-10-23 Thread Tom Lane

I wrote:
> Mark Dilger  writes:
>> The patch I *should* have attached last night this time:

> Thanks, I'll do some big-endian testing with this.

Seems to work, so I pushed it (after some compulsive fooling
about with whitespace and perltidy-ing).  It appears to me that
the code coverage for verify_heapam.c is not very good though,
only circa 50%.  Do we care to expend more effort on that?

regards, tom lane

Re: Deleting older versions in unique indexes to avoid page splits

2020-10-23 Thread Peter Geoghegan

On Fri, Oct 23, 2020 at 9:03 AM Simon Riggs  wrote:
> Please publish details of how long a pre-split cleaning operation
> takes and what that does to transaction latency. It *might* be true
> that the cost of avoiding splits is worth it in balance against the
> cost of splitting, but it might not.

I don't think that you understand how the patch works. I cannot very
well isolate that cost because the patch is designed to only pay it
when there is a strong chance of getting a much bigger reward, and
when the only alternative is to split the page. When it fails the
question naturally doesn't come up again for the same two pages that
follow from the page split. As far as I know the only cases that are
regressed all involve small indexes with lots of contention, which is
not surprising. And not necessarily due to the heap page accesses -
making indexes smaller sometimes has that effect, even when it happens
due to things like better page split heuristics.

If anybody finds a serious problem with my patch then it'll be a
weakness or hole in the argument I just made -- it won't have much to
do with how expensive any of these operations are in isolation. It
usually isn't sensible to talk about page splits as isolated things.
Most of my work on B-Trees in the past couple of years built on the
observation that sometimes successive page splits are related to each
other in one way or another.

It is a fallacy of composition to think of the patch as a thing that
prevents some page splits. The patch is valuable because it more or
less eliminates *unnecessary* page splits (and makes it so that there
cannot be very many TIDs for each logical row in affected indexes).
The overall effect is not linear. If you added code to artificially
make the mechanism fail randomly 10% of the time (at the point where
it becomes clear that the current operation would otherwise be
successful) that wouldn't make the resulting code 90% as useful as the
original. It would actually make it approximately 0% as useful. On
human timescales the behavioral difference between this hobbled
version of my patch and the master branch would be almost
imperceptible.

It's obvious that a page split is more expensive than the delete
operation (when it works out). It doesn't need a microbenchmark (and I
really can't think of one that would make any sense). Page splits
typically have WAL records that are ~4KB in size, whereas the
opportunistic delete records are almost always less than 100 bytes,
and typically close to 64 bytes -- which is the same size as most
individual leaf page insert WAL records. Plus you have almost double
the FPIs going forward with the page split.

> You've shown us a very nice paper analysing the page split waves, but
> we need a similar detailed analysis so we can understand if what you
> propose is better or not (and in what situations).

That paper was just referenced in passing. It isn't essential to the
main argument.

> The leaf page locks are held for longer, so we need to perform
> sensible tests that show if this has a catastrophic effect on related
> workloads, or not.
>
> The SELECT tests proposed need to be aimed at the same table, at the same 
> time.

But that's exactly what I did the first time!

I had two SELECT statements against the same table. They use almost
the same distribution as the UPDATE, so that they'd hit the same part
of the key space without it being exactly the same as the UPDATE from
the same xact in each case (I thought that if it was exactly the same
part of the table then that might unfairly favor my patch).

> > The strength of the design is in how clever it isn't.
>
> What it doesn't do could be good or bad so we need to review more
> details on behavior. Since the whole idea of the patch is to change
> behavior, that seems a reasonable ask. I don't have any doubt about
> the validity of the approach or coding.

I agree, but the patch isn't the sum of its parts. You need to talk
about a workload or a set of conditions, and how things develop over
time.

-- 
Peter Geoghegan

Re: The ultimate extension hook.

2020-10-23 Thread Jehan-Guillaume de Rorthais

On Thu, 24 Sep 2020 17:08:44 +1200
David Rowley  wrote:
[...]
> I wondered if there was much in the way of use-cases like a traffic
> filter, or statement replication. I wasn't sure if it was a solution
> looking for a problem or not, but it seems like it could be productive
> to talk about possibilities here and make a judgement call based on if
> any alternatives exist today that will allow that problem to be solved
> sufficiently in another way.

If I understand correctly the proposal, this might enable traffic capture using
a loadable extension.

This kind of usage would allows to replay and validate any kind of traffic from
a major version to another one. Eg. to look for regressions from the application
point of view, before a major upgrade.

I did such regression tests in past. We were capturing production traffic
using libpcap and replay it using pgshark on upgraded test env. Very handy.
However:

* libpcap can drop network packet during high load. This make the capture
  painful to recover past the hole.
* useless with encrypted traffic

So, +1 for such hooks.

Regards,

Re: [PATCH] Add section headings to index types doc

2020-10-23 Thread David G. Johnston

On Fri, Oct 23, 2020 at 3:18 AM Jürgen Purtz  wrote:

> and add a link to the "CREATE INDEX" command from the chapter preamble.
>
> is the link necessary?
>

I suppose it would make more sense to add it to the previous section - the
introduction page.  I do think having a link (or more than one) to CREATE
INDEX from the Indexes chapter is reader friendly.  Having links to SQL
commands is never truly necessary - the reader knows a SQL command
reference exists and the name of the command allows them to find the
correct page.

David J.

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Tom Lane

I chanced to do an --enable-coverage test run today, and I got this
weird message during "make coverage-html":

genhtml: WARNING: function data mismatch at 
/home/postgres/pgsql/src/common/unicode_norm.c:102

I've never seen anything like that before.  I suppose it means that
something about 783f0cc64 is a bit fishy, but I don't know what.

The emitted coverage report looks fairly normal anyway.  It says
unicode_norm.c has zero test coverage, which is very possibly correct
since I wasn't running in UTF8 encoding, but I'm not entirely sure of
that either.

This is with RHEL8's lcov-1.13-4.el8 package.  I suppose the first
question is does anybody else see that?

regards, tom lane

Re: Deleting older versions in unique indexes to avoid page splits

2020-10-23 Thread Simon Riggs

On Thu, 22 Oct 2020 at 18:42, Peter Geoghegan  wrote:

> > The average latency is x2. What is the distribution of latencies?
> > Occasional very long or all uniformly x2?
>
> The latency is generally very even with the patch. There is a constant
> hum of cleanup by the new mechanism in the case of the benchmark
> workload. As opposed to a cascade of page splits, which occur in
> clearly distinct correlated waves.

Please publish details of how long a pre-split cleaning operation
takes and what that does to transaction latency. It *might* be true
that the cost of avoiding splits is worth it in balance against the
cost of splitting, but it might not.

You've shown us a very nice paper analysing the page split waves, but
we need a similar detailed analysis so we can understand if what you
propose is better or not (and in what situations).

> > I would guess that holding the page locks will also slow down SELECT
> > workload, so I think you should also report that workload as well.
> >
> > Hopefully that will be better in the latest version.
>
> But the same benchmark that you're asking about here has two SELECT
> statements and only one UPDATE. It already is read-heavy in that
> sense. And we see that the latency is also significantly improved for
> the SELECT queries.
>
> Even if there was often a performance hit rather than a benefit (which
> is definitely not what we see), it would still probably be worth it.
> Users create indexes for a reason. I believe that we are obligated to
> maintain the indexes to a reasonable degree, and not just when it
> happens to be convenient to do so in passing.

The leaf page locks are held for longer, so we need to perform
sensible tests that show if this has a catastrophic effect on related
workloads, or not.

The SELECT tests proposed need to be aimed at the same table, at the same time.

> The strength of the design is in how clever it isn't.

What it doesn't do could be good or bad so we need to review more
details on behavior. Since the whole idea of the patch is to change
behavior, that seems a reasonable ask. I don't have any doubt about
the validity of the approach or coding.

What you've done so far is very good and I am very positive about
this, well done.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

vacuum -vs reltuples on insert only index

2020-10-23 Thread Jehan-Guillaume de Rorthais

Hello,

I've found a behavior change with pg_class.reltuples on btree index. With only
insert activity on a table, when an index is processed, its related reltuples
is set to 0. Here is a demo script:

  -- force index cleanup
  set vacuum_cleanup_index_scale_factor to 0;

  drop table if exists t;
  create table t as select i from generate_series(1, 100) i;
  create index t_i on t(i);

  -- after index creation its reltuples is correct
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 100

  -- vacuum set index reltuples to 0
  vacuum t;
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 0

  -- analyze set it back to correct value
  analyze t;
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 100

  -- insert + vacuum reset it again to 0
  insert into t values(101);
  vacuum (verbose off, analyze on, index_cleanup on) t;
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 0

  -- delete + vacuum set it back to correct value
  delete from t where i=10;
  vacuum (verbose off, analyze on, index_cleanup on) t;
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 100

  -- and back to 0 again with insert+vacuum
  insert into t values(102);
  vacuum (verbose off, analyze on, index_cleanup on) t;
  select reltuples from pg_class where relname = 't_i' 
  -- result: reltuples | 0

Before 0d861bbb70, btvacuumpage was adding to relation stats the number of
leaving lines in the block using:

  stats->num_index_tuples += maxoff - minoff + 1;

After 0d861bbb70, it is set using new variable nhtidslive:

  stats->num_index_tuples += nhtidslive

However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback)
is set, which seems not to be the case on select-only workload.

A naive fix might be to use "maxoff - minoff + 1" when callback==NULL.

Thoughts?

Regards,

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > We do need to decide at what point we're going to move forward pg_dump's
> > oldest server version support.
> 
> I'm not really in a big hurry to move it forward at all.  There were
> good solid reasons to drop support for pre-schema and pre-pg_depend
> servers, because of the messy kluges pg_dump had to implement
> to provide only-partial workarounds for those lacks.  But I don't
> see comparable reasons or code savings that we'll get from dropping
> later versions.
> 
> There is an argument for dropping support for server versions that
> fail to build anymore with modern toolchains, since once that happens
> it becomes difficult to test, unless you have old executables already
> laying around.  But I don't think we're at that point yet for 8.0 or
> later.  (I rebuilt 7.4 and later when I updated my workstation to
> RHEL8 a few months ago, and they seem fine, though I did use -O0 out of
> fear of -faggressive-loop-optimizations bugs for anything before 8.2.)

Along those same lines though- keeping all of the versions working with
pg_dump requires everyone who is working with pg_dump to have those old
versions not just able to compile but to also take the time to test
against those older versions when making changes.

> But anyway, this was about documentation not code.

Perhaps it didn't come across very well, but I was making an argument
that we should consider them both under a general "every 5 years, go
through and clean out anything that's older than 10 years" type of
policy.  I don't know that we need to spend time doing it every year,
but I wouldn't be against it either.

> What I'm wondering
> about is when to drop things like, say, this bit in the regex docs:
> 
> Two significant incompatibilities exist between AREs and the ERE syntax
> recognized by pre-7.4 releases of PostgreSQL:
> (etc etc)
> 
> Seems like we could have gotten rid of that by now, but when exactly
> does it become fair game?  And can we have a non-ad-hoc process for
> getting rid of such cruft?

I agree we should get rid of it and I'm suggesting our policy be that we
only go back about 10 years.  As for the process part, I suggested that
we make it a every-5-year thing, but we could make it be part of the
annual process instead.

We have a number of general tasks that go into each major release and
some of that process is documented, though it seems like a lot isn't as
explicitly spelled out as perhaps it should be.  Here I'm thinking about
things like:

- Get a CFM for each commitfest
- Form an RMT for each major release
- Figure out who will run each major/minor release
- Get translations done
- Review contributors to see who might become a committer
- other things, I'm sure

"Clean up documentation and remove things older than 10 years" could be
another item to get checked off each year.  We might consider looking at
Debian-

https://wiki.debian.org/Teams/ReleaseTeam

and

https://wiki.debian.org/Teams/ReleaseTeam/ReleaseCheckList

Perhaps the past RMTs have thought about this also.  Having these things
written down and available would be good though, and then we should make
sure that they're assigned out and get addressed (maybe that becomes
part of what the RMT does, maybe not).

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Tom Lane

Stephen Frost  writes:
> We do need to decide at what point we're going to move forward pg_dump's
> oldest server version support.

I'm not really in a big hurry to move it forward at all.  There were
good solid reasons to drop support for pre-schema and pre-pg_depend
servers, because of the messy kluges pg_dump had to implement
to provide only-partial workarounds for those lacks.  But I don't
see comparable reasons or code savings that we'll get from dropping
later versions.

There is an argument for dropping support for server versions that
fail to build anymore with modern toolchains, since once that happens
it becomes difficult to test, unless you have old executables already
laying around.  But I don't think we're at that point yet for 8.0 or
later.  (I rebuilt 7.4 and later when I updated my workstation to
RHEL8 a few months ago, and they seem fine, though I did use -O0 out of
fear of -faggressive-loop-optimizations bugs for anything before 8.2.)

But anyway, this was about documentation not code.  What I'm wondering
about is when to drop things like, say, this bit in the regex docs:

Two significant incompatibilities exist between AREs and the ERE syntax
recognized by pre-7.4 releases of PostgreSQL:
(etc etc)

Seems like we could have gotten rid of that by now, but when exactly
does it become fair game?  And can we have a non-ad-hoc process for
getting rid of such cruft?

regards, tom lane

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Ian Lawrence Barwick

2020年10月23日(金) 23:12 Stephen Frost :
>
> Greetings,
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > Isn't this a bit pre-mature as we still support running pg_dump against
> > > 8.0 clusters..?
> >
> > The removed para was discussing the behavior of pg_dump itself.  What
> > server version you run it against isn't relevant.
>
> Ah, alright, that makes a bit more sense then..

Yes, it's removing a note regarding a behavioural change between pg_dump
introduced in 8.2. This will severely inconvenience anyone who has emerged
from a coma they fell into before December 2006 and who is just getting to grips
with the brave new world of post-8.1 pg_dump, but anyone running pg_dump
against an 8.x server has hopefully caught up with the change sometime
during the last 14 years.

> > Having said that, there are a *lot* of past-their-sell-by-date bits
> > of info throughout our documentation, because we don't have any sort
> > of policy or mechanism for getting rid of this kind of backwards
> > compatibility note.  Maybe we should first try to agree on a policy
> > for when it's okay to remove such info.
>
> I would have thought the general policy would be "match what the tool
> works with", so if we've got references to things about how pg_dump
> works against older-than-8.0 then we should clearly remove those as
> pg_dump no londer will run against versions that old.
>
> Extending that to more general notes would probably make sense though.
> That is- we'll keep anything relevant to the oldest version that pg_dump
> runs against (since I'm pretty sure pg_dump's compatibility goes the
> farthest back of anything we've got in core and probably always will).

Obviously any references to supporting functionality which is no longer
actually supported should be updated/removed. Any notes about behavioural
differences between two versions no longer under community support (such as
the bit removed by this patch) seems like fair game (though I'm sure there are
exceptions). However I'm not sure what else there is out there which needs
consideration.

> We do need to decide at what point we're going to move forward pg_dump's
> oldest server version support.  (...)

I suggest starting a new thread for that.

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: new heapcheck contrib module

2020-10-23 Thread Tom Lane

Mark Dilger  writes:
> The patch I *should* have attached last night this time:

Thanks, I'll do some big-endian testing with this.

regards, tom lane

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Isn't this a bit pre-mature as we still support running pg_dump against
> > 8.0 clusters..?
> 
> The removed para was discussing the behavior of pg_dump itself.  What
> server version you run it against isn't relevant.

Ah, alright, that makes a bit more sense then..

> Having said that, there are a *lot* of past-their-sell-by-date bits
> of info throughout our documentation, because we don't have any sort
> of policy or mechanism for getting rid of this kind of backwards
> compatibility note.  Maybe we should first try to agree on a policy
> for when it's okay to remove such info.

I would have thought the general policy would be "match what the tool
works with", so if we've got references to things about how pg_dump
works against older-than-8.0 then we should clearly remove those as
pg_dump no londer will run against versions that old.

Extending that to more general notes would probably make sense though.
That is- we'll keep anything relevant to the oldest version that pg_dump
runs against (since I'm pretty sure pg_dump's compatibility goes the
farthest back of anything we've got in core and probably always will).

We do need to decide at what point we're going to move forward pg_dump's
oldest server version support.  I had thought we would do that with each
top-level major version change (eg: support 8.0+ until we reach 11.0 or
someting), but that doesn't work since we've moved to a single integer
for major versions.  Looking at the timeline though:

2016-10-12: 64f3524e2c8deebc02808aa5ebdfa17859473add Removed pre-8.0
2005-01-19: 8.0 released

So, that's about 10 years.

2010-09-20: 9.0 released

Or about 10 years from today, which seems to me to imply we should
probably be considering moving pg_dump forward already.  I'm not really
inclined to do this every year as I don't really think it's helpful, but
once every 5 years or so probably makes sense.  To be a bit more
specific about my thoughts:

- Move pg_dump up to 9.0 as the required minimum, starting with v14.
- In about 5 years or so, move pg_dump up to minimum of v10.

(clean up all documentation with older references and such too)

If we wanted to be particularly cute about it, we could wait until v15
to drop support for older-than-9.0, and then v20 would remove support
for older-than-10, and then v25 would remove support for
older-than-v15, etc.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: new heapcheck contrib module

2020-10-23 Thread Mark Dilger



> On Oct 22, 2020, at 9:21 PM, Mark Dilger  wrote:
> 
> 
> 
>> On Oct 22, 2020, at 7:01 PM, Tom Lane  wrote:
>> 
>> Mark Dilger  writes:
>>> Ahh, crud.  It's because
>>> syswrite($fh, '\x77\x77\x77\x77', 500)
>>> is wrong twice.  The 500 was wrong, but the string there isn't the bit 
>>> pattern we want -- it's just a string literal with backslashes and such.  
>>> It should have been double-quoted.
>> 
>> Argh.  So we really have, using same test except
>> 
>>  memcpy(&lp, "\\x77", sizeof(lp));
>> 
>> little endian:   off = 785c, flags = 2, len = 1b9b
>> big endian:  off = 2e3c, flags = 0, len = 3737
>> 
>> which explains the apparent LP_DEAD result.
>> 
>> I'm not particularly on board with your suggestion of "well, if it works
>> sometimes then it's okay".  Then we have no idea of what we really tested.
>> 
>>  regards, tom lane
> 
> Ok, I've pruned it down to something you may like better.  Instead of just 
> checking that *some* corruption occurs, it checks the returned corruption 
> against an expected regex, and if it fails to match, you should see in the 
> logs what you got vs. what you expected.
> 
> It only corrupts the first two line pointers, the first one with 0x 
> and the second one with 0x, which are consciously chosen to be 
> bitwise reverses of each other and just strings of alternating bits rather 
> than anything that could have a more complicated interpretation.
> 
> On my little-endian mac, the 0x value creates a line pointer which 
> redirects to an invalid offset 0x, which gets reported as decimal 30583 
> in the corruption report, "line pointer redirection to item at offset 30583 
> exceeds maximum offset 38".  The test is indifferent to whether the 
> corruption it is looking for is reported relative to the first line pointer 
> or the second one, so if endian-ness matters, it may be the 0x that 
> results in that corruption message.  I don't have a machine handy to test 
> that.  It would be nice to determine the minimum amount of paranoia necessary 
> to make this portable and not commit the rest.

Obviously, that should have said 0x and 0x.  After writing the 
patch that way, I checked that the old value 0x also works on my mac, 
which it does, and checked that writing the line pointers starting at offset 24 
rather than 32 works on my mac, which it does, and then went on to write this 
rather confused email and attached the patch with those changes, which all work 
(at least on my mac) but are potentially less portable than what I had before 
testing those changes.

I apologize for any confusion my email from last night may have caused.

The patch I *should* have attached last night this time:



regress.patch.WIP.2
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Tom Lane

Stephen Frost  writes:
> Isn't this a bit pre-mature as we still support running pg_dump against
> 8.0 clusters..?

The removed para was discussing the behavior of pg_dump itself.  What
server version you run it against isn't relevant.

Having said that, there are a *lot* of past-their-sell-by-date bits
of info throughout our documentation, because we don't have any sort
of policy or mechanism for getting rid of this kind of backwards
compatibility note.  Maybe we should first try to agree on a policy
for when it's okay to remove such info.

regards, tom lane

Re: git clone failed in windows

2020-10-23 Thread Sridhar N Bamandlapally

Thanks All,

 it wotked with
git://git.postgresql.org/git/postgresql.git

Thanks
Sridhar


On Fri, Oct 23, 2020 at 7:02 PM Amit Kapila  wrote:

> On Fri, Oct 23, 2020 at 6:21 PM Dave Page  wrote:
> >
> > On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila 
> wrote:
> >>
> >> On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally
> >>  wrote:
> >> >
> >> > Am trying to clone postgresql git, getting error
> >> >
> >> > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
> >> > Cloning into 'postgresql'...
> >> > remote: Enumerating objects: 806507, done.
> >> > remote: Counting objects: 100% (806507/806507), done.
> >> > remote: Compressing objects: 100% (122861/122861), done.
> >> > error: RPC failed; curl 18 transfer closed with 3265264 bytes
> remaining to read
> >> > fatal: the remote end hung up unexpectedly
> >> > fatal: early EOF
> >> > fatal: index-pack failed
> >> >
> >>
> >> I have also just tried this and it failed with same error. However, it
> >> worked when I tried 'git clone
> >> git://git.postgresql.org/git/postgresql.git'. I don't know what is the
> >> issue.
> >
> >
> > It worked for me with https. Can you try again?
> >
>
> This time it worked but on the third try.
>
> --
> With Regards,
> Amit Kapila.
>

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Stephen Frost

Greetings,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 06/10/2020 15:15, Ian Lawrence Barwick wrote:
> >2020年10月6日(火) 21:13 Ian Lawrence Barwick :
> >>The pg_dump doc page [1], under the -t/--table option, contains a Note
> >>documenting the behavioural differences introduced in PostgreSQL 8.2.
> >>
> >>As it's been almost exactly 14 years since that note was added [2], I 
> >>suggest
> >>it can be removed entirely.
> >>
> >>[1] https://www.postgresql.org/docs/current/app-pgdump.html
> >>[2] 
> >>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b
> >
> >
> >Oh yes, I was planning to attach an ultra-trivial patch for that too.
> 
> Applied, thanks.

Isn't this a bit pre-mature as we still support running pg_dump against
8.0 clusters..?

Removing support for older clusters is certainly something we can
discuss but I don't know that it makes sense to just piecemeal pull
things out.  I get that this was just a documentation note, but, still,
we do support pg_dump run against 8.0 and 8.1 clusters, at least today.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: git clone failed in windows

2020-10-23 Thread Amit Kapila

On Fri, Oct 23, 2020 at 6:21 PM Dave Page  wrote:
>
> On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila  wrote:
>>
>> On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally
>>  wrote:
>> >
>> > Am trying to clone postgresql git, getting error
>> >
>> > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
>> > Cloning into 'postgresql'...
>> > remote: Enumerating objects: 806507, done.
>> > remote: Counting objects: 100% (806507/806507), done.
>> > remote: Compressing objects: 100% (122861/122861), done.
>> > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to 
>> > read
>> > fatal: the remote end hung up unexpectedly
>> > fatal: early EOF
>> > fatal: index-pack failed
>> >
>>
>> I have also just tried this and it failed with same error. However, it
>> worked when I tried 'git clone
>> git://git.postgresql.org/git/postgresql.git'. I don't know what is the
>> issue.
>
>
> It worked for me with https. Can you try again?
>

This time it worked but on the third try.

-- 
With Regards,
Amit Kapila.

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma

On Fri, Oct 23, 2020 at 5:42 PM Ashutosh Sharma  wrote:
>
> Hi Vignesh,
>
> Thanks for the updated patches. Here are some more comments that I can
> find after reviewing your latest patches:
>
> +/*
> + * This structure helps in storing the common data from CopyStateData that 
> are
> + * required by the workers. This information will then be allocated and 
> stored
> + * into the DSM for the worker to retrieve and copy it to CopyStateData.
> + */
> +typedef struct SerializedParallelCopyState
> +{
> +   /* low-level state data */
> +   CopyDestcopy_dest;  /* type of copy source/destination */
> +   int file_encoding;  /* file or remote side's character encoding */
> +   boolneed_transcoding;   /* file encoding diff from server? */
> +   boolencoding_embeds_ascii;  /* ASCII can be non-first byte? */
> +
> ...
> ...
> +
> +   /* Working state for COPY FROM */
> +   AttrNumber  num_defaults;
> +   Oid relid;
> +} SerializedParallelCopyState;
>
> Can the above structure not be part of the CopyStateData structure? I
> am just asking this question because all the fields present in the
> above structure are also present in the CopyStateData structure. So,
> including it in the CopyStateData structure will reduce the code
> duplication and will also make CopyStateData a bit shorter.
>
> --
>
> +   pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
> +relid);
>
> Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
> function when we are already passing cstate structure, using which
> both of these information can be retrieved ?
>
> --
>
> +/* DSM keys for parallel copy.  */
> +#define PARALLEL_COPY_KEY_SHARED_INFO  1
> +#define PARALLEL_COPY_KEY_CSTATE   2
> +#define PARALLEL_COPY_WAL_USAGE3
> +#define PARALLEL_COPY_BUFFER_USAGE 4
>
> DSM key names do not appear to be consistent. For shared info and
> cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY",
> but for WalUsage and BufferUsage structures, it is prefixed with
> "PARALLEL_COPY". I think it would be better to make them consistent.
>
> --
>
> if (resultRelInfo->ri_TrigDesc != NULL &&
> (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
>  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
> {
> /*
>  * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
>  * triggers on the table. Such triggers might query the table we're
>  * inserting into and act differently if the tuples that have already
>  * been processed and prepared for insertion are not there.
>  */
> insertMethod = CIM_SINGLE;
> }
> else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
>  resultRelInfo->ri_TrigDesc->trig_insert_new_table)
> {
> /*
>  * For partitioned tables we can't support multi-inserts when there
>  * are any statement level insert triggers. It might be possible to
>  * allow partitioned tables with such triggers in the future, but for
>  * now, CopyMultiInsertInfoFlush expects that any before row insert
>  * and statement level insert triggers are on the same relation.
>  */
> insertMethod = CIM_SINGLE;
> }
> else if (resultRelInfo->ri_FdwRoutine != NULL ||
>  cstate->volatile_defexprs)
> {
> ...
> ...
>
> I think, if possible, all these if-else checks in CopyFrom() can be
> moved to a single function which can probably be named as
> IdentifyCopyInsertMethod() and this function can be called in
> IsParallelCopyAllowed(). This will ensure that in case of Parallel
> Copy when the leader has performed all these checks, the worker won't
> do it again. I also feel that it will make the code look a bit
> cleaner.
>

Just rewriting above comment to make it a bit more clear:

I think, if possible, all these if-else checks in CopyFrom() should be
moved to a separate function which can probably be named as
IdentifyCopyInsertMethod() and this function called from
IsParallelCopyAllowed() and CopyFrom() functions. It will only be
called from CopyFrom() when IsParallelCopy() returns false. This will
ensure that in case of Parallel Copy if the leader has performed all
these checks, the worker won't do it again. I also feel that having a
separate function containing all these checks will make the code look
a bit cleaner.

> --
>
> +void
> +ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
> +{
> ...
> ...
> +   InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> + &walusage[ParallelWorkerNumber]);
> +
> +   MemoryContextSwitchTo(oldcontext);
> +   pfree(cstate);
> +   return;
> +}
>
> It seems like you also need to delete the memory context
> (cstate->copycontext) here.
>
> --
>
> +void
> +ExecBeforeStmtTrigger(CopyState cstate)
> +{
> +   EState

Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat

On Fri, 23 Oct 2020 at 18:23, Amit Kapila  wrote:

> On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <
> alvhe...@alvh.no-ip.org> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <
> horikyota@gmail.com>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts
> that the
> > > > enum fits a byte. If there's a way to declare byte long enums I
> would use
> > > > that. But I didn't find a way to do that.
> > >
> > > I didn't look at the code, but maybe it's sufficient to add a
> > > StaticAssert?
> >
> > That check needs to visit all symbols in a enum and confirm that each
> > of them is in a certain range.
> >
>
> Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
> indicating this is a fake message and must be the last one) as the
> last and just check that?
>
>
I don't think that's required once I applied suggestions from Kyotaro and
Peter. Please check the latest patch.
Usually LAST is added to an enum when we need to cap the number of symbols
or want to find the number of symbols. None of that is necessary here. Do
you see any other use?

-- 
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat

On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi 
wrote:

> At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith 
> wrote in
> > On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <
> alvhe...@alvh.no-ip.org> wrote in
> > > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > > >
> > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <
> horikyota@gmail.com>
> > > > > wrote:
> > > >
> > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we
> need
> > > > > > something like that we shouldn't do this refactoring, I think.
> > > > >
> > > > > Enum is an integer, and we want to send byte. The function asserts
> that the
> > > > > enum fits a byte. If there's a way to declare byte long enums I
> would use
> > > > > that. But I didn't find a way to do that.
> >
> > The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.
>
> Ah, yes, it is what I meant. I didn't come up with the word "overkill".
>
> > The comment in the LogicalRepMsgType enum will sufficiently ensure
> > nobody is going to accidentally add any bad replication message codes.
> > And it's not like these are going to be changed often.
>
> Agreed.
>
> > Why not simply downcast your enums when calling pq_sendbyte?
> > There are only a few of them.
> >
> > e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);
>
> If you are worried about compiler warning, that explicit cast is not
> required. Even if the symbol is larger than 0xff, the upper bytes are
> silently truncated off.
>
>
I agree with Peter that the prologue of  LogicalRepMsgType is enough.

I also agree with Kyotaro, that explicit cast is unnecessary.

All this together makes the second patch useless. Removed it. Instead used
Kyotaro's idea in previous mail.

PFA updated patch.

-- 
Best Wishes,
Ashutosh
From 76f578416503478bf5e39993eec4bbd0f17d5a17 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses string
literals for the same. Enumize those so that
1. All the string literals used can be found at a single place. This
makes it easy to add more actions without the risk of conflicts.
2. It's easy to locate the code handling a given action.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 25 +++
 3 files changed, 81 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, LOGICAL_RE

Re: Enumize logical replication message actions

2020-10-23 Thread Amit Kapila

On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera  
> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that 
> > > the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

-- 
With Regards,
Amit Kapila.

Re: git clone failed in windows

2020-10-23 Thread Dave Page

On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila  wrote:

> On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally
>  wrote:
> >
> > Am trying to clone postgresql git, getting error
> >
> > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
> > Cloning into 'postgresql'...
> > remote: Enumerating objects: 806507, done.
> > remote: Counting objects: 100% (806507/806507), done.
> > remote: Compressing objects: 100% (122861/122861), done.
> > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining
> to read
> > fatal: the remote end hung up unexpectedly
> > fatal: early EOF
> > fatal: index-pack failed
> >
>
> I have also just tried this and it failed with same error. However, it
> worked when I tried 'git clone
> git://git.postgresql.org/git/postgresql.git'. I don't know what is the
> issue.
>

It worked for me with https. Can you try again? It may be that the Varnish
cache was doing it's meditation thing for some reason. I can't see anything
obvious on the system though - nothing in the logs, and the services have
all been up for days.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat

On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi 
wrote:

>
> Those two switch()es are apparently redundant. That code is exactly
> equivalent to:
>
> apply_dispatch(s)
> {
>   LogicalRepMsgType msgtype = pq_getmsgtype(s);
>
>   switch (msgtype)
>   {
>  case LOGICAL_REP_MSG_BEGIN:
> apply_handle_begin();
> !   return;
>  ...
>  case LOGICAL_REP_MSG_STREAM_COMMIT:
> apply_handle_begin();
> !   return;
>   }
>
>   ereport(ERROR, (errmsg("invalid logical replication message type"..
> }
>
> which is smaller and fast.
>

Good idea. Implemented in the latest patch posted with the next mail.

-- 
Best Wishes,
Ashutosh

Re: git clone failed in windows

2020-10-23 Thread Amit Kapila

On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally
 wrote:
>
> Am trying to clone postgresql git, getting error
>
> D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
> Cloning into 'postgresql'...
> remote: Enumerating objects: 806507, done.
> remote: Counting objects: 100% (806507/806507), done.
> remote: Compressing objects: 100% (122861/122861), done.
> error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to 
> read
> fatal: the remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
>

I have also just tried this and it failed with same error. However, it
worked when I tried 'git clone
git://git.postgresql.org/git/postgresql.git'. I don't know what is the
issue.

-- 
With Regards,
Amit Kapila.

Re: git clone failed in windows

2020-10-23 Thread Andrey Borodin

Hi Sridhar!

> 23 окт. 2020 г., в 16:09, Sridhar N Bamandlapally  
> написал(а):
> 
> Am trying to clone postgresql git, getting error
> 
> D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
> Cloning into 'postgresql'...
> remote: Enumerating objects: 806507, done.
> remote: Counting objects: 100% (806507/806507), done.
> remote: Compressing objects: 100% (122861/122861), done.
> error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to 
> read
> fatal: the remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed

It seems like your internet connection is not stable enough.
As an alternative you can try to clone https://github.com/postgres/postgres
It's synced with official repository you mentioned and allows you to have your 
fork for personal branches.

Thanks!

Best regards, Andrey Borodin.

Logical replication from HA cluster

2020-10-23 Thread Andrey Borodin

Hi!

I'm working on providing smooth failover to a CDC system in HA cluster.
Currently, we do not replicate logical slots and when we promote a replica. 
This renders impossible continuation of changed data capture (CDC) from new 
primary after failover.

We cannot start logical replication from LSN different from LSN of a slot. And 
cannot create a slot on LSN in the past, particularly before or right after 
promotion.

This leads to massive waste of network bandwidth in our installations, due to 
necessity of initial table sync.

We are considering to use the extension that creates replication slot with LSN 
in the past [0]. I understand that there might be some caveats with logical 
replication, but do not see scale of possible implications of this approach. 
User get error if WAL is rotated or waits if LSN is not reached yet, this seems 
perfectly fine for us. In most of our cases when CDC agent detects failover and 
goes to new primary there are plenty of old WALs to restart CDC.

Are there strong reasons why we do not allow creation of slots with given LSNs, 
possibly within narrow LSN range (but wider that just GetXLogInsertRecPtr())?

Thanks!

Best regards, Andrey Borodin.


[0] https://github.com/x4m/pg_tm_aux/blob/master/pg_tm_aux.c#L74-L77

Re: git clone failed in windows

2020-10-23 Thread Sridhar N Bamandlapally

git clone repository showing failed from Visual studio

[image: git-clone-error.PNG]

Please let me know is there any issue,

Thanks
Sridhar BN


On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally <
sridhar@gmail.com> wrote:

> Am trying to clone postgresql git, getting error
>
> D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
> Cloning into 'postgresql'...
> remote: Enumerating objects: 806507, done.
> remote: Counting objects: 100% (806507/806507), done.
> remote: Compressing objects: 100% (122861/122861), done.
> error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to
> read
> fatal: the remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
>
> Please let me know anything as am doing this for first time
>
> Thanks
> Sridhar BN
>
>
>

Re: Error in pg_restore (could not close data file: Success)

2020-10-23 Thread Andrii Tkach

Maybe it would be better to commit this patches to mainstream, but I don't
realy know.

ср, 21 окт. 2020 г. в 09:20, Kyotaro Horiguchi :

> At Wed, 21 Oct 2020 13:45:15 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> >
> https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea
> >
> > Me> Hmm. Sounds reasonable.  I'm going to do that.  Thanks!
> >
> > But somehow that haven't happened, I'll come up with a new version.
>
> pg_restore shows the following error instead of "Success" for broken
> compressed file.
>
> pg_restore: error: could not close data file "d/3149.dat": zlib error:
> error reading or writing compressed file
>
>
> 0001:
>
> cfclose() calls fatal() instead of returning the result to the callers
> on error, which isobviously okay for all existing callers that are
> handling errors from the function. Other callers ignored the returned
> value but we should fatal() on error of the function.
>
> At least for me, gzerror doesn't return a message (specifically,
> returns NULL) after gzclose failure so currently cfclose shows its own
> messages for erros of gzclose().  Am I missing something?
>
> 0002:
>
> cfread has the same code with get_cfp_error() and cfgetc uses
> sterror() after gzgetc(). It would be suitable for a separate patch,
> but 0002 fixes those bugs.  I changed _EndBlob() to show the cause of
> an error.
>
> Did not do in this patch:
>
> We could do further small refactoring to remove temporary variables in
> pg_backup_directory.c for _StartData(), InitArchiveFmt_Directory,
> _LoadBlobs(), _StartBlobs() and _CloseArchive(), but I left them as is
> for the ease of back-patching.
>
> Now that we have the file name in the context variable so we could
> show the file name in all error messages, but that change was large
> and there's a part where that change is a bit more complex so I didn't
> do that.
>
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma

Hi Vignesh,

Thanks for the updated patches. Here are some more comments that I can
find after reviewing your latest patches:

+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+   /* low-level state data */
+   CopyDestcopy_dest;  /* type of copy source/destination */
+   int file_encoding;  /* file or remote side's character encoding */
+   boolneed_transcoding;   /* file encoding diff from server? */
+   boolencoding_embeds_ascii;  /* ASCII can be non-first byte? */
+
...
...
+
+   /* Working state for COPY FROM */
+   AttrNumber  num_defaults;
+   Oid relid;
+} SerializedParallelCopyState;

Can the above structure not be part of the CopyStateData structure? I
am just asking this question because all the fields present in the
above structure are also present in the CopyStateData structure. So,
including it in the CopyStateData structure will reduce the code
duplication and will also make CopyStateData a bit shorter.

--

+   pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+relid);

Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
function when we are already passing cstate structure, using which
both of these information can be retrieved ?

--

+/* DSM keys for parallel copy.  */
+#define PARALLEL_COPY_KEY_SHARED_INFO  1
+#define PARALLEL_COPY_KEY_CSTATE   2
+#define PARALLEL_COPY_WAL_USAGE3
+#define PARALLEL_COPY_BUFFER_USAGE 4

DSM key names do not appear to be consistent. For shared info and
cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY",
but for WalUsage and BufferUsage structures, it is prefixed with
"PARALLEL_COPY". I think it would be better to make them consistent.

--

if (resultRelInfo->ri_TrigDesc != NULL &&
(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
{
/*
 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
 * triggers on the table. Such triggers might query the table we're
 * inserting into and act differently if the tuples that have already
 * been processed and prepared for insertion are not there.
 */
insertMethod = CIM_SINGLE;
}
else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
{
/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers. It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now, CopyMultiInsertInfoFlush expects that any before row insert
 * and statement level insert triggers are on the same relation.
 */
insertMethod = CIM_SINGLE;
}
else if (resultRelInfo->ri_FdwRoutine != NULL ||
 cstate->volatile_defexprs)
{
...
...

I think, if possible, all these if-else checks in CopyFrom() can be
moved to a single function which can probably be named as
IdentifyCopyInsertMethod() and this function can be called in
IsParallelCopyAllowed(). This will ensure that in case of Parallel
Copy when the leader has performed all these checks, the worker won't
do it again. I also feel that it will make the code look a bit
cleaner.

--

+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
...
...
+   InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+   MemoryContextSwitchTo(oldcontext);
+   pfree(cstate);
+   return;
+}

It seems like you also need to delete the memory context
(cstate->copycontext) here.

--

+void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+   EState *estate = CreateExecutorState();
+   ResultRelInfo *resultRelInfo;

This function has a lot of comments which have been copied as it is
from the CopyFrom function, I think it would be good to remove those
comments from here and mention that this code changes done in this
function has been taken from the CopyFrom function. If any queries
people may refer to the CopyFrom function. This will again avoid the
unnecessary code in the patch.

--

As Heikki rightly pointed out in his previous email, we need some high
level description of how Parallel Copy works somewhere in
copyparallel.c file. For reference, please see how a brief description
about parallel vacuum has been added in the vacuumlazy.c file.

 * Lazy vacuum supports parallel execution with parallel worker processes.  In
 * a parallel vacuum, we perform both index vacuum and index cleanup with
 * parallel worker processes.

Re: Enumize logical replication message actions

2020-10-23 Thread Kyotaro Horiguchi

At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith  wrote 
in 
> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera 
> >  wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > > 
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that 
> > > > the
> > > > enum fits a byte. If there's a way to declare byte long enums I would 
> > > > use
> > > > that. But I didn't find a way to do that.
> 
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
> 
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Autovacuum on partitioned table (autoanalyze)

2020-10-23 Thread Kyotaro Horiguchi

Thanks you for the new version.

At Fri, 23 Oct 2020 15:12:51 +0900, yuzuko  wrote in 
> Hello,
> 
> I reconsidered  a way based on the v5 patch in line with
> Horiguchi-san's comment.
> 
> This approach is as follows:
> - A partitioned table is checked whether it needs analyze like a plain
>   table in relation_needs_vacanalyze().  To do this, we should store
>   partitioned table's stats (changes_since_analyze).
> - Partitioned table's changes_since_analyze is updated when
>   analyze a leaf partition by propagating its changes_since_analyze.
>   In the next scheduled analyze time, it is used in the above process.
>   That is, the partitioned table is analyzed behind leaf partitions.
> - The propagation process differs between autoanalyze or plain analyze.
>   In autoanalyze, a leaf partition's changes_since_analyze is propagated
>   to *all* ancestors.  Whereas, in plain analyze on an inheritance tree,
>   propagates to ancestors not included the tree to avoid needless counting.
> 
> Attach the latest patch to this email.
> Could you check it again please?

+   /*
+* Get its all ancestors to propagate changes_since_analyze 
count.
+* However, when ANALYZE inheritance tree, we get ancestors of
+* toprel_oid to avoid needless counting.
+*/
+   if (!OidIsValid(toprel_oid))
+   ancestors = 
get_partition_ancestors(RelationGetRelid(rel));
+   else
+   ancestors = get_partition_ancestors(toprel_oid);

This comment doesn't explaining what the code intends but what the
code does.

The reason for the difference is that if we have a valid toprel_oid,
we analyze all descendants of the relation this time, and if we
propagate the number to the descendants of the top relation, the next
analyze on the relations could happen shortly than expected.


-   msg.m_live_tuples = livetuples;
-   msg.m_dead_tuples = deadtuples;
+   msg.m_live_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ? 0  /* if this is a partitioned table, skip modifying */
+   : livetuples;
+   msg.m_dead_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ? 0 /* if this is a partitioned table, skip modifying */
+   : deadtuples;

Two successive branching with the same condition looks odd.  And we
need an explanation of *why* we don't set the values for partitioned
tables.

+   foreach(lc, ancestors)
+   {
+   Oid parentOid = lfirst_oid(lc);
+   Relation parentrel;
+
+   parentrel = table_open(parentOid, AccessShareLock);

I'm not sure, but all of the ancestors always cannot be a parent (in
other words, a parent of a parent of mine is not a parent of
mine). Isn't just rel sufficient?


-* Report ANALYZE to the stats collector, too.  However, if doing
-* inherited stats we shouldn't report, because the stats collector only
-* tracks per-table stats.  Reset the changes_since_analyze counter only
-* if we analyzed all columns; otherwise, there is still work for
-* auto-analyze to do.
+* Report ANALYZE to the stats collector, too.  Reset the
+* changes_since_analyze counter only if we analyzed all columns;
+* otherwise, there is still work for auto-analyze to do.
 */
-   if (!inh)
+   if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
pgstat_report_analyze(onerel, totalrows, totaldeadrows,

This still rejects traditional inheritance nonleaf relations. But if
we remove the description about that completely in the comment above,
we should support traditional inheritance parents here.  I think we
can do that as far as we need to propagate only per-tuple stats (that
mans not per-attribute) like changes_since_analyze.

Whichever way we take, do we need the description about the behavior
in the documentation?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

git clone failed in windows

2020-10-23 Thread Sridhar N Bamandlapally

Am trying to clone postgresql git, getting error

D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git
Cloning into 'postgresql'...
remote: Enumerating objects: 806507, done.
remote: Counting objects: 100% (806507/806507), done.
remote: Compressing objects: 100% (122861/122861), done.
error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to
read
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

Please let me know anything as am doing this for first time

Thanks
Sridhar BN

Re: [PATCH] Add extra statistics to explain for Nested Loop

2020-10-23 Thread e . sokolova


 wrote:

You should update the explain_parallel_append() plpgsql function
created in that test file to make sure that both "rows" and the two
new counters are changed to "N".  There might be other similar changes
needed.


Thank you for watching this issue. I made the necessary changes in tests 
following your advice.



Now, for the changes themselves.  For the min/max time, you're
aggregating "totaltime - instr->firsttuple".  Why removing the startup
time from the loop execution time?  I think this should be kept.


I think it's right remark. I fixed it.


Also, in explain.c you display the counters in the "Nested loop" node,
but this should be done in the inner plan node instead, as this is the
one being looped on.  So the condition should probably be "nloops > 1"
rather than testing if it's a NestLoop.


Condition "nloops > 1" is not the same as checking if it's NestLoop. 
This condition will lead to printing extra statistics for nodes with 
different types of join, not only Nested Loop Join. If this statistic is 
useful for other plan nodes, it makes sense to change the condition.


 wrote:

I'm a bit worried that further increasing the size of struct
Instrumentation will increase the overhead of EXPLAIN ANALYZE further -
in some workloads we spend a fair bit of time in code handling that. It
would be good to try to find a few bad cases, and see what the overhead 
is.


Thank you for this comment, I will try to figure it out. Do you have 
some examples of large overhead dependent on this struct? I think I need 
some sample to know which way to think.


Thank you all for the feedback. I hope the new version of my patch will 
be more correct and useful.

Please don't hesitate to share any thoughts on this topic!
--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom: "Ekaterina Sokolova" 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f18374..2132d82fe79 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1568,29 +1568,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		double		startup_ms = 1000.0 * planstate->instrument->startup / nloops;
 		double		total_ms = 1000.0 * planstate->instrument->total / nloops;
 		double		rows = planstate->instrument->ntuples / nloops;
+		double		min_r = planstate->instrument->min_tuples;
+		double		max_r = planstate->instrument->max_tuples;
+		double		min_t_ms = 1000.0 * planstate->instrument->min_t;
+		double		max_t_ms = 1000.0 * planstate->instrument->max_t;
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
-			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+			if (nodeTag(plan) == T_NestLoop) {
+if (es->timing)
+	appendStringInfo(es->str,
+	 " (actual time=%.3f..%.3f min_time=%.3f max_time=%.3f min_rows=%.0f rows=%.0f max_rows=%.0f loops=%.0f)",
+	 startup_ms, total_ms, min_t_ms, max_t_ms, min_r, rows, max_r, nloops);
+else
+	appendStringInfo(es->str,
+	 " (actual min_rows=%.0f rows=%.0f max_rows=%.0f loops=%.0f)",
+	 min_r, rows, max_r, nloops);
+			}
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+			{
+if (es->timing)
+	appendStringInfo(es->str,
+	 " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 " (actual rows=%.0f loops=%.0f)",
+	 rows, nloops);
+			}
 		}
 		else
 		{
-			if (es->timing)
+			if (nodeTag(plan) == T_NestLoop) {
+if (es->timing) {
+	ExplainPropertyFloat("Actual Startup Time", "s", startup_ms,
+		 3, es);
+	ExplainPropertyFloat("Actual Total Time", "s", total_ms,
+		 3, es);
+	ExplainPropertyFloat("Min Time", "s", min_t_ms,
+		 3, es);
+	ExplainPropertyFloat("Max Time", "s", max_t_ms,
+		 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Min Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Max Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+			else
 			{
-ExplainPropertyFloat("Actual Startup Time", "s", startup_ms,
-	 3, es);
-ExplainPropertyFloat("Actual Total Time", "s", total_ms,
-	 3, es);
+if (es->timing) {
+	ExplainPropertyFloat("Actual Startup Time", "s", startup_ms,
+		 3, es);
+	ExplainPropertyFloat("Actual Total Time", "s", total_ms,
+		 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
 		}
 	}
 	else if (es->analyze)
@@ -1599,6 +1634,7 @@ ExplainNode

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-23 Thread Amit Kapila

On Fri, Oct 23, 2020 at 3:41 PM Ajin Cherian  wrote:
>
>
> Amit,
> I have also modified the stream callback APIs to not include
> stream_commit_prpeared and stream_rollback_prepared, instead use the
> non-stream APIs for the same functionality.
> I have also updated the test_decoding and pgoutput plugins accordingly.
>

Thanks, I think you forgot to address one of my comments in the
previous email[1] (See "One minor comment .."). You have not even
responded to it.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1JzRvUX2XLEKo2f74Vjecnt6wq-kkk1OiyMJ5XjJN%2BGvQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Add section headings to index types doc

2020-10-23 Thread Jürgen Purtz

On 21.10.20 23:12, David G. Johnston wrote:
On Wed, Sep 30, 2020 at 4:25 AM Dagfinn Ilmari Mannsåker 
mailto:ilm...@ilmari.org>> wrote:

Michael Paquier mailto:mich...@paquier.xyz>>
writes:

> On Mon, Aug 10, 2020 at 12:52:17PM +, Jürgen Purtz wrote:
>> The new status of this patch is: Waiting on Author
>
> This has not been answered yet, so I have marked the patch as
returned
> with feedback.

Updated patch attached, wich reformats the operator lists as requested
by Jürgen, 

A couple of things:

One, I would place the equality operator for hash inside a standalone 
synopsis just like all of the others.

ok
Two, why is hash special in having its create index command provided 
here? (I notice this isn't the fault of this patch but it stands out 
when reviewing it)

yes, this looks odd.

I would suggest rewording hash to look more like the others

ok

and add a link to the "CREATE INDEX" command from the chapter preamble.

is the link necessary?

and skips the reindentation as suggested by Tom.

Agreed
David J.

--

J. Purtz

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread John Naylor

On Thu, Oct 22, 2020 at 10:11 PM Michael Paquier 
wrote:

> On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote:
> > Looks good to me.
>
> Thanks.  Committed, then.  Great work!
>

Thank you!

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: partition routing layering in nodeModifyTable.c

2020-10-23 Thread Amit Langote

On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas  wrote:
> On 22/10/2020 16:49, Amit Langote wrote:
> > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote  
> > wrote:
> >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas  wrote:
> >>> It's probably true that there's no performance gain from initializing
> >>> them more lazily. But the reasoning and logic around the initialization
> >>> is complicated. After tracing through various path through the code, I'm
> >>> convinced enough that it's correct, or at least these patches didn't
> >>> break it, but I still think some sort of lazy initialization on first
> >>> use would make it more readable. Or perhaps there's some other
> >>> refactoring we could do.
> >>
> >> So the other patch I have mentioned is about lazy initialization of
> >> the ResultRelInfo itself, not the individual fields, but maybe with
> >> enough refactoring we can get the latter too.
> >
> > So, I tried implementing a lazy-initialization-on-first-access
> > approach for both the ResultRelInfos themselves and some of the
> > individual fields of ResultRelInfo that don't need to be set right
> > away.  You can see the end result in the attached 0003 patch.  This
> > slims down ExecInitModifyTable() significantly, both in terms of code
> > footprint and the amount of work that it does.
>
> Have you done any performance testing? I'd like to know how much of a
> difference this makes in practice.

I have shown some numbers here:

https://www.postgresql.org/message-id/ca+hiwqg7zrubmmih3wpsbz4s0h2ehywrnxeducky5hr3fwz...@mail.gmail.com

To reiterate, if you apply the following patch:

> Does this patch become moot if we do the "Overhaul UPDATE/DELETE
> processing"?
> (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)?

...and run this benchmark with plan_cache_mode = force_generic_plan

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 100)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

you may see roughly the following results:

HEAD:

0 tps = 13045.485121 (excluding connections establishing)
10 tps = 9358.157433 (excluding connections establishing)
100 tps = 1878.274500 (excluding connections establishing)
1000 tps = 84.684695 (excluding connections establishing)

Patched (overhaul update/delete processing):

0 tps = 12743.487196 (excluding connections establishing)
10 tps = 12644.240748 (excluding connections establishing)
100 tps = 4158.123345 (excluding connections establishing)
1000 tps = 391.248067 (excluding connections establishing)

And if you apply the patch being discussed here, TPS shoots up a bit,
especially for higher partition counts:

Patched (lazy-ResultRelInfo-initialization)

0 tps = 13419.283168 (excluding connections establishing)
10 tps = 12588.016095 (excluding connections establishing)
100 tps = 8560.824225 (excluding connections establishing)
1000 tps = 1926.553901 (excluding connections establishing)

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner.  The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email.  To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

> Another alternative is to continue to create the ResultRelInfos in
> ExecInitModify(), but initialize the individual fields in them lazily.

If you consider the above, maybe you can see how that will not really
eliminate the bottleneck I'm aiming to fix here.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Online checksums verification in the backend

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote:
> I agree.  However I'm assuming that this refactor is relying on the
> not yet committed patch (in the nearby thread dealing with base backup
> checksums check) to also refactor PageIsVerified?  As all the logic
> you removed was done to avoid spamming a lot of warnings  when calling
> the function.

Yeah, it should use a refactored version, but I was as well in the
mood of looking at version based on what we have now on HEAD.  Even if
I am not completely clear where the patch for page verification and
base backups will go, I was thinking as well to do the refactoring
introducing PageIsVerifiedExtended() first, before considering the
next steps for this thread.  It seems to me that the path where we
generate no WARNINGs at all makes the whole experience more consistent
for the user with this function.

> Mmm, is it really an improvement to report warnings during this
> function execution?  Note also that PageIsVerified as-is won't report
> a warning if a page is found as PageIsNew() but isn't actually all
> zero, while still being reported as corrupted by the SRF.

Yep, joining the point of above to just have no WARNINGs at all.

> Have you also considered that it's possible to execute
> pg_relation_check_pages with ignore_checksum_failure = on?  That's
> evidently a bad idea, but doing so would report some of the data
> corruption as warnings while still not reporting anything in the SRF.

Yeah, I thought about that as well, but I did not see a strong
argument against preventing this behavior either, even if it sounds
a bit strange.  We could always tune that later depending on the
feedback.
--
Michael

signature.asc
Description: PGP signature

Re: dynamic result sets support in extended query protocol

2020-10-23 Thread Peter Eisentraut


On 2020-10-20 12:24, Dave Cramer wrote:

Finally, we could do it an a best-effort basis.  We use binary format
for registered types, until there is some invalidation event for the
type, at which point we revert to default/text format until the end
of a
session (or until another protocol message arrives re-registering the
type). 


Does the driver tell the server what registered types it wants in binary ?


Yes, the driver tells the server, "whenever you send these types, send 
them in binary" (all other types keep sending in text).



This should work, because the result row descriptor contains the
actual format type, and there is no guarantee that it's the same one
that was requested.

So how about that last option?  I imagine a new protocol message, say,
TypeFormats, that contains a number of type/format pairs.  The message
would typically be sent right after the first ReadyForQuery, gets no
response. 


This seems a bit hard to control. How long do you wait for no response?


In this design, you don't need a response.


It could also be sent at any other time, but I expect that to
be less used in practice.  Binary format is used for registered
types if
they have binary format support functions, otherwise text continues to
be used.  There is no error response for types without binary support.
(There should probably be an error response for registering a type that
does not exist.)

I'm not sure we (pgjdbc) want all types with binary support functions 
sent automatically. Turns out that decoding binary is sometimes slower 
than decoding the text and the on wire overhead isn't significant. 
Timestamps/dates with timezone are also interesting as the binary output 
does not include the timezone.


In this design, you pick the types you want.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Ian Lawrence Barwick

2020年10月23日(金) 17:52 Heikki Linnakangas :
>
> On 06/10/2020 15:15, Ian Lawrence Barwick wrote:
> > 2020年10月6日(火) 21:13 Ian Lawrence Barwick :
> >>
> >> Hi
> >>
> >> The pg_dump doc page [1], under the -t/--table option, contains a Note
> >> documenting the behavioural differences introduced in PostgreSQL 8.2.
> >>
> >> As it's been almost exactly 14 years since that note was added [2], I 
> >> suggest
> >> it can be removed entirely.
> >>
> >> [1] https://www.postgresql.org/docs/current/app-pgdump.html
> >> [2] 
> >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b
> >
> >
> > Oh yes, I was planning to attach an ultra-trivial patch for that too.
>
> Applied, thanks.
>
> - Heikki

Thanks!


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-23 Thread Heikki Linnakangas


On 06/10/2020 15:15, Ian Lawrence Barwick wrote:

2020年10月6日(火) 21:13 Ian Lawrence Barwick :


Hi

The pg_dump doc page [1], under the -t/--table option, contains a Note
documenting the behavioural differences introduced in PostgreSQL 8.2.

As it's been almost exactly 14 years since that note was added [2], I suggest
it can be removed entirely.

[1] https://www.postgresql.org/docs/current/app-pgdump.html
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b



Oh yes, I was planning to attach an ultra-trivial patch for that too.


Applied, thanks.

- Heikki

Re: Enumize logical replication message actions

2020-10-23 Thread Peter Smith

On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera  
> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that 
> > > the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

Kind Regards.
Peter Smith
Fujitsu Australia.

Re: Parallel copy

2020-10-23 Thread Heikki Linnakangas

I had a brief look at at this patch. Important work! A couple of first 
impressions:


1. The split between patches 
0002-Framework-for-leader-worker-in-parallel-copy.patch and 
0003-Allow-copy-from-command-to-process-data-from-file.patch is quite 
artificial. All the stuff introduced in the first is unused until the 
second patch is applied. The first patch introduces a forward 
declaration for ParallelCopyData(), but the function only comes in the 
second patch. The comments in the first patch talk about 
LINE_LEADER_POPULATING and LINE_LEADER_POPULATED, but the enum only 
comes in the second patch. I think these have to merged into one. If you 
want to split it somehow, I'd suggest having a separate patch just to 
move CopyStateData from copy.c to copy.h. The subsequent patch would 
then be easier to read as you could see more easily what's being added 
to CopyStateData. Actually I think it would be better to have a new 
header file, copy_internal.h, to hold CopyStateData and the other 
structs, and keep copy.h as it is.


2. This desperately needs some kind of a high-level overview of how it 
works. What is a leader, what is a worker? Which process does each step 
of COPY processing, like reading from the file/socket, splitting the 
input into lines, handling escapes, calling input functions, and 
updating the heap and indexes? What data structures are used for the 
communication? How does is the work synchronized between the processes? 
There are comments on those individual aspects scattered in the patch, 
but if you're not already familiar with it, you don't know where to 
start. There's some of that in the commit message, but it needs to be 
somewhere in the source code, maybe in a long comment at the top of 
copyparallel.c.


3. I'm surprised there's a separate ParallelCopyLineBoundary struct for 
every input line. Doesn't that incur a lot of synchronization overhead? 
I haven't done any testing, this is just my gut feeling, but I assumed 
you'd work in batches of, say, 100 or 1000 lines each.


- Heikki

Re: Online checksums verification in the backend

2020-10-23 Thread Julien Rouhaud

On Fri, Oct 23, 2020 at 3:28 PM Michael Paquier  wrote:
>
> On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote:
> > No issues with reporting the block number and the fork type in the SRF
> > at least of course as this is helpful to detect the position of the
> > broken blocks.  For the checksum found in the header and the one
> > calculated (named expected in the patch), I am less sure how to put a
> > clear definition on top of that but we could always consider that
> > later and extend the SRF as needed.  Once the user knows that both do
> > not match, he/she knows that there is a problem.
>
> So, I have reviewed your patch set, and heavily reworked the logic to
> be more consistent on many thinks, resulting in a largely simplified
> patch without sacrificing its usefulness:

Thanks!

> - Removal of the dependency with checksums for this feature.  While
> simplifying the code, I have noticed that this feature can also be
> beneficial for clusters that do not have have data checksums, as
> PageIsVerified() is perfectly able to run some page header checks and
> the zero-page case.  That's of course less useful than having the
> checksums, but there is no need to add a dependency here.  The file
> for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c.

I agree.  However I'm assuming that this refactor is relying on the
not yet committed patch (in the nearby thread dealing with base backup
checksums check) to also refactor PageIsVerified?  As all the logic
you removed was done to avoid spamming a lot of warnings  when calling
the function.

> - The function is changed to return no tuples if the relkind is not
> supported, and the same applies for temporary relations.  That's more
> consistent with other system functions like the ones in charge of
> partition information, and this makes full scans of pg_class much
> easier to work with.  Temporary tables were not handled correctly
> anyway as these are in local buffers, but the use case of this
> function in this case is not really obvious to me.

Agreed

> - Having the forknum in the SRF is kind of confusing, as the user
> would need to map a number with the physical on-disk name.  Instead, I
> have made the choice to return the *path* of the corrupted file with a
> block number.  This way, an operator can know immediately where a
> problem comes from.  The patch does not append the segment number, and
> I am not sure if we should actually do that, but adding it is
> straight-forward as we have the block number.  There is a dependency
> with table AMs here as well, as this goes down to fd.c, explaining why
> I have not added it and just.

That's a clear improvement, thanks!

> - I really don't know what kind of default ACL should apply for such
> functions, but I am sure that SCAN_TABLES is not what we are looking
> for here, and there is nothing preventing us from having a safe
> default from the start of times, so I moved the function to be
> superuser-only by default, and GRANT can be used to allow its
> execution to other roles.  We could relax that in the future, of
> course, this can be discussed separately.

I don't have a strong opinion here, SCAN_TABLES was maybe not ideal.
No objections.

> - The WARNING report for each block found as corrupted gains an error
> context, as a result of a switch to PageIsVerified(), giving a user
> all the information needed in the logs on top of the result in the
> SRF.  That's useful as well if combined with CHECK_FOR_INTERRUPTS(),
> and I got to wonder if we should have some progress report for this
> stuff, though that's a separate discussion.

Mmm, is it really an improvement to report warnings during this
function execution?  Note also that PageIsVerified as-is won't report
a warning if a page is found as PageIsNew() but isn't actually all
zero, while still being reported as corrupted by the SRF.

Have you also considered that it's possible to execute
pg_relation_check_pages with ignore_checksum_failure = on?  That's
evidently a bad idea, but doing so would report some of the data
corruption as warnings while still not reporting anything in the SRF.

Having some progress report would be nice to have, but +1 to have a
separate discussion for that.

> - The function is renamed to something less generic,
> pg_relation_check_pages(), and I have reduced the number of functions
> from two to one, where the user can specify the fork name with a new
> option.  The default of NULL means that all the forks of a relation
> are checked.

Ok.

> - The TAP tests are rather bulky.  I have moved all the non-corruption
> test cases into a new SQL test file.  That's useful for people willing
> to do some basic sanity checks with a non-default table AM.  At least
> it provides a minimum coverage.

Agreed

Re: warn_unused_results

2020-10-23 Thread Peter Eisentraut


On 2020-10-17 17:58, Tom Lane wrote:

Peter Eisentraut  writes:

Forgetting to assign the return value of list APIs such as lappend() is
a perennial favorite.  The compiler can help point out such mistakes.
GCC has an attribute warn_unused_results.  Also C++ has standardized
this under the name "nodiscard", and C has a proposal to do the same
[0].  In my patch I call the symbol pg_nodiscard, so that perhaps in a
distant future one only has to do s/pg_nodiscard/nodiscard/ or something
similar.  Also, the name is short enough that it doesn't mess up the
formatting of function declarations too much.


+1 in principle (I've not read the patch in detail); but I wonder what
pgindent does with these added keywords.


pgindent doesn't seem to want to change anything about the patched files 
as I had sent them.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bizarre coding in recovery target GUC management

2020-10-23 Thread Peter Eisentraut


On 2020-10-12 18:00, Tom Lane wrote:

Would someone explain to me why assign_recovery_target_lsn and related GUC
assign hooks throw errors, rather than doing so in the associated check
hooks?  An assign hook is not supposed to throw an error.  Full stop, no
exceptions.  We wouldn't bother to separate those hooks otherwise.


That code is checking whether more than one recovery target GUC has been 
set.  I don't think the check hook sees the right state to be able to 
check that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 12:49:57AM -0400, Tom Lane wrote:
> Although ... just to argue against myself for a moment, how likely
> is it that pg_dump is going to be faced with the need to dump a
> value for unix_socket_directories?

I am trying to think about some scenarios here, but honestly I
cannot..

> So maybe we could get away with just changing it.  It'd be good to
> verify though that this doesn't break existing string values for
> the variable, assuming they contain no unlikely characters that'd
> need quoting.

Yeah, that's the kind of things I wanted to check anyway before
considering doing the switch.
--
Michael


signature.asc
Description: PGP signature

Re: Implementing Incremental View Maintenance

2020-10-23 Thread Yugo NAGATA

Hi Adam,

On Thu, 22 Oct 2020 10:07:29 -0400
Adam Brusselback  wrote:

> Hey there Yugo,
> I've asked a coworker to prepare a self contained example that encapsulates
> our multiple use cases.

Thank you very much!

> The immediate/eager approach is exactly what we need, as within the same
> transaction we have statements that can cause one of those "materialized
> tables" to be updated, and then sometimes have the need to query that
> "materialized table" in a subsequent statement and need to see the changes
> reflected.

The proposed patch provides the exact this feature and I think this will meet
your needs.

> As soon as my coworker gets that example built up I'll send a followup with
> it attached.

Great! We are looking forward to it. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Hash support for row types

2020-10-23 Thread Peter Eisentraut


On 2020-10-20 17:10, Peter Eisentraut wrote:

On 2020-10-20 01:32, Andres Freund wrote:

How does this deal with row types with a field that doesn't have a hash
function? Erroring out at runtime could cause queries that used to
succeed, e.g. because all fields have btree ops, to fail, if we just have
a generic unconditionally present hash opclass?  Is that an OK
"regression"?


Good point.  There is actually code in the type cache that is supposed
to handle that, so I'll need to adjust that.


Here is an updated patch with the type cache integration added.

To your point, this now checks each fields hashability before 
considering a row type as hashable.  It can still have run-time errors 
for untyped record datums, but that's not something we can do anything 
about.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eda6f7182cfe8ac7317d6874317ace24c7c7d5f6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Oct 2020 09:33:41 +0200
Subject: [PATCH v2] Hash support for row types

Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as UNION DISTINCT, hash
joins, and hash partitioning.

Discussion: 
https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
---
 doc/src/sgml/queries.sgml   |   9 -
 src/backend/utils/adt/rowtypes.c| 249 
 src/backend/utils/cache/lsyscache.c |   7 +-
 src/backend/utils/cache/typcache.c  |  78 +---
 src/include/catalog/pg_amop.dat |   5 +
 src/include/catalog/pg_amproc.dat   |   4 +
 src/include/catalog/pg_opclass.dat  |   2 +
 src/include/catalog/pg_operator.dat |   2 +-
 src/include/catalog/pg_opfamily.dat |   2 +
 src/include/catalog/pg_proc.dat |   7 +
 src/test/regress/expected/hash_func.out |  12 ++
 src/test/regress/expected/join.out  |   1 +
 src/test/regress/expected/with.out  |  38 
 src/test/regress/sql/hash_func.sql  |   9 +
 src/test/regress/sql/join.sql   |   1 +
 src/test/regress/sql/with.sql   |  10 +
 16 files changed, 402 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index f06afe2c3f..8e70e57b72 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2182,15 +2182,6 @@ Search Order
 

 
-   
-
- The queries shown in this and the following section involving
- ROW constructors in the target list only support
- UNION ALL (not plain UNION) in the
- current implementation.
-
-   
-

 
  Omit the ROW() syntax in the common case where only one
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 674cf0a55d..28bbce128f 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -19,6 +19,7 @@
 #include "access/detoast.h"
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -1766,3 +1767,251 @@ btrecordimagecmp(PG_FUNCTION_ARGS)
 {
PG_RETURN_INT32(record_image_cmp(fcinfo));
 }
+
+
+/*
+ * Row type hash functions
+ */
+
+Datum
+hash_record(PG_FUNCTION_ARGS)
+{
+   HeapTupleHeader record = PG_GETARG_HEAPTUPLEHEADER(0);
+   uint32  result = 0;
+   Oid tupType;
+   int32   tupTypmod;
+   TupleDesc   tupdesc;
+   HeapTupleData tuple;
+   int ncolumns;
+   RecordCompareData *my_extra;
+   Datum  *values;
+   bool   *nulls;
+
+   check_stack_depth();/* recurses for record-type columns */
+
+   /* Extract type info from tuple */
+   tupType = HeapTupleHeaderGetTypeId(record);
+   tupTypmod = HeapTupleHeaderGetTypMod(record);
+   tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
+   ncolumns = tupdesc->natts;
+
+   /* Build temporary HeapTuple control structure */
+   tuple.t_len = HeapTupleHeaderGetDatumLength(record);
+   ItemPointerSetInvalid(&(tuple.t_self));
+   tuple.t_tableOid = InvalidOid;
+   tuple.t_data = record;
+
+   /*
+* We arrange to look up the needed hashing info just once per series
+* of calls, assuming the record type doesn't change underneath us.
+*/
+   my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
+   if (my_extra == NULL ||
+   my_extra->ncolumns < ncolumns)
+   {
+   fcinfo->flinfo->fn_extra =
+   MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+  
offsetof(RecordCompareData, columns) +
+  ncolumns * 
sizeof(ColumnCo

Re: Tab complete for alter table rls

2020-10-23 Thread Michael Paquier

On Fri, Oct 23, 2020 at 05:22:57AM +, Li Japin wrote:
> Sorry, I forgot add the subject.

No worries.  Good catch.  I'll try to test that and apply it later,
but by reading the code it looks like you got that right.
--
Michael


signature.asc
Description: PGP signature

Re: Online checksums verification in the backend

2020-10-23 Thread Michael Paquier

On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote:
> No issues with reporting the block number and the fork type in the SRF
> at least of course as this is helpful to detect the position of the
> broken blocks.  For the checksum found in the header and the one
> calculated (named expected in the patch), I am less sure how to put a
> clear definition on top of that but we could always consider that
> later and extend the SRF as needed.  Once the user knows that both do
> not match, he/she knows that there is a problem.

So, I have reviewed your patch set, and heavily reworked the logic to
be more consistent on many thinks, resulting in a largely simplified
patch without sacrificing its usefulness:
- The logic of the core routine of bufmgr.c is unchanged.  I have
simplified it a bit though by merging the subroutines that were part
of the patch.  SMgrRelation is used as argument instead of a
Relation.  That's more consistent with the surroundings.  The initial
read of a page without locks is still on the table as an extra
optimization, though I am not completely sure if this should be part
of CheckBuffer() or not.  I also thought about the previous benchmarks
and I think that not using the most-optimized improved performance,
because it reduced the successive runes of the SQL functions, reducing
the back-pressure on the partition locks (we held on of them at the
same time for a longer period, meaning that the other 127 ran free for
a longer time).  Please note that this part still referred to a
"module", which was incorrect.
- Removal of the expected and found checksums from the SRF of the
function.  Based on my recent business with the page checks for base
backups, I have arrived at the conclusion that the SRF should return
data that we can absolutely trust, and the minimum I think we have to
trust here is if a given page is thought as safe or not, considering
all the sanity checks done by PageIsVerified() as the main entry
point for everything.  This has led to a bit of confusion with the
addition of NoComputedChecksum for a page that was empty as of the
initial of the patch, so it happens that we don't need it anymore.
- Removal of the dependency with checksums for this feature.  While
simplifying the code, I have noticed that this feature can also be
beneficial for clusters that do not have have data checksums, as
PageIsVerified() is perfectly able to run some page header checks and
the zero-page case.  That's of course less useful than having the
checksums, but there is no need to add a dependency here.  The file
for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c.
- The function is changed to return no tuples if the relkind is not
supported, and the same applies for temporary relations.  That's more
consistent with other system functions like the ones in charge of
partition information, and this makes full scans of pg_class much
easier to work with.  Temporary tables were not handled correctly
anyway as these are in local buffers, but the use case of this
function in this case is not really obvious to me.
- Having the forknum in the SRF is kind of confusing, as the user
would need to map a number with the physical on-disk name.  Instead, I
have made the choice to return the *path* of the corrupted file with a
block number.  This way, an operator can know immediately where a
problem comes from.  The patch does not append the segment number, and
I am not sure if we should actually do that, but adding it is
straight-forward as we have the block number.  There is a dependency
with table AMs here as well, as this goes down to fd.c, explaining why
I have not added it and just.
- I really don't know what kind of default ACL should apply for such
functions, but I am sure that SCAN_TABLES is not what we are looking
for here, and there is nothing preventing us from having a safe
default from the start of times, so I moved the function to be
superuser-only by default, and GRANT can be used to allow its
execution to other roles.  We could relax that in the future, of
course, this can be discussed separately.
- The WARNING report for each block found as corrupted gains an error
context, as a result of a switch to PageIsVerified(), giving a user
all the information needed in the logs on top of the result in the
SRF.  That's useful as well if combined with CHECK_FOR_INTERRUPTS(),
and I got to wonder if we should have some progress report for this
stuff, though that's a separate discussion.
- The function is renamed to something less generic,
pg_relation_check_pages(), and I have reduced the number of functions
from two to one, where the user can specify the fork name with a new
option.  The default of NULL means that all the forks of a relation
are checked.
- The TAP tests are rather bulky.  I have moved all the non-corruption
test cases into a new SQL test file.  That's useful for people willing
to do some basic sanity checks with a non-default table AM.  At least
it provides a minimum

Re: partition routing layering in nodeModifyTable.c

2020-10-23 Thread Heikki Linnakangas

On 22/10/2020 16:49, Amit Langote wrote:

On Tue, Oct 20, 2020 at 9:57 PM Amit Langote wrote:

On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas wrote:

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

So, I tried implementing a lazy-initialization-on-first-access
approach for both the ResultRelInfos themselves and some of the
individual fields of ResultRelInfo that don't need to be set right
away. You can see the end result in the attached 0003 patch. This
slims down ExecInitModifyTable() significantly, both in terms of code
footprint and the amount of work that it does.

Have you done any performance testing? I'd like to know how much of a
difference this makes in practice.

Another alternative is to continue to create the ResultRelInfos in
ExecInitModify(), but initialize the individual fields in them lazily.

Does this patch become moot if we do the "Overhaul UPDATE/DELETE
processing"?
(https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)?

- Heikki

73 matches

Mail list logo