date:20200902

Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Oleksandr Shulgin

On Wed, Sep 2, 2020 at 7:35 AM Andres Freund  wrote:

> Hi,
>
> on a regular basis I remember a builtin function's name, or can figure it
> out
> using \df etc, but can't remember the argument order. A typical example is
> regexp_*, where I never remember whether the pattern or the input string
> comes
> first.
>
> Unfortunatly \df does not really help with that:
>
> =# \df regexp_split_to_table
>
> ┌┬───┬──┬─┬──┐
> │   Schema   │ Name  │ Result data type │ Argument data
> types │ Type │
>
> ├┼───┼──┼─┼──┤
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text
> │ func │
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text,
> text│ func │
>
> └┴───┴──┴─┴──┘
>
> If the parameters were named however, it'd be clear:
>
> =# CREATE OR REPLACE FUNCTION pg_catalog.regexp_split_to_table(string
> text, pattern text)
>  RETURNS SETOF text
>  LANGUAGE internal
>  IMMUTABLE PARALLEL SAFE STRICT
> AS $function$regexp_split_to_table_no_flags$function$
>
> =# \df regexp_split_to_table
>
> ┌┬───┬──┬──┬──┐
> │   Schema   │ Name  │ Result data type │   Argument data
> types│ Type │
>
> ├┼───┼──┼──┼──┤
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ string text,
> pattern text │ func │
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text,
> text │ func │
>
> └┴───┴──┴──┴──┘
>
> (I intentionally left the three parameter version unchanged, to show the
> difference)
>
>
> In the docs we already name the parameters using SQL like syntax, see [1].
> How
> about we actually do so for at least the more common / complicated
> functions?
>

+many

I find myself in the same situation a lot.
I've never realized that's an implementation detail and not something
fundamental preventing the parameters from being named in the built-in
functions.

--
Alex

Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Julien Rouhaud

On Wed, Sep 2, 2020 at 9:13 AM Oleksandr Shulgin
 wrote:
>
> On Wed, Sep 2, 2020 at 7:35 AM Andres Freund  wrote:
>>
>> Hi,
>>
>> on a regular basis I remember a builtin function's name, or can figure it out
>> using \df etc, but can't remember the argument order. A typical example is
>> regexp_*, where I never remember whether the pattern or the input string 
>> comes
>> first.
>>
>> Unfortunatly \df does not really help with that:
>>
>> =# \df regexp_split_to_table
>> ┌┬───┬──┬─┬──┐
>> │   Schema   │ Name  │ Result data type │ Argument data 
>> types │ Type │
>> ├┼───┼──┼─┼──┤
>> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text 
>>  │ func │
>> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text, text   
>>  │ func │
>> └┴───┴──┴─┴──┘
>>
>> If the parameters were named however, it'd be clear:
>>
>> =# CREATE OR REPLACE FUNCTION pg_catalog.regexp_split_to_table(string text, 
>> pattern text)
>>  RETURNS SETOF text
>>  LANGUAGE internal
>>  IMMUTABLE PARALLEL SAFE STRICT
>> AS $function$regexp_split_to_table_no_flags$function$
>>
>> =# \df regexp_split_to_table
>> ┌┬───┬──┬──┬──┐
>> │   Schema   │ Name  │ Result data type │   Argument data 
>> types│ Type │
>> ├┼───┼──┼──┼──┤
>> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ string text, 
>> pattern text │ func │
>> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text, text   
>>   │ func │
>> └┴───┴──┴──┴──┘
>>
>> (I intentionally left the three parameter version unchanged, to show the 
>> difference)
>>
>>
>> In the docs we already name the parameters using SQL like syntax, see [1]. 
>> How
>> about we actually do so for at least the more common / complicated functions?
>
>
> +many
>
> I find myself in the same situation a lot.
> I've never realized that's an implementation detail and not something 
> fundamental preventing the parameters from being named in the built-in 
> functions.

Same here, it would be a very nice improvement.

Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Gavin Flower


On 02/09/2020 19:15, Julien Rouhaud wrote:

On Wed, Sep 2, 2020 at 9:13 AM Oleksandr Shulgin
 wrote:

On Wed, Sep 2, 2020 at 7:35 AM Andres Freund  wrote:

Hi,

on a regular basis I remember a builtin function's name, or can figure it out
using \df etc, but can't remember the argument order. A typical example is
regexp_*, where I never remember whether the pattern or the input string comes
first.

Unfortunatly \df does not really help with that:

=# \df regexp_split_to_table
┌┬───┬──┬─┬──┐
│   Schema   │ Name  │ Result data type │ Argument data types │ 
Type │
├┼───┼──┼─┼──┤
│ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text  │ 
func │
│ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text, text│ 
func │
└┴───┴──┴─┴──┘

If the parameters were named however, it'd be clear:

=# CREATE OR REPLACE FUNCTION pg_catalog.regexp_split_to_table(string text, 
pattern text)
  RETURNS SETOF text
  LANGUAGE internal
  IMMUTABLE PARALLEL SAFE STRICT
AS $function$regexp_split_to_table_no_flags$function$

=# \df regexp_split_to_table
┌┬───┬──┬──┬──┐
│   Schema   │ Name  │ Result data type │   Argument data types 
   │ Type │
├┼───┼──┼──┼──┤
│ pg_catalog │ regexp_split_to_table │ SETOF text   │ string text, pattern 
text │ func │
│ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text, text  
   │ func │
└┴───┴──┴──┴──┘

(I intentionally left the three parameter version unchanged, to show the 
difference)


In the docs we already name the parameters using SQL like syntax, see [1]. How
about we actually do so for at least the more common / complicated functions?


+many

I find myself in the same situation a lot.
I've never realized that's an implementation detail and not something 
fundamental preventing the parameters from being named in the built-in 
functions.

Same here, it would be a very nice improvement.


+1

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Dave Page

Hi

On Tue, Sep 1, 2020 at 5:29 PM Stephen Frost  wrote:

> Greetings,
>
> * Dave Page (dp...@pgadmin.org) wrote:
> > Attached is a patch against 12.4 for the build system in case anyone
> wants
> > to play (I'll do it properly against the head branch later). I'm guessing
> > this will work for < 12, as with 12 I'm now getting the following which
> > looks like it's related to GSS encryption:
> >
> > "C:\Users\dpage\Downloads\postgresql-12.4\pgsql.sln" (default target)
> (1) ->
> > "C:\Users\dpage\Downloads\postgresql-12.4\pgcrypto.vcxproj" (default
> > target) (2) ->
> > "C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj" (default
> > target) (3) ->
> > (Link target) ->
> >   be-secure-gssapi.obj : error LNK2019: unresolved external symbol setenv
> > referenced in function secure_open_gssapi
> > [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> >   .\Release\postgres\postgres.exe : fatal error LNK1120: 1 unresolved
> > externals [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> >
> > I'll dig into that some more.
>
> Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
> used under Windows.  If you're successful, I don't have any issue
> helping to make that work, though I'm curious if you're trying to build
> with MIT KfW (which is rather ancient these days, being based on krb5
> 1.13 and not updated since..) or with a more current release...?
>

I'm currently using the KFW 4.1 build from MIT. I've tried building it
myself but it requires a very old toolchain (which defeated the point of
what I was trying to do at the time).

I haven't yet looked to see if the source for krb5-1.8.2 will build or even
has the right bits in it for Windows - as I'm sure you know MIT seem to
maintain an entirely different version for Windows for which I assume
there's a reason.


>
> Of course, it'd be good to get a buildfarm animal in place that's
> actually testing this if we're going to make it work.
>

Fixing the config on hamerkop should deal with that I think. Though I am
confused as to why the Buildfarm UI thinks it has Kerberos support enabled
- did we change the config parameter from krb5 to gss some time prior to
9.5? If so, that could explain it.


>
> Regarding the setenv() call, should be able to use pgwin32_putenv() in
> place on Windows, I'd think..?
>

Right, I imagine so. It's on my todo...


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Include access method in listTables output

2020-09-02 Thread Michael Paquier

On Tue, Sep 01, 2020 at 10:27:31AM +, Georgios wrote:
> On Tuesday, 1 September 2020 07:41, Michael Paquier  
> wrote:
>> Adding \dE as there are no foreign tables does not make much sense,
>> and also I wondered why \dt+ was not added.
>>
>> Does the attached look correct to you?
> 
> You have my :+1:

OK, applied then.
--
Michael


signature.asc
Description: PGP signature

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-02 Thread Hao Wu

Not related to DETACH PARTITION, but I found a bug in ATTACH PARTITION.
Here are the steps to reproduce the issue:


create table tpart(i int, j int) partition by range(i);
create table tpart_1(like tpart);
create table tpart_2(like tpart);
create table tpart_default(like tpart);alter table tpart attach partition 
tpart_1 for values from(0) to (100);
alter table tpart attach partition tpart_default default;insert into tpart_2 
values(110,110),(120,120),(150,150);1: begin;
1: alter table tpart attach partition tpart_2 for values from(100) to (200);
2: begin;
-- insert will be blocked by ALTER TABLE ATTACH PARTITION
2: insert into tpart values (110,110),(120,120),(150,150);
1: end;
2: select * from tpart_default;
2: end;

After that the partition tpart_default contains (110,110),(120,120),(150,150)
inserted in session 2, which obviously violates the partition constraint.

Regards,
Hao Wu

Re: More tests with USING INDEX replident and dropped indexes

2020-09-02 Thread Rahila Syed

>
>
>
> On the other hand, it looks appealing to make index_set_state_flags()
> transactional.  This would solve the consistency problem, and looking
> at the code scanning pg_index, I don't see a reason why we could not
> do that.  What do you think?
>

TBH, I am not sure.  I think it is a reasonable change. It is even
indicated in the
comment above index_set_state_flags() that it can be made transactional.
At the same time, probably we can just go ahead with current
inconsistent update of relisreplident and indisvalid flags. Can't see what
will break with that.

Thank you,
-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: clarify "rewritten" in pg_checksums docs

2020-09-02 Thread Michael Paquier

On Tue, Sep 01, 2020 at 03:44:06PM +0200, Daniel Gustafsson wrote:
> Well, I was thinking less technically accurate and more descriptive for end
> users, hiding the implementation details.  "Rewrite" sounds to me more like
> changing data rather than amending pages with a checksum keeping data intact.
> Either way, adding "in-place" is an improvement IMO.

Using rewritten still sounds more adapted to me, as we still write the
thing with chunks of size BLCKSZ.  No objections with the addition of
"in-place" for that sentence.  Any extra opinions?
--
Michael


signature.asc
Description: PGP signature

Re: factorial function/phase out postfix operators?

2020-09-02 Thread John Naylor

On Tue, Sep 1, 2020 at 10:00 PM Mark Dilger
 wrote:
>
> Some changes were made on another thread [1] for the deprecation notices, 
> committed recently by Tom, and I think this patch set is compatible with what 
> was done there.  This patch set is intended for commit against master, 
> targeted for PostgreSQL 14, so the deprecation notices are removed along with 
> the things that were deprecated.  The references to right-unary operators 
> that you call out, above, have been removed.

Hi Mark,

Looks good. Just a couple things I found in 0001:

The factorial operators should now be removed from func.sgml.

For pg_dump, should we issue a pg_log_warning() (or stronger)
somewhere if user-defined postfix operators are found? I'm looking at
the example of "WITH OIDS" in pg_dump.c.

Nitpick: these can be removed, since we already test factorial() in this file:

-SELECT 4!;
-SELECT !!3;
+SELECT factorial(4);
+SELECT factorial(3);

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-09-02 Thread Amit Kapila

On Wed, Sep 2, 2020 at 9:17 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Even if the relation is locked, background processes like checkpointer
> > can still touch the relation which might cause problems. Consider a
> > case where we extend the relation but didn't flush the newly added
> > pages. Now during truncate operation, checkpointer can still flush
> > those pages which can cause trouble for truncate. But, I think in the
> > recovery path such cases won't cause a problem.
>
> I wouldn't count on that staying true ...
>
> https://www.postgresql.org/message-id/ca+hukgj8nrsqgkzensnrc2mfrobv-jcnacbyvtpptk2a9yy...@mail.gmail.com
>

I don't think that proposal will matter after commit c5315f4f44
because we are caching the size/blocks for recovery while doing extend
(smgrextend). In the above scenario, we would have cached the blocks
which will be used at later point of time.

-- 
With Regards,
Amit Kapila.

Re: New statistics for tuning WAL buffer size

2020-09-02 Thread Masahiro Ikeda


+/* --
+ * Backend types
+ * --

You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment 
[-Wcomment]


Thanks for the comment. I fixed.


The contents of pg_stat_walwrites are reset when the server
is restarted. Isn't this problematic? IMO since pg_stat_walwrites
is a collected statistics view, basically its contents should be
kept even in the case of server restart.


I agree your opinion.
I modified to use the statistics collector and persist the wal 
statistics.



I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like 
pg_stat_bgwriter,
which is for bgwriter statistics but it has the statistics related to 
backend.


The pg_stat_walwriter is not security restricted now, so ordinary users 
can access it.
I has the same security level as pg_stat_archiver.If you have any 
comments, please let me know.



Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d973e1149a..9ea5c9b2e2 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
  
 
+ 
+  pg_stat_walwriterpg_stat_walwriter
+  One row only, showing statistics about the WAL writing activity. See
+for details.
+  
+ 
+
  
   pg_stat_databasepg_stat_database
   One row per database, showing database-wide statistics. See
@@ -3264,6 +3271,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+   pg_stat_walwriter
+
+  
+   pg_stat_walwriter
+  
+
+  
+   The pg_stat_walwriter view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  
+
+  
+   pg_stat_walwriter View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   dirty_writes bigint
+  
+  
+   Number of dirty WAL writes when the  are full
+  
+ 
+
+ 
+  
+   stats_reset timestamp with time zone
+  
+  
+   Time at which these statistics were last reset
+  
+ 
+ 
+   
+  
+
+
+
  
   pg_stat_database
 
@@ -4652,8 +4709,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 argument.  The argument can be bgwriter to reset
 all the counters shown in
 the pg_stat_bgwriter
-view, or archiver to reset all the counters shown in
-the pg_stat_archiver view.
+view, archiver to reset all the counters shown in
+the pg_stat_archiver view ,or walwriter
+to reset all the counters shown in the pg_stat_walwriter view.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..47b148b3b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2193,6 +2193,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 	WriteRqst.Write = OldPageRqstPtr;
 	WriteRqst.Flush = 0;
 	XLogWrite(WriteRqst, false);
+	WalWriterStats.m_xlog_dirty_writes++;
 	LWLockRelease(WALWriteLock);
 	TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a2d61302f9..6b43ad61be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1000,6 +1000,11 @@ CREATE VIEW pg_stat_progress_analyze AS
 FROM pg_stat_get_progress_info('ANALYZE') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_walwriter AS
+SELECT
+pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8116b23614..154cdc5ddb 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,14 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WalWriter global statistics counter.
+ * This counter is incremented by each XLogWrite call,
+ * both in the wal writer process and each backend.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWalWriter WalWriterStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +289,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_Ar

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Dilip Kumar

On Tue, Sep 1, 2020 at 8:33 PM Amit Kapila  wrote:
>
> On Tue, Sep 1, 2020 at 9:28 AM Amit Kapila  wrote:
> >
> > On Mon, Aug 31, 2020 at 7:28 PM Dilip Kumar  wrote:
> > >
> > > On Mon, Aug 31, 2020 at 1:24 PM Amit Kapila  
> > > wrote:
> >
> > In functions cleanup_rel_sync_cache and
> > get_schema_sent_in_streamed_txn, lets cast the result of lfirst_int to
> > uint32 as suggested by Tom [1]. Also, lets keep the way we compare
> > xids consistent in both functions, i.e, if (xid == lfirst_int(lc)).
> >
>
> Fixed this in the attached patch.
>
> > The behavior tested by the test case added for this is not clear
> > primarily because of comments.
> >
> > +++ b/src/test/subscription/t/021_stream_schema.pl
> > @@ -0,0 +1,80 @@
> > +# Test behavior with streaming transaction exceeding 
> > logical_decoding_work_mem
> > ...
> > +# large (streamed) transaction with DDL, DML and ROLLBACKs
> > +$node_publisher->safe_psql('postgres', q{
> > +BEGIN;
> > +ALTER TABLE test_tab ADD COLUMN c INT;
> > +INSERT INTO test_tab SELECT i, md5(i::text), i FROM
> > generate_series(3,3000) s(i);
> > +ALTER TABLE test_tab ADD COLUMN d INT;
> > +COMMIT;
> > +});
> > +
> > +# large (streamed) transaction with DDL, DML and ROLLBACKs
> > +$node_publisher->safe_psql('postgres', q{
> > +BEGIN;
> > +INSERT INTO test_tab SELECT i, md5(i::text), i, i FROM
> > generate_series(3001,3005) s(i);
> > +COMMIT;
> > +});
> > +wait_for_caught_up($node_publisher, $appname);
> >
> > I understand that how this test will test the functionality related to
> > schema_sent stuff but neither the comments atop of file nor atop the
> > test case explains it clearly.
> >
>
> Added comments for this test.
>
> > > > Few more comments:
> >
> > >
> > > > 2.
> > > > 009_stream_simple.pl
> > > > +# Insert, update and delete enough rows to exceed the 64kB limit.
> > > > +$node_publisher->safe_psql('postgres', q{
> > > > +BEGIN;
> > > > +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 
> > > > 5000) s(i);
> > > > +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> > > > +DELETE FROM test_tab WHERE mod(a,3) = 0;
> > > > +COMMIT;
> > > > +});
> > > >
> > > > How much above this data is 64kB limit? I just wanted to see that it
> > > > should not be on borderline and then due to some alignment issues the
> > > > streaming doesn't happen on some machines? Also, how such a test
> > > > ensures that the streaming has happened because the way we are
> > > > checking results, won't it be the same for the non-streaming case as
> > > > well?
> > >
> > > Only for this case, or you mean for all the tests?
> > >
> >
>
> I have not done this yet.
Most of the test cases are generating above 100kb and a few are around
72kb, Please find the test case wise data size.

015 - 200kb
016 - 150kb
017 - 72kb
018 - 72kb before first rollback to sb and total ~100kb
019 - 76kb before first rollback to sb and total ~100kb
020 - 150kb
021 - 100kb

> > It is better to do it for all tests and I have clarified this in my
> > next email sent yesterday [2] where I have raised a few more comments
> > as well. I hope you have not missed that email.

I saw that I think I replied to this before seeing that.

> > > > 3.
> > > > +# Change the local values of the extra columns on the subscriber,
> > > > +# update publisher, and check that subscriber retains the expected
> > > > +# values
> > > > +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> > > > 'epoch'::timestamptz + 987654321 * interval '1s'");
> > > > +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = 
> > > > md5(a::text)");
> > > > +
> > > > +wait_for_caught_up($node_publisher, $appname);
> > > > +
> > > > +$result =
> > > > +  $node_subscriber->safe_psql('postgres', "SELECT count(*),
> > > > count(extract(epoch from c) = 987654321), count(d = 999) FROM
> > > > test_tab");
> > > > +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> > > > changed data');
> > > >
> > > > Again, how this test is relevant to streaming mode?
> > >
> > > I agree, it is not specific to the streaming.
> > >
>
> I think we can leave this as of now. After committing the stats
> patches by Sawada-San and Ajin, we might be able to improve this test.

Make sense to me.

> > +sub wait_for_caught_up
> > +{
> > + my ($node, $appname) = @_;
> > +
> > + $node->poll_query_until('postgres',
> > +"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication
> > WHERE application_name = '$appname';"
> > + ) or die "Timed ou
> >
> > The patch has added this in all the test files if it is used in so
> > many tests then we need to add this in some generic place
> > (PostgresNode.pm) but actually, I am not sure if need this at all. Why
> > can't the existing wait_for_catchup in PostgresNode.pm serve the same
> > purpose.
> >
>
> Changed as per this suggestion.

Okay.

> > 2.
> > In system_views.sql,
> >
> > -- All columns of pg_subscription except subconninfo are readable.
> > REVOKE ALL ON pg_subscription FROM pub

Re: [HACKERS] Custom compression methods

2020-09-02 Thread Dilip Kumar

On Wed, Sep 2, 2020 at 4:57 AM Mark Dilger  wrote:
>
>
>
> > On Aug 13, 2020, at 4:48 AM, Dilip Kumar  wrote:
> >
> > v1-0001: As suggested by Robert, it provides the syntax support for
> > setting the compression method for a column while creating a table and
> > adding columns.  However, we don't support changing the compression
> > method for the existing column.  As part of this patch, there is only
> > one built-in compression method that can be set (pglz).  In this, we
> > have one in-build am (pglz) and the compressed attributes will directly
> > store the oid of the AM.  In this patch, I have removed the
> > pg_attr_compresion as we don't support changing the compression
> > for the existing column so we don't need to preserve the old
> > compressions.
>
> I do not like the way pglz compression is handled in this patch.  After 
> upgrading PostgreSQL to the first version with this patch included, 
> pre-existing on-disk compressed data will not include any custom compression 
> Oid in the header, and toast_decompress_datum will notice that and decompress 
> the data directly using pglz_decompress.   If the same data were then written 
> back out, perhaps to another table, into a column with no custom compression 
> method defined, it will get compressed by toast_compress_datum using 
> DefaultCompressionOid, which is defined as PGLZ_COMPRESSION_AM_OID.  That 
> isn't a proper round-trip for the data, as when it gets re-compressed, the 
> PGLZ_COMPRESSION_AM_OID gets written into the header, which makes the data a 
> bit longer, but also means that it is not byte-for-byte the same as it was, 
> which is counter-intuitive.  Given that any given pglz compressed datum now 
> has two totally different formats that might occur on disk, code may have to 
> consider both of them, which increases code complexity, and regression tests 
> will need to be written with coverage for both of them, which increases test 
> complexity.  It's also not easy to write the extra tests, as there isn't any 
> way (that I see) to intentionally write out the traditional shorter form from 
> a newer database server; you'd have to do something like a pg_upgrade test 
> where you install an older server to write the older format, upgrade, and 
> then check that the new server can handle it.
>
> The cleanest solution to this would seem to be removal of the compression 
> am's Oid from the header for all compression ams, so that pre-patch written 
> data and post-patch written data look exactly the same.  The other solution 
> is to give pglz pride-of-place as the original compression mechanism, and 
> just say that when pglz is the compression method, no Oid gets written to the 
> header, and only when other compression methods are used does the Oid get 
> written.  This second option seems closer to the implementation that you 
> already have, because you already handle the decompression of data where the 
> Oid is lacking, so all you have to do is intentionally not write the Oid when 
> compressing using pglz.
>
> Or did I misunderstand your implementation?

Thanks for looking into it.  Actually, I am planning to change this
patch such that we will use the upper 2 bits of the size field instead
of storing the amoid for the builtin compression methods.
e. g. 0x00 = pglz, 0x01 = zlib, 0x10 = other built-in, 0x11 -> custom
compression method.  And when 0x11 is set then only we will store the
amoid in the toast header.  I think after a week or two I will make
these changes and post my updated patch.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Re: [HACKERS] Custom compression methods

2020-09-02 Thread Dilip Kumar

On Mon, Aug 31, 2020 at 10:45 AM Amit Khandekar  wrote:
>
> On Thu, 13 Aug 2020 at 17:18, Dilip Kumar  wrote:
> > I have rebased the patch on the latest head and currently, broken into 3 
> > parts.
> >
> > v1-0001: As suggested by Robert, it provides the syntax support for
> > setting the compression method for a column while creating a table and
> > adding columns.  However, we don't support changing the compression
> > method for the existing column.  As part of this patch, there is only
> > one built-in compression method that can be set (pglz).  In this, we
> > have one in-build am (pglz) and the compressed attributes will directly
> > store the oid of the AM.  In this patch, I have removed the
> > pg_attr_compresion as we don't support changing the compression
> > for the existing column so we don't need to preserve the old
> > compressions.
> > v1-0002: Add another built-in compression method (zlib)
> > v1:0003: Remaining patch set (nothing is changed except rebase on the
> > current head, stabilizing check-world  and  0001 and 0002 are pulled
> > out of this)
> >
> > Next, I will be working on separating out the remaining patches as per
> > the suggestion by Robert.
>
> Thanks for this new feature. Looks promising and very useful, with so
> many good compression libraries already available.

Thanks for looking into it.

> I see that with the patch-set, I would be able to create an extension
> that defines a PostgreSQL C handler function which assigns all the
> required hook function implementations for compressing, decompressing
> and validating, etc. In short, I would be able to use a completely
> different compression algorithm to compress toast data if I write such
> an extension. Correct me if I am wrong with my interpretation.
>
> Just a quick superficial set of review comments 
>
> A minor re-base is required due to a conflict in a regression test

Okay, I will do this.

> -
>
> In heap_toast_insert_or_update() and in other places, the comments for
> new parameter preserved_am_info are missing.
>
> -

ok

> +toast_compress_datum(Datum value, Oid acoid)
>  {
> struct varlena *tmp = NULL;
> int32 valsize;
> -   CompressionAmOptions cmoptions;
> +   CompressionAmOptions *cmoptions = NULL;
>
> I think tmp and cmoptions need not be initialized to NULL

Right

> -
>
> - TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
> - SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
>   /* successful compression */
> + toast_set_compressed_datum_info(tmp, amoid, valsize);
>   return PointerGetDatum(tmp);
>
> Any particular reason why is this code put in a new extern function ?
> Is there a plan to re-use it ? Otherwise, it's not necessary to do
> this.
>
> 
>
> Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
> amoptions_cache_mcxt" aren't static declarations. They are being used
> only in toast_internals.c

> ---
>
> The tab-completion doesn't show COMPRESSION :
> postgres=# create access method my_method TYPE
> INDEX  TABLE
> postgres=# create access method my_method TYPE
>
> Also, the below syntax also would better be tab-completed  so as to
> display all the installed compression methods, in line with how we
> show all the storage methods like plain,extended,etc:
> postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION
>
> 

I will fix these comments in the next version of the patch.

> I could see the differences in compression ratio, and the compression
> and decompression speed when I use lz versus zib :
>
> CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
> create table lztab(t text);
> ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;
>
> pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
> COPY 13050
>
> real0m1.344s
> user0m0.031s
> sys 0m0.026s
>
> pgg:s2:pg$ time psql -c "\copy lztab from text.data"
> COPY 13050
>
> real0m2.088s
> user0m0.008s
> sys 0m0.050s
>
>
> pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
> pg_table_size('lztab'::regclass)"
>  pg_table_size | pg_table_size
> ---+---
>1261568 |   1687552
>
> pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like ''"
>  > /dev/null
>
> real0m0.127s
> user0m0.000s
> sys 0m0.002s
>
> pgg:s2:pg$ time psql -c "select NULL from lztab where t like ''"
> > /dev/null
>
> real0m0.050s
> user0m0.002s
> sys 0m0.000s
>

Thanks for testing this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Refactor ReindexStmt and its "concurrent" boolean

2020-09-02 Thread Michael Paquier

Hi all,

$subject has been mentioned a couple of times, including today:
https://www.postgresql.org/message-id/20200902010012.ge1...@paquier.xyz

We have a boolean argument in ReindexStmt to control a concurrent
run, and we also have in parallel of that a bitmask to control the
options of the statement, which feels like a duplicate.  Attached is a
patch to refactor the whole, adding CONCURRENTLY as a member of the
available options.  This simplifies a bit the code.

Any thoughts?
--
Michael
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..3129b684f6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -34,10 +34,10 @@ extern ObjectAddress DefineIndex(Oid relationId,
  bool check_not_in_use,
  bool skip_build,
  bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent);
-extern Oid	ReindexTable(RangeVar *relation, int options, bool concurrent);
+extern void ReindexIndex(RangeVar *indexRelation, int options);
+extern Oid	ReindexTable(RangeVar *relation, int options);
 extern void ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
-  int options, bool concurrent);
+  int options);
 extern char *makeObjectName(const char *name1, const char *name2,
 			const char *label);
 extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d52c563305..e83329fd6d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3353,6 +3353,7 @@ typedef struct ConstraintsSetStmt
 #define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */
 #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK (1 << 2)	/* skip missing relations */
+#define REINDEXOPT_CONCURRENTLY (1 << 3)	/* concurrent mode */
 
 typedef enum ReindexObjectType
 {
@@ -3371,7 +3372,6 @@ typedef struct ReindexStmt
 	RangeVar   *relation;		/* Table or index to reindex */
 	const char *name;			/* name of database to reindex */
 	int			options;		/* Reindex options flags */
-	bool		concurrent;		/* reindex concurrently? */
 } ReindexStmt;
 
 /* --
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b3a92381f9..040185dc54 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -97,7 +97,7 @@ static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
  */
 struct ReindexIndexCallbackState
 {
-	bool		concurrent;		/* flag from statement */
+	bool		options;		/* flag from statement */
 	Oid			locked_table_oid;	/* tracks previously locked table */
 };
 
@@ -2420,7 +2420,7 @@ ChooseIndexColumnNames(List *indexElems)
  *		Recreate a specific index.
  */
 void
-ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
+ReindexIndex(RangeVar *indexRelation, int options)
 {
 	struct ReindexIndexCallbackState state;
 	Oid			indOid;
@@ -2437,10 +2437,11 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 	 * upgrade the lock, but that's OK, because other sessions can't hold
 	 * locks on our temporary table.
 	 */
-	state.concurrent = concurrent;
+	state.options = options;
 	state.locked_table_oid = InvalidOid;
 	indOid = RangeVarGetRelidExtended(indexRelation,
-	  concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+	  (options & REINDEXOPT_CONCURRENTLY) != 0 ?
+	  ShareUpdateExclusiveLock : AccessExclusiveLock,
 	  0,
 	  RangeVarCallbackForReindexIndex,
 	  &state);
@@ -2460,7 +2461,8 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 	persistence = irel->rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	if (concurrent && persistence != RELPERSISTENCE_TEMP)
+	if ((options & REINDEXOPT_CONCURRENTLY) != 0 &&
+		persistence != RELPERSISTENCE_TEMP)
 		ReindexRelationConcurrently(indOid, options);
 	else
 		reindex_index(indOid, false, persistence,
@@ -2485,7 +2487,8 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 	 * non-concurrent case and table locks used by index_concurrently_*() for
 	 * concurrent case.
 	 */
-	table_lockmode = state->concurrent ? ShareUpdateExclusiveLock : ShareLock;
+	table_lockmode = ((state->options & REINDEXOPT_CONCURRENTLY) != 0) ?
+		ShareUpdateExclusiveLock : ShareLock;
 
 	/*
 	 * If we previously locked some other index's heap, and the name we're
@@ -2542,7 +2545,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation, int options, bool concurrent)
+ReindexTable(RangeVar *relation, int options)
 {
 	Oid			heapOid;
 	bool		result;
@@ -2556,11 +2559,13 @@ ReindexTable(RangeVar *relation, int options, bool concurrent)
 	 * locks on our temporary table.
 	 */
 	heapOid

Re: Refactor ReindexStmt and its "concurrent" boolean

2020-09-02 Thread Julien Rouhaud

On Wed, Sep 2, 2020 at 1:03 PM Michael Paquier  wrote:
>
> Hi all,
>
> $subject has been mentioned a couple of times, including today:
> https://www.postgresql.org/message-id/20200902010012.ge1...@paquier.xyz
>
> We have a boolean argument in ReindexStmt to control a concurrent
> run, and we also have in parallel of that a bitmask to control the
> options of the statement, which feels like a duplicate.  Attached is a
> patch to refactor the whole, adding CONCURRENTLY as a member of the
> available options.  This simplifies a bit the code.
>
> Any thoughts?


+1

 struct ReindexIndexCallbackState
 {
- bool concurrent; /* flag from statement */
+ bool options; /* flag from statement */
  Oid locked_table_oid; /* tracks previously locked table */
 };

Shouldn't options be an int?  The rest of the patch looks good to me.

Re: [PATCH] Covering SPGiST index

2020-09-02 Thread Andrey M. Borodin




> 31 авг. 2020 г., в 16:57, Pavel Borisov  написал(а):
> 
> I agree with all of your proposals and integrated them into v9.

I have a wild idea of renaming nextOffset in SpGistLeafTupleData.
Or at least documenting in comments that this field is more than just an offset.

This looks like assert rather than real runtime check in spgLeafTupleSize()

+   if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+   ereport(ERROR,
+   (errcode(ERRCODE_TOO_MANY_COLUMNS),
+errmsg("number of index columns (%d) 
exceeds limit (%d)",
+   
state->includeTupdesc->natts, INDEX_MAX_KEYS)));
Even if you go with check, number of columns is state->includeTupdesc->natts  + 
1.

Also I'd refactor this
+   /* Form descriptor for INCLUDE columns if any */
+   if (IndexRelationGetNumberOfAttributes(index) > 1)
+   {
+   int i;
+
+   cache->includeTupdesc = CreateTemplateTupleDesc(
+   
IndexRelationGetNumberOfAttributes(index) - 1);
 
+   for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; 
i++)
+   {
+   TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+  
TupleDescAttr(index->rd_att, i + 1)->atttypid,
+  -1, 0);
+   }
+   }
+   else
+   cache->includeTupdesc = NULL;
into something like
cache->includeTupdesc = NULL;
for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
{
if (cache->includeTupdesc == NULL)
// init tuple description
// init entry
}
But, probably it's only a matter of taste.

Beside this, I think patch is ready for committer. If Anastasia has no 
objections, let's flip CF entry state.

Thanks!

Best regards, Andrey Borodin.

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Stephen Frost

Greetings,

* Dave Page (dp...@pgadmin.org) wrote:
> On Tue, Sep 1, 2020 at 5:29 PM Stephen Frost  wrote:
> > * Dave Page (dp...@pgadmin.org) wrote:
> > > Attached is a patch against 12.4 for the build system in case anyone
> > wants
> > > to play (I'll do it properly against the head branch later). I'm guessing
> > > this will work for < 12, as with 12 I'm now getting the following which
> > > looks like it's related to GSS encryption:
> > >
> > > "C:\Users\dpage\Downloads\postgresql-12.4\pgsql.sln" (default target)
> > (1) ->
> > > "C:\Users\dpage\Downloads\postgresql-12.4\pgcrypto.vcxproj" (default
> > > target) (2) ->
> > > "C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj" (default
> > > target) (3) ->
> > > (Link target) ->
> > >   be-secure-gssapi.obj : error LNK2019: unresolved external symbol setenv
> > > referenced in function secure_open_gssapi
> > > [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> > >   .\Release\postgres\postgres.exe : fatal error LNK1120: 1 unresolved
> > > externals [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> > >
> > > I'll dig into that some more.
> >
> > Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
> > used under Windows.  If you're successful, I don't have any issue
> > helping to make that work, though I'm curious if you're trying to build
> > with MIT KfW (which is rather ancient these days, being based on krb5
> > 1.13 and not updated since..) or with a more current release...?
> 
> I'm currently using the KFW 4.1 build from MIT. I've tried building it
> myself but it requires a very old toolchain (which defeated the point of
> what I was trying to do at the time).

> I haven't yet looked to see if the source for krb5-1.8.2 will build or even
> has the right bits in it for Windows - as I'm sure you know MIT seem to
> maintain an entirely different version for Windows for which I assume
> there's a reason.

I'm a bit confused as to why you'd consider trying 1.8.2- did you mean
1.18.2 there, perhaps..?  That's what I would think to try, since, as I
understand it from following the Kerberos Dev list (which is pretty
responsive...) has been updated to work with newer Windows build
toolchains.

> > Of course, it'd be good to get a buildfarm animal in place that's
> > actually testing this if we're going to make it work.
> 
> Fixing the config on hamerkop should deal with that I think. Though I am
> confused as to why the Buildfarm UI thinks it has Kerberos support enabled
> - did we change the config parameter from krb5 to gss some time prior to
> 9.5? If so, that could explain it.

Looks to be run by SRA OSS..  Perhaps reaching out to them to ask about
it would help?

> > Regarding the setenv() call, should be able to use pgwin32_putenv() in
> > place on Windows, I'd think..?
> 
> Right, I imagine so. It's on my todo...

Alright.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Amit Kapila

On Wed, Sep 2, 2020 at 3:41 PM Dilip Kumar  wrote:
>
> On Wed, Sep 2, 2020 at 10:55 AM Amit Kapila  wrote:
> >
> > >
> >
> > We can combine the tests in 015_stream_simple.pl and
> > 020_stream_binary.pl as I can't see a good reason to keep them
> > separate. Then, I think we can keep only this part with the main patch
> > and extract other tests into a separate patch. Basically, we can
> > commit the basic tests with the main patch and then keep the advanced
> > tests separately. I am afraid that there are some tests that don't add
> > much value so we can review them separately.
>
> Fixed
>

I have slightly adjusted this test and ran pgindent on the patch. I am
planning to push this tomorrow unless you have more comments.

-- 
With Regards,
Amit Kapila.


v60-0001-Add-support-for-streaming-to-built-in-logical-re.patch
Description: Binary data

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-09-02 Thread Alvaro Herrera

On 2020-Sep-02, David Rowley wrote:

> v7 (Separate Result Cache node)
> Query 1:
>  Execution Time: 894.003 ms
> 
> Query 2:
>  Execution Time: 854.950 ms

> v7 + hacks_V3 (caching done in Nested Loop)
> Query 1:
>  Execution Time: 770.470 ms
>
> Query 2
>  Execution Time: 779.181 ms

Wow, this is a *significant* change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [patch] Fix checksum verification in base backups for zero page headers

2020-09-02 Thread Anastasia Lubennikova


On 01.09.2020 13:22, Michael Banck wrote:

Hi,

as a continuation of [1], I've split-off the zero page header case from
the last patch, as this one seems less contentious.


Michael

[1] https://commitfest.postgresql.org/28/2308/

I've looked through the previous discussion. As far as I got it, most of 
the controversy was about online checksums improvements.


The warning about pd_upper inconsistency that you've added is a good 
addition. The patch is a bit messy, though, because a huge code block 
was shifted.


Will it be different, if you just leave
    "if (!PageIsNew(page) && PageGetLSN(page) < startptr)"
block as it was, and add
    "else if (PageIsNew(page) && !PageIsZero(page))" ?


While on it, I also have a few questions about the code around:

1) Maybe we can use other header sanity checks from PageIsVerified() as 
well? Or even the function itself.


2) > /* Reset loop to validate the block again */
How many times do we try to reread the block? Is one reread enough?
Speaking of which, 'reread_cnt' name looks confusing to me. I would 
expect that this variable contains a number of attempts, not the number 
of bytes read.


If everyone agrees, that for basebackup purpose it's enough to rely on a 
single reread, I'm ok with it.
Another approach is to read the page directly from shared buffers to 
ensure that the page is fine. This way is more complicated, but should 
be almost bullet-proof. Using it we can also check pages with lsn >= 
startptr.


3) Judging by warning messages, we count checksum failures per file, not 
per page, and don't report after a fifth failure. Why so?  Is it a 
common case that so many pages of one file are corrupted?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Problems with the FSM, heap fillfactor, and temporal locality

2020-09-02 Thread John Naylor

On Wed, Sep 2, 2020 at 1:57 AM Peter Geoghegan  wrote:
>
> On Wed, Aug 26, 2020 at 1:46 AM John Naylor  
> wrote:
> > The fact that that logic extends by 20 * numwaiters to get optimal
> > performance is a red flag that resources aren't being allocated
> > efficiently.
>
> I agree that that's pretty suspicious.

Per Simon off-list some time ago (if I understood him correctly),
counting the lock waiters make the lock contention worse. I haven't
tried to measure this, but I just had an idea instead to keep track of
how many times the table has previously been extended by multiple
blocks, and extend by a number calculated from that. Something like
(pow2(2 + num-times-ext-mult-blocks)), with some ceiling perhaps much
smaller than 512. Maybe a bit off topic, but the general problem you
brought up has many moving parts, as you've mentioned.

> > I have an idea to ignore fp_next_slot entirely if we have
> > extended by multiple blocks: The backend that does the extension
> > stores in the FSM root page 1) the number of blocks added and 2) the
> > end-most block number. Any request for space will look for a valid
> > value here first before doing the usual search. If there is then the
> > block to try is based on a hash of the xid. Something like:
> >
> > candidate-block = prev-end-of-relation + 1 + (xid % (num-new-blocks))
>
> I was thinking of doing something in shared memory, and not using the
> FSM here at all. If we're really giving 20 pages out to each backend,
> we will probably benefit from explicitly assigning contiguous ranges
> of pages to each backend, and making some effort to respect that they
> own the blocks in some general sense.

That would give us flexibility and precise control. I suspect it would
also have more cognitive and maintenance cost, by having more than one
source of info.

> Hopefully without also losing
> access to the free space in corner cases (e.g. one of the backend's
> has an error shortly after receiving its contiguous range of blocks).

Right, you'd need some way of resetting or retiring the shared memory
info when it is no longer useful. That was my thinking with the
collision counter -- go back to using the FSM when we no longer have a
reasonable chance of getting a fresh block.

> > Also num-new-blocks above could be scaled down from the actual number
> > of blocks added, just to make sure writes aren't happening all over
> > the place.
>
> Or scaled up, perhaps.

I don't think I explained this part well, so here's a concrete
example. Let's say 128 blocks were added at once. Then xid % 128 would
give a number to be added to the previous last block in the relation,
so new target block allocations could be anywhere in this 128. By
"scale down", I mean compute (say) xid % 32. That would limit deviance
of spatial locality for those backends that were waiting on extension.
Doing the opposite, like xid % 256, would give you blocks past the end
of the relation. Further thinking out loud, after detecting enough
collisions in the first 32, we could iterate through the other ranges
and finally revert to conventional FSM search.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-09-02 Thread Dilip Kumar

On Wed, Sep 2, 2020 at 7:19 PM Amit Kapila  wrote:

> On Wed, Sep 2, 2020 at 3:41 PM Dilip Kumar  wrote:
> >
> > On Wed, Sep 2, 2020 at 10:55 AM Amit Kapila 
> wrote:
> > >
> > > >
> > >
> > > We can combine the tests in 015_stream_simple.pl and
> > > 020_stream_binary.pl as I can't see a good reason to keep them
> > > separate. Then, I think we can keep only this part with the main patch
> > > and extract other tests into a separate patch. Basically, we can
> > > commit the basic tests with the main patch and then keep the advanced
> > > tests separately. I am afraid that there are some tests that don't add
> > > much value so we can review them separately.
> >
> > Fixed
> >
>
> I have slightly adjusted this test and ran pgindent on the patch. I am
> planning to push this tomorrow unless you have more comments.
>

Looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: A micro-optimisation for walkdir()

2020-09-02 Thread Tom Lane

Thomas Munro  writes:
> You don't need to call stat() just to find out if a dirent is a file
> or directory, most of the time.  Please see attached.

Hm.  If we do this, I can see wanting to apply the knowledge in more
places than walkdir().  Is it possible to extract out the nonstandard
bits into a reusable subroutine?  I'm envisioning an API more or less
like

   extern enum PGFileType identify_file_type(const char *path,
 const struct dirent *de,
 bool look_thru_symlinks);

regards, tom lane

Re: describe-config issue

2020-09-02 Thread Tom Lane

vignesh C  writes:
> Postgres's describe-config option prints reset_val for int & real
> configuration parameters which is not useful as it is not updated.

Uh, what?

> Printing boot_val is better in this case.

Please defend that claim.  Otherwise this seems like a pretty
random change.

regards, tom lane

Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Tom Lane

Oleksandr Shulgin  writes:
> On Wed, Sep 2, 2020 at 7:35 AM Andres Freund  wrote:
>> In the docs we already name the parameters using SQL like syntax, see [1].
>> How about we actually do so for at least the more common / complicated
>> functions?

> I find myself in the same situation a lot.
> I've never realized that's an implementation detail and not something
> fundamental preventing the parameters from being named in the built-in
> functions.

Yeah, it's not really hard to fix; somebody just has to do the legwork.
The attached is enough to get me to

regression=# \df regexp_split_to_table
  List of functions
   Schema   | Name  | Result data type |  Argument data 
types  | Type 
+---+--+---+--
 pg_catalog | regexp_split_to_table | SETOF text   | string text, pattern 
text | func
 pg_catalog | regexp_split_to_table | SETOF text   | string text, pattern 
text, flags text | func
(2 rows)

I don't think we should go overboard on this, but +1 for labeling all the
cases where the usage isn't obvious.

regards, tom lane

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1dd325e0e6..ecf1299ef3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3551,10 +3551,12 @@
 { oid => '2765', descr => 'split string by pattern',
   proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
   prorettype => 'text', proargtypes => 'text text',
+  proargnames => '{string,pattern}',
   prosrc => 'regexp_split_to_table_no_flags' },
 { oid => '2766', descr => 'split string by pattern',
   proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
   prorettype => 'text', proargtypes => 'text text text',
+  proargnames => '{string,pattern,flags}',
   prosrc => 'regexp_split_to_table' },
 { oid => '2767', descr => 'split string by pattern',
   proname => 'regexp_split_to_array', prorettype => '_text',

Re: [PATCH] Covering SPGiST index

2020-09-02 Thread Pavel Borisov

>
> I have a wild idea of renaming nextOffset in SpGistLeafTupleData.
> Or at least documenting in comments that this field is more than just an
> offset.
>
Seems reasonable as previous changes localized mentions of nextOffset only
to leaf tuple definition and access macros. So I've done this renaming.


> This looks like assert rather than real runtime check in spgLeafTupleSize()
>
> +   if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_TOO_MANY_COLUMNS),
> +errmsg("number of index columns
> (%d) exceeds limit (%d)",
> +
>  state->includeTupdesc->natts, INDEX_MAX_KEYS)));
> Even if you go with check, number of columns is
> state->includeTupdesc->natts  + 1.
>
I placed this check only once on SpGist state creation and replaced the
other checks to asserts.


> Also I'd refactor this
> +   /* Form descriptor for INCLUDE columns if any */
>
Also done. Thanks a lot!
See v10.


-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v10-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patch
Description: Binary data


v1-0002-Add-VACUUM-ANALYZE-to-index-including-test.patch
Description: Binary data

Re: A micro-optimisation for walkdir()

2020-09-02 Thread Juan José Santamaría Flecha

On Wed, Sep 2, 2020 at 4:35 PM Tom Lane  wrote:

> Thomas Munro  writes:
> > You don't need to call stat() just to find out if a dirent is a file
> > or directory, most of the time.  Please see attached.
>
> Hm.  If we do this, I can see wanting to apply the knowledge in more
> places than walkdir().
>

Win32 could also benefit from this micro-optimisation if we expanded the
dirent port to include d_type. Please find attached a patch that does so.

Regards,

Juan José Santamaría Flecha

0002-Add-d_type-to-WIN32-dirent-port.patch
Description: Binary data

Re: Group by reordering optimization

2020-09-02 Thread Tomas Vondra

On Tue, Sep 01, 2020 at 03:09:14PM -0700, Peter Geoghegan wrote:

On Tue, Sep 1, 2020 at 2:09 PM Tomas Vondra
 wrote:

>* Instead of changing the order directly, now patch creates another patch with
>  modifier order of clauses. It does so for the normal sort as well as for
>  incremental sort. The whole thing is done in two steps: first it finds a
>  potentially better ordering taking into account number of groups, widths and
>  comparison costs; afterwards this information is used to produce a cost
>  estimation. This is implemented via a separate create_reordered_sort_path to
>  not introduce too many changes, I couldn't find any better place.
>

I haven't tested the patch with any queries, but I agree this seems like
the right approach in general.

If we're creating a new sort path anyway, then perhaps we can also
change the collation -- it might be possible to "reduce" it to the "C"
collation without breaking queries.

This is admittedly pretty hard to do well. It could definitely work
out when we have to do a sort anyway -- a sort with high cardinality
abbreviated keys will be very fast (though we can't use abbreviated
keys with libc collations right now). OTOH, it would be quite
counterproductive if we were hoping to get an incremental sort that
used some available index that happens to use the default collation
(which is not the C collation in cases where this optimization is
expected to help).

Even if reducing collations like this was possible (I have no idea how
tricky it is, my knowledge of collations is pretty minimal and from what
I know I'm not dying to learn more), I suggest we consider that out of
scope for this particular patch.

There are multiple open issues already - deciding which pathkeys are
interesting, reasonable costing, etc. Once those issues are solved, we
can consider tweaking collations as an additional optimizations.

Or maybe we can consider it entirely separately, i.e. why would it
matter if we re-order the GROUP BY keys? The collation reduction can
just as well help even if we use the same pathkeys.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Dave Page

Hi

On Wed, Sep 2, 2020 at 9:05 AM Dave Page  wrote:

>
>> Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
>> used under Windows.
>
>
Here's a patch to make it build successfully (against head). I believe the
changes to Solution.pm should be back patched all the way, and the rest to
12.

Testing however, has been more problematic - I suspect at least partly
because of my Kerberos newbie-ness. I have a test server in an Ubuntu VM,
which I've used quite successfully to authenticate against another VM
running PG 12  on Ubuntu, from both Ubuntu and Windows clients. Using that,
but with a Windows client running MIT Kerberos I find that getting a ticket
takes a good 30 seconds or so. Postgres also seems to get it's ticket
successfully via the keytab file:

C:\pg>"c:\Program Files\MIT\Kerberos\bin\klist.exe"
Ticket cache: API:Initial default ccache
Default principal: dp...@pgadmin.org

Valid starting ExpiresService principal
09/02/20 15:06:49  09/03/20 01:06:49  krbtgt/pgadmin@pgadmin.org
renew until 09/03/20 15:06:31
09/02/20 15:07:06  09/03/20 01:06:49  postgres/win-ilt1arj8a9c@
renew until 09/03/20 15:06:31
09/02/20 15:07:06  09/03/20 01:06:49  postgres/win-ilt1arj8...@pgadmin.org
renew until 09/03/20 15:06:31

However, If I try to login using host + gss in the pg_hba.conf file, I then
get:

C:\pg>bin\psql postgres
psql: error: could not connect to server: SSPI continuation error: No
credentials are available in the security package
 (8009030e)

If I try to use hostgssenc + gss, it looks like it's not even trying to
encrypt:

C:\pg>bin\psql postgres
psql: error: could not connect to server: FATAL:  no pg_hba.conf entry for
host "::1", user "dpage", database "postgres", SSL off

Any ideas?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

msvc64-kerberos-v2.diff
Description: Binary data

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Dave Page

On Wed, Sep 2, 2020 at 2:47 PM Stephen Frost  wrote:

> Greetings,
>
> * Dave Page (dp...@pgadmin.org) wrote:
> > On Tue, Sep 1, 2020 at 5:29 PM Stephen Frost  wrote:
> > > * Dave Page (dp...@pgadmin.org) wrote:
> > > > Attached is a patch against 12.4 for the build system in case anyone
> > > wants
> > > > to play (I'll do it properly against the head branch later). I'm
> guessing
> > > > this will work for < 12, as with 12 I'm now getting the following
> which
> > > > looks like it's related to GSS encryption:
> > > >
> > > > "C:\Users\dpage\Downloads\postgresql-12.4\pgsql.sln" (default target)
> > > (1) ->
> > > > "C:\Users\dpage\Downloads\postgresql-12.4\pgcrypto.vcxproj" (default
> > > > target) (2) ->
> > > > "C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj" (default
> > > > target) (3) ->
> > > > (Link target) ->
> > > >   be-secure-gssapi.obj : error LNK2019: unresolved external symbol
> setenv
> > > > referenced in function secure_open_gssapi
> > > > [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> > > >   .\Release\postgres\postgres.exe : fatal error LNK1120: 1 unresolved
> > > > externals [C:\Users\dpage\Downloads\postgresql-12.4\postgres.vcxproj]
> > > >
> > > > I'll dig into that some more.
> > >
> > > Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
> > > used under Windows.  If you're successful, I don't have any issue
> > > helping to make that work, though I'm curious if you're trying to build
> > > with MIT KfW (which is rather ancient these days, being based on krb5
> > > 1.13 and not updated since..) or with a more current release...?
> >
> > I'm currently using the KFW 4.1 build from MIT. I've tried building it
> > myself but it requires a very old toolchain (which defeated the point of
> > what I was trying to do at the time).
>
> > I haven't yet looked to see if the source for krb5-1.8.2 will build or
> even
> > has the right bits in it for Windows - as I'm sure you know MIT seem to
> > maintain an entirely different version for Windows for which I assume
> > there's a reason.
>
> I'm a bit confused as to why you'd consider trying 1.8.2- did you mean
> 1.18.2 there, perhaps..?


Yes, typo.


> That's what I would think to try, since, as I
> understand it from following the Kerberos Dev list (which is pretty
> responsive...) has been updated to work with newer Windows build
> toolchains.
>

OK, will try to do that tomorrow.

Thanks!


>
> > > Of course, it'd be good to get a buildfarm animal in place that's
> > > actually testing this if we're going to make it work.
> >
> > Fixing the config on hamerkop should deal with that I think. Though I am
> > confused as to why the Buildfarm UI thinks it has Kerberos support
> enabled
> > - did we change the config parameter from krb5 to gss some time prior to
> > 9.5? If so, that could explain it.
>
> Looks to be run by SRA OSS..  Perhaps reaching out to them to ask about
> it would help?
>
> > > Regarding the setenv() call, should be able to use pgwin32_putenv() in
> > > place on Windows, I'd think..?
> >
> > Right, I imagine so. It's on my todo...
>
> Alright.
>
> Thanks,
>
> Stephen
>


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Alvaro Herrera

On 2020-Sep-02, Tom Lane wrote:

> I don't think we should go overboard on this, but +1 for labeling all the
> cases where the usage isn't obvious.

+1

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Stephen Frost

Greetings,

* Dave Page (dp...@pgadmin.org) wrote:
> On Wed, Sep 2, 2020 at 9:05 AM Dave Page  wrote:
> >> Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
> >> used under Windows.
>
> Here's a patch to make it build successfully (against head). I believe the
> changes to Solution.pm should be back patched all the way, and the rest to
> 12.

Looks about right..  I might pull out the code from both places that are
setting that variable into a dedicated function to be used from both
though.

> Testing however, has been more problematic - I suspect at least partly
> because of my Kerberos newbie-ness. I have a test server in an Ubuntu VM,
> which I've used quite successfully to authenticate against another VM
> running PG 12  on Ubuntu, from both Ubuntu and Windows clients. Using that,
> but with a Windows client running MIT Kerberos I find that getting a ticket
> takes a good 30 seconds or so. Postgres also seems to get it's ticket
> successfully via the keytab file:

So, from Windows clients that don't have MIT KfW installed, you're able
to authenticate against PG 12 on Ubuntu using Kerberos, right..?  With
PG built using SSPI on the client side, I'm guessing?

Kerberos uses reverse DNS to try to check what hostname to use when
requesting a ticket, I wonder if what you're seeing here is a delay due
to there not being reverse DNS functional in the environment, perhaps..?

> C:\pg>"c:\Program Files\MIT\Kerberos\bin\klist.exe"
> Ticket cache: API:Initial default ccache
> Default principal: dp...@pgadmin.org
> 
> Valid starting ExpiresService principal
> 09/02/20 15:06:49  09/03/20 01:06:49  krbtgt/pgadmin@pgadmin.org
> renew until 09/03/20 15:06:31
> 09/02/20 15:07:06  09/03/20 01:06:49  postgres/win-ilt1arj8a9c@
> renew until 09/03/20 15:06:31
> 09/02/20 15:07:06  09/03/20 01:06:49  postgres/win-ilt1arj8...@pgadmin.org
> renew until 09/03/20 15:06:31
> 
> However, If I try to login using host + gss in the pg_hba.conf file, I then
> get:
> 
> C:\pg>bin\psql postgres
> psql: error: could not connect to server: SSPI continuation error: No
> credentials are available in the security package
>  (8009030e)

This is with PG compiled with GSS on the client side and using MIT KfW?

This particular error from SSPI seems to possibly be coming from the
constrained delegation system.  While not directly about this issue,
Microsoft has some documentation about configuring constrained
delegation (and how to turn it off) here:

https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/deploy/Set-up-hosts-for-live-migration-without-Failover-Clustering

Now, we aren't actually delegating credentials here, so it seems a bit
odd for it to be complaining about that, but perhaps it's throwing this
error because the MIT KfW library has no clue about constrained
delegation and therefore wouldn't be trying to enforce it.

> If I try to use hostgssenc + gss, it looks like it's not even trying to
> encrypt:
> 
> C:\pg>bin\psql postgres
> psql: error: could not connect to server: FATAL:  no pg_hba.conf entry for
> host "::1", user "dpage", database "postgres", SSL off
> 
> Any ideas?

If it's not trying then I would be suspicious that the
gss_acquire_creds() call is saying that there isn't a credential cache,
though that would be a bit odd given that klist seems to be working.

Would certainly be interesting to see if 1.18.2 changes anything in this
regard.

Thanks,

Stephen

signature.asc
Description: PGP signature

PostgreSQL 13 Release Timeline

2020-09-02 Thread Jonathan S. Katz

Hi,

First, on behalf of the PostgreSQL 13 RMT, thank you everyone for your
hard work not only building all of the features that are going into
PostgreSQL 13, but for the diligence and work in stabilizing the
release. We have now reached the point of the release cycle where we can
start making decisions about GA (general availability).

Based on the current state and progress of the open items[1], the RMT
with consultation from the release team has decided to forgo a Beta 4
release. Instead, we propose the following schedule:

* PostgreSQL 13 Release Candidate 1 (RC1) will be released on September
17, 2020.

* In absence of any critical issues, PostgreSQL 13 will become generally
available on September 24, 2020.

The aim is to have all outstanding open items committed before RC1.
Please ensure everything is committed by September 16, 2020 AoE. We will
send out a reminder prior to that date.

We're very excited that we are in the final phase of the PostgreSQL 13
launch. Thank you again for making this possible.

Sincerely,

Alvaro, Peter, Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items



signature.asc
Description: OpenPGP digital signature

Re: Kerberos support broken on MSVC builds for Windows x64?

2020-09-02 Thread Dave Page

Hi

On Wed, Sep 2, 2020 at 7:08 PM Stephen Frost  wrote:

> Greetings,
>
> * Dave Page (dp...@pgadmin.org) wrote:
> > On Wed, Sep 2, 2020 at 9:05 AM Dave Page  wrote:
> > >> Yes, that'd be in the GSSENC code, which I hadn't been expecting to be
> > >> used under Windows.
> >
> > Here's a patch to make it build successfully (against head). I believe
> the
> > changes to Solution.pm should be back patched all the way, and the rest
> to
> > 12.
>
> Looks about right..  I might pull out the code from both places that are
> setting that variable into a dedicated function to be used from both
> though.
>
> > Testing however, has been more problematic - I suspect at least partly
> > because of my Kerberos newbie-ness. I have a test server in an Ubuntu VM,
> > which I've used quite successfully to authenticate against another VM
> > running PG 12  on Ubuntu, from both Ubuntu and Windows clients. Using
> that,
> > but with a Windows client running MIT Kerberos I find that getting a
> ticket
> > takes a good 30 seconds or so. Postgres also seems to get it's ticket
> > successfully via the keytab file:
>
> So, from Windows clients that don't have MIT KfW installed, you're able
> to authenticate against PG 12 on Ubuntu using Kerberos, right..?  With
> PG built using SSPI on the client side, I'm guessing?
>

Yes, with the workstation configured to authenticate windows login with
Kerberos (e.g.
https://www.garyhawkins.me.uk/non-domain-mit-kerberos-logins-on-windows-10/)


>
> Kerberos uses reverse DNS to try to check what hostname to use when
> requesting a ticket, I wonder if what you're seeing here is a delay due
> to there not being reverse DNS functional in the environment, perhaps..?
>

Ahh, probably. I'm just using host files on these VMs, but I'll bet I
forgot to add the client to the kdc's file. Will try that tomorrow.


>
> > C:\pg>"c:\Program Files\MIT\Kerberos\bin\klist.exe"
> > Ticket cache: API:Initial default ccache
> > Default principal: dp...@pgadmin.org
> >
> > Valid starting ExpiresService principal
> > 09/02/20 15:06:49  09/03/20 01:06:49  krbtgt/pgadmin@pgadmin.org
> > renew until 09/03/20 15:06:31
> > 09/02/20 15:07:06  09/03/20 01:06:49  postgres/win-ilt1arj8a9c@
> > renew until 09/03/20 15:06:31
> > 09/02/20 15:07:06  09/03/20 01:06:49  postgres/
> win-ilt1arj8...@pgadmin.org
> > renew until 09/03/20 15:06:31
> >
> > However, If I try to login using host + gss in the pg_hba.conf file, I
> then
> > get:
> >
> > C:\pg>bin\psql postgres
> > psql: error: could not connect to server: SSPI continuation error: No
> > credentials are available in the security package
> >  (8009030e)
>
> This is with PG compiled with GSS on the client side and using MIT KfW?
>

Yes.


>
> This particular error from SSPI seems to possibly be coming from the
> constrained delegation system.  While not directly about this issue,
> Microsoft has some documentation about configuring constrained
> delegation (and how to turn it off) here:
>
>
> https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/deploy/Set-up-hosts-for-live-migration-without-Failover-Clustering
>
> Now, we aren't actually delegating credentials here, so it seems a bit
> odd for it to be complaining about that, but perhaps it's throwing this
> error because the MIT KfW library has no clue about constrained
> delegation and therefore wouldn't be trying to enforce it.
>

OK, I'll look into that.


>
> > If I try to use hostgssenc + gss, it looks like it's not even trying to
> > encrypt:
> >
> > C:\pg>bin\psql postgres
> > psql: error: could not connect to server: FATAL:  no pg_hba.conf entry
> for
> > host "::1", user "dpage", database "postgres", SSL off
> >
> > Any ideas?
>
> If it's not trying then I would be suspicious that the
> gss_acquire_creds() call is saying that there isn't a credential cache,
> though that would be a bit odd given that klist seems to be working.
>
> Would certainly be interesting to see if 1.18.2 changes anything in this
> regard.
>

I'll let you know how that goes. Thanks for the tips!

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Allow continuations in "pg_hba.conf" files

2020-09-02 Thread Tom Lane

Fabien COELHO  writes:
> [ pg-hba-cont-4.patch ]

I looked this over and I think it seems reasonable, but there's
something else we should do.  If people are writing lines long
enough that they need to continue them, how long will it be
before they overrun the line length limit?  Admittedly, there's
a good deal of daylight between 80 characters and 8K, but if
we're busy removing restrictions on password length in an adjacent
thread [1], I think we ought to get rid of pg_hba.conf's line length
restriction while we're at it.

Accordingly, I borrowed some code from that thread and present
the attached revision.  I also added some test coverage, since
that was lacking before, and wordsmithed docs and comments slightly.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/09512C4F-8CB9-4021-B455-EF4C4F0D55A0%40amazon.com

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 5cd88b462d..d62d1a061c 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -77,13 +77,15 @@
The general format of the pg_hba.conf file is
a set of records, one per line. Blank lines are ignored, as is any
text after the # comment character.
-   Records cannot be continued across lines.
+   A record can be continued onto the next line by ending the line with
+   a backslash. (Backslashes are not special except at the end of a line.)
A record is made
up of a number of fields which are separated by spaces and/or tabs.
Fields can contain white space if the field value is double-quoted.
Quoting one of the keywords in a database, user, or address field (e.g.,
all or replication) makes the word lose its special
meaning, and just match a database, user, or host with that name.
+   Backslash line continuation applies even within quoted text or comments.
   
 
   
@@ -821,7 +823,7 @@ local   db1,db2,@demodbs  all   md5
 
 map-name system-username database-username
 
-   Comments and whitespace are handled in the same way as in
+   Comments, whitespace and line continuations are handled in the same way as in
pg_hba.conf.  The
map-name is an arbitrary name that will be used to
refer to this mapping in pg_hba.conf. The other
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 9d63830553..5991a21cf2 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -29,6 +29,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/ip.h"
+#include "common/string.h"
 #include "funcapi.h"
 #include "libpq/ifaddr.h"
 #include "libpq/libpq.h"
@@ -54,7 +55,6 @@
 
 
 #define MAX_TOKEN	256
-#define MAX_LINE	8192
 
 /* callback data for check_network_callback */
 typedef struct check_network_data
@@ -166,11 +166,19 @@ pg_isblank(const char c)
 /*
  * Grab one token out of the string pointed to by *lineptr.
  *
- * Tokens are strings of non-blank
- * characters bounded by blank characters, commas, beginning of line, and
- * end of line. Blank means space or tab. Tokens can be delimited by
- * double quotes (this allows the inclusion of blanks, but not newlines).
- * Comments (started by an unquoted '#') are skipped.
+ * Tokens are strings of non-blank characters bounded by blank characters,
+ * commas, beginning of line, and end of line.  Blank means space or tab.
+ *
+ * Tokens can be delimited by double quotes (this allows the inclusion of
+ * blanks or '#', but not newlines).  As in SQL, write two double-quotes
+ * to represent a double quote.
+ *
+ * Comments (started by an unquoted '#') are skipped, i.e. the remainder
+ * of the line is ignored.
+ *
+ * (Note that line continuation processing happens before tokenization.
+ * Thus, if a continuation occurs within quoted text or a comment, the
+ * quoted text or comment is considered to continue to the next line.)
  *
  * The token, if any, is returned at *buf (a buffer of size bufsz), and
  * *lineptr is advanced past the token.
@@ -470,6 +478,7 @@ static MemoryContext
 tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 {
 	int			line_number = 1;
+	StringInfoData buf;
 	MemoryContext linecxt;
 	MemoryContext oldcxt;
 
@@ -478,47 +487,72 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 	ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(linecxt);
 
+	initStringInfo(&buf);
+
 	*tok_lines = NIL;
 
 	while (!feof(file) && !ferror(file))
 	{
-		char		rawline[MAX_LINE];
 		char	   *lineptr;
 		List	   *current_line = NIL;
 		char	   *err_msg = NULL;
+		int			last_backslash_buflen = 0;
+		int			continuations = 0;
 
-		if (!fgets(rawline, sizeof(rawline), file))
-		{
-			int			save_errno = errno;
+		/* Collect the next input line, handling backslash continuations */
+		resetStringInfo(&buf);
 
-			if (!ferror(file))
-break;			/* normal EOF */
-			/* I/O error! */
-			ereport(elevel,
-	(errcode_for_file_access(),
-

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Jesse Zhang

Hi Peter,

Yeah it's funny I got this immediately after upgrading to Big Sur (beta
5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing
list thread on it from 4 years ago (it lists Heikki and Thomas Munro as
reviewers). Was it prompted by a similar error you encountered?

On Tue, Aug 25, 2020 at 6:57 AM Peter Eisentraut
 wrote:
>
> On 2020-08-02 23:18, Tom Lane wrote:
> > Thomas Gilligan  writes:
> >> Under the next version of macOS (11.0 unreleased beta 3), configuring 
> >> Postgres 9.5 and 9.6 fails with
> >
> >>> checking test program... ok
> >>> checking whether long int is 64 bits... no
> >>> checking whether long long int is 64 bits... no
> >>> configure: error: Cannot find a working 64-bit integer type.
> >
> > Hm, could we see the config.log output for this?  I'm not 100% convinced
> > that you've diagnosed the problem accurately, because it'd imply that
> > Apple made some fundamentally incompatible changes in libc, which
> > seems like stirring up trouble for nothing.
>
> It looks like the new compiler errors out on calling undeclared
> functions.  Might be good to see config.log to confirm this.

Yeah here's an excerpt from config.log verbatim (I'm not wrapping the
lines):

| configure:13802: checking whether long int is 64 bits
| configure:13860: ccache clang -o conftest -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O0 conftest.c -lz
-lreadline -lm  >&5
| conftest.c:169:5: warning: no previous prototype for function
'does_int64_work' [-Wmissing-prototypes]
| int does_int64_work()
| ^
| conftest.c:169:1: note: declare 'static' if the function is not
intended to be used outside of this translation unit
| int does_int64_work()
| ^
| static
| conftest.c:183:1: warning: type specifier missing, defaults to 'int'
[-Wimplicit-int]
| main() {
| ^
| conftest.c:184:3: error: implicitly declaring library function
'exit' with type 'void (int) __attribute__((noreturn))'
[-Werror,-Wimplicit-function-declaration]
|   exit(! does_int64_work());
|   ^
| conftest.c:184:3: note: include the header  or explicitly
provide a declaration for 'exit'
| 2 warnings and 1 error generated.

Cheers,
Jesse

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Alvaro Herrera

On 2020-Sep-02, Jesse Zhang wrote:

> Hi Peter,
> 
> Yeah it's funny I got this immediately after upgrading to Big Sur (beta
> 5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing
> list thread on it from 4 years ago (it lists Heikki and Thomas Munro as
> reviewers). Was it prompted by a similar error you encountered?

https://postgr.es/m/bf9de63c-b669-4b8c-d33b-4a5ed11cd...@2ndquadrant.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Jesse Zhang

Wow thanks Alvaro! My search of "most obvious keywords" didn't turn this
up.

On Wed, Sep 2, 2020 at 2:18 PM Alvaro Herrera wrote:
>
> On 2020-Sep-02, Jesse Zhang wrote:
>
> > Hi Peter,
> >
> > Yeah it's funny I got this immediately after upgrading to Big Sur (beta
> > 5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing
> > list thread on it from 4 years ago (it lists Heikki and Thomas Munro as
> > reviewers). Was it prompted by a similar error you encountered?
>
> https://postgr.es/m/bf9de63c-b669-4b8c-d33b-4a5ed11cd...@2ndquadrant.com

Cheers,
Jesse

[NBTREE] Possible NULL pointer dereference (backend/access/nbtree/nbutils.c)

2020-09-02 Thread Ranier Vilela

Hi,

Is possible that BTreeTupleSetNAtts, leave everything tidy, so that
BTreeTupleGetHeapTID doesn't fail.
BTreeTupleGetHeapTID can return NULL.

But, as we can see:
1. Line 2085 (nbtutils.c):
if (BTreeTupleGetHeapTID(itup) != NULL && tupnatts != nkeyatts)
2. Line 803 (nbtsearch.c):
if (heapTid == NULL)

Maybe, better make sure, because:
3. Line 2285 (nbtutils.c):
ItemPointerCopy(BTreeTupleGetMaxHeapTID(lastleft), pivotheaptid);
4. Line 2316 (nbtutils.c) :
ItemPointerCopy(BTreeTupleGetHeapTID(firstright), pivotheaptid);

Can dereference NULL pointer (pivotheaptid) at runtime (release version).

itemptr.h:
#define ItemPointerCopy(fromPointer, toPointer) \
( \
AssertMacro(PointerIsValid(toPointer)), \
AssertMacro(PointerIsValid(fromPointer)), \
*(toPointer) = *(fromPointer) \
)

regards,
Ranier Vilela

Re: [NBTREE] Possible NULL pointer dereference (backend/access/nbtree/nbutils.c)

2020-09-02 Thread Ranier Vilela

Also, in:
5. Line 2671 (nbtutils.c):
  ItemPointerGetBlockNumber(BTreeTupleGetHeapTID(newtup)),
  ItemPointerGetOffsetNumber(BTreeTupleGetHeapTID(newtup)),

itemptr.h:
/*
 * ItemPointerGetBlockNumberNoCheck
 * Returns the block number of a disk item pointer.
 */
#define ItemPointerGetBlockNumberNoCheck(pointer) \
( \
BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
)

#define ItemPointerGetOffsetNumberNoCheck(pointer) \
( \
(pointer)->ip_posid \
)

regards,
Ranier Vilela

Re: [NBTREE] Possible NULL pointer dereference (backend/access/nbtree/nbutils.c)

2020-09-02 Thread Peter Geoghegan

On Wed, Sep 2, 2020 at 2:41 PM Ranier Vilela  wrote:
> Maybe, better make sure, because:
> 3. Line 2285 (nbtutils.c):
> ItemPointerCopy(BTreeTupleGetMaxHeapTID(lastleft), pivotheaptid);
> 4. Line 2316 (nbtutils.c) :
> ItemPointerCopy(BTreeTupleGetHeapTID(firstright), pivotheaptid);
>
> Can dereference NULL pointer (pivotheaptid) at runtime (release version).

The entire codepath in question exists to set a new pivot tuple's heap
TID, in the case where we have to include a heap TID in a new leaf
page high key. This is a tuple in palloc()'d memory that we ourselves
just created.

We know that BTreeTupleGetHeapTID() will return a valid heap TID
pointer (a pointer into the end of the new pivot tuple buffer) because
we just marked the pivot tuple as having space for one ourselves -- we
still completely own the tuple. While it's true that in general
BTreeTupleGetHeapTID() can return a NULL pointer, it does not matter
here. Even if BTreeTupleGetHeapTID() did somehow return a NULL
pointer, then the user would be getting off lightly by experiencing a
hard crash instead of data corruption.

You should spend more time (as in more than zero time) trying to
understand the intent of the code that you write these reports about.

-- 
Peter Geoghegan

Re: [NBTREE] Possible NULL pointer dereference (backend/access/nbtree/nbutils.c)

2020-09-02 Thread Ranier Vilela

> Even if BTreeTupleGetHeapTID() did somehow return a NULL
> pointer, then the user would be getting off lightly by experiencing a
> hard crash instead of data corruption.
>
Oh, I'm sorry, I thought that "hard crash" was a bad thing.

Ranier Vilela

Re: proposal - function string_to_table

2020-09-02 Thread Tom Lane

Pavel Stehule  writes:
> [ string_to_table-20200825.patch ]

I reviewed this, whacked it around a little, and pushed it.

Possibly the most controversial thing I did was to move the existing
documentation entry for string_to_array() into the string-functions
table.  I did not like it one bit that the patch was documenting
string_to_table() far away from string_to_array(), and on reflection
I concluded that you'd picked the right place and the issue here is
that string_to_array() was in the wrong place.

Also, I pared the proposed regression tests a great deal, ending up
with something that matches the existing tests for string_to_array().
The proposed tests seemed mighty duplicative, and they even contained
syntax errors, so I didn't believe that they were carefully considered.

regards, tom lane

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-09-02 Thread Justin Pryzby

On Thu, Sep 03, 2020 at 12:00:17AM +0300, Alexey Kondratov wrote:
> On 2020-09-02 07:56, Justin Pryzby wrote:
> > 
> > Done in the attached, which is also rebased on 1d6541666.
> > 
> > And added RelationAssumeNewRelfilenode() as I mentioned - but I'm hoping
> > to
> > hear from Michael about any reason not to call
> > RelationSetNewRelfilenode()
> > instead of directly calling the things it would itself call.
> 
> The latest patch set immediately got a conflict with Michael's fixup
> 01767533, so I've rebased it first of all.

On my side, I've also rearranged function parameters to make the diff more
readable.  And squishes your changes into the respective patches.

Michael started a new thread about retiring ReindexStmt->concurrent, which I
guess will cause more conflicts (although I don't see why we wouldn't implement
a generic List grammar now rather than only after a preliminary patch).

-- 
Justin
>From 704d230463deca5303b0eae6c55e0e197c6fa473 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Mar 2020 17:50:46 -0500
Subject: [PATCH v25 1/6] Change REINDEX/CLUSTER to accept an option list..

..like EXPLAIN (..), VACUUM (..), and ANALYZE (..).

Change docs in the style of VACUUM.  See also: 52dcfda48778d16683c64ca4372299a099a15b96
---
 doc/src/sgml/ref/cluster.sgml| 27 +---
 doc/src/sgml/ref/reindex.sgml| 43 +---
 src/backend/commands/cluster.c   | 23 -
 src/backend/commands/indexcmds.c | 41 --
 src/backend/nodes/copyfuncs.c|  2 ++
 src/backend/nodes/equalfuncs.c   |  2 ++
 src/backend/parser/gram.y| 33 +---
 src/backend/tcop/utility.c   | 36 +++---
 src/bin/psql/tab-complete.c  | 22 
 src/include/commands/cluster.h   |  3 ++-
 src/include/commands/defrem.h|  7 +++---
 src/include/nodes/parsenodes.h   |  2 ++
 12 files changed, 175 insertions(+), 66 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 4da60d8d56..e6ebce27e6 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -21,8 +21,13 @@ PostgreSQL documentation
 
  
 
-CLUSTER [VERBOSE] table_name [ USING index_name ]
-CLUSTER [VERBOSE]
+CLUSTER [VERBOSE] [ ( option [, ...] ) ] table_name [ USING index_name ]
+CLUSTER [VERBOSE] [ ( option [, ...] ) ]
+
+where option can be one of:
+
+VERBOSE [ boolean ]
+
 
  
 
@@ -81,6 +86,15 @@ CLUSTER [VERBOSE]
   Parameters
 
   
+   
+VERBOSE
+
+ 
+  Prints a progress report as each table is clustered.
+ 
+
+   
+

 table_name
 
@@ -100,10 +114,15 @@ CLUSTER [VERBOSE]

 

-VERBOSE
+boolean
 
  
-  Prints a progress report as each table is clustered.
+  Specifies whether the selected option should be turned on or off.
+  You can write TRUE, ON, or
+  1 to enable the option, and FALSE,
+  OFF, or 0 to disable it.  The
+  boolean value can also
+  be omitted, in which case TRUE is assumed.
  
 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c16f223e4e..a32e192a87 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,7 +25,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 where option can be one of:
 
-VERBOSE
+VERBOSE [ boolean ]
 
  
 
@@ -141,19 +141,6 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
-   
-name
-
- 
-  The name of the specific index, table, or database to be
-  reindexed.  Index and table names can be schema-qualified.
-  Presently, REINDEX DATABASE and REINDEX SYSTEM
-  can only reindex the current database, so their parameter must match
-  the current database's name.
- 
-
-   
-

 CONCURRENTLY
 
@@ -181,6 +168,34 @@ REINDEX [ ( option [, ...] ) ] { IN
  
 

+
+   
+name
+
+ 
+  The name of the specific index, table, or database to be
+  reindexed.  Index and table names can be schema-qualified.
+  Presently, REINDEX DATABASE and REINDEX SYSTEM
+  can only reindex the current database, so their parameter must match
+  the current database's name.
+ 
+
+   
+
+   
+boolean
+
+ 
+  Specifies whether the selected option should be turned on or off.
+  You can write TRUE, ON, or
+  1 to enable the option, and FALSE,
+  OFF, or 0 to disable it.  The
+  boolean value can also
+  be omitted, in which case TRUE is assumed.
+ 
+
+   
+
   
  
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0d647e912c..a42dfe98f4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_am.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
+#include "commands/defrem.h"
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #i

Re: [NBTREE] Possible NULL pointer dereference (backend/access/nbtree/nbutils.c)

2020-09-02 Thread Peter Geoghegan

On Wed, Sep 2, 2020 at 3:16 PM Ranier Vilela  wrote:
> Oh, I'm sorry, I thought that "hard crash" was a bad thing.

I think that you are being sarcastic here, but just in case I'm wrong
I'll clarify what I meant: a good sign that a static analysis tool has
produced a useless complaint is that it appears to prove something far
stronger than you first thought possible. You should look out for
that. This is just a heuristic, but in practice it is a good one.
Perhaps you recall our discussion of a similar false positive in
nbtsplitloc.c; that had a similar feel to it. For example, if your
static analysis tool says that code that has been around for many
years is riddled with bugs, then the problem is almost certainly with
the tool (or with how the tool has been used).

High performance C code very often relies on representational
invariants that may not be readily apparent to a compiler --
especially when dealing with on-disk state. Barring some obvious
problem like a hard crash, you have to do the work of assessing if the
code is correct based on the actual assumptions/context of the code. A
static analysis tool is of very little use if you're not going to put
the work in. I went to a lot of trouble to clearly document the code
in question here (the heap TID stuff in _bt_truncate()) -- did you
even bother to read those comments once?

-- 
Peter Geoghegan

Re: Two fsync related performance issues?

2020-09-02 Thread Thomas Munro

On Wed, May 27, 2020 at 12:31 AM Craig Ringer  wrote:
> On Tue, 12 May 2020, 08:42 Paul Guo,  wrote:
>> 1. StartupXLOG() does fsync on the whole data directory early in the crash 
>> recovery. I'm wondering if we could skip some directories (at least the 
>> pg_log/, table directories) since wal, etc could ensure consistency. Here is 
>> the related code.
>>
>>   if (ControlFile->state != DB_SHUTDOWNED &&
>>   ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
>>   {
>>   RemoveTempXlogFiles();
>>   SyncDataDirectory();
>>   }
>
> This would actually be a good candidate for a thread pool. Dispatch sync 
> requests and don't wait. Come back later when they're done.
>
> Unsure if that's at all feasible given that pretty much all the Pg APIs 
> aren't thread safe though. No palloc, no elog/ereport, etc. However I don't 
> think we're ready to run bgworkers or use shm_mq etc at that stage.

We could run auxiliary processes.  I think it'd be very useful if we
had a general purpose worker pool that could perform arbitrary tasks
and could even replace current and future dedicated launcher and
worker processes, but in this case I think the problem is quite
closely linked to infrastructure that we already have.  I think we
should:

1.  Run the checkpointer during crash recovery (see
https://commitfest.postgresql.org/29/2706/).
2.  In walkdir(), don't call stat() on all the files, so that we don't
immediately fault in all 42 bazillion inodes synchronously on a cold
system (see 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BFzxupGGN4GpUdbzZN%2Btn6FQPHo8w0Q%2BAPH5Wz8RG%2Bww%40mail.gmail.com).
3.  In src/common/relpath.c, add a function ParseRelationPath() that
does the opposite of GetRelationPath().
4.  In datadir_fsync_fname(), if ParseRelationPath() is able to
recognise a file as being a relation file, build a FileTag and call
RegisterSyncRequest() to tell the checkpointer to sync this file as
part of the end checkpoint (currently the end-of-recovery checkpoint,
but that could also be relaxed).

There are a couple of things I'm not sure about though, which is why I
don't have a patch for 3 and 4:
1.  You have to move a few things around to avoid hideous modularity
violations: it'd be weird if fd.c knew how to make md.c's sync
requests.  So you'd need to find a new place to put the crash-recovery
data-dir-sync routine, but then it can't use walkdir().
2.  I don't know how to fit the pre_sync_fname() part into this
scheme.  Or even if you still need it: if recovery is short, it
probably doesn't help much, and if it's long then the kernel is likely
to have started writing back before the checkpoint anyway due to dirty
writeback timer policies.  On the other hand, it'd be nice to start
the work of *opening* the files sooner than the the start of the
checkpoint, if cold inode access is slow, so perhaps a little bit more
infrastructure is needed; a new way to queue a message for the
checkpointer that says "hey, why don't you start presyncing stuff".
On the third hand, it's arguably better to wait for more pages to get
dirtied by recovery before doing any pre-sync work anyway, because the
WAL will likely be redirtying the same pages again we'd ideally not
like to write our data out twice, which is one of the reasons to want
to collapse the work into the next checkpoint.  So I'm not sure what
the best plan is here.

As I mentioned earlier, I think it's also possible to do smarter
analysis based on WAL information so that we don't even need to open
all 42 bazillion files, but just the ones touched since the last
checkpoint, if you're prepared to ignore the
previously-running-with-fsync=off scenario Robert mentioned.  I'm not
too sure about that.  But something like the above scheme would at
least get some more concurrency going, without changing the set of
hazards we believe our scheme protects against (I mean, you could
argue that SyncDataDirectory() is a bit like using a sledgehammer to
crack an unspecified nut, and then not even quite hitting it if it's a
Linux nut, but I'm trying to practise kai-zen here).

Re: list of extended statistics on psql

2020-09-02 Thread Tatsuro Yamada


Hi,


 >> I wonder if trying to list info about all stats from the statistics
 >> object in a single line is necessary. Maybe we should split the info
 >> into one line per statistics, so for example
 >>
 >>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
 >>
 >> would result in three lines in the \dX output. The statistics name would
 >> identify which lines belong together, but other than that the pieces are
 >> mostly independent.
 >
 >Yeah, that's what I'm suggesting.  I don't think we need to repeat the
 >name/definition for each line though.
 >
 >It might be useful to know how does pspg show a single entry that's
 >split in three lines, though.
 >

Ah, I didn't realize you're proposing that - I assumed it's broken
simply to make it readable, or something like that. I think the lines
are mostly independent, so I'd suggest to include the name of the object
on each line. The question is whether this independence will remain true
in the future - for example histograms would be built only on data not
represented by the MCV list, so there's a close dependency there.

Not sure about pspg, and I'm not sure it matters too much.


pspg almost ignores multiline rows - the horizontal cursor is one row every 
time. There is only one use case where pspg detects multiline rows - sorts, and 
pspg ensures correct content for multiline rows displayed in different (than 
input) order.




I try to summarize the discussion so far.
Is my understanding right? Could you revise it if it has something wrong?


* Summary

  1. "\dX[+]" doesn't display the Size of extended stats since the size is
  useful only for the development process of the stats.

  2. each row should have stats name, definition, type, and status.
 For example:

 statname |   definition | type  |
--+--+---+
 someobj  | (a, b) FROM tab  | n-distinct: built |
 someobj  | (a, b) FROM tab  | func-dependencies: built  |
 someobj  | (a, b) FROM tab  | mcv: built|
 sttshoge | (a, b) FROM hoge | n-distinct: required  |
 sttshoge | (a, b) FROM hoge | func-dependencies:required|
 sttscross| (a, b) FROM t1,t2| n-distinct: required  |


My opinion is below:

  For 1., Agreed. I will remove it on the next patch.
  For 2., I feel the design is not beautiful so I'd like to change it.
The reasons are:

- I think that even if we expected the number of types increasing two times,
   each type would be better to put as columns, not lines.
  Repeating items (the stats name and definition) should be removed.
  It's okay there are many columns in the future like "\df+" because we can
  use "\x" mode to display if we need it.

- The type column has two kinds of data, the one is stats type and another
  is status. We know the word "One fact in One place" for data modeling in
  the RDBMS world so it would be better to divide it.
  I'd like to suggest the bellow design of the view.

 statname |   definition | n-distinct | func-dependencies | mcv   |
--+--++---+---|
 someobj  | (a, b) FROM tab  | built  | built | built |
 sttshoge | (a, b) FROM hoge | required   | required  |   |
 sttscross| (a, b) FROM t1,t2| required   |   |   |


Any thoughts?


Thanks,
Tatsuro Yamada

Re: Two fsync related performance issues?

2020-09-02 Thread Thomas Munro

On Tue, May 12, 2020 at 12:43 PM Paul Guo  wrote:
> 2.  CheckPointTwoPhase()
>
> This may be a small issue.
>
> See the code below,
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> RecreateTwoPhaseFile(gxact->xid, buf, len);
>
> RecreateTwoPhaseFile() writes a state file for a prepared transaction and 
> does fsync. It might be good to do fsync for all files once after writing 
> them, given the kernel is able to do asynchronous flush when writing those 
> file contents. If the TwoPhaseState->numPrepXacts is large we could do 
> batching to avoid the fd resource limit. I did not test them yet but this 
> should be able to speed up checkpoint/restartpoint a bit.

Hi Paul,

I hadn't previously focused on this second part of your email.  I
think the fsync() call in RecreateTwoPhaseFile() might be a candidate
for processing by the checkpoint code through the new facilities in
sync.c, which effectively does something like what you describe.  Take
a look at the code I'm proposing for slru.c in
https://commitfest.postgresql.org/29/2669/ and also in md.c.  You'd
need a way to describe the path of these files in a FileTag struct, so
you can defer the work; it will invoke your callback to sync the file
as part of the next checkpoint (or panic if it can't).  You also need
to make sure to tell it to forget the sync request before you unlink
the file.  Although it still has to call fsync one-by-one on the files
at checkpoint time, by deferring it until then there is a good chance
the kernel has already done the work so you don't have to go off-CPU
at all. I hope that means we'd often never have to perform the fsync
at all, because the file is usually gone before we reach the
checkpoint.  Do I have that right?

Re: Disk-based hash aggregate's cost model

2020-09-02 Thread Jeff Davis

On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
> for the spilling case. In particular, the initial section computing
> startup_cost seems to ignore that we may need to so some of the stuff
> repeatedly - for example we'll repeat hash lookups for spilled
> tuples,
> and so on.

I tried to analyze this using a slightly different approach: cost units
per second of runtime. Obviously this will vary based on the machine,
but it's interesting anyway.

All of the following fit in system memory. Schema, data, and queries
are at the end of this email.

A low value of cost-units/second-runtime means "more likely to be
chosen incorrectly" and a high value means "more likely to be missed
incorrectly".

Plan| work_mem | 10M  | 100M INT4| 100M | 10M
|  | INT4 | (10M groups) | INT4 | TEXT
+--+--+--+--+-
HashAgg | 4MB  |  88  |  63  |  82  | 78
HashAgg | 1TB  |  41  |  37  |  33  | 38
Sort| 4MB  | 182  | 188  | 174  | 37
Sort| 1TB  | 184  | 231  | 189  | 30

Sort is consistent for a given datatype, but it seems that the
comparison cost is far from proportionate between data types.

HashAgg is consistent between the data types, but the disk costs play a
larger role (in this case, there is no actual disk access to worry
about, because it fits in system memory).

At least for these simple queries, Sort is punished unfairly for INT4,
but gets an unfair boost for TEXT.

It seems that we should make a change here, but to be conservative for
13, we should limit changes to the new plans, which are the first line
(HashAgg that spills).

The attached patch makes two adjustments: a 2X disk penalty for
HashAgg, and I also add:

  spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost

The new results:

Plan| work_mem | 10M  | 100M INT4| 100M | 10M
|  | INT4 | (10M groups) | INT4 | TEXT
+--+--+--+--+-
HashAgg | 4MB  | 192  | 131  | 178  | 166

That's much more comparable to Sort for INT4, but makes the gap wider
for TEXT. Fortunately, at least for my simple queries, it just barely
avoids a plan change to Sort for the TEXT case (which is nearly 5X
slower than HashAgg).

I think this approach is reasonable for 13: it only changes the costing
for HashAgg plans that are expected to spill, it's fairly conservative
so we will not get lots of new bad plans, and it still seems to use
HashAgg for cases where it clearly should.

Note: in-memory HashAgg is still unfairly favored over Sort, at least
for these cases.

Regards,
Jeff Davis

create table t10m(i int);
insert into t10m select (random()*10)::int from
generate_series(1,1000);
explain analyze select distinct i from t10m;

create table t100m(i int);
insert into t100m select (random()*20)::int from
generate_series(1,1);
explain analyze select distinct i from t100m;

-- 100m tuples, 10m groups, 10tuples/group
create table t100m_10(i int);
insert into t100m_10 select (random()*1000)::int from
generate_series(1,1);
explain analyze select distinct i from t100m_10;

create table text10m(t text collate "C.UTF-8", i int, n numeric);
insert into text10m select s.g::text, s.g, s.g::numeric from (select
(random()*10)::int as g from generate_series(1,1000)) s;
explain analyze select distinct t from text10m;

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 104e779f6ac..f39e6a9f80d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2416,6 +2416,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		double		pages;
 		double		pages_written = 0.0;
 		double		pages_read = 0.0;
+		double		spill_cost;
 		double		hashentrysize;
 		double		nbatches;
 		Size		mem_limit;
@@ -2453,9 +2454,21 @@ cost_agg(Path *path, PlannerInfo *root,
 		pages = relation_byte_size(input_tuples, input_width) / BLCKSZ;
 		pages_written = pages_read = pages * depth;

+		/*
+		 * HashAgg has somewhat worse IO behavior than Sort on typical
+		 * hardware/OS combinations. Account for this with a generic penalty.
+		 */
+		pages_read *= 2.0;
+		pages_written *= 2.0;
+
 		startup_cost += pages_written * random_page_cost;
 		total_cost += pages_written * random_page_cost;
 		total_cost += pages_read * seq_page_cost;
+
+		/* account for CPU cost of spilling a tuple and reading it back */
+		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
+		startup_cost += spill_cost;
+		total_cost += spill_cost;
 	}

 	/*

Re: Disk-based hash aggregate's cost model

2020-09-02 Thread Peter Geoghegan

On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis  wrote:
> create table text10m(t text collate "C.UTF-8", i int, n numeric);
> insert into text10m select s.g::text, s.g, s.g::numeric from (select
> (random()*10)::int as g from generate_series(1,1000)) s;
> explain analyze select distinct t from text10m;

Note that you won't get what Postgres considers to be the C collation
unless you specify "collate C" -- "C.UTF-8" is the C collation exposed
by glibc. The difference matters a lot, because only the former can
use abbreviated keys (unless you manually #define TRUST_STRXFRM). And
even without abbreviated keys it's probably still significantly faster
for other reasons.

This doesn't undermine your point, because we don't take the
difference into account in cost_sort() -- even though abbreviated keys
will regularly make text sorts 2x-3x faster. My point is only that it
would be more accurate to say that the costing unfairly boosts sorts
on collated texts specifically. Though maybe not when an ICU collation
is used (since abbreviated keys will be enabled generally).

-- 
Peter Geoghegan

Re: describe-config issue

2020-09-02 Thread vignesh C

On Wed, Sep 2, 2020 at 8:06 PM Tom Lane  wrote:

> Please defend that claim.  Otherwise this seems like a pretty
> random change.

I had seen that there is discrepancy in postgres --describe-config & the
value displayed from pg_settings like in the below case:
postgres=# select name,min_val, max_val, boot_val,reset_val from
pg_settings where name = 'checkpoint_timeout';
name| min_val | max_val | *boot_val | reset_val*
+-+-+--+---
 checkpoint_timeout | 30  | 86400   | *300  | 300*
(1 row)

[vignesh@localhost bin]$ ./postgres --describe-config | grep
checkpoint_timeout
checkpoint_timeout sighup Write-Ahead Log / Checkpoints INTEGER *0 *30
86400 Sets the maximum time between automatic WAL checkpoints.

In the case of pg_settings we display 300 for boot_val/reset_val whereas in
the case of describe-config we display 0, shouldn't it be 300 here?
Thoughts?

 Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: proposal - function string_to_table

2020-09-02 Thread Peter Smith

On Thu, Sep 3, 2020 at 8:30 AM Tom Lane  wrote:
> The proposed tests seemed mighty duplicative, and they even contained
> syntax errors, so I didn't believe that they were carefully considered.

Can you please share examples of what syntax errors were in those
previous tests?

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: proposal - function string_to_table

2020-09-02 Thread Tom Lane

Peter Smith  writes:
> On Thu, Sep 3, 2020 at 8:30 AM Tom Lane  wrote:
>> The proposed tests seemed mighty duplicative, and they even contained
>> syntax errors, so I didn't believe that they were carefully considered.

> Can you please share examples of what syntax errors were in those
> previous tests?

At about line 415 of string_to_table-20200825.patch:

+select v, v is null as "is null" from string_to_table('1,2,3,4,,6', ',') g(v) 
g(v);
+ERROR:  syntax error at or near "g"
+LINE 1: ..."is null" from string_to_table('1,2,3,4,,6', ',') g(v) g(v);
+  ^

Without the duplicate "g(v)", this is identical to the preceding test
case.

regards, tom lane

Re: Refactor ReindexStmt and its "concurrent" boolean

2020-09-02 Thread Michael Paquier

On Wed, Sep 02, 2020 at 01:17:32PM +0200, Julien Rouhaud wrote:
> Shouldn't options be an int?  The rest of the patch looks good to me.

It should, thanks for looking at it.  Let's wait a couple of days and
see if others have any comments.  If there are no objections, I'll try
to commit this one.
--
Michael
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..3129b684f6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -34,10 +34,10 @@ extern ObjectAddress DefineIndex(Oid relationId,
  bool check_not_in_use,
  bool skip_build,
  bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent);
-extern Oid	ReindexTable(RangeVar *relation, int options, bool concurrent);
+extern void ReindexIndex(RangeVar *indexRelation, int options);
+extern Oid	ReindexTable(RangeVar *relation, int options);
 extern void ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
-  int options, bool concurrent);
+  int options);
 extern char *makeObjectName(const char *name1, const char *name2,
 			const char *label);
 extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d52c563305..e83329fd6d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3353,6 +3353,7 @@ typedef struct ConstraintsSetStmt
 #define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */
 #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK (1 << 2)	/* skip missing relations */
+#define REINDEXOPT_CONCURRENTLY (1 << 3)	/* concurrent mode */
 
 typedef enum ReindexObjectType
 {
@@ -3371,7 +3372,6 @@ typedef struct ReindexStmt
 	RangeVar   *relation;		/* Table or index to reindex */
 	const char *name;			/* name of database to reindex */
 	int			options;		/* Reindex options flags */
-	bool		concurrent;		/* reindex concurrently? */
 } ReindexStmt;
 
 /* --
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b3a92381f9..9e1566812d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -97,7 +97,7 @@ static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
  */
 struct ReindexIndexCallbackState
 {
-	bool		concurrent;		/* flag from statement */
+	int			options;			/* options from statement */
 	Oid			locked_table_oid;	/* tracks previously locked table */
 };
 
@@ -2420,7 +2420,7 @@ ChooseIndexColumnNames(List *indexElems)
  *		Recreate a specific index.
  */
 void
-ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
+ReindexIndex(RangeVar *indexRelation, int options)
 {
 	struct ReindexIndexCallbackState state;
 	Oid			indOid;
@@ -2437,10 +2437,11 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 	 * upgrade the lock, but that's OK, because other sessions can't hold
 	 * locks on our temporary table.
 	 */
-	state.concurrent = concurrent;
+	state.options = options;
 	state.locked_table_oid = InvalidOid;
 	indOid = RangeVarGetRelidExtended(indexRelation,
-	  concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+	  (options & REINDEXOPT_CONCURRENTLY) != 0 ?
+	  ShareUpdateExclusiveLock : AccessExclusiveLock,
 	  0,
 	  RangeVarCallbackForReindexIndex,
 	  &state);
@@ -2460,7 +2461,8 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 	persistence = irel->rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	if (concurrent && persistence != RELPERSISTENCE_TEMP)
+	if ((options & REINDEXOPT_CONCURRENTLY) != 0 &&
+		persistence != RELPERSISTENCE_TEMP)
 		ReindexRelationConcurrently(indOid, options);
 	else
 		reindex_index(indOid, false, persistence,
@@ -2485,7 +2487,8 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 	 * non-concurrent case and table locks used by index_concurrently_*() for
 	 * concurrent case.
 	 */
-	table_lockmode = state->concurrent ? ShareUpdateExclusiveLock : ShareLock;
+	table_lockmode = ((state->options & REINDEXOPT_CONCURRENTLY) != 0) ?
+		ShareUpdateExclusiveLock : ShareLock;
 
 	/*
 	 * If we previously locked some other index's heap, and the name we're
@@ -2542,7 +2545,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation, int options, bool concurrent)
+ReindexTable(RangeVar *relation, int options)
 {
 	Oid			heapOid;
 	bool		result;
@@ -2556,11 +2559,13 @@ ReindexTable(RangeVar *relation, int options, bool concurrent)
 	 * locks on our temporary table.
 	 */
 	heapOid = RangeVarGetRelidExtended(relation,
-	   concurrent ? ShareUpdateExclusiveLock : ShareLock,
+	   (options & REINDEXOPT_CONCURRENTLY) != 0 ?
+	   ShareUpdateEx

Re: proposal - function string_to_table

2020-09-02 Thread Pavel Stehule

čt 3. 9. 2020 v 0:30 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > [ string_to_table-20200825.patch ]
>
> I reviewed this, whacked it around a little, and pushed it.
>
> Possibly the most controversial thing I did was to move the existing
> documentation entry for string_to_array() into the string-functions
> table.  I did not like it one bit that the patch was documenting
> string_to_table() far away from string_to_array(), and on reflection
> I concluded that you'd picked the right place and the issue here is
> that string_to_array() was in the wrong place.
>
> Also, I pared the proposed regression tests a great deal, ending up
> with something that matches the existing tests for string_to_array().
> The proposed tests seemed mighty duplicative, and they even contained
> syntax errors, so I didn't believe that they were carefully considered.
>

Thank you

Pavel


> regards, tom lane
>

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Peter Eisentraut


On 2020-09-02 22:43, Jesse Zhang wrote:

| conftest.c:184:3: error: implicitly declaring library function
'exit' with type 'void (int) __attribute__((noreturn))'
[-Werror,-Wimplicit-function-declaration]
|   exit(! does_int64_work());
|   ^
| conftest.c:184:3: note: include the header  or explicitly
provide a declaration for 'exit'
| 2 warnings and 1 error generated.


Where did the -Werror come from?

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A micro-optimisation for walkdir()

2020-09-02 Thread Thomas Munro

On Thu, Sep 3, 2020 at 3:52 AM Juan José Santamaría Flecha
 wrote:
> On Wed, Sep 2, 2020 at 4:35 PM Tom Lane  wrote:
>> Thomas Munro  writes:
>> > You don't need to call stat() just to find out if a dirent is a file
>> > or directory, most of the time.  Please see attached.
>>
>> Hm.  If we do this, I can see wanting to apply the knowledge in more
>> places than walkdir().

Good idea.  Here's a new version that defines a new function
get_dirent_type() in src/common/file_utils_febe.c and uses it for both
frontend and backend walkdir().

> Win32 could also benefit from this micro-optimisation if we expanded the 
> dirent port to include d_type. Please find attached a patch that does so.

Is GetFileAttributes() actually faster than stat()?  Oh, I see that
our pgwin32_safestat() makes extra system calls.  Huh.  Ok then.
Thanks!
From 19a5c980370f89d1340cdec7d74659715d8b4a17 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 3 Sep 2020 13:58:17 +1200
Subject: [PATCH v2 1/4] Skip unnecessary stat() calls in walkdir().
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some kernels can tell us the type of a "dirent", so we can avoid a call
to stat() or lstat() in many cases.  In order to be able to apply this
change to both frontend and backend versions of walkdir(), define a new
function get_dirent_type() in a new translation unit file_utils_febe.c.

Reviewed-by: Tom Lane 
Reviewed-by: Juan José Santamaría Flecha 
Discussion: https://postgr.es/m/CA%2BhUKG%2BFzxupGGN4GpUdbzZN%2Btn6FQPHo8w0Q%2BAPH5Wz8RG%2Bww%40mail.gmail.com
---
 src/backend/storage/file/fd.c   | 26 +--
 src/common/Makefile |  1 +
 src/common/file_utils.c | 28 +---
 src/common/file_utils_febe.c| 81 +
 src/include/common/file_utils.h | 21 -
 src/tools/msvc/Mkvcbuild.pm |  2 +-
 6 files changed, 126 insertions(+), 33 deletions(-)
 create mode 100644 src/common/file_utils_febe.c

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f376a97ed6..b9acd38d23 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -89,6 +89,7 @@
 #include "access/xlog.h"
 #include "catalog/pg_tablespace.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "portability/mem.h"
@@ -3340,8 +3341,6 @@ walkdir(const char *path,
 	while ((de = ReadDirExtended(dir, path, elevel)) != NULL)
 	{
 		char		subpath[MAXPGPATH * 2];
-		struct stat fst;
-		int			sret;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -3351,23 +3350,22 @@ walkdir(const char *path,
 
 		snprintf(subpath, sizeof(subpath), "%s/%s", path, de->d_name);
 
-		if (process_symlinks)
-			sret = stat(subpath, &fst);
-		else
-			sret = lstat(subpath, &fst);
-
-		if (sret < 0)
+		switch (get_dirent_type(subpath, de, process_symlinks))
 		{
+		case PGFILETYPE_REG:
+			(*action) (subpath, false, elevel);
+			break;
+		case PGFILETYPE_DIR:
+			walkdir(subpath, action, false, elevel);
+			break;
+		case PGFILETYPE_ERROR:
 			ereport(elevel,
 	(errcode_for_file_access(),
 	 errmsg("could not stat file \"%s\": %m", subpath)));
-			continue;
+			break;
+		default:
+			break;
 		}
-
-		if (S_ISREG(fst.st_mode))
-			(*action) (subpath, false, elevel);
-		else if (S_ISDIR(fst.st_mode))
-			walkdir(subpath, action, false, elevel);
 	}
 
 	FreeDir(dir);/* we ignore any error here */
diff --git a/src/common/Makefile b/src/common/Makefile
index 16619e4ba8..aac92aabe1 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	exec.o \
 	f2s.o \
 	file_perm.o \
+	file_utils_febe.o \
 	hashfn.o \
 	ip.o \
 	jsonapi.o \
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index a2faafdf13..5a427e246c 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -2,7 +2,7 @@
  *
  * File-processing utility routines.
  *
- * Assorted utility functions to work on files.
+ * Assorted utility functions to work on files, frontend only.
  *
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
@@ -27,7 +27,6 @@
 #include "common/file_utils.h"
 #include "common/logging.h"
 
-
 /* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
 #if defined(HAVE_SYNC_FILE_RANGE)
 #define PG_FLUSH_DATA_WORKS 1
@@ -167,8 +166,6 @@ walkdir(const char *path,
 	while (errno = 0, (de = readdir(dir)) != NULL)
 	{
 		char		subpath[MAXPGPATH * 2];
-		struct stat fst;
-		int			sret;
 
 		if (strcmp(de->d_name, ".") == 0 ||
 			strcmp(de->d_name, "..") == 0)
@@ -176,21 +173,20 @@ walkdir(const char *path,
 
 		snprintf(subpath, sizeof(subpath), "%s/%s", path, de->d_name);
 
-		if (process_symlinks)
-			sret = stat(subpath, &fst);
-		else
-			sret = lstat(subpath, &fst);
-
-		if (sret < 0)
+		switch (get_dirent_type(subpath, de, process_symlinks))
 		{
-			pg_log_error("could not stat file \"%s\": %m", subpath);
-			continue;
-

Re: Ideas about a better API for postgres_fdw remote estimates

2020-09-02 Thread Andrey V. Lepikhov


On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
 wrote:


Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.


In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.
2. Serialized stat contains 'version' field to indicate format of 
statistic received.
3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of 
implicit analysis of the relation.


I am currently keeping limitation of using the approach for regular 
relations only, because i haven't studied the specifics of another types 
of relations.

But I don't know any reason to keep this limit in the future.

The patch in attachment is very raw. I publish for further substantive 
discussion.


--
regards,
Andrey Lepikhov
Postgres Professional
From 9cfd9b8a43691f1dacd0967dcc32fdf8414ddb56 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Aug 2020 09:29:37 +0500
Subject: [PATCH] Pull statistic for a foreign table from remote server.

Add the extract_relation_statistics() routine that convert statistics
on the relation into json format.
All OIDs - starelid, staop[] stacoll[] is converted to a portable
representation. Operation uniquely defined by set of features:
namespace, operator name, left operator namespace and name, right
operator namespace and name.
Collation uniquely defined by namespace, collation name and encoding.
New fdw API routine GetForeignRelStat() implements access to this
machinery and returns JSON string to the caller.
This function is called by ANALYZE command (without explicit relation
name) as an attempt to reduce the cost of updating statistics.
If attempt fails, ANALYZE go the expensive way. Add this feature into
the VACUUM ANALYZE and autovacuum.
In accordance with discussion [1] move the extract_relation_statistics()
routine into the core.

ToDo: tests on custom operations and collations.

1. https://www.postgresql.org/message-id/flat/1155731.1593832096%40sss.pgh.pa.us
---
 contrib/postgres_fdw/Makefile |   2 +-
 contrib/postgres_fdw/deparse.c|   8 +
 .../postgres_fdw/expected/foreign_stat.out| 112 +++
 contrib/postgres_fdw/postgres_fdw.c   |  49 ++
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/sql/foreign_stat.sql |  46 +
 src/backend/commands/analyze.c| 794 ++
 src/backend/commands/vacuum.c |  13 +-
 src/backend/utils/adt/json.c  |   6 +
 src/backend/utils/cache/lsyscache.c   | 167 
 src/include/catalog/pg_proc.dat   |   3 +
 src/include/catalog/pg_statistic.h|   1 +
 src/include/foreign/fdwapi.h  |   2 +
 src/include/utils/json.h  |   1 +
 src/include/utils/lsyscache.h |   8 +
 15 files changed, 1211 insertions(+), 2 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/foreign_stat.out
 create mode 100644 contrib/postgres_fdw/sql/foreign_stat.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..a5a838b8fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw foreign_stat
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..e63cb4982f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2053,6 +2053,14 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 	appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }
 
+void
+deparseGetStatSql(StringInfo buf, Relation rel)
+{
+	appendStringInfo(buf, "SELECT * FROM extract_relation_statistics('");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, "');");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
new file mode 100644
index 00..46fd2f8427
--- /dev/null
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -0,0 +1,112 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+VACUUM ANALYZE;
+-- Check statistic interface r

Re: Switch to multi-inserts for pg_depend

2020-09-02 Thread Michael Paquier

On Tue, Sep 01, 2020 at 11:53:36AM +0200, Daniel Gustafsson wrote:
> On 14 Aug 2020, at 20:23, Alvaro Herrera  wrote:
> 
>> The logic to keep track number of used slots used is baroque, though -- that
>> could use a lot of simplification.
> 
> What if slot_init was an integer which increments together with the loop
> variable until max_slots is reached?  If so, maybe it should be renamed
> slot_init_count and slotCount renamed slot_stored_count to make the their use
> clearer?

Good idea, removing slot_init looks like a good thing for readability.
And the same can be done for pg_shdepend.

> On 31 Aug 2020, at 09:56, Michael Paquier  wrote:
> +#define MAX_CATALOG_INSERT_BYTES 65535
> This name, and inclusion in a headerfile, implies that the definition is
> somewhat generic as opposed to its actual use.  Using MULTIINSERT rather than
> INSERT in the name would clarify I reckon.

Makes sense, I have switched to MAX_CATALOG_MULTI_INSERT_BYTES. 

> A few other comments:
> 
> + /*
> +  * Allocate the slots to use, but delay initialization until we know 
> that
> +  * they will be used.
> +  */
> I think this comment warrants a longer explanation on why they wont all be
> used, or perhaps none of them (which is the real optimization win here).

Okay, I have updated the comments where this formulation is used.
Does that look adapted to you?

> + ObjectAddresses *addrs_auto;
> + ObjectAddresses *addrs_normal;
> It's not for this patch, but it seems a logical next step would be to be able
> to record the DependencyType as well when collecting deps rather than having 
> to
> create multiple buckets.

Yeah, agreed.  I am not sure yet how to design those APIs.  One option
is to use a set of an array with DependencyType elements, each one
storing a list of dependencies of the same type.

> + /* Normal dependency from a domain to its collation. */
> + /* We know the default collation is pinned, so don't bother recording 
> it */
> It's just moved and not introduced in this patch, but shouldn't these two 
> lines
> be joined into a normal block comment?

Okay, done.
--
Michael
From 71cfefec57388d7d90ae169a8122bdb97a7bcd14 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 3 Sep 2020 14:23:48 +0900
Subject: [PATCH v3 1/2] Use multi-inserts for pg_depend

This is a follow-up of the work done in e3931d01.  This case is a bit
different than pg_attribute and pg_shdepend: the maximum number of items
to insert is known in advance, but there is no need to handle pinned
dependencies.  Hence, the base allocation for slots is done based on the
number of items and the maximum allowed with a cap at 64kB, and items
are initialized once used to minimize the overhead of the operation.

Some of the multi-insert logic is also simplified for pg_shdepend, as
per the feedback discussed for this specific patch.

Author: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/XXX
---
 src/include/catalog/indexing.h|  6 +++
 src/backend/catalog/heap.c|  8 +--
 src/backend/catalog/pg_depend.c   | 88 +++
 src/backend/catalog/pg_shdepend.c | 66 +++
 4 files changed, 106 insertions(+), 62 deletions(-)

diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a7e2a9b26b..d86729dc6c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -29,6 +29,12 @@
  */
 typedef struct ResultRelInfo *CatalogIndexState;
 
+/*
+ * Cap the maximum amount of bytes allocated for multi-inserts with system
+ * catalogs, limiting the number of slots used.
+ */
+#define MAX_CATALOG_MULTI_INSERT_BYTES 65535
+
 /*
  * indexing.c prototypes
  */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index abd5bdb866..1201a93361 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -709,12 +709,6 @@ CheckAttributeType(const char *attname,
 	}
 }
 
-/*
- * Cap the maximum amount of bytes allocated for InsertPgAttributeTuples()
- * slots.
- */
-#define MAX_PGATTRIBUTE_INSERT_BYTES 65535
-
 /*
  * InsertPgAttributeTuples
  *		Construct and insert a set of tuples in pg_attribute.
@@ -750,7 +744,7 @@ InsertPgAttributeTuples(Relation pg_attribute_rel,
 
 	/* Initialize the number of slots to use */
 	nslots = Min(tupdesc->natts,
- (MAX_PGATTRIBUTE_INSERT_BYTES / sizeof(FormData_pg_attribute)));
+ (MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_attribute)));
 	slot = palloc(sizeof(TupleTableSlot *) * nslots);
 	for (int i = 0; i < nslots; i++)
 		slot[i] = MakeSingleTupleTableSlot(td, &TTSOpsHeapTuple);
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 70baf03178..406e36ec00 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -59,10 +59,11 @@ recordMultipleDependencies(const ObjectAddress *depender,
 {
 	Relation	dependDesc;
 	CatalogIndexState indstate;
-	HeapTuple	tup;
-	int			i;
-	bool		nulls[Natts_p

Re: A micro-optimisation for walkdir()

2020-09-02 Thread Tom Lane

Thomas Munro  writes:
>> On Wed, Sep 2, 2020 at 4:35 PM Tom Lane  wrote:
>>> Hm.  If we do this, I can see wanting to apply the knowledge in more
>>> places than walkdir().

> Good idea.  Here's a new version that defines a new function
> get_dirent_type() in src/common/file_utils_febe.c and uses it for both
> frontend and backend walkdir().

Quick thoughts on this patch:

* The API spec for get_dirent_type() needs to say that errno is
meaningful when the return value is PGFILETYPE_ERROR.  That's
something that would not be hard to break, so not documenting
the point at all doesn't seem great.  More generally, I don't
especially like having the callers know that the errno is from
stat() rather than something else.

* I don't quite like the calling code you have that covers some
return values and then has a default: case without any comment.
It's not really obvious that the default: case is expected to be
hit in non-error situations, especially when there is a separate
switch path for errors.  I can't find fault with the code as such,
but I think it'd be good to have a comment there.  Maybe along
the lines of "Ignore file types other than regular files and
directories".

Both of these concerns would abate if we had get_dirent_type()
just throw an error itself when stat() fails, thereby removing the
PGFILETYPE_ERROR result code.  I'm not 100% sold either way on
that, but it's something to think about.  Is there ever going
to be a reason for the caller to ignore an error?

regards, tom lane

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Tom Lane

Peter Eisentraut  writes:
> On 2020-09-02 22:43, Jesse Zhang wrote:
>> | conftest.c:184:3: error: implicitly declaring library function
>> 'exit' with type 'void (int) __attribute__((noreturn))'
>> [-Werror,-Wimplicit-function-declaration]

> Where did the -Werror come from?

Peter wasn't entirely explicit here, but note the advice at the end of

https://www.postgresql.org/docs/devel/install-procedure.html

that you cannot include -Werror in any CFLAGS you tell configure
to use.  It'd be nice if autoconf was more robust about that,
but it is not our bug to fix.

regards, tom lane

Re: Get memory contexts of an arbitrary backend process

2020-09-02 Thread torikoshia


On 2020-09-01 03:29, Pavel Stehule wrote:

Hi

po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito
 napsal:


Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia
 wrote:

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

+1


Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.

Thanks, it's a very good patch for discussion.


It doesn't display contexts of all the backends but only
the contexts of specified process.

or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.


The rough idea of implementation is like below:

1. send  a signal to the specified process
2. signaled process dumps its memory contexts to a file
3. read the dumped file and display it to the user

I agree with the overview of the idea.
Here are some comments and questions.


Thanks for the comments!



- Currently, "the signal transmission for dumping memory
information"
and "the read & output of dump information "
are on the same interface, but I think it would be better to
separate them.
How about providing the following three types of functions for
users?
- send a signal to specified pid
- check the status of the signal sent and received
- read the dumped information


Is this for future extensibility to make it possible to get
other information like the current execution plan which was
suggested by Pavel?

If so, I agree with considering extensibility, but I'm not
sure it's necessary whether providing these types of
functions for 'users'.


- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
Therefore, it would be good to have management information to
check
the status of each process.
A simple idea is that ..
- send a signal to dump to a PID, it first record following
information into the shared hash.
pid (specified pid)
loc (dump location, currently might be ASAP)
recv (did the pid process receive a signal? first false)
dumped (did the pid process dump a mem information?  first
false)
- specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
- specified process finish dump mem information,  update the
status
in the shared hash.


Adding management information on shared memory seems necessary
when we want to have more controls over dumping like 'dump
location' or any other information such as 'current execution
plan'.
I'm going to consider this.



- Does it allow one process to output multiple dump files?
It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?


 For a very long time there has been similar discussion about taking
session query and session execution plans from other sessions.

I am not sure how necessary information is in the memory dump, but I
am sure so taking the current execution plan and complete text of the
current query is pretty necessary information.

but can be great so this infrastructure can be used for any debugging
purpose.


Thanks!
It would be good if some part of this effort can be an infrastructure
of other debugging.
It may be hard, but I will keep your comment in mind.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Regards

Pavel


Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com [1]



Links:
--
[1] http://gmail.com

Re: A micro-optimisation for walkdir()

2020-09-02 Thread Thomas Munro

On Thu, Sep 3, 2020 at 5:36 PM Tom Lane  wrote:
> [request for better comments]

Ack.

> Both of these concerns would abate if we had get_dirent_type()
> just throw an error itself when stat() fails, thereby removing the
> PGFILETYPE_ERROR result code.  I'm not 100% sold either way on
> that, but it's something to think about.  Is there ever going
> to be a reason for the caller to ignore an error?

Hmm.  Well I had it like that in an earlier version, but then I
couldn't figure out the right way to write code that would work in
both frontend and backend code, without writing two copies in two
translation units, or putting the whole thing in a header.  What
approach do you prefer?

Re: Get memory contexts of an arbitrary backend process

2020-09-02 Thread torikoshia


Thanks for reviewing!

I'm going to modify the patch according to your comments.

On 2020-09-01 10:54, Andres Freund wrote:

Hi,

On 2020-08-31 20:22:18 +0900, torikoshia wrote:

After commit 3e98c0bafb28de, we can display the usage of the
memory contexts using pg_backend_memory_contexts system
view.

However, its target is limited to the  process attached to
the current session.

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.


Awesome!



It doesn't display contexts of all the backends but only
the contexts of specified process.
I think it would be enough because I suppose this function
is used after investigations using ps command or other OS
level utilities.


It can be used as a building block if all are needed. Getting the
infrastructure right is the big thing here, I think. Adding more
detailed views on top of that data later is easier.



diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql

index a2d61302f9..88fb837ecd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;

 CREATE VIEW pg_backend_memory_contexts AS
-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(-1);


-1 is odd. Why not use NULL or even 0?


+   else
+   {
+   int rc;
+   int parent_len = strlen(parent);
+   int name_len = strlen(name);
+
+   /*
+* write out the current memory context information.
+* Since some elements of values are reusable, we write it out.


Not sure what the second comment line here is supposed to mean?



+*/
+   fputc('D', fpout);
+   rc = fwrite(values, sizeof(values), 1, fpout);
+   rc = fwrite(nulls, sizeof(nulls), 1, fpout);
+
+		/* write out information which is not resuable from serialized 
values */


s/resuable/reusable/



+   rc = fwrite(&name_len, sizeof(int), 1, fpout);
+   rc = fwrite(name, name_len, 1, fpout);
+   rc = fwrite(&idlen, sizeof(int), 1, fpout);
+   rc = fwrite(clipped_ident, idlen, 1, fpout);
+   rc = fwrite(&level, sizeof(int), 1, fpout);
+   rc = fwrite(&parent_len, sizeof(int), 1, fpout);
+   rc = fwrite(parent, parent_len, 1, fpout);
+   (void) rc;  /* we'll check for 
error with ferror */
+
+   }


This format is not descriptive. How about serializing to json or
something? Or at least having field names?

Alternatively, build the same tuple we build for the SRF, and serialize
that. Then there's basically no conversion needed.


@@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate 
*tupstore,

 Datum
 pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
 {
+   int pid =  PG_GETARG_INT32(0);
+
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc   tupdesc;
Tuplestorestate *tupstore;
@@ -147,11 +189,258 @@ 
pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)


MemoryContextSwitchTo(oldcontext);

-   PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-   
TopMemoryContext, NULL, 0);
+   if (pid == -1)
+   {
+   /*
+* Since pid -1 indicates target is the local process, simply
+* traverse memory contexts.
+*/
+   PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
+   TopMemoryContext, 
"", 0, NULL);
+   }
+   else
+   {
+   /*
+* Send signal for dumping memory contexts to the target 
process,
+* and read the dumped file.
+*/
+   FILE   *fpin;
+   chardumpfile[MAXPGPATH];
+
+   SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
+
+   snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
+
+   while (true)
+   {
+   CHECK_FOR_INTERRUPTS();
+
+   pg_usleep(1L);
+


Need better signalling back/forth here.


Do you mean I should also send another signal from the dumped
process to the caller of the pg_get_backend_memory_contexts()
when it finishes dumping?

Regards,



--
Atsushi Torikoshi
NTT DATA CORPORATION






+/*
+ * dump_memory_contexts
+ * Dumping local memory contexts to a file.
+ * 	This function does not delete the file as i

Re: Parallel copy

2020-09-02 Thread Greg Nancarrow

>On Wed, Sep 2, 2020 at 3:40 PM vignesh C  wrote:
> I have attached the scripts that I used for the test results I
> mentioned in my previous mail. create.sql file has the table that I
> used, insert_data_gen.txt has the insert data generation scripts. I
> varied the count in insert_data_gen to generate csv files of 1GB, 2GB
> & 5GB & varied the data to generate 1 char, 10 char & 100 char for
> each column for various testing. You can rename insert_data_gen.txt to
> insert_data_gen.sh & generate the csv file.


Hi Vignesh,

I used your script and table definition, multiplying the number of
records to produce a 5GB and 9.5GB CSV file.
I got the following results:


(1) Postgres default settings, 5GB CSV (53 rows):

Copy TypeDuration (s)   Load factor
===
Normal Copy  132.197 -

Parallel Copy
(#workers)
198.428  1.34
252.753  2.51
337.630  3.51
433.554  3.94
533.636  3.93
633.821  3.91
734.270  3.86
834.465  3.84
934.315  3.85
10   33.543  3.94


(2) Postgres increased resources, 5GB CSV (53 rows):

shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB


Copy TypeDuration (s)   Load factor
===
Normal Copy  131.835 -

Parallel Copy
(#workers)
198.301  1.34
253.261  2.48
337.868  3.48
434.224  3.85
533.831  3.90
634.229  3.85
734.512  3.82
834.303  3.84
934.690  3.80
10   34.479  3.82



(3) Postgres default settings, 9.5GB CSV (100 rows):

Copy TypeDuration (s)   Load factor
===
Normal Copy  248.503 -

Parallel Copy
(#workers)
1185.724 1.34
299.832  2.49
370.560  3.52
463.328  3.92
563.182  3.93
664.108  3.88
764.131  3.87
864.350  3.86
964.293  3.87
10   63.818  3.89


(4) Postgres increased resources, 9.5GB CSV (100 rows):

shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB


Copy TypeDuration (s)   Load factor
===
Normal Copy  248.647-

Parallel Copy
(#workers)
1182.2361.36
292.814 2.68
367.347 3.69
463.839 3.89
562.672 3.97
663.873 3.89
764.930 3.83
863.885 3.89
962.397 3.98
10   64.477 3.86



So as you found, with this particular table definition and data, 1
parallel worker always performs better than normal copy.
The different result obtained for this particular case seems to be
caused by the following factors:
- different table definition (I used a variety of column types)
- amount of data per row (I used less data per row, so more rows per
same size data file)

As I previously observed, if the target table has no indexes,
increasing resources beyond the default settings makes little
difference to the performance.

Regards,
Greg Nancarrow
Fujitsu Australia

66 matches

Mail list logo