Re: Dump public schema ownership & seclabels

2021-05-02 Thread Noah Misch
On Sat, May 01, 2021 at 09:43:36AM -0700, Zhihong Yu wrote:
> On Sat, May 1, 2021 at 8:16 AM Asif Rehman  wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   tested, passed
> > Spec compliant:   tested, passed
> > Documentation:not tested
> >
> > Hi,
> >
> > I have tested this patch. This patch still applies and it works well.
> >
> > Regards,
> > Asif
> >
> > The new status of this patch is: Ready for Committer

Thanks.  Later, I saw that "pg_dump --schema=public" traditionally has yielded
"CREATE SCHEMA public" and "COMMENT ON SCHEMA public".  I've updated the
patches to preserve that behavior.

I'll push this when v15 branches.  I do think it's a bug fix and could argue
for including it in v14.  On the other hand, I mailed three total patch
versions now known to be wrong, so it would be imprudent to count on no
surprises remaining.

> For public-schema-comment-dump-v2.patch :
> 
> +   if (ncomments == 0)
> +   {
> +   comments = _comment;
> +   ncomments = 1;
> +   }
> +   else if (strcmp(comments->descr, (fout->remoteVersion >= 80300 ?
> + "standard public schema" :
> + "Standard public schema")) == 0)
> +   {
> +   ncomments = 0;
> 
> Is it possible that, in the case ncomments > 0, there are more than one
> comment ?

Yes, I think that's normal when the search terms include an objsubid (subid !=
InvalidOid).
Author: Noah Misch 
Commit: Noah Misch 

Dump public schema ownership and security labels.

As a side effect, this corrects dumps of public schema ACLs in databases
where the DBA dropped and recreated that schema.

Reviewed by Asif Rehman.

Discussion: https://postgr.es/m/20201229134924.ga1431...@rfd.leadboat.com

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 60d306e..ea67e52 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -725,6 +725,7 @@ void
 buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
PQExpBuffer init_acl_subquery, PQExpBuffer 
init_racl_subquery,
const char *acl_column, const char *acl_owner,
+   const char *initprivs_expr,
const char *obj_kind, bool binary_upgrade)
 {
/*
@@ -765,23 +766,25 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
  "WITH ORDINALITY AS perm(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS init(init_acl) WHERE acl = 
init_acl)) as foo)",
  acl_column,
  obj_kind,
  acl_owner,
+ initprivs_expr,
  obj_kind,
  acl_owner);
 
printfPQExpBuffer(racl_subquery,
  "(SELECT pg_catalog.array_agg(acl 
ORDER BY row_n) FROM "
  "(SELECT acl, row_n FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "WITH ORDINALITY AS initp(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
  
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS permp(orig_acl) WHERE acl = 
orig_acl)) as foo)",
+ initprivs_expr,
  obj_kind,
  acl_owner,
  acl_column,
@@ -807,12 +810,13 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
printfPQExpBuffer(init_acl_subquery,
  "CASE WHEN privtype = 'e' 
THEN "
  "(SELECT 
pg_catalog.array_agg(acl ORDER BY row_n) FROM "
- "(SELECT acl, row_n FROM 
pg_catalog.unnest(pip.initprivs) "
+ 

Re: WIP: WAL prefetch (another approach)

2021-05-02 Thread Thomas Munro
On Sun, May 2, 2021 at 3:16 PM Tom Lane  wrote:
> That last point means that there was some hard-to-hit problem even
> before any of the recent WAL-related changes.  However, 323cbe7c7
> (Remove read_page callback from XLogReader) increased the failure
> rate by at least a factor of 5, and 1d257577e (Optionally prefetch
> referenced data) seems to have increased it by another factor of 4.
> But it looks like f003d9f87 (Add circular WAL decoding buffer)
> didn't materially change the failure rate.

Oh, wow.  There are several surprising results there.  Thanks for
running those tests for so long so that we could see the rarest
failures.

Even if there are somehow *two* causes of corruption, one preexisting
and one added by the refactoring or decoding patches, I'm struggling
to understand how the chance increases with 1d2575, since that only
adds code that isn't reached when not enabled (though I'm going to
re-review that).

> Considering that 323cbe7c7 was supposed to be just refactoring,
> and 1d257577e is allegedly disabled-by-default, these are surely
> not the results I was expecting to get.

+1

> It seems like it's still an open question whether all this is
> a real bug, or flaky hardware.  I have seen occasional kernel
> freezeups (or so I think -- machine stops responding to keyboard
> or network input) over the past year or two, so I cannot in good
> conscience rule out the flaky-hardware theory.  But it doesn't
> smell like that kind of problem to me.  I think what we're looking
> at is a timing-sensitive bug that was there before (maybe long
> before?) and these commits happened to make it occur more often
> on this particular hardware.  This hardware is enough unlike
> anything made in the past decade that it's not hard to credit
> that it'd show a timing problem that nobody else can reproduce.

Hmm, yeah that does seem plausible.  It would be nice to see a report
from any other system though.  I'm still trying, and reviewing...




Re: Identify missing publications from publisher while create/alter subscription.

2021-05-02 Thread Bharath Rupireddy
On Sun, May 2, 2021 at 10:04 PM vignesh C  wrote:
> > 5) Instead of adding a new file 021_validate_publications.pl for
> > tests, spawning a new test database which would make the overall
> > regression slower, can't we add with the existing database nodes in
> > 0001_rep_changes.pl? I would suggest adding the tests in there even if
> > the number of tests are many, I don't mind.
>
> 001_rep_changes.pl has the changes mainly for checking the replicated
> data. I did not find an appropriate file in the current tap tests, I
> preferred these tests to be in a separate file. Thoughts?

If 001_rep_changes.pl is not the right place, how about adding them
into 007_ddl.pl? That file seems to be only for DDL changes, and since
the feature tests cases are for CREATE/ALTER SUBSCRIPTION, it's the
right place. I strongly feel that we don't need a new file for these
tests.

Comment on the tests:
1) Instead of "pub_doesnt_exist" name, how about "non_existent_pub" or
just pub_non_existent" or some other?
+   "CREATE SUBSCRIPTION testsub2 CONNECTION '$publisher_connstr'
PUBLICATION pub_doesnt_exist WITH (VALIDATE_PUBLICATION = TRUE)"
The error message with this name looks a bit odd to me.
+ /ERROR:  publication "pub_doesnt_exist" does not exist in
the publisher/,

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Replication slot stats misgivings

2021-05-02 Thread Amit Kapila
On Fri, Apr 30, 2021 at 1:47 PM Amit Kapila  wrote:
>
> LGTM. I have slightly edited the comments in the attached. I'll push
> this early next week unless there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Replication slot stats misgivings

2021-05-02 Thread Amit Kapila
On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila  wrote:
>
> On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila  wrote:
> > >
> >
> > > I am not sure if any of these alternatives are a good idea. What do
> > > you think? Do you have any other ideas for this?
> >
> > I've been considering some ideas but don't come up with a good one
> > yet. It’s just an idea and not tested but how about having
> > CreateDecodingContext() register before_shmem_exit() callback with the
> > decoding context to ensure that we send slot stats even on
> > interruption. And FreeDecodingContext() cancels the callback.
> >
>
> Is it a good idea to send stats while exiting and rely on the same? I
> think before_shmem_exit is mostly used for the cleanup purpose so not
> sure if we can rely on it for this purpose. I think we can't be sure
> that in all cases we will send all stats, so maybe Vignesh's patch is
> sufficient to cover the cases where we avoid losing it in cases where
> we would have sent a large amount of data.
>

Sawada-San, any thoughts on this point? Apart from this, I think you
have suggested somewhere in this thread to slightly update the
description of stream_bytes. I would like to update the description of
stream_bytes and total_bytes as below:

stream_bytes
Amount of transaction data decoded for streaming in-progress
transactions to the decoding output plugin while decoding changes from
WAL for this slot. This and other streaming counters for this slot can
be used to tune logical_decoding_work_mem.

total_bytes
Amount of transaction data decoded for sending transactions to the
decoding output plugin while decoding changes from WAL for this slot.
Note that this includes data that is streamed and/or spilled.

This update considers two points:
a. we don't send this data across the network because plugin might
decide to filter this data, ex. based on publications.
b. not all of the decoded changes are sent to plugin, consider
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, etc.

-- 
With Regards,
Amit Kapila.




Re: Identify missing publications from publisher while create/alter subscription.

2021-05-02 Thread Dilip Kumar
On Sun, May 2, 2021 at 10:04 PM vignesh C  wrote:
>
> Thanks for the comments.
> The Attached patch has the fixes for the same.

I was reviewing the documentation part, I think in the below paragraph
we should include validate_publication as well?

   
connect (boolean)

 
  Specifies whether the CREATE SUBSCRIPTION
  should connect to the publisher at all.  Setting this to
  false will change default values of
  enabled, create_slot and
  copy_data to false.
 

I will review/test the other parts of the patch and let you know.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Logical Replication - behavior of TRUNCATE ... CASCADE

2021-05-02 Thread Bharath Rupireddy
Hi,

In apply_handle_truncate, the following comment before ExecuteTruncateGuts
says that it defaults to RESTRICT even if the CASCADE option has been
specified in publisher's TRUNCATE command.
/*
 * Even if we used CASCADE on the upstream primary we explicitly default
 * to replaying changes without further cascading. This might be later
 * changeable with a user specified option.
 */
I tried the following use case to see if that's actually true:
1) Created two tables tbl_pk (primary key), tbl_fk(references tbl_pk
primary key via foreign key) on both publisher and subscriber.
2) In general, TRUNCATE tbl_pk; or TRUNCATE tbl_pk RESTRICT; would fail
because tbl_fk is dependent on tbl_pk.
3) TRUNCATE tbl_pk, tbl_fk; would work because the dependent table is
specified in the command.
4) TRUNCATE tbl_pk CASCADE; would work because of the CASCADE option and
both tbl_pk and tbl_fk are truncated. When this command is run on the
publisher, the CASCADE option is sent to the subscriber, see
DecodeTruncate. But the apply worker ignores it and passes DROP_RESTRICT to
ExecuteTruncateGuts. Therefore, the expectation(per the comment) is that on
the subscriber, the behavior should be equivalent to TRUNCATE tbl_pk;, so
an error is expected. But we are also receiving the tbl_fk in the remote
rels along with tbl_pk, so the behavior is equivalent to (3) and both
tbl_pk and tbl_fk are truncated.

Does the comment still hold true? Does ignoring the CASCADE option make
sense in apply_handle_truncate, as we are receiving all the dependent
relations in the remote rels from the publisher? Am I missing something?

The commit id of the feature "Logical replication support for TRUNCATE" is
039eb6e92f, and adding relevant people in cc.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Re: Remove redundant variable from transformCreateStmt

2021-05-02 Thread Amul Sul
On Fri, Apr 30, 2021 at 10:49 AM Bharath Rupireddy
 wrote:
>
> On Fri, Apr 30, 2021 at 7:07 AM Justin Pryzby  wrote:
> >
> > On Thu, Apr 29, 2021 at 02:39:42PM -0400, Alvaro Herrera wrote:
> > > On 2021-Apr-29, Tom Lane wrote:
> > > > Alvaro Herrera  writes:
> > > > > I'd do it like this.  Note I removed an if/else block in addition to
> > > > > your changes.
> > > >
> > > > > I couldn't convince myself that this is worth pushing though; either 
> > > > > we
> > > > > push it to all branches (which seems unwarranted) or we create
> > > > > back-patching hazards.
> > > >
> > > > Yeah ... an advantage of the if/else coding is that it'd likely be
> > > > simple to extend to cover additional statement types, should we ever
> > > > wish to do that.  The rendering you have here is nice and compact,
> > > > but it would not scale up well.
> > >
> > > That makes sense.  But that part is not in Amul's patch -- he was only
> > > on about removing the is_foreign_table Boolean.  If I remove the if/else
> > > block change, does the rest of the patch looks something we'd want to
> > > have?  I kinda agree that the redundant variable is "ugly".  Is it worth
> > > removing?  My hunch is no.
> >
> > Getting rid of a redundant, boolean variable is good not because it's more
> > efficient but because it's one fewer LOC to read and maintain (and an
> > opportunity for inconsistency, I suppose).
>
> Yes.
>
> > Also, this is a roundabout and too-verbose way to invert a boolean:
> > | transformCheckConstraints(, !is_foreign_table ? true : false);
>
> I agree to remove only the redundant variable, is_foreign_table but
> not the if else block as Tom said: it's not scalable.

+1.

Regards,
Amul




Re: WIP: WAL prefetch (another approach)

2021-05-02 Thread Thomas Munro
On Thu, Apr 29, 2021 at 12:24 PM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2021-04-28 19:24:53 -0400, Tom Lane wrote:
> >> IOW, we've spent over twice as many CPU cycles shipping data to the
> >> standby as we did in applying the WAL on the standby.
>
> > I don't really know how the time calculation works on mac. Is there a
> > chance it includes time spent doing IO?

For comparison, on a modern Linux system I see numbers like this,
while running that 025_stream_rep_regress.pl test I posted in a nearby
thread:

USER PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
tmunro   2150863 22.5  0.0  55348  6752 ?Ss   12:59   0:07
postgres: standby_1: startup recovering 00010002003C
tmunro   2150867 17.5  0.0  55024  6364 ?Ss   12:59   0:05
postgres: standby_1: walreceiver streaming 2/3C675D80
tmunro   2150868 11.7  0.0  55296  7192 ?Ss   12:59   0:04
postgres: primary: walsender tmunro [local] streaming 2/3C675D80

Those ratios are better but it's still hard work, and perf shows the
CPU time is all in page cache schlep:

  22.44%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
  20.12%  postgres  [kernel.kallsyms]   [k] __add_to_page_cache_locked
   7.30%  postgres  [kernel.kallsyms]   [k] iomap_set_page_dirty

That was with all three patches reverted, so it's nothing new.
Definitely room for improvement... there have been a few discussions
about not using a buffered file for high-frequency data exchange and
relaxing various timing rules, which we should definitely look into,
but I wouldn't be at all surprised if HFS+ was just much worse at
this.

Thinking more about good old HFS+... I guess it's remotely possible
that there might have been coherency bugs in that could be exposed by
our usage pattern, but then that doesn't fit too well with the clues I
have from light reading: this is a non-SMP system, and it's said that
HFS+ used to serialise pretty much everything on big filesystem locks
anyway.




Re: pg_upgrade not preserving comments on predefined roles

2021-05-02 Thread Tom Lane
Chapman Flack  writes:
> Is there an inherent technical or policy reason for pg_upgrade not to
> preserve comments on predefined roles (or on predefined objects generally)?

I think this is absolutely out of scope for pg_dump.  We generally expect
that system objects' properties are not dumped, because they might be
different in a newer version, and overwriting the system definition with
a possibly-obsolete version would be a bad thing.

You could quibble about comments being a different matter, but I don't
buy it.

Also, our one venture into this space (allowing custom modifications of
system-object privileges to be propagated by pg_dump) has IMV been an
unmitigated disaster.  Years later, it *still* has unresolved bugs and
definitional issues.  So I'm going to run away screaming from any proposal
to do likewise for other object properties.

> For that matter, would it be objectionable for the predefined roles to
> come with comments right out of the box?

That, however, seems reasonable enough.  We deliver built-in functions and
operators with comments, so why not roles?

> Another objection might be that they'd presumably be subject to translation,
> and would need some way for initdb to install the proper localized versions.

We've not worried about that for functions/operators.

> I've appended the comments we use for them at $work, anyway.

IMO these would have to be shortened quite a bit to be friendly for
"\du+" displays.  I'm not against the concept though.

regards, tom lane




pg_upgrade not preserving comments on predefined roles

2021-05-02 Thread Chapman Flack
Hi hackers,

I recently did a pg_upgrade to 13 at $work, and noticed it did not
preserve the comments I had added locally on the pg_* predefined roles.

We have a bgworker that runs periodically and makes a report of existing
roles, memberships, and grants, showing the comments on the roles, so
I had added comments on the predefined ones so they would not look naked
and unexplained in the report.

All I had to do was go back to a pre-upgrade version of the report
and re-add the comments.

Is there an inherent technical or policy reason for pg_upgrade not to
preserve comments on predefined roles (or on predefined objects generally)?

For that matter, would it be objectionable for the predefined roles to
come with comments right out of the box? I guess one possible objection
could be "what next? comments on everything in pg_catalog?", but perhaps
there is a way to distinguish the case of predefined roles: they are a
relatively recent, um, encroachment into a namespace traditionally
managed by the admin, so maybe there's that much extra reason for them
to come with explanations attached.

Another objection might be that they'd presumably be subject to translation,
and would need some way for initdb to install the proper localized versions.

So maybe it is simpler to leave them uncommented by default, but perhaps
desirable for pg_upgrade to preserve comments locally added.

I've appended the comments we use for them at $work, anyway.

Regards,
-Chap





COMMENT ON ROLE pg_execute_server_program IS 'Allow executing programs on
the database server as the user the database runs as with COPY and other
functions which allow executing a server-side program. Since PG 11.';

COMMENT ON ROLE pg_monitor IS 'Read/execute various monitoring views and
functions. This role is a member of pg_read_all_settings, pg_read_all_stats
and pg_stat_scan_tables. Since PG 10.';

COMMENT ON ROLE pg_read_all_settings IS 'Read all configuration variables,
even those normally visible only to superusers. Since PG 10.';

COMMENT ON ROLE pg_read_all_stats IS 'Read all pg_stat_* views and use
various statistics related extensions, even those normally visible only to
superusers. Since PG 10.';

COMMENT ON ROLE pg_read_server_files IS 'Allow reading files from any
location the database user can access on the server with COPY and other
file-access functions. Since PG 11.';

COMMENT ON ROLE pg_signal_backend IS 'Send signals to other backends (eg:
cancel query, terminate). Since PG 9.6.';

COMMENT ON ROLE pg_stat_scan_tables IS 'Execute monitoring functions that
may take ACCESS SHARE locks on tables, potentially for a long time. Since PG
10.';

COMMENT ON ROLE pg_write_server_files IS 'Allow writing to files in any
location the database user can access on the server with COPY and other
file-access functions. Since PG 11.';




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
On Sun, May 2, 2021 at 9:37 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Sun, May 2, 2021 at 9:04 PM Tom Lane  wrote:
> >> -   state.in_quotes = false;
> >>
> >> This change seems wrong/unsafe too.
>
> > It seems OK, because this patch removes in_quotes field altogether.
>
> Oh, sorry, I misread the patch --- I thought that earlier hunk
> was removing a local variable.  Agreed, if you can do without this
> state field altogether, that's fine.

OK, thank you for review!

--
Regards,
Alexander Korotkov




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Tom Lane
Alexander Korotkov  writes:
> On Sun, May 2, 2021 at 9:04 PM Tom Lane  wrote:
>> -   state.in_quotes = false;
>> 
>> This change seems wrong/unsafe too.

> It seems OK, because this patch removes in_quotes field altogether.

Oh, sorry, I misread the patch --- I thought that earlier hunk
was removing a local variable.  Agreed, if you can do without this
state field altogether, that's fine.

regards, tom lane




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
On Sun, May 2, 2021 at 9:17 PM Zhihong Yu  wrote:
> One minor comment:
> +   /* iterate to the closing quotes or end of the string*/
>
> closing quotes -> closing quote

Yep, I've missed the third place to change from plural to single form :)

--
Regards,
Alexander Korotkov


0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v4.patch
Description: Binary data


Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Zhihong Yu
On Sun, May 2, 2021 at 11:12 AM Alexander Korotkov 
wrote:

> On Sun, May 2, 2021 at 9:06 PM Zhihong Yu  wrote:
> > +   /* Everything is quotes is processed as a single
> token */
> >
> > is quotes -> in quotes
> >
> > +   /* iterate to the closing quotes or end of the
> string*/
> >
> > closing quotes -> closing quote
> >
> > +   /* or else ƒtsvector() will raise an error */
> >
> > The character before tsvector() seems to be special.
>
> Thank you for catching.  Fixed in v3.
>
> --
> Regards,
> Alexander Korotkov
>

Hi,
One minor comment:
+   /* iterate to the closing quotes or end of the string*/

closing quotes -> closing quote

Cheers


Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
On Sun, May 2, 2021 at 9:06 PM Zhihong Yu  wrote:
> +   /* Everything is quotes is processed as a single token */
>
> is quotes -> in quotes
>
> +   /* iterate to the closing quotes or end of the string*/
>
> closing quotes -> closing quote
>
> +   /* or else ƒtsvector() will raise an error */
>
> The character before tsvector() seems to be special.

Thank you for catching.  Fixed in v3.

--
Regards,
Alexander Korotkov




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
On Sun, May 2, 2021 at 9:04 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Ooops, I've included this by oversight.  The next revision is attached.
> > Anything besides that?
>
> Some quick eyeball review:
>
> +/* Everything is quotes is processed as a single token */
>
> Should read "Everything in quotes ..."
>
> -/* or else gettoken_tsvector() will raise an error */
> +/* or else ƒtsvector() will raise an error */
>
> Looks like an unintentional change?

Thank you for catching this!

> @@ -846,7 +812,6 @@ parse_tsquery(char *buf,
> state.buffer = buf;
> state.buf = buf;
> state.count = 0;
> -   state.in_quotes = false;
> state.state = WAITFIRSTOPERAND;
> state.polstr = NIL;
>
> This change seems wrong/unsafe too.

It seems OK, because this patch removes in_quotes field altogether.
We don't have to know whether we in quotes in the state, since we
process everything in quotes as a single token.

--
Regards,
Alexander Korotkov


0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v3.patch
Description: Binary data


Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Zhihong Yu
On Sun, May 2, 2021 at 10:57 AM Alexander Korotkov 
wrote:

> On Sun, May 2, 2021 at 8:52 PM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > It seems there is another bug with phrase search and query parsing.
> > > It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> > > just parse text in quotes as a single token.  Besides fixing this bug,
> > > it simplifies the code.
> >
> > OK ...
> >
> > > Trying to fix this bug before 0c4f355c6a doesn't seem to worth the
> efforts.
> >
> > Agreed, plus it doesn't sound like the sort of behavior change that
> > we want to push out in minor releases.
>
> +1
>
> > > I propose to push the attached patch to v14.  Objections?
> >
> > This patch seems to include some unrelated fooling around in GiST?
>
> Ooops, I've included this by oversight.  The next revision is attached.
>
> Anything besides that?
>
> --
> Regards,
> Alexander Korotkov
>

Hi,
+   /* Everything is quotes is processed as a single token
*/

is quotes -> in quotes

+   /* iterate to the closing quotes or end of the string*/

closing quotes -> closing quote

+   /* or else ƒtsvector() will raise an error */

The character before tsvector() seems to be special.

Cheers


Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Tom Lane
Alexander Korotkov  writes:
> Ooops, I've included this by oversight.  The next revision is attached.
> Anything besides that?

Some quick eyeball review:

+/* Everything is quotes is processed as a single token */

Should read "Everything in quotes ..."

-/* or else gettoken_tsvector() will raise an error */
+/* or else ƒtsvector() will raise an error */

Looks like an unintentional change?

@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
-   state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;

This change seems wrong/unsafe too.

regards, tom lane




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
On Sun, May 2, 2021 at 8:52 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > It seems there is another bug with phrase search and query parsing.
> > It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> > just parse text in quotes as a single token.  Besides fixing this bug,
> > it simplifies the code.
>
> OK ...
>
> > Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
>
> Agreed, plus it doesn't sound like the sort of behavior change that
> we want to push out in minor releases.

+1

> > I propose to push the attached patch to v14.  Objections?
>
> This patch seems to include some unrelated fooling around in GiST?

Ooops, I've included this by oversight.  The next revision is attached.

Anything besides that?

--
Regards,
Alexander Korotkov


0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v2.patch
Description: Binary data


Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Tom Lane
Alexander Korotkov  writes:
> It seems there is another bug with phrase search and query parsing.
> It seems to me that since 0c4f355c6a websearch_to_tsquery() should
> just parse text in quotes as a single token.  Besides fixing this bug,
> it simplifies the code.

OK ...

> Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.

Agreed, plus it doesn't sound like the sort of behavior change that
we want to push out in minor releases.

> I propose to push the attached patch to v14.  Objections?

This patch seems to include some unrelated fooling around in GiST?

regards, tom lane




Re: websearch_to_tsquery() returns queries that don't match to_tsvector()

2021-05-02 Thread Alexander Korotkov
Hi!

On Mon, Apr 19, 2021 at 9:57 AM Valentin Gatien-Baron
 wrote:
> Looking at the tsvector and tsquery, we can see that the problem is
> that the ":" counts as one position for the ts_query but not the
> ts_vector:
>
> select to_tsvector('english', 'aaa: bbb'), websearch_to_tsquery('english', 
> '"aaa: bbb"');
>to_tsvector   | websearch_to_tsquery
> -+--
>  'aaa':1 'bbb':2 | 'aaa' <2> 'bbb'
> (1 row)

It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token.  Besides fixing this bug,
it simplifies the code.

Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.

I propose to push the attached patch to v14.  Objections?

--
Regards,
Alexander Korotkov


0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-a-.patch
Description: Binary data


Re: Regex performance regression induced by match-all code

2021-05-02 Thread Tom Lane
"Joel Jacobson"  writes:
> On Sat, May 1, 2021, at 21:46, Tom Lane wrote:
>> I found a nasty performance problem in commit 824bf7190: given the
>> right sort of regex, checkmatchall() takes an unreasonable amount
>> of time.

> Nice catch.
> I've successfully tested the patch on the regex corpus:

Thanks for testing!

>> The main thing I find a bit ugly about this solution is that
>> I'm abusing the state->tmp fields (which are declared struct state *)
>> to hold the memoization results (which are "bool *").  It'd be
>> possible to avoid that by allocating an additional "bool **" array
>> indexed by state number, but whether it's worth that depends on how
>> allergic you are to weird pointer casts.

I tried rewriting it like that, and I have to say I do like it better
that way.  I think it's clearer, which seems well worth one extra
malloc.

Also, I poked a little more at the question of the heuristic limit
on number of states, by checking the actual numbers of states in
various ways of writing matchall regexes.  It looks to me like
we can cut the limit to DUPINF*2 and still have lots of daylight,
because reasonable (and even not so reasonable) ways to write a
pattern that can match up to K characters seem to come out with
not much more than K states.

Hence, PFA v2.  I also added a couple of test cases based on
looking at code coverage in this area, as well as a case that
takes an unreasonably long time without this fix.

regards, tom lane

diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 77b860cb0f..6d77c59e12 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -3032,96 +3032,189 @@ analyze(struct nfa *nfa)
 static void
 checkmatchall(struct nfa *nfa)
 {
-	bool		hasmatch[DUPINF + 1];
-	int			minmatch,
-maxmatch,
-morematch;
+	bool	  **haspaths;
+	struct state *s;
+	int			i;
 
 	/*
-	 * hasmatch[i] will be set true if a match of length i is feasible, for i
-	 * from 0 to DUPINF-1.  hasmatch[DUPINF] will be set true if every match
-	 * length of DUPINF or more is feasible.
+	 * If there are too many states, don't bother trying to detect matchall.
+	 * This limit serves to bound the time and memory we could consume below.
+	 * Note that even if the graph is all-RAINBOW, if there are significantly
+	 * more than DUPINF states then it's likely that there are paths of length
+	 * more than DUPINF, which would force us to fail anyhow.  In practice,
+	 * plausible ways of writing a matchall regex with maximum finite path
+	 * length K tend not to have very many more than K states.
 	 */
-	memset(hasmatch, 0, sizeof(hasmatch));
+	if (nfa->nstates > DUPINF * 2)
+		return;
 
 	/*
-	 * Recursively search the graph for all-RAINBOW paths to the "post" state,
-	 * starting at the "pre" state.  The -1 initial depth accounts for the
-	 * fact that transitions out of the "pre" state are not part of the
-	 * matched string.  We likewise don't count the final transition to the
-	 * "post" state as part of the match length.  (But we still insist that
-	 * those transitions have RAINBOW arcs, otherwise there are lookbehind or
-	 * lookahead constraints at the start/end of the pattern.)
+	 * First, scan all the states to verify that only RAINBOW arcs appear,
+	 * plus pseudocolor arcs adjacent to the pre and post states.  This lets
+	 * us quickly eliminate most cases that aren't matchall NFAs.
 	 */
-	if (!checkmatchall_recurse(nfa, nfa->pre, false, -1, hasmatch))
-		return;
+	for (s = nfa->states; s != NULL; s = s->next)
+	{
+		struct arc *a;
+
+		for (a = s->outs; a != NULL; a = a->outchain)
+		{
+			if (a->type != PLAIN)
+return;			/* any LACONs make it non-matchall */
+			if (a->co != RAINBOW)
+			{
+if (nfa->cm->cd[a->co].flags & PSEUDO)
+{
+	/*
+	 * Pseudocolor arc: verify it's in a valid place (this
+	 * seems quite unlikely to fail, but let's be sure).
+	 */
+	if (s == nfa->pre &&
+		(a->co == nfa->bos[0] || a->co == nfa->bos[1]))
+		 /* okay BOS/BOL arc */ ;
+	else if (a->to == nfa->post &&
+			 (a->co == nfa->eos[0] || a->co == nfa->eos[1]))
+		 /* okay EOS/EOL arc */ ;
+	else
+		return; /* unexpected pseudocolor arc */
+	/* We'll check these arcs some more below. */
+}
+else
+	return;		/* any other color makes it non-matchall */
+			}
+		}
+		/* Also, assert that the tmp fields are available for use. */
+		assert(s->tmp == NULL);
+	}
 
 	/*
-	 * We found some all-RAINBOW paths, and not anything that we couldn't
-	 * handle.  Now verify that pseudocolor arcs adjacent to the pre and post
-	 * states match the RAINBOW arcs there.  (We could do this while
-	 * recursing, but it's expensive and unlikely to fail, so do it last.)
+	 * The next cheapest check we can make is to verify that the BOS/BOL
+	 * outarcs of the pre state reach the same states as its RAINBOW outarcs.
+	 * If they don't, the NFA expresses some constraints on the character
+	 * before 

Re: Identify missing publications from publisher while create/alter subscription.

2021-05-02 Thread vignesh C
On Sat, May 1, 2021 at 7:58 PM Bharath Rupireddy
 wrote:
>
> On Sat, May 1, 2021 at 12:49 PM vignesh C  wrote:
> > Thanks for the comments, Attached patch has the fixes for the same.
> > Thoughts?
>
> Few more comments on v5:
>
> 1) Deletion of below empty new line is spurious:
> -
>  /*
>   * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
>   *
>

Modified.

> 2) I think we could just do as below to save indentation of the code
> for validate_publication == true.
> static void
> +connect_and_check_pubs(Subscription *sub, List *publications,
> +   bool validate_publication)
> +{
> +char   *err;
> +
> +if (validate_pulication == false )
> +return;
> +
> +/* Load the library providing us libpq calls. */
> +load_file("libpqwalreceiver", false);
>

Modified.

> 3) To be consistent, either we pass in validate_publication to both
> connect_and_check_pubs and check_publications, return immediately from
> them if it is false or do the checks outside. I suggest to pass in the
> bool parameter to check_publications like you did for
> connect_and_check_pubs. Or remove validate_publication from
> connect_and_check_pubs and do the check outside.
> +if (validate_publication)
> +check_publications(wrconn, publications);
> +if (check_pub)
> +check_publications(wrconn, sub->publications);
>

Modified.

> 4) Below line of code is above 80-char limit:
> +else if (strcmp(defel->defname, "validate_publication") == 0
> && validate_publication)
>

Modified

> 5) Instead of adding a new file 021_validate_publications.pl for
> tests, spawning a new test database which would make the overall
> regression slower, can't we add with the existing database nodes in
> 0001_rep_changes.pl? I would suggest adding the tests in there even if
> the number of tests are many, I don't mind.

001_rep_changes.pl has the changes mainly for checking the replicated
data. I did not find an appropriate file in the current tap tests, I
preferred these tests to be in a separate file. Thoughts?

Thanks for the comments.
The Attached patch has the fixes for the same.

Regards,
Vignesh
From 89a1e47c8be015fac520d2f94d16f86bf78a481c Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Wed, 7 Apr 2021 22:05:53 +0530
Subject: [PATCH v6] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 +
 doc/src/sgml/ref/create_subscription.sgml |  13 +
 src/backend/commands/subscriptioncmds.c   | 232 +++---
 src/bin/psql/tab-complete.c   |   7 +-
 .../t/021_validate_publications.pl|  82 +++
 5 files changed, 315 insertions(+), 32 deletions(-)
 create mode 100644 src/test/subscription/t/021_validate_publications.pl

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 367ac814f4..81e156437b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -160,6 +160,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index e812beee37..2f1f541253 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -239,6 +239,19 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 517c8edd3b..d731ba3e14 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -69,7 +69,9 @@ parse_subscription_options(List *options,
 		   char **synchronous_commit,
 		   bool *refresh,
 		   bool *binary_given, bool *binary,
-		   bool *streaming_given, bool *streaming)
+		   bool *streaming_given, bool *streaming,
+		  

Re: Enhanced error message to include hint messages for redundant options error

2021-05-02 Thread vignesh C
On Sat, May 1, 2021 at 9:02 PM Bharath Rupireddy
 wrote:
>
> On Sat, May 1, 2021 at 7:25 PM vignesh C  wrote:
> > > > I'm not attaching above one line change as a patch, maybe Vignesh can
> > > > merge this into the main patch.
> >
> > Thanks for the comments. I have merged the change into the attached patch.
> > Thoughts?
>
> Thanks! v4 basically LGTM. Can we park this in the current commitfest
> if not done already?
>
> Upon looking at the number of places where we have the "option \"%s\"
> specified more than once" error, I, now strongly feel that we should
> use goto duplicate_error approach like in compute_common_attribute, so
> that we will have only one ereport(ERROR. We can change it in
> following files: copy.c, dbcommands.c, extension.c,
> compute_function_attributes, sequence.c, subscriptioncmds.c,
> typecmds.c, user.c, walsender.c, pgoutput.c. This will reduce the LOC
> greatly.
>
> Thoughts?

I have made the changes for this, I have posted the same in the v5
patch posted in my earlier mail.

Regards,
Vignesh




Re: Enhanced error message to include hint messages for redundant options error

2021-05-02 Thread vignesh C
On Sat, May 1, 2021 at 10:54 PM Alvaro Herrera  wrote:
>
> On 2021-May-01, vignesh C wrote:
>
> > On Thu, Apr 29, 2021 at 10:44 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2021-Apr-29, vignesh C wrote:
> > >
> > > > Thanks for the comments, please find the attached v3 patch which has
> > > > the change for the first part.
> > >
> > > Looks good to me.  I would only add parser_errposition() to the few
> > > error sites missing that.
> >
> > I have not included parser_errposition as ParseState was not available
> > for these errors.
>
> Yeah, it's tough to do that in a few of those such as validator
> functions, and I don't think we'd want to do that.  However there are
> some cases where we can easily add the parsestate as an argument -- for
> example CreatePublication can get it in ProcessUtilitySlow and pass it
> down to parse_publication_options; likewise for ExecuteDoStmt.  I didn't
> check other places.

Thanks for the comments. I have changed in most of the places except
for a few places like plugin functions, internal commands and changes
that required changing more levels of function callers. Attached patch
has the changes for the same.
Thoughts?

Regards,
Vignesh
From 7c56cea92947b82107e2298f6d48d934df6fd7d8 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 26 Apr 2021 18:40:36 +0530
Subject: [PATCH v5] Enhance error message.

Enhanced error message, so that the user can easily identify the error.
---
 contrib/file_fdw/file_fdw.c |  20 +--
 src/backend/catalog/aclchk.c|  22 ++--
 src/backend/commands/copy.c |  66 --
 src/backend/commands/dbcommands.c   |  90 +-
 src/backend/commands/extension.c|  27 ++--
 src/backend/commands/foreigncmds.c  |  30 +++--
 src/backend/commands/functioncmds.c |  55 
 src/backend/commands/publicationcmds.c  |  39 +++---
 src/backend/commands/sequence.c |  57 +++--
 src/backend/commands/subscriptioncmds.c |  75 +--
 src/backend/commands/tablecmds.c|   2 +-
 src/backend/commands/typecmds.c |  38 +++---
 src/backend/commands/user.c | 131 +++-
 src/backend/parser/parse_utilcmd.c  |   2 +-
 src/backend/replication/pgoutput/pgoutput.c |  31 +++--
 src/backend/replication/walsender.c |  23 ++--
 src/backend/tcop/utility.c  |  20 +--
 src/include/commands/defrem.h   |   6 +-
 src/include/commands/publicationcmds.h  |   4 +-
 src/include/commands/subscriptioncmds.h |   4 +-
 src/include/commands/typecmds.h |   2 +-
 src/include/commands/user.h |   2 +-
 src/test/regress/expected/copy2.out |  24 ++--
 src/test/regress/expected/foreign_data.out  |   8 +-
 src/test/regress/expected/publication.out   |   4 +-
 25 files changed, 352 insertions(+), 430 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..f857d5af97 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -200,6 +200,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	char	   *filename = NULL;
 	DefElem*force_not_null = NULL;
 	DefElem*force_null = NULL;
+	DefElem*def;
 	List	   *other_options = NIL;
 	ListCell   *cell;
 
@@ -209,7 +210,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	 */
 	foreach(cell, options_list)
 	{
-		DefElem*def = (DefElem *) lfirst(cell);
+		def = (DefElem *) lfirst(cell);
 
 		if (!is_valid_option(def->defname, catalog))
 		{
@@ -290,10 +291,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_not_null") == 0)
 		{
 			if (force_not_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_not_null\" supplied more than once for a column.")));
+goto duplicate_error;
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
@@ -302,10 +300,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_null") == 0)
 		{
 			if (force_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_null\" supplied more than once for a column.")));
+goto duplicate_error;
 			force_null = def;
 			(void) defGetBoolean(def);
 		}
@@ -328,6 +323,13 @@ file_fdw_validator(PG_FUNCTION_ARGS)
  errmsg("either filename or program is required for file_fdw foreign tables")));
 
 	PG_RETURN_VOID();
+
+duplicate_error:
+	ereport(ERROR,
+			(errcode(ERRCODE_SYNTAX_ERROR),
+			 errmsg("option \"%s\" specified more than once", def->defname)));
+	PG_RETURN_VOID();/* keep compiler quiet */
+
 }
 
 /*
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index e1573eb398..be38488934 100644
--- a/src/backend/catalog/aclchk.c
+++ 

Re: Enhanced error message to include hint messages for redundant options error

2021-05-02 Thread Julien Rouhaud
Le dim. 2 mai 2021 à 18:31, vignesh C  a écrit :

> On Sun, May 2, 2021 at 4:27 AM Alvaro Herrera 
> wrote:
> >
> > On 2021-May-01, Bharath Rupireddy wrote:
> >
> > > IMO, it's not good to change the function API just for showing up
> > > parse_position (which is there for cosmetic reasons I feel) in an error
> > > which actually has the option name clearly mentioned in the error
> message.
> >
> > Why not?  We've done it before, I'm sure you can find examples in the
> > git log.  The function API is not that critical -- these functions are
> > mostly only called from ProcessUtility and friends.
>
> I feel it is better to include it wherever possible.
>

+1

>


Re: Enhanced error message to include hint messages for redundant options error

2021-05-02 Thread vignesh C
On Sun, May 2, 2021 at 4:27 AM Alvaro Herrera  wrote:
>
> On 2021-May-01, Bharath Rupireddy wrote:
>
> > IMO, it's not good to change the function API just for showing up
> > parse_position (which is there for cosmetic reasons I feel) in an error
> > which actually has the option name clearly mentioned in the error message.
>
> Why not?  We've done it before, I'm sure you can find examples in the
> git log.  The function API is not that critical -- these functions are
> mostly only called from ProcessUtility and friends.

I feel it is better to include it wherever possible.

Regards,
Vignesh