Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Mon, Sep 20, 2021 at 3:17 PM Ajin Cherian  wrote:
>
> I have not changed any of the first 5 patches, just added my patch 006
> at the end. Do let me know of any comments on this approach.
>

I have a question regarding v29-0003-PS-ExprState-cache-modifications.
In pgoutput_row_filter, for row_filter, we are traversing ancestors of
a partition to find pub_relid but isn't that already available in
RelationSyncEntry as publish_as_relid?

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Sat, Sep 25, 2021 at 3:07 AM Tomas Vondra
 wrote:
>
> On 9/24/21 8:09 AM, Amit Kapila wrote:
> > On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
> >  wrote:
> >>
> >> 13) turning update into insert
> >>
> >> I agree with Ajin Cherian [4] that looking at just old or new row for
> >> updates is not the right solution, because each option will "break" the
> >> replica in some case. So I think the goal "keeping the replica in sync"
> >> is the right perspective, and converting the update to insert/delete if
> >> needed seems appropriate.
> >>
> >> This seems a somewhat similar to what pglogical does, because that may
> >> also convert updates (although only to inserts, IIRC) when handling
> >> replication conflicts. The difference is pglogical does all this on the
> >> subscriber, while this makes the decision on the publisher.
> >>
> >> I wonder if this might have some negative consequences, or whether
> >> "moving" this to downstream would be useful for other purposes in the
> >> fuure (e.g. it might be reused for handling other conflicts).
> >>
> >
> > Apart from additional traffic, I am not sure how will we handle all
> > the conditions on subscribers, say if the new row doesn't match, how
> > will subscribers know about this unless we pass row_filter or some
> > additional information along with tuple. Previously, I have done some
> > research and shared in one of the emails above that IBM's InfoSphere
> > Data Replication [1] performs filtering in this way which also
> > suggests that we won't be off here.
> >
>
> I'm certainly not suggesting what we're doing is wrong. Given the design
> of built-in logical replication it makes sense doing it this way, I was
> just thinking aloud about what we might want to do in the future (e.g.
> pglogical uses this to deal with conflicts between multiple sources, and
> so on).
>

Fair enough.

> >>
> >>
> >> 15) pgoutput_row_filter initializing filter
> >>
> >> I'm not sure I understand why the filter initialization gets moved from
> >> get_rel_sync_entry. Presumably, most of what the replication does is
> >> replicating rows, so I see little point in not initializing this along
> >> with the rest of the rel_sync_entry.
> >>
> >
> > Sorry, IIRC, this has been suggested by me and I thought it was best
> > to do any expensive computation the first time it is required. I have
> > shared few cases like in [2] where it would lead to additional cost
> > without any gain. Unless I am missing something, I don't see any
> > downside of doing it in a delayed fashion.
> >
>
> Not sure, but the arguments presented there seem a bit wonky ...
>
> Yes, the work would be wasted if we discard the cached data without
> using it (it might happen for truncate, I'm not sure). But how likely is
> it that such operations happen *in isolation*? I'd bet the workload is
> almost never just a stream of truncates - there are always some
> operations in between that would actually use this.
>

It could also happen with a mix of truncate and other operations as we
decide whether to publish an operation or not after
get_rel_sync_entry.

> Similarly for the errors - IIRC hitting an error means the replication
> restarts, which is orders of magnitude more expensive than anything we
> can save by this delayed evaluation.
>
> I'd keep it simple, for the sake of simplicity of the whole patch.
>

The current version proposed by Peter is not reviewed yet and by
looking at it I have some questions too which I'll clarify in a
separate email. I am not sure if you are against delaying the
expression initialization because of the current code or concept as a
general because if it is later then we have other instances as well
when we don't do all the work in get_rel_sync_entry like building
tuple conversion map which is cached as well.

-- 
With Regards,
Amit Kapila.




Re: Fixing WAL instability in various TAP tests

2021-09-24 Thread Noah Misch
On Fri, Sep 24, 2021 at 05:33:13PM -0700, Mark Dilger wrote:
> A few TAP tests in the project appear to be sensitive to reductions of the
> PostgresNode's max_wal_size setting, resulting in tests failing due to wal
> files having been removed too soon.  The failures in the logs typically are
> of the "requested WAL segment %s has already been removed" variety.  I would
> expect tests which fail under legal alternate GUC settings to be hardened to
> explicitly set the GUCs as they need, rather than implicitly relying on the
> defaults.

That is not the general practice in PostgreSQL tests today.  The buildfarm
exercises some settings, so we keep the tests clean for those.  Coping with
max_wal_size=2 that way sounds reasonable.  I'm undecided about the value of
hardening tests against all possible settings.  On the plus side, it would let
us run one buildfarm member that sets every setting to its min_val or
enumvals[1] and another member that elects enumvals[cardinality(enumvals)] or
max_val.  We'd catch some real bugs that way.  On the minus side, every
nontrivial test suite addition would need to try those two cases before commit
or risk buildfarm wrath.  I don't know whether the bugs found would pay for
that trouble.  (There's also a less-important loss around the ability to
exercise a setting and manually inspect the results.  For example, I sometimes
test parallel_tuple_cost=0 parallel_setup_cost=0 and confirm a lack of
crashes.  After hardening, that would require temporary source code edits to
remove the hardening.  That's fine, though.)




Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Sat, Sep 25, 2021 at 3:30 AM Tomas Vondra
 wrote:
>
> On 9/24/21 7:20 AM, Amit Kapila wrote:
> >
> > I think the right way to support functions is by the explicit marking
> > of functions and in one of the emails above Jeff Davis also agreed
> > with the same. I think we should probably introduce a new marking for
> > this. I feel this is important because without this it won't be safe
> > to access even some of the built-in functions that can access/update
> > database (non-immutable functions) due to logical decoding environment
> > restrictions.
> >
>
> I agree that seems reasonable. Is there any reason why not to just use
> IMMUTABLE for this purpose? Seems like a good match to me.
>

It will just solve one part of the puzzle (related to database access)
but it is better to avoid the risk of broken replication by explicit
marking especially for UDFs or other user-defined objects. You seem to
be okay documenting such risk but I am not sure we have an agreement
on that especially because that was one of the key points of
discussions in this thread and various people told that we need to do
something about it. I personally feel we should do something if we
want to allow user-defined functions or operators because as reported
in the thread this problem has been reported multiple times. I think
we can go ahead with IMMUTABLE built-ins for the first version and
then allow UDFs later or let's try to find a way for explicit marking.

> Yes, the user can lie and label something that is not really IMMUTABLE,
> but that's his fault. Yes, it's harder to fix than e.g. for indexes.
>

Agreed and I think we can't do anything about this.

> >>
> >> 12) misuse of REPLICA IDENTITY
> >>
> >> The more I think about this, the more I think we're actually misusing
> >> REPLICA IDENTITY for something entirely different. The whole purpose of
> >> RI was to provide a row identifier for the subscriber.
> >>
> >> But now we're using it to ensure we have all the necessary columns,
> >> which is entirely orthogonal to the original purpose. I predict this
> >> will have rather negative consequences.
> >>
> >> People will either switch everything to REPLICA IDENTITY FULL, or create
> >> bogus unique indexes with extra columns. Which is really silly, because
> >> it wastes network bandwidth (transfers more data) or local resources
> >> (CPU and disk space to maintain extra indexes).
> >>
> >> IMHO this needs more infrastructure to request extra columns to decode
> >> (e.g. for the filter expression), and then remove them before sending
> >> the data to the subscriber.
> >>
> >
> > Yeah, but that would have an additional load on write operations and I
> > am not sure at this stage but maybe there could be other ways to
> > extend the current infrastructure wherein we build the snapshots using
> > which we can access the user tables instead of only catalog tables.
> > Such enhancements if feasible would be useful not only for allowing
> > additional column access in row filters but for other purposes like
> > allowing access to functions that access user tables. I feel we can
> > extend this later as well seeing the usage and requests. For the first
> > version, this doesn't sound too limiting to me.
> >
>
> I'm not really buying the argument that this means overhead for write
> operations. Well, it does, but the current RI approach is forcing users
> to either use RIF or add an index covering the filter attributes.
> Neither of those options is free, and I'd bet the extra overhead of
> adding just the row filter columns would be actually lower.
>
> If the argument is merely to limit the scope of this patch, fine.
>

Yeah, that is one and I am not sure that adding extra WAL is the best
or only solution for this problem. As mentioned in my previous
response, I think we eventually need to find a way to access user
tables to support UDFs (that access database) or sub-query which other
databases already support, and for that, we might need to enhance the
current snapshot mechanism after which we might not need any
additional WAL even for additional columns in row filter. I don't
think anyone of us has evaluated in detail the different ways this
problem can be solved and the pros/cons of each approach, so limiting
the scope for this purpose doesn't seem like a bad idea to me.

-- 
With Regards,
Amit Kapila.




Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Fri, Sep 24, 2021 at 7:44 PM Peter Geoghegan  wrote:
> The scheduling of autovacuum is itself a big problem for the two big
> BenchmarkSQL tables I'm always going on about -- though it did get a
> lot better with the introduction of the
> autovacuum_vacuum_insert_scale_factor stuff in Postgres 13. I recently
> noticed that the tables have *every* autovacuum driven by inserts
> (i.e. by the new autovacuum_vacuum_scale_factor stuff), and never by
> updates -- even though updates obviously produce significant bloat in
> the two tables. BenchmarkSQL on Postgres was far worse than it is now
> a few releases ago [1], and I think that this stats business was a big
> factor (on top of everything else). I can clearly see that
> autovacuum_vacuum_scale_factor is certainly accidentally protective
> with BenchmarkSQL today, in a way that wasn't particularly anticipated
> by anybody.

> So if this was a real app, the DBA would
> somehow have to work out that they should aggressively tune
> autovacuum_vacuum_scale_factor to clean up bloat from updates. I doubt
> any DBA could ever figure that out, because it doesn't make any sense.

Correction: I meant that the autovacuum_vacuum_insert_scale_factor GUC is
accidentally protective with the BenchmarkSQL tables, and that no DBA
could be expected to figure this out. That is, it helps to lower
autovacuum_vacuum_insert_scale_factor from its default of 0.20, just
to get autovacuum to better handle bloat from *updates*. This has
nothing to do with inserts, or with freeze or set VM bits -- and so
overall it doesn't make any sense.


--
Peter Geoghegan




Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Fri, Sep 24, 2021 at 11:48 AM Robert Haas  wrote:
> Actually, I have. I've been focusing on trying to create a general
> infrastructure for conveyor belt storage. An incomplete and likely
> quite buggy version of this can be found here:
>
> https://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/conveyor

That's great news! I think that this is the right high level direction.

> Mark Dilger has been helping me debug it, but it's still very early
> days. I was planning to wait until it was a little more baked before
> posting it to the list, but since you asked...

Reminds me of my FSM patch, in a way. It's ambitious, and still very
rough, but maybe I should bite the bullet and post it as a POC soon.

> Once that infrastructure is sufficiently mature, then the next step, I
> think, would be to try to use it to store dead TIDs.

+1.

> And then after that, one has to think about how autovacuum scheduling
> ought to work in a world where table vacuuming and index vacuuming are
> decoupled.

I'm excited about the possibility of using this infrastructure as a
springboard for driving autovacuum's behavior using more or less
authoritative information, rather than dubious statistics that can
consistently lead us down the wrong path. ANALYZE style statistics are
something that can only work under specific conditions that take their
obvious limitations into account -- and even there (even within the
optimizer) it's amazing that they work as well as they do. I fear that
we assumed that the statistics driving autovacuum were good enough at
some point in the distant past, and never really validated that
assumption. Perhaps because anti-wraparound VACUUM was *accidentally*
protective.

The scheduling of autovacuum is itself a big problem for the two big
BenchmarkSQL tables I'm always going on about -- though it did get a
lot better with the introduction of the
autovacuum_vacuum_insert_scale_factor stuff in Postgres 13. I recently
noticed that the tables have *every* autovacuum driven by inserts
(i.e. by the new autovacuum_vacuum_scale_factor stuff), and never by
updates -- even though updates obviously produce significant bloat in
the two tables. BenchmarkSQL on Postgres was far worse than it is now
a few releases ago [1], and I think that this stats business was a big
factor (on top of everything else). I can clearly see that
autovacuum_vacuum_scale_factor is certainly accidentally protective
with BenchmarkSQL today, in a way that wasn't particularly anticipated
by anybody.

The fact that the intellectual justifications for a lot of these
things are so vague concerns me. For example, why do we apply
autovacuum_vacuum_scale_factor based on reltuples at the end of the
last VACUUM? That aspect of the design will make much less sense once
we have this decoupling in place. Even with the happy accident of
autovacuum_vacuum_insert_scale_factor helping BenchmarkSQL, the
conventional dead tuples based approach to VACUUM still doesn't drive
autovacuum sensibly -- we still systematically undercount LP_DEAD
stubs because (with this workload) they're systemically concentrated
in relatively few heap pages. So if this was a real app, the DBA would
somehow have to work out that they should aggressively tune
autovacuum_vacuum_scale_factor to clean up bloat from updates. I doubt
any DBA could ever figure that out, because it doesn't make any sense.

The problem goes both ways: in addition to undercounting dead tuples,
we effectively overcount, which can lead to autovacuum chasing its own
tail [2].

I think that we could do *way* better than we do today without
enormous effort, and I think that it really matters. Maybe we could
select from a few standard models for autovacuum scheduling using
Bayesian inference -- converge on the more predictive model for a
given table over time, using actual outcomes for each autovacuum. Be
sensitive to how LP_DEAD stub line pointers can become concentrated in
relatively few heap pages, and stuff like that. Maybe keep a little
history to work off of. The problem with the current model is not that
it might be wrong. The problem is that it might *never* be right (for
a given table). The scheduling never learns any lessons, because it's
fundamentally static -- it ought to be dynamic. How things change is
much more informative than where things are at an arbitrary point in
time.

[1] 
https://www.postgresql.org/message-id/flat/0265f9e2-3e32-e67d-f106-8abde596c0e4%40commandprompt.com
[2] 
https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=w...@mail.gmail.com
--
Peter Geoghegan




RE: Column Filtering in Logical Replication

2021-09-24 Thread houzj.f...@fujitsu.com
From Fri, Sep 24, 2021 9:25 PM Alvaro Herrera  wrote:
> On 2021-Sep-23, Amit Kapila wrote:
> 
> > Alvaro, do you have any thoughts on these proposed grammar changes?
> 
> Yeah, I think pubobj_name remains a problem in that you don't know its return
> type -- could be a String or a RangeVar, and the user of that production can't
> distinguish.  So you're still (unnecessarily, IMV) stashing an object of
> undetermined type into ->object.
> 
> I think you should get rid of both pubobj_name and pubobj_expr and do
> somethine like this:
> PublicationObjSpec:   TABLE ColId
>   {
>   $$ = 
> makeNode(PublicationObjSpec);
>   $$->pubobjtype = 
> PUBLICATIONOBJ_TABLE;
>   $$->rangevar = 
> makeRangeVarFromQualifiedName($1, NULL, @1, yyscanner);
>   }
>   | TABLE ColId indirection
>   {
>   $$ = 
> makeNode(PublicationObjSpec);
>   $$->pubobjtype = 
> PUBLICATIONOBJ_TABLE;
>   $$->rangevar = 
> makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
>   }

Hi,

IIRC, the above grammar doesn't support extended relation expression (like:
"tablename * ", "ONLY tablename", "ONLY '( tablename )") which is part of rule
relation_expr. I think we should add these too. And if we move forward with the
design you proposed, we should do something like the following:

/* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
PublicationObjSpec:
TABLE relation_expr
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype 
= PUBLICATIONOBJ_TABLE;
$$->rangevar = 
$2;
}
| ALL TABLES IN_P SCHEMA ColId
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype 
= PUBLICATIONOBJ_REL_IN_SCHEMA;
$$->name = $5;
}
| ALL TABLES IN_P SCHEMA CURRENT_SCHEMA
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype 
= PUBLICATIONOBJ_CURRSCHEMA;
$$->name = $5;
}
| extended_relation_expr/* grammar like 
tablename * , ONLY tablename, ONLY ( tablename )*/
{
$$ = 
makeNode(PublicationObjSpec);
$$->rangevar = 
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
$$->pubobjtype 
= PUBLICATIONOBJ_CONTINUATION;
}
| ColId
{
$$ = 
makeNode(PublicationObjSpec);
$$->name = $1;
$$->pubobjtype 
= PUBLICATIONOBJ_CONTINUATION;

}
| ColId indirection
{
$$ = 
makeNode(PublicationObjSpec);
$$->rangevar = 
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
$$->pubobjtype 
= PUBLICATIONOBJ_CONTINUATION;

}

Best regards,
Hou zj


Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Thu, Sep 23, 2021 at 10:42 PM Masahiko Sawada  wrote:
> On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan  wrote:
> > Enabling index-only scans is a good enough reason to pursue this
> > project, even on its own.
>
> +1

I was hoping that you might be able to work on opportunistically
freezing whole pages for Postgres 15. I think that it would make sense
to opportunistically make a page that is about to become all_visible
during VACUUM become all_frozen instead. Our goal is to make most
pages skip all_visible, and go straight to all_frozen directly. Often
the page won't need to be dirtied again, ever.

Right now freezing is something that we mostly just think about as
occurring at the level of tuples, which doesn't seem ideal. This seems
related to Robert's project because both projects are connected to the
question of how autovacuum scheduling works in general. We will
probably need to rethink things like the vacuum_freeze_min_age GUC. (I
also think that we might need to reconsider how
aggressive/anti-wraparound VACUUMs work, but that's another story.)

Obviously this is a case of performing work eagerly; a form of
speculation that tries to lower costs in the aggregate, over time.
Heuristics that work well on average seem possible, but even excellent
heuristics could be wrong -- in the end we're trying to predict the
future, which is inherently impossible to do reliably for all
workloads. I think that that will be okay, provided that the cost of
being wrong is kept low and *fixed* (the exact definition of "fixed"
will need to be defined, but the basic idea is that any regression is
once per page, not once per page per VACUUM or something).

Once it's much cheaper enough to freeze a whole page early (i.e. all
tuple headers from all tuples), then the implementation can be wrong
95%+ of the time, and maybe we'll still win by a lot. That may sound
bad, until you realize that it's 95% *per VACUUM* -- the entire
situation is much better once you think about the picture for the
entire table over time and across many different VACUUM operations,
and once you think about FPIs in the WAL stream. We'll be paying the
cost of freezing in smaller and more predictable increments, too,
which can make the whole system more robust. Many pages that all go
from all_visible to all_frozen at the same time (just because they
crossed some usually-meaningless XID-based threshold) is actually
quite risky (this is why I mentioned aggressive VACUUMs in passing).

The hard part is getting the cost way down. lazy_scan_prune() uses
xl_heap_freeze_tuple records for each tuple it freezes. These
obviously have a lot of redundancy across tuples from the same page in
practice. And the WAL overhead is much larger just because these are
per-tuple records, not per-page records. Getting the cost down is hard
because of issues with MultiXacts, freezing xmin but not freezing xmax
at the same time, etc.

> Logging how vacuum uses and sets VM bits seems a good idea.

> I think that we will end up doubly counting the page as scanned_pages
> and allfrozen_pages due to the newly added latter change. This seems
> wrong to me because we calculate as follows:

I agree that that's buggy. Oops.

It was just a prototype that I wrote for my own work. I do think that
we should have a patch that has some of this, for users, but I am not
sure about the details just yet. This is probably too much information
for users, but I think it will take me more time to decide what really
does matter to users.

-- 
Peter Geoghegan




Fixing WAL instability in various TAP tests

2021-09-24 Thread Mark Dilger
Hackers,

A few TAP tests in the project appear to be sensitive to reductions of the 
PostgresNode's max_wal_size setting, resulting in tests failing due to wal 
files having been removed too soon.  The failures in the logs typically are of 
the "requested WAL segment %s has already been removed" variety.  I would 
expect tests which fail under legal alternate GUC settings to be hardened to 
explicitly set the GUCs as they need, rather than implicitly relying on the 
defaults.  As far as missing WAL files go, I would expect the TAP test to 
prevent this with the use of replication slots or some other mechanism, and not 
simply to rely on checkpoints not happening too soon.  I'm curious if others on 
this list disagree with that point of view.

Failures in src/test/recovery/t/015_promotion_pages.pl can be fixed by creating 
a physical replication slot on node "alpha" and using it from node "beta", a 
technique already used in other TAP tests and apparently merely overlooked in 
this one.

The first two tests in src/bin/pg_basebackup/t fail, and it's not clear that 
physical replication slots are the appropriate solution, since no replication 
is happening.  It's not immediately obvious that the tests are at fault anyway. 
 On casual inspection, it seems they might be detecting a live bug which simply 
doesn't manifest under larger values of max_wal_size.  Test 010 appears to show 
a bug with `pg_basebackup -X`, and test 020 with `pg_receivewal`.

The test in contrib/bloom/t/ is deliberately disabled in contrib/bloom/Makefile 
with a comment that the test is unstable in the buildfarm, but I didn't find 
anything to explain what exactly those buildfarm failures might have been when 
I chased down the email thread that gave rise to the related commit.  That test 
happens to be stable on my laptop until I change GUC settings to both reduce 
max_wal_size=32MB and to set wal_consistency_checking=all.

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Column Filtering in Logical Replication

2021-09-24 Thread Alvaro Herrera
On 2021-Sep-25, Tomas Vondra wrote:

> On 9/25/21 12:24 AM, Alvaro Herrera wrote:
> > On 2021-Sep-24, Tomas Vondra wrote:
> > 
> > > But that's not the column filtering patch, right? Why would this patch
> > > depend on "schema level support", but maybe the consensus is there's some
> > > common part that we need to get in first?
> > 
> > Yes, the grammar needs to be common.  I posted a proposed grammar in
> > https://www.postgresql.org/message-id/202109241325.eag5g6mpvoup%40alvherre.pgsql
> > (this thread) which should serve both.  I forgot to test the addition of
> > a WHERE clause for row filtering, though, and I didn't think to look at
> > adding SEQUENCE support either.
> 
> Fine with me, but I still don't know which version of the column filtering
> patch should I look at ... maybe there's none up to date, at the moment?

I don't think there is one.  I think the latest is what I posted in
https://postgr.es/m/202109061751.3qz5xpugwx6w@alvherre.pgsql (At least I
don't see any reply from Rahila with attachments after that), but that
wasn't addressing a bunch of review comments that had been made; and I
suspect that Amit K has already committed a few conflicting patches
after that.

> > (I'm not sure what's going to be the proposal regarding FOR ALL TABLES
> > IN SCHEMA for sequences.  Are we going to have "FOR ALL SEQUENCES IN
> > SCHEMA" and "FOR ALL TABLES AND SEQUENCES IN SCHEMA"?)
> 
> Should be "FOR ABSOLUTELY EVERYTHING IN SCHEMA" of course ;-)

hahah ...

> On a more serious note, a comma-separated list of objects seems like the
> best / most flexible choice, i.e. "FOR TABLES, SEQUENCES IN SCHEMA"?

Hmm, not sure if bison is going to like that.  Maybe it's OK if
SEQUENCES is a fully reserved word?  But nothing beats experimentation!

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)




Re: Proposal: Save user's original authenticated identity for logging

2021-09-24 Thread Michael Paquier
On Fri, Sep 24, 2021 at 05:37:48PM -0400, Andrew Dunstan wrote:
> It probably wouldn't be a bad thing to have something somewhere
> (src/test/perl/README ?) that explains when and why we need to bump
> $Test::Builder::Level.

I have some ideas about that.  So I propose to move the discussion to
a new thread.
--
Michael


signature.asc
Description: PGP signature


Re: Column Filtering in Logical Replication

2021-09-24 Thread Tomas Vondra

On 9/25/21 12:24 AM, Alvaro Herrera wrote:

On 2021-Sep-24, Tomas Vondra wrote:


But that's not the column filtering patch, right? Why would this patch
depend on "schema level support", but maybe the consensus is there's some
common part that we need to get in first?


Yes, the grammar needs to be common.  I posted a proposed grammar in
https://www.postgresql.org/message-id/202109241325.eag5g6mpvoup%40alvherre.pgsql
(this thread) which should serve both.  I forgot to test the addition of
a WHERE clause for row filtering, though, and I didn't think to look at
adding SEQUENCE support either.



Fine with me, but I still don't know which version of the column 
filtering patch should I look at ... maybe there's none up to date, at 
the moment?



(I'm not sure what's going to be the proposal regarding FOR ALL TABLES
IN SCHEMA for sequences.  Are we going to have "FOR ALL SEQUENCES IN
SCHEMA" and "FOR ALL TABLES AND SEQUENCES IN SCHEMA"?)



Should be "FOR ABSOLUTELY EVERYTHING IN SCHEMA" of course ;-)

On a more serious note, a comma-separated list of objects seems like the 
best / most flexible choice, i.e. "FOR TABLES, SEQUENCES IN SCHEMA"?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Column Filtering in Logical Replication

2021-09-24 Thread Alvaro Herrera
On 2021-Sep-24, Tomas Vondra wrote:

> But that's not the column filtering patch, right? Why would this patch
> depend on "schema level support", but maybe the consensus is there's some
> common part that we need to get in first?

Yes, the grammar needs to be common.  I posted a proposed grammar in
https://www.postgresql.org/message-id/202109241325.eag5g6mpvoup%40alvherre.pgsql
(this thread) which should serve both.  I forgot to test the addition of
a WHERE clause for row filtering, though, and I didn't think to look at
adding SEQUENCE support either.

(I'm not sure what's going to be the proposal regarding FOR ALL TABLES
IN SCHEMA for sequences.  Are we going to have "FOR ALL SEQUENCES IN
SCHEMA" and "FOR ALL TABLES AND SEQUENCES IN SCHEMA"?)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)




Re: row filtering for logical replication

2021-09-24 Thread Tomas Vondra




On 9/24/21 7:20 AM, Amit Kapila wrote:

On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
 wrote:


6) parse_oper.c

I'm having some second thoughts about (not) allowing UDFs ...

Yes, I get that if the function starts failing, e.g. because querying a
dropped table or something, that breaks the replication and can't be
fixed without a resync.



The other problem is that users can access/query any table inside the
function and that also won't work in a logical decoding environment as
we use historic snapshots using which we can access only catalog
tables.



True. I always forget about some of these annoying issues. Let's 
document all of this in some comment / README. I see we still don't have


  src/backend/replication/logical/README

which is a bit surprising, considering how complex this code is.


That's pretty annoying, but maybe disallowing anything user-defined
(functions and operators) is maybe overly anxious? Also, extensibility
is one of the hallmarks of Postgres, and disallowing all custom UDF and
operators seems to contradict that ...

Perhaps just explaining that the expression can / can't do in the docs,
with clear warnings of the risks, would be acceptable.



I think the right way to support functions is by the explicit marking
of functions and in one of the emails above Jeff Davis also agreed
with the same. I think we should probably introduce a new marking for
this. I feel this is important because without this it won't be safe
to access even some of the built-in functions that can access/update
database (non-immutable functions) due to logical decoding environment
restrictions.



I agree that seems reasonable. Is there any reason why not to just use 
IMMUTABLE for this purpose? Seems like a good match to me.


Yes, the user can lie and label something that is not really IMMUTABLE, 
but that's his fault. Yes, it's harder to fix than e.g. for indexes.




12) misuse of REPLICA IDENTITY

The more I think about this, the more I think we're actually misusing
REPLICA IDENTITY for something entirely different. The whole purpose of
RI was to provide a row identifier for the subscriber.

But now we're using it to ensure we have all the necessary columns,
which is entirely orthogonal to the original purpose. I predict this
will have rather negative consequences.

People will either switch everything to REPLICA IDENTITY FULL, or create
bogus unique indexes with extra columns. Which is really silly, because
it wastes network bandwidth (transfers more data) or local resources
(CPU and disk space to maintain extra indexes).

IMHO this needs more infrastructure to request extra columns to decode
(e.g. for the filter expression), and then remove them before sending
the data to the subscriber.



Yeah, but that would have an additional load on write operations and I
am not sure at this stage but maybe there could be other ways to
extend the current infrastructure wherein we build the snapshots using
which we can access the user tables instead of only catalog tables.
Such enhancements if feasible would be useful not only for allowing
additional column access in row filters but for other purposes like
allowing access to functions that access user tables. I feel we can
extend this later as well seeing the usage and requests. For the first
version, this doesn't sound too limiting to me.



I'm not really buying the argument that this means overhead for write 
operations. Well, it does, but the current RI approach is forcing users 
to either use RIF or add an index covering the filter attributes. 
Neither of those options is free, and I'd bet the extra overhead of 
adding just the row filter columns would be actually lower.


If the argument is merely to limit the scope of this patch, fine. But 
I'd bet the amount of code we'd have to add to ExtractReplicaIdentity 
(or maybe somewhere close to it) would be fairly small. We'd need to 
cache which columns are needed (like RelationGetIndexAttrBitmap), and 
this might be a bit more complex, due to having to consider all the 
publications etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-24 Thread Melanie Plageman
On Thu, Sep 23, 2021 at 5:05 PM Melanie Plageman
 wrote:
>
> The attached v8 patchset is rewritten to add in an additional dimension
> -- buffer type. Now, a backend keeps track of how many buffers of a
> particular type (e.g. shared, local) it has accessed in a particular way
> (e.g. alloc, write). It also changes the naming of various structures
> and the view members.
>
> Previously, stats reset did not work since it did not consider live
> backends' counters. Now, the reset message includes the current live
> backends' counters to be tracked by the stats collector and used when
> the view is queried.
>
> The reset message is one of the areas in which I still need to do some
> work -- I shoved the array of PgBufferAccesses into the existing reset
> message used for checkpointer, bgwriter, etc. Before making a new type
> of message, I would like feedback from a reviewer about the approach.
>
> There are various TODOs in the code which are actually questions for the
> reviewer. Once I have some feedback, it will be easier to address these
> items.
>
> There a few other items which may be material for other commits that
> I would also like to do:
> 1) write wrapper functions for smgr* functions which count buffer
> accesses of the appropriate type. I wasn't sure if these should
> literally just take all the parameters that the smgr* functions take +
> buffer type. Once these exist, there will be less possibility for
> regressions in which new code is added using smgr* functions without
> counting this buffer activity. Once I add these, I was going to go
> through and replace existing calls to smgr* functions and thereby start
> counting currently uncounted buffer type accesses (direct, local, etc).
>
> 2) Separate checkpointer and bgwriter into two views and add additional
> stats to the bgwriter view.
>
> 3) Consider adding a helper function to pgstatfuncs.c to help create the
> tuplestore. These functions all have quite a few lines which are exactly
> the same, and I thought it might be nice to do something about that:
>   pg_stat_get_progress_info(PG_FUNCTION_ARGS)
>   pg_stat_get_activity(PG_FUNCTION_ARGS)
>   pg_stat_get_buffers_accesses(PG_FUNCTION_ARGS)
>   pg_stat_get_slru(PG_FUNCTION_ARGS)
>   pg_stat_get_progress_info(PG_FUNCTION_ARGS)
> I can imagine a function that takes a Datums array, a nulls array, and a
> ResultSetInfo and then makes the tuplestore -- though I think that will
> use more memory. Perhaps we could make a macro which does the initial
> error checking (checking if caller supports returning a tuplestore)? I'm
> not sure if there is something meaningful here, but I thought I would
> ask.
>
> Finally, I haven't removed the test in pg_stats and haven't done a final
> pass for comment clarity, alphabetization, etc on this version.
>

I have addressed almost all of the issues mentioned above in v9.
The only remaining TODOs are described in the commit message.
most critical one is that the reset message doesn't work.
From 9747484ad0b6f1fe97f98cfb681fa117982dfb2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 2 Sep 2021 11:47:41 -0400
Subject: [PATCH v9 3/3] Remove superfluous bgwriter stats

Remove stats from pg_stat_bgwriter which are now more clearly expressed
in pg_stat_buffers.

TODO:
- make pg_stat_checkpointer view and move relevant stats into it
- add additional stats to pg_stat_bgwriter
---
 doc/src/sgml/monitoring.sgml  | 47 ---
 src/backend/catalog/system_views.sql  |  6 +---
 src/backend/postmaster/checkpointer.c | 26 ---
 src/backend/postmaster/pgstat.c   |  5 ---
 src/backend/storage/buffer/bufmgr.c   |  6 
 src/backend/utils/adt/pgstatfuncs.c   | 30 -
 src/include/catalog/pg_proc.dat   | 22 -
 src/include/pgstat.h  | 10 --
 src/test/regress/expected/rules.out   |  5 ---
 9 files changed, 1 insertion(+), 156 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 60627c692a..08772652ac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3416,24 +3416,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_checkpoint bigint
-  
-  
-   Number of buffers written during checkpoints
-  
- 
-
- 
-  
-   buffers_clean bigint
-  
-  
-   Number of buffers written by the background writer
-  
- 
-
  
   
maxwritten_clean bigint
@@ -3444,35 +3426,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_backend bigint
-  
-  
-   Number of buffers written directly by a backend
-  
- 
-
- 
-  
-   buffers_backend_fsync bigint
-  
-  
-   Number of times a backend had to execute its own
-   fsync call (normally the background writer 

typos

2021-09-24 Thread Justin Pryzby
A compilation of fixes for master.

The first patch should be applied to v13 - the typo was already fixed in master
but not backpatched.
>From 8496bfa328d40cce9afdc4b491243a3ab9b0b528 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Sep 2021 16:19:54 -0500
Subject: [PATCH] unused COLUMNS (plural)

This was fixed in master since 1ed6b8956, but the typo should be backpatched
since it's user-facing documentation publicized on the website for the next 5
years
---
 doc/src/sgml/catalogs.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 304196c216..0e4d7617c4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4704,7 +4704,7 @@ SCRAM-SHA-256$iteration 
count:
   
 
   
-   Unused column contain zeroes. For example, 
oprleft
+   Unused columns contain zeroes. For example, 
oprleft
is zero for a prefix operator.
   
 
-- 
2.17.0

>From f65f3b4d7523ba2e40bb4913fd225308426f7a9c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 28 Apr 2021 14:12:49 -0500
Subject: [PATCH 01/10] Avoid double parens

git grep -l '\tuphdr->t_hoff));
-		else if ((infomask & HEAP_HASNULL))
+		else if (infomask & HEAP_HASNULL)
 			report_corruption(ctx,
 			  psprintf("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)",
 	   expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 15115cb29f..b75f35d5b5 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -661,17 +661,17 @@ deparse_lquery(const lquery *in)
 }
 memcpy(ptr, curtlevel->name, curtlevel->len);
 ptr += curtlevel->len;
-if ((curtlevel->flag & LVAR_SUBLEXEME))
+if (curtlevel->flag & LVAR_SUBLEXEME)
 {
 	*ptr = '%';
 	ptr++;
 }
-if ((curtlevel->flag & LVAR_INCASE))
+if (curtlevel->flag & LVAR_INCASE)
 {
 	*ptr = '@';
 	ptr++;
 }
-if ((curtlevel->flag & LVAR_ANYEND))
+if (curtlevel->flag & LVAR_ANYEND)
 {
 	*ptr = '*';
 	ptr++;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6597ec45a9..8350a8fd19 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2427,7 +2427,7 @@ PrepareTransaction(void)
 	 * cases, such as a temp table created and dropped all within the
 	 * transaction.  That seems to require much more bookkeeping though.
 	 */
-	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
+	if (MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot PREPARE a transaction that has operated on temporary objects")));
@@ -5540,7 +5540,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_xinfo.xinfo |= XACT_COMPLETION_UPDATE_RELCACHE_FILE;
 	if (forceSyncCommit)
 		xl_xinfo.xinfo |= XACT_COMPLETION_FORCE_SYNC_COMMIT;
-	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+	if (xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK)
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	/*
@@ -5691,7 +5691,7 @@ XactLogAbortRecord(TimestampTz abort_time,
 
 	xlrec.xact_time = abort_time;
 
-	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+	if (xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK)
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	if (nsubxacts > 0)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dbee6ae199..e018cdfd9e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16309,7 +16309,7 @@ PreCommit_on_commit_actions(void)
  * relations, we can skip truncating ON COMMIT DELETE ROWS
  * tables, as they must still be empty.
  */
-if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
+if (MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE)
 	oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
 break;
 			case ONCOMMIT_DROP:
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..e5e82cb85f 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2390,7 +2390,7 @@ query_tree_walker(Query *query,
 	 * don't contain actual expressions. However they do contain OIDs which
 	 * may be needed by dependency walkers etc.
 	 */
-	if ((flags & QTW_EXAMINE_SORTGROUP))
+	if (flags & QTW_EXAMINE_SORTGROUP)
 	{
 		if (walker((Node *) query->groupClause, context))
 			return true;
@@ -3328,7 +3328,7 @@ query_tree_mutator(Query *query,
 	 * may be of interest to some mutators.
 	 */
 
-	if ((flags & QTW_EXAMINE_SORTGROUP))
+	if (flags & QTW_EXAMINE_SORTGROUP)
 	{
 		MUTATE(query->groupClause, query->groupClause, List *);
 		MUTATE(query->windowClause, query->windowClause, List *);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 2874dc0612..fc9a0d67c9 100644
--- 

Re: Proposal: Save user's original authenticated identity for logging

2021-09-24 Thread Andrew Dunstan


On 9/23/21 5:20 PM, Peter Eisentraut wrote:
> On 23.09.21 12:34, Michael Paquier wrote:
>> On Wed, Sep 22, 2021 at 03:18:43PM +, Jacob Champion wrote:
>>> On Wed, 2021-09-22 at 10:20 +0200, Peter Eisentraut wrote:
 This should be added to each level of a function call that
 represents a
 test.  This ensures that when a test fails, the line number points to
 the top-level location of the test_role() call.  Otherwise it would
 point to the connect_ok() call inside test_role().
>>>
>>> Patch LGTM, sorry about that. Thanks!
>>
>> For the places of the patch, that seems fine then.  Thanks!
>
> committed
>
>> Do we need to care about that in other places?  We have tests in
>> src/bin/ using subroutines that call things from PostgresNode.pm or
>> TestLib.pm, like pg_checksums, pg_ctl or pg_verifybackup, just to name
>> three.
>
> Yeah, at first glance, there is probably more that could be done. 
> Here, I was just looking at a place where it was already and was
> accidentally removed.



It probably wouldn't be a bad thing to have something somewhere
(src/test/perl/README ?) that explains when and why we need to bump
$Test::Builder::Level.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: row filtering for logical replication

2021-09-24 Thread Tomas Vondra




On 9/24/21 8:09 AM, Amit Kapila wrote:

On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
 wrote:


13) turning update into insert

I agree with Ajin Cherian [4] that looking at just old or new row for
updates is not the right solution, because each option will "break" the
replica in some case. So I think the goal "keeping the replica in sync"
is the right perspective, and converting the update to insert/delete if
needed seems appropriate.

This seems a somewhat similar to what pglogical does, because that may
also convert updates (although only to inserts, IIRC) when handling
replication conflicts. The difference is pglogical does all this on the
subscriber, while this makes the decision on the publisher.

I wonder if this might have some negative consequences, or whether
"moving" this to downstream would be useful for other purposes in the
fuure (e.g. it might be reused for handling other conflicts).



Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.



I'm certainly not suggesting what we're doing is wrong. Given the design 
of built-in logical replication it makes sense doing it this way, I was 
just thinking aloud about what we might want to do in the future (e.g. 
pglogical uses this to deal with conflicts between multiple sources, and 
so on).





15) pgoutput_row_filter initializing filter

I'm not sure I understand why the filter initialization gets moved from
get_rel_sync_entry. Presumably, most of what the replication does is
replicating rows, so I see little point in not initializing this along
with the rest of the rel_sync_entry.



Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.



Not sure, but the arguments presented there seem a bit wonky ...

Yes, the work would be wasted if we discard the cached data without 
using it (it might happen for truncate, I'm not sure). But how likely is 
it that such operations happen *in isolation*? I'd bet the workload is 
almost never just a stream of truncates - there are always some 
operations in between that would actually use this.


Similarly for the errors - IIRC hitting an error means the replication 
restarts, which is orders of magnitude more expensive than anything we 
can save by this delayed evaluation.


I'd keep it simple, for the sake of simplicity of the whole patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Column Filtering in Logical Replication

2021-09-24 Thread Tomas Vondra




On 9/24/21 7:05 AM, vignesh C wrote:

On Fri, Sep 24, 2021 at 8:40 AM Amit Kapila  wrote:


On Fri, Sep 24, 2021 at 12:45 AM Tomas Vondra
 wrote:


Hi,

I wanted to do a review of this patch, but I'm a bit confused about
which patch(es) to review. There's the v5 patch, and then these two
patches - which seem to be somewhat duplicate, though.

Can anyone explain what's the "current" patch version, or perhaps tell
me which of the patches to combine?



I think v5 won't work atop a common grammar patch. There need some
adjustments in v5. I think it would be good if we can first get the
common grammar patch reviewed/committed and then build this on top of
it. The common grammar and the corresponding implementation are being
accomplished in the Schema support patch, the latest version of which
is at [1].


I have posted an updated patch with the fixes at [1], please review
the updated patch.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com



But that's not the column filtering patch, right? Why would this patch 
depend on "schema level support", but maybe the consensus is there's 
some common part that we need to get in first?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Gather performance analysis

2021-09-24 Thread Tomas Vondra

On 9/24/21 7:08 PM, Robert Haas wrote:

On Fri, Sep 24, 2021 at 3:50 AM Dilip Kumar  wrote:

Tomas, can you share your test script, I would like to repeat the same
test in my environment and with different batching sizes.


I think it's probably the run.sh file attached to
http://postgr.es/m/d76a759d-9240-94f5-399e-ae244e5f0...@enterprisedb.com



Yep. I've used a slightly modified version, which also varies the queue 
size, but we can ignore that and the rest of the script is the same.


As for the config, I've used only slightly tuned postgresql.conf, with 
shared_buffers set to 1GB, and increased worker limits.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Gather performance analysis

2021-09-24 Thread Tomas Vondra

On 9/24/21 1:43 AM, Robert Haas wrote:

On Thu, Sep 23, 2021 at 5:36 PM Tomas Vondra
 wrote:

(c) This can't explain the slowdown for cases without any Gather nodes
(and it's ~17%, so unlikely due to binary layout).


Yeah, but none of the modified code would even execute in those cases,
so it's either binary layout, something wrong in your test
environment, or gremlins.



I'm not going to the office all that often these days, but I think I'm 
sure I'd notice a bunch of gremlins running around ... so it's probably 
the binary layout thing. But still, strange.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: .ready and .done files considered harmful

2021-09-24 Thread David Steele

On 9/24/21 12:28 PM, Robert Haas wrote:

On Thu, Sep 16, 2021 at 7:26 PM Bossart, Nathan  wrote:

What do you think?


I think this is committable. I also went back and looked at your
previous proposal to do files in batches, and I think that's also
committable. After some reflection, I think I have a slight preference
for the batching approach.
It seems like it might lend itself to archiving multiple files in a
single invocation of the archive_command, and Alvaro just suggested it
again apparently not having realized that it had been previously
proposed by Andres, so I guess it has the further advantage of being
the thing that several committers intuitively feel like we ought to be
doing to solve this problem.


I also prefer this approach. Reducing directory scans is an excellent 
optimization, but from experience I know that execution time for the 
archive_command can also be a significant bottleneck. Begin able to 
archive multiple segments per execution would be a big win in certain 
scenarios.



So what I am inclined to do is commit
v1-0001-Improve-performance-of-pgarch_readyXlog-with-many.patch.


I read the patch and it looks good to me.

I do wish we had a way to test that history files get archived first, 
but as I recall I was not able to figure out how to do reliably for [1] 
without writing a custom archive_command just for testing. That is 
something we might want to consider as we make this logic more complex.


Regards,
--
-David
da...@pgmasters.net

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b981df4cc09aca978c5ce55e437a74913d09





Re: Corruption during WAL replay

2021-09-24 Thread Tom Lane
Robert Haas  writes:
> On Fri, Sep 24, 2021 at 3:42 PM Tom Lane  wrote:
>> I think the basic idea is about right, but I'm not happy with the
>> three-way delayChkpt business; that seems too cute by three-quarters.

> Nobody, but the version of the patch that I was looking at uses a
> separate bit for each one:

> +/* symbols for PGPROC.delayChkpt */
> +#define DELAY_CHKPT_START (1<<0)
> +#define DELAY_CHKPT_COMPLETE (1<<1)

Hm, that's not in the patch version that the CF app claims to be
latest [1].  It does this:

+/* type for PGPROC.delayChkpt */
+typedef enum DelayChkptType
+{
+   DELAY_CHKPT_NONE = 0,
+   DELAY_CHKPT_START,
+   DELAY_CHKPT_COMPLETE
+} DelayChkptType;

which seems like a distinct disimprovement over what you're quoting.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/20210106.173327.1444585955309078930.horikyota@gmail.com




Re: Corruption during WAL replay

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 3:42 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I like this patch.
>
> I think the basic idea is about right, but I'm not happy with the
> three-way delayChkpt business; that seems too cute by three-quarters.
> I think two independent boolean flags, one saying "I'm preventing
> checkpoint start" and one saying "I'm preventing checkpoint completion",
> would be much less confusing and also more future-proof.  Who's to say
> that we won't ever need both states to be set in the same process?

Nobody, but the version of the patch that I was looking at uses a
separate bit for each one:

+/* symbols for PGPROC.delayChkpt */
+#define DELAY_CHKPT_START (1<<0)
+#define DELAY_CHKPT_COMPLETE (1<<1)

One could instead use separate Booleans, but there doesn't seem to be
anything three-way about this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Corruption during WAL replay

2021-09-24 Thread Tom Lane
Robert Haas  writes:
> I like this patch.

I think the basic idea is about right, but I'm not happy with the
three-way delayChkpt business; that seems too cute by three-quarters.
I think two independent boolean flags, one saying "I'm preventing
checkpoint start" and one saying "I'm preventing checkpoint completion",
would be much less confusing and also more future-proof.  Who's to say
that we won't ever need both states to be set in the same process?

I also dislike the fact that the patch has made procarray.h depend
on proc.h ... maybe I'm wrong, but I thought that there was a reason
for keeping those independent, if indeed this hasn't actually resulted
in a circular-includes situation.  If we avoid inventing that enum type
then there's no need for that.  If we do need an enum, maybe it could
be put in some already-common prerequisite header.

regards, tom lane




Re: logical decoding and replication of sequences

2021-09-24 Thread Tomas Vondra

Hi,

On 9/23/21 12:27 PM, Peter Eisentraut wrote:

On 30.07.21 20:26, Tomas Vondra wrote:
Here's a an updated version of this patch - rebased to current master, 
and fixing some of the issues raised in Peter's review.


This patch needs an update, as various conflicts have arisen now.

As was discussed before, it might be better to present a separate patch 
for just the logical decoding part for now, since the replication and 
DDL stuff has the potential to conflict heavily with other patches being 
discussed right now.  It looks like cutting this patch in two should be 
doable easily.




Attached is the rebased patch, split into three parts:

1) basic decoding infrastructure (decoding, reorderbuffer etc.)
2) support for sequences in test_decoding
3) support for sequences in built-in replication (catalogs, syntax, ...)

The last part is the largest one - I'm sure we'll have discussions about 
the grammar, adding sequences automatically, etc. But as you said, let's 
focus on the first part, which deals with the required decoding stuff.


I've added a couple comments, explaining how we track sequences, why we 
need the XID in nextval() etc. I've also added streaming support.



I looked through the test cases in test_decoding again.  It all looks 
pretty sensible.  If anyone can think of any other tricky or dubious 
cases, we can add them there.  It's easiest to discuss these things with 
concrete test cases rather than in theory.


One slightly curious issue is that this can make sequence values go 
backwards, when seen by the logical decoding consumer, like in the test 
case:


+ BEGIN
+ sequence: public.test_sequence transactional: 1 created: 1 last_value: 
1, log_cnt: 0 is_called: 0

+ COMMIT
+ sequence: public.test_sequence transactional: 0 created: 0 last_value: 
33, log_cnt: 0 is_called: 1

+ BEGIN
+ sequence: public.test_sequence transactional: 1 created: 1 last_value: 
4, log_cnt: 0 is_called: 1

+ COMMIT
+ sequence: public.test_sequence transactional: 0 created: 0 last_value: 
334, log_cnt: 0 is_called: 1


I suppose that's okay, since it's not really the intention that someone 
is concurrently consuming sequence values on the subscriber.  Maybe 
something for the future.  Fixing that would require changing the way 
transactional sequence DDL updates these values, so it's not directly 
the job of the decoding to address this.




Yeah, that's due to how ALTER SEQUENCE does things, and I agree redoing 
that seems well out of scope for this patch. What seems a bit suspicious 
is that some of the ALTER SEQUENCE changes have "created: 1" - it's 
probably correct, though, because those ALTER SEQUENCE statements can be 
rolled-back, so we see it as if a new sequence is created. The flag name 
might be a bit confusing, though.


A small thing I found: Maybe the text that test_decoding produces for 
sequences can be made to look more consistent with the one for tables. 
For example, in


+ BEGIN
+ sequence: public.test_table_a_seq transactional: 1 created: 1 
last_value: 1, log_cnt: 0 is_called: 0
+ sequence: public.test_table_a_seq transactional: 1 created: 0 
last_value: 33, log_cnt: 0 is_called: 1

+ table public.test_table: INSERT: a[integer]:1 b[integer]:100
+ table public.test_table: INSERT: a[integer]:2 b[integer]:200
+ COMMIT

note how the punctuation is different.


I did tweak this a bit, hopefully it's more consistent.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 9b21a107dee280333e1f849134c22b2f4038d5b9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 24 Sep 2021 00:41:33 +0200
Subject: [PATCH 1/3] Logical decoding of sequences

This adds the infrastructure for logical decoding of sequences. Support
for built-in logical replication and test_decoding is added separately.

When decoding sequences, we differentiate between a sequences created in
a (running) transaction, and sequences created in other (already
committed) transactions. Changes for sequences created in the same
top-level transaction are treated as "transactional" i.e. just like any
other change from that transaction (and discarded in case of a
rollback). Changes for sequences created earlier are treated as not
transactional - are processed immediately, as if performed outside any
transaction (and thus not rolled back).

This mixed behavior is necessary - sequences are non-transactional (e.g.
ROLLBACK does not undo the sequence increments). But for new sequences,
we need to handle them in a transactional way, because if we ever get
some DDL support, the sequence won't exist until the transaction gets
applied. So we need to ensure the increments don't happen until the
sequence gets created.

To differentiate which sequences are "old" and which were created in a
still-running transaction, we track sequences created in running
transactions in a hash table. Each sequence is identified by it's
relfilenode, and at transaction commit we discard all sequences created
in 

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-09-24 Thread Tom Lane
Etsuro Fujita  writes:
> One thing I noticed is that collatable operators/functions sent to the
> remote might also cause an unexpected result when the default
> collations are not compatible.  Consider this example (even with your
> patch):
> ...
> where ft1 is a foreign table with an integer column c1.  As shown
> above, the sort using the collatable function chr() is performed
> remotely, so the select query might produce the result in an
> unexpected sort order when the default collations are not compatible.

I don't think there's anything really new there --- it's still assuming
that COLLATE "default" means the same locally and remotely.

As a short-term answer, I propose that we apply (and back-patch) the
attached documentation changes.

Longer-term, it seems like we really have to be able to represent
the notion of a remote column that has an "unknown" collation (that
is, one that doesn't match any local collation, or at least is not
known to do so).  My previous patch essentially makes "default" act
that way, but conflating "unknown" with "default" has too many
downsides.  A rough sketch for making this happen is:

1. Create a built-in "unknown" entry in pg_collation.  Insert some
hack or other to prevent this from being applied to any real, local
column; but allow foreign-table columns to have it.

2. Apply mods, probably fairly similar to my patch, that prevent
postgres_fdw from believing that "unknown" matches any local
collation.  (Hm, actually maybe no special code change will be
needed here, once "unknown" has its own OID?)

3. Change postgresImportForeignSchema so that it can substitute
the "unknown" collation at need.  The exact rules for this could
be debated depending on whether you'd rather prioritize safety or
ease-of-use, but I think at least we should use "unknown" whenever
import_collate is turned off.  Perhaps there should be an option
to substitute it for remote "default" as well.  (Further down the
road, perhaps that could be generalized to allow a user-controlled
mapping from remote to local collations.)

Anyway, I think I should withdraw the upthread patch; we don't
want to go that way.

regards, tom lane

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bf95da9721..dbc11694a0 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -95,8 +95,8 @@
   referenced columns of the remote table.  Although postgres_fdw
   is currently rather forgiving about performing data type conversions at
   need, surprising semantic anomalies may arise when types or collations do
-  not match, due to the remote server interpreting WHERE clauses
-  slightly differently from the local server.
+  not match, due to the remote server interpreting query conditions
+  differently from the local server.
  
 
  
@@ -537,6 +537,17 @@ OPTIONS (ADD password_required 'false');
need to turn this off if the remote server has a different set of
collation names than the local server does, which is likely to be the
case if it's running on a different operating system.
+   If you do so, however, there is a very severe risk that the imported
+   table columns' collations will not match the underlying data, resulting
+   in anomalous query behavior.
+  
+
+  
+   Even when this parameter is set to true, importing
+   columns whose collation is the remote server's default can be risky.
+   They will be imported with COLLATE "default", which
+   will select the local server's default collation, which could be
+   different.
   
  
 


Re: decoupling table and index vacuum

2021-09-24 Thread Robert Haas
On Wed, Sep 15, 2021 at 6:08 PM Peter Geoghegan  wrote:
> Have you started any work on this project? I think that it's a very good idea.

Actually, I have. I've been focusing on trying to create a general
infrastructure for conveyor belt storage. An incomplete and likely
quite buggy version of this can be found here:

https://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/conveyor

Mark Dilger has been helping me debug it, but it's still very early
days. I was planning to wait until it was a little more baked before
posting it to the list, but since you asked...

Once that infrastructure is sufficiently mature, then the next step, I
think, would be to try to use it to store dead TIDs.

And then after that, one has to think about how autovacuum scheduling
ought to work in a world where table vacuuming and index vacuuming are
decoupled.

This is a very hard problem, and I don't expect to solve it quickly. I
do hope to keep plugging away at it, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: extensible options syntax for replication parser?

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 2:28 PM tushar  wrote:
> Please refer this scenario where publication on v14RC1  and subscription
> on HEAD (w/patch)
>
> --create a subscription with parameter two_phase=1 on HEAD
>
> postgres=# CREATE SUBSCRIPTION r1015 CONNECTION 'dbname=postgres
> host=localhost port=5454' PUBLICATION p WITH (two_phase=1);
> NOTICE:  created replication slot "r1015" on publisher
> CREATE SUBSCRIPTION
> postgres=#
>
> --check on 14RC1
>
> postgres=# select two_phase from pg_replication_slots where
> slot_name='r105';
>   two_phase
> ---
>   f
> (1 row)
>
> so are we silently ignoring this parameter as it is not supported on
> v14RC/HEAD ? and if yes then why not we just throw an message like
> ERROR:  unrecognized subscription parameter: "two_phase"

two_phase is new in v15, something you could also find out by checking
the documentation. Now if the patch changes the way two_phase
interacts with older versions, that's a bug in the patch and we should
fix it. But if the same issue exists without the patch then I'm not
sure why you are raising it here rather than on the thread where that
feature was developed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: extensible options syntax for replication parser?

2021-09-24 Thread tushar

On 9/24/21 11:57 PM, tushar wrote:
postgres=# select two_phase from pg_replication_slots where 
slot_name='r105'; 


Correction -Please read  'r105' to 'r1015'

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: extensible options syntax for replication parser?

2021-09-24 Thread tushar

On 9/24/21 10:36 PM, Robert Haas wrote:

Here's v9, fixing the issue reported by Fujii Masao.


Please refer this scenario where publication on v14RC1  and subscription 
on HEAD (w/patch)


--create a subscription with parameter two_phase=1 on HEAD

postgres=# CREATE SUBSCRIPTION r1015 CONNECTION 'dbname=postgres 
host=localhost port=5454' PUBLICATION p WITH (two_phase=1);

NOTICE:  created replication slot "r1015" on publisher
CREATE SUBSCRIPTION
postgres=#

--check on 14RC1

postgres=# select two_phase from pg_replication_slots where 
slot_name='r105';

 two_phase
---
 f
(1 row)

so are we silently ignoring this parameter as it is not supported on 
v14RC/HEAD ? and if yes then why not we just throw an message like

ERROR:  unrecognized subscription parameter: "two_phase"

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2021-09-24 Thread Tom Lane
Robert Haas  writes:
> The commit message for 0001 is not clear enough for me to understand
> what problem it's supposed to be fixing. The code comments aren't
> really either. They make it sound like there's some problem with
> copying symlinks but mostly they just talk about callbacks, which
> doesn't really help me understand what problem we'd have if we just
> didn't commit this (or reverted it later).

> I am not really convinced by Álvaro's claim that 0004 is a "fix"; I
> think I'd call it an improvement. But either way I agree that could
> just be committed.

> I haven't analyzed 0002 and 0003 yet.

I took a quick look through this:

* I don't like 0001 either, though it seems like the issue is mostly
documentation.  sub _srcsymlink should have a comment explaining
what it's doing and why.  The documentation of copypath's new parameter
seems like gobbledegook too --- I suppose it should read more like
"By default, copypath fails if a source item is a symlink.  But if
B is provided, that subroutine is called to process any
symlink."

* I'm allergic to 0002's completely undocumented changes to
poll_query_until, especially since I don't see anything in the
patch that actually uses them.  Can't we just drop these diffs
in PostgresNode.pm?  BTW, the last error message in the patch,
talking about a 5-second timeout, seems wrong.  With or without
these changes, poll_query_until's default timeout is 180 sec.
The actual test case might be okay other than that nit and a
comment typo or two.

* 0003 might actually be okay.  I've not read it line-by-line,
but it seems like it's implementing a sane solution and it's
adequately commented.

* I'm inclined to reject 0004 out of hand, because I don't
agree with what it's doing.  The purpose of the rmgrdesc
functions is to show you what is in the WAL records, and
everywhere else we interpret that as "show the verbatim,
numeric field contents".  heapdesc.c, for example, doesn't
attempt to look up the name of the table being operated on.
0004 isn't adhering to that style, and aside from being
inconsistent I'm afraid that it's adding failure modes
we don't want.

regards, tom lane




Re: extensible options syntax for replication parser?

2021-09-24 Thread tushar

On 9/24/21 10:36 PM, Robert Haas wrote:

I am not able to reproduce this failure. I suspect you made a mistake
in testing, because my test case before sending the patch was
basically the same as yours, except that I was testing with v13. But I
tried again with v12 and it seems fine:


Right, on a clean setup -I am not also not able to reproduce this issue. 
Thanks for checking at your end.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: prevent immature WAL streaming

2021-09-24 Thread Hannu Krosing
Hi Alvaro,

I just started reading this thread, but maybe you can confirm or
refute my understanding of what was done.

In the first email you write

> As mentioned in the course of thread [1], we're missing a fix for
streaming replication to avoid sending records that the primary hasn't
fully flushed yet.  This patch is a first attempt at fixing that problem
by retreating the LSN reported as FlushPtr whenever a segment is
registered, based on the understanding that if no registration exists
then the LogwrtResult.Flush pointer can be taken at face value; but if a
registration exists, then we have to stream only till the start LSN of
that registered entry.

So did we end up holding back the wal_sender to not send anything that
is not confirmed as flushed on master

Are there measurements on how much this slows down replication
compared to allowing sending the moment it is written in buffers but
not necessarily flushed locally ?

Did we investigate possibility of sending as fast as possible and
controlling the flush synchronisation by sending separate flush
pointers *both* ways ?

And maybe there was even an alternative considered where we are
looking at a more general Durability, for example 2-out-of-3 where
primary is one of the 3 and not necessarily the most durable one?


-
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.




On Fri, Sep 24, 2021 at 4:33 AM Alvaro Herrera  wrote:
>
> On 2021-Sep-23, Alvaro Herrera wrote:
>
> > However, I notice now that the pg_rewind tests reproducibly fail in
> > branch 14 for reasons I haven't yet understood.  It's strange that no
> > other branch fails, even when run quite a few times.
>
> Turns out that this is a real bug (setting EndOfLog seems insufficient).
> I'm looking into it.
>
> --
> Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
> "No necesitamos banderas
>  No reconocemos fronteras"  (Jorge González)
>
>




Re: Gather performance analysis

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 1:19 AM Dilip Kumar  wrote:
> I am looking at the "query-results.ods" file shared by Tomas, with a
> million tuple I do not really see where the patch hurts? because I am
> seeing in most of the cases the time taken by the patch is 60-80%
> compared to the head.  And the worst case with a million tuple is
> 100.32% are are we pointing to that 0.32% or there is something else
> that I am missing here.

The spreadsheet has two tabs. Flip to "xeon e5-2620v3" and scroll all
the way down.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Gather performance analysis

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 3:50 AM Dilip Kumar  wrote:
> Tomas, can you share your test script, I would like to repeat the same
> test in my environment and with different batching sizes.

I think it's probably the run.sh file attached to
http://postgr.es/m/d76a759d-9240-94f5-399e-ae244e5f0...@enterprisedb.com

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: extensible options syntax for replication parser?

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 7:28 AM tushar  wrote:
> On 9/23/21 8:35 PM, Robert Haas wrote:
> > Thanks. Looks like that version had some stupid mistakes. Here's a new one.
>
> Thanks, the reported issue seems to be fixed now for HEAD w/patch
> (publication) to HEAD w/patch (subscription) but still getting the same
> error if we try to perform v12(publication) to HEAD
> w/patch(subscription) . I checked there is no such issue for
> v12(publication) to v14 RC1 (subscription)
>
> postgres=#  create subscription sub123s CONNECTION 'host=127.0.0.1
> user=edb  port= dbname=postgres' PUBLICATION pp with (slot_name =
> from_v14);
> ERROR:  could not create replication slot "from_v14": ERROR: syntax error
> postgres=#

I am not able to reproduce this failure. I suspect you made a mistake
in testing, because my test case before sending the patch was
basically the same as yours, except that I was testing with v13. But I
tried again with v12 and it seems fine:

[rhaas pgsql]$ createdb -p 5412
[rhaas pgsql]$ psql -c 'select version()' -p 5412
version

 PostgreSQL 12.3 on x86_64-apple-darwin19.4.0, compiled by clang
version 5.0.2 (tags/RELEASE_502/final), 64-bit
(1 row)
[rhaas pgsql]$ psql
psql (15devel)
Type "help" for help.

rhaas=# create subscription sub123s CONNECTION 'port=5412' PUBLICATION
pp with (slot_name =
from_v14);
NOTICE:  created replication slot "from_v14" on publisher
CREATE SUBSCRIPTION

Here's v9, fixing the issue reported by Fujii Masao.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v9-0001-Flexible-options-for-BASE_BACKUP.patch
Description: Binary data


v9-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patch
Description: Binary data


Re: DELETE CASCADE

2021-09-24 Thread Tom Lane
[ a couple of random thoughts after quickly scanning this thread ... ]

David Christensen  writes:
> I assume this would look something like:
> ALTER TABLE foo ALTER CONSTRAINT my_fkey ON UPDATE CASCADE ON DELETE RESTRICT
> with omitted referential_action implying preserving the existing one.

I seem to remember somebody working on exactly that previously, though
it's evidently not gotten committed.  In any case, we already have

ALTER TABLE ... ALTER CONSTRAINT constraint_name
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY 
IMMEDIATE ]

which has to modify pg_trigger rows, so it's hard to see why it'd
be at all difficult to implement this using code similar to that
(maybe even sharing much of the code).

Returning to the original thought of a DML statement option to temporarily
override the referential_action, I wonder why only temporarily-set-CASCADE
was considered.  It seems to me like there might also be use-cases for
temporarily selecting the SET NULL or SET DEFAULT actions.

Another angle is that if we consider the deferrability properties as
precedent, there already is a way to override an FK constraint's
deferrability for the duration of a transaction: see SET CONSTRAINTS.
So I wonder if maybe the way to treat this is to invent something like

SET CONSTRAINTS my_fk_constraint [,...] ON DELETE referential_action

which would override the constraints' action for the remainder of the
transaction.  (Permission needs TBD, but probably the same as you
would need to create a new FK constraint on the relevant table.)

In comparison to the original proposal, this'd force you to be explicit
about which constraint(s) you intend to override, but TBH I think that's
a good thing.

One big practical problem, which we've never addressed in the context of
SET CONSTRAINTS but maybe it's time to tackle, is that the SQL spec
defines the syntax like that because it thinks constraint names are
unique per-schema; thus a possibly-schema-qualified name is sufficient
ID.  Of course we say that constraint names are only unique per-table,
so SET CONSTRAINTS has always had this issue of not being very carefully
targeted.  I think we could do something like extending the syntax
to be

SET CONSTRAINTS conname [ON tablename] [,...] new_properties

Anyway, just food for thought --- I'm not necessarily set on any
of this.

regards, tom lane




Re: extensible options syntax for replication parser?

2021-09-24 Thread Robert Haas
On Fri, Sep 24, 2021 at 12:01 AM Fujii Masao
 wrote:
> You seem to accidentally remove the index term for BASE_BACKUP.

Good catch.

> +ident_or_keyword:
> +   IDENT 
>   { $$ = $1; }
>
> ident_or_keyword seems to be used only for generic options,
> but it includes the keywords for legacy options like "FAST".
> Isn't it better to remove the keywords for legacy options from
> ident_or_keyword?

I don't think so. I mean, if we do, then it's not really an
ident_or_keyword production any more, because it would only allow some
keywords, not all. Now if the keywords that are not included aren't
used as options anywhere then it won't matter, but it seems cleaner to
me to make the list complete.

> OTOH, the keywords for newly-introduced generic options like
> CHECKPOINT should be defined in repl_scanner.l and repl_gram.y?

One big advantage of this approach is that we don't need to make
changes to those files in order to add new options, so I feel like
that would be missing the point completely.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




how to distinguish between using the server as a standby or for executing a targeted recovery in PG 11?

2021-09-24 Thread Bharath Rupireddy
Hi,

I'm trying to set up a postgres server with version 11 in targeted
recovery mode (for the first time after my journey started with
postgres) and I came across the explanation at [1] in PG 12 and newer
versions that we have a clear differentiation as to what is the
"standby" mode or "targeted recovery" mode. How do we differentiate
these two modes in PG 11? Can anyone please help me with it?

PS: hackers-list may not be the right place to ask, but I'm used to
seeking help from it.

[1] From the https://www.postgresql.org/docs/12/runtime-config-wal.html:

"19.5.4. Archive Recovery

This section describes the settings that apply only for the duration
of the recovery. They must be reset for any subsequent recovery you
wish to perform.

“Recovery” covers using the server as a standby or for executing a
targeted recovery. Typically, standby mode would be used to provide
high availability and/or read scalability, whereas a targeted recovery
is used to recover from data loss.

To start the server in standby mode, create a file called
standby.signal in the data directory. The server will enter recovery
and will not stop recovery when the end of archived WAL is reached,
but will keep trying to continue recovery by connecting to the sending
server as specified by the primary_conninfo setting and/or by fetching
new WAL segments using restore_command. For this mode, the parameters
from this section and Section 19.6.3 are of interest. Parameters from
Section 19.5.5 will also be applied but are typically not useful in
this mode.

To start the server in targeted recovery mode, create a file called
recovery.signal in the data directory. If both standby.signal and
recovery.signal files are created, standby mode takes precedence.
Targeted recovery mode ends when the archived WAL is fully replayed,
or when recovery_target is reached. In this mode, the parameters from
both this section and Section 19.5.5 will be used."

Regards,
Bharath Rupireddy.




Re: .ready and .done files considered harmful

2021-09-24 Thread Robert Haas
On Thu, Sep 16, 2021 at 7:26 PM Bossart, Nathan  wrote:
> What do you think?

I think this is committable. I also went back and looked at your
previous proposal to do files in batches, and I think that's also
committable. After some reflection, I think I have a slight preference
for the batching approach.
It seems like it might lend itself to archiving multiple files in a
single invocation of the archive_command, and Alvaro just suggested it
again apparently not having realized that it had been previously
proposed by Andres, so I guess it has the further advantage of being
the thing that several committers intuitively feel like we ought to be
doing to solve this problem.

So what I am inclined to do is commit
v1-0001-Improve-performance-of-pgarch_readyXlog-with-many.patch.
However, v6-0001-Do-fewer-directory-scans-of-archive_status.patch has
perhaps evolved a bit more than the other one, so I thought I should
first ask whether any of those changes have influenced your thinking
about the batching approach and whether you want to make any updates
to that patch first. I don't really see that this is needed, but I
might be missing something.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade test for binary compatibility of core data types

2021-09-24 Thread Andrew Dunstan


On 9/15/21 3:28 PM, Andrew Dunstan wrote:
> On 9/13/21 9:20 AM, Andrew Dunstan wrote:
>> On 9/12/21 2:41 PM, Andrew Dunstan wrote:
>>> On 9/11/21 8:51 PM, Justin Pryzby wrote:
 @Andrew: did you have any comment on this part ?

 |Subject: buildfarm xversion diff
 |Forking 
 https://www.postgresql.org/message-id/20210328231433.gi15...@telsasoft.com
 |
 |I gave suggestion how to reduce the "lines of diff" metric almost to 
 nothing,
 |allowing a very small "fudge factor", and which I think makes this a 
 pretty
 |good metric rather than a passable one.

>>> Somehow I missed that. Looks like some good suggestions. I'll
>>> experiment. (Note: we can't assume the presence of sed, especially on
>>> Windows).
>>>
>>>
>> I tried with the attached patch on crake, which tests back as far as
>> 9.2. Here are the diff counts from HEAD:
>>
>>
>> andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1*
>> dumpdiff-HEAD
>> dumpdiff-REL9_2_STABLE:514
>> dumpdiff-REL9_3_STABLE:169
>> dumpdiff-REL9_4_STABLE:185
>> dumpdiff-REL9_5_STABLE:221
>> dumpdiff-REL9_6_STABLE:11
>> dumpdiff-REL_10_STABLE:11
>> dumpdiff-REL_11_STABLE:73
>> dumpdiff-REL_12_STABLE:73
>> dumpdiff-REL_13_STABLE:73
>> dumpdiff-REL_14_STABLE:0
>> dumpdiff-HEAD:0
>>
>>
>> I've also attached those non-empty dumpdiff files for information, since
>> they are quite small.
>>
>>
>> There is still work to do, but this is promising. Next step: try it on
>> Windows.
>>
>>
> It appears to do the right thing on Windows. yay!
>
>
> We probably need to get smarter about the heuristics, though, e.g. by
> taking into account the buildfarm options and the platform. It would
> also help a lot if we could make vcregress.pl honor USE_MODULE_DB.
> That's on my TODO list, but it just got a lot higher priority.
>
>


Here's what I've committed:



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Document atthasmissing default optimization avoids verification table scan

2021-09-24 Thread James Coleman
When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
default value without rewriting the table the doc changes did not note
how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
Previously such a new column required a verification table scan to
ensure no values were null. That scan happens under an exclusive lock on
the table, so it can have a meaningful impact on database "accessible
uptime".

I've attached a patch to document that the new mechanism also
precludes that scan.

Thanks,
James Coleman


v1-0001-Document-atthasmissing-default-avoids-verificatio.patch
Description: Binary data


Re: error_severity of brin work item

2021-09-24 Thread Justin Pryzby
On Fri, Sep 24, 2021 at 10:42:36AM -0300, Alvaro Herrera wrote:
> I think the most reasonable action is to push the patch in
> https://postgr.es/m/20201123193957.GA21810@alvherre.pgsql to all
> branches, closing the immediate hole, and we can see about the xact-hook
> stuff (when we have it) to master only, which I view as a better
> mechanism to protect work-items going forward.  That's what I understand
> Justin was suggesting in his last reply.

I suggested to use the 20 line solution rather than invent new infrastructure
just to avoid corner case errors.

But I also don't understand why changing to use an INFO message doesn't work.
vacuum_open_relation() outputs a WARNING for manual vacuum and an INFO for
autovacuum when log_min_duration >= 0.  Why couldn't this do the same thing?

I found a few instances of this in my logs, indicating that my understanding of
the logic is not wrong.

log_time   | 2021-01-13 15:23:40.687-05
session_line   | 2
session_start_time | 2021-01-13 15:23:19-05
error_severity | LOG
sql_state_code | 42P01
message| skipping vacuum of "pg_toast_984781820" --- relation 
no longer exists
backend_type   | autovacuum worker

-- 
Justin




Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-09-24 Thread Aleksander Alekseev
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

The cfbot seems to be happy with the updated patch.

The new status of this patch is: Ready for Committer


Re: error_severity of brin work item

2021-09-24 Thread Alvaro Herrera
On 2021-Sep-24, Jaime Casanova wrote:

> Do you plan to work on this for this CF?

I think the most reasonable action is to push the patch in
https://postgr.es/m/20201123193957.GA21810@alvherre.pgsql to all
branches, closing the immediate hole, and we can see about the xact-hook
stuff (when we have it) to master only, which I view as a better
mechanism to protect work-items going forward.  That's what I understand
Justin was suggesting in his last reply.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)




Re: Column Filtering in Logical Replication

2021-09-24 Thread Alvaro Herrera
On 2021-Sep-23, Amit Kapila wrote:

> Alvaro, do you have any thoughts on these proposed grammar changes?

Yeah, I think pubobj_name remains a problem in that you don't know its
return type -- could be a String or a RangeVar, and the user of that
production can't distinguish.  So you're still (unnecessarily, IMV)
stashing an object of undetermined type into ->object.

I think you should get rid of both pubobj_name and pubobj_expr and do
somethine like this:

/* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
PublicationObjSpec: TABLE ColId
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype = 
PUBLICATIONOBJ_TABLE;
$$->rangevar = 
makeRangeVarFromQualifiedName($1, NULL, @1, yyscanner);
}
| TABLE ColId indirection
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype = 
PUBLICATIONOBJ_TABLE;
$$->rangevar = 
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
}
| ALL TABLES IN_P SCHEMA ColId
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype = 
PUBLICATIONOBJ_REL_IN_SCHEMA;
$$->name = $4;
}
| ALL TABLES IN_P SCHEMA CURRENT_SCHEMA /* XXX should 
this be "IN_P CURRENT_SCHEMA"? */
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype = 
PUBLICATIONOBJ_CURRSCHEMA;
$$->name = $4;
}
| ColId
{
$$ = 
makeNode(PublicationObjSpec);
$$->name = $1;
$$->pubobjtype = 
PUBLICATIONOBJ_CONTINUATION;
}
| ColId indirection
{
$$ = 
makeNode(PublicationObjSpec);
$$->rangevar = 
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
$$->pubobjtype = 
PUBLICATIONOBJ_CONTINUATION;
}
| CURRENT_SCHEMA
{
$$ = 
makeNode(PublicationObjSpec);
$$->pubobjtype = 
PUBLICATIONOBJ_CURRSCHEMA;
}
;

so in AlterPublicationStmt you would have stanzas like

| ALTER PUBLICATION name ADD_P pub_obj_list
{
AlterPublicationStmt *n = 
makeNode(AlterPublicationStmt);
n->pubname = $3;
n->pubobjects = 
preprocess_pubobj_list($5);
n->action = DEFELEM_ADD;
$$ = (Node *)n;
}

where preprocess_pubobj_list (defined right after processCASbits and
somewhat mimicking it and SplitColQualList) takes all
PUBLICATIONOBJ_CONTINUATION and turns them into either
PUBLICATIONOBJ_TABLE entries or PUBLICATIONOBJ_REL_IN_SCHEMA entries,
depending on what the previous entry was.  (And of course if there is no
previous entry, raise an error immediately).  Note that node
PublicationObjSpec now has two fields, one for RangeVar and another for
a plain name, and tables always use the second one, except when they are
continuations, but of course those continuations that use name are
turned into rangevars in the preprocess step.  I think that would make
the code in ObjectsInPublicationToOids less messy.

(I don't think using the string "CURRENT_SCHEMA" is a great solution.
Did you try having a schema named CURRENT_SCHEMA?)

I verified that bison is happy with the grammar I proposed; I also
verified that you can add opt_column_list to the stanzas for tables, and
it remains happy.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y 

Re: Skipping logical replication transactions on subscriber side

2021-09-24 Thread Masahiko Sawada
On Fri, Sep 24, 2021 at 5:53 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, September 21, 2021 12:53 PM Masahiko Sawada 
>  wrote:
> >
> > I've attached the updated version patches. Please review them.
>
> Thanks for updating the patch,
> here are a few comments on the v14-0001 patch.

Thank you for the comments!

>
> 1)
> +   hash_ctl.keysize = sizeof(Oid);
> +   hash_ctl.entrysize = 
> sizeof(SubscriptionRelState);
> +   not_ready_rels_htab = hash_create("not ready 
> relations in subscription",
> + 
> 64,
> + 
> _ctl,
> + 
> HASH_ELEM | HASH_BLOBS);
> +
>
> ISTM we can pass list_length(not_ready_rels_list) as the nelem to hash_create.

As Amit pointed out, it seems not necessary to build a temporary hash
table for this purpose.

>
> 2)
>
> +   /*
> +* Search for all the dead subscriptions and error entries in stats
> +* hashtable and tell the stats collector to drop them.
> +*/
> +   if (subscriptionHash)
> +   {
> ...
> +   HTAB   *htab;
> +
>
> It seems we already delacre a "HTAB *htab;" in function pgstat_vacuum_stat(),
> can we use the existing htab here ?

Right. Will remove it.

>
>
> 3)
>
> PGSTAT_MTYPE_RESETREPLSLOTCOUNTER,
> +   PGSTAT_MTYPE_SUBSCRIPTIONERR,
> +   PGSTAT_MTYPE_SUBSCRIPTIONERRRESET,
> +   PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE,
> +   PGSTAT_MTYPE_SUBSCRIPTIONPURGE,
> PGSTAT_MTYPE_AUTOVAC_START,
>
> Can we append these values at the end of the Enum struct which won't affect 
> the
> other Enum values.

Yes, I'll move them to the end of the Enum struct.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-24 Thread Masahiko Sawada
On Fri, Sep 24, 2021 at 5:27 PM Greg Nancarrow  wrote:
>
> On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada  wrote:
> >
> > I've attached the updated version patches. Please review them.
> >
>
> A few review comments for the v14-0002 patch:

Thank you for the comments!

>
> (1)
> I suggest a small update to the patch comment:
>
> BEFORE:
> ALTER SUBSCRIPTION ... RESET command resets subscription
> parameters. The parameters that can be set are streaming, binary,
> synchronous_commit.
>
> AFTER:
> ALTER SUBSCRIPTION ... RESET command resets subscription
> parameters to their default value. The parameters that can be reset
> are streaming, binary, and synchronous_commit.
>
> (2)
> In the documentation, the RESETable parameters should be listed in the
> same way and order as for SET:
>
> BEFORE:
> + 
> +   The parameters that can be reset are: streaming,
> +   binary, synchronous_commit.
> + 
> AFTER:
> + 
> +   The parameters that can be reset are
> synchronous_commit,
> +   binary, and streaming.
> +  
>
> Also, I'm thinking it would be beneficial to say the following before this:
>
> RESET is used to set parameters back to their default value.
>

I agreed with all of the above comments. I'll incorporate them into
the next version patch that I'm going to submit next Monday.

> (3)
> I notice that if you try to reset the slot_name, you get the following 
> message:
> postgres=# alter subscription sub reset (slot_name);
> ERROR:  unrecognized subscription parameter: "slot_name"
>
> This is a bit misleading, because "slot_name" actually IS a
> subscription parameter, just not resettable.
> It would be better in this case if it said something like:
> ERROR: not a resettable subscription parameter: "slot_name"
>
> However, it seems that this is also an existing issue with SET (e.g.
> for "refresh" or "two_phase"):
> postgres=# alter subscription sub set (refresh=true);
> ERROR:  unrecognized subscription parameter: "refresh"

Good point. Maybe we can improve it in a separate patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-24 Thread Masahiko Sawada
On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila  wrote:
>
> On Tue, Sep 21, 2021 at 10:23 AM Masahiko Sawada  
> wrote:
> >
> > I've attached the updated version patches. Please review them.
> >
>
> Review comments for v14-0001-Add-pg_stat_subscription_errors-statistics-view
> ==
> 1.
> 
> +   command text
> +  
> +  
> +Name of command being applied when the error occurred.  This
> +field is always NULL if the error is reported by the
> +tablesync worker.
> +  
> + 
> ..
> ..
> + 
> +  
> +   xid xid
> +  
> +  
> +Transaction ID of the publisher node being applied when the error
> +occurred.  This field is always NULL if the error is reported
> +by the tablesync worker.
> +  
>
> Shouldn't we display command and transaction id even for table sync
> worker if it occurs during sync phase (syncing with apply worker
> position)

Right. I'll fix it.

>
> 2.
> + /*
> + * The number of not-ready relations can be high for example right
> + * after creating a subscription, so we load the list of
> + * SubscriptionRelState into the hash table for faster lookups.
> + */
>
> I am not sure this optimization of converting to not-ready relations
> list to hash table is worth it. Are we expecting thousands of
> relations per subscription? I think that will be a rare case even if
> it is there.

Yeah, it seems overkill. I'll use the simple list. If this becomes a
problem, we can add such optimization later.

>
> 3.
> +static void
> +pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
> +{
> + if (subscriptionHash == NULL)
> + return;
> +
> + for (int i = 0; i < msg->m_nentries; i++)
> + {
> + PgStat_StatSubEntry *subent;
> +
> + subent = pgstat_get_subscription_entry(msg->m_subids[i], false);
> +
> + /*
> + * Nothing to do if the subscription entry is not found.  This could
> + * happen when the subscription is dropped and the message for
> + * dropping subscription entry arrived before the message for
> + * reporting the error.
> + */
> + if (subent == NULL)
>
> Is the above comment true even during the purge? I can think of this
> during normal processing but not during the purge.

Right, the comment is not true during the purge. Since subent could be
NULL if concurrent autovacuum workers do pgstat_vacuum_stat() I'll
change the comment.

>
> 4.
> +typedef struct PgStat_MsgSubscriptionErr
> +{
> + PgStat_MsgHdr m_hdr;
> +
> + /*
> + * m_subid and m_subrelid are used to determine the subscription and the
> + * reporter of this error.  m_subrelid is InvalidOid if reported by the
> + * apply worker, otherwise by the table sync worker.  In table sync worker
> + * case, m_subrelid must be the same as m_relid.
> + */
> + Oid m_subid;
> + Oid m_subrelid;
> +
> + /* Error information */
> + Oid m_relid;
>
> Is m_subrelid is used only to distinguish the type of worker? I think
> it could be InvalidOid during the syncing phase in the table sync
> worker.

Right. I'll fix it.

>
> 5.
> +/*
> + * Subscription error statistics kept in the stats collector, representing
> + * an error that occurred during application of logical replication or
>
> The part of the message " ... application of logical replication ..."
> sounds a little unclear. Shall we instead write: " ... application of
> logical message ..."?

Will fix.

>
> 6.
> +typedef struct PgStat_StatSubEntry
> +{
> + Oid subid; /* hash table key */
> +
> + /*
> + * Statistics of errors that occurred during logical replication.  While
> + * having the hash table for table sync errors we have a separate
> + * statistics value for apply error (apply_error), because we can avoid
> + * building a nested hash table for table sync errors in the case where
> + * there is no table sync error, which is the common case in practice.
> + *
>
> The above comment is not clear to me. Why do you need to have a
> separate hash table for table sync errors? And what makes it avoid
> building nested hash table?

In the previous patch, a subscription stats entry
(PgStat_StatSubEntry) had one hash table that had error entries of
both apply and table sync. Since a subscription can have one apply
worker and multiple table sync workers it makes sense to me to have
the subscription entry have a hash table for them. The reason why we
have one error entry for an apply error and a hash table for table
sync errors is that there is the common case where an apply error
happens whereas any table sync error doesn’t. With this optimization,
if the subscription has only apply error, since we can store it into
aply_error field, we can avoid building a hash table for sync errors.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Empty string in lexeme for tsvector

2021-09-24 Thread Ranier Vilela
Em sex., 24 de set. de 2021 às 09:39, Jean-Christophe Arnu 
escreveu:

>
>
> Le ven. 24 sept. 2021 à 13:03, Ranier Vilela  a
> écrit :
>
>>
>> Comments are more than welcome!
>>>
>> 1. Would be better to add this test-and-error before tsvector_bsearch
>> call.
>>
>> + if (lex_len == 0)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
>> + errmsg("lexeme array may not contain empty strings")));
>> +
>>
>> If lex_len is equal to zero, better get out soon.
>>
>> 2. The second test-and-error can use lex_len, just like the first test,
>> I don't see the point in recalculating the size of lex_len if that's
>> already done.
>>
>> + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
>> + errmsg("lexeme array may not contain empty strings")));
>> +
>>
>
> Hello Ranier,
> Thank you for your comments.
> Here's a new patch file taking your comments into account.
>
Thanks.


> I was just wondering if empty string eviction is done in the right place.
>
As you rightfully commented, lex_len is calculated later (once again for a
> right purpose) and my code checks for empty strings as soon as possible.
> To me, it seems to be the right thing to do (prevent further processing on
> lexemes
> as soon as possible) but I might omit something.
>
It's always good to avoid unnecessary processing.

regards,
Ranier Vilela


Re: refactoring basebackup.c

2021-09-24 Thread Jeevan Ladhe
>
> Still, there's got to be a simple way to make this work, and it can't
> involve setting autoFlush. Like, look at this:
>
> https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
>
> That uses the same APIs that we're here and a fixed-size input buffer
> and a fixed-size output buffer, just as we have here, to compress a
> file. And it probably works, because otherwise it likely wouldn't be
> in the "examples" directory. And it sets autoFlush to 0.
>

Thanks, Robert. I have seen this example, and it is similar to what we have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.

I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.

Regards,
Jeevan Ladhe


Re: rand48 replacement

2021-09-24 Thread Aleksander Alekseev
Hi Fabien,

> Attached a v10 which is some kind of compromise where the interface uses
> inclusive min and max bounds, so that all values can be reached.

Just wanted to let you know that cfbot [1] doesn't seem to be happy with
the patch. Apparently, some tests are falling. To be honest, I didn't
invest too much time into investigating this. Hopefully, it's not a big
deal.

[1]: http://cfbot.cputube.org/

-- 
Best regards,
Aleksander Alekseev


Re: psql - add SHOW_ALL_RESULTS option

2021-09-24 Thread Peter Eisentraut

On 22.07.21 16:58, Fabien COELHO wrote:
Ok. I noticed. The patch got significantly broken by the watch pager 
commit. I also have to enhance the added tests (per Peter request).


I wrote a test to check psql query cancel support.  I checked that it 
fails against the patch that was reverted.  Maybe this is useful.


Here is the updated version (v8? I'm not sure what the right count is), 
which works for me and for "make check", including some tests added for 
uncovered paths.


I was looking at adding test coverage for the issue complained about in 
[0].  That message said that the autocommit logic was broken, but 
actually the issue was with the ON_ERROR_ROLLBACK logic.  However, it 
turned out that neither feature had any test coverage, and they are 
easily testable using the pg_regress setup, so I wrote tests for both 
and another little thing I found nearby.


It turns out that your v8 patch still has the issue complained about in 
[0].  The issue is that after COMMIT AND CHAIN, the internal savepoint 
is gone, but the patched psql still thinks it should be there and tries 
to release it, which leads to errors.



[0]: https://www.postgresql.org/message-id/2671235.1618154...@sss.pgh.pa.us
>From c6042975c9cb802c5e6017c5f2452cea21beac1e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 24 Sep 2021 14:27:35 +0200
Subject: [PATCH] psql: Add various tests

Add tests for psql features

- AUTOCOMMIT
- ON_ERROR_ROLLBACK
- ECHO errors
---
 src/test/regress/expected/psql.out | 99 ++
 src/test/regress/sql/psql.sql  | 63 +++
 2 files changed, 162 insertions(+)

diff --git a/src/test/regress/expected/psql.out 
b/src/test/regress/expected/psql.out
index 1b2f6bc418..930ce8597a 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5179,3 +5179,102 @@ List of access methods
  pg_catalog | &&   | anyarray  | anyarray   | boolean | overlaps
 (1 row)
 
+-- AUTOCOMMIT
+CREATE TABLE ac_test (a int);
+\set AUTOCOMMIT off
+INSERT INTO ac_test VALUES (1);
+COMMIT;
+SELECT * FROM ac_test;
+ a 
+---
+ 1
+(1 row)
+
+COMMIT;
+INSERT INTO ac_test VALUES (2);
+ROLLBACK;
+SELECT * FROM ac_test;
+ a 
+---
+ 1
+(1 row)
+
+COMMIT;
+BEGIN;
+INSERT INTO ac_test VALUES (3);
+COMMIT;
+SELECT * FROM ac_test;
+ a 
+---
+ 1
+ 3
+(2 rows)
+
+COMMIT;
+BEGIN;
+INSERT INTO ac_test VALUES (4);
+ROLLBACK;
+SELECT * FROM ac_test;
+ a 
+---
+ 1
+ 3
+(2 rows)
+
+COMMIT;
+\set AUTOCOMMIT on
+DROP TABLE ac_test;
+SELECT * FROM ac_test;  -- should be gone now
+ERROR:  relation "ac_test" does not exist
+LINE 1: SELECT * FROM ac_test;
+  ^
+-- ON_ERROR_ROLLBACK
+\set ON_ERROR_ROLLBACK on
+CREATE TABLE oer_test (a int);
+BEGIN;
+INSERT INTO oer_test VALUES (1);
+INSERT INTO oer_test VALUES ('foo');
+ERROR:  invalid input syntax for type integer: "foo"
+LINE 1: INSERT INTO oer_test VALUES ('foo');
+ ^
+INSERT INTO oer_test VALUES (3);
+COMMIT;
+SELECT * FROM oer_test;
+ a 
+---
+ 1
+ 3
+(2 rows)
+
+BEGIN;
+INSERT INTO oer_test VALUES (4);
+ROLLBACK;
+SELECT * FROM oer_test;
+ a 
+---
+ 1
+ 3
+(2 rows)
+
+BEGIN;
+INSERT INTO oer_test VALUES (5);
+COMMIT AND CHAIN;
+INSERT INTO oer_test VALUES (6);
+COMMIT;
+SELECT * FROM oer_test;
+ a 
+---
+ 1
+ 3
+ 5
+ 6
+(4 rows)
+
+DROP TABLE oer_test;
+\set ON_ERROR_ROLLBACK off
+-- ECHO errors
+\set ECHO errors
+ERROR:  relation "notexists" does not exist
+LINE 1: SELECT * FROM notexists;
+  ^
+STATEMENT:  SELECT * FROM notexists;
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 68121d171c..e9d504baf2 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1241,3 +1241,66 @@ CREATE MATERIALIZED VIEW mat_view_heap_psql USING 
heap_psql AS SELECT f1 from tb
 \dfa bit* small*
 \do - pg_catalog.int4
 \do && anyarray *
+
+-- AUTOCOMMIT
+
+CREATE TABLE ac_test (a int);
+\set AUTOCOMMIT off
+
+INSERT INTO ac_test VALUES (1);
+COMMIT;
+SELECT * FROM ac_test;
+COMMIT;
+
+INSERT INTO ac_test VALUES (2);
+ROLLBACK;
+SELECT * FROM ac_test;
+COMMIT;
+
+BEGIN;
+INSERT INTO ac_test VALUES (3);
+COMMIT;
+SELECT * FROM ac_test;
+COMMIT;
+
+BEGIN;
+INSERT INTO ac_test VALUES (4);
+ROLLBACK;
+SELECT * FROM ac_test;
+COMMIT;
+
+\set AUTOCOMMIT on
+DROP TABLE ac_test;
+SELECT * FROM ac_test;  -- should be gone now
+
+-- ON_ERROR_ROLLBACK
+
+\set ON_ERROR_ROLLBACK on
+CREATE TABLE oer_test (a int);
+
+BEGIN;
+INSERT INTO oer_test VALUES (1);
+INSERT INTO oer_test VALUES ('foo');
+INSERT INTO oer_test VALUES (3);
+COMMIT;
+SELECT * FROM oer_test;
+
+BEGIN;
+INSERT INTO oer_test VALUES (4);
+ROLLBACK;
+SELECT * FROM oer_test;
+
+BEGIN;
+INSERT INTO oer_test VALUES (5);
+COMMIT AND CHAIN;
+INSERT INTO oer_test VALUES (6);
+COMMIT;
+SELECT * FROM oer_test;
+
+DROP TABLE oer_test;
+\set ON_ERROR_ROLLBACK off
+
+-- ECHO errors
+\set ECHO errors
+SELECT * FROM notexists;
+\set ECHO none
-- 
2.33.0



Re: Empty string in lexeme for tsvector

2021-09-24 Thread Jean-Christophe Arnu
Le ven. 24 sept. 2021 à 13:03, Ranier Vilela  a écrit :

>
> Comments are more than welcome!
>>
> 1. Would be better to add this test-and-error before tsvector_bsearch call.
>
> + if (lex_len == 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
> + errmsg("lexeme array may not contain empty strings")));
> +
>
> If lex_len is equal to zero, better get out soon.
>
> 2. The second test-and-error can use lex_len, just like the first test,
> I don't see the point in recalculating the size of lex_len if that's
> already done.
>
> + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
> + errmsg("lexeme array may not contain empty strings")));
> +
>

Hello Ranier,
Thank you for your comments.
Here's a new patch file taking your comments into account.

I was just wondering if empty string eviction is done in the right place.
As you rightfully commented, lex_len is calculated later (once again for a
right purpose) and my code checks for empty strings as soon as possible.
To me, it seems to be the right thing to do (prevent further processing on
lexemes
as soon as possible) but I might omit something.

Regards


Jean-Christophe Arnu
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 9236ebcc8f..00f80ffcbc 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -329,6 +329,12 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
 
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
+
+		if (lex_len == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		lex_pos = tsvector_bsearch(tsout, lex, lex_len);
 
 		if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0)
@@ -609,6 +615,12 @@ tsvector_delete_arr(PG_FUNCTION_ARGS)
 
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
+
+		if (lex_len == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		lex_pos = tsvector_bsearch(tsin, lex, lex_len);
 
 		if (lex_pos >= 0)
@@ -761,13 +773,18 @@ array_to_tsvector(PG_FUNCTION_ARGS)
 
 	deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, , , );
 
-	/* Reject nulls (maybe we should just ignore them, instead?) */
+	/* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */
 	for (i = 0; i < nitems; i++)
 	{
 		if (nulls[i])
 			ereport(ERROR,
 	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 	 errmsg("lexeme array may not contain nulls")));
+
+		if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
 	}
 
 	/* Sort and de-dup, because this is required for a valid tsvector. */
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 2601e312df..a258179ef0 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -1260,6 +1260,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi
 
 SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']);
+ERROR:  lexeme array may not contain empty strings
 SELECT unnest('base:7 hidden:6 rebel:1 spaceship:2,33A,34B,35C,36D strike:3'::tsvector);
unnest
 -
@@ -1330,6 +1332,8 @@ SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship','strike']);
 
 SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', '']);
+ERROR:  lexeme array may not contain empty strings
 -- array_to_tsvector must sort and de-dup
 SELECT array_to_tsvector(ARRAY['foo','bar','baz','bar']);
  array_to_tsvector 
@@ -1375,6 +1379,8 @@ SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', '{a,zxc}');
 
 SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', '']);
+ERROR:  lexeme array may not contain empty strings
 SELECT ts_filter('base:7A empir:17 evil:15 first:11 galact:16 hidden:6A rebel:1A spaceship:2A strike:3A victori:12 won:9'::tsvector, '{a}');
   ts_filter  
 -
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 30c8c702f0..6be0b26117 100644
--- 

Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-09-24 Thread Aleksander Alekseev
Hi David,

> Please find attached the next revision :)

The patch didn't apply and couldn't pass cfbot [1]. The (hopefully)
corrected patch is attached. Other than that it looks OK to me but let's
see what cfbot will tell.

[1]: http://cfbot.cputube.org/patch_34_3113.log

-- 
Best regards,
Aleksander Alekseev


v5-0001-psql-Fix-tab-completion-for-ALTER-TABLE.patch
Description: Binary data


Re: very long record lines in expanded psql output

2021-09-24 Thread Andrew Dunstan


On 9/24/21 12:49 AM, Platon Pronko wrote:
> On 2021-09-23 22:28, Andrew Dunstan wrote:
>>
>> 2. It would possibly be better to pass the relevant parts of the options
>> to print_aligned_vertical_line() rather than the whole options
>> structure. It feels odd to pass both that and opt_border.
>
> What do you think about doing it the other way around - passing only
> whole
> options structure? That way we will roll 4 parameters (opt_border,
> printTextFormat,
> and two xheader ones) into only one argument.
> This increases code coupling a bit, but I'm not sure if that's
> relevant here.
>

Sure, as long as it doesn't result in duplicated computation.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-24 Thread Amul Sul
On Thu, Sep 23, 2021 at 11:56 PM Robert Haas  wrote:
>
> On Mon, Sep 20, 2021 at 11:20 AM Amul Sul  wrote:
> > Ok, understood, I have separated my changes into 0001 and 0002 patch,
> > and the refactoring patches start from 0003.
>
> I think it would be better in the other order, with the refactoring
> patches at the beginning of the series.
>

Ok, will do that. I did this other way to minimize the diff e.g.
deletion diff of RecoveryXlogAction enum and
DetermineRecoveryXlogAction(), etc.

> > In the 0001 patch, I have copied ArchiveRecoveryRequested to shared
> > memory as said previously. Coping ArchiveRecoveryRequested value to
> > shared memory is not really interesting, and I think somehow we should
> > reuse existing variable, (perhaps, with some modification of the
> > information it can store, e.g. adding one more enum value for
> > SharedRecoveryState or something else), thinking on the same.
> >
> > In addition to that, I tried to turn down the scope of
> > ArchiveRecoveryRequested global variable. Now, this is a static
> > variable, and the scope is limited to xlog.c file like
> > LocalXLogInsertAllowed and can be accessed through the newly added
> > function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let
> > me know what you think about the approach.
>
> I'm not sure yet whether I like this or not, but it doesn't seem like
> a terrible idea. You spelled UNKNOWN wrong, though, which does seem
> like a terrible idea. :-) "acccsed" is not correct either.
>
> Also, the new comments for ArchiveRecoveryRequested /
> ARCHIVE_RECOVERY_REQUEST_* are really not very clear. All you did is
> substitute the new terminology into the existing comment, but that
> means that the purpose of the new "unknown" value is not at all clear.
>

Ok, will fix those typos and try to improve the comments.

> Consider the following two patch fragments:
>
> + * SharedArchiveRecoveryRequested indicates whether an archive recovery is
> + * requested. Protected by info_lck.
> ...
> + * Remember archive recovery request in shared memory state.  A lock is not
> + * needed since we are the only ones who updating this.
>
> These two comments directly contradict each other.
>

Okay, the first comment is not clear enough, I will fix that too. I
meant we don't need the lock now since we are the only one updating at
this point.

> + SpinLockAcquire(>info_lck);
> + XLogCtl->SharedArchiveRecoveryRequested = false;
> + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN;
> + SpinLockRelease(>info_lck);
>
> This seems odd to me. In the first place, there doesn't seem to be any
> value in clearing this -- we're just expending extra CPU cycles to get
> rid of a value that wouldn't be used anyway. In the second place, if
> somehow someone checked the value after this point, with this code,
> they might get the wrong answer, whereas if you just deleted this,
> they would get the right answer.
>

Previously, this flag was only valid in the startup process. But now
it will be valid for all the processes and will stay until the whole
server gets restarted. I don't want anybody to use this flag after the
cleanup point and just be sure I am explicitly cleaning this.

By the way, I also don't expect we should go with this approach.  I
proposed this by referring to the PromoteIsTriggered() implementation,
but IMO, it is better to have something else since we just want to perform
archive cleanup operation, and most of the work related to archive
recovery was done inside the StartupXLOG().

Rather than the proposed design, I was thinking of adding one or two
more RecoveryState enums. And while skipping XLogAcceptsWrite() set
XLogCtl->SharedRecoveryState appropriately, so that we can easily
identify that the archive recovery was requested previously and now,
we need to perform its pending cleanup operation. Thoughts?

> > In 0002 patch is a mixed one where I tried to remove the dependencies
> > on global variables and local variables belonging to StartupXLOG(). I
> > am still worried about the InRecovery value that needs to be deduced
> > afterward inside XLogAcceptWrites().  Currently, relying on
> > ControlFile->state != DB_SHUTDOWNED check but I think that will not be
> > good for ASRO where we plan to skip XLogAcceptWrites() work only and
> > let the StartupXLOG() do the rest of the work as it is where it will
> > going to update ControlFile's DBState to DB_IN_PRODUCTION, then we
> > might need some ugly kludge to call PerformRecoveryXLogAction() in
> > checkpointer irrespective of DBState, which makes me a bit
> > uncomfortable.
>
> I think that replacing the if (InRecovery) test with if
> (ControlFile->state != DB_SHUTDOWNED) is probably just plain wrong. I
> mean, there are three separate places where we set InRecovery = true.
> One of those executes if ControlFile->state != DB_SHUTDOWNED, matching
> what you have here, but it also can happen if checkPoint.redo <
> RecPtr, or if read_backup_label is true and 

Re: extensible options syntax for replication parser?

2021-09-24 Thread tushar

On 9/23/21 8:35 PM, Robert Haas wrote:

Thanks. Looks like that version had some stupid mistakes. Here's a new one.


Thanks, the reported issue seems to be fixed now for HEAD w/patch 
(publication) to HEAD w/patch (subscription) but still getting the same 
error if we try to perform v12(publication) to HEAD 
w/patch(subscription) . I checked there is no such issue for 
v12(publication) to v14 RC1 (subscription)


postgres=#  create subscription sub123s CONNECTION 'host=127.0.0.1 
user=edb  port= dbname=postgres' PUBLICATION pp with (slot_name = 
from_v14);

ERROR:  could not create replication slot "from_v14": ERROR: syntax error
postgres=#

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: POC: Cleaning up orphaned files using undo logs

2021-09-24 Thread Antonin Houska
Amit Kapila  wrote:

> On Mon, Sep 20, 2021 at 10:24 AM Antonin Houska  wrote:
> >
> > Amit Kapila  wrote:
> >
> > > On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> 
> > > wrote:
> > > >
> > > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > > >
> > > > * What happened with the idea of abandoning discard worker for the sake
> > > >   of simplicity? From what I see limiting everything to foreground undo
> > > >   could reduce the core of the patch series to the first four patches
> > > >   (forgetting about test and docs, but I guess it would be enough at
> > > >   least for the design review), which is already less overwhelming.
 
> > What we can miss, at least for the cleanup of the orphaned files, is the 
> > *undo
> > worker*. In this patch series the cleanup is handled by the startup process.
> >
> 
> Okay, I think various people at different point of times has suggested
> that idea. I think one thing we might need to consider is what to do
> in case of a FATAL error? In case of FATAL error, it won't be
> advisable to execute undo immediately, so would we upgrade the error
> to PANIC in such cases. I remember vaguely that for clean up of
> orphaned files that can happen rarely and someone has suggested
> upgrading the error to PANIC in such a case but I don't remember the
> exact details.

Do you mean FATAL error during normal operation? As far as I understand, even
zheap does not rely on immediate UNDO execution (otherwise it'd never
introduce the undo worker), so FATAL only means that the undo needs to be
applied later so it can be discarded.

As for the orphaned files cleanup feature with no undo worker, we might need
PANIC to ensure that the undo is applied during restart and that it can be
discarded, otherwise the unapplied undo log would stay there until the next
(regular) restart and it would block discarding. However upgrading FATAL to
PANIC just because the current transaction created a table seems quite
rude. So the undo worker might be needed even for this patch?

Or do you mean FATAL error when executing the UNDO?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Empty string in lexeme for tsvector

2021-09-24 Thread Ranier Vilela
Em sex., 24 de set. de 2021 às 05:47, Jean-Christophe Arnu 
escreveu:

> Hello Hackers,
>
> This is my second proposal for a patch, so I hope not to make "rookie"
> mistakes.
>
> This proposal patch is based on a simple use case :
>
> If one creates a table this way
> CREATE TABLE tst_table AS (SELECT
> array_to_tsvector(ARRAY['','abc','def']));
>
> the table content is :
>  array_to_tsvector
> ---
>  '' 'abc' 'def'
> (1 row)
>
> First it can be strange to have an empty string for tsvector lexeme but
> anyway, keep going to the real point.
>
> Once dumped, this table dump contains that empty string that can't be
> restored.
> tsvector_parse (./utils/adt/tsvector_parser.c) raises an error.
>
> Thus it is not possible for data to be restored this way.
>
> There are two ways to consider this : is it alright to have empty strings
> in lexemes ?
>* If so, empty strings should be correctly parsed by tsvector_parser.
>* If not, one should prevent empty strings from being stored into
> tsvectors.
>
> Since "empty strings" seems not to be a valid lexeme, I undertook to
> change some functions dealing with tsvector to check whether string
> arguments are empty. This might be the wrong path as I'm not familiar with
> tsvector usage... (OTOH, I can provide a fix patch for tsvector_parser() if
> I'm wrong).
>
> This involved changing the way functions like array_to_tsvector(),
> ts_delete() and setweight() behave. As for NULL values, empty string values
> are checked and an error is raised for such a value. It appears to me that
> ERRCODE_ZERO_LENGTH_CHARACTER_STRING (2200F) matched this behaviour but I
> may be wrong.
>
> Since this patch changes the way functions behave, consider it as a simple
> try to overcome a strange situation we've noticed on a specific use case.
>
> This included patch manages that checks for empty strings on the pointed
> out functions. It comes with modified regression tests. Patch applies along
> head/master and 14_RC1.
>
> Comments are more than welcome!
>
1. Would be better to add this test-and-error before tsvector_bsearch call.

+ if (lex_len == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+ errmsg("lexeme array may not contain empty strings")));
+

If lex_len is equal to zero, better get out soon.

2. The second test-and-error can use lex_len, just like the first test,
I don't see the point in recalculating the size of lex_len if that's
already done.

+ if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+ errmsg("lexeme array may not contain empty strings")));
+

regards,
Ranier Vilela


Re: Skipping logical replication transactions on subscriber side

2021-09-24 Thread Amit Kapila
On Tue, Sep 21, 2021 at 10:23 AM Masahiko Sawada  wrote:
>
> I've attached the updated version patches. Please review them.
>

Review comments for v14-0001-Add-pg_stat_subscription_errors-statistics-view
==
1.

+   command text
+  
+  
+Name of command being applied when the error occurred.  This
+field is always NULL if the error is reported by the
+tablesync worker.
+  
+ 
..
..
+ 
+  
+   xid xid
+  
+  
+Transaction ID of the publisher node being applied when the error
+occurred.  This field is always NULL if the error is reported
+by the tablesync worker.
+  

Shouldn't we display command and transaction id even for table sync
worker if it occurs during sync phase (syncing with apply worker
position)

2.
+ /*
+ * The number of not-ready relations can be high for example right
+ * after creating a subscription, so we load the list of
+ * SubscriptionRelState into the hash table for faster lookups.
+ */

I am not sure this optimization of converting to not-ready relations
list to hash table is worth it. Are we expecting thousands of
relations per subscription? I think that will be a rare case even if
it is there.

3.
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+ if (subscriptionHash == NULL)
+ return;
+
+ for (int i = 0; i < msg->m_nentries; i++)
+ {
+ PgStat_StatSubEntry *subent;
+
+ subent = pgstat_get_subscription_entry(msg->m_subids[i], false);
+
+ /*
+ * Nothing to do if the subscription entry is not found.  This could
+ * happen when the subscription is dropped and the message for
+ * dropping subscription entry arrived before the message for
+ * reporting the error.
+ */
+ if (subent == NULL)

Is the above comment true even during the purge? I can think of this
during normal processing but not during the purge.

4.
+typedef struct PgStat_MsgSubscriptionErr
+{
+ PgStat_MsgHdr m_hdr;
+
+ /*
+ * m_subid and m_subrelid are used to determine the subscription and the
+ * reporter of this error.  m_subrelid is InvalidOid if reported by the
+ * apply worker, otherwise by the table sync worker.  In table sync worker
+ * case, m_subrelid must be the same as m_relid.
+ */
+ Oid m_subid;
+ Oid m_subrelid;
+
+ /* Error information */
+ Oid m_relid;

Is m_subrelid is used only to distinguish the type of worker? I think
it could be InvalidOid during the syncing phase in the table sync
worker.

5.
+/*
+ * Subscription error statistics kept in the stats collector, representing
+ * an error that occurred during application of logical replication or

The part of the message " ... application of logical replication ..."
sounds a little unclear. Shall we instead write: " ... application of
logical message ..."?

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication.  While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

-- 
With Regards,
Amit Kapila.




RE: Skipping logical replication transactions on subscriber side

2021-09-24 Thread houzj.f...@fujitsu.com
On Tuesday, September 21, 2021 12:53 PM Masahiko Sawada  
wrote:
> 
> I've attached the updated version patches. Please review them.

Thanks for updating the patch,
here are a few comments on the v14-0001 patch.

1)
+   hash_ctl.keysize = sizeof(Oid);
+   hash_ctl.entrysize = 
sizeof(SubscriptionRelState);
+   not_ready_rels_htab = hash_create("not ready 
relations in subscription",
+   
  64,
+   
  _ctl,
+   
  HASH_ELEM | HASH_BLOBS);
+

ISTM we can pass list_length(not_ready_rels_list) as the nelem to hash_create.

2)

+   /*
+* Search for all the dead subscriptions and error entries in stats
+* hashtable and tell the stats collector to drop them.
+*/
+   if (subscriptionHash)
+   {
...
+   HTAB   *htab;
+

It seems we already delacre a "HTAB *htab;" in function pgstat_vacuum_stat(),
can we use the existing htab here ?


3)

PGSTAT_MTYPE_RESETREPLSLOTCOUNTER,
+   PGSTAT_MTYPE_SUBSCRIPTIONERR,
+   PGSTAT_MTYPE_SUBSCRIPTIONERRRESET,
+   PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE,
+   PGSTAT_MTYPE_SUBSCRIPTIONPURGE,
PGSTAT_MTYPE_AUTOVAC_START,

Can we append these values at the end of the Enum struct which won't affect the
other Enum values.

Best regards,
Hou zj


Empty string in lexeme for tsvector

2021-09-24 Thread Jean-Christophe Arnu
Hello Hackers,

This is my second proposal for a patch, so I hope not to make "rookie"
mistakes.

This proposal patch is based on a simple use case :

If one creates a table this way
CREATE TABLE tst_table AS (SELECT array_to_tsvector(ARRAY['','abc','def']));

the table content is :
 array_to_tsvector
---
 '' 'abc' 'def'
(1 row)

First it can be strange to have an empty string for tsvector lexeme but
anyway, keep going to the real point.

Once dumped, this table dump contains that empty string that can't be
restored.
tsvector_parse (./utils/adt/tsvector_parser.c) raises an error.

Thus it is not possible for data to be restored this way.

There are two ways to consider this : is it alright to have empty strings
in lexemes ?
   * If so, empty strings should be correctly parsed by tsvector_parser.
   * If not, one should prevent empty strings from being stored into
tsvectors.

Since "empty strings" seems not to be a valid lexeme, I undertook to change
some functions dealing with tsvector to check whether string arguments are
empty. This might be the wrong path as I'm not familiar with tsvector
usage... (OTOH, I can provide a fix patch for tsvector_parser() if I'm
wrong).

This involved changing the way functions like array_to_tsvector(),
ts_delete() and setweight() behave. As for NULL values, empty string values
are checked and an error is raised for such a value. It appears to me that
ERRCODE_ZERO_LENGTH_CHARACTER_STRING (2200F) matched this behaviour but I
may be wrong.

Since this patch changes the way functions behave, consider it as a simple
try to overcome a strange situation we've noticed on a specific use case.

This included patch manages that checks for empty strings on the pointed
out functions. It comes with modified regression tests. Patch applies along
head/master and 14_RC1.

Comments are more than welcome!
Thank you,

-- 
Jean-Christophe Arnu
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 9236ebcc8f..5a33e1bb10 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -331,6 +331,11 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
 		lex_pos = tsvector_bsearch(tsout, lex, lex_len);
 
+		if (lex_len == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0)
 		{
 			WordEntryPos *p = POSDATAPTR(tsout, entry + lex_pos);
@@ -607,6 +612,11 @@ tsvector_delete_arr(PG_FUNCTION_ARGS)
 	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 	 errmsg("lexeme array may not contain nulls")));
 
+		if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
 		lex_pos = tsvector_bsearch(tsin, lex, lex_len);
@@ -761,13 +771,18 @@ array_to_tsvector(PG_FUNCTION_ARGS)
 
 	deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, , , );
 
-	/* Reject nulls (maybe we should just ignore them, instead?) */
+	/* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */
 	for (i = 0; i < nitems; i++)
 	{
 		if (nulls[i])
 			ereport(ERROR,
 	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 	 errmsg("lexeme array may not contain nulls")));
+
+		if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
 	}
 
 	/* Sort and de-dup, because this is required for a valid tsvector. */
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 2601e312df..a258179ef0 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -1260,6 +1260,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi
 
 SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']);
+ERROR:  lexeme array may not contain empty strings
 SELECT unnest('base:7 hidden:6 rebel:1 spaceship:2,33A,34B,35C,36D strike:3'::tsvector);
unnest
 -
@@ -1330,6 +1332,8 @@ SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship','strike']);
 
 SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', '']);
+ERROR:  lexeme array may not contain empty strings
 -- array_to_tsvector must sort and de-dup
 SELECT 

Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Fri, Sep 24, 2021 at 12:19 PM Dilip Kumar  wrote:
>
> On Fri, Sep 24, 2021 at 12:04 PM Amit Kapila  wrote:
> >
> > One possibility in this regard could be that we enhance Replica
> > Identity .. Include (column_list) where all the columns in the include
> > list won't be sent
>
> Instead of RI's include column list why we can not think of
> row_filter's columns list?  I mean like we log the old RI column can't
> we make similar things for the row filter columns?  With that, we
> don't have to all the columns instead we only log the columns which
> are in row filter, or is this too hard to identify during write
> operation?
>

Yeah, we can do that as well but my guess is that will have some
additional work (to find common columns and log them only once) in
heap_delete/update and then probably during decoding (to assemble the
required filter and RI key). I am not very sure on this point, one has
to write code and test.

-- 
With Regards,
Amit Kapila.




Re: error_severity of brin work item

2021-09-24 Thread Jaime Casanova
On Wed, Mar 10, 2021 at 08:24:50AM -0500, David Steele wrote:
> On 12/1/20 5:25 PM, Justin Pryzby wrote:
> > On Tue, Dec 01, 2020 at 03:57:24PM -0300, Alvaro Herrera wrote:
> > 
> > > > Another idea is if perform_work_item() were responsible for discarding
> > > > relations which disappear.  Currently it does this, which is racy since 
> > > > it
> > > > holds no lock.
> > > 
> > > That has the property that it remains contained in autovacuum.c, but no
> > > other advantages I think.
> > 
> > It has the advantage that it moves all the try_open stuff out of brin.
> > 
> > I started implementing this, and then realized that the try_open stuff 
> > *has* to
> > be in the brin_summarize function, to handle the case that someone passes a
> > non-index, since it's SQL exposed.
> > So maybe we should use your LockOid patch now, and refactor in the future 
> > if we
> > add additional work-item types.
> 
> Thoughts on this, Álvaro? I can see that the first version of this patch was
> not ideal but the rework seems to have stalled. Since it is a bug perhaps it
> would be better to get something in as Justin suggests?
> 

Hi Álvaro,

Do you plan to work on this for this CF?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-09-24 Thread Etsuro Fujita
On Fri, Sep 10, 2021 at 8:42 PM Etsuro Fujita  wrote:
> On Fri, Sep 10, 2021 at 1:00 AM Tom Lane  wrote:
> > Etsuro Fujita  writes:
> > > Having said that, I think another option for this would be to left the
> > > code as-is; assume that 1) the foreign var has "COLLATE default”, not
> > > an unknown collation, when labeled with "COLLATE default”, and 2)
> > > "COLLATE default” on the local database matches "COLLATE default” on
> > > the remote database.
> >
> > The fundamental complaint that started this thread was exactly that
> > assumption (2) isn't safe.  So it sounds to me like you're proposing
> > that we do nothing, which isn't a great answer either.  I suppose
> > we could try documenting our way out of this, but people will
> > continue to get bit because they won't read or won't understand
> > the limitation.
>
> Yeah, but I think it’s the user’s responsibility to make sure that the
> local and remote default collations match if labeling collatable
> columns with “COLLATE default” when defining foreign tables manually
> IMO.

One thing I noticed is that collatable operators/functions sent to the
remote might also cause an unexpected result when the default
collations are not compatible.  Consider this example (even with your
patch):

explain verbose select chr(c1) from ft1 order by chr(c1);
   QUERY PLAN

 Foreign Scan on public.ft1  (cost=100.00..212.91 rows=2925 width=32)
   Output: chr(c1)
   Remote SQL: SELECT c1 FROM public.t1 ORDER BY chr(c1) ASC NULLS LAST
(3 rows)

where ft1 is a foreign table with an integer column c1.  As shown
above, the sort using the collatable function chr() is performed
remotely, so the select query might produce the result in an
unexpected sort order when the default collations are not compatible.

ISTM that we rely heavily on assumption (2).

Best regards,
Etsuro Fujita




Re: Skipping logical replication transactions on subscriber side

2021-09-24 Thread Greg Nancarrow
On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada  wrote:
>
> I've attached the updated version patches. Please review them.
>

A few review comments for the v14-0002 patch:

(1)
I suggest a small update to the patch comment:

BEFORE:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters. The parameters that can be set are streaming, binary,
synchronous_commit.

AFTER:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters to their default value. The parameters that can be reset
are streaming, binary, and synchronous_commit.

(2)
In the documentation, the RESETable parameters should be listed in the
same way and order as for SET:

BEFORE:
+ 
+   The parameters that can be reset are: streaming,
+   binary, synchronous_commit.
+ 
AFTER:
+ 
+   The parameters that can be reset are
synchronous_commit,
+   binary, and streaming.
+  

Also, I'm thinking it would be beneficial to say the following before this:

RESET is used to set parameters back to their default value.

(3)
I notice that if you try to reset the slot_name, you get the following message:
postgres=# alter subscription sub reset (slot_name);
ERROR:  unrecognized subscription parameter: "slot_name"

This is a bit misleading, because "slot_name" actually IS a
subscription parameter, just not resettable.
It would be better in this case if it said something like:
ERROR: not a resettable subscription parameter: "slot_name"

However, it seems that this is also an existing issue with SET (e.g.
for "refresh" or "two_phase"):
postgres=# alter subscription sub set (refresh=true);
ERROR:  unrecognized subscription parameter: "refresh"


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Use simplehash.h instead of dynahash in SMgr

2021-09-24 Thread Jaime Casanova
On Tue, Jun 22, 2021 at 02:15:26AM +1200, David Rowley wrote:
[...]
> 
> I've come up with a new hash table implementation that I've called
> generichash.   It works similarly to simplehash in regards to the

Hi David,

Are you planning to work on this in this CF?
This is marked as "Ready for committer" but it doesn't apply anymore.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




pgcrypto support for bcrypt $2b$ hashes

2021-09-24 Thread Daniel Fone
Hello,

I've recently been working with a database containing bcrypt hashes generated 
by a 3rd-party which use the $2b$ prefix. This prefix was introduced in 2014 
and has since been recognised by a number of bcrypt implementations. 
[1][2][3][4]

At the moment, pgcrypto’s `crypt` doesn’t recognise this prefix. However, 
simply `replace`ing the prefix with $2a$ allows crypt to validate the hashes. 
This patch simply adds recognition for the prefix and treats the hash 
identically to the $2a$ hashes.

Is this a reasonable change to pgcrypto? I note that the last upstream change 
brought into crypt-blowfish.c was in 2011, predating this prefix. [5] Are there 
deeper concerns or other upstream changes that need to be addressed alongside 
this? Is there a better approach to this? 

At the moment, the $2x$ variant is supported but not mentioned in the docs, so 
I haven’t included any documentation updates.

Thanks,

Daniel


[1]: https://marc.info/?l=openbsd-misc=139320023202696
[2]: https://www.openwall.com/lists/announce/2014/08/31/1
[3]: 
https://github.com/kelektiv/node.bcrypt.js/pull/549/files#diff-c55280c5e4da52b0f86244d3b95c5ae0abf2fcd121a071dba1363540875b32bc
[4]: 
https://github.com/bcrypt-ruby/bcrypt-ruby/commit/d19ea481618420922b533a8b0ed049109404cb13
[5]: 
https://github.com/postgres/postgres/commit/ca59dfa6f727fe3bf3a01904ec30e87f7fa5a67e

diff --git a/contrib/pgcrypto/crypt-blowfish.c 
b/contrib/pgcrypto/crypt-blowfish.c
index a663852c..9012ca43 100644
--- a/contrib/pgcrypto/crypt-blowfish.c
+++ b/contrib/pgcrypto/crypt-blowfish.c
@@ -618,7 +618,7 @@ _crypt_blowfish_rn(const char *key, const char *setting,
 
if (setting[0] != '$' ||
setting[1] != '2' ||
-   (setting[2] != 'a' && setting[2] != 'x') ||
+   (setting[2] != 'a' && setting[2] != 'b' && setting[2] != 'x') ||
setting[3] != '$' ||
setting[4] < '0' || setting[4] > '3' ||
setting[5] < '0' || setting[5] > '9' ||
diff --git a/contrib/pgcrypto/expected/crypt-blowfish.out 
b/contrib/pgcrypto/expected/crypt-blowfish.out
index d79b0c04..3d6b215c 100644
--- a/contrib/pgcrypto/expected/crypt-blowfish.out
+++ b/contrib/pgcrypto/expected/crypt-blowfish.out
@@ -13,6 +13,13 @@ SELECT crypt('foox', '$2a$06$RQiOJ.3ELirrXwxIZY8q0O');
  $2a$06$RQiOJ.3ELirrXwxIZY8q0OR3CVJrAfda1z26CCHPnB6mmVZD8p0/C
 (1 row)
 
+-- support $2b$ hashes
+SELECT crypt('password', 
'$2b$06$cqhNY4qXkVeKliLWOxCFieDY6/J4SmCEojsXfHs7iF4X1r3tuNepW');
+crypt 
+--
+ $2b$06$cqhNY4qXkVeKliLWOxCFieDY6/J4SmCEojsXfHs7iF4X1r3tuNepW
+(1 row)
+
 -- error, salt too short:
 SELECT crypt('foox', '$2a$');
 ERROR:  invalid salt
diff --git a/contrib/pgcrypto/px-crypt.c b/contrib/pgcrypto/px-crypt.c
index 0913ff2c..9272d409 100644
--- a/contrib/pgcrypto/px-crypt.c
+++ b/contrib/pgcrypto/px-crypt.c
@@ -78,6 +78,7 @@ struct px_crypt_algo
 static const struct px_crypt_algo
px_crypt_list[] = {
{"$2a$", 4, run_crypt_bf},
+   {"$2b$", 4, run_crypt_bf},
{"$2x$", 4, run_crypt_bf},
{"$2$", 3, NULL},   /* N/A */
{"$1$", 3, run_crypt_md5},
diff --git a/contrib/pgcrypto/sql/crypt-blowfish.sql 
b/contrib/pgcrypto/sql/crypt-blowfish.sql
index 3b5a681c..0e96ee0c 100644
--- a/contrib/pgcrypto/sql/crypt-blowfish.sql
+++ b/contrib/pgcrypto/sql/crypt-blowfish.sql
@@ -6,6 +6,9 @@ SELECT crypt('', '$2a$06$RQiOJ.3ELirrXwxIZY8q0O');
 
 SELECT crypt('foox', '$2a$06$RQiOJ.3ELirrXwxIZY8q0O');
 
+-- support $2b$ hashes
+SELECT crypt('password', 
'$2b$06$cqhNY4qXkVeKliLWOxCFieDY6/J4SmCEojsXfHs7iF4X1r3tuNepW');
+
 -- error, salt too short:
 SELECT crypt('foox', '$2a$');
 


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2021-09-24 Thread osumi.takami...@fujitsu.com
Hi


On Tuesday, March 16, 2021 1:35 AM Oh, Mike  wrote:
> We noticed that the logical replication could fail when the
> Standby::RUNNING_XACT record is generated in the middle of a catalog
> modifying transaction and if the logical decoding has to restart from the
> RUNNING_XACT
> WAL entry.
...
> Proposed solution:
> If we’re decoding a catalog modifying commit record, then check whether
> it’s part of the RUNNING_XACT xid’s processed @ the restart_lsn. If so,
> then add its xid & subxacts in the committed txns list in the logical decoding
> snapshot.
> 
> Please refer to the attachment for the proposed patch.


Let me share some review comments for the patch.

(1) last_running declaration

Isn't it better to add static for this variable,
because we don't use this in other places ?

@@ -85,6 +86,9 @@ static bool DecodeTXNNeedSkip(LogicalDecodingContext *ctx,
  XLogRecordBuffer 
*buf, Oid dbId,
  RepOriginId 
origin_id);

+/* record previous restart_lsn running xacts */
+xl_running_xacts *last_running = NULL;


(2) DecodeStandbyOp's memory free

I'm not sure when
we pass this condition with already allocated last_running,
but do you need to free it's xid array here as well,
if last_running isn't null ?
Otherwise, we'll miss the chance after this.

+   /* record restart_lsn running xacts */
+   if (MyReplicationSlot && (buf->origptr == 
MyReplicationSlot->data.restart_lsn))
+   {
+   if (last_running)
+   free(last_running);
+
+   last_running = NULL;

(3) suggestion of small readability improvement

We calculate the same size twice here and DecodeCommit.
I suggest you declare a variable that stores the computed result of size,
which might shorten those codes.

+   /*
+* xl_running_xacts contains a xids 
Flexible Array
+* and its size is subxcnt + xcnt.
+* Take that into account while 
allocating
+* the memory for last_running.
+*/
+   last_running = (xl_running_xacts *) 
malloc(sizeof(xl_running_xacts)
+   
+ sizeof(TransactionId )
+   
* (running->subxcnt + 
running->xcnt));
+   memcpy(last_running, running, 
sizeof(xl_running_xacts)
+   
 + (sizeof(TransactionId)
+   
 * (running->subxcnt + running->xcnt)));

Best Regards,
Takamichi Osumi



Re: Gather performance analysis

2021-09-24 Thread Dilip Kumar
On Fri, Sep 24, 2021 at 1:30 AM Tomas Vondra
 wrote:
>
> On 9/23/21 9:31 PM, Robert Haas wrote:
> > On Wed, Sep 8, 2021 at 2:06 AM Dilip Kumar  wrote:
> >> But I am attaching both the patches in case you want to play around.
> >
> > I don't really see any reason not to commit 0001. Perhaps some very
> > minor grammatical nitpicking is in order here, but apart from that I
> > can't really see anything to criticize with this approach. It seems
> > safe enough, it's not invasive in any way that matters, and we have
> > benchmark results showing that it works well. If someone comes up with
> > something even better, no harm done, we can always change it again.
> >
> > Objections?
>
> Yeah, it seems like a fairly clear win, according to the benchmarks.
>
> I did find some suspicious behavior on the bigger box I have available
> (with 2x xeon e5-2620v3), see the attached spreadsheet. But it seems
> pretty weird because the worst affected case is with no parallel workers
> (so the queue changes should affect it). Not sure how to explain it, but
> the behavior seems consistent.
>
> Anyway, the other machine with a single CPU seems perfectly clean.

Tomas, can you share your test script, I would like to repeat the same
test in my environment and with different batching sizes.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: a comment in joinrel.c: compute_partition_bounds()

2021-09-24 Thread Etsuro Fujita
Hi Amit-san,

On Fri, Sep 24, 2021 at 3:34 PM Amit Langote  wrote:
> I think there's a word missing in the following comment:
>
> /*
>  * See if the partition bounds for inputs are exactly the same, in
>  * which case we don't need to work hard: the join rel have the same
>  * partition bounds as inputs, and the partitions with the same
>  * cardinal positions form the pairs.
>
> ": the join rel have the same..." seems to be missing a "will".
>
> Attached a patch to fix.

Good catch!  Will fix.

Thanks!

Best regards,
Etsuro Fujita




Re: row filtering for logical replication

2021-09-24 Thread Dilip Kumar
On Fri, Sep 24, 2021 at 12:04 PM Amit Kapila  wrote:
>
> One possibility in this regard could be that we enhance Replica
> Identity .. Include (column_list) where all the columns in the include
> list won't be sent

Instead of RI's include column list why we can not think of
row_filter's columns list?  I mean like we log the old RI column can't
we make similar things for the row filter columns?  With that, we
don't have to all the columns instead we only log the columns which
are in row filter, or is this too hard to identify during write
operation?  So now the WAL logging requirement for RI and row filter
is orthogonal and if some columns are common then we can log only
once?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Fri, Sep 24, 2021 at 11:52 AM Amit Kapila  wrote:
>
> On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar  wrote:
> >
> > On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila  
> > wrote:
> >
> > > > 12) misuse of REPLICA IDENTITY
> > > >
> > > > The more I think about this, the more I think we're actually misusing
> > > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > > RI was to provide a row identifier for the subscriber.
> > > >
> > > > But now we're using it to ensure we have all the necessary columns,
> > > > which is entirely orthogonal to the original purpose. I predict this
> > > > will have rather negative consequences.
> > > >
> > > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > > bogus unique indexes with extra columns. Which is really silly, because
> > > > it wastes network bandwidth (transfers more data) or local resources
> > > > (CPU and disk space to maintain extra indexes).
> > > >
> > > > IMHO this needs more infrastructure to request extra columns to decode
> > > > (e.g. for the filter expression), and then remove them before sending
> > > > the data to the subscriber.
> > > >
> > >
> > > Yeah, but that would have an additional load on write operations and I
> > > am not sure at this stage but maybe there could be other ways to
> > > extend the current infrastructure wherein we build the snapshots using
> > > which we can access the user tables instead of only catalog tables.
> > > Such enhancements if feasible would be useful not only for allowing
> > > additional column access in row filters but for other purposes like
> > > allowing access to functions that access user tables. I feel we can
> > > extend this later as well seeing the usage and requests. For the first
> > > version, this doesn't sound too limiting to me.
> >
> > I agree with one point from Tomas, that if we bind the row filter with
> > the RI, then if the user has to use the row filter on any column 1)
> > they have to add an unnecessary column to the index 2) Since they have
> > to add it to RI so now we will have to send it over the network as
> > well.  3). We anyway have to WAL log it if it is modified because now
> > we forced users to add some columns to RI because they wanted to use
> > the row filter on that.   Now suppose we remove that limitation and we
> > somehow make these changes orthogonal to RI, i.e. if we have a row
> > filter on some column then we WAL log it, so now the only extra cost
> > we are paying is to just WAL log that column, but the user is not
> > forced to add it to index, not forced to send it over the network.
> >
>
> I am not suggesting adding additional columns to RI just for using
> filter expressions. If most users that intend to publish delete/update
> wanted to use filter conditions apart from replica identity then we
> can later extend this functionality but not sure if the only way to
> accomplish that is to log additional data in WAL.
>

One possibility in this regard could be that we enhance Replica
Identity .. Include (column_list) where all the columns in the include
list won't be sent but I think it is better to postpone such
enhancements for a later version. Like, I suggested above, we might
want to extend our infrastructure in a way where not only this extra
columns request can be accomplished but we should be able to allow
UDF's (where user tables can be accessed) and probably sub-queries as
well.

-- 
With Regards,
Amit Kapila.




a comment in joinrel.c: compute_partition_bounds()

2021-09-24 Thread Amit Langote
Hi,

I think there's a word missing in the following comment:

/*
 * See if the partition bounds for inputs are exactly the same, in
 * which case we don't need to work hard: the join rel have the same
 * partition bounds as inputs, and the partitions with the same
 * cardinal positions form the pairs.

": the join rel have the same..." seems to be missing a "will".

Attached a patch to fix.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


comment-missing-word.patch
Description: Binary data


Re: Hook for extensible parsing.

2021-09-24 Thread Julien Rouhaud
On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
> 
> I do have sympathy for the idea that extensions would like to define
> their own statement types.  I just don't see a practical way to do it
> in our existing parser infrastructure.  This patch certainly doesn't
> offer that.

Allowing extensions to define their own (utility) statement type is just a
matter of allowing ExtensibleNode as top level statement.  As far as I can
see the only change required for that is to give those a specific command tag
in CreateCommandTag(), since transformStmt() default to emitting a utility
command.  You can then easily intercept such statement in the utility hook and
fetch your custom struct.

I could do that but I'm assuming that you still wouldn't be satisfied as
custom parser would still be needed, whihc may or may not require to
copy/paste chunks of the core grammar?

If so, do you have any suggestion for an approach you would accept?




Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar  wrote:
>
> On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila  wrote:
>
> > > 12) misuse of REPLICA IDENTITY
> > >
> > > The more I think about this, the more I think we're actually misusing
> > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > RI was to provide a row identifier for the subscriber.
> > >
> > > But now we're using it to ensure we have all the necessary columns,
> > > which is entirely orthogonal to the original purpose. I predict this
> > > will have rather negative consequences.
> > >
> > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > bogus unique indexes with extra columns. Which is really silly, because
> > > it wastes network bandwidth (transfers more data) or local resources
> > > (CPU and disk space to maintain extra indexes).
> > >
> > > IMHO this needs more infrastructure to request extra columns to decode
> > > (e.g. for the filter expression), and then remove them before sending
> > > the data to the subscriber.
> > >
> >
> > Yeah, but that would have an additional load on write operations and I
> > am not sure at this stage but maybe there could be other ways to
> > extend the current infrastructure wherein we build the snapshots using
> > which we can access the user tables instead of only catalog tables.
> > Such enhancements if feasible would be useful not only for allowing
> > additional column access in row filters but for other purposes like
> > allowing access to functions that access user tables. I feel we can
> > extend this later as well seeing the usage and requests. For the first
> > version, this doesn't sound too limiting to me.
>
> I agree with one point from Tomas, that if we bind the row filter with
> the RI, then if the user has to use the row filter on any column 1)
> they have to add an unnecessary column to the index 2) Since they have
> to add it to RI so now we will have to send it over the network as
> well.  3). We anyway have to WAL log it if it is modified because now
> we forced users to add some columns to RI because they wanted to use
> the row filter on that.   Now suppose we remove that limitation and we
> somehow make these changes orthogonal to RI, i.e. if we have a row
> filter on some column then we WAL log it, so now the only extra cost
> we are paying is to just WAL log that column, but the user is not
> forced to add it to index, not forced to send it over the network.
>

I am not suggesting adding additional columns to RI just for using
filter expressions. If most users that intend to publish delete/update
wanted to use filter conditions apart from replica identity then we
can later extend this functionality but not sure if the only way to
accomplish that is to log additional data in WAL. I am just trying to
see if we can provide meaningful functionality without extending too
much the scope of this work.

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-24 Thread vignesh C
On Thu, Sep 23, 2021 at 12:32 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thur, Sep 23, 2021 11:06 AM Greg Nancarrow  wrote:
> > On Wed, Sep 22, 2021 at 9:33 PM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > >
> > > > How do you suggest changing it?
> > >
> > > Personally, I think we'd better move the code about changing
> > > publication's tablelist into AlterPublicationTables and the code about
> > > changing publication's schemalist into AlterPublicationSchemas. It's
> > > similar to what the v29-patchset did, the difference is the SET
> > > action, I suggest we drop all the tables in function
> > > AlterPublicationTables when user only set schemas and drop all the
> > > schema in AlterPublicationSchemas when user only set tables. In this
> > > approach, we can keep schema and relation code separate and don't need to
> > worry about the locking order.
> > >
> > > Attach a top-up patch which refactor the code like above.
> > > Thoughts ?
> > >
> >
> > Sounds like a good idea.
> > Is it possible to incorporate the existing
> > CheckPublicationAlterTables() and CheckPublicationAlterSchemas() functions
> > into your suggested update?
> > I think it might tidy up the error-checking a bit.
>
> I agreed we can put the check about ALL TABLE and superuser into a function
> like what the v30-patchset did. But I have some hesitations about the code
> related to CheckObjSchemaNotAlreadyInPublication(). Currently, we need to open
> and lock the table before invoking the CheckObjxxx function, ISTM we'd better
> open the table in function AlterPublicationTables. Maybe we can wait for the
> author's(Vignesh) opinion.

I felt keeping the code related to
CheckObjSchemaNotAlreadyInPublication as it is in
AlterPublicationTables and AlterPublicationSchemas is better.

Regards,
Vignesh




Re: Added schema level support for publication.

2021-09-24 Thread Masahiko Sawada
On Thu, Sep 23, 2021 at 1:56 PM Amit Kapila  wrote:
>
> On Wed, Sep 22, 2021 at 8:52 AM Masahiko Sawada  wrote:
> >
> > On Wed, Sep 22, 2021 at 3:02 AM vignesh C  wrote:
> > >
> > ---
> > This patch introduces some new static functions to publicationcmds.c
> > but some have function prototypes but some don't. As far as I checked,
> >
> > ObjectsInPublicationToOids()
> > CheckObjSchemaNotAlreadyInPublication()
> > GetAlterPublicationDelRelations()
> > AlterPublicationSchemas()
> > CheckPublicationAlterTables()
> > CheckPublicationAlterSchemas()
> > LockSchemaList()
> > OpenReliIdList()
> > PublicationAddSchemas()
> > PublicationDropSchemas()
> >
> > are newly introduced but only four functions:
> >
> > OpenReliIdList()
> > LockSchemaList()
> > PublicationAddSchemas()
> > PublicationDropSchemas()
> >
> > have function prototypes. Actually, there already are functions that
> > don't have their prototype in publicationcmds.c. But it seems better
> > to clarify what kind of functions don't need to have a prototype at
> > least in this file.
> >
>
> I think if the function is defined after its use then we declare it at
> the top. Do you prefer to declare all static functions to allow ease
> of usage? Do you have something else in mind?

I prefer to declare all static functions since if we have a function
prototype we don't need to include the change about the function in
future (unrelated) commits that might add a new function which uses
the function and is defined before their declarations. But it seems to
me that the policy varies per file. For instance, all functions in
vacuumlazy.c have their function prototype but functions in
publicationcmds.c seems not. I'm not going to insist on that so please
ignore this comment.

>
> >
> > ---
> > +   if ((action == DEFELEM_ADD || action == DEFELEM_SET) && 
> > !superuser())
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +errmsg("must be superuser to add or
> > set schemas")));
> >
> > Why do we require the superuser privilege only for ADD and SET but not for 
> > DROP?
> >
>
> For Add/Set of for all tables of Schema is similar to all tables
> publication requirement. For Drop, I don't think it is mandatory to
> allow only to superuser. The same applies to Alter Publication ...
> Drop table case where you don't need to be table owner whereas, for
> Add, you need to be. We had a discussion on these points in this
> thread. See [1] and some emails prior to it.

Thank you for sharing the link. That makes sense.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: row filtering for logical replication

2021-09-24 Thread Amit Kapila
On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
 wrote:
>
> 13) turning update into insert
>
> I agree with Ajin Cherian [4] that looking at just old or new row for
> updates is not the right solution, because each option will "break" the
> replica in some case. So I think the goal "keeping the replica in sync"
> is the right perspective, and converting the update to insert/delete if
> needed seems appropriate.
>
> This seems a somewhat similar to what pglogical does, because that may
> also convert updates (although only to inserts, IIRC) when handling
> replication conflicts. The difference is pglogical does all this on the
> subscriber, while this makes the decision on the publisher.
>
> I wonder if this might have some negative consequences, or whether
> "moving" this to downstream would be useful for other purposes in the
> fuure (e.g. it might be reused for handling other conflicts).
>

Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.

>
>
> 15) pgoutput_row_filter initializing filter
>
> I'm not sure I understand why the filter initialization gets moved from
> get_rel_sync_entry. Presumably, most of what the replication does is
> replicating rows, so I see little point in not initializing this along
> with the rest of the rel_sync_entry.
>

Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.

[1] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-search-conditions
[2] - 
https://www.postgresql.org/message-id/CAA4eK1JBHo2U2sZemFdJmcwEinByiJVii8wzGCDVMxOLYB3CUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-24 Thread vignesh C
On Thu, Sep 23, 2021 at 12:22 PM houzj.f...@fujitsu.com
 wrote:
>
> From Thurs, Sep 23, 2021 12:09 PM Amit Kapila  wrote:
> > On Wed, Sep 22, 2021 at 5:03 PM Hou Zhijie  wrote:
> > >
> > > Personally, I think we'd better move the code about changing publication's
> > > tablelist into AlterPublicationTables and the code about changing
> > publication's
> > > schemalist into AlterPublicationSchemas. It's similar to what the
> > v29-patchset
> > > did, the difference is the SET action, I suggest we drop all the tables in
> > > function AlterPublicationTables when user only set schemas and drop all 
> > > the
> > > schema in AlterPublicationSchemas when user only set tables. In this
> > approach,
> > > we can keep schema and relation code separate and don't need to worry
> > > about the locking order.
> > >
> > > Attach a top-up patch which refactor the code like above.
> > >
> >
> > Good suggestion. I think it would still be better if we can move the
> > checks related to superuser and puballtables into a separate function
> > that gets called before taking a lock on publication.
>
> I agreed.
>
> I noticed v30-0001 has been committed with some minor changes, and the 
> V30-0002
> patchset need to be rebased accordingly. Attach a rebased version patch set to
> make cfbot happy. Also Attach the two top-up patches which refactor the code 
> as
> suggested. (top-up patch 1 is to keep schema and table code separate, top-up
> patch 2 is to move some cheap check into a function and invoke it before
> locking.)

Thanks for the patches, the changes simplifies alterpublications code
and handles the drop object in a better way. I have merged it to 0001
patch at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards,
Vignesh